Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database outputs #5

Merged
merged 7 commits into from
Dec 4, 2024
Merged

Database outputs #5

merged 7 commits into from
Dec 4, 2024

Conversation

hannahker
Copy link
Collaborator

Porting over code from https://github.com/OCHA-DAP/ds-floodexposure-monitoring-app, that should live as a pipeline instead. This is a quick and dirty copy and paste exercise -- there are still a number of things that we'll need to improve to make this our production pipeline. .

@hannahker hannahker changed the base branch from main to exposure-pipeline November 20, 2024 20:14
@hannahker hannahker requested a review from t-downing November 20, 2024 20:14
Copy link
Collaborator

@t-downing t-downing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! In addition to the thing about locally storing the CODAB, I think we just need to add a line to run pipelines/update_database.py to the workflow .yml so this runs each day as well.

Then, as discussed, we can do a separate PR to bypass the intermediate .parquet storage and just write straight to the DB. But we can merge this one first to get the DB write happening automatically each day, cause I noticed it's getting out of date in the app.

@hannahker
Copy link
Collaborator Author

@t-downing I've addressed the database issues here and repopulated the flood_exposure database.

Then, as discussed, we can do a separate PR to bypass the intermediate .parquet storage and just write straight to the DB. But we can merge this one first to get the DB write happening automatically each day, cause I noticed it's getting out of date in the app.

I think we'll want to work on the migration from parquet and the daily updates together. Rerunning this update_database.py script as-is will rewrite data from all dates to the database, which takes some time and is quite redundant. It'll take a bit of refactoring to process the parquet files to only pull in the latest data, so at that point we may as well just refactor to get rid of the parquet files entirely. I'd vote for addressing those both in a separate PR once this one is merged.

@t-downing t-downing linked an issue Dec 2, 2024 that may be closed by this pull request
Copy link
Collaborator

@t-downing t-downing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this now as an intermediate step to switching to the new database table

@hannahker hannahker merged commit 11f497b into exposure-pipeline Dec 4, 2024
@hannahker hannahker deleted the database-outputs branch December 4, 2024 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sudden drop in aggregated values
2 participants