-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data: Start using WHO as main source #2792
Conversation
scripts/src/cowidev/cmd/check.py
Outdated
@@ -13,14 +13,21 @@ | |||
VAX_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv" | |||
TESTING_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/testing/covid-testing-all-observations.csv" | |||
HOSP_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/hospitalizations/covid-hospitalizations.csv" | |||
FULL_URL = "https://covid.ourworldindata.org/data/owid-covid-data.csv" | |||
FULL_URL_CSV = "https://covid.ourworldindata.org/data/owid-covid-data.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to bother. I am not sure if this specific file represents the finalized new dataset mirroring the WHO dataset, but I thought I should try to help and point out that there is a large discrepancy in the total number of cases in China, if I am reading this correctly. Here is what I have graphed out for 2020 through Feb 27th, 2023:
This of course ends in a magnitude of 2,023,904 total cases from 1-22-2020 to 2-27-2023.
However looking on the WHO dashboard here: https://covid19.who.int/region/wpro/country/cn
It appears that their accounting shows that there has been 99,030,129 confirmed cases between 3 January 2020 to 6:06pm CET, 28 February 2023
This represents a discrepancy of 97,006,225 or an about 48 times difference in total cases.
I am not sure if this may also effect the following file, which appears to be an R tracking filter from what I can tell in the ReadMe:
scripts/src/cowidev/megafile/steps/core.py
file_url="https://github.com/crondonm/TrackingR/raw/main/Estimates-Database/database_7.csv"
Full disclosure, again, sorry to bother, hopefully I am being helpful here, but there is a prediction contest being put on by the University of Texas at Austin that I am taking part of. You have a large number of participants currently watching this repo. Here is the discussion in case you are interested. https://salemcenter.manifold.markets/SalemCenter/china-reaches-100000-covid-cases-by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pwdel
Thanks for getting in touch. I just scrolled through the latest comments on Manifold and can see how complicated this situation is. We won't give any particular opinion on how to handle the resolution of the market, but if it helps, here's more information below.
The data published by Johns Hopkins University (JHU) currently shows 2,023,904 confirmed cases in China.
The data published by the WHO currently shows a total of 99,030,129 confirmed cases in China.
The reasons for the discrepancy between the two sources are explained by the JHU team here.
Note that we (Our World in Data) do not collect data on confirmed cases ourselves; we've always relied on third-party sources for this data.
The file you mentioned (https://covid.ourworldindata.org/data/owid-covid-data.csv
) is our primary COVID dataset aggregating data from multiple sources. This file still relies on the JHU data for confirmed cases and will do so until 8 March.
On 8 March, we'll merge our pull request to start relying on WHO data instead, and the entire time series (all the way back to early 2020) will be updated in this file.
I hope this helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edomt thank you so much for the fantastic work you have done. Obviously you are under no obligation to solve any other institution's problems at all and in truth I feel bad about even approaching you about this. I think your answer completely clears up the question I had though, thank you.
Main and internal datasets will use WHO's data instead of JHU's.