Data: Start using WHO as main source #2792

lucasrodes · 2023-02-27T15:23:01Z

Main and internal datasets will use WHO's data instead of JHU's.

…nhance/case-death

pwdel · 2023-03-01T22:54:14Z

scripts/src/cowidev/cmd/check.py

@@ -13,14 +13,21 @@
 VAX_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv"
 TESTING_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/testing/covid-testing-all-observations.csv"
 HOSP_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/hospitalizations/covid-hospitalizations.csv"
-FULL_URL = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
+FULL_URL_CSV = "https://covid.ourworldindata.org/data/owid-covid-data.csv"


Sorry to bother. I am not sure if this specific file represents the finalized new dataset mirroring the WHO dataset, but I thought I should try to help and point out that there is a large discrepancy in the total number of cases in China, if I am reading this correctly. Here is what I have graphed out for 2020 through Feb 27th, 2023:

This of course ends in a magnitude of 2,023,904 total cases from 1-22-2020 to 2-27-2023.

However looking on the WHO dashboard here: https://covid19.who.int/region/wpro/country/cn

It appears that their accounting shows that there has been 99,030,129 confirmed cases between 3 January 2020 to 6:06pm CET, 28 February 2023

This represents a discrepancy of 97,006,225 or an about 48 times difference in total cases.

I am not sure if this may also effect the following file, which appears to be an R tracking filter from what I can tell in the ReadMe:

scripts/src/cowidev/megafile/steps/core.py

file_url="https://github.com/crondonm/TrackingR/raw/main/Estimates-Database/database_7.csv"

Full disclosure, again, sorry to bother, hopefully I am being helpful here, but there is a prediction contest being put on by the University of Texas at Austin that I am taking part of. You have a large number of participants currently watching this repo. Here is the discussion in case you are interested. https://salemcenter.manifold.markets/SalemCenter/china-reaches-100000-covid-cases-by

Hi @pwdel

Thanks for getting in touch. I just scrolled through the latest comments on Manifold and can see how complicated this situation is. We won't give any particular opinion on how to handle the resolution of the market, but if it helps, here's more information below.

The data published by Johns Hopkins University (JHU) currently shows 2,023,904 confirmed cases in China.
The data published by the WHO currently shows a total of 99,030,129 confirmed cases in China.
The reasons for the discrepancy between the two sources are explained by the JHU team here.
Note that we (Our World in Data) do not collect data on confirmed cases ourselves; we've always relied on third-party sources for this data.

The file you mentioned (https://covid.ourworldindata.org/data/owid-covid-data.csv) is our primary COVID dataset aggregating data from multiple sources. This file still relies on the JHU data for confirmed cases and will do so until 8 March.

On 8 March, we'll merge our pull request to start relying on WHO data instead, and the entire time series (all the way back to early 2020) will be updated in this file.

I hope this helps!

@edomt thank you so much for the fantastic work you have done. Obviously you are under no obligation to solve any other institution's problems at all and in truth I feel bad about even approaching you about this. I think your answer completely clears up the question I had though, thank you.

enhance(case,death): overwrite data with new source

d8a1bd1

lucasrodes linked an issue Feb 27, 2023 that may be closed by this pull request

Cases and deaths: New source, technical details #2785

Closed

9 tasks

lucasrodes mentioned this pull request Feb 27, 2023

Cases and deaths: New source, technical details #2785

Closed

9 tasks

lucasrodes added 2 commits February 27, 2023 22:26

Merge branch 'master' into enhance/case-death

069fcc8

Merge branch 'master' of https://github.com/owid/covid-19-data into e…

a6bc7cc

…nhance/case-death

pwdel reviewed Mar 1, 2023

View reviewed changes

Merge branch 'master' into enhance/case-death

d82db45

lucasrodes merged commit 2810f8f into master Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data: Start using WHO as main source #2792

Data: Start using WHO as main source #2792

lucasrodes commented Feb 27, 2023

pwdel Mar 1, 2023

edomt Mar 2, 2023 •

edited

Loading

pwdel Mar 2, 2023

Data: Start using WHO as main source #2792

Data: Start using WHO as main source #2792

Conversation

lucasrodes commented Feb 27, 2023

pwdel Mar 1, 2023

Choose a reason for hiding this comment

edomt Mar 2, 2023 • edited Loading

Choose a reason for hiding this comment

pwdel Mar 2, 2023

Choose a reason for hiding this comment

edomt Mar 2, 2023 •

edited

Loading