Skip to content

Commit

Permalink
deprecate oxcgrt
Browse files Browse the repository at this point in the history
  • Loading branch information
lucasrodes committed Aug 1, 2024
1 parent eb95cf2 commit c9c6ffc
Show file tree
Hide file tree
Showing 11 changed files with 25 additions and 304 deletions.
50 changes: 25 additions & 25 deletions scripts/docs/data-pipeline.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Data pipeline

To produce [our dataset](../dataset) we are constantly developing our dedicated library [cowidev](../cowidev/index). This library provides us with the
command tool [`cowid`](../cowidev/cowid-api) which eases:

Expand All @@ -7,8 +8,8 @@ command tool [`cowid`](../cowidev/cowid-api) which eases:

Consequently, the dataset is updated multiple times a day (_at least_ at 06:00 and 18:00 UTC), using the latest generated intermediate datasets.


## Overview

The dataset pipeline is built from several pipelines, which are executed independently and whose outputs are combined in
a final step. The complexity of the pipelines varies. For instance, for vaccinations, testing and hospitalization
we are responsible for collecting, processing and publishing the data but for cases/deaths we leave the collection step to the [WHO](https://data.who.int/dashboards/covid19/cases) and then transform and publish the data. Note
Expand All @@ -17,28 +18,30 @@ that on 23 June 2022, we stopped adding new data points to our COVID-19 testing
The table below lists all the constituent pipelines, along with their execution frequencies, and what are the pipelines'
tasks.

| **Pipeline** | **Frequency** | **Tasks** |
|---------------------------|------------------------------|------------------------------------------|
| [Vaccinations](#vaccinations-pipeline) | every weekday at 12:00 UTC | {abbr}`Collection (Scraping primary sources (e.g. country governmental sites) and extracting relevant datapoints.)`, {abbr}`transformation (Transforming and cleaning the downloaded data into a human-readable format.)`, {abbr}`presentation (Presenting the cleaned data to the public (e.g. charts, dataset files, etc.).)` |
| [Testing](#testing-pipeline) | Phased out ([read more](https://github.com/owid/covid-19-data/discussions/2667)) | Collection, transformation, presentation |
| [Hospitalization & ICU](#hospitalization-icu-pipeline) | daily at 06:00 and 18:00 UTC | Collection, transformation, presentation |
| [Cases & Deaths](#cases-deaths-pipeline) | daily (multiple times) | Transformation, presentation |
| [Excess mortality](#excess-mortality-pipeline) | weekly | Transformation, presentation |
| [Variants](#variants-pipeline) | daily at 20:00 UTC | Transformation, presentation |
| [Reproduction rate](#reproduction-rate-pipeline) | daily | Presentation |
| [Policy responses (OxCGRT)](#policy-responses-oxcgrt-pipeline) | daily | Transformation, presentation |
| [Public monitor (YouGov)](#public-monitor-yougov-pipeline) | weekly | Transformation, presentation |
| **Pipeline** | **Frequency** | **Tasks** |
| -------------------------------------------------------------- | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Vaccinations](#vaccinations-pipeline) | every weekday at 12:00 UTC | {abbr}`Collection (Scraping primary sources (e.g. country governmental sites) and extracting relevant datapoints.)`, {abbr}`transformation (Transforming and cleaning the downloaded data into a human-readable format.)`, {abbr}`presentation (Presenting the cleaned data to the public (e.g. charts, dataset files, etc.).)` |
| [Testing](#testing-pipeline) | Phased out ([read more](https://github.com/owid/covid-19-data/discussions/2667)) | Collection, transformation, presentation |
| [Hospitalization & ICU](#hospitalization-icu-pipeline) | daily at 06:00 and 18:00 UTC | Collection, transformation, presentation |
| [Cases & Deaths](#cases-deaths-pipeline) | daily (multiple times) | Transformation, presentation |
| [Excess mortality](#excess-mortality-pipeline) | weekly | Transformation, presentation |
| [Variants](#variants-pipeline) | daily at 20:00 UTC | Transformation, presentation |
| [Reproduction rate](#reproduction-rate-pipeline) | daily | Presentation |
| [Policy responses (OxCGRT)](#policy-responses-oxcgrt-pipeline) | daily | Transformation, presentation |
| [Public monitor (YouGov)](#public-monitor-yougov-pipeline) | weekly | Transformation, presentation |

You can find all the automation details [in this file](https://github.com/owid/covid-19-data/blob/master/scripts/scripts/autoupdate.sh).

## Vaccinations pipeline

The vaccination pipeline is probably the most complete one, where we scrape and extract data for each country in the
dataset.

The pipeline is executed manually, by [@edomt](https://github.com/edomt) or [@lucasrodes](https://github.com/lucasrodes)
every weekday (i.e. Monday until Friday) before 12 UTC.

### Execution steps

```
# Download/scrape data
cowid vax get
Expand All @@ -59,6 +62,7 @@ cowid vax export
```

## Testing pipeline

We scrape and process data for multiple countries, similarly to the vaccinations pipeline. The pipeline is executed manually, by [@camapel](https://github.com/camapel) on Mondays and Fridays.

:::{warning}
Expand All @@ -76,7 +80,9 @@ cowid testing get
```{seealso}
[Intermediate datasets](https://github.com/owid/covid-19-data/tree/master/public/data/testing)
```

## Hospitalization & ICU pipeline

We scrape and process the data similarly as to what we do for testing and vaccinations. The pipeline is run daily.

### Execution steps
Expand All @@ -95,24 +101,25 @@ cowid hosp grapher-io
```

## Cases & Deaths pipeline

We source cases and death figures from the [COVID-19 Dashboard by the WHO](https://data.who.int/dashboards/covid19/cases). We transform some of the variables and
re-publish the dataset.

### Execution steps

```
# Generate dataset
cowid casedeath generate
```


```{seealso}
[Intermediate datasets](https://github.com/owid/covid-19-data/tree/master/public/data/cases_deaths).
```

## Excess Mortality pipeline
The pipeline is manually executed once a week. The reported all-cause mortality data is from the [Human Mortality Database](https://www.mortality.org/) (HMD) Short-term Mortality Fluctuations project and the [World Mortality Dataset](https://github.com/akarlinsky/world_mortality) (WMD). Both sources are updated weekly. We also present estimates of excess deaths globally that are [published by _The Economist_](https://github.com/TheEconomist/covid-19-the-economist-global-excess-deaths-model).

The pipeline is manually executed once a week. The reported all-cause mortality data is from the [Human Mortality Database](https://www.mortality.org/) (HMD) Short-term Mortality Fluctuations project and the [World Mortality Dataset](https://github.com/akarlinsky/world_mortality) (WMD). Both sources are updated weekly. We also present estimates of excess deaths globally that are [published by _The Economist_](https://github.com/TheEconomist/covid-19-the-economist-global-excess-deaths-model).

### Execution steps

Expand All @@ -127,7 +134,9 @@ cowid xm generate
```

## Variants pipeline

We run this pipeline daily.

### Execution steps

```
Expand All @@ -143,24 +152,15 @@ The data on variants and sequencing is indeed no longer available to download.
It is published by GISAID under a license that doesn't allow us to redistribute it.
Please visit [the data publisher's website](https://www.gisaid.org/) for more details. You may want to register an account there if you're really interested in using this data.
```

## Reproduction rate pipeline

We source the data from [crondonm/TrackingR/](https://github.com/crondonm/TrackingR/).

```{seealso}
[_Tracking R of COVID-19 A New Real-Time Estimation Using the Kalman Filter_](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244474), by Francisco Arroyo, Francisco Bullano, Simas Kucinskas, and Carlos Rondón-Moreno
```
## Policy responses (OxCGRT) pipeline

```
# Get the data
cowid oxcgrt get
# Update Grapher files
cowid oxcgrt grapher-io
```



## Public monitor (YouGov) pipeline

Expand Down
1 change: 0 additions & 1 deletion scripts/docs/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,6 @@ Commands:
casedeath COVID-19 Cases/Deaths data pipeline.
variants COVID-19 Variants data pipeline.
xm COVID-19 Excess Mortality data pipeline.
oxcgrt COVID-19 stringency index (by OxCGRT) data pipeline.
sweden COVID-19 Sweden data pipeline.
uk-nations COVID-19 UK Nations data pipeline.
check COVID-19 data pipeline checks.
Expand Down
19 changes: 0 additions & 19 deletions scripts/scripts/autoupdate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -152,25 +152,6 @@ if [ $hour == 21 ] ; then
git_push "xm"
fi

# =====================================================================
# Policy responses
OXCGRT_CSV_PATH=./scripts/input/bsg/latest.csv
hour=$(date +%H)
if [ $hour == 23 ] ; then
# Download CSV
cowid --server oxcgrt get
# If there are any unstaged changes in the repo, then the
# CSV has changed, and we need to run the update script.
if has_changed $OXCGRT_CSV_PATH; then
echo "Generating OxCGRT export..."
cowid --server oxcgrt grapher-io
git_push "oxcgrt"
else
echo "OxCGRT export is up to date"
fi
else
echo "OxCGRT CSV was recently updated; skipping download"
fi


# =====================================================================
Expand Down
2 changes: 0 additions & 2 deletions scripts/src/cowidev/cmd/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
from cowidev.cmd.cases_deaths import click_cases_deaths
from cowidev.cmd.xm import click_xm
from cowidev.cmd.variants import click_variants
from cowidev.cmd.oxcgrt import click_oxcgrt
from cowidev.cmd.megafile import click_megafile
from cowidev.cmd.sweden import click_sweden
from cowidev.cmd.uk_nations import click_uk_nations
Expand Down Expand Up @@ -57,7 +56,6 @@ def cli(ctx, parallel, n_jobs, server):
cli.add_command(click_cases_deaths)
cli.add_command(click_variants)
cli.add_command(click_xm)
cli.add_command(click_oxcgrt)
cli.add_command(click_sweden)
cli.add_command(click_uk_nations)
cli.add_command(click_check)
Expand Down
47 changes: 0 additions & 47 deletions scripts/src/cowidev/cmd/oxcgrt.py

This file was deleted.

Empty file.
24 changes: 0 additions & 24 deletions scripts/src/cowidev/oxcgrt/__main__.py

This file was deleted.

21 changes: 0 additions & 21 deletions scripts/src/cowidev/oxcgrt/_parser.py

This file was deleted.

60 changes: 0 additions & 60 deletions scripts/src/cowidev/oxcgrt/etl.py

This file was deleted.

Loading

0 comments on commit c9c6ffc

Please sign in to comment.