Skip to content

Commit

Permalink
refactor(vax): vax -> cowidev.vax
Browse files Browse the repository at this point in the history
  • Loading branch information
lucas rg committed Aug 10, 2021
1 parent 690cd5e commit cd8e9ab
Show file tree
Hide file tree
Showing 7 changed files with 101 additions and 60 deletions.
35 changes: 24 additions & 11 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,39 @@
# Development
[![Data](https://img.shields.io/badge/go_to-public_data-purple)](../../../public/data/)
[![Vaccinations docs](https://img.shields.io/badge/vaccination-docs-0055ff)](docs/VACCINATIONS_README.md)
[![Vaccinations docs](https://img.shields.io/badge/vaccination-docs-0055ff)](docs/vaccinations/README.md)
[![Testing docs](https://img.shields.io/badge/testing-docs-0055ff)](scripts/testing/README.md)

Here you will find all the different scripts and tools that we use to generate [the data](https://github.com/owid/covid-19-data/tree/master/public/data).
Here you will find all the different scripts and tools that we use to generate [the
data](https://github.com/owid/covid-19-data/tree/master/public/data).

As there are several metrics being reported, each with its independent pipeline, the overall data pipeline can seem a bit
complex. Therefore, this file attempts to explain the most relevant processes in use as we believe that transparency
is a must and that it can help developers in contributing to the project.
Currently, most of the pipelines have been integrated into our [`cowidev`](src/cowidev) library. For details about the
library structure and guidelines to contribute please refer to the [library documentation](docs/README.md).

Currently, we are trying to have all diferent pipelines in our [`cowidev`](src/cowidev) library.

## Folders
|Folder|Description |
|------|-----------------------------|
|[`grapher`](grapher)|Contains output files that power our [_grapher_](https://ourworldindata.org/owid-grapher) visualizations|
|[`input`](input)|External files used to compute derived metrics, such as X-per capita, and aggregate groups, such as 'Asia', etc.|
|[`notebook`](notebooks)|Notebooks used for development purposes (not maintained).|
|[`src`](src)|`cowidev` library. It contains the code for almost all project's pipelines.|
|[`docs/`](docs)|Development documentation.|
|[`grapher/`](grapher)|Hosts internal files that power our [_grapher_](https://ourworldindata.org/owid-grapher) visualizations.|
|[`input/`](input)|External files used to compute derived metrics, such as X-per capita, and aggregate groups, such as 'Asia', etc.|
|[`output/`](output)|Temporary files. Only for development purposes. Use it at your own risk.|
|[`src/cowidev/`](src/cowidev)|`cowidev` library. It contains the code for almost all project's pipelines.|
|[`scripts`](scripts)|Legacy folder. Contains some parts of the code, such as the COVID-19 testing collection scripts. The code is a mixture of R and Python scripts.|

Note that the folder [public/data](../public/data) is not to be modified, as it contains output files generated by this
pipeline. Exceptions may include output folder refactor and others.
pipeline.

## Library `cowidev`
Install it by runing `pip install .` from this directory.

Currently it contains the code for the following data pipelines:

- Excess mortality
- Google Mobility
- OxCGRT
- Variants
- Vaccination
- YouGov

## Vaccination data
> 📁 Find it at [`scripts/vaccinations/`](scripts/vaccinations)
Expand Down
51 changes: 51 additions & 0 deletions scripts/docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
## Set up Development environment
### Python version
Make sure you have a working environment with Python 3 installed. We use Python >= 3.7.

You can check this with:

```
python --version
```

### Install library
In your environment (shell), cd to the project directory and install the library in development mode. That is, run:

```
$ pip install -e .
```

In addition to `cowidev` package, this will install the command tool `cowid-vax`, which is required
to run the data pipeline.

### Required configuration

#### Environment varilables
- `{OWID_COVID_PROJECT_DIR}`: Path to the local project directory. E.g. `/Users/username/projects/covid-19-data`.
- `{OWID_COVID_VAX_CREDENTIALS_FILE}` (vaccinations): Path to the credentials file (this is internal). Google-related fields require a valid OAuth JSON credentials file (see [gsheets
documentation](https://gsheets.readthedocs.io/en/stable/#quickstart)). The credentials file should have the following structure:
```json
{
"greece_api_token": "[GREECE_API_TOKEN]",
"owid_cloud_table_post": "[OWID_CLOUD_TABLE_POST]",
"google_credentials": "[CREDENTIALS_JSON_PATH]",
"google_spreadsheet_vax_id": "[SHEET_ID]",
"twitter_consumer_key": "[TWITTER_CONSUMER_KEY]",
"twitter_consumer_secret": "[TWITTER_CONSUMER_SECRET]"
}
```
- `{OWID_COVID_VAX_CONFIG_FILE}` (vaccinations): Path to `config.yaml` file required for vaccination pipeline.

#### Credentials file
The environment variable `${OWID_COVID_VAX_CREDENTIALS_FILE}` corresponds to the path to the credentials file. This is internal. Google-related fields require a valid OAuth JSON credentials file (see [gsheets
documentation](https://gsheets.readthedocs.io/en/stable/#quickstart)). The file should have the following structure:
```json
{
"greece_api_token": "[GREECE_API_TOKEN]",
"owid_cloud_table_post": "[OWID_CLOUD_TABLE_POST]",
"google_credentials": "[CREDENTIALS_JSON_PATH]",
"google_spreadsheet_vax_id": "[SHEET_ID]",
"twitter_consumer_key": "[TWITTER_CONSUMER_KEY]",
"twitter_consumer_secret": "[TWITTER_CONSUMER_SECRET]"
}
```
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Vaccination update automation
[![Python 3"](https://img.shields.io/badge/python-3.7|3.8|3.9-blue.svg?&logo=python&logoColor=yellow)](https://www.python.org/downloads/release/python-3)
[![Contribute](https://img.shields.io/badge/-contribute-0055ff)](CONTRIBUTE.md)
[![Data](https://img.shields.io/badge/public-data-purple)](../../public/data/)
[![Data](https://img.shields.io/badge/public-data-purple)](../../../public/data/)

**THIS FILE IS BEING RE-WRITTEN**

Vaccination data is updated on a daily basis. For some countries, the update is done by means of an automated process,
while others require some manual work. To keep track of the currently automated processes, check [this
Expand All @@ -15,18 +14,17 @@ table](automation_state.csv).
3. [The data pipeline](#3-the-data-pipeline)
4. [Other functions](#4-other-functions)
5. [Contribute](CONTRIBUTE.md)
6. [FAQs](#6-FAQs)
6. [FAQs](#6-faqs)

## 1. Vaccination pipeline files
This directory contains the following files:


| File name | Description |
| ----------- | ----------- |
| [`output/`](../output/vaccination/) | Temporary automated imports are placed here. |
| [`src/cowidev/vax/`](src/cowidev/vax) | Scripts to automate country data imports. |
| [`output/vaccinations/`](../../output/vaccinations/) | Temporary automated imports are placed here. |
| [`src/cowidev/vax/`](../../src/cowidev/vax) | Scripts to automate country data imports. |
| [`config.yaml`](config.yaml) | Data pipeline configuration. |
| [`us_states/input/`](us_states/input) | Data for US-state vaccination data updates. |
| [`MANIFEST.in`](MANIFEST.IN), [`setup.py`](setup.py), [`requirements.txt`](requirements.txt), [`requirements-flake.txt`](requirements-flake.txt) | Library development related files |
| [`automation_state.csv`](automation_state.csv) | Lists if country process is automated (TRUE) or not (FALSE). |
| [`source_table.html`](source_table.html) | HTML table with country source URLs. Shown at [OWID's website](https://ourworldindata.org/covid-vaccinations#source-information-country-by-country). |
Expand All @@ -36,33 +34,14 @@ _*Only most relevant files have been listed_


## 2. Development environment
<details closed>
<details open>
<summary>Show steps ...</summary>
Follow the steps below to correctly set up your virtual environment.

### Python version
Make sure you have a working environment with Python 3 installed. We use Python >= 3.7.

You can check this with:

```
python --version
```

### Install library
In your environment (shell), install the library in development mode. That is, run:

```
$ pip install -e .
```

In addition to `owid-covid19-vaccination-dev` package, this will install the command tool `cowid-vax`, which is required
to run the data pipeline.

### Configuration file

To correctly run the data pipeline, make sure to have a valid _configuration file_. We currently use
[config.yaml](config.yaml). This file contains data used throughout the different pipeline stages.
A valid _configuration file_ is required to run the vaccination pipeline. In addition, you must have environment
variable `{OWID_COVID_VAX_CONFIG_FILE}` pointing to the aforementioned _configuration file_. We currently use
[config.yaml](../../config.yaml). This file contains data used throughout the different pipeline stages.

```yaml
global:
Expand Down Expand Up @@ -101,7 +80,7 @@ Our current configuration requires to previously set environment variables `${OW

```sh
export OWID_COVID_PROJECT_DIR=/Users/username/projects/covid-19-data
export OWID_COVID_VAX_CREDENTIALS_FILE=${OWID_COVID_PROJECT_DIR}/scripts/scripts/vaccinations/vax_dataset_config.json
export OWID_COVID_VAX_CREDENTIALS_FILE=${OWID_COVID_PROJECT_DIR}/scripts/vax_dataset_config.json
```

### Credentials file
Expand Down Expand Up @@ -242,18 +221,16 @@ Once the automation is successfully executed, the following files and directorie

| File name | Description |
| ----------- | ----------- |
| [`vaccinations.csv`](../../public/data/vaccinations/vaccinations.csv) | Main output with vaccination data of all countries. |
| [`vaccinations.json`](../../public/data/vaccinations/vaccinations.json) | Same as `vaccinations.csv` but in JSON format. |
| [`vaccinations-by-manufacturer.csv`](../../public/data/vaccinations/vaccinations-by-manufacturer.csv) | Secondary output with vaccination by manufacturer for a select number of countries. |
| [`country_data/`](../../public/data/vaccinations/country_data/) | Individual country CSV files. |
| [`locations.csv`](../../public/data/vaccinations/locations.csv) | Country-level metadata. |
| [`source_table.csv`](../output/vaccinations/source_table.html) | HTML table with country source URLs. Shown at [OWID's website](https://ourworldindata.org/covid-vaccinations#source-information-country-by-country) |
| [`automation_state.csv`](../output/vaccinations/automation_state.csv) | Lists if country process is automated (TRUE) or not (FALSE). |
| [`COVID-19 - Vaccinations.csv`](../grapher/COVID-19%20-%20Vaccinations.csv) | Internal file for OWID grapher on vaccinations. |
| [`COVID-19 - Vaccinations by manufacturer.csv`](../grapher/COVID-19%20-%20Vaccinations%20by%20manufacturer.csv) | Internal file for OWID grapher on vaccinations by manufacturer. |

| [`vaccinations.csv`](../../../public/data/vaccinations/vaccinations.csv) | Main output with vaccination data of all countries. |
| [`vaccinations.json`](../../../public/data/vaccinations/vaccinations.json) | Same as `vaccinations.csv` but in JSON format. |
| [`vaccinations-by-manufacturer.csv`](../../../public/data/vaccinations/vaccinations-by-manufacturer.csv) | Secondary output with vaccination by manufacturer for a select number of countries. |
| [`country_data/`](../../../public/data/vaccinations/country_data/) | Individual country CSV files. |
| [`locations.csv`](../../../public/data/vaccinations/locations.csv) | Country-level metadata. |
| [`source_table.csv`](../../output/vaccinations/source_table.html) | HTML table with country source URLs. Shown at [OWID's website](https://ourworldindata.org/covid-vaccinations#source-information-country-by-country) |
| [`automation_state.csv`](../../output/vaccinations/automation_state.csv) | Lists if country process is automated (TRUE) or not (FALSE). |
| [`COVID-19 - Vaccinations.csv`](../../grapher/COVID-19%20-%20Vaccinations.csv) | Internal file for OWID grapher on vaccinations. |
| [`COVID-19 - Vaccinations by manufacturer.csv`](../../grapher/COVID-19%20-%20Vaccinations%20by%20manufacturer.csv) | Internal file for OWID grapher on vaccinations by manufacturer. |

_You can find more information about these files [here](../../public/data/vaccinations/README.md)_.

#### Notes

Expand Down Expand Up @@ -327,7 +304,7 @@ Countries are given from the one with the least to the one with he most number o


## 5. Contribute
We welcome contributions! Read more in [CONTRIBUTE](VACCINATIONS_CONTRIBUTE.md)
We welcome contributions! Read more in [CONTRIBUTE](vaccinations/CONTRIBUTE.md)
## 6. FAQs

### Any question or suggestion?
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions scripts/scripts/vaccinations/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ This directory is deprecated.
- `src/vax`[`scripts/src/cowidev/vax`](../../../scripts/src/cowidev/vax/)
- `output`[`scripts/output/vaccinations`](../../../scripts/output/vaccinations/). Main data for country files live at
[`scripts/output/vaccinations/main_data`](../../../scripts/output/vaccinations/main_data)
- `CONTRIBUTE.md`[`scripts/docs/VACCINATIONS_CONTRIBUTE.md`](../../docs/VACCINATIONS_CONTRIBUTE.md)
- `README.md`[`scripts/docs/VACCINATIONS_README.md`](../../docs/VACCINATIONS_README.md)
- `CONTRIBUTE.md`[`scripts/docs/vaccinations/CONTRIBUTE.md`](../../docs/vaccinations/CONTRIBUTE.md)
- `README.md`[`scripts/docs/vaccinations/README.md`](../../docs/vaccinations/README.md)
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ cowid-vax export

# Git push
git add output/*
git add source_table.html
git add automation_state.csv
git add ../../grapher/*
git add ../../../public/data/*
git add output/vaccinations/*
git add output/vaccinations/source_table.html
git add output/vaccinations/automation_state.csv
git add grapher/*
git add ../public/data/*
git commit -m 'data(vax): update'
git push origin master

0 comments on commit cd8e9ab

Please sign in to comment.