- English version
- Russian version
- TODO
- Kazakh version
- TODO
This is an open-source project fully hosted on GitHub. All contributions are welcome, whether it's a bug report, data engineering or documentation. Below you can find some details about different contributions to this project.
Ideas for datasets can be submitted in this issue tracker. Please, provide as much details as possible so that other people can easily understand the idea. Before creating an issue for your idea, make sure it doesn't exist already.
When you start working on a new dataset, normally you wouldn't have an existing repository. In that case, you can create one with your own account on Github and transfer it to our organization.
If you found an issue/bug in an existing dataset repository, you can simply open a pull request and assign one of the members of this organization as a reviewer. Once it is reviewed and approved, we make sure it is merged into the main branch.
You can also create an issue for each dataset. Simply open a new issue in the relevant repository. Please, don't use ideas
repository for issues in an existing dataset.
A single dataset can be placed in a repository, for instance, this is a population dataset: https://github.com/open-data-kazakhstan/population. It should have the following structure:
README.md
- a markdown based documentation for a given dataset. For example: https://github.com/open-data-kazakhstan/geo-boundaries-kz/blob/master/README.mddatapackage.json
- a metadata for a dataset. See metadata specification section for more details.data/
- a directory where data files (e.g.,csv
files) are stored.scripts/
- a directory where scripts to generate (e.g., process, transform etc.) data files are stored..github/workflows/
- a directory where we define GitHub actions (data pipelines).
We describe data using frictionlessdata specification.
TODO: how to generate datapackage.json
.
We use CSV format for tabular data.
We use Python programming language by default and Dataflows library.
TODO