-
Clone this repository:
git clone https://github.com/pycontw/pycon-etl
-
Create a new branch:
git checkout -b <branch-name>
-
Make your changes.
NOTICE: We are still using Airflow v1, so please read the official document Apache Airflow v1.10.15 Documentation to ensure your changes are compatible with our current version.
If your task uses an external service, add the connection and variable in the Airflow UI.
-
Test your changes in your local environment:
- Ensure the DAG file is loaded successfully.
- Verify that the task runs successfully.
- Confirm that your code is correctly formatted and linted.
- Check that all necessary dependencies are included in
requirements.txt
.
-
Push your branch:
git push origin <branch-name>
-
Create a Pull Request (PR).
-
Wait for the review and merge.
-
Write any necessary documentation.
Please use GitLab Flow; otherwise, you cannot pass Docker Hub CI.
Airflow dependencies are managed by requirements.txt
and constraints-3.8.txt
via pip
. It is not recommended to use poetry
or other tools.
constraints-3.8.txt
is used to pin the version of the Airflow dependencies, and requirements.txt
is used to install user-defined dependencies.
Please add or update dependencies in requirements.txt
. Do not modify constraints-3.8.txt
unless Airflow is updated.
For more information, refer to the Airflow Installation Documentation.
-
Please refer to this article for naming guidelines.
- Examples:
ods/opening_crawler
: Crawlers written by @Rain. These openings can be used for the recruitment board, which was implemented by @tai271828 and @stacy.ods/survey_cake
: A manually triggered uploader that uploads questionnaires to BigQuery. The uploader should be invoked after we receive the SurveyCake questionnaire.
- Examples:
Please use make format
to format your code before committing, otherwise, the CI will fail.
It is recommended to use Commitizen.
Please check the .github/workflows directory for details.