Airflow Data Pipeline is a project aimed at demonstrating the setup and usage of Apache Airflow for orchestrating data workflows. This README provides instructions for installing Apache Airflow using Docker and examples of basic usage.
Imagine you have a task to hit an API endpoint that provides various data in JSON format, including details about upcoming space launches. You use the following command to retrieve the data:
$ curl -L "https://ll.thespacedevs.com/2.0.0/launch/upcoming"
The response contains a list of upcoming launches, each with details such as ID, URL, launch name, scheduled launch time, and associated images.
{
"results": [
{
"id": "528b72ff-e47e-46a3-b7ad-23b2ffcec2f2",
"url": "https://.../528b72ff-e47e-46a3-b7ad-23b2ffcec2f2/",
"launch_library_id": 2103,
"name": "Falcon 9 Block 5 | NROL-108",
"net": "2020-12-19T14:00:00Z",
"window_end": "2020-12-19T17:00:00Z",
"window_start": "2020-12-19T14:00:00Z",
"image": "https://spacelaunchnow-prod-east.nyc3.digitaloceanspaces.com/media/launch_images/falcon2520925_image_20201217060406.jpeg",
"infographic": ".../falcon2520925_infographic_20201217162942.png"
},
...
]
}
This task might seem simple, but it's actually composed of three complex sub-tasks. Managing these tasks concurrently could become difficult, especially if one task depends on the output of another. This is where Apache Airflow comes in. It allows you to define dependent tasks as a pipeline and execute them in an orderly manner. Additionally, with Airflow's ability to treat pipelines as code, you can easily manage and version control your data processing workflows.
Your ultimate goal is not just to retrieve this data, but also to notify the API hitter that their task was successful. This means you need to process the JSON response, extract relevant information, and possibly send a notification to the user indicating the successful completion of the task.
- Docker installed on your machine. You can download Docker from here.
git clone https://github.com/paresh2806/Docker_Airflow.git apache-airflow
AIRFLOW__CORE__LOAD_EXAMPLES: 'true' >> 'false'
- Initialize the database services
docker-compose up airflow-init
- Running Airflow
docker-compose up
- to cleanup the voluems or Down the container
docker-compose down -v
- Username:
airflow
- Password:
airflow
Here there are 2 Examples are stated based on complaxity
example dags can be found in /dags
folder
which genrally prints the statement pre-written by user
Over iteraring multiple dag invokes we got certain results
Your contributions are what make the open-source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Information about the project's license and any usage restrictions.