Skip to content

Latest commit

 

History

History
 
 

02-workflow-orchestration

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

If you're looking for Airflow videos from the 2022 edition, check the 2022 cohort folder.
If you're looking for Prefect videos from the 2023 edition, check the 2023 cohort folder.

Week 2: Workflow Orchestration

Welcome to Week 2 of the Data Engineering Zoomcamp! 🚀😤 This week, we'll be covering workflow orchestration with Mage.

Mage is an open-source, hybrid framework for transforming and integrating data. ✨

This week, you'll learn how to use the Mage platform to author and share magical data pipelines. This will all be covered in the course, but if you'd like to learn a bit more about Mage, check out our docs here.

📕 Course Resources

2.2.1 - 📯 Intro to Orchestration

In this section, we'll cover the basics of workflow orchestration. We'll discuss what it is, why it's important, and how it can be used to build data pipelines.

Videos

  • What is Orchestration?

Resources

2.2.2 - 🧙‍♂️ Intro to Mage

In this section, we'll introduce the Mage platform. We'll cover what makes Mage different from other orchestrators, the fundamental concepts behind Mage, and how to get started. To cap it off, we'll spin Mage up via Docker 🐳 and run a simple pipeline.

Videos

  • What is Mage?
  • Configuring Mage
  • A Simple Pipeline

Resources

2.2.3 - 🐘 ETL: API to Postgres

Hooray! Mage is up and running. Now, let's build a real pipeline. In this section, we'll build a simple ETL pipeline that loads data from an API into a Postgres database. Our database will be built using Docker— it will be running locally, but it's the same as if it were running in the cloud.

Videos

  • Configuring Postgres
  • Writing an ETL Pipeline

Resources

2.2.4 - 🤓 ETL: API to GCS

Ok, so we've written data locally to a database, but what about the cloud? In this tutorial, we'll walk through the process of using Mage to extract, transform, and load data from an API to Google Cloud Storage (GCS).

We'll cover both writing partitioned and unpartitioned data to GCS and discuss why you might want to do one over the other. Many data teams start with extracting data from a source and writing it to a data lake before loading it to a structured data source, like a database.

Videos

  • Configuring GCP
  • Writing an ETL Pipeline

Resources

2.2.5 - 🔍 ETL: GCS to BigQuery

Now that we've written data to GCS, let's load it into BigQuery. In this section, we'll walk through the process of using Mage to load our data from GCS to BigQuery. This closely mirrors a very common data engineering workflow: loading data from a data lake into a data warehouse.

Videos

  • Writing an ETL Pipeline

2.2.6 - 👨‍💻 Parameterized Execution

By now you're familiar with building pipelines, but what about adding parameters? In this video, we'll discuss some built-in runtime variables that exist in Mage and show you how to define your own! We'll also cover how to use these variables to parameterize your pipelines.

Videos

  • Parameterized Execution

Resources

2.2.7 - 🤖 Deployment (Optional)

In this section, we'll cover deploying Mage using Terraform and Google Cloud. This section is optional— it's not necessary to learn Mage, but it might be helpful if you're interested in creating a fully deployed project. If you're using Mage in your final project, you'll need to deploy it to the cloud.

Videos

  • Deployment Prerequisites
  • Google Cloud Permissions
  • Deploying to Google Cloud

Resources

Additional Mage Guides

2.2.8 - 🧱 Advanced Blocks (Optional)

Our final learning section is also optional— on using advanced block methods. We'll cover dynamic blocks, conditional blocks, replica blocks, and callback blocks. These are all advanced topics, but they're also very powerful and can help take your pipelines to the next level.

Videos

  • Advanced Blocks

Resources

2.2.9 - 🗒️ Homework

We've prepared a brief homework assignment to help you practice what you've learned. Give it a go and feel free to reach out to us on Slack if you have any questions! You can also find the solutions in the solutions section.

Videos

  • Homework Overview

Resources

2.2.10 - 👣 Next Steps

Congratulations! You've completed Week 2 of the Data Engineering Zoomcamp. We hope you've enjoyed learning about Mage and that you're excited to use it in your final project. If you have any questions, feel free to reach out to us on Slack. Be sure to check out our "Next Steps" video for some inspiration for the rest of your journey 😄.

Videos

  • Next Steps

Resources

📑 Additional Resources

✅ Solutions and Examples

If you're looking for the solutions or completed examples from the course, you can take a look at the solutions branch of the course repo.

git checkout solutions

Running docker compose up on the solutions branch will start the container with the solutions loaded. Note: this will overwrite the files in your local repo. Be sure to commit your files to a separate branch if you'd like to save your work.

Navigate to http://localhost:6789 in your browser to see the solutions. Optionally, use tag sorting to group solutions by tag.

Community notes

Did you take notes? You can share them here:

2024 notes

2023 notes

2022 notes

Most of these notes are about Airflow, but you might find them useful.