Skip to content

tomknapp/learn-dbt

Repository files navigation

dbt and Airflow Playground

Provides a dockerised environment and an initial database schema and data to play around with data build tool and Apache Airflow.

A postgres database container is created by the docker compose file.

Database setup is performed on startup by applying database migrations to create the schema and populate it with faked data.

Liquibase is used as the database migration tool and is created as a docker container that runs against the database, conditional on the database reaching a healthy state.

dbt

dbt provides functionality to transformation data for analysis or consumption. It is generally used for the Transform part of ELT pipelines.

Airflow

Apache Airflow provides batch workflow functionality including scheduling and monitoring. Extract and Load pipelines can be written and run in Airflow to provide the source data to be Transformed by dbt.

Setup

Environment variables

Create a .env file in the project root directory based on the .env_example

Generate a (connection parameter compatible) secure password:

openssl rand -base64 48 | tr -dc 'a-zA-Z0-9' | head -c 64

Docker Compose

Setup the docker environment by first building the containers contained in the docker-compose.yml file.

docker compose build

Startup the docker environmement to create the database, run the initial liquibase migrations and setup Airflow.

docker compose up -d

Setup dbt

dbt commands can be run within the dbt_playground directory to debug, test and run models.

Create the profile for dbt to use to connect to the source database.

If you haven't installed dbt previously, create a .dbt folder in your home directory and create a profiles.yml file:

cd ~
mkdir .dbt
cd .dbt
touch profiles.yml

Add a profile for the dbt-playground to the profiles.yml file:

dbt_playground:
  outputs:
    dev:
      dbname: <DB_NAME from your .env file>
      host: localhost
      pass: <DB_PASSWORD from your .env file>
      port: 5432
      schema: dbt
      threads: 1
      type: postgres
      user: <DB_USER from your .env file>
  target: dev

From the learn-dbt project directory, change into the dbt_playground directory:

cd dbt_playground

Test the setup and database connection by running:

dbt debug

Install the dbt dependencies:

dbt deps

Load the seed data:

dbt seed

Run the dbt models:

dbt run

Build the documentation:

dbt docs generate

View the documentation (when Airflow isn't running due to port conflict):

dbt docs serve

Using the playground

Some example usage

Using Airflow

When the docker environment is running, the Airflow UI can be accessed at: http://localhost:8080

Sign in with username: airflow, password: airflow

Create a database connection for the example ons_postcode_pipeline.py to use:

  • Navigate to Admin > Connections
  • Add a new connection of type postgres
  • Enter the connection_id as "postgres_dag_connection" to match the one set in the ons_postcode_pipeline.py file
  • Use the connection parameters from the project .env file

About

Docker Environment for learning DBT/Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published