This is a demo project to create an ELT pipeline using airbyte, dbt, snowflake and dagster.
Accompanying presentation here
-
Create a new snowflake account here
-
Run the SQL code snowflake_airbyte_setup.sql to configure an Airbyte account in Snowflake
-
Export the following environment variables
export AIRBYTE_PASSWORD=your_snowflake_password_for_airbyte_account export SNOWFLAKE_ACCOUNT_ID=your_snowflake_password
-
Install the python dependencies
pip install -r requirements.txt
-
Create the mock source database by:
- Install postgresql
- Create a new database in your localhost called
dvdrental
- Unzip dvdrental.zip
- Use PgAdmin4 to restore dvdrental
- Create a source for the postgresql database
dvdrental
- host:
host.docker.internal
- host:
- Create a destination for the Snowflake database
- Create a connection between
dvdrental
andsnowflake
- Namespace Custom Format:
<your_destination_schema>
- Namespace Custom Format:
- Run the sync job
- Log in to your snowflake account
- Go to
worksheets
>+ worksheet
- On the top right, select the role
ACCOUNTADMIN.AIRBYTE_WAREHOUSE
. - On the top left of the worksheet, select
AIRBYTE_DATABASE.AIRBYTE_SCHEMA
- Query one of the synced tables from airbyte e.g.
select * from customer
cd
totransform/dw
- Execute the command
dbt docs generate
anddbt docs serve
to create the dbt docs and view the lineage graph - Execute the command
dbt build
to run and test dbt models
cd
toorchestrate/elt
- Execute the command
dagit
to launch the dagit user interface - Go to
workspace
>jobs
>elt_job
>launchpad
>launch run
to launch the run