An open source repository for pulling Canvas Data 2 data into a dbt project.
This project assumes that you will use Google Cloud Storage (GCS) for external file storage and BigQuery for the database layer. Before beginning, you will need to:
- create a gcloud project
- create a service account, and download a credentials file
- set up the gcloud cli
Once you have your Google Cloud environment set up, you can:
- Create a GCS bucket. Within your bucket, create a folder called
canvasdata2_api
. GrantStorage Legacy Bucket Reader
andStorage Object Admin
access to your service account. - Set all environment variables and save to
.env
. See the.env-sample
file for a template. DO NOT COMMIT SENSITIVE DATA TO GIT - Install a Python virtual environment using
poetry install
. - Install dbt packages using
dbt deps
. - There is a 'chicken and egg' problem in using dagster to view dbt assets the first time, since the dbt assets rely on Canvas data that has not been created yet. To load some initial data, run
DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev -f dagster/repository.py
and execute theall_canvasdata2_job
job. - Once you have successfully loaded your first Canvas data, exit the dagster webserver and build your dbt project using
dbt run-operation stage_external_sources
and thendbt build
. - Enter the dagster webserver again (
dagster dev -f dagster/repository.py
), and you should now see your assets.