Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dbt package for tests and reload manifest #66

Merged
merged 9 commits into from
Mar 19, 2024

Conversation

cnolanminich
Copy link
Contributor

This PR does 2 things related to our use of dbt in the hooli project:

  • demonstrates how to use a test from dbt-expectations. I chose a test that is slightly silly, but I think the idea of showing a text-based comparison is powerful (like, for websites / names / other columns folks might want to test on).
  • I used the examples in the Dagster & dbt university course (Lesson 4, "speeding up the development cycle) to re-parse the dbt project when the definitions are reloaded. I left make manifest in because we will still need it for deployment for production

One note: I tried to use the new experimental DbtArtifacts class from 1.6.9 but it did not work -- it looks like it will need to expose all options of the DbtCliResource (right now I don't see a good way to set the profiles-dir argument in that class). Will post about it and see if that's expected.

The code that didn't work looked like this:

dbt_artifacts = DbtArtifacts(
    project_dir=DBT_PROJECT_DIR,
    #profiles_dir=DBT_PROFILES_DIR, #does not work is not a positional arg
    #target="BRANCH",  # does not work / is not a positional arg
    prepare_command=["--quiet","parse","--target BRANCH"],
)
DBT_MANIFEST = dbt_artifacts.manifest_path

@cnolanminich cnolanminich requested review from slopp and izzye84 March 8, 2024 20:50
Copy link

github-actions bot commented Mar 8, 2024

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
demo_assets View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
snowflake_insights View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
basics View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
data-eng-pipeline View in Cloud Mar 19, 2024 at 03:44 PM (UTC)
batch_enrichment View in Cloud Mar 19, 2024 at 03:44 PM (UTC)

@slopp
Copy link
Contributor

slopp commented Mar 8, 2024

The name of this asset check is kind of a bummer:

Screen Shot 2024-03-08 at 2 50 50 PM

Do you know what it looks like if it fails?

@@ -64,6 +64,12 @@ def get_env():
# Similar to having different dbt targets, here we create the resource
# configuration by environment

dbt_resource = DbtCliResource(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little confusing to me, this resource is only used in the local case right? And only for manifest generation? Whereas the other resources here, including the other use of dbt, are all specific to local, branch, or prod.

I'd maybe consider moving this to within the if block you added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great callout, wasn't satisfied with this overall (I was following the examples in Dagster & dbt for consistency, but what I actually wanted was to use the DbtArtifact class I mentioned in the comments. With an assist in Slack I was able to get that up and running 🎉 ,so this code is gone now

@@ -24,6 +24,7 @@
)
from dagster_dbt.asset_decorator import dbt_assets
from dagster._utils import file_relative_path
from ..resources import dbt_resource
Copy link
Contributor

@slopp slopp Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would generally prefer absolute reference paths

hooli_data_eng.resources

or whatever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is gone now but 👍 for the future

@cnolanminich
Copy link
Contributor Author

The name of this asset check is kind of a bummer:

I definitely agree -- this is a dbt thing, so in the spirit of matching what you would see this test called in dbt I wasn't sure if we should change it. I'm open to a re-naming it, especially for demo purposes.

Do you know what it looks like if it fails?

Yeah, it's relatively uninformative tbh -- right now it just tells you it fails. If we want to get a bit more fancy, we could likely read in the run_results.json along the lines that @izzye84 did in #63 but to read in the test results to show how many rows failed. It won't show any more than that, since that's all dbt gives us, but it's certainly better than what is currently here

@cnolanminich
Copy link
Contributor Author

added the column schema info (pairs nicely with this PR as it adds a dbt package dagster to emit the schema.

Taken from here: https://docs.dagster.io/integrations/dbt/reference#emit-column-schema-as-materialization-metadata-

It just works! Will want to see it in Snowflake as well as duckdb

image

- dbt_expectations.expect_column_values_to_match_like_pattern_list:
like_pattern_list: ["%Sport%","%Co%","%Ltd%", "%Shop%"]
match_on: any # (Optional. Default is 'any', which applies an 'OR' for each pattern. If 'all', it applies an 'AND' for each regex.)
alias: example_alias
Copy link
Collaborator

@izzye84 izzye84 Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it the alias didn't work? (I wasn't sure if it would with tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have uncommitted changes for how to do it with dbt models and asset names, but yeah it looks like it would require changes to the Dagster-dbt library for test result aliases.

Same story for test result metadata, though I think we could get it by accessing the raw_event_stream and then appending the output to the AssetCheckResult

@slopp
Copy link
Contributor

slopp commented Mar 16, 2024 via email

@cnolanminich cnolanminich merged commit f0a83d4 into master Mar 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants