Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update dagster asset check to fail"better" #65

Merged
merged 3 commits into from
Mar 11, 2024

Conversation

cnolanminich
Copy link
Contributor

This PR updates the asset check on the RAW_DATA.users asset in a few ways

  • compares the types to a set (previously it would almost always fail because lists are unordered)
  • forces it to fail by omitting one of the observed companies in the expected companies, and by adding an expected company that is not in observed
  • this one I want feedback on -- I added a markdown data element (see the image below) after following along with some of the cool metadata output types @slopp wrote up in this GitHub Discussion. I wanted to show the "in expected not observed" and "in observed not expected" in some sort of way to show off how powerful the metadata exposures can be, but the data frame --> markdown I did here leaves a lot to be desired.
image

Will pull this out of draft mode once we get some consensus on the metadata output for this

Copy link

github-actions bot commented Mar 8, 2024

Your pull request is automatically being deployed to Dagster Cloud.

Location Status Link Updated
demo_assets View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
snowflake_insights View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
basics View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
data-eng-pipeline View in Cloud Mar 11, 2024 at 06:22 PM (UTC)
batch_enrichment View in Cloud Mar 11, 2024 at 06:22 PM (UTC)

@slopp
Copy link
Contributor

slopp commented Mar 8, 2024

Yea I like the idea of using markdown output metadata in an asset check, but I think this particular data frame is fairly confusing because what you are presenting isn't really a dataframe, the rows aren't meaningful across columns.

I think of this data check more as looking for bad company data, not necessarily checking that every expected company was observed. So maybe we could simplify it as:

MetadataValue.md(f"Observed the following unexpected companies: {list(unique_companies - expected_companies)}") 

@cnolanminich
Copy link
Contributor Author

Oh yeah I agree! I'll push that change and then this one will be in good shape

@cnolanminich
Copy link
Contributor Author

Fixed!
image

@cnolanminich cnolanminich marked this pull request as ready for review March 11, 2024 14:09
Copy link
Contributor

@slopp slopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment on removing what is now dead code

hooli_data_eng/assets/raw_data/__init__.py Outdated Show resolved Hide resolved
@cnolanminich cnolanminich merged commit 88c4300 into master Mar 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants