Replies: 1 comment 4 replies
-
Thank you for this write up! The factory you propose is similar to something we considered initially when designing checks. We liked how easy it was to add checks across your assets, but we were worried about parameterizing. The current factory pattern is maximally flexible for how you build your check based on whatever input you chose. With the other approach we'd have to provide the the apis to parameterize them, and they could get more complicated as we improve checks (e.g. by adding support for checks on multiple assets). I wonder if there's a way to achieve some of the ergonomics you want with the current approach. Re:
I'm curious how you'd feel about something like this: def create_is_key_unique_check(target_asset: AssetsDefinition, column_to_validate: str):
@asset_check(asset=target_asset, name=f"{column_to_validate}_is_unique")
def is_key_unique(asset_to_check: df.DataFrame):
num_duplicates = asset_to_check.duplicated(subset=[column_to_validate]).values.sum()
return AssetCheckResult(passed=bool(num_duplicates == 0))
return is_key_unique where you'd use it next to each asset: @asset
def my_asset(): ...
check_my_asset_id = create_is_key_unique_check(target_asset=my_asset, column_to_validate='id')
@asset
def my_other_asset(): ...
check_my_other_asset_id = create_is_key_unique_check(target_asset=my_other_asset, column_to_validate='id')
check_my_other_asset_other_column = create_is_key_unique_check(target_asset=my_other_asset, column_to_validate='other_column') and load both of these with defs = Definitions(
assets=load_assets_from_current_module(),
asset_checks=load_asset_checks_from_current_module(),
) |
Beta Was this translation helpful? Give feedback.
-
I found it hard to understand why the asset check feature is so limitating.
In my mind, most (90%) of the checks for all assets are similar and then you can create some that target something very unique that you want to validate for a specific asset. But in general, once I would build a asset checks I'll probably apply it to all my assets. For example, nulls, duplicates, difference between previous value, quantity exceeds xyz, increase from a median value, etc.
It would be nice if we could configure fields close to the data in our
@asset
decorator that would be given to an asset checks.What I mean by this is instead of generating a list of things and providing that list to a factory like describe here .
Same thing for the factory pattern that you have defined in the doc
We could instead configure the list of checks with the necessary parameters in the asset decorator.
Or something like a dict or key (That represent the asset check name to run) and the value that represent the parameter to look at.
This would be similar to the feature that Dagster already have for dagster called "partition_expr" where you can define the field that would be used for the partition in the metadata dict.
For most of the use-case that I found with the asset check feature, having them configure directly in the function doesn't really make sense, as it will grow larger and large each time I want to validate something.
https://docs.dagster.io/concepts/assets/asset-checks#checks-that-execute-in-the-same-op-that-materializes-the-asset
Lastly, the factory solution works but isn't really scallable IMO, since we would endup with a big list that we need to keep around and manage instead of specifying the checks at the asset levels.
If I have 100 assets where I want to validate nulls for, I would need to generate a list of 100 elements in the
check_blobs
.Instead configuring the different elements closer to the asset would make more sense IMO.
Since in the configs, the only thing that changes are the
asset name
that could be provided directly by the function name decorated with@asset
and the sql query.In that example, the table and
where
condition are different. But since we can leverage the IO Manager instead of writing the query directly, we could replace thesql
query by a parameter in that case the name of the columns and pass that as a parameter to the@asset_check
function.I hope I was mostly clear in my proposition, don't hesitate to ask question or tell me which feature that's already available that I missed.
Have a great day.
Beta Was this translation helpful? Give feedback.
All reactions