Replies: 5 comments 5 replies
-
This is really neat. One comment though is that it looks like this isn't compatible with existing assets? For example, if we had many assets defined using @-asset decorators, these would need to be migrated back to @op-based definitions? I know there are also some outstanding GH issues about being able to control execution for ops within a given graph-backed asset itself (for example running all steps in a single process so that data can be passed around in-memory). Given the heterogeneity of the computational demands of an expensive computation vs. check vs. promotion, this seems like it could be important. |
Beta Was this translation helpful? Give feedback.
-
Oh great, sorry I missed that. To make my second point clearer, suppose I perform a computation with the result held in memory and want to run a check on it without serializing the result in some way. I can use dagster's built-in in-memory io manager, but only if I use some single-process executor. However, that might not be the executor that I want to use across different assets in a given job. This issue already exists with graph-backed assets, but at least from my perspective, this would pop up quite often with asset checks. |
Beta Was this translation helpful? Give feedback.
-
The above examples are for checks that execute in their own Ops. If you're willing to execute the check in the same Op as the asset, blocking is a little easier. Example: @asset(check_specs=[AssetCheckSpec(name='my_check', asset='asset_with_blocking_check_in_same_op')])
def asset_with_blocking_check_in_same_op():
# materialize your asset
yield Output('whatever your asset returns')
# report the check
check_success: bool = ...
yield AssetCheckResult(success=check_success)
# if the check failed, raise an error to block downstreams
if not check_success:
raise AnyExceptionClass() and modified for the stage-then-promote pattern: @asset(check_specs=[AssetCheckSpec(name='my_check', asset='asset_with_blocking_check_in_same_op')])
def asset_with_blocking_check_in_same_op():
# stage
asset_output = 'whatever your asset returns'
# report the check
check_success: bool = ...
yield AssetCheckResult(success=check_success)
# if the check failed, raise an error to block materialization
if not check_success:
raise AnyExceptionClass()
# otherwise, promote
yield Output(asset_output) The downside is that you can't run the check without also running the asset. |
Beta Was this translation helpful? Give feedback.
-
Would |
Beta Was this translation helpful? Give feedback.
-
YES! This is looking neat! I would love to have the same functionality in the |
Beta Was this translation helpful? Give feedback.
-
Asset checks are a new experimental Dagster feature. This discussion covers the APIs for blocking materializations with checks.
Where we started
Our initial designs had two types of asset checks:
What we heard:
Now
In response we made a few changes.
Set severity at runtime
We moved severity from
@asset_check
orAssetCheckSpec
(the definition objects) toAssetCheckResult
.Severity doesn't impact control flow
We removed the feature that setting ERROR severity meant blocking downstream assets. This is actually a requirement of setting it at runtime: you have to decide up front if the downstream assets will wait for the check to complete or not. By the time the
AssetCheckResult
is returned, a downstream asset could have already finished.Custom control flow with
@graph_asset
sWe decided to support user defined logic for blocking via graph-backed assets. This approach is more verbose but more powerful than the previous WARN vs ERROR option.
Note that these examples use managed inputs and outputs. This isn't a requirement for checks, and we'll followup with examples that don't use IO managers.
Here's an asset that blocks downstream assets if the check fails:
It's also possible to switch the order, and check before materializing. Here's an example that creates some staged data, runs some tests, and promotes the data to an actual materialization if the check passes.
Future
We'd love to get feedback on this approach.
Once we have more feedback about common blocking patterns, it's likely that we'll introduce a more concise way to define them on
@asset_check
.Beta Was this translation helpful? Give feedback.
All reactions