Asset checks: severity and blocking materializations #16569

johannkm · 2023-09-15T20:16:42Z

johannkm
Sep 15, 2023

Asset checks are a new experimental Dagster feature. This discussion covers the APIs for blocking materializations with checks.

Where we started

Our initial designs had two types of asset checks:

checks that raised warnings, and didn't impact control flow
checks that raised errors, and blocked downstream assets from running.

@asset_check(
    asset=orders,
    severity=AssetCheckSeverity.ERROR, # or WARN
)
def orders_id_has_no_nulls(context):
    return AssetCheckResult(success=True)

What we heard:

I don't want to define severity up front. E.g. I have a threshold value for warn, and a higher threshold for error
I want to set error severity without necessarily blocking downstream assets
I want to use checks to block the materialization of the asset itself, not only downstream assets

Now

In response we made a few changes.

Set severity at runtime
We moved severity from @asset_check or AssetCheckSpec (the definition objects) to AssetCheckResult.

@asset_check(asset=my_asset)
def always_pass_check():
    return AssetCheckResult(
        success=True,
        severity=AssetCheckSeverity.WARN
    )

Severity doesn't impact control flow
We removed the feature that setting ERROR severity meant blocking downstream assets. This is actually a requirement of setting it at runtime: you have to decide up front if the downstream assets will wait for the check to complete or not. By the time the AssetCheckResult is returned, a downstream asset could have already finished.

Custom control flow with @graph_assets
We decided to support user defined logic for blocking via graph-backed assets. This approach is more verbose but more powerful than the previous WARN vs ERROR option.

Note that these examples use managed inputs and outputs. This isn't a requirement for checks, and we'll followup with examples that don't use IO managers.

Here's an asset that blocks downstream assets if the check fails:

@op
def get_data_op():
    """
    Do the actual work to materialize
    """
    return "foobar"


@op(out={"result": Out(), "blocking_graph_asset_random_fail_check": Out()})
def blocking_check_op(data):
    '''
    Check the materialized data. We pass data through this Op so that downstreams
    assets will wait for the check to complete before executing.
    '''
    yield Output(data)

    success = ...
    yield AssetCheckResult(success=success) # We currently need to yield here, but may simplify this in the future

    if not success:
        raise Exception("The check failed, so block downstreams!")


@graph_asset(
    group_name="asset_checks",
    check_specs=[
        AssetCheckSpec(
            name="random_fail_check",
            asset="blocking_graph_asset",
        )
    ],
)
def blocking_graph_asset():
    """
    An asset that materializes data and then runs a check on it. If the check fails, it will raise an
    exception so that downstreams don't execute.
    """
    data = get_data_op()
    return blocking_check_op(data)

It's also possible to switch the order, and check before materializing. Here's an example that creates some staged data, runs some tests, and promotes the data to an actual materialization if the check passes.

@op
def create_staged_data():
    return "foo"


@op
def test_staged_data(staged_data):
    result = AssetCheckResult(success=...,)
    yield result
    if not result.success:
        raise Exception("Raising an exception to block promotion.")


@op(ins={"staged_asset": In(), "check_result": In(Nothing)})
def promote_staged_asset(staged_asset):
    return staged_asset


@graph_asset(
    group_name="asset_checks",
    check_specs=[
        AssetCheckSpec("random_fail_and_raise_check", asset="stage_then_promote_graph_asset")
    ],
)
def stage_then_promote_graph_asset():
    staged_data = create_staged_data()
    check_result = test_staged_data(staged_data)
    return {
        "result": promote_staged_asset(staged_asset, check_result),
        "stage_then_promote_graph_asset_random_fail_and_raise_check": check_result,
    }

Future

We'd love to get feedback on this approach.

Once we have more feedback about common blocking patterns, it's likely that we'll introduce a more concise way to define them on @asset_check.

erinov1 · 2023-09-22T16:28:12Z

erinov1
Sep 22, 2023

This is really neat. One comment though is that it looks like this isn't compatible with existing assets? For example, if we had many assets defined using @-asset decorators, these would need to be migrated back to @op-based definitions?

I know there are also some outstanding GH issues about being able to control execution for ops within a given graph-backed asset itself (for example running all steps in a single process so that data can be passed around in-memory). Given the heterogeneity of the computational demands of an expensive computation vs. check vs. promotion, this seems like it could be important.

1 reply

johannkm Sep 22, 2023
Author

Here's a prototype of a workaround for that: #16612. We can add a factory method that turns your @-asset definitions in to graph assets with a blocking check

erinov1 · 2023-09-22T16:38:55Z

erinov1
Sep 22, 2023

Here's a prototype of a workaround for that: #16612. We can add a factory method that turns your @-asset definitions in to graph assets with a blocking check

Oh great, sorry I missed that.

To make my second point clearer, suppose I perform a computation with the result held in memory and want to run a check on it without serializing the result in some way. I can use dagster's built-in in-memory io manager, but only if I use some single-process executor. However, that might not be the executor that I want to use across different assets in a given job. This issue already exists with graph-backed assets, but at least from my perspective, this would pop up quite often with asset checks.

1 reply

johannkm Sep 25, 2023
Author

When you want to hold the result in memory, the easiest option is to compute the asset and the check in the same Op. I shared examples of writing blocking checks this way in the comment below #16569 (comment)

johannkm · 2023-09-25T20:37:34Z

johannkm
Sep 25, 2023
Author

The above examples are for checks that execute in their own Ops. If you're willing to execute the check in the same Op as the asset, blocking is a little easier. Example:

@asset(check_specs=[AssetCheckSpec(name='my_check', asset='asset_with_blocking_check_in_same_op')])
def asset_with_blocking_check_in_same_op():
    # materialize your asset
    yield Output('whatever your asset returns')

    # report the check
    check_success: bool = ...
    yield AssetCheckResult(success=check_success)

    # if the check failed, raise an error to block downstreams
    if not check_success:
        raise AnyExceptionClass()

and modified for the stage-then-promote pattern:

@asset(check_specs=[AssetCheckSpec(name='my_check', asset='asset_with_blocking_check_in_same_op')])
def asset_with_blocking_check_in_same_op():
    # stage
    asset_output = 'whatever your asset returns'

    # report the check
    check_success: bool = ...
    yield AssetCheckResult(success=check_success)

    # if the check failed, raise an error to block materialization
    if not check_success:
        raise AnyExceptionClass()
        
    # otherwise, promote
    yield Output(asset_output)

The downside is that you can't run the check without also running the asset.

0 replies

danielgafni · 2023-09-26T13:02:31Z

danielgafni
Sep 26, 2023
Collaborator

Would AssetCheckSeverity interact with AutoMaterializePolicy in the same way? E.g. failed ERROR being blocking and failed WARNING being non-blocking?

3 replies

johannkm Sep 26, 2023
Author

Severity no longer impacts blocking behavior, it's currently replaced with the graph asset approach above. ~~AutoMaterializePolicys will work with the graph asset approach.~~

Edit, AMPs do not currently support blocking checks. See #17332

danielgafni Sep 26, 2023
Collaborator

Oh ok! This works fine for me

danielgafni Sep 26, 2023
Collaborator

Hmm checking right now in Dagster Cloud with a long-running @asset_check.
Seems like the downstream assets are not being automatically materialized.

mgierada · 2023-12-05T08:09:37Z

mgierada
Dec 5, 2023

YES! This is looking neat! I would love to have the same functionality in the @dbt_asset.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asset checks: severity and blocking materializations #16569

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Asset checks: severity and blocking materializations #16569

johannkm Sep 15, 2023

Where we started

Now

Future

Replies: 5 comments · 5 replies

erinov1 Sep 22, 2023

johannkm Sep 22, 2023 Author

erinov1 Sep 22, 2023

johannkm Sep 25, 2023 Author

johannkm Sep 25, 2023 Author

danielgafni Sep 26, 2023 Collaborator

johannkm Sep 26, 2023 Author

danielgafni Sep 26, 2023 Collaborator

danielgafni Sep 26, 2023 Collaborator

mgierada Dec 5, 2023

johannkm
Sep 15, 2023

Replies: 5 comments 5 replies

erinov1
Sep 22, 2023

johannkm Sep 22, 2023
Author

erinov1
Sep 22, 2023

johannkm Sep 25, 2023
Author

johannkm
Sep 25, 2023
Author

danielgafni
Sep 26, 2023
Collaborator

johannkm Sep 26, 2023
Author

danielgafni Sep 26, 2023
Collaborator

danielgafni Sep 26, 2023
Collaborator

mgierada
Dec 5, 2023