Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(asset-checks): allow asset check subsetting #16672

Merged
merged 3 commits into from
Sep 27, 2023

Conversation

rexledesma
Copy link
Contributor

@rexledesma rexledesma commented Sep 20, 2023

Summary & Motivation

Build on #16610 and #16219 to enable asset check subselection. This change was painful and a doozy.

To wrangle this complexity in the future we need to:

  • Encompass a succinct and interpretable version of XXXSelection to seamlessly enable the asset selection and asset check selection use case
  • Standardize our usage of None, [], {}, when indicating that a selection is defaulting to select all the objects (assets and asset checks).
    • Rather than using None, an explicit sentinel value should be used (e.g. XXXSelection.ALL, so that the value indicates its usage. Likewise, [] and {} should be replaced with XXXSelection.EMPTY).

How I Tested These Changes

  • When materializing a subsetted asset, there are four cases:
    • Materialize without a selection
    • Materialize with a selection of assets and checks
    • Materialize with only a selection of assets
    • Materialize with only a selection of checks
  • Remove any usages of AssetSelection that break the new subsetting invariants (e.g. Add a checks only job to toys #16638)
  • Existing pytest
  • Stack on dbt implementation

@rexledesma
Copy link
Contributor Author

rexledesma commented Sep 20, 2023

@rexledesma rexledesma force-pushed the rl/allow-subsetted-asset-checks branch 6 times, most recently from cf5a9fa to c2e5215 Compare September 21, 2023 01:34
@rexledesma rexledesma requested review from schrockn and johannkm and removed request for schrockn September 21, 2023 01:39
@rexledesma rexledesma marked this pull request as ready for review September 21, 2023 01:39
@rexledesma rexledesma requested a review from schrockn September 21, 2023 01:43
@rexledesma rexledesma self-assigned this Sep 21, 2023
@johannkm johannkm requested a review from sryza September 21, 2023 02:06
@@ -254,18 +257,25 @@ def __init__(
backfill_policy, "backfill_policy", BackfillPolicy
)

if selected_asset_keys is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context to this change for Nick and Sandy: currently for multi assets we just filter asset checks down to the ones that target selected_asset_keys. That's still the default because AssetSelection includes checks for assets by default, but it will be passed in over selected_asset_check_handles now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be passed in over selected_asset_check_handles now

What precisely do you mean by this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ben is going to start sending explicit check selections from the front end, instead of defaulting to None

Copy link
Contributor

@johannkm johannkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tysm for putting this together. Let's get a look from @schrockn or @sryza then merge

@rexledesma rexledesma force-pushed the rl/allow-subsetted-asset-checks branch from c2e5215 to 1d634c7 Compare September 21, 2023 20:19
@@ -794,21 +807,25 @@ def build_asset_selection_job(
build_source_asset_observation_job,
)

included_assets = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little more verbose, but I think it would make this code more bug-proof to explicitly setting each of these in each of the branches below, instead of setting them up here and overwriting them below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most bulletproof thing here is to make a NamedTuple with these properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can also extract out functions that return that object as appopriate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took Sandy's suggestion, I think it's a reasonable improvement. I think we just need to prioritize removing the special handling for None

python_modules/dagster/dagster/_core/definitions/assets.py Outdated Show resolved Hide resolved

@public
@property
def check_handles(self) -> AbstractSet[Tuple[AssetKey, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should expose AssetCheckHandle publicly rather than tuples. Also, I don't think we need both a check_handles and an asset_check_handles property.

Copy link
Member

@schrockn schrockn Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I pushed back on a type since I didn't want to intoroduce a new concept, "handles", to the public API. I think we should keep that nomenclature internal. However a public property called check_handles does the same thing.

How about we do selected_check_specs and return a list of AssetCheckSpec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about the name at all, but I feel pretty strongly that we should expose a type for this. By the fact that a check is globally identifiable by its (asset key, check name), the concept is inherently present in the ontology. Hiding it will just require a set of contortions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any other analogous concepts? Searching for a name here.

It is our first instance of an asset-scoped identifier, although PartitionKey might count.

What about AssetCheckKey? This stays consistent with the naming that a "Key" is a globally unique identifier for a spec in this system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like AssetCheckKey.

We use AssetKeyPartitionKey internally, which would correspond to something like AssetKeyCheckName, though a bit of a mouthful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key + check name are guaranteed unique across checks. We concat them to create Op names which is where you can hit a conflict

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still confused. Asset keys should be globally unique within a deployment. Can you provide a concrete example of a collision?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • asset_key = "a", check_name = "b_c"
  • asset_key = "a_b", check_name = "c"

Copy link
Member

@schrockn schrockn Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mangle the name at least a little bit to make that exceeding unlikely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll pick this conversation back up in a thread

@github-actions
Copy link

github-actions bot commented Sep 22, 2023

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-ijbw93eps-elementl.vercel.app
https://rl-allow-subsetted-asset-checks.dagster-university.dagster-docs.io

Built with commit d64839e.
This pull request is being automatically deployed with vercel-action

@github-actions
Copy link

github-actions bot commented Sep 22, 2023

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-bimn05r5g-elementl.vercel.app
https://rl-allow-subsetted-asset-checks.components-storybook.dagster-docs.io

Built with commit 0d3830a.
This pull request is being automatically deployed with vercel-action

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly about the "handles" question

@@ -794,21 +807,25 @@ def build_asset_selection_job(
build_source_asset_observation_job,
)

included_assets = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most bulletproof thing here is to make a NamedTuple with these properties.

@@ -794,21 +807,25 @@ def build_asset_selection_job(
build_source_asset_observation_job,
)

included_assets = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can also extract out functions that return that object as appopriate

@@ -254,18 +257,25 @@ def __init__(
backfill_policy, "backfill_policy", BackfillPolicy
)

if selected_asset_keys is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be passed in over selected_asset_check_handles now

What precisely do you mean by this?


@public
@property
def check_handles(self) -> AbstractSet[Tuple[AssetKey, str]]:
Copy link
Member

@schrockn schrockn Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I pushed back on a type since I didn't want to intoroduce a new concept, "handles", to the public API. I think we should keep that nomenclature internal. However a public property called check_handles does the same thing.

How about we do selected_check_specs and return a list of AssetCheckSpec?

@rexledesma rexledesma force-pushed the rl/allow-subsetted-asset-checks branch from db87ca0 to 13dfb5f Compare September 25, 2023 13:20
@github-actions
Copy link

github-actions bot commented Sep 25, 2023

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-bb31ie9cm-elementl.vercel.app
https://rl-allow-subsetted-asset-checks.dagster.dagster-docs.io

Direct link to changed pages:

@rexledesma rexledesma force-pushed the rl/allow-subsetted-asset-checks branch from 13dfb5f to ea29644 Compare September 25, 2023 13:28
@johannkm johannkm force-pushed the rl/allow-subsetted-asset-checks branch from ea29644 to 0d3830a Compare September 26, 2023 03:55
johannkm added a commit that referenced this pull request Sep 26, 2023
@johannkm johannkm force-pushed the rl/allow-subsetted-asset-checks branch 2 times, most recently from f4d0c7e to 12b4e44 Compare September 26, 2023 17:19
@johannkm johannkm assigned johannkm and unassigned rexledesma Sep 26, 2023
@johannkm johannkm force-pushed the rl/allow-subsetted-asset-checks branch 8 times, most recently from dc2c0f5 to 6770d5d Compare September 26, 2023 20:16
@johannkm johannkm requested review from schrockn and sryza September 26, 2023 20:16
@johannkm
Copy link
Contributor

Updated to use AssetCheckKeys, and cleaned up some of the branching/added comments

@rexledesma rexledesma force-pushed the rl/allow-subsetted-asset-checks branch from 6770d5d to d405c22 Compare September 26, 2023 22:08
Comment on lines +1112 to +1118
def subset_for(
self,
selected_asset_keys: AbstractSet[AssetKey],
selected_asset_check_keys: Optional[AbstractSet[AssetCheckKey]],
) -> "AssetsDefinition":
"""Create a subset of this AssetsDefinition that will only materialize the assets and checks
in the selected set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While outside the scope of this PR, and we need to move forward, this function definitely makes me think we need a unified concept of some sort of "keyed entity" (spec?) in the asset graph so that we don't have to thread another parallel set of arguments around.

@schrockn schrockn self-requested a review September 27, 2023 00:57
@schrockn
Copy link
Member

So we need to move forward here, but it definitely concerns me that we increasing the complexity of AssetsDefinition and also adding another surface area for selection bugs and surface area.

We should do a fast-follow refactor once things have calmed down a bit, especially to consolidate asset selection and asset check selection into a single object to thread around. However time is short, and we have a release cut to make.

spice_must_flow

@johannkm johannkm force-pushed the rl/allow-subsetted-asset-checks branch from d405c22 to dda4b01 Compare September 27, 2023 01:42
@johannkm johannkm merged commit e69cd3f into master Sep 27, 2023
@johannkm johannkm deleted the rl/allow-subsetted-asset-checks branch September 27, 2023 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants