-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[single-implicit-asset-job] methods on CodeLocation
to get job partitions, partition tags, and partition config, with asset selection
#23491
Conversation
CodeLocation
for get job partitions, partition tags, and partition config
CodeLocation
for get job partitions, partition tags, and partition configCodeLocation
to get job partitions, partition tags, and partition config
CodeLocation
to get job partitions, partition tags, and partition configCodeLocation
to get job partitions, partition tags, and partition config, with asset selection
f2cc723
to
0fdc2d3
Compare
41e2efe
to
9e9936a
Compare
43f584a
to
34a855a
Compare
93579be
to
8500be5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sandy can you say more about what we are doing here to ensure that selected_asset_keys is only used/needed when there are newer versions of user code? how does that requirement get threaded through?
8500be5
to
28f571b
Compare
Why it's only needed when there are newer versions of user code:1.8.1 will be the first Dagster version where it's possible for a job to have multiple In 1.8.1, it will be possible for an asset job to target assets with different
Why it's only used when there are newer versions of user codeThis might reveal a naive misunderstanding of how things work, but I thought the answer to this was basically "only versions of user code on or 1.8.1 after will know about this |
"selected_asset_keys is not needed and can be safely ignored prior to 1.8.1 " answers my question - this scheme works as long as the grpc client isn't relying on the selected_asset_keys to do filtering, even on older versions. (For example, if this were like the asset_selection argument which causes certain runs to subset the assets that are being materialized - it would be bad if that argument was sometimes dropped on the floor by the server and ignored, because then you would launch a run for every asset instead of just the assets being selected - but it sounds like this argument is not like that) |
34a855a
to
5ed7b66
Compare
CodeLocation
to get job partitions, partition tags, and partition config, with asset selectionCodeLocation
to get job partitions, partition tags, and partition config, with asset selection
28f571b
to
fc88adb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not actually seeing anything set the new selected_asset_keys field other than derived it from the legacy partition set - is there still a change coming that has the frontend vary that field in some way that it wasn't before?
job_name: str, | ||
partition_name: str, | ||
selected_asset_keys: Optional[AbstractSet[AssetKey]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any other validation that you cloud do here to validate that its a valid job name / selected asset keys combo? Like that if selected_asset_keys is None, that the job name has some property that only partition set jobs have?
python_modules/dagster/dagster/_core/remote_representation/code_location.py
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/remote_representation/code_location.py
Outdated
Show resolved
Hide resolved
partitions_def.validate_partition_key( | ||
partition_name, dynamic_partitions_store=instance | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was removing this part intentional / related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was intentional, though I should have flagged it and could be easily nudged to break it out into its own PR.
If you look at _instance_from_ref_for_dynamic_partitions
, there's this weirdness right now where we only construct an instance ref if the PartitionsDefinition
is a DynamicPartitionsDefinition
, because otherwise we want the user to not need a dagster.yaml in their code server.
HOWEVER, part of what's happening in this PR is that we abstract away the PartitionsDefinition
from this code path (this allows a future where someone executes a run for "2024-04-23" for asset1 and asset2, but those assets have different daily PartitionsDefinition
s with different start dates). So now we don't have the PartitionsDefinition
to look at and see if it's dynamic.
We could add some sort of JobDefinition.partitions_def_is_dynamic(selected_asset_keys)
method to retain the existing behavior. HOWEVER HOWEVER, the reason I deleted this code is that I question whether this validation is really important. It results in an extra round trip to the DB. And this weirdness with the instance ref. And the backend code paths that invoke it already know the partition exists.
], | ||
) | ||
): | ||
def __new__( | ||
cls, | ||
repository_origin: RemoteRepositoryOrigin, | ||
job_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break when users upgrade their dagit/daemon versions without also upgrading their user code - we should make sure that we are prepared for user reports of that and maybe ensure that our docs are clear about exactly what our back-compat guarantees are in OSS - in the past we have been kind of wishy washy about it and I think we even have tests in our backcompat suite that exercise new versions of OSS with old versions of dagit
Maybe it's something we could detect in dagit and warn about so that the resulting error is less cryptic? Code servers tell us what dagster versions they are using
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break when users upgrade their dagit/daemon versions without also upgrading their user code
Do you mean it will break when users upgrade their user code without upgrading their dagit/daemon versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry, I had that reversed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe ensure that our docs are clear about exactly what our back-compat guarantees are in OSS
Here's a PR that adds a section on this: #23734.
Maybe it's something we could detect in dagit and warn about so that the resulting error is less cryptic?
Thoughts on where we would add this? The gRPC call that (re)loads the definitions? Do you think we should add this before merging this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think it needs to block the PR - if there’s a known exception it will emit we could catch it and add “this error may be due to version skew” in the error message.
5ed7b66
to
af98201
Compare
fc88adb
to
1f24881
Compare
…to grpc partition api params (#23527) ## Summary & Motivation "Partition set"s are an eliminated concept that lives on like a ghost inside our host processes and gRPC APIs. This PR is the bottom of a stack that makes a dent at removing it. When a job has a `PartitionsDefinition`, we also automatically generate a `PartitionSetSnap` whose name is f"{job_name}_partition_set". This is a relic from a past era when jobs could have multiple partition sets. This partition set name is then referenced from host processes when they need to get partition-related information for the job: - The set of partitions - The tags corresponding to a particular partition - The config corresponding to a particular partition This is problematic for two reasons: - The redundant and vestigial "partition set" concept is threaded through a bunch of our host process code - When an asset job targets assets with different PartitionsDefinitions, there is no partition set for it. However, we still need to be able to get the partitions, tags, and config for subsets of assets, e.g. to populate the Launchpad. This functionality in the Launchpad has been broken since the single-implicit-asset-job PR was merged. ### Backwards compatibility After this change (and some of the changes stacked on top), there will be two kinds of code locations: 1. Code locations that don't know about these new params and never have jobs that include assets with different `PartitionsDefinition`s. - These code locations will continue to use the `partition_set_name` param, which continues to be supplied by host processes, to find the associated job and fetch the associated data. 2. Code locations that do know about these new params and may have jobs that include assets with different `PartitionsDefinition`s. - These code locations will make use of the new params. ## How I Tested These Changes Up-stack: #23491
1f24881
to
9be36bd
Compare
Oops yes, I should have linked here: #23494. And then @bengotow has a branch for the frontend changes. |
…` to `GrapheneJob` (#23494) ## Summary & Motivation This enables the frontend to query for partition information with respect to an asset selection within a job. This enables a world where jobs can contain assets with different `PartitionsDefinition`s. It's used by @bengotow 's `partition-set-to-job/graphql-ui` branch. Sits atop #23491. ## How I Tested These Changes
…itions, partition tags, and partition config, with asset selection (#23491) ## Summary & Motivation This builds on top of #23527 to do two related things: - On the host process side, refactors `CodeLocation` methods and call sites to use job names and selected asset keys instead of partition set names. - On the user code process side, updates the implementations of grpc APIs that fetch partition information to pass selected asset keys to the `JobDefinition`. ## How I Tested These Changes (cherry picked from commit 18993a4)
…` to `GrapheneJob` (#23494) ## Summary & Motivation This enables the frontend to query for partition information with respect to an asset selection within a job. This enables a world where jobs can contain assets with different `PartitionsDefinition`s. It's used by @bengotow 's `partition-set-to-job/graphql-ui` branch. Sits atop #23491. ## How I Tested These Changes (cherry picked from commit 3bd53c0)
…itions, partition tags, and partition config, with asset selection (#23491) ## Summary & Motivation This builds on top of #23527 to do two related things: - On the host process side, refactors `CodeLocation` methods and call sites to use job names and selected asset keys instead of partition set names. - On the user code process side, updates the implementations of grpc APIs that fetch partition information to pass selected asset keys to the `JobDefinition`. ## How I Tested These Changes (cherry picked from commit 18993a4)
…` to `GrapheneJob` (#23494) ## Summary & Motivation This enables the frontend to query for partition information with respect to an asset selection within a job. This enables a world where jobs can contain assets with different `PartitionsDefinition`s. It's used by @bengotow 's `partition-set-to-job/graphql-ui` branch. Sits atop #23491. ## How I Tested These Changes (cherry picked from commit 3bd53c0)
…3807) ## Summary & Motivation While working on #23491 and #23494, I wrote a set of tests that validate that operations like `get_tags_for_partition` over GraphQL and gRPC, when an asset job includes assets with different `PartitionsDefinition`s and an asset selection is provided. However, I had to take them out due to the way the stack was sequenced. This PR adds them back in. ## How I Tested These Changes ## Changelog NOCHANGELOG
Summary & Motivation
This builds on top of #23527 to do two related things:
CodeLocation
methods and call sites to use job names and selected asset keys instead of partition set names.JobDefinition
.How I Tested These Changes