-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[external-assets] ObserveResult #17824
Conversation
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
b48ebba
to
a7fb194
Compare
ad61e04
to
84ab166
Compare
if source_asset.observe_fn is None | ||
else {} | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated, but this block was duplicated
corresponding AssetMaterialization event. | ||
data_version (Optional[DataVersion]): The data version of the asset that was observed. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_results
and data_version
were missing from MaterializeResult
docstring
84ab166
to
5503f7f
Compare
This PR really begs the question of what is the difference between an observation and a materialization? We have parallel objects in multiple layers (ObserveResult and MaterializeResult; AssetMaterialization and AssetObservation) and in fact you have a codepath which relies on them having the same I wonder if there is at a minimum an internal factor here that can reconcile the codepaths better and also keep observations and materialization more in sync Perhaps the right mental model is that a materialization contains an instance of an observation. |
Yes-- I was thinking about this while writing the PR. Here is my mental model:
So the ontology is fuzzy around external assets. If I were designing this de novo I would require all modeled assets to have an In the current state, I'm not sure.
I thought about factoring out a common
Seems reasonable but it is kind of a confusing interpretation to apply to existing event logs. |
5503f7f
to
33d754c
Compare
Deploy preview for dagster-university ready! ✅ Preview Built with commit 33d754c. |
Deploy preview for dagit-storybook ready! ✅ Preview Built with commit 33d754c. |
33d754c
to
fc5f310
Compare
Deploy preview for dagster-docs ready! Preview available at https://dagster-docs-8tvk2r8n0-elementl.vercel.app Direct link to changed pages: |
fc5f310
to
375c613
Compare
f" asset_key, options are: {context.per_invocation_properties.assets_def.keys}" | ||
) | ||
|
||
|
||
def _output_name_for_result_obj( | ||
event: MaterializeResult, | ||
event: Union[MaterializeResult, ObserveResult], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think we should add a marker interface AssetResult
that both MaterializeResult
and ObserveResult
implement for code paths like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason I didn't do something like this is because there is also AssetCheckResult
which would not be included so is kind of confusing, but I'm open to it-- do you think that's preferable to a common base class?
OK I see below comment on base class now
@@ -1,7 +1,7 @@ | |||
from typing import NamedTuple, Optional, Sequence | |||
|
|||
import dagster._check as check | |||
from dagster._annotations import PublicAttr | |||
from dagster._annotations import PublicAttr, experimental | |||
from dagster._core.definitions.asset_check_result import AssetCheckResult | |||
from dagster._core.definitions.data_version import DataVersion | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this let's add the base class. I think `MaterializeResult` and `ObserveResult` can just trivially inherit
class AssetResult(NamedTuple(...)):
# all the stuff
class MaterializeResult(AssetResult):
pass
class ObserveResult(AssetResult):
pass
```<!--__GRAPHITE_HTML_TAG_START__--><span class='graphite__hidden'><br/><br/>See this comment inline on <a href="https://app.graphite.dev/github/pr/dagster-io/dagster/17824?utm_source=unchanged-line-comment">Graphite</a>.</span>
<!--__GRAPHITE_HTML_TAG_END__-->
@@ -168,10 +168,12 @@ def _filter_expected_output_defs( | |||
result_tuple = ( | |||
(result,) if not isinstance(result, tuple) or is_named_tuple_instance(result) else result | |||
) | |||
materialize_results = [x for x in result_tuple if isinstance(x, MaterializeResult)] | |||
materialize_or_observe_results = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah asset_results
is better
python_modules/dagster/dagster/_core/execution/plan/compute_generator.py
Show resolved
Hide resolved
# This is a temporary workaround to prevent duplicate observation events from external | ||
# observable assets that were auto-converted from source assets. These assets yield | ||
# observation events through the context in their body, and will continue to do so until we | ||
# can convert them to using ObserveResult, which requires a solution to partition-scoped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add linear tasks for these follow ups?
1f3aa24
to
08f4a01
Compare
542b6a7
to
80b7df4
Compare
dc76212
to
1625c87
Compare
@@ -359,7 +359,7 @@ def dagster_internal_init( | |||
is_subset=is_subset, | |||
) | |||
|
|||
def __call__(self, *args: object, **kwargs: object) -> object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it slipped in, removed
if execution_type == AssetExecutionType.MATERIALIZATION | ||
else () | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would strongly prefer to have this structured so that there is no elif
to demonstrate to the code reader that under no circumstances will these blocks be silently skipped.. The check against UNEXECUTABLE
in elif
is unnecessary given the invariant at top of function.
If you still want to check against unexecutable down here, do so as an invariant.
else:
check.invariant(execution_type != AssetExecutionType.UNEXECUTABLE)
yield from (...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, changed to else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this looks good. Please heed final comment about the elif
in core execution.
Also @sryza definitely want your signoff on this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add this to the api docs, right?
|
||
@experimental | ||
class ObserveResult(AssetResult): | ||
"""An object representing a successful observation of an asset. These can be returned from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments aren't accurate, right? (I know that in the future they may become accurate if we choose to go the execution_type
route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments are accurate, with the caveat that you need a special metadata key/value pair (setting execution type to OBSERVATION) for it to work. I added some clarification in the docstring.
@@ -330,7 +330,9 @@ | |||
make_values_resource as make_values_resource, | |||
resource as resource, | |||
) | |||
from dagster._core.definitions.result import MaterializeResult as MaterializeResult | |||
from dagster._core.definitions.result import ( | |||
MaterializeResult as MaterializeResult, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ObserveResult
be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to leave it private for the immediate future
1625c87
to
1670c1f
Compare
When we make it public |
Summary & Motivation
Add
ObserveResult
counterpart toMaterializeResult
and a common base classAssetResult
. The base class was added becauseObserveResult
andMaterializeResult
are currently exactly the same data structure, and much of the existing machinery used for auto-generating materializations to be used to generate observations.ObserveResult
is converted toOutput
at the same point thatMaterializeResult
is converted toOutput
.The initial plan here was to include in this PR a change to the source asset observation function implementation to use
ObserveResult
. However, this is not possible until we find a solution to the "partition-scoped metadata/data version" question (see https://github.com/dagster-io/internal/discussions/7529), becauseObserveResult
goes throughOutput
(which does not currently support partition-scoped data versions), and observable source assets can return partition-scoped data versions.Therefore this just adds
ObserveResult
and the capability to create an observable source asset equivalent from an external asset. There is some workaround logic for auto-converted source-assets that prevents auto-generation ofAssetObservation
.How I Tested These Changes
New unit tests.