[external-assets] ObserveResult #17824

smackesey · 2023-11-08T17:21:44Z

Summary & Motivation

Add ObserveResult counterpart to MaterializeResult and a common base class AssetResult. The base class was added because ObserveResult and MaterializeResult are currently exactly the same data structure, and much of the existing machinery used for auto-generating materializations to be used to generate observations. ObserveResult is converted to Output at the same point that MaterializeResult is converted to Output.

The initial plan here was to include in this PR a change to the source asset observation function implementation to use ObserveResult. However, this is not possible until we find a solution to the "partition-scoped metadata/data version" question (see https://github.com/dagster-io/internal/discussions/7529), because ObserveResult goes through Output (which does not currently support partition-scoped data versions), and observable source assets can return partition-scoped data versions.

Therefore this just adds ObserveResult and the capability to create an observable source asset equivalent from an external asset. There is some workaround logic for auto-converted source-assets that prevents auto-generation of AssetObservation.

How I Tested These Changes

New unit tests.

smackesey · 2023-11-08T17:21:55Z

Current dependencies on/for this PR:

master
- PR [external-assets] ObserveResult #17824 👈
  - PR [external assets] source assets -> external assets #18217

This stack of pull requests is managed by Graphite.

smackesey · 2023-11-22T14:32:33Z

python_modules/dagster/dagster/_core/definitions/external_asset.py

-        if source_asset.observe_fn is None
-        else {}
-    )
-


unrelated, but this block was duplicated

smackesey · 2023-11-22T14:34:09Z

python_modules/dagster/dagster/_core/definitions/result.py

+            corresponding AssetMaterialization event.
+        data_version (Optional[DataVersion]): The data version of the asset that was observed.
+    """
+


check_results and data_version were missing from MaterializeResult docstring

schrockn · 2023-11-27T18:42:30Z

This PR really begs the question of what is the difference between an observation and a materialization? We have parallel objects in multiple layers (ObserveResult and MaterializeResult; AssetMaterialization and AssetObservation) and in fact you have a codepath which relies on them having the same __init__ arguments.

I wonder if there is at a minimum an internal factor here that can reconcile the codepaths better and also keep observations and materialization more in sync

Perhaps the right mental model is that a materialization contains an instance of an observation.

smackesey · 2023-11-29T00:38:10Z

This PR really begs the question of what is the difference between an observation and a materialization?

Yes-- I was thinking about this while writing the PR. Here is my mental model:

A Dagster deployment models some set of data entities ("assets").
Every asset modeled has an asset key as identifier. The set of assets modeled by a Dagster deployment is the union of all asset keys that have either: (a) a corresponding AssetsDefinition/SourceAsset; or (b) at least one event in the log.
There are "materializable assets" (i.e. standard SDAs) and "external assets". External assets here include both assets with a corresponding definition and those with no definition but at least one event in the log. The meaning of AssetMaterialization and AssetObservation events differs for the two kinds of assets.
For materializable assets:
- The asset state specifies either one or zero values (zero if it has no AssetMaterialization on record) at any given time.
- The most recent AssetMaterialization event represents the computation of the asset's current value.
- AssetObservation events provide additional information about the value represented by the closest preceding materialization.
For external assets:
- The asset is assumed to always correspond to a value, even if there is no AssetMaterialization on record.
- AssetMaterializations represent computations of the asset's value, but there is no guarantee that all such computations have corresponding AssetMaterialization events. It is not necessary for there to be any AssetMaterialization event on record.
- Consequently, we can't draw any relationship between AssetObservation and AssetMaterialization events like we can with materializable assets. All we can say about an AssetObservation is that it corresponds to the value at the time the observation was conducted-- but whether it corresponds to the current value is unknown.

So the ontology is fuzzy around external assets. If I were designing this de novo I would require all modeled assets to have an AssetMaterialization, and then the meaning of AssetObservation would be clear in that it always contains information about the value represented by the preceding materialization.

In the current state, I'm not sure.

I wonder if there is at a minimum an internal factor here that can reconcile the codepaths better and also keep observations and materialization more in sync

I thought about factoring out a common AssetExecutionResult class for MaterializeResult and ObserveResult, but that would seem like it should apply to AssetCheckResult as well-- but this doesn't have a data_version. So I thought it clearer to not do this.

Perhaps the right mental model is that a materialization contains an instance of an observation.

Seems reasonable but it is kind of a confusing interpretation to apply to existing event logs.

github-actions · 2024-01-29T22:41:46Z

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-2vymrotg3-elementl.vercel.app
https://sean-external-assets-observe-result.dagster-university.dagster-docs.io

Built with commit 33d754c.
This pull request is being automatically deployed with vercel-action

github-actions · 2024-01-29T22:42:03Z

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-9gta3x4oc-elementl.vercel.app
https://sean-external-assets-observe-result.components-storybook.dagster-docs.io

Built with commit 33d754c.
This pull request is being automatically deployed with vercel-action

github-actions · 2024-02-01T17:28:01Z

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-8tvk2r8n0-elementl.vercel.app
https://sean-external-assets-observe-result.dagster.dagster-docs.io

Direct link to changed pages:

https://dagster-docs-8tvk2r8n0-elementl.vercel.app
https://sean-external-assets-observe-result.dagster.dagster-docs.io/integrations/embedded-elt

python_modules/dagster/dagster/_core/execution/plan/execute_step.py

schrockn · 2024-02-01T18:23:51Z

python_modules/dagster/dagster/_core/definitions/op_invocation.py

        f" asset_key, options are: {context.per_invocation_properties.assets_def.keys}"
    )


 def _output_name_for_result_obj(
-    event: MaterializeResult,
+    event: Union[MaterializeResult, ObserveResult],


This makes me think we should add a marker interface AssetResult that both MaterializeResult and ObserveResult implement for code paths like this

One reason I didn't do something like this is because there is also AssetCheckResult which would not be included so is kind of confusing, but I'm open to it-- do you think that's preferable to a common base class?

OK I see below comment on base class now

schrockn · 2024-02-01T18:27:24Z

python_modules/dagster/dagster/_core/definitions/result.py

@@ -1,7 +1,7 @@
 from typing import NamedTuple, Optional, Sequence

 import dagster._check as check
-from dagster._annotations import PublicAttr
+from dagster._annotations import PublicAttr, experimental
 from dagster._core.definitions.asset_check_result import AssetCheckResult
 from dagster._core.definitions.data_version import DataVersion



[Re: lines 15 to 15]

looking at this let's add the base class. I think `MaterializeResult` and `ObserveResult` can just trivially inherit
class AssetResult(NamedTuple(...)): # all the stuff class MaterializeResult(AssetResult): pass class ObserveResult(AssetResult): pass ```<span class='graphite__hidden'><br/><br/>See this comment inline on <a href="https://app.graphite.dev/github/pr/dagster-io/dagster/17824?utm_source=unchanged-line-comment">Graphite</a>.</span>

schrockn · 2024-02-01T18:27:48Z

python_modules/dagster/dagster/_core/execution/plan/compute_generator.py

@@ -168,10 +168,12 @@ def _filter_expected_output_defs(
    result_tuple = (
        (result,) if not isinstance(result, tuple) or is_named_tuple_instance(result) else result
    )
-    materialize_results = [x for x in result_tuple if isinstance(x, MaterializeResult)]
+    materialize_or_observe_results = [


yeah asset_results is better

python_modules/dagster/dagster/_core/execution/plan/compute_generator.py

python_modules/dagster/dagster/_core/execution/plan/execute_step.py

schrockn · 2024-02-01T18:50:16Z

python_modules/dagster/dagster/_core/execution/plan/execute_step.py

+        # This is a temporary workaround to prevent duplicate observation events from external
+        # observable assets that were auto-converted from source assets. These assets yield
+        # observation events through the context in their body, and will continue to do so until we
+        # can convert them to using ObserveResult, which requires a solution to partition-scoped


Can you add linear tasks for these follow ups?

python_modules/dagster/dagster/_core/execution/plan/execute_step.py

schrockn · 2024-02-02T21:58:30Z

python_modules/dagster/dagster/_core/definitions/assets.py

@@ -359,7 +359,7 @@ def dagster_internal_init(
            is_subset=is_subset,
        )

-    def __call__(self, *args: object, **kwargs: object) -> object:


Why is this change in this PR?

it slipped in, removed

schrockn · 2024-02-02T22:04:41Z

python_modules/dagster/dagster/_core/execution/plan/execute_step.py

-            if execution_type == AssetExecutionType.MATERIALIZATION
-            else ()
-        )
+


Would strongly prefer to have this structured so that there is no elif to demonstrate to the code reader that under no circumstances will these blocks be silently skipped.. The check against UNEXECUTABLE in elif is unnecessary given the invariant at top of function.

If you still want to check against unexecutable down here, do so as an invariant.

else: check.invariant(execution_type != AssetExecutionType.UNEXECUTABLE) yield from (...)

good point, changed to else

schrockn

Ok this looks good. Please heed final comment about the elif in core execution.

Also @sryza definitely want your signoff on this change.

sryza

We should add this to the api docs, right?

sryza · 2024-02-02T22:10:16Z

python_modules/dagster/dagster/_core/definitions/result.py

+
+@experimental
+class ObserveResult(AssetResult):
+    """An object representing a successful observation of an asset. These can be returned from


These comments aren't accurate, right? (I know that in the future they may become accurate if we choose to go the execution_type route.

The comments are accurate, with the caveat that you need a special metadata key/value pair (setting execution type to OBSERVATION) for it to work. I added some clarification in the docstring.

sryza · 2024-02-02T22:10:59Z

python_modules/dagster/dagster/__init__.py

@@ -330,7 +330,9 @@
    make_values_resource as make_values_resource,
    resource as resource,
 )
-from dagster._core.definitions.result import MaterializeResult as MaterializeResult
+from dagster._core.definitions.result import (
+    MaterializeResult as MaterializeResult,


Should ObserveResult be added here?

We decided to leave it private for the immediate future

smackesey · 2024-02-05T14:39:57Z

We should add this to the api docs, right?

When we make it public

smackesey changed the base branch from master to sean/metadata-by-partition November 20, 2023 16:56

smackesey force-pushed the sean/external-assets-observe-result branch from b48ebba to a7fb194 Compare November 20, 2023 16:56

This was referenced Nov 20, 2023

MetadataUserInput -> RawMetadataMapping #18156

Merged

Add MetadataByPartition #18157

Draft

smackesey changed the base branch from sean/metadata-by-partition to master November 22, 2023 13:18

smackesey force-pushed the sean/external-assets-observe-result branch 2 times, most recently from ad61e04 to 84ab166 Compare November 22, 2023 14:14

smackesey commented Nov 22, 2023

View reviewed changes

smackesey force-pushed the sean/external-assets-observe-result branch from 84ab166 to 5503f7f Compare November 22, 2023 14:55

smackesey marked this pull request as ready for review November 22, 2023 15:28

smackesey requested a review from schrockn November 22, 2023 15:28

smackesey mentioned this pull request Nov 22, 2023

[external assets] source assets -> external assets #18217

Closed

smackesey force-pushed the sean/external-assets-observe-result branch from 5503f7f to 33d754c Compare January 29, 2024 22:39

smackesey force-pushed the sean/external-assets-observe-result branch from 33d754c to fc5f310 Compare February 1, 2024 17:21

smackesey requested a review from erinkcochran87 as a code owner February 1, 2024 17:21

smackesey changed the base branch from master to sean/bump-ruff February 1, 2024 17:21

This was referenced Feb 1, 2024

[pyright] 1.1.339 -> 1.1.349 #19482

Merged

[ruff] 0.1.7 -> 0.2.0 #19531

Merged

smackesey force-pushed the sean/external-assets-observe-result branch from fc5f310 to 375c613 Compare February 1, 2024 17:45

schrockn requested changes Feb 1, 2024

View reviewed changes

smackesey force-pushed the sean/bump-ruff branch from 1f3aa24 to 08f4a01 Compare February 2, 2024 15:35

smackesey force-pushed the sean/external-assets-observe-result branch 2 times, most recently from 542b6a7 to 80b7df4 Compare February 2, 2024 16:04

smackesey changed the base branch from sean/bump-ruff to master February 2, 2024 16:04

smackesey force-pushed the sean/external-assets-observe-result branch 3 times, most recently from dc76212 to 1625c87 Compare February 2, 2024 19:44

smackesey requested a review from schrockn February 2, 2024 19:59

erinkcochran87 removed their request for review February 2, 2024 20:24

schrockn reviewed Feb 2, 2024

View reviewed changes

schrockn approved these changes Feb 2, 2024

View reviewed changes

sryza reviewed Feb 2, 2024

View reviewed changes

[external-assets] ObserveResult

1670c1f

smackesey force-pushed the sean/external-assets-observe-result branch from 1625c87 to 1670c1f Compare February 5, 2024 14:33

smackesey merged commit 7072555 into master Feb 5, 2024
1 check passed

smackesey deleted the sean/external-assets-observe-result branch February 5, 2024 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[external-assets] ObserveResult #17824

[external-assets] ObserveResult #17824

smackesey commented Nov 8, 2023 •

edited

Loading

smackesey commented Nov 8, 2023 •

edited

Loading

smackesey Nov 22, 2023

smackesey Nov 22, 2023

schrockn commented Nov 27, 2023

smackesey commented Nov 29, 2023

github-actions bot commented Jan 29, 2024

github-actions bot commented Jan 29, 2024

github-actions bot commented Feb 1, 2024

schrockn Feb 1, 2024

smackesey Feb 2, 2024 •

edited

Loading

schrockn Feb 1, 2024

schrockn Feb 1, 2024

schrockn Feb 1, 2024

schrockn Feb 2, 2024

smackesey Feb 5, 2024

schrockn Feb 2, 2024

smackesey Feb 5, 2024

schrockn left a comment

sryza left a comment

sryza Feb 2, 2024

smackesey Feb 5, 2024

sryza Feb 2, 2024

smackesey Feb 5, 2024

smackesey commented Feb 5, 2024

[external-assets] ObserveResult #17824

[external-assets] ObserveResult #17824

Conversation

smackesey commented Nov 8, 2023 • edited Loading

Summary & Motivation

How I Tested These Changes

smackesey commented Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schrockn commented Nov 27, 2023

smackesey commented Nov 29, 2023

github-actions bot commented Jan 29, 2024

github-actions bot commented Jan 29, 2024

github-actions bot commented Feb 1, 2024

Choose a reason for hiding this comment

smackesey Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schrockn left a comment

Choose a reason for hiding this comment

sryza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smackesey commented Feb 5, 2024

smackesey commented Nov 8, 2023 •

edited

Loading

smackesey commented Nov 8, 2023 •

edited

Loading

smackesey Feb 2, 2024 •

edited

Loading