Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_from_specs #16754

schrockn · 2023-09-25T09:06:12Z

Summary & Motivation

This PR renames create_unexecutable_observable_assets_def to external_assets_from_specs. I also want this PR to serve as the final discussion on naming this feature and documents this capability. The verbiage in the docblock:

Create an external assets definition from a sequence of asset specs.

An external asset is an asset that is not materialized by Dagster, but is tracked in the
asset graph and asset catalog.

A common use case for external assets is modeling data produced by an process not
under Dagster's control. For example a daily drop of a file from a third party in s3.

In most systems these are described as sources. This includes Dagster, which includes
:py:class:`SourceAsset`, which will be supplanted by external assets in the near-term
future, as external assets are a superset of the functionality
of Source Assets.

External assets can act as sources, but that is not their only use.

In particular, external assets have themselves have lineage–-specified through the
``deps`` argument of :py:class:`AssetSpec`-– and can depend on other external assets.
External assets are not allowed to depend on non-external assets.

The user can emit `AssetMaterialization`, `AssetObservation`, and `AssetCheckEvaluations`
events attached external assets.  And Dagster now has the ability to have "runless"
events to enable many use cases that were previously not possible.  Runless events
are events generated outside the context of a particular run (for example, in a
sensor or by a script), allowing for greater flexibility in event generation.

This can be done in a few ways:

Note to reviewers that this in an in-progress doc block and the below will have links and examples.

1) DagsterInstance exposes `report_runless_event` that can be used to generate events for
    external assets directly on an instance. See docs.
2) Sensors can build these events and return them using :py:class:`SensorResult`. A use
    case for this is using a sensor to continously monitor the metadata exhaust from
    an external system and inserting events that
    reflect that exhaust. See docs.
3) Dagster Cloud exposes a REST API for ingesting runless events. Users can copy and
    paste a curl command in the their external computations (such as Airflow operator)
    to register metadata associated with those computations See docs.
4) Dagster ops can generate these events directly and yield them or by calling
    ``log_event`` on :py:class:`OpExecutionContext`.  Use cases for this include
    querying metadata in an external system that is too expensive to do so in a sensor. Or
    for adapting pure op-based Dagster code to take advantage of asset-oriented lineage,
    observability, and data quality features, without having to port them wholesale
    to `@asset`- and `@multi_asset`-based code.

This feature set allows users to use Dagster as an observability, lineage, and
data quality tool for assets that are not materialized by Dagster. In addition to
traditional use cases like sources, this feature can model entire lineage graphs of
assets that are scheduled and materialized by other tools and workflow engines. This
allows users to use Dagster as a cross-cutting observability tool without migrating
their entire data platform to a single orchestration engine.

External assets do not have all the features of normal assets: they cannot be
materialized ad hoc by Dagster (this is diabled in the UI); cannot be backfilled; cannot
be scheduled using auto-materialize policies; and opt out of other features around
direct materialization, both now and in the future. External assets also provide fewer
guarantees around the correctness of information of their information in the asset
catalog. In other words, in exchange for the flexibility Dagster provides less guardrails
for external assets than assets that are materialized by Dagster, and there is an increased
chance that they will insert nonsensical information into the asset catalog, potentially
eroding trust.

Suggesting alternative lanuage in this docblock is the best way to talk about an alternative name IMO.

How I Tested These Changes

schrockn · 2023-09-25T09:06:23Z

Current dependencies on/for this PR:

master
- PR Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_from_specs #16754 👈

This comment was auto-generated by Graphite.

github-actions · 2023-09-26T10:09:16Z

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-cybu18mzv-elementl.vercel.app
https://rename-to-external.components-storybook.dagster-docs.io

Built with commit 71e5d29.
This pull request is being automatically deployed with vercel-action

github-actions · 2023-09-26T10:09:38Z

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-fdlpk1l54-elementl.vercel.app
https://rename-to-external.core-storybook.dagster-docs.io

Built with commit 2629ca6.
This pull request is being automatically deployed with vercel-action

github-actions · 2023-09-26T10:12:10Z

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-obbuf9osz-elementl.vercel.app
https://rename-to-external.dagster.dagster-docs.io

Direct link to changed pages:

petehunt

I have long thought that external assets would be the right name for this feature so I am supportive! This feature resembles BigQuery's external tables feature so I think it will resonate with practitioners.

alangenfeld

The reservations around "external" have just been the existing meaning in the code base, but those artifacts are not part of the public API so can be renamed.

I think external is much better than unexecutable & observable. I don't think there have been any other real contenders.

I think the external language works well when describing these. In replacing source asset we have some dynamic scoping for what "external" refers to but i think it still makes sense (outside of dagster vs outside of a particular code location).

jamiedemaria · 2023-09-26T15:47:28Z

python_modules/dagster/dagster/_core/definitions/observable_asset.py

 from dagster._core.definitions.decorators.asset_decorator import asset, multi_asset
 from dagster._core.definitions.source_asset import SourceAsset
 from dagster._core.errors import DagsterInvariantViolationError


-def create_unexecutable_observable_assets_def(specs: Sequence[AssetSpec]):
+def external_assets_def_from_specs(specs: Sequence[AssetSpec]) -> AssetsDefinition:
+    """Create an external assets definition from a sequence of asset specs.


what's a use case where you would create an external asset from multiple AssetSpecs?

as an aside, doing this

a1 = AssetSpec(key="table_1") a2 = AssetSpec(key="table_2") a3 = AssetSpec(key="table_3", deps=[a1]) external_assets = external_assets_def_from_specs([a1, a2, a3])

causes errors when running the UI

dagster._core.errors.DagsterInvariantViolationError: Expected Tuple annotation for multiple outputs, but received non-tuple annotation.

This is inconsequential to the behavior of this system. This is just a weird outcropping of the fact that our core abstraction is pluralized. I could change it to produce N asset defs if so desired.

I might not fully understand, but the assets in the fn name isn't my concern, it's that you can pass a list of AssetSpecs. I don't really get why you would ever pass a list, so I'm wondering why it can't be

def external_assets_def_from_spec(spec: AssetSpec) -> AssetsDefinition:

This is more about convenience than anything. People will construct N asset specs to represent the graph and I want them to be able to coerce them into a form that can be threaded to Definitions in a single fn call.

ok that makes sense. I will note that just with the doc bloc and function signature, I was pretty unsure about what would happen if i passed a list of AssetSpecs. And it does still seem to throw an error if you try to pass a list (but that's fixable). I think the initial confusion can be remedied by addressing what happens when you pass multiple AssetSpecs in the doc block, or by generating an AssetsDefinition per spec. While the latter is maybe less efficient/convenient, it fits my intuituve mental model of what I would expect this function to do, rather than a single multi_asset which feels like it should consist of assets that are related to each other in some way

I'm totally cool with creating multiple assets_def. @sryza is that what you would prefer as well?

Yeah that feels more grokkable to me. And I agree that it's ergonomic for it to take a list, rather than requiring the user to write a comprehension.

Cool. Makes sense to me.

Actually this was a very good call. Error message if you somehow get to the user-defined execute function will be much more clear.

sryza · 2023-09-26T15:58:40Z

Interestingly, we already use the "external" language heavily in the concept docs where we present source assets: https://docs.dagster.io/concepts/assets/software-defined-assets#defining-external-asset-dependencies. This helps me feel comfortable with this change.

However, it's a little difficult to understand the impact of this change without understanding how it affects the UI and other parts of how we communicate about Dagster. E.g. are there a set of places in the docs where we'll want to add text like "if the asset is an external asset, then it behaves like X instead?"

I feel optimistic about the "external" naming, but the above gives me some apprehension about committing to it before we have the breathing room to understand the user-facing impact a little bit more.

A separate, more tactical reservation I have is that it feels weird to build an AssetsDefinition that has multiple of these. It's functionally equivalent to a single AssetsDefinition per asset spec, right? Allowing multiple to go on a single AssetsDefinition exposes a degree of freedom that might lead users to believe there's a difference in behavior.

All that said, exposing a helper function seems pretty low risk.

schrockn · 2023-09-26T16:23:39Z

A separate, more tactical reservation I have is that it feels weird to build an AssetsDefinition that has multiple of these. It's functionally equivalent to a single AssetsDefinition per asset spec, right? Allowing multiple to go on a single AssetsDefinition exposes a degree of freedom that might lead users to believe there's a difference in behavior.

#16754 (comment)

schrockn · 2023-09-26T16:33:37Z

All that said, exposing a helper function seems pretty low risk.

👍🏻 It is not even essential that this go out with 1.5. It is purely additive on top so can be in a follow on. What is important is aligning on visioning and naming prior to launch week event.

PedramNavid · 2023-09-26T17:08:26Z

Nothing new for me to add, the name is a big improvement over unexecutable. I think there are some questions around docs and how we want to communicate this to our users. I think API docs are part of it, but probably will warrant further prose too.

schrockn · 2023-09-26T17:30:46Z

Roger that. It appears we have consensus on external. We will land this at a reasonable pace, after the pipes renaming has landed. It doesn't not need to land in 1.5 to talk about it at Launch Week.

However I want to make it clear that I have very high conviction that this structure replaces source assets and observable source assets, and post launch week will subsequently––as part of our quality/consolidation push––do the following:

Embark at eliminating SourceAsset from the core of the framework
Enable "active observation" (dagster actively emitting Asset Observations for an external asset) of external assets in vanilla sensors and schedules,
Consolidating Materialize and Observe Sources into a single button Execute, the covers but traditional assets and actively observed external assets
Fitting actively observed external assets into whatever AMP evolves into.

This has been a laceration in our product and architecture for too long, and I look forward to eliminating it.

…ssets_def_from_specs to external_asset_def_from_specs docs snapshot snaps

github-actions · 2023-09-27T01:47:37Z

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-gyi8ec01v-elementl.vercel.app
https://rename-to-external.dagster-university.dagster-docs.io

Built with commit 2629ca6.
This pull request is being automatically deployed with vercel-action

schrockn

I'm going to land this but not export it top-level.

…s_def to external_assets_from_specs (#16754) ## Summary & Motivation This PR renames `create_unexecutable_observable_assets_def` to `external_assets_from_specs`. I also want this PR to serve as the final discussion on naming this feature and documents this capability. The verbiage in the docblock: ``` Create an external assets definition from a sequence of asset specs. An external asset is an asset that is not materialized by Dagster, but is tracked in the asset graph and asset catalog. A common use case for external assets is modeling data produced by an process not under Dagster's control. For example a daily drop of a file from a third party in s3. In most systems these are described as sources. This includes Dagster, which includes :py:class:`SourceAsset`, which will be supplanted by external assets in the near-term future, as external assets are a superset of the functionality of Source Assets. External assets can act as sources, but that is not their only use. In particular, external assets have themselves have lineage-specified through the ``deps`` argument of :py:class:`AssetSpec`- and can depend on other external assets. External assets are not allowed to depend on non-external assets. The user can emit `AssetMaterialization`, `AssetObservation`, and `AssetCheckEvaluations` events attached external assets. And Dagster now has the ability to have "runless" events to enable many use cases that were previously not possible. Runless events are events generated outside the context of a particular run (for example, in a sensor or by an script), allowing for greater flexibility in event generation. This can be done in a few ways: Note to reviewers that this in an in-progress doc block and the below will have links and examples. 1) DagsterInstance exposes `report_runless_event` that can be used to generate events for external assets directly on an instance. See docs. 2) Sensors can build these events and return them using :py:class:`SensorResult`. A use case for this is using a sensor to continously monitor the metadata exhaust from an external system and inserting events that reflect that exhaust. See docs. 3) Dagster Cloud exposes a REST API for ingesting runless events. Users can copy and paste a curl command in the their external computations (such as Airflow operator) to register metadata associated with those computations See docs. 4) Dagster ops can generate these events directly and yield them or by calling ``log_event`` on :py:class:`OpExecutionContext`. Use cases for this include querying metadata in an external system that is too expensive to do so in a sensor. Or for adapting pure op-based Dagster code to take advantage of asset-oriented lineage, observability, and data quality features, without having to port them wholesale to `@asset`- and `@multi_asset`-based code. This feature set allows users to use Dagster as an observability, lineage, and data quality tool for assets that are not materialized by Dagster. In addition to traditional use cases like sources, this feature can model entire lineage graphs of assets that are scheduled and materialized by other tools and workflow engines. This allows users to use Dagster as a cross-cutting observability tool without migrating their entire data platform to a single orchestration engine. External assets do not have all the features of normal assets: they cannot be materialized ad hoc by Dagster (this is diabled in the UI); cannot be backfilled; cannot be scheduled using auto-materialize policies; and opt out of other features around direct materialization, both now and in the future. External assets also provide fewer guarantees around the correctness of information of their information in the asset catalog. In other words, in exchange for the flexibility Dagster provides less guardrails for external assets than assets that are materialized by Dagster, and there is an increased chance that they will insert non-sensical information into the asset catalog, potentially eroding trust. ``` Suggesting alternative lanuage in this docblock is the best way to talk about an alternative name IMO. ## How I Tested These Changes

## Summary & Motivation Adds an External Assets concept page (motivation described in #16754). This also contains a code change necessary because of the bug demonstrated in #17077. ## How I Tested These Changes BK. Also loaded examples in `dagster dev` --------- Co-authored-by: Erin Cochran <[email protected]> Co-authored-by: Yuhan Luo <[email protected]>

Adds an External Assets concept page (motivation described in #16754). This also contains a code change necessary because of the bug demonstrated in #17077. BK. Also loaded examples in `dagster dev` --------- Co-authored-by: Erin Cochran <[email protected]> Co-authored-by: Yuhan Luo <[email protected]>

schrockn changed the title ~~rename create_unexecutable_observable_assets_def to create_external_assets_def_from_specs~~ rename create_unexecutable_observable_assets_def to external_assets_def_from_specs Sep 26, 2023

schrockn force-pushed the rename-to-external branch from 4dc593c to 71e5d29 Compare September 26, 2023 10:04

schrockn changed the title ~~rename create_unexecutable_observable_assets_def to external_assets_def_from_specs~~ Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_def_from_specs Sep 26, 2023

schrockn marked this pull request as ready for review September 26, 2023 10:14

schrockn requested review from petehunt, sryza, PedramNavid, erinkcochran87 and alangenfeld September 26, 2023 10:14

petehunt approved these changes Sep 26, 2023

View reviewed changes

alangenfeld approved these changes Sep 26, 2023

View reviewed changes

jamiedemaria reviewed Sep 26, 2023

View reviewed changes

schrockn added 2 commits September 26, 2023 21:33

rename create_unexecutable_observable_assets_def to create_external_a…

04aa9df

…ssets_def_from_specs to external_asset_def_from_specs docs snapshot snaps

cp

2629ca6

schrockn force-pushed the rename-to-external branch from ca4d19c to 2629ca6 Compare September 27, 2023 01:45

move files

e940cdd

schrockn changed the title ~~Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_def_from_specs~~ Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_from_specs Sep 27, 2023

pyright

3f46b35

schrockn commented Sep 27, 2023

View reviewed changes

schrockn merged commit 56219ae into master Sep 27, 2023

schrockn deleted the rename-to-external branch September 27, 2023 09:39

schrockn mentioned this pull request Oct 9, 2023

External Assets Concept Page #16935

Merged

sryza mentioned this pull request Jun 6, 2024

feat(looker): convert looker assets to external assets #22322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_from_specs #16754

Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_from_specs #16754

schrockn commented Sep 25, 2023 •

edited

Loading

schrockn commented Sep 25, 2023

github-actions bot commented Sep 26, 2023

github-actions bot commented Sep 26, 2023 •

edited

Loading

github-actions bot commented Sep 26, 2023

petehunt left a comment •

edited

Loading

alangenfeld left a comment

jamiedemaria Sep 26, 2023

jamiedemaria Sep 26, 2023 •

edited

Loading

schrockn Sep 26, 2023

jamiedemaria Sep 26, 2023

schrockn Sep 26, 2023

jamiedemaria Sep 26, 2023 •

edited

Loading

schrockn Sep 26, 2023

sryza Sep 26, 2023

schrockn Sep 27, 2023

schrockn Sep 27, 2023 •

edited

Loading

sryza commented Sep 26, 2023

schrockn commented Sep 26, 2023

schrockn commented Sep 26, 2023

PedramNavid commented Sep 26, 2023

schrockn commented Sep 26, 2023 •

edited

Loading

github-actions bot commented Sep 27, 2023

schrockn left a comment

Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_from_specs #16754

Describe External Assets; rename create_unexecutable_observable_assets_def to external_assets_from_specs #16754

Conversation

schrockn commented Sep 25, 2023 • edited Loading

Summary & Motivation

How I Tested These Changes

schrockn commented Sep 25, 2023

github-actions bot commented Sep 26, 2023

github-actions bot commented Sep 26, 2023 • edited Loading

github-actions bot commented Sep 26, 2023

petehunt left a comment • edited Loading

Choose a reason for hiding this comment

alangenfeld left a comment

Choose a reason for hiding this comment

jamiedemaria Sep 26, 2023

Choose a reason for hiding this comment

jamiedemaria Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

schrockn Sep 26, 2023

Choose a reason for hiding this comment

jamiedemaria Sep 26, 2023

Choose a reason for hiding this comment

schrockn Sep 26, 2023

Choose a reason for hiding this comment

jamiedemaria Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

schrockn Sep 26, 2023

Choose a reason for hiding this comment

sryza Sep 26, 2023

Choose a reason for hiding this comment

schrockn Sep 27, 2023

Choose a reason for hiding this comment

schrockn Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

sryza commented Sep 26, 2023

schrockn commented Sep 26, 2023

schrockn commented Sep 26, 2023

PedramNavid commented Sep 26, 2023

schrockn commented Sep 26, 2023 • edited Loading

github-actions bot commented Sep 27, 2023

schrockn left a comment

Choose a reason for hiding this comment

schrockn commented Sep 25, 2023 •

edited

Loading

github-actions bot commented Sep 26, 2023 •

edited

Loading

petehunt left a comment •

edited

Loading

jamiedemaria Sep 26, 2023 •

edited

Loading

jamiedemaria Sep 26, 2023 •

edited

Loading

schrockn Sep 27, 2023 •

edited

Loading

schrockn commented Sep 26, 2023 •

edited

Loading