[module-loaders] [rfc] Definitions from module loader #26546

dpeng817 · 2024-12-17T17:56:22Z

Summary & Motivation

This PR is a prototype of a top-level API to load dagster definitions from across a module into a single returned Definitions object. It is not intended for landing yet. Just want to align on direction.

Why add this API?

For any user who is bought into a module structure making use of load_assets_from_x, we are currently making their life unnecessarily more difficult by not bringing in other objects that are scoped at module-load time. Those users agree - we've received requests for an API to do that numerous times.

In absence of a compelling replacement for our current project structure, I think the existence of this API is a good stopgap to improve module ergonomics.

Why are resources, loggers, and executor provided as args?

It seems like the most straightforward way to support these objects without some sort of additional magic. Since we force you to provide a key in addition to the class itself, there's not currently a module-scoped pattern that matches how these objects are defined on a Definitions argument. So it seems reasonable to accept these as parameters.

The other approach would be to allow users to configure resources as variables and use the key as a variable. But I figure the more conservative approach here leaves us room to also do that in the future (can imagine some sort of combination step).

dbt = DbtCliResource(...)
# equivalent of
{"dbt": DbtCliResource(...)}

For executor, the fact that we only accept one per module is kind of unique. I don't think a user would define an executor in the same way that they define a sensor, schedule, or job, for example, where the definition is inlined to the module. I think it makes more sense for the user to need to provide this explicitly.

Why does this return a Definitions object

I don't think there is any reasonable alternative. The existing load_assets_from_x is nice because that whole list can be provided directly to a Definitions object - this is not the case for if we provided a flat list of heterogeneous types of Dagster objects - this could not be provided directly to a Definitions object. Unless we added some sort of from_list constructor or something. But I think this is more straightforward.

It also gives us the opportunity to potentially paper over some stuff; if we so choose - we can, for example, automatically coerce source assets and AssetSpec objects into resolved AssetsDefinitions.

Should this thing take in other `Definitions` objects?

Right now, I think no. While Definitions.merge exists, it's obscured and not documented, I think most users think of a single Definitions object as being synonymous with a code location.

What's the intended design pattern for using this?

I think the intended use case for this would be to provide one call at the top level of the module of a given code location; and automatically load in all of your dagster defs.

defs = load_definitions_from_module(current_module, resources=..., loggers=...)

How I Tested These Changes

I added a new test which operates on all test specs and calls this fxn, and also one for handling the resource and logger cases.

dpeng817 · 2024-12-17T17:56:52Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

[module-loaders] [rfc] Definitions from module loader #26546 👈 (View in Graphite)
[module-loaders] Delete extra_source_assets param from assets_from_modules #26494 : 9 other dependent PRs (#26484 , #26498 , #26514 and 6 others)
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

schrockn · 2024-12-18T15:52:09Z

I think this an excellent change. We can use this in components as well. cc: @OwenKephart

schrockn · 2024-12-18T15:52:55Z

I also think passing resources in like this is a good compromise.

yuhan

Approving to unblock the landing. Since this is structured as an internal API, I don’t think the stakes are high enough for us to bikeshed on the names forever.

A couple of open discussions in this stack:

[edited] I had another open question but on a second thought, I think passing resources is smart, esp in a vary-resources-per-env world, you wouldn't want to blindly load every resource.
- (Excluding loggers makes sense to me. As a side note, I almost wonder if we could long-term move loggers to be more devops-y - perhaps as part of Components - rather than at the code location definitions level. But this is just a separate note that doesn't have to be addressed right now)
Naming Separate Pandas Dataframe Solid into Two Sources #1: load_definitions_from_module vs load_defs_from_module: I think definitions is clearer and more descriptive, and it matches load_assets_. just simply full name.
Naming Purge qhp from git history #2: Type hints: LoadableAssetObject vs LoadableAssetDef: I vote for Def. I don’t think users care about the implementation details that Def may not equal Union[AssetsDef, AssetSpec, SourceAsset] and neither would I. To me, they all fall under the umbrella of Def, similar to JobDef, ScheduleDef, etc.

## Summary & Motivation This PR is a prototype of a top-level API to load dagster definitions from across a module into a single returned `Definitions` object. It is not intended for landing yet. Just want to align on direction. ### Why add this API? For any user who is bought into a module structure making use of `load_assets_from_x`, we are currently making their life unnecessarily more difficult by not bringing in other objects that are scoped at module-load time. Those users agree - we've received requests for an API to do that numerous times. In absence of a compelling replacement for our current project structure, I think the existence of this API is a good stopgap to improve module ergonomics. ### Why are resources, loggers, and executor provided as args? It seems like the most straightforward way to support these objects without some sort of additional magic. Since we force you to provide a key in addition to the class itself, there's not currently a module-scoped pattern that matches how these objects are defined on a `Definitions` argument. So it seems reasonable to accept these as parameters. The other approach would be to allow users to configure resources as variables and use the key as a variable. But I figure the more conservative approach here leaves us room to also do that in the future (can imagine some sort of combination step). ```python dbt = DbtCliResource(...) # equivalent of {"dbt": DbtCliResource(...)} ``` For executor, the fact that we only accept one per module is kind of unique. I don't think a user would define an executor in the same way that they define a sensor, schedule, or job, for example, where the definition is inlined to the module. I think it makes more sense for the user to need to provide this explicitly. ### Why does this return a Definitions object I don't think there is any reasonable alternative. The existing `load_assets_from_x` is nice because that whole list can be provided directly to a `Definitions` object - this is not the case for if we provided a flat list of heterogeneous types of Dagster objects - this could not be provided directly to a `Definitions` object. Unless we added some sort of `from_list` constructor or something. But I think this is more straightforward. It also gives us the opportunity to potentially paper over some stuff; if we so choose - we can, for example, automatically coerce source assets and AssetSpec objects into resolved AssetsDefinitions. ### Should this thing take in other `Definitions` objects? Right now, I think no. While `Definitions.merge` exists, it's obscured and not documented, I think most users think of a single `Definitions` object as being synonymous with a code location. ### What's the intended design pattern for using this? I think the intended use case for this would be to provide one call at the top level of the module of a given code location; and automatically load in all of your dagster defs. ``` defs = load_definitions_from_module(current_module, resources=..., loggers=...) ``` ## How I Tested These Changes I added a new test which operates on all test specs and calls this fxn, and also one for handling the resource and logger cases.

schrockn

@dpeng817 when is this getting into master?

dpeng817 · 2024-12-20T13:37:35Z

@schrockn ideally today. There's been a ton of test failures to chase down. But I'll be sure to let you know when it does

dpeng817 mentioned this pull request Dec 17, 2024

[module-loaders] Genericize object list functionality to take in all sensors, jobs, schedules #26545

Merged

dpeng817 force-pushed the genericize_object_list branch from 4b0a6f9 to 6b95559 Compare December 18, 2024 02:53

dpeng817 force-pushed the dpeng817/defs_from_module branch from b4b713d to c2a380b Compare December 18, 2024 02:53

dpeng817 force-pushed the genericize_object_list branch from 6b95559 to 85e25a2 Compare December 18, 2024 03:35

dpeng817 force-pushed the dpeng817/defs_from_module branch from c2a380b to 6999642 Compare December 18, 2024 03:35

dpeng817 changed the title ~~Definitions from module loader~~ [rfc] Definitions from module loader Dec 18, 2024

dpeng817 requested review from OwenKephart, schrockn and yuhan December 18, 2024 15:48

dpeng817 marked this pull request as ready for review December 18, 2024 15:48

dpeng817 changed the title ~~[rfc] Definitions from module loader~~ [module-loaders] [rfc] Definitions from module loader Dec 18, 2024

dpeng817 force-pushed the genericize_object_list branch from 85e25a2 to 46ad68c Compare December 18, 2024 18:03

dpeng817 force-pushed the dpeng817/defs_from_module branch from 6999642 to 0bf8e09 Compare December 18, 2024 18:03

dpeng817 force-pushed the genericize_object_list branch from 46ad68c to b2b3568 Compare December 18, 2024 20:25

dpeng817 force-pushed the dpeng817/defs_from_module branch from 0bf8e09 to 29d067a Compare December 18, 2024 20:25

dpeng817 force-pushed the genericize_object_list branch from 6cb2f15 to 498fa70 Compare December 19, 2024 01:51

dpeng817 force-pushed the dpeng817/defs_from_module branch from 2ec94b9 to dd4427a Compare December 19, 2024 01:51

dpeng817 force-pushed the genericize_object_list branch from 498fa70 to c8a751e Compare December 19, 2024 02:06

dpeng817 force-pushed the dpeng817/defs_from_module branch from dd4427a to 7cf84ab Compare December 19, 2024 02:06

yuhan approved these changes Dec 19, 2024

View reviewed changes

dpeng817 force-pushed the genericize_object_list branch from c8a751e to bbc7c21 Compare December 19, 2024 14:19

dpeng817 force-pushed the dpeng817/defs_from_module branch from 7cf84ab to 560ae27 Compare December 19, 2024 14:19

dpeng817 force-pushed the genericize_object_list branch from bbc7c21 to fc5fbc5 Compare December 19, 2024 14:24

dpeng817 force-pushed the dpeng817/defs_from_module branch from 560ae27 to 74be923 Compare December 19, 2024 14:25

dpeng817 force-pushed the genericize_object_list branch from fc5fbc5 to d42506a Compare December 19, 2024 15:54

dpeng817 force-pushed the dpeng817/defs_from_module branch from 74be923 to 2df711a Compare December 19, 2024 15:54

dpeng817 force-pushed the genericize_object_list branch from d42506a to 4591c5c Compare December 19, 2024 16:36

dpeng817 force-pushed the dpeng817/defs_from_module branch from 2df711a to 3d97d0f Compare December 19, 2024 16:37

dpeng817 force-pushed the genericize_object_list branch from 4591c5c to 3c41b3c Compare December 19, 2024 16:40

dpeng817 force-pushed the dpeng817/defs_from_module branch from 3d97d0f to 8140fc9 Compare December 19, 2024 16:40

dpeng817 force-pushed the genericize_object_list branch from 3c41b3c to 429aa3a Compare December 19, 2024 16:41

dpeng817 force-pushed the dpeng817/defs_from_module branch from 8140fc9 to 4067f7c Compare December 19, 2024 16:42

dpeng817 force-pushed the genericize_object_list branch from 429aa3a to ba32822 Compare December 19, 2024 16:53

dpeng817 force-pushed the dpeng817/defs_from_module branch from 4067f7c to 8a3850b Compare December 19, 2024 16:53

dpeng817 force-pushed the genericize_object_list branch from ba32822 to 588f9ee Compare December 19, 2024 16:58

dpeng817 force-pushed the dpeng817/defs_from_module branch from 8a3850b to cb5939d Compare December 19, 2024 16:58

Base automatically changed from genericize_object_list to dpeng817/delete_extra_source_assets December 19, 2024 17:07

Definitions from module loader

6090815

dpeng817 force-pushed the dpeng817/defs_from_module branch from cb5939d to 6090815 Compare December 19, 2024 17:08

dpeng817 merged commit e9365bb into dpeng817/delete_extra_source_assets Dec 19, 2024
1 check was pending

dpeng817 deleted the dpeng817/defs_from_module branch December 19, 2024 17:09

schrockn reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[module-loaders] [rfc] Definitions from module loader #26546

[module-loaders] [rfc] Definitions from module loader #26546

dpeng817 commented Dec 17, 2024 •

edited

Loading

dpeng817 commented Dec 17, 2024 •

edited

Loading

schrockn commented Dec 18, 2024

schrockn commented Dec 18, 2024

yuhan left a comment •

edited

Loading

schrockn left a comment

dpeng817 commented Dec 20, 2024 •

edited

Loading

[module-loaders] [rfc] Definitions from module loader #26546

[module-loaders] [rfc] Definitions from module loader #26546

Conversation

dpeng817 commented Dec 17, 2024 • edited Loading

Summary & Motivation

Why add this API?

Why are resources, loggers, and executor provided as args?

Why does this return a Definitions object

Should this thing take in other Definitions objects?

What's the intended design pattern for using this?

How I Tested These Changes

dpeng817 commented Dec 17, 2024 • edited Loading

schrockn commented Dec 18, 2024

schrockn commented Dec 18, 2024

yuhan left a comment • edited Loading

Choose a reason for hiding this comment

schrockn left a comment

Choose a reason for hiding this comment

dpeng817 commented Dec 20, 2024 • edited Loading

dpeng817 commented Dec 17, 2024 •

edited

Loading

Should this thing take in other `Definitions` objects?

dpeng817 commented Dec 17, 2024 •

edited

Loading

yuhan left a comment •

edited

Loading

dpeng817 commented Dec 20, 2024 •

edited

Loading