Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[module-loaders] [rfc] Definitions from module loader #26546

Merged

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Dec 17, 2024

Summary & Motivation

This PR is a prototype of a top-level API to load dagster definitions from across a module into a single returned Definitions object. It is not intended for landing yet. Just want to align on direction.

Why add this API?

For any user who is bought into a module structure making use of load_assets_from_x, we are currently making their life unnecessarily more difficult by not bringing in other objects that are scoped at module-load time. Those users agree - we've received requests for an API to do that numerous times.

In absence of a compelling replacement for our current project structure, I think the existence of this API is a good stopgap to improve module ergonomics.

Why are resources, loggers, and executor provided as args?

It seems like the most straightforward way to support these objects without some sort of additional magic. Since we force you to provide a key in addition to the class itself, there's not currently a module-scoped pattern that matches how these objects are defined on a Definitions argument. So it seems reasonable to accept these as parameters.

The other approach would be to allow users to configure resources as variables and use the key as a variable. But I figure the more conservative approach here leaves us room to also do that in the future (can imagine some sort of combination step).

dbt = DbtCliResource(...)
# equivalent of
{"dbt": DbtCliResource(...)}

For executor, the fact that we only accept one per module is kind of unique. I don't think a user would define an executor in the same way that they define a sensor, schedule, or job, for example, where the definition is inlined to the module. I think it makes more sense for the user to need to provide this explicitly.

Why does this return a Definitions object

I don't think there is any reasonable alternative. The existing load_assets_from_x is nice because that whole list can be provided directly to a Definitions object - this is not the case for if we provided a flat list of heterogeneous types of Dagster objects - this could not be provided directly to a Definitions object. Unless we added some sort of from_list constructor or something. But I think this is more straightforward.

It also gives us the opportunity to potentially paper over some stuff; if we so choose - we can, for example, automatically coerce source assets and AssetSpec objects into resolved AssetsDefinitions.

Should this thing take in other Definitions objects?

Right now, I think no. While Definitions.merge exists, it's obscured and not documented, I think most users think of a single Definitions object as being synonymous with a code location.

What's the intended design pattern for using this?

I think the intended use case for this would be to provide one call at the top level of the module of a given code location; and automatically load in all of your dagster defs.

defs = load_definitions_from_module(current_module, resources=..., loggers=...)

How I Tested These Changes

I added a new test which operates on all test specs and calls this fxn, and also one for handling the resource and logger cases.

Copy link
Contributor Author

dpeng817 commented Dec 17, 2024

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 4b0a6f9 to 6b95559 Compare December 18, 2024 02:53
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from b4b713d to c2a380b Compare December 18, 2024 02:53
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 6b95559 to 85e25a2 Compare December 18, 2024 03:35
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from c2a380b to 6999642 Compare December 18, 2024 03:35
@dpeng817 dpeng817 changed the title Definitions from module loader [rfc] Definitions from module loader Dec 18, 2024
@dpeng817 dpeng817 marked this pull request as ready for review December 18, 2024 15:48
@dpeng817 dpeng817 changed the title [rfc] Definitions from module loader [module-loaders] [rfc] Definitions from module loader Dec 18, 2024
@schrockn
Copy link
Member

I think this an excellent change. We can use this in components as well. cc: @OwenKephart

@schrockn
Copy link
Member

I also think passing resources in like this is a good compromise.

@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 85e25a2 to 46ad68c Compare December 18, 2024 18:03
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 6999642 to 0bf8e09 Compare December 18, 2024 18:03
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 46ad68c to b2b3568 Compare December 18, 2024 20:25
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 0bf8e09 to 29d067a Compare December 18, 2024 20:25
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 6cb2f15 to 498fa70 Compare December 19, 2024 01:51
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 2ec94b9 to dd4427a Compare December 19, 2024 01:51
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 498fa70 to c8a751e Compare December 19, 2024 02:06
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from dd4427a to 7cf84ab Compare December 19, 2024 02:06
Copy link
Contributor

@yuhan yuhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock the landing. Since this is structured as an internal API, I don’t think the stakes are high enough for us to bikeshed on the names forever.

A couple of open discussions in this stack:

  • [edited] I had another open question but on a second thought, I think passing resources is smart, esp in a vary-resources-per-env world, you wouldn't want to blindly load every resource.
    • (Excluding loggers makes sense to me. As a side note, I almost wonder if we could long-term move loggers to be more devops-y - perhaps as part of Components - rather than at the code location definitions level. But this is just a separate note that doesn't have to be addressed right now)
  • Naming Separate Pandas Dataframe Solid into Two Sources #1: load_definitions_from_module vs load_defs_from_module: I think definitions is clearer and more descriptive, and it matches load_assets_. just simply full name.
  • Naming Purge qhp from git history #2: Type hints: LoadableAssetObject vs LoadableAssetDef: I vote for Def. I don’t think users care about the implementation details that Def may not equal Union[AssetsDef, AssetSpec, SourceAsset] and neither would I. To me, they all fall under the umbrella of Def, similar to JobDef, ScheduleDef, etc.

@dpeng817 dpeng817 force-pushed the genericize_object_list branch from c8a751e to bbc7c21 Compare December 19, 2024 14:19
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 7cf84ab to 560ae27 Compare December 19, 2024 14:19
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from bbc7c21 to fc5fbc5 Compare December 19, 2024 14:24
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 560ae27 to 74be923 Compare December 19, 2024 14:25
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from fc5fbc5 to d42506a Compare December 19, 2024 15:54
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 74be923 to 2df711a Compare December 19, 2024 15:54
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from d42506a to 4591c5c Compare December 19, 2024 16:36
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 2df711a to 3d97d0f Compare December 19, 2024 16:37
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 4591c5c to 3c41b3c Compare December 19, 2024 16:40
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 3d97d0f to 8140fc9 Compare December 19, 2024 16:40
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 3c41b3c to 429aa3a Compare December 19, 2024 16:41
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 8140fc9 to 4067f7c Compare December 19, 2024 16:42
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from 429aa3a to ba32822 Compare December 19, 2024 16:53
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 4067f7c to 8a3850b Compare December 19, 2024 16:53
@dpeng817 dpeng817 force-pushed the genericize_object_list branch from ba32822 to 588f9ee Compare December 19, 2024 16:58
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from 8a3850b to cb5939d Compare December 19, 2024 16:58
Base automatically changed from genericize_object_list to dpeng817/delete_extra_source_assets December 19, 2024 17:07
@dpeng817 dpeng817 force-pushed the dpeng817/defs_from_module branch from cb5939d to 6090815 Compare December 19, 2024 17:08
@dpeng817 dpeng817 merged commit e9365bb into dpeng817/delete_extra_source_assets Dec 19, 2024
1 check was pending
@dpeng817 dpeng817 deleted the dpeng817/defs_from_module branch December 19, 2024 17:09
dpeng817 added a commit that referenced this pull request Dec 19, 2024
## Summary & Motivation
This PR is a prototype of a top-level API to load dagster definitions
from across a module into a single returned `Definitions` object. It is
not intended for landing yet. Just want to align on direction.

### Why add this API?
For any user who is bought into a module structure making use of
`load_assets_from_x`, we are currently making their life unnecessarily
more difficult by not bringing in other objects that are scoped at
module-load time. Those users agree - we've received requests for an API
to do that numerous times.

In absence of a compelling replacement for our current project
structure, I think the existence of this API is a good stopgap to
improve module ergonomics.

### Why are resources, loggers, and executor provided as args?
It seems like the most straightforward way to support these objects
without some sort of additional magic. Since we force you to provide a
key in addition to the class itself, there's not currently a
module-scoped pattern that matches how these objects are defined on a
`Definitions` argument. So it seems reasonable to accept these as
parameters.

The other approach would be to allow users to configure resources as
variables and use the key as a variable. But I figure the more
conservative approach here leaves us room to also do that in the future
(can imagine some sort of combination step).
```python
dbt = DbtCliResource(...)
# equivalent of
{"dbt": DbtCliResource(...)}
```

For executor, the fact that we only accept one per module is kind of
unique. I don't think a user would define an executor in the same way
that they define a sensor, schedule, or job, for example, where the
definition is inlined to the module. I think it makes more sense for the
user to need to provide this explicitly.

### Why does this return a Definitions object
I don't think there is any reasonable alternative. The existing
`load_assets_from_x` is nice because that whole list can be provided
directly to a `Definitions` object - this is not the case for if we
provided a flat list of heterogeneous types of Dagster objects - this
could not be provided directly to a `Definitions` object. Unless we
added some sort of `from_list` constructor or something. But I think
this is more straightforward.

It also gives us the opportunity to potentially paper over some stuff;
if we so choose - we can, for example, automatically coerce source
assets and AssetSpec objects into resolved AssetsDefinitions.

### Should this thing take in other `Definitions` objects?
Right now, I think no. While `Definitions.merge` exists, it's obscured
and not documented, I think most users think of a single `Definitions`
object as being synonymous with a code location.

### What's the intended design pattern for using this?
I think the intended use case for this would be to provide one call at
the top level of the module of a given code location; and automatically
load in all of your dagster defs.

```
defs = load_definitions_from_module(current_module, resources=..., loggers=...)
```

## How I Tested These Changes
I added a new test which operates on all test specs and calls this fxn,
and also one for handling the resource and logger cases.
dpeng817 added a commit that referenced this pull request Dec 19, 2024
## Summary & Motivation
This PR is a prototype of a top-level API to load dagster definitions
from across a module into a single returned `Definitions` object. It is
not intended for landing yet. Just want to align on direction.

### Why add this API?
For any user who is bought into a module structure making use of
`load_assets_from_x`, we are currently making their life unnecessarily
more difficult by not bringing in other objects that are scoped at
module-load time. Those users agree - we've received requests for an API
to do that numerous times.

In absence of a compelling replacement for our current project
structure, I think the existence of this API is a good stopgap to
improve module ergonomics.

### Why are resources, loggers, and executor provided as args?
It seems like the most straightforward way to support these objects
without some sort of additional magic. Since we force you to provide a
key in addition to the class itself, there's not currently a
module-scoped pattern that matches how these objects are defined on a
`Definitions` argument. So it seems reasonable to accept these as
parameters.

The other approach would be to allow users to configure resources as
variables and use the key as a variable. But I figure the more
conservative approach here leaves us room to also do that in the future
(can imagine some sort of combination step).
```python
dbt = DbtCliResource(...)
# equivalent of
{"dbt": DbtCliResource(...)}
```

For executor, the fact that we only accept one per module is kind of
unique. I don't think a user would define an executor in the same way
that they define a sensor, schedule, or job, for example, where the
definition is inlined to the module. I think it makes more sense for the
user to need to provide this explicitly.

### Why does this return a Definitions object
I don't think there is any reasonable alternative. The existing
`load_assets_from_x` is nice because that whole list can be provided
directly to a `Definitions` object - this is not the case for if we
provided a flat list of heterogeneous types of Dagster objects - this
could not be provided directly to a `Definitions` object. Unless we
added some sort of `from_list` constructor or something. But I think
this is more straightforward.

It also gives us the opportunity to potentially paper over some stuff;
if we so choose - we can, for example, automatically coerce source
assets and AssetSpec objects into resolved AssetsDefinitions.

### Should this thing take in other `Definitions` objects?
Right now, I think no. While `Definitions.merge` exists, it's obscured
and not documented, I think most users think of a single `Definitions`
object as being synonymous with a code location.

### What's the intended design pattern for using this?
I think the intended use case for this would be to provide one call at
the top level of the module of a given code location; and automatically
load in all of your dagster defs.

```
defs = load_definitions_from_module(current_module, resources=..., loggers=...)
```

## How I Tested These Changes
I added a new test which operates on all test specs and calls this fxn,
and also one for handling the resource and logger cases.
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpeng817 when is this getting into master?

@dpeng817
Copy link
Contributor Author

dpeng817 commented Dec 20, 2024

@schrockn ideally today. There's been a ton of test failures to chase down. But I'll be sure to let you know when it does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants