-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[module-loaders] [rfc] Definitions from module loader #26546
[module-loaders] [rfc] Definitions from module loader #26546
Conversation
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
4b0a6f9
to
6b95559
Compare
b4b713d
to
c2a380b
Compare
6b95559
to
85e25a2
Compare
c2a380b
to
6999642
Compare
I think this an excellent change. We can use this in components as well. cc: @OwenKephart |
I also think passing resources in like this is a good compromise. |
85e25a2
to
46ad68c
Compare
6999642
to
0bf8e09
Compare
46ad68c
to
b2b3568
Compare
0bf8e09
to
29d067a
Compare
6cb2f15
to
498fa70
Compare
2ec94b9
to
dd4427a
Compare
498fa70
to
c8a751e
Compare
dd4427a
to
7cf84ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock the landing. Since this is structured as an internal API, I don’t think the stakes are high enough for us to bikeshed on the names forever.
A couple of open discussions in this stack:
- [edited] I had another open question but on a second thought, I think passing resources is smart, esp in a vary-resources-per-env world, you wouldn't want to blindly load every resource.
- (Excluding loggers makes sense to me. As a side note, I almost wonder if we could long-term move loggers to be more devops-y - perhaps as part of Components - rather than at the code location definitions level. But this is just a separate note that doesn't have to be addressed right now)
- Naming Separate Pandas Dataframe Solid into Two Sources #1: load_definitions_from_module vs load_defs_from_module: I think definitions is clearer and more descriptive, and it matches load_assets_. just simply full name.
- Naming Purge qhp from git history #2: Type hints:
LoadableAssetObject
vsLoadableAssetDef
: I vote forDef
. I don’t think users care about the implementation details thatDef
may not equalUnion[AssetsDef, AssetSpec, SourceAsset]
and neither would I. To me, they all fall under the umbrella ofDef
, similar toJobDef
,ScheduleDef
, etc.
c8a751e
to
bbc7c21
Compare
7cf84ab
to
560ae27
Compare
bbc7c21
to
fc5fbc5
Compare
560ae27
to
74be923
Compare
fc5fbc5
to
d42506a
Compare
74be923
to
2df711a
Compare
d42506a
to
4591c5c
Compare
2df711a
to
3d97d0f
Compare
4591c5c
to
3c41b3c
Compare
3d97d0f
to
8140fc9
Compare
3c41b3c
to
429aa3a
Compare
8140fc9
to
4067f7c
Compare
429aa3a
to
ba32822
Compare
4067f7c
to
8a3850b
Compare
ba32822
to
588f9ee
Compare
8a3850b
to
cb5939d
Compare
cb5939d
to
6090815
Compare
e9365bb
into
dpeng817/delete_extra_source_assets
## Summary & Motivation This PR is a prototype of a top-level API to load dagster definitions from across a module into a single returned `Definitions` object. It is not intended for landing yet. Just want to align on direction. ### Why add this API? For any user who is bought into a module structure making use of `load_assets_from_x`, we are currently making their life unnecessarily more difficult by not bringing in other objects that are scoped at module-load time. Those users agree - we've received requests for an API to do that numerous times. In absence of a compelling replacement for our current project structure, I think the existence of this API is a good stopgap to improve module ergonomics. ### Why are resources, loggers, and executor provided as args? It seems like the most straightforward way to support these objects without some sort of additional magic. Since we force you to provide a key in addition to the class itself, there's not currently a module-scoped pattern that matches how these objects are defined on a `Definitions` argument. So it seems reasonable to accept these as parameters. The other approach would be to allow users to configure resources as variables and use the key as a variable. But I figure the more conservative approach here leaves us room to also do that in the future (can imagine some sort of combination step). ```python dbt = DbtCliResource(...) # equivalent of {"dbt": DbtCliResource(...)} ``` For executor, the fact that we only accept one per module is kind of unique. I don't think a user would define an executor in the same way that they define a sensor, schedule, or job, for example, where the definition is inlined to the module. I think it makes more sense for the user to need to provide this explicitly. ### Why does this return a Definitions object I don't think there is any reasonable alternative. The existing `load_assets_from_x` is nice because that whole list can be provided directly to a `Definitions` object - this is not the case for if we provided a flat list of heterogeneous types of Dagster objects - this could not be provided directly to a `Definitions` object. Unless we added some sort of `from_list` constructor or something. But I think this is more straightforward. It also gives us the opportunity to potentially paper over some stuff; if we so choose - we can, for example, automatically coerce source assets and AssetSpec objects into resolved AssetsDefinitions. ### Should this thing take in other `Definitions` objects? Right now, I think no. While `Definitions.merge` exists, it's obscured and not documented, I think most users think of a single `Definitions` object as being synonymous with a code location. ### What's the intended design pattern for using this? I think the intended use case for this would be to provide one call at the top level of the module of a given code location; and automatically load in all of your dagster defs. ``` defs = load_definitions_from_module(current_module, resources=..., loggers=...) ``` ## How I Tested These Changes I added a new test which operates on all test specs and calls this fxn, and also one for handling the resource and logger cases.
## Summary & Motivation This PR is a prototype of a top-level API to load dagster definitions from across a module into a single returned `Definitions` object. It is not intended for landing yet. Just want to align on direction. ### Why add this API? For any user who is bought into a module structure making use of `load_assets_from_x`, we are currently making their life unnecessarily more difficult by not bringing in other objects that are scoped at module-load time. Those users agree - we've received requests for an API to do that numerous times. In absence of a compelling replacement for our current project structure, I think the existence of this API is a good stopgap to improve module ergonomics. ### Why are resources, loggers, and executor provided as args? It seems like the most straightforward way to support these objects without some sort of additional magic. Since we force you to provide a key in addition to the class itself, there's not currently a module-scoped pattern that matches how these objects are defined on a `Definitions` argument. So it seems reasonable to accept these as parameters. The other approach would be to allow users to configure resources as variables and use the key as a variable. But I figure the more conservative approach here leaves us room to also do that in the future (can imagine some sort of combination step). ```python dbt = DbtCliResource(...) # equivalent of {"dbt": DbtCliResource(...)} ``` For executor, the fact that we only accept one per module is kind of unique. I don't think a user would define an executor in the same way that they define a sensor, schedule, or job, for example, where the definition is inlined to the module. I think it makes more sense for the user to need to provide this explicitly. ### Why does this return a Definitions object I don't think there is any reasonable alternative. The existing `load_assets_from_x` is nice because that whole list can be provided directly to a `Definitions` object - this is not the case for if we provided a flat list of heterogeneous types of Dagster objects - this could not be provided directly to a `Definitions` object. Unless we added some sort of `from_list` constructor or something. But I think this is more straightforward. It also gives us the opportunity to potentially paper over some stuff; if we so choose - we can, for example, automatically coerce source assets and AssetSpec objects into resolved AssetsDefinitions. ### Should this thing take in other `Definitions` objects? Right now, I think no. While `Definitions.merge` exists, it's obscured and not documented, I think most users think of a single `Definitions` object as being synonymous with a code location. ### What's the intended design pattern for using this? I think the intended use case for this would be to provide one call at the top level of the module of a given code location; and automatically load in all of your dagster defs. ``` defs = load_definitions_from_module(current_module, resources=..., loggers=...) ``` ## How I Tested These Changes I added a new test which operates on all test specs and calls this fxn, and also one for handling the resource and logger cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpeng817 when is this getting into master?
@schrockn ideally today. There's been a ton of test failures to chase down. But I'll be sure to let you know when it does |
Summary & Motivation
This PR is a prototype of a top-level API to load dagster definitions from across a module into a single returned
Definitions
object. It is not intended for landing yet. Just want to align on direction.Why add this API?
For any user who is bought into a module structure making use of
load_assets_from_x
, we are currently making their life unnecessarily more difficult by not bringing in other objects that are scoped at module-load time. Those users agree - we've received requests for an API to do that numerous times.In absence of a compelling replacement for our current project structure, I think the existence of this API is a good stopgap to improve module ergonomics.
Why are resources, loggers, and executor provided as args?
It seems like the most straightforward way to support these objects without some sort of additional magic. Since we force you to provide a key in addition to the class itself, there's not currently a module-scoped pattern that matches how these objects are defined on a
Definitions
argument. So it seems reasonable to accept these as parameters.The other approach would be to allow users to configure resources as variables and use the key as a variable. But I figure the more conservative approach here leaves us room to also do that in the future (can imagine some sort of combination step).
For executor, the fact that we only accept one per module is kind of unique. I don't think a user would define an executor in the same way that they define a sensor, schedule, or job, for example, where the definition is inlined to the module. I think it makes more sense for the user to need to provide this explicitly.
Why does this return a Definitions object
I don't think there is any reasonable alternative. The existing
load_assets_from_x
is nice because that whole list can be provided directly to aDefinitions
object - this is not the case for if we provided a flat list of heterogeneous types of Dagster objects - this could not be provided directly to aDefinitions
object. Unless we added some sort offrom_list
constructor or something. But I think this is more straightforward.It also gives us the opportunity to potentially paper over some stuff; if we so choose - we can, for example, automatically coerce source assets and AssetSpec objects into resolved AssetsDefinitions.
Should this thing take in other
Definitions
objects?Right now, I think no. While
Definitions.merge
exists, it's obscured and not documented, I think most users think of a singleDefinitions
object as being synonymous with a code location.What's the intended design pattern for using this?
I think the intended use case for this would be to provide one call at the top level of the module of a given code location; and automatically load in all of your dagster defs.
How I Tested These Changes
I added a new test which operates on all test specs and calls this fxn, and also one for handling the resource and logger cases.