Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[module-loaders] Delete extra_source_assets param from assets_from_modules #26494

Merged
merged 12 commits into from
Dec 20, 2024

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Dec 13, 2024

This PR was used to accumulate the upstack.

Summary & Motivation

How I Tested These Changes

Changelog

Insert changelog entry or delete this section.

Copy link
Contributor Author

dpeng817 commented Dec 13, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see -- sadly this is a public API so we can't actually delete this arg, right?

we could at least make it a hidden param

@dpeng817
Copy link
Contributor Author

@OwenKephart i don't think this is a public api actually. Searched for reference at top level and couldn't find. Was added pre 1.0

@OwenKephart
Copy link
Contributor

@dpeng817 huh I swear I tried from dagster import assets_from_modules last week and it worked but maybe I mixed it up with something else -- seems good to ditch

@dpeng817 dpeng817 marked this pull request as ready for review December 16, 2024 18:43
@OwenKephart OwenKephart self-requested a review December 16, 2024 18:50
Copy link

github-actions bot commented Dec 18, 2024

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-fmnqhegjp-elementl.vercel.app
https://dpeng817-delete-extra-source-assets.dagster.dagster-docs.io

Direct link to changed pages:

@dpeng817 dpeng817 force-pushed the dpeng817/delete_extra_source_assets branch from 2219a42 to 2853ef1 Compare December 18, 2024 18:36
@dpeng817 dpeng817 closed this Dec 18, 2024
@dpeng817 dpeng817 reopened this Dec 18, 2024
@dpeng817 dpeng817 force-pushed the dpeng817/delete_extra_source_assets branch 4 times, most recently from 8e93c2e to 0291fcc Compare December 19, 2024 00:27
@dpeng817 dpeng817 force-pushed the dpeng817/delete_extra_source_assets branch 7 times, most recently from 1b50ebe to 5ddc747 Compare December 19, 2024 16:40
## Summary & Motivation
This code path was highly crufty - code smell here is "iterating through
big list and doing a ton of things at once". Solution is to create a
lookup table of cacheable properties and imperatively call properties as
you need them (stole this pattern from @schrockn when we were working on
airlift stuff) I think the code is a lot clearer now.

I also got rid of an error message we were devoting a lot of lines to
that felt very outdated; nobody is using with_resources, and
repository_def is misleading.

I think we also end up catching a few hidden error states; for example,
we weren't checking assetsdefs against sourceassets with the same key,
sourceasset collision checking felt more like a side effect.

## How I Tested These Changes
Existing tests
## Summary & Motivation
We can further simplify the load_asset_from_x code path by making the
asset checks loader return AssetsDefinitions, and changing the
assertions to just make sure that the returned assets only contain
checks.

## How I Tested These Changes
Altered existing tests
#26529)

## Summary & Motivation
This was one of the most confusing parts of the original code path; that
asset check key changing was happening _implicitly_ within
with_attributes when you changed output_asset_keys.

Instead, provide a top-level, explicit way to remap asset check keys. I
think the old behavior was highly mysterious. I found myself questioning
what the heck was happening. This is much more; "what it says on the
tin."

## How I Tested These Changes
Existing tests.
## Summary & Motivation
Consolidate module loader code and tests into a single directory for
easier reuse between them.
)

## Summary & Motivation
The load_assets_from_x file was getting quite large, split it up into
different files for easier reuse across the module.
## Summary & Motivation
The object list utils were previously defining behavior mostly in terms
of assets. Genericize the utilities to instead refer to "DagsterObjects"
where appropriate; in preparation for handling other types of dagster
definitions.

## How I Tested These Changes
Existing tests
@dpeng817 dpeng817 force-pushed the dpeng817/delete_extra_source_assets branch from 0405939 to ca38954 Compare December 19, 2024 16:58
…sensors, jobs, schedules (#26545)

## Summary & Motivation
Actually genericize `ModuleScopedDagsterObjects` to handle taking in all
types of Dagster objects - sensors, schedules, and jobs. I explicitly
left out resources and loggers, because I don't think it makes sense to
scoop those up at module load - since they need to be associated with a
key.

## How I Tested These Changes
Added a new test file `test_module_loaders` which scaffolds out a module
fake for a given test spec, and ensures that either an error is thrown
or the correct number of attributes are accessible on the underlying
list.
@dpeng817 dpeng817 force-pushed the dpeng817/delete_extra_source_assets branch from e9365bb to f160acc Compare December 19, 2024 19:28
## Summary & Motivation
This PR is a prototype of a top-level API to load dagster definitions
from across a module into a single returned `Definitions` object. It is
not intended for landing yet. Just want to align on direction.

### Why add this API?
For any user who is bought into a module structure making use of
`load_assets_from_x`, we are currently making their life unnecessarily
more difficult by not bringing in other objects that are scoped at
module-load time. Those users agree - we've received requests for an API
to do that numerous times.

In absence of a compelling replacement for our current project
structure, I think the existence of this API is a good stopgap to
improve module ergonomics.

### Why are resources, loggers, and executor provided as args?
It seems like the most straightforward way to support these objects
without some sort of additional magic. Since we force you to provide a
key in addition to the class itself, there's not currently a
module-scoped pattern that matches how these objects are defined on a
`Definitions` argument. So it seems reasonable to accept these as
parameters.

The other approach would be to allow users to configure resources as
variables and use the key as a variable. But I figure the more
conservative approach here leaves us room to also do that in the future
(can imagine some sort of combination step).
```python
dbt = DbtCliResource(...)
# equivalent of
{"dbt": DbtCliResource(...)}
```

For executor, the fact that we only accept one per module is kind of
unique. I don't think a user would define an executor in the same way
that they define a sensor, schedule, or job, for example, where the
definition is inlined to the module. I think it makes more sense for the
user to need to provide this explicitly.

### Why does this return a Definitions object
I don't think there is any reasonable alternative. The existing
`load_assets_from_x` is nice because that whole list can be provided
directly to a `Definitions` object - this is not the case for if we
provided a flat list of heterogeneous types of Dagster objects - this
could not be provided directly to a `Definitions` object. Unless we
added some sort of `from_list` constructor or something. But I think
this is more straightforward.

It also gives us the opportunity to potentially paper over some stuff;
if we so choose - we can, for example, automatically coerce source
assets and AssetSpec objects into resolved AssetsDefinitions.

### Should this thing take in other `Definitions` objects?
Right now, I think no. While `Definitions.merge` exists, it's obscured
and not documented, I think most users think of a single `Definitions`
object as being synonymous with a code location.

### What's the intended design pattern for using this?
I think the intended use case for this would be to provide one call at
the top level of the module of a given code location; and automatically
load in all of your dagster defs.

```
defs = load_definitions_from_module(current_module, resources=..., loggers=...)
```

## How I Tested These Changes
I added a new test which operates on all test specs and calls this fxn,
and also one for handling the resource and logger cases.
@dpeng817 dpeng817 force-pushed the dpeng817/delete_extra_source_assets branch from f160acc to cf8c840 Compare December 19, 2024 20:10
@dpeng817 dpeng817 merged commit 802ef32 into master Dec 20, 2024
1 of 2 checks passed
@dpeng817 dpeng817 deleted the dpeng817/delete_extra_source_assets branch December 20, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants