Replies: 13 comments 32 replies
-
Will there be some sort of functionality to replace multiple repositories in a
load_from:
- python_package: my_first_package
- python_package: my_second_package In other words, how is this fan-out supported in a local OSS instance: Since a
|
Beta Was this translation helpful? Give feedback.
-
If we have a code location with hundreds of jobs, it’s unclear to me what the recommended mechanism for grouping them in dagit with this change. Repositories may have an odd name, but they work very well for that purpose and having the “folders” in dagit is great. Should we split into multiple python modules/code locations to keep this behavior? Could those still be served by one user code deployment? (Apologies if this was answered and I just missed it) |
Beta Was this translation helpful? Give feedback.
-
This looks very promising! I like the simplification a lot! One of my first questions is how this will impact projects where there's multiple different python environments. I know there's some work that's been done on enabling custom docker images and environments in dagster. EG: Assuming we use poetry managed dependencies, a tree like:
How would I instruct dagster to look at domain_a pyproject.toml and domain_b pyproject.toml? From my understanding of the proposal we only cover pyproject.toml -> domain_a, domain_b and not "domain_a->pyproject.toml, domain_b->pyproject.toml" |
Beta Was this translation helpful? Give feedback.
-
At a glance it makes sense to me that How should we handle users including two |
Beta Was this translation helpful? Give feedback.
-
I really like this. The reduction in concepts is great and think the IIUC a top level # my_project/__init__.py
defs = Definitions(
assets=[*load_assets_from_dbt_project(), *load_assets_from_some_warehouse()],
resources={"my_resource": expensive_computation()},
) Will this make In general I think it is good to have imports not do too much and have a separate phase for one time initialization. It allows tools or repls to import code and introspect it. One simple alternative is to use a function instead: # my_project/__init__.py
def defs():
return Definitions(...) Another option may be to separate the construction of the |
Beta Was this translation helpful? Give feedback.
-
This sounds like a great change. The new proposal covers our (pretty simple) use cases, and the UI simplifications do seem easier to navigate. |
Beta Was this translation helpful? Give feedback.
-
How would loading assets with different resource configurations for the same keys work in this model? Examples that we have of doing this now are things like:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for working on the mental load! I think it will be important for Dagster adoption, as its inclination to introduce innovative and powerful concepts has the negative side effect of creating a steeper learning curve. It reminds me when you "hid" In our on-premise environment, we have been using workspace for a very clear data side reason: separating the definitions used for two strictly separated VLANs that handles different assets and resources. We have one instance of Dagster running in each VLAN, each instance loading only what's relevant to its VLAN. Each VLAN's code is also grouped inside two distinct Dagster repositories. So in our case a Dagster workspace and a Dagster repository are equivalent, therefor one of the two levels is redundant, but we still need one to separate both sides. While the assets are separated by directories, we keep both code bases in a single git repository and a single Python package because many other aspects are shared, including a script to start all the orchestration services. In this script, depending on the value of an environment variable, we will start Dagster by pointing to one of the two workspaces: if [[ $VLAN == "special" ]]; then
workspace_path="special_workspace.yaml"
else
workspace_path="workspace.yaml"
fi
dagster-daemon run -w $workspace_path &
dagit -w $workspace_path How could we handle our use case with your proposal ? |
Beta Was this translation helpful? Give feedback.
-
Something particularly nice about this proposal is the subsuming of |
Beta Was this translation helpful? Give feedback.
-
Quick update. We've decided to keep the |
Beta Was this translation helpful? Give feedback.
-
I like the sound of it, even though I'd agree with a few other comments talking about the added friction in having only one repository/definitions object per gRPC server. I especially like having all the information in the
|
Beta Was this translation helpful? Give feedback.
-
@schrockn, what are your thoughts on how will this affect the Kubernetes deployment model.
|
Beta Was this translation helpful? Give feedback.
-
This feature is now live and #11167 and our examples and most of our content now is written in terms of Quick notes on changes as a result of this RFC.
|
Beta Was this translation helpful? Give feedback.
-
Summary
Dagster has some duplicative and unnecessary concepts, which can introduce friction to the onboarding process and add to the cognitive load of existing users. Workspaces and repositores stick out as problem areas. Workspaces don't do that much––they are a list of code locations––but occupy a lot of headspace in our UI and tools. Repositories can cause immense confusion as (1) they collide with the notion of GitHub repositories but have a different scope leading to confusing states where you can have multiple Dagster repositories within a single GitHub repository and (2) for most users they also did not provide that much value beyond the code location organizational unit, leaving them with the question of why the even exist in the first place.
For these reasons, we are eliminating the concepts of Workspace and Repository from the UI and adding a new entry point API
Definitions
to replace@repository
. There will be a single layer of hierarchy of code locations in our tools.Note: this is not a breaking change and will not require any code changes from existing users.
API
Instead of the decorator
@repository
to group Dagster definitions together, there will instead be aDefinitions
class. Dagster tools will autodiscover a Definitions object in a module when it is set to thedefs
variable.Example:
Rather than previous API
A few notable changes
Definitions
has typed, named arguments. This is better for discovery, documentation, and robustness.Definitions
takes a top-level resources argument, which is automatically applied to assets, rather than requiring the use ofwith_resources
.Definitions
object per code location, whereas before there could be multiple repositories in the code location.Definitions
can accept raw objects, not just resource definitions.Instead of
it can be
Workspace
A workspace is defined as a set of code locations. In our UIs it served very little purpose and did little else other than confuse users. The
workspace.yaml
file had some real utility in local development, as it allowed a user to use the CLI ordagit
without any additional arguments.Our project templates now include a
pyproject.toml
file. This file is part of the Python standard (see PEP). Its primary purpose is to replacesetup.py
but it is increasingly a standard to use this file to configure any tool that interacts with your python code and it has an extensibility point designed for that purpose. We propose using that file instead ofworkspace.yaml
to enable automatic loading.Whereas before one would have a
workspace.yaml
file like so:Now you will have a section in
pyproject.toml
like so:For OSS deployments the workspace was used to point Dagit at grpc servers. This will continue to be the supported way to do that.
What about existing code
You will not need to change any code to adjust to this change. The UI will treat existing repositories as a code location with the name
repo_name@location_name
.If you have multiple repositories within a single code location, there will be some potentially counterintuitive behavior in our operational tools. For example, when you click "reload" on a code location from the Code Locations tab, in will in fact reload two entries in this table as they share a process.
We do recommend migrating code as we will be moving our documentation and content to refer to this new construct and avoid using the term Workspace and Repository in documentation and our tools.
If you want to hide the repository name in the UI without migrating to
Definitions
you can simply set your repository name to "__repository__
" if the repository is the only one in the code location.There are no immediate plans to eliminate
@repository
.Alternatives considered
Definitions
set to variable rather thanregister_definitions
Right now the API is to set the constructed
Definitions
object on a variable at module scope calleddefs
. Our tools search for this name and only this name.Alternative approaches considered included a function the registered definitions.
While this had some advantages, in the end we found the registration of global state to be, for lack of a better term, distastefully stateful.
However that approach does avoid the error case where one forgets to assign the
Definitions
object to a variable and it silently fails.CodeLocation
rather thanDefinitions
We consider creating an object that represented a code location directly. However this introduced the concept of code location very early in tutorials. Additionally some of the information about a code location is determined by the tools that load them. The mental model here is the
Definitions
object is the set of definitions that will be loading within a code location. In the UI a code location is a live-running process, not just a set of definitions.Conclusion
We're excited to move forward with this change as we believe it removes substantial cognitive load at low cost. We're looking for feedback about this change. We plan on shipping it (with feedback incorporated) Dec 8th.
Beta Was this translation helpful? Give feedback.
All reactions