Replies: 14 comments 19 replies
-
Love it! Been using dataclasses and such a lot in our dagster repo internally and this definitely kicks that sort of pattern up a level! Can't wait to use it |
Beta Was this translation helpful? Give feedback.
-
I really like this change, feels very natural as the typing system also pushed me towards using pydantic for my inputs and outputs, so this extension really helps. Especially like the added visibility over resource configs and types. Always felt a little odd to specify resource keys but not have to specify the class/superclass for them. My only question is related to a use case I've had for a while that led me down the path of some interesting workarounds: will this new system include first-class support for supplying default configs to assets without having to go the Editing to add: As a followup, are io_managers also going to be defined within the asset parameters, something along the lines of:
|
Beta Was this translation helpful? Give feedback.
-
This looks amazing! We are heavy users of pydantic at our company, so this is one of the things we've been wishing for :) |
Beta Was this translation helpful? Give feedback.
-
There's a lot to like here! What happens for assets which will still use Relatedly - if I have a signature like |
Beta Was this translation helpful? Give feedback.
-
This RFC describes config changes for user-side code, particularly for OpConfigs and Resources. What about the configuration objects passed in other places? For example, JobDefinitions accept a Another type of config passed in through dictionaries is config for libraries. For example, to set up the container configuration with dagster-k8s used in an asset_job, the configuration gets passed in via precovery_index_job = define_asset_job(
"k8s_asset_name",
selection="k8s_asset",
tags={
"dagster-k8s/config": {
"container_config": {
"resources": {
"requests": {
"cpu": "2000m",
"memory": "10Gi",
"ephemeral-storage": "1T",
},
},
"volume_mounts": [{"mount_path": "/tmp/run/", "name": "run-volume"}],
},
"pod_spec_config": {
"volumes": [{"name": "run-volume", "empty_dir": {"size_limit": "1T"}}],
},
}
},
) Will the schemas for libraries like dagster-k8s be definable with this new system? That would be really really really helpful for clarifying interfaces. |
Beta Was this translation helpful? Give feedback.
-
These are super exciting changes! In PUDL we're currently converting our existing pydantic to use dagster config types. This works but would be much more straightforward if dagster objects could be configured using pydantic classes. @benpankow you mentioned potentially adding validation methods to the Config classes. If the Config classes are now pydantic classes, can we validate the config using the standard pydantic validators? |
Beta Was this translation helpful? Give feedback.
-
This looks great! This makes it much plainer how asset and op/job files should be written (import a resource --> define a config class --> specify asset parameters --> write op job. I can also see this leading to much more use of configuration in general, which I have found challenging to work with previously. The more I lean on the configuration API, the more I expect it will be useful to view what is being passed implicitly at runtime. This might be outside the scope for this proposal, but it would be useful to also have:
Great work! |
Beta Was this translation helpful? Give feedback.
-
I was hoping for more advanced config validation functionality (we currently use an in house written dataclass to dagster config conversion function already to achieve the same kind of functionality described here). Things like min max value constraints, or even fully custom validation functions on configuration. We currently do this as a first step in our workflows but it would be so much nicer if it would already be signalled to the user even before the workflow can be launched. |
Beta Was this translation helpful? Give feedback.
-
Awesome news! I'm excited to try the new "Pythonic config" and "Pythonic resources". It's great to see improvements to the configuration and resources system that would simplify and increase ergonomics and observability. Thanks for your hard work! |
Beta Was this translation helpful? Give feedback.
-
I have a suggestion for improvement regarding error messages for invalid configs.
|
Beta Was this translation helpful? Give feedback.
-
I also noticed that there was no mention of how inputs would be configured with the new config API, and I couldn't find any support for inputs in the RunConfig class. Since we use inputs for graphs and configure them when triggering jobs (e.g. |
Beta Was this translation helpful? Give feedback.
-
class AnAssetConfig(Config):
a_string: str
an_int: int
@asset(config=AnAssetConfig) # <=== specify the config class here in some way?
def an_asset(a_string: str, an_int: int):
... Especially if resources are now just viewed as classes instances that can be passed as arguments, one can further decouple business logic from dagster itself and simplify testing. (Admittedly that seems easy for users to implement themselves with a new decorator.) |
Beta Was this translation helpful? Give feedback.
-
Is there a way to partially configure the new pythonic resources? Or to override just a piece of their configuration at launch? Previously I was doing .configured and passing most of what I needed in, and I tried to do .configure_at_launch() with most of my values passed through, which I assumed would work like configured() did on the old style of resources, but now when I go to the launchpad I get nothing and if I try to override just the remaining configuration value dagit asks me to reconfigure everything. |
Beta Was this translation helpful? Give feedback.
-
When would the validate method be run for Validated resources? I'm thinking probably at resource initialization time to account for secret rotation between deployment and runs, etc, but I could also see it happening at code location loading time. Either way I think it's important to have specified. If it's at resource initialization time, thenI think that also makes it a good candidate to use for establishing connections, creating db engines, or otherwise warming up resources via @functools.computed_property or other methods, but then validate seems maybe too specific? Am I thinking along the right track or would we want another extension point for those sorts of things? |
Beta Was this translation helpful? Give feedback.
-
Summary
Recently, the Dagster team has been focused on tackling programming ergonomics issues which are a hurdle for users. The first of these changes was a move from
@repository
decorators to theDefinitions
API.Clear areas for improvement are the configuration and resources systems. The Dagster config system relies heavily on string indirection, requires users to learn a custom type system, and doesn't take advantage of Python's native typing system for parameterizing or accessing values.
Alongside similar challenges with typing and indirection, a lack of observability into the resource system has become apparent. With users deploying Dagster across environments ranging from a local machine to branch deployments to production, configuration of external services via resources is increasingly important. These resources until now have remained hidden away, an implementation detail rather than a first-class object in the Dagster UI.
We are introducing a new ergonomic layer onto the Dagster config and resources system, referred to here as “Pythonic config” and “Pythonic resources,” with the goal of increasing ergonomic accessibility, simplicity, and observability.
This discussion will provide a brief overview of the API changes which will launch as experimental with the 1.2 release scheduled for March 9. The goal of this document is to gather early feedback and answer questions about these APIs before they launch.
After the 1.2 release in early March this system will be generally available to the community, at which point we we hope to get users actively using the new APIs. We have strong conviction that these APIs are a marked improvement, and will transition towards using it as the new default for examples and content alongside the 1.3 release, scheduled for mid-April.
For those interested in delving into implementation details or more complex use-cases, see the further reading section below.
Note: This is not a breaking change, existing code will continue to work. For the foreseeable future, this ergonomic layer is an opt-in API.
Notable changes
Python typing frontend for the configuration system
Config schemas for assets and ops can now be defined using a Pydantic frontend. Strongly typed config objects can be accessed through the
config
parameter to asset and op functions rather than pulled from thecontext
. These config objects can be directly constructed when launching runs from code.Before:
After:
Here we can see a number of advantages:
config_schema
and the custom Dagster config schema system.context
to obtain config.op_config
dictionary or building untypedconfig
input when launching runs in code.Class-based resources
Resources and their config schemas can be defined by subclassing a new
ConfiguredResource
class, also a Pydantic model. Assets and ops can encode their resource dependencies in annotated parameters to the asset or op functions.Before:
After:
This change also leads to a number of simplifications:
@resource
decorator.required_resource_keys
in favor of strongly typed resource parameters.context
to use resources in op or asset body.Resources in the UI
To enhance observability of resources and their configuration, a resource page will be added to the Dagster UI.
The resource page surfaces resource metadata, including descriptions of the resource, the assets and ops which rely on it, and the set of config keys and values in the current deployment environment. Values sourced from environment variables are specifically highlighted. The resource page makes it easier to tell at-a-glance what external services your Dagster instance is configured to interact with.
Environment variables and secrets
Often, resources will be configured with secret values (such as a password to a database) or values which vary between deployed environments (such as a target database schema). These values are most often provided through environment variables. The current config syntax allows sourcing config values via environment variables with
[StringSource](https://docs.dagster.io/_apidocs/config#dagster.StringSource)
, but a new API makes this even more natural:An instance of the
EnvVar
class can be substituted for any string config value, meaning users never have to think aboutStringSource
. Lookup of the environment variable is deferred until runtime, when the value of the env var is substituted in the resource’s config.EnvVar
also provides improved observability, with config sourced from env vars specifically highlighted in the resource UI.Potential extension points - resource validation
With resources reframed as Python classes and as first-class APIs, it’s easier for us to layer optional functionality into resources. One use-case we’ve considered is an optional
Validated
interface which resources could implement. This would allow resources to provide a sort of status check that config is valid and that the resource works:For example, a resource connecting to an outside SaaS service could return whether it can connect successfully or has a credentials error. This can make deploying code to a production environment easier, and could quickly highlight issues such as missing or misconfigured env vars.
Existing codebases
No changes are needed to existing code - the current config and resource APIs will continue to work alongside this new ergonomic layer.
Users may choose to try out or gradually adopt this system by adding Pythonic config to ops and assets incrementally.
Existing, function-style resources can be adapted into strictly typed Pythonic resources using adapter classes. For more details, see our Pythonic resources docs.
Further reading
For more details on advanced use-cases and implementation details, we have published a set of experimental docs outlining the new config and resource APIs:
Beta Was this translation helpful? Give feedback.
All reactions