Replies: 5 comments 4 replies
-
Yeah definitely would be useful. That could really help enable folks who aren't as confident with Python programming to use Dagster. I love Dagster, but it is certainly not super accessible to folks who are new to python and just want to string some basic tasks together that are already defined externally. |
Beta Was this translation helpful? Give feedback.
-
How about defining assets as reusable Python components? e.g. make a factory for an asset or smth? So, you have high enough level configuration, but also extendability, if needed, also types. |
Beta Was this translation helpful? Give feedback.
-
YAML is also useful for reviewing the pipeline with a non-technical stakeholder. Some industries (e.g. Financial Services) actually require us data folks to walk through the whole pipeline end-to-end with our non-technical counterparts. But the part I'm actually excited about is the combination of multiple defs (YAML + Python, or multiple YAMLs, or multiple Pythons) 😄 (Edited: I see some of us already asked for combining multiple defs - #21092) |
Beta Was this translation helpful? Give feedback.
-
My two cents: YAML is terrible and pushing it as a viable option makes me sad. If you haven't seen this document, enjoy - https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell. For the dbt example this doesn't seem too crazy, because the DBT stuff is already not really in python, so it's just some more yaml to expose yet more yaml that's already done. I guess also for wiring up other non-python assets it could make pipes easier. But then how does dagster know what the inputs and outputs look like? How can it update the catalog? I suppose a lot of that happens inside of pipes, but I'm not sure. Can you reference the assets in YAML from python code and vice versa. I'm pretty sure the answer is yes, by string keys, but you're likely not going to get any assistance from your IDE or your type checker. Unless dagster is also going to start providing an editor inside of the webserver that uses instance level knowledge rather than source code level knowledge to assist, but I feel like that's potentially playing a dangerous game and could further divide the open source and cloud offerings. But for general assets this just seems sad. I probably sound like a broken record, but one of the best things about dagster is how it pushes users into better practices. We've already lost some of that in my opinion with pipes and the push away from IO managers I think (though I realize IO managers are often a burden, I think they are a nice concept to default to), but this is even further. I also feel like if this starts up, you'll shortly be getting requests to add functionality to the YAML, as that seems to always happen. Cloudformation is a particularly egregious example of this. |
Beta Was this translation helpful? Give feedback.
-
I'm actually doing exactly this, yet I am against this feature (or at best indifferent). Why? I am doing this because I want to control the API I expose to my stakeholders. A generic API will never be able to give me this kind of control, it will always expose either too little or too less configuration. I doubt it will be possible to hit anyone's sweetspot here. On the contrary, writing a few factory methods is really not that much effort. |
Beta Was this translation helpful? Give feedback.
-
Motivation
YAML. A very polarizing topic in the data community. Many hate it. Many more use it.
Many data organizations define parts of their data pipelines in configuration languages like YAML. Configuration languages like YAML are particularly useful for:
We've observed that many organizations build layers on top of Dagster that consume YAML. We’re trying to determine whether – and how – to make these uses more first class in Dagster, and to help users avoid reinventing the wheel every time they want to implement them.
What this could look like
What might this look like on top of Dagster? Some basic requirements:
A high-level sketch:
dagster-yaml
library includes aload_defs_from_yaml
function, which reads a directory of YAML files and uses their contents to generate a DagsterDefinitions
object.load_defs_from_yaml
accepts a list of plugins, which specify YAML interfaces for building Dagster definitions.dagster-dbt
anddagster-shell
, expose built-in plugins. Users can also define their own plugins.Example
definitions.py
definitions.py (alternative with both YAML defs and Python defs)
my_analytics_defs.yaml:
my_recommender_model_defs.yaml:
Let us know what you think
As usual, we would love your feedback. Do you write your own definition factories using YAML? What kinds of tasks do you use it for? What kinds of difficulties do you face? What functionality would you find useful?
Beta Was this translation helpful? Give feedback.
All reactions