Step Launchers supersession announcement #25685

danielgafni · 2024-11-01T19:17:42Z

danielgafni
Nov 1, 2024
Collaborator

StepLauncher supersession announcement

TLDR

The StepLauncher is being superseded with Dagster Pipes. We recommend users start exploring Dagster Pipes and consider migrating (see the Spark example) existing StepLauncher usages to Pipes. While StepLauncher will remain available, this feature will no longer be receiving active development.

Users who prefer Step Launchers to Pipes because of some features like IOManager integration are welcome to do so, given they accept the increased development and deployment complexity.

Context

StepLauncher was an experimental Dagster resource which could be used to run Dagster steps in remote environments such as Spark. The following step launchers were implemented:

dagster_databricks.databricks_pyspark_step_launcher
dagster_aws.emr.emr_pyspark_step_launcher

The goal of step launchers was to seamlessly execute Dagster code in Spark or another remote environment. In particular, a StepLauncher is responsible for:

Setting up the remote environment - uploading Dagster code and installing dependencies
Executing the op/asset remotely
Fetching logs and Dagster events from the remote process and making them available to Dagster.

However, this power came at the cost of significant implementation complexity and feature coupling. If you've been using step launchers, especially in Spark, you might wonder why we're moving away from this approach. Historically, we tried to integrate business logic and orchestration in external runtimes like Spark using framework-level abstractions. While StepLauncher aimed to make remote execution easier, it faced several challenges that made adoption difficult:

Rigid Code Structure: You had to structure your deployed code around Dagster code locations and definitions, which could be impractical if you had existing Spark jobs or dependency conflicts.
Deployment Limitations: StepLauncher managed code deployment at runtime, which often conflicted with DevOps processes. Many users preferred managing deployments during push time instead.
Complex Customization: Writing custom step launchers was complex, making them difficult to extend and use effectively.
Runtime dependencies: The StepLauncher required to have dagster installed in the remote environment, which might introduce version conflicts or increase the complexity of the deployment process.
Usage with other programming languages: because of the previous point, it was not possible to use StepLauncher with programming languages other than Python, which is a blocker for popular Scala/Java Spark workloads.

For those who do not want to structure their Spark business logic around Dagster definitions, we believe that Dagster Pipes -- a more composable and lightweight solution -- is the right path forward.

Dagster Pipes

Dagster Pipes is a wire protocol that handles parameter/context passing to the remote process, and log/metadata gathering from the remote process. This approach aligns better with Dagster's philosophy of modularity and extensibility, enabling users to create more flexible and powerful remote execution solutions, albeit with some additional setup responsibility.

Some of the improvements over StepLauncher are:

➕ Increased composability: Pipes components can be mixed and matched, allowing for use in a wide variety of environments.

➕ Decreased complexity: individual Pipes components can be implemented and tested in isolation, making it easier to develop and maintain custom solutions. This modular approach allows for greater flexibility and adaptability to specific use cases.

➕ Improved extensibility: Users can easily extend the existing existing family of Pipes components to meet their unique requirements, fostering a more diverse ecosystem of integrations.

➕ Lightweight: Pipes can execute unmodified scripts. In order to send additional Dagster events (such as Dagster metadata or asset check results) back to the orchestration process, a zero-dependency (and single-file) dagster-pipes Python package can be installed in the remote environment.

➕ Language-agnostic: Implementing Dagster Pipes in additional programming languages is very tractable. Dagster customers have already done this. Right now we only have an official implementation for Python, but we will be adding support to more languages in the near future (JVM languages being in progress).

As Pipes are more lightweight and give you greater control, they also come with some responsibilities:

➖ Pipes do not automatically set up the remote environment. This responsibility now falls to the user, typically handled through CI/CD processes.

➖ Pipes to not automatically execute op/asset body. Instead, the users are typically expected to provide an external script which will be launched by Pipes.

Pipes Clients

We have implemented a set of opinionated Pipes clients on top of the Pipes framework for some popular services.

References

Spark Step Launchers -> Pipes migration guide

dinis-rodrigues · 2024-11-05T23:53:51Z

dinis-rodrigues
Nov 5, 2024

This will imply major major breaking changes in the future. From the release notes the step launcher:

will not be removed from the codebase until Dagster 2.0 is released

Meaning that from v2 support it could be removed entirely.
Organizations currently using step launchers and io_managers (like us) will need to refactor the entire codebase to work with dagster pipes.

Yes the step launcher has its drawbacks, but it is easily customizable (like dagster pipes), by extending its class in my opinion.
If we compare the two, the boiler plate code which is needed to setup dagster pipes, rounds up to the same as a custom inherited step launcher.

And the biggest drawback of them all is the deprecation of io_manager with dagster pipes. For example, many spark jobs rely on intermediate outputs (instead of cache) to optimize the spark job. We for example, have intermediate outputs in all jobs, because we have complex queries which require this optimization.

Having to configure each op with a hardcoded "io_manager" in the end / beginning, just to make things work, is just friction to developers.

I think the idea behind dagster pipes is good, but it feels like we had a battletested steplauncher and now we are "obligated" to use a new less-featured almost blank canvas tool. For newcomers in dagster, this might be okay, but for existing users, its a barrier.

If the step launcher had its flaws, it should've been a matter of improving the existing solution, instead of building from the ground up a new one and making existing workflows deprecated.

Maybe this opinion is due to my lack of experience and experimentation with pipes, but just the thought of having to refactor everything, makes me demotivated just to try it

3 replies

danielgafni Nov 6, 2024
Collaborator Author

Hey!

Thank you for the feedback, that's exactly why this discussion was created.

will not be removed from the codebase until Dagster 2.0 is released

This wording was poor, I'll make sure to change it. Two points here:

We don't have any timeline or plans for Dagster 2.0. This means this can be considered as a very distant point in time.
Actually, we aren't aiming to remove Step Launchers even in 2.0. There are no reasons for doing this. So you could keep using them as long as you'd like to. After all, they are just normal resources (just like Pipes) and aren't dependent on any specific framework features. We are planning to stop active development of Step Launchers, but they aren't going anywhere.

Having to configure each op with a hardcoded "io_manager" in the end / beginning, just to make things work, is just friction to developers.

I very much agree with that. I'll let @schrockn elaborate

CC @yuhan feedback above ^^

schrockn Nov 8, 2024
Maintainer

@dinis-rodrigues thanks for the feedback.

For people like yourselves who are "all in" on step launchers to the point of building their own step launchers, our recommendation is not to port to Pipes. It is working for you now. For people who are willing commit to step launchers and to have it own deployment, install dagster in the remote environment, etc, step launchers provide more features. We will clarify the wording of the discussion on this point.

dinis-rodrigues Nov 8, 2024

Let me just clarify that I'm all for it on the new dagster-pipes features. I would love to see those same features on step launchers without losing the io_manager, for example.

We extended the functionality of the existing step launchers to achieve some extra/custom functionality.
We are all-in on step launchers because we have been all-in on dagster from the beginning. Step launchers were what was available, and we just built from that.

We've been closely following the new features, and I just raised this awareness because of possible major refactoring.
In the near future we want and will use/extend dagster-pipes. But with the io_manager "issue", it's not feasible at the moment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step Launchers supersession announcement #25685

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Step Launchers supersession announcement #25685

danielgafni Nov 1, 2024 Collaborator

StepLauncher supersession announcement

TLDR

Context

Dagster Pipes

Pipes Clients

Replies: 1 comment · 3 replies

dinis-rodrigues Nov 5, 2024

danielgafni Nov 6, 2024 Collaborator Author

schrockn Nov 8, 2024 Maintainer

dinis-rodrigues Nov 8, 2024

danielgafni
Nov 1, 2024
Collaborator

Replies: 1 comment 3 replies

dinis-rodrigues
Nov 5, 2024

danielgafni Nov 6, 2024
Collaborator Author

schrockn Nov 8, 2024
Maintainer