How to write a step launcher #3201

sryza · 2020-11-06T16:41:38Z

sryza
Nov 6, 2020

Dec 16, 2020

Step launchers allow users to control how specific steps within a run are launched. For example, the emr_pyspark_step_launcher has the op run inside an EMR cluster, instead of the process that the executor would normally execute it inside.

Writing a step launcher is not the only way to run code remotely. You can also have the body of an op invoke a remote execution. https://docs.dagster.io/_apidocs/libraries/dagster-databricks#dagster_databricks.create_databricks_job_op is an example of this approach. The advantage of using a step launcher is that it allows your op to express pure business logic, which makes it much easier to test.

Writing a step launcher isn't simple, because running cod…

View full answer

sryza · 2020-12-16T23:17:07Z

sryza
Dec 16, 2020
Author

Step launchers allow users to control how specific steps within a run are launched. For example, the emr_pyspark_step_launcher has the op run inside an EMR cluster, instead of the process that the executor would normally execute it inside.

Writing a step launcher is not the only way to run code remotely. You can also have the body of an op invoke a remote execution. https://docs.dagster.io/_apidocs/libraries/dagster-databricks#dagster_databricks.create_databricks_job_op is an example of this approach. The advantage of using a step launcher is that it allows your op to express pure business logic, which makes it much easier to test.

Writing a step launcher isn't simple, because running code remotely isn't simple.

Writing a StepLauncher means implementing the launch_step method of the StepLauncher abstract class.

Writing a step launcher involves:

Serializing the step context that's passed in to launch step. You can call the step_context_to_step_run_ref function in the dagster.core.execution.plan.external_step module to create a serializable version of a step context.
Getting a python process up and running on the machine that you want to run the step. For Spark, this often involves spark-submit.
Transferring the serialized step context to the remote python process. A good way to do this is to write out the step context to a file in some filesystem that the remote python process can access. The remote process can convert the serialized step run ref to a step context with the step_run_ref_to_step_context in dagster.core.execution.plan.external_step.
Within the remote process, executing the step code. Here's an example: https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/_core/execution/plan/local_external_step_main.py.
Passing dagster events back from the remote python process to the step launcher.

The LocalExternalStepLauncher is a "simple" step launcher implementation that provides a starting point.

2 replies

sharangSharma54 Aug 25, 2022

The links to local_external_step_main are broken. Could we have the updated links. Was unable to find the resource myself

sryza Aug 25, 2022
Author

Thanks for catching this @sharangSharma54 . I just updated them.

michael-bily · 2023-03-09T14:12:07Z

michael-bily
Mar 9, 2023

Hello,

I have started developing my StepLauncher for EMR Serverless based on your implementation for plain EMR.

However when I start a job on the EMR side, it fails during the resolution of StepRunRef to StepExecutionContext in dagster._core.execution.plan.external_step.step_run_ref_to_step_context. (Please see attached stacktrace)
emr_serverless_execution_stack_trace.txt

Somewhere more down the line it tries to import my Definitions to lookup the pipeline/asset/resources etc., and DagsterInvalidConfigError is thrown because some jobs lack required run_configs. These configs are normally read from a file which is obviously missing from the EMR cluster and the run_config provided to the Run is not propagated.

When I supply manually the config file to the EMR Job it succeeds however this decouples the configs I have during job run and also does not allow me to change it from the UI.

Do you have any solution how to pass run_config from the Run to the remote execution?

Thank you very much.

2 replies

sryza Mar 16, 2023
Author

Hi @michael-bily - StepRunRef.run_config should contain the config provided to the run.

a-meledin Nov 10, 2023

Can someone kindly provide an example of how to use custom step launcher in @op or @asset and @job? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to write a step launcher #3201

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to write a step launcher #3201

sryza Nov 6, 2020

Replies: 2 comments · 4 replies

sryza Dec 16, 2020 Author

sharangSharma54 Aug 25, 2022

sryza Aug 25, 2022 Author

michael-bily Mar 9, 2023

sryza Mar 16, 2023 Author

a-meledin Nov 10, 2023

sryza
Nov 6, 2020

Replies: 2 comments 4 replies

sryza
Dec 16, 2020
Author

sryza Aug 25, 2022
Author

michael-bily
Mar 9, 2023

sryza Mar 16, 2023
Author