What happens between `STEP_WORKER_STARTING` and `STEP_WORKER_STARTED` ? #18014

alangenfeld · 2023-11-14T20:07:14Z

dagsir[bot]
bot Nov 14, 2023

Dumb question, what takes so long between STEP_WORKER_STARTING and STEP_WORKER_STARTED? I'm seeing like 5 min of down time in between before my jobs start.

Does it re-build the Definitions everytime? Does most of the time go to the forkserver? Is there a way to speed up the forserver?

The question was originally asked in Dagster Slack.

Answered by alangenfeld

Nov 14, 2023

what takes so long between STEP_WORKER_STARTING and STEP_WORKER_STARTED? I'm seeing like 5 min of down time in between before my jobs start

In the context of the default multiprocess executor, STEP_WORKER_STARTING happens before a sub-process is started and STEP_WORKER_STARTED happens after that subprocess has finished initializing and is now executing Dagster framework code.

Does it re-build the Definitions everytime?

Yep, we do not attempt to serialize the code / definitions objects. Instead a pointer to what file/module to import to load the code is passed down to the subprocess.

If you are using the forkserver start_method, then this should only happen once in the template process…

View full answer

alangenfeld · 2023-11-14T20:28:31Z

alangenfeld
Nov 14, 2023
Maintainer

what takes so long between STEP_WORKER_STARTING and STEP_WORKER_STARTED? I'm seeing like 5 min of down time in between before my jobs start

In the context of the default multiprocess executor, STEP_WORKER_STARTING happens before a sub-process is started and STEP_WORKER_STARTED happens after that subprocess has finished initializing and is now executing Dagster framework code.

Does it re-build the Definitions everytime?

Yep, we do not attempt to serialize the code / definitions objects. Instead a pointer to what file/module to import to load the code is passed down to the subprocess.

If you are using the forkserver start_method, then this should only happen once in the template process and the forked copies will not have to process the import again.

Does most of the time go to the forkserver? Is there a way to speed up the forserver?

When using forkserver, the first subprocess that starts the server will have to pay for the cost to start the template process. Subsequent subprocesses should be much faster.

That said, its not possible to say for certain where the time is being spent without measuring. A profiling tool like https://github.com/benfred/py-spy is recommended. Some more details available at #14771

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens between `STEP_WORKER_STARTING` and `STEP_WORKER_STARTED` ? #18014

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What happens between STEP_WORKER_STARTING and STEP_WORKER_STARTED ? #18014

dagsir[bot] bot Nov 14, 2023

Replies: 1 comment

alangenfeld Nov 14, 2023 Maintainer

What happens between `STEP_WORKER_STARTING` and `STEP_WORKER_STARTED` ? #18014

dagsir[bot]
bot Nov 14, 2023

alangenfeld
Nov 14, 2023
Maintainer