What happens between STEP_WORKER_STARTING
and STEP_WORKER_STARTED
?
#18014
-
Dumb question, what takes so long between STEP_WORKER_STARTING and STEP_WORKER_STARTED? I'm seeing like 5 min of down time in between before my jobs start. Does it re-build the Definitions everytime? Does most of the time go to the forkserver? Is there a way to speed up the forserver? The question was originally asked in Dagster Slack. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
In the context of the default multiprocess executor,
Yep, we do not attempt to serialize the code / definitions objects. Instead a pointer to what file/module to import to load the code is passed down to the subprocess. If you are using the forkserver start_method, then this should only happen once in the template process and the forked copies will not have to process the import again.
When using forkserver, the first subprocess that starts the server will have to pay for the cost to start the template process. Subsequent subprocesses should be much faster. That said, its not possible to say for certain where the time is being spent without measuring. A profiling tool like https://github.com/benfred/py-spy is recommended. Some more details available at #14771 |
Beta Was this translation helpful? Give feedback.
In the context of the default multiprocess executor,
STEP_WORKER_STARTING
happens before a sub-process is started andSTEP_WORKER_STARTED
happens after that subprocess has finished initializing and is now executing Dagster framework code.Yep, we do not attempt to serialize the code / definitions objects. Instead a pointer to what file/module to import to load the code is passed down to the subprocess.
If you are using the forkserver start_method, then this should only happen once in the template process…