-
-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple worker executors #4869
Conversation
a04c133
to
758fa9e
Compare
I'm excited by the potential introduced by this PR :-) |
In general this looks good to me. I like the clean generalization. If you wanted to maintain backwards compatibility, and also maybe slightly reduce the cost of updating tests (which seems minimal anyway) then we might consider adding the following property class Worker:
@property
def executor(self):
return self.executors["default"] |
I suspect this also closes #4560 |
I did this initially but decide against it to limit the API maintenance. But of cause, if people are using |
I know of groups that I wouldn't be surprised to learn that they used it. I think that providing the property is a good idea for backwards compatibility. I don't think that we lose anything from it. |
Alright, added the property and a test |
This reverts commit 8bc91de.
This seems great to me. +1 I don't know enough about the current state of testing to know if the current failure is relevant, but it looks familiar to me, so I suspect that it's fine. For future work I think that we might want to start thinking about how to expose this through configuration options. Maybe we optionally accept strings instead of |
1 similar comment
This seems great to me. +1 I don't know enough about the current state of testing to know if the current failure is relevant, but it looks familiar to me, so I suspect that it's fine. For future work I think that we might want to start thinking about how to expose this through configuration options. Maybe we optionally accept strings instead of |
8647726
to
cd48a6c
Compare
cd48a6c
to
d0f3841
Compare
@jrbourbeau do you recognize the CI errors? I cannot reproduce them locally :/ |
|
In that case, this PR is ready for review :) |
I kicked the CI to rerun the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this @madsbk! This looks really nice, the diff is much smaller than I was expecting : )
if executor is utils._offload_executor: | ||
continue # Never shutdown the offload executor | ||
if isinstance(executor, ThreadPoolExecutor): | ||
executor._work_queue.queue.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we're doing this step manually (I realize this predates this PR -- this isn't meant to be a blocker for merging the changes here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because even with shutdown(wait=False)
, the ThreadPoolExecutor
will still run pending tasks. By calling executor._work_queue.queue.clear()
, we cancel them before shutting down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Thanks for clarifying! FWIW it looks like a cancel_futures=
keyword was added to Executor.shutdown
in Python 3.9. Just pointing this out as something we can use in the future once 3.9 is our minimum Python version
distributed/worker.py
Outdated
executor._work_queue.queue.clear() | ||
executor.shutdown(wait=executor_wait, timeout=timeout) | ||
else: | ||
executor.shutdown(wait=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also pass wait=executor_wait
and timeout=timeout
keywords here like we do above? For example, I might have a custom ProcessPoolExecutor
I'm using and want to have wait=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added wait=executor_wait
but timeout=timeout
isn't a valid concurrent.futures.Executor
argument. It is introduced in ThreadPoolExecutor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @madsbk! This is in
(the CI failures here are known flaky tests)
This is the first iteration of implementing multiple worker executors as discussed in #4655
black distributed
/flake8 distributed
/isort distributed
The idea is to send task annotations to the workers and let workers have a dict of executors
name -> ThreadPoolExecutor
. The default value of this dict is:If nothing is specified through the
executor
annotation, the worker will use the"default"
executor.NB: this doesn't control the number of concurrent tasks running on a worker. For now, use the already available
resource
management.Example of running CPU tasks in the default
ThreadPoolExecutor
and GPU tasks in an dedicatedThreadPoolExecutor
:Output