Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect GPU tasks by inspecting inputs/outputs #4656

Open
mrocklin opened this issue Mar 31, 2021 · 4 comments
Open

Detect GPU tasks by inspecting inputs/outputs #4656

mrocklin opened this issue Mar 31, 2021 · 4 comments

Comments

@mrocklin
Copy link
Member

It would be useful to automagically detect which tasks engaged the GPU. This would allow us to more easily use both the CPU and GPU in mixed workloads, and require less configuration by the user. Unfortunately automatically detecting GPU tasks is hard.

There are a few approaches to this:

  1. Years ago I tried achieving this by looking at the serialized form of the task, and looking for text like b"cudf" or b"torch". This was surprisingly effective, but also cludgy as heck.
  2. Libraries like cudf could annotate layers, this may help less with PyTorch and delayed/futures though
  3. Users can handle this themselves with annotations and resource restrictions
  4. New idea! we could learn this by looking at the inputs and outputs of a function for common protocols like __cuda_array_interface__ and send that information back to the scheduler

So, the new idea again would be that whenever a task created a result that engaged the __cuda_array_interface__ protocol we would include that information as we sent it up to the scheduler. Probably this requires a new attribute like cuda_nbytes on the TaskState (which I'm personally fine with). The scheduler would watch for this signal, and if it occurred it would probably flip a cuda flag on the TaskPrefix, which would then trigger a signal that got sent down to all of the workers, and maybe pushed that task to run in a different ThreadPoolExecutor (see #4655 )

This would mis-allocate the first few tasks to the CPU Executor, but mostly it would do the right thing, and wouldn't require any intervention from the user

cc @dask/gpu

@jakirkham
Copy link
Member

I think @ayushdg has been exploring using annotations for heterogeneous cluster use cases. So he may have thoughts on that approach as well 🙂

@kkraus14
Copy link
Member

I think only using task results is going to cause problems. One of the things we see our users somewhat commonly do is self-contain all of their GPU work within a task where neither the inputs or outputs of the task are GPU objects.

@mrocklin
Copy link
Member Author

Hrm, good point. Maybe there is a holistic "try many different things" mix of approaches that we use to cover this space.

@mrocklin
Copy link
Member Author

Regardless, we should probably think about annotating tasks with a cuda flag, probably on the TaskPrefix, and make sure that that heads down to the Worker. Even if this doesn't do anything yet it might be useful to see how the plumbing there could work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants