[FEA] Async mode for cudf.series operations #13087

lmeyerov · 2023-04-07T17:09:44Z

Is your feature request related to a problem? Please describe.

We get wide dataframes in situations like machine learning (easily 1-5K cols) and genomics (10K+ cols), and while there is some speedup from cudf (say 2-3X), it'd be easy to get to the 10X+ level with much higher GPU utilization if we could spawn concurrent tasks for each column . Getting this all the way to the df level seems tricky, but async primitives at the column level would get us far.

One Python-native idea is doing via async/await, when one cudf operation is getting scheduled, allocated, & run, we can be scheduling the next, and ideally, cudf can run them independently . It smoothed out 2-3 years ago in python + javascript as a popular native choice, and has since been a lot more popular in pydata, e.g., langchain just rewrote to support async versions of all methods. Ex: https://trends.google.com/trends/explore?date=all&q=async%20await&hl=en . Separately, there's heightened value for pydata dashboarding scenarios like plotly, streamlit, etc as these ecosystem increasingly build for async io underneath as well.

(Another idea with precedent is a lazy mode similar to haskell or dask, discussed below as well)

Describe the solution you'd like

I'd like to be do something like:

async def f(s: cudf.Series) -> cudf.Series:
    # async mode for core series operations lets other f() calls proceed while this runs
    s2 = await  s.stra.hex_to_int('AABBCC')
   
    # math can be clean and enable the same
    # if we're super clever, this may even unlock query plan optimizations like fusion in the future
    async with cudf.async.binop_mode:
        s3_a = s2 + 1 / 3
        s3 = await s3_a

   return s3
  
cols2 = await async.gather([  f(df[col]) for col in df ])

Describe alternatives you've considered

Use existing abstractions

In theory we can setup threads or multiple dask workers, but (1) both are super awkward, (2) underneath, cudf will not do concurrent jobs

Lazy cudf

Another thought is to create a lazy mode for cudf. This has precedent with Haskell, and in modern pydata land, more so with polars. Dask does this too, and we'd use it if that can work, but it's awkward -- I haven't used, but polars sounds to be more friendly in practice:

def f(s: cudf.Series) -> cudf.Series:
    # explicitly lazy ops
    s2 = s.str_lazy.hex_to_int('AABBCC')

    # binops know they're lazy
    s3 = s2 + 1 / 3

    return s3

# force with async friendliness  
cols2 = await cudf_client.compute_async([  f(df[col]) for col in df ])

Underneath, cudf can reinvent async/io, dask, or whatever

Additional context

Slack thread: https://rapids-goai.slack.com/archives/C5E06F4DC/p1680710488795869

The text was updated successfully, but these errors were encountered:

lmeyerov · 2023-04-07T17:28:08Z

From Slack: There's thinking of starting at the level of pylibcudf cpython bindings layer

I can imagine there may be some worthwhile cudf internals that can be the first consumer of such things. Ex: Maybe some cuml kernels, or df.str.xyz df-level ones like hashing.

GregoryKimball · 2023-06-05T18:39:26Z

Thank you @lmeyerov for raising this, and thank you for joining into the Slack discussion.

Here are some points from the discussion:

RAPIDS is working on a new pylibcudf API that has low-level bindings to libcudf and will enable CUDA stream control. We expect that the appropriate usage pattern of streams in Python could accelerate your high-column-count transformations, even if it's not host-async
asyncio is associated with high overhead for short-running tasks, and most column transformations are likely to be small tasks
When CUDA streams are exposed in the python API, users will be able to write their own asyncio coroutines.

lmeyerov · 2023-06-06T21:59:22Z

Yes, that'll be interesting to us wrt some of the hot loops in https://github.com/graphistry/cu-cat: parallel per-column tasks, some sparse matrix math, ...

vyasr · 2025-01-31T00:14:31Z

Now that pylibcudf is close to being a usable standalone option it would be feasible for someone to build an async API on top of it. We should be adding stream ordering soon (#17620), which should make pylibcudf primitives sufficient to work with an async API. I'm still not sure that async/await is quite the right layer to put on top of stream-ordered operations, but it's worth exploring as a possibility.

lmeyerov · 2025-01-31T02:18:48Z

Yeah I think it is still interesting to us from letting the cpu continue while GPU is blocked (eg, unblock a web server response handler), vs as a way to have task parallel GPU tasks for more GPU saturation. Think web apps, dashboards, streaming ETL orchestration, etc.

We care more about the former in our scenario. Arguably, getting native python structured concurrency primitives for the former can start setting up native python support for more targeted task parallel pattern etc. instructions, but not obvious to me... Keith's old discussions here on stream experiences seem relevant...

lmeyerov added Needs Triage Need team to review and classify feature request New feature or request labels Apr 7, 2023

GregoryKimball added 0 - Waiting on Author Waiting for author to respond to review 0 - Backlog In queue waiting for assignment and removed Needs Triage Need team to review and classify 0 - Waiting on Author Waiting for author to respond to review labels Jun 5, 2023

GregoryKimball added this to the Enable streams milestone Jun 6, 2023

GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. labels Jun 6, 2023

GregoryKimball mentioned this issue Aug 18, 2023

[FEA] Introduce the pylibcudf API and subpackage #13921

Closed

vyasr mentioned this issue Feb 27, 2024

[FEA] Expose stream-ordered APIs in pylibcudf #15163

Open

vyasr added this to cuDF Python Nov 5, 2024

github-project-automation bot moved this to Todo in cuDF Python Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Async mode for cudf.series operations #13087

[FEA] Async mode for cudf.series operations #13087

lmeyerov commented Apr 7, 2023 •

edited

Loading

lmeyerov commented Apr 7, 2023

GregoryKimball commented Jun 5, 2023 •

edited

Loading

lmeyerov commented Jun 6, 2023

vyasr commented Jan 31, 2025

lmeyerov commented Jan 31, 2025 •

edited

Loading

[FEA] Async mode for cudf.series operations #13087

[FEA] Async mode for cudf.series operations #13087

Comments

lmeyerov commented Apr 7, 2023 • edited Loading

lmeyerov commented Apr 7, 2023

GregoryKimball commented Jun 5, 2023 • edited Loading

lmeyerov commented Jun 6, 2023

vyasr commented Jan 31, 2025

lmeyerov commented Jan 31, 2025 • edited Loading

lmeyerov commented Apr 7, 2023 •

edited

Loading

GregoryKimball commented Jun 5, 2023 •

edited

Loading

lmeyerov commented Jan 31, 2025 •

edited

Loading