[FEA]: Introduce cache-modified input iterator into cuda.parallel #2536

gevtushenko · 2024-10-10T16:48:12Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

Not sure

Is your feature request related to a problem? Please describe.

Usage of cuda.parallel in applications like llm.c (example) is currently blocked by lack of cache-modified iterators support.

Describe the solution you'd like

We need an functional alternative of cache-modified iterator in cuda.parallel.itertools. Design might follow the API that @fbusato came up with in #2487. For instance:

d_input = cp.array([8, 6, 7, 5, 3, 0, 9], dtype=dtype)
d_streaming_input = cudax.itertools.accessor(d_input, "eviction_policy::no_allocation")
cudax.reduce(d_streaming_input)

should lead to streaming loads of d_input (ld.global.cs instruction in PTX)

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

gevtushenko added the feature request New feature or request. label Oct 10, 2024

github-project-automation bot added this to CCCL Oct 10, 2024

github-project-automation bot moved this to Todo in CCCL Oct 10, 2024

gevtushenko assigned rwgk Oct 10, 2024

jollylili added the 2.8.0 target for 2.8.0 release label Nov 15, 2024

jollylili moved this from Todo to In Progress in CCCL Nov 21, 2024

rwgk mentioned this issue Dec 6, 2024

[WIP] Support fancy iterators in cuda.parallel #2788

Merged

gevtushenko closed this as completed in #2788 Dec 6, 2024

github-project-automation bot moved this from In Progress to Done in CCCL Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Introduce cache-modified input iterator into cuda.parallel #2536

[FEA]: Introduce cache-modified input iterator into cuda.parallel #2536

gevtushenko commented Oct 10, 2024

[FEA]: Introduce cache-modified input iterator into cuda.parallel #2536

[FEA]: Introduce cache-modified input iterator into cuda.parallel #2536

Comments

gevtushenko commented Oct 10, 2024

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context