You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Usage of cuda.parallel in applications like llm.c (example) is currently blocked by lack of cache-modified iterators support.
Describe the solution you'd like
We need an functional alternative of cache-modified iterator in cuda.parallel.itertools. Design might follow the API that @fbusato came up with in #2487. For instance:
Is this a duplicate?
Area
Not sure
Is your feature request related to a problem? Please describe.
Usage of cuda.parallel in applications like llm.c (example) is currently blocked by lack of cache-modified iterators support.
Describe the solution you'd like
We need an functional alternative of cache-modified iterator in cuda.parallel.itertools. Design might follow the API that @fbusato came up with in #2487. For instance:
should lead to streaming loads of
d_input
(ld.global.cs
instruction in PTX)Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: