Issue using cub reduce on more than elements than fit into a 4 byte integer #129

felipeblazing · 2018-02-20T18:51:56Z

Reduce (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_items, ReductionOpT reduction_op, T init, cudaStream_t stream=0, bool debug_synchronous=false)

my issue seems to be that num_items is of type int so when I try to reduce more elements than fit into a 4 byte integer it overflows and the code obviously doesn't work properly. Given that GPUs are both growing in RAM size and that we can now oversubscribe by using cudaSharedMalloc are there any plans to change that number to be able to receive type size_t?

dumerrill · 2018-05-30T19:53:33Z

So, an explanation and a solution:

CUB's algorithms actually aren't hard-coded for 32-bit counts -- they are specialized by template parameter. However, 64b offsets require twice the register file as 32b offsets, and many of the algorithms (prefix sum, radix sort, etc.) have pervasive bookkeeping offsets and counts, so specializing for 64b counts often reduces performance as RF pressure reduces occupancy. So... the outer interface specializes everything for 32b int counts because the majority of people aren't reducing/scanning/sorting more than 2 billion items.

If you want to use a 64b count (e.g., int64_t or size_t or whatever), you can simply invoke the more generic interface underneath. See:

https://github.com/NVlabs/cub/blob/1.8.0/cub/device/device_reduce.cuh#L148

for example (which is where the outer interface specializes int as the offset type). Let me know if that unsticks you,

Duane

felipeblazing · 2018-05-31T15:39:31Z

Ok yes i see how to do this now. This does unstick us thank you.

Felipe

jakirkham · 2020-02-25T19:30:19Z

So maybe I'm missing something here, but it appears that num_items is still int. Is there a way to relax that constraint? It would be useful to have something like size_t here instead. Thoughts? 🙂

leofang · 2020-02-25T19:39:14Z

@jakirkham I guess what @dumerrill meant is to invoke DispatchReduce::Dispatch() ourselves with num_items being size_t?

jakirkham · 2020-02-25T19:49:01Z

Ah sorry. I got overly focused on the highlighted line. So IIUC we should be looking here. Is that right?

leofang · 2020-02-25T20:23:42Z

Yeah I guess so.

jakirkham · 2020-02-25T20:35:50Z

Also if using 32-bit is significantly more performant than 64-bit, what is the recommendation for doing reductions that exceed the size of 32-bit signed integers?

Additionally I understand that 32-bit has special value here, but why is a signed value used instead of an unsigned one? Switching would double the size of allowed values without affecting the number of bits used.

alliepiper · 2020-10-20T21:27:21Z

Closing as this is part of a larger issue being tracked in #212.

felipeblazing changed the title ~~Issue using cub reduce on more than elements than fit into a 4 bit integer~~ Issue using cub reduce on more than elements than fit into a 4 byte integer Feb 20, 2018

jakirkham mentioned this issue Feb 25, 2020

Type x_size as size_t cupy/cupy#3117

Closed

jakirkham mentioned this issue Mar 5, 2020

About unified memory in Cupy cupy/cupy#3127

Closed

leofang mentioned this issue Apr 28, 2020

CUB device_reduce and size_t cupy/cupy#3309

Open

alliepiper mentioned this issue Oct 13, 2020

Transparent support for 64-bit indexing in device algorithms #212

Closed

7 tasks

alliepiper closed this as completed Oct 20, 2020

maltenbergert mentioned this issue Jun 7, 2021

thrust::sort fails for > 2.1B keys NVIDIA/thrust#1453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue using cub reduce on more than elements than fit into a 4 byte integer #129

Issue using cub reduce on more than elements than fit into a 4 byte integer #129

felipeblazing commented Feb 20, 2018 •

edited

Loading

dumerrill commented May 30, 2018 •

edited

Loading

felipeblazing commented May 31, 2018

jakirkham commented Feb 25, 2020

leofang commented Feb 25, 2020

jakirkham commented Feb 25, 2020

leofang commented Feb 25, 2020

jakirkham commented Feb 25, 2020

alliepiper commented Oct 20, 2020

Issue using cub reduce on more than elements than fit into a 4 byte integer #129

Issue using cub reduce on more than elements than fit into a 4 byte integer #129

Comments

felipeblazing commented Feb 20, 2018 • edited Loading

dumerrill commented May 30, 2018 • edited Loading

felipeblazing commented May 31, 2018

jakirkham commented Feb 25, 2020

leofang commented Feb 25, 2020

jakirkham commented Feb 25, 2020

leofang commented Feb 25, 2020

jakirkham commented Feb 25, 2020

alliepiper commented Oct 20, 2020

felipeblazing commented Feb 20, 2018 •

edited

Loading

dumerrill commented May 30, 2018 •

edited

Loading