-
Notifications
You must be signed in to change notification settings - Fork 12
CUDA
A simple library-less CUDA implementation of DeviceRadixSort and OneSweep. This implementation is purely a demo. Its main purpose is to demystify the implmentations of the algorithms, while also providing data on the performance of the implementations in this library. It is not intended for production or use, instead proper implementatios can be found in the CUB library.
GPUSortingCUDA includes:
-
DeviceRadixSort: a general purpose 8-bit LSD radix sort using "reduce-then-scan" to perform the inter-threadblock prefix sum of digit counts. Although slower than OneSweep, use this whenever portability is a concern.
-
OneSweep: a general purpose 8-bit LSD radix sort using "chained-scan-with-decoupled-lookback" to perfom the the inter-threadblock prefix sum of digit counts.
GPUSortingCUDA currently supports:
- keys only sorting
- ascending order sorting
- a maximum sorting size of
$2^{32} - 128$ for DeviceRadixSort - a maxmimum sorting size of
$2^{30} - 128$ for OneSweep - Thearling and Smith entropy controlled benchmarking.
uint32_t
-
Clone or fork the repository.
-
Download and install CUDA toolkit 12.3.2
-
Download and install Visual Studio 2019 or greater.
-
Open the solution file with Visual studio. If on Visual 2022, updating the platform toolset from v142 to v143 should work without issue.
-
In Visual Studio go to Project > Properties > Configuration Properties > CUDA C/C++ > Device > Code Generation
-
Change
compute_x, sm_x
to match the compute capability of your graphics card. You can lookup your graphics card's compute capability here. For example my 2080 super is compute capability 7.5 so I set the properties tocompute_75, sm_75
-
Build and run the solution.
Create a sort object by calling the constructor for its wrapper class, and pass in the desired size of allocation for GPU resources:
OneSweepDispatcher* oneSweep = new OneSweepDispatcher(1 << 28);
Then run the tests:
oneSweep->TestAll();
Then benchmark the performance. Use the ENTROPY_PRESET
enum to control the entropy of the distribution:
oneSweep->BatchTiming(1 << 28, 50, 10, ENTROPY_PRESET_1);