Skip to content
Thomas Smith edited this page Mar 9, 2024 · 7 revisions

Overview

A headless demo for D3D12; no safety rails for the host code whatsoever. Release as a package is planned, but not in the immediate future.

Supported Features

GPUSortingD3D12 includes:

  • DeviceRadixSort: a general purpose 8-bit LSD radix sort using "reduce-then-scan" to perform the inter-threadblock prefix sum of digit counts. Although slower than OneSweep, use this whenever portability is a concern.

  • OneSweep: a general purpose 8-bit LSD radix sort using "chained-scan-with-decoupled-lookback" to perfom the the inter-threadblock prefix sum of digit counts. Use this if targetting desktop setups with somewhat recent hardware.

GPUSortingD3D12 currently supports:

  • keys only sorting
  • key-value pair supporting
  • ascending order sorting
  • descending order sorting
  • a maximum sorting size of $2^{32} - 128$ for DeviceRadixSort
  • a maxmimum sorting size of $2^{30} - 128$ for OneSweep
  • Thearling and Smith entropy controlled benchmarking.

GPUSortingD3D12 currently supports the following data types for keys and values:

  • uint32_t
  • int32_t
  • float

Getting Started

  1. Clone or fork the repository.

  2. Download and install Visual Studio 2019 or greater.

  3. Open the solution file with Visual studio. If on Visual 2022, updating the platform toolset from v142 to v143 should work without issue.

  4. Build and run the solution.

Example Use

To use GPUSortingD3D12 create a device and poll its capabilities likes so:

winrt::com_ptr<ID3D12Device> device = InitDevice();
DeviceInfo deviceInfo = GetDeviceInfo(device.get());

Then use the device pointer and information struct to create a sorting object. Use the enums found in GPUSorting.h to define the capabilities of the sorting object. These paramters cannot be changed without creating a new object.

DeviceRadixSort* dvr = new DeviceRadixSort(
        device, 
        deviceInfo,
        GPU_SORTING_ASCENDING,
        GPU_SORTING_KEY_FLOAT32,
        GPU_SORTING_PAYLOAD_UINT32);

Now run TestAll() to ensure the sort is running correctly:

dvr->TestAll();

To time the sort's performance on a uniform random distribution run BatchTiming(). To control the distribution of the keys generated for the tests a la Thearling and Smith, use the ENTROPY_PRESET enum. If no change to the distribution is desired, use ENTROPY_PRESET_1.

uint32_t testSize = 1 << 20;
uint32_t testSeed = 12345;
uint32_t testIterations = 100;
dvr->BatchTiming(testSize, testIterations, testSeed, ENTROPY_PRESET_1);

Portability Concerns

OneSweep does not work on WARP.

Clone this wiki locally