Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: UniqueByKey ,This function needs to be sorted in order to calculate accurately #3232

Closed
1 task done
mcmingchang opened this issue Jan 2, 2025 · 1 comment
Closed
1 task done
Labels
bug Something isn't working right.

Comments

@mcmingchang
Copy link

Is this a duplicate?

Type of Bug

Something else

Component

CUB

Describe the bug

I created 110000 random values based on uint64_t, with no size limit on the values. This array needs to be sorted from small to large in order to obtain the correct result.

How to Reproduce

global void voxel_pts(Eigen::Vector3f* pts_o, Eigen::Vector3f* min_pts, uint64_t* hash, uint64_t* idx_sort, float voxel_size, size_t pt_num) {
int ind = blockIdx.x * blockDim.x + threadIdx.x;
uint64_t x, y, z, hash_seed = 0;
if (ind < pt_num) {
idx_sort[ind] = ind;
x = round((pts_o[ind].x() - min_pts[0].x()) / voxel_size);
y = round((pts_o[ind].y() - min_pts[0].y()) / voxel_size);
z = round((pts_o[ind].z() - min_pts[0].z()) / voxel_size);
hash[ind] = ((((14695981039346656037 ^ x) * 1099511628211) ^ y) * 1099511628211) ^ z;
}
}

unique_storage = nullptr; sort_storage = nullptr; scan_storage = nullptr;
unique_bytes = 0; sort_bytes = 0; scan_bytes = 0;

cub::DeviceRadixSort::SortPairs(sort_storage, sort_bytes, hash, hash, idx_sort, idx_sort, pt_num);
cudaMalloc(&sort_storage, sort_bytes);
cub::DeviceRadixSort::SortPairs(sort_storage, sort_bytes, hash, hash, idx_sort, idx_sort, pt_num);

cub::DeviceSelect::UniqueByKey(unique_storage, unique_bytes, hash, idx_sort, hash, idx_sort, unique_num, pt_num);
cudaMalloc(&unique_storage, unique_bytes);
cub::DeviceSelect::UniqueByKey(unique_storage, unique_bytes, hash, idx_sort, hash, idx_sort, unique_num, pt_num);
cudaMemcpy(h_unique_num, unique_num, sizeof(uint64_t), cudaMemcpyDeviceToHost);

Expected behavior

Calculate to obtain the correct result.

The original number of points was 110000, but after calculation, the error output was 100000, and the correct number is 30000.

Reproduction link

No response

Operating System

No response

nvidia-smi output

Thu Jan 2 15:01:40 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.36 Driver Version: 566.36 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 ... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 40C P8 1W / 105W | 0MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

NVCC version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Oct_30_01:18:48_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

@mcmingchang mcmingchang added the bug Something isn't working right. label Jan 2, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jan 2, 2025
@elstehle
Copy link
Contributor

elstehle commented Jan 2, 2025

Thank you for reporting this issue. From your description, it seems like the core of the problem might be a misunderstanding regarding the behavior of cub::DeviceSelect::UniqueByKey. Specifically, it appears that you may expect UniqueByKey to work correctly even when the input sequence is not sorted.

However, as noted in the documentation:

"Given an input sequence d_keys_in and d_values_in with runs of key-value pairs with consecutive equal-valued keys, only the first key and its value from each run is selectively copied to d_keys_out and d_values_out."

This means that UniqueByKey assumes the input sequence (d_keys_in and d_values_in) contains runs of consecutive equal-valued keys. If the input is not sorted, the results may not be as expected. This behavior is similar to std::unique.

If the input sequence is unsorted, you'll need to sort it first (as you've done using cub::DeviceRadixSort::SortPairs) before calling UniqueByKey. This ensures the keys are grouped consecutively, allowing UniqueByKey to work as intended.

Please let us know if this clarifies your concern or if you’re observing an issue outside of this expected behavior.

@github-project-automation github-project-automation bot moved this from Todo to Done in CCCL Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Archived in project
Development

No branches or pull requests

2 participants