Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifies workload generation forDeviceMerge benchmarks #3645

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

elstehle
Copy link
Collaborator

@elstehle elstehle commented Feb 3, 2025

Description

PR #3529 has added benchmarks for DeviceMerge. During that PR, @gevtushenko noted the potential for unifying the workload generation between benchmarks for merging keys and merging key-value pairs.

This PR unifies the workload generation across the two files.

@elstehle elstehle requested a review from a team as a code owner February 3, 2025 06:07
@elstehle elstehle requested a review from gevtushenko February 3, 2025 06:08
Copy link
Contributor

github-actions bot commented Feb 3, 2025

🟩 CI finished in 1h 06m: Pass: 100%/90 | Total: 14h 57m | Avg: 9m 58s | Max: 34m 14s | Hits: 413%/12742
  • 🟩 cub: Pass: 100%/44 | Total: 7h 48m | Avg: 10m 38s | Max: 28m 00s | Hits: 539%/3512

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  7h 36m | Avg: 10m 52s | Max: 28m 00s | Hits: 539%/3512  
      🟩 arm64              Pass: 100%/2   | Total: 11m 57s | Avg:  5m 58s | Max:  6m 12s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 51m 24s | Avg: 10m 16s | Max: 26m 20s | Hits: 539%/878   
      🟩 12.5               Pass: 100%/2   | Total: 19m 07s | Avg:  9m 33s | Max:  9m 45s
      🟩 12.8               Pass: 100%/37  | Total:  6h 37m | Avg: 10m 45s | Max: 28m 00s | Hits: 539%/2634  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 17s | Avg:  4m 38s | Max:  4m 41s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 51m 24s | Avg: 10m 16s | Max: 26m 20s | Hits: 539%/878   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 19m 07s | Avg:  9m 33s | Max:  9m 45s
      🟩 nvcc12.8           Pass: 100%/35  | Total:  6h 28m | Avg: 11m 06s | Max: 28m 00s | Hits: 539%/2634  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 17s | Avg:  4m 38s | Max:  4m 41s
      🟩 nvcc               Pass: 100%/42  | Total:  7h 39m | Avg: 10m 55s | Max: 28m 00s | Hits: 539%/3512  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 24m 33s | Avg:  6m 08s | Max:  6m 19s
      🟩 Clang15            Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 36s
      🟩 Clang16            Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 13s
      🟩 Clang17            Pass: 100%/2   | Total: 13m 02s | Avg:  6m 31s | Max:  6m 36s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 09m | Avg:  9m 57s | Max: 23m 09s
      🟩 GCC7               Pass: 100%/2   | Total: 12m 29s | Avg:  6m 14s | Max:  6m 22s
      🟩 GCC8               Pass: 100%/1   | Total:  6m 37s | Avg:  6m 37s | Max:  6m 37s
      🟩 GCC9               Pass: 100%/2   | Total: 13m 10s | Avg:  6m 35s | Max:  6m 40s
      🟩 GCC10              Pass: 100%/2   | Total: 13m 31s | Avg:  6m 45s | Max:  7m 06s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 35s | Avg:  6m 17s | Max:  6m 20s
      🟩 GCC12              Pass: 100%/2   | Total: 13m 18s | Avg:  6m 39s | Max:  6m 51s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 15m | Avg: 13m 34s | Max: 24m 26s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 53m 48s | Avg: 26m 54s | Max: 27m 28s | Hits: 539%/1756  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 55m 52s | Avg: 27m 56s | Max: 28m 00s | Hits: 539%/1756  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 19m 07s | Avg:  9m 33s | Max:  9m 45s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 12m | Avg:  7m 46s | Max: 23m 09s
      🟩 GCC                Pass: 100%/21  | Total:  3h 27m | Avg:  9m 52s | Max: 24m 26s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 49m | Avg: 27m 25s | Max: 28m 00s | Hits: 539%/3512  
      🟩 NVHPC              Pass: 100%/2   | Total: 19m 07s | Avg:  9m 33s | Max:  9m 45s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 29m 38s | Avg: 14m 49s | Max: 24m 26s
      🟩 rtx2080            Pass: 100%/34  | Total:  5h 03m | Avg:  8m 56s | Max: 28m 00s | Hits: 539%/3512  
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 14m | Avg: 16m 51s | Max: 23m 09s
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 22m | Avg:  8m 42s | Max: 28m 00s | Hits: 539%/3512  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 50s | Avg: 19m 50s | Max: 19m 50s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 37s | Avg: 15m 37s | Max: 15m 37s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 25s | Max: 24m 26s
      🟩 TestGPU            Pass: 100%/2   | Total: 40m 12s | Avg: 20m 06s | Max: 21m 01s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 29m 38s | Avg: 14m 49s | Max: 24m 26s
      🟩 90;90a;100         Pass: 100%/1   | Total:  7m 18s | Avg:  7m 18s | Max:  7m 18s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 12m | Avg:  9m 37s | Max: 28m 00s | Hits: 539%/2634  
      🟩 20                 Pass: 100%/24  | Total:  4h 35m | Avg: 11m 29s | Max: 27m 52s | Hits: 539%/878   
    
  • 🟩 thrust: Pass: 100%/43 | Total: 6h 36m | Avg: 9m 13s | Max: 34m 14s | Hits: 365%/9230

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 36s | Avg:  8m 48s | Max: 11m 14s
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  6h 26m | Avg:  9m 26s | Max: 34m 14s | Hits: 365%/9230  
      🟩 arm64              Pass: 100%/2   | Total:  9m 57s | Avg:  4m 58s | Max:  5m 14s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 44m 10s | Avg:  8m 50s | Max: 22m 54s | Hits: 365%/1846  
      🟩 12.5               Pass: 100%/2   | Total: 31m 15s | Avg: 15m 37s | Max: 15m 56s
      🟩 12.8               Pass: 100%/36  | Total:  5h 21m | Avg:  8m 55s | Max: 34m 14s | Hits: 365%/7384  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 45s | Avg:  5m 22s | Max:  5m 38s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 44m 10s | Avg:  8m 50s | Max: 22m 54s | Hits: 365%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 31m 15s | Avg: 15m 37s | Max: 15m 56s
      🟩 nvcc12.8           Pass: 100%/34  | Total:  5h 10m | Avg:  9m 08s | Max: 34m 14s | Hits: 365%/7384  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 45s | Avg:  5m 22s | Max:  5m 38s
      🟩 nvcc               Pass: 100%/41  | Total:  6h 25m | Avg:  9m 24s | Max: 34m 14s | Hits: 365%/9230  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 52s | Avg:  5m 28s | Max:  5m 50s
      🟩 Clang15            Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  6m 05s
      🟩 Clang16            Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  6m 03s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 37s
      🟩 Clang18            Pass: 100%/7   | Total: 45m 32s | Avg:  6m 30s | Max: 10m 29s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 14s | Avg:  5m 37s | Max:  5m 52s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 36s | Avg:  5m 48s | Max:  6m 08s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 47s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 12s | Avg:  6m 06s | Max:  6m 22s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 48s | Avg:  6m 24s | Max:  6m 29s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 02m | Avg:  7m 45s | Max: 11m 39s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 49m 21s | Avg: 24m 40s | Max: 26m 27s | Hits: 365%/3692  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 26m | Avg: 28m 50s | Max: 34m 14s | Hits: 365%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 31m 15s | Avg: 15m 37s | Max: 15m 56s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 42m | Avg:  6m 01s | Max: 10m 29s
      🟩 GCC                Pass: 100%/19  | Total:  2h 07m | Avg:  6m 41s | Max: 11m 39s
      🟩 MSVC               Pass: 100%/5   | Total:  2h 15m | Avg: 27m 10s | Max: 34m 14s | Hits: 365%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total: 31m 15s | Avg: 15m 37s | Max: 15m 56s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total:  4h 26m | Avg:  8m 04s | Max: 26m 27s | Hits: 365%/5538  
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 10m | Avg: 13m 00s | Max: 34m 14s | Hits: 365%/3692  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  5h 13m | Avg:  8m 27s | Max: 27m 54s | Hits: 365%/7384  
      🟩 TestCPU            Pass: 100%/3   | Total: 50m 07s | Avg: 16m 42s | Max: 34m 14s | Hits: 365%/1846  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 22s | Avg: 11m 07s | Max: 11m 39s
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 25s | Avg:  6m 25s | Max:  6m 25s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  3h 02m | Avg:  9m 07s | Max: 26m 27s | Hits: 365%/5538  
      🟩 20                 Pass: 100%/21  | Total:  3h 16m | Avg:  9m 21s | Max: 34m 14s | Hits: 365%/3692  
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 7m 00s | Avg: 3m 30s | Max: 4m 56s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  4m 56s
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  4m 56s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  4m 56s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  4m 56s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  4m 56s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  4m 56s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  7m 00s | Avg:  3m 30s | Max:  4m 56s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
      🟩 Test               Pass: 100%/1   | Total:  4m 56s | Avg:  4m 56s | Max:  4m 56s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 41s | Avg: 25m 41s | Max: 25m 41s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 90)

# Runner
65 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
4 linux-arm64-cpu16
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1
1 linux-amd64-gpu-h100-latest-1

@elstehle elstehle merged commit 60994b0 into NVIDIA:main Feb 14, 2025
105 of 108 checks passed
davebayer pushed a commit to davebayer/cccl that referenced this pull request Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants