Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTX: Add tcgen05 instructions #3607

Merged
merged 11 commits into from
Jan 30, 2025
Merged

PTX: Add tcgen05 instructions #3607

merged 11 commits into from
Jan 30, 2025

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Jan 30, 2025

@bernhardmgruber bernhardmgruber marked this pull request as ready for review January 30, 2025 12:11
@NVIDIA NVIDIA deleted a comment from copy-pr-bot bot Jan 30, 2025
Copy link
Contributor

🟩 CI finished in 1h 28m: Pass: 100%/152 | Total: 3d 00h | Avg: 28m 37s | Max: 1h 13m | Hits: 416%/21675
  • 🟩 cub: Pass: 100%/44 | Total: 1d 15h | Avg: 53m 36s | Max: 1h 13m | Hits: 159%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 13h | Avg: 53m 18s | Max:  1h 13m | Hits: 159%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 45s | Max: 59m 55s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 02m | Hits: 159%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
      🟩 12.6               Pass: 100%/37  | Total:  1d 08h | Avg: 52m 04s | Max:  1h 13m | Hits: 159%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 07m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 01m | Avg:  1h 00m | Max:  1h 02m | Hits: 159%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1d 05h | Avg: 51m 12s | Max:  1h 13m | Hits: 159%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 07m
      🟩 nvcc               Pass: 100%/42  | Total:  1d 13h | Avg: 52m 56s | Max:  1h 13m | Hits: 159%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 57m | Avg: 59m 16s | Max:  1h 02m
      🟩 Clang15            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 40s | Max:  1h 00m
      🟩 Clang16            Pass: 100%/2   | Total:  1h 54m | Avg: 57m 25s | Max: 58m 12s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 14s | Max:  1h 01m
      🟩 Clang18            Pass: 100%/7   | Total:  6h 02m | Avg: 51m 43s | Max:  1h 07m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 38s | Max: 59m 18s
      🟩 GCC8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
      🟩 GCC9               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 39s | Max:  1h 01m
      🟩 GCC10              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 59s | Max:  1h 02m
      🟩 GCC11              Pass: 100%/2   | Total:  1h 56m | Avg: 58m 03s | Max: 59m 30s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 57m | Avg: 44m 26s | Max:  1h 03m
      🟩 GCC13              Pass: 100%/8   | Total:  4h 53m | Avg: 36m 39s | Max:  1h 02m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 13m | Hits: 159%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 19m | Avg:  1h 09m | Max:  1h 11m | Hits: 158%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 49m | Avg: 55m 52s | Max:  1h 07m
      🟩 GCC                Pass: 100%/21  | Total: 16h 45m | Avg: 47m 52s | Max:  1h 03m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 32m | Avg:  1h 08m | Max:  1h 13m | Hits: 159%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 05m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 55m 40s | Avg: 27m 50s | Max: 29m 24s
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 11m | Avg: 31m 25s | Max: 58m 19s
      🟩 v100               Pass: 100%/34  | Total:  1d 10h | Avg:  1h 00m | Max:  1h 13m | Hits: 159%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 12h | Avg: 59m 17s | Max:  1h 13m | Hits: 159%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 42s | Avg: 19m 42s | Max: 19m 42s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 29s | Avg: 15m 29s | Max: 15m 29s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 23m | Avg: 27m 48s | Max: 29m 24s
      🟩 TestGPU            Pass: 100%/2   | Total: 46m 18s | Avg: 23m 09s | Max: 23m 26s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 55m 40s | Avg: 27m 50s | Max: 29m 24s
      🟩 90a                Pass: 100%/1   | Total: 26m 03s | Avg: 26m 03s | Max: 26m 03s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 32m | Avg:  1h 01m | Max:  1h 13m | Hits: 159%/2664  
      🟩 20                 Pass: 100%/24  | Total: 18h 46m | Avg: 46m 55s | Max:  1h 11m | Hits: 158%/888   
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 7h 12m | Avg: 10m 03s | Max: 34m 59s | Hits: 680%/10217

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  7h 05m | Avg: 10m 22s | Max: 34m 59s | Hits: 680%/10217 
      🟩 arm64              Pass: 100%/2   | Total:  7m 26s | Avg:  3m 43s | Max:  3m 49s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 54m 09s | Avg: 10m 49s | Max: 22m 57s | Hits: 671%/2509  
      🟩 12.5               Pass: 100%/2   | Total: 43m 15s | Avg: 21m 37s | Max: 34m 59s
      🟩 12.6               Pass: 100%/36  | Total:  5h 35m | Avg:  9m 18s | Max: 28m 06s | Hits: 683%/7708  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 06m | Avg: 16m 33s | Max: 20m 10s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 54m 09s | Avg: 10m 49s | Max: 22m 57s | Hits: 671%/2509  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 43m 15s | Avg: 21m 37s | Max: 34m 59s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  4h 28m | Avg:  8m 24s | Max: 28m 06s | Hits: 683%/7708  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 06m | Avg: 16m 33s | Max: 20m 10s
      🟩 nvcc               Pass: 100%/39  | Total:  6h 06m | Avg:  9m 23s | Max: 34m 59s | Hits: 680%/10217 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 17m 15s | Avg:  4m 18s | Max:  4m 39s
      🟩 Clang15            Pass: 100%/2   | Total: 21m 28s | Avg: 10m 44s | Max: 16m 27s
      🟩 Clang16            Pass: 100%/2   | Total:  9m 11s | Avg:  4m 35s | Max:  4m 46s
      🟩 Clang17            Pass: 100%/2   | Total: 18m 21s | Avg:  9m 10s | Max: 13m 48s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 28m | Avg: 11m 01s | Max: 20m 10s
      🟩 GCC7               Pass: 100%/2   | Total: 36m 23s | Avg: 18m 11s | Max: 19m 14s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 46s | Avg:  3m 46s | Max:  3m 46s
      🟩 GCC9               Pass: 100%/2   | Total:  8m 15s | Avg:  4m 07s | Max:  4m 17s
      🟩 GCC10              Pass: 100%/2   | Total:  8m 13s | Avg:  4m 06s | Max:  4m 18s
      🟩 GCC11              Pass: 100%/2   | Total:  8m 16s | Avg:  4m 08s | Max:  4m 10s
      🟩 GCC12              Pass: 100%/2   | Total:  8m 25s | Avg:  4m 12s | Max:  4m 14s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 44s | Max: 16m 34s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 46m 50s | Avg: 23m 25s | Max: 23m 53s | Hits: 677%/5028  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 52m 47s | Avg: 26m 23s | Max: 28m 06s | Hits: 683%/5189  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 43m 15s | Avg: 21m 37s | Max: 34m 59s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/18  | Total:  2h 34m | Avg:  8m 34s | Max: 20m 10s
      🟩 GCC                Pass: 100%/19  | Total:  2h 15m | Avg:  7m 07s | Max: 19m 14s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 39m | Avg: 24m 54s | Max: 28m 06s | Hits: 680%/10217 
      🟩 NVHPC              Pass: 100%/2   | Total: 43m 15s | Avg: 21m 37s | Max: 34m 59s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/6   | Total: 59m 24s | Avg:  9m 54s | Max: 16m 34s
      🟩 v100               Pass: 100%/37  | Total:  6h 13m | Avg: 10m 05s | Max: 34m 59s | Hits: 680%/10217 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  6h 20m | Avg: 10m 00s | Max: 34m 59s | Hits: 680%/10217 
      🟩 NVRTC              Pass: 100%/2   | Total: 32m 14s | Avg: 16m 07s | Max: 16m 34s
      🟩 Test               Pass: 100%/2   | Total: 18m 04s | Avg:  9m 02s | Max:  9m 05s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 55s | Avg:  1m 55s | Max:  1m 55s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 32m 14s | Avg: 16m 07s | Max: 16m 34s
      🟩 90                 Pass: 100%/1   | Total: 12m 29s | Avg: 12m 29s | Max: 12m 29s
      🟩 90a                Pass: 100%/2   | Total: 20m 51s | Avg: 10m 25s | Max: 13m 47s
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  3h 47m | Avg: 10m 51s | Max: 24m 41s | Hits: 679%/7547  
      🟩 20                 Pass: 100%/21  | Total:  3h 22m | Avg:  9m 39s | Max: 34m 59s | Hits: 682%/2670  
    
  • 🟩 thrust: Pass: 100%/42 | Total: 23h 31m | Avg: 33m 35s | Max: 1h 08m | Hits: 177%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 39m 11s | Avg: 19m 35s | Max: 27m 57s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 22h 31m | Avg: 33m 47s | Max:  1h 08m | Hits: 177%/7384  
      🟩 arm64              Pass: 100%/2   | Total: 59m 35s | Avg: 29m 47s | Max: 31m 33s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 20m | Avg: 40m 07s | Max:  1h 08m | Hits: 177%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  1h 46m | Avg: 53m 02s | Max: 53m 56s
      🟩 12.6               Pass: 100%/35  | Total: 18h 24m | Avg: 31m 33s | Max:  1h 06m | Hits: 177%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 53m 54s | Avg: 26m 57s | Max: 27m 27s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 20m | Avg: 40m 07s | Max:  1h 08m | Hits: 177%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 46m | Avg: 53m 02s | Max: 53m 56s
      🟩 nvcc12.6           Pass: 100%/33  | Total: 17h 30m | Avg: 31m 49s | Max:  1h 06m | Hits: 177%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 54s | Avg: 26m 57s | Max: 27m 27s
      🟩 nvcc               Pass: 100%/40  | Total: 22h 37m | Avg: 33m 55s | Max:  1h 08m | Hits: 177%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 13m | Avg: 33m 15s | Max: 34m 25s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 29s | Max: 32m 35s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 52s | Max: 34m 29s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 01s | Max: 33m 37s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 45m | Avg: 23m 39s | Max: 33m 23s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 02m | Avg: 31m 19s | Max: 32m 53s
      🟩 GCC8               Pass: 100%/1   | Total: 32m 12s | Avg: 32m 12s | Max: 32m 12s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 38s | Max: 42m 05s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 49s | Max: 37m 47s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 10s | Max: 31m 27s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 55s | Max: 35m 53s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 57m | Avg: 22m 13s | Max: 35m 25s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 08m | Hits: 177%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 06m | Hits: 177%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 46m | Avg: 53m 02s | Max: 53m 56s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 15m | Avg: 29m 08s | Max: 34m 29s
      🟩 GCC                Pass: 100%/19  | Total:  9h 07m | Avg: 28m 49s | Max: 42m 05s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 21m | Avg:  1h 05m | Max:  1h 08m | Hits: 177%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 46m | Avg: 53m 02s | Max: 53m 56s
    🟩 gpu
      🟩 rtx4090            Pass: 100%/8   | Total:  2h 24m | Avg: 18m 03s | Max: 35m 25s
      🟩 v100               Pass: 100%/34  | Total: 21h 06m | Avg: 37m 15s | Max:  1h 08m | Hits: 177%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 22h 41m | Avg: 36m 48s | Max:  1h 08m | Hits: 177%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 59s | Avg:  7m 59s | Max:  8m 07s
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 09s | Avg: 11m 03s | Max: 11m 25s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 20m 08s | Avg: 20m 08s | Max: 20m 08s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 57m | Avg: 38m 53s | Max:  1h 08m | Hits: 177%/5538  
      🟩 20                 Pass: 100%/20  | Total:  9h 54m | Avg: 29m 42s | Max:  1h 06m | Hits: 177%/1846  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 51m | Avg: 5m 34s | Max: 13m 10s | Hits: 383%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 37m | Avg:  6m 04s | Max: 13m 10s | Hits: 383%/522   
      🟩 arm64              Pass: 100%/4   | Total: 14m 09s | Avg:  3m 32s | Max:  3m 40s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 27s | Avg:  9m 27s | Max:  9m 27s | Hits: 383%/261   
      🟩 12.5               Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 25s
      🟩 12.6               Pass: 100%/17  | Total:  1h 29m | Avg:  5m 15s | Max: 13m 10s | Hits: 383%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 27s | Avg:  9m 27s | Max:  9m 27s | Hits: 383%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 25s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 29m | Avg:  5m 15s | Max: 13m 10s | Hits: 383%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 51m | Avg:  5m 34s | Max: 13m 10s | Hits: 383%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 08s | Avg:  4m 08s | Max:  4m 08s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 03s | Avg:  4m 03s | Max:  4m 03s
      🟩 Clang17            Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
      🟩 Clang18            Pass: 100%/4   | Total: 22m 34s | Avg:  5m 38s | Max: 11m 31s
      🟩 GCC10              Pass: 100%/1   | Total:  4m 01s | Avg:  4m 01s | Max:  4m 01s
      🟩 GCC11              Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
      🟩 GCC12              Pass: 100%/2   | Total: 17m 24s | Avg:  8m 42s | Max: 13m 10s
      🟩 GCC13              Pass: 100%/4   | Total: 13m 37s | Avg:  3m 24s | Max:  3m 40s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 27s | Avg:  9m 27s | Max:  9m 27s | Hits: 383%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 40s | Avg: 11m 40s | Max: 11m 40s | Hits: 383%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 25s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 38m 40s | Avg:  4m 50s | Max: 11m 31s
      🟩 GCC                Pass: 100%/8   | Total: 39m 04s | Avg:  4m 53s | Max: 13m 10s
      🟩 MSVC               Pass: 100%/2   | Total: 21m 07s | Avg: 10m 33s | Max: 11m 40s | Hits: 383%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 36s | Avg:  6m 18s | Max:  6m 25s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 32m 54s | Avg:  8m 13s | Max: 13m 10s
      🟩 v100               Pass: 100%/16  | Total:  1h 18m | Avg:  4m 54s | Max: 11m 40s | Hits: 383%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 26m | Avg:  4m 49s | Max: 11m 40s | Hits: 383%/522   
      🟩 Test               Pass: 100%/2   | Total: 24m 41s | Avg: 12m 20s | Max: 13m 10s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s
      🟩 90a                Pass: 100%/1   | Total:  3m 24s | Avg:  3m 24s | Max:  3m 24s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 16s | Avg:  4m 04s | Max:  6m 11s
      🟩 20                 Pass: 100%/16  | Total:  1h 35m | Avg:  5m 56s | Max: 13m 10s | Hits: 383%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 7m 24s | Avg: 3m 42s | Max: 5m 11s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  5m 11s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  5m 11s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  5m 11s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  5m 11s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  5m 11s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  5m 11s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  5m 11s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 13s | Avg:  2m 13s | Max:  2m 13s
      🟩 Test               Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
    
  • 🟩 python: Pass: 100%/1 | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 29m 58s | Avg: 29m 58s | Max: 29m 58s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 152)

# Runner
110 linux-amd64-cpu16
14 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
1 linux-amd64-gpu-h100-latest-1

@miscco miscco merged commit 38983eb into NVIDIA:main Jan 30, 2025
165 of 169 checks passed
Copy link
Contributor

Backport failed for branch/2.8.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin branch/2.8.x
git worktree add -d .worktree/backport-3607-to-branch/2.8.x origin/branch/2.8.x
cd .worktree/backport-3607-to-branch/2.8.x
git checkout -b backport-3607-to-branch/2.8.x
ancref=$(git merge-base afa2ca25d00fc9bd8037b3b2ca064f2c18708bfc 2a51ac3d7ee1d8e4dc296cd6205e831798837a3d)
git cherry-pick -x $ancref..2a51ac3d7ee1d8e4dc296cd6205e831798837a3d

@bernhardmgruber bernhardmgruber deleted the ptx_tcgen05 branch January 30, 2025 15:42
bernhardmgruber added a commit that referenced this pull request Jan 31, 2025
* ptx: Add tcgen05.alloc

* ptx: Add tcgen05.commit

* ptx: Add tcgen05.cp

* ptx: Add tcgen05.fence

* ptx: Add tcgen05.ld

* ptx: Add tcgen05.mma

* ptx: Add tcgen05.mma.ws

* ptx: Add tcgen05.shift

* ptx: Add tcgen05.st

* ptx: Add tcgen05.wait

* fix docs

---------

Co-authored-by: Allard Hendriksen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants