Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync ptx helpers with libcudaptx #3564

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

bernhardmgruber
Copy link
Contributor

No description provided.

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner January 28, 2025 14:42
@bernhardmgruber bernhardmgruber changed the title Sync ptx_dot_variants.h with libcuda-ptx Sync ptx helpers with libcudaptx Jan 28, 2025
@@ -22,6 +22,7 @@
# pragma system_header
#endif // no system header

#include <cuda/std/__type_traits/enable_if.h>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we are going to use this in generated functions which will be proposed in subsequent PRs.

Copy link
Contributor

@ahendriksen ahendriksen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One question.

Copy link
Contributor

🟩 CI finished in 3h 54m: Pass: 100%/152 | Total: 3d 01h | Avg: 29m 01s | Max: 1h 14m | Hits: 414%/21515
  • 🟩 cub: Pass: 100%/44 | Total: 1d 15h | Avg: 54m 03s | Max: 1h 14m | Hits: 159%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total:  1d 13h | Avg: 53m 37s | Max:  1h 14m | Hits: 159%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 03m
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 08m | Avg:  1h 01m | Max:  1h 02m | Hits: 159%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 14m
      🟩 12.6               Pass: 100%/37  | Total:  1d 08h | Avg: 52m 09s | Max:  1h 14m | Hits: 159%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 08m | Avg:  1h 01m | Max:  1h 02m | Hits: 159%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 14m
      🟩 nvcc12.6           Pass: 100%/35  | Total:  1d 06h | Avg: 51m 40s | Max:  1h 14m | Hits: 159%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 01m
      🟩 nvcc               Pass: 100%/42  | Total:  1d 13h | Avg: 53m 44s | Max:  1h 14m | Hits: 159%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 05m | Avg:  1h 01m | Max:  1h 02m
      🟩 Clang15            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 48s | Max:  1h 00m
      🟩 Clang16            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 32s | Max:  1h 00m
      🟩 Clang17            Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m
      🟩 Clang18            Pass: 100%/7   | Total:  5h 36m | Avg: 48m 06s | Max:  1h 03m
      🟩 GCC7               Pass: 100%/2   | Total:  1h 57m | Avg: 58m 58s | Max:  1h 00m
      🟩 GCC8               Pass: 100%/1   | Total:  1h 01m | Avg:  1h 01m | Max:  1h 01m
      🟩 GCC9               Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 03m
      🟩 GCC10              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 02m
      🟩 GCC11              Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 03m
      🟩 GCC12              Pass: 100%/4   | Total:  2h 46m | Avg: 41m 32s | Max:  1h 00m
      🟩 GCC13              Pass: 100%/8   | Total:  4h 58m | Avg: 37m 21s | Max:  1h 02m
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 10m | Hits: 159%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 28m | Avg:  1h 14m | Max:  1h 14m | Hits: 159%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 14m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 15h 44m | Avg: 55m 33s | Max:  1h 03m
      🟩 GCC                Pass: 100%/21  | Total: 16h 51m | Avg: 48m 10s | Max:  1h 03m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 41m | Avg:  1h 10m | Max:  1h 14m | Hits: 159%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 14m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 47m 01s | Avg: 23m 30s | Max: 27m 32s
      🟩 v100               Pass: 100%/42  | Total:  1d 14h | Avg: 55m 30s | Max:  1h 14m | Hits: 159%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 13h | Avg:  1h 00m | Max:  1h 14m | Hits: 159%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 05s | Avg: 20m 05s | Max: 20m 05s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 09s | Avg: 17m 09s | Max: 17m 09s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 06m | Avg: 22m 07s | Max: 26m 19s
      🟩 TestGPU            Pass: 100%/2   | Total: 47m 52s | Avg: 23m 56s | Max: 28m 22s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 47m 01s | Avg: 23m 30s | Max: 27m 32s
      🟩 90a                Pass: 100%/1   | Total: 23m 58s | Avg: 23m 58s | Max: 23m 58s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 20h 52m | Avg:  1h 02m | Max:  1h 14m | Hits: 159%/2664  
      🟩 20                 Pass: 100%/24  | Total: 18h 46m | Avg: 46m 55s | Max:  1h 14m | Hits: 158%/888   
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 7h 21m | Avg: 10m 16s | Max: 29m 18s | Hits: 680%/10065

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  7h 02m | Avg: 10m 17s | Max: 29m 18s | Hits: 680%/10065 
      🟩 arm64              Pass: 100%/2   | Total: 19m 31s | Avg:  9m 45s | Max: 15m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 37m 53s | Avg:  7m 34s | Max: 22m 38s | Hits: 680%/2471  
      🟩 12.5               Pass: 100%/2   | Total: 18m 14s | Avg:  9m 07s | Max:  9m 28s
      🟩 12.6               Pass: 100%/36  | Total:  6h 25m | Avg: 10m 42s | Max: 29m 18s | Hits: 680%/7594  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 13m | Avg: 18m 19s | Max: 22m 03s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 37m 53s | Avg:  7m 34s | Max: 22m 38s | Hits: 680%/2471  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 14s | Avg:  9m 07s | Max:  9m 28s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  5h 12m | Avg:  9m 45s | Max: 29m 18s | Hits: 680%/7594  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 13m | Avg: 18m 19s | Max: 22m 03s
      🟩 nvcc               Pass: 100%/39  | Total:  6h 08m | Avg:  9m 26s | Max: 29m 18s | Hits: 680%/10065 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 16m 27s | Avg:  4m 06s | Max:  4m 22s
      🟩 Clang15            Pass: 100%/2   | Total:  9m 38s | Avg:  4m 49s | Max:  4m 54s
      🟩 Clang16            Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  5m 01s
      🟩 Clang17            Pass: 100%/2   | Total:  9m 22s | Avg:  4m 41s | Max:  4m 45s
      🟩 Clang18            Pass: 100%/8   | Total:  2h 03m | Avg: 15m 22s | Max: 24m 08s
      🟩 GCC7               Pass: 100%/2   | Total:  7m 33s | Avg:  3m 46s | Max:  4m 05s
      🟩 GCC8               Pass: 100%/1   | Total:  4m 16s | Avg:  4m 16s | Max:  4m 16s
      🟩 GCC9               Pass: 100%/2   | Total:  8m 06s | Avg:  4m 03s | Max:  4m 13s
      🟩 GCC10              Pass: 100%/2   | Total:  8m 17s | Avg:  4m 08s | Max:  4m 09s
      🟩 GCC11              Pass: 100%/2   | Total:  8m 44s | Avg:  4m 22s | Max:  4m 27s
      🟩 GCC12              Pass: 100%/2   | Total:  8m 40s | Avg:  4m 20s | Max:  4m 42s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 49m | Avg: 13m 43s | Max: 29m 18s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 47m 09s | Avg: 23m 34s | Max: 24m 31s | Hits: 680%/4952  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 52m 59s | Avg: 26m 29s | Max: 27m 27s | Hits: 679%/5113  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 14s | Avg:  9m 07s | Max:  9m 28s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/18  | Total:  2h 47m | Avg:  9m 19s | Max: 24m 08s
      🟩 GCC                Pass: 100%/19  | Total:  2h 35m | Avg:  8m 10s | Max: 29m 18s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 40m | Avg: 25m 02s | Max: 27m 27s | Hits: 680%/10065 
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 14s | Avg:  9m 07s | Max:  9m 28s
    🟩 gpu
      🟩 v100               Pass: 100%/43  | Total:  7h 21m | Avg: 10m 16s | Max: 29m 18s | Hits: 680%/10065 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 38m | Avg:  8m 54s | Max: 27m 27s | Hits: 680%/10065 
      🟩 NVRTC              Pass: 100%/2   | Total: 52m 28s | Avg: 26m 14s | Max: 29m 18s
      🟩 Test               Pass: 100%/2   | Total: 48m 36s | Avg: 24m 18s | Max: 24m 28s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 04s | Avg:  2m 04s | Max:  2m 04s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 15m 44s | Avg: 15m 44s | Max: 15m 44s
      🟩 90a                Pass: 100%/2   | Total: 21m 33s | Avg: 10m 46s | Max: 13m 37s
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  3h 21m | Avg:  9m 35s | Max: 25m 32s | Hits: 680%/7433  
      🟩 20                 Pass: 100%/21  | Total:  3h 58m | Avg: 11m 20s | Max: 29m 18s | Hits: 679%/2632  
    
  • 🟩 thrust: Pass: 100%/42 | Total: 23h 35m | Avg: 33m 42s | Max: 1h 08m | Hits: 177%/7376

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 37m 48s | Avg: 18m 54s | Max: 26m 22s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total: 22h 36m | Avg: 33m 55s | Max:  1h 08m | Hits: 177%/7376  
      🟩 arm64              Pass: 100%/2   | Total: 58m 55s | Avg: 29m 27s | Max: 30m 54s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 16m | Avg: 39m 17s | Max: 57m 55s | Hits: 177%/1844  
      🟩 12.5               Pass: 100%/2   | Total:  1h 59m | Avg: 59m 52s | Max:  1h 00m
      🟩 12.6               Pass: 100%/35  | Total: 18h 19m | Avg: 31m 24s | Max:  1h 08m | Hits: 177%/5532  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 54m 11s | Avg: 27m 05s | Max: 27m 11s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 16m | Avg: 39m 17s | Max: 57m 55s | Hits: 177%/1844  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 59m | Avg: 59m 52s | Max:  1h 00m
      🟩 nvcc12.6           Pass: 100%/33  | Total: 17h 25m | Avg: 31m 40s | Max:  1h 08m | Hits: 177%/5532  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 54m 11s | Avg: 27m 05s | Max: 27m 11s
      🟩 nvcc               Pass: 100%/40  | Total: 22h 41m | Avg: 34m 02s | Max:  1h 08m | Hits: 177%/7376  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 38s | Max: 35m 40s
      🟩 Clang15            Pass: 100%/2   | Total:  1h 06m | Avg: 33m 12s | Max: 33m 45s
      🟩 Clang16            Pass: 100%/2   | Total:  1h 01m | Avg: 30m 38s | Max: 31m 51s
      🟩 Clang17            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 33s | Max: 34m 32s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 46m | Avg: 23m 48s | Max: 33m 11s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 41s | Max: 33m 59s
      🟩 GCC8               Pass: 100%/1   | Total: 30m 42s | Avg: 30m 42s | Max: 30m 42s
      🟩 GCC9               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 37s | Max: 34m 09s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 03m | Avg: 31m 39s | Max: 32m 05s
      🟩 GCC11              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 31s | Max: 35m 50s
      🟩 GCC12              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 38s | Max: 33m 24s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 03m | Avg: 22m 58s | Max: 37m 39s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 42s | Max:  1h 01m | Hits: 177%/3688  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 08m | Hits: 177%/3688  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 52s | Max:  1h 00m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  8h 17m | Avg: 29m 17s | Max: 35m 40s
      🟩 GCC                Pass: 100%/19  | Total:  9h 08m | Avg: 28m 52s | Max: 37m 39s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 09m | Avg:  1h 02m | Max:  1h 08m | Hits: 177%/7376  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 59m | Avg: 59m 52s | Max:  1h 00m
    🟩 gpu
      🟩 v100               Pass: 100%/42  | Total: 23h 35m | Avg: 33m 42s | Max:  1h 08m | Hits: 177%/7376  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 22h 42m | Avg: 36m 49s | Max:  1h 08m | Hits: 177%/7376  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 30s | Avg:  7m 45s | Max:  8m 10s
      🟩 TestGPU            Pass: 100%/3   | Total: 37m 44s | Avg: 12m 34s | Max: 13m 43s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 20m 15s | Avg: 20m 15s | Max: 20m 15s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 44m | Avg: 38m 13s | Max:  1h 01m | Hits: 177%/5532  
      🟩 20                 Pass: 100%/20  | Total: 10h 13m | Avg: 30m 40s | Max:  1h 08m | Hits: 177%/1844  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 2h 02m | Avg: 6m 08s | Max: 18m 30s | Hits: 383%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 48m | Avg:  6m 48s | Max: 18m 30s | Hits: 383%/522   
      🟩 arm64              Pass: 100%/4   | Total: 14m 00s | Avg:  3m 30s | Max:  3m 39s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  9m 59s | Avg:  9m 59s | Max:  9m 59s | Hits: 383%/261   
      🟩 12.5               Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 22s
      🟩 12.6               Pass: 100%/17  | Total:  1h 40m | Avg:  5m 54s | Max: 18m 30s | Hits: 383%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  9m 59s | Avg:  9m 59s | Max:  9m 59s | Hits: 383%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 22s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 40m | Avg:  5m 54s | Max: 18m 30s | Hits: 383%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  2h 02m | Avg:  6m 08s | Max: 18m 30s | Hits: 383%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 56s | Avg:  3m 56s | Max:  3m 56s
      🟩 Clang15            Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
      🟩 Clang16            Pass: 100%/1   | Total:  4m 09s | Avg:  4m 09s | Max:  4m 09s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 46s | Avg:  3m 46s | Max:  3m 46s
      🟩 Clang18            Pass: 100%/4   | Total: 29m 20s | Avg:  7m 20s | Max: 18m 30s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 47s | Avg:  3m 47s | Max:  3m 47s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 49s | Avg:  3m 49s | Max:  3m 49s
      🟩 GCC12              Pass: 100%/2   | Total: 21m 01s | Avg: 10m 30s | Max: 16m 45s
      🟩 GCC13              Pass: 100%/4   | Total: 13m 49s | Avg:  3m 27s | Max:  3m 39s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  9m 59s | Avg:  9m 59s | Max:  9m 59s | Hits: 383%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 47s | Avg: 12m 47s | Max: 12m 47s | Hits: 383%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 22s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 45m 16s | Avg:  5m 39s | Max: 18m 30s
      🟩 GCC                Pass: 100%/8   | Total: 42m 26s | Avg:  5m 18s | Max: 16m 45s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 46s | Avg: 11m 23s | Max: 12m 47s | Hits: 383%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 22s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  2h 02m | Avg:  6m 08s | Max: 18m 30s | Hits: 383%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 27m | Avg:  4m 52s | Max: 12m 47s | Hits: 383%/522   
      🟩 Test               Pass: 100%/2   | Total: 35m 15s | Avg: 17m 37s | Max: 18m 30s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s
      🟩 90a                Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 16m 48s | Avg:  4m 12s | Max:  6m 22s
      🟩 20                 Pass: 100%/16  | Total:  1h 46m | Avg:  6m 37s | Max: 18m 30s | Hits: 383%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 29s | Avg: 4m 44s | Max: 7m 14s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 14s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 14s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 14s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 14s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 14s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 14s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total:  9m 29s | Avg:  4m 44s | Max:  7m 14s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
      🟩 Test               Pass: 100%/1   | Total:  7m 14s | Avg:  7m 14s | Max:  7m 14s
    
  • 🟩 python: Pass: 100%/1 | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 43m 44s | Avg: 43m 44s | Max: 43m 44s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 152)

# Runner
110 linux-amd64-cpu16
17 linux-amd64-gpu-v100-latest-1
14 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1-testing

@bernhardmgruber bernhardmgruber merged commit e08bda4 into NVIDIA:main Jan 28, 2025
173 of 176 checks passed
Copy link
Contributor

Git push to origin failed for branch/2.8.x with exitcode 128

@bernhardmgruber bernhardmgruber deleted the sync_ptx branch January 29, 2025 07:50
bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this pull request Jan 29, 2025
davebayer pushed a commit to davebayer/cccl that referenced this pull request Jan 29, 2025
miscco pushed a commit that referenced this pull request Jan 31, 2025
* Sync ptx_dot_variants.h with libcuda-ptx (#3564)

* Update ptx_isa.h to include 8.6 and 8.7 (#3563)

* PTX: Update generated files with Blackwell instructions (#3568)

* ptx: Update existing instructions
* ptx: Add new instructions
* Fix returning error out values
See:
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/74
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/73
* ptx: Fix out var declaration
See  https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/75
* mbarrier.{test,try}_wait: Fix test. Wrong files were included.
* docs: Fix special registers include
* Allow non-included documentation pages
* Workaround NVRTC

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Remove internal instructions (#3583)

* barrier.cluster.aligned: Remove
This is not supposed to be exposed in CCCL.

* elect.sync: Remove
Not ready for inclusion yet. This needs to handle the optional extra
output mask as well.

* mapa: Remove
This has compiler bugs. We should use intrinsics instead.

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Update existing instructions (#3584)

* mbarrier.expect_tx: Add missing source and test
It was already documented(!)

* cp.async.bulk.tensor: Add .{gather,scatter}4
* fence: Add .sync_restrict, .proxy.async.sync_restrict

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add clusterlaunchcontrol (#3589)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add cp.async.mbarrier.arrive{.noinc} (#3602)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add multimem instructions (#3603)

* Add multimem.ld_reduce
* Add multimem.red
* Add multimem.st

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add st.bulk (#3604)

Co-authored-by: Allard Hendriksen <[email protected]>

* PTX: Add tcgen05 instructions (#3607)

* ptx: Add tcgen05.alloc

* ptx: Add tcgen05.commit

* ptx: Add tcgen05.cp

* ptx: Add tcgen05.fence

* ptx: Add tcgen05.ld

* ptx: Add tcgen05.mma

* ptx: Add tcgen05.mma.ws

* ptx: Add tcgen05.shift

* ptx: Add tcgen05.st

* ptx: Add tcgen05.wait

* fix docs

---------

Co-authored-by: Allard Hendriksen <[email protected]>

---------

Co-authored-by: Allard Hendriksen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants