Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTX: Update generated files with Blackwell instructions #3568

Merged
merged 8 commits into from
Jan 29, 2025

Conversation

bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Jan 28, 2025

This PR updates the generated files for the PTX support in libcu++. A non-generated test and a documentation file were adapted to account for some changes in file names.

Copy link

copy-pr-bot bot commented Jan 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bernhardmgruber
Copy link
Contributor Author

/ok to test

@bernhardmgruber bernhardmgruber changed the title ptx: Update existing instructions ptx: Update generated files Jan 29, 2025
@bernhardmgruber bernhardmgruber marked this pull request as ready for review January 29, 2025 09:38
@bernhardmgruber bernhardmgruber changed the title ptx: Update generated files PTX: Update generated files Jan 29, 2025
@bernhardmgruber
Copy link
Contributor Author

bernhardmgruber commented Jan 29, 2025

Is there any chance we can temporarily suppress these docs build warnings:

Errors (from: '/home/runner/work/cccl/cccl/docs/_build/docs/sphinx_warnings.txt'):
/home/runner/work/cccl/cccl/docs/libcudacxx/ptx/instructions/generated/barrier_cluster_aligned.rst: WARNING: document isn't included in any toctree
/home/runner/work/cccl/cccl/docs/libcudacxx/ptx/instructions/generated/clusterlaunchcontrol.rst: WARNING: document isn't included in any toctree
/home/runner/work/cccl/cccl/docs/libcudacxx/ptx/instructions/generated/cp_async_bulk_tensor_gather_scatter.rst: WARNING: document isn't included in any toctree
/home/runner/work/cccl/cccl/docs/libcudacxx/ptx/instructions/generated/cp_async_mbarrier_arrive.rst: WARNING: document isn't included in any toctree
/home/runner/work/cccl/cccl/docs/libcudacxx/ptx/instructions/generated/cp_async_mbarrier_arrive_noinc.rst: WARNING: document isn't included in any toctree
...

The documentation for these instructions will be brought up in subsequent PRs.

@@ -54,7 +54,7 @@ api_output_directory = "api"
use_fast_doxygen_conversion = true
sphinx_generate_doxygen_groups = true
sphinx_generate_doxygen_pages = true
sphinx_exclude_patterns = []
sphinx_exclude_patterns = ['ptx/instructions/generated']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suppressing warnings on non-included rst files for now.

@bernhardmgruber
Copy link
Contributor Author

I am getting failures from NVRTC:

  1: /home/coder/cccl/libcudacxx/test/libcudacxx/cuda/ptx/generated/cp_async_bulk_multicast.h(33): error: expected a ")"
  1:     NV_IF_TARGET(
  1:     ^
  1: 
  1: /home/coder/cccl/libcudacxx/test/libcudacxx/cuda/ptx/generated/cp_async_bulk_multicast.h(33): error: expected a ")"
  1:     NV_IF_TARGET(
  1:     ^
  1: 
  1: /home/coder/cccl/libcudacxx/test/libcudacxx/cuda/ptx/generated/cp_async_bulk_multicast.h(33): error: identifier "_NV_IF__NV_TARGET_BOOL_NV_HAS_FEATURE_SM_100a" is undefined
  1:     NV_IF_TARGET(
  1:     ^
  1: 
  1: /home/coder/cccl/libcudacxx/test/libcudacxx/cuda/ptx/generated/cp_async_bulk_multicast.h(33): error: expected an expression
  1:     NV_IF_TARGET(
  1:     ^

Comment on lines +15 to +34
#ifdef __CUDACC_RTC__
# ifndef NV_HAS_FEATURE_SM_100a
# define NV_HAS_FEATURE_SM_100a __NV_HAS_FEATURE_SM_100a
# if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 1000) && defined(__CUDA_ARCH_FEAT_SM100_ALL))
# define _NV_TARGET_BOOL___NV_HAS_FEATURE_SM_100a 1
# else
# define _NV_TARGET_BOOL___NV_HAS_FEATURE_SM_100a 0
# endif
# endif // NV_HAS_FEATURE_SM_100a

// Re-enable sm_101a support in nvcc.
# ifndef NV_HAS_FEATURE_SM_101a
# define NV_HAS_FEATURE_SM_101a __NV_HAS_FEATURE_SM_101a
# if (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 1010) && defined(__CUDA_ARCH_FEAT_SM101_ALL))
# define _NV_TARGET_BOOL___NV_HAS_FEATURE_SM_101a 1
# else
# define _NV_TARGET_BOOL___NV_HAS_FEATURE_SM_101a 0
# endif
# endif // NV_HAS_FEATURE_SM_101a
#endif // __CUDACC_RTC__
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since NVRTC does not use our __target_macros header, I have to ship some of them in the tests as a workaround.

@bernhardmgruber bernhardmgruber enabled auto-merge (squash) January 29, 2025 18:36
@bernhardmgruber bernhardmgruber changed the title PTX: Update generated files PTX: Update generated files with Blackwell instructions Jan 29, 2025
Copy link
Contributor

🟩 CI finished in 3h 05m: Pass: 100%/152 | Total: 1d 07h | Avg: 12m 22s | Max: 1h 09m | Hits: 449%/21523
  • 🟩 cub: Pass: 100%/44 | Total: 12h 28m | Avg: 17m 01s | Max: 1h 09m | Hits: 277%/3552

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total: 12h 19m | Avg: 17m 35s | Max:  1h 09m | Hits: 277%/3552  
      🟩 arm64              Pass: 100%/2   | Total:  9m 49s | Avg:  4m 54s | Max:  4m 59s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 15m | Avg: 15m 07s | Max: 54m 27s | Hits: 279%/888   
      🟩 12.5               Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 09m
      🟩 12.6               Pass: 100%/37  | Total:  8h 56m | Avg: 14m 29s | Max:  1h 02m | Hits: 276%/2664  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 01s | Avg:  4m 30s | Max:  4m 32s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 15m | Avg: 15m 07s | Max: 54m 27s | Hits: 279%/888   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 09m
      🟩 nvcc12.6           Pass: 100%/35  | Total:  8h 47m | Avg: 15m 03s | Max:  1h 02m | Hits: 276%/2664  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 01s | Avg:  4m 30s | Max:  4m 32s
      🟩 nvcc               Pass: 100%/42  | Total: 12h 19m | Avg: 17m 37s | Max:  1h 09m | Hits: 277%/3552  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 16s | Avg:  5m 19s | Max:  5m 34s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 39s | Avg:  5m 49s | Max:  5m 55s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 27s | Avg:  5m 43s | Max:  5m 46s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 44s | Avg:  5m 52s | Max:  5m 55s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 27m | Avg: 12m 26s | Max: 35m 36s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 04s | Avg:  5m 32s | Max:  5m 45s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 45s | Avg:  5m 45s | Max:  5m 45s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 39s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  5m 52s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 54s | Avg:  5m 57s | Max:  6m 07s
      🟩 GCC12              Pass: 100%/4   | Total: 35m 58s | Avg:  8m 59s | Max: 19m 07s
      🟩 GCC13              Pass: 100%/8   | Total:  2h 23m | Avg: 17m 59s | Max: 38m 28s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 06s | Max: 59m 46s | Hits: 287%/1776  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits: 266%/1776  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 09m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 23m | Avg:  8m 25s | Max: 35m 36s
      🟩 GCC                Pass: 100%/21  | Total:  3h 51m | Avg: 11m 01s | Max: 38m 28s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 57m | Avg: 59m 18s | Max:  1h 02m | Hits: 277%/3552  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 09m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 23m 25s | Avg: 11m 42s | Max: 19m 07s
      🟩 v100               Pass: 100%/42  | Total: 12h 05m | Avg: 17m 16s | Max:  1h 09m | Hits: 277%/3552  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  9h 05m | Avg: 14m 43s | Max:  1h 09m | Hits: 277%/3552  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 26m 21s | Avg: 26m 21s | Max: 26m 21s
      🟩 GraphCapture       Pass: 100%/1   | Total: 20m 04s | Avg: 20m 04s | Max: 20m 04s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 23m | Avg: 27m 46s | Max: 37m 51s
      🟩 TestGPU            Pass: 100%/2   | Total:  1h 14m | Avg: 37m 02s | Max: 38m 28s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 23m 25s | Avg: 11m 42s | Max: 19m 07s
      🟩 90a                Pass: 100%/1   | Total:  4m 21s | Avg:  4m 21s | Max:  4m 21s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  5h 34m | Avg: 16m 43s | Max:  1h 09m | Hits: 296%/2664  
      🟩 20                 Pass: 100%/24  | Total:  6h 54m | Avg: 17m 15s | Max:  1h 07m | Hits: 220%/888   
    
  • 🟩 libcudacxx: Pass: 100%/43 | Total: 6h 49m | Avg: 9m 31s | Max: 28m 07s | Hits: 688%/10065

    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  6h 42m | Avg:  9m 48s | Max: 28m 07s | Hits: 688%/10065 
      🟩 arm64              Pass: 100%/2   | Total:  7m 11s | Avg:  3m 35s | Max:  3m 45s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 37m 30s | Avg:  7m 30s | Max: 22m 34s | Hits: 688%/2471  
      🟩 12.5               Pass: 100%/2   | Total: 18m 30s | Avg:  9m 15s | Max: 10m 01s
      🟩 12.6               Pass: 100%/36  | Total:  5h 53m | Avg:  9m 49s | Max: 28m 07s | Hits: 688%/7594  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 10m | Avg: 17m 43s | Max: 22m 44s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 37m 30s | Avg:  7m 30s | Max: 22m 34s | Hits: 688%/2471  
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 30s | Avg:  9m 15s | Max: 10m 01s
      🟩 nvcc12.6           Pass: 100%/32  | Total:  4h 42m | Avg:  8m 49s | Max: 28m 07s | Hits: 688%/7594  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 10m | Avg: 17m 43s | Max: 22m 44s
      🟩 nvcc               Pass: 100%/39  | Total:  5h 38m | Avg:  8m 40s | Max: 28m 07s | Hits: 688%/10065 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 17m 01s | Avg:  4m 15s | Max:  4m 33s
      🟩 Clang15            Pass: 100%/2   | Total:  9m 00s | Avg:  4m 30s | Max:  4m 36s
      🟩 Clang16            Pass: 100%/2   | Total:  9m 09s | Avg:  4m 34s | Max:  4m 46s
      🟩 Clang17            Pass: 100%/2   | Total:  8m 56s | Avg:  4m 28s | Max:  4m 35s
      🟩 Clang18            Pass: 100%/8   | Total:  1h 52m | Avg: 14m 00s | Max: 28m 03s
      🟩 GCC7               Pass: 100%/2   | Total:  6m 58s | Avg:  3m 29s | Max:  3m 29s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 45s | Avg:  3m 45s | Max:  3m 45s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 09s | Avg:  3m 34s | Max:  3m 48s
      🟩 GCC10              Pass: 100%/2   | Total:  8m 22s | Avg:  4m 11s | Max:  4m 20s
      🟩 GCC11              Pass: 100%/2   | Total:  8m 26s | Avg:  4m 13s | Max:  4m 23s
      🟩 GCC12              Pass: 100%/2   | Total:  8m 33s | Avg:  4m 16s | Max:  4m 29s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 28m | Avg: 11m 07s | Max: 27m 12s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 47m 04s | Avg: 23m 32s | Max: 24m 30s | Hits: 688%/4952  
      🟩 MSVC14.39          Pass: 100%/2   | Total: 55m 35s | Avg: 27m 47s | Max: 28m 07s | Hits: 687%/5113  
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 30s | Avg:  9m 15s | Max: 10m 01s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/18  | Total:  2h 36m | Avg:  8m 40s | Max: 28m 03s
      🟩 GCC                Pass: 100%/19  | Total:  2h 12m | Avg:  6m 57s | Max: 27m 12s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 42m | Avg: 25m 39s | Max: 28m 07s | Hits: 688%/10065 
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 30s | Avg:  9m 15s | Max: 10m 01s
    🟩 gpu
      🟩 v100               Pass: 100%/43  | Total:  6h 49m | Avg:  9m 31s | Max: 28m 07s | Hits: 688%/10065 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total:  5h 07m | Avg:  8m 05s | Max: 28m 07s | Hits: 688%/10065 
      🟩 NVRTC              Pass: 100%/2   | Total: 45m 02s | Avg: 22m 31s | Max: 22m 45s
      🟩 Test               Pass: 100%/2   | Total: 55m 15s | Avg: 27m 37s | Max: 28m 03s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  1m 55s | Avg:  1m 55s | Max:  1m 55s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total: 14m 08s | Avg: 14m 08s | Max: 14m 08s
      🟩 90a                Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max: 14m 10s
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  3h 07m | Avg:  8m 56s | Max: 27m 28s | Hits: 688%/7433  
      🟩 20                 Pass: 100%/21  | Total:  3h 39m | Avg: 10m 28s | Max: 28m 07s | Hits: 686%/2632  
    
  • 🟩 thrust: Pass: 100%/42 | Total: 9h 08m | Avg: 13m 02s | Max: 58m 32s | Hits: 210%/7384

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 21m 37s | Avg: 10m 48s | Max: 15m 27s
    🟩 cpu
      🟩 amd64              Pass: 100%/40  | Total:  8h 58m | Avg: 13m 27s | Max: 58m 32s | Hits: 210%/7384  
      🟩 arm64              Pass: 100%/2   | Total:  9m 31s | Avg:  4m 45s | Max:  5m 00s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 01m | Avg: 12m 12s | Max: 40m 26s | Hits: 200%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max: 58m 32s
      🟩 12.6               Pass: 100%/35  | Total:  6h 13m | Avg: 10m 40s | Max: 52m 10s | Hits: 213%/5538  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 36s | Avg:  5m 18s | Max:  5m 22s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 01m | Avg: 12m 12s | Max: 40m 26s | Hits: 200%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max: 58m 32s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  6h 03m | Avg: 11m 00s | Max: 52m 10s | Hits: 213%/5538  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 36s | Avg:  5m 18s | Max:  5m 22s
      🟩 nvcc               Pass: 100%/40  | Total:  8h 57m | Avg: 13m 26s | Max: 58m 32s | Hits: 210%/7384  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 49s | Avg:  5m 27s | Max:  5m 51s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 03s | Avg:  5m 31s | Max:  5m 41s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  5m 51s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 31s | Avg:  5m 45s | Max:  5m 46s
      🟩 Clang18            Pass: 100%/7   | Total: 53m 50s | Avg:  7m 41s | Max: 19m 40s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 37s | Avg:  5m 18s | Max:  5m 39s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 25s | Avg:  5m 25s | Max:  5m 25s
      🟩 GCC9               Pass: 100%/2   | Total: 10m 35s | Avg:  5m 17s | Max:  5m 29s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 21s | Avg:  5m 40s | Max:  5m 50s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 28s | Avg:  5m 44s | Max:  5m 48s
      🟩 GCC12              Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 07s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 08m | Avg:  8m 37s | Max: 16m 37s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 31m | Avg: 45m 59s | Max: 51m 32s | Hits: 221%/3692  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 18s | Max: 52m 10s | Hits: 200%/3692  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max: 58m 32s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 49m | Avg:  6m 27s | Max: 19m 40s
      🟩 GCC                Pass: 100%/19  | Total:  2h 10m | Avg:  6m 51s | Max: 16m 37s
      🟩 MSVC               Pass: 100%/4   | Total:  3h 14m | Avg: 48m 38s | Max: 52m 10s | Hits: 210%/7384  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 38s | Max: 58m 32s
    🟩 gpu
      🟩 v100               Pass: 100%/42  | Total:  9h 08m | Avg: 13m 02s | Max: 58m 32s | Hits: 210%/7384  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  8h 00m | Avg: 12m 59s | Max: 58m 32s | Hits: 210%/7384  
      🟩 TestCPU            Pass: 100%/2   | Total: 15m 46s | Avg:  7m 53s | Max:  8m 17s
      🟩 TestGPU            Pass: 100%/3   | Total: 51m 44s | Avg: 17m 14s | Max: 19m 40s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total:  4m 42s | Avg:  4m 42s | Max:  4m 42s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  4h 50m | Avg: 14m 31s | Max: 58m 32s | Hits: 215%/5538  
      🟩 20                 Pass: 100%/20  | Total:  3h 55m | Avg: 11m 47s | Max: 54m 45s | Hits: 196%/1846  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 53m | Avg: 5m 41s | Max: 19m 13s | Hits: 384%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 43m | Avg:  6m 28s | Max: 19m 13s | Hits: 384%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 19s | Avg:  2m 34s | Max:  2m 37s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 10m 13s | Avg: 10m 13s | Max: 10m 13s | Hits: 384%/261   
      🟩 12.5               Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 17s
      🟩 12.6               Pass: 100%/17  | Total:  1h 31m | Avg:  5m 22s | Max: 19m 13s | Hits: 384%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 10m 13s | Avg: 10m 13s | Max: 10m 13s | Hits: 384%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 17s
      🟩 nvcc12.6           Pass: 100%/17  | Total:  1h 31m | Avg:  5m 22s | Max: 19m 13s | Hits: 384%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 53m | Avg:  5m 41s | Max: 19m 13s | Hits: 384%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 06s | Avg:  3m 06s | Max:  3m 06s
      🟩 Clang18            Pass: 100%/4   | Total: 26m 36s | Avg:  6m 39s | Max: 18m 11s
      🟩 GCC10              Pass: 100%/1   | Total:  2m 59s | Avg:  2m 59s | Max:  2m 59s
      🟩 GCC11              Pass: 100%/1   | Total:  2m 56s | Avg:  2m 56s | Max:  2m 56s
      🟩 GCC12              Pass: 100%/2   | Total: 22m 44s | Avg: 11m 22s | Max: 19m 13s
      🟩 GCC13              Pass: 100%/4   | Total: 10m 46s | Avg:  2m 41s | Max:  2m 52s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 10m 13s | Avg: 10m 13s | Max: 10m 13s | Hits: 384%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 11s | Avg: 12m 11s | Max: 12m 11s | Hits: 384%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 17s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 39m 39s | Avg:  4m 57s | Max: 18m 11s
      🟩 GCC                Pass: 100%/8   | Total: 39m 25s | Avg:  4m 55s | Max: 19m 13s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 24s | Avg: 11m 12s | Max: 12m 11s | Hits: 384%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  6m 17s
    🟩 gpu
      🟩 v100               Pass: 100%/20  | Total:  1h 53m | Avg:  5m 41s | Max: 19m 13s | Hits: 384%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 16m | Avg:  4m 14s | Max: 12m 11s | Hits: 384%/522   
      🟩 Test               Pass: 100%/2   | Total: 37m 24s | Avg: 18m 42s | Max: 19m 13s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 52s | Avg:  2m 52s | Max:  2m 52s
      🟩 90a                Pass: 100%/1   | Total:  2m 45s | Avg:  2m 45s | Max:  2m 45s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 14m 15s | Avg:  3m 33s | Max:  6m 17s
      🟩 20                 Pass: 100%/16  | Total:  1h 39m | Avg:  6m 13s | Max: 19m 13s | Hits: 384%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 28s | Avg: 5m 14s | Max: 8m 26s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 26s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 26s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 26s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 26s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 26s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 26s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 28s | Avg:  5m 14s | Max:  8m 26s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 02s | Avg:  2m 02s | Max:  2m 02s
      🟩 Test               Pass: 100%/1   | Total:  8m 26s | Avg:  8m 26s | Max:  8m 26s
    
  • 🟩 python: Pass: 100%/1 | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 50m 12s | Avg: 50m 12s | Max: 50m 12s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 152)

# Runner
110 linux-amd64-cpu16
17 linux-amd64-gpu-v100-latest-1
14 windows-amd64-cpu16
10 linux-arm64-cpu16
1 linux-amd64-gpu-h100-latest-1

@bernhardmgruber bernhardmgruber merged commit d21e0c9 into NVIDIA:main Jan 29, 2025
164 of 167 checks passed
Copy link
Contributor

Backport failed for branch/2.8.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally.

git fetch origin branch/2.8.x
git worktree add -d .worktree/backport-3568-to-branch/2.8.x origin/branch/2.8.x
cd .worktree/backport-3568-to-branch/2.8.x
git checkout -b backport-3568-to-branch/2.8.x
ancref=$(git merge-base d0f254490bad268887e33266dc64a0722318ef30 deebc024508aac9b20012ac1e972afa4437e92f5)
git cherry-pick -x $ancref..deebc024508aac9b20012ac1e972afa4437e92f5

@bernhardmgruber bernhardmgruber deleted the ptx_update branch January 29, 2025 22:06
bernhardmgruber added a commit that referenced this pull request Jan 31, 2025
* ptx: Update existing instructions
* ptx: Add new instructions
* Fix returning error out values
See:
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/74
- https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/73
* ptx: Fix out var declaration
See  https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/75
* mbarrier.{test,try}_wait: Fix test. Wrong files were included.
* docs: Fix special registers include
* Allow non-included documentation pages
* Workaround NVRTC

Co-authored-by: Allard Hendriksen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants