Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward declare half types in cuda::ptx #2981

Merged
merged 1 commit into from
Nov 28, 2024

Conversation

ahendriksen
Copy link
Contributor

Description

closes #2933

Forward declare half types to avoid including the (expensive) fp16 headers.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Nov 28, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ahendriksen ahendriksen marked this pull request as ready for review November 28, 2024 09:04
@ahendriksen ahendriksen requested review from a team as code owners November 28, 2024 09:04
@bernhardmgruber
Copy link
Contributor

bernhardmgruber commented Nov 28, 2024

That's great work! Compiling:

#include <cuda/ptx>
int main() {}

with
nvcc -fdevice-time-trace trace -Icccl/cub -Icccl/thrust -Icccl/libcudacxx/include ./ptx.cu

now shows this trace:
image

ptx.h takes 12ms to parse. Great work!

Copy link
Contributor

🟩 CI finished in 1h 46m: Pass: 100%/396 | Total: 7d 15h | Avg: 27m 43s | Max: 1h 13m | Hits: 81%/22090
  • 🟩 libcudacxx: Pass: 100%/118 | Total: 1d 00h | Avg: 12m 42s | Max: 34m 08s | Hits: 97%/9546

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total: 23h 21m | Avg: 12m 44s | Max: 34m 08s | Hits:  97%/9546  
      🟩 arm64              Pass: 100%/8   | Total:  1h 37m | Avg: 12m 12s | Max: 25m 20s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  2h 48m | Avg: 11m 15s | Max: 24m 27s | Hits:  97%/2199  
      🟩 11.8               Pass: 100%/3   | Total:  1h 16m | Avg: 25m 34s | Max: 34m 08s
      🟩 12.5               Pass: 100%/4   | Total: 46m 01s | Avg: 11m 30s | Max: 20m 17s
      🟩 12.6               Pass: 100%/96  | Total: 20h 07m | Avg: 12m 34s | Max: 30m 02s | Hits:  97%/7347  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/12  | Total:  2h 24m | Avg: 12m 03s | Max: 19m 45s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  2h 48m | Avg: 11m 15s | Max: 24m 27s | Hits:  97%/2199  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 16m | Avg: 25m 34s | Max: 34m 08s
      🟩 nvcc12.5           Pass: 100%/4   | Total: 46m 01s | Avg: 11m 30s | Max: 20m 17s
      🟩 nvcc12.6           Pass: 100%/84  | Total: 17h 43m | Avg: 12m 39s | Max: 30m 02s | Hits:  97%/7347  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/12  | Total:  2h 24m | Avg: 12m 03s | Max: 19m 45s
      🟩 nvcc               Pass: 100%/106 | Total: 22h 34m | Avg: 12m 46s | Max: 34m 08s | Hits:  97%/9546  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 39m | Avg: 16m 33s | Max: 28m 00s
      🟩 Clang10            Pass: 100%/3   | Total: 37m 09s | Avg: 12m 23s | Max: 18m 31s
      🟩 Clang11            Pass: 100%/4   | Total:  1h 01m | Avg: 15m 24s | Max: 28m 23s
      🟩 Clang12            Pass: 100%/4   | Total: 54m 58s | Avg: 13m 44s | Max: 25m 55s
      🟩 Clang13            Pass: 100%/4   | Total:  1h 13m | Avg: 18m 22s | Max: 26m 37s
      🟩 Clang14            Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 21s
      🟩 Clang15            Pass: 100%/4   | Total: 32m 58s | Avg:  8m 14s | Max: 17m 37s
      🟩 Clang16            Pass: 100%/4   | Total: 16m 54s | Avg:  4m 13s | Max:  4m 40s
      🟩 Clang17            Pass: 100%/4   | Total:  1h 11m | Avg: 17m 47s | Max: 27m 53s
      🟩 Clang18            Pass: 100%/18  | Total:  3h 15m | Avg: 10m 52s | Max: 19m 45s
      🟩 GCC6               Pass: 100%/2   | Total: 19m 22s | Avg:  9m 41s | Max: 16m 33s
      🟩 GCC7               Pass: 100%/6   | Total: 29m 12s | Avg:  4m 52s | Max: 12m 36s
      🟩 GCC8               Pass: 100%/6   | Total:  1h 02m | Avg: 10m 22s | Max: 24m 27s
      🟩 GCC9               Pass: 100%/6   | Total:  1h 13m | Avg: 12m 10s | Max: 25m 58s
      🟩 GCC10              Pass: 100%/4   | Total: 38m 23s | Avg:  9m 35s | Max: 26m 13s
      🟩 GCC11              Pass: 100%/7   | Total:  2h 04m | Avg: 17m 46s | Max: 34m 08s
      🟩 GCC12              Pass: 100%/4   | Total:  1h 17m | Avg: 19m 29s | Max: 27m 19s
      🟩 GCC13              Pass: 100%/17  | Total:  3h 54m | Avg: 13m 47s | Max: 27m 21s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 16m | Avg: 25m 28s | Max: 30m 02s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 18m 56s | Avg: 18m 56s | Max: 18m 56s | Hits:  97%/2199  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 25m 05s | Avg: 12m 32s | Max: 12m 48s | Hits:  97%/4743  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 48s | Avg: 13m 48s | Max: 13m 48s | Hits:  97%/2604  
      🟩 NVHPC24.7          Pass: 100%/4   | Total: 46m 01s | Avg: 11m 30s | Max: 20m 17s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/55  | Total: 11h 00m | Avg: 12m 00s | Max: 28m 23s
      🟩 GCC                Pass: 100%/52  | Total: 10h 59m | Avg: 12m 40s | Max: 34m 08s
      🟩 Intel              Pass: 100%/3   | Total:  1h 16m | Avg: 25m 28s | Max: 30m 02s
      🟩 MSVC               Pass: 100%/4   | Total: 57m 49s | Avg: 14m 27s | Max: 18m 56s | Hits:  97%/9546  
      🟩 NVHPC              Pass: 100%/4   | Total: 46m 01s | Avg: 11m 30s | Max: 20m 17s
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 00h | Avg: 12m 42s | Max: 34m 08s | Hits:  97%/9546  
    🟩 jobs
      🟩 Build              Pass: 100%/110 | Total: 22h 31m | Avg: 12m 16s | Max: 34m 08s | Hits:  97%/9546  
      🟩 NVRTC              Pass: 100%/4   | Total:  1h 28m | Avg: 22m 02s | Max: 27m 21s
      🟩 Test               Pass: 100%/3   | Total: 57m 53s | Avg: 19m 17s | Max: 22m 02s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 15s | Avg:  2m 15s | Max:  2m 15s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 16m | Avg: 25m 34s | Max: 34m 08s
      🟩 90                 Pass: 100%/4   | Total: 40m 25s | Avg: 10m 06s | Max: 11m 38s
      🟩 90a                Pass: 100%/8   | Total:  1h 06m | Avg:  8m 16s | Max: 11m 59s
    🟩 std
      🟩 11                 Pass: 100%/32  | Total:  5h 13m | Avg:  9m 47s | Max: 24m 27s
      🟩 14                 Pass: 100%/32  | Total:  6h 03m | Avg: 11m 22s | Max: 25m 50s | Hits:  97%/4492  
      🟩 17                 Pass: 100%/30  | Total:  7h 51m | Avg: 15m 42s | Max: 34m 08s | Hits:  97%/2450  
      🟩 20                 Pass: 100%/23  | Total:  5h 48m | Avg: 15m 09s | Max: 28m 23s | Hits:  97%/2604  
    
  • 🟩 thrust: Pass: 100%/111 | Total: 2d 11h | Avg: 31m 59s | Max: 1h 00m | Hits: 70%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 28s | Avg: 20m 44s | Max: 30m 11s
    🟩 cpu
      🟩 amd64              Pass: 100%/103 | Total:  2d 07h | Avg: 32m 03s | Max:  1h 00m | Hits:  70%/9260  
      🟩 arm64              Pass: 100%/8   | Total:  4h 08m | Avg: 31m 02s | Max: 38m 34s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 54m | Avg: 31m 36s | Max: 53m 56s | Hits:  63%/1852  
      🟩 11.8               Pass: 100%/3   | Total:  2h 01m | Avg: 40m 29s | Max: 44m 02s
      🟩 12.5               Pass: 100%/4   | Total:  3h 24m | Avg: 51m 14s | Max: 55m 01s
      🟩 12.6               Pass: 100%/89  | Total:  1d 21h | Avg: 30m 53s | Max:  1h 00m | Hits:  72%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  1h 45m | Avg: 26m 27s | Max: 30m 58s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 54m | Avg: 31m 36s | Max: 53m 56s | Hits:  63%/1852  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 01m | Avg: 40m 29s | Max: 44m 02s
      🟩 nvcc12.5           Pass: 100%/4   | Total:  3h 24m | Avg: 51m 14s | Max: 55m 01s
      🟩 nvcc12.6           Pass: 100%/85  | Total:  1d 20h | Avg: 31m 06s | Max:  1h 00m | Hits:  72%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  1h 45m | Avg: 26m 27s | Max: 30m 58s
      🟩 nvcc               Pass: 100%/107 | Total:  2d 09h | Avg: 32m 11s | Max:  1h 00m | Hits:  70%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 00m | Avg: 30m 03s | Max: 33m 38s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 00m | Avg: 40m 10s | Max: 46m 55s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 50s | Max: 32m 49s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 23s | Max: 41m 47s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 13s | Max: 38m 13s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 48s | Max: 36m 44s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 28s | Max: 33m 59s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 33s | Max: 33m 29s
      🟩 Clang17            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 50s | Max: 34m 57s
      🟩 Clang18            Pass: 100%/11  | Total:  4h 32m | Avg: 24m 45s | Max: 32m 43s
      🟩 GCC6               Pass: 100%/2   | Total: 55m 51s | Avg: 27m 55s | Max: 29m 12s
      🟩 GCC7               Pass: 100%/6   | Total:  3h 11m | Avg: 31m 53s | Max: 35m 17s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 58m | Avg: 29m 47s | Max: 33m 51s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 11m | Avg: 31m 51s | Max: 34m 57s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 09m | Avg: 32m 23s | Max: 34m 58s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 17m | Avg: 36m 46s | Max: 44m 02s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 24m | Avg: 36m 13s | Max: 47m 20s
      🟩 GCC13              Pass: 100%/16  | Total:  5h 57m | Avg: 22m 18s | Max: 38m 47s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 07m | Avg: 42m 21s | Max: 46m 02s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 53m 56s | Avg: 53m 56s | Max: 53m 56s | Hits:  63%/1852  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 48m | Avg: 54m 12s | Max: 55m 15s | Hits:  63%/3704  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 20m | Avg: 40m 25s | Max:  1h 00m | Hits:  81%/3704  
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  3h 24m | Avg: 51m 14s | Max: 55m 01s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  1d 00h | Avg: 30m 37s | Max: 46m 55s
      🟩 GCC                Pass: 100%/51  | Total:  1d 01h | Avg: 29m 31s | Max: 47m 20s
      🟩 Intel              Pass: 100%/3   | Total:  2h 07m | Avg: 42m 21s | Max: 46m 02s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 03m | Avg: 48m 38s | Max:  1h 00m | Hits:  70%/9260  
      🟩 NVHPC              Pass: 100%/4   | Total:  3h 24m | Avg: 51m 14s | Max: 55m 01s
    🟩 gpu
      🟩 v100               Pass: 100%/111 | Total:  2d 11h | Avg: 31m 59s | Max:  1h 00m | Hits:  70%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/103 | Total:  2d 09h | Avg: 33m 35s | Max:  1h 00m | Hits:  63%/7408  
      🟩 TestCPU            Pass: 100%/4   | Total: 44m 26s | Avg: 11m 06s | Max: 20m 35s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/4   | Total: 46m 47s | Avg: 11m 41s | Max: 12m 17s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 01m | Avg: 40m 29s | Max: 44m 02s
      🟩 90a                Pass: 100%/4   | Total:  1h 14m | Avg: 18m 43s | Max: 21m 21s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 13h 13m | Avg: 26m 26s | Max: 45m 07s
      🟩 14                 Pass: 100%/29  | Total: 17h 03m | Avg: 35m 18s | Max: 53m 56s | Hits:  63%/3704  
      🟩 17                 Pass: 100%/27  | Total: 16h 06m | Avg: 35m 47s | Max: 55m 15s | Hits:  63%/1852  
      🟩 20                 Pass: 100%/23  | Total: 12h 05m | Avg: 31m 33s | Max:  1h 00m | Hits:  81%/3704  
    
  • 🟩 cub: Pass: 100%/110 | Total: 3d 21h | Avg: 51m 07s | Max: 1h 13m | Hits: 65%/3028

    🟩 cpu
      🟩 amd64              Pass: 100%/102 | Total:  3d 14h | Avg: 50m 50s | Max:  1h 13m | Hits:  65%/3028  
      🟩 arm64              Pass: 100%/8   | Total:  7h 17m | Avg: 54m 44s | Max: 56m 51s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 48m | Avg: 47m 15s | Max: 58m 38s | Hits:  65%/757   
      🟩 11.8               Pass: 100%/3   | Total:  3h 32m | Avg:  1h 10m | Max:  1h 13m
      🟩 12.5               Pass: 100%/4   | Total:  4h 00m | Avg:  1h 00m | Max:  1h 01m
      🟩 12.6               Pass: 100%/88  | Total:  3d 02h | Avg: 50m 41s | Max:  1h 12m | Hits:  65%/2271  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/4   | Total:  3h 44m | Avg: 56m 07s | Max: 58m 22s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 48m | Avg: 47m 15s | Max: 58m 38s | Hits:  65%/757   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 32m | Avg:  1h 10m | Max:  1h 13m
      🟩 nvcc12.5           Pass: 100%/4   | Total:  4h 00m | Avg:  1h 00m | Max:  1h 01m
      🟩 nvcc12.6           Pass: 100%/84  | Total:  2d 22h | Avg: 50m 26s | Max:  1h 12m | Hits:  65%/2271  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/4   | Total:  3h 44m | Avg: 56m 07s | Max: 58m 22s
      🟩 nvcc               Pass: 100%/106 | Total:  3d 17h | Avg: 50m 56s | Max:  1h 13m | Hits:  65%/3028  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  5h 18m | Avg: 53m 02s | Max:  1h 09m
      🟩 Clang10            Pass: 100%/3   | Total:  2h 40m | Avg: 53m 28s | Max: 57m 39s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 37m | Avg: 54m 22s | Max: 58m 05s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 43m | Avg: 55m 58s | Max: 59m 14s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 35m | Avg: 53m 55s | Max: 58m 42s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 40m | Avg: 55m 00s | Max: 57m 27s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 42s | Max: 58m 04s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 41m | Avg: 55m 25s | Max: 59m 47s
      🟩 Clang17            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 44s | Max: 53m 25s
      🟩 Clang18            Pass: 100%/11  | Total:  8h 59m | Avg: 49m 03s | Max: 58m 22s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 30m | Avg: 45m 08s | Max: 46m 29s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 46m | Avg: 47m 46s | Max: 51m 14s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 54m | Avg: 49m 06s | Max: 53m 17s
      🟩 GCC9               Pass: 100%/6   | Total:  5h 02m | Avg: 50m 23s | Max: 53m 51s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 38m | Avg: 54m 38s | Max: 59m 31s
      🟩 GCC11              Pass: 100%/7   | Total:  7h 24m | Avg:  1h 03m | Max:  1h 13m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 36m | Avg: 54m 04s | Max: 57m 06s
      🟩 GCC13              Pass: 100%/16  | Total:  9h 43m | Avg: 36m 27s | Max: 58m 28s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 51m | Avg: 57m 00s | Max: 58m 50s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 58m 38s | Avg: 58m 38s | Max: 58m 38s | Hits:  65%/757   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 36s | Max: 57m 50s | Hits:  65%/1514  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  1h 07m | Avg:  1h 07m | Max:  1h 07m | Hits:  65%/757   
      🟩 NVHPC24.7          Pass: 100%/4   | Total:  4h 00m | Avg:  1h 00m | Max:  1h 01m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/48  | Total:  1d 18h | Avg: 52m 48s | Max:  1h 09m
      🟩 GCC                Pass: 100%/51  | Total:  1d 16h | Avg: 47m 46s | Max:  1h 13m
      🟩 Intel              Pass: 100%/3   | Total:  2h 51m | Avg: 57m 00s | Max: 58m 50s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 01m | Avg:  1h 00m | Max:  1h 07m | Hits:  65%/3028  
      🟩 NVHPC              Pass: 100%/4   | Total:  4h 00m | Avg:  1h 00m | Max:  1h 01m
    🟩 gpu
      🟩 v100               Pass: 100%/110 | Total:  3d 21h | Avg: 51m 07s | Max:  1h 13m | Hits:  65%/3028  
    🟩 jobs
      🟩 Build              Pass: 100%/102 | Total:  3d 18h | Avg: 53m 14s | Max:  1h 13m | Hits:  65%/3028  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 18m 52s | Avg: 18m 52s | Max: 18m 52s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 46s | Avg: 16m 46s | Max: 16m 46s
      🟩 HostLaunch         Pass: 100%/3   | Total: 55m 38s | Avg: 18m 32s | Max: 20m 26s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 42m | Avg: 34m 01s | Max: 49m 34s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 32m | Avg:  1h 10m | Max:  1h 13m
      🟩 90a                Pass: 100%/4   | Total:  1h 39m | Avg: 24m 52s | Max: 32m 08s
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  1d 01h | Avg: 50m 34s | Max:  1h 13m
      🟩 14                 Pass: 100%/29  | Total:  1d 01h | Avg: 53m 17s | Max:  1h 09m | Hits:  65%/1514  
      🟩 17                 Pass: 100%/27  | Total: 23h 44m | Avg: 52m 45s | Max:  1h 09m | Hits:  65%/757   
      🟩 20                 Pass: 100%/24  | Total: 18h 56m | Avg: 47m 20s | Max:  1h 12m | Hits:  65%/757   
    
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 42m | Avg: 5m 13s | Max: 22m 03s | Hits: 88%/256

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 27m | Avg:  5m 20s | Max: 22m 03s | Hits:  88%/256   
      🟩 arm64              Pass: 100%/4   | Total: 15m 10s | Avg:  3m 47s | Max:  4m 07s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 32m | Avg:  4m 52s | Max: 15m 26s | Hits:  88%/128   
      🟩 12.5               Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 30s
      🟩 12.6               Pass: 100%/33  | Total:  2h 56m | Avg:  5m 21s | Max: 22m 03s | Hits:  88%/128   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 32m | Avg:  4m 52s | Max: 15m 26s | Hits:  88%/128   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 30s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 56m | Avg:  5m 21s | Max: 22m 03s | Hits:  88%/128   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 42m | Avg:  5m 13s | Max: 22m 03s | Hits:  88%/256   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  7m 18s | Avg:  3m 39s | Max:  4m 00s
      🟩 Clang10            Pass: 100%/2   | Total:  8m 11s | Avg:  4m 05s | Max:  4m 45s
      🟩 Clang11            Pass: 100%/4   | Total: 13m 56s | Avg:  3m 29s | Max:  3m 42s
      🟩 Clang12            Pass: 100%/4   | Total: 13m 31s | Avg:  3m 22s | Max:  3m 37s
      🟩 Clang13            Pass: 100%/4   | Total: 14m 05s | Avg:  3m 31s | Max:  3m 51s
      🟩 Clang14            Pass: 100%/4   | Total: 26m 22s | Avg:  6m 35s | Max: 15m 26s
      🟩 Clang15            Pass: 100%/2   | Total:  7m 31s | Avg:  3m 45s | Max:  3m 49s
      🟩 Clang16            Pass: 100%/4   | Total: 15m 51s | Avg:  3m 57s | Max:  4m 07s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 27s | Avg:  3m 43s | Max:  3m 51s
      🟩 Clang18            Pass: 100%/2   | Total: 18m 26s | Avg:  9m 13s | Max: 14m 54s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 25s | Avg:  3m 42s | Max:  4m 08s
      🟩 GCC10              Pass: 100%/4   | Total: 14m 27s | Avg:  3m 36s | Max:  3m 53s
      🟩 GCC11              Pass: 100%/4   | Total: 13m 38s | Avg:  3m 24s | Max:  3m 33s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 13m | Avg: 10m 30s | Max: 22m 03s
      🟩 GCC13              Pass: 100%/3   | Total: 10m 02s | Avg:  3m 20s | Max:  3m 33s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 45s | Avg:  8m 45s | Max:  8m 45s | Hits:  88%/128   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 08s | Avg:  9m 08s | Max:  9m 08s | Hits:  88%/128   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 30s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 12m | Avg:  4m 25s | Max: 15m 26s
      🟩 GCC                Pass: 100%/20  | Total:  1h 59m | Avg:  5m 57s | Max: 22m 03s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 53s | Avg:  8m 56s | Max:  9m 08s | Hits:  88%/256   
      🟩 NVHPC              Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 30s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 42m | Avg:  5m 13s | Max: 22m 03s | Hits:  88%/256   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 13m | Avg:  3m 56s | Max:  9m 08s | Hits:  88%/256   
      🟩 Test               Pass: 100%/5   | Total:  1h 29m | Avg: 17m 49s | Max: 22m 03s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 54s | Avg:  2m 54s | Max:  2m 54s
      🟩 90a                Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 17m | Avg:  4m 45s | Max: 21m 39s
      🟩 20                 Pass: 100%/25  | Total:  2h 24m | Avg:  5m 47s | Max: 22m 03s | Hits:  88%/256   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 23s | Avg: 5m 11s | Max: 8m 05s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  8m 05s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  8m 05s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  8m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  8m 05s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  8m 05s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  8m 05s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 10m 23s | Avg:  5m 11s | Max:  8m 05s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 18s | Avg:  2m 18s | Max:  2m 18s
      🟩 Test               Pass: 100%/1   | Total:  8m 05s | Avg:  8m 05s | Max:  8m 05s
    
  • 🟩 python: Pass: 100%/1 | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 14m 43s | Avg: 14m 43s | Max: 14m 43s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 396)

# Runner
327 linux-amd64-cpu16
28 linux-arm64-cpu16
26 linux-amd64-gpu-v100-latest-1
15 windows-amd64-cpu16

@bernhardmgruber bernhardmgruber merged commit af0a8bb into NVIDIA:main Nov 28, 2024
414 checks passed
@miscco
Copy link
Contributor

miscco commented Nov 30, 2024

I am not a fan of this, because it might break existing code that looks whether those names are available. We should guard those with the macros and only replace the include with the forward declaration

@bernhardmgruber
Copy link
Contributor

Can you elaborate what you mean with "those names"? Are you referring to whether the types __half and __nv_bfloat16 are visible and could be used in SFINAE?

So we should guard the forward declaratiions by _LIBCUDACXX_HAS_NVFP16 etc.?

@miscco
Copy link
Contributor

miscco commented Dec 2, 2024

Yes unconditionally forward declaring names might break code. We should guard that with the feature macros

@bernhardmgruber
Copy link
Contributor

PR up: #2998

davebayer pushed a commit to davebayer/cccl that referenced this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG]: cuda::ptx takes long to compile
3 participants