-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support FP16 traits on CTK 12.0 #3535
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
@@ -53,7 +53,7 @@ _CCCL_INLINE_VAR constexpr bool __is_extended_floating_point_v | |||
# endif // !_CCCL_NO_INLINE_VARIABLES | |||
#endif // !_CCCL_NO_VARIABLE_TEMPLATES | |||
|
|||
#if defined(_LIBCUDACXX_HAS_NVFP16) | |||
#if defined(_CCCL_HAS_FP16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work, because above we guard with _LIBCUDACXX_HAS_NVFP16
Please also change the include guard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work, because above we guard with
_LIBCUDACXX_HAS_NVFP16
Fixes
Please also change the include guard
Can you elaborate? The include guard _LIBCUDACXX___TYPE_TRAITS_IS_EXTENDED_FLOATING_POINT_H
looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I meant please also change theguard for the <cuda_fp16.h>
include
8653d48
to
6cade3d
Compare
/ok to test |
6cade3d
to
d2f5a82
Compare
/ok to test |
🟨 CI finished in 5h 49m: Pass: 86%/153 | Total: 2d 00h | Avg: 18m 53s | Max: 1h 15m | Hits: 414%/8457
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 153)
# | Runner |
---|---|
110 | linux-amd64-cpu16 |
17 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
d2f5a82
to
3aa0c00
Compare
/ok to test |
🟨 CI finished in 1h 57m: Pass: 93%/153 | Total: 1d 16h | Avg: 15m 52s | Max: 1h 17m | Hits: 82%/11450
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 153)
# | Runner |
---|---|
110 | linux-amd64-cpu16 |
17 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
3aa0c00
to
4aaedf9
Compare
/ok to test |
🟨 CI finished in 1d 01h: Pass: 99%/152 | Total: 1d 13h | Avg: 14m 39s | Max: 1h 15m | Hits: 436%/21515
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 152)
# | Runner |
---|---|
110 | linux-amd64-cpu16 |
17 | linux-amd64-gpu-v100-latest-1 |
14 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
4aaedf9
to
86457ca
Compare
🟨 CI finished in 2h 57m: Pass: 99%/152 | Total: 1d 13h | Avg: 14m 44s | Max: 1h 16m | Hits: 398%/21523
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 152)
# | Runner |
---|---|
110 | linux-amd64-cpu16 |
17 | linux-amd64-gpu-v100-latest-1 |
14 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1 |
🟩 CI finished in 6h 38m: Pass: 100%/152 | Total: 1d 13h | Avg: 14m 59s | Max: 1h 16m | Hits: 398%/21523
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 152)
# | Runner |
---|---|
110 | linux-amd64-cpu16 |
17 | linux-amd64-gpu-v100-latest-1 |
14 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1 |
Backport failed for Please cherry-pick the changes locally. git fetch origin branch/2.8.x
git worktree add -d .worktree/backport-3535-to-branch/2.8.x origin/branch/2.8.x
cd .worktree/backport-3535-to-branch/2.8.x
git checkout -b backport-3535-to-branch/2.8.x
ancref=$(git merge-base 09b12009d906bdb69f9da60de5196991d0610f9e 86457ca76fdf502d4932d372e6cd49ca751e575f)
git cherry-pick -x $ancref..86457ca76fdf502d4932d372e6cd49ca751e575f |
* Support FP16 traits on CTK 12.0 * Only enable constexpr limits when supported * Support float_eq on CTK < 12.2
* Support FP16 traits on CTK 12.0 * Only enable constexpr limits when supported * Support float_eq on CTK < 12.2
libcu++ uses the macros
_CCCL_HAS_NV[FP|BF]16
to indicate the general availability of FP16 types (CTK headers are present), and_LIBCUDACXX_HAS_NV[FP|BF]16
that those types are actually supported in various places. This leads to problematic results in downstream code.For example, in the CUB test,
_CCCL_HAS_NVFP16
is always defined for the supported CTKs, but_LIBCUDACXX_HAS_NVFP16
only on CTK >= 12.2. Therefore,cuda::is_floating_point<__half>
isfalse
for CTK < 12.2 andtrue
afterwards. However, any code relying on_CCCL_HAS_NVFP16
for__half
support will likely run into surprising results.This PR enables all FP16 traits already when
_CCCL_HAS_NVFP16
is defined.