-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement bit sized int types #3956
base: main
Are you sure you want to change the base?
Conversation
/ok to test |
🟨 CI finished in 1h 29m: Pass: 29%/158 | Total: 2d 06h | Avg: 20m 30s | Max: 1h 19m | Hits: 30%/53485
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
# | Runner |
---|---|
111 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
|
I love to see |
also, please note that |
GCC <10 does not support |
/ok to test |
1 similar comment
/ok to test |
🟩 CI finished in 1h 45m: Pass: 100%/158 | Total: 3d 19h | Avg: 34m 38s | Max: 1h 19m | Hits: 35%/249051
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
# | Runner |
---|---|
111 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
/ok to test |
🟨 CI finished in 1h 06m: Pass: 98%/158 | Total: 1d 06h | Avg: 11m 36s | Max: 53m 05s | Hits: 77%/249031
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 158)
# | Runner |
---|---|
111 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
asm("mov.b64 {%0, %1}, %2;" : "=r"(__hi), "=r"(__lo) : "l"(__val)); | ||
asm("prmt.b32 %0, %0, 0, 0x0123;" : "+r"(__hi)); | ||
asm("prmt.b32 %0, %0, 0, 0x0123;" : "+r"(__lo)); | ||
_CCCL_NODISCARD _CCCL_HIDE_FROM_ABI _CCCL_DEVICE uint16_t __byteswap_impl_device(uint16_t __val) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about to use if constexpr
instead of function overloadings?
} | ||
return __impl_recursive<uint16_t>(__val); | ||
#endif // !_CCCL_BUILTIN_BSWAP32 | ||
# if __cccl_ptx_isa >= 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm start thinking that __cccl_ptx_isa
guard is not needed even here
# if _CCCL_COMPILER(MSVC) | ||
NV_IF_TARGET(NV_IS_HOST, return _byteswap_ulong(__val);) | ||
NV_IF_TARGET(NV_IS_HOST, return _byteswap_ulong(__val);) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NV_IF_TARGET(NV_IS_HOST, return _byteswap_ulong(__val);) | |
NV_IF_TARGET(NV_IS_HOST, return ::_byteswap_ulong(__val);) |
} | ||
return __result; | ||
__result <<= __shift; | ||
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(_CUDA_VSTD::numeric_limits<uint8_t>::max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(_CUDA_VSTD::numeric_limits<uint8_t>::max()); | |
__result |= (__val >> (__i * __shift)) & _Tp{numeric_limits<uint8_t>::max()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(_CUDA_VSTD::numeric_limits<uint8_t>::max()); | |
__result |= (__val >> (__i * __shift)) & static_cast<_Tp>(~_Tp{0}); |
constexpr auto __shift = numeric_limits<uint8_t>::digits; | ||
|
||
_Tp __result{}; | ||
for (size_t __i{}; __i < sizeof(__val); ++__i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we would need a portable #pragma unroll
__always_false
trait__uint_t
in byteswap and fix unchecked ptx ISA and device's SM versioncuda/std/__type_traits/make_32_64_or_128_bit.h
module