-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable including atomics and friends in TUs that do not support them. #1736
base: main
Are you sure you want to change the base?
Conversation
🟩 CI Results [ Failed: 0 | Passed: 302 | Total: 302 ]
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, that has been a long desired feature.
That said, we are missing
- Enabling all tests that are currently guarded by
pre-sm-70
- Adding a test set that ensures that we do fail when passed a lower architecture
(return __atomic_load_n_cuda(__a->get(), static_cast<__memory_order_underlying_t>(__order), _Sco{});), | ||
NV_IS_DEVICE, | ||
(__atomic_is_not_supported_pre_sm_60();), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to make this a template that returns __atomic_underlying_t<_Sto>
because otherwise we might have no return in that function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or:
(__atomic_is_not_supported_pre_sm_60();), | |
(__atomic_is_not_supported_pre_sm_60(); return {};), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment applies to a lot more places below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just should mark the undefined function [[noinline]]
, that should kill any potential for stupid warnings.
I dont think so. With us moving towards linker errors we should be able to use NV_IF_TARGET to test this within the current framework |
Agreed with @miscco's review. |
Sure, you'd just need to rewrite all of the current |
It depends a bit on the tests, but we have moved towards defining a single test function anyway, which makes this rather easy to do |
@@ -64,11 +64,17 @@ struct __atomic_storage | |||
} | |||
}; | |||
|
|||
#if defined(_CCCL_CUDA_COMPILER) | |||
extern "C" _CCCL_DEVICE void __atomic_is_not_supported_pre_sm_60(); | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#endif | |
#endif // _CCCL_CUDA_COMPILER |
(return __atomic_load_n_cuda(__a->get(), static_cast<__memory_order_underlying_t>(__order), _Sco{});), | ||
NV_IS_DEVICE, | ||
(__atomic_is_not_supported_pre_sm_60();), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment applies to a lot more places below.
(return __atomic_load_n_cuda(__a->get(), static_cast<__memory_order_underlying_t>(__order), _Sco{});), | ||
NV_IS_DEVICE, | ||
(__atomic_is_not_supported_pre_sm_60();), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just should mark the undefined function [[noinline]]
, that should kill any potential for stupid warnings.
@@ -45,7 +45,7 @@ function(libcudacxx_add_public_header_test header) | |||
target_compile_options(headertest_${header_name} PRIVATE ${headertest_warning_levels_device}) | |||
|
|||
# Ensure that if this is an atomic header, we only include the right architectures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is stale with these changes.
🟨 CI finished in 4h 11m: Pass: 99%/417 | Total: 3d 05h | Avg: 11m 11s | Max: 1h 00m | Hits: 94%/524401
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 417)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
61 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
When can we expect this to be merged? |
/ok to test |
🟨 CI finished in 7h 05m: Pass: 99%/417 | Total: 2d 09h | Avg: 8m 14s | Max: 38m 15s | Hits: 97%/526130
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 417)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
61 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
🟩 CI finished in 3d 00h: Pass: 100%/417 | Total: 2d 09h | Avg: 8m 12s | Max: 38m 15s | Hits: 97%/526135
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 417)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
61 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
There are still issues with the solution we've come up with here. It's possible to defer the linker error to runtime when the TU is compiled from PTX to SASS during JIT or with |
Description
This removes most
#error
protections from atomics headers. This allows them to be included in TUs where they previously would force a failure if the__CUDA_ARCH__
list included an architecture that was not supported.Link time failures are emitted when compiling for architectures where the API is not supported.
For example:
results in:
closes: #1083
Checklist