-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify assert handling in cccl #2382
Conversation
649bf5d
to
694ac3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work so far, thx! I see no changes to the build system tough. What are users expected to define now to get assertions? DEBUG
, _CCCL_ENABLE_DEBUG_MODE
, or _CCCL_ENABLE_ASSERTIONS
? And why do we need a distinction between a debug mode and assertions enabled?
{ | ||
#ifdef _LIBCUDACXX_ENABLE_DEBUG_MODE | ||
if (!__libcpp_is_constant_evaluated()) | ||
{ | ||
__get_db()->__iterator_copy(this, _CUDA_VSTD::addressof(__u)); | ||
} | ||
#endif | ||
} | ||
#ifdef _LIBCUDACXX_ENABLE_DEBUG_MODE | ||
_LIBCUDACXX_HIDE_FROM_ABI _CCCL_CONSTEXPR_CXX14 __wrap_iter(const __wrap_iter& __x) | ||
: __i_(__x.base()) | ||
{ | ||
if (!__libcpp_is_constant_evaluated()) | ||
{ | ||
__get_db()->__iterator_copy(this, _CUDA_VSTD::addressof(__x)); | ||
} | ||
} | ||
_LIBCUDACXX_HIDE_FROM_ABI _CCCL_CONSTEXPR_CXX14 __wrap_iter& operator=(const __wrap_iter& __x) | ||
{ | ||
if (this != _CUDA_VSTD::addressof(__x)) | ||
{ | ||
if (!__libcpp_is_constant_evaluated()) | ||
{ | ||
__get_db()->__iterator_copy(this, _CUDA_VSTD::addressof(__x)); | ||
} | ||
__i_ = __x.__i_; | ||
} | ||
return *this; | ||
} | ||
_LIBCUDACXX_HIDE_FROM_ABI _CCCL_CONSTEXPR_CXX20 ~__wrap_iter() | ||
{ | ||
if (!__libcpp_is_constant_evaluated()) | ||
{ | ||
__get_db()->__erase_i(this); | ||
} | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why remove all this? And why as part of this PR? It seems unrelated to the assert unification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats something i had in that branch lying around. We need to get rid of all the old debug handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For next time: please ship this is a separate PR.
@@ -39,6 +39,11 @@ else() # NOT LIBCUDACXX_TEST_WITH_NVRTC | |||
set(LIBCUDACXX_TEST_COMPILER_FLAGS "-DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE") | |||
endif() | |||
|
|||
# enable exceptions and assertions in tests | |||
string(APPEND LIBCUDACXX_TEST_COMPILER_FLAGS | |||
" -DLIBCUDACXX_ENABLE_EXCEPTIONS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling exceptions seems unrelated to this PR? Why was it added? Oversight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to enable assertions globally, so we can just as well enable exceptions too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine this introducing all kinds of new side effects and possible requiring additional fixes unrelated to assertions. Can we please do it in a separate PR?
would make sense to add an "optimization mode"? namely, mapping assertions to |
694ac3a
to
30abcd9
Compare
That is exceptionally difficult. On one hand, __builtin_assume is not supported, but more importantly, even clang does not enable it because there are performance implications |
@miscco can you update the PR description with a summary of the approach we're taking to unify things and what the default behavior will be? |
This is mindblowing. I always found user-provided assumptions very useful for optimization, but I understand that this doesn't apply to everything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big problem here is that __cccl/assert.h
doesn't handle nvc++ -stdpar
.
# define _CCCL_VERIFY(expression, message) \ | ||
_CCCL_DIAG_PUSH _CCCL_DIAG_SUPPRESS_ICC(4190) _CCCL_ASSERT_IMPL(expression, message) _CCCL_DIAG_POP | ||
#else // ^^^ _CCCL_COMPILER_ICC ^^^ / vvv !_CCCL_COMPILER_ICC vvv | ||
# define _CCCL_VERIFY(expression, message) _CCCL_ASSERT_IMPL(expression, message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If __CUDA_ARCH__
is defined but CCCL_ENABLE_DEVICE_ASSERTIONS
is not defined, then _CCCL_ASSERT_IMPL
will be defined as empty, so _CCCL_VERIFY(expression, message)
will expand to (expression, message)
. That will compile and behave correctly, but it's not what you want.
If neither __CUDA_ARCH__
nor CCCL_ENABLE_HOST_ASSERTIONS
are defined, then _CCCL_ASSERT_IMPL
is not defined at all, so _CCCL_VERIFY(expression, message)
will expand to _CCCL_ASSERT_IMPL(expression, message)
, which is a compilation error.
Since _CCCL_VERIFY
is always defined, _CCCL_ASSERT_IMPL
must always be defined. If it can't be defined as something meaningful, then it should be #define _CCCL_ASSERT_IMPL(expression, message) ((void)0)
#endif // !CCCL_ENABLE_DEVICE_ASSERTIONS | ||
|
||
//! Use internal nvcc implementation on device or the host library for clang-cuda | ||
#ifdef __CUDA_ARCH__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file, cuda/std/__cccl/assert.h
, ignores NVC++ entirely, and won't work correctly for nvc++ -stdpar=gpu
. The file assumes that host and device code are compiled separately, and that __CUDA_ARCH__
is defined when compiling device code and __CUDA_ARCH__
is not defined when compiling host code.
nvc++ -stdpar=gpu
compiles both host and device code in a single pass and does not define __CUDA_ARCH__
. It is probably necessary to give NVC++ special handling, separate from all the logic that is here. _CCCL_ASSERT_IMPL
would be defined as something that works in both host and device code with NVC++. (__assert_fail
might work in both host and device code with NVC++, but that should be verified.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked the approach to use the host standard library on host and only use the special fallback for nvcc on device.
The _CCCL_ASSERT_IMPL
now forwards to _NV_IF_TARGET
dce12bb
to
8e3ed7e
Compare
🟨 CI finished in 6h 25m: Pass: 93%/433 | Total: 3d 05h | Avg: 10m 48s | Max: 1h 30m | Hits: 99%/24636
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 433)
# | Runner |
---|---|
320 | linux-amd64-cpu16 |
62 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
7689a19
to
4500e82
Compare
f4feb1a
to
0bf8c0e
Compare
🟨 CI finished in 8h 46m: Pass: 95%/433 | Total: 6d 21h | Avg: 22m 56s | Max: 1h 54m | Hits: 78%/24636
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 433)
# | Runner |
---|---|
320 | linux-amd64-cpu16 |
62 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at cuda/std/__cccl/assert.h
. It looks fine to me. Thank you for using NV_IF_TARGET
only for NVC++. I was going to suggest that.
232c40f
to
c0b019e
Compare
What is the default state for these flags and how does it interact with |
I need to write some documentation dont I The default is that those flags are not defined. We only define them when the user defines |
🟨 CI finished in 52m 10s: Pass: 97%/364 | Total: 1d 20h | Avg: 7m 17s | Max: 43m 21s | Hits: 77%/23013
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 364)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
24 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
76b8117
to
fa5ebe7
Compare
🟨 CI finished in 2h 03m: Pass: 97%/364 | Total: 4d 00h | Avg: 15m 57s | Max: 1h 24m | Hits: 15%/25679
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 364)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
24 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
🟨 CI finished in 1h 39m: Pass: 97%/364 | Total: 2d 08h | Avg: 9m 18s | Max: 1h 25m | Hits: 74%/25679
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 364)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
24 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
dddcf29
to
2386df3
Compare
🟨 CI finished in 1h 43m: Pass: 98%/364 | Total: 7d 21h | Avg: 31m 16s | Max: 1h 15m | Hits: 15%/25687
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
Thrust | |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 364)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
24 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
1c75ba5
to
3e00163
Compare
We currently do not have proper assertions within CCCL. There are different approaches in cub thrust and libcu++, some of which are completely broken. This tries to rework the assertion handlers so that they work uniformly everywhere and can be selectively enabled.
🟨 CI finished in 2h 05m: Pass: 99%/364 | Total: 6d 07h | Avg: 24m 57s | Max: 1h 20m | Hits: 9%/25720
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 364)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
24 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
I cannot deal with it in the interpreter
🟩 CI finished in 2h 03m: Pass: 100%/364 | Total: 2d 19h | Avg: 11m 11s | Max: 1h 23m | Hits: 9%/25720
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 364)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
24 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. a big improvement.
fixes #2381
This works toward unification of assert handling in CCCL
Currently we do not enable assertions even in debug mode, which is ... unfortunate (See #2381)
For libcu++ the user must define
_LIBCUDACXX_ENABLE_DEBUG_MODE
or_LIBCUDACXX_ENABLE_ASSERTIONS
None of those macros are user facing or documented anywhere, which makes it really difficult to use. Furthermore, this only works on libcu++ and not Thrust and CUB. However, we want a delightful experience across all our libraries.
To improve the situation and give our users the choice we move the assertion handling into a global CCCL wide facility.
The proposal is to add three distinct, user facing flags:
CCCL_ENABLE_HOST_ASSERTIONS
This enables use of assertions in host code onlyCCCL_ENABLE_DEVICE_ASSERTIONS
This enables use of assertions in device code onlyCCCL_ENABLE_ASSERTIONS
This enables use of assertions in both host and device codeThis allows the user to select what kind of assertions they want to use and whether they want to take the compile-time / performance hit of device assertions.
Finally, we also provide internal facilities to actually use those assertions, namely
_CCCL_VERIFY
This is always on and reserved for absolutely critical correctness checks._CCCL_ASSERT
This is conditionally on depending on the user provided flag