-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomics backend refactor #1631
Atomics backend refactor #1631
Conversation
e085e18
to
a780c26
Compare
* This matches other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot.
There is still some nitty gritty correctness stuff we need to do but its already looking great
* We previously defined or *clobbered* the existing STL definitions. * See: `ATOMIC_BOOL_LOCK_FREE`->`LIBCUDACXX_ATOMIC_BOOL_LOCK_FREE`
Co-authored-by: Michael Schellenberger Costa <[email protected]>
template <typename _Tp> | ||
__atomic_alignment_wrapper<__remove_cv_t<_Tp>>& __atomic_auto_align(_Tp* __a) | ||
{ | ||
using __aligned_t = __atomic_alignment_wrapper<__remove_cv_t<_Tp>>; | ||
return *reinterpret_cast<__aligned_t*>(__a); | ||
}; | ||
template <typename _Tp> | ||
const __atomic_alignment_wrapper<__remove_cv_t<_Tp>>& __atomic_auto_align(const _Tp* __a) | ||
{ | ||
using __aligned_t = const __atomic_alignment_wrapper<__remove_cv_t<_Tp>>; | ||
return *reinterpret_cast<__aligned_t*>(__a); | ||
}; | ||
template <typename _Tp> | ||
volatile __atomic_alignment_wrapper<__remove_cv_t<_Tp>>& __atomic_auto_align(volatile _Tp* __a) | ||
{ | ||
using __aligned_t = volatile __atomic_alignment_wrapper<__remove_cv_t<_Tp>>; | ||
return *reinterpret_cast<__aligned_t*>(__a); | ||
}; | ||
template <typename _Tp> | ||
const volatile __atomic_alignment_wrapper<__remove_cv_t<_Tp>>& __atomic_auto_align(const volatile _Tp* __a) | ||
{ | ||
using __aligned_t = const volatile __atomic_alignment_wrapper<__remove_cv_t<_Tp>>; | ||
return *reinterpret_cast<__aligned_t*>(__a); | ||
}; | ||
|
||
// Guard ifdef for lock free query in case it is assigned elsewhere (MSVC/CUDA) | ||
inline void __atomic_thread_fence_host(memory_order __order) | ||
{ | ||
__atomic_thread_fence(__atomic_order_to_int(__order)); | ||
} | ||
|
||
inline void __atomic_signal_fence_host(memory_order __order) | ||
{ | ||
__atomic_signal_fence(__atomic_order_to_int(__order)); | ||
} | ||
|
||
template <typename _Tp, typename _Up> | ||
inline void __atomic_store_host(_Tp* __a, _Up __val, memory_order __order) | ||
{ | ||
__atomic_store( | ||
&__atomic_auto_align<_Tp>(__a), &__atomic_auto_align<__remove_cv_t<_Tp>>(&__val), __atomic_order_to_int(__order)); | ||
} | ||
|
||
template <typename _Tp> | ||
inline auto __atomic_load_host(_Tp* __a, memory_order __order) -> __remove_cv_t<_Tp> | ||
{ | ||
__remove_cv_t<_Tp> __ret; | ||
__atomic_load( | ||
&__atomic_auto_align<_Tp>(__a), &__atomic_auto_align<__remove_cv_t<_Tp>>(&__ret), __atomic_order_to_int(__order)); | ||
return __ret; | ||
} | ||
|
||
template <typename _Tp, typename _Up> | ||
inline auto __atomic_exchange_host(_Tp* __a, _Up __val, memory_order __order) -> __remove_cv_t<_Tp> | ||
{ | ||
__remove_cv_t<_Tp> __ret; | ||
__atomic_exchange(&__atomic_auto_align<_Tp>(__a), | ||
&__atomic_auto_align<__remove_cv_t<_Tp>>(&__val), | ||
&__atomic_auto_align<__remove_cv_t<_Tp>>(&__ret), | ||
__atomic_order_to_int(__order)); | ||
return __ret; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco @griwes
Time for excuses: __atomic_auto_align
why did I write this?
Because alignment information is lost as we descend into the abyss. And if we continued to link we'd end up requiring libatomic. There could be a better way to write this. Any ideas?
https://gcc.godbolt.org/z/vdxffdnKc
This is where all that wrap/unwrap nastyness came from as well. It was just more or less poorly written when a first pass was made.
🟨 CI Results [ Failed: 21 | Passed: 281 | Total: 302 ]
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
🟨 CI Results [ Failed: 20 | Passed: 282 | Total: 302 ]
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
🟨 CI Results [ Failed: 2 | Passed: 300 | Total: 302 ]
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
pre-commit.ci autofix |
/ok to test |
🟩 CI Results [ Failed: 0 | Passed: 302 | Total: 302 ]
|
# | Runner |
---|---|
232 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
24 | linux-arm64-cpu16 |
18 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
The `trie_mt.cpp` example doesn't use libcu++, so `LIBCUDACXX_ATOMIC_VAR_INIT` is undeclared: /root/cccl/libcudacxx/examples/trie_mt.cpp:39:30: error: 'LIBCUDACXX_ATOMIC_VAR_INIT' was not declared in this scope 39 | std::atomic<trie*> ptr = LIBCUDACXX_ATOMIC_VAR_INIT(nullptr); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ /root/cccl/libcudacxx/examples/trie_mt.cpp:41:29: error: 'LIBCUDACXX_ATOMIC_VAR_INIT' was not declared in this scope 41 | std::atomic_flag flag = LIBCUDACXX_ATOMIC_VAR_INIT(0); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ /root/cccl/libcudacxx/examples/trie_mt.cpp:43:28: error: 'LIBCUDACXX_ATOMIC_VAR_INIT' was not declared in this scope 43 | std::atomic<int> count = LIBCUDACXX_ATOMIC_VAR_INIT(0); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ gmake[2]: *** [CMakeFiles/trie_mt.dir/build.make:76: CMakeFiles/trie_mt.dir/trie_mt.cpp.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:117: CMakeFiles/trie_mt.dir/all] Error 2 gmake: *** [Makefile:91: all] Error 2 This issue is introduced in PR NVIDIA#1631 and commit 12c2892
Description
closes: #1615
Codegen before and after changes is identical. Though the next step is to compare with additional FileCheck tests.
Checklist