-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace _CCCL_ALWAYS_INLINE
with _CCCL_FORCEINLINE
#2439
Conversation
🟨 CI finished in 1h 19m: Pass: 99%/368 | Total: 2d 03h | Avg: 8m 22s | Max: 49m 36s | Hits: 74%/25647
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 2h 32m: Pass: 100%/368 | Total: 2d 03h | Avg: 8m 24s | Max: 49m 36s | Hits: 74%/25647
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
template <class Vector> | ||
void TestTransformInputOutputIterator() | ||
THRUST_DISABLE_BROKEN_GCC_VECTORIZER void TestTransformInputOutputIterator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes our tests, but won't gcc still be miscompiling Thrust for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is nothing we can change. I want to note that this is exceptionally frickle and dependent on exact sizes and optimization settings, so I dont see anything we can do there
@@ -39,7 +39,7 @@ struct wrapped_function | |||
|
|||
_CCCL_EXEC_CHECK_DISABLE | |||
template <typename... Ts> | |||
_CCCL_FORCEINLINE _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const | |||
inline _CCCL_HOST_DEVICE Result operator()(Ts&&... args) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco at least locally, this change avoids the gcc optimizer issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is awesome. I will test whether we can avoid that hack in the other tests too. Will file a separate PR though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is still failing with gcc 12 😿
https://github.com/NVIDIA/cccl/actions/runs/10967082905/job/30507674109?pr=2439
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it fixed almost all of the failures tho. just two remain. i can track those down when i get back from vacation.
/ok to test |
🟨 CI finished in 2h 00m: Pass: 99%/368 | Total: 7d 00h | Avg: 27m 29s | Max: 1h 25m | Hits: 54%/25647
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟨 CI finished in 2d 08h: Pass: 99%/368 | Total: 7d 00h | Avg: 27m 25s | Max: 1h 25m | Hits: 54%/25647
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 368)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
28 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
/ok to test |
🟩 CI finished in 1h 49m: Pass: 100%/370 | Total: 7d 15h | Avg: 29m 44s | Max: 1h 14m | Hits: 9%/25696
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
+/- | Thrust |
+/- | CUDA Experimental |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 370)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
30 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
Description
cccl has
_CCCL_FORCEINLINE
and_CCCL_ALWAYS_INLINE
. there should be only one. also,_CCCL_FORCEINLINE
currently expands toinline
when not using a CUDA compiler. that is unexpected. it should expand to either__attribute__((always_inline))
or__forceinline
depending on which is supported by the host compiler.closes #2438
This PR moves the definition of
_CCCL_FORCEINLINE
fromexecution_space.h
tovisibility.h
. it also changes the definition to expand directly to either__inline__ __attribute__((always_inline))
or__forceinline
rather then indirectly through the__forceinline__
macro defined inhost_defines.h
.Checklist