-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize PTX headers to match generator #2925
Conversation
8118516
to
e77f1d2
Compare
🟨 CI finished in 3h 06m: Pass: 99%/396 | Total: 4d 14h | Avg: 16m 47s | Max: 1h 11m | Hits: 75%/21915
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 396)
# | Runner |
---|---|
327 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
26 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
e77f1d2
to
542f32d
Compare
What would be the issue of adding header guards around the .inc files? The code is autogenerated, so we should be able to generate those as well |
If you promise me to never need to change the header guard because you want to move the generated files to a different place, then I am fine! |
8a35ce5
to
e55e5f8
Compare
Can I add |
Dont you already have the name of the instrinsic stored when printing out? |
🟨 CI finished in 3h 04m: Pass: 96%/396 | Total: 3d 22h | Avg: 14m 17s | Max: 56m 30s | Hits: 83%/22038
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 396)
# | Runner |
---|---|
327 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
26 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
Alright, I can add a header guard. But if you would want to do it properly, we would also need to put the generated code into the right namespace, which means surrounding it with |
__cuda_ptx_tensormap_cp_fenceproxy_is_not_supported_before_SM_90__();)); | ||
} | ||
#endif // __cccl_ptx_isa >= 830 | ||
#include <cuda/__ptx/instructions/generated/tensormap_cp_fenceproxy.inc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this being a temporary step and merging it as is, but I do not see a reason why a code generator that does generate a tensormap_cp_fenceproxy.inc
cannot generate a tensormap_cp_fenceproxy.h
with an #ifdef _CUDA_PTX_TENSORMAP_CP_FENCEPROXY_H
at the beginning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot generate a tensormap_cp_fenceproxy.h with an #ifdef _CUDA_PTX_TENSORMAP_CP_FENCEPROXY_H at the beginning
That will break, because that macro is already defined in this header. The idea is to maximize the decoupling between the generated code and CCCL. CCCL is not the only customer of this generated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco I will add the header guards in a follow-up PR.
@ahendriksen I think adding header guards will still support your internal use cases. But we can discuss this offline.
That is almost certainly not going to happen and depends solely on us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM! Thanks to CI, I found that mbarrier_test_wait was missing some content. Do you think that could have happened elsewhere as well?
libcudacxx/include/cuda/__ptx/instructions/generated/mbarrier_test_wait.inc
Show resolved
Hide resolved
e55e5f8
to
11041a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
🟩 CI finished in 1h 47m: Pass: 100%/396 | Total: 6d 20h | Avg: 24m 53s | Max: 1h 11m | Hits: 66%/22042
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 396)
# | Runner |
---|---|
327 | linux-amd64-cpu16 |
28 | linux-arm64-cpu16 |
26 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
This PR splits the
cuda::ptx
headers into the generated and manually written part. It's just moving around content and renaming files. No other changes.