-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync ptx helpers with libcudaptx #3564
Conversation
@@ -22,6 +22,7 @@ | |||
# pragma system_header | |||
#endif // no system header | |||
|
|||
#include <cuda/std/__type_traits/enable_if.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: we are going to use this in generated functions which will be proposed in subsequent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One question.
🟩 CI finished in 3h 54m: Pass: 100%/152 | Total: 3d 01h | Avg: 29m 01s | Max: 1h 14m | Hits: 414%/21515
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 152)
# | Runner |
---|---|
110 | linux-amd64-cpu16 |
17 | linux-amd64-gpu-v100-latest-1 |
14 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
Git push to origin failed for branch/2.8.x with exitcode 128 |
* Sync ptx_dot_variants.h with libcuda-ptx (#3564) * Update ptx_isa.h to include 8.6 and 8.7 (#3563) * PTX: Update generated files with Blackwell instructions (#3568) * ptx: Update existing instructions * ptx: Add new instructions * Fix returning error out values See: - https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/74 - https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/73 * ptx: Fix out var declaration See https://gitlab-master.nvidia.com/CCCL/libcuda-ptx/-/merge_requests/75 * mbarrier.{test,try}_wait: Fix test. Wrong files were included. * docs: Fix special registers include * Allow non-included documentation pages * Workaround NVRTC Co-authored-by: Allard Hendriksen <[email protected]> * PTX: Remove internal instructions (#3583) * barrier.cluster.aligned: Remove This is not supposed to be exposed in CCCL. * elect.sync: Remove Not ready for inclusion yet. This needs to handle the optional extra output mask as well. * mapa: Remove This has compiler bugs. We should use intrinsics instead. Co-authored-by: Allard Hendriksen <[email protected]> * PTX: Update existing instructions (#3584) * mbarrier.expect_tx: Add missing source and test It was already documented(!) * cp.async.bulk.tensor: Add .{gather,scatter}4 * fence: Add .sync_restrict, .proxy.async.sync_restrict Co-authored-by: Allard Hendriksen <[email protected]> * PTX: Add clusterlaunchcontrol (#3589) Co-authored-by: Allard Hendriksen <[email protected]> * PTX: Add cp.async.mbarrier.arrive{.noinc} (#3602) Co-authored-by: Allard Hendriksen <[email protected]> * PTX: Add multimem instructions (#3603) * Add multimem.ld_reduce * Add multimem.red * Add multimem.st Co-authored-by: Allard Hendriksen <[email protected]> * PTX: Add st.bulk (#3604) Co-authored-by: Allard Hendriksen <[email protected]> * PTX: Add tcgen05 instructions (#3607) * ptx: Add tcgen05.alloc * ptx: Add tcgen05.commit * ptx: Add tcgen05.cp * ptx: Add tcgen05.fence * ptx: Add tcgen05.ld * ptx: Add tcgen05.mma * ptx: Add tcgen05.mma.ws * ptx: Add tcgen05.shift * ptx: Add tcgen05.st * ptx: Add tcgen05.wait * fix docs --------- Co-authored-by: Allard Hendriksen <[email protected]> --------- Co-authored-by: Allard Hendriksen <[email protected]>
No description provided.