-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add b200 policies for partition.three_way #3708
Add b200 policies for partition.three_way #3708
Conversation
// template <class Input, class OffsetT> | ||
// struct sm100_tuning<Input, OffsetT, input_size::_1, offset_size::_4> | ||
// { | ||
// // trp_0.ipt_12.tpb_256.ns_792.dcid_6.l2w_365 1.063960 0.978016 1.072833 1.301435 | ||
// static constexpr int items = 12; | ||
// static constexpr int threads = 256; | ||
// static constexpr BlockLoadAlgorithm load_algorithm = BLOCK_LOAD_DIRECT; | ||
// using delay_constructor = exponential_backon_jitter_constructor_t<792, 365>; | ||
// }; | ||
|
||
// template <class Input, class OffsetT> | ||
// struct sm100_tuning<Input, OffsetT, input_size::_2, offset_size::_4> | ||
// { | ||
// // trp_1.ipt_14.tpb_288.ns_496.dcid_6.l2w_400 1.170449 1.123515 1.170428 1.252066 | ||
// static constexpr int items = 14; | ||
// static constexpr int threads = 288; | ||
// static constexpr BlockLoadAlgorithm load_algorithm = BLOCK_LOAD_WARP_TRANSPOSE; | ||
// using delay_constructor = exponential_backon_jitter_constructor_t<496, 400>; | ||
// }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be explicit and name the same as SM90 as in other tunings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, the less I like constructs like:
// default back to SM90 tuning
template <....>
struct sm100_tuning<...> : sm90_tuning<...> {};
Tunings can work differently for each algorithm or architecture, so here sm100_tuning
can have different template arguments than sm90_tuning
, or different data members. Also, if sm100_tuning
had a ::value
that could be interpreted differently than sm90_tuning::value
, by the selection logic in the policy hub. Therefore, IMO, the best is to not provide a template specialization at all and let SFINAE not find an sm100_tuning
and fall back.
But I could add a comment here.
634e19f
to
1cd873e
Compare
🟩 CI finished in 1h 23m: Pass: 100%/90 | Total: 19h 18m | Avg: 12m 52s | Max: 39m 52s | Hits: 94%/132225
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 90)
# | Runner |
---|---|
65 | linux-amd64-cpu16 |
9 | windows-amd64-cpu16 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-arm64-cpu16 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2 | linux-amd64-gpu-rtx2080-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
(cherry picked from commit 9b7333b)
Successfully created backport PR for |
(cherry picked from commit 9b7333b) Co-authored-by: Bernhard Manfred Gruber <[email protected]> Co-authored-by: Michael Schellenberger Costa <[email protected]>
Pulled out the tunings which are already approved from #3617 to make progress.