Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for in-place DeviceSelect & thrust::remove_if #1782

Merged
merged 25 commits into from
Jul 9, 2024

Conversation

elstehle
Copy link
Collaborator

@elstehle elstehle commented May 27, 2024

Description

Closes #1730

Currently, for the in-place versions of mentioned algorithms, we may run into a race condition, where a thread block's input items may have already been overwritten by one of the subsequent thread blocks. More specifically, if there are attributes of an item that are not needed to evaluate whether an item is selected or not, when compiling with a tuning policy that is not using shared memory during the BlockLoad stage.

All of the following conditions must be met for the race to be present:

  • Presence of item attributes that are not needed to evaluate whether an item is selected (e.g., see [BUG]: Intermittent wrong output from thrust::remove_if under heavy GPU loading #1730)
  • The decoupled look-back's type must be of cub::Traits<T>::PRIMITIVE, as for other "more complex" types we emit a st.release.
  • Algorithm is compiled with a BlockLoad algorithm that doesn't load all of the items' data into shared memory (e.g., BLOCK_LOAD_DIRECT).

In order for stream compaction to work in-place, we need to make sure a thread block has loaded its items before it signals successor thread blocks the number of items it selected (i.e., the "aggregate" or "partial" in the decoupled look-back), as that is the only information needed by successor thread blocks to infer their offset, which unblocks them to write out their stream-compacted items. To make sure in-place stream compaction works as expected for tuning policies with BLOCK_LOAD_DIRECT, we need a device-wide memory barrier between BlockLoad(...).Load(items) and tile_state.SetPartial().

Unless we're loading items to shared memory during the BlockLoad stage, the CTA_SYNC() (__syncthreads()) is not sufficient as that is only a memory barrier with regards to other threads in the thread block.

Checklist - Post-Load acquire introduction (aeff76e)

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • Verify sass remains the same for all our cub benchmarks, except for function signature changes in the SelectIf kernel template, which was extended for the MayAlias template parameter.
  • Run benchmarks for in-place compaction to gather performance difference
  • Confirm the race condition is fixed for (a) when loading via shared memory and (b) when not loading via shared memory

@elstehle
Copy link
Collaborator Author

elstehle commented May 27, 2024

Comparing before/after adding a threadfence, compiling for targeted architecture.

Select.If - Tesla V100-SXM2-32GB

T{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 8.912 us 8.58% 9.099 us 8.08% 0.187 us 2.10% PASS
I8 I32 2^20 1 15.293 us 4.58% 15.708 us 4.12% 0.414 us 2.71% PASS
I8 I32 2^24 1 106.539 us 0.98% 115.718 us 1.00% 9.180 us 8.62% FAIL
I8 I32 2^28 1 1.581 ms 0.50% 1.729 ms 0.50% 148.923 us 9.42% FAIL
I8 I32 2^16 0.544 8.770 us 8.29% 8.924 us 7.26% 0.154 us 1.76% PASS
I8 I32 2^20 0.544 14.762 us 4.53% 15.448 us 4.10% 0.686 us 4.64% FAIL
I8 I32 2^24 0.544 98.968 us 0.94% 108.532 us 0.75% 9.565 us 9.66% FAIL
I8 I32 2^28 0.544 1.451 ms 0.50% 1.605 ms 0.50% 154.071 us 10.62% FAIL
I8 I32 2^16 0 8.502 us 7.23% 8.728 us 7.50% 0.226 us 2.66% PASS
I8 I32 2^20 0 14.359 us 4.55% 14.844 us 4.38% 0.486 us 3.38% PASS
I8 I32 2^24 0 90.417 us 0.81% 99.528 us 0.69% 9.111 us 10.08% FAIL
I8 I32 2^28 0 1.291 ms 0.46% 1.441 ms 0.50% 150.519 us 11.66% FAIL
I8 I64 2^16 1 8.965 us 6.84% 9.179 us 5.87% 0.214 us 2.39% PASS
I8 I64 2^20 1 15.673 us 4.10% 15.930 us 4.18% 0.257 us 1.64% PASS
I8 I64 2^24 1 111.868 us 0.87% 119.288 us 0.84% 7.420 us 6.63% FAIL
I8 I64 2^28 1 1.661 ms 0.50% 1.789 ms 0.50% 128.483 us 7.74% FAIL
I8 I64 2^16 0.544 8.893 us 7.34% 9.142 us 6.28% 0.249 us 2.80% PASS
I8 I64 2^20 0.544 15.170 us 4.08% 15.688 us 4.26% 0.518 us 3.41% PASS
I8 I64 2^24 0.544 105.205 us 0.81% 112.016 us 0.68% 6.810 us 6.47% FAIL
I8 I64 2^28 0.544 1.551 ms 0.50% 1.663 ms 0.50% 111.505 us 7.19% FAIL
I8 I64 2^16 0 8.703 us 8.24% 8.870 us 7.99% 0.167 us 1.92% PASS
I8 I64 2^20 0 14.839 us 4.75% 15.088 us 4.30% 0.249 us 1.68% PASS
I8 I64 2^24 0 95.512 us 0.71% 102.365 us 0.70% 6.853 us 7.17% FAIL
I8 I64 2^28 0 1.366 ms 0.50% 1.485 ms 0.50% 118.661 us 8.68% FAIL
I16 I32 2^16 1 9.028 us 6.83% 9.188 us 5.94% 0.160 us 1.77% PASS
I16 I32 2^20 1 16.606 us 3.59% 16.921 us 4.21% 0.315 us 1.90% PASS
I16 I32 2^24 1 125.902 us 1.15% 133.191 us 1.13% 7.289 us 5.79% FAIL
I16 I32 2^28 1 1.873 ms 0.50% 1.989 ms 0.50% 115.531 us 6.17% FAIL
I16 I32 2^16 0.544 8.950 us 7.57% 9.086 us 7.09% 0.135 us 1.51% PASS
I16 I32 2^20 0.544 16.164 us 3.91% 16.595 us 3.99% 0.431 us 2.67% PASS
I16 I32 2^24 0.544 115.171 us 1.05% 122.809 us 1.01% 7.638 us 6.63% FAIL
I16 I32 2^28 0.544 1.701 ms 0.50% 1.816 ms 0.50% 115.545 us 6.79% FAIL
I16 I32 2^16 0 8.635 us 8.25% 8.814 us 7.41% 0.179 us 2.08% PASS
I16 I32 2^20 0 15.912 us 4.21% 16.145 us 3.69% 0.233 us 1.46% PASS
I16 I32 2^24 0 95.870 us 0.73% 105.086 us 0.71% 9.216 us 9.61% FAIL
I16 I32 2^28 0 1.349 ms 0.50% 1.513 ms 0.50% 163.377 us 12.11% FAIL
I16 I64 2^16 1 9.016 us 7.24% 9.291 us 5.60% 0.275 us 3.05% PASS
I16 I64 2^20 1 16.798 us 3.86% 17.119 us 3.84% 0.321 us 1.91% PASS
I16 I64 2^24 1 128.175 us 0.99% 135.671 us 0.97% 7.496 us 5.85% FAIL
I16 I64 2^28 1 1.904 ms 0.50% 2.019 ms 0.50% 114.364 us 6.01% FAIL
I16 I64 2^16 0.544 8.978 us 6.69% 9.220 us 6.03% 0.243 us 2.70% PASS
I16 I64 2^20 0.544 16.520 us 3.82% 16.980 us 3.91% 0.460 us 2.78% PASS
I16 I64 2^24 0.544 118.118 us 0.88% 125.303 us 0.88% 7.185 us 6.08% FAIL
I16 I64 2^28 0.544 1.741 ms 0.50% 1.852 ms 0.50% 110.498 us 6.35% FAIL
I16 I64 2^16 0 8.777 us 7.58% 8.969 us 6.92% 0.191 us 2.18% PASS
I16 I64 2^20 0 16.226 us 3.87% 16.477 us 4.28% 0.251 us 1.55% PASS
I16 I64 2^24 0 100.467 us 0.66% 107.935 us 0.65% 7.468 us 7.43% FAIL
I16 I64 2^28 0 1.419 ms 0.50% 1.553 ms 0.50% 134.076 us 9.45% FAIL
I32 I32 2^16 1 9.036 us 6.74% 9.384 us 6.72% 0.347 us 3.84% PASS
I32 I32 2^20 1 19.874 us 4.34% 20.109 us 3.50% 0.235 us 1.18% PASS
I32 I32 2^24 1 184.482 us 0.85% 190.320 us 1.21% 5.839 us 3.16% FAIL
I32 I32 2^28 1 2.823 ms 0.62% 2.911 ms 0.65% 88.147 us 3.12% FAIL
I32 I32 2^16 0.544 9.070 us 6.71% 9.457 us 5.97% 0.388 us 4.27% PASS
I32 I32 2^20 0.544 19.686 us 3.11% 20.061 us 3.58% 0.375 us 1.90% PASS
I32 I32 2^24 0.544 155.884 us 1.14% 161.834 us 1.42% 5.950 us 3.82% FAIL
I32 I32 2^28 0.544 2.332 ms 0.50% 2.422 ms 0.50% 90.603 us 3.89% FAIL
I32 I32 2^16 0 8.830 us 7.39% 8.993 us 6.19% 0.163 us 1.84% PASS
I32 I32 2^20 0 18.978 us 3.33% 19.083 us 3.85% 0.105 us 0.55% PASS
I32 I32 2^24 0 113.539 us 1.01% 118.506 us 1.01% 4.967 us 4.37% FAIL
I32 I32 2^28 0 1.580 ms 1.01% 1.662 ms 0.79% 82.477 us 5.22% FAIL
I32 I64 2^16 1 9.162 us 5.73% 9.654 us 6.73% 0.492 us 5.37% PASS
I32 I64 2^20 1 20.099 us 3.84% 20.136 us 3.34% 0.036 us 0.18% PASS
I32 I64 2^24 1 186.217 us 1.00% 192.459 us 1.44% 6.242 us 3.35% FAIL
I32 I64 2^28 1 2.845 ms 0.61% 2.961 ms 0.65% 115.749 us 4.07% FAIL
I32 I64 2^16 0.544 9.205 us 5.84% 9.525 us 6.77% 0.320 us 3.48% PASS
I32 I64 2^20 0.544 20.076 us 3.21% 20.255 us 3.65% 0.179 us 0.89% PASS
I32 I64 2^24 0.544 157.555 us 1.07% 163.415 us 1.32% 5.861 us 3.72% FAIL
I32 I64 2^28 0.544 2.357 ms 0.50% 2.450 ms 0.50% 92.733 us 3.93% FAIL
I32 I64 2^16 0 8.937 us 7.42% 9.027 us 6.74% 0.090 us 1.01% PASS
I32 I64 2^20 0 19.320 us 3.60% 19.467 us 3.22% 0.148 us 0.76% PASS
I32 I64 2^24 0 117.161 us 0.99% 121.330 us 0.94% 4.169 us 3.56% FAIL
I32 I64 2^28 0 1.642 ms 0.89% 1.702 ms 0.77% 60.250 us 3.67% FAIL
I64 I32 2^16 1 10.166 us 6.21% 10.544 us 6.41% 0.379 us 3.73% PASS
I64 I32 2^20 1 29.646 us 2.64% 30.024 us 2.99% 0.378 us 1.27% PASS
I64 I32 2^24 1 349.804 us 0.56% 355.649 us 0.63% 5.846 us 1.67% FAIL
I64 I32 2^28 1 5.472 ms 0.50% 5.563 ms 0.50% 91.410 us 1.67% FAIL
I64 I32 2^16 0.544 10.472 us 5.86% 10.994 us 6.49% 0.522 us 4.98% PASS
I64 I32 2^20 0.544 28.024 us 2.91% 28.268 us 3.08% 0.244 us 0.87% PASS
I64 I32 2^24 0.544 281.239 us 0.62% 286.075 us 0.81% 4.836 us 1.72% FAIL
I64 I32 2^28 0.544 4.339 ms 0.50% 4.410 ms 0.50% 70.634 us 1.63% FAIL
I64 I32 2^16 0 9.773 us 6.62% 10.168 us 6.22% 0.395 us 4.04% PASS
I64 I32 2^20 0 27.180 us 2.79% 27.634 us 2.90% 0.454 us 1.67% PASS
I64 I32 2^24 0 192.678 us 0.88% 197.576 us 0.93% 4.897 us 2.54% FAIL
I64 I32 2^28 0 2.832 ms 0.88% 2.911 ms 0.77% 78.442 us 2.77% FAIL
I64 I64 2^16 1 10.579 us 6.03% 11.077 us 6.48% 0.498 us 4.71% PASS
I64 I64 2^20 1 30.587 us 2.42% 30.687 us 2.49% 0.100 us 0.33% PASS
I64 I64 2^24 1 357.968 us 0.64% 365.810 us 0.79% 7.841 us 2.19% FAIL
I64 I64 2^28 1 5.594 ms 0.50% 5.728 ms 0.50% 134.466 us 2.40% FAIL
I64 I64 2^16 0.544 10.144 us 6.04% 10.556 us 6.55% 0.411 us 4.05% PASS
I64 I64 2^20 0.544 28.717 us 3.23% 28.868 us 3.13% 0.152 us 0.53% PASS
I64 I64 2^24 0.544 291.287 us 0.79% 299.833 us 0.98% 8.547 us 2.93% FAIL
I64 I64 2^28 0.544 4.503 ms 0.50% 4.648 ms 0.50% 145.339 us 3.23% FAIL
I64 I64 2^16 0 10.352 us 5.39% 10.651 us 6.81% 0.299 us 2.89% PASS
I64 I64 2^20 0 28.169 us 2.64% 28.055 us 3.21% -0.114 us -0.40% PASS
I64 I64 2^24 0 205.067 us 0.79% 211.705 us 0.76% 6.638 us 3.24% FAIL
I64 I64 2^28 0 3.046 ms 0.73% 3.153 ms 0.62% 107.727 us 3.54% FAIL
I128 I32 2^16 1 12.566 us 5.21% 13.126 us 5.30% 0.561 us 4.46% PASS
I128 I32 2^20 1 40.083 us 1.70% 40.682 us 1.74% 0.599 us 1.50% PASS
I128 I32 2^24 1 393.489 us 0.54% 417.517 us 0.50% 24.029 us 6.11% FAIL
I128 I32 2^28 1 6.069 ms 0.50% 6.443 ms 0.50% 373.328 us 6.15% FAIL
I128 I32 2^16 0.544 12.594 us 5.43% 13.127 us 5.17% 0.534 us 4.24% PASS
I128 I32 2^20 0.544 40.083 us 1.78% 40.746 us 1.95% 0.663 us 1.65% PASS
I128 I32 2^24 0.544 393.475 us 0.60% 417.545 us 0.50% 24.070 us 6.12% FAIL
I128 I32 2^28 0.544 6.069 ms 0.50% 6.443 ms 0.50% 373.411 us 6.15% FAIL
I128 I32 2^16 0 12.542 us 4.92% 13.108 us 4.75% 0.566 us 4.51% PASS
I128 I32 2^20 0 39.995 us 1.68% 40.677 us 1.85% 0.682 us 1.70% FAIL
I128 I32 2^24 0 393.323 us 0.53% 417.526 us 0.50% 24.203 us 6.15% FAIL
I128 I32 2^28 0 6.069 ms 0.50% 6.443 ms 0.50% 373.458 us 6.15% FAIL
I128 I64 2^16 1 12.001 us 5.42% 12.576 us 5.40% 0.575 us 4.80% PASS
I128 I64 2^20 1 41.167 us 1.60% 41.400 us 1.71% 0.234 us 0.57% PASS
I128 I64 2^24 1 415.244 us 0.50% 435.304 us 0.41% 20.060 us 4.83% FAIL
I128 I64 2^28 1 6.418 ms 0.50% 6.742 ms 0.50% 323.727 us 5.04% FAIL
I128 I64 2^16 0.544 12.055 us 5.35% 12.654 us 5.46% 0.599 us 4.97% PASS
I128 I64 2^20 0.544 41.133 us 1.51% 41.490 us 2.03% 0.357 us 0.87% PASS
I128 I64 2^24 0.544 415.328 us 0.50% 435.348 us 0.40% 20.020 us 4.82% FAIL
I128 I64 2^28 0.544 6.418 ms 0.50% 6.742 ms 0.50% 323.551 us 5.04% FAIL
I128 I64 2^16 0 12.059 us 5.56% 12.653 us 5.36% 0.594 us 4.93% PASS
I128 I64 2^20 0 41.177 us 1.41% 41.572 us 1.96% 0.395 us 0.96% PASS
I128 I64 2^24 0 415.291 us 0.50% 435.331 us 0.42% 20.040 us 4.83% FAIL
I128 I64 2^28 0 6.419 ms 0.50% 6.742 ms 0.50% 323.852 us 5.05% FAIL
F32 I32 2^16 1 9.396 us 6.82% 9.330 us 6.60% -0.065 us -0.70% PASS
F32 I32 2^20 1 19.837 us 3.13% 19.958 us 3.57% 0.121 us 0.61% PASS
F32 I32 2^24 1 184.529 us 0.91% 190.288 us 1.25% 5.758 us 3.12% FAIL
F32 I32 2^28 1 2.960 ms 0.67% 3.020 ms 0.66% 60.203 us 2.03% FAIL
F32 I32 2^16 0.544 8.842 us 7.86% 8.970 us 7.18% 0.128 us 1.45% PASS
F32 I32 2^20 0.544 18.993 us 3.18% 19.265 us 3.67% 0.272 us 1.43% PASS
F32 I32 2^24 0.544 128.972 us 1.08% 133.893 us 1.15% 4.921 us 3.82% FAIL
F32 I32 2^28 0.544 1.858 ms 0.77% 1.926 ms 0.67% 67.216 us 3.62% FAIL
F32 I32 2^16 0 8.802 us 6.88% 8.944 us 6.95% 0.142 us 1.61% PASS
F32 I32 2^20 0 18.928 us 3.83% 18.982 us 3.74% 0.054 us 0.29% PASS
F32 I32 2^24 0 113.272 us 1.13% 118.799 us 1.03% 5.527 us 4.88% FAIL
F32 I32 2^28 0 1.580 ms 1.01% 1.662 ms 0.80% 82.529 us 5.22% FAIL
F32 I64 2^16 1 9.398 us 6.38% 9.377 us 7.18% -0.020 us -0.22% PASS
F32 I64 2^20 1 20.335 us 3.99% 20.434 us 4.50% 0.099 us 0.49% PASS
F32 I64 2^24 1 185.982 us 0.96% 192.968 us 1.47% 6.985 us 3.76% FAIL
F32 I64 2^28 1 2.969 ms 0.67% 3.041 ms 0.67% 71.424 us 2.41% FAIL
F32 I64 2^16 0.544 8.958 us 6.71% 9.156 us 6.20% 0.198 us 2.21% PASS
F32 I64 2^20 0.544 19.403 us 3.40% 19.717 us 3.39% 0.314 us 1.62% PASS
F32 I64 2^24 0.544 132.361 us 1.23% 136.217 us 1.22% 3.856 us 2.91% FAIL
F32 I64 2^28 0.544 1.902 ms 0.69% 1.964 ms 0.64% 61.788 us 3.25% FAIL
F32 I64 2^16 0 9.009 us 6.55% 9.216 us 6.23% 0.207 us 2.29% PASS
F32 I64 2^20 0 19.284 us 3.55% 19.635 us 3.69% 0.351 us 1.82% PASS
F32 I64 2^24 0 117.281 us 0.98% 121.540 us 0.97% 4.259 us 3.63% FAIL
F32 I64 2^28 0 1.645 ms 0.88% 1.705 ms 0.77% 60.682 us 3.69% FAIL
F64 I32 2^16 1 9.970 us 6.71% 10.656 us 6.52% 0.686 us 6.88% FAIL
F64 I32 2^20 1 29.566 us 2.56% 30.029 us 2.98% 0.463 us 1.57% PASS
F64 I32 2^24 1 349.673 us 0.50% 355.488 us 0.65% 5.814 us 1.66% FAIL
F64 I32 2^28 1 5.471 ms 0.50% 5.560 ms 0.50% 89.456 us 1.64% FAIL
F64 I32 2^16 0.544 9.776 us 6.74% 10.225 us 6.68% 0.449 us 4.60% PASS
F64 I32 2^20 0.544 26.970 us 2.33% 27.067 us 3.04% 0.097 us 0.36% PASS
F64 I32 2^24 0.544 224.209 us 0.74% 227.678 us 0.84% 3.468 us 1.55% FAIL
F64 I32 2^28 0.544 3.384 ms 0.50% 3.429 ms 0.50% 44.776 us 1.32% FAIL
F64 I32 2^16 0 9.795 us 6.91% 10.232 us 6.27% 0.437 us 4.46% PASS
F64 I32 2^20 0 27.198 us 2.76% 27.270 us 2.68% 0.072 us 0.26% PASS
F64 I32 2^24 0 192.582 us 0.85% 197.339 us 0.91% 4.757 us 2.47% FAIL
F64 I32 2^28 0 2.833 ms 0.88% 2.907 ms 0.77% 74.390 us 2.63% FAIL
F64 I64 2^16 1 10.453 us 5.95% 10.848 us 6.55% 0.394 us 3.77% PASS
F64 I64 2^20 1 30.257 us 2.34% 30.516 us 2.52% 0.258 us 0.85% PASS
F64 I64 2^24 1 357.008 us 0.58% 365.797 us 0.83% 8.789 us 2.46% FAIL
F64 I64 2^28 1 5.579 ms 0.50% 5.727 ms 0.50% 148.084 us 2.65% FAIL
F64 I64 2^16 0.544 10.167 us 6.22% 10.599 us 6.78% 0.432 us 4.25% PASS
F64 I64 2^20 0.544 27.649 us 2.35% 27.181 us 2.77% -0.468 us -1.69% PASS
F64 I64 2^24 0.544 234.540 us 0.87% 238.254 us 0.81% 3.714 us 1.58% FAIL
F64 I64 2^28 0.544 3.568 ms 0.50% 3.616 ms 0.50% 48.448 us 1.36% FAIL
F64 I64 2^16 0 10.140 us 5.75% 10.494 us 6.41% 0.354 us 3.49% PASS
F64 I64 2^20 0 27.881 us 2.73% 27.380 us 2.66% -0.501 us -1.80% PASS
F64 I64 2^24 0 204.907 us 0.81% 211.228 us 0.77% 6.321 us 3.09% FAIL
F64 I64 2^28 0 3.046 ms 0.75% 3.147 ms 0.62% 101.815 us 3.34% FAIL

Select.Flagged - Tesla V100-SXM2-32GB

[0] Tesla V100-SXM2-32GB

T{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 8.968 us 8.86% 9.498 us 7.74% 0.530 us 5.91% PASS
I8 I32 2^20 1 16.185 us 4.20% 17.055 us 4.30% 0.869 us 5.37% FAIL
I8 I32 2^24 1 120.488 us 0.87% 128.596 us 0.76% 8.108 us 6.73% FAIL
I8 I32 2^28 1 1.807 ms 0.50% 1.941 ms 0.50% 134.174 us 7.42% FAIL
I8 I32 2^16 0.544 8.842 us 8.21% 8.991 us 7.44% 0.149 us 1.68% PASS
I8 I32 2^20 0.544 15.875 us 4.22% 16.739 us 3.54% 0.863 us 5.44% FAIL
I8 I32 2^24 0.544 117.490 us 0.97% 126.136 us 0.81% 8.647 us 7.36% FAIL
I8 I32 2^28 0.544 1.734 ms 0.51% 1.874 ms 0.50% 140.140 us 8.08% FAIL
I8 I32 2^16 0 8.594 us 7.53% 8.766 us 8.01% 0.172 us 2.00% PASS
I8 I32 2^20 0 15.336 us 4.57% 16.506 us 3.97% 1.170 us 7.63% FAIL
I8 I32 2^24 0 103.213 us 0.80% 112.879 us 0.71% 9.666 us 9.37% FAIL
I8 I32 2^28 0 1.486 ms 0.10% 1.644 ms 0.11% 157.688 us 10.61% FAIL
I8 I64 2^16 1 8.855 us 7.52% 9.299 us 6.28% 0.444 us 5.02% PASS
I8 I64 2^20 1 16.947 us 4.01% 17.745 us 3.72% 0.798 us 4.71% FAIL
I8 I64 2^24 1 131.561 us 0.67% 140.788 us 0.57% 9.227 us 7.01% FAIL
I8 I64 2^28 1 1.991 ms 0.50% 2.144 ms 0.50% 152.858 us 7.68% FAIL
I8 I64 2^16 0.544 8.832 us 7.83% 9.185 us 6.56% 0.353 us 4.00% PASS
I8 I64 2^20 0.544 16.608 us 4.14% 17.544 us 3.54% 0.936 us 5.64% FAIL
I8 I64 2^24 0.544 128.598 us 0.76% 137.702 us 0.68% 9.105 us 7.08% FAIL
I8 I64 2^28 0.544 1.920 ms 0.50% 2.067 ms 0.50% 147.296 us 7.67% FAIL
I8 I64 2^16 0 8.652 us 8.26% 9.015 us 6.77% 0.363 us 4.20% PASS
I8 I64 2^20 0 15.849 us 4.19% 16.728 us 4.05% 0.879 us 5.54% FAIL
I8 I64 2^24 0 114.232 us 0.71% 123.151 us 0.67% 8.919 us 7.81% FAIL
I8 I64 2^28 0 1.662 ms 0.08% 1.807 ms 0.10% 144.943 us 8.72% FAIL
I16 I32 2^16 1 8.941 us 7.29% 9.210 us 6.12% 0.269 us 3.00% PASS
I16 I32 2^20 1 17.656 us 3.49% 18.537 us 3.15% 0.882 us 5.00% FAIL
I16 I32 2^24 1 137.352 us 0.97% 143.125 us 1.06% 5.773 us 4.20% FAIL
I16 I32 2^28 1 2.053 ms 0.50% 2.142 ms 0.50% 88.655 us 4.32% FAIL
I16 I32 2^16 0.544 8.993 us 7.49% 9.148 us 6.70% 0.155 us 1.72% PASS
I16 I32 2^20 0.544 17.457 us 4.05% 18.242 us 3.54% 0.785 us 4.49% FAIL
I16 I32 2^24 0.544 132.951 us 1.08% 140.627 us 1.09% 7.676 us 5.77% FAIL
I16 I32 2^28 0.544 1.973 ms 0.55% 2.076 ms 0.56% 102.638 us 5.20% FAIL
I16 I32 2^16 0 8.729 us 8.85% 8.896 us 7.41% 0.167 us 1.91% PASS
I16 I32 2^20 0 16.906 us 4.28% 17.573 us 3.49% 0.667 us 3.94% FAIL
I16 I32 2^24 0 105.762 us 0.79% 114.450 us 0.68% 8.688 us 8.21% FAIL
I16 I32 2^28 0 1.477 ms 0.13% 1.619 ms 0.13% 142.273 us 9.63% FAIL
I16 I64 2^16 1 9.233 us 6.72% 9.331 us 5.48% 0.098 us 1.07% PASS
I16 I64 2^20 1 18.815 us 3.65% 19.361 us 3.95% 0.546 us 2.90% PASS
I16 I64 2^24 1 144.780 us 0.91% 159.057 us 0.76% 14.277 us 9.86% FAIL
I16 I64 2^28 1 2.162 ms 0.50% 2.396 ms 0.50% 233.361 us 10.79% FAIL
I16 I64 2^16 0.544 9.072 us 6.99% 9.244 us 5.93% 0.172 us 1.89% PASS
I16 I64 2^20 0.544 18.352 us 4.13% 18.692 us 3.55% 0.339 us 1.85% PASS
I16 I64 2^24 0.544 140.339 us 0.90% 155.306 us 0.77% 14.967 us 10.67% FAIL
I16 I64 2^28 0.544 2.088 ms 0.51% 2.337 ms 0.50% 249.516 us 11.95% FAIL
I16 I64 2^16 0 8.793 us 7.98% 9.104 us 6.38% 0.311 us 3.54% PASS
I16 I64 2^20 0 17.499 us 3.97% 18.087 us 3.87% 0.588 us 3.36% PASS
I16 I64 2^24 0 115.227 us 0.78% 130.556 us 0.62% 15.329 us 13.30% FAIL
I16 I64 2^28 0 1.635 ms 0.10% 1.904 ms 0.09% 268.747 us 16.44% FAIL
I32 I32 2^16 1 9.431 us 6.60% 9.736 us 6.77% 0.305 us 3.24% PASS
I32 I32 2^20 1 21.706 us 3.07% 22.719 us 3.11% 1.013 us 4.66% FAIL
I32 I32 2^24 1 207.507 us 0.82% 213.549 us 0.97% 6.042 us 2.91% FAIL
I32 I32 2^28 1 3.186 ms 0.55% 3.248 ms 0.59% 61.900 us 1.94% FAIL
I32 I32 2^16 0.544 9.473 us 6.93% 9.646 us 6.89% 0.173 us 1.82% PASS
I32 I32 2^20 0.544 21.681 us 3.26% 22.808 us 3.06% 1.127 us 5.20% FAIL
I32 I32 2^24 0.544 184.719 us 1.06% 193.191 us 1.28% 8.471 us 4.59% FAIL
I32 I32 2^28 0.544 2.795 ms 0.50% 2.909 ms 0.50% 113.939 us 4.08% FAIL
I32 I32 2^16 0 9.067 us 7.71% 9.359 us 7.03% 0.292 us 3.22% PASS
I32 I32 2^20 0 20.829 us 3.16% 22.036 us 3.17% 1.207 us 5.80% FAIL
I32 I32 2^24 0 131.303 us 0.83% 135.584 us 0.78% 4.281 us 3.26% FAIL
I32 I32 2^28 0 1.834 ms 0.16% 1.900 ms 0.16% 65.846 us 3.59% FAIL
I32 I64 2^16 1 9.640 us 7.28% 9.596 us 6.65% -0.044 us -0.46% PASS
I32 I64 2^20 1 22.407 us 3.52% 22.311 us 3.40% -0.096 us -0.43% PASS
I32 I64 2^24 1 210.662 us 0.78% 217.703 us 0.99% 7.041 us 3.34% FAIL
I32 I64 2^28 1 3.223 ms 0.55% 3.336 ms 0.55% 113.104 us 3.51% FAIL
I32 I64 2^16 0.544 9.563 us 6.50% 9.429 us 6.55% -0.135 us -1.41% PASS
I32 I64 2^20 0.544 22.181 us 3.56% 22.420 us 3.40% 0.239 us 1.08% PASS
I32 I64 2^24 0.544 190.016 us 1.00% 200.693 us 1.15% 10.677 us 5.62% FAIL
I32 I64 2^28 0.544 2.873 ms 0.50% 3.044 ms 0.50% 170.261 us 5.93% FAIL
I32 I64 2^16 0 9.243 us 6.79% 9.278 us 5.97% 0.035 us 0.38% PASS
I32 I64 2^20 0 21.224 us 3.60% 21.764 us 3.47% 0.539 us 2.54% PASS
I32 I64 2^24 0 137.559 us 0.78% 146.899 us 0.65% 9.340 us 6.79% FAIL
I32 I64 2^28 0 1.944 ms 0.16% 2.121 ms 0.12% 177.032 us 9.11% FAIL
I64 I32 2^16 1 10.504 us 6.91% 10.514 us 6.34% 0.010 us 0.09% PASS
I64 I32 2^20 1 31.662 us 2.47% 32.551 us 2.32% 0.889 us 2.81% FAIL
I64 I32 2^24 1 378.684 us 0.56% 387.398 us 0.75% 8.713 us 2.30% FAIL
I64 I32 2^28 1 5.917 ms 0.50% 6.064 ms 0.50% 147.519 us 2.49% FAIL
I64 I32 2^16 0.544 10.212 us 6.50% 10.500 us 6.41% 0.288 us 2.82% PASS
I64 I32 2^20 0.544 29.567 us 2.51% 30.684 us 2.50% 1.116 us 3.78% FAIL
I64 I32 2^24 0.544 317.750 us 0.73% 326.572 us 0.91% 8.822 us 2.78% FAIL
I64 I32 2^28 0.544 4.910 ms 0.50% 5.055 ms 0.50% 144.807 us 2.95% FAIL
I64 I32 2^16 0 9.994 us 6.83% 10.303 us 5.92% 0.309 us 3.09% PASS
I64 I32 2^20 0 28.928 us 2.71% 29.853 us 2.70% 0.925 us 3.20% FAIL
I64 I32 2^24 0 217.336 us 0.52% 224.209 us 0.58% 6.873 us 3.16% FAIL
I64 I32 2^28 0 3.230 ms 0.13% 3.333 ms 0.13% 102.461 us 3.17% FAIL
I64 I64 2^16 1 9.930 us 7.08% 10.803 us 6.51% 0.873 us 8.79% FAIL
I64 I64 2^20 1 31.342 us 2.54% 32.495 us 2.50% 1.154 us 3.68% FAIL
I64 I64 2^24 1 379.375 us 0.57% 390.142 us 0.84% 10.767 us 2.84% FAIL
I64 I64 2^28 1 5.916 ms 0.50% 6.095 ms 0.50% 178.760 us 3.02% FAIL
I64 I64 2^16 0.544 9.786 us 7.31% 10.681 us 6.44% 0.895 us 9.15% FAIL
I64 I64 2^20 0.544 29.703 us 3.04% 30.512 us 2.69% 0.809 us 2.73% FAIL
I64 I64 2^24 0.544 317.789 us 0.70% 327.787 us 0.85% 9.998 us 3.15% FAIL
I64 I64 2^28 0.544 4.909 ms 0.50% 5.076 ms 0.50% 167.527 us 3.41% FAIL
I64 I64 2^16 0 9.740 us 6.82% 10.438 us 6.11% 0.699 us 7.17% FAIL
I64 I64 2^20 0 28.939 us 2.48% 29.784 us 2.32% 0.846 us 2.92% FAIL
I64 I64 2^24 0 221.492 us 0.54% 228.875 us 0.49% 7.383 us 3.33% FAIL
I64 I64 2^28 0 3.309 ms 0.12% 3.411 ms 0.11% 102.915 us 3.11% FAIL
I128 I32 2^16 1 12.917 us 5.35% 13.548 us 4.98% 0.632 us 4.89% PASS
I128 I32 2^20 1 54.920 us 1.48% 55.072 us 1.85% 0.152 us 0.28% PASS
I128 I32 2^24 1 739.042 us 0.50% 751.141 us 0.57% 12.099 us 1.64% FAIL
I128 I32 2^28 1 11.707 ms 0.50% 11.931 ms 0.50% 223.542 us 1.91% FAIL
I128 I32 2^16 0.544 12.877 us 5.66% 13.334 us 4.94% 0.456 us 3.54% PASS
I128 I32 2^20 0.544 47.881 us 2.01% 48.755 us 2.00% 0.874 us 1.82% PASS
I128 I32 2^24 0.544 607.521 us 0.68% 619.392 us 0.75% 11.871 us 1.95% FAIL
I128 I32 2^28 0.544 9.573 ms 0.50% 9.774 ms 0.50% 201.799 us 2.11% FAIL
I128 I32 2^16 0 12.708 us 5.33% 13.141 us 5.61% 0.432 us 3.40% PASS
I128 I32 2^20 0 41.695 us 1.88% 42.299 us 1.76% 0.604 us 1.45% PASS
I128 I32 2^24 0 413.564 us 0.38% 434.250 us 0.31% 20.686 us 5.00% FAIL
I128 I32 2^28 0 6.370 ms 0.08% 6.705 ms 0.08% 335.925 us 5.27% FAIL
I128 I64 2^16 1 12.474 us 5.29% 13.167 us 4.76% 0.693 us 5.56% FAIL
I128 I64 2^20 1 54.632 us 1.64% 55.377 us 1.84% 0.745 us 1.36% PASS
I128 I64 2^24 1 744.716 us 0.50% 759.217 us 0.59% 14.501 us 1.95% FAIL
I128 I64 2^28 1 11.789 ms 0.50% 12.048 ms 0.50% 259.238 us 2.20% FAIL
I128 I64 2^16 0.544 12.464 us 5.05% 12.995 us 5.30% 0.531 us 4.26% PASS
I128 I64 2^20 0.544 48.528 us 1.63% 49.392 us 1.98% 0.864 us 1.78% FAIL
I128 I64 2^24 0.544 616.587 us 0.66% 629.460 us 0.68% 12.874 us 2.09% FAIL
I128 I64 2^28 0.544 9.712 ms 0.50% 9.946 ms 0.50% 234.648 us 2.42% FAIL
I128 I64 2^16 0 12.279 us 5.05% 12.703 us 6.12% 0.424 us 3.45% PASS
I128 I64 2^20 0 42.418 us 1.70% 43.146 us 1.64% 0.727 us 1.71% FAIL
I128 I64 2^24 0 431.022 us 0.31% 446.924 us 0.26% 15.902 us 3.69% FAIL
I128 I64 2^28 0 6.662 ms 0.07% 6.920 ms 0.06% 257.548 us 3.87% FAIL
F32 I32 2^16 1 9.353 us 7.35% 9.680 us 6.83% 0.327 us 3.49% PASS
F32 I32 2^20 1 21.654 us 3.25% 22.933 us 3.26% 1.279 us 5.91% FAIL
F32 I32 2^24 1 207.630 us 0.82% 213.642 us 0.95% 6.012 us 2.90% FAIL
F32 I32 2^28 1 3.185 ms 0.56% 3.247 ms 0.59% 61.836 us 1.94% FAIL
F32 I32 2^16 0.544 9.468 us 6.79% 9.506 us 6.40% 0.039 us 0.41% PASS
F32 I32 2^20 0.544 21.738 us 3.19% 22.732 us 3.15% 0.994 us 4.57% FAIL
F32 I32 2^24 0.544 184.812 us 1.04% 193.011 us 1.27% 8.198 us 4.44% FAIL
F32 I32 2^28 0.544 2.796 ms 0.50% 2.909 ms 0.50% 113.690 us 4.07% FAIL
F32 I32 2^16 0 9.152 us 6.81% 9.270 us 6.65% 0.118 us 1.29% PASS
F32 I32 2^20 0 20.757 us 3.06% 21.913 us 3.28% 1.156 us 5.57% FAIL
F32 I32 2^24 0 131.182 us 0.82% 135.503 us 0.82% 4.321 us 3.29% FAIL
F32 I32 2^28 0 1.834 ms 0.18% 1.901 ms 0.18% 66.163 us 3.61% FAIL
F32 I64 2^16 1 9.581 us 7.36% 9.616 us 6.51% 0.036 us 0.37% PASS
F32 I64 2^20 1 22.103 us 3.54% 22.647 us 3.31% 0.543 us 2.46% PASS
F32 I64 2^24 1 210.402 us 0.79% 217.726 us 1.00% 7.324 us 3.48% FAIL
F32 I64 2^28 1 3.222 ms 0.55% 3.336 ms 0.55% 113.503 us 3.52% FAIL
F32 I64 2^16 0.544 9.603 us 7.17% 9.467 us 6.80% -0.136 us -1.42% PASS
F32 I64 2^20 0.544 22.201 us 3.71% 22.714 us 4.59% 0.513 us 2.31% PASS
F32 I64 2^24 0.544 189.620 us 1.02% 199.942 us 1.12% 10.322 us 5.44% FAIL
F32 I64 2^28 0.544 2.873 ms 0.50% 3.043 ms 0.50% 170.245 us 5.93% FAIL
F32 I64 2^16 0 9.292 us 6.90% 9.424 us 6.41% 0.132 us 1.42% PASS
F32 I64 2^20 0 21.237 us 3.50% 21.904 us 3.19% 0.668 us 3.14% PASS
F32 I64 2^24 0 137.274 us 0.76% 147.125 us 0.70% 9.851 us 7.18% FAIL
F32 I64 2^28 0 1.944 ms 0.14% 2.122 ms 0.12% 177.713 us 9.14% FAIL
F64 I32 2^16 1 10.381 us 6.37% 10.529 us 6.75% 0.147 us 1.42% PASS
F64 I32 2^20 1 31.717 us 2.34% 32.602 us 2.43% 0.885 us 2.79% FAIL
F64 I32 2^24 1 378.763 us 0.57% 387.486 us 0.71% 8.723 us 2.30% FAIL
F64 I32 2^28 1 5.917 ms 0.50% 6.065 ms 0.50% 147.912 us 2.50% FAIL
F64 I32 2^16 0.544 10.542 us 6.62% 10.450 us 5.86% -0.092 us -0.87% PASS
F64 I32 2^20 0.544 29.626 us 2.52% 30.642 us 2.73% 1.017 us 3.43% FAIL
F64 I32 2^24 0.544 317.616 us 0.72% 326.596 us 0.87% 8.980 us 2.83% FAIL
F64 I32 2^28 0.544 4.911 ms 0.50% 5.054 ms 0.50% 143.797 us 2.93% FAIL
F64 I32 2^16 0 10.042 us 6.41% 10.331 us 6.53% 0.289 us 2.88% PASS
F64 I32 2^20 0 28.956 us 2.42% 29.831 us 2.63% 0.876 us 3.02% FAIL
F64 I32 2^24 0 217.447 us 0.52% 224.264 us 0.58% 6.818 us 3.14% FAIL
F64 I32 2^28 0 3.231 ms 0.12% 3.332 ms 0.12% 101.283 us 3.13% FAIL
F64 I64 2^16 1 10.119 us 6.49% 10.626 us 5.93% 0.507 us 5.01% PASS
F64 I64 2^20 1 31.479 us 2.45% 32.458 us 2.35% 0.979 us 3.11% FAIL
F64 I64 2^24 1 379.315 us 0.56% 390.081 us 0.81% 10.766 us 2.84% FAIL
F64 I64 2^28 1 5.916 ms 0.50% 6.095 ms 0.50% 178.840 us 3.02% FAIL
F64 I64 2^16 0.544 10.253 us 6.61% 10.593 us 5.90% 0.340 us 3.31% PASS
F64 I64 2^20 0.544 29.804 us 2.84% 30.389 us 2.54% 0.584 us 1.96% PASS
F64 I64 2^24 0.544 317.599 us 0.67% 327.729 us 0.84% 10.130 us 3.19% FAIL
F64 I64 2^28 0.544 4.909 ms 0.50% 5.076 ms 0.50% 166.845 us 3.40% FAIL
F64 I64 2^16 0 9.717 us 6.68% 10.360 us 5.73% 0.643 us 6.62% FAIL
F64 I64 2^20 0 28.901 us 2.46% 29.741 us 2.29% 0.840 us 2.91% FAIL
F64 I64 2^24 0 221.525 us 0.49% 228.770 us 0.46% 7.244 us 3.27% FAIL
F64 I64 2^28 0 3.309 ms 0.12% 3.411 ms 0.13% 102.454 us 3.10% FAIL

@elstehle
Copy link
Collaborator Author

Comparing before/after, compiling for targeted architecture.

Select.If - NVIDIA A100-PCIE-40GB

[0] NVIDIA A100-PCIE-40GB

T{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 14.003 us 10.59% 14.326 us 10.51% 0.323 us 2.30% PASS
I8 I32 2^20 1 16.561 us 2.95% 16.863 us 3.24% 0.303 us 1.83% PASS
I8 I32 2^24 1 71.453 us 1.01% 73.136 us 1.07% 1.683 us 2.36% FAIL
I8 I32 2^28 1 915.396 us 0.11% 924.950 us 0.33% 9.554 us 1.04% FAIL
I8 I32 2^16 0.544 13.733 us 3.97% 13.789 us 4.07% 0.056 us 0.41% PASS
I8 I32 2^20 0.544 15.951 us 3.67% 16.185 us 3.34% 0.234 us 1.47% PASS
I8 I32 2^24 0.544 68.066 us 1.06% 69.339 us 0.89% 1.273 us 1.87% FAIL
I8 I32 2^28 0.544 848.127 us 0.45% 893.979 us 0.30% 45.852 us 5.41% FAIL
I8 I32 2^16 0 13.194 us 3.54% 13.290 us 3.78% 0.096 us 0.73% PASS
I8 I32 2^20 0 14.543 us 3.27% 14.651 us 3.61% 0.107 us 0.74% PASS
I8 I32 2^24 0 56.416 us 1.29% 58.472 us 1.31% 2.056 us 3.64% FAIL
I8 I32 2^28 0 658.149 us 0.13% 665.887 us 0.50% 7.738 us 1.18% FAIL
I8 I64 2^16 1 10.744 us 5.11% 10.954 us 5.11% 0.210 us 1.96% PASS
I8 I64 2^20 1 16.208 us 3.11% 16.455 us 2.92% 0.247 us 1.52% PASS
I8 I64 2^24 1 98.434 us 0.70% 105.275 us 0.77% 6.841 us 6.95% FAIL
I8 I64 2^28 1 1.437 ms 0.26% 1.553 ms 0.24% 116.434 us 8.10% FAIL
I8 I64 2^16 0.544 10.696 us 5.31% 10.956 us 4.86% 0.260 us 2.43% PASS
I8 I64 2^20 0.544 16.002 us 3.55% 16.360 us 3.01% 0.358 us 2.24% PASS
I8 I64 2^24 0.544 95.497 us 0.76% 101.129 us 0.87% 5.633 us 5.90% FAIL
I8 I64 2^28 0.544 1.385 ms 0.26% 1.490 ms 0.27% 104.969 us 7.58% FAIL
I8 I64 2^16 0 10.196 us 4.57% 10.374 us 4.24% 0.178 us 1.75% PASS
I8 I64 2^20 0 15.608 us 3.27% 15.804 us 3.70% 0.196 us 1.26% PASS
I8 I64 2^24 0 91.231 us 0.71% 96.614 us 0.81% 5.383 us 5.90% FAIL
I8 I64 2^28 0 1.281 ms 0.50% 1.393 ms 0.35% 112.541 us 8.79% FAIL
I16 I32 2^16 1 10.983 us 4.82% 11.045 us 4.62% 0.062 us 0.56% PASS
I16 I32 2^20 1 18.496 us 2.93% 19.112 us 2.90% 0.616 us 3.33% FAIL
I16 I32 2^24 1 111.354 us 0.54% 112.868 us 0.60% 1.515 us 1.36% FAIL
I16 I32 2^28 1 1.582 ms 0.07% 1.593 ms 0.11% 11.225 us 0.71% FAIL
I16 I32 2^16 0.544 11.523 us 4.57% 11.597 us 4.42% 0.074 us 0.64% PASS
I16 I32 2^20 0.544 18.395 us 2.89% 18.940 us 2.94% 0.545 us 2.96% FAIL
I16 I32 2^24 0.544 106.319 us 0.81% 109.550 us 0.81% 3.232 us 3.04% FAIL
I16 I32 2^28 0.544 1.487 ms 0.36% 1.560 ms 0.16% 73.625 us 4.95% FAIL
I16 I32 2^16 0 11.019 us 4.54% 11.083 us 4.61% 0.064 us 0.58% PASS
I16 I32 2^20 0 16.909 us 3.43% 17.193 us 3.07% 0.284 us 1.68% PASS
I16 I32 2^24 0 90.747 us 0.63% 91.090 us 0.53% 0.342 us 0.38% PASS
I16 I32 2^28 0 1.255 ms 0.28% 1.275 ms 0.07% 19.550 us 1.56% FAIL
I16 I64 2^16 1 10.767 us 5.33% 11.021 us 4.62% 0.254 us 2.36% PASS
I16 I64 2^20 1 17.485 us 2.98% 17.795 us 3.11% 0.310 us 1.77% PASS
I16 I64 2^24 1 106.389 us 0.68% 113.040 us 1.00% 6.651 us 6.25% FAIL
I16 I64 2^28 1 1.544 ms 0.27% 1.653 ms 0.25% 108.572 us 7.03% FAIL
I16 I64 2^16 0.544 10.822 us 5.14% 11.126 us 4.53% 0.305 us 2.81% PASS
I16 I64 2^20 0.544 17.368 us 2.64% 17.916 us 3.00% 0.547 us 3.15% FAIL
I16 I64 2^24 0.544 103.990 us 0.87% 109.129 us 0.94% 5.138 us 4.94% FAIL
I16 I64 2^28 0.544 1.491 ms 0.30% 1.592 ms 0.25% 101.094 us 6.78% FAIL
I16 I64 2^16 0 10.816 us 5.26% 10.876 us 5.29% 0.060 us 0.55% PASS
I16 I64 2^20 0 16.761 us 3.21% 17.238 us 2.90% 0.477 us 2.85% PASS
I16 I64 2^24 0 98.174 us 0.79% 102.736 us 0.99% 4.563 us 4.65% FAIL
I16 I64 2^28 0 1.356 ms 0.50% 1.453 ms 0.35% 96.181 us 7.09% FAIL
I32 I32 2^16 1 10.802 us 5.53% 11.041 us 4.55% 0.240 us 2.22% PASS
I32 I32 2^20 1 17.680 us 3.19% 17.937 us 3.18% 0.257 us 1.46% PASS
I32 I32 2^24 1 117.972 us 2.52% 122.942 us 4.55% 4.970 us 4.21% FAIL
I32 I32 2^28 1 1.686 ms 0.75% 1.758 ms 1.41% 72.262 us 4.29% FAIL
I32 I32 2^16 0.544 10.497 us 4.70% 10.917 us 5.08% 0.420 us 4.00% PASS
I32 I32 2^20 0.544 17.420 us 3.16% 17.763 us 3.23% 0.343 us 1.97% PASS
I32 I32 2^24 0.544 100.675 us 2.20% 106.385 us 2.20% 5.710 us 5.67% FAIL
I32 I32 2^28 0.544 1.348 ms 0.69% 1.401 ms 0.96% 52.663 us 3.91% FAIL
I32 I32 2^16 0 9.993 us 5.48% 10.154 us 4.64% 0.161 us 1.61% PASS
I32 I32 2^20 0 17.586 us 19.34% 17.253 us 8.35% -0.333 us -1.89% PASS
I32 I32 2^24 0 82.159 us 2.07% 84.814 us 2.96% 2.655 us 3.23% FAIL
I32 I32 2^28 0 932.078 us 0.71% 988.164 us 0.80% 56.087 us 6.02% FAIL
I32 I64 2^16 1 10.954 us 5.06% 11.092 us 4.54% 0.138 us 1.26% PASS
I32 I64 2^20 1 19.086 us 2.82% 19.605 us 2.61% 0.519 us 2.72% FAIL
I32 I64 2^24 1 126.630 us 0.86% 130.365 us 0.93% 3.734 us 2.95% FAIL
I32 I64 2^28 1 1.870 ms 0.35% 1.937 ms 0.32% 66.480 us 3.55% FAIL
I32 I64 2^16 0.544 10.859 us 5.41% 11.231 us 4.18% 0.372 us 3.42% PASS
I32 I64 2^20 0.544 18.805 us 2.88% 19.237 us 2.65% 0.432 us 2.30% PASS
I32 I64 2^24 0.544 117.265 us 0.86% 122.339 us 1.00% 5.075 us 4.33% FAIL
I32 I64 2^28 0.544 1.710 ms 0.48% 1.786 ms 0.41% 76.582 us 4.48% FAIL
I32 I64 2^16 0 10.346 us 4.31% 10.599 us 5.19% 0.253 us 2.45% PASS
I32 I64 2^20 0 18.268 us 2.86% 18.669 us 2.57% 0.401 us 2.19% PASS
I32 I64 2^24 0 105.070 us 0.86% 109.843 us 0.73% 4.773 us 4.54% FAIL
I32 I64 2^28 0 1.433 ms 0.50% 1.526 ms 0.43% 92.698 us 6.47% FAIL
I64 I32 2^16 1 10.832 us 5.21% 11.248 us 4.48% 0.416 us 3.84% PASS
I64 I32 2^20 1 23.997 us 2.55% 23.766 us 2.48% -0.231 us -0.96% PASS
I64 I32 2^24 1 210.850 us 0.77% 214.249 us 1.11% 3.398 us 1.61% FAIL
I64 I32 2^28 1 3.185 ms 0.50% 3.230 ms 0.50% 45.208 us 1.42% FAIL
I64 I32 2^16 0.544 11.070 us 4.94% 11.088 us 4.56% 0.018 us 0.17% PASS
I64 I32 2^20 0.544 23.345 us 2.31% 23.337 us 3.05% -0.007 us -0.03% PASS
I64 I32 2^24 0.544 174.978 us 0.85% 177.655 us 1.12% 2.676 us 1.53% FAIL
I64 I32 2^28 0.544 2.532 ms 0.50% 2.573 ms 0.50% 40.790 us 1.61% FAIL
I64 I32 2^16 0 10.724 us 5.47% 10.710 us 5.32% -0.013 us -0.12% PASS
I64 I32 2^20 0 22.017 us 2.55% 21.899 us 2.69% -0.118 us -0.53% PASS
I64 I32 2^24 0 126.339 us 1.10% 128.704 us 1.49% 2.365 us 1.87% FAIL
I64 I32 2^28 0 1.612 ms 0.63% 1.661 ms 0.79% 48.768 us 3.03% FAIL
I64 I64 2^16 1 11.439 us 4.09% 11.624 us 4.82% 0.185 us 1.62% PASS
I64 I64 2^20 1 26.201 us 2.36% 26.918 us 2.21% 0.718 us 2.74% FAIL
I64 I64 2^24 1 232.302 us 0.60% 238.925 us 0.67% 6.623 us 2.85% FAIL
I64 I64 2^28 1 3.559 ms 0.45% 3.664 ms 0.32% 104.862 us 2.95% FAIL
I64 I64 2^16 0.544 11.441 us 4.39% 11.572 us 4.28% 0.131 us 1.14% PASS
I64 I64 2^20 0.544 25.943 us 2.04% 26.668 us 2.03% 0.725 us 2.79% FAIL
I64 I64 2^24 0.544 207.195 us 0.56% 216.482 us 0.63% 9.287 us 4.48% FAIL
I64 I64 2^28 0.544 3.138 ms 0.25% 3.277 ms 0.23% 139.576 us 4.45% FAIL
I64 I64 2^16 0 11.340 us 4.31% 11.459 us 4.49% 0.119 us 1.05% PASS
I64 I64 2^20 0 25.170 us 2.27% 25.708 us 2.26% 0.539 us 2.14% PASS
I64 I64 2^24 0 181.791 us 0.53% 191.160 us 0.62% 9.369 us 5.15% FAIL
I64 I64 2^28 0 2.653 ms 0.50% 2.827 ms 0.43% 173.655 us 6.54% FAIL
I128 I32 2^16 1 12.977 us 4.24% 12.915 us 4.32% -0.062 us -0.47% PASS
I128 I32 2^20 1 40.951 us 1.84% 42.435 us 2.38% 1.484 us 3.62% FAIL
I128 I32 2^24 1 427.279 us 2.77% 440.225 us 3.02% 12.947 us 3.03% FAIL
I128 I32 2^28 1 6.582 ms 0.80% 6.775 ms 0.80% 192.812 us 2.93% FAIL
I128 I32 2^16 0.544 12.735 us 4.56% 12.525 us 4.26% -0.210 us -1.65% PASS
I128 I32 2^20 0.544 36.219 us 2.55% 36.966 us 2.40% 0.747 us 2.06% PASS
I128 I32 2^24 0.544 335.504 us 1.05% 345.505 us 2.20% 10.001 us 2.98% FAIL
I128 I32 2^28 0.544 5.081 ms 0.50% 5.237 ms 0.67% 155.983 us 3.07% FAIL
I128 I32 2^16 0 11.941 us 4.68% 11.964 us 4.92% 0.023 us 0.19% PASS
I128 I32 2^20 0 32.979 us 2.32% 33.000 us 2.12% 0.021 us 0.06% PASS
I128 I32 2^24 0 225.813 us 0.68% 234.537 us 0.84% 8.724 us 3.86% FAIL
I128 I32 2^28 0 3.175 ms 0.50% 3.320 ms 0.50% 144.796 us 4.56% FAIL
I128 I64 2^16 1 13.888 us 4.96% 13.856 us 4.23% -0.032 us -0.23% PASS
I128 I64 2^20 1 43.101 us 1.81% 44.190 us 1.91% 1.089 us 2.53% FAIL
I128 I64 2^24 1 485.846 us 0.49% 501.259 us 0.42% 15.413 us 3.17% FAIL
I128 I64 2^28 1 7.619 ms 0.39% 7.865 ms 0.31% 245.257 us 3.22% FAIL
I128 I64 2^16 0.544 13.515 us 3.62% 13.434 us 3.87% -0.080 us -0.59% PASS
I128 I64 2^20 0.544 41.715 us 1.83% 42.390 us 1.89% 0.675 us 1.62% PASS
I128 I64 2^24 0.544 447.034 us 0.42% 464.824 us 0.38% 17.790 us 3.98% FAIL
I128 I64 2^28 0.544 6.960 ms 0.25% 7.252 ms 0.20% 291.424 us 4.19% FAIL
I128 I64 2^16 0 15.888 us 34.21% 13.708 us 4.45% -2.180 us -13.72% FAIL
I128 I64 2^20 0 39.887 us 2.15% 40.225 us 1.91% 0.339 us 0.85% PASS
I128 I64 2^24 0 397.473 us 0.36% 412.908 us 0.46% 15.435 us 3.88% FAIL
I128 I64 2^28 0 6.129 ms 0.49% 6.385 ms 0.35% 256.153 us 4.18% FAIL
F32 I32 2^16 1 11.051 us 4.55% 10.888 us 5.42% -0.163 us -1.47% PASS
F32 I32 2^20 1 17.516 us 2.99% 17.733 us 3.19% 0.217 us 1.24% PASS
F32 I32 2^24 1 117.833 us 2.43% 122.766 us 4.77% 4.933 us 4.19% FAIL
F32 I32 2^28 1 1.697 ms 0.92% 1.765 ms 1.31% 67.986 us 4.01% FAIL
F32 I32 2^16 0.544 10.106 us 5.13% 10.284 us 4.26% 0.179 us 1.77% PASS
F32 I32 2^20 0.544 17.096 us 4.58% 17.642 us 3.90% 0.546 us 3.20% PASS
F32 I32 2^24 0.544 85.431 us 2.10% 88.044 us 2.89% 2.613 us 3.06% FAIL
F32 I32 2^28 0.544 1.015 ms 0.68% 1.064 ms 0.55% 49.253 us 4.85% FAIL
F32 I32 2^16 0 9.894 us 5.61% 10.132 us 5.02% 0.238 us 2.40% PASS
F32 I32 2^20 0 18.221 us 23.24% 17.400 us 9.32% -0.821 us -4.51% PASS
F32 I32 2^24 0 82.215 us 2.24% 84.631 us 2.86% 2.416 us 2.94% FAIL
F32 I32 2^28 0 930.426 us 0.55% 988.323 us 0.81% 57.897 us 6.22% FAIL
F32 I64 2^16 1 10.979 us 4.83% 11.289 us 3.91% 0.310 us 2.82% PASS
F32 I64 2^20 1 19.126 us 2.76% 19.535 us 2.43% 0.409 us 2.14% PASS
F32 I64 2^24 1 126.971 us 0.89% 130.462 us 0.96% 3.491 us 2.75% FAIL
F32 I64 2^28 1 1.875 ms 0.24% 1.937 ms 0.50% 62.148 us 3.31% FAIL
F32 I64 2^16 0.544 10.757 us 5.42% 11.033 us 4.79% 0.275 us 2.56% PASS
F32 I64 2^20 0.544 18.527 us 2.79% 18.861 us 3.11% 0.334 us 1.80% PASS
F32 I64 2^24 0.544 108.536 us 0.84% 112.933 us 1.00% 4.398 us 4.05% FAIL
F32 I64 2^28 0.544 1.517 ms 0.43% 1.597 ms 0.40% 80.000 us 5.27% FAIL
F32 I64 2^16 0 10.346 us 4.28% 10.902 us 5.20% 0.556 us 5.37% FAIL
F32 I64 2^20 0 18.266 us 2.92% 18.693 us 2.87% 0.427 us 2.34% PASS
F32 I64 2^24 0 105.444 us 0.84% 109.709 us 0.78% 4.266 us 4.05% FAIL
F32 I64 2^28 0 1.445 ms 0.50% 1.529 ms 0.40% 83.864 us 5.81% FAIL
F64 I32 2^16 1 10.800 us 5.07% 11.403 us 4.30% 0.604 us 5.59% FAIL
F64 I32 2^20 1 23.925 us 2.55% 23.718 us 2.59% -0.207 us -0.87% PASS
F64 I32 2^24 1 210.758 us 0.75% 213.802 us 1.07% 3.044 us 1.44% FAIL
F64 I32 2^28 1 3.185 ms 0.50% 3.230 ms 0.50% 44.668 us 1.40% FAIL
F64 I32 2^16 0.544 10.768 us 5.53% 10.646 us 5.35% -0.122 us -1.13% PASS
F64 I32 2^20 0.544 22.167 us 2.85% 22.324 us 2.89% 0.157 us 0.71% PASS
F64 I32 2^24 0.544 137.520 us 1.21% 139.555 us 1.46% 2.035 us 1.48% FAIL
F64 I32 2^28 0.544 1.846 ms 0.50% 1.892 ms 0.63% 45.714 us 2.48% FAIL
F64 I32 2^16 0 10.564 us 5.11% 10.446 us 4.81% -0.118 us -1.11% PASS
F64 I32 2^20 0 21.976 us 2.61% 21.674 us 2.55% -0.301 us -1.37% PASS
F64 I32 2^24 0 126.097 us 1.14% 127.951 us 1.49% 1.854 us 1.47% FAIL
F64 I32 2^28 0 1.607 ms 0.92% 1.656 ms 0.85% 48.949 us 3.05% FAIL
F64 I64 2^16 1 11.543 us 4.54% 11.689 us 4.87% 0.146 us 1.27% PASS
F64 I64 2^20 1 26.137 us 2.32% 26.968 us 2.15% 0.830 us 3.18% FAIL
F64 I64 2^24 1 231.907 us 0.62% 238.530 us 0.70% 6.624 us 2.86% FAIL
F64 I64 2^28 1 3.567 ms 0.40% 3.655 ms 0.25% 88.042 us 2.47% FAIL
F64 I64 2^16 0.544 11.250 us 4.32% 11.386 us 4.13% 0.136 us 1.21% PASS
F64 I64 2^20 0.544 25.499 us 2.03% 26.665 us 2.31% 1.166 us 4.57% FAIL
F64 I64 2^24 0.544 189.150 us 0.58% 197.122 us 0.64% 7.972 us 4.21% FAIL
F64 I64 2^28 0.544 2.796 ms 0.50% 2.934 ms 0.34% 137.801 us 4.93% FAIL
F64 I64 2^16 0 11.405 us 4.47% 11.638 us 4.58% 0.233 us 2.04% PASS
F64 I64 2^20 0 25.005 us 2.24% 25.606 us 2.61% 0.602 us 2.41% FAIL
F64 I64 2^24 0 180.254 us 0.57% 189.613 us 0.62% 9.359 us 5.19% FAIL
F64 I64 2^28 0 2.629 ms 0.50% 2.802 ms 0.42% 172.879 us 6.58% FAIL

Select.Flagged - NVIDIA A100-PCIE-40GB

[0] NVIDIA A100-PCIE-40GB

T{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 10.550 us 10.37% 11.009 us 16.12% 0.459 us 4.35% PASS
I8 I32 2^20 1 15.326 us 3.35% 15.887 us 3.90% 0.561 us 3.66% FAIL
I8 I32 2^24 1 77.358 us 3.26% 82.103 us 4.19% 4.745 us 6.13% FAIL
I8 I32 2^28 1 1.043 ms 0.50% 1.118 ms 0.73% 74.777 us 7.17% FAIL
I8 I32 2^16 0.544 10.788 us 5.26% 10.942 us 4.93% 0.154 us 1.43% PASS
I8 I32 2^20 0.544 15.230 us 3.03% 15.646 us 3.30% 0.416 us 2.73% PASS
I8 I32 2^24 0.544 76.144 us 2.11% 80.767 us 2.21% 4.623 us 6.07% FAIL
I8 I32 2^28 0.544 1.017 ms 0.55% 1.094 ms 0.72% 77.114 us 7.58% FAIL
I8 I32 2^16 0 10.428 us 4.59% 10.554 us 4.84% 0.127 us 1.21% PASS
I8 I32 2^20 0 14.967 us 5.05% 15.327 us 3.82% 0.360 us 2.41% PASS
I8 I32 2^24 0 69.982 us 2.12% 73.301 us 3.19% 3.319 us 4.74% FAIL
I8 I32 2^28 0 866.153 us 0.21% 918.086 us 0.27% 51.933 us 6.00% FAIL
I8 I64 2^16 1 10.908 us 4.97% 11.144 us 4.16% 0.236 us 2.16% PASS
I8 I64 2^20 1 17.424 us 2.91% 17.918 us 3.17% 0.494 us 2.84% PASS
I8 I64 2^24 1 107.996 us 0.63% 113.144 us 0.93% 5.148 us 4.77% FAIL
I8 I64 2^28 1 1.562 ms 0.29% 1.653 ms 0.26% 91.281 us 5.84% FAIL
I8 I64 2^16 0.544 10.889 us 5.04% 11.076 us 4.64% 0.187 us 1.72% PASS
I8 I64 2^20 0.544 17.289 us 2.62% 17.676 us 2.87% 0.387 us 2.24% PASS
I8 I64 2^24 0.544 105.647 us 0.85% 110.765 us 0.94% 5.118 us 4.84% FAIL
I8 I64 2^28 0.544 1.525 ms 0.47% 1.616 ms 0.37% 90.833 us 5.96% FAIL
I8 I64 2^16 0 10.240 us 3.96% 10.422 us 4.49% 0.182 us 1.78% PASS
I8 I64 2^20 0 16.855 us 3.45% 17.290 us 3.23% 0.435 us 2.58% PASS
I8 I64 2^24 0 101.577 us 0.70% 106.896 us 0.84% 5.319 us 5.24% FAIL
I8 I64 2^28 0 1.406 ms 0.10% 1.498 ms 0.12% 91.279 us 6.49% FAIL
I16 I32 2^16 1 11.346 us 4.13% 11.864 us 4.72% 0.518 us 4.56% FAIL
I16 I32 2^20 1 17.175 us 4.31% 17.757 us 3.53% 0.581 us 3.39% PASS
I16 I32 2^24 1 100.276 us 1.91% 95.588 us 2.53% -4.688 us -4.67% FAIL
I16 I32 2^28 1 1.336 ms 0.36% 1.269 ms 0.40% -67.190 us -5.03% FAIL
I16 I32 2^16 0.544 11.666 us 4.70% 12.260 us 3.98% 0.593 us 5.09% FAIL
I16 I32 2^20 0.544 16.571 us 3.29% 17.085 us 3.35% 0.514 us 3.10% PASS
I16 I32 2^24 0.544 95.417 us 1.92% 90.798 us 2.44% -4.619 us -4.84% FAIL
I16 I32 2^28 0.544 1.248 ms 0.50% 1.165 ms 0.39% -82.988 us -6.65% FAIL
I16 I32 2^16 0 10.870 us 5.21% 11.374 us 3.85% 0.504 us 4.64% FAIL
I16 I32 2^20 0 16.404 us 3.30% 17.037 us 3.54% 0.633 us 3.86% FAIL
I16 I32 2^24 0 84.071 us 2.11% 79.902 us 2.78% -4.170 us -4.96% FAIL
I16 I32 2^28 0 1.037 ms 0.24% 938.336 us 0.43% -98.709 us -9.52% FAIL
I16 I64 2^16 1 10.695 us 5.20% 11.102 us 4.57% 0.407 us 3.81% PASS
I16 I64 2^20 1 18.518 us 2.93% 19.119 us 2.77% 0.601 us 3.25% FAIL
I16 I64 2^24 1 116.678 us 0.84% 121.883 us 0.96% 5.204 us 4.46% FAIL
I16 I64 2^28 1 1.696 ms 0.29% 1.777 ms 0.26% 81.068 us 4.78% FAIL
I16 I64 2^16 0.544 10.609 us 5.14% 10.979 us 4.78% 0.370 us 3.49% PASS
I16 I64 2^20 0.544 18.384 us 2.45% 18.899 us 2.92% 0.515 us 2.80% FAIL
I16 I64 2^24 0.544 114.292 us 0.81% 119.623 us 0.90% 5.331 us 4.66% FAIL
I16 I64 2^28 0.544 1.656 ms 0.43% 1.736 ms 0.41% 79.875 us 4.82% FAIL
I16 I64 2^16 0 10.696 us 5.28% 11.085 us 4.58% 0.389 us 3.63% PASS
I16 I64 2^20 0 17.845 us 3.19% 18.220 us 2.95% 0.375 us 2.10% PASS
I16 I64 2^24 0 107.377 us 0.76% 112.512 us 0.95% 5.135 us 4.78% FAIL
I16 I64 2^28 0 1.484 ms 0.12% 1.572 ms 0.14% 88.041 us 5.93% FAIL
I32 I32 2^16 1 10.617 us 5.36% 11.016 us 4.73% 0.399 us 3.76% PASS
I32 I32 2^20 1 19.408 us 2.75% 20.367 us 2.91% 0.959 us 4.94% FAIL
I32 I32 2^24 1 137.828 us 1.01% 144.374 us 1.37% 6.545 us 4.75% FAIL
I32 I32 2^28 1 1.953 ms 0.50% 2.047 ms 0.57% 94.395 us 4.83% FAIL
I32 I32 2^16 0.544 10.468 us 4.74% 10.768 us 5.31% 0.300 us 2.87% PASS
I32 I32 2^20 0.544 19.095 us 3.06% 20.093 us 3.78% 0.998 us 5.22% FAIL
I32 I32 2^24 0.544 124.197 us 1.21% 130.603 us 1.48% 6.406 us 5.16% FAIL
I32 I32 2^28 0.544 1.694 ms 0.50% 1.811 ms 0.50% 116.534 us 6.88% FAIL
I32 I32 2^16 0 10.018 us 4.88% 10.266 us 4.59% 0.248 us 2.48% PASS
I32 I32 2^20 0 18.045 us 3.29% 18.504 us 2.86% 0.459 us 2.54% PASS
I32 I32 2^24 0 108.258 us 1.09% 111.968 us 1.45% 3.710 us 3.43% FAIL
I32 I32 2^28 0 1.392 ms 0.18% 1.480 ms 0.21% 88.184 us 6.34% FAIL
I32 I64 2^16 1 11.141 us 4.37% 11.378 us 4.41% 0.237 us 2.13% PASS
I32 I64 2^20 1 20.818 us 2.50% 20.953 us 2.55% 0.135 us 0.65% PASS
I32 I64 2^24 1 141.316 us 0.66% 144.585 us 0.68% 3.269 us 2.31% FAIL
I32 I64 2^28 1 2.059 ms 0.25% 2.123 ms 0.33% 64.328 us 3.12% FAIL
I32 I64 2^16 0.544 11.284 us 3.52% 11.303 us 3.50% 0.019 us 0.17% PASS
I32 I64 2^20 0.544 20.760 us 2.40% 20.879 us 2.53% 0.119 us 0.57% PASS
I32 I64 2^24 0.544 131.145 us 0.75% 135.526 us 0.83% 4.381 us 3.34% FAIL
I32 I64 2^28 0.544 1.857 ms 0.46% 1.932 ms 0.42% 75.349 us 4.06% FAIL
I32 I64 2^16 0 10.646 us 5.35% 10.799 us 5.12% 0.153 us 1.43% PASS
I32 I64 2^20 0 19.968 us 2.66% 20.118 us 2.97% 0.150 us 0.75% PASS
I32 I64 2^24 0 116.104 us 0.71% 120.232 us 0.88% 4.128 us 3.56% FAIL
I32 I64 2^28 0 1.579 ms 0.13% 1.659 ms 0.15% 79.681 us 5.04% FAIL
I64 I32 2^16 1 11.029 us 4.80% 11.216 us 4.26% 0.188 us 1.70% PASS
I64 I32 2^20 1 27.381 us 2.16% 28.194 us 2.24% 0.814 us 2.97% FAIL
I64 I32 2^24 1 232.335 us 1.19% 243.800 us 3.40% 11.466 us 4.93% FAIL
I64 I32 2^28 1 3.487 ms 0.55% 3.672 ms 0.91% 185.850 us 5.33% FAIL
I64 I32 2^16 0.544 11.294 us 4.37% 11.290 us 4.21% -0.005 us -0.04% PASS
I64 I32 2^20 0.544 26.590 us 2.06% 27.128 us 2.42% 0.539 us 2.03% PASS
I64 I32 2^24 0.544 193.905 us 1.03% 200.667 us 1.62% 6.762 us 3.49% FAIL
I64 I32 2^28 0.544 2.815 ms 0.53% 2.924 ms 0.79% 108.853 us 3.87% FAIL
I64 I32 2^16 0 10.779 us 5.41% 10.723 us 5.30% -0.056 us -0.52% PASS
I64 I32 2^20 0 24.282 us 2.48% 24.749 us 2.57% 0.467 us 1.92% PASS
I64 I32 2^24 0 148.208 us 0.77% 154.013 us 1.07% 5.805 us 3.92% FAIL
I64 I32 2^28 0 1.972 ms 0.17% 2.080 ms 0.30% 108.251 us 5.49% FAIL
I64 I64 2^16 1 11.677 us 4.66% 11.843 us 4.89% 0.166 us 1.42% PASS
I64 I64 2^20 1 27.792 us 1.86% 28.562 us 1.97% 0.770 us 2.77% FAIL
I64 I64 2^24 1 245.119 us 0.53% 250.969 us 0.62% 5.850 us 2.39% FAIL
I64 I64 2^28 1 3.735 ms 0.50% 3.823 ms 0.49% 87.727 us 2.35% FAIL
I64 I64 2^16 0.544 11.712 us 4.93% 11.883 us 4.87% 0.171 us 1.46% PASS
I64 I64 2^20 0.544 27.297 us 2.19% 28.330 us 2.08% 1.033 us 3.78% FAIL
I64 I64 2^24 0.544 217.224 us 0.58% 226.457 us 0.61% 9.233 us 4.25% FAIL
I64 I64 2^28 0.544 3.235 ms 0.50% 3.399 ms 0.43% 163.751 us 5.06% FAIL
I64 I64 2^16 0 11.519 us 4.33% 11.842 us 4.77% 0.323 us 2.80% PASS
I64 I64 2^20 0 26.621 us 2.26% 27.374 us 2.28% 0.753 us 2.83% FAIL
I64 I64 2^24 0 191.486 us 0.51% 200.126 us 0.62% 8.640 us 4.51% FAIL
I64 I64 2^28 0 2.787 ms 0.13% 2.943 ms 0.14% 155.775 us 5.59% FAIL
I128 I32 2^16 1 12.278 us 4.09% 12.584 us 4.34% 0.306 us 2.49% PASS
I128 I32 2^20 1 40.328 us 1.81% 40.735 us 1.66% 0.407 us 1.01% PASS
I128 I32 2^24 1 426.357 us 0.56% 442.825 us 1.05% 16.468 us 3.86% FAIL
I128 I32 2^28 1 6.605 ms 0.50% 6.875 ms 0.50% 270.120 us 4.09% FAIL
I128 I32 2^16 0.544 12.255 us 4.08% 12.675 us 4.46% 0.420 us 3.43% PASS
I128 I32 2^20 0.544 37.975 us 1.92% 38.504 us 1.83% 0.529 us 1.39% PASS
I128 I32 2^24 0.544 345.927 us 0.69% 357.985 us 1.20% 12.058 us 3.49% FAIL
I128 I32 2^28 0.544 5.232 ms 0.50% 5.399 ms 0.50% 167.768 us 3.21% FAIL
I128 I32 2^16 0 11.840 us 4.88% 12.098 us 4.68% 0.258 us 2.18% PASS
I128 I32 2^20 0 35.066 us 2.06% 35.104 us 1.86% 0.038 us 0.11% PASS
I128 I32 2^24 0 234.184 us 0.79% 246.398 us 0.99% 12.214 us 5.22% FAIL
I128 I32 2^28 0 3.333 ms 0.16% 3.498 ms 0.24% 165.087 us 4.95% FAIL
I128 I64 2^16 1 13.868 us 4.04% 14.180 us 4.05% 0.312 us 2.25% PASS
I128 I64 2^20 1 43.816 us 1.58% 45.447 us 1.85% 1.631 us 3.72% FAIL
I128 I64 2^24 1 491.981 us 0.50% 508.654 us 0.43% 16.673 us 3.39% FAIL
I128 I64 2^28 1 7.708 ms 0.46% 7.969 ms 0.36% 261.033 us 3.39% FAIL
I128 I64 2^16 0.544 13.439 us 3.68% 13.799 us 4.32% 0.360 us 2.68% PASS
I128 I64 2^20 0.544 42.957 us 3.84% 44.550 us 4.07% 1.593 us 3.71% PASS
I128 I64 2^24 0.544 449.942 us 0.43% 469.523 us 0.39% 19.581 us 4.35% FAIL
I128 I64 2^28 0.544 7.014 ms 0.45% 7.325 ms 0.36% 311.103 us 4.44% FAIL
I128 I64 2^16 0 14.666 us 23.28% 15.752 us 29.31% 1.087 us 7.41% PASS
I128 I64 2^20 0 40.049 us 1.55% 41.397 us 1.85% 1.348 us 3.37% FAIL
I128 I64 2^24 0 397.476 us 0.34% 415.143 us 0.51% 17.667 us 4.44% FAIL
I128 I64 2^28 0 6.109 ms 0.10% 6.396 ms 0.11% 287.378 us 4.70% FAIL
F32 I32 2^16 1 11.108 us 4.45% 11.313 us 3.90% 0.205 us 1.84% PASS
F32 I32 2^20 1 19.752 us 2.75% 20.315 us 2.91% 0.564 us 2.85% FAIL
F32 I32 2^24 1 137.973 us 1.01% 144.387 us 1.42% 6.414 us 4.65% FAIL
F32 I32 2^28 1 1.948 ms 0.50% 2.041 ms 0.50% 93.589 us 4.80% FAIL
F32 I32 2^16 0.544 10.518 us 5.06% 10.886 us 5.29% 0.367 us 3.49% PASS
F32 I32 2^20 0.544 19.152 us 3.25% 19.804 us 3.48% 0.653 us 3.41% FAIL
F32 I32 2^24 0.544 124.317 us 1.18% 130.768 us 1.47% 6.451 us 5.19% FAIL
F32 I32 2^28 0.544 1.694 ms 0.50% 1.811 ms 0.50% 116.890 us 6.90% FAIL
F32 I32 2^16 0 10.079 us 5.26% 10.410 us 4.51% 0.330 us 3.28% PASS
F32 I32 2^20 0 18.077 us 3.23% 18.776 us 3.06% 0.699 us 3.87% FAIL
F32 I32 2^24 0 108.270 us 1.06% 112.225 us 1.42% 3.955 us 3.65% FAIL
F32 I32 2^28 0 1.391 ms 0.19% 1.481 ms 0.21% 89.383 us 6.42% FAIL
F32 I64 2^16 1 11.168 us 4.32% 11.582 us 4.81% 0.414 us 3.71% PASS
F32 I64 2^20 1 20.750 us 2.56% 21.095 us 2.58% 0.345 us 1.66% PASS
F32 I64 2^24 1 141.341 us 0.66% 144.912 us 0.70% 3.571 us 2.53% FAIL
F32 I64 2^28 1 2.050 ms 0.39% 2.110 ms 0.26% 59.936 us 2.92% FAIL
F32 I64 2^16 0.544 11.350 us 3.75% 11.603 us 4.78% 0.253 us 2.23% PASS
F32 I64 2^20 0.544 20.667 us 2.45% 20.948 us 2.86% 0.282 us 1.36% PASS
F32 I64 2^24 0.544 131.247 us 0.76% 135.918 us 0.85% 4.671 us 3.56% FAIL
F32 I64 2^28 0.544 1.857 ms 0.47% 1.933 ms 0.43% 75.756 us 4.08% FAIL
F32 I64 2^16 0 11.030 us 4.63% 11.340 us 3.82% 0.310 us 2.81% PASS
F32 I64 2^20 0 19.882 us 2.77% 20.193 us 2.61% 0.311 us 1.56% PASS
F32 I64 2^24 0 116.187 us 0.73% 120.631 us 0.88% 4.444 us 3.82% FAIL
F32 I64 2^28 0 1.579 ms 0.12% 1.659 ms 0.15% 80.105 us 5.07% FAIL
F64 I32 2^16 1 11.418 us 4.04% 11.761 us 4.85% 0.343 us 3.00% PASS
F64 I32 2^20 1 27.530 us 2.43% 28.444 us 2.96% 0.915 us 3.32% FAIL
F64 I32 2^24 1 232.420 us 1.34% 244.087 us 2.88% 11.667 us 5.02% FAIL
F64 I32 2^28 1 3.489 ms 0.55% 3.674 ms 0.90% 184.551 us 5.29% FAIL
F64 I32 2^16 0.544 10.880 us 5.22% 11.202 us 4.28% 0.322 us 2.96% PASS
F64 I32 2^20 0.544 26.797 us 2.10% 27.385 us 2.32% 0.588 us 2.19% FAIL
F64 I32 2^24 0.544 193.974 us 1.07% 200.739 us 1.62% 6.766 us 3.49% FAIL
F64 I32 2^28 0.544 2.815 ms 0.53% 2.925 ms 0.82% 109.445 us 3.89% FAIL
F64 I32 2^16 0 10.368 us 4.71% 10.643 us 5.40% 0.275 us 2.65% PASS
F64 I32 2^20 0 26.214 us 20.57% 24.966 us 2.56% -1.248 us -4.76% FAIL
F64 I32 2^24 0 147.941 us 0.77% 154.369 us 1.06% 6.428 us 4.34% FAIL
F64 I32 2^28 0 1.971 ms 0.17% 2.080 ms 0.23% 109.155 us 5.54% FAIL
F64 I64 2^16 1 11.522 us 4.60% 12.045 us 4.46% 0.523 us 4.54% FAIL
F64 I64 2^20 1 27.626 us 1.76% 28.816 us 1.95% 1.190 us 4.31% FAIL
F64 I64 2^24 1 245.109 us 0.54% 251.156 us 0.62% 6.047 us 2.47% FAIL
F64 I64 2^28 1 3.737 ms 0.50% 3.829 ms 0.35% 92.013 us 2.46% FAIL
F64 I64 2^16 0.544 11.550 us 4.62% 12.049 us 4.71% 0.500 us 4.33% PASS
F64 I64 2^20 0.544 27.294 us 2.07% 28.566 us 1.92% 1.271 us 4.66% FAIL
F64 I64 2^24 0.544 217.308 us 0.53% 226.582 us 0.62% 9.274 us 4.27% FAIL
F64 I64 2^28 0.544 3.234 ms 0.50% 3.398 ms 0.47% 163.952 us 5.07% FAIL
F64 I64 2^16 0 11.530 us 4.38% 12.090 us 4.38% 0.560 us 4.85% FAIL
F64 I64 2^20 0 26.495 us 2.13% 27.618 us 2.12% 1.123 us 4.24% FAIL
F64 I64 2^24 0 191.519 us 0.49% 200.335 us 0.58% 8.816 us 4.60% FAIL
F64 I64 2^28 0 2.787 ms 0.13% 2.943 ms 0.14% 155.876 us 5.59% FAIL

Copy link
Contributor

🟩 CI Results: Pass: 100%/198 | Total Time: 3d 14h | Avg Time: 26m 14s | Hits: 64%/118084
  • 🟩 thrust: Pass: 100%/99 | Total Time: 1d 21h | Avg Time: 27m 52s | Hits: 37%/50817

    🟩 cpu
      🟩 amd64              Pass: 100%/91  | Total Time:  1d 18h | Avg Time: 27m 45s | Hits:  38%/46709 
      🟩 arm64              Pass: 100%/8   | Total Time:  3h 54m | Avg Time: 29m 16s | Hits:  32%/4108  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total Time:  6h 29m | Avg Time: 25m 59s | Hits:  32%/7700  
      🟩 11.8               Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 26s | Hits:  33%/1542  
      🟩 12.4               Pass: 100%/81  | Total Time:  1d 13h | Avg Time: 28m 01s | Hits:  39%/41575 
    🟩 cudacxx_full
      🟩 clang-cuda16       Pass: 100%/2   | Total Time: 53m 00s | Avg Time: 26m 30s | Hits:  31%/1026  
      🟩 nvcc11.1           Pass: 100%/15  | Total Time:  6h 29m | Avg Time: 25m 59s | Hits:  32%/7700  
      🟩 nvcc11.8           Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 26s | Hits:  33%/1542  
      🟩 nvcc12.4           Pass: 100%/79  | Total Time:  1d 12h | Avg Time: 28m 03s | Hits:  39%/40549 
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total Time: 53m 00s | Avg Time: 26m 30s | Hits:  31%/1026  
      🟩 nvcc               Pass: 100%/97  | Total Time:  1d 21h | Avg Time: 27m 54s | Hits:  38%/49791 
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total Time:  2h 27m | Avg Time: 24m 32s | Hits:  33%/3078  
      🟩 clang10            Pass: 100%/3   | Total Time:  1h 24m | Avg Time: 28m 03s | Hits:  33%/1539  
      🟩 clang11            Pass: 100%/4   | Total Time:  1h 49m | Avg Time: 27m 17s | Hits:  32%/2052  
      🟩 clang12            Pass: 100%/4   | Total Time:  2h 04m | Avg Time: 31m 11s | Hits:  32%/2052  
      🟩 clang13            Pass: 100%/4   | Total Time:  1h 53m | Avg Time: 28m 18s | Hits:  32%/2052  
      🟩 clang14            Pass: 100%/4   | Total Time:  1h 54m | Avg Time: 28m 37s | Hits:  32%/2052  
      🟩 clang15            Pass: 100%/4   | Total Time:  1h 59m | Avg Time: 29m 50s | Hits:  32%/2052  
      🟩 clang16            Pass: 100%/14  | Total Time:  5h 19m | Avg Time: 22m 50s | Hits:  51%/7182  
      🟩 gcc6               Pass: 100%/2   | Total Time: 44m 34s | Avg Time: 22m 17s | Hits:  33%/1026  
      🟩 gcc7               Pass: 100%/6   | Total Time:  2h 44m | Avg Time: 27m 20s | Hits:  33%/3084  
      🟩 gcc8               Pass: 100%/6   | Total Time:  3h 01m | Avg Time: 30m 10s | Hits:  33%/3084  
      🟩 gcc9               Pass: 100%/6   | Total Time:  2h 30m | Avg Time: 25m 04s | Hits:  33%/3084  
      🟩 gcc10              Pass: 100%/4   | Total Time:  2h 00m | Avg Time: 30m 10s | Hits:  32%/2056  
      🟩 gcc11              Pass: 100%/7   | Total Time:  3h 41m | Avg Time: 31m 36s | Hits:  32%/3598  
      🟩 gcc12              Pass: 100%/16  | Total Time:  5h 23m | Avg Time: 20m 14s | Hits:  49%/8224  
      🟩 Intel2023.2.0      Pass: 100%/3   | Total Time:  1h 44m | Avg Time: 34m 51s | Hits:  33%/1548  
      🟩 MSVC14.16          Pass: 100%/1   | Total Time: 52m 41s | Avg Time: 52m 41s | Hits:  28%/509   
      🟩 MSVC14.29          Pass: 100%/2   | Total Time:  1h 44m | Avg Time: 52m 17s | Hits:  28%/1018  
      🟩 MSVC14.39          Pass: 100%/3   | Total Time:  2h 40m | Avg Time: 53m 26s | Hits:  28%/1527  
    🟩 cxx_name
      🟩 clang              Pass: 100%/43  | Total Time: 18h 51m | Avg Time: 26m 19s | Hits:  39%/22059 
      🟩 gcc                Pass: 100%/47  | Total Time: 20h 05m | Avg Time: 25m 39s | Hits:  38%/24156 
      🟩 Intel              Pass: 100%/3   | Total Time:  1h 44m | Avg Time: 34m 51s | Hits:  33%/1548  
      🟩 MSVC               Pass: 100%/6   | Total Time:  5h 17m | Avg Time: 52m 55s | Hits:  28%/3054  
    🟩 gpu
      🟩 v100               Pass: 100%/99  | Total Time:  1d 21h | Avg Time: 27m 52s | Hits:  37%/50817 
    🟩 jobs
      🟩 build              Pass: 100%/91  | Total Time:  1d 20h | Avg Time: 29m 23s | Hits:  32%/46709 
      🟩 test               Pass: 100%/8   | Total Time:  1h 25m | Avg Time: 10m 39s | Hits:  99%/4108  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total Time:  5h 37m | Avg Time: 24m 05s | Hits:  33%/7191  
      🟩 ubuntu20.04        Pass: 100%/35  | Total Time: 16h 56m | Avg Time: 29m 02s | Hits:  32%/17968 
      🟩 ubuntu22.04        Pass: 100%/44  | Total Time: 18h 08m | Avg Time: 24m 44s | Hits:  44%/22604 
      🟩 windows2022        Pass: 100%/6   | Total Time:  5h 17m | Avg Time: 52m 55s | Hits:  28%/3054  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 26s | Hits:  33%/1542  
      🟩 90a                Pass: 100%/4   | Total Time: 57m 51s | Avg Time: 14m 27s | Hits:  32%/2056  
    🟩 std
      🟩 11                 Pass: 100%/26  | Total Time: 10h 29m | Avg Time: 24m 11s | Hits:  42%/13354 
      🟩 14                 Pass: 100%/29  | Total Time: 14h 29m | Avg Time: 29m 58s | Hits:  35%/14881 
      🟩 17                 Pass: 100%/28  | Total Time: 13h 36m | Avg Time: 29m 08s | Hits:  35%/14372 
      🟩 20                 Pass: 100%/16  | Total Time:  7h 25m | Avg Time: 27m 50s | Hits:  39%/8210  
    
  • 🟩 cub: Pass: 100%/99 | Total Time: 1d 16h | Avg Time: 24m 37s | Hits: 84%/67267

    🟩 cpu
      🟩 amd64              Pass: 100%/91  | Total Time:  1d 13h | Avg Time: 24m 39s | Hits:  85%/61635 
      🟩 arm64              Pass: 100%/8   | Total Time:  3h 12m | Avg Time: 24m 06s | Hits:  82%/5632  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total Time:  4h 29m | Avg Time: 17m 58s | Hits:  85%/9350  
      🟩 11.8               Pass: 100%/3   | Total Time:  1h 14m | Avg Time: 24m 52s | Hits:  82%/2112  
      🟩 12.4               Pass: 100%/81  | Total Time:  1d 10h | Avg Time: 25m 50s | Hits:  84%/55805 
    🟩 cudacxx_full
      🟩 clang-cuda16       Pass: 100%/2   | Total Time: 24m 08s | Avg Time: 12m 04s | Hits:  86%/1116  
      🟩 nvcc11.1           Pass: 100%/15  | Total Time:  4h 29m | Avg Time: 17m 58s | Hits:  85%/9350  
      🟩 nvcc11.8           Pass: 100%/3   | Total Time:  1h 14m | Avg Time: 24m 52s | Hits:  82%/2112  
      🟩 nvcc12.4           Pass: 100%/79  | Total Time:  1d 10h | Avg Time: 26m 11s | Hits:  84%/54689 
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total Time: 24m 08s | Avg Time: 12m 04s | Hits:  86%/1116  
      🟩 nvcc               Pass: 100%/97  | Total Time:  1d 16h | Avg Time: 24m 52s | Hits:  84%/66151 
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total Time:  1h 47m | Avg Time: 17m 58s | Hits:  84%/4002  
      🟩 clang10            Pass: 100%/3   | Total Time:  1h 01m | Avg Time: 20m 34s | Hits:  83%/2118  
      🟩 clang11            Pass: 100%/4   | Total Time:  1h 28m | Avg Time: 22m 05s | Hits:  83%/2824  
      🟩 clang12            Pass: 100%/4   | Total Time:  1h 14m | Avg Time: 18m 40s | Hits:  83%/2824  
      🟩 clang13            Pass: 100%/4   | Total Time:  1h 35m | Avg Time: 23m 45s | Hits:  83%/2824  
      🟩 clang14            Pass: 100%/4   | Total Time:  1h 25m | Avg Time: 21m 19s | Hits:  83%/2824  
      🟩 clang15            Pass: 100%/4   | Total Time:  1h 22m | Avg Time: 20m 33s | Hits:  83%/2816  
      🟩 clang16            Pass: 100%/14  | Total Time:  6h 53m | Avg Time: 29m 31s | Hits:  88%/9564  
      🟩 gcc6               Pass: 100%/2   | Total Time: 32m 34s | Avg Time: 16m 17s | Hits:  85%/1256  
      🟩 gcc7               Pass: 100%/6   | Total Time:  1h 52m | Avg Time: 18m 40s | Hits:  84%/4005  
      🟩 gcc8               Pass: 100%/6   | Total Time:  3h 35m | Avg Time: 35m 59s | Hits:  84%/4005  
      🟩 gcc9               Pass: 100%/6   | Total Time:  1h 48m | Avg Time: 18m 08s | Hits:  84%/4005  
      🟩 gcc10              Pass: 100%/4   | Total Time:  1h 36m | Avg Time: 24m 02s | Hits:  82%/2824  
      🟩 gcc11              Pass: 100%/7   | Total Time:  2h 48m | Avg Time: 24m 05s | Hits:  82%/4928  
      🟩 gcc12              Pass: 100%/16  | Total Time:  7h 17m | Avg Time: 27m 19s | Hits:  86%/11264 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total Time:  1h 11m | Avg Time: 23m 44s | Hits:  85%/1890  
      🟩 MSVC14.16          Pass: 100%/1   | Total Time: 38m 24s | Avg Time: 38m 24s | Hits:  85%/549   
      🟩 MSVC14.29          Pass: 100%/2   | Total Time: 53m 39s | Avg Time: 26m 49s | Hits:  85%/1098  
      🟩 MSVC14.39          Pass: 100%/3   | Total Time:  1h 34m | Avg Time: 31m 23s | Hits:  85%/1647  
    🟩 cxx_name
      🟩 clang              Pass: 100%/43  | Total Time: 16h 48m | Avg Time: 23m 27s | Hits:  85%/29796 
      🟩 gcc                Pass: 100%/47  | Total Time: 19h 31m | Avg Time: 24m 55s | Hits:  84%/32287 
      🟩 Intel              Pass: 100%/3   | Total Time:  1h 11m | Avg Time: 23m 44s | Hits:  85%/1890  
      🟩 MSVC               Pass: 100%/6   | Total Time:  3h 06m | Avg Time: 31m 02s | Hits:  85%/3294  
    🟩 gpu
      🟩 v100               Pass: 100%/99  | Total Time:  1d 16h | Avg Time: 24m 37s | Hits:  84%/67267 
    🟩 jobs
      🟩 build              Pass: 100%/91  | Total Time:  1d 09h | Avg Time: 22m 03s | Hits:  83%/61635 
      🟩 test               Pass: 100%/8   | Total Time:  7h 10m | Avg Time: 53m 48s | Hits:  99%/5632  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total Time:  3h 51m | Avg Time: 16m 31s | Hits:  85%/8801  
      🟩 ubuntu20.04        Pass: 100%/35  | Total Time: 14h 07m | Avg Time: 24m 12s | Hits:  83%/24710 
      🟩 ubuntu22.04        Pass: 100%/44  | Total Time: 19h 32m | Avg Time: 26m 38s | Hits:  86%/30462 
      🟩 windows2022        Pass: 100%/6   | Total Time:  3h 06m | Avg Time: 31m 02s | Hits:  85%/3294  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total Time:  1h 14m | Avg Time: 24m 52s | Hits:  82%/2112  
      🟩 90a                Pass: 100%/4   | Total Time: 43m 29s | Avg Time: 10m 52s | Hits:  82%/2816  
    🟩 std
      🟩 11                 Pass: 100%/26  | Total Time: 10h 58m | Avg Time: 25m 19s | Hits:  84%/17873 
      🟩 14                 Pass: 100%/29  | Total Time: 11h 20m | Avg Time: 23m 28s | Hits:  84%/19520 
      🟩 17                 Pass: 100%/28  | Total Time: 10h 45m | Avg Time: 23m 03s | Hits:  84%/18901 
      🟩 20                 Pass: 100%/16  | Total Time:  7h 32m | Avg Time: 28m 18s | Hits:  85%/10973 
    

🏃‍ Runner counts (total jobs: 198)

# Runner
154 linux-amd64-cpu16
16 linux-arm64-cpu16
16 linux-amd64-gpu-v100-latest-1
12 windows-amd64-cpu16

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

@lilohuang
Copy link

Hi @elstehle ,

This is a great finding, but I'm surprised to learn that there is an in-place version of thrust::copy_if. I thought the thrust::copy_if output buffer typically wasn't allowed to overlap with the input buffer, perhaps I misunderstood this.

thrust::copy_if() API documentation states:

Preconditions: The ranges [first, last) and [result, result + (last - first)) shall not overlap.

If so, I assume if I use a separate buffer (non-inplace) to store the output while calling thrust::copy_if, I won't encounter this issue. Please correct me if I'm wrong.

Thank you very much,
Lilo

@elstehle elstehle changed the title [DRAFT] Fix for in-place DeviceSelect, thrust::copy_if, and thrust::remove_if [DRAFT] Fix for in-place DeviceSelect & thrust::remove_if Jun 2, 2024
@elstehle
Copy link
Collaborator Author

elstehle commented Jun 2, 2024

This is a great finding, but I'm surprised to learn that there is an in-place version of thrust::copy_if. I thought the thrust::copy_if output buffer typically wasn't allowed to overlap with the input buffer, perhaps I misunderstood this.

Sorry for the confusion. You are correct that thrust::copy_if does not support in-place stream compaction and I think the benchmark results above highlight why we want to keep it that way: We want to avoid needles performance degradation for out-of-place usage of thrust::copy_if.

However, I believe in-place stream compaction is a reasonable feature request to have so I have opened #1799.

If so, I assume if I use a separate buffer (non-inplace) to store the output while calling thrust::copy_if, I won't encounter this issue. Please correct me if I'm wrong.

This is exactly right, out-of-place usage is not affected by the referenced issue.

Copy link
Contributor

github-actions bot commented Jun 4, 2024

🟨 CI Results: Pass: 97%/198 | Total Time: 4d 02h | Avg Time: 29m 54s | Hits: 32%/115429
  • 🟨 thrust: Pass: 96%/99 | Total Time: 1d 17h | Avg Time: 25m 25s | Hits: 23%/49278

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  96%/91  | Total Time:  1d 14h | Avg Time: 25m 25s | Hits:  23%/45170 
      🟩 arm64              Pass: 100%/8   | Total Time:  3h 23m | Avg Time: 25m 25s | Hits:  15%/4108  
    🔍 ctk: 12.4 🔍
      🟩 11.1               Pass: 100%/15  | Total Time:  6h 01m | Avg Time: 24m 06s | Hits:  16%/7700  
      🟩 11.8               Pass: 100%/3   | Total Time:  1h 44m | Avg Time: 34m 54s | Hits:  16%/1542  
      🔍 12.4               Pass:  96%/81  | Total Time:  1d 10h | Avg Time: 25m 19s | Hits:  24%/40036 
    🔍 cxx_name: clang 🔍
      🔍 clang              Pass:  93%/43  | Total Time: 16h 16m | Avg Time: 22m 42s | Hits:  24%/20520 
      🟩 gcc                Pass: 100%/47  | Total Time: 19h 01m | Avg Time: 24m 17s | Hits:  23%/24156 
      🟩 Intel              Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 34s | Hits:  16%/1548  
      🟩 MSVC               Pass: 100%/6   | Total Time:  4h 58m | Avg Time: 49m 40s | Hits:  19%/3054  
    🔍 jobs: build 🔍
      🔍 build              Pass:  96%/91  | Total Time:  1d 16h | Avg Time: 26m 26s | Hits:  16%/45170 
      🟩 test               Pass: 100%/8   | Total Time:  1h 50m | Avg Time: 13m 49s | Hits:  99%/4108  
    🔍 os: ubuntu22.04 🔍
      🟩 ubuntu18.04        Pass: 100%/14  | Total Time:  5h 14m | Avg Time: 22m 27s | Hits:  16%/7191  
      🟩 ubuntu20.04        Pass: 100%/35  | Total Time: 15h 17m | Avg Time: 26m 12s | Hits:  15%/17968 
      🔍 ubuntu22.04        Pass:  93%/44  | Total Time: 16h 27m | Avg Time: 22m 26s | Hits:  32%/21065 
      🟩 windows2022        Pass: 100%/6   | Total Time:  4h 58m | Avg Time: 49m 40s | Hits:  19%/3054  
    🟨 cudacxx_full
      🟥 clang-cuda16       Pass:   0%/2   | Total Time:  4m 20s | Avg Time:  2m 10s
      🟩 nvcc11.1           Pass: 100%/15  | Total Time:  6h 01m | Avg Time: 24m 06s | Hits:  16%/7700  
      🟩 nvcc11.8           Pass: 100%/3   | Total Time:  1h 44m | Avg Time: 34m 54s | Hits:  16%/1542  
      🟨 nvcc12.4           Pass:  98%/79  | Total Time:  1d 10h | Avg Time: 25m 54s | Hits:  24%/40036 
    🟨 cxx_full
      🟩 clang9             Pass: 100%/6   | Total Time:  2h 28m | Avg Time: 24m 48s | Hits:  16%/3078  
      🟩 clang10            Pass: 100%/3   | Total Time:  1h 21m | Avg Time: 27m 10s | Hits:  16%/1539  
      🟩 clang11            Pass: 100%/4   | Total Time:  1h 43m | Avg Time: 25m 49s | Hits:  15%/2052  
      🟩 clang12            Pass: 100%/4   | Total Time:  1h 45m | Avg Time: 26m 24s | Hits:  15%/2052  
      🟩 clang13            Pass: 100%/4   | Total Time:  1h 38m | Avg Time: 24m 44s | Hits:  15%/2052  
      🟩 clang14            Pass: 100%/4   | Total Time:  1h 43m | Avg Time: 25m 47s | Hits:  15%/2052  
      🟨 clang15            Pass:  75%/4   | Total Time:  1h 19m | Avg Time: 19m 55s | Hits:  16%/1539  
      🟨 clang16            Pass:  85%/14  | Total Time:  4h 15m | Avg Time: 18m 15s | Hits:  43%/6156  
      🟩 gcc6               Pass: 100%/2   | Total Time: 44m 25s | Avg Time: 22m 12s | Hits:  17%/1026  
      🟩 gcc7               Pass: 100%/6   | Total Time:  2h 28m | Avg Time: 24m 40s | Hits:  16%/3084  
      🟩 gcc8               Pass: 100%/6   | Total Time:  2h 23m | Avg Time: 23m 52s | Hits:  16%/3084  
      🟩 gcc9               Pass: 100%/6   | Total Time:  2h 28m | Avg Time: 24m 48s | Hits:  16%/3084  
      🟩 gcc10              Pass: 100%/4   | Total Time:  1h 45m | Avg Time: 26m 28s | Hits:  15%/2056  
      🟩 gcc11              Pass: 100%/7   | Total Time:  3h 29m | Avg Time: 29m 58s | Hits:  15%/3598  
      🟩 gcc12              Pass: 100%/16  | Total Time:  5h 41m | Avg Time: 21m 20s | Hits:  36%/8224  
      🟩 Intel2023.2.0      Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 34s | Hits:  16%/1548  
      🟩 MSVC14.16          Pass: 100%/1   | Total Time: 47m 06s | Avg Time: 47m 06s | Hits:  19%/509   
      🟩 MSVC14.29          Pass: 100%/2   | Total Time:  1h 37m | Avg Time: 48m 46s | Hits:  19%/1018  
      🟩 MSVC14.39          Pass: 100%/3   | Total Time:  2h 33m | Avg Time: 51m 08s | Hits:  19%/1527  
    🟨 std
      🟩 11                 Pass: 100%/26  | Total Time:  9h 35m | Avg Time: 22m 07s | Hits:  26%/13354 
      🟨 14                 Pass:  96%/29  | Total Time: 12h 55m | Avg Time: 26m 45s | Hits:  20%/14368 
      🟨 17                 Pass:  96%/28  | Total Time: 12h 39m | Avg Time: 27m 07s | Hits:  20%/13859 
      🟨 20                 Pass:  93%/16  | Total Time:  6h 46m | Avg Time: 25m 24s | Hits:  25%/7697  
    🟨 gpu
      🟨 v100               Pass:  96%/99  | Total Time:  1d 17h | Avg Time: 25m 25s | Hits:  23%/49278 
    🟨 cudacxx_name
      🟥 clang-cuda         Pass:   0%/2   | Total Time:  4m 20s | Avg Time:  2m 10s
      🟨 nvcc               Pass:  98%/97  | Total Time:  1d 17h | Avg Time: 25m 54s | Hits:  23%/49278 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total Time:  1h 44m | Avg Time: 34m 54s | Hits:  16%/1542  
      🟩 90a                Pass: 100%/4   | Total Time:  1h 02m | Avg Time: 15m 37s | Hits:  15%/2056  
    
  • 🟨 cub: Pass: 97%/99 | Total Time: 2d 08h | Avg Time: 34m 23s | Hits: 39%/66151

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  97%/91  | Total Time:  2d 04h | Avg Time: 34m 31s | Hits:  40%/60519 
      🟩 arm64              Pass: 100%/8   | Total Time:  4h 22m | Avg Time: 32m 52s | Hits:  34%/5632  
    🔍 ctk: 12.4 🔍
      🟩 11.1               Pass: 100%/15  | Total Time:  6h 42m | Avg Time: 26m 50s | Hits:  32%/9350  
      🟩 11.8               Pass: 100%/3   | Total Time:  2h 09m | Avg Time: 43m 08s | Hits:  34%/2112  
      🔍 12.4               Pass:  97%/81  | Total Time:  1d 23h | Avg Time: 35m 27s | Hits:  41%/54689 
    🚨 cudacxx_full: clang-cuda16 🚨
      🔥 clang-cuda16       Pass:   0%/2   | Total Time:  9m 33s | Avg Time:  4m 46s
      🟩 nvcc11.1           Pass: 100%/15  | Total Time:  6h 42m | Avg Time: 26m 50s | Hits:  32%/9350  
      🟩 nvcc11.8           Pass: 100%/3   | Total Time:  2h 09m | Avg Time: 43m 08s | Hits:  34%/2112  
      🟩 nvcc12.4           Pass: 100%/79  | Total Time:  1d 23h | Avg Time: 36m 14s | Hits:  41%/54689 
    🚨 cudacxx_name: clang-cuda 🚨
      🔥 clang-cuda         Pass:   0%/2   | Total Time:  9m 33s | Avg Time:  4m 46s
      🟩 nvcc               Pass: 100%/97  | Total Time:  2d 08h | Avg Time: 34m 59s | Hits:  39%/66151 
    🔍 cxx_full: clang16 🔍
      🟩 clang9             Pass: 100%/6   | Total Time:  2h 53m | Avg Time: 28m 53s | Hits:  33%/4002  
      🟩 clang10            Pass: 100%/3   | Total Time:  1h 42m | Avg Time: 34m 11s | Hits:  34%/2118  
      🟩 clang11            Pass: 100%/4   | Total Time:  2h 06m | Avg Time: 31m 35s | Hits:  34%/2824  
      🟩 clang12            Pass: 100%/4   | Total Time:  2h 08m | Avg Time: 32m 07s | Hits:  34%/2824  
      🟩 clang13            Pass: 100%/4   | Total Time:  2h 05m | Avg Time: 31m 22s | Hits:  34%/2824  
      🟩 clang14            Pass: 100%/4   | Total Time:  2h 13m | Avg Time: 33m 15s | Hits:  34%/2824  
      🟩 clang15            Pass: 100%/4   | Total Time:  2h 10m | Avg Time: 32m 32s | Hits:  34%/2816  
      🔍 clang16            Pass:  85%/14  | Total Time:  8h 45m | Avg Time: 37m 33s | Hits:  56%/8448  
      🟩 gcc6               Pass: 100%/2   | Total Time: 49m 52s | Avg Time: 24m 56s | Hits:  31%/1256  
      🟩 gcc7               Pass: 100%/6   | Total Time:  2h 51m | Avg Time: 28m 35s | Hits:  33%/4005  
      🟩 gcc8               Pass: 100%/6   | Total Time:  2h 57m | Avg Time: 29m 31s | Hits:  33%/4005  
      🟩 gcc9               Pass: 100%/6   | Total Time:  2h 59m | Avg Time: 29m 58s | Hits:  32%/4005  
      🟩 gcc10              Pass: 100%/4   | Total Time:  2h 19m | Avg Time: 34m 52s | Hits:  34%/2824  
      🟩 gcc11              Pass: 100%/7   | Total Time:  4h 21m | Avg Time: 37m 24s | Hits:  34%/4928  
      🟩 gcc12              Pass: 100%/16  | Total Time: 10h 14m | Avg Time: 38m 22s | Hits:  50%/11264 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total Time:  1h 45m | Avg Time: 35m 02s | Hits:  31%/1890  
      🟩 MSVC14.16          Pass: 100%/1   | Total Time: 45m 55s | Avg Time: 45m 55s | Hits:  40%/549   
      🟩 MSVC14.29          Pass: 100%/2   | Total Time:  1h 22m | Avg Time: 41m 27s | Hits:  40%/1098  
      🟩 MSVC14.39          Pass: 100%/3   | Total Time:  2h 11m | Avg Time: 43m 50s | Hits:  40%/1647  
    🔍 cxx_name: clang 🔍
      🔍 clang              Pass:  95%/43  | Total Time:  1d 00h | Avg Time: 33m 36s | Hits:  41%/28680 
      🟩 gcc                Pass: 100%/47  | Total Time:  1d 02h | Avg Time: 33m 54s | Hits:  39%/32287 
      🟩 Intel              Pass: 100%/3   | Total Time:  1h 45m | Avg Time: 35m 02s | Hits:  31%/1890  
      🟩 MSVC               Pass: 100%/6   | Total Time:  4h 20m | Avg Time: 43m 23s | Hits:  40%/3294  
    🔍 jobs: build 🔍
      🔍 build              Pass:  97%/91  | Total Time:  1d 23h | Avg Time: 31m 38s | Hits:  34%/60519 
      🟩 test               Pass: 100%/8   | Total Time:  8h 44m | Avg Time:  1h 05m | Hits:  99%/5632  
    🔍 os: ubuntu22.04 🔍
      🟩 ubuntu18.04        Pass: 100%/14  | Total Time:  5h 56m | Avg Time: 25m 28s | Hits:  31%/8801  
      🟩 ubuntu20.04        Pass: 100%/35  | Total Time: 19h 10m | Avg Time: 32m 52s | Hits:  34%/24710 
      🔍 ubuntu22.04        Pass:  95%/44  | Total Time:  1d 03h | Avg Time: 37m 12s | Hits:  46%/29346 
      🟩 windows2022        Pass: 100%/6   | Total Time:  4h 20m | Avg Time: 43m 23s | Hits:  40%/3294  
    🟨 std
      🟩 11                 Pass: 100%/26  | Total Time: 14h 22m | Avg Time: 33m 09s | Hits:  39%/17873 
      🟩 14                 Pass: 100%/29  | Total Time: 16h 41m | Avg Time: 34m 32s | Hits:  39%/19520 
      🟨 17                 Pass:  96%/28  | Total Time: 15h 59m | Avg Time: 34m 15s | Hits:  39%/18343 
      🟨 20                 Pass:  93%/16  | Total Time:  9h 41m | Avg Time: 36m 19s | Hits:  43%/10415 
    🟨 gpu
      🟨 v100               Pass:  97%/99  | Total Time:  2d 08h | Avg Time: 34m 23s | Hits:  39%/66151 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total Time:  2h 09m | Avg Time: 43m 08s | Hits:  34%/2112  
      🟩 90a                Pass: 100%/4   | Total Time:  1h 14m | Avg Time: 18m 40s | Hits:  34%/2816  
    

🏃‍ Runner counts (total jobs: 198)

# Runner
154 linux-amd64-cpu16
16 linux-arm64-cpu16
16 linux-amd64-gpu-v100-latest-1
12 windows-amd64-cpu16

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

@elstehle elstehle marked this pull request as ready for review June 10, 2024 13:10
@elstehle elstehle requested review from a team as code owners June 10, 2024 13:10
@elstehle elstehle changed the title [DRAFT] Fix for in-place DeviceSelect & thrust::remove_if Fix for in-place DeviceSelect & thrust::remove_if Jun 10, 2024
@elstehle
Copy link
Collaborator Author

elstehle commented Jun 10, 2024

Edit: Turns out, this branch was still based off of 7fe0eb4 (main, May 24) and the sass comparison was against e734d68 (main, Jun 7). After bisecting, the sass delta, which I had reported in my original comment (see below), was introduced by 733eb94. Those sass changes only affected kernels other than DeviceSelectSweepKernel.

Original comment:

In it's current version the sass unfortunately changed for algorithms (i.e., may_alias=false) which we want to remain unchanged, as they should not be affected by these changes.

I will move the PR back to Draft stage until I was able to resurrect sass compatibility.

@elstehle elstehle marked this pull request as draft June 10, 2024 17:35
Copy link
Contributor

🟨 CI Results: Pass: 99%/198 | Total Time: 3d 00h | Avg Time: 21m 53s | Hits: 68%/117380
  • 🟩 thrust: Pass: 100%/99 | Total Time: 1d 15h | Avg Time: 24m 00s | Hits: 43%/50817

    🟩 cpu
      🟩 amd64              Pass: 100%/91  | Total Time:  1d 15h | Avg Time: 25m 51s | Hits:  38%/46709 
      🟩 arm64              Pass: 100%/8   | Total Time: 22m 57s | Avg Time:  2m 52s | Hits:  99%/4108  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total Time:  5h 57m | Avg Time: 23m 50s | Hits:  31%/7700  
      🟩 11.8               Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 20s | Hits:  33%/1542  
      🟩 12.4               Pass: 100%/81  | Total Time:  1d 07h | Avg Time: 23m 41s | Hits:  45%/41575 
    🟩 cudacxx_full
      🟩 clang-cuda16       Pass: 100%/2   | Total Time: 46m 45s | Avg Time: 23m 22s | Hits:  31%/1026  
      🟩 nvcc11.1           Pass: 100%/15  | Total Time:  5h 57m | Avg Time: 23m 50s | Hits:  31%/7700  
      🟩 nvcc11.8           Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 20s | Hits:  33%/1542  
      🟩 nvcc12.4           Pass: 100%/79  | Total Time:  1d 07h | Avg Time: 23m 41s | Hits:  46%/40549 
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total Time: 46m 45s | Avg Time: 23m 22s | Hits:  31%/1026  
      🟩 nvcc               Pass: 100%/97  | Total Time:  1d 14h | Avg Time: 24m 01s | Hits:  43%/49791 
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total Time:  2h 19m | Avg Time: 23m 16s | Hits:  33%/3078  
      🟩 clang10            Pass: 100%/3   | Total Time:  1h 20m | Avg Time: 26m 48s | Hits:  33%/1539  
      🟩 clang11            Pass: 100%/4   | Total Time:  1h 44m | Avg Time: 26m 08s | Hits:  32%/2052  
      🟩 clang12            Pass: 100%/4   | Total Time:  1h 42m | Avg Time: 25m 42s | Hits:  32%/2052  
      🟩 clang13            Pass: 100%/4   | Total Time:  1h 40m | Avg Time: 25m 12s | Hits:  32%/2052  
      🟩 clang14            Pass: 100%/4   | Total Time:  1h 41m | Avg Time: 25m 20s | Hits:  32%/2052  
      🟩 clang15            Pass: 100%/4   | Total Time:  1h 42m | Avg Time: 25m 43s | Hits:  32%/2052  
      🟩 clang16            Pass: 100%/14  | Total Time:  3h 36m | Avg Time: 15m 25s | Hits:  70%/7182  
      🟩 gcc6               Pass: 100%/2   | Total Time: 42m 52s | Avg Time: 21m 26s | Hits:  33%/1026  
      🟩 gcc7               Pass: 100%/6   | Total Time:  2h 23m | Avg Time: 23m 53s | Hits:  30%/3084  
      🟩 gcc8               Pass: 100%/6   | Total Time:  2h 25m | Avg Time: 24m 11s | Hits:  33%/3084  
      🟩 gcc9               Pass: 100%/6   | Total Time:  2h 29m | Avg Time: 24m 50s | Hits:  33%/3084  
      🟩 gcc10              Pass: 100%/4   | Total Time:  1h 47m | Avg Time: 26m 53s | Hits:  32%/2056  
      🟩 gcc11              Pass: 100%/7   | Total Time:  3h 26m | Avg Time: 29m 31s | Hits:  32%/3598  
      🟩 gcc12              Pass: 100%/16  | Total Time:  3h 57m | Avg Time: 14m 51s | Hits:  66%/8224  
      🟩 Intel2023.2.0      Pass: 100%/3   | Total Time:  1h 36m | Avg Time: 32m 08s | Hits:  33%/1548  
      🟩 MSVC14.16          Pass: 100%/1   | Total Time: 47m 11s | Avg Time: 47m 11s | Hits:  28%/509   
      🟩 MSVC14.29          Pass: 100%/2   | Total Time:  1h 39m | Avg Time: 49m 56s | Hits:  28%/1018  
      🟩 MSVC14.39          Pass: 100%/3   | Total Time:  2h 32m | Avg Time: 50m 45s | Hits:  28%/1527  
    🟩 cxx_name
      🟩 clang              Pass: 100%/43  | Total Time: 15h 48m | Avg Time: 22m 03s | Hits:  45%/22059 
      🟩 gcc                Pass: 100%/47  | Total Time: 17h 12m | Avg Time: 21m 57s | Hits:  43%/24156 
      🟩 Intel              Pass: 100%/3   | Total Time:  1h 36m | Avg Time: 32m 08s | Hits:  33%/1548  
      🟩 MSVC               Pass: 100%/6   | Total Time:  4h 59m | Avg Time: 49m 53s | Hits:  28%/3054  
    🟩 gpu
      🟩 v100               Pass: 100%/99  | Total Time:  1d 15h | Avg Time: 24m 00s | Hits:  43%/50817 
    🟩 jobs
      🟩 build              Pass: 100%/91  | Total Time:  1d 13h | Avg Time: 24m 58s | Hits:  38%/46709 
      🟩 test               Pass: 100%/8   | Total Time:  1h 43m | Avg Time: 12m 59s | Hits:  99%/4108  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total Time:  5h 10m | Avg Time: 22m 10s | Hits:  32%/7191  
      🟩 ubuntu20.04        Pass: 100%/35  | Total Time: 15h 07m | Avg Time: 25m 55s | Hits:  32%/17968 
      🟩 ubuntu22.04        Pass: 100%/44  | Total Time: 14h 19m | Avg Time: 19m 32s | Hits:  57%/22604 
      🟩 windows2022        Pass: 100%/6   | Total Time:  4h 59m | Avg Time: 49m 53s | Hits:  28%/3054  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total Time:  1h 40m | Avg Time: 33m 20s | Hits:  33%/1542  
      🟩 90a                Pass: 100%/4   | Total Time: 59m 54s | Avg Time: 14m 58s | Hits:  32%/2056  
    🟩 std
      🟩 11                 Pass: 100%/26  | Total Time:  8h 32m | Avg Time: 19m 43s | Hits:  46%/13354 
      🟩 14                 Pass: 100%/29  | Total Time: 12h 39m | Avg Time: 26m 10s | Hits:  40%/14881 
      🟩 17                 Pass: 100%/28  | Total Time: 12h 15m | Avg Time: 26m 17s | Hits:  40%/14372 
      🟩 20                 Pass: 100%/16  | Total Time:  6h 08m | Avg Time: 23m 02s | Hits:  48%/8210  
    
  • 🟨 cub: Pass: 98%/99 | Total Time: 1d 08h | Avg Time: 19m 45s | Hits: 88%/66563

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  98%/91  | Total Time:  1d 07h | Avg Time: 20m 58s | Hits:  87%/60931 
      🟩 arm64              Pass: 100%/8   | Total Time: 47m 24s | Avg Time:  5m 55s | Hits:  97%/5632  
    🔍 ctk: 12.4 🔍
      🟩 11.1               Pass: 100%/15  | Total Time:  3h 24m | Avg Time: 13m 38s | Hits:  88%/9350  
      🟩 11.8               Pass: 100%/3   | Total Time: 57m 35s | Avg Time: 19m 11s | Hits:  87%/2112  
      🔍 12.4               Pass:  98%/81  | Total Time:  1d 04h | Avg Time: 20m 54s | Hits:  88%/55101 
    🔍 cudacxx_full: nvcc12.4 🔍
      🟩 clang-cuda16       Pass: 100%/2   | Total Time:  6m 13s | Avg Time:  3m 06s | Hits: 100%/1116  
      🟩 nvcc11.1           Pass: 100%/15  | Total Time:  3h 24m | Avg Time: 13m 38s | Hits:  88%/9350  
      🟩 nvcc11.8           Pass: 100%/3   | Total Time: 57m 35s | Avg Time: 19m 11s | Hits:  87%/2112  
      🔍 nvcc12.4           Pass:  98%/79  | Total Time:  1d 04h | Avg Time: 21m 21s | Hits:  88%/53985 
    🔍 cudacxx_name: nvcc 🔍
      🟩 clang-cuda         Pass: 100%/2   | Total Time:  6m 13s | Avg Time:  3m 06s | Hits: 100%/1116  
      🔍 nvcc               Pass:  98%/97  | Total Time:  1d 08h | Avg Time: 20m 06s | Hits:  88%/65447 
    🔍 cxx_full: clang16 🔍
      🟩 clang9             Pass: 100%/6   | Total Time:  1h 35m | Avg Time: 15m 57s | Hits:  84%/4002  
      🟩 clang10            Pass: 100%/3   | Total Time: 58m 28s | Avg Time: 19m 29s | Hits:  83%/2118  
      🟩 clang11            Pass: 100%/4   | Total Time:  1h 11m | Avg Time: 17m 56s | Hits:  83%/2824  
      🟩 clang12            Pass: 100%/4   | Total Time:  1h 12m | Avg Time: 18m 07s | Hits:  83%/2824  
      🟩 clang13            Pass: 100%/4   | Total Time:  1h 12m | Avg Time: 18m 12s | Hits:  83%/2824  
      🟩 clang14            Pass: 100%/4   | Total Time: 47m 08s | Avg Time: 11m 47s | Hits:  91%/2824  
      🟩 clang15            Pass: 100%/4   | Total Time: 47m 50s | Avg Time: 11m 57s | Hits:  91%/2816  
      🔍 clang16            Pass:  92%/14  | Total Time:  5h 32m | Avg Time: 23m 44s | Hits:  94%/8860  
      🟩 gcc6               Pass: 100%/2   | Total Time: 18m 44s | Avg Time:  9m 22s | Hits:  98%/1256  
      🟩 gcc7               Pass: 100%/6   | Total Time:  1h 09m | Avg Time: 11m 34s | Hits:  92%/4005  
      🟩 gcc8               Pass: 100%/6   | Total Time:  1h 19m | Avg Time: 13m 13s | Hits:  90%/4005  
      🟩 gcc9               Pass: 100%/6   | Total Time:  1h 48m | Avg Time: 18m 00s | Hits:  77%/4005  
      🟩 gcc10              Pass: 100%/4   | Total Time:  1h 06m | Avg Time: 16m 30s | Hits:  85%/2824  
      🟩 gcc11              Pass: 100%/7   | Total Time:  2h 00m | Avg Time: 17m 12s | Hits:  86%/4928  
      🟩 gcc12              Pass: 100%/16  | Total Time:  7h 44m | Avg Time: 29m 00s | Hits:  92%/11264 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total Time:  1h 03m | Avg Time: 21m 10s | Hits:  85%/1890  
      🟩 MSVC14.16          Pass: 100%/1   | Total Time: 32m 39s | Avg Time: 32m 39s | Hits:  85%/549   
      🟩 MSVC14.29          Pass: 100%/2   | Total Time: 54m 35s | Avg Time: 27m 17s | Hits:  85%/1098  
      🟩 MSVC14.39          Pass: 100%/3   | Total Time:  1h 21m | Avg Time: 27m 00s | Hits:  85%/1647  
    🔍 cxx_name: clang 🔍
      🔍 clang              Pass:  97%/43  | Total Time: 13h 18m | Avg Time: 18m 34s | Hits:  88%/29092 
      🟩 gcc                Pass: 100%/47  | Total Time: 15h 26m | Avg Time: 19m 42s | Hits:  88%/32287 
      🟩 Intel              Pass: 100%/3   | Total Time:  1h 03m | Avg Time: 21m 10s | Hits:  85%/1890  
      🟩 MSVC               Pass: 100%/6   | Total Time:  2h 48m | Avg Time: 28m 02s | Hits:  85%/3294  
    🔍 jobs: test 🔍
      🟩 build              Pass: 100%/91  | Total Time: 23h 21m | Avg Time: 15m 24s | Hits:  87%/61635 
      🔍 test               Pass:  87%/8   | Total Time:  9h 14m | Avg Time:  1h 09m | Hits:  99%/4928  
    🔍 os: ubuntu22.04 🔍
      🟩 ubuntu18.04        Pass: 100%/14  | Total Time:  2h 52m | Avg Time: 12m 17s | Hits:  88%/8801  
      🟩 ubuntu20.04        Pass: 100%/35  | Total Time:  9h 47m | Avg Time: 16m 47s | Hits:  85%/24710 
      🔍 ubuntu22.04        Pass:  97%/44  | Total Time: 17h 08m | Avg Time: 23m 22s | Hits:  91%/29758 
      🟩 windows2022        Pass: 100%/6   | Total Time:  2h 48m | Avg Time: 28m 02s | Hits:  85%/3294  
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/26  | Total Time:  8h 07m | Avg Time: 18m 45s | Hits:  89%/17873 
      🔍 14                 Pass:  96%/29  | Total Time:  9h 03m | Avg Time: 18m 45s | Hits:  87%/18816 
      🟩 17                 Pass: 100%/28  | Total Time:  8h 53m | Avg Time: 19m 03s | Hits:  87%/18901 
      🟩 20                 Pass: 100%/16  | Total Time:  6h 31m | Avg Time: 24m 26s | Hits:  89%/10973 
    🟨 gpu
      🟨 v100               Pass:  98%/99  | Total Time:  1d 08h | Avg Time: 19m 45s | Hits:  88%/66563 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total Time: 57m 35s | Avg Time: 19m 11s | Hits:  87%/2112  
      🟩 90a                Pass: 100%/4   | Total Time: 33m 37s | Avg Time:  8m 24s | Hits:  90%/2816  
    

🏃‍ Runner counts (total jobs: 198)

# Runner
154 linux-amd64-cpu16
16 linux-arm64-cpu16
16 linux-amd64-gpu-v100-latest-1
12 windows-amd64-cpu16

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Copy link
Contributor

🟨 CI finished in 12h 50m: Pass: 99%/249 | Total: 1d 07h | Avg: 7m 36s | Max: 47m 49s | Hits: 99%/246608
  • 🟨 cub: Pass: 98%/131 | Total: 20h 38m | Avg: 9m 27s | Max: 47m 49s | Hits: 98%/107342

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  98%/123 | Total: 19h 53m | Avg:  9m 42s | Max: 47m 49s | Hits:  98%/100534
      🟩 arm64              Pass: 100%/8   | Total: 45m 47s | Avg:  5m 43s | Max:  7m 46s | Hits:  99%/6808  
    🔍 ctk: 12.4 🔍
      🟩 11.1               Pass: 100%/15  | Total:  1h 29m | Avg:  5m 56s | Max: 25m 52s | Hits:  97%/11554 
      🟩 11.8               Pass: 100%/3   | Total: 15m 43s | Avg:  5m 14s | Max:  5m 30s | Hits:  99%/2553  
      🔍 12.4               Pass:  98%/113 | Total: 18h 53m | Avg: 10m 02s | Max: 47m 49s | Hits:  98%/93235 
    🔍 cudacxx_full: nvcc12.4 🔍
      🟩 clang-cuda17       Pass: 100%/2   | Total:  7m 39s | Avg:  3m 49s | Max:  3m 51s | Hits: 100%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 29m | Avg:  5m 56s | Max: 25m 52s | Hits:  97%/11554 
      🟩 nvcc11.8           Pass: 100%/3   | Total: 15m 43s | Avg:  5m 14s | Max:  5m 30s | Hits:  99%/2553  
      🔍 nvcc12.4           Pass:  98%/111 | Total: 18h 46m | Avg: 10m 08s | Max: 47m 49s | Hits:  98%/91827 
    🔍 cudacxx_name: nvcc 🔍
      🟩 clang-cuda         Pass: 100%/2   | Total:  7m 39s | Avg:  3m 49s | Max:  3m 51s | Hits: 100%/1408  
      🔍 nvcc               Pass:  98%/129 | Total: 20h 31m | Avg:  9m 32s | Max: 47m 49s | Hits:  98%/105934
    🔍 cxx_full: clang17 🔍
      🟩 clang9             Pass: 100%/6   | Total: 28m 04s | Avg:  4m 40s | Max:  5m 46s | Hits:  99%/4884  
      🟩 clang10            Pass: 100%/3   | Total: 16m 19s | Avg:  5m 26s | Max:  5m 31s | Hits:  99%/2559  
      🟩 clang11            Pass: 100%/4   | Total: 20m 26s | Avg:  5m 06s | Max:  5m 18s | Hits:  99%/3412  
      🟩 clang12            Pass: 100%/4   | Total: 20m 31s | Avg:  5m 07s | Max:  5m 24s | Hits:  99%/3412  
      🟩 clang13            Pass: 100%/4   | Total: 19m 55s | Avg:  4m 58s | Max:  5m 10s | Hits:  99%/3412  
      🟩 clang14            Pass: 100%/4   | Total: 20m 05s | Avg:  5m 01s | Max:  5m 09s | Hits:  99%/3412  
      🟩 clang15            Pass: 100%/4   | Total: 20m 21s | Avg:  5m 05s | Max:  5m 17s | Hits:  99%/3404  
      🟩 clang16            Pass: 100%/4   | Total: 20m 27s | Avg:  5m 06s | Max:  5m 18s | Hits:  99%/3404  
      🔍 clang17            Pass:  92%/26  | Total:  5h 53m | Avg: 13m 36s | Max: 47m 49s | Hits:  99%/20130 
      🟩 gcc6               Pass: 100%/2   | Total:  7m 41s | Avg:  3m 50s | Max:  3m 59s | Hits:  99%/1550  
      🟩 gcc7               Pass: 100%/6   | Total: 25m 36s | Avg:  4m 16s | Max:  5m 12s | Hits:  99%/4887  
      🟩 gcc8               Pass: 100%/6   | Total: 52m 12s | Avg:  8m 42s | Max: 31m 17s | Hits:  92%/4887  
      🟩 gcc9               Pass: 100%/6   | Total: 48m 26s | Avg:  8m 04s | Max: 25m 52s | Hits:  93%/4887  
      🟩 gcc10              Pass: 100%/4   | Total: 20m 43s | Avg:  5m 10s | Max:  5m 27s | Hits:  99%/3412  
      🟩 gcc11              Pass: 100%/7   | Total: 36m 28s | Avg:  5m 12s | Max:  5m 30s | Hits:  99%/5957  
      🟩 gcc12              Pass: 100%/4   | Total: 21m 02s | Avg:  5m 15s | Max:  5m 30s | Hits:  99%/3404  
      🟩 gcc13              Pass: 100%/28  | Total:  6h 59m | Avg: 14m 58s | Max: 45m 50s | Hits:  98%/23828 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 15m 41s | Avg:  5m 13s | Max:  5m 18s | Hits: 100%/2331  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 14m 06s | Avg: 14m 06s | Max: 14m 06s | Hits:  98%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 22m 52s | Avg: 11m 26s | Max: 11m 46s | Hits:  98%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 34m 50s | Avg: 11m 36s | Max: 11m 48s | Hits:  98%/2085  
    🔍 cxx_name: clang 🔍
      🔍 clang              Pass:  96%/59  | Total:  8h 40m | Avg:  8m 48s | Max: 47m 49s | Hits:  99%/48029 
      🟩 gcc                Pass: 100%/63  | Total: 10h 31m | Avg: 10m 01s | Max: 45m 50s | Hits:  97%/52812 
      🟩 Intel              Pass: 100%/3   | Total: 15m 41s | Avg:  5m 13s | Max:  5m 18s | Hits: 100%/2331  
      🟩 MSVC               Pass: 100%/6   | Total:  1h 11m | Avg: 11m 58s | Max: 14m 06s | Hits:  98%/4170  
    🔍 jobs: HostLaunch 🔍
      🟩 Build              Pass: 100%/99  | Total:  9h 35m | Avg:  5m 48s | Max: 31m 17s | Hits:  98%/81812 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 49m | Avg: 21m 08s | Max: 30m 24s | Hits:  99%/6808  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 25m | Avg: 18m 13s | Max: 45m 50s | Hits:  95%/6808  
      🔍 HostLaunch         Pass:  75%/8   | Total:  1h 59m | Avg: 14m 54s | Max: 29m 04s | Hits:  99%/5106  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 49m | Avg: 28m 42s | Max: 47m 49s | Hits:  99%/6808  
    🔍 os: ubuntu22.04 🔍
      🟩 ubuntu18.04        Pass: 100%/14  | Total:  1h 15m | Avg:  5m 21s | Max: 25m 52s | Hits:  97%/10859 
      🟩 ubuntu20.04        Pass: 100%/35  | Total:  3h 24m | Avg:  5m 51s | Max: 31m 17s | Hits:  98%/29855 
      🔍 ubuntu22.04        Pass:  97%/76  | Total: 14h 47m | Avg: 11m 40s | Max: 47m 49s | Hits:  99%/62458 
      🟩 windows2022        Pass: 100%/6   | Total:  1h 11m | Avg: 11m 58s | Max: 14m 06s | Hits:  98%/4170  
    🟨 std
      🟩 11                 Pass: 100%/34  | Total:  4h 42m | Avg:  8m 18s | Max: 25m 52s | Hits:  98%/28503 
      🟩 14                 Pass: 100%/37  | Total:  5h 35m | Avg:  9m 03s | Max: 47m 49s | Hits:  99%/30588 
      🟨 17                 Pass:  97%/36  | Total:  5h 44m | Avg:  9m 34s | Max: 32m 31s | Hits:  98%/28971 
      🟨 20                 Pass:  95%/24  | Total:  4h 36m | Avg: 11m 31s | Max: 45m 50s | Hits:  97%/19280 
    🟨 gpu
      🟨 v100               Pass:  98%/131 | Total: 20h 38m | Avg:  9m 27s | Max: 47m 49s | Hits:  98%/107342
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 15m 43s | Avg:  5m 14s | Max:  5m 30s | Hits:  99%/2553  
      🟩 90a                Pass: 100%/4   | Total: 15m 51s | Avg:  3m 57s | Max:  4m 08s | Hits:  99%/3404  
    
  • 🟩 thrust: Pass: 100%/118 | Total: 10h 53m | Avg: 5m 32s | Max: 18m 37s | Hits: 99%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total: 10h 19m | Avg:  5m 37s | Max: 18m 37s | Hits:  99%/129822
      🟩 arm64              Pass: 100%/8   | Total: 34m 27s | Avg:  4m 18s | Max:  4m 33s | Hits:  99%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 02m | Avg:  4m 09s | Max: 15m 57s | Hits:  99%/17705 
      🟩 11.8               Pass: 100%/3   | Total: 11m 10s | Avg:  3m 43s | Max:  3m 48s | Hits:  99%/3543  
      🟩 12.4               Pass: 100%/100 | Total:  9h 40m | Avg:  5m 48s | Max: 18m 37s | Hits:  99%/118018
    🟩 cudacxx_full
      🟩 clang-cuda17       Pass: 100%/2   | Total:  7m 53s | Avg:  3m 56s | Max:  4m 01s | Hits: 100%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 02m | Avg:  4m 09s | Max: 15m 57s | Hits:  99%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total: 11m 10s | Avg:  3m 43s | Max:  3m 48s | Hits:  99%/3543  
      🟩 nvcc12.4           Pass: 100%/98  | Total:  9h 32m | Avg:  5m 50s | Max: 18m 37s | Hits:  99%/115658
    🟩 cudacxx_name
      🟩 clang-cuda         Pass: 100%/2   | Total:  7m 53s | Avg:  3m 56s | Max:  4m 01s | Hits: 100%/2360  
      🟩 nvcc               Pass: 100%/116 | Total: 10h 46m | Avg:  5m 34s | Max: 18m 37s | Hits:  99%/136906
    🟩 cxx_full
      🟩 clang9             Pass: 100%/6   | Total: 23m 20s | Avg:  3m 53s | Max:  4m 19s | Hits: 100%/7080  
      🟩 clang10            Pass: 100%/3   | Total: 13m 01s | Avg:  4m 20s | Max:  4m 29s | Hits: 100%/3540  
      🟩 clang11            Pass: 100%/4   | Total: 15m 28s | Avg:  3m 52s | Max:  4m 08s | Hits: 100%/4720  
      🟩 clang12            Pass: 100%/4   | Total: 15m 15s | Avg:  3m 48s | Max:  3m 59s | Hits: 100%/4720  
      🟩 clang13            Pass: 100%/4   | Total: 14m 51s | Avg:  3m 42s | Max:  3m 49s | Hits: 100%/4720  
      🟩 clang14            Pass: 100%/4   | Total: 15m 03s | Avg:  3m 45s | Max:  3m 57s | Hits: 100%/4720  
      🟩 clang15            Pass: 100%/4   | Total: 15m 43s | Avg:  3m 55s | Max:  4m 11s | Hits: 100%/4720  
      🟩 clang16            Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 00s | Hits: 100%/4720  
      🟩 clang17            Pass: 100%/18  | Total:  1h 59m | Avg:  6m 39s | Max: 17m 34s | Hits: 100%/21240 
      🟩 gcc6               Pass: 100%/2   | Total:  6m 39s | Avg:  3m 19s | Max:  3m 22s | Hits:  99%/2360  
      🟩 gcc7               Pass: 100%/6   | Total: 20m 18s | Avg:  3m 23s | Max:  3m 47s | Hits:  99%/7086  
      🟩 gcc8               Pass: 100%/6   | Total: 20m 40s | Avg:  3m 26s | Max:  3m 50s | Hits:  99%/7086  
      🟩 gcc9               Pass: 100%/6   | Total: 22m 31s | Avg:  3m 45s | Max:  4m 14s | Hits:  99%/7086  
      🟩 gcc10              Pass: 100%/4   | Total: 15m 33s | Avg:  3m 53s | Max:  4m 01s | Hits:  99%/4724  
      🟩 gcc11              Pass: 100%/7   | Total: 26m 46s | Avg:  3m 49s | Max:  4m 06s | Hits:  99%/8267  
      🟩 gcc12              Pass: 100%/4   | Total: 16m 20s | Avg:  4m 05s | Max:  4m 15s | Hits:  99%/4724  
      🟩 gcc13              Pass: 100%/20  | Total:  2h 13m | Avg:  6m 41s | Max: 17m 20s | Hits:  99%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 14m 24s | Avg:  4m 48s | Max:  4m 56s | Hits: 100%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 57s | Avg: 15m 57s | Max: 15m 57s | Hits:  98%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 23m 52s | Avg: 11m 56s | Max: 11m 59s | Hits:  98%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  1h 29m | Avg: 14m 52s | Max: 18m 37s | Hits:  98%/7056  
    🟩 cxx_name
      🟩 clang              Pass: 100%/51  | Total:  4h 07m | Avg:  4m 51s | Max: 17m 34s | Hits: 100%/60180 
      🟩 gcc                Pass: 100%/55  | Total:  4h 22m | Avg:  4m 46s | Max: 17m 20s | Hits:  99%/64953 
      🟩 Intel              Pass: 100%/3   | Total: 14m 24s | Avg:  4m 48s | Max:  4m 56s | Hits: 100%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  2h 09m | Avg: 14m 20s | Max: 18m 37s | Hits:  98%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total: 10h 53m | Avg:  5m 32s | Max: 18m 37s | Hits:  99%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  7h 13m | Avg:  4m 22s | Max: 15m 57s | Hits:  99%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 46m | Avg:  9m 38s | Max: 18m 37s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 54m | Avg: 14m 15s | Max: 17m 34s | Hits:  99%/9444  
    🟩 os
      🟩 ubuntu18.04        Pass: 100%/14  | Total: 46m 29s | Avg:  3m 19s | Max:  3m 39s | Hits:  99%/16529 
      🟩 ubuntu20.04        Pass: 100%/35  | Total:  2h 16m | Avg:  3m 53s | Max:  4m 29s | Hits:  99%/41313 
      🟩 ubuntu22.04        Pass: 100%/60  | Total:  5h 42m | Avg:  5m 42s | Max: 17m 34s | Hits:  99%/70840 
      🟩 windows2022        Pass: 100%/9   | Total:  2h 09m | Avg: 14m 20s | Max: 18m 37s | Hits:  98%/10584 
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 11m 10s | Avg:  3m 43s | Max:  3m 48s | Hits:  99%/3543  
      🟩 90a                Pass: 100%/4   | Total: 13m 12s | Avg:  3m 18s | Max:  3m 25s | Hits:  99%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 18m | Avg:  4m 36s | Max: 17m 20s | Hits:  99%/35418 
      🟩 14                 Pass: 100%/34  | Total:  3h 18m | Avg:  5m 50s | Max: 16m 51s | Hits:  99%/40122 
      🟩 17                 Pass: 100%/33  | Total:  3h 03m | Avg:  5m 33s | Max: 18m 37s | Hits:  99%/38946 
      🟩 20                 Pass: 100%/21  | Total:  2h 13m | Avg:  6m 22s | Max: 17m 36s | Hits:  99%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I have a few questions that we should consider before merging this.

cub/cub/agent/single_pass_scan_operators.cuh Outdated Show resolved Hide resolved
cub/cub/agent/single_pass_scan_operators.cuh Outdated Show resolved Hide resolved
cub/cub/agent/single_pass_scan_operators.cuh Outdated Show resolved Hide resolved
cub/cub/agent/single_pass_scan_operators.cuh Outdated Show resolved Hide resolved
cub/benchmarks/bench/select/if.cu Outdated Show resolved Hide resolved
@elstehle elstehle requested a review from gevtushenko June 28, 2024 20:59
Copy link
Contributor

🟨 CI finished in 3h 15m: Pass: 99%/249 | Total: 4d 20h | Avg: 28m 10s | Max: 1h 19m | Hits: 65%/247587
  • 🟨 cub: Pass: 99%/131 | Total: 2d 17h | Avg: 30m 04s | Max: 54m 32s | Hits: 58%/108321

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/123 | Total:  2d 13h | Avg: 29m 49s | Max: 54m 32s | Hits:  59%/101505
      🟩 arm64              Pass: 100%/8   | Total:  4h 32m | Avg: 34m 04s | Max: 36m 16s | Hits:  45%/6816  
    🔍 ctk: 12.4 🔍
      🟩 11.1               Pass: 100%/15  | Total:  6h 52m | Avg: 27m 30s | Max: 47m 44s | Hits:  43%/11568 
      🟩 11.8               Pass: 100%/3   | Total:  2h 12m | Avg: 44m 16s | Max: 47m 28s | Hits:  44%/2556  
      🔍 12.4               Pass:  99%/113 | Total:  2d 08h | Avg: 30m 02s | Max: 54m 32s | Hits:  60%/94197 
    🔍 cudacxx: nvcc12.4 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 42m 44s | Avg: 21m 22s | Max: 22m 30s | Hits:  47%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 52m | Avg: 27m 30s | Max: 47m 44s | Hits:  43%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 12m | Avg: 44m 16s | Max: 47m 28s | Hits:  44%/2556  
      🔍 nvcc12.4           Pass:  99%/111 | Total:  2d 07h | Avg: 30m 12s | Max: 54m 32s | Hits:  60%/92789 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total: 42m 44s | Avg: 21m 22s | Max: 22m 30s | Hits:  47%/1408  
      🔍 nvcc               Pass:  99%/129 | Total:  2d 16h | Avg: 30m 12s | Max: 54m 32s | Hits:  58%/106913
    🔍 cxx: GCC13 🔍
      🟩 Clang9             Pass: 100%/6   | Total:  2h 57m | Avg: 29m 37s | Max: 35m 45s | Hits:  44%/4890  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 42m | Avg: 34m 11s | Max: 36m 13s | Hits:  45%/2562  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 14m | Avg: 33m 36s | Max: 34m 17s | Hits:  45%/3416  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 33s | Max: 35m 24s | Hits:  45%/3416  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 14m | Avg: 33m 39s | Max: 35m 13s | Hits:  45%/3416  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 08s | Max: 35m 49s | Hits:  45%/3416  
      🟩 Clang15            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 32s | Max: 36m 02s | Hits:  45%/3408  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 12m | Avg: 33m 01s | Max: 34m 43s | Hits:  45%/3408  
      🟩 Clang17            Pass: 100%/26  | Total: 10h 42m | Avg: 24m 42s | Max: 36m 10s | Hits:  79%/21856 
      🟩 GCC6               Pass: 100%/2   | Total: 54m 41s | Avg: 27m 20s | Max: 28m 06s | Hits:  43%/1552  
      🟩 GCC7               Pass: 100%/6   | Total:  3h 02m | Avg: 30m 22s | Max: 36m 52s | Hits:  44%/4893  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 59m | Avg: 29m 59s | Max: 36m 51s | Hits:  44%/4893  
      🟩 GCC9               Pass: 100%/6   | Total:  3h 05m | Avg: 30m 50s | Max: 35m 10s | Hits:  44%/4893  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 20m | Avg: 35m 04s | Max: 37m 40s | Hits:  44%/3416  
      🟩 GCC11              Pass: 100%/7   | Total:  4h 31m | Avg: 38m 45s | Max: 47m 28s | Hits:  44%/5964  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 21m | Avg: 35m 21s | Max: 36m 35s | Hits:  44%/3408  
      🔍 GCC13              Pass:  96%/28  | Total: 10h 41m | Avg: 22m 54s | Max: 36m 16s | Hits:  75%/23004 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 58m | Avg: 39m 25s | Max: 40m 19s | Hits:  43%/2340  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 47m 44s | Avg: 47m 44s | Max: 47m 44s | Hits:  46%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 34m | Avg: 47m 01s | Max: 48m 18s | Hits:  46%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 27m | Avg: 49m 13s | Max: 54m 32s | Hits:  46%/2085  
    🔍 cxx_family: GCC 🔍
      🟩 Clang              Pass: 100%/59  | Total:  1d 04h | Avg: 29m 26s | Max: 36m 13s | Hits:  60%/49788 
      🔍 GCC                Pass:  98%/63  | Total:  1d 05h | Avg: 28m 30s | Max: 47m 28s | Hits:  58%/52023 
      🟩 Intel              Pass: 100%/3   | Total:  1h 58m | Avg: 39m 25s | Max: 40m 19s | Hits:  43%/2340  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 49m | Avg: 48m 14s | Max: 54m 32s | Hits:  46%/4170  
    🔍 jobs: GraphCapture 🔍
      🟩 Build              Pass: 100%/99  | Total:  2d 07h | Avg: 33m 33s | Max: 54m 32s | Hits:  44%/81909 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 27m | Avg: 18m 28s | Max: 22m 08s | Hits:  99%/6816  
      🔍 GraphCapture       Pass:  87%/8   | Total:  1h 51m | Avg: 13m 58s | Max: 19m 22s | Hits:  99%/5964  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 44m | Avg: 20m 31s | Max: 29m 08s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 13m | Avg: 24m 14s | Max: 29m 44s | Hits:  99%/6816  
    🔍 std: 17 🔍
      🟩 11                 Pass: 100%/34  | Total: 16h 36m | Avg: 29m 18s | Max: 47m 28s | Hits:  57%/28539 
      🟩 14                 Pass: 100%/37  | Total: 19h 11m | Avg: 31m 07s | Max: 48m 35s | Hits:  57%/30624 
      🔍 17                 Pass:  97%/36  | Total: 17h 49m | Avg: 29m 42s | Max: 48m 18s | Hits:  56%/29005 
      🟩 20                 Pass: 100%/24  | Total: 12h 02m | Avg: 30m 06s | Max: 54m 32s | Hits:  63%/20153 
    🟨 gpu
      🟨 v100               Pass:  99%/131 | Total:  2d 17h | Avg: 30m 04s | Max: 54m 32s | Hits:  58%/108321
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 12m | Avg: 44m 16s | Max: 47m 28s | Hits:  44%/2556  
      🟩 90a                Pass: 100%/4   | Total:  1h 14m | Avg: 18m 40s | Max: 19m 15s | Hits:  44%/3408  
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 03h | Avg: 26m 03s | Max: 1h 19m | Hits: 71%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 23h | Avg: 26m 06s | Max:  1h 19m | Hits:  72%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 22m | Avg: 25m 18s | Max: 27m 56s | Hits:  66%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 14m | Avg: 24m 57s | Max: 49m 26s | Hits:  66%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 49m | Avg: 36m 24s | Max: 40m 05s | Hits:  66%/3543  
      🟩 12.4               Pass: 100%/100 | Total:  1d 19h | Avg: 25m 54s | Max:  1h 19m | Hits:  72%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 51m 04s | Avg: 25m 32s | Max: 26m 57s | Hits:  66%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 14m | Avg: 24m 57s | Max: 49m 26s | Hits:  66%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 49m | Avg: 36m 24s | Max: 40m 05s | Hits:  66%/3543  
      🟩 nvcc12.4           Pass: 100%/98  | Total:  1d 18h | Avg: 25m 54s | Max:  1h 19m | Hits:  72%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 04s | Avg: 25m 32s | Max: 26m 57s | Hits:  66%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  2d 02h | Avg: 26m 03s | Max:  1h 19m | Hits:  71%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 26m | Avg: 24m 24s | Max: 29m 58s | Hits:  66%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 18m | Avg: 26m 06s | Max: 28m 17s | Hits:  66%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 30s | Max: 27m 58s | Hits:  66%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 42s | Max: 27m 24s | Hits:  66%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 14s | Max: 30m 02s | Hits:  66%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 14s | Max: 29m 28s | Hits:  66%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 43s | Max: 27m 29s | Hits:  66%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 39s | Max: 27m 33s | Hits:  66%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  6h 53m | Avg: 22m 59s | Max:  1h 19m | Hits:  81%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 47m 18s | Avg: 23m 39s | Max: 25m 24s | Hits:  67%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 28m | Avg: 24m 47s | Max: 29m 24s | Hits:  66%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 29m | Avg: 24m 56s | Max: 29m 06s | Hits:  66%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 33m | Avg: 25m 39s | Max: 29m 46s | Hits:  66%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 36s | Max: 31m 26s | Hits:  66%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 37m | Avg: 31m 05s | Max: 40m 05s | Hits:  66%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 52m | Avg: 28m 14s | Max: 29m 56s | Hits:  66%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  6h 31m | Avg: 19m 35s | Max: 30m 28s | Hits:  78%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 40m | Avg: 33m 30s | Max: 36m 39s | Hits:  67%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s | Hits:  64%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 02s | Max: 53m 21s | Hits:  64%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 39m | Avg: 36m 30s | Max: 57m 46s | Hits:  81%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 21h 06m | Avg: 24m 50s | Max:  1h 19m | Hits:  71%/60180 
      🟩 GCC                Pass: 100%/55  | Total: 22h 16m | Avg: 24m 17s | Max: 40m 05s | Hits:  70%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 30s | Max: 36m 39s | Hits:  67%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  6h 10m | Avg: 41m 10s | Max: 57m 46s | Hits:  76%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 03h | Avg: 26m 03s | Max:  1h 19m | Hits:  71%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 21h | Avg: 27m 38s | Max: 57m 46s | Hits:  66%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 12m | Avg: 12m 02s | Max: 30m 28s | Hits:  96%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 24m | Avg: 25m 34s | Max:  1h 19m | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 49m | Avg: 36m 24s | Max: 40m 05s | Hits:  66%/3543  
      🟩 90a                Pass: 100%/4   | Total: 59m 15s | Avg: 14m 48s | Max: 16m 00s | Hits:  66%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 49m | Avg: 21m 38s | Max: 30m 35s | Hits:  72%/35418 
      🟩 14                 Pass: 100%/34  | Total: 15h 32m | Avg: 27m 25s | Max: 53m 21s | Hits:  70%/40122 
      🟩 17                 Pass: 100%/33  | Total: 15h 04m | Avg: 27m 23s | Max: 51m 23s | Hits:  69%/38946 
      🟩 20                 Pass: 100%/21  | Total:  9h 48m | Avg: 28m 01s | Max:  1h 19m | Hits:  73%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 7h 06m: Pass: 100%/249 | Total: 4d 21h | Avg: 28m 12s | Max: 1h 19m | Hits: 65%/248439
  • 🟩 cub: Pass: 100%/131 | Total: 2d 17h | Avg: 30m 08s | Max: 54m 32s | Hits: 58%/109173

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  2d 13h | Avg: 29m 53s | Max: 54m 32s | Hits:  59%/102357
      🟩 arm64              Pass: 100%/8   | Total:  4h 32m | Avg: 34m 04s | Max: 36m 16s | Hits:  45%/6816  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 52m | Avg: 27m 30s | Max: 47m 44s | Hits:  43%/11568 
      🟩 11.8               Pass: 100%/3   | Total:  2h 12m | Avg: 44m 16s | Max: 47m 28s | Hits:  44%/2556  
      🟩 12.4               Pass: 100%/113 | Total:  2d 08h | Avg: 30m 07s | Max: 54m 32s | Hits:  60%/95049 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 42m 44s | Avg: 21m 22s | Max: 22m 30s | Hits:  47%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 52m | Avg: 27m 30s | Max: 47m 44s | Hits:  43%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 12m | Avg: 44m 16s | Max: 47m 28s | Hits:  44%/2556  
      🟩 nvcc12.4           Pass: 100%/111 | Total:  2d 08h | Avg: 30m 16s | Max: 54m 32s | Hits:  61%/93641 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 42m 44s | Avg: 21m 22s | Max: 22m 30s | Hits:  47%/1408  
      🟩 nvcc               Pass: 100%/129 | Total:  2d 17h | Avg: 30m 17s | Max: 54m 32s | Hits:  58%/107765
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 57m | Avg: 29m 37s | Max: 35m 45s | Hits:  44%/4890  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 42m | Avg: 34m 11s | Max: 36m 13s | Hits:  45%/2562  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 14m | Avg: 33m 36s | Max: 34m 17s | Hits:  45%/3416  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 33s | Max: 35m 24s | Hits:  45%/3416  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 14m | Avg: 33m 39s | Max: 35m 13s | Hits:  45%/3416  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 08s | Max: 35m 49s | Hits:  45%/3416  
      🟩 Clang15            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 32s | Max: 36m 02s | Hits:  45%/3408  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 12m | Avg: 33m 01s | Max: 34m 43s | Hits:  45%/3408  
      🟩 Clang17            Pass: 100%/26  | Total: 10h 42m | Avg: 24m 42s | Max: 36m 10s | Hits:  79%/21856 
      🟩 GCC6               Pass: 100%/2   | Total: 54m 41s | Avg: 27m 20s | Max: 28m 06s | Hits:  43%/1552  
      🟩 GCC7               Pass: 100%/6   | Total:  3h 02m | Avg: 30m 22s | Max: 36m 52s | Hits:  44%/4893  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 59m | Avg: 29m 59s | Max: 36m 51s | Hits:  44%/4893  
      🟩 GCC9               Pass: 100%/6   | Total:  3h 05m | Avg: 30m 50s | Max: 35m 10s | Hits:  44%/4893  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 20m | Avg: 35m 04s | Max: 37m 40s | Hits:  44%/3416  
      🟩 GCC11              Pass: 100%/7   | Total:  4h 31m | Avg: 38m 45s | Max: 47m 28s | Hits:  44%/5964  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 21m | Avg: 35m 21s | Max: 36m 35s | Hits:  44%/3408  
      🟩 GCC13              Pass: 100%/28  | Total: 10h 50m | Avg: 23m 12s | Max: 36m 16s | Hits:  76%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 58m | Avg: 39m 25s | Max: 40m 19s | Hits:  43%/2340  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 47m 44s | Avg: 47m 44s | Max: 47m 44s | Hits:  46%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 34m | Avg: 47m 01s | Max: 48m 18s | Hits:  46%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 27m | Avg: 49m 13s | Max: 54m 32s | Hits:  46%/2085  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 04h | Avg: 29m 26s | Max: 36m 13s | Hits:  60%/49788 
      🟩 GCC                Pass: 100%/63  | Total:  1d 06h | Avg: 28m 39s | Max: 47m 28s | Hits:  58%/52875 
      🟩 Intel              Pass: 100%/3   | Total:  1h 58m | Avg: 39m 25s | Max: 40m 19s | Hits:  43%/2340  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 49m | Avg: 48m 14s | Max: 54m 32s | Hits:  46%/4170  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  2d 17h | Avg: 30m 08s | Max: 54m 32s | Hits:  58%/109173
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 07h | Avg: 33m 33s | Max: 54m 32s | Hits:  44%/81909 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 27m | Avg: 18m 28s | Max: 22m 08s | Hits:  99%/6816  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 00m | Avg: 15m 03s | Max: 19m 22s | Hits:  99%/6816  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 44m | Avg: 20m 31s | Max: 29m 08s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 13m | Avg: 24m 14s | Max: 29m 44s | Hits:  99%/6816  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 12m | Avg: 44m 16s | Max: 47m 28s | Hits:  44%/2556  
      🟩 90a                Pass: 100%/4   | Total:  1h 14m | Avg: 18m 40s | Max: 19m 15s | Hits:  44%/3408  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 16h 36m | Avg: 29m 18s | Max: 47m 28s | Hits:  57%/28539 
      🟩 14                 Pass: 100%/37  | Total: 19h 11m | Avg: 31m 07s | Max: 48m 35s | Hits:  57%/30624 
      🟩 17                 Pass: 100%/36  | Total: 17h 58m | Avg: 29m 57s | Max: 48m 18s | Hits:  57%/29857 
      🟩 20                 Pass: 100%/24  | Total: 12h 02m | Avg: 30m 06s | Max: 54m 32s | Hits:  63%/20153 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 03h | Avg: 26m 03s | Max: 1h 19m | Hits: 71%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 23h | Avg: 26m 06s | Max:  1h 19m | Hits:  72%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 22m | Avg: 25m 18s | Max: 27m 56s | Hits:  66%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 14m | Avg: 24m 57s | Max: 49m 26s | Hits:  66%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 49m | Avg: 36m 24s | Max: 40m 05s | Hits:  66%/3543  
      🟩 12.4               Pass: 100%/100 | Total:  1d 19h | Avg: 25m 54s | Max:  1h 19m | Hits:  72%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 51m 04s | Avg: 25m 32s | Max: 26m 57s | Hits:  66%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 14m | Avg: 24m 57s | Max: 49m 26s | Hits:  66%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 49m | Avg: 36m 24s | Max: 40m 05s | Hits:  66%/3543  
      🟩 nvcc12.4           Pass: 100%/98  | Total:  1d 18h | Avg: 25m 54s | Max:  1h 19m | Hits:  72%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 04s | Avg: 25m 32s | Max: 26m 57s | Hits:  66%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  2d 02h | Avg: 26m 03s | Max:  1h 19m | Hits:  71%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 26m | Avg: 24m 24s | Max: 29m 58s | Hits:  66%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 18m | Avg: 26m 06s | Max: 28m 17s | Hits:  66%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 30s | Max: 27m 58s | Hits:  66%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 42s | Max: 27m 24s | Hits:  66%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 14s | Max: 30m 02s | Hits:  66%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 48m | Avg: 27m 14s | Max: 29m 28s | Hits:  66%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 43s | Max: 27m 29s | Hits:  66%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 39s | Max: 27m 33s | Hits:  66%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  6h 53m | Avg: 22m 59s | Max:  1h 19m | Hits:  81%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 47m 18s | Avg: 23m 39s | Max: 25m 24s | Hits:  67%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 28m | Avg: 24m 47s | Max: 29m 24s | Hits:  66%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 29m | Avg: 24m 56s | Max: 29m 06s | Hits:  66%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 33m | Avg: 25m 39s | Max: 29m 46s | Hits:  66%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 36s | Max: 31m 26s | Hits:  66%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 37m | Avg: 31m 05s | Max: 40m 05s | Hits:  66%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 52m | Avg: 28m 14s | Max: 29m 56s | Hits:  66%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  6h 31m | Avg: 19m 35s | Max: 30m 28s | Hits:  78%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 40m | Avg: 33m 30s | Max: 36m 39s | Hits:  67%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 49m 26s | Avg: 49m 26s | Max: 49m 26s | Hits:  64%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 02s | Max: 53m 21s | Hits:  64%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 39m | Avg: 36m 30s | Max: 57m 46s | Hits:  81%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 21h 06m | Avg: 24m 50s | Max:  1h 19m | Hits:  71%/60180 
      🟩 GCC                Pass: 100%/55  | Total: 22h 16m | Avg: 24m 17s | Max: 40m 05s | Hits:  70%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 30s | Max: 36m 39s | Hits:  67%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  6h 10m | Avg: 41m 10s | Max: 57m 46s | Hits:  76%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 03h | Avg: 26m 03s | Max:  1h 19m | Hits:  71%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 21h | Avg: 27m 38s | Max: 57m 46s | Hits:  66%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  2h 12m | Avg: 12m 02s | Max: 30m 28s | Hits:  96%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 24m | Avg: 25m 34s | Max:  1h 19m | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 49m | Avg: 36m 24s | Max: 40m 05s | Hits:  66%/3543  
      🟩 90a                Pass: 100%/4   | Total: 59m 15s | Avg: 14m 48s | Max: 16m 00s | Hits:  66%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 49m | Avg: 21m 38s | Max: 30m 35s | Hits:  72%/35418 
      🟩 14                 Pass: 100%/34  | Total: 15h 32m | Avg: 27m 25s | Max: 53m 21s | Hits:  70%/40122 
      🟩 17                 Pass: 100%/33  | Total: 15h 04m | Avg: 27m 23s | Max: 51m 23s | Hits:  69%/38946 
      🟩 20                 Pass: 100%/21  | Total:  9h 48m | Avg: 28m 01s | Max:  1h 19m | Hits:  73%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Jul 3, 2024

🟩 CI finished in 3h 38m: Pass: 100%/249 | Total: 4d 17h | Avg: 27m 18s | Max: 50m 52s | Hits: 66%/248439
  • 🟩 cub: Pass: 100%/131 | Total: 2d 17h | Avg: 29m 49s | Max: 48m 29s | Hits: 58%/109173

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  2d 12h | Avg: 29m 33s | Max: 48m 29s | Hits:  59%/102357
      🟩 arm64              Pass: 100%/8   | Total:  4h 30m | Avg: 33m 52s | Max: 37m 01s | Hits:  45%/6816  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 45m | Avg: 27m 00s | Max: 47m 45s | Hits:  43%/11568 
      🟩 11.8               Pass: 100%/3   | Total:  2h 14m | Avg: 44m 54s | Max: 46m 12s | Hits:  44%/2556  
      🟩 12.4               Pass: 100%/113 | Total:  2d 08h | Avg: 29m 48s | Max: 48m 29s | Hits:  60%/95049 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 42m 25s | Avg: 21m 12s | Max: 21m 53s | Hits:  47%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 45m | Avg: 27m 00s | Max: 47m 45s | Hits:  43%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 14m | Avg: 44m 54s | Max: 46m 12s | Hits:  44%/2556  
      🟩 nvcc12.4           Pass: 100%/111 | Total:  2d 07h | Avg: 29m 57s | Max: 48m 29s | Hits:  61%/93641 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 42m 25s | Avg: 21m 12s | Max: 21m 53s | Hits:  47%/1408  
      🟩 nvcc               Pass: 100%/129 | Total:  2d 16h | Avg: 29m 57s | Max: 48m 29s | Hits:  58%/107765
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 57m | Avg: 29m 39s | Max: 34m 43s | Hits:  44%/4890  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 39m | Avg: 33m 14s | Max: 33m 51s | Hits:  45%/2562  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 44s | Max: 37m 11s | Hits:  45%/3416  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 12m | Avg: 33m 05s | Max: 34m 09s | Hits:  45%/3416  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 13m | Avg: 33m 27s | Max: 36m 45s | Hits:  45%/3416  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 33s | Max: 33m 47s | Hits:  45%/3416  
      🟩 Clang15            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 31s | Max: 33m 27s | Hits:  45%/3408  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 31s | Max: 34m 01s | Hits:  45%/3408  
      🟩 Clang17            Pass: 100%/26  | Total: 11h 01m | Avg: 25m 25s | Max: 33m 52s | Hits:  79%/21856 
      🟩 GCC6               Pass: 100%/2   | Total: 50m 35s | Avg: 25m 17s | Max: 25m 38s | Hits:  43%/1552  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 53m | Avg: 28m 58s | Max: 32m 50s | Hits:  44%/4893  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 52m | Avg: 28m 45s | Max: 32m 49s | Hits:  44%/4893  
      🟩 GCC9               Pass: 100%/6   | Total:  3h 03m | Avg: 30m 34s | Max: 37m 46s | Hits:  44%/4893  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 20m | Avg: 35m 03s | Max: 37m 28s | Hits:  44%/3416  
      🟩 GCC11              Pass: 100%/7   | Total:  4h 30m | Avg: 38m 38s | Max: 46m 12s | Hits:  44%/5964  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 17m | Avg: 34m 24s | Max: 34m 51s | Hits:  44%/3408  
      🟩 GCC13              Pass: 100%/28  | Total: 11h 01m | Avg: 23m 37s | Max: 37m 01s | Hits:  76%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 49m | Avg: 36m 27s | Max: 38m 17s | Hits:  43%/2340  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 47m 45s | Avg: 47m 45s | Max: 47m 45s | Hits:  46%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 28m | Avg: 44m 00s | Max: 44m 09s | Hits:  46%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 17m | Avg: 45m 59s | Max: 48m 29s | Hits:  46%/2085  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 04h | Avg: 29m 23s | Max: 37m 11s | Hits:  60%/49788 
      🟩 GCC                Pass: 100%/63  | Total:  1d 05h | Avg: 28m 24s | Max: 46m 12s | Hits:  58%/52875 
      🟩 Intel              Pass: 100%/3   | Total:  1h 49m | Avg: 36m 27s | Max: 38m 17s | Hits:  43%/2340  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 33m | Avg: 45m 37s | Max: 48m 29s | Hits:  46%/4170  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  2d 17h | Avg: 29m 49s | Max: 48m 29s | Hits:  58%/109173
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 05h | Avg: 32m 42s | Max: 48m 29s | Hits:  44%/81909 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 40m | Avg: 20m 01s | Max: 23m 07s | Hits:  99%/6816  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 11m | Avg: 16m 29s | Max: 19m 49s | Hits:  99%/6816  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 33m | Avg: 19m 08s | Max: 21m 31s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 44m | Avg: 28m 04s | Max: 31m 10s | Hits:  99%/6816  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 14m | Avg: 44m 54s | Max: 46m 12s | Hits:  44%/2556  
      🟩 90a                Pass: 100%/4   | Total:  1h 12m | Avg: 18m 02s | Max: 18m 37s | Hits:  44%/3408  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 16h 35m | Avg: 29m 16s | Max: 46m 12s | Hits:  57%/28539 
      🟩 14                 Pass: 100%/37  | Total: 19h 03m | Avg: 30m 54s | Max: 48m 29s | Hits:  57%/30624 
      🟩 17                 Pass: 100%/36  | Total: 17h 58m | Avg: 29m 58s | Max: 44m 38s | Hits:  57%/29857 
      🟩 20                 Pass: 100%/24  | Total: 11h 29m | Avg: 28m 44s | Max: 46m 20s | Hits:  63%/20153 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 00h | Avg: 24m 31s | Max: 50m 52s | Hits: 71%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 20h | Avg: 24m 28s | Max: 50m 52s | Hits:  72%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 21m | Avg: 25m 14s | Max: 27m 44s | Hits:  66%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  6h 04m | Avg: 24m 16s | Max: 45m 20s | Hits:  66%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 45m | Avg: 35m 02s | Max: 36m 49s | Hits:  66%/3543  
      🟩 12.4               Pass: 100%/100 | Total:  1d 16h | Avg: 24m 14s | Max: 50m 52s | Hits:  72%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 47m 00s | Avg: 23m 30s | Max: 24m 13s | Hits:  66%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  6h 04m | Avg: 24m 16s | Max: 45m 20s | Hits:  66%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 45m | Avg: 35m 02s | Max: 36m 49s | Hits:  66%/3543  
      🟩 nvcc12.4           Pass: 100%/98  | Total:  1d 15h | Avg: 24m 15s | Max: 50m 52s | Hits:  73%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 47m 00s | Avg: 23m 30s | Max: 24m 13s | Hits:  66%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 23h | Avg: 24m 32s | Max: 50m 52s | Hits:  72%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 28m | Avg: 24m 42s | Max: 29m 07s | Hits:  66%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 19m | Avg: 26m 23s | Max: 28m 37s | Hits:  66%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 30s | Max: 28m 59s | Hits:  66%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 06s | Max: 27m 12s | Hits:  66%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 43m | Avg: 25m 46s | Max: 27m 36s | Hits:  66%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 03s | Max: 27m 35s | Hits:  66%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 42s | Max: 28m 26s | Hits:  66%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 12s | Max: 27m 26s | Hits:  66%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 30m | Avg: 18m 22s | Max: 27m 57s | Hits:  81%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 43m 38s | Avg: 21m 49s | Max: 23m 39s | Hits:  67%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 26m | Avg: 24m 23s | Max: 31m 51s | Hits:  66%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 25m | Avg: 24m 12s | Max: 26m 47s | Hits:  66%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 29m | Avg: 24m 59s | Max: 28m 09s | Hits:  66%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 48m | Avg: 27m 00s | Max: 28m 58s | Hits:  66%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 36m | Avg: 30m 52s | Max: 36m 49s | Hits:  66%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 42s | Max: 33m 27s | Hits:  66%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  5h 54m | Avg: 17m 44s | Max: 27m 44s | Hits:  79%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 33m | Avg: 31m 05s | Max: 33m 17s | Hits:  67%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 20s | Avg: 45m 20s | Max: 45m 20s | Hits:  64%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 35m | Avg: 47m 39s | Max: 48m 28s | Hits:  64%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 29m | Avg: 34m 51s | Max: 50m 52s | Hits:  81%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 19h 31m | Avg: 22m 58s | Max: 29m 07s | Hits:  71%/60180 
      🟩 GCC                Pass: 100%/55  | Total: 21h 18m | Avg: 23m 15s | Max: 36m 49s | Hits:  71%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 33m | Avg: 31m 05s | Max: 33m 17s | Hits:  67%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 49m | Avg: 38m 51s | Max: 50m 52s | Hits:  76%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 00h | Avg: 24m 31s | Max: 50m 52s | Hits:  71%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 20h | Avg: 26m 58s | Max: 50m 52s | Hits:  66%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 46m | Avg:  9m 43s | Max: 20m 11s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 56m | Avg: 14m 33s | Max: 16m 34s | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 45m | Avg: 35m 02s | Max: 36m 49s | Hits:  66%/3543  
      🟩 90a                Pass: 100%/4   | Total:  1h 01m | Avg: 15m 19s | Max: 16m 30s | Hits:  66%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 27m | Avg: 20m 54s | Max: 32m 49s | Hits:  72%/35418 
      🟩 14                 Pass: 100%/34  | Total: 14h 37m | Avg: 25m 49s | Max: 48m 34s | Hits:  70%/40122 
      🟩 17                 Pass: 100%/33  | Total: 14h 36m | Avg: 26m 33s | Max: 50m 52s | Hits:  71%/38946 
      🟩 20                 Pass: 100%/21  | Total:  8h 32m | Avg: 24m 23s | Max: 50m 50s | Hits:  73%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@jrhemstad jrhemstad added the 2.6.0 Targeted for 2.6.0 release label Jul 3, 2024
@elstehle
Copy link
Collaborator Author

elstehle commented Jul 4, 2024

For the latest commit, aeff76e

  • sass for all algorithms of our CUB benchmarks, except for DeviceSelect<MayAlias=true>, stayed the same
  • Benchmark results for DeviceSelect<MayAlias=true> can be found below

Select.If - Tesla V100-SXM2-32GB

T{ct} OffsetT{ct} IsInPlace{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.007 us 11.14% 8.900 us 9.40% -0.107 us -1.19% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 15.431 us 6.21% 15.290 us 4.31% -0.141 us -0.91% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 106.602 us 1.16% 106.607 us 0.99% 0.005 us 0.01% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 1.582 ms 0.50% 1.582 ms 0.50% -0.004 us -0.00% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 8.909 us 11.30% 8.745 us 7.71% -0.164 us -1.84% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 14.776 us 6.39% 14.673 us 4.22% -0.104 us -0.70% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 99.508 us 1.22% 99.539 us 0.95% 0.031 us 0.03% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.452 ms 0.50% 1.452 ms 0.50% -0.203 us -0.01% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.681 us 12.13% 8.522 us 8.96% -0.160 us -1.84% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 14.494 us 6.75% 14.376 us 5.30% -0.118 us -0.81% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 90.669 us 1.17% 90.674 us 0.76% 0.005 us 0.01% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.291 ms 0.46% 1.291 ms 0.46% -0.071 us -0.01% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 8.881 us 11.96% 9.217 us 7.13% 0.336 us 3.78% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 15.201 us 6.24% 16.716 us 3.56% 1.514 us 9.96% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 106.884 us 1.13% 130.418 us 0.97% 23.534 us 22.02% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 1.588 ms 0.50% 1.970 ms 0.50% 381.817 us 24.04% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 8.807 us 10.74% 9.091 us 6.10% 0.284 us 3.22% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 14.614 us 5.59% 16.189 us 4.74% 1.575 us 10.78% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 99.812 us 1.08% 120.872 us 0.99% 21.060 us 21.10% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.459 ms 0.50% 1.808 ms 0.50% 348.988 us 23.93% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.556 us 10.55% 8.857 us 8.39% 0.301 us 3.52% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 14.264 us 6.21% 15.734 us 4.08% 1.470 us 10.31% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 90.695 us 1.03% 108.666 us 0.90% 17.971 us 19.81% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.293 ms 0.48% 1.586 ms 0.45% 292.777 us 22.65% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 8.992 us 10.90% 8.927 us 7.19% -0.065 us -0.73% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 15.746 us 6.16% 15.658 us 5.04% -0.089 us -0.56% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 112.095 us 0.91% 111.839 us 0.83% -0.256 us -0.23% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 1.661 ms 0.50% 1.661 ms 0.50% -0.014 us -0.00% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.037 us 9.73% 8.844 us 8.08% -0.193 us -2.14% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 15.293 us 6.06% 15.044 us 4.12% -0.249 us -1.63% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 105.595 us 0.99% 105.342 us 0.72% -0.253 us -0.24% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.553 ms 0.50% 1.551 ms 0.50% -1.704 us -0.11% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.755 us 10.97% 8.613 us 8.53% -0.142 us -1.63% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 14.901 us 6.37% 14.745 us 4.46% -0.157 us -1.05% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 95.914 us 0.99% 95.482 us 0.86% -0.432 us -0.45% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.366 ms 0.50% 1.366 ms 0.50% -0.172 us -0.01% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.045 us 10.33% 9.250 us 6.63% 0.205 us 2.26% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 15.536 us 5.19% 16.757 us 3.73% 1.221 us 7.86% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 112.301 us 1.05% 131.931 us 1.06% 19.629 us 17.48% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 1.667 ms 0.50% 1.995 ms 0.50% 327.554 us 19.65% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 8.993 us 10.72% 9.186 us 7.10% 0.193 us 2.15% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 15.061 us 5.97% 16.360 us 4.11% 1.299 us 8.62% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 105.852 us 0.85% 123.472 us 0.73% 17.620 us 16.65% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.562 ms 0.50% 1.850 ms 0.50% 288.585 us 18.48% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.641 us 11.08% 8.971 us 8.32% 0.330 us 3.82% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 14.509 us 5.74% 15.818 us 4.62% 1.309 us 9.02% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 95.546 us 0.95% 110.062 us 0.64% 14.516 us 15.19% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.367 ms 0.50% 1.598 ms 0.50% 231.192 us 16.91% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.076 us 9.46% 9.224 us 6.13% 0.147 us 1.62% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 16.587 us 5.77% 16.572 us 3.68% -0.015 us -0.09% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 125.800 us 1.24% 126.143 us 1.11% 0.343 us 0.27% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 1.873 ms 0.50% 1.873 ms 0.50% 0.193 us 0.01% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.098 us 10.03% 9.033 us 7.38% -0.066 us -0.72% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 16.305 us 5.83% 16.281 us 3.81% -0.024 us -0.15% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 115.662 us 1.16% 115.605 us 1.09% -0.057 us -0.05% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.702 ms 0.50% 1.702 ms 0.50% -0.028 us -0.00% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.738 us 11.17% 8.681 us 7.56% -0.057 us -0.65% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 16.013 us 5.96% 15.905 us 4.99% -0.108 us -0.67% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 95.845 us 0.83% 95.773 us 0.69% -0.073 us -0.08% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.349 ms 0.50% 1.349 ms 0.50% -0.205 us -0.02% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.290 us 9.78% 9.419 us 6.98% 0.129 us 1.39% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 16.379 us 5.24% 17.882 us 3.78% 1.503 us 9.18% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 125.515 us 1.23% 148.870 us 1.14% 23.355 us 18.61% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 1.870 ms 0.50% 2.247 ms 0.50% 376.694 us 20.14% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 8.967 us 10.96% 9.302 us 5.65% 0.335 us 3.73% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 16.083 us 5.42% 17.760 us 3.87% 1.677 us 10.43% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 115.538 us 1.29% 137.238 us 1.07% 21.699 us 18.78% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.703 ms 0.50% 2.054 ms 0.50% 351.280 us 20.63% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.619 us 11.07% 8.884 us 7.71% 0.264 us 3.07% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 15.684 us 5.61% 17.120 us 4.29% 1.436 us 9.15% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 95.589 us 1.06% 115.920 us 0.79% 20.331 us 21.27% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.346 ms 0.50% 1.685 ms 0.50% 338.197 us 25.12% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.442 us 9.14% 9.060 us 7.27% -0.382 us -4.05% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 16.781 us 5.58% 16.821 us 3.87% 0.040 us 0.24% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 128.234 us 1.14% 128.370 us 0.96% 0.136 us 0.11% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 1.904 ms 0.50% 1.905 ms 0.50% 0.175 us 0.01% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.077 us 9.76% 9.057 us 6.52% -0.019 us -0.21% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 16.648 us 4.96% 16.683 us 4.22% 0.035 us 0.21% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 118.221 us 1.21% 118.322 us 0.87% 0.101 us 0.09% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.743 ms 0.50% 1.743 ms 0.50% 0.247 us 0.01% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.813 us 10.81% 8.784 us 8.14% -0.030 us -0.33% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 16.289 us 5.41% 16.237 us 4.04% -0.052 us -0.32% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 100.343 us 0.98% 100.208 us 0.68% -0.135 us -0.13% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.419 ms 0.50% 1.419 ms 0.50% -0.165 us -0.01% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.163 us 9.84% 9.560 us 6.91% 0.397 us 4.34% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 16.755 us 5.83% 18.175 us 3.96% 1.420 us 8.48% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 128.191 us 1.08% 148.892 us 0.88% 20.701 us 16.15% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 1.906 ms 0.50% 2.241 ms 0.50% 334.878 us 17.57% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.094 us 10.12% 9.468 us 6.78% 0.375 us 4.12% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 16.608 us 5.82% 17.967 us 4.35% 1.359 us 8.18% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 118.511 us 1.11% 137.718 us 0.80% 19.207 us 16.21% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.747 ms 0.50% 2.061 ms 0.50% 313.239 us 17.93% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.780 us 10.87% 9.112 us 7.28% 0.333 us 3.79% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 16.130 us 5.77% 17.346 us 3.54% 1.216 us 7.54% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 100.348 us 1.06% 115.834 us 0.69% 15.486 us 15.43% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.420 ms 0.50% 1.674 ms 0.50% 254.693 us 17.94% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.087 us 9.50% 9.100 us 6.54% 0.013 us 0.15% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 19.727 us 4.18% 19.902 us 2.95% 0.175 us 0.89% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 184.678 us 0.94% 184.797 us 0.89% 0.119 us 0.06% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 2.822 ms 0.62% 2.822 ms 0.62% -0.019 us -0.00% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.084 us 10.31% 9.121 us 6.97% 0.038 us 0.42% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 19.709 us 4.36% 19.754 us 3.18% 0.045 us 0.23% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 156.173 us 1.19% 156.400 us 1.15% 0.228 us 0.15% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 2.331 ms 0.50% 2.332 ms 0.50% 0.268 us 0.01% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.811 us 10.47% 8.845 us 7.67% 0.034 us 0.39% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 18.760 us 4.75% 18.830 us 3.38% 0.069 us 0.37% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 113.675 us 1.10% 113.924 us 1.00% 0.249 us 0.22% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.580 ms 1.01% 1.580 ms 1.00% 0.211 us 0.01% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.211 us 9.99% 9.522 us 7.28% 0.311 us 3.38% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 19.330 us 4.83% 20.825 us 3.21% 1.496 us 7.74% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 184.238 us 0.95% 202.337 us 1.22% 18.099 us 9.82% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 2.823 ms 0.61% 3.117 ms 0.58% 294.152 us 10.42% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.222 us 9.56% 9.565 us 7.44% 0.343 us 3.72% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 19.430 us 4.31% 20.810 us 3.08% 1.380 us 7.10% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 155.713 us 1.17% 176.721 us 1.21% 21.009 us 13.49% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.329 ms 0.50% 2.678 ms 0.50% 348.622 us 14.97% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.950 us 10.60% 9.259 us 7.39% 0.308 us 3.45% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 18.403 us 4.97% 19.856 us 3.98% 1.453 us 7.90% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 113.154 us 1.11% 131.679 us 0.98% 18.525 us 16.37% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.577 ms 1.00% 1.884 ms 0.66% 306.858 us 19.45% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.556 us 10.15% 9.282 us 6.07% -0.275 us -2.87% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 20.286 us 4.29% 20.178 us 3.73% -0.108 us -0.53% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 186.279 us 1.06% 186.296 us 0.99% 0.016 us 0.01% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 2.845 ms 0.61% 2.845 ms 0.61% -0.075 us -0.00% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.158 us 9.66% 9.349 us 7.11% 0.191 us 2.09% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 19.940 us 4.34% 20.234 us 3.84% 0.294 us 1.47% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 158.219 us 1.18% 157.395 us 1.12% -0.824 us -0.52% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 2.357 ms 0.50% 2.357 ms 0.50% -0.289 us -0.01% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.898 us 10.14% 8.910 us 7.63% 0.012 us 0.14% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 19.121 us 4.63% 19.203 us 3.40% 0.081 us 0.42% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 117.220 us 1.09% 117.232 us 1.07% 0.012 us 0.01% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.642 ms 0.88% 1.642 ms 0.87% 0.097 us 0.01% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.601 us 9.71% 9.630 us 7.01% 0.030 us 0.31% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 19.685 us 4.33% 20.800 us 3.69% 1.115 us 5.66% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 185.464 us 1.07% 200.556 us 1.18% 15.092 us 8.14% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 2.844 ms 0.60% 3.080 ms 0.57% 235.917 us 8.29% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.216 us 9.41% 9.851 us 7.16% 0.635 us 6.89% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 19.648 us 4.19% 20.840 us 3.25% 1.192 us 6.07% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 157.396 us 1.18% 176.233 us 1.09% 18.837 us 11.97% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.352 ms 0.50% 2.666 ms 0.50% 313.266 us 13.32% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 9.038 us 10.37% 9.448 us 6.43% 0.410 us 4.54% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 18.767 us 4.70% 20.207 us 3.98% 1.439 us 7.67% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 116.400 us 1.07% 132.050 us 0.86% 15.651 us 13.45% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.639 ms 0.88% 1.891 ms 0.59% 252.971 us 15.44% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 10.156 us 9.39% 10.174 us 6.65% 0.018 us 0.18% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 29.541 us 3.10% 29.546 us 2.51% 0.005 us 0.02% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 349.572 us 0.58% 349.468 us 0.50% -0.104 us -0.03% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.472 ms 0.50% 5.472 ms 0.50% -0.069 us -0.00% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 10.577 us 9.61% 10.532 us 6.89% -0.045 us -0.42% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 28.041 us 3.48% 28.151 us 2.82% 0.110 us 0.39% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 281.177 us 0.67% 281.079 us 0.61% -0.098 us -0.03% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 4.339 ms 0.50% 4.339 ms 0.50% 0.414 us 0.01% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.914 us 9.06% 9.877 us 6.99% -0.036 us -0.37% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 27.064 us 3.54% 27.703 us 3.38% 0.638 us 2.36% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 192.591 us 0.98% 192.688 us 0.97% 0.097 us 0.05% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 2.833 ms 0.88% 2.833 ms 0.88% 0.077 us 0.00% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.214 us 8.21% 10.605 us 5.88% 0.391 us 3.83% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 29.476 us 3.90% 30.299 us 2.68% 0.823 us 2.79% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 349.741 us 0.61% 373.575 us 1.05% 23.834 us 6.81% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.477 ms 0.50% 5.866 ms 0.50% 388.650 us 7.10% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 10.582 us 7.84% 11.063 us 6.18% 0.481 us 4.55% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 27.939 us 3.34% 29.521 us 3.21% 1.582 us 5.66% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 280.994 us 0.74% 310.603 us 0.93% 29.610 us 10.54% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 4.339 ms 0.50% 4.823 ms 0.50% 483.804 us 11.15% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 9.888 us 9.98% 10.322 us 5.13% 0.434 us 4.39% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 27.005 us 3.50% 28.126 us 3.11% 1.121 us 4.15% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 192.298 us 0.91% 219.359 us 0.89% 27.061 us 14.07% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 2.826 ms 0.89% 3.275 ms 0.64% 448.628 us 15.87% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 10.723 us 9.01% 10.641 us 6.18% -0.082 us -0.76% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 30.655 us 3.12% 30.421 us 2.72% -0.234 us -0.76% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 357.695 us 0.68% 357.707 us 0.68% 0.013 us 0.00% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.594 ms 0.50% 5.594 ms 0.50% -0.107 us -0.00% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 10.307 us 8.96% 10.293 us 5.60% -0.014 us -0.14% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 28.926 us 3.21% 28.965 us 3.23% 0.040 us 0.14% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 291.111 us 0.80% 291.240 us 0.78% 0.129 us 0.04% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 4.506 ms 0.50% 4.505 ms 0.50% -0.563 us -0.01% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 10.427 us 8.20% 10.377 us 6.86% -0.050 us -0.48% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 27.917 us 4.04% 27.829 us 2.50% -0.088 us -0.31% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 205.002 us 0.85% 205.133 us 0.82% 0.131 us 0.06% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 3.046 ms 0.73% 3.046 ms 0.74% 0.075 us 0.00% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.662 us 8.70% 11.186 us 6.03% 0.524 us 4.92% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 30.334 us 3.39% 31.449 us 2.43% 1.115 us 3.67% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 357.806 us 0.65% 380.408 us 0.90% 22.601 us 6.32% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.594 ms 0.50% 5.972 ms 0.50% 377.699 us 6.75% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 10.262 us 8.80% 10.829 us 6.69% 0.568 us 5.53% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 28.833 us 4.16% 30.139 us 2.81% 1.306 us 4.53% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 290.620 us 0.80% 320.918 us 0.82% 30.298 us 10.43% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 4.499 ms 0.50% 4.988 ms 0.50% 489.262 us 10.88% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 10.432 us 8.99% 10.953 us 6.46% 0.521 us 4.99% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 27.746 us 3.14% 28.971 us 2.26% 1.225 us 4.42% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 204.764 us 0.92% 231.714 us 0.72% 26.950 us 13.16% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 3.043 ms 0.74% 3.484 ms 0.56% 441.080 us 14.50% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 12.641 us 8.39% 12.581 us 5.84% -0.060 us -0.47% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 39.876 us 2.39% 39.822 us 1.72% -0.054 us -0.14% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 394.223 us 0.59% 394.020 us 0.54% -0.203 us -0.05% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 6.070 ms 0.50% 6.070 ms 0.50% -0.136 us -0.00% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 12.641 us 7.56% 12.590 us 5.75% -0.050 us -0.40% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 39.922 us 2.85% 39.844 us 1.87% -0.079 us -0.20% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 393.988 us 0.55% 394.046 us 0.56% 0.058 us 0.01% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 6.070 ms 0.50% 6.070 ms 0.50% 0.037 us 0.00% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 12.602 us 7.70% 12.569 us 5.22% -0.032 us -0.26% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 39.894 us 2.25% 39.834 us 1.74% -0.060 us -0.15% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 393.885 us 0.56% 393.952 us 0.58% 0.067 us 0.02% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 6.069 ms 0.50% 6.070 ms 0.50% 0.373 us 0.01% PASS
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 12.565 us 7.45% 13.224 us 5.27% 0.659 us 5.24% PASS
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 39.734 us 2.45% 43.211 us 1.95% 3.477 us 8.75% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 393.148 us 0.57% 465.553 us 0.53% 72.404 us 18.42% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 6.056 ms 0.50% 7.251 ms 0.50% 1.195 ms 19.73% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 12.545 us 7.03% 13.226 us 4.90% 0.681 us 5.43% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 39.770 us 2.62% 43.260 us 2.02% 3.490 us 8.77% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 392.997 us 0.58% 465.384 us 0.55% 72.387 us 18.42% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 6.056 ms 0.50% 7.251 ms 0.50% 1.195 ms 19.73% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 12.521 us 6.79% 13.232 us 5.39% 0.711 us 5.68% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 39.748 us 2.53% 43.242 us 1.97% 3.494 us 8.79% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 393.161 us 0.58% 465.331 us 0.50% 72.170 us 18.36% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 6.056 ms 0.50% 7.252 ms 0.50% 1.196 ms 19.74% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 12.075 us 8.34% 12.043 us 5.91% -0.032 us -0.26% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 40.972 us 2.18% 40.946 us 1.76% -0.026 us -0.06% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 415.609 us 0.50% 415.617 us 0.50% 0.008 us 0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 6.420 ms 0.50% 6.420 ms 0.50% -0.156 us -0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 12.191 us 7.96% 12.121 us 5.79% -0.070 us -0.58% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 41.120 us 2.35% 41.022 us 1.86% -0.099 us -0.24% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 415.724 us 0.50% 415.608 us 0.50% -0.116 us -0.03% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 6.420 ms 0.50% 6.420 ms 0.50% -0.012 us -0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 12.157 us 7.92% 12.099 us 6.11% -0.057 us -0.47% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 41.074 us 2.13% 40.998 us 1.84% -0.076 us -0.18% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 415.625 us 0.50% 415.623 us 0.50% -0.002 us -0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 6.420 ms 0.50% 6.420 ms 0.50% 0.134 us 0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 12.262 us 7.01% 13.002 us 4.77% 0.740 us 6.03% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 41.046 us 2.60% 44.094 us 2.09% 3.049 us 7.43% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 414.718 us 0.50% 477.074 us 0.48% 62.356 us 15.04% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 6.405 ms 0.50% 7.418 ms 0.50% 1.013 ms 15.82% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 12.280 us 7.11% 13.014 us 5.67% 0.734 us 5.98% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 41.036 us 2.41% 44.092 us 1.96% 3.056 us 7.45% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 414.723 us 0.58% 476.940 us 0.50% 62.217 us 15.00% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 6.405 ms 0.50% 7.418 ms 0.50% 1.013 ms 15.82% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 12.200 us 7.09% 12.954 us 5.81% 0.755 us 6.19% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 40.967 us 2.42% 44.007 us 1.61% 3.040 us 7.42% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 414.601 us 0.50% 476.925 us 0.49% 62.324 us 15.03% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 6.404 ms 0.50% 7.418 ms 0.50% 1.013 ms 15.82% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.090 us 10.09% 9.084 us 6.98% -0.005 us -0.06% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 19.736 us 4.04% 19.834 us 3.73% 0.098 us 0.50% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 184.820 us 1.03% 184.908 us 0.91% 0.087 us 0.05% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 2.960 ms 0.67% 2.960 ms 0.67% 0.058 us 0.00% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 8.918 us 10.77% 8.956 us 7.62% 0.038 us 0.42% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 19.030 us 4.89% 19.078 us 3.87% 0.047 us 0.25% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 128.787 us 1.16% 128.853 us 1.07% 0.065 us 0.05% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.859 ms 0.79% 1.858 ms 0.79% -0.177 us -0.01% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.824 us 10.89% 8.855 us 8.03% 0.031 us 0.35% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 18.777 us 4.48% 18.815 us 3.55% 0.038 us 0.20% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 113.562 us 1.08% 113.590 us 0.99% 0.028 us 0.02% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.580 ms 1.00% 1.580 ms 1.00% 0.020 us 0.00% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.213 us 9.50% 9.456 us 6.99% 0.242 us 2.63% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 19.300 us 4.53% 20.691 us 4.14% 1.391 us 7.21% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 184.421 us 0.94% 202.357 us 1.23% 17.936 us 9.73% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 2.955 ms 0.66% 3.195 ms 0.59% 239.910 us 8.12% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.024 us 10.13% 9.195 us 6.72% 0.171 us 1.90% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 18.608 us 4.46% 20.212 us 3.40% 1.603 us 8.62% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 128.213 us 1.22% 145.994 us 1.19% 17.780 us 13.87% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.854 ms 0.77% 2.146 ms 0.71% 291.596 us 15.73% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.931 us 9.48% 9.134 us 7.89% 0.204 us 2.28% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 18.366 us 5.05% 19.719 us 3.25% 1.352 us 7.36% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 113.159 us 1.19% 131.735 us 1.05% 18.576 us 16.42% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.578 ms 1.00% 1.884 ms 0.66% 306.240 us 19.41% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.334 us 9.11% 9.198 us 5.90% -0.136 us -1.45% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 20.174 us 5.03% 20.096 us 4.17% -0.078 us -0.39% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 186.293 us 1.06% 186.392 us 1.03% 0.099 us 0.05% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 2.969 ms 0.67% 2.969 ms 0.66% 0.192 us 0.01% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 8.996 us 9.30% 9.002 us 7.68% 0.007 us 0.07% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 19.309 us 4.20% 19.312 us 3.46% 0.003 us 0.01% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 131.875 us 1.31% 131.866 us 1.27% -0.009 us -0.01% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.902 ms 0.70% 1.902 ms 0.68% -0.085 us -0.00% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.960 us 10.61% 8.971 us 9.02% 0.011 us 0.12% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 19.053 us 5.00% 19.074 us 3.74% 0.021 us 0.11% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 117.438 us 1.28% 117.481 us 1.02% 0.043 us 0.04% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.645 ms 0.88% 1.645 ms 0.88% -0.022 us -0.00% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.189 us 8.36% 9.609 us 7.31% 0.419 us 4.56% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 19.579 us 4.44% 20.791 us 3.14% 1.212 us 6.19% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 185.899 us 1.04% 201.170 us 1.21% 15.271 us 8.21% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 2.964 ms 0.66% 3.157 ms 0.58% 193.144 us 6.52% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 8.980 us 9.85% 9.404 us 6.42% 0.424 us 4.72% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 18.899 us 5.35% 20.350 us 3.17% 1.452 us 7.68% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 131.479 us 1.38% 145.287 us 1.09% 13.808 us 10.50% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.898 ms 0.69% 2.130 ms 0.67% 231.565 us 12.20% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.987 us 10.25% 9.316 us 6.14% 0.329 us 3.66% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 18.671 us 4.86% 19.853 us 3.88% 1.182 us 6.33% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 116.948 us 1.09% 132.369 us 0.88% 15.421 us 13.19% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.642 ms 0.89% 1.898 ms 0.58% 256.094 us 15.59% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 10.168 us 9.38% 10.246 us 7.35% 0.078 us 0.76% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 29.440 us 3.05% 29.382 us 2.83% -0.058 us -0.20% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 349.573 us 0.58% 349.559 us 0.50% -0.014 us -0.00% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.471 ms 0.50% 5.471 ms 0.50% -0.029 us -0.00% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.833 us 9.91% 9.662 us 7.82% -0.171 us -1.74% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 26.817 us 3.49% 26.751 us 2.66% -0.065 us -0.24% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 224.507 us 0.86% 224.448 us 0.78% -0.059 us -0.03% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 3.384 ms 0.50% 3.384 ms 0.50% -0.220 us -0.01% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.747 us 9.66% 9.639 us 7.67% -0.108 us -1.11% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 27.028 us 3.81% 26.923 us 2.57% -0.105 us -0.39% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 192.588 us 0.97% 192.669 us 0.91% 0.080 us 0.04% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 2.833 ms 0.88% 2.833 ms 0.88% 0.019 us 0.00% PASS
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.355 us 8.89% 11.014 us 6.97% 0.659 us 6.37% PASS
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 29.343 us 3.16% 30.409 us 2.69% 1.066 us 3.63% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 349.703 us 0.61% 373.643 us 0.99% 23.940 us 6.85% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.476 ms 0.50% 5.863 ms 0.50% 387.350 us 7.07% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.936 us 8.76% 10.482 us 6.55% 0.546 us 5.50% PASS
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 26.761 us 3.82% 28.094 us 2.79% 1.332 us 4.98% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 224.301 us 0.84% 250.054 us 0.96% 25.753 us 11.48% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 3.381 ms 0.50% 3.803 ms 0.50% 422.081 us 12.48% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 9.821 us 9.16% 10.241 us 6.56% 0.421 us 4.28% PASS
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 27.019 us 3.92% 28.123 us 3.23% 1.103 us 4.08% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 192.345 us 1.04% 219.372 us 0.90% 27.027 us 14.05% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 2.827 ms 0.89% 3.274 ms 0.64% 447.008 us 15.81% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 10.473 us 8.21% 10.543 us 6.21% 0.070 us 0.67% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 30.201 us 3.02% 30.117 us 2.72% -0.084 us -0.28% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 356.777 us 0.59% 356.719 us 0.62% -0.058 us -0.02% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.579 ms 0.50% 5.579 ms 0.50% -0.056 us -0.00% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 10.262 us 8.14% 10.232 us 6.04% -0.030 us -0.30% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 27.376 us 3.87% 27.363 us 2.95% -0.012 us -0.04% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 235.683 us 0.94% 235.588 us 0.87% -0.095 us -0.04% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 3.568 ms 0.50% 3.568 ms 0.50% -0.384 us -0.01% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 10.229 us 8.82% 10.180 us 6.67% -0.049 us -0.48% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 27.559 us 3.11% 27.542 us 2.42% -0.017 us -0.06% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 204.945 us 0.84% 204.917 us 0.82% -0.028 us -0.01% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 3.046 ms 0.74% 3.046 ms 0.74% -0.063 us -0.00% PASS
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.799 us 8.74% 11.257 us 7.03% 0.458 us 4.24% PASS
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 30.393 us 2.96% 31.365 us 2.47% 0.972 us 3.20% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 357.692 us 0.67% 380.848 us 0.95% 23.155 us 6.47% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.592 ms 0.50% 5.977 ms 0.50% 384.659 us 6.88% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 10.390 us 9.12% 10.878 us 6.73% 0.488 us 4.70% PASS
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 27.362 us 3.22% 28.456 us 2.49% 1.094 us 4.00% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 234.751 us 0.83% 258.557 us 0.91% 23.806 us 10.14% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 3.554 ms 0.50% 3.955 ms 0.50% 400.615 us 11.27% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 10.272 us 8.69% 10.721 us 6.58% 0.449 us 4.38% PASS
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 27.522 us 3.43% 28.585 us 2.53% 1.063 us 3.86% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 204.282 us 0.89% 229.501 us 0.77% 25.219 us 12.35% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 3.036 ms 0.75% 3.454 ms 0.60% 417.787 us 13.76% FAIL

Select.Flagged - Tesla V100-SXM2-32GB

T{ct} OffsetT{ct} IsInPlace{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 8.914 us 9.49% 9.036 us 8.40% 0.122 us 1.36% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 16.298 us 5.87% 16.257 us 5.53% -0.041 us -0.25% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 120.410 us 0.95% 120.345 us 0.86% -0.065 us -0.05% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 1.805 ms 0.50% 1.805 ms 0.50% 0.045 us 0.00% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 8.783 us 10.78% 8.906 us 8.33% 0.123 us 1.39% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 15.903 us 5.68% 15.911 us 5.39% 0.008 us 0.05% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 120.355 us 0.86% 120.415 us 0.95% 0.060 us 0.05% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.733 ms 0.52% 1.733 ms 0.51% -0.176 us -0.01% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.566 us 10.81% 8.629 us 7.93% 0.063 us 0.73% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 15.379 us 5.59% 15.328 us 3.75% -0.051 us -0.33% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 103.673 us 0.93% 103.681 us 0.84% 0.008 us 0.01% PASS
I8 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.487 ms 0.12% 1.487 ms 0.10% -0.076 us -0.01% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.129 us 9.95% 9.401 us 7.16% 0.272 us 2.98% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 16.560 us 5.20% 18.294 us 4.26% 1.734 us 10.47% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 121.442 us 1.06% 142.335 us 0.86% 20.893 us 17.20% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 1.817 ms 0.50% 2.151 ms 0.50% 334.285 us 18.40% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.055 us 9.17% 9.292 us 6.48% 0.237 us 2.62% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 16.223 us 5.19% 17.938 us 3.77% 1.715 us 10.57% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 121.616 us 1.14% 141.595 us 0.91% 19.979 us 16.43% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.749 ms 0.52% 2.092 ms 0.50% 342.680 us 19.59% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.727 us 10.27% 8.914 us 7.87% 0.187 us 2.15% PASS
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 15.601 us 5.73% 17.215 us 4.45% 1.614 us 10.34% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 104.299 us 0.94% 122.782 us 0.75% 18.483 us 17.72% FAIL
I8 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.498 ms 0.14% 1.791 ms 0.17% 293.387 us 19.59% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.014 us 9.14% 8.922 us 7.31% -0.092 us -1.02% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 17.074 us 4.92% 17.030 us 4.15% -0.043 us -0.25% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 131.937 us 0.78% 132.036 us 0.63% 0.099 us 0.07% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 1.991 ms 0.50% 1.991 ms 0.50% 0.271 us 0.01% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 8.893 us 10.78% 8.950 us 8.60% 0.057 us 0.64% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 16.782 us 5.70% 16.843 us 4.15% 0.060 us 0.36% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 131.679 us 0.85% 131.770 us 0.79% 0.090 us 0.07% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.919 ms 0.50% 1.919 ms 0.50% 0.217 us 0.01% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.710 us 10.74% 8.751 us 8.47% 0.041 us 0.47% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 15.955 us 6.13% 15.972 us 4.21% 0.017 us 0.11% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 114.285 us 0.70% 114.355 us 0.81% 0.070 us 0.06% PASS
I8 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.662 ms 0.08% 1.662 ms 0.10% 0.010 us 0.00% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.146 us 9.08% 9.543 us 7.06% 0.397 us 4.34% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 16.883 us 5.17% 18.343 us 4.45% 1.459 us 8.64% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 126.447 us 0.96% 144.267 us 0.71% 17.820 us 14.09% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 1.903 ms 0.50% 2.181 ms 0.50% 278.194 us 14.62% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.077 us 10.18% 9.464 us 6.54% 0.387 us 4.27% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 16.545 us 5.01% 18.092 us 3.63% 1.547 us 9.35% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 126.564 us 0.88% 143.767 us 0.88% 17.203 us 13.59% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 1.828 ms 0.50% 2.123 ms 0.50% 295.200 us 16.15% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.832 us 10.10% 9.036 us 7.15% 0.204 us 2.31% PASS
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 16.032 us 5.64% 17.334 us 4.25% 1.302 us 8.12% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 109.697 us 0.86% 124.912 us 0.68% 15.216 us 13.87% FAIL
I8 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.576 ms 0.12% 1.825 ms 0.11% 249.471 us 15.83% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.095 us 9.76% 8.997 us 7.18% -0.097 us -1.07% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 17.774 us 4.89% 17.622 us 3.59% -0.152 us -0.85% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 137.066 us 1.14% 137.021 us 1.03% -0.045 us -0.03% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 2.051 ms 0.50% 2.052 ms 0.50% 0.410 us 0.02% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.066 us 9.79% 8.995 us 6.34% -0.071 us -0.78% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 17.312 us 4.59% 17.258 us 4.52% -0.053 us -0.31% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 132.787 us 1.18% 132.795 us 1.09% 0.008 us 0.01% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 1.971 ms 0.56% 1.970 ms 0.55% -0.481 us -0.02% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.746 us 11.41% 8.676 us 8.10% -0.071 us -0.81% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 17.075 us 4.90% 17.129 us 3.61% 0.054 us 0.31% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 106.139 us 1.09% 106.202 us 0.70% 0.063 us 0.06% PASS
I16 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.476 ms 0.14% 1.476 ms 0.15% 0.015 us 0.00% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.117 us 8.97% 9.551 us 7.54% 0.434 us 4.76% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 17.803 us 5.49% 19.636 us 2.78% 1.833 us 10.29% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 143.685 us 0.92% 163.002 us 1.07% 19.317 us 13.44% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 2.162 ms 0.50% 2.465 ms 0.50% 303.622 us 14.05% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.024 us 9.44% 9.432 us 7.73% 0.408 us 4.52% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 17.154 us 5.41% 18.922 us 3.59% 1.769 us 10.31% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 138.911 us 1.14% 158.745 us 1.00% 19.834 us 14.28% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.083 ms 0.56% 2.407 ms 0.52% 323.552 us 15.53% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.754 us 10.42% 9.119 us 6.69% 0.365 us 4.16% PASS
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 16.812 us 5.41% 18.568 us 3.52% 1.756 us 10.44% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 112.614 us 0.80% 133.348 us 0.70% 20.734 us 18.41% FAIL
I16 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.594 ms 0.13% 1.940 ms 0.17% 346.239 us 21.72% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.159 us 9.73% 9.254 us 6.77% 0.096 us 1.04% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 18.807 us 4.83% 19.049 us 3.88% 0.242 us 1.28% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 144.760 us 0.98% 144.643 us 0.82% -0.117 us -0.08% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 2.163 ms 0.50% 2.163 ms 0.50% -0.190 us -0.01% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.137 us 8.87% 9.048 us 7.18% -0.089 us -0.98% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 18.416 us 5.30% 18.331 us 3.66% -0.085 us -0.46% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 140.166 us 1.01% 139.951 us 0.84% -0.215 us -0.15% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 2.087 ms 0.51% 2.086 ms 0.50% -0.502 us -0.02% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 8.937 us 10.44% 8.782 us 7.56% -0.155 us -1.73% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 17.762 us 5.34% 17.590 us 4.43% -0.171 us -0.96% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 115.686 us 0.94% 115.442 us 0.65% -0.244 us -0.21% PASS
I16 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.635 ms 0.10% 1.634 ms 0.09% -0.127 us -0.01% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.298 us 8.82% 9.551 us 7.24% 0.253 us 2.72% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 18.430 us 4.72% 19.600 us 2.75% 1.170 us 6.35% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 146.846 us 0.87% 164.037 us 0.86% 17.191 us 11.71% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 2.211 ms 0.50% 2.485 ms 0.50% 274.573 us 12.42% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.194 us 9.09% 9.433 us 6.27% 0.239 us 2.60% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 17.957 us 4.84% 19.125 us 4.10% 1.168 us 6.50% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 142.385 us 0.96% 160.276 us 0.89% 17.892 us 12.57% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.134 ms 0.51% 2.425 ms 0.50% 291.188 us 13.64% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.882 us 10.28% 9.217 us 6.60% 0.335 us 3.77% PASS
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 17.155 us 5.38% 18.556 us 3.07% 1.401 us 8.16% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 117.623 us 0.91% 134.665 us 0.64% 17.042 us 14.49% FAIL
I16 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.680 ms 0.09% 1.962 ms 0.12% 281.743 us 16.77% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.497 us 9.15% 9.502 us 7.09% 0.006 us 0.06% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 21.831 us 4.11% 21.834 us 3.05% 0.004 us 0.02% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 207.161 us 0.81% 207.165 us 0.79% 0.004 us 0.00% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 3.174 ms 0.56% 3.174 ms 0.56% 0.110 us 0.00% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.470 us 9.56% 9.466 us 8.31% -0.004 us -0.04% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 21.976 us 4.06% 21.943 us 3.61% -0.033 us -0.15% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 185.022 us 1.04% 185.077 us 1.02% 0.055 us 0.03% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 2.796 ms 0.50% 2.796 ms 0.50% -0.580 us -0.02% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.082 us 9.55% 9.076 us 7.57% -0.005 us -0.06% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 20.810 us 4.65% 20.773 us 2.81% -0.037 us -0.18% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 130.582 us 0.88% 130.379 us 0.84% -0.203 us -0.16% PASS
I32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.831 ms 0.17% 1.830 ms 0.15% -0.093 us -0.01% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.494 us 9.07% 9.675 us 7.39% 0.181 us 1.91% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 21.617 us 4.29% 22.717 us 2.97% 1.100 us 5.09% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 209.253 us 0.84% 224.991 us 1.03% 15.738 us 7.52% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 3.208 ms 0.53% 3.468 ms 0.50% 260.109 us 8.11% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.466 us 9.43% 9.600 us 7.07% 0.134 us 1.42% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 21.625 us 4.47% 22.913 us 4.02% 1.288 us 5.96% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 188.181 us 1.04% 207.204 us 1.00% 19.023 us 10.11% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.847 ms 0.50% 3.149 ms 0.50% 302.452 us 10.62% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 8.991 us 10.24% 9.390 us 6.57% 0.399 us 4.44% PASS
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 20.585 us 4.43% 21.790 us 3.14% 1.206 us 5.86% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 133.353 us 0.93% 151.234 us 0.79% 17.880 us 13.41% FAIL
I32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.885 ms 0.15% 2.184 ms 0.20% 298.346 us 15.82% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.687 us 9.76% 9.715 us 7.79% 0.028 us 0.29% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 22.153 us 4.42% 22.217 us 3.56% 0.064 us 0.29% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 209.579 us 0.82% 209.542 us 0.88% -0.036 us -0.02% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 3.213 ms 0.55% 3.213 ms 0.55% -0.090 us -0.00% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.574 us 10.03% 9.445 us 7.04% -0.130 us -1.35% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 22.150 us 4.43% 22.076 us 3.88% -0.073 us -0.33% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 190.030 us 1.14% 189.921 us 1.02% -0.109 us -0.06% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 2.873 ms 0.50% 2.873 ms 0.50% 0.042 us 0.00% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.283 us 9.58% 9.250 us 7.03% -0.032 us -0.35% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 21.446 us 4.06% 21.366 us 3.53% -0.080 us -0.37% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 137.004 us 0.88% 136.850 us 0.77% -0.154 us -0.11% PASS
I32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.945 ms 0.15% 1.946 ms 0.14% 0.195 us 0.01% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.580 us 9.54% 9.852 us 7.50% 0.272 us 2.84% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 21.865 us 4.07% 22.948 us 3.17% 1.083 us 4.95% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 210.739 us 0.84% 224.008 us 0.89% 13.269 us 6.30% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 3.233 ms 0.51% 3.439 ms 0.50% 205.544 us 6.36% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.425 us 9.43% 9.687 us 7.37% 0.262 us 2.78% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 22.234 us 4.90% 23.272 us 3.83% 1.038 us 4.67% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 190.206 us 1.03% 207.140 us 1.00% 16.934 us 8.90% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.875 ms 0.50% 3.149 ms 0.50% 274.202 us 9.54% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 9.113 us 9.59% 9.567 us 7.64% 0.453 us 4.97% PASS
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 20.935 us 4.39% 22.105 us 3.46% 1.170 us 5.59% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 137.221 us 0.99% 152.410 us 0.70% 15.189 us 11.07% FAIL
I32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.949 ms 0.16% 2.198 ms 0.14% 248.294 us 12.74% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 10.518 us 8.55% 10.476 us 7.15% -0.042 us -0.40% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 31.926 us 2.95% 31.905 us 2.47% -0.021 us -0.07% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 377.979 us 0.64% 377.836 us 0.58% -0.143 us -0.04% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.926 ms 0.50% 5.926 ms 0.50% -0.359 us -0.01% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 10.233 us 8.40% 10.219 us 6.71% -0.014 us -0.13% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 29.777 us 3.07% 29.731 us 2.69% -0.046 us -0.15% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 316.915 us 0.81% 316.757 us 0.77% -0.158 us -0.05% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 4.918 ms 0.50% 4.919 ms 0.50% 0.043 us 0.00% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 10.034 us 9.18% 10.004 us 6.49% -0.030 us -0.30% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 29.086 us 3.20% 29.067 us 3.12% -0.019 us -0.07% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 217.329 us 0.57% 217.196 us 0.57% -0.134 us -0.06% PASS
I64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 3.233 ms 0.13% 3.233 ms 0.13% -0.152 us -0.00% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.637 us 8.69% 10.948 us 6.35% 0.312 us 2.93% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 32.104 us 3.14% 33.831 us 3.13% 1.727 us 5.38% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 379.048 us 0.64% 405.881 us 0.87% 26.833 us 7.08% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.942 ms 0.50% 6.364 ms 0.50% 422.929 us 7.12% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 10.313 us 8.22% 10.709 us 7.41% 0.396 us 3.84% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 30.123 us 3.19% 32.174 us 3.45% 2.051 us 6.81% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 317.541 us 0.76% 348.690 us 0.85% 31.149 us 9.81% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 4.927 ms 0.50% 5.427 ms 0.50% 500.284 us 10.15% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 10.021 us 8.44% 10.402 us 6.88% 0.381 us 3.81% PASS
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 29.298 us 3.40% 31.012 us 2.57% 1.714 us 5.85% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 216.701 us 0.64% 245.581 us 0.70% 28.880 us 13.33% FAIL
I64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 3.223 ms 0.13% 3.687 ms 0.18% 464.623 us 14.42% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 10.052 us 8.63% 9.861 us 6.76% -0.191 us -1.90% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 31.559 us 3.20% 31.412 us 2.39% -0.147 us -0.47% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 377.742 us 0.59% 377.851 us 0.53% 0.109 us 0.03% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.927 ms 0.50% 5.927 ms 0.50% -0.072 us -0.00% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.821 us 8.53% 9.940 us 6.83% 0.119 us 1.21% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 29.529 us 3.40% 29.599 us 2.89% 0.070 us 0.24% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 316.296 us 0.68% 316.405 us 0.69% 0.109 us 0.03% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 4.919 ms 0.50% 4.919 ms 0.50% 0.256 us 0.01% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.588 us 10.02% 9.633 us 7.61% 0.045 us 0.47% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 28.784 us 3.26% 28.850 us 2.76% 0.066 us 0.23% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 221.116 us 0.52% 221.122 us 0.50% 0.006 us 0.00% PASS
I64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 3.312 ms 0.13% 3.312 ms 0.12% 0.068 us 0.00% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.828 us 8.76% 11.446 us 6.10% 0.618 us 5.71% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 32.202 us 2.82% 33.978 us 2.53% 1.776 us 5.51% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 381.326 us 0.66% 402.863 us 0.89% 21.537 us 5.65% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.981 ms 0.50% 6.317 ms 0.50% 336.380 us 5.62% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 11.070 us 8.04% 11.335 us 5.41% 0.265 us 2.40% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 30.208 us 3.36% 32.285 us 3.10% 2.077 us 6.88% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 321.095 us 0.82% 348.733 us 0.81% 27.638 us 8.61% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 4.981 ms 0.50% 5.421 ms 0.50% 440.383 us 8.84% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 10.564 us 8.74% 10.933 us 6.79% 0.368 us 3.49% PASS
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 29.573 us 3.14% 31.261 us 2.46% 1.689 us 5.71% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 221.903 us 0.52% 248.289 us 0.54% 26.386 us 11.89% FAIL
I64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 3.316 ms 0.12% 3.737 ms 0.14% 421.455 us 12.71% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 13.052 us 7.31% 13.067 us 6.38% 0.016 us 0.12% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 53.976 us 2.01% 53.970 us 1.57% -0.006 us -0.01% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 738.310 us 0.46% 738.431 us 0.48% 0.121 us 0.02% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 11.705 ms 0.50% 11.705 ms 0.50% 0.212 us 0.00% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 12.870 us 7.76% 12.745 us 5.59% -0.124 us -0.97% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 47.696 us 2.26% 47.814 us 2.13% 0.118 us 0.25% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 606.811 us 0.68% 607.020 us 0.68% 0.209 us 0.03% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 9.572 ms 0.50% 9.571 ms 0.50% -0.545 us -0.01% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 12.712 us 7.15% 12.605 us 5.97% -0.107 us -0.85% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 41.788 us 2.70% 41.681 us 2.12% -0.107 us -0.26% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 413.667 us 0.39% 413.871 us 0.36% 0.204 us 0.05% PASS
I128 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 6.372 ms 0.08% 6.372 ms 0.09% 0.843 us 0.01% PASS
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 13.018 us 6.82% 13.888 us 5.68% 0.870 us 6.68% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 53.800 us 2.16% 57.750 us 2.10% 3.950 us 7.34% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 737.848 us 0.47% 797.278 us 0.64% 59.430 us 8.05% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 11.698 ms 0.50% 12.630 ms 0.50% 931.807 us 7.97% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 12.818 us 7.38% 13.616 us 5.93% 0.799 us 6.23% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 46.942 us 2.32% 51.647 us 2.13% 4.705 us 10.02% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 606.174 us 0.69% 674.760 us 0.70% 68.586 us 11.31% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 9.560 ms 0.50% 10.662 ms 0.50% 1.103 ms 11.53% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 12.697 us 7.15% 13.384 us 4.81% 0.687 us 5.41% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 41.548 us 1.96% 45.190 us 1.93% 3.642 us 8.77% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 412.133 us 0.39% 480.678 us 0.44% 68.545 us 16.63% FAIL
I128 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 6.349 ms 0.09% 7.483 ms 0.12% 1.133 ms 17.85% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 12.595 us 6.73% 12.687 us 5.45% 0.092 us 0.73% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 54.426 us 1.80% 54.526 us 1.63% 0.100 us 0.18% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 744.302 us 0.50% 744.240 us 0.50% -0.062 us -0.01% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 11.787 ms 0.50% 11.787 ms 0.50% 0.033 us 0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 12.493 us 6.84% 12.394 us 5.20% -0.099 us -0.79% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 47.800 us 2.55% 48.634 us 2.32% 0.834 us 1.75% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 615.550 us 0.68% 615.273 us 0.65% -0.278 us -0.05% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 9.709 ms 0.50% 9.709 ms 0.50% 0.035 us 0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 12.378 us 6.47% 12.235 us 4.91% -0.143 us -1.15% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 42.668 us 2.26% 42.604 us 1.99% -0.064 us -0.15% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 431.237 us 0.38% 431.117 us 0.36% -0.120 us -0.03% PASS
I128 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 6.666 ms 0.07% 6.666 ms 0.07% -0.013 us -0.00% PASS
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 12.672 us 7.05% 13.466 us 4.98% 0.794 us 6.26% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 54.391 us 1.78% 57.538 us 1.81% 3.147 us 5.79% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 742.842 us 0.49% 788.839 us 0.61% 45.997 us 6.19% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 11.774 ms 0.50% 12.470 ms 0.50% 696.084 us 5.91% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 12.471 us 6.61% 13.193 us 4.96% 0.722 us 5.79% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 47.804 us 2.56% 52.308 us 2.00% 4.504 us 9.42% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 614.566 us 0.67% 672.616 us 0.63% 58.050 us 9.45% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 9.692 ms 0.50% 10.618 ms 0.50% 926.413 us 9.56% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 12.312 us 7.11% 13.109 us 5.23% 0.796 us 6.47% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 42.546 us 2.59% 45.749 us 2.10% 3.203 us 7.53% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 429.194 us 0.32% 491.004 us 0.32% 61.810 us 14.40% FAIL
I128 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 6.632 ms 0.07% 7.632 ms 0.08% 1.000 ms 15.09% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.488 us 9.89% 9.524 us 6.99% 0.036 us 0.38% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 21.830 us 4.02% 21.884 us 3.59% 0.053 us 0.24% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 207.294 us 0.86% 207.319 us 0.80% 0.025 us 0.01% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 3.174 ms 0.56% 3.174 ms 0.56% 0.015 us 0.00% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.734 us 9.98% 9.795 us 7.76% 0.061 us 0.62% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 21.942 us 4.28% 21.946 us 3.46% 0.004 us 0.02% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 185.041 us 1.05% 185.014 us 1.03% -0.027 us -0.01% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 2.796 ms 0.50% 2.796 ms 0.50% -0.226 us -0.01% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.041 us 9.95% 9.148 us 7.37% 0.108 us 1.19% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 20.787 us 4.17% 20.785 us 3.45% -0.001 us -0.01% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 130.512 us 0.94% 130.351 us 0.84% -0.160 us -0.12% PASS
F32 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.831 ms 0.18% 1.830 ms 0.16% -0.471 us -0.03% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.490 us 9.34% 9.849 us 7.86% 0.359 us 3.79% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 21.651 us 4.70% 22.783 us 3.08% 1.133 us 5.23% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 209.725 us 0.84% 225.322 us 1.01% 15.597 us 7.44% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 3.205 ms 0.53% 3.468 ms 0.50% 262.680 us 8.20% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.676 us 9.91% 9.905 us 6.84% 0.230 us 2.37% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 21.802 us 4.61% 23.133 us 3.75% 1.331 us 6.11% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 188.262 us 1.07% 206.613 us 1.02% 18.350 us 9.75% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.847 ms 0.50% 3.150 ms 0.50% 303.125 us 10.65% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 9.062 us 9.72% 9.347 us 6.59% 0.285 us 3.15% PASS
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 20.680 us 4.38% 21.757 us 3.94% 1.077 us 5.21% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 133.447 us 0.85% 151.220 us 0.76% 17.773 us 13.32% FAIL
F32 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.885 ms 0.15% 2.184 ms 0.16% 298.568 us 15.84% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.778 us 9.55% 9.663 us 8.03% -0.116 us -1.18% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 22.494 us 4.59% 22.345 us 3.68% -0.150 us -0.66% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 209.628 us 0.86% 209.679 us 0.80% 0.051 us 0.02% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 3.213 ms 0.54% 3.213 ms 0.55% -0.108 us -0.00% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.547 us 9.74% 9.606 us 7.85% 0.059 us 0.62% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 22.416 us 4.70% 22.186 us 3.35% -0.230 us -1.02% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 189.858 us 1.08% 189.806 us 0.99% -0.052 us -0.03% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 2.873 ms 0.50% 2.873 ms 0.50% -0.340 us -0.01% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.211 us 9.87% 9.266 us 7.58% 0.055 us 0.60% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 21.264 us 4.09% 21.225 us 3.92% -0.039 us -0.18% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 136.972 us 0.94% 136.913 us 0.81% -0.059 us -0.04% PASS
F32 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 1.945 ms 0.16% 1.945 ms 0.15% -0.437 us -0.02% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 9.514 us 9.99% 9.962 us 6.92% 0.448 us 4.70% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 22.028 us 4.60% 23.254 us 4.04% 1.226 us 5.57% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 210.701 us 0.91% 224.307 us 0.94% 13.606 us 6.46% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 3.233 ms 0.51% 3.439 ms 0.50% 205.962 us 6.37% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 9.550 us 8.12% 9.842 us 8.17% 0.292 us 3.06% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 21.888 us 3.60% 23.140 us 3.36% 1.252 us 5.72% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 190.309 us 1.02% 207.127 us 1.00% 16.818 us 8.84% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 2.875 ms 0.50% 3.150 ms 0.50% 274.132 us 9.53% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 9.176 us 7.14% 9.552 us 7.80% 0.375 us 4.09% PASS
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 21.073 us 3.34% 22.104 us 3.50% 1.032 us 4.90% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 137.320 us 0.83% 152.563 us 0.77% 15.243 us 11.10% FAIL
F32 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 1.950 ms 0.15% 2.198 ms 0.15% 248.403 us 12.74% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 1 10.466 us 5.85% 10.310 us 5.47% -0.155 us -1.48% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 1 32.007 us 2.54% 31.908 us 2.38% -0.099 us -0.31% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 1 377.823 us 0.57% 377.850 us 0.57% 0.027 us 0.01% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.926 ms 0.50% 5.926 ms 0.50% -0.067 us -0.00% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 10.686 us 6.56% 10.336 us 6.58% -0.351 us -3.28% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 29.754 us 2.67% 30.188 us 2.83% 0.435 us 1.46% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 316.933 us 0.75% 316.882 us 0.79% -0.051 us -0.02% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 4.918 ms 0.50% 4.919 ms 0.50% 0.297 us 0.01% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^16 0 10.080 us 6.79% 9.917 us 7.17% -0.163 us -1.62% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^20 0 29.114 us 2.24% 29.035 us 2.40% -0.078 us -0.27% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^24 0 217.259 us 0.57% 217.277 us 0.49% 0.018 us 0.01% PASS
F64 I32 cuda::std::__4::integral_constant<bool, false> 2^28 0 3.233 ms 0.12% 3.232 ms 0.12% -0.766 us -0.02% PASS
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.397 us 6.12% 10.933 us 6.89% 0.537 us 5.16% PASS
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 1 32.010 us 2.37% 33.800 us 2.71% 1.790 us 5.59% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 1 378.756 us 0.59% 405.616 us 0.91% 26.860 us 7.09% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.941 ms 0.50% 6.365 ms 0.50% 424.126 us 7.14% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 10.522 us 6.90% 11.088 us 6.87% 0.566 us 5.38% PASS
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 29.932 us 2.39% 32.130 us 3.05% 2.198 us 7.34% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 317.432 us 0.74% 348.707 us 0.87% 31.275 us 9.85% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 4.927 ms 0.50% 5.427 ms 0.50% 500.120 us 10.15% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^16 0 9.942 us 6.38% 10.538 us 5.94% 0.596 us 6.00% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^20 0 29.197 us 3.05% 31.094 us 2.47% 1.897 us 6.50% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^24 0 216.607 us 0.61% 245.595 us 0.70% 28.988 us 13.38% FAIL
F64 I32 cuda::std::__4::integral_constant<bool, true> 2^28 0 3.223 ms 0.11% 3.688 ms 0.19% 465.826 us 14.45% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 1 9.906 us 6.72% 10.007 us 6.63% 0.100 us 1.01% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 1 31.451 us 2.91% 31.492 us 2.31% 0.041 us 0.13% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 1 377.775 us 0.54% 377.852 us 0.55% 0.077 us 0.02% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 1 5.926 ms 0.50% 5.927 ms 0.50% 0.138 us 0.00% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0.544 9.915 us 7.07% 9.781 us 7.92% -0.134 us -1.35% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0.544 29.602 us 2.98% 30.115 us 2.73% 0.512 us 1.73% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0.544 316.427 us 0.65% 316.611 us 0.70% 0.184 us 0.06% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0.544 4.919 ms 0.50% 4.920 ms 0.50% 0.387 us 0.01% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^16 0 9.681 us 7.55% 9.576 us 7.56% -0.105 us -1.09% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^20 0 28.881 us 2.39% 28.872 us 2.48% -0.009 us -0.03% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^24 0 221.119 us 0.54% 221.288 us 0.52% 0.169 us 0.08% PASS
F64 I64 cuda::std::__4::integral_constant<bool, false> 2^28 0 3.311 ms 0.11% 3.311 ms 0.11% -0.276 us -0.01% PASS
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 1 10.842 us 6.84% 11.476 us 6.34% 0.634 us 5.85% PASS
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 1 32.175 us 2.24% 33.865 us 2.74% 1.690 us 5.25% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 1 381.725 us 0.65% 402.896 us 0.83% 21.171 us 5.55% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 1 5.981 ms 0.50% 6.317 ms 0.50% 335.479 us 5.61% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0.544 10.737 us 6.50% 11.206 us 5.20% 0.470 us 4.37% PASS
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0.544 30.316 us 2.63% 32.825 us 2.65% 2.509 us 8.28% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0.544 321.029 us 0.76% 348.792 us 0.85% 27.763 us 8.65% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0.544 4.981 ms 0.50% 5.422 ms 0.50% 440.682 us 8.85% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^16 0 10.429 us 6.40% 11.047 us 5.53% 0.618 us 5.93% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^20 0 29.420 us 2.62% 31.346 us 2.88% 1.926 us 6.55% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^24 0 221.825 us 0.49% 248.314 us 0.54% 26.489 us 11.94% FAIL
F64 I64 cuda::std::__4::integral_constant<bool, true> 2^28 0 3.315 ms 0.11% 3.737 ms 0.14% 421.507 us 12.71% FAIL

Copy link
Contributor

github-actions bot commented Jul 4, 2024

🟩 CI finished in 4h 41m: Pass: 100%/249 | Total: 5d 02h | Avg: 29m 30s | Max: 1h 03m | Hits: 36%/248433
  • 🟩 cub: Pass: 100%/131 | Total: 2d 20h | Avg: 31m 09s | Max: 49m 35s | Hits: 36%/109167

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  2d 15h | Avg: 30m 53s | Max: 49m 35s | Hits:  38%/102351
      🟩 arm64              Pass: 100%/8   | Total:  4h 41m | Avg: 35m 14s | Max: 38m 01s | Hits:   6%/6816  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 32m | Avg: 30m 09s | Max: 47m 15s | Hits:  34%/11568 
      🟩 11.8               Pass: 100%/3   | Total:  2h 15m | Avg: 45m 03s | Max: 46m 15s | Hits:  36%/2556  
      🟩 12.5               Pass: 100%/113 | Total:  2d 10h | Avg: 30m 54s | Max: 49m 35s | Hits:  36%/95043 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 44m 51s | Avg: 22m 25s | Max: 23m 18s | Hits:  24%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 32m | Avg: 30m 09s | Max: 47m 15s | Hits:  34%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  2h 15m | Avg: 45m 03s | Max: 46m 15s | Hits:  36%/2556  
      🟩 nvcc12.5           Pass: 100%/111 | Total:  2d 09h | Avg: 31m 04s | Max: 49m 35s | Hits:  36%/93635 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 44m 51s | Avg: 22m 25s | Max: 23m 18s | Hits:  24%/1408  
      🟩 nvcc               Pass: 100%/129 | Total:  2d 19h | Avg: 31m 17s | Max: 49m 35s | Hits:  36%/107759
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 08m | Avg: 31m 22s | Max: 38m 21s | Hits:  22%/4890  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 51m | Avg: 37m 10s | Max: 39m 04s | Hits:  11%/2562  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 15m | Avg: 33m 51s | Max: 36m 09s | Hits:  17%/3416  
      🟩 Clang12            Pass: 100%/4   | Total:  2h 16m | Avg: 34m 02s | Max: 35m 11s | Hits:  17%/3416  
      🟩 Clang13            Pass: 100%/4   | Total:  2h 14m | Avg: 33m 30s | Max: 34m 21s | Hits:  17%/3416  
      🟩 Clang14            Pass: 100%/4   | Total:  2h 21m | Avg: 35m 19s | Max: 37m 27s | Hits:  17%/3416  
      🟩 Clang15            Pass: 100%/4   | Total:  2h 14m | Avg: 33m 36s | Max: 34m 48s | Hits:   8%/3408  
      🟩 Clang16            Pass: 100%/4   | Total:  2h 17m | Avg: 34m 28s | Max: 36m 13s | Hits:   8%/3408  
      🟩 Clang17            Pass: 100%/26  | Total: 11h 17m | Avg: 26m 03s | Max: 36m 29s | Hits:  67%/21856 
      🟩 GCC6               Pass: 100%/2   | Total: 56m 54s | Avg: 28m 27s | Max: 28m 30s | Hits:  34%/1552  
      🟩 GCC7               Pass: 100%/6   | Total:  3h 16m | Avg: 32m 48s | Max: 37m 39s | Hits:  21%/4893  
      🟩 GCC8               Pass: 100%/6   | Total:  3h 10m | Avg: 31m 45s | Max: 34m 32s | Hits:  21%/4893  
      🟩 GCC9               Pass: 100%/6   | Total:  3h 13m | Avg: 32m 13s | Max: 36m 30s | Hits:  21%/4893  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 21m | Avg: 35m 27s | Max: 35m 57s | Hits:  16%/3416  
      🟩 GCC11              Pass: 100%/7   | Total:  4h 35m | Avg: 39m 24s | Max: 46m 15s | Hits:  19%/5964  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 28m | Avg: 37m 06s | Max: 37m 39s | Hits:   6%/3408  
      🟩 GCC13              Pass: 100%/28  | Total: 11h 20m | Avg: 24m 17s | Max: 38m 48s | Hits:  60%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 58m | Avg: 39m 30s | Max: 41m 52s | Hits:   2%/2334  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 47m 15s | Avg: 47m 15s | Max: 47m 15s | Hits:  35%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 33m | Avg: 46m 51s | Max: 48m 13s | Hits:   0%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 20m | Avg: 46m 53s | Max: 49m 35s | Hits:   0%/2085  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total:  1d 05h | Avg: 30m 27s | Max: 39m 04s | Hits:  38%/49788 
      🟩 GCC                Pass: 100%/63  | Total:  1d 07h | Avg: 29m 54s | Max: 46m 15s | Hits:  37%/52875 
      🟩 Intel              Pass: 100%/3   | Total:  1h 58m | Avg: 39m 30s | Max: 41m 52s | Hits:   2%/2334  
      🟩 MSVC               Pass: 100%/6   | Total:  4h 41m | Avg: 46m 56s | Max: 49m 35s | Hits:   6%/4170  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  2d 20h | Avg: 31m 09s | Max: 49m 35s | Hits:  36%/109167
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 08h | Avg: 34m 27s | Max: 49m 35s | Hits:  14%/81903 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 33m | Avg: 19m 09s | Max: 24m 24s | Hits:  99%/6816  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 11m | Avg: 16m 24s | Max: 19m 12s | Hits:  99%/6816  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 37m | Avg: 19m 44s | Max: 24m 11s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 46m | Avg: 28m 19s | Max: 36m 29s | Hits:  99%/6816  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  2h 15m | Avg: 45m 03s | Max: 46m 15s | Hits:  36%/2556  
      🟩 90a                Pass: 100%/4   | Total:  1h 17m | Avg: 19m 29s | Max: 20m 13s | Hits:   5%/3408  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total: 17h 09m | Avg: 30m 16s | Max: 44m 37s | Hits:  34%/28537 
      🟩 14                 Pass: 100%/37  | Total: 19h 46m | Avg: 32m 04s | Max: 48m 13s | Hits:  33%/30622 
      🟩 17                 Pass: 100%/36  | Total: 19h 07m | Avg: 31m 51s | Max: 45m 29s | Hits:  32%/29855 
      🟩 20                 Pass: 100%/24  | Total: 11h 57m | Avg: 29m 53s | Max: 49m 35s | Hits:  48%/20153 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 06h | Avg: 27m 40s | Max: 1h 03m | Hits: 36%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  2d 02h | Avg: 27m 41s | Max:  1h 03m | Hits:  37%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 39m | Avg: 27m 29s | Max: 32m 14s | Hits:  30%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 18m | Avg: 29m 12s | Max: 55m 42s | Hits:  14%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 55m | Avg: 38m 38s | Max: 44m 27s | Hits:  16%/3543  
      🟩 12.5               Pass: 100%/100 | Total:  1d 21h | Avg: 27m 07s | Max:  1h 03m | Hits:  40%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 49m 08s | Avg: 24m 34s | Max: 24m 45s | Hits:  61%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 18m | Avg: 29m 12s | Max: 55m 42s | Hits:  14%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 55m | Avg: 38m 38s | Max: 44m 27s | Hits:  16%/3543  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 20h | Avg: 27m 10s | Max:  1h 03m | Hits:  40%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 49m 08s | Avg: 24m 34s | Max: 24m 45s | Hits:  61%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  2d 05h | Avg: 27m 44s | Max:  1h 03m | Hits:  36%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 45m | Avg: 27m 35s | Max: 31m 30s | Hits:  11%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 30m | Avg: 30m 18s | Max: 33m 07s | Hits:  10%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 41s | Max: 32m 06s | Hits:  11%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 47s | Max: 32m 04s | Hits:  11%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 53s | Max: 31m 24s | Hits:  11%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 44m | Avg: 26m 04s | Max: 27m 23s | Hits:  58%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 44s | Max: 29m 51s | Hits:  58%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 46m | Avg: 26m 32s | Max: 29m 30s | Hits:  58%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 28m | Avg: 18m 15s | Max: 28m 12s | Hits:  77%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 53m 09s | Avg: 26m 34s | Max: 29m 01s | Hits:  16%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 48m | Avg: 28m 08s | Max: 30m 50s | Hits:  14%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 55m | Avg: 29m 17s | Max: 34m 00s | Hits:  14%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 59m | Avg: 29m 50s | Max: 34m 48s | Hits:  14%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  2h 08m | Avg: 32m 02s | Max: 35m 35s | Hits:  13%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 51m | Avg: 33m 07s | Max: 44m 27s | Hits:  33%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  2h 19m | Avg: 34m 53s | Max: 40m 26s | Hits:  13%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  6h 37m | Avg: 19m 52s | Max: 32m 14s | Hits:  52%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 06m | Avg: 42m 08s | Max: 48m 46s | Hits:   2%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 55m 42s | Avg: 55m 42s | Max: 55m 42s | Hits:   0%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 50m | Avg: 55m 12s | Max: 56m 41s | Hits:   0%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 55m | Avg: 39m 19s | Max:  1h 03m | Hits:  49%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 21h 03m | Avg: 24m 46s | Max: 33m 07s | Hits:  45%/60180 
      🟩 GCC                Pass: 100%/55  | Total:  1d 00h | Avg: 26m 47s | Max: 44m 27s | Hits:  30%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  2h 06m | Avg: 42m 08s | Max: 48m 46s | Hits:   2%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  6h 42m | Avg: 44m 40s | Max:  1h 03m | Hits:  33%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 06h | Avg: 27m 40s | Max:  1h 03m | Hits:  36%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  2d 02h | Avg: 30m 45s | Max:  1h 03m | Hits:  24%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 42m | Avg:  9m 17s | Max: 18m 52s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 58m | Avg: 14m 47s | Max: 17m 15s | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 55m | Avg: 38m 38s | Max: 44m 27s | Hits:  16%/3543  
      🟩 90a                Pass: 100%/4   | Total:  1h 17m | Avg: 19m 27s | Max: 21m 29s | Hits:   1%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 11h 44m | Avg: 23m 29s | Max: 35m 26s | Hits:  36%/35418 
      🟩 14                 Pass: 100%/34  | Total: 16h 40m | Avg: 29m 26s | Max: 57m 56s | Hits:  32%/40122 
      🟩 17                 Pass: 100%/33  | Total: 16h 28m | Avg: 29m 56s | Max:  1h 03m | Hits:  34%/38946 
      🟩 20                 Pass: 100%/21  | Total:  9h 32m | Avg: 27m 16s | Max: 59m 58s | Hits:  47%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Jul 5, 2024

🟩 CI finished in 1h 53m: Pass: 100%/249 | Total: 1d 11h | Avg: 8m 27s | Max: 29m 01s | Hits: 97%/248433
  • 🟩 cub: Pass: 100%/131 | Total: 23h 34m | Avg: 10m 47s | Max: 28m 27s | Hits: 97%/109167

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total: 22h 09m | Avg: 10m 48s | Max: 28m 27s | Hits:  97%/102351
      🟩 arm64              Pass: 100%/8   | Total:  1h 24m | Avg: 10m 32s | Max: 11m 04s | Hits:  96%/6816  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 04m | Avg:  4m 18s | Max: 15m 00s | Hits:  99%/11568 
      🟩 11.8               Pass: 100%/3   | Total: 32m 08s | Avg: 10m 42s | Max: 10m 57s | Hits:  96%/2556  
      🟩 12.5               Pass: 100%/113 | Total: 21h 57m | Avg: 11m 39s | Max: 28m 27s | Hits:  97%/95043 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 01s | Avg:  3m 30s | Max:  3m 33s | Hits: 100%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 04m | Avg:  4m 18s | Max: 15m 00s | Hits:  99%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total: 32m 08s | Avg: 10m 42s | Max: 10m 57s | Hits:  96%/2556  
      🟩 nvcc12.5           Pass: 100%/111 | Total: 21h 50m | Avg: 11m 48s | Max: 28m 27s | Hits:  97%/93635 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 01s | Avg:  3m 30s | Max:  3m 33s | Hits: 100%/1408  
      🟩 nvcc               Pass: 100%/129 | Total: 23h 27m | Avg: 10m 54s | Max: 28m 27s | Hits:  97%/107759
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 38m 51s | Avg:  6m 28s | Max:  9m 41s | Hits:  98%/4890  
      🟩 Clang10            Pass: 100%/3   | Total: 28m 01s | Avg:  9m 20s | Max:  9m 57s | Hits:  96%/2562  
      🟩 Clang11            Pass: 100%/4   | Total: 34m 24s | Avg:  8m 36s | Max:  8m 42s | Hits:  96%/3416  
      🟩 Clang12            Pass: 100%/4   | Total: 34m 52s | Avg:  8m 43s | Max:  8m 55s | Hits:  96%/3416  
      🟩 Clang13            Pass: 100%/4   | Total: 34m 49s | Avg:  8m 42s | Max:  8m 50s | Hits:  96%/3416  
      🟩 Clang14            Pass: 100%/4   | Total: 35m 50s | Avg:  8m 57s | Max:  9m 37s | Hits:  96%/3416  
      🟩 Clang15            Pass: 100%/4   | Total: 34m 51s | Avg:  8m 42s | Max:  8m 58s | Hits:  96%/3408  
      🟩 Clang16            Pass: 100%/4   | Total: 35m 35s | Avg:  8m 53s | Max:  9m 13s | Hits:  96%/3408  
      🟩 Clang17            Pass: 100%/26  | Total:  6h 24m | Avg: 14m 47s | Max: 28m 27s | Hits:  98%/21856 
      🟩 GCC6               Pass: 100%/2   | Total:  6m 43s | Avg:  3m 21s | Max:  3m 26s | Hits:  99%/1552  
      🟩 GCC7               Pass: 100%/6   | Total: 36m 17s | Avg:  6m 02s | Max:  8m 44s | Hits:  97%/4893  
      🟩 GCC8               Pass: 100%/6   | Total: 36m 41s | Avg:  6m 06s | Max:  8m 56s | Hits:  97%/4893  
      🟩 GCC9               Pass: 100%/6   | Total: 37m 01s | Avg:  6m 10s | Max:  9m 00s | Hits:  97%/4893  
      🟩 GCC10              Pass: 100%/4   | Total: 37m 02s | Avg:  9m 15s | Max:  9m 45s | Hits:  96%/3416  
      🟩 GCC11              Pass: 100%/7   | Total:  1h 07m | Avg:  9m 41s | Max: 10m 57s | Hits:  96%/5964  
      🟩 GCC12              Pass: 100%/4   | Total: 36m 17s | Avg:  9m 04s | Max:  9m 36s | Hits:  96%/3408  
      🟩 GCC13              Pass: 100%/28  | Total:  6h 50m | Avg: 14m 39s | Max: 28m 19s | Hits:  98%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 15m 42s | Avg:  5m 14s | Max:  5m 31s | Hits: 100%/2334  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 00s | Avg: 15m 00s | Max: 15m 00s | Hits:  98%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 20m 29s | Avg: 10m 14s | Max: 10m 16s | Hits:  98%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 32m 40s | Avg: 10m 53s | Max: 11m 19s | Hits:  98%/2085  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total: 11h 01m | Avg: 11m 13s | Max: 28m 27s | Hits:  97%/49788 
      🟩 GCC                Pass: 100%/63  | Total: 11h 08m | Avg: 10m 36s | Max: 28m 19s | Hits:  97%/52875 
      🟩 Intel              Pass: 100%/3   | Total: 15m 42s | Avg:  5m 14s | Max:  5m 31s | Hits: 100%/2334  
      🟩 MSVC               Pass: 100%/6   | Total:  1h 08m | Avg: 11m 21s | Max: 15m 00s | Hits:  98%/4170  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total: 23h 34m | Avg: 10m 47s | Max: 28m 27s | Hits:  97%/109167
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total: 13h 24m | Avg:  8m 07s | Max: 15m 00s | Hits:  97%/81903 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 20m | Avg: 17m 35s | Max: 22m 09s | Hits:  99%/6816  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 02m | Avg: 15m 20s | Max: 19m 09s | Hits:  99%/6816  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 20m | Avg: 17m 31s | Max: 21m 39s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 25m | Avg: 25m 41s | Max: 28m 27s | Hits:  99%/6816  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 32m 08s | Avg: 10m 42s | Max: 10m 57s | Hits:  96%/2556  
      🟩 90a                Pass: 100%/4   | Total: 23m 35s | Avg:  5m 53s | Max:  6m 15s | Hits:  96%/3408  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  5h 37m | Avg:  9m 55s | Max: 24m 19s | Hits:  97%/28537 
      🟩 14                 Pass: 100%/37  | Total:  6h 30m | Avg: 10m 33s | Max: 26m 01s | Hits:  97%/30622 
      🟩 17                 Pass: 100%/36  | Total:  6h 17m | Avg: 10m 29s | Max: 26m 04s | Hits:  97%/29855 
      🟩 20                 Pass: 100%/24  | Total:  5h 08m | Avg: 12m 51s | Max: 28m 27s | Hits:  97%/20153 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 11h 31m | Avg: 5m 51s | Max: 29m 01s | Hits: 98%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total: 10h 42m | Avg:  5m 50s | Max: 29m 01s | Hits:  98%/129822
      🟩 arm64              Pass: 100%/8   | Total: 49m 28s | Avg:  6m 11s | Max: 24m 59s | Hits:  90%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 46m | Avg:  7m 05s | Max: 29m 01s | Hits:  92%/17705 
      🟩 11.8               Pass: 100%/3   | Total: 10m 22s | Avg:  3m 27s | Max:  3m 34s | Hits:  99%/3543  
      🟩 12.5               Pass: 100%/100 | Total:  9h 35m | Avg:  5m 45s | Max: 24m 59s | Hits:  98%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  3m 40s | Hits: 100%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 46m | Avg:  7m 05s | Max: 29m 01s | Hits:  92%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total: 10m 22s | Avg:  3m 27s | Max:  3m 34s | Hits:  99%/3543  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  9h 27m | Avg:  5m 47s | Max: 24m 59s | Hits:  98%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  3m 40s | Hits: 100%/2360  
      🟩 nvcc               Pass: 100%/116 | Total: 11h 24m | Avg:  5m 54s | Max: 29m 01s | Hits:  98%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 22m 56s | Avg:  3m 49s | Max:  4m 17s | Hits: 100%/7080  
      🟩 Clang10            Pass: 100%/3   | Total: 12m 23s | Avg:  4m 07s | Max:  4m 20s | Hits: 100%/3540  
      🟩 Clang11            Pass: 100%/4   | Total: 15m 08s | Avg:  3m 47s | Max:  4m 02s | Hits: 100%/4720  
      🟩 Clang12            Pass: 100%/4   | Total: 14m 53s | Avg:  3m 43s | Max:  4m 01s | Hits: 100%/4720  
      🟩 Clang13            Pass: 100%/4   | Total: 14m 55s | Avg:  3m 43s | Max:  3m 49s | Hits: 100%/4720  
      🟩 Clang14            Pass: 100%/4   | Total: 14m 30s | Avg:  3m 37s | Max:  3m 39s | Hits: 100%/4720  
      🟩 Clang15            Pass: 100%/4   | Total: 14m 38s | Avg:  3m 39s | Max:  3m 47s | Hits: 100%/4720  
      🟩 Clang16            Pass: 100%/4   | Total: 15m 04s | Avg:  3m 46s | Max:  4m 02s | Hits: 100%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  1h 53m | Avg:  6m 18s | Max: 14m 55s | Hits: 100%/21240 
      🟩 GCC6               Pass: 100%/2   | Total:  6m 06s | Avg:  3m 03s | Max:  3m 13s | Hits:  99%/2360  
      🟩 GCC7               Pass: 100%/6   | Total: 39m 42s | Avg:  6m 37s | Max: 23m 01s | Hits:  94%/7086  
      🟩 GCC8               Pass: 100%/6   | Total: 20m 09s | Avg:  3m 21s | Max:  3m 40s | Hits:  99%/7086  
      🟩 GCC9               Pass: 100%/6   | Total: 46m 45s | Avg:  7m 47s | Max: 29m 01s | Hits:  86%/7086  
      🟩 GCC10              Pass: 100%/4   | Total: 15m 11s | Avg:  3m 47s | Max:  3m 56s | Hits:  99%/4724  
      🟩 GCC11              Pass: 100%/7   | Total: 28m 57s | Avg:  4m 08s | Max:  6m 52s | Hits:  96%/8267  
      🟩 GCC12              Pass: 100%/4   | Total: 15m 31s | Avg:  3m 52s | Max:  4m 11s | Hits:  99%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  2h 22m | Avg:  7m 08s | Max: 24m 59s | Hits:  95%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 14m 03s | Avg:  4m 41s | Max:  4m 56s | Hits: 100%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 15m 42s | Avg: 15m 42s | Max: 15m 42s | Hits:  98%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total: 21m 56s | Avg: 10m 58s | Max: 11m 05s | Hits:  98%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  1h 27m | Avg: 14m 33s | Max: 19m 50s | Hits:  98%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total:  3h 57m | Avg:  4m 39s | Max: 14m 55s | Hits: 100%/60180 
      🟩 GCC                Pass: 100%/55  | Total:  5h 15m | Avg:  5m 43s | Max: 29m 01s | Hits:  95%/64953 
      🟩 Intel              Pass: 100%/3   | Total: 14m 03s | Avg:  4m 41s | Max:  4m 56s | Hits: 100%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  2h 04m | Avg: 13m 53s | Max: 19m 50s | Hits:  98%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total: 11h 31m | Avg:  5m 51s | Max: 29m 01s | Hits:  98%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  8h 03m | Avg:  4m 52s | Max: 29m 01s | Hits:  97%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 41m | Avg:  9m 12s | Max: 19m 50s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 47m | Avg: 13m 24s | Max: 15m 06s | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 10m 22s | Avg:  3m 27s | Max:  3m 34s | Hits:  99%/3543  
      🟩 90a                Pass: 100%/4   | Total: 13m 15s | Avg:  3m 18s | Max:  3m 27s | Hits:  99%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  2h 27m | Avg:  4m 54s | Max: 24m 59s | Hits:  97%/35418 
      🟩 14                 Pass: 100%/34  | Total:  3h 53m | Avg:  6m 52s | Max: 29m 01s | Hits:  96%/40122 
      🟩 17                 Pass: 100%/33  | Total:  3h 00m | Avg:  5m 27s | Max: 19m 50s | Hits:  99%/38946 
      🟩 20                 Pass: 100%/21  | Total:  2h 10m | Avg:  6m 13s | Max: 17m 27s | Hits:  98%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

thrust/thrust/system/cuda/detail/remove.h Outdated Show resolved Hide resolved
cub/cub/device/dispatch/dispatch_select_if.cuh Outdated Show resolved Hide resolved
cub/cub/agent/single_pass_scan_operators.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Jul 6, 2024

🟨 CI finished in 1h 36m: Pass: 99%/249 | Total: 2d 22h | Avg: 17m 02s | Max: 39m 06s | Hits: 84%/247581
  • 🟨 cub: Pass: 99%/131 | Total: 1d 02h | Avg: 12m 18s | Max: 35m 51s | Hits: 94%/108315

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  99%/123 | Total:  1d 01h | Avg: 12m 37s | Max: 35m 51s | Hits:  93%/101499
      🟩 arm64              Pass: 100%/8   | Total:  1h 00m | Avg:  7m 35s | Max:  8m 03s | Hits:  98%/6816  
    🔍 ctk: 12.5 🔍
      🟩 11.1               Pass: 100%/15  | Total:  1h 25m | Avg:  5m 43s | Max: 16m 50s | Hits:  98%/11568 
      🟩 11.8               Pass: 100%/3   | Total: 22m 13s | Avg:  7m 24s | Max:  7m 41s | Hits:  98%/2556  
      🔍 12.5               Pass:  99%/113 | Total:  1d 01h | Avg: 13m 18s | Max: 35m 51s | Hits:  93%/94191 
    🔍 cudacxx: nvcc12.5 🔍
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  4m 59s | Hits:  98%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 25m | Avg:  5m 43s | Max: 16m 50s | Hits:  98%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total: 22m 13s | Avg:  7m 24s | Max:  7m 41s | Hits:  98%/2556  
      🔍 nvcc12.5           Pass:  99%/111 | Total:  1d 00h | Avg: 13m 28s | Max: 35m 51s | Hits:  93%/92783 
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  4m 59s | Hits:  98%/1408  
      🔍 nvcc               Pass:  99%/129 | Total:  1d 02h | Avg: 12m 25s | Max: 35m 51s | Hits:  94%/106907
    🔍 cxx: Clang17 🔍
      🟩 Clang9             Pass: 100%/6   | Total: 36m 35s | Avg:  6m 05s | Max:  7m 24s | Hits:  98%/4890  
      🟩 Clang10            Pass: 100%/3   | Total: 22m 47s | Avg:  7m 35s | Max:  8m 02s | Hits:  98%/2562  
      🟩 Clang11            Pass: 100%/4   | Total: 27m 23s | Avg:  6m 50s | Max:  7m 17s | Hits:  98%/3416  
      🟩 Clang12            Pass: 100%/4   | Total: 26m 40s | Avg:  6m 40s | Max:  6m 49s | Hits:  98%/3416  
      🟩 Clang13            Pass: 100%/4   | Total: 26m 50s | Avg:  6m 42s | Max:  6m 56s | Hits:  98%/3416  
      🟩 Clang14            Pass: 100%/4   | Total: 27m 11s | Avg:  6m 47s | Max:  6m 58s | Hits:  98%/3416  
      🟩 Clang15            Pass: 100%/4   | Total: 27m 06s | Avg:  6m 46s | Max:  7m 21s | Hits:  98%/3408  
      🟩 Clang16            Pass: 100%/4   | Total: 26m 52s | Avg:  6m 43s | Max:  6m 50s | Hits:  98%/3408  
      🔍 Clang17            Pass:  96%/26  | Total:  7h 48m | Avg: 18m 01s | Max: 31m 24s | Hits:  92%/21004 
      🟩 GCC6               Pass: 100%/2   | Total:  9m 20s | Avg:  4m 40s | Max:  4m 41s | Hits:  98%/1552  
      🟩 GCC7               Pass: 100%/6   | Total: 58m 11s | Avg:  9m 41s | Max: 15m 26s | Hits:  86%/4893  
      🟩 GCC8               Pass: 100%/6   | Total: 56m 57s | Avg:  9m 29s | Max: 14m 27s | Hits:  86%/4893  
      🟩 GCC9               Pass: 100%/6   | Total:  1h 04m | Avg: 10m 44s | Max: 16m 36s | Hits:  84%/4893  
      🟩 GCC10              Pass: 100%/4   | Total: 26m 35s | Avg:  6m 38s | Max:  6m 55s | Hits:  97%/3416  
      🟩 GCC11              Pass: 100%/7   | Total: 49m 43s | Avg:  7m 06s | Max:  7m 41s | Hits:  98%/5964  
      🟩 GCC12              Pass: 100%/4   | Total: 27m 47s | Avg:  6m 56s | Max:  7m 17s | Hits:  97%/3408  
      🟩 GCC13              Pass: 100%/28  | Total:  8h 40m | Avg: 18m 34s | Max: 35m 51s | Hits:  92%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 24m 22s | Avg:  8m 07s | Max:  8m 17s | Hits:  98%/2334  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 16m 50s | Avg: 16m 50s | Max: 16m 50s | Hits:  97%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 26m 05s | Avg: 13m 02s | Max: 13m 03s | Hits:  97%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 42m 20s | Avg: 14m 06s | Max: 14m 35s | Hits:  97%/2085  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  98%/59  | Total: 11h 30m | Avg: 11m 41s | Max: 31m 24s | Hits:  95%/48936 
      🟩 GCC                Pass: 100%/63  | Total: 13h 33m | Avg: 12m 54s | Max: 35m 51s | Hits:  92%/52875 
      🟩 Intel              Pass: 100%/3   | Total: 24m 22s | Avg:  8m 07s | Max:  8m 17s | Hits:  98%/2334  
      🟩 MSVC               Pass: 100%/6   | Total:  1h 25m | Avg: 14m 12s | Max: 16m 50s | Hits:  97%/4170  
    🔍 jobs: GraphCapture 🔍
      🟩 Build              Pass: 100%/99  | Total: 15h 32m | Avg:  9m 24s | Max: 29m 05s | Hits:  92%/81903 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 42m | Avg: 20m 17s | Max: 25m 54s | Hits:  99%/6816  
      🔍 GraphCapture       Pass:  87%/8   | Total:  2h 09m | Avg: 16m 08s | Max: 22m 56s | Hits:  99%/5964  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 35m | Avg: 19m 24s | Max: 25m 48s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 54m | Avg: 29m 15s | Max: 35m 51s | Hits:  99%/6816  
    🔍 std: 14 🔍
      🟩 11                 Pass: 100%/34  | Total:  6h 38m | Avg: 11m 43s | Max: 35m 48s | Hits:  93%/28537 
      🔍 14                 Pass:  97%/37  | Total:  6h 59m | Avg: 11m 19s | Max: 28m 24s | Hits:  94%/29770 
      🟩 17                 Pass: 100%/36  | Total:  7h 51m | Avg: 13m 06s | Max: 35m 51s | Hits:  94%/29855 
      🟩 20                 Pass: 100%/24  | Total:  5h 23m | Avg: 13m 27s | Max: 29m 05s | Hits:  94%/20153 
    🟨 gpu
      🟨 v100               Pass:  99%/131 | Total:  1d 02h | Avg: 12m 18s | Max: 35m 51s | Hits:  94%/108315
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 22m 13s | Avg:  7m 24s | Max:  7m 41s | Hits:  98%/2556  
      🟩 90a                Pass: 100%/4   | Total: 20m 00s | Avg:  5m 00s | Max:  5m 30s | Hits:  97%/3408  
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 19h | Avg: 22m 16s | Max: 39m 06s | Hits: 76%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 16h | Avg: 22m 08s | Max: 39m 06s | Hits:  76%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 13m | Avg: 24m 12s | Max: 26m 41s | Hits:  69%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  4h 22m | Avg: 17m 29s | Max: 37m 02s | Hits:  79%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 35m | Avg: 31m 58s | Max: 35m 13s | Hits:  69%/3543  
      🟩 12.5               Pass: 100%/100 | Total:  1d 13h | Avg: 22m 42s | Max: 39m 06s | Hits:  75%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 48m 37s | Avg: 24m 18s | Max: 25m 39s | Hits:  68%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  4h 22m | Avg: 17m 29s | Max: 37m 02s | Hits:  79%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 35m | Avg: 31m 58s | Max: 35m 13s | Hits:  69%/3543  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 13h | Avg: 22m 40s | Max: 39m 06s | Hits:  76%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 48m 37s | Avg: 24m 18s | Max: 25m 39s | Hits:  68%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 19h | Avg: 22m 14s | Max: 39m 06s | Hits:  76%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 59m | Avg: 19m 52s | Max: 25m 22s | Hits:  74%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 19m | Avg: 26m 38s | Max: 29m 00s | Hits:  70%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 36m | Avg: 24m 13s | Max: 26m 22s | Hits:  69%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 38m | Avg: 24m 43s | Max: 26m 05s | Hits:  69%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 39m | Avg: 24m 46s | Max: 27m 24s | Hits:  69%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 37m | Avg: 24m 27s | Max: 26m 05s | Hits:  70%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 09s | Max: 28m 23s | Hits:  69%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 35m | Avg: 23m 59s | Max: 25m 51s | Hits:  69%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 11m | Avg: 17m 19s | Max: 26m 45s | Hits:  83%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 30m 47s | Avg: 15m 23s | Max: 16m 18s | Hits:  79%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 03m | Avg: 20m 34s | Max: 26m 04s | Hits:  74%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  1h 58m | Avg: 19m 49s | Max: 25m 50s | Hits:  74%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 06m | Avg: 21m 03s | Max: 27m 49s | Hits:  74%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 48m | Avg: 27m 08s | Max: 30m 23s | Hits:  69%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 23m | Avg: 29m 02s | Max: 35m 13s | Hits:  69%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 45m | Avg: 26m 18s | Max: 28m 10s | Hits:  69%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  5h 43m | Avg: 17m 10s | Max: 27m 54s | Hits:  81%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 33m | Avg: 31m 13s | Max: 34m 18s | Hits:  70%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 37m 02s | Avg: 37m 02s | Max: 37m 02s | Hits:  75%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 12m | Avg: 36m 28s | Max: 37m 12s | Hits:  75%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  2h 45m | Avg: 27m 34s | Max: 39m 06s | Hits:  87%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 18h 20m | Avg: 21m 34s | Max: 29m 00s | Hits:  75%/60180 
      🟩 GCC                Pass: 100%/55  | Total: 19h 19m | Avg: 21m 05s | Max: 35m 13s | Hits:  76%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 33m | Avg: 31m 13s | Max: 34m 18s | Hits:  70%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  4h 35m | Avg: 30m 36s | Max: 39m 06s | Hits:  83%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 19h | Avg: 22m 16s | Max: 39m 06s | Hits:  76%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 16h | Avg: 24m 27s | Max: 39m 06s | Hits:  71%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 41m | Avg:  9m 15s | Max: 18m 25s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 46m | Avg: 13m 17s | Max: 15m 10s | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 35m | Avg: 31m 58s | Max: 35m 13s | Hits:  69%/3543  
      🟩 90a                Pass: 100%/4   | Total: 56m 34s | Avg: 14m 08s | Max: 15m 35s | Hits:  69%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  9h 30m | Avg: 19m 00s | Max: 27m 51s | Hits:  77%/35418 
      🟩 14                 Pass: 100%/34  | Total: 13h 16m | Avg: 23m 26s | Max: 37m 12s | Hits:  75%/40122 
      🟩 17                 Pass: 100%/33  | Total: 12h 56m | Avg: 23m 32s | Max: 36m 01s | Hits:  75%/38946 
      🟩 20                 Pass: 100%/21  | Total:  8h 05m | Avg: 23m 06s | Max: 39m 06s | Hits:  76%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Jul 6, 2024

🟩 CI finished in 2h 16m: Pass: 100%/249 | Total: 2d 22h | Avg: 17m 04s | Max: 39m 06s | Hits: 84%/248433
  • 🟩 cub: Pass: 100%/131 | Total: 1d 03h | Avg: 12m 22s | Max: 35m 51s | Hits: 94%/109167

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  1d 02h | Avg: 12m 41s | Max: 35m 51s | Hits:  93%/102351
      🟩 arm64              Pass: 100%/8   | Total:  1h 00m | Avg:  7m 35s | Max:  8m 03s | Hits:  98%/6816  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  1h 25m | Avg:  5m 43s | Max: 16m 50s | Hits:  98%/11568 
      🟩 11.8               Pass: 100%/3   | Total: 22m 13s | Avg:  7m 24s | Max:  7m 41s | Hits:  98%/2556  
      🟩 12.5               Pass: 100%/113 | Total:  1d 01h | Avg: 13m 23s | Max: 35m 51s | Hits:  93%/95043 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  4m 59s | Hits:  98%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  1h 25m | Avg:  5m 43s | Max: 16m 50s | Hits:  98%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total: 22m 13s | Avg:  7m 24s | Max:  7m 41s | Hits:  98%/2556  
      🟩 nvcc12.5           Pass: 100%/111 | Total:  1d 01h | Avg: 13m 32s | Max: 35m 51s | Hits:  93%/93635 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 51s | Avg:  4m 55s | Max:  4m 59s | Hits:  98%/1408  
      🟩 nvcc               Pass: 100%/129 | Total:  1d 02h | Avg: 12m 29s | Max: 35m 51s | Hits:  94%/107759
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total: 36m 35s | Avg:  6m 05s | Max:  7m 24s | Hits:  98%/4890  
      🟩 Clang10            Pass: 100%/3   | Total: 22m 47s | Avg:  7m 35s | Max:  8m 02s | Hits:  98%/2562  
      🟩 Clang11            Pass: 100%/4   | Total: 27m 23s | Avg:  6m 50s | Max:  7m 17s | Hits:  98%/3416  
      🟩 Clang12            Pass: 100%/4   | Total: 26m 40s | Avg:  6m 40s | Max:  6m 49s | Hits:  98%/3416  
      🟩 Clang13            Pass: 100%/4   | Total: 26m 50s | Avg:  6m 42s | Max:  6m 56s | Hits:  98%/3416  
      🟩 Clang14            Pass: 100%/4   | Total: 27m 11s | Avg:  6m 47s | Max:  6m 58s | Hits:  98%/3416  
      🟩 Clang15            Pass: 100%/4   | Total: 27m 06s | Avg:  6m 46s | Max:  7m 21s | Hits:  98%/3408  
      🟩 Clang16            Pass: 100%/4   | Total: 26m 52s | Avg:  6m 43s | Max:  6m 50s | Hits:  98%/3408  
      🟩 Clang17            Pass: 100%/26  | Total:  7h 56m | Avg: 18m 20s | Max: 31m 24s | Hits:  92%/21856 
      🟩 GCC6               Pass: 100%/2   | Total:  9m 20s | Avg:  4m 40s | Max:  4m 41s | Hits:  98%/1552  
      🟩 GCC7               Pass: 100%/6   | Total: 58m 11s | Avg:  9m 41s | Max: 15m 26s | Hits:  86%/4893  
      🟩 GCC8               Pass: 100%/6   | Total: 56m 57s | Avg:  9m 29s | Max: 14m 27s | Hits:  86%/4893  
      🟩 GCC9               Pass: 100%/6   | Total:  1h 04m | Avg: 10m 44s | Max: 16m 36s | Hits:  84%/4893  
      🟩 GCC10              Pass: 100%/4   | Total: 26m 35s | Avg:  6m 38s | Max:  6m 55s | Hits:  97%/3416  
      🟩 GCC11              Pass: 100%/7   | Total: 49m 43s | Avg:  7m 06s | Max:  7m 41s | Hits:  98%/5964  
      🟩 GCC12              Pass: 100%/4   | Total: 27m 47s | Avg:  6m 56s | Max:  7m 17s | Hits:  97%/3408  
      🟩 GCC13              Pass: 100%/28  | Total:  8h 40m | Avg: 18m 34s | Max: 35m 51s | Hits:  92%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 24m 22s | Avg:  8m 07s | Max:  8m 17s | Hits:  98%/2334  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 16m 50s | Avg: 16m 50s | Max: 16m 50s | Hits:  97%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 26m 05s | Avg: 13m 02s | Max: 13m 03s | Hits:  97%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total: 42m 20s | Avg: 14m 06s | Max: 14m 35s | Hits:  97%/2085  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total: 11h 38m | Avg: 11m 50s | Max: 31m 24s | Hits:  95%/49788 
      🟩 GCC                Pass: 100%/63  | Total: 13h 33m | Avg: 12m 54s | Max: 35m 51s | Hits:  92%/52875 
      🟩 Intel              Pass: 100%/3   | Total: 24m 22s | Avg:  8m 07s | Max:  8m 17s | Hits:  98%/2334  
      🟩 MSVC               Pass: 100%/6   | Total:  1h 25m | Avg: 14m 12s | Max: 16m 50s | Hits:  97%/4170  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  1d 03h | Avg: 12m 22s | Max: 35m 51s | Hits:  94%/109167
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total: 15h 32m | Avg:  9m 24s | Max: 29m 05s | Hits:  92%/81903 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 42m | Avg: 20m 17s | Max: 25m 54s | Hits:  99%/6816  
      🟩 GraphCapture       Pass: 100%/8   | Total:  2h 17m | Avg: 17m 10s | Max: 22m 56s | Hits:  99%/6816  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 35m | Avg: 19m 24s | Max: 25m 48s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 54m | Avg: 29m 15s | Max: 35m 51s | Hits:  99%/6816  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total: 22m 13s | Avg:  7m 24s | Max:  7m 41s | Hits:  98%/2556  
      🟩 90a                Pass: 100%/4   | Total: 20m 00s | Avg:  5m 00s | Max:  5m 30s | Hits:  97%/3408  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  6h 38m | Avg: 11m 43s | Max: 35m 48s | Hits:  93%/28537 
      🟩 14                 Pass: 100%/37  | Total:  7h 07m | Avg: 11m 33s | Max: 28m 24s | Hits:  94%/30622 
      🟩 17                 Pass: 100%/36  | Total:  7h 51m | Avg: 13m 06s | Max: 35m 51s | Hits:  94%/29855 
      🟩 20                 Pass: 100%/24  | Total:  5h 23m | Avg: 13m 27s | Max: 29m 05s | Hits:  94%/20153 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 1d 19h | Avg: 22m 16s | Max: 39m 06s | Hits: 76%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 16h | Avg: 22m 08s | Max: 39m 06s | Hits:  76%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 13m | Avg: 24m 12s | Max: 26m 41s | Hits:  69%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  4h 22m | Avg: 17m 29s | Max: 37m 02s | Hits:  79%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 35m | Avg: 31m 58s | Max: 35m 13s | Hits:  69%/3543  
      🟩 12.5               Pass: 100%/100 | Total:  1d 13h | Avg: 22m 42s | Max: 39m 06s | Hits:  75%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 48m 37s | Avg: 24m 18s | Max: 25m 39s | Hits:  68%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  4h 22m | Avg: 17m 29s | Max: 37m 02s | Hits:  79%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 35m | Avg: 31m 58s | Max: 35m 13s | Hits:  69%/3543  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 13h | Avg: 22m 40s | Max: 39m 06s | Hits:  76%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 48m 37s | Avg: 24m 18s | Max: 25m 39s | Hits:  68%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 19h | Avg: 22m 14s | Max: 39m 06s | Hits:  76%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 59m | Avg: 19m 52s | Max: 25m 22s | Hits:  74%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 19m | Avg: 26m 38s | Max: 29m 00s | Hits:  70%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 36m | Avg: 24m 13s | Max: 26m 22s | Hits:  69%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 38m | Avg: 24m 43s | Max: 26m 05s | Hits:  69%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 39m | Avg: 24m 46s | Max: 27m 24s | Hits:  69%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 37m | Avg: 24m 27s | Max: 26m 05s | Hits:  70%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 40m | Avg: 25m 09s | Max: 28m 23s | Hits:  69%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 35m | Avg: 23m 59s | Max: 25m 51s | Hits:  69%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 11m | Avg: 17m 19s | Max: 26m 45s | Hits:  83%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 30m 47s | Avg: 15m 23s | Max: 16m 18s | Hits:  79%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 03m | Avg: 20m 34s | Max: 26m 04s | Hits:  74%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  1h 58m | Avg: 19m 49s | Max: 25m 50s | Hits:  74%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 06m | Avg: 21m 03s | Max: 27m 49s | Hits:  74%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 48m | Avg: 27m 08s | Max: 30m 23s | Hits:  69%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 23m | Avg: 29m 02s | Max: 35m 13s | Hits:  69%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 45m | Avg: 26m 18s | Max: 28m 10s | Hits:  69%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  5h 43m | Avg: 17m 10s | Max: 27m 54s | Hits:  81%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 33m | Avg: 31m 13s | Max: 34m 18s | Hits:  70%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 37m 02s | Avg: 37m 02s | Max: 37m 02s | Hits:  75%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 12m | Avg: 36m 28s | Max: 37m 12s | Hits:  75%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  2h 45m | Avg: 27m 34s | Max: 39m 06s | Hits:  87%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 18h 20m | Avg: 21m 34s | Max: 29m 00s | Hits:  75%/60180 
      🟩 GCC                Pass: 100%/55  | Total: 19h 19m | Avg: 21m 05s | Max: 35m 13s | Hits:  76%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 33m | Avg: 31m 13s | Max: 34m 18s | Hits:  70%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  4h 35m | Avg: 30m 36s | Max: 39m 06s | Hits:  83%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  1d 19h | Avg: 22m 16s | Max: 39m 06s | Hits:  76%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 16h | Avg: 24m 27s | Max: 39m 06s | Hits:  71%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 41m | Avg:  9m 15s | Max: 18m 25s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 46m | Avg: 13m 17s | Max: 15m 10s | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 35m | Avg: 31m 58s | Max: 35m 13s | Hits:  69%/3543  
      🟩 90a                Pass: 100%/4   | Total: 56m 34s | Avg: 14m 08s | Max: 15m 35s | Hits:  69%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total:  9h 30m | Avg: 19m 00s | Max: 27m 51s | Hits:  77%/35418 
      🟩 14                 Pass: 100%/34  | Total: 13h 16m | Avg: 23m 26s | Max: 37m 12s | Hits:  75%/40122 
      🟩 17                 Pass: 100%/33  | Total: 12h 56m | Avg: 23m 32s | Max: 36m 01s | Hits:  75%/38946 
      🟩 20                 Pass: 100%/21  | Total:  8h 05m | Avg: 23m 06s | Max: 39m 06s | Hits:  76%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

Copy link
Contributor

github-actions bot commented Jul 8, 2024

🟩 CI finished in 3h 25m: Pass: 100%/249 | Total: 3d 15h | Avg: 20m 58s | Max: 53m 03s | Hits: 82%/248433
  • 🟩 cub: Pass: 100%/131 | Total: 1d 14h | Avg: 17m 47s | Max: 34m 58s | Hits: 92%/109167

    🟩 cpu
      🟩 amd64              Pass: 100%/123 | Total:  1d 12h | Avg: 17m 34s | Max: 32m 33s | Hits:  92%/102351
      🟩 arm64              Pass: 100%/8   | Total:  2h 49m | Avg: 21m 10s | Max: 34m 58s | Hits:  86%/6816  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  3h 41m | Avg: 14m 46s | Max: 32m 33s | Hits:  92%/11568 
      🟩 11.8               Pass: 100%/3   | Total:  1h 07m | Avg: 22m 35s | Max: 23m 43s | Hits:  89%/2556  
      🟩 12.5               Pass: 100%/113 | Total:  1d 10h | Avg: 18m 03s | Max: 34m 58s | Hits:  92%/95043 
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 21m 42s | Avg: 10m 51s | Max: 11m 13s | Hits:  93%/1408  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  3h 41m | Avg: 14m 46s | Max: 32m 33s | Hits:  92%/11568 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 07m | Avg: 22m 35s | Max: 23m 43s | Hits:  89%/2556  
      🟩 nvcc12.5           Pass: 100%/111 | Total:  1d 09h | Avg: 18m 11s | Max: 34m 58s | Hits:  92%/93635 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 21m 42s | Avg: 10m 51s | Max: 11m 13s | Hits:  93%/1408  
      🟩 nvcc               Pass: 100%/129 | Total:  1d 14h | Avg: 17m 53s | Max: 34m 58s | Hits:  92%/107759
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  1h 33m | Avg: 15m 31s | Max: 17m 57s | Hits:  91%/4890  
      🟩 Clang10            Pass: 100%/3   | Total: 53m 54s | Avg: 17m 58s | Max: 19m 02s | Hits:  90%/2562  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 08m | Avg: 17m 00s | Max: 17m 19s | Hits:  90%/3416  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 09m | Avg: 17m 28s | Max: 19m 19s | Hits:  90%/3416  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 11m | Avg: 17m 57s | Max: 19m 43s | Hits:  90%/3416  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 08m | Avg: 17m 13s | Max: 19m 02s | Hits:  90%/3416  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 11m | Avg: 17m 47s | Max: 19m 23s | Hits:  90%/3408  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 09m | Avg: 17m 21s | Max: 18m 00s | Hits:  90%/3408  
      🟩 Clang17            Pass: 100%/26  | Total:  7h 33m | Avg: 17m 26s | Max: 25m 22s | Hits:  96%/21856 
      🟩 GCC6               Pass: 100%/2   | Total: 29m 31s | Avg: 14m 45s | Max: 15m 33s | Hits:  92%/1552  
      🟩 GCC7               Pass: 100%/6   | Total:  1h 29m | Avg: 14m 50s | Max: 17m 05s | Hits:  90%/4893  
      🟩 GCC8               Pass: 100%/6   | Total:  1h 32m | Avg: 15m 28s | Max: 18m 03s | Hits:  90%/4893  
      🟩 GCC9               Pass: 100%/6   | Total:  1h 37m | Avg: 16m 14s | Max: 21m 02s | Hits:  90%/4893  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 15m | Avg: 18m 52s | Max: 21m 13s | Hits:  89%/3416  
      🟩 GCC11              Pass: 100%/7   | Total:  2h 17m | Avg: 19m 37s | Max: 23m 43s | Hits:  89%/5964  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 10m | Avg: 17m 36s | Max: 18m 25s | Hits:  89%/3408  
      🟩 GCC13              Pass: 100%/28  | Total:  8h 15m | Avg: 17m 41s | Max: 34m 58s | Hits:  94%/23856 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total: 59m 09s | Avg: 19m 43s | Max: 21m 34s | Hits:  92%/2334  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 32m 33s | Avg: 32m 33s | Max: 32m 33s | Hits:  92%/695   
      🟩 MSVC14.29          Pass: 100%/2   | Total: 51m 08s | Avg: 25m 34s | Max: 26m 17s | Hits:  92%/1390  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  1h 20m | Avg: 26m 48s | Max: 28m 04s | Hits:  92%/2085  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/59  | Total: 16h 59m | Avg: 17m 17s | Max: 25m 22s | Hits:  92%/49788 
      🟩 GCC                Pass: 100%/63  | Total: 18h 07m | Avg: 17m 15s | Max: 34m 58s | Hits:  92%/52875 
      🟩 Intel              Pass: 100%/3   | Total: 59m 09s | Avg: 19m 43s | Max: 21m 34s | Hits:  92%/2334  
      🟩 MSVC               Pass: 100%/6   | Total:  2h 44m | Avg: 27m 21s | Max: 32m 33s | Hits:  92%/4170  
    🟩 gpu
      🟩 v100               Pass: 100%/131 | Total:  1d 14h | Avg: 17m 47s | Max: 34m 58s | Hits:  92%/109167
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 05h | Avg: 17m 40s | Max: 34m 58s | Hits:  90%/81903 
      🟩 DeviceLaunch       Pass: 100%/8   | Total:  2h 16m | Avg: 17m 04s | Max: 19m 28s | Hits:  99%/6816  
      🟩 GraphCapture       Pass: 100%/8   | Total:  1h 53m | Avg: 14m 07s | Max: 15m 26s | Hits:  99%/6816  
      🟩 HostLaunch         Pass: 100%/8   | Total:  2h 16m | Avg: 17m 00s | Max: 19m 47s | Hits:  99%/6816  
      🟩 TestGPU            Pass: 100%/8   | Total:  3h 14m | Avg: 24m 20s | Max: 26m 47s | Hits:  99%/6816  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 07m | Avg: 22m 35s | Max: 23m 43s | Hits:  89%/2556  
      🟩 90a                Pass: 100%/4   | Total: 38m 52s | Avg:  9m 43s | Max:  9m 51s | Hits:  89%/3408  
    🟩 std
      🟩 11                 Pass: 100%/34  | Total:  9h 40m | Avg: 17m 04s | Max: 25m 19s | Hits:  92%/28537 
      🟩 14                 Pass: 100%/37  | Total: 11h 31m | Avg: 18m 41s | Max: 34m 58s | Hits:  91%/30622 
      🟩 17                 Pass: 100%/36  | Total: 10h 31m | Avg: 17m 31s | Max: 27m 01s | Hits:  92%/29855 
      🟩 20                 Pass: 100%/24  | Total:  7h 07m | Avg: 17m 49s | Max: 25m 21s | Hits:  93%/20153 
    
  • 🟩 thrust: Pass: 100%/118 | Total: 2d 00h | Avg: 24m 29s | Max: 53m 03s | Hits: 75%/139266

    🟩 cpu
      🟩 amd64              Pass: 100%/110 | Total:  1d 20h | Avg: 24m 21s | Max: 53m 03s | Hits:  75%/129822
      🟩 arm64              Pass: 100%/8   | Total:  3h 31m | Avg: 26m 29s | Max: 30m 00s | Hits:  70%/9444  
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  5h 54m | Avg: 23m 39s | Max: 45m 19s | Hits:  70%/17705 
      🟩 11.8               Pass: 100%/3   | Total:  1h 49m | Avg: 36m 30s | Max: 42m 09s | Hits:  70%/3543  
      🟩 12.5               Pass: 100%/100 | Total:  1d 16h | Avg: 24m 15s | Max: 53m 03s | Hits:  76%/118018
    🟩 cudacxx
      🟩 ClangCUDA17        Pass: 100%/2   | Total: 52m 24s | Avg: 26m 12s | Max: 28m 11s | Hits:  70%/2360  
      🟩 nvcc11.1           Pass: 100%/15  | Total:  5h 54m | Avg: 23m 39s | Max: 45m 19s | Hits:  70%/17705 
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 49m | Avg: 36m 30s | Max: 42m 09s | Hits:  70%/3543  
      🟩 nvcc12.5           Pass: 100%/98  | Total:  1d 15h | Avg: 24m 13s | Max: 53m 03s | Hits:  76%/115658
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 24s | Avg: 26m 12s | Max: 28m 11s | Hits:  70%/2360  
      🟩 nvcc               Pass: 100%/116 | Total:  1d 23h | Avg: 24m 28s | Max: 53m 03s | Hits:  75%/136906
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 18m | Avg: 23m 04s | Max: 26m 27s | Hits:  70%/7080  
      🟩 Clang10            Pass: 100%/3   | Total:  1h 20m | Avg: 26m 46s | Max: 30m 15s | Hits:  70%/3540  
      🟩 Clang11            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 38s | Max: 27m 28s | Hits:  70%/4720  
      🟩 Clang12            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 41s | Max: 28m 11s | Hits:  70%/4720  
      🟩 Clang13            Pass: 100%/4   | Total:  1h 42m | Avg: 25m 42s | Max: 26m 52s | Hits:  70%/4720  
      🟩 Clang14            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 16s | Max: 28m 29s | Hits:  70%/4720  
      🟩 Clang15            Pass: 100%/4   | Total:  1h 45m | Avg: 26m 15s | Max: 31m 23s | Hits:  70%/4720  
      🟩 Clang16            Pass: 100%/4   | Total:  1h 41m | Avg: 25m 19s | Max: 27m 28s | Hits:  70%/4720  
      🟩 Clang17            Pass: 100%/18  | Total:  5h 29m | Avg: 18m 19s | Max: 28m 23s | Hits:  83%/21240 
      🟩 GCC6               Pass: 100%/2   | Total: 46m 33s | Avg: 23m 16s | Max: 25m 49s | Hits:  71%/2360  
      🟩 GCC7               Pass: 100%/6   | Total:  2h 23m | Avg: 23m 56s | Max: 28m 34s | Hits:  70%/7086  
      🟩 GCC8               Pass: 100%/6   | Total:  2h 22m | Avg: 23m 42s | Max: 26m 37s | Hits:  70%/7086  
      🟩 GCC9               Pass: 100%/6   | Total:  2h 26m | Avg: 24m 29s | Max: 28m 22s | Hits:  70%/7086  
      🟩 GCC10              Pass: 100%/4   | Total:  1h 54m | Avg: 28m 40s | Max: 33m 09s | Hits:  70%/4724  
      🟩 GCC11              Pass: 100%/7   | Total:  3h 36m | Avg: 30m 59s | Max: 42m 09s | Hits:  70%/8267  
      🟩 GCC12              Pass: 100%/4   | Total:  1h 52m | Avg: 28m 01s | Max: 33m 44s | Hits:  70%/4724  
      🟩 GCC13              Pass: 100%/20  | Total:  5h 56m | Avg: 17m 50s | Max: 30m 48s | Hits:  82%/23620 
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 40m | Avg: 33m 33s | Max: 37m 45s | Hits:  71%/3549  
      🟩 MSVC14.16          Pass: 100%/1   | Total: 45m 19s | Avg: 45m 19s | Max: 45m 19s | Hits:  68%/1176  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 35m | Avg: 47m 53s | Max: 50m 17s | Hits:  68%/2352  
      🟩 MSVC14.39          Pass: 100%/6   | Total:  3h 20m | Avg: 33m 27s | Max: 53m 03s | Hits:  83%/7056  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/51  | Total: 19h 28m | Avg: 22m 54s | Max: 31m 23s | Hits:  75%/60180 
      🟩 GCC                Pass: 100%/55  | Total: 21h 19m | Avg: 23m 16s | Max: 42m 09s | Hits:  74%/64953 
      🟩 Intel              Pass: 100%/3   | Total:  1h 40m | Avg: 33m 33s | Max: 37m 45s | Hits:  71%/3549  
      🟩 MSVC               Pass: 100%/9   | Total:  5h 41m | Avg: 37m 58s | Max: 53m 03s | Hits:  78%/10584 
    🟩 gpu
      🟩 v100               Pass: 100%/118 | Total:  2d 00h | Avg: 24m 29s | Max: 53m 03s | Hits:  75%/139266
    🟩 jobs
      🟩 Build              Pass: 100%/99  | Total:  1d 20h | Avg: 27m 08s | Max: 53m 03s | Hits:  70%/116850
      🟩 TestCPU            Pass: 100%/11  | Total:  1h 40m | Avg:  9m 06s | Max: 18m 06s | Hits:  99%/12972 
      🟩 TestGPU            Pass: 100%/8   | Total:  1h 42m | Avg: 12m 51s | Max: 14m 42s | Hits:  99%/9444  
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 49m | Avg: 36m 30s | Max: 42m 09s | Hits:  70%/3543  
      🟩 90a                Pass: 100%/4   | Total: 57m 22s | Avg: 14m 20s | Max: 15m 05s | Hits:  70%/4724  
    🟩 std
      🟩 11                 Pass: 100%/30  | Total: 10h 17m | Avg: 20m 34s | Max: 31m 01s | Hits:  76%/35418 
      🟩 14                 Pass: 100%/34  | Total: 14h 34m | Avg: 25m 42s | Max: 45m 41s | Hits:  74%/40122 
      🟩 17                 Pass: 100%/33  | Total: 14h 36m | Avg: 26m 33s | Max: 53m 03s | Hits:  74%/38946 
      🟩 20                 Pass: 100%/21  | Total:  8h 42m | Avg: 24m 54s | Max: 50m 04s | Hits:  77%/24780 
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental

🏃‍ Runner counts (total jobs: 249)

# Runner
178 linux-amd64-cpu16
40 linux-amd64-gpu-v100-latest-1
16 linux-arm64-cpu16
15 windows-amd64-cpu16

@elstehle elstehle merged commit 3797dc3 into NVIDIA:main Jul 9, 2024
257 of 261 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.6.0 Targeted for 2.6.0 release
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG]: Intermittent wrong output from thrust::remove_if under heavy GPU loading
6 participants