Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for large number of items to DeviceScan::*ByKey family of algorithms #2477

Merged
merged 9 commits into from
Oct 8, 2024

Conversation

elstehle
Copy link
Collaborator

@elstehle elstehle commented Sep 28, 2024

Description

Closes #2458

ScanByKey used to have the tile state comprising (1) the accumulated value and (2) the OffsetT. The OffsetT part is used by ReduceByKey to figure the offsets of the reduced aggregate (i.e., the number of unique keys preceding the input so far). However, for ScanByKey this information is irrelevant, since we write exactly one partial sum per input item. The only remaining use for the OffsetT in the case of ScanByKey is to carry information on where a new segment begins (i.e., the head flags). I.e., to implement something like this:

retval.value = (second.key) ? second.value : op(first.value, second.value);

This PR got rid of having the key of the tile state be of OffsetT and instead moved to using int.

Performance results

Note, we're planning on using choose_offset_t in the DeviceScan::*ByKey interface, so u32 and u64 are the columns of interest.

Summary: Comparing performance of different offset types, against previous main with offset type i32 PR:

Diff u32 vs main.i32 any num items Diff u32 vs main.i32 2^28 num items Diff i64 vs main.i32 any num items Diff i64 vs main.i32 2^28 num items Diff u64 vs main.i32 any num items Diff u64 vs main.i32 2^28 num items
min 88.48% 88.48% 91.18% 91.18% 91.16% 91.16%
max 104.77% 104.61% 109.70% 106.96% 109.86% 107.01%
avg 99.96% 99.51% 101.41% 101.08% 101.41% 101.08%
Detailed H100 exclusive.by_key results, all offset types versus main
KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 10.444 us 2.07% 10.254 us 2.10% -0.190 us -1.82% PASS
I8 I8 I32 2^20 14.606 us 1.54% 14.408 us 1.45% -0.198 us -1.36% PASS
I8 I8 I32 2^24 70.487 us 0.97% 70.839 us 0.94% 0.352 us 0.50% PASS
I8 I8 I32 2^28 948.508 us 0.50% 958.601 us 0.50% 10.093 us 1.06% FAIL
I8 I8 U32 2^16 10.673 us 2.26% 10.438 us 2.38% -0.235 us -2.20% PASS
I8 I8 U32 2^20 14.648 us 1.66% 14.410 us 1.47% -0.238 us -1.62% FAIL
I8 I8 U32 2^24 70.938 us 0.94% 70.737 us 0.95% -0.201 us -0.28% PASS
I8 I8 U32 2^28 951.433 us 0.50% 958.797 us 0.50% 7.364 us 0.77% FAIL
I8 I8 I64 2^16 10.751 us 2.71% 10.305 us 2.52% -0.446 us -4.14% FAIL
I8 I8 I64 2^20 14.916 us 1.54% 14.607 us 1.59% -0.310 us -2.08% FAIL
I8 I8 I64 2^24 76.837 us 0.73% 72.740 us 0.81% -4.096 us -5.33% FAIL
I8 I8 I64 2^28 1.058 ms 0.46% 988.581 us 0.49% -69.322 us -6.55% FAIL
I8 I8 U64 2^16 10.735 us 2.24% 10.216 us 2.00% -0.519 us -4.83% FAIL
I8 I8 U64 2^20 14.850 us 2.00% 14.646 us 1.72% -0.204 us -1.37% PASS
I8 I8 U64 2^24 76.538 us 0.78% 72.669 us 0.80% -3.869 us -5.05% FAIL
I8 I8 U64 2^28 1.055 ms 0.47% 988.742 us 0.50% -65.914 us -6.25% FAIL
I8 I16 I32 2^16 11.392 us 2.06% 11.803 us 2.06% 0.411 us 3.61% FAIL
I8 I16 I32 2^20 15.403 us 1.46% 15.558 us 1.67% 0.155 us 1.00% PASS
I8 I16 I32 2^24 88.372 us 0.90% 88.717 us 0.88% 0.346 us 0.39% PASS
I8 I16 I32 2^28 1.189 ms 0.50% 1.192 ms 0.50% 2.920 us 0.25% PASS
I8 I16 U32 2^16 11.420 us 1.90% 11.836 us 2.00% 0.416 us 3.64% FAIL
I8 I16 U32 2^20 15.548 us 1.39% 15.709 us 1.73% 0.161 us 1.04% PASS
I8 I16 U32 2^24 80.497 us 1.19% 80.614 us 1.15% 0.117 us 0.14% PASS
I8 I16 U32 2^28 1.053 ms 0.70% 1.052 ms 0.70% -0.521 us -0.05% PASS
I8 I16 I64 2^16 11.951 us 1.65% 12.246 us 2.10% 0.295 us 2.47% FAIL
I8 I16 I64 2^20 15.780 us 1.56% 15.892 us 1.77% 0.113 us 0.71% PASS
I8 I16 I64 2^24 90.673 us 0.87% 89.055 us 0.89% -1.618 us -1.78% FAIL
I8 I16 I64 2^28 1.223 ms 0.50% 1.191 ms 0.50% -31.750 us -2.60% FAIL
I8 I16 U64 2^16 11.920 us 1.64% 12.234 us 2.23% 0.314 us 2.63% FAIL
I8 I16 U64 2^20 15.760 us 1.55% 15.873 us 1.74% 0.113 us 0.71% PASS
I8 I16 U64 2^24 90.917 us 0.88% 89.035 us 0.92% -1.882 us -2.07% FAIL
I8 I16 U64 2^28 1.228 ms 0.50% 1.191 ms 0.50% -37.171 us -3.03% FAIL
I8 I32 I32 2^16 10.916 us 1.97% 11.672 us 1.99% 0.756 us 6.92% FAIL
I8 I32 I32 2^20 17.849 us 1.45% 18.428 us 1.46% 0.580 us 3.25% FAIL
I8 I32 I32 2^24 103.109 us 1.06% 107.384 us 0.80% 4.275 us 4.15% FAIL
I8 I32 I32 2^28 1.485 ms 0.71% 1.525 ms 0.65% 40.731 us 2.74% FAIL
I8 I32 U32 2^16 11.068 us 1.92% 11.383 us 1.65% 0.315 us 2.84% FAIL
I8 I32 U32 2^20 17.992 us 1.48% 18.143 us 1.43% 0.150 us 0.84% PASS
I8 I32 U32 2^24 103.279 us 1.03% 107.032 us 0.83% 3.753 us 3.63% FAIL
I8 I32 U32 2^28 1.485 ms 0.70% 1.525 ms 0.67% 40.449 us 2.72% FAIL
I8 I32 I64 2^16 11.676 us 1.90% 11.362 us 2.04% -0.314 us -2.69% FAIL
I8 I32 I64 2^20 18.221 us 1.36% 18.733 us 1.45% 0.512 us 2.81% FAIL
I8 I32 I64 2^24 113.216 us 0.79% 106.501 us 0.72% -6.715 us -5.93% FAIL
I8 I32 I64 2^28 1.597 ms 0.68% 1.523 ms 0.65% -74.126 us -4.64% FAIL
I8 I32 U64 2^16 11.390 us 1.98% 11.443 us 2.14% 0.054 us 0.47% PASS
I8 I32 U64 2^20 18.115 us 1.58% 18.650 us 1.45% 0.535 us 2.95% FAIL
I8 I32 U64 2^24 105.933 us 0.94% 106.423 us 0.71% 0.490 us 0.46% PASS
I8 I32 U64 2^28 1.521 ms 0.68% 1.523 ms 0.66% 1.775 us 0.12% PASS
I8 I64 I32 2^16 11.353 us 2.18% 11.422 us 2.41% 0.070 us 0.61% PASS
I8 I64 I32 2^20 20.815 us 1.39% 20.789 us 1.43% -0.026 us -0.13% PASS
I8 I64 I32 2^24 175.564 us 1.02% 175.622 us 1.03% 0.057 us 0.03% PASS
I8 I64 I32 2^28 2.630 ms 0.54% 2.631 ms 0.53% 0.888 us 0.03% PASS
I8 I64 U32 2^16 11.425 us 2.43% 11.366 us 2.16% -0.059 us -0.51% PASS
I8 I64 U32 2^20 20.764 us 1.30% 20.851 us 1.36% 0.086 us 0.42% PASS
I8 I64 U32 2^24 175.610 us 1.03% 175.615 us 1.01% 0.005 us 0.00% PASS
I8 I64 U32 2^28 2.630 ms 0.53% 2.630 ms 0.54% 0.480 us 0.02% PASS
I8 I64 I64 2^16 13.212 us 1.76% 11.650 us 2.44% -1.562 us -11.82% FAIL
I8 I64 I64 2^20 27.640 us 1.49% 21.230 us 1.41% -6.411 us -23.19% FAIL
I8 I64 I64 2^24 230.116 us 0.56% 176.152 us 1.03% -53.964 us -23.45% FAIL
I8 I64 I64 2^28 3.408 ms 0.50% 2.633 ms 0.53% -775.531 us -22.76% FAIL
I8 I64 U64 2^16 13.198 us 1.87% 11.990 us 2.42% -1.208 us -9.15% FAIL
I8 I64 U64 2^20 27.645 us 1.55% 21.026 us 1.50% -6.619 us -23.94% FAIL
I8 I64 U64 2^24 228.994 us 0.56% 176.024 us 1.00% -52.970 us -23.13% FAIL
I8 I64 U64 2^28 3.388 ms 0.50% 2.632 ms 0.53% -756.052 us -22.31% FAIL
I8 I128 I32 2^16 16.125 us 1.79% 15.823 us 1.83% -0.301 us -1.87% FAIL
I8 I128 I32 2^20 40.420 us 1.66% 36.649 us 1.70% -3.771 us -9.33% FAIL
I8 I128 I32 2^24 378.670 us 0.46% 346.441 us 0.76% -32.229 us -8.51% FAIL
I8 I128 I32 2^28 5.838 ms 0.11% 5.263 ms 0.22% -575.406 us -9.86% FAIL
I8 I128 U32 2^16 15.849 us 1.50% 16.015 us 1.68% 0.166 us 1.05% PASS
I8 I128 U32 2^20 40.424 us 1.64% 36.693 us 1.79% -3.732 us -9.23% FAIL
I8 I128 U32 2^24 378.732 us 0.46% 346.390 us 0.79% -32.342 us -8.54% FAIL
I8 I128 U32 2^28 5.840 ms 0.13% 5.265 ms 0.20% -575.131 us -9.85% FAIL
I8 I128 I64 2^16 15.517 us 1.54% 15.833 us 1.58% 0.315 us 2.03% FAIL
I8 I128 I64 2^20 40.548 us 1.78% 36.881 us 1.69% -3.667 us -9.04% FAIL
I8 I128 I64 2^24 379.454 us 0.45% 346.525 us 0.76% -32.929 us -8.68% FAIL
I8 I128 I64 2^28 5.851 ms 0.13% 5.272 ms 0.20% -578.183 us -9.88% FAIL
I8 I128 U64 2^16 15.764 us 1.58% 15.880 us 1.61% 0.116 us 0.74% PASS
I8 I128 U64 2^20 40.529 us 1.66% 36.698 us 1.71% -3.831 us -9.45% FAIL
I8 I128 U64 2^24 379.760 us 0.46% 346.772 us 0.80% -32.988 us -8.69% FAIL
I8 I128 U64 2^28 5.856 ms 0.12% 5.271 ms 0.22% -585.116 us -9.99% FAIL
I16 I8 I32 2^16 10.444 us 2.47% 10.174 us 2.46% -0.270 us -2.59% FAIL
I16 I8 I32 2^20 16.160 us 1.64% 16.325 us 1.41% 0.164 us 1.02% PASS
I16 I8 I32 2^24 77.884 us 0.68% 80.326 us 0.53% 2.442 us 3.14% FAIL
I16 I8 I32 2^28 1.037 ms 0.50% 1.083 ms 0.41% 45.774 us 4.41% FAIL
I16 I8 U32 2^16 10.389 us 2.64% 10.177 us 2.37% -0.211 us -2.03% PASS
I16 I8 U32 2^20 16.156 us 1.62% 16.208 us 1.52% 0.051 us 0.32% PASS
I16 I8 U32 2^24 77.612 us 0.71% 80.178 us 0.53% 2.565 us 3.31% FAIL
I16 I8 U32 2^28 1.030 ms 0.50% 1.083 ms 0.42% 52.674 us 5.11% FAIL
I16 I8 I64 2^16 10.640 us 2.55% 10.358 us 2.44% -0.282 us -2.65% FAIL
I16 I8 I64 2^20 16.232 us 1.69% 16.299 us 1.41% 0.068 us 0.42% PASS
I16 I8 I64 2^24 82.322 us 0.61% 80.286 us 0.51% -2.036 us -2.47% FAIL
I16 I8 I64 2^28 1.117 ms 0.45% 1.084 ms 0.39% -33.278 us -2.98% FAIL
I16 I8 U64 2^16 10.600 us 2.69% 10.312 us 2.57% -0.289 us -2.72% FAIL
I16 I8 U64 2^20 16.335 us 1.75% 16.339 us 1.43% 0.004 us 0.02% PASS
I16 I8 U64 2^24 82.462 us 0.63% 80.328 us 0.54% -2.133 us -2.59% FAIL
I16 I8 U64 2^28 1.118 ms 0.45% 1.084 ms 0.39% -34.317 us -3.07% FAIL
I16 I16 I32 2^16 11.509 us 2.33% 11.303 us 2.16% -0.206 us -1.79% PASS
I16 I16 I32 2^20 16.620 us 1.32% 16.575 us 1.25% -0.045 us -0.27% PASS
I16 I16 I32 2^24 90.001 us 1.45% 89.570 us 1.30% -0.431 us -0.48% PASS
I16 I16 I32 2^28 1.172 ms 0.83% 1.171 ms 0.82% -0.054 us -0.00% PASS
I16 I16 U32 2^16 11.716 us 1.98% 11.683 us 2.05% -0.033 us -0.28% PASS
I16 I16 U32 2^20 16.572 us 1.67% 16.625 us 1.58% 0.053 us 0.32% PASS
I16 I16 U32 2^24 89.877 us 1.44% 89.762 us 1.33% -0.115 us -0.13% PASS
I16 I16 U32 2^28 1.170 ms 0.83% 1.170 ms 0.82% 0.196 us 0.02% PASS
I16 I16 I64 2^16 11.753 us 2.45% 11.736 us 2.38% -0.017 us -0.14% PASS
I16 I16 I64 2^20 16.555 us 1.68% 16.522 us 1.58% -0.032 us -0.19% PASS
I16 I16 I64 2^24 95.041 us 1.21% 89.823 us 1.40% -5.218 us -5.49% FAIL
I16 I16 I64 2^28 1.233 ms 0.76% 1.168 ms 0.83% -65.790 us -5.33% FAIL
I16 I16 U64 2^16 11.781 us 2.43% 11.741 us 2.40% -0.040 us -0.34% PASS
I16 I16 U64 2^20 16.496 us 1.68% 16.481 us 1.66% -0.015 us -0.09% PASS
I16 I16 U64 2^24 94.829 us 1.22% 89.842 us 1.41% -4.987 us -5.26% FAIL
I16 I16 U64 2^28 1.233 ms 0.79% 1.168 ms 0.85% -65.241 us -5.29% FAIL
I16 I32 I32 2^16 11.811 us 2.18% 11.532 us 2.79% -0.280 us -2.37% FAIL
I16 I32 I32 2^20 17.691 us 1.53% 17.580 us 1.37% -0.110 us -0.62% PASS
I16 I32 I32 2^24 111.346 us 1.25% 108.039 us 1.19% -3.308 us -2.97% FAIL
I16 I32 I32 2^28 1.597 ms 0.69% 1.566 ms 0.74% -31.246 us -1.96% FAIL
I16 I32 U32 2^16 11.851 us 2.13% 11.323 us 2.40% -0.528 us -4.46% FAIL
I16 I32 U32 2^20 17.937 us 1.33% 17.683 us 1.25% -0.254 us -1.41% FAIL
I16 I32 U32 2^24 111.456 us 1.26% 108.183 us 1.16% -3.272 us -2.94% FAIL
I16 I32 U32 2^28 1.597 ms 0.69% 1.566 ms 0.73% -30.978 us -1.94% FAIL
I16 I32 I64 2^16 11.809 us 2.21% 11.762 us 2.04% -0.047 us -0.40% PASS
I16 I32 I64 2^20 18.173 us 1.36% 17.893 us 1.34% -0.280 us -1.54% FAIL
I16 I32 I64 2^24 115.074 us 0.98% 114.450 us 1.07% -0.624 us -0.54% PASS
I16 I32 I64 2^28 1.641 ms 0.66% 1.637 ms 0.65% -3.882 us -0.24% PASS
I16 I32 U64 2^16 11.756 us 2.21% 11.770 us 2.14% 0.014 us 0.12% PASS
I16 I32 U64 2^20 17.986 us 1.49% 17.882 us 1.49% -0.104 us -0.58% PASS
I16 I32 U64 2^24 114.672 us 1.08% 114.360 us 1.05% -0.312 us -0.27% PASS
I16 I32 U64 2^28 1.640 ms 0.67% 1.637 ms 0.65% -3.471 us -0.21% PASS
I16 I64 I32 2^16 11.706 us 2.49% 11.752 us 2.40% 0.046 us 0.40% PASS
I16 I64 I32 2^20 21.664 us 1.32% 21.729 us 1.34% 0.065 us 0.30% PASS
I16 I64 I32 2^24 183.793 us 1.01% 183.955 us 1.02% 0.163 us 0.09% PASS
I16 I64 I32 2^28 2.745 ms 0.50% 2.746 ms 0.50% 0.603 us 0.02% PASS
I16 I64 U32 2^16 11.739 us 2.33% 11.717 us 2.20% -0.022 us -0.19% PASS
I16 I64 U32 2^20 21.660 us 1.38% 21.658 us 1.42% -0.002 us -0.01% PASS
I16 I64 U32 2^24 183.891 us 1.01% 183.952 us 1.02% 0.061 us 0.03% PASS
I16 I64 U32 2^28 2.746 ms 0.50% 2.746 ms 0.50% 0.218 us 0.01% PASS
I16 I64 I64 2^16 13.643 us 1.68% 12.154 us 2.02% -1.490 us -10.92% FAIL
I16 I64 I64 2^20 28.559 us 1.43% 21.621 us 1.51% -6.938 us -24.29% FAIL
I16 I64 I64 2^24 234.724 us 0.58% 184.042 us 1.00% -50.683 us -21.59% FAIL
I16 I64 I64 2^28 3.462 ms 0.50% 2.747 ms 0.50% -714.823 us -20.65% FAIL
I16 I64 U64 2^16 13.575 us 1.93% 12.117 us 2.21% -1.458 us -10.74% FAIL
I16 I64 U64 2^20 28.387 us 1.44% 21.595 us 1.37% -6.791 us -23.92% FAIL
I16 I64 U64 2^24 233.131 us 0.57% 184.056 us 1.00% -49.075 us -21.05% FAIL
I16 I64 U64 2^28 3.435 ms 0.50% 2.747 ms 0.50% -688.225 us -20.04% FAIL
I16 I128 I32 2^16 16.177 us 1.54% 15.961 us 1.50% -0.217 us -1.34% PASS
I16 I128 I32 2^20 38.777 us 1.97% 38.106 us 1.81% -0.671 us -1.73% PASS
I16 I128 I32 2^24 360.280 us 0.70% 355.293 us 0.87% -4.987 us -1.38% FAIL
I16 I128 I32 2^28 5.501 ms 0.18% 5.419 ms 0.23% -82.494 us -1.50% FAIL
I16 I128 U32 2^16 16.268 us 1.83% 15.762 us 1.66% -0.506 us -3.11% FAIL
I16 I128 U32 2^20 38.458 us 1.44% 38.162 us 1.77% -0.296 us -0.77% PASS
I16 I128 U32 2^24 360.421 us 0.71% 355.608 us 0.86% -4.812 us -1.34% FAIL
I16 I128 U32 2^28 5.501 ms 0.17% 5.421 ms 0.24% -80.608 us -1.47% FAIL
I16 I128 I64 2^16 15.921 us 1.53% 15.985 us 1.55% 0.064 us 0.40% PASS
I16 I128 I64 2^20 41.403 us 1.38% 38.180 us 1.45% -3.223 us -7.78% FAIL
I16 I128 I64 2^24 382.156 us 0.55% 355.526 us 0.72% -26.630 us -6.97% FAIL
I16 I128 I64 2^28 5.887 ms 0.15% 5.419 ms 0.20% -467.638 us -7.94% FAIL
I16 I128 U64 2^16 15.916 us 1.52% 16.000 us 1.53% 0.084 us 0.52% PASS
I16 I128 U64 2^20 41.487 us 1.45% 38.300 us 1.59% -3.186 us -7.68% FAIL
I16 I128 U64 2^24 382.504 us 0.56% 355.456 us 0.76% -27.048 us -7.07% FAIL
I16 I128 U64 2^28 5.891 ms 0.15% 5.421 ms 0.21% -470.221 us -7.98% FAIL
I32 I8 I32 2^16 10.302 us 2.04% 10.343 us 2.12% 0.042 us 0.40% PASS
I32 I8 I32 2^20 16.555 us 1.30% 16.740 us 1.35% 0.185 us 1.12% PASS
I32 I8 I32 2^24 90.851 us 1.09% 91.085 us 1.09% 0.234 us 0.26% PASS
I32 I8 I32 2^28 1.158 ms 0.74% 1.162 ms 0.73% 3.370 us 0.29% PASS
I32 I8 U32 2^16 10.636 us 2.25% 10.619 us 2.59% -0.017 us -0.16% PASS
I32 I8 U32 2^20 16.938 us 1.54% 16.852 us 1.53% -0.086 us -0.51% PASS
I32 I8 U32 2^24 91.267 us 1.10% 91.326 us 1.10% 0.059 us 0.06% PASS
I32 I8 U32 2^28 1.160 ms 0.73% 1.162 ms 0.73% 2.202 us 0.19% PASS
I32 I8 I64 2^16 10.748 us 2.48% 10.715 us 2.21% -0.033 us -0.31% PASS
I32 I8 I64 2^20 17.118 us 1.72% 17.445 us 1.51% 0.327 us 1.91% FAIL
I32 I8 I64 2^24 94.859 us 0.92% 92.314 us 0.92% -2.545 us -2.68% FAIL
I32 I8 I64 2^28 1.235 ms 0.56% 1.180 ms 0.64% -54.801 us -4.44% FAIL
I32 I8 U64 2^16 10.749 us 2.43% 10.822 us 2.49% 0.073 us 0.68% PASS
I32 I8 U64 2^20 17.145 us 1.67% 17.287 us 1.44% 0.142 us 0.83% PASS
I32 I8 U64 2^24 94.536 us 0.92% 92.238 us 0.93% -2.298 us -2.43% FAIL
I32 I8 U64 2^28 1.229 ms 0.58% 1.180 ms 0.65% -48.678 us -3.96% FAIL
I32 I16 I32 2^16 11.347 us 2.01% 11.635 us 2.28% 0.288 us 2.54% FAIL
I32 I16 I32 2^20 17.504 us 1.39% 17.772 us 1.47% 0.268 us 1.53% FAIL
I32 I16 I32 2^24 100.892 us 1.16% 105.108 us 1.27% 4.216 us 4.18% FAIL
I32 I16 I32 2^28 1.404 ms 0.75% 1.443 ms 0.79% 39.782 us 2.83% FAIL
I32 I16 U32 2^16 11.415 us 2.02% 11.466 us 2.07% 0.051 us 0.45% PASS
I32 I16 U32 2^20 17.603 us 1.38% 17.780 us 1.25% 0.177 us 1.00% PASS
I32 I16 U32 2^24 101.019 us 1.15% 105.183 us 1.24% 4.164 us 4.12% FAIL
I32 I16 U32 2^28 1.403 ms 0.75% 1.444 ms 0.82% 41.200 us 2.94% FAIL
I32 I16 I64 2^16 11.829 us 2.10% 11.545 us 2.12% -0.285 us -2.41% FAIL
I32 I16 I64 2^20 17.479 us 1.49% 17.178 us 1.29% -0.302 us -1.73% FAIL
I32 I16 I64 2^24 109.023 us 1.06% 100.336 us 1.15% -8.687 us -7.97% FAIL
I32 I16 I64 2^28 1.448 ms 0.81% 1.401 ms 0.73% -47.301 us -3.27% FAIL
I32 I16 U64 2^16 11.772 us 2.18% 11.400 us 1.89% -0.371 us -3.15% FAIL
I32 I16 U64 2^20 17.389 us 1.22% 17.164 us 1.17% -0.225 us -1.29% FAIL
I32 I16 U64 2^24 108.986 us 1.08% 100.393 us 1.17% -8.594 us -7.89% FAIL
I32 I16 U64 2^28 1.448 ms 0.81% 1.401 ms 0.77% -47.365 us -3.27% FAIL
I32 I32 I32 2^16 11.703 us 1.69% 11.722 us 1.57% 0.019 us 0.16% PASS
I32 I32 I32 2^20 18.877 us 1.28% 19.004 us 1.24% 0.126 us 0.67% PASS
I32 I32 I32 2^24 129.443 us 0.98% 129.519 us 1.02% 0.076 us 0.06% PASS
I32 I32 I32 2^28 1.905 ms 0.54% 1.905 ms 0.54% 0.530 us 0.03% PASS
I32 I32 U32 2^16 11.692 us 1.53% 11.687 us 1.73% -0.005 us -0.04% PASS
I32 I32 U32 2^20 18.918 us 1.30% 18.973 us 1.24% 0.055 us 0.29% PASS
I32 I32 U32 2^24 129.540 us 0.99% 129.421 us 1.00% -0.119 us -0.09% PASS
I32 I32 U32 2^28 1.907 ms 0.55% 1.905 ms 0.54% -1.404 us -0.07% PASS
I32 I32 I64 2^16 11.997 us 1.88% 11.937 us 1.97% -0.060 us -0.50% PASS
I32 I32 I64 2^20 19.465 us 1.25% 18.899 us 1.19% -0.566 us -2.91% FAIL
I32 I32 I64 2^24 137.268 us 0.62% 136.152 us 0.76% -1.116 us -0.81% FAIL
I32 I32 I64 2^28 2.045 ms 0.53% 2.037 ms 0.54% -7.754 us -0.38% PASS
I32 I32 U64 2^16 11.973 us 1.93% 11.883 us 1.87% -0.090 us -0.75% PASS
I32 I32 U64 2^20 19.210 us 1.20% 19.008 us 1.28% -0.202 us -1.05% PASS
I32 I32 U64 2^24 137.247 us 0.68% 136.242 us 0.76% -1.005 us -0.73% FAIL
I32 I32 U64 2^28 2.045 ms 0.55% 2.038 ms 0.54% -7.292 us -0.36% PASS
I32 I64 I32 2^16 12.079 us 1.88% 12.470 us 2.07% 0.391 us 3.24% FAIL
I32 I64 I32 2^20 22.541 us 1.18% 22.814 us 1.29% 0.273 us 1.21% FAIL
I32 I64 I32 2^24 198.042 us 0.87% 198.111 us 0.86% 0.069 us 0.03% PASS
I32 I64 I32 2^28 2.938 ms 0.50% 2.939 ms 0.50% 0.276 us 0.01% PASS
I32 I64 U32 2^16 12.189 us 2.00% 12.454 us 2.05% 0.265 us 2.17% FAIL
I32 I64 U32 2^20 22.460 us 1.15% 22.880 us 1.29% 0.420 us 1.87% FAIL
I32 I64 U32 2^24 197.892 us 0.90% 197.984 us 0.90% 0.091 us 0.05% PASS
I32 I64 U32 2^28 2.939 ms 0.50% 2.939 ms 0.50% -0.278 us -0.01% PASS
I32 I64 I64 2^16 13.578 us 1.89% 12.688 us 2.26% -0.890 us -6.56% FAIL
I32 I64 I64 2^20 27.498 us 1.19% 23.100 us 1.30% -4.398 us -16.00% FAIL
I32 I64 I64 2^24 230.477 us 0.62% 200.895 us 0.92% -29.582 us -12.84% FAIL
I32 I64 I64 2^28 3.481 ms 0.50% 2.965 ms 0.50% -516.036 us -14.82% FAIL
I32 I64 U64 2^16 13.568 us 1.90% 12.646 us 2.15% -0.922 us -6.79% FAIL
I32 I64 U64 2^20 27.532 us 1.14% 23.041 us 1.36% -4.490 us -16.31% FAIL
I32 I64 U64 2^24 230.378 us 0.61% 200.767 us 0.89% -29.611 us -12.85% FAIL
I32 I64 U64 2^28 3.480 ms 0.50% 2.965 ms 0.50% -515.033 us -14.80% FAIL
I32 I128 I32 2^16 16.130 us 1.81% 16.262 us 1.79% 0.132 us 0.82% PASS
I32 I128 I32 2^20 39.356 us 1.55% 39.263 us 1.48% -0.094 us -0.24% PASS
I32 I128 I32 2^24 373.408 us 0.63% 367.760 us 0.68% -5.648 us -1.51% FAIL
I32 I128 I32 2^28 5.694 ms 0.18% 5.605 ms 0.17% -88.660 us -1.56% FAIL
I32 I128 U32 2^16 16.314 us 1.70% 15.752 us 1.51% -0.561 us -3.44% FAIL
I32 I128 U32 2^20 39.712 us 1.66% 38.896 us 1.37% -0.816 us -2.06% FAIL
I32 I128 U32 2^24 373.390 us 0.61% 367.196 us 0.67% -6.194 us -1.66% FAIL
I32 I128 U32 2^28 5.699 ms 0.15% 5.606 ms 0.17% -93.490 us -1.64% FAIL
I32 I128 I64 2^16 16.494 us 1.87% 16.202 us 1.75% -0.292 us -1.77% FAIL
I32 I128 I64 2^20 42.494 us 1.42% 39.057 us 1.55% -3.437 us -8.09% FAIL
I32 I128 I64 2^24 397.089 us 0.54% 367.948 us 0.67% -29.141 us -7.34% FAIL
I32 I128 I64 2^28 6.100 ms 0.14% 5.611 ms 0.16% -489.070 us -8.02% FAIL
I32 I128 U64 2^16 16.486 us 1.96% 16.176 us 1.67% -0.309 us -1.88% FAIL
I32 I128 U64 2^20 42.231 us 1.47% 38.988 us 1.48% -3.243 us -7.68% FAIL
I32 I128 U64 2^24 397.325 us 0.53% 367.900 us 0.66% -29.425 us -7.41% FAIL
I32 I128 U64 2^28 6.106 ms 0.15% 5.611 ms 0.17% -495.070 us -8.11% FAIL
I64 I8 I32 2^16 11.070 us 2.22% 10.819 us 2.23% -0.251 us -2.27% FAIL
I64 I8 I32 2^20 20.509 us 1.70% 20.673 us 1.67% 0.164 us 0.80% PASS
I64 I8 I32 2^24 128.937 us 0.81% 127.485 us 0.74% -1.452 us -1.13% FAIL
I64 I8 I32 2^28 1.731 ms 0.61% 1.718 ms 0.60% -13.500 us -0.78% FAIL
I64 I8 U32 2^16 11.062 us 2.28% 10.774 us 2.07% -0.288 us -2.60% FAIL
I64 I8 U32 2^20 20.521 us 1.82% 20.565 us 1.70% 0.044 us 0.21% PASS
I64 I8 U32 2^24 129.366 us 0.82% 127.608 us 0.74% -1.758 us -1.36% FAIL
I64 I8 U32 2^28 1.732 ms 0.62% 1.718 ms 0.60% -14.162 us -0.82% FAIL
I64 I8 I64 2^16 10.947 us 2.56% 10.720 us 2.17% -0.227 us -2.08% PASS
I64 I8 I64 2^20 21.056 us 1.65% 20.930 us 1.71% -0.126 us -0.60% PASS
I64 I8 I64 2^24 131.495 us 0.79% 129.328 us 0.76% -2.167 us -1.65% FAIL
I64 I8 I64 2^28 1.748 ms 0.56% 1.728 ms 0.59% -20.029 us -1.15% FAIL
I64 I8 U64 2^16 11.097 us 2.41% 11.016 us 2.40% -0.081 us -0.73% PASS
I64 I8 U64 2^20 20.933 us 1.79% 20.967 us 1.69% 0.035 us 0.17% PASS
I64 I8 U64 2^24 131.489 us 0.81% 129.384 us 0.78% -2.105 us -1.60% FAIL
I64 I8 U64 2^28 1.748 ms 0.56% 1.728 ms 0.61% -19.829 us -1.13% FAIL
I64 I16 I32 2^16 11.497 us 2.19% 11.423 us 2.24% -0.074 us -0.64% PASS
I64 I16 I32 2^20 21.754 us 1.99% 21.678 us 2.03% -0.076 us -0.35% PASS
I64 I16 I32 2^24 143.218 us 1.38% 143.433 us 1.43% 0.215 us 0.15% PASS
I64 I16 I32 2^28 2.129 ms 0.55% 2.131 ms 0.55% 2.128 us 0.10% PASS
I64 I16 U32 2^16 11.527 us 2.09% 11.333 us 2.39% -0.195 us -1.69% PASS
I64 I16 U32 2^20 21.872 us 1.99% 20.889 us 1.36% -0.982 us -4.49% FAIL
I64 I16 U32 2^24 143.855 us 1.48% 138.756 us 1.33% -5.099 us -3.54% FAIL
I64 I16 U32 2^28 2.134 ms 0.55% 2.040 ms 0.58% -93.562 us -4.39% FAIL
I64 I16 I64 2^16 11.734 us 1.92% 11.519 us 2.50% -0.214 us -1.83% PASS
I64 I16 I64 2^20 22.328 us 1.87% 22.080 us 1.90% -0.248 us -1.11% PASS
I64 I16 I64 2^24 145.029 us 1.35% 143.686 us 1.44% -1.343 us -0.93% PASS
I64 I16 I64 2^28 2.137 ms 0.55% 2.133 ms 0.56% -3.704 us -0.17% PASS
I64 I16 U64 2^16 11.710 us 2.27% 11.550 us 2.20% -0.160 us -1.36% PASS
I64 I16 U64 2^20 23.079 us 1.58% 21.974 us 2.08% -1.105 us -4.79% FAIL
I64 I16 U64 2^24 161.873 us 1.03% 143.618 us 1.45% -18.255 us -11.28% FAIL
I64 I16 U64 2^28 2.304 ms 0.50% 2.132 ms 0.55% -171.686 us -7.45% FAIL
I64 I32 I32 2^16 11.496 us 1.84% 11.403 us 1.71% -0.094 us -0.81% PASS
I64 I32 I32 2^20 23.222 us 1.18% 23.786 us 1.18% 0.565 us 2.43% FAIL
I64 I32 I32 2^24 171.077 us 1.09% 171.205 us 1.07% 0.128 us 0.08% PASS
I64 I32 I32 2^28 2.532 ms 0.50% 2.532 ms 0.50% 0.014 us 0.00% PASS
I64 I32 U32 2^16 11.726 us 2.23% 11.523 us 1.90% -0.203 us -1.73% PASS
I64 I32 U32 2^20 23.390 us 1.12% 23.790 us 1.21% 0.401 us 1.71% FAIL
I64 I32 U32 2^24 171.248 us 1.11% 171.214 us 1.06% -0.035 us -0.02% PASS
I64 I32 U32 2^28 2.532 ms 0.50% 2.532 ms 0.50% -0.531 us -0.02% PASS
I64 I32 I64 2^16 12.149 us 1.83% 11.717 us 1.77% -0.432 us -3.55% FAIL
I64 I32 I64 2^20 23.452 us 1.37% 23.458 us 1.23% 0.006 us 0.03% PASS
I64 I32 I64 2^24 173.890 us 1.14% 173.561 us 1.09% -0.329 us -0.19% PASS
I64 I32 I64 2^28 2.591 ms 0.50% 2.588 ms 0.50% -3.548 us -0.14% PASS
I64 I32 U64 2^16 11.969 us 1.98% 11.742 us 1.73% -0.226 us -1.89% FAIL
I64 I32 U64 2^20 23.349 us 1.37% 23.440 us 1.15% 0.091 us 0.39% PASS
I64 I32 U64 2^24 173.789 us 1.11% 173.539 us 1.09% -0.250 us -0.14% PASS
I64 I32 U64 2^28 2.591 ms 0.50% 2.588 ms 0.50% -2.690 us -0.10% PASS
I64 I64 I32 2^16 12.547 us 1.91% 12.053 us 2.04% -0.494 us -3.93% FAIL
I64 I64 I32 2^20 26.449 us 1.07% 25.630 us 1.13% -0.819 us -3.10% FAIL
I64 I64 I32 2^24 242.541 us 0.87% 238.161 us 0.95% -4.381 us -1.81% FAIL
I64 I64 I32 2^28 3.651 ms 0.32% 3.581 ms 0.32% -70.466 us -1.93% FAIL
I64 I64 U32 2^16 12.110 us 2.08% 12.286 us 2.29% 0.176 us 1.45% PASS
I64 I64 U32 2^20 25.473 us 1.08% 25.851 us 1.21% 0.378 us 1.48% FAIL
I64 I64 U32 2^24 238.171 us 0.94% 238.574 us 0.95% 0.403 us 0.17% PASS
I64 I64 U32 2^28 3.581 ms 0.33% 3.581 ms 0.30% 0.257 us 0.01% PASS
I64 I64 I64 2^16 13.810 us 1.76% 12.537 us 2.41% -1.274 us -9.22% FAIL
I64 I64 I64 2^20 33.004 us 1.73% 28.399 us 2.18% -4.605 us -13.95% FAIL
I64 I64 I64 2^24 275.017 us 0.59% 246.215 us 0.91% -28.802 us -10.47% FAIL
I64 I64 I64 2^28 4.185 ms 0.16% 3.736 ms 0.28% -449.836 us -10.75% FAIL
I64 I64 U64 2^16 13.815 us 1.82% 12.534 us 2.60% -1.281 us -9.27% FAIL
I64 I64 U64 2^20 32.955 us 1.65% 28.468 us 2.39% -4.487 us -13.62% FAIL
I64 I64 U64 2^24 274.967 us 0.60% 246.056 us 0.88% -28.911 us -10.51% FAIL
I64 I64 U64 2^28 4.184 ms 0.18% 3.736 ms 0.27% -447.552 us -10.70% FAIL
I64 I128 I32 2^16 15.673 us 1.21% 16.161 us 1.54% 0.488 us 3.12% FAIL
I64 I128 I32 2^20 43.491 us 1.70% 45.161 us 1.39% 1.670 us 3.84% FAIL
I64 I128 I32 2^24 421.192 us 0.59% 427.930 us 0.53% 6.738 us 1.60% FAIL
I64 I128 I32 2^28 6.494 ms 0.15% 6.587 ms 0.13% 92.916 us 1.43% FAIL
I64 I128 U32 2^16 15.663 us 1.22% 16.249 us 1.38% 0.586 us 3.74% FAIL
I64 I128 U32 2^20 43.459 us 1.60% 45.171 us 1.44% 1.712 us 3.94% FAIL
I64 I128 U32 2^24 420.981 us 0.64% 427.696 us 0.52% 6.715 us 1.59% FAIL
I64 I128 U32 2^28 6.495 ms 0.14% 6.589 ms 0.13% 94.045 us 1.45% FAIL
I64 I128 I64 2^16 15.942 us 1.30% 16.485 us 1.54% 0.544 us 3.41% FAIL
I64 I128 I64 2^20 44.996 us 1.52% 45.117 us 1.46% 0.121 us 0.27% PASS
I64 I128 I64 2^24 427.595 us 0.52% 427.321 us 0.52% -0.274 us -0.06% PASS
I64 I128 I64 2^28 6.589 ms 0.11% 6.579 ms 0.12% -9.897 us -0.15% FAIL
I64 I128 U64 2^16 16.213 us 1.35% 16.294 us 1.45% 0.081 us 0.50% PASS
I64 I128 U64 2^20 44.929 us 1.59% 45.093 us 1.48% 0.164 us 0.36% PASS
I64 I128 U64 2^24 427.290 us 0.54% 427.018 us 0.51% -0.273 us -0.06% PASS
I64 I128 U64 2^28 6.584 ms 0.10% 6.578 ms 0.14% -6.152 us -0.09% PASS
I128 I8 I32 2^16 12.093 us 2.15% 12.252 us 2.27% 0.159 us 1.32% PASS
I128 I8 I32 2^20 28.275 us 1.43% 28.871 us 1.29% 0.597 us 2.11% FAIL
I128 I8 I32 2^24 204.030 us 1.21% 203.265 us 1.11% -0.765 us -0.38% PASS
I128 I8 I32 2^28 3.014 ms 0.50% 3.001 ms 0.50% -13.640 us -0.45% PASS
I128 I8 U32 2^16 12.148 us 1.87% 12.256 us 1.95% 0.108 us 0.89% PASS
I128 I8 U32 2^20 28.531 us 1.44% 28.938 us 1.30% 0.407 us 1.43% FAIL
I128 I8 U32 2^24 203.971 us 1.21% 203.276 us 1.17% -0.695 us -0.34% PASS
I128 I8 U32 2^28 3.011 ms 0.50% 3.001 ms 0.50% -10.023 us -0.33% PASS
I128 I8 I64 2^16 12.278 us 2.32% 12.368 us 1.94% 0.089 us 0.73% PASS
I128 I8 I64 2^20 29.192 us 1.30% 28.958 us 1.37% -0.234 us -0.80% PASS
I128 I8 I64 2^24 224.953 us 1.06% 204.034 us 1.14% -20.918 us -9.30% FAIL
I128 I8 I64 2^28 3.276 ms 0.50% 3.001 ms 0.50% -275.555 us -8.41% FAIL
I128 I8 U64 2^16 12.249 us 2.23% 12.329 us 2.15% 0.080 us 0.66% PASS
I128 I8 U64 2^20 29.406 us 1.23% 29.116 us 1.34% -0.291 us -0.99% PASS
I128 I8 U64 2^24 225.019 us 1.08% 203.937 us 1.15% -21.082 us -9.37% FAIL
I128 I8 U64 2^28 3.277 ms 0.50% 3.001 ms 0.50% -276.116 us -8.43% FAIL
I128 I16 I32 2^16 12.024 us 1.91% 12.109 us 1.96% 0.086 us 0.71% PASS
I128 I16 I32 2^20 27.949 us 1.43% 27.979 us 1.45% 0.030 us 0.11% PASS
I128 I16 I32 2^24 220.376 us 1.06% 220.536 us 1.08% 0.160 us 0.07% PASS
I128 I16 I32 2^28 3.268 ms 0.50% 3.270 ms 0.50% 2.055 us 0.06% PASS
I128 I16 U32 2^16 12.026 us 2.00% 12.066 us 1.99% 0.039 us 0.33% PASS
I128 I16 U32 2^20 28.035 us 1.32% 28.037 us 1.41% 0.002 us 0.01% PASS
I128 I16 U32 2^24 220.406 us 1.07% 220.500 us 1.08% 0.094 us 0.04% PASS
I128 I16 U32 2^28 3.268 ms 0.50% 3.269 ms 0.50% 1.232 us 0.04% PASS
I128 I16 I64 2^16 12.242 us 1.97% 12.187 us 1.99% -0.054 us -0.45% PASS
I128 I16 I64 2^20 30.499 us 1.95% 28.059 us 1.39% -2.440 us -8.00% FAIL
I128 I16 I64 2^24 285.868 us 0.94% 220.701 us 1.08% -65.167 us -22.80% FAIL
I128 I16 I64 2^28 4.292 ms 0.27% 3.272 ms 0.50% -1019.727 us -23.76% FAIL
I128 I16 U64 2^16 12.030 us 1.96% 12.022 us 1.93% -0.008 us -0.07% PASS
I128 I16 U64 2^20 30.534 us 1.98% 28.063 us 1.41% -2.471 us -8.09% FAIL
I128 I16 U64 2^24 285.816 us 0.98% 220.842 us 1.09% -64.974 us -22.73% FAIL
I128 I16 U64 2^28 4.294 ms 0.24% 3.271 ms 0.50% -1022.896 us -23.82% FAIL
I128 I32 I32 2^16 11.827 us 2.02% 11.825 us 1.90% -0.002 us -0.02% PASS
I128 I32 I32 2^20 29.523 us 1.63% 29.690 us 1.61% 0.167 us 0.56% PASS
I128 I32 I32 2^24 257.535 us 1.06% 257.746 us 1.05% 0.212 us 0.08% PASS
I128 I32 I32 2^28 3.880 ms 0.34% 3.880 ms 0.35% -0.428 us -0.01% PASS
I128 I32 U32 2^16 12.203 us 2.11% 11.853 us 2.02% -0.351 us -2.87% FAIL
I128 I32 U32 2^20 29.892 us 1.66% 29.620 us 1.62% -0.272 us -0.91% PASS
I128 I32 U32 2^24 257.860 us 1.03% 257.758 us 1.03% -0.102 us -0.04% PASS
I128 I32 U32 2^28 3.879 ms 0.33% 3.880 ms 0.35% 1.616 us 0.04% PASS
I128 I32 I64 2^16 12.382 us 2.36% 12.055 us 2.23% -0.327 us -2.64% FAIL
I128 I32 I64 2^20 30.288 us 1.51% 30.011 us 1.56% -0.276 us -0.91% PASS
I128 I32 I64 2^24 258.018 us 1.05% 257.895 us 1.02% -0.122 us -0.05% PASS
I128 I32 I64 2^28 3.881 ms 0.39% 3.882 ms 0.38% 0.787 us 0.02% PASS
I128 I32 U64 2^16 12.579 us 2.67% 12.326 us 2.40% -0.253 us -2.01% PASS
I128 I32 U64 2^20 30.314 us 1.63% 29.955 us 1.60% -0.359 us -1.18% PASS
I128 I32 U64 2^24 258.425 us 1.02% 258.008 us 1.06% -0.417 us -0.16% PASS
I128 I32 U64 2^28 3.885 ms 0.40% 3.882 ms 0.38% -2.553 us -0.07% PASS
I128 I64 I32 2^16 13.080 us 2.03% 12.738 us 1.79% -0.342 us -2.61% FAIL
I128 I64 I32 2^20 35.242 us 1.69% 34.734 us 1.88% -0.507 us -1.44% PASS
I128 I64 I32 2^24 322.030 us 0.84% 322.089 us 0.86% 0.059 us 0.02% PASS
I128 I64 I32 2^28 4.925 ms 0.18% 4.927 ms 0.20% 2.360 us 0.05% PASS
I128 I64 U32 2^16 13.087 us 2.30% 12.713 us 1.70% -0.374 us -2.86% FAIL
I128 I64 U32 2^20 35.270 us 1.69% 34.933 us 1.96% -0.337 us -0.96% PASS
I128 I64 U32 2^24 322.096 us 0.84% 321.995 us 0.85% -0.101 us -0.03% PASS
I128 I64 U32 2^28 4.924 ms 0.23% 4.925 ms 0.22% 1.543 us 0.03% PASS
I128 I64 I64 2^16 15.109 us 1.75% 13.082 us 2.13% -2.027 us -13.41% FAIL
I128 I64 I64 2^20 42.387 us 1.41% 35.333 us 1.59% -7.055 us -16.64% FAIL
I128 I64 I64 2^24 364.170 us 0.57% 321.987 us 0.84% -42.183 us -11.58% FAIL
I128 I64 I64 2^28 5.627 ms 0.13% 4.925 ms 0.24% -702.252 us -12.48% FAIL
I128 I64 U64 2^16 15.124 us 1.86% 13.013 us 2.05% -2.111 us -13.96% FAIL
I128 I64 U64 2^20 42.380 us 1.36% 35.035 us 1.97% -7.345 us -17.33% FAIL
I128 I64 U64 2^24 364.199 us 0.57% 321.964 us 0.84% -42.235 us -11.60% FAIL
I128 I64 U64 2^28 5.625 ms 0.13% 4.924 ms 0.21% -701.083 us -12.46% FAIL
I128 I128 I32 2^16 17.782 us 1.33% 17.479 us 1.34% -0.302 us -1.70% FAIL
I128 I128 I32 2^20 50.161 us 1.34% 49.363 us 1.40% -0.797 us -1.59% FAIL
I128 I128 I32 2^24 494.844 us 0.51% 489.726 us 0.57% -5.118 us -1.03% FAIL
I128 I128 I32 2^28 7.620 ms 0.13% 7.537 ms 0.16% -82.767 us -1.09% FAIL
I128 I128 U32 2^16 17.726 us 1.71% 17.672 us 1.76% -0.054 us -0.31% PASS
I128 I128 U32 2^20 49.878 us 1.39% 49.382 us 1.49% -0.495 us -0.99% PASS
I128 I128 U32 2^24 494.366 us 0.53% 489.907 us 0.60% -4.460 us -0.90% FAIL
I128 I128 U32 2^28 7.609 ms 0.12% 7.537 ms 0.18% -72.558 us -0.95% FAIL
I128 I128 I64 2^16 17.791 us 1.45% 17.258 us 1.28% -0.533 us -3.00% FAIL
I128 I128 I64 2^20 50.557 us 1.47% 49.413 us 1.77% -1.144 us -2.26% FAIL
I128 I128 I64 2^24 496.353 us 0.51% 489.969 us 0.59% -6.384 us -1.29% FAIL
I128 I128 I64 2^28 7.635 ms 0.16% 7.537 ms 0.19% -97.640 us -1.28% FAIL
I128 I128 U64 2^16 17.933 us 1.39% 17.204 us 1.32% -0.729 us -4.07% FAIL
I128 I128 U64 2^20 50.260 us 1.46% 49.159 us 1.60% -1.101 us -2.19% FAIL
I128 I128 U64 2^24 495.430 us 0.54% 489.789 us 0.59% -5.641 us -1.14% FAIL
I128 I128 U64 2^28 7.629 ms 0.13% 7.539 ms 0.15% -90.809 us -1.19% FAIL
F32 I8 I32 2^16 10.319 us 2.31% 10.340 us 2.37% 0.022 us 0.21% PASS
F32 I8 I32 2^20 16.562 us 1.31% 16.604 us 1.31% 0.042 us 0.25% PASS
F32 I8 I32 2^24 91.604 us 1.10% 91.370 us 1.11% -0.234 us -0.26% PASS
F32 I8 I32 2^28 1.170 ms 0.72% 1.167 ms 0.72% -3.597 us -0.31% PASS
F32 I8 U32 2^16 10.321 us 2.31% 10.289 us 2.14% -0.032 us -0.31% PASS
F32 I8 U32 2^20 16.672 us 1.29% 16.735 us 1.39% 0.064 us 0.38% PASS
F32 I8 U32 2^24 91.747 us 1.06% 91.467 us 1.14% -0.281 us -0.31% PASS
F32 I8 U32 2^28 1.171 ms 0.71% 1.166 ms 0.71% -5.098 us -0.44% PASS
F32 I8 I64 2^16 10.477 us 2.13% 10.543 us 2.66% 0.065 us 0.62% PASS
F32 I8 I64 2^20 16.884 us 1.47% 17.009 us 1.41% 0.125 us 0.74% PASS
F32 I8 I64 2^24 95.015 us 0.90% 91.842 us 0.93% -3.173 us -3.34% FAIL
F32 I8 I64 2^28 1.242 ms 0.55% 1.177 ms 0.65% -64.359 us -5.18% FAIL
F32 I8 U64 2^16 10.532 us 2.10% 10.530 us 2.38% -0.002 us -0.02% PASS
F32 I8 U64 2^20 16.878 us 1.50% 16.980 us 1.45% 0.102 us 0.61% PASS
F32 I8 U64 2^24 94.636 us 0.95% 91.868 us 0.94% -2.768 us -2.93% FAIL
F32 I8 U64 2^28 1.234 ms 0.59% 1.177 ms 0.65% -57.045 us -4.62% FAIL
F32 I16 I32 2^16 10.948 us 2.05% 11.046 us 1.93% 0.098 us 0.90% PASS
F32 I16 I32 2^20 17.168 us 1.35% 17.232 us 1.30% 0.064 us 0.37% PASS
F32 I16 I32 2^24 100.424 us 1.15% 100.356 us 1.12% -0.067 us -0.07% PASS
F32 I16 I32 2^28 1.401 ms 0.76% 1.400 ms 0.74% -1.169 us -0.08% PASS
F32 I16 U32 2^16 11.018 us 1.94% 11.417 us 2.11% 0.400 us 3.63% FAIL
F32 I16 U32 2^20 17.261 us 1.47% 17.414 us 1.58% 0.153 us 0.89% PASS
F32 I16 U32 2^24 100.538 us 1.15% 100.524 us 1.16% -0.014 us -0.01% PASS
F32 I16 U32 2^28 1.402 ms 0.75% 1.401 ms 0.74% -1.828 us -0.13% PASS
F32 I16 I64 2^16 11.718 us 1.86% 11.803 us 2.13% 0.085 us 0.72% PASS
F32 I16 I64 2^20 17.771 us 1.23% 17.594 us 1.45% -0.178 us -1.00% PASS
F32 I16 I64 2^24 118.681 us 1.18% 100.915 us 1.14% -17.766 us -14.97% FAIL
F32 I16 I64 2^28 1.563 ms 0.77% 1.401 ms 0.74% -162.016 us -10.36% FAIL
F32 I16 U64 2^16 11.624 us 1.83% 11.904 us 2.09% 0.280 us 2.41% FAIL
F32 I16 U64 2^20 17.755 us 1.26% 17.653 us 1.50% -0.103 us -0.58% PASS
F32 I16 U64 2^24 118.682 us 1.20% 101.026 us 1.13% -17.656 us -14.88% FAIL
F32 I16 U64 2^28 1.564 ms 0.76% 1.401 ms 0.76% -163.196 us -10.43% FAIL
F32 I32 I32 2^16 11.666 us 1.63% 12.003 us 1.69% 0.337 us 2.89% FAIL
F32 I32 I32 2^20 18.891 us 1.25% 19.203 us 1.47% 0.311 us 1.65% FAIL
F32 I32 I32 2^24 129.387 us 1.01% 129.493 us 0.98% 0.106 us 0.08% PASS
F32 I32 I32 2^28 1.905 ms 0.54% 1.905 ms 0.56% -0.436 us -0.02% PASS
F32 I32 U32 2^16 11.645 us 1.76% 12.027 us 1.85% 0.381 us 3.28% FAIL
F32 I32 U32 2^20 18.853 us 1.18% 19.228 us 1.46% 0.375 us 1.99% FAIL
F32 I32 U32 2^24 129.348 us 1.00% 129.482 us 1.00% 0.134 us 0.10% PASS
F32 I32 U32 2^28 1.905 ms 0.55% 1.904 ms 0.55% -1.119 us -0.06% PASS
F32 I32 I64 2^16 11.885 us 1.85% 12.250 us 2.16% 0.364 us 3.07% FAIL
F32 I32 I64 2^20 19.403 us 1.17% 19.035 us 1.32% -0.368 us -1.90% FAIL
F32 I32 I64 2^24 137.192 us 0.64% 132.015 us 1.03% -5.177 us -3.77% FAIL
F32 I32 I64 2^28 2.046 ms 0.54% 1.957 ms 0.50% -89.328 us -4.37% FAIL
F32 I32 U64 2^16 11.930 us 1.91% 12.281 us 2.43% 0.352 us 2.95% FAIL
F32 I32 U64 2^20 19.248 us 1.30% 19.041 us 1.40% -0.207 us -1.07% PASS
F32 I32 U64 2^24 137.223 us 0.67% 132.043 us 1.01% -5.180 us -3.77% FAIL
F32 I32 U64 2^28 2.045 ms 0.54% 1.957 ms 0.50% -88.340 us -4.32% FAIL
F32 I64 I32 2^16 12.200 us 1.93% 12.520 us 2.22% 0.320 us 2.63% FAIL
F32 I64 I32 2^20 22.444 us 1.18% 22.763 us 1.25% 0.319 us 1.42% FAIL
F32 I64 I32 2^24 197.939 us 0.87% 198.108 us 0.86% 0.169 us 0.09% PASS
F32 I64 I32 2^28 2.939 ms 0.50% 2.939 ms 0.50% 0.270 us 0.01% PASS
F32 I64 U32 2^16 12.058 us 2.38% 12.342 us 2.66% 0.284 us 2.35% PASS
F32 I64 U32 2^20 22.462 us 1.12% 22.785 us 1.29% 0.324 us 1.44% FAIL
F32 I64 U32 2^24 197.973 us 0.87% 198.048 us 0.87% 0.075 us 0.04% PASS
F32 I64 U32 2^28 2.939 ms 0.50% 2.939 ms 0.50% 0.073 us 0.00% PASS
F32 I64 I64 2^16 13.551 us 2.00% 12.720 us 2.20% -0.831 us -6.13% FAIL
F32 I64 I64 2^20 27.504 us 1.29% 23.009 us 1.37% -4.495 us -16.34% FAIL
F32 I64 I64 2^24 230.314 us 0.64% 200.598 us 0.91% -29.715 us -12.90% FAIL
F32 I64 I64 2^28 3.480 ms 0.50% 2.963 ms 0.50% -516.399 us -14.84% FAIL
F32 I64 U64 2^16 13.595 us 1.87% 12.690 us 2.09% -0.905 us -6.65% FAIL
F32 I64 U64 2^20 27.519 us 1.20% 23.022 us 1.20% -4.497 us -16.34% FAIL
F32 I64 U64 2^24 230.357 us 0.62% 200.600 us 0.90% -29.758 us -12.92% FAIL
F32 I64 U64 2^28 3.479 ms 0.50% 2.963 ms 0.50% -516.140 us -14.83% FAIL
F32 I128 I32 2^16 16.473 us 1.69% 16.444 us 1.73% -0.029 us -0.18% PASS
F32 I128 I32 2^20 39.735 us 1.63% 38.876 us 1.60% -0.859 us -2.16% FAIL
F32 I128 I32 2^24 373.537 us 0.65% 367.449 us 0.69% -6.088 us -1.63% FAIL
F32 I128 I32 2^28 5.697 ms 0.15% 5.606 ms 0.16% -90.659 us -1.59% FAIL
F32 I128 U32 2^16 16.486 us 1.83% 16.574 us 1.81% 0.087 us 0.53% PASS
F32 I128 U32 2^20 39.721 us 1.49% 38.901 us 1.62% -0.819 us -2.06% FAIL
F32 I128 U32 2^24 373.729 us 0.63% 367.730 us 0.69% -5.999 us -1.61% FAIL
F32 I128 U32 2^28 5.699 ms 0.16% 5.603 ms 0.18% -96.102 us -1.69% FAIL
F32 I128 I64 2^16 16.326 us 1.99% 16.393 us 1.75% 0.067 us 0.41% PASS
F32 I128 I64 2^20 42.456 us 1.34% 39.478 us 1.52% -2.978 us -7.01% FAIL
F32 I128 I64 2^24 397.178 us 0.53% 368.477 us 0.70% -28.701 us -7.23% FAIL
F32 I128 I64 2^28 6.101 ms 0.13% 5.608 ms 0.18% -492.408 us -8.07% FAIL
F32 I128 U64 2^16 16.340 us 1.95% 16.561 us 1.57% 0.221 us 1.35% PASS
F32 I128 U64 2^20 42.228 us 1.52% 39.214 us 1.45% -3.015 us -7.14% FAIL
F32 I128 U64 2^24 397.383 us 0.53% 368.289 us 0.69% -29.094 us -7.32% FAIL
F32 I128 U64 2^28 6.105 ms 0.11% 5.608 ms 0.18% -496.111 us -8.13% FAIL
F64 I8 I32 2^16 10.768 us 2.59% 10.986 us 2.42% 0.218 us 2.02% PASS
F64 I8 I32 2^20 20.592 us 1.72% 20.846 us 1.80% 0.254 us 1.23% PASS
F64 I8 I32 2^24 128.115 us 0.79% 128.907 us 0.79% 0.792 us 0.62% PASS
F64 I8 I32 2^28 1.720 ms 0.58% 1.721 ms 0.59% 0.885 us 0.05% PASS
F64 I8 U32 2^16 10.904 us 2.20% 10.895 us 2.40% -0.008 us -0.08% PASS
F64 I8 U32 2^20 20.772 us 1.74% 20.822 us 1.76% 0.050 us 0.24% PASS
F64 I8 U32 2^24 128.161 us 0.76% 128.733 us 0.79% 0.572 us 0.45% PASS
F64 I8 U32 2^28 1.720 ms 0.59% 1.720 ms 0.59% 0.513 us 0.03% PASS
F64 I8 I64 2^16 10.993 us 2.82% 10.983 us 2.57% -0.009 us -0.08% PASS
F64 I8 I64 2^20 21.664 us 1.77% 20.969 us 1.82% -0.694 us -3.21% FAIL
F64 I8 I64 2^24 131.977 us 0.67% 130.355 us 0.84% -1.622 us -1.23% FAIL
F64 I8 I64 2^28 1.740 ms 0.54% 1.732 ms 0.61% -8.441 us -0.49% PASS
F64 I8 U64 2^16 10.998 us 2.58% 11.082 us 2.38% 0.084 us 0.76% PASS
F64 I8 U64 2^20 21.123 us 1.67% 20.847 us 1.85% -0.276 us -1.31% PASS
F64 I8 U64 2^24 131.129 us 0.78% 130.214 us 0.82% -0.915 us -0.70% PASS
F64 I8 U64 2^28 1.735 ms 0.57% 1.732 ms 0.60% -2.575 us -0.15% PASS
F64 I16 I32 2^16 11.346 us 2.15% 11.364 us 2.14% 0.018 us 0.16% PASS
F64 I16 I32 2^20 20.776 us 1.23% 21.742 us 1.97% 0.965 us 4.65% FAIL
F64 I16 I32 2^24 138.521 us 1.34% 142.958 us 1.50% 4.437 us 3.20% FAIL
F64 I16 I32 2^28 2.038 ms 0.56% 2.129 ms 0.54% 90.730 us 4.45% FAIL
F64 I16 U32 2^16 11.355 us 2.52% 11.350 us 2.14% -0.004 us -0.04% PASS
F64 I16 U32 2^20 20.896 us 1.33% 20.841 us 1.18% -0.055 us -0.26% PASS
F64 I16 U32 2^24 138.624 us 1.37% 138.838 us 1.35% 0.213 us 0.15% PASS
F64 I16 U32 2^28 2.037 ms 0.56% 2.040 ms 0.56% 2.079 us 0.10% PASS
F64 I16 I64 2^16 11.577 us 2.60% 11.477 us 2.20% -0.100 us -0.86% PASS
F64 I16 I64 2^20 22.760 us 1.70% 21.990 us 2.05% -0.770 us -3.38% FAIL
F64 I16 I64 2^24 160.889 us 1.03% 143.459 us 1.53% -17.431 us -10.83% FAIL
F64 I16 I64 2^28 2.300 ms 0.50% 2.134 ms 0.56% -166.230 us -7.23% FAIL
F64 I16 U64 2^16 11.361 us 2.35% 11.479 us 2.28% 0.118 us 1.03% PASS
F64 I16 U64 2^20 21.989 us 1.83% 22.062 us 1.88% 0.072 us 0.33% PASS
F64 I16 U64 2^24 144.080 us 1.50% 143.468 us 1.53% -0.611 us -0.42% PASS
F64 I16 U64 2^28 2.134 ms 0.54% 2.134 ms 0.56% -0.010 us -0.00% PASS
F64 I32 I32 2^16 11.465 us 2.02% 11.762 us 2.00% 0.297 us 2.59% FAIL
F64 I32 I32 2^20 22.704 us 1.21% 23.102 us 1.31% 0.399 us 1.76% FAIL
F64 I32 I32 2^24 170.608 us 1.10% 170.909 us 1.11% 0.301 us 0.18% PASS
F64 I32 I32 2^28 2.529 ms 0.50% 2.530 ms 0.50% 1.094 us 0.04% PASS
F64 I32 U32 2^16 11.469 us 1.76% 11.839 us 1.85% 0.370 us 3.23% FAIL
F64 I32 U32 2^20 22.796 us 1.32% 23.118 us 1.28% 0.322 us 1.41% FAIL
F64 I32 U32 2^24 170.760 us 1.12% 170.893 us 1.12% 0.133 us 0.08% PASS
F64 I32 U32 2^28 2.529 ms 0.50% 2.530 ms 0.50% 0.487 us 0.02% PASS
F64 I32 I64 2^16 11.836 us 2.15% 11.971 us 2.20% 0.135 us 1.14% PASS
F64 I32 I64 2^20 23.071 us 1.28% 23.771 us 1.26% 0.701 us 3.04% FAIL
F64 I32 I64 2^24 173.553 us 1.11% 173.626 us 1.09% 0.073 us 0.04% PASS
F64 I32 I64 2^28 2.588 ms 0.50% 2.586 ms 0.50% -2.599 us -0.10% PASS
F64 I32 U64 2^16 11.705 us 2.03% 11.982 us 1.85% 0.278 us 2.37% FAIL
F64 I32 U64 2^20 23.055 us 1.24% 23.719 us 1.28% 0.664 us 2.88% FAIL
F64 I32 U64 2^24 173.580 us 1.14% 173.621 us 1.10% 0.041 us 0.02% PASS
F64 I32 U64 2^28 2.588 ms 0.50% 2.586 ms 0.50% -2.688 us -0.10% PASS
F64 I64 I32 2^16 12.481 us 1.97% 12.308 us 1.85% -0.173 us -1.38% PASS
F64 I64 I32 2^20 26.300 us 1.13% 25.777 us 1.17% -0.523 us -1.99% FAIL
F64 I64 I32 2^24 242.262 us 0.95% 241.238 us 0.91% -1.025 us -0.42% PASS
F64 I64 I32 2^28 3.645 ms 0.31% 3.637 ms 0.32% -8.758 us -0.24% PASS
F64 I64 U32 2^16 12.268 us 2.32% 12.195 us 1.58% -0.073 us -0.59% PASS
F64 I64 U32 2^20 28.271 us 2.15% 25.863 us 1.11% -2.408 us -8.52% FAIL
F64 I64 U32 2^24 245.599 us 0.88% 241.397 us 0.92% -4.202 us -1.71% FAIL
F64 I64 U32 2^28 3.723 ms 0.29% 3.636 ms 0.29% -87.282 us -2.34% FAIL
F64 I64 I64 2^16 14.067 us 1.92% 12.270 us 2.48% -1.797 us -12.77% FAIL
F64 I64 I64 2^20 33.130 us 1.65% 28.152 us 2.34% -4.977 us -15.02% FAIL
F64 I64 I64 2^24 274.104 us 0.59% 245.446 us 0.92% -28.658 us -10.46% FAIL
F64 I64 I64 2^28 4.180 ms 0.16% 3.727 ms 0.27% -452.734 us -10.83% FAIL
F64 I64 U64 2^16 14.296 us 1.86% 12.121 us 2.86% -2.174 us -15.21% FAIL
F64 I64 U64 2^20 34.082 us 1.38% 28.091 us 2.22% -5.992 us -17.58% FAIL
F64 I64 U64 2^24 280.107 us 0.56% 245.375 us 0.94% -34.732 us -12.40% FAIL
F64 I64 U64 2^28 4.191 ms 0.16% 3.728 ms 0.26% -463.764 us -11.06% FAIL
F64 I128 I32 2^16 15.882 us 1.50% 15.873 us 1.25% -0.009 us -0.06% PASS
F64 I128 I32 2^20 43.565 us 1.71% 44.854 us 1.35% 1.289 us 2.96% FAIL
F64 I128 I32 2^24 420.377 us 0.59% 426.824 us 0.54% 6.447 us 1.53% FAIL
F64 I128 I32 2^28 6.478 ms 0.17% 6.574 ms 0.14% 95.998 us 1.48% FAIL
F64 I128 U32 2^16 15.579 us 1.51% 15.870 us 1.48% 0.291 us 1.87% FAIL
F64 I128 U32 2^20 43.593 us 1.68% 44.737 us 1.34% 1.144 us 2.62% FAIL
F64 I128 U32 2^24 421.147 us 0.62% 426.781 us 0.53% 5.634 us 1.34% FAIL
F64 I128 U32 2^28 6.481 ms 0.16% 6.575 ms 0.13% 94.541 us 1.46% FAIL
F64 I128 I64 2^16 15.847 us 1.42% 15.885 us 1.42% 0.038 us 0.24% PASS
F64 I128 I64 2^20 44.602 us 1.46% 44.702 us 1.36% 0.101 us 0.23% PASS
F64 I128 I64 2^24 426.795 us 0.53% 427.097 us 0.52% 0.302 us 0.07% PASS
F64 I128 I64 2^28 6.576 ms 0.12% 6.578 ms 0.12% 2.222 us 0.03% PASS
F64 I128 U64 2^16 16.059 us 1.37% 15.953 us 1.28% -0.106 us -0.66% PASS
F64 I128 U64 2^20 44.429 us 1.50% 44.664 us 1.35% 0.235 us 0.53% PASS
F64 I128 U64 2^24 426.429 us 0.55% 427.061 us 0.54% 0.632 us 0.15% PASS
F64 I128 U64 2^28 6.574 ms 0.13% 6.577 ms 0.14% 3.545 us 0.05% PASS
C64 I8 I32 2^16 10.478 us 2.01% 10.638 us 2.35% 0.161 us 1.53% PASS
C64 I8 I32 2^20 20.245 us 1.65% 20.382 us 1.71% 0.137 us 0.68% PASS
C64 I8 I32 2^24 129.141 us 0.82% 128.988 us 0.78% -0.153 us -0.12% PASS
C64 I8 I32 2^28 1.731 ms 0.60% 1.729 ms 0.60% -2.262 us -0.13% PASS
C64 I8 U32 2^16 10.508 us 2.34% 10.547 us 2.40% 0.038 us 0.36% PASS
C64 I8 U32 2^20 20.484 us 1.66% 20.498 us 1.67% 0.014 us 0.07% PASS
C64 I8 U32 2^24 129.022 us 0.79% 128.939 us 0.80% -0.084 us -0.06% PASS
C64 I8 U32 2^28 1.731 ms 0.60% 1.729 ms 0.59% -1.509 us -0.09% PASS
C64 I8 I64 2^16 11.021 us 2.44% 10.960 us 2.16% -0.060 us -0.55% PASS
C64 I8 I64 2^20 22.177 us 1.61% 20.697 us 1.61% -1.480 us -6.67% FAIL
C64 I8 I64 2^24 131.416 us 0.64% 129.173 us 0.79% -2.243 us -1.71% FAIL
C64 I8 I64 2^28 1.727 ms 0.54% 1.731 ms 0.61% 4.387 us 0.25% PASS
C64 I8 U64 2^16 11.282 us 2.20% 10.937 us 2.03% -0.344 us -3.05% FAIL
C64 I8 U64 2^20 21.037 us 1.62% 20.650 us 1.59% -0.388 us -1.84% FAIL
C64 I8 U64 2^24 130.006 us 0.76% 129.149 us 0.79% -0.857 us -0.66% PASS
C64 I8 U64 2^28 1.730 ms 0.59% 1.732 ms 0.61% 1.574 us 0.09% PASS
C64 I16 I32 2^16 11.579 us 2.46% 11.323 us 2.10% -0.256 us -2.21% FAIL
C64 I16 I32 2^20 20.934 us 1.20% 20.651 us 1.20% -0.284 us -1.35% FAIL
C64 I16 I32 2^24 138.713 us 1.34% 138.636 us 1.36% -0.077 us -0.06% PASS
C64 I16 I32 2^28 2.041 ms 0.57% 2.042 ms 0.57% 0.323 us 0.02% PASS
C64 I16 U32 2^16 11.631 us 2.28% 11.294 us 2.30% -0.337 us -2.90% FAIL
C64 I16 U32 2^20 20.905 us 1.48% 20.588 us 1.27% -0.316 us -1.51% FAIL
C64 I16 U32 2^24 138.852 us 1.35% 138.710 us 1.35% -0.141 us -0.10% PASS
C64 I16 U32 2^28 2.041 ms 0.58% 2.042 ms 0.57% 0.657 us 0.03% PASS
C64 I16 I64 2^16 11.605 us 2.57% 11.296 us 2.12% -0.309 us -2.66% FAIL
C64 I16 I64 2^20 22.213 us 1.84% 21.779 us 1.80% -0.434 us -1.95% FAIL
C64 I16 I64 2^24 145.163 us 1.43% 143.753 us 1.49% -1.410 us -0.97% PASS
C64 I16 I64 2^28 2.139 ms 0.56% 2.138 ms 0.57% -1.107 us -0.05% PASS
C64 I16 U64 2^16 11.618 us 2.56% 11.258 us 2.17% -0.361 us -3.10% FAIL
C64 I16 U64 2^20 23.068 us 1.58% 21.811 us 1.81% -1.257 us -5.45% FAIL
C64 I16 U64 2^24 162.073 us 1.05% 143.838 us 1.47% -18.235 us -11.25% FAIL
C64 I16 U64 2^28 2.308 ms 0.50% 2.138 ms 0.55% -169.538 us -7.35% FAIL
C64 I32 I32 2^16 11.308 us 1.86% 11.359 us 1.72% 0.051 us 0.46% PASS
C64 I32 I32 2^20 22.712 us 1.15% 22.917 us 1.10% 0.205 us 0.90% PASS
C64 I32 I32 2^24 170.984 us 1.11% 171.042 us 1.12% 0.058 us 0.03% PASS
C64 I32 I32 2^28 2.532 ms 0.50% 2.532 ms 0.50% 0.074 us 0.00% PASS
C64 I32 U32 2^16 11.584 us 1.85% 11.505 us 1.85% -0.078 us -0.68% PASS
C64 I32 U32 2^20 22.951 us 1.10% 23.042 us 1.21% 0.091 us 0.40% PASS
C64 I32 U32 2^24 171.027 us 1.11% 171.079 us 1.10% 0.052 us 0.03% PASS
C64 I32 U32 2^28 2.533 ms 0.50% 2.532 ms 0.50% -1.029 us -0.04% PASS
C64 I32 I64 2^16 11.754 us 2.33% 11.758 us 2.34% 0.003 us 0.03% PASS
C64 I32 I64 2^20 23.078 us 1.38% 22.977 us 1.25% -0.102 us -0.44% PASS
C64 I32 I64 2^24 173.615 us 1.13% 173.372 us 1.12% -0.243 us -0.14% PASS
C64 I32 I64 2^28 2.589 ms 0.50% 2.588 ms 0.50% -0.715 us -0.03% PASS
C64 I32 U64 2^16 11.741 us 2.11% 11.726 us 1.90% -0.015 us -0.12% PASS
C64 I32 U64 2^20 22.982 us 1.35% 22.958 us 1.31% -0.024 us -0.10% PASS
C64 I32 U64 2^24 173.560 us 1.13% 173.445 us 1.11% -0.115 us -0.07% PASS
C64 I32 U64 2^28 2.590 ms 0.50% 2.588 ms 0.50% -1.358 us -0.05% PASS
C64 I64 I32 2^16 12.072 us 2.43% 12.510 us 2.43% 0.438 us 3.63% FAIL
C64 I64 I32 2^20 25.525 us 1.17% 25.466 us 1.16% -0.059 us -0.23% PASS
C64 I64 I32 2^24 238.085 us 0.94% 238.381 us 0.97% 0.296 us 0.12% PASS
C64 I64 I32 2^28 3.581 ms 0.36% 3.580 ms 0.35% -1.675 us -0.05% PASS
C64 I64 U32 2^16 12.422 us 2.70% 12.557 us 2.51% 0.135 us 1.08% PASS
C64 I64 U32 2^20 25.814 us 1.27% 25.519 us 1.21% -0.295 us -1.14% PASS
C64 I64 U32 2^24 238.506 us 0.94% 238.310 us 0.96% -0.196 us -0.08% PASS
C64 I64 U32 2^28 3.579 ms 0.33% 3.579 ms 0.33% -0.774 us -0.02% PASS
C64 I64 I64 2^16 14.142 us 1.96% 12.279 us 2.32% -1.863 us -13.18% FAIL
C64 I64 I64 2^20 33.302 us 1.75% 28.000 us 2.14% -5.302 us -15.92% FAIL
C64 I64 I64 2^24 277.142 us 0.59% 245.432 us 0.91% -31.710 us -11.44% FAIL
C64 I64 I64 2^28 4.197 ms 0.17% 3.730 ms 0.27% -467.248 us -11.13% FAIL
C64 I64 U64 2^16 14.102 us 1.89% 12.160 us 2.32% -1.943 us -13.77% FAIL
C64 I64 U64 2^20 33.376 us 1.77% 28.042 us 2.21% -5.335 us -15.98% FAIL
C64 I64 U64 2^24 276.960 us 0.59% 245.588 us 0.91% -31.373 us -11.33% FAIL
C64 I64 U64 2^28 4.195 ms 0.18% 3.730 ms 0.28% -465.840 us -11.10% FAIL
C64 I128 I32 2^16 16.149 us 1.44% 16.155 us 1.19% 0.005 us 0.03% PASS
C64 I128 I32 2^20 44.151 us 1.72% 45.076 us 1.42% 0.925 us 2.10% FAIL
C64 I128 I32 2^24 422.365 us 0.61% 427.718 us 0.53% 5.353 us 1.27% FAIL
C64 I128 I32 2^28 6.501 ms 0.16% 6.589 ms 0.13% 88.192 us 1.36% FAIL
C64 I128 U32 2^16 15.974 us 1.48% 16.000 us 1.42% 0.026 us 0.16% PASS
C64 I128 U32 2^20 44.049 us 1.65% 44.896 us 1.36% 0.847 us 1.92% FAIL
C64 I128 U32 2^24 422.281 us 0.60% 427.846 us 0.52% 5.565 us 1.32% FAIL
C64 I128 U32 2^28 6.495 ms 0.14% 6.587 ms 0.12% 91.905 us 1.41% FAIL
C64 I128 I64 2^16 16.326 us 1.49% 15.961 us 1.24% -0.365 us -2.23% FAIL
C64 I128 I64 2^20 45.103 us 1.53% 45.014 us 1.47% -0.089 us -0.20% PASS
C64 I128 I64 2^24 428.805 us 0.53% 427.678 us 0.55% -1.126 us -0.26% PASS
C64 I128 I64 2^28 6.606 ms 0.13% 6.592 ms 0.14% -13.971 us -0.21% FAIL
C64 I128 U64 2^16 16.004 us 1.17% 16.252 us 1.50% 0.248 us 1.55% FAIL
C64 I128 U64 2^20 44.970 us 1.49% 45.194 us 1.39% 0.224 us 0.50% PASS
C64 I128 U64 2^24 428.220 us 0.52% 427.975 us 0.54% -0.245 us -0.06% PASS
C64 I128 U64 2^28 6.597 ms 0.12% 6.589 ms 0.14% -7.979 us -0.12% FAIL

Summary: Comparing performance of different offset types, all after the changes in this PR:

Diff u32 vs i32 any num items Diff u32 vs i32 2^28 num items Diff i64 vs i32 any num items Diff i64 vs i32 2^28 num items Diff u64 vs i32 any num items Diff u64 vs i32 2^28 num items
min 88.26% 88.26% 95.46% 97.09% 95.51% 97.09%
max 103.36% 100.07% 110.80% 106.93% 111.07% 106.98%
avg 99.77% 99.49% 101.22% 101.07% 101.22% 101.07%
Detailed H100 exclusive.by_key results, absolute of all offset types after this PR
KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Samples CPU Time Noise GPU Time Noise Elem/s GlobalMem BW BWUtil
I8 I8 I32 2^16 = 65536 48768x 16.308 us 59.15% 10.254 us 2.10% 6.391G 19.174 GB/s 0.94%
I8 I8 I32 2^20 = 1048576 34704x 20.502 us 42.36% 14.408 us 1.45% 72.777G 218.332 GB/s 10.71%
I8 I8 I32 2^24 = 16777216 7072x 77.137 us 8.95% 70.839 us 0.94% 236.837G 710.510 GB/s 34.85%
I8 I8 I32 2^28 = 268435456 694x 964.693 us 0.81% 958.601 us 0.50% 280.028G 840.085 GB/s 41.20%
I8 I8 U32 2^16 = 65536 47904x 16.656 us 59.67% 10.438 us 2.38% 6.278G 18.835 GB/s 0.92%
I8 I8 U32 2^20 = 1048576 34704x 20.479 us 42.19% 14.410 us 1.47% 72.767G 218.300 GB/s 10.71%
I8 I8 U32 2^24 = 16777216 7072x 76.895 us 8.77% 70.737 us 0.95% 237.177G 711.530 GB/s 34.90%
I8 I8 U32 2^28 = 268435456 756x 964.890 us 0.81% 958.797 us 0.50% 279.971G 839.913 GB/s 41.19%
I8 I8 I64 2^16 = 65536 48528x 16.535 us 60.60% 10.305 us 2.52% 6.359G 19.078 GB/s 0.94%
I8 I8 I64 2^20 = 1048576 34240x 20.832 us 42.71% 14.607 us 1.59% 71.787G 215.360 GB/s 10.56%
I8 I8 I64 2^24 = 16777216 6880x 78.828 us 8.42% 72.740 us 0.81% 230.645G 691.936 GB/s 33.93%
I8 I8 I64 2^28 = 268435456 506x 994.717 us 0.80% 988.581 us 0.49% 271.536G 814.608 GB/s 39.95%
I8 I8 U64 2^16 = 65536 48944x 16.339 us 60.06% 10.216 us 2.00% 6.415G 19.245 GB/s 0.94%
I8 I8 U64 2^20 = 1048576 34144x 20.844 us 42.42% 14.646 us 1.72% 71.595G 214.784 GB/s 10.53%
I8 I8 U64 2^24 = 16777216 6896x 78.711 us 8.36% 72.669 us 0.80% 230.872G 692.617 GB/s 33.97%
I8 I8 U64 2^28 = 268435456 506x 994.879 us 0.80% 988.742 us 0.50% 271.492G 814.476 GB/s 39.94%
I8 I16 I32 2^16 = 65536 42368x 18.028 us 52.84% 11.803 us 2.06% 5.552G 27.762 GB/s 1.36%
I8 I16 I32 2^20 = 1048576 32144x 21.792 us 40.15% 15.558 us 1.67% 67.398G 336.992 GB/s 16.53%
I8 I16 I32 2^24 = 16777216 5648x 94.800 us 6.92% 88.717 us 0.88% 189.108G 945.542 GB/s 46.37%
I8 I16 I32 2^28 = 268435456 1635x 1.198 ms 0.73% 1.192 ms 0.50% 225.277G 1.126 TB/s 55.24%
I8 I16 U32 2^16 = 65536 42256x 17.965 us 51.87% 11.836 us 2.00% 5.537G 27.685 GB/s 1.36%
I8 I16 U32 2^20 = 1048576 31840x 22.009 us 40.19% 15.709 us 1.73% 66.749G 333.745 GB/s 16.37%
I8 I16 U32 2^24 = 16777216 6208x 86.753 us 7.72% 80.614 us 1.15% 208.118G 1.041 TB/s 51.03%
I8 I16 U32 2^28 = 268435456 2576x 1.059 ms 0.91% 1.052 ms 0.70% 255.060G 1.275 TB/s 62.54%
I8 I16 I64 2^16 = 65536 40832x 18.464 us 50.87% 12.246 us 2.10% 5.352G 26.759 GB/s 1.31%
I8 I16 I64 2^20 = 1048576 31472x 22.103 us 39.17% 15.892 us 1.77% 65.980G 329.901 GB/s 16.18%
I8 I16 I64 2^24 = 16777216 5616x 95.164 us 6.93% 89.055 us 0.89% 188.392G 941.958 GB/s 46.20%
I8 I16 I64 2^28 = 268435456 1680x 1.197 ms 0.73% 1.191 ms 0.50% 225.402G 1.127 TB/s 55.27%
I8 I16 U64 2^16 = 65536 40880x 18.334 us 49.96% 12.234 us 2.23% 5.357G 26.785 GB/s 1.31%
I8 I16 U64 2^20 = 1048576 31504x 22.214 us 40.04% 15.873 us 1.74% 66.062G 330.312 GB/s 16.20%
I8 I16 U64 2^24 = 16777216 5616x 95.162 us 6.95% 89.035 us 0.92% 188.435G 942.175 GB/s 46.21%
I8 I16 U64 2^28 = 268435456 1652x 1.197 ms 0.72% 1.191 ms 0.50% 225.421G 1.127 TB/s 55.28%
I8 I32 I32 2^16 = 65536 42848x 17.926 us 53.68% 11.672 us 1.99% 5.615G 50.533 GB/s 2.48%
I8 I32 I32 2^20 = 1048576 27136x 24.663 us 33.91% 18.428 us 1.46% 56.900G 512.098 GB/s 25.11%
I8 I32 I32 2^24 = 16777216 4672x 113.439 us 5.70% 107.384 us 0.80% 156.235G 1.406 TB/s 68.96%
I8 I32 I32 2^28 = 268435456 2656x 1.532 ms 0.77% 1.525 ms 0.65% 175.971G 1.584 TB/s 77.67%
I8 I32 U32 2^16 = 65536 43936x 17.477 us 53.65% 11.383 us 1.65% 5.757G 51.817 GB/s 2.54%
I8 I32 U32 2^20 = 1048576 27568x 24.204 us 33.48% 18.143 us 1.43% 57.796G 520.160 GB/s 25.51%
I8 I32 U32 2^24 = 16777216 4672x 113.167 us 5.80% 107.032 us 0.83% 156.749G 1.411 TB/s 69.19%
I8 I32 U32 2^28 = 268435456 2544x 1.531 ms 0.78% 1.525 ms 0.67% 175.984G 1.584 TB/s 77.68%
I8 I32 I64 2^16 = 65536 44016x 17.557 us 54.64% 11.362 us 2.04% 5.768G 51.914 GB/s 2.55%
I8 I32 I64 2^20 = 1048576 26704x 24.810 us 32.52% 18.733 us 1.45% 55.974G 503.767 GB/s 24.71%
I8 I32 I64 2^24 = 16777216 4704x 112.668 us 5.84% 106.501 us 0.72% 157.531G 1.418 TB/s 69.53%
I8 I32 I64 2^28 = 268435456 2512x 1.529 ms 0.76% 1.523 ms 0.65% 176.270G 1.586 TB/s 77.80%
I8 I32 U64 2^16 = 65536 43696x 17.634 us 54.22% 11.443 us 2.14% 5.727G 51.543 GB/s 2.53%
I8 I32 U64 2^20 = 1048576 26816x 24.722 us 32.63% 18.650 us 1.45% 56.224G 506.016 GB/s 24.82%
I8 I32 U64 2^24 = 16777216 4704x 112.535 us 5.79% 106.423 us 0.71% 157.647G 1.419 TB/s 69.58%
I8 I32 U64 2^28 = 268435456 2560x 1.529 ms 0.77% 1.523 ms 0.66% 176.267G 1.586 TB/s 77.80%
I8 I64 I32 2^16 = 65536 43776x 17.580 us 54.04% 11.422 us 2.41% 5.738G 97.538 GB/s 4.78%
I8 I64 I32 2^20 = 1048576 24064x 26.841 us 29.18% 20.789 us 1.43% 50.438G 857.454 GB/s 42.05%
I8 I64 I32 2^24 = 16777216 2848x 181.734 us 3.64% 175.622 us 1.03% 95.530G 1.624 TB/s 79.65%
I8 I64 I32 2^28 = 268435456 2240x 2.637 ms 0.58% 2.631 ms 0.53% 102.043G 1.735 TB/s 85.08%
I8 I64 U32 2^16 = 65536 44000x 17.542 us 54.46% 11.366 us 2.16% 5.766G 98.023 GB/s 4.81%
I8 I64 U32 2^20 = 1048576 23984x 26.880 us 29.00% 20.851 us 1.36% 50.290G 854.928 GB/s 41.93%
I8 I64 U32 2^24 = 16777216 2848x 181.701 us 3.62% 175.615 us 1.01% 95.534G 1.624 TB/s 79.65%
I8 I64 U32 2^28 = 268435456 2224x 2.636 ms 0.59% 2.630 ms 0.54% 102.052G 1.735 TB/s 85.08%
I8 I64 I64 2^16 = 65536 42928x 17.797 us 52.89% 11.650 us 2.44% 5.625G 95.632 GB/s 4.69%
I8 I64 I64 2^20 = 1048576 23568x 27.376 us 29.03% 21.230 us 1.41% 49.392G 839.665 GB/s 41.18%
I8 I64 I64 2^24 = 16777216 2848x 182.382 us 3.69% 176.152 us 1.03% 95.243G 1.619 TB/s 79.41%
I8 I64 I64 2^28 = 268435456 2272x 2.639 ms 0.58% 2.633 ms 0.53% 101.968G 1.733 TB/s 85.01%
I8 I64 U64 2^16 = 65536 41712x 18.257 us 52.38% 11.990 us 2.42% 5.466G 92.917 GB/s 4.56%
I8 I64 U64 2^20 = 1048576 23792x 27.150 us 29.19% 21.026 us 1.50% 49.872G 847.818 GB/s 41.58%
I8 I64 U64 2^24 = 16777216 2848x 182.123 us 3.61% 176.024 us 1.00% 95.312G 1.620 TB/s 79.46%
I8 I64 U64 2^28 = 268435456 2272x 2.639 ms 0.58% 2.632 ms 0.53% 101.973G 1.734 TB/s 85.02%
I8 I128 I32 2^16 = 65536 31600x 22.000 us 39.11% 15.823 us 1.83% 4.142G 136.677 GB/s 6.70%
I8 I128 I32 2^20 = 1048576 13648x 42.747 us 16.77% 36.649 us 1.70% 28.612G 944.182 GB/s 46.31%
I8 I128 I32 2^24 = 16777216 1456x 352.572 us 1.93% 346.441 us 0.76% 48.427G 1.598 TB/s 78.38%
I8 I128 I32 2^28 = 268435456 96x 5.269 ms 0.24% 5.263 ms 0.22% 51.005G 1.683 TB/s 82.55%
I8 I128 U32 2^16 = 65536 31232x 22.257 us 39.06% 16.015 us 1.68% 4.092G 135.042 GB/s 6.62%
I8 I128 U32 2^20 = 1048576 13632x 42.754 us 16.63% 36.693 us 1.79% 28.577G 943.049 GB/s 46.25%
I8 I128 U32 2^24 = 16777216 1456x 352.325 us 1.90% 346.390 us 0.79% 48.434G 1.598 TB/s 78.39%
I8 I128 U32 2^28 = 268435456 95x 5.271 ms 0.23% 5.265 ms 0.20% 50.987G 1.683 TB/s 82.52%
I8 I128 I64 2^16 = 65536 31584x 21.830 us 37.95% 15.833 us 1.58% 4.139G 136.596 GB/s 6.70%
I8 I128 I64 2^20 = 1048576 13568x 43.049 us 16.83% 36.881 us 1.69% 28.431G 938.231 GB/s 46.01%
I8 I128 I64 2^24 = 16777216 1456x 352.526 us 1.90% 346.525 us 0.76% 48.416G 1.598 TB/s 78.36%
I8 I128 I64 2^28 = 268435456 95x 5.278 ms 0.23% 5.272 ms 0.20% 50.913G 1.680 TB/s 82.40%
I8 I128 U64 2^16 = 65536 31488x 22.024 us 38.79% 15.880 us 1.61% 4.127G 136.192 GB/s 6.68%
I8 I128 U64 2^20 = 1048576 13632x 42.826 us 16.79% 36.698 us 1.71% 28.573G 942.906 GB/s 46.24%
I8 I128 U64 2^24 = 16777216 1456x 352.760 us 1.90% 346.772 us 0.80% 48.381G 1.597 TB/s 78.30%
I8 I128 U64 2^28 = 268435456 95x 5.277 ms 0.25% 5.271 ms 0.22% 50.925G 1.681 TB/s 82.42%
I16 I8 I32 2^16 = 65536 49152x 16.097 us 58.35% 10.174 us 2.46% 6.442G 25.767 GB/s 1.26%
I16 I8 I32 2^20 = 1048576 30640x 22.487 us 37.82% 16.325 us 1.41% 64.233G 256.933 GB/s 12.60%
I16 I8 I32 2^24 = 16777216 6240x 86.356 us 7.54% 80.326 us 0.53% 208.865G 835.458 GB/s 40.97%
I16 I8 I32 2^28 = 268435456 462x 1.089 ms 0.69% 1.083 ms 0.41% 247.847G 991.388 GB/s 48.62%
I16 I8 U32 2^16 = 65536 49136x 16.175 us 59.08% 10.177 us 2.37% 6.439G 25.758 GB/s 1.26%
I16 I8 U32 2^20 = 1048576 30864x 22.267 us 37.45% 16.208 us 1.52% 64.696G 258.783 GB/s 12.69%
I16 I8 U32 2^24 = 16777216 6240x 86.099 us 7.41% 80.178 us 0.53% 209.251G 837.004 GB/s 41.05%
I16 I8 U32 2^28 = 268435456 462x 1.089 ms 0.70% 1.083 ms 0.42% 247.948G 991.792 GB/s 48.64%
I16 I8 I64 2^16 = 65536 48288x 16.337 us 57.84% 10.358 us 2.44% 6.327G 25.309 GB/s 1.24%
I16 I8 I64 2^20 = 1048576 30688x 22.457 us 37.85% 16.299 us 1.41% 64.333G 257.331 GB/s 12.62%
I16 I8 I64 2^24 = 16777216 6240x 86.240 us 7.44% 80.286 us 0.51% 208.968G 835.873 GB/s 40.99%
I16 I8 I64 2^28 = 268435456 462x 1.089 ms 0.68% 1.084 ms 0.39% 247.741G 990.966 GB/s 48.60%
I16 I8 U64 2^16 = 65536 48496x 16.302 us 58.24% 10.312 us 2.57% 6.356G 25.422 GB/s 1.25%
I16 I8 U64 2^20 = 1048576 30608x 22.425 us 37.31% 16.339 us 1.43% 64.177G 256.708 GB/s 12.59%
I16 I8 U64 2^24 = 16777216 6240x 86.272 us 7.43% 80.328 us 0.54% 208.858G 835.431 GB/s 40.97%
I16 I8 U64 2^28 = 268435456 462x 1.090 ms 0.69% 1.084 ms 0.39% 247.720G 990.881 GB/s 48.60%
I16 I16 I32 2^16 = 65536 44240x 17.298 us 53.13% 11.303 us 2.16% 5.798G 34.788 GB/s 1.71%
I16 I16 I32 2^20 = 1048576 30176x 22.621 us 36.56% 16.575 us 1.25% 63.263G 379.579 GB/s 18.62%
I16 I16 I32 2^24 = 16777216 5584x 95.666 us 6.94% 89.570 us 1.30% 187.309G 1.124 TB/s 55.12%
I16 I16 I32 2^28 = 268435456 2576x 1.177 ms 0.97% 1.171 ms 0.82% 229.146G 1.375 TB/s 67.43%
I16 I16 U32 2^16 = 65536 42800x 17.919 us 53.50% 11.683 us 2.05% 5.610G 33.658 GB/s 1.65%
I16 I16 U32 2^20 = 1048576 30080x 22.801 us 37.23% 16.625 us 1.58% 63.072G 378.432 GB/s 18.56%
I16 I16 U32 2^24 = 16777216 5584x 95.958 us 7.04% 89.762 us 1.33% 186.907G 1.121 TB/s 55.00%
I16 I16 U32 2^28 = 268435456 2544x 1.176 ms 0.97% 1.170 ms 0.82% 229.445G 1.377 TB/s 67.52%
I16 I16 I64 2^16 = 65536 42608x 17.945 us 53.03% 11.736 us 2.38% 5.584G 33.506 GB/s 1.64%
I16 I16 I64 2^20 = 1048576 30272x 22.696 us 37.45% 16.522 us 1.58% 63.464G 380.781 GB/s 18.67%
I16 I16 I64 2^24 = 16777216 5568x 96.026 us 7.06% 89.823 us 1.40% 186.781G 1.121 TB/s 54.96%
I16 I16 I64 2^28 = 268435456 2512x 1.174 ms 0.98% 1.168 ms 0.83% 229.886G 1.379 TB/s 67.65%
I16 I16 U64 2^16 = 65536 42592x 17.949 us 53.00% 11.741 us 2.40% 5.582G 33.491 GB/s 1.64%
I16 I16 U64 2^20 = 1048576 30352x 22.642 us 37.46% 16.481 us 1.66% 63.622G 381.734 GB/s 18.72%
I16 I16 U64 2^24 = 16777216 5568x 96.045 us 7.06% 89.842 us 1.41% 186.741G 1.120 TB/s 54.95%
I16 I16 U64 2^28 = 268435456 2560x 1.174 ms 1.00% 1.168 ms 0.85% 229.918G 1.380 TB/s 67.65%
I16 I32 I32 2^16 = 65536 43360x 17.764 us 54.19% 11.532 us 2.79% 5.683G 56.830 GB/s 2.79%
I16 I32 I32 2^20 = 1048576 28448x 23.831 us 35.63% 17.580 us 1.37% 59.645G 596.452 GB/s 29.25%
I16 I32 I32 2^24 = 16777216 4640x 114.106 us 5.75% 108.039 us 1.19% 155.289G 1.553 TB/s 76.16%
I16 I32 I32 2^28 = 268435456 2352x 1.572 ms 0.84% 1.566 ms 0.74% 171.437G 1.714 TB/s 84.08%
I16 I32 U32 2^16 = 65536 44160x 17.480 us 54.52% 11.323 us 2.40% 5.788G 57.880 GB/s 2.84%
I16 I32 U32 2^20 = 1048576 28288x 23.919 us 35.33% 17.683 us 1.25% 59.297G 592.969 GB/s 29.08%
I16 I32 U32 2^24 = 16777216 4624x 114.263 us 5.75% 108.183 us 1.16% 155.081G 1.551 TB/s 76.06%
I16 I32 U32 2^28 = 268435456 2416x 1.572 ms 0.83% 1.566 ms 0.73% 171.451G 1.715 TB/s 84.08%
I16 I32 I64 2^16 = 65536 42512x 17.889 us 52.24% 11.762 us 2.04% 5.572G 55.718 GB/s 2.73%
I16 I32 I64 2^20 = 1048576 27952x 24.133 us 34.94% 17.893 us 1.34% 58.602G 586.020 GB/s 28.74%
I16 I32 I64 2^24 = 16777216 4384x 120.558 us 5.45% 114.450 us 1.07% 146.589G 1.466 TB/s 71.89%
I16 I32 I64 2^28 = 268435456 2464x 1.643 ms 0.76% 1.637 ms 0.65% 163.968G 1.640 TB/s 80.41%
I16 I32 U64 2^16 = 65536 42496x 17.956 us 52.66% 11.770 us 2.14% 5.568G 55.679 GB/s 2.73%
I16 I32 U64 2^20 = 1048576 27968x 24.006 us 34.31% 17.882 us 1.49% 58.639G 586.387 GB/s 28.76%
I16 I32 U64 2^24 = 16777216 4384x 120.476 us 5.45% 114.360 us 1.05% 146.705G 1.467 TB/s 71.95%
I16 I32 U64 2^28 = 268435456 2464x 1.643 ms 0.75% 1.637 ms 0.65% 163.998G 1.640 TB/s 80.43%
I16 I64 I32 2^16 = 65536 42560x 17.969 us 53.01% 11.752 us 2.40% 5.577G 100.380 GB/s 4.92%
I16 I64 I32 2^20 = 1048576 23024x 27.886 us 28.41% 21.729 us 1.34% 48.257G 868.635 GB/s 42.60%
I16 I64 I32 2^24 = 16777216 2720x 190.143 us 3.52% 183.955 us 1.02% 91.203G 1.642 TB/s 80.51%
I16 I64 I32 2^28 = 268435456 2009x 2.752 ms 0.55% 2.746 ms 0.50% 97.765G 1.760 TB/s 86.30%
I16 I64 U32 2^16 = 65536 42688x 17.959 us 53.39% 11.717 us 2.20% 5.593G 100.681 GB/s 4.94%
I16 I64 U32 2^20 = 1048576 23088x 27.677 us 27.86% 21.658 us 1.42% 48.416G 871.482 GB/s 42.74%
I16 I64 U32 2^24 = 16777216 2720x 190.044 us 3.47% 183.952 us 1.02% 91.205G 1.642 TB/s 80.51%
I16 I64 U32 2^28 = 268435456 1948x 2.752 ms 0.55% 2.746 ms 0.50% 97.762G 1.760 TB/s 86.30%
I16 I64 I64 2^16 = 65536 41152x 18.310 us 50.76% 12.154 us 2.02% 5.392G 97.062 GB/s 4.76%
I16 I64 I64 2^20 = 1048576 23136x 27.845 us 28.86% 21.621 us 1.51% 48.498G 872.965 GB/s 42.81%
I16 I64 I64 2^24 = 16777216 2720x 190.252 us 3.52% 184.042 us 1.00% 91.160G 1.641 TB/s 80.47%
I16 I64 I64 2^28 = 268435456 2068x 2.753 ms 0.55% 2.747 ms 0.50% 97.704G 1.759 TB/s 86.25%
I16 I64 U64 2^16 = 65536 41280x 18.413 us 52.07% 12.117 us 2.21% 5.409G 97.358 GB/s 4.77%
I16 I64 U64 2^20 = 1048576 23168x 27.667 us 28.18% 21.595 us 1.37% 48.556G 874.012 GB/s 42.86%
I16 I64 U64 2^24 = 16777216 2720x 190.196 us 3.49% 184.056 us 1.00% 91.153G 1.641 TB/s 80.47%
I16 I64 U64 2^28 = 268435456 2064x 2.753 ms 0.55% 2.747 ms 0.50% 97.735G 1.759 TB/s 86.28%
I16 I128 I32 2^16 = 65536 31328x 22.103 us 38.57% 15.961 us 1.50% 4.106G 139.609 GB/s 6.85%
I16 I128 I32 2^20 = 1048576 13136x 44.132 us 15.95% 38.106 us 1.81% 27.517G 935.593 GB/s 45.88%
I16 I128 I32 2^24 = 16777216 1408x 361.350 us 1.92% 355.293 us 0.87% 47.221G 1.606 TB/s 78.74%
I16 I128 I32 2^28 = 268435456 93x 5.425 ms 0.26% 5.419 ms 0.23% 49.537G 1.684 TB/s 82.60%
I16 I128 U32 2^16 = 65536 31728x 21.887 us 38.95% 15.762 us 1.66% 4.158G 141.371 GB/s 6.93%
I16 I128 U32 2^20 = 1048576 13104x 44.206 us 15.96% 38.162 us 1.77% 27.477G 934.222 GB/s 45.82%
I16 I128 U32 2^24 = 16777216 1408x 361.656 us 1.91% 355.608 us 0.86% 47.179G 1.604 TB/s 78.67%
I16 I128 U32 2^28 = 268435456 93x 5.427 ms 0.27% 5.421 ms 0.24% 49.521G 1.684 TB/s 82.57%
I16 I128 I64 2^16 = 65536 31280x 22.090 us 38.27% 15.985 us 1.55% 4.100G 139.395 GB/s 6.84%
I16 I128 I64 2^20 = 1048576 13104x 44.211 us 15.88% 38.180 us 1.45% 27.464G 933.781 GB/s 45.80%
I16 I128 I64 2^24 = 16777216 1408x 361.553 us 1.84% 355.526 us 0.72% 47.190G 1.604 TB/s 78.69%
I16 I128 I64 2^28 = 268435456 93x 5.425 ms 0.23% 5.419 ms 0.20% 49.536G 1.684 TB/s 82.60%
I16 I128 U64 2^16 = 65536 31264x 22.104 us 38.23% 16.000 us 1.53% 4.096G 139.264 GB/s 6.83%
I16 I128 U64 2^20 = 1048576 13056x 44.412 us 16.05% 38.300 us 1.59% 27.378G 930.839 GB/s 45.65%
I16 I128 U64 2^24 = 16777216 1408x 361.389 us 1.84% 355.456 us 0.76% 47.199G 1.605 TB/s 78.70%
I16 I128 U64 2^28 = 268435456 93x 5.427 ms 0.24% 5.421 ms 0.21% 49.518G 1.684 TB/s 82.57%
I32 I8 I32 2^16 = 65536 48352x 16.358 us 58.28% 10.343 us 2.12% 6.336G 38.017 GB/s 1.86%
I32 I8 I32 2^20 = 1048576 29872x 22.877 us 36.74% 16.740 us 1.35% 62.639G 375.835 GB/s 18.43%
I32 I8 I32 2^24 = 16777216 5504x 97.060 us 6.66% 91.085 us 1.09% 184.193G 1.105 TB/s 54.20%
I32 I8 I32 2^28 = 268435456 2768x 1.168 ms 0.90% 1.162 ms 0.73% 231.080G 1.386 TB/s 68.00%
I32 I8 U32 2^16 = 65536 47088x 16.772 us 58.07% 10.619 us 2.59% 6.172G 37.030 GB/s 1.82%
I32 I8 U32 2^20 = 1048576 29680x 23.098 us 37.14% 16.852 us 1.53% 62.221G 373.327 GB/s 18.31%
I32 I8 U32 2^24 = 16777216 5488x 97.465 us 6.82% 91.326 us 1.10% 183.707G 1.102 TB/s 54.06%
I32 I8 U32 2^28 = 268435456 2752x 1.168 ms 0.91% 1.162 ms 0.73% 230.957G 1.386 TB/s 67.96%
I32 I8 I64 2^16 = 65536 46672x 16.847 us 57.35% 10.715 us 2.21% 6.116G 36.697 GB/s 1.80%
I32 I8 I64 2^20 = 1048576 28672x 23.678 us 35.80% 17.445 us 1.51% 60.109G 360.653 GB/s 17.69%
I32 I8 I64 2^24 = 16777216 5424x 98.394 us 6.66% 92.314 us 0.92% 181.741G 1.090 TB/s 53.48%
I32 I8 I64 2^28 = 268435456 2784x 1.186 ms 0.83% 1.180 ms 0.64% 227.449G 1.365 TB/s 66.93%
I32 I8 U64 2^16 = 65536 46208x 16.968 us 56.93% 10.822 us 2.49% 6.056G 36.336 GB/s 1.78%
I32 I8 U64 2^20 = 1048576 28928x 23.538 us 36.24% 17.287 us 1.44% 60.657G 363.940 GB/s 17.85%
I32 I8 U64 2^24 = 16777216 5424x 98.383 us 6.74% 92.238 us 0.93% 181.891G 1.091 TB/s 53.52%
I32 I8 U64 2^28 = 268435456 2848x 1.187 ms 0.84% 1.180 ms 0.65% 227.426G 1.365 TB/s 66.92%
I32 I16 I32 2^16 = 65536 42976x 17.776 us 52.90% 11.635 us 2.28% 5.633G 45.060 GB/s 2.21%
I32 I16 I32 2^20 = 1048576 28144x 23.976 us 34.99% 17.772 us 1.47% 59.002G 472.015 GB/s 23.15%
I32 I16 I32 2^24 = 16777216 4768x 111.207 us 5.95% 105.108 us 1.27% 159.619G 1.277 TB/s 62.63%
I32 I16 I32 2^28 = 268435456 2224x 1.450 ms 0.90% 1.443 ms 0.79% 185.985G 1.488 TB/s 72.97%
I32 I16 U32 2^16 = 65536 43616x 17.623 us 53.82% 11.466 us 2.07% 5.716G 45.727 GB/s 2.24%
I32 I16 U32 2^20 = 1048576 28128x 24.061 us 35.40% 17.780 us 1.25% 58.976G 471.806 GB/s 23.14%
I32 I16 U32 2^24 = 16777216 4768x 111.247 us 5.91% 105.183 us 1.24% 159.504G 1.276 TB/s 62.58%
I32 I16 U32 2^28 = 268435456 2064x 1.451 ms 0.92% 1.444 ms 0.82% 185.840G 1.487 TB/s 72.91%
I32 I16 I64 2^16 = 65536 43312x 17.703 us 53.47% 11.545 us 2.12% 5.677G 45.414 GB/s 2.23%
I32 I16 I64 2^20 = 1048576 29120x 23.182 us 35.01% 17.178 us 1.29% 61.043G 488.344 GB/s 23.95%
I32 I16 I64 2^24 = 16777216 4992x 106.364 us 6.12% 100.336 us 1.15% 167.210G 1.338 TB/s 65.60%
I32 I16 I64 2^28 = 268435456 2192x 1.407 ms 0.85% 1.401 ms 0.73% 191.616G 1.533 TB/s 75.18%
I32 I16 U64 2^16 = 65536 43872x 17.481 us 53.45% 11.400 us 1.89% 5.749G 45.989 GB/s 2.26%
I32 I16 U64 2^20 = 1048576 29136x 23.230 us 35.41% 17.164 us 1.17% 61.090G 488.720 GB/s 23.97%
I32 I16 U64 2^24 = 16777216 4992x 106.544 us 6.25% 100.393 us 1.17% 167.116G 1.337 TB/s 65.57%
I32 I16 U64 2^28 = 268435456 2240x 1.407 ms 0.88% 1.401 ms 0.77% 191.616G 1.533 TB/s 75.18%
I32 I32 I32 2^16 = 65536 42656x 17.864 us 52.49% 11.722 us 1.57% 5.591G 67.091 GB/s 3.29%
I32 I32 I32 2^20 = 1048576 26320x 25.001 us 31.62% 19.004 us 1.24% 55.177G 662.127 GB/s 32.47%
I32 I32 I32 2^24 = 16777216 3872x 135.509 us 4.74% 129.519 us 1.02% 129.534G 1.554 TB/s 76.23%
I32 I32 I32 2^28 = 268435456 2080x 1.911 ms 0.63% 1.905 ms 0.54% 140.894G 1.691 TB/s 82.92%
I32 I32 U32 2^16 = 65536 42784x 17.778 us 52.20% 11.687 us 1.73% 5.607G 67.290 GB/s 3.30%
I32 I32 U32 2^20 = 1048576 26368x 25.031 us 32.01% 18.973 us 1.24% 55.266G 663.194 GB/s 32.52%
I32 I32 U32 2^24 = 16777216 3872x 135.505 us 4.81% 129.421 us 1.00% 129.633G 1.556 TB/s 76.29%
I32 I32 U32 2^28 = 268435456 2096x 1.911 ms 0.63% 1.905 ms 0.54% 140.900G 1.691 TB/s 82.92%
I32 I32 I64 2^16 = 65536 41888x 18.080 us 51.55% 11.937 us 1.97% 5.490G 65.881 GB/s 3.23%
I32 I32 I64 2^20 = 1048576 26464x 24.882 us 31.71% 18.899 us 1.19% 55.484G 665.804 GB/s 32.65%
I32 I32 I64 2^24 = 16777216 3680x 142.181 us 4.50% 136.152 us 0.76% 123.224G 1.479 TB/s 72.52%
I32 I32 I64 2^28 = 268435456 2080x 2.044 ms 0.62% 2.037 ms 0.54% 131.752G 1.581 TB/s 77.54%
I32 I32 U64 2^16 = 65536 42080x 17.933 us 50.99% 11.883 us 1.87% 5.515G 66.180 GB/s 3.25%
I32 I32 U64 2^20 = 1048576 26320x 25.088 us 32.08% 19.008 us 1.28% 55.166G 661.995 GB/s 32.47%
I32 I32 U64 2^24 = 16777216 3680x 142.391 us 4.60% 136.242 us 0.76% 123.143G 1.478 TB/s 72.47%
I32 I32 U64 2^28 = 268435456 1984x 2.044 ms 0.62% 2.038 ms 0.54% 131.723G 1.581 TB/s 77.52%
I32 I64 I32 2^16 = 65536 40112x 18.628 us 49.47% 12.470 us 2.07% 5.255G 105.110 GB/s 5.15%
I32 I64 I32 2^20 = 1048576 21920x 28.970 us 27.06% 22.814 us 1.29% 45.962G 919.243 GB/s 45.08%
I32 I64 I32 2^24 = 16777216 2528x 204.316 us 3.25% 198.111 us 0.86% 84.686G 1.694 TB/s 83.06%
I32 I64 I32 2^28 = 268435456 995x 2.945 ms 0.54% 2.939 ms 0.50% 91.343G 1.827 TB/s 89.59%
I32 I64 U32 2^16 = 65536 40160x 18.674 us 50.05% 12.454 us 2.05% 5.262G 105.241 GB/s 5.16%
I32 I64 U32 2^20 = 1048576 21856x 29.032 us 26.94% 22.880 us 1.29% 45.829G 916.578 GB/s 44.95%
I32 I64 U32 2^24 = 16777216 2528x 203.983 us 3.16% 197.984 us 0.90% 84.740G 1.695 TB/s 83.12%
I32 I64 U32 2^28 = 268435456 1023x 2.945 ms 0.54% 2.939 ms 0.50% 91.343G 1.827 TB/s 89.59%
I32 I64 I64 2^16 = 65536 39424x 18.809 us 48.35% 12.688 us 2.26% 5.165G 103.308 GB/s 5.07%
I32 I64 I64 2^20 = 1048576 21648x 29.380 us 27.25% 23.100 us 1.30% 45.393G 907.865 GB/s 44.52%
I32 I64 I64 2^24 = 16777216 2496x 207.015 us 3.18% 200.895 us 0.92% 83.512G 1.670 TB/s 81.91%
I32 I64 I64 2^28 = 268435456 1001x 2.971 ms 0.54% 2.965 ms 0.50% 90.525G 1.811 TB/s 88.79%
I32 I64 U64 2^16 = 65536 39552x 18.818 us 48.92% 12.646 us 2.15% 5.182G 103.646 GB/s 5.08%
I32 I64 U64 2^20 = 1048576 21712x 29.233 us 26.93% 23.041 us 1.36% 45.508G 910.167 GB/s 44.64%
I32 I64 U64 2^24 = 16777216 2496x 206.823 us 3.15% 200.767 us 0.89% 83.566G 1.671 TB/s 81.97%
I32 I64 U64 2^28 = 268435456 1044x 2.971 ms 0.54% 2.965 ms 0.50% 90.537G 1.811 TB/s 88.80%
I32 I128 I32 2^16 = 65536 30752x 22.260 us 36.98% 16.262 us 1.79% 4.030G 145.077 GB/s 7.11%
I32 I128 I32 2^20 = 1048576 12736x 45.471 us 15.90% 39.263 us 1.48% 26.707G 961.437 GB/s 47.15%
I32 I128 I32 2^24 = 16777216 1360x 373.788 us 1.78% 367.760 us 0.68% 45.620G 1.642 TB/s 80.54%
I32 I128 I32 2^28 = 268435456 90x 5.611 ms 0.20% 5.605 ms 0.17% 47.889G 1.724 TB/s 84.55%
I32 I128 U32 2^16 = 65536 31744x 21.750 us 38.17% 15.752 us 1.51% 4.160G 149.776 GB/s 7.35%
I32 I128 U32 2^20 = 1048576 12864x 44.952 us 15.65% 38.896 us 1.37% 26.959G 970.512 GB/s 47.60%
I32 I128 U32 2^24 = 16777216 1376x 373.302 us 1.79% 367.196 us 0.67% 45.690G 1.645 TB/s 80.67%
I32 I128 U32 2^28 = 268435456 90x 5.612 ms 0.20% 5.606 ms 0.17% 47.887G 1.724 TB/s 84.55%
I32 I128 I64 2^16 = 65536 30864x 22.322 us 37.86% 16.202 us 1.75% 4.045G 145.616 GB/s 7.14%
I32 I128 I64 2^20 = 1048576 12816x 45.075 us 15.50% 39.057 us 1.55% 26.847G 966.509 GB/s 47.40%
I32 I128 I64 2^24 = 16777216 1360x 373.989 us 1.78% 367.948 us 0.67% 45.597G 1.641 TB/s 80.50%
I32 I128 I64 2^28 = 268435456 90x 5.617 ms 0.20% 5.611 ms 0.16% 47.842G 1.722 TB/s 84.47%
I32 I128 U64 2^16 = 65536 30912x 22.292 us 37.90% 16.176 us 1.67% 4.051G 145.850 GB/s 7.15%
I32 I128 U64 2^20 = 1048576 12832x 45.051 us 15.64% 38.988 us 1.48% 26.895G 968.225 GB/s 47.48%
I32 I128 U64 2^24 = 16777216 1360x 373.941 us 1.77% 367.900 us 0.66% 45.603G 1.642 TB/s 80.51%
I32 I128 U64 2^28 = 268435456 90x 5.617 ms 0.20% 5.611 ms 0.17% 47.840G 1.722 TB/s 84.46%
I64 I8 I32 2^16 = 65536 46224x 16.977 us 57.04% 10.819 us 2.23% 6.057G 60.574 GB/s 2.97%
I64 I8 I32 2^20 = 1048576 24192x 26.761 us 29.54% 20.673 us 1.67% 50.722G 507.216 GB/s 24.88%
I64 I8 I32 2^24 = 16777216 3936x 133.593 us 4.85% 127.485 us 0.74% 131.601G 1.316 TB/s 64.54%
I64 I8 I32 2^28 = 268435456 2736x 1.724 ms 0.69% 1.718 ms 0.60% 156.251G 1.563 TB/s 76.63%
I64 I8 U32 2^16 = 65536 46416x 16.940 us 57.36% 10.774 us 2.07% 6.083G 60.830 GB/s 2.98%
I64 I8 U32 2^20 = 1048576 24320x 26.771 us 30.27% 20.565 us 1.70% 50.989G 509.889 GB/s 25.01%
I64 I8 U32 2^24 = 16777216 3920x 133.638 us 4.79% 127.608 us 0.74% 131.474G 1.315 TB/s 64.48%
I64 I8 U32 2^28 = 268435456 2720x 1.724 ms 0.70% 1.718 ms 0.60% 156.277G 1.563 TB/s 76.64%
I64 I8 I64 2^16 = 65536 46656x 16.790 us 56.74% 10.720 us 2.17% 6.113G 61.134 GB/s 3.00%
I64 I8 I64 2^20 = 1048576 23904x 27.207 us 30.08% 20.930 us 1.71% 50.100G 501.004 GB/s 24.57%
I64 I8 I64 2^24 = 16777216 3872x 135.440 us 4.80% 129.328 us 0.76% 129.726G 1.297 TB/s 63.62%
I64 I8 I64 2^28 = 268435456 2720x 1.734 ms 0.69% 1.728 ms 0.59% 155.367G 1.554 TB/s 76.20%
I64 I8 U64 2^16 = 65536 45392x 17.213 us 56.38% 11.016 us 2.40% 5.949G 59.492 GB/s 2.92%
I64 I8 U64 2^20 = 1048576 23856x 27.225 us 29.94% 20.967 us 1.69% 50.010G 500.102 GB/s 24.53%
I64 I8 U64 2^24 = 16777216 3872x 135.510 us 4.81% 129.384 us 0.78% 129.670G 1.297 TB/s 63.59%
I64 I8 U64 2^28 = 268435456 2752x 1.734 ms 0.71% 1.728 ms 0.61% 155.369G 1.554 TB/s 76.20%
I64 I16 I32 2^16 = 65536 43776x 17.581 us 54.02% 11.423 us 2.24% 5.737G 68.848 GB/s 3.38%
I64 I16 I32 2^20 = 1048576 23072x 27.947 us 29.03% 21.678 us 2.03% 48.371G 580.452 GB/s 28.47%
I64 I16 I32 2^24 = 16777216 3488x 149.561 us 4.51% 143.433 us 1.43% 116.969G 1.404 TB/s 68.84%
I64 I16 I32 2^28 = 268435456 1840x 2.138 ms 0.63% 2.131 ms 0.55% 125.949G 1.511 TB/s 74.12%
I64 I16 U32 2^16 = 65536 44128x 17.482 us 54.38% 11.333 us 2.39% 5.783G 69.395 GB/s 3.40%
I64 I16 U32 2^20 = 1048576 23936x 27.171 us 30.14% 20.889 us 1.36% 50.196G 602.356 GB/s 29.54%
I64 I16 U32 2^24 = 16777216 3616x 144.892 us 4.63% 138.756 us 1.33% 120.912G 1.451 TB/s 71.16%
I64 I16 U32 2^28 = 268435456 1936x 2.046 ms 0.66% 2.040 ms 0.58% 131.583G 1.579 TB/s 77.44%
I64 I16 I64 2^16 = 65536 43408x 17.650 us 53.38% 11.519 us 2.50% 5.689G 68.272 GB/s 3.35%
I64 I16 I64 2^20 = 1048576 22656x 28.299 us 28.28% 22.080 us 1.90% 47.491G 569.887 GB/s 27.95%
I64 I16 I64 2^24 = 16777216 3488x 149.725 us 4.45% 143.686 us 1.44% 116.763G 1.401 TB/s 68.72%
I64 I16 I64 2^28 = 268435456 1792x 2.139 ms 0.63% 2.133 ms 0.56% 125.858G 1.510 TB/s 74.07%
I64 I16 U64 2^16 = 65536 43296x 17.643 us 52.88% 11.550 us 2.20% 5.674G 68.088 GB/s 3.34%
I64 I16 U64 2^20 = 1048576 22768x 28.237 us 28.62% 21.974 us 2.08% 47.719G 572.624 GB/s 28.08%
I64 I16 U64 2^24 = 16777216 3488x 149.739 us 4.51% 143.618 us 1.45% 116.818G 1.402 TB/s 68.75%
I64 I16 U64 2^28 = 268435456 1792x 2.138 ms 0.62% 2.132 ms 0.55% 125.888G 1.511 TB/s 74.09%
I64 I32 I32 2^16 = 65536 43856x 17.477 us 53.37% 11.403 us 1.71% 5.747G 91.958 GB/s 4.51%
I64 I32 I32 2^20 = 1048576 21024x 29.804 us 25.35% 23.786 us 1.18% 44.083G 705.326 GB/s 34.59%
I64 I32 I32 2^24 = 16777216 2928x 177.250 us 3.69% 171.205 us 1.07% 97.995G 1.568 TB/s 76.89%
I64 I32 I32 2^28 = 268435456 733x 2.538 ms 0.56% 2.532 ms 0.50% 106.008G 1.696 TB/s 83.18%
I64 I32 U32 2^16 = 65536 43408x 17.637 us 53.17% 11.523 us 1.90% 5.688G 91.002 GB/s 4.46%
I64 I32 U32 2^20 = 1048576 21024x 29.897 us 25.73% 23.790 us 1.21% 44.076G 705.214 GB/s 34.59%
I64 I32 U32 2^24 = 16777216 2928x 177.353 us 3.74% 171.214 us 1.06% 97.990G 1.568 TB/s 76.89%
I64 I32 U32 2^28 = 268435456 708x 2.538 ms 0.55% 2.532 ms 0.50% 106.026G 1.696 TB/s 83.20%
I64 I32 I64 2^16 = 65536 42688x 17.896 us 52.83% 11.717 us 1.77% 5.593G 89.492 GB/s 4.39%
I64 I32 I64 2^20 = 1048576 21328x 29.456 us 25.62% 23.458 us 1.23% 44.700G 715.198 GB/s 35.08%
I64 I32 I64 2^24 = 16777216 2896x 179.596 us 3.65% 173.561 us 1.09% 96.665G 1.547 TB/s 75.85%
I64 I32 I64 2^28 = 268435456 669x 2.594 ms 0.55% 2.588 ms 0.50% 103.728G 1.660 TB/s 81.39%
I64 I32 U64 2^16 = 65536 42592x 17.837 us 51.98% 11.742 us 1.73% 5.581G 89.298 GB/s 4.38%
I64 I32 U64 2^20 = 1048576 21344x 29.509 us 25.95% 23.440 us 1.15% 44.735G 715.763 GB/s 35.10%
I64 I32 U64 2^24 = 16777216 2896x 179.616 us 3.67% 173.539 us 1.09% 96.677G 1.547 TB/s 75.86%
I64 I32 U64 2^28 = 268435456 684x 2.594 ms 0.55% 2.588 ms 0.50% 103.723G 1.660 TB/s 81.39%
I64 I64 I32 2^16 = 65536 41488x 18.188 us 51.00% 12.053 us 2.04% 5.437G 130.491 GB/s 6.40%
I64 I64 I32 2^20 = 1048576 19520x 31.622 us 23.43% 25.630 us 1.13% 40.913G 981.903 GB/s 48.16%
I64 I64 I32 2^24 = 16777216 2112x 244.200 us 2.71% 238.161 us 0.95% 70.445G 1.691 TB/s 82.92%
I64 I64 I32 2^28 = 268435456 140x 3.587 ms 0.36% 3.581 ms 0.32% 74.966G 1.799 TB/s 88.24%
I64 I64 U32 2^16 = 65536 40704x 18.528 us 50.92% 12.286 us 2.29% 5.334G 128.021 GB/s 6.28%
I64 I64 U32 2^20 = 1048576 19344x 31.949 us 23.64% 25.851 us 1.21% 40.562G 973.492 GB/s 47.74%
I64 I64 U32 2^24 = 16777216 2096x 244.630 us 2.71% 238.574 us 0.95% 70.323G 1.688 TB/s 82.77%
I64 I64 U32 2^28 = 268435456 140x 3.587 ms 0.35% 3.581 ms 0.30% 74.965G 1.799 TB/s 88.24%
I64 I64 I64 2^16 = 65536 39888x 18.729 us 49.51% 12.537 us 2.41% 5.228G 125.462 GB/s 6.15%
I64 I64 I64 2^20 = 1048576 17616x 34.609 us 22.01% 28.399 us 2.18% 36.923G 886.159 GB/s 43.46%
I64 I64 I64 2^24 = 16777216 2032x 252.470 us 2.70% 246.215 us 0.91% 68.141G 1.635 TB/s 80.20%
I64 I64 I64 2^28 = 268435456 134x 3.742 ms 0.32% 3.736 ms 0.28% 71.859G 1.725 TB/s 84.58%
I64 I64 U64 2^16 = 65536 39904x 18.773 us 49.91% 12.534 us 2.60% 5.229G 125.487 GB/s 6.15%
I64 I64 U64 2^20 = 1048576 17568x 34.661 us 21.91% 28.468 us 2.39% 36.834G 884.014 GB/s 43.35%
I64 I64 U64 2^24 = 16777216 2048x 252.097 us 2.61% 246.056 us 0.88% 68.185G 1.636 TB/s 80.25%
I64 I64 U64 2^28 = 268435456 134x 3.743 ms 0.32% 3.736 ms 0.27% 71.842G 1.724 TB/s 84.56%
I64 I128 I32 2^16 = 65536 30944x 22.232 us 37.63% 16.161 us 1.54% 4.055G 162.206 GB/s 7.96%
I64 I128 I32 2^20 = 1048576 11072x 51.399 us 13.90% 45.161 us 1.39% 23.219G 928.746 GB/s 45.55%
I64 I128 I32 2^24 = 16777216 1184x 433.972 us 1.51% 427.930 us 0.53% 39.205G 1.568 TB/s 76.91%
I64 I128 I32 2^28 = 268435456 76x 6.593 ms 0.16% 6.587 ms 0.13% 40.753G 1.630 TB/s 79.95%
I64 I128 U32 2^16 = 65536 30784x 22.407 us 37.98% 16.249 us 1.38% 4.033G 161.325 GB/s 7.91%
I64 I128 U32 2^20 = 1048576 11072x 51.255 us 13.56% 45.171 us 1.44% 23.213G 928.534 GB/s 45.54%
I64 I128 U32 2^24 = 16777216 1184x 433.649 us 1.49% 427.696 us 0.52% 39.227G 1.569 TB/s 76.95%
I64 I128 U32 2^28 = 268435456 76x 6.596 ms 0.17% 6.589 ms 0.13% 40.738G 1.630 TB/s 79.92%
I64 I128 I64 2^16 = 65536 30336x 22.478 us 36.43% 16.485 us 1.54% 3.975G 159.016 GB/s 7.80%
I64 I128 I64 2^20 = 1048576 11088x 51.268 us 13.73% 45.117 us 1.46% 23.241G 929.654 GB/s 45.59%
I64 I128 I64 2^24 = 16777216 1184x 433.344 us 1.50% 427.321 us 0.52% 39.261G 1.570 TB/s 77.02%
I64 I128 I64 2^28 = 268435456 77x 6.585 ms 0.16% 6.579 ms 0.12% 40.804G 1.632 TB/s 80.04%
I64 I128 U64 2^16 = 65536 30688x 22.382 us 37.45% 16.294 us 1.45% 4.022G 160.882 GB/s 7.89%
I64 I128 U64 2^20 = 1048576 11104x 51.386 us 14.05% 45.093 us 1.48% 23.254G 930.150 GB/s 45.62%
I64 I128 U64 2^24 = 16777216 1184x 433.047 us 1.50% 427.018 us 0.51% 39.289G 1.572 TB/s 77.07%
I64 I128 U64 2^28 = 268435456 77x 6.584 ms 0.17% 6.578 ms 0.14% 40.810G 1.632 TB/s 80.06%
I128 I8 I32 2^16 = 65536 40816x 18.407 us 50.35% 12.252 us 2.27% 5.349G 96.283 GB/s 4.72%
I128 I8 I32 2^20 = 1048576 17328x 35.171 us 21.89% 28.871 us 1.29% 36.319G 653.738 GB/s 32.06%
I128 I8 I32 2^24 = 16777216 2464x 209.382 us 3.21% 203.265 us 1.11% 82.539G 1.486 TB/s 72.86%
I128 I8 I32 2^28 = 268435456 692x 3.007 ms 0.54% 3.001 ms 0.50% 89.454G 1.610 TB/s 78.97%
I128 I8 U32 2^16 = 65536 40800x 18.425 us 50.44% 12.256 us 1.95% 5.347G 96.248 GB/s 4.72%
I128 I8 U32 2^20 = 1048576 17280x 35.187 us 21.66% 28.938 us 1.30% 36.235G 652.234 GB/s 31.99%
I128 I8 U32 2^24 = 16777216 2464x 209.375 us 3.22% 203.276 us 1.17% 82.534G 1.486 TB/s 72.86%
I128 I8 U32 2^28 = 268435456 689x 3.007 ms 0.54% 3.001 ms 0.50% 89.460G 1.610 TB/s 78.97%
I128 I8 I64 2^16 = 65536 40432x 18.475 us 49.50% 12.368 us 1.94% 5.299G 95.381 GB/s 4.68%
I128 I8 I64 2^20 = 1048576 17280x 35.207 us 21.65% 28.958 us 1.37% 36.210G 651.787 GB/s 31.97%
I128 I8 I64 2^24 = 16777216 2464x 210.128 us 3.20% 204.034 us 1.14% 82.227G 1.480 TB/s 72.59%
I128 I8 I64 2^28 = 268435456 596x 3.007 ms 0.54% 3.001 ms 0.50% 89.463G 1.610 TB/s 78.98%
I128 I8 U64 2^16 = 65536 40560x 18.529 us 50.40% 12.329 us 2.15% 5.316G 95.679 GB/s 4.69%
I128 I8 U64 2^20 = 1048576 17184x 35.353 us 21.49% 29.116 us 1.34% 36.014G 648.251 GB/s 31.79%
I128 I8 U64 2^24 = 16777216 2464x 210.182 us 3.28% 203.937 us 1.15% 82.266G 1.481 TB/s 72.62%
I128 I8 U64 2^28 = 268435456 663x 3.007 ms 0.54% 3.001 ms 0.50% 89.444G 1.610 TB/s 78.96%
I128 I16 I32 2^16 = 65536 41296x 18.201 us 50.38% 12.109 us 1.96% 5.412G 108.241 GB/s 5.31%
I128 I16 I32 2^20 = 1048576 17872x 33.969 us 21.50% 27.979 us 1.45% 37.477G 749.539 GB/s 36.76%
I128 I16 I32 2^24 = 16777216 2272x 226.620 us 2.97% 220.536 us 1.08% 76.075G 1.521 TB/s 74.62%
I128 I16 I32 2^28 = 268435456 300x 3.276 ms 0.53% 3.270 ms 0.50% 82.091G 1.642 TB/s 80.52%
I128 I16 U32 2^16 = 65536 41440x 18.206 us 51.00% 12.066 us 1.99% 5.431G 108.630 GB/s 5.33%
I128 I16 U32 2^20 = 1048576 17840x 34.073 us 21.60% 28.037 us 1.41% 37.400G 748.001 GB/s 36.68%
I128 I16 U32 2^24 = 16777216 2272x 226.563 us 2.96% 220.500 us 1.08% 76.087G 1.522 TB/s 74.63%
I128 I16 U32 2^28 = 268435456 351x 3.275 ms 0.53% 3.269 ms 0.50% 82.113G 1.642 TB/s 80.54%
I128 I16 I64 2^16 = 65536 41040x 18.302 us 50.26% 12.187 us 1.99% 5.377G 107.548 GB/s 5.27%
I128 I16 I64 2^20 = 1048576 17824x 34.216 us 22.01% 28.059 us 1.39% 37.370G 747.410 GB/s 36.65%
I128 I16 I64 2^24 = 16777216 2272x 226.711 us 2.93% 220.701 us 1.08% 76.018G 1.520 TB/s 74.56%
I128 I16 I64 2^28 = 268435456 390x 3.278 ms 0.53% 3.272 ms 0.50% 82.043G 1.641 TB/s 80.47%
I128 I16 U64 2^16 = 65536 41600x 18.082 us 50.50% 12.022 us 1.93% 5.451G 109.027 GB/s 5.35%
I128 I16 U64 2^20 = 1048576 17824x 34.135 us 21.70% 28.063 us 1.41% 37.365G 747.291 GB/s 36.65%
I128 I16 U64 2^24 = 16777216 2272x 226.767 us 2.90% 220.842 us 1.09% 75.969G 1.519 TB/s 74.51%
I128 I16 U64 2^28 = 268435456 358x 3.277 ms 0.54% 3.271 ms 0.50% 82.066G 1.641 TB/s 80.49%
I128 I32 I32 2^16 = 65536 42288x 17.822 us 50.79% 11.825 us 1.90% 5.542G 133.010 GB/s 6.52%
I128 I32 I32 2^20 = 1048576 16848x 35.881 us 20.94% 29.690 us 1.61% 35.317G 847.619 GB/s 41.57%
I128 I32 I32 2^24 = 16777216 1952x 263.772 us 2.56% 257.746 us 1.05% 65.092G 1.562 TB/s 76.61%
I128 I32 I32 2^28 = 268435456 129x 3.886 ms 0.39% 3.880 ms 0.35% 69.193G 1.661 TB/s 81.44%
I128 I32 U32 2^16 = 65536 42192x 17.968 us 51.72% 11.853 us 2.02% 5.529G 132.702 GB/s 6.51%
I128 I32 U32 2^20 = 1048576 16896x 35.796 us 20.94% 29.620 us 1.62% 35.401G 849.618 GB/s 41.67%
I128 I32 U32 2^24 = 16777216 1952x 263.773 us 2.55% 257.758 us 1.03% 65.089G 1.562 TB/s 76.61%
I128 I32 U32 2^28 = 268435456 129x 3.887 ms 0.39% 3.880 ms 0.35% 69.177G 1.660 TB/s 81.42%
I128 I32 I64 2^16 = 65536 41488x 18.082 us 50.12% 12.055 us 2.23% 5.436G 130.475 GB/s 6.40%
I128 I32 I64 2^20 = 1048576 16672x 36.116 us 20.43% 30.011 us 1.56% 34.939G 838.545 GB/s 41.12%
I128 I32 I64 2^24 = 16777216 1952x 263.987 us 2.58% 257.895 us 1.02% 65.054G 1.561 TB/s 76.57%
I128 I32 I64 2^28 = 268435456 129x 3.888 ms 0.41% 3.882 ms 0.38% 69.152G 1.660 TB/s 81.39%
I128 I32 U64 2^16 = 65536 40576x 18.520 us 50.38% 12.326 us 2.40% 5.317G 127.610 GB/s 6.26%
I128 I32 U64 2^20 = 1048576 16704x 36.058 us 20.47% 29.955 us 1.60% 35.005G 840.120 GB/s 41.20%
I128 I32 U64 2^24 = 16777216 1952x 264.130 us 2.60% 258.008 us 1.06% 65.026G 1.561 TB/s 76.54%
I128 I32 U64 2^28 = 268435456 129x 3.888 ms 0.41% 3.882 ms 0.38% 69.146G 1.660 TB/s 81.39%
I128 I64 I32 2^16 = 65536 39264x 18.918 us 48.63% 12.738 us 1.79% 5.145G 164.637 GB/s 8.07%
I128 I64 I32 2^20 = 1048576 14400x 40.794 us 17.57% 34.734 us 1.88% 30.188G 966.031 GB/s 47.38%
I128 I64 I32 2^24 = 16777216 1568x 328.208 us 2.09% 322.089 us 0.86% 52.089G 1.667 TB/s 81.75%
I128 I64 I32 2^28 = 268435456 102x 4.933 ms 0.24% 4.927 ms 0.20% 54.483G 1.743 TB/s 85.50%
I128 I64 U32 2^16 = 65536 39344x 18.857 us 48.43% 12.713 us 1.70% 5.155G 164.964 GB/s 8.09%
I128 I64 U32 2^20 = 1048576 14320x 41.049 us 17.64% 34.933 us 1.96% 30.017G 960.545 GB/s 47.11%
I128 I64 U32 2^24 = 16777216 1568x 328.147 us 2.09% 321.995 us 0.85% 52.104G 1.667 TB/s 81.77%
I128 I64 U32 2^28 = 268435456 102x 4.932 ms 0.25% 4.925 ms 0.22% 54.500G 1.744 TB/s 85.53%
I128 I64 I64 2^16 = 65536 38224x 19.266 us 47.38% 13.082 us 2.13% 5.010G 160.305 GB/s 7.86%
I128 I64 I64 2^20 = 1048576 14160x 41.378 us 17.21% 35.333 us 1.59% 29.677G 949.666 GB/s 46.57%
I128 I64 I64 2^24 = 16777216 1568x 328.094 us 2.08% 321.987 us 0.84% 52.105G 1.667 TB/s 81.77%
I128 I64 I64 2^28 = 268435456 102x 4.931 ms 0.27% 4.925 ms 0.24% 54.503G 1.744 TB/s 85.53%
I128 I64 U64 2^16 = 65536 38432x 19.123 us 47.06% 13.013 us 2.05% 5.036G 161.158 GB/s 7.90%
I128 I64 U64 2^20 = 1048576 14272x 41.143 us 17.57% 35.035 us 1.97% 29.930G 957.754 GB/s 46.97%
I128 I64 U64 2^24 = 16777216 1568x 328.080 us 2.08% 321.964 us 0.84% 52.109G 1.667 TB/s 81.78%
I128 I64 U64 2^28 = 268435456 102x 4.930 ms 0.25% 4.924 ms 0.21% 54.513G 1.744 TB/s 85.55%
I128 I128 I32 2^16 = 65536 28608x 23.649 us 35.37% 17.479 us 1.34% 3.749G 179.969 GB/s 8.83%
I128 I128 I32 2^20 = 1048576 10144x 55.418 us 12.36% 49.363 us 1.40% 21.242G 1.020 TB/s 50.00%
I128 I128 I32 2^24 = 16777216 1024x 495.761 us 1.36% 489.726 us 0.57% 34.258G 1.644 TB/s 80.65%
I128 I128 I32 2^28 = 268435456 67x 7.543 ms 0.18% 7.537 ms 0.16% 35.616G 1.710 TB/s 83.84%
I128 I128 U32 2^16 = 65536 28304x 23.815 us 34.86% 17.672 us 1.76% 3.709G 178.010 GB/s 8.73%
I128 I128 U32 2^20 = 1048576 10128x 55.476 us 12.44% 49.382 us 1.49% 21.234G 1.019 TB/s 49.99%
I128 I128 U32 2^24 = 16777216 1024x 495.832 us 1.35% 489.907 us 0.60% 34.246G 1.644 TB/s 80.62%
I128 I128 U32 2^28 = 268435456 67x 7.543 ms 0.20% 7.537 ms 0.18% 35.617G 1.710 TB/s 83.84%
I128 I128 I64 2^16 = 65536 28976x 23.289 us 35.04% 17.258 us 1.28% 3.797G 182.277 GB/s 8.94%
I128 I128 I64 2^20 = 1048576 10128x 55.565 us 12.59% 49.413 us 1.77% 21.221G 1.019 TB/s 49.95%
I128 I128 I64 2^24 = 16777216 1024x 495.909 us 1.35% 489.969 us 0.59% 34.241G 1.644 TB/s 80.61%
I128 I128 I64 2^28 = 268435456 67x 7.543 ms 0.20% 7.537 ms 0.19% 35.615G 1.710 TB/s 83.84%
I128 I128 U64 2^16 = 65536 29072x 23.275 us 35.38% 17.204 us 1.32% 3.809G 182.848 GB/s 8.97%
I128 I128 U64 2^20 = 1048576 10176x 55.234 us 12.48% 49.159 us 1.60% 21.330G 1.024 TB/s 50.21%
I128 I128 U64 2^24 = 16777216 1024x 495.676 us 1.34% 489.789 us 0.59% 34.254G 1.644 TB/s 80.64%
I128 I128 U64 2^28 = 268435456 67x 7.545 ms 0.17% 7.539 ms 0.15% 35.608G 1.709 TB/s 83.82%
F32 I8 I32 2^16 = 65536 48368x 16.363 us 58.38% 10.340 us 2.37% 6.338G 38.027 GB/s 1.86%
F32 I8 I32 2^20 = 1048576 30128x 22.661 us 36.53% 16.604 us 1.31% 63.150G 378.902 GB/s 18.58%
F32 I8 I32 2^24 = 16777216 5488x 97.259 us 6.55% 91.370 us 1.11% 183.619G 1.102 TB/s 54.03%
F32 I8 I32 2^28 = 268435456 2736x 1.173 ms 0.89% 1.167 ms 0.72% 230.045G 1.380 TB/s 67.69%
F32 I8 U32 2^16 = 65536 48608x 16.293 us 58.46% 10.289 us 2.14% 6.370G 38.218 GB/s 1.87%
F32 I8 U32 2^20 = 1048576 29888x 22.938 us 37.15% 16.735 us 1.39% 62.656G 375.938 GB/s 18.44%
F32 I8 U32 2^24 = 16777216 5472x 97.482 us 6.68% 91.467 us 1.14% 183.425G 1.101 TB/s 53.97%
F32 I8 U32 2^28 = 268435456 2768x 1.172 ms 0.88% 1.166 ms 0.71% 230.222G 1.381 TB/s 67.74%
F32 I8 I64 2^16 = 65536 47440x 16.590 us 57.51% 10.543 us 2.66% 6.216G 37.297 GB/s 1.83%
F32 I8 I64 2^20 = 1048576 29408x 23.115 us 35.97% 17.009 us 1.41% 61.650G 369.898 GB/s 18.14%
F32 I8 I64 2^24 = 16777216 5456x 97.829 us 6.59% 91.842 us 0.93% 182.675G 1.096 TB/s 53.75%
F32 I8 I64 2^28 = 268435456 2832x 1.184 ms 0.84% 1.177 ms 0.65% 227.983G 1.368 TB/s 67.09%
F32 I8 U64 2^16 = 65536 47488x 16.478 us 56.60% 10.530 us 2.38% 6.224G 37.343 GB/s 1.83%
F32 I8 U64 2^20 = 1048576 29456x 23.188 us 36.64% 16.980 us 1.45% 61.752G 370.514 GB/s 18.17%
F32 I8 U64 2^24 = 16777216 5456x 97.936 us 6.68% 91.868 us 0.94% 182.623G 1.096 TB/s 53.74%
F32 I8 U64 2^28 = 268435456 2816x 1.183 ms 0.83% 1.177 ms 0.65% 227.993G 1.368 TB/s 67.09%
F32 I16 I32 2^16 = 65536 45280x 17.088 us 54.81% 11.046 us 1.93% 5.933G 47.463 GB/s 2.33%
F32 I16 I32 2^20 = 1048576 29024x 23.310 us 35.33% 17.232 us 1.30% 60.850G 486.798 GB/s 23.87%
F32 I16 I32 2^24 = 16777216 4992x 106.295 us 6.03% 100.356 us 1.12% 167.176G 1.337 TB/s 65.59%
F32 I16 I32 2^28 = 268435456 2272x 1.406 ms 0.86% 1.400 ms 0.74% 191.698G 1.534 TB/s 75.21%
F32 I16 U32 2^16 = 65536 43808x 17.590 us 54.18% 11.417 us 2.11% 5.740G 45.920 GB/s 2.25%
F32 I16 U32 2^20 = 1048576 28720x 23.574 us 35.45% 17.414 us 1.58% 60.215G 481.720 GB/s 23.62%
F32 I16 U32 2^24 = 16777216 4976x 106.720 us 6.28% 100.524 us 1.16% 166.898G 1.335 TB/s 65.48%
F32 I16 U32 2^28 = 268435456 2240x 1.407 ms 0.85% 1.401 ms 0.74% 191.666G 1.533 TB/s 75.20%
F32 I16 I64 2^16 = 65536 42368x 18.037 us 52.94% 11.803 us 2.13% 5.553G 44.421 GB/s 2.18%
F32 I16 I64 2^20 = 1048576 28432x 23.797 us 35.34% 17.594 us 1.45% 59.600G 476.798 GB/s 23.38%
F32 I16 I64 2^24 = 16777216 4960x 107.110 us 6.25% 100.915 us 1.14% 166.251G 1.330 TB/s 65.23%
F32 I16 I64 2^28 = 268435456 2304x 1.407 ms 0.86% 1.401 ms 0.74% 191.555G 1.532 TB/s 75.15%
F32 I16 U64 2^16 = 65536 42016x 18.174 us 52.79% 11.904 us 2.09% 5.506G 44.045 GB/s 2.16%
F32 I16 U64 2^20 = 1048576 28336x 23.793 us 34.86% 17.653 us 1.50% 59.400G 475.202 GB/s 23.31%
F32 I16 U64 2^24 = 16777216 4960x 107.283 us 6.31% 101.026 us 1.13% 166.067G 1.329 TB/s 65.16%
F32 I16 U64 2^28 = 268435456 2256x 1.407 ms 0.88% 1.401 ms 0.76% 191.599G 1.533 TB/s 75.17%
F32 I32 I32 2^16 = 65536 41664x 18.248 us 52.11% 12.003 us 1.69% 5.460G 65.518 GB/s 3.21%
F32 I32 I32 2^20 = 1048576 26048x 25.351 us 32.09% 19.203 us 1.47% 54.606G 655.269 GB/s 32.14%
F32 I32 I32 2^24 = 16777216 3872x 135.653 us 4.87% 129.493 us 0.98% 129.560G 1.555 TB/s 76.25%
F32 I32 I32 2^28 = 268435456 2128x 1.911 ms 0.64% 1.905 ms 0.56% 140.941G 1.691 TB/s 82.95%
F32 I32 U32 2^16 = 65536 41584x 18.295 us 52.24% 12.027 us 1.85% 5.449G 65.390 GB/s 3.21%
F32 I32 U32 2^20 = 1048576 26016x 25.425 us 32.30% 19.228 us 1.46% 54.533G 654.395 GB/s 32.09%
F32 I32 U32 2^24 = 16777216 3872x 135.675 us 4.89% 129.482 us 1.00% 129.572G 1.555 TB/s 76.25%
F32 I32 U32 2^28 = 268435456 2160x 1.910 ms 0.64% 1.904 ms 0.55% 140.960G 1.692 TB/s 82.96%
F32 I32 I64 2^16 = 65536 40832x 18.497 us 51.10% 12.250 us 2.16% 5.350G 64.200 GB/s 3.15%
F32 I32 I64 2^20 = 1048576 26272x 25.145 us 32.15% 19.035 us 1.32% 55.086G 661.035 GB/s 32.42%
F32 I32 I64 2^24 = 16777216 3792x 138.153 us 4.77% 132.015 us 1.03% 127.086G 1.525 TB/s 74.79%
F32 I32 I64 2^28 = 268435456 2120x 1.963 ms 0.59% 1.957 ms 0.50% 137.195G 1.646 TB/s 80.74%
F32 I32 U64 2^16 = 65536 40720x 18.486 us 50.63% 12.281 us 2.43% 5.336G 64.035 GB/s 3.14%
F32 I32 U64 2^20 = 1048576 26272x 25.177 us 32.30% 19.041 us 1.40% 55.068G 660.817 GB/s 32.41%
F32 I32 U64 2^24 = 16777216 3792x 138.262 us 4.82% 132.043 us 1.01% 127.058G 1.525 TB/s 74.78%
F32 I32 U64 2^28 = 268435456 2115x 1.963 ms 0.59% 1.957 ms 0.50% 137.191G 1.646 TB/s 80.74%
F32 I64 I32 2^16 = 65536 39936x 18.788 us 50.18% 12.520 us 2.22% 5.234G 104.690 GB/s 5.13%
F32 I64 I32 2^20 = 1048576 21968x 28.957 us 27.27% 22.763 us 1.25% 46.066G 921.316 GB/s 45.18%
F32 I64 I32 2^24 = 16777216 2528x 204.141 us 3.17% 198.108 us 0.86% 84.687G 1.694 TB/s 83.07%
F32 I64 I32 2^28 = 268435456 981x 2.945 ms 0.54% 2.939 ms 0.50% 91.337G 1.827 TB/s 89.59%
F32 I64 U32 2^16 = 65536 40528x 18.457 us 49.68% 12.342 us 2.66% 5.310G 106.202 GB/s 5.21%
F32 I64 U32 2^20 = 1048576 21952x 29.086 us 27.71% 22.785 us 1.29% 46.020G 920.402 GB/s 45.14%
F32 I64 U32 2^24 = 16777216 2528x 204.238 us 3.25% 198.048 us 0.87% 84.713G 1.694 TB/s 83.09%
F32 I64 U32 2^28 = 268435456 994x 2.945 ms 0.54% 2.939 ms 0.50% 91.338G 1.827 TB/s 89.59%
F32 I64 I64 2^16 = 65536 39312x 18.875 us 48.51% 12.720 us 2.20% 5.152G 103.046 GB/s 5.05%
F32 I64 I64 2^20 = 1048576 21744x 29.195 us 26.95% 23.009 us 1.37% 45.572G 911.450 GB/s 44.70%
F32 I64 I64 2^24 = 16777216 2496x 206.610 us 3.13% 200.598 us 0.91% 83.636G 1.673 TB/s 82.03%
F32 I64 I64 2^28 = 268435456 1008x 2.969 ms 0.54% 2.963 ms 0.50% 90.593G 1.812 TB/s 88.86%
F32 I64 U64 2^16 = 65536 39408x 18.817 us 48.37% 12.690 us 2.09% 5.164G 103.287 GB/s 5.07%
F32 I64 U64 2^20 = 1048576 21728x 29.245 us 27.11% 23.022 us 1.20% 45.547G 910.938 GB/s 44.67%
F32 I64 U64 2^24 = 16777216 2496x 206.681 us 3.17% 200.600 us 0.90% 83.635G 1.673 TB/s 82.03%
F32 I64 U64 2^28 = 268435456 942x 2.969 ms 0.54% 2.963 ms 0.50% 90.588G 1.812 TB/s 88.85%
F32 I128 I32 2^16 = 65536 30416x 22.560 us 37.30% 16.444 us 1.73% 3.985G 143.478 GB/s 7.04%
F32 I128 I32 2^20 = 1048576 12864x 44.908 us 15.61% 38.876 us 1.60% 26.972G 970.999 GB/s 47.62%
F32 I128 I32 2^24 = 16777216 1376x 373.454 us 1.77% 367.449 us 0.69% 45.659G 1.644 TB/s 80.61%
F32 I128 I32 2^28 = 268435456 90x 5.612 ms 0.20% 5.606 ms 0.16% 47.884G 1.724 TB/s 84.54%
F32 I128 U32 2^16 = 65536 30176x 22.693 us 37.00% 16.574 us 1.81% 3.954G 142.353 GB/s 6.98%
F32 I128 U32 2^20 = 1048576 12864x 45.035 us 15.87% 38.901 us 1.62% 26.955G 970.372 GB/s 47.59%
F32 I128 U32 2^24 = 16777216 1360x 373.861 us 1.81% 367.730 us 0.69% 45.624G 1.642 TB/s 80.55%
F32 I128 U32 2^28 = 268435456 90x 5.609 ms 0.21% 5.603 ms 0.18% 47.906G 1.725 TB/s 84.58%
F32 I128 I64 2^16 = 65536 30512x 22.615 us 38.03% 16.393 us 1.75% 3.998G 143.922 GB/s 7.06%
F32 I128 I64 2^20 = 1048576 12672x 45.512 us 15.37% 39.478 us 1.52% 26.561G 956.186 GB/s 46.89%
F32 I128 I64 2^24 = 16777216 1360x 374.516 us 1.78% 368.477 us 0.70% 45.531G 1.639 TB/s 80.39%
F32 I128 I64 2^28 = 268435456 90x 5.614 ms 0.21% 5.608 ms 0.18% 47.864G 1.723 TB/s 84.51%
F32 I128 U64 2^16 = 65536 30208x 22.717 us 37.24% 16.561 us 1.57% 3.957G 142.465 GB/s 6.99%
F32 I128 U64 2^20 = 1048576 12752x 45.462 us 16.02% 39.214 us 1.45% 26.740G 962.645 GB/s 47.21%
F32 I128 U64 2^24 = 16777216 1360x 374.377 us 1.79% 368.289 us 0.69% 45.555G 1.640 TB/s 80.43%
F32 I128 U64 2^28 = 268435456 90x 5.615 ms 0.22% 5.608 ms 0.18% 47.863G 1.723 TB/s 84.50%
F64 I8 I32 2^16 = 65536 45520x 17.107 us 55.85% 10.986 us 2.42% 5.965G 59.652 GB/s 2.93%
F64 I8 I32 2^20 = 1048576 24000x 27.038 us 29.79% 20.846 us 1.80% 50.302G 503.022 GB/s 24.67%
F64 I8 I32 2^24 = 16777216 3888x 134.905 us 4.73% 128.907 us 0.79% 130.150G 1.301 TB/s 63.83%
F64 I8 I32 2^28 = 268435456 2720x 1.727 ms 0.69% 1.721 ms 0.59% 155.978G 1.560 TB/s 76.50%
F64 I8 U32 2^16 = 65536 45904x 17.065 us 56.75% 10.895 us 2.40% 6.015G 60.151 GB/s 2.95%
F64 I8 U32 2^20 = 1048576 24016x 27.122 us 30.35% 20.822 us 1.76% 50.360G 503.599 GB/s 24.70%
F64 I8 U32 2^24 = 16777216 3888x 134.859 us 4.83% 128.733 us 0.79% 130.326G 1.303 TB/s 63.92%
F64 I8 U32 2^28 = 268435456 2720x 1.727 ms 0.69% 1.720 ms 0.59% 156.037G 1.560 TB/s 76.52%
F64 I8 I64 2^16 = 65536 45536x 17.136 us 56.16% 10.983 us 2.57% 5.967G 59.669 GB/s 2.93%
F64 I8 I64 2^20 = 1048576 23856x 27.230 us 29.96% 20.969 us 1.82% 50.006G 500.057 GB/s 24.52%
F64 I8 I64 2^24 = 16777216 3840x 136.430 us 4.74% 130.355 us 0.84% 128.704G 1.287 TB/s 63.12%
F64 I8 I64 2^28 = 268435456 2736x 1.738 ms 0.71% 1.732 ms 0.61% 154.995G 1.550 TB/s 76.01%
F64 I8 U64 2^16 = 65536 45120x 17.186 us 55.22% 11.082 us 2.38% 5.914G 59.137 GB/s 2.90%
F64 I8 U64 2^20 = 1048576 24000x 27.142 us 30.29% 20.847 us 1.85% 50.298G 502.980 GB/s 24.67%
F64 I8 U64 2^24 = 16777216 3840x 136.365 us 4.80% 130.214 us 0.82% 128.844G 1.288 TB/s 63.19%
F64 I8 U64 2^28 = 268435456 2704x 1.738 ms 0.70% 1.732 ms 0.60% 154.981G 1.550 TB/s 76.01%
F64 I16 I32 2^16 = 65536 44000x 17.538 us 54.45% 11.364 us 2.14% 5.767G 69.204 GB/s 3.39%
F64 I16 I32 2^20 = 1048576 23008x 27.955 us 28.69% 21.742 us 1.97% 48.229G 578.746 GB/s 28.38%
F64 I16 I32 2^24 = 16777216 3504x 149.011 us 4.50% 142.958 us 1.50% 117.358G 1.408 TB/s 69.07%
F64 I16 I32 2^28 = 268435456 1856x 2.135 ms 0.61% 2.129 ms 0.54% 126.109G 1.513 TB/s 74.22%
F64 I16 U32 2^16 = 65536 44064x 17.520 us 54.47% 11.350 us 2.14% 5.774G 69.287 GB/s 3.40%
F64 I16 U32 2^20 = 1048576 24000x 27.087 us 30.03% 20.841 us 1.18% 50.312G 603.747 GB/s 29.61%
F64 I16 U32 2^24 = 16777216 3616x 144.964 us 4.62% 138.838 us 1.35% 120.840G 1.450 TB/s 71.12%
F64 I16 U32 2^28 = 268435456 2000x 2.046 ms 0.64% 2.040 ms 0.56% 131.617G 1.579 TB/s 77.46%
F64 I16 I64 2^16 = 65536 43568x 17.654 us 53.94% 11.477 us 2.20% 5.710G 68.523 GB/s 3.36%
F64 I16 I64 2^20 = 1048576 22752x 28.184 us 28.27% 21.990 us 2.05% 47.684G 572.209 GB/s 28.06%
F64 I16 I64 2^24 = 16777216 3488x 149.486 us 4.48% 143.459 us 1.53% 116.948G 1.403 TB/s 68.83%
F64 I16 I64 2^28 = 268435456 1760x 2.140 ms 0.63% 2.134 ms 0.56% 125.783G 1.509 TB/s 74.02%
F64 I16 U64 2^16 = 65536 43568x 17.609 us 53.51% 11.479 us 2.28% 5.709G 68.510 GB/s 3.36%
F64 I16 U64 2^20 = 1048576 22672x 28.238 us 28.10% 22.062 us 1.88% 47.529G 570.353 GB/s 27.97%
F64 I16 U64 2^24 = 16777216 3488x 149.662 us 4.59% 143.468 us 1.53% 116.940G 1.403 TB/s 68.82%
F64 I16 U64 2^28 = 268435456 1872x 2.140 ms 0.63% 2.134 ms 0.56% 125.782G 1.509 TB/s 74.02%
F64 I32 I32 2^16 = 65536 42512x 18.032 us 53.42% 11.762 us 2.00% 5.572G 89.153 GB/s 4.37%
F64 I32 I32 2^20 = 1048576 21648x 29.203 us 26.46% 23.102 us 1.31% 45.388G 726.209 GB/s 35.62%
F64 I32 I32 2^24 = 16777216 2928x 177.038 us 3.76% 170.909 us 1.11% 98.165G 1.571 TB/s 77.03%
F64 I32 I32 2^28 = 268435456 697x 2.536 ms 0.55% 2.530 ms 0.50% 106.103G 1.698 TB/s 83.26%
F64 I32 U32 2^16 = 65536 42240x 17.934 us 51.57% 11.839 us 1.85% 5.535G 88.566 GB/s 4.34%
F64 I32 U32 2^20 = 1048576 21632x 29.304 us 26.83% 23.118 us 1.28% 45.358G 725.720 GB/s 35.59%
F64 I32 U32 2^24 = 16777216 2928x 177.077 us 3.79% 170.893 us 1.12% 98.174G 1.571 TB/s 77.04%
F64 I32 U32 2^28 = 268435456 802x 2.536 ms 0.55% 2.530 ms 0.50% 106.119G 1.698 TB/s 83.27%
F64 I32 I64 2^16 = 65536 41776x 18.240 us 52.48% 11.971 us 2.20% 5.474G 87.590 GB/s 4.30%
F64 I32 I64 2^20 = 1048576 21040x 29.865 us 25.70% 23.771 us 1.26% 44.111G 705.771 GB/s 34.61%
F64 I32 I64 2^24 = 16777216 2880x 179.776 us 3.71% 173.626 us 1.09% 96.629G 1.546 TB/s 75.82%
F64 I32 I64 2^28 = 268435456 632x 2.592 ms 0.55% 2.586 ms 0.50% 103.814G 1.661 TB/s 81.46%
F64 I32 U64 2^16 = 65536 41728x 18.156 us 51.63% 11.982 us 1.85% 5.469G 87.510 GB/s 4.29%
F64 I32 U64 2^20 = 1048576 21088x 29.929 us 26.25% 23.719 us 1.28% 44.207G 707.318 GB/s 34.69%
F64 I32 U64 2^24 = 16777216 2880x 179.814 us 3.74% 173.621 us 1.10% 96.631G 1.546 TB/s 75.83%
F64 I32 U64 2^28 = 268435456 543x 2.592 ms 0.55% 2.586 ms 0.50% 103.820G 1.661 TB/s 81.47%
F64 I64 I32 2^16 = 65536 40624x 18.450 us 50.01% 12.308 us 1.85% 5.324G 127.788 GB/s 6.27%
F64 I64 I32 2^20 = 1048576 19408x 31.822 us 23.52% 25.777 us 1.17% 40.679G 976.288 GB/s 47.88%
F64 I64 I32 2^24 = 16777216 2080x 247.317 us 2.68% 241.238 us 0.91% 69.546G 1.669 TB/s 81.86%
F64 I64 I32 2^28 = 268435456 138x 3.643 ms 0.36% 3.637 ms 0.32% 73.812G 1.771 TB/s 86.88%
F64 I64 U32 2^16 = 65536 41008x 18.345 us 50.53% 12.195 us 1.58% 5.374G 128.980 GB/s 6.33%
F64 I64 U32 2^20 = 1048576 19344x 31.893 us 23.38% 25.863 us 1.11% 40.544G 973.056 GB/s 47.72%
F64 I64 U32 2^24 = 16777216 2080x 247.514 us 2.70% 241.397 us 0.92% 69.500G 1.668 TB/s 81.80%
F64 I64 U32 2^28 = 268435456 138x 3.642 ms 0.33% 3.636 ms 0.29% 73.833G 1.772 TB/s 86.90%
F64 I64 I64 2^16 = 65536 40752x 18.403 us 50.11% 12.270 us 2.48% 5.341G 128.184 GB/s 6.29%
F64 I64 I64 2^20 = 1048576 17776x 34.344 us 22.15% 28.152 us 2.34% 37.247G 893.919 GB/s 43.84%
F64 I64 I64 2^24 = 16777216 2048x 251.470 us 2.62% 245.446 us 0.92% 68.354G 1.640 TB/s 80.45%
F64 I64 I64 2^28 = 268435456 135x 3.733 ms 0.32% 3.727 ms 0.27% 72.022G 1.729 TB/s 84.77%
F64 I64 U64 2^16 = 65536 41264x 18.192 us 50.24% 12.121 us 2.86% 5.407G 129.759 GB/s 6.36%
F64 I64 U64 2^20 = 1048576 17808x 34.279 us 22.17% 28.091 us 2.22% 37.328G 895.875 GB/s 43.94%
F64 I64 U64 2^24 = 16777216 2048x 251.309 us 2.60% 245.375 us 0.94% 68.374G 1.641 TB/s 80.48%
F64 I64 U64 2^28 = 268435456 135x 3.734 ms 0.31% 3.728 ms 0.26% 72.011G 1.728 TB/s 84.76%
F64 I128 I32 2^16 = 65536 31504x 21.909 us 38.11% 15.873 us 1.25% 4.129G 165.154 GB/s 8.10%
F64 I128 I32 2^20 = 1048576 11152x 50.999 us 13.78% 44.854 us 1.35% 23.377G 935.094 GB/s 45.86%
F64 I128 I32 2^24 = 16777216 1184x 432.819 us 1.51% 426.824 us 0.54% 39.307G 1.572 TB/s 77.11%
F64 I128 I32 2^28 = 268435456 77x 6.580 ms 0.16% 6.574 ms 0.14% 40.832G 1.633 TB/s 80.10%
F64 I128 U32 2^16 = 65536 31520x 21.810 us 37.51% 15.870 us 1.48% 4.130G 165.187 GB/s 8.10%
F64 I128 U32 2^20 = 1048576 11184x 50.854 us 13.76% 44.737 us 1.34% 23.439G 937.549 GB/s 45.98%
F64 I128 U32 2^24 = 16777216 1184x 432.717 us 1.49% 426.781 us 0.53% 39.311G 1.572 TB/s 77.12%
F64 I128 U32 2^28 = 268435456 77x 6.581 ms 0.16% 6.575 ms 0.13% 40.826G 1.633 TB/s 80.09%
F64 I128 I64 2^16 = 65536 31488x 21.919 us 38.07% 15.885 us 1.42% 4.126G 165.021 GB/s 8.09%
F64 I128 I64 2^20 = 1048576 11200x 50.726 us 13.55% 44.702 us 1.36% 23.457G 938.276 GB/s 46.02%
F64 I128 I64 2^24 = 16777216 1184x 432.977 us 1.47% 427.097 us 0.52% 39.282G 1.571 TB/s 77.06%
F64 I128 I64 2^28 = 268435456 77x 6.584 ms 0.15% 6.578 ms 0.12% 40.806G 1.632 TB/s 80.05%
F64 I128 U64 2^16 = 65536 31344x 21.937 us 37.57% 15.953 us 1.28% 4.108G 164.318 GB/s 8.06%
F64 I128 U64 2^20 = 1048576 11200x 50.773 us 13.76% 44.664 us 1.35% 23.477G 939.074 GB/s 46.05%
F64 I128 U64 2^24 = 16777216 1184x 432.994 us 1.49% 427.061 us 0.54% 39.285G 1.571 TB/s 77.07%
F64 I128 U64 2^28 = 268435456 77x 6.583 ms 0.16% 6.577 ms 0.14% 40.812G 1.632 TB/s 80.06%
C64 I8 I32 2^16 = 65536 47008x 16.671 us 56.85% 10.638 us 2.35% 6.160G 61.605 GB/s 3.02%
C64 I8 I32 2^20 = 1048576 24544x 26.455 us 29.89% 20.382 us 1.71% 51.446G 514.461 GB/s 25.23%
C64 I8 I32 2^24 = 16777216 3888x 134.915 us 4.67% 128.988 us 0.78% 130.068G 1.301 TB/s 63.79%
C64 I8 I32 2^28 = 268435456 2752x 1.735 ms 0.70% 1.729 ms 0.60% 155.257G 1.553 TB/s 76.14%
C64 I8 U32 2^16 = 65536 47424x 16.479 us 56.36% 10.547 us 2.40% 6.214G 62.139 GB/s 3.05%
C64 I8 U32 2^20 = 1048576 24400x 26.584 us 29.77% 20.498 us 1.67% 51.155G 511.552 GB/s 25.09%
C64 I8 U32 2^24 = 16777216 3888x 135.032 us 4.80% 128.939 us 0.80% 130.118G 1.301 TB/s 63.81%
C64 I8 U32 2^28 = 268435456 2768x 1.735 ms 0.69% 1.729 ms 0.59% 155.243G 1.552 TB/s 76.14%
C64 I8 I64 2^16 = 65536 45632x 17.109 us 56.21% 10.960 us 2.16% 5.979G 59.794 GB/s 2.93%
C64 I8 I64 2^20 = 1048576 24160x 26.776 us 29.45% 20.697 us 1.61% 50.663G 506.634 GB/s 24.85%
C64 I8 I64 2^24 = 16777216 3872x 135.302 us 4.82% 129.173 us 0.79% 129.882G 1.299 TB/s 63.70%
C64 I8 I64 2^28 = 268435456 2720x 1.737 ms 0.70% 1.731 ms 0.61% 155.038G 1.550 TB/s 76.03%
C64 I8 U64 2^16 = 65536 45728x 17.098 us 56.45% 10.937 us 2.03% 5.992G 59.920 GB/s 2.94%
C64 I8 U64 2^20 = 1048576 24224x 26.718 us 29.47% 20.650 us 1.59% 50.779G 507.794 GB/s 24.90%
C64 I8 U64 2^24 = 16777216 3872x 135.260 us 4.80% 129.149 us 0.79% 129.905G 1.299 TB/s 63.71%
C64 I8 U64 2^28 = 268435456 2672x 1.738 ms 0.70% 1.732 ms 0.61% 155.014G 1.550 TB/s 76.02%
C64 I16 I32 2^16 = 65536 44160x 17.451 us 54.24% 11.323 us 2.10% 5.788G 69.455 GB/s 3.41%
C64 I16 I32 2^20 = 1048576 24224x 26.730 us 29.51% 20.651 us 1.20% 50.777G 609.322 GB/s 29.88%
C64 I16 I32 2^24 = 16777216 3616x 144.772 us 4.64% 138.636 us 1.36% 121.017G 1.452 TB/s 71.22%
C64 I16 I32 2^28 = 268435456 2064x 2.048 ms 0.64% 2.042 ms 0.57% 131.487G 1.578 TB/s 77.38%
C64 I16 U32 2^16 = 65536 44288x 17.424 us 54.41% 11.294 us 2.30% 5.803G 69.636 GB/s 3.42%
C64 I16 U32 2^20 = 1048576 24288x 26.771 us 30.10% 20.588 us 1.27% 50.930G 611.163 GB/s 29.97%
C64 I16 U32 2^24 = 16777216 3616x 144.741 us 4.56% 138.710 us 1.35% 120.952G 1.451 TB/s 71.18%
C64 I16 U32 2^28 = 268435456 1968x 2.048 ms 0.65% 2.042 ms 0.57% 131.466G 1.578 TB/s 77.37%
C64 I16 I64 2^16 = 65536 44272x 17.315 us 53.40% 11.296 us 2.12% 5.802G 69.618 GB/s 3.41%
C64 I16 I64 2^20 = 1048576 22960x 27.925 us 28.32% 21.779 us 1.80% 48.146G 577.749 GB/s 28.33%
C64 I16 I64 2^24 = 16777216 3488x 149.729 us 4.42% 143.753 us 1.49% 116.709G 1.401 TB/s 68.68%
C64 I16 I64 2^28 = 268435456 1808x 2.144 ms 0.64% 2.138 ms 0.57% 125.545G 1.507 TB/s 73.88%
C64 I16 U64 2^16 = 65536 44416x 17.301 us 53.81% 11.258 us 2.17% 5.822G 69.858 GB/s 3.43%
C64 I16 U64 2^20 = 1048576 22928x 27.965 us 28.32% 21.811 us 1.81% 48.076G 576.908 GB/s 28.29%
C64 I16 U64 2^24 = 16777216 3488x 149.865 us 4.44% 143.838 us 1.47% 116.640G 1.400 TB/s 68.64%
C64 I16 U64 2^28 = 268435456 1872x 2.144 ms 0.62% 2.138 ms 0.55% 125.547G 1.507 TB/s 73.89%
C64 I32 I32 2^16 = 65536 44032x 17.360 us 52.91% 11.359 us 1.72% 5.769G 92.312 GB/s 4.53%
C64 I32 I32 2^20 = 1048576 21824x 28.952 us 26.39% 22.917 us 1.10% 45.755G 732.087 GB/s 35.90%
C64 I32 I32 2^24 = 16777216 2928x 177.081 us 3.71% 171.042 us 1.12% 98.088G 1.569 TB/s 76.97%
C64 I32 I32 2^28 = 268435456 759x 2.538 ms 0.55% 2.532 ms 0.50% 105.998G 1.696 TB/s 83.17%
C64 I32 U32 2^16 = 65536 43472x 17.625 us 53.29% 11.505 us 1.85% 5.696G 91.139 GB/s 4.47%
C64 I32 U32 2^20 = 1048576 21712x 29.000 us 25.92% 23.042 us 1.21% 45.507G 728.111 GB/s 35.71%
C64 I32 U32 2^24 = 16777216 2928x 177.092 us 3.69% 171.079 us 1.10% 98.067G 1.569 TB/s 76.95%
C64 I32 U32 2^28 = 268435456 746x 2.538 ms 0.55% 2.532 ms 0.50% 106.015G 1.696 TB/s 83.19%
C64 I32 I64 2^16 = 65536 42528x 17.867 us 52.09% 11.758 us 2.34% 5.574G 89.182 GB/s 4.37%
C64 I32 I64 2^20 = 1048576 21776x 29.058 us 26.54% 22.977 us 1.25% 45.637G 730.186 GB/s 35.81%
C64 I32 I64 2^24 = 16777216 2896x 179.466 us 3.69% 173.372 us 1.12% 96.770G 1.548 TB/s 75.93%
C64 I32 I64 2^28 = 268435456 628x 2.594 ms 0.55% 2.588 ms 0.50% 103.708G 1.659 TB/s 81.38%
C64 I32 U64 2^16 = 65536 42640x 17.877 us 52.54% 11.726 us 1.90% 5.589G 89.420 GB/s 4.39%
C64 I32 U64 2^20 = 1048576 21792x 28.893 us 25.90% 22.958 us 1.31% 45.673G 730.765 GB/s 35.84%
C64 I32 U64 2^24 = 16777216 2896x 179.415 us 3.62% 173.445 us 1.11% 96.729G 1.548 TB/s 75.90%
C64 I32 U64 2^28 = 268435456 675x 2.594 ms 0.55% 2.588 ms 0.50% 103.709G 1.659 TB/s 81.38%
C64 I64 I32 2^16 = 65536 39984x 18.565 us 48.52% 12.510 us 2.43% 5.239G 125.731 GB/s 6.17%
C64 I64 I32 2^20 = 1048576 19648x 31.555 us 23.97% 25.466 us 1.16% 41.176G 988.214 GB/s 48.46%
C64 I64 I32 2^24 = 16777216 2112x 244.527 us 2.76% 238.381 us 0.97% 70.380G 1.689 TB/s 82.84%
C64 I64 I32 2^28 = 268435456 140x 3.585 ms 0.39% 3.580 ms 0.35% 74.992G 1.800 TB/s 88.27%
C64 I64 U32 2^16 = 65536 39824x 18.688 us 48.97% 12.557 us 2.51% 5.219G 125.257 GB/s 6.14%
C64 I64 U32 2^20 = 1048576 19600x 31.551 us 23.70% 25.519 us 1.21% 41.089G 986.142 GB/s 48.36%
C64 I64 U32 2^24 = 16777216 2112x 244.426 us 2.74% 238.310 us 0.96% 70.401G 1.690 TB/s 82.86%
C64 I64 U32 2^28 = 268435456 140x 3.585 ms 0.37% 3.579 ms 0.33% 75.010G 1.800 TB/s 88.29%
C64 I64 I64 2^16 = 65536 40736x 18.415 us 50.10% 12.279 us 2.32% 5.337G 128.098 GB/s 6.28%
C64 I64 I64 2^20 = 1048576 17872x 34.028 us 21.67% 28.000 us 2.14% 37.449G 898.770 GB/s 44.08%
C64 I64 I64 2^24 = 16777216 2048x 251.546 us 2.66% 245.432 us 0.91% 68.358G 1.641 TB/s 80.46%
C64 I64 I64 2^28 = 268435456 135x 3.736 ms 0.31% 3.730 ms 0.27% 71.967G 1.727 TB/s 84.71%
C64 I64 U64 2^16 = 65536 41120x 18.294 us 50.56% 12.160 us 2.32% 5.390G 129.349 GB/s 6.34%
C64 I64 U64 2^20 = 1048576 17840x 34.122 us 21.83% 28.042 us 2.21% 37.393G 897.439 GB/s 44.01%
C64 I64 U64 2^24 = 16777216 2048x 251.688 us 2.65% 245.588 us 0.91% 68.315G 1.640 TB/s 80.41%
C64 I64 U64 2^28 = 268435456 135x 3.736 ms 0.32% 3.730 ms 0.28% 71.976G 1.727 TB/s 84.72%
C64 I128 I32 2^16 = 65536 30960x 22.269 us 37.91% 16.155 us 1.19% 4.057G 162.271 GB/s 7.96%
C64 I128 I32 2^20 = 1048576 11104x 51.236 us 13.76% 45.076 us 1.42% 23.262G 930.495 GB/s 45.63%
C64 I128 I32 2^24 = 16777216 1184x 433.662 us 1.49% 427.718 us 0.53% 39.225G 1.569 TB/s 76.95%
C64 I128 I32 2^28 = 268435456 76x 6.595 ms 0.16% 6.589 ms 0.13% 40.739G 1.630 TB/s 79.92%
C64 I128 U32 2^16 = 65536 31264x 22.046 us 37.87% 16.000 us 1.42% 4.096G 163.840 GB/s 8.04%
C64 I128 U32 2^20 = 1048576 11152x 51.014 us 13.71% 44.896 us 1.36% 23.356G 934.225 GB/s 45.82%
C64 I128 U32 2^24 = 16777216 1184x 433.775 us 1.48% 427.846 us 0.52% 39.213G 1.569 TB/s 76.92%
C64 I128 U32 2^28 = 268435456 76x 6.593 ms 0.15% 6.587 ms 0.12% 40.752G 1.630 TB/s 79.94%
C64 I128 I64 2^16 = 65536 31328x 21.967 us 37.71% 15.961 us 1.24% 4.106G 164.239 GB/s 8.05%
C64 I128 I64 2^20 = 1048576 11120x 51.219 us 13.88% 45.014 us 1.47% 23.294G 931.776 GB/s 45.70%
C64 I128 I64 2^24 = 16777216 1184x 433.745 us 1.52% 427.678 us 0.55% 39.229G 1.569 TB/s 76.95%
C64 I128 I64 2^28 = 268435456 76x 6.598 ms 0.17% 6.592 ms 0.14% 40.722G 1.629 TB/s 79.88%
C64 I128 U64 2^16 = 65536 30768x 22.376 us 37.77% 16.252 us 1.50% 4.032G 161.297 GB/s 7.91%
C64 I128 U64 2^20 = 1048576 11072x 51.378 us 13.77% 45.194 us 1.39% 23.202G 928.068 GB/s 45.51%
C64 I128 U64 2^24 = 16777216 1184x 433.911 us 1.49% 427.975 us 0.54% 39.201G 1.568 TB/s 76.90%
C64 I128 U64 2^28 = 268435456 76x 6.595 ms 0.17% 6.589 ms 0.14% 40.739G 1.630 TB/s 79.92%

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@elstehle elstehle requested review from a team as code owners September 28, 2024 06:54

using ScanTileStateT = ReduceByKeyScanTileState<AccumT, OffsetT>;
using ScanTileStateT = ReduceByKeyScanTileState<AccumT, int>;
Copy link
Collaborator Author

@elstehle elstehle Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried making this bool but performance dropped for some workloads by 20%. Using int with the logical or, |, operator conserved both the semantics and performance, and made the algorithms performance almost agnostic to the offset type.

Copy link
Contributor

🟩 CI finished in 1h 30m: Pass: 100%/208 | Total: 5d 13h | Avg: 38m 39s | Max: 1h 05m | Hits: 63%/14066
  • 🟩 cub: Pass: 100%/104 | Total: 3d 10h | Avg: 47m 34s | Max: 1h 05m | Hits: 29%/2916

    🟩 cpu
      🟩 amd64              Pass: 100%/96  | Total:  3d 03h | Avg: 47m 04s | Max:  1h 05m | Hits:  29%/2916  
      🟩 arm64              Pass: 100%/8   | Total:  7h 08m | Avg: 53m 35s | Max: 55m 37s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 25m | Avg: 45m 43s | Max:  1h 02m | Hits:  29%/729   
      🟩 11.8               Pass: 100%/3   | Total:  3h 15m | Avg:  1h 05m | Max:  1h 05m
      🟩 12.6               Pass: 100%/86  | Total:  2d 19h | Avg: 47m 16s | Max:  1h 03m | Hits:  29%/2187  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 46m | Avg: 53m 27s | Max: 53m 40s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 25m | Avg: 45m 43s | Max:  1h 02m | Hits:  29%/729   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 15m | Avg:  1h 05m | Max:  1h 05m
      🟩 nvcc12.6           Pass: 100%/84  | Total:  2d 17h | Avg: 47m 07s | Max:  1h 03m | Hits:  29%/2187  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 46m | Avg: 53m 27s | Max: 53m 40s
      🟩 nvcc               Pass: 100%/102 | Total:  3d 08h | Avg: 47m 27s | Max:  1h 05m | Hits:  29%/2916  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 53m | Avg: 48m 59s | Max: 53m 59s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 32m | Avg: 50m 41s | Max: 53m 01s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 03s | Max: 49m 14s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 10s | Max: 49m 54s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 17s | Max: 54m 33s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 21m | Avg: 50m 24s | Max: 55m 22s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 24m | Avg: 51m 12s | Max: 54m 39s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 18m | Avg: 49m 33s | Max: 51m 35s
      🟩 Clang17            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 01s | Max: 49m 24s
      🟩 Clang18            Pass: 100%/9   | Total:  6h 51m | Avg: 45m 41s | Max: 53m 40s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 30m | Avg: 45m 28s | Max: 47m 07s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 40m | Avg: 46m 48s | Max: 54m 22s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 48m | Avg: 48m 01s | Max: 56m 22s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 47m | Avg: 47m 51s | Max: 56m 43s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 26m | Avg: 51m 40s | Max: 56m 36s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 39m | Avg: 57m 04s | Max:  1h 05m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 22m | Avg: 50m 37s | Max: 52m 54s
      🟩 GCC13              Pass: 100%/16  | Total:  8h 41m | Avg: 32m 35s | Max: 55m 37s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 44m | Avg: 54m 56s | Max: 59m 06s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 02m | Avg:  1h 02m | Max:  1h 02m | Hits:  29%/729   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 01m | Hits:  29%/1458  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:  29%/729   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total:  1d 13h | Avg: 49m 02s | Max: 55m 22s
      🟩 GCC                Pass: 100%/51  | Total:  1d 13h | Avg: 44m 39s | Max:  1h 05m
      🟩 Intel              Pass: 100%/3   | Total:  2h 44m | Avg: 54m 56s | Max: 59m 06s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 08m | Avg:  1h 02m | Max:  1h 03m | Hits:  29%/2916  
    🟩 gpu
      🟩 v100               Pass: 100%/104 | Total:  3d 10h | Avg: 47m 34s | Max:  1h 05m | Hits:  29%/2916  
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  3d 07h | Avg: 49m 52s | Max:  1h 05m | Hits:  29%/2916  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 17m 57s | Avg: 17m 57s | Max: 17m 57s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 00s | Avg: 15m 00s | Max: 15m 00s
      🟩 HostLaunch         Pass: 100%/3   | Total: 52m 15s | Avg: 17m 25s | Max: 18m 12s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 13m | Avg: 24m 26s | Max: 25m 31s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 15m | Avg:  1h 05m | Max:  1h 05m
      🟩 90a                Pass: 100%/4   | Total:  1h 28m | Avg: 22m 10s | Max: 25m 18s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total: 21h 46m | Avg: 46m 39s | Max:  1h 04m
      🟩 14                 Pass: 100%/27  | Total: 22h 31m | Avg: 50m 03s | Max:  1h 05m | Hits:  29%/1458  
      🟩 17                 Pass: 100%/26  | Total: 21h 48m | Avg: 50m 20s | Max:  1h 05m | Hits:  29%/729   
      🟩 20                 Pass: 100%/23  | Total: 16h 20m | Avg: 42m 36s | Max:  1h 03m | Hits:  29%/729   
    
  • 🟩 thrust: Pass: 100%/103 | Total: 2d 03h | Avg: 29m 47s | Max: 1h 03m | Hits: 72%/11150

    🟩 cpu
      🟩 amd64              Pass: 100%/95  | Total:  1d 23h | Avg: 29m 50s | Max:  1h 03m | Hits:  72%/11150 
      🟩 arm64              Pass: 100%/8   | Total:  3h 53m | Avg: 29m 09s | Max: 33m 11s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 25m | Avg: 29m 41s | Max:  1h 03m | Hits:  65%/2230  
      🟩 11.8               Pass: 100%/3   | Total:  1h 52m | Avg: 37m 26s | Max: 41m 16s
      🟩 12.6               Pass: 100%/85  | Total:  1d 17h | Avg: 29m 32s | Max: 58m 57s | Hits:  74%/8920  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 48m 56s | Avg: 24m 28s | Max: 24m 54s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 25m | Avg: 29m 41s | Max:  1h 03m | Hits:  65%/2230  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 52m | Avg: 37m 26s | Max: 41m 16s
      🟩 nvcc12.6           Pass: 100%/83  | Total:  1d 17h | Avg: 29m 40s | Max: 58m 57s | Hits:  74%/8920  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 48m 56s | Avg: 24m 28s | Max: 24m 54s
      🟩 nvcc               Pass: 100%/101 | Total:  2d 02h | Avg: 29m 54s | Max:  1h 03m | Hits:  72%/11150 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 48m | Avg: 28m 04s | Max: 32m 25s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 30m | Avg: 30m 08s | Max: 32m 53s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 58s | Max: 35m 32s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 56m | Avg: 29m 04s | Max: 31m 03s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 00m | Avg: 30m 11s | Max: 35m 47s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 48s | Max: 32m 19s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 58m | Avg: 29m 42s | Max: 32m 13s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 47s | Max: 32m 21s
      🟩 Clang17            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 34s | Max: 37m 01s
      🟩 Clang18            Pass: 100%/9   | Total:  3h 36m | Avg: 24m 02s | Max: 31m 52s
      🟩 GCC6               Pass: 100%/2   | Total: 51m 12s | Avg: 25m 36s | Max: 28m 20s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 50m | Avg: 28m 26s | Max: 32m 23s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 57m | Avg: 29m 32s | Max: 35m 18s
      🟩 GCC9               Pass: 100%/6   | Total:  2h 54m | Avg: 29m 06s | Max: 34m 04s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 10m | Avg: 32m 42s | Max: 36m 39s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 00m | Avg: 34m 17s | Max: 41m 16s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 21m | Avg: 35m 24s | Max: 39m 54s
      🟩 GCC13              Pass: 100%/14  | Total:  4h 49m | Avg: 20m 39s | Max: 34m 17s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 55m | Avg: 38m 29s | Max: 44m 19s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:  65%/2230  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 01s | Max: 57m 22s | Hits:  65%/4460  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 21m | Avg: 40m 36s | Max: 58m 57s | Hits:  82%/4460  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total: 21h 59m | Avg: 28m 41s | Max: 37m 01s
      🟩 GCC                Pass: 100%/49  | Total: 22h 55m | Avg: 28m 04s | Max: 41m 16s
      🟩 Intel              Pass: 100%/3   | Total:  1h 55m | Avg: 38m 29s | Max: 44m 19s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 18m | Avg: 51m 42s | Max:  1h 03m | Hits:  72%/11150 
    🟩 gpu
      🟩 v100               Pass: 100%/103 | Total:  2d 03h | Avg: 29m 47s | Max:  1h 03m | Hits:  72%/11150 
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  2d 01h | Avg: 31m 05s | Max:  1h 03m | Hits:  65%/8920  
      🟩 TestCPU            Pass: 100%/4   | Total: 43m 29s | Avg: 10m 52s | Max: 22m 15s | Hits:  99%/2230  
      🟩 TestGPU            Pass: 100%/3   | Total: 40m 26s | Avg: 13m 28s | Max: 15m 05s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 52m | Avg: 37m 26s | Max: 41m 16s
      🟩 90a                Pass: 100%/4   | Total:  1h 09m | Avg: 17m 16s | Max: 19m 01s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total: 11h 16m | Avg: 24m 09s | Max: 32m 03s
      🟩 14                 Pass: 100%/27  | Total: 14h 47m | Avg: 32m 52s | Max:  1h 03m | Hits:  65%/4460  
      🟩 17                 Pass: 100%/26  | Total: 14h 31m | Avg: 33m 30s | Max: 56m 41s | Hits:  65%/2230  
      🟩 20                 Pass: 100%/22  | Total: 10h 33m | Avg: 28m 48s | Max: 58m 57s | Hits:  82%/4460  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 23m 00s | Avg: 23m 00s | Max: 23m 00s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 208)

# Runner
171 linux-amd64-cpu16
16 linux-arm64-cpu16
12 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 1h 29m: Pass: 100%/208 | Total: 5d 14h | Avg: 38m 49s | Max: 1h 09m | Hits: 63%/14066
  • 🟩 cub: Pass: 100%/104 | Total: 3d 10h | Avg: 47m 49s | Max: 1h 09m | Hits: 29%/2916

    🟩 cpu
      🟩 amd64              Pass: 100%/96  | Total:  3d 03h | Avg: 47m 22s | Max:  1h 09m | Hits:  29%/2916  
      🟩 arm64              Pass: 100%/8   | Total:  7h 05m | Avg: 53m 08s | Max: 54m 27s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 09m | Avg: 44m 37s | Max: 56m 22s | Hits:  29%/729   
      🟩 11.8               Pass: 100%/3   | Total:  3h 19m | Avg:  1h 06m | Max:  1h 09m
      🟩 12.6               Pass: 100%/86  | Total:  2d 20h | Avg: 47m 43s | Max:  1h 06m | Hits:  29%/2187  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 54m | Avg: 57m 13s | Max:  1h 00m
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 09m | Avg: 44m 37s | Max: 56m 22s | Hits:  29%/729   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 19m | Avg:  1h 06m | Max:  1h 09m
      🟩 nvcc12.6           Pass: 100%/84  | Total:  2d 18h | Avg: 47m 29s | Max:  1h 06m | Hits:  29%/2187  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 54m | Avg: 57m 13s | Max:  1h 00m
      🟩 nvcc               Pass: 100%/102 | Total:  3d 08h | Avg: 47m 38s | Max:  1h 09m | Hits:  29%/2916  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 38m | Avg: 46m 26s | Max: 49m 53s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 35m | Avg: 51m 56s | Max: 56m 10s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 15m | Avg: 48m 57s | Max: 49m 24s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 20m | Avg: 50m 03s | Max: 52m 12s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 26m | Avg: 51m 38s | Max: 53m 46s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 37m | Avg: 54m 29s | Max: 58m 48s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 30m | Avg: 52m 31s | Max: 57m 09s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 17m | Avg: 49m 16s | Max: 49m 39s
      🟩 Clang17            Pass: 100%/4   | Total:  3h 29m | Avg: 52m 16s | Max: 56m 11s
      🟩 Clang18            Pass: 100%/9   | Total:  6h 58m | Avg: 46m 31s | Max:  1h 00m
      🟩 GCC6               Pass: 100%/2   | Total:  1h 26m | Avg: 43m 05s | Max: 43m 10s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 51m | Avg: 48m 37s | Max: 55m 39s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 40m | Avg: 46m 47s | Max: 49m 43s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 48m | Avg: 48m 00s | Max: 57m 57s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 18m | Avg: 49m 43s | Max: 50m 43s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 48m | Avg: 58m 18s | Max:  1h 09m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 27m | Avg: 51m 56s | Max: 57m 54s
      🟩 GCC13              Pass: 100%/16  | Total:  8h 34m | Avg: 32m 10s | Max: 54m 27s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 39m | Avg: 53m 13s | Max: 54m 21s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 56m 22s | Avg: 56m 22s | Max: 56m 22s | Hits:  29%/729   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 06m | Hits:  29%/1458  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:  29%/729   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total:  1d 14h | Avg: 49m 47s | Max:  1h 00m
      🟩 GCC                Pass: 100%/51  | Total:  1d 13h | Avg: 44m 37s | Max:  1h 09m
      🟩 Intel              Pass: 100%/3   | Total:  2h 39m | Avg: 53m 13s | Max: 54m 21s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 07m | Avg:  1h 01m | Max:  1h 06m | Hits:  29%/2916  
    🟩 gpu
      🟩 v100               Pass: 100%/104 | Total:  3d 10h | Avg: 47m 49s | Max:  1h 09m | Hits:  29%/2916  
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  3d 08h | Avg: 50m 10s | Max:  1h 09m | Hits:  29%/2916  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 19m 35s | Avg: 19m 35s | Max: 19m 35s
      🟩 GraphCapture       Pass: 100%/1   | Total: 17m 27s | Avg: 17m 27s | Max: 17m 27s
      🟩 HostLaunch         Pass: 100%/3   | Total: 53m 30s | Avg: 17m 50s | Max: 19m 52s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 05m | Avg: 21m 58s | Max: 24m 03s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 19m | Avg:  1h 06m | Max:  1h 09m
      🟩 90a                Pass: 100%/4   | Total:  1h 26m | Avg: 21m 38s | Max: 22m 09s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total: 21h 32m | Avg: 46m 08s | Max:  1h 04m
      🟩 14                 Pass: 100%/27  | Total: 22h 42m | Avg: 50m 27s | Max:  1h 06m | Hits:  29%/1458  
      🟩 17                 Pass: 100%/26  | Total: 22h 14m | Avg: 51m 19s | Max:  1h 09m | Hits:  29%/729   
      🟩 20                 Pass: 100%/23  | Total: 16h 24m | Avg: 42m 48s | Max:  1h 03m | Hits:  29%/729   
    
  • 🟩 thrust: Pass: 100%/103 | Total: 2d 03h | Avg: 29m 59s | Max: 1h 03m | Hits: 72%/11150

    🟩 cpu
      🟩 amd64              Pass: 100%/95  | Total:  1d 23h | Avg: 30m 02s | Max:  1h 03m | Hits:  72%/11150 
      🟩 arm64              Pass: 100%/8   | Total:  3h 54m | Avg: 29m 21s | Max: 32m 41s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 21m | Avg: 29m 25s | Max: 55m 00s | Hits:  65%/2230  
      🟩 11.8               Pass: 100%/3   | Total:  1h 54m | Avg: 38m 03s | Max: 41m 23s
      🟩 12.6               Pass: 100%/85  | Total:  1d 18h | Avg: 29m 48s | Max:  1h 03m | Hits:  74%/8920  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 52m 50s | Avg: 26m 25s | Max: 28m 57s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 21m | Avg: 29m 25s | Max: 55m 00s | Hits:  65%/2230  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 54m | Avg: 38m 03s | Max: 41m 23s
      🟩 nvcc12.6           Pass: 100%/83  | Total:  1d 17h | Avg: 29m 53s | Max:  1h 03m | Hits:  74%/8920  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 50s | Avg: 26m 25s | Max: 28m 57s
      🟩 nvcc               Pass: 100%/101 | Total:  2d 02h | Avg: 30m 03s | Max:  1h 03m | Hits:  72%/11150 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  2h 56m | Avg: 29m 20s | Max: 35m 49s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 30m | Avg: 30m 10s | Max: 32m 51s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 00m | Avg: 30m 09s | Max: 32m 05s
      🟩 Clang12            Pass: 100%/4   | Total:  1h 57m | Avg: 29m 20s | Max: 31m 16s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 05m | Avg: 31m 24s | Max: 37m 28s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 49s | Max: 32m 46s
      🟩 Clang15            Pass: 100%/4   | Total:  1h 59m | Avg: 29m 50s | Max: 32m 11s
      🟩 Clang16            Pass: 100%/4   | Total:  1h 57m | Avg: 29m 29s | Max: 32m 03s
      🟩 Clang17            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 32s | Max: 32m 31s
      🟩 Clang18            Pass: 100%/9   | Total:  3h 37m | Avg: 24m 10s | Max: 31m 49s
      🟩 GCC6               Pass: 100%/2   | Total: 53m 52s | Avg: 26m 56s | Max: 28m 43s
      🟩 GCC7               Pass: 100%/6   | Total:  2h 56m | Avg: 29m 24s | Max: 34m 45s
      🟩 GCC8               Pass: 100%/6   | Total:  2h 50m | Avg: 28m 20s | Max: 31m 49s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 00m | Avg: 30m 02s | Max: 37m 51s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 02m | Avg: 30m 36s | Max: 33m 35s
      🟩 GCC11              Pass: 100%/7   | Total:  3h 57m | Avg: 33m 54s | Max: 41m 23s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 11m | Avg: 32m 53s | Max: 39m 43s
      🟩 GCC13              Pass: 100%/14  | Total:  5h 17m | Avg: 22m 41s | Max: 35m 32s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 52m | Avg: 37m 29s | Max: 40m 24s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 55m 00s | Avg: 55m 00s | Max: 55m 00s | Hits:  65%/2230  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 59m | Avg: 59m 41s | Max:  1h 03m | Hits:  65%/4460  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 21m | Avg: 40m 57s | Max: 59m 13s | Hits:  82%/4460  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total: 22h 10m | Avg: 28m 55s | Max: 37m 28s
      🟩 GCC                Pass: 100%/49  | Total: 23h 09m | Avg: 28m 21s | Max: 41m 23s
      🟩 Intel              Pass: 100%/3   | Total:  1h 52m | Avg: 37m 29s | Max: 40m 24s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 16m | Avg: 51m 15s | Max:  1h 03m | Hits:  72%/11150 
    🟩 gpu
      🟩 v100               Pass: 100%/103 | Total:  2d 03h | Avg: 29m 59s | Max:  1h 03m | Hits:  72%/11150 
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  2d 01h | Avg: 31m 06s | Max:  1h 03m | Hits:  65%/8920  
      🟩 TestCPU            Pass: 100%/4   | Total:  1h 07m | Avg: 16m 50s | Max: 30m 10s | Hits:  99%/2230  
      🟩 TestGPU            Pass: 100%/3   | Total: 34m 58s | Avg: 11m 39s | Max: 12m 28s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 54m | Avg: 38m 03s | Max: 41m 23s
      🟩 90a                Pass: 100%/4   | Total:  1h 12m | Avg: 18m 03s | Max: 20m 23s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total: 11h 53m | Avg: 25m 28s | Max: 33m 48s
      🟩 14                 Pass: 100%/27  | Total: 14h 52m | Avg: 33m 03s | Max:  1h 03m | Hits:  65%/4460  
      🟩 17                 Pass: 100%/26  | Total: 14h 12m | Avg: 32m 47s | Max: 55m 52s | Hits:  65%/2230  
      🟩 20                 Pass: 100%/22  | Total: 10h 30m | Avg: 28m 40s | Max: 59m 13s | Hits:  82%/4460  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 208)

# Runner
171 linux-amd64-cpu16
16 linux-arm64-cpu16
12 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

cub/cub/device/device_scan.cuh Show resolved Hide resolved
cub/cub/thread/thread_operators.cuh Outdated Show resolved Hide resolved
@elstehle elstehle requested a review from a team as a code owner October 8, 2024 12:28
Copy link
Contributor

github-actions bot commented Oct 8, 2024

🟩 CI finished in 1h 39m: Pass: 100%/208 | Total: 5d 17h | Avg: 39m 36s | Max: 1h 13m | Hits: 65%/16011
  • 🟩 cub: Pass: 100%/104 | Total: 3d 11h | Avg: 47m 57s | Max: 1h 13m | Hits: 29%/2916

    🟩 cpu
      🟩 amd64              Pass: 100%/96  | Total:  3d 03h | Avg: 47m 25s | Max:  1h 13m | Hits:  29%/2916  
      🟩 arm64              Pass: 100%/8   | Total:  7h 15m | Avg: 54m 26s | Max: 59m 00s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total: 11h 13m | Avg: 44m 53s | Max: 57m 19s | Hits:  29%/729   
      🟩 11.8               Pass: 100%/3   | Total:  3h 25m | Avg:  1h 08m | Max:  1h 13m
      🟩 12.6               Pass: 100%/86  | Total:  2d 20h | Avg: 47m 46s | Max:  1h 13m | Hits:  29%/2187  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  1h 56m | Avg: 58m 03s | Max: 59m 35s
      🟩 nvcc11.1           Pass: 100%/15  | Total: 11h 13m | Avg: 44m 53s | Max: 57m 19s | Hits:  29%/729   
      🟩 nvcc11.8           Pass: 100%/3   | Total:  3h 25m | Avg:  1h 08m | Max:  1h 13m
      🟩 nvcc12.6           Pass: 100%/84  | Total:  2d 18h | Avg: 47m 32s | Max:  1h 13m | Hits:  29%/2187  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 56m | Avg: 58m 03s | Max: 59m 35s
      🟩 nvcc               Pass: 100%/102 | Total:  3d 09h | Avg: 47m 45s | Max:  1h 13m | Hits:  29%/2916  
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  4h 54m | Avg: 49m 05s | Max: 56m 48s
      🟩 Clang10            Pass: 100%/3   | Total:  2h 28m | Avg: 49m 38s | Max: 50m 07s
      🟩 Clang11            Pass: 100%/4   | Total:  3h 18m | Avg: 49m 30s | Max: 50m 05s
      🟩 Clang12            Pass: 100%/4   | Total:  3h 22m | Avg: 50m 41s | Max: 55m 05s
      🟩 Clang13            Pass: 100%/4   | Total:  3h 16m | Avg: 49m 01s | Max: 49m 10s
      🟩 Clang14            Pass: 100%/4   | Total:  3h 24m | Avg: 51m 07s | Max: 57m 14s
      🟩 Clang15            Pass: 100%/4   | Total:  3h 24m | Avg: 51m 10s | Max: 57m 31s
      🟩 Clang16            Pass: 100%/4   | Total:  3h 18m | Avg: 49m 32s | Max: 50m 12s
      🟩 Clang17            Pass: 100%/4   | Total:  3h 25m | Avg: 51m 26s | Max: 56m 00s
      🟩 Clang18            Pass: 100%/9   | Total:  7h 07m | Avg: 47m 29s | Max: 59m 35s
      🟩 GCC6               Pass: 100%/2   | Total:  1h 26m | Avg: 43m 06s | Max: 43m 12s
      🟩 GCC7               Pass: 100%/6   | Total:  4h 39m | Avg: 46m 32s | Max: 50m 58s
      🟩 GCC8               Pass: 100%/6   | Total:  4h 40m | Avg: 46m 49s | Max: 49m 12s
      🟩 GCC9               Pass: 100%/6   | Total:  4h 46m | Avg: 47m 48s | Max: 56m 07s
      🟩 GCC10              Pass: 100%/4   | Total:  3h 36m | Avg: 54m 12s | Max: 58m 39s
      🟩 GCC11              Pass: 100%/7   | Total:  6h 54m | Avg: 59m 12s | Max:  1h 13m
      🟩 GCC12              Pass: 100%/4   | Total:  3h 28m | Avg: 52m 02s | Max: 54m 53s
      🟩 GCC13              Pass: 100%/16  | Total:  8h 32m | Avg: 32m 01s | Max: 55m 50s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  2h 49m | Avg: 56m 34s | Max:  1h 00m
      🟩 MSVC14.16          Pass: 100%/1   | Total: 57m 19s | Avg: 57m 19s | Max: 57m 19s | Hits:  29%/729   
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  29%/1458  
      🟩 MSVC14.39          Pass: 100%/1   | Total:  1h 13m | Avg:  1h 13m | Max:  1h 13m | Hits:  29%/729   
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total:  1d 14h | Avg: 49m 35s | Max: 59m 35s
      🟩 GCC                Pass: 100%/51  | Total:  1d 14h | Avg: 44m 48s | Max:  1h 13m
      🟩 Intel              Pass: 100%/3   | Total:  2h 49m | Avg: 56m 34s | Max:  1h 00m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 12m | Avg:  1h 03m | Max:  1h 13m | Hits:  29%/2916  
    🟩 gpu
      🟩 v100               Pass: 100%/104 | Total:  3d 11h | Avg: 47m 57s | Max:  1h 13m | Hits:  29%/2916  
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  3d 08h | Avg: 50m 25s | Max:  1h 13m | Hits:  29%/2916  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 16m 13s | Avg: 16m 13s | Max: 16m 13s
      🟩 GraphCapture       Pass: 100%/1   | Total: 13m 59s | Avg: 13m 59s | Max: 13m 59s
      🟩 HostLaunch         Pass: 100%/3   | Total: 52m 27s | Avg: 17m 29s | Max: 19m 07s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 38s | Max: 23m 50s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  3h 25m | Avg:  1h 08m | Max:  1h 13m
      🟩 90a                Pass: 100%/4   | Total:  1h 28m | Avg: 22m 07s | Max: 24m 20s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total: 21h 52m | Avg: 46m 51s | Max:  1h 13m
      🟩 14                 Pass: 100%/27  | Total: 22h 45m | Avg: 50m 35s | Max:  1h 06m | Hits:  29%/1458  
      🟩 17                 Pass: 100%/26  | Total: 21h 39m | Avg: 49m 58s | Max:  1h 05m | Hits:  29%/729   
      🟩 20                 Pass: 100%/23  | Total: 16h 50m | Avg: 43m 56s | Max:  1h 13m | Hits:  29%/729   
    
  • 🟩 thrust: Pass: 100%/103 | Total: 2d 05h | Avg: 31m 24s | Max: 1h 09m | Hits: 74%/13095

    🟩 cpu
      🟩 amd64              Pass: 100%/95  | Total:  2d 01h | Avg: 31m 28s | Max:  1h 09m | Hits:  74%/13095 
      🟩 arm64              Pass: 100%/8   | Total:  4h 04m | Avg: 30m 33s | Max: 34m 15s
    🟩 ctk
      🟩 11.1               Pass: 100%/15  | Total:  7h 43m | Avg: 30m 54s | Max: 57m 53s | Hits:  67%/2619  
      🟩 11.8               Pass: 100%/3   | Total:  1h 59m | Avg: 39m 59s | Max: 45m 49s
      🟩 12.6               Pass: 100%/85  | Total:  1d 20h | Avg: 31m 11s | Max:  1h 09m | Hits:  75%/10476 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 52m 17s | Avg: 26m 08s | Max: 26m 15s
      🟩 nvcc11.1           Pass: 100%/15  | Total:  7h 43m | Avg: 30m 54s | Max: 57m 53s | Hits:  67%/2619  
      🟩 nvcc11.8           Pass: 100%/3   | Total:  1h 59m | Avg: 39m 59s | Max: 45m 49s
      🟩 nvcc12.6           Pass: 100%/83  | Total:  1d 19h | Avg: 31m 18s | Max:  1h 09m | Hits:  75%/10476 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 52m 17s | Avg: 26m 08s | Max: 26m 15s
      🟩 nvcc               Pass: 100%/101 | Total:  2d 05h | Avg: 31m 30s | Max:  1h 09m | Hits:  74%/13095 
    🟩 cxx
      🟩 Clang9             Pass: 100%/6   | Total:  3h 05m | Avg: 30m 50s | Max: 35m 04s
      🟩 Clang10            Pass: 100%/3   | Total:  1h 34m | Avg: 31m 27s | Max: 34m 18s
      🟩 Clang11            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 37s | Max: 35m 30s
      🟩 Clang12            Pass: 100%/4   | Total:  2h 07m | Avg: 31m 58s | Max: 37m 11s
      🟩 Clang13            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 35s | Max: 35m 30s
      🟩 Clang14            Pass: 100%/4   | Total:  2h 08m | Avg: 32m 04s | Max: 34m 43s
      🟩 Clang15            Pass: 100%/4   | Total:  2h 03m | Avg: 30m 49s | Max: 33m 26s
      🟩 Clang16            Pass: 100%/4   | Total:  2h 10m | Avg: 32m 40s | Max: 38m 21s
      🟩 Clang17            Pass: 100%/4   | Total:  2h 13m | Avg: 33m 27s | Max: 39m 01s
      🟩 Clang18            Pass: 100%/9   | Total:  3h 45m | Avg: 25m 00s | Max: 33m 50s
      🟩 GCC6               Pass: 100%/2   | Total: 55m 10s | Avg: 27m 35s | Max: 30m 36s
      🟩 GCC7               Pass: 100%/6   | Total:  3h 05m | Avg: 30m 50s | Max: 37m 13s
      🟩 GCC8               Pass: 100%/6   | Total:  3h 02m | Avg: 30m 27s | Max: 35m 47s
      🟩 GCC9               Pass: 100%/6   | Total:  3h 07m | Avg: 31m 15s | Max: 40m 00s
      🟩 GCC10              Pass: 100%/4   | Total:  2h 09m | Avg: 32m 27s | Max: 34m 58s
      🟩 GCC11              Pass: 100%/7   | Total:  4h 16m | Avg: 36m 39s | Max: 45m 49s
      🟩 GCC12              Pass: 100%/4   | Total:  2h 23m | Avg: 35m 47s | Max: 41m 55s
      🟩 GCC13              Pass: 100%/14  | Total:  5h 12m | Avg: 22m 20s | Max: 36m 28s
      🟩 Intel2023.2.0      Pass: 100%/3   | Total:  1h 53m | Avg: 37m 55s | Max: 42m 04s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 57m 53s | Avg: 57m 53s | Max: 57m 53s | Hits:  67%/2619  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 57s | Max: 59m 32s | Hits:  67%/5238  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 33m | Avg: 46m 39s | Max:  1h 09m | Hits:  83%/5238  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/46  | Total: 23h 21m | Avg: 30m 27s | Max: 39m 01s
      🟩 GCC                Pass: 100%/49  | Total:  1d 00h | Avg: 29m 38s | Max: 45m 49s
      🟩 Intel              Pass: 100%/3   | Total:  1h 53m | Avg: 37m 55s | Max: 42m 04s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 27m | Avg: 53m 25s | Max:  1h 09m | Hits:  74%/13095 
    🟩 gpu
      🟩 v100               Pass: 100%/103 | Total:  2d 05h | Avg: 31m 24s | Max:  1h 09m | Hits:  74%/13095 
    🟩 jobs
      🟩 Build              Pass: 100%/96  | Total:  2d 04h | Avg: 32m 45s | Max:  1h 09m | Hits:  67%/10476 
      🟩 TestCPU            Pass: 100%/4   | Total: 48m 38s | Avg: 12m 09s | Max: 23m 34s | Hits:  99%/2619  
      🟩 TestGPU            Pass: 100%/3   | Total: 42m 00s | Avg: 14m 00s | Max: 14m 33s
    🟩 sm
      🟩 60;70;80;90        Pass: 100%/3   | Total:  1h 59m | Avg: 39m 59s | Max: 45m 49s
      🟩 90a                Pass: 100%/4   | Total:  1h 15m | Avg: 18m 50s | Max: 20m 11s
    🟩 std
      🟩 11                 Pass: 100%/28  | Total: 11h 54m | Avg: 25m 31s | Max: 34m 00s
      🟩 14                 Pass: 100%/27  | Total: 15h 21m | Avg: 34m 07s | Max: 57m 53s | Hits:  67%/5238  
      🟩 17                 Pass: 100%/26  | Total: 15h 19m | Avg: 35m 22s | Max: 59m 32s | Hits:  67%/2619  
      🟩 20                 Pass: 100%/22  | Total: 11h 19m | Avg: 30m 53s | Max:  1h 09m | Hits:  83%/5238  
    
  • 🟩 pycuda: Pass: 100%/1 | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    🟩 ctk
      🟩 12.5               Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 14m 25s | Avg: 14m 25s | Max: 14m 25s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
pycuda
CUDA C Core Library

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- pycuda
+/- CUDA C Core Library

🏃‍ Runner counts (total jobs: 208)

# Runner
171 linux-amd64-cpu16
16 linux-arm64-cpu16
12 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16

@elstehle elstehle merged commit 951c822 into NVIDIA:main Oct 8, 2024
224 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add support for large num_items to DeviceScan::*ByKey
2 participants