-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for in-place DeviceSelect
& thrust::remove_if
#1782
Conversation
Comparing before/after adding a
|
T{ct} | OffsetT{ct} | Elements{io} | Entropy | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|---|---|
I8 | I32 | 2^16 | 1 | 8.912 us | 8.58% | 9.099 us | 8.08% | 0.187 us | 2.10% | PASS |
I8 | I32 | 2^20 | 1 | 15.293 us | 4.58% | 15.708 us | 4.12% | 0.414 us | 2.71% | PASS |
I8 | I32 | 2^24 | 1 | 106.539 us | 0.98% | 115.718 us | 1.00% | 9.180 us | 8.62% | FAIL |
I8 | I32 | 2^28 | 1 | 1.581 ms | 0.50% | 1.729 ms | 0.50% | 148.923 us | 9.42% | FAIL |
I8 | I32 | 2^16 | 0.544 | 8.770 us | 8.29% | 8.924 us | 7.26% | 0.154 us | 1.76% | PASS |
I8 | I32 | 2^20 | 0.544 | 14.762 us | 4.53% | 15.448 us | 4.10% | 0.686 us | 4.64% | FAIL |
I8 | I32 | 2^24 | 0.544 | 98.968 us | 0.94% | 108.532 us | 0.75% | 9.565 us | 9.66% | FAIL |
I8 | I32 | 2^28 | 0.544 | 1.451 ms | 0.50% | 1.605 ms | 0.50% | 154.071 us | 10.62% | FAIL |
I8 | I32 | 2^16 | 0 | 8.502 us | 7.23% | 8.728 us | 7.50% | 0.226 us | 2.66% | PASS |
I8 | I32 | 2^20 | 0 | 14.359 us | 4.55% | 14.844 us | 4.38% | 0.486 us | 3.38% | PASS |
I8 | I32 | 2^24 | 0 | 90.417 us | 0.81% | 99.528 us | 0.69% | 9.111 us | 10.08% | FAIL |
I8 | I32 | 2^28 | 0 | 1.291 ms | 0.46% | 1.441 ms | 0.50% | 150.519 us | 11.66% | FAIL |
I8 | I64 | 2^16 | 1 | 8.965 us | 6.84% | 9.179 us | 5.87% | 0.214 us | 2.39% | PASS |
I8 | I64 | 2^20 | 1 | 15.673 us | 4.10% | 15.930 us | 4.18% | 0.257 us | 1.64% | PASS |
I8 | I64 | 2^24 | 1 | 111.868 us | 0.87% | 119.288 us | 0.84% | 7.420 us | 6.63% | FAIL |
I8 | I64 | 2^28 | 1 | 1.661 ms | 0.50% | 1.789 ms | 0.50% | 128.483 us | 7.74% | FAIL |
I8 | I64 | 2^16 | 0.544 | 8.893 us | 7.34% | 9.142 us | 6.28% | 0.249 us | 2.80% | PASS |
I8 | I64 | 2^20 | 0.544 | 15.170 us | 4.08% | 15.688 us | 4.26% | 0.518 us | 3.41% | PASS |
I8 | I64 | 2^24 | 0.544 | 105.205 us | 0.81% | 112.016 us | 0.68% | 6.810 us | 6.47% | FAIL |
I8 | I64 | 2^28 | 0.544 | 1.551 ms | 0.50% | 1.663 ms | 0.50% | 111.505 us | 7.19% | FAIL |
I8 | I64 | 2^16 | 0 | 8.703 us | 8.24% | 8.870 us | 7.99% | 0.167 us | 1.92% | PASS |
I8 | I64 | 2^20 | 0 | 14.839 us | 4.75% | 15.088 us | 4.30% | 0.249 us | 1.68% | PASS |
I8 | I64 | 2^24 | 0 | 95.512 us | 0.71% | 102.365 us | 0.70% | 6.853 us | 7.17% | FAIL |
I8 | I64 | 2^28 | 0 | 1.366 ms | 0.50% | 1.485 ms | 0.50% | 118.661 us | 8.68% | FAIL |
I16 | I32 | 2^16 | 1 | 9.028 us | 6.83% | 9.188 us | 5.94% | 0.160 us | 1.77% | PASS |
I16 | I32 | 2^20 | 1 | 16.606 us | 3.59% | 16.921 us | 4.21% | 0.315 us | 1.90% | PASS |
I16 | I32 | 2^24 | 1 | 125.902 us | 1.15% | 133.191 us | 1.13% | 7.289 us | 5.79% | FAIL |
I16 | I32 | 2^28 | 1 | 1.873 ms | 0.50% | 1.989 ms | 0.50% | 115.531 us | 6.17% | FAIL |
I16 | I32 | 2^16 | 0.544 | 8.950 us | 7.57% | 9.086 us | 7.09% | 0.135 us | 1.51% | PASS |
I16 | I32 | 2^20 | 0.544 | 16.164 us | 3.91% | 16.595 us | 3.99% | 0.431 us | 2.67% | PASS |
I16 | I32 | 2^24 | 0.544 | 115.171 us | 1.05% | 122.809 us | 1.01% | 7.638 us | 6.63% | FAIL |
I16 | I32 | 2^28 | 0.544 | 1.701 ms | 0.50% | 1.816 ms | 0.50% | 115.545 us | 6.79% | FAIL |
I16 | I32 | 2^16 | 0 | 8.635 us | 8.25% | 8.814 us | 7.41% | 0.179 us | 2.08% | PASS |
I16 | I32 | 2^20 | 0 | 15.912 us | 4.21% | 16.145 us | 3.69% | 0.233 us | 1.46% | PASS |
I16 | I32 | 2^24 | 0 | 95.870 us | 0.73% | 105.086 us | 0.71% | 9.216 us | 9.61% | FAIL |
I16 | I32 | 2^28 | 0 | 1.349 ms | 0.50% | 1.513 ms | 0.50% | 163.377 us | 12.11% | FAIL |
I16 | I64 | 2^16 | 1 | 9.016 us | 7.24% | 9.291 us | 5.60% | 0.275 us | 3.05% | PASS |
I16 | I64 | 2^20 | 1 | 16.798 us | 3.86% | 17.119 us | 3.84% | 0.321 us | 1.91% | PASS |
I16 | I64 | 2^24 | 1 | 128.175 us | 0.99% | 135.671 us | 0.97% | 7.496 us | 5.85% | FAIL |
I16 | I64 | 2^28 | 1 | 1.904 ms | 0.50% | 2.019 ms | 0.50% | 114.364 us | 6.01% | FAIL |
I16 | I64 | 2^16 | 0.544 | 8.978 us | 6.69% | 9.220 us | 6.03% | 0.243 us | 2.70% | PASS |
I16 | I64 | 2^20 | 0.544 | 16.520 us | 3.82% | 16.980 us | 3.91% | 0.460 us | 2.78% | PASS |
I16 | I64 | 2^24 | 0.544 | 118.118 us | 0.88% | 125.303 us | 0.88% | 7.185 us | 6.08% | FAIL |
I16 | I64 | 2^28 | 0.544 | 1.741 ms | 0.50% | 1.852 ms | 0.50% | 110.498 us | 6.35% | FAIL |
I16 | I64 | 2^16 | 0 | 8.777 us | 7.58% | 8.969 us | 6.92% | 0.191 us | 2.18% | PASS |
I16 | I64 | 2^20 | 0 | 16.226 us | 3.87% | 16.477 us | 4.28% | 0.251 us | 1.55% | PASS |
I16 | I64 | 2^24 | 0 | 100.467 us | 0.66% | 107.935 us | 0.65% | 7.468 us | 7.43% | FAIL |
I16 | I64 | 2^28 | 0 | 1.419 ms | 0.50% | 1.553 ms | 0.50% | 134.076 us | 9.45% | FAIL |
I32 | I32 | 2^16 | 1 | 9.036 us | 6.74% | 9.384 us | 6.72% | 0.347 us | 3.84% | PASS |
I32 | I32 | 2^20 | 1 | 19.874 us | 4.34% | 20.109 us | 3.50% | 0.235 us | 1.18% | PASS |
I32 | I32 | 2^24 | 1 | 184.482 us | 0.85% | 190.320 us | 1.21% | 5.839 us | 3.16% | FAIL |
I32 | I32 | 2^28 | 1 | 2.823 ms | 0.62% | 2.911 ms | 0.65% | 88.147 us | 3.12% | FAIL |
I32 | I32 | 2^16 | 0.544 | 9.070 us | 6.71% | 9.457 us | 5.97% | 0.388 us | 4.27% | PASS |
I32 | I32 | 2^20 | 0.544 | 19.686 us | 3.11% | 20.061 us | 3.58% | 0.375 us | 1.90% | PASS |
I32 | I32 | 2^24 | 0.544 | 155.884 us | 1.14% | 161.834 us | 1.42% | 5.950 us | 3.82% | FAIL |
I32 | I32 | 2^28 | 0.544 | 2.332 ms | 0.50% | 2.422 ms | 0.50% | 90.603 us | 3.89% | FAIL |
I32 | I32 | 2^16 | 0 | 8.830 us | 7.39% | 8.993 us | 6.19% | 0.163 us | 1.84% | PASS |
I32 | I32 | 2^20 | 0 | 18.978 us | 3.33% | 19.083 us | 3.85% | 0.105 us | 0.55% | PASS |
I32 | I32 | 2^24 | 0 | 113.539 us | 1.01% | 118.506 us | 1.01% | 4.967 us | 4.37% | FAIL |
I32 | I32 | 2^28 | 0 | 1.580 ms | 1.01% | 1.662 ms | 0.79% | 82.477 us | 5.22% | FAIL |
I32 | I64 | 2^16 | 1 | 9.162 us | 5.73% | 9.654 us | 6.73% | 0.492 us | 5.37% | PASS |
I32 | I64 | 2^20 | 1 | 20.099 us | 3.84% | 20.136 us | 3.34% | 0.036 us | 0.18% | PASS |
I32 | I64 | 2^24 | 1 | 186.217 us | 1.00% | 192.459 us | 1.44% | 6.242 us | 3.35% | FAIL |
I32 | I64 | 2^28 | 1 | 2.845 ms | 0.61% | 2.961 ms | 0.65% | 115.749 us | 4.07% | FAIL |
I32 | I64 | 2^16 | 0.544 | 9.205 us | 5.84% | 9.525 us | 6.77% | 0.320 us | 3.48% | PASS |
I32 | I64 | 2^20 | 0.544 | 20.076 us | 3.21% | 20.255 us | 3.65% | 0.179 us | 0.89% | PASS |
I32 | I64 | 2^24 | 0.544 | 157.555 us | 1.07% | 163.415 us | 1.32% | 5.861 us | 3.72% | FAIL |
I32 | I64 | 2^28 | 0.544 | 2.357 ms | 0.50% | 2.450 ms | 0.50% | 92.733 us | 3.93% | FAIL |
I32 | I64 | 2^16 | 0 | 8.937 us | 7.42% | 9.027 us | 6.74% | 0.090 us | 1.01% | PASS |
I32 | I64 | 2^20 | 0 | 19.320 us | 3.60% | 19.467 us | 3.22% | 0.148 us | 0.76% | PASS |
I32 | I64 | 2^24 | 0 | 117.161 us | 0.99% | 121.330 us | 0.94% | 4.169 us | 3.56% | FAIL |
I32 | I64 | 2^28 | 0 | 1.642 ms | 0.89% | 1.702 ms | 0.77% | 60.250 us | 3.67% | FAIL |
I64 | I32 | 2^16 | 1 | 10.166 us | 6.21% | 10.544 us | 6.41% | 0.379 us | 3.73% | PASS |
I64 | I32 | 2^20 | 1 | 29.646 us | 2.64% | 30.024 us | 2.99% | 0.378 us | 1.27% | PASS |
I64 | I32 | 2^24 | 1 | 349.804 us | 0.56% | 355.649 us | 0.63% | 5.846 us | 1.67% | FAIL |
I64 | I32 | 2^28 | 1 | 5.472 ms | 0.50% | 5.563 ms | 0.50% | 91.410 us | 1.67% | FAIL |
I64 | I32 | 2^16 | 0.544 | 10.472 us | 5.86% | 10.994 us | 6.49% | 0.522 us | 4.98% | PASS |
I64 | I32 | 2^20 | 0.544 | 28.024 us | 2.91% | 28.268 us | 3.08% | 0.244 us | 0.87% | PASS |
I64 | I32 | 2^24 | 0.544 | 281.239 us | 0.62% | 286.075 us | 0.81% | 4.836 us | 1.72% | FAIL |
I64 | I32 | 2^28 | 0.544 | 4.339 ms | 0.50% | 4.410 ms | 0.50% | 70.634 us | 1.63% | FAIL |
I64 | I32 | 2^16 | 0 | 9.773 us | 6.62% | 10.168 us | 6.22% | 0.395 us | 4.04% | PASS |
I64 | I32 | 2^20 | 0 | 27.180 us | 2.79% | 27.634 us | 2.90% | 0.454 us | 1.67% | PASS |
I64 | I32 | 2^24 | 0 | 192.678 us | 0.88% | 197.576 us | 0.93% | 4.897 us | 2.54% | FAIL |
I64 | I32 | 2^28 | 0 | 2.832 ms | 0.88% | 2.911 ms | 0.77% | 78.442 us | 2.77% | FAIL |
I64 | I64 | 2^16 | 1 | 10.579 us | 6.03% | 11.077 us | 6.48% | 0.498 us | 4.71% | PASS |
I64 | I64 | 2^20 | 1 | 30.587 us | 2.42% | 30.687 us | 2.49% | 0.100 us | 0.33% | PASS |
I64 | I64 | 2^24 | 1 | 357.968 us | 0.64% | 365.810 us | 0.79% | 7.841 us | 2.19% | FAIL |
I64 | I64 | 2^28 | 1 | 5.594 ms | 0.50% | 5.728 ms | 0.50% | 134.466 us | 2.40% | FAIL |
I64 | I64 | 2^16 | 0.544 | 10.144 us | 6.04% | 10.556 us | 6.55% | 0.411 us | 4.05% | PASS |
I64 | I64 | 2^20 | 0.544 | 28.717 us | 3.23% | 28.868 us | 3.13% | 0.152 us | 0.53% | PASS |
I64 | I64 | 2^24 | 0.544 | 291.287 us | 0.79% | 299.833 us | 0.98% | 8.547 us | 2.93% | FAIL |
I64 | I64 | 2^28 | 0.544 | 4.503 ms | 0.50% | 4.648 ms | 0.50% | 145.339 us | 3.23% | FAIL |
I64 | I64 | 2^16 | 0 | 10.352 us | 5.39% | 10.651 us | 6.81% | 0.299 us | 2.89% | PASS |
I64 | I64 | 2^20 | 0 | 28.169 us | 2.64% | 28.055 us | 3.21% | -0.114 us | -0.40% | PASS |
I64 | I64 | 2^24 | 0 | 205.067 us | 0.79% | 211.705 us | 0.76% | 6.638 us | 3.24% | FAIL |
I64 | I64 | 2^28 | 0 | 3.046 ms | 0.73% | 3.153 ms | 0.62% | 107.727 us | 3.54% | FAIL |
I128 | I32 | 2^16 | 1 | 12.566 us | 5.21% | 13.126 us | 5.30% | 0.561 us | 4.46% | PASS |
I128 | I32 | 2^20 | 1 | 40.083 us | 1.70% | 40.682 us | 1.74% | 0.599 us | 1.50% | PASS |
I128 | I32 | 2^24 | 1 | 393.489 us | 0.54% | 417.517 us | 0.50% | 24.029 us | 6.11% | FAIL |
I128 | I32 | 2^28 | 1 | 6.069 ms | 0.50% | 6.443 ms | 0.50% | 373.328 us | 6.15% | FAIL |
I128 | I32 | 2^16 | 0.544 | 12.594 us | 5.43% | 13.127 us | 5.17% | 0.534 us | 4.24% | PASS |
I128 | I32 | 2^20 | 0.544 | 40.083 us | 1.78% | 40.746 us | 1.95% | 0.663 us | 1.65% | PASS |
I128 | I32 | 2^24 | 0.544 | 393.475 us | 0.60% | 417.545 us | 0.50% | 24.070 us | 6.12% | FAIL |
I128 | I32 | 2^28 | 0.544 | 6.069 ms | 0.50% | 6.443 ms | 0.50% | 373.411 us | 6.15% | FAIL |
I128 | I32 | 2^16 | 0 | 12.542 us | 4.92% | 13.108 us | 4.75% | 0.566 us | 4.51% | PASS |
I128 | I32 | 2^20 | 0 | 39.995 us | 1.68% | 40.677 us | 1.85% | 0.682 us | 1.70% | FAIL |
I128 | I32 | 2^24 | 0 | 393.323 us | 0.53% | 417.526 us | 0.50% | 24.203 us | 6.15% | FAIL |
I128 | I32 | 2^28 | 0 | 6.069 ms | 0.50% | 6.443 ms | 0.50% | 373.458 us | 6.15% | FAIL |
I128 | I64 | 2^16 | 1 | 12.001 us | 5.42% | 12.576 us | 5.40% | 0.575 us | 4.80% | PASS |
I128 | I64 | 2^20 | 1 | 41.167 us | 1.60% | 41.400 us | 1.71% | 0.234 us | 0.57% | PASS |
I128 | I64 | 2^24 | 1 | 415.244 us | 0.50% | 435.304 us | 0.41% | 20.060 us | 4.83% | FAIL |
I128 | I64 | 2^28 | 1 | 6.418 ms | 0.50% | 6.742 ms | 0.50% | 323.727 us | 5.04% | FAIL |
I128 | I64 | 2^16 | 0.544 | 12.055 us | 5.35% | 12.654 us | 5.46% | 0.599 us | 4.97% | PASS |
I128 | I64 | 2^20 | 0.544 | 41.133 us | 1.51% | 41.490 us | 2.03% | 0.357 us | 0.87% | PASS |
I128 | I64 | 2^24 | 0.544 | 415.328 us | 0.50% | 435.348 us | 0.40% | 20.020 us | 4.82% | FAIL |
I128 | I64 | 2^28 | 0.544 | 6.418 ms | 0.50% | 6.742 ms | 0.50% | 323.551 us | 5.04% | FAIL |
I128 | I64 | 2^16 | 0 | 12.059 us | 5.56% | 12.653 us | 5.36% | 0.594 us | 4.93% | PASS |
I128 | I64 | 2^20 | 0 | 41.177 us | 1.41% | 41.572 us | 1.96% | 0.395 us | 0.96% | PASS |
I128 | I64 | 2^24 | 0 | 415.291 us | 0.50% | 435.331 us | 0.42% | 20.040 us | 4.83% | FAIL |
I128 | I64 | 2^28 | 0 | 6.419 ms | 0.50% | 6.742 ms | 0.50% | 323.852 us | 5.05% | FAIL |
F32 | I32 | 2^16 | 1 | 9.396 us | 6.82% | 9.330 us | 6.60% | -0.065 us | -0.70% | PASS |
F32 | I32 | 2^20 | 1 | 19.837 us | 3.13% | 19.958 us | 3.57% | 0.121 us | 0.61% | PASS |
F32 | I32 | 2^24 | 1 | 184.529 us | 0.91% | 190.288 us | 1.25% | 5.758 us | 3.12% | FAIL |
F32 | I32 | 2^28 | 1 | 2.960 ms | 0.67% | 3.020 ms | 0.66% | 60.203 us | 2.03% | FAIL |
F32 | I32 | 2^16 | 0.544 | 8.842 us | 7.86% | 8.970 us | 7.18% | 0.128 us | 1.45% | PASS |
F32 | I32 | 2^20 | 0.544 | 18.993 us | 3.18% | 19.265 us | 3.67% | 0.272 us | 1.43% | PASS |
F32 | I32 | 2^24 | 0.544 | 128.972 us | 1.08% | 133.893 us | 1.15% | 4.921 us | 3.82% | FAIL |
F32 | I32 | 2^28 | 0.544 | 1.858 ms | 0.77% | 1.926 ms | 0.67% | 67.216 us | 3.62% | FAIL |
F32 | I32 | 2^16 | 0 | 8.802 us | 6.88% | 8.944 us | 6.95% | 0.142 us | 1.61% | PASS |
F32 | I32 | 2^20 | 0 | 18.928 us | 3.83% | 18.982 us | 3.74% | 0.054 us | 0.29% | PASS |
F32 | I32 | 2^24 | 0 | 113.272 us | 1.13% | 118.799 us | 1.03% | 5.527 us | 4.88% | FAIL |
F32 | I32 | 2^28 | 0 | 1.580 ms | 1.01% | 1.662 ms | 0.80% | 82.529 us | 5.22% | FAIL |
F32 | I64 | 2^16 | 1 | 9.398 us | 6.38% | 9.377 us | 7.18% | -0.020 us | -0.22% | PASS |
F32 | I64 | 2^20 | 1 | 20.335 us | 3.99% | 20.434 us | 4.50% | 0.099 us | 0.49% | PASS |
F32 | I64 | 2^24 | 1 | 185.982 us | 0.96% | 192.968 us | 1.47% | 6.985 us | 3.76% | FAIL |
F32 | I64 | 2^28 | 1 | 2.969 ms | 0.67% | 3.041 ms | 0.67% | 71.424 us | 2.41% | FAIL |
F32 | I64 | 2^16 | 0.544 | 8.958 us | 6.71% | 9.156 us | 6.20% | 0.198 us | 2.21% | PASS |
F32 | I64 | 2^20 | 0.544 | 19.403 us | 3.40% | 19.717 us | 3.39% | 0.314 us | 1.62% | PASS |
F32 | I64 | 2^24 | 0.544 | 132.361 us | 1.23% | 136.217 us | 1.22% | 3.856 us | 2.91% | FAIL |
F32 | I64 | 2^28 | 0.544 | 1.902 ms | 0.69% | 1.964 ms | 0.64% | 61.788 us | 3.25% | FAIL |
F32 | I64 | 2^16 | 0 | 9.009 us | 6.55% | 9.216 us | 6.23% | 0.207 us | 2.29% | PASS |
F32 | I64 | 2^20 | 0 | 19.284 us | 3.55% | 19.635 us | 3.69% | 0.351 us | 1.82% | PASS |
F32 | I64 | 2^24 | 0 | 117.281 us | 0.98% | 121.540 us | 0.97% | 4.259 us | 3.63% | FAIL |
F32 | I64 | 2^28 | 0 | 1.645 ms | 0.88% | 1.705 ms | 0.77% | 60.682 us | 3.69% | FAIL |
F64 | I32 | 2^16 | 1 | 9.970 us | 6.71% | 10.656 us | 6.52% | 0.686 us | 6.88% | FAIL |
F64 | I32 | 2^20 | 1 | 29.566 us | 2.56% | 30.029 us | 2.98% | 0.463 us | 1.57% | PASS |
F64 | I32 | 2^24 | 1 | 349.673 us | 0.50% | 355.488 us | 0.65% | 5.814 us | 1.66% | FAIL |
F64 | I32 | 2^28 | 1 | 5.471 ms | 0.50% | 5.560 ms | 0.50% | 89.456 us | 1.64% | FAIL |
F64 | I32 | 2^16 | 0.544 | 9.776 us | 6.74% | 10.225 us | 6.68% | 0.449 us | 4.60% | PASS |
F64 | I32 | 2^20 | 0.544 | 26.970 us | 2.33% | 27.067 us | 3.04% | 0.097 us | 0.36% | PASS |
F64 | I32 | 2^24 | 0.544 | 224.209 us | 0.74% | 227.678 us | 0.84% | 3.468 us | 1.55% | FAIL |
F64 | I32 | 2^28 | 0.544 | 3.384 ms | 0.50% | 3.429 ms | 0.50% | 44.776 us | 1.32% | FAIL |
F64 | I32 | 2^16 | 0 | 9.795 us | 6.91% | 10.232 us | 6.27% | 0.437 us | 4.46% | PASS |
F64 | I32 | 2^20 | 0 | 27.198 us | 2.76% | 27.270 us | 2.68% | 0.072 us | 0.26% | PASS |
F64 | I32 | 2^24 | 0 | 192.582 us | 0.85% | 197.339 us | 0.91% | 4.757 us | 2.47% | FAIL |
F64 | I32 | 2^28 | 0 | 2.833 ms | 0.88% | 2.907 ms | 0.77% | 74.390 us | 2.63% | FAIL |
F64 | I64 | 2^16 | 1 | 10.453 us | 5.95% | 10.848 us | 6.55% | 0.394 us | 3.77% | PASS |
F64 | I64 | 2^20 | 1 | 30.257 us | 2.34% | 30.516 us | 2.52% | 0.258 us | 0.85% | PASS |
F64 | I64 | 2^24 | 1 | 357.008 us | 0.58% | 365.797 us | 0.83% | 8.789 us | 2.46% | FAIL |
F64 | I64 | 2^28 | 1 | 5.579 ms | 0.50% | 5.727 ms | 0.50% | 148.084 us | 2.65% | FAIL |
F64 | I64 | 2^16 | 0.544 | 10.167 us | 6.22% | 10.599 us | 6.78% | 0.432 us | 4.25% | PASS |
F64 | I64 | 2^20 | 0.544 | 27.649 us | 2.35% | 27.181 us | 2.77% | -0.468 us | -1.69% | PASS |
F64 | I64 | 2^24 | 0.544 | 234.540 us | 0.87% | 238.254 us | 0.81% | 3.714 us | 1.58% | FAIL |
F64 | I64 | 2^28 | 0.544 | 3.568 ms | 0.50% | 3.616 ms | 0.50% | 48.448 us | 1.36% | FAIL |
F64 | I64 | 2^16 | 0 | 10.140 us | 5.75% | 10.494 us | 6.41% | 0.354 us | 3.49% | PASS |
F64 | I64 | 2^20 | 0 | 27.881 us | 2.73% | 27.380 us | 2.66% | -0.501 us | -1.80% | PASS |
F64 | I64 | 2^24 | 0 | 204.907 us | 0.81% | 211.228 us | 0.77% | 6.321 us | 3.09% | FAIL |
F64 | I64 | 2^28 | 0 | 3.046 ms | 0.75% | 3.147 ms | 0.62% | 101.815 us | 3.34% | FAIL |
Select.Flagged - Tesla V100-SXM2-32GB
[0] Tesla V100-SXM2-32GB
T{ct} | OffsetT{ct} | Elements{io} | Entropy | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|---|---|
I8 | I32 | 2^16 | 1 | 8.968 us | 8.86% | 9.498 us | 7.74% | 0.530 us | 5.91% | PASS |
I8 | I32 | 2^20 | 1 | 16.185 us | 4.20% | 17.055 us | 4.30% | 0.869 us | 5.37% | FAIL |
I8 | I32 | 2^24 | 1 | 120.488 us | 0.87% | 128.596 us | 0.76% | 8.108 us | 6.73% | FAIL |
I8 | I32 | 2^28 | 1 | 1.807 ms | 0.50% | 1.941 ms | 0.50% | 134.174 us | 7.42% | FAIL |
I8 | I32 | 2^16 | 0.544 | 8.842 us | 8.21% | 8.991 us | 7.44% | 0.149 us | 1.68% | PASS |
I8 | I32 | 2^20 | 0.544 | 15.875 us | 4.22% | 16.739 us | 3.54% | 0.863 us | 5.44% | FAIL |
I8 | I32 | 2^24 | 0.544 | 117.490 us | 0.97% | 126.136 us | 0.81% | 8.647 us | 7.36% | FAIL |
I8 | I32 | 2^28 | 0.544 | 1.734 ms | 0.51% | 1.874 ms | 0.50% | 140.140 us | 8.08% | FAIL |
I8 | I32 | 2^16 | 0 | 8.594 us | 7.53% | 8.766 us | 8.01% | 0.172 us | 2.00% | PASS |
I8 | I32 | 2^20 | 0 | 15.336 us | 4.57% | 16.506 us | 3.97% | 1.170 us | 7.63% | FAIL |
I8 | I32 | 2^24 | 0 | 103.213 us | 0.80% | 112.879 us | 0.71% | 9.666 us | 9.37% | FAIL |
I8 | I32 | 2^28 | 0 | 1.486 ms | 0.10% | 1.644 ms | 0.11% | 157.688 us | 10.61% | FAIL |
I8 | I64 | 2^16 | 1 | 8.855 us | 7.52% | 9.299 us | 6.28% | 0.444 us | 5.02% | PASS |
I8 | I64 | 2^20 | 1 | 16.947 us | 4.01% | 17.745 us | 3.72% | 0.798 us | 4.71% | FAIL |
I8 | I64 | 2^24 | 1 | 131.561 us | 0.67% | 140.788 us | 0.57% | 9.227 us | 7.01% | FAIL |
I8 | I64 | 2^28 | 1 | 1.991 ms | 0.50% | 2.144 ms | 0.50% | 152.858 us | 7.68% | FAIL |
I8 | I64 | 2^16 | 0.544 | 8.832 us | 7.83% | 9.185 us | 6.56% | 0.353 us | 4.00% | PASS |
I8 | I64 | 2^20 | 0.544 | 16.608 us | 4.14% | 17.544 us | 3.54% | 0.936 us | 5.64% | FAIL |
I8 | I64 | 2^24 | 0.544 | 128.598 us | 0.76% | 137.702 us | 0.68% | 9.105 us | 7.08% | FAIL |
I8 | I64 | 2^28 | 0.544 | 1.920 ms | 0.50% | 2.067 ms | 0.50% | 147.296 us | 7.67% | FAIL |
I8 | I64 | 2^16 | 0 | 8.652 us | 8.26% | 9.015 us | 6.77% | 0.363 us | 4.20% | PASS |
I8 | I64 | 2^20 | 0 | 15.849 us | 4.19% | 16.728 us | 4.05% | 0.879 us | 5.54% | FAIL |
I8 | I64 | 2^24 | 0 | 114.232 us | 0.71% | 123.151 us | 0.67% | 8.919 us | 7.81% | FAIL |
I8 | I64 | 2^28 | 0 | 1.662 ms | 0.08% | 1.807 ms | 0.10% | 144.943 us | 8.72% | FAIL |
I16 | I32 | 2^16 | 1 | 8.941 us | 7.29% | 9.210 us | 6.12% | 0.269 us | 3.00% | PASS |
I16 | I32 | 2^20 | 1 | 17.656 us | 3.49% | 18.537 us | 3.15% | 0.882 us | 5.00% | FAIL |
I16 | I32 | 2^24 | 1 | 137.352 us | 0.97% | 143.125 us | 1.06% | 5.773 us | 4.20% | FAIL |
I16 | I32 | 2^28 | 1 | 2.053 ms | 0.50% | 2.142 ms | 0.50% | 88.655 us | 4.32% | FAIL |
I16 | I32 | 2^16 | 0.544 | 8.993 us | 7.49% | 9.148 us | 6.70% | 0.155 us | 1.72% | PASS |
I16 | I32 | 2^20 | 0.544 | 17.457 us | 4.05% | 18.242 us | 3.54% | 0.785 us | 4.49% | FAIL |
I16 | I32 | 2^24 | 0.544 | 132.951 us | 1.08% | 140.627 us | 1.09% | 7.676 us | 5.77% | FAIL |
I16 | I32 | 2^28 | 0.544 | 1.973 ms | 0.55% | 2.076 ms | 0.56% | 102.638 us | 5.20% | FAIL |
I16 | I32 | 2^16 | 0 | 8.729 us | 8.85% | 8.896 us | 7.41% | 0.167 us | 1.91% | PASS |
I16 | I32 | 2^20 | 0 | 16.906 us | 4.28% | 17.573 us | 3.49% | 0.667 us | 3.94% | FAIL |
I16 | I32 | 2^24 | 0 | 105.762 us | 0.79% | 114.450 us | 0.68% | 8.688 us | 8.21% | FAIL |
I16 | I32 | 2^28 | 0 | 1.477 ms | 0.13% | 1.619 ms | 0.13% | 142.273 us | 9.63% | FAIL |
I16 | I64 | 2^16 | 1 | 9.233 us | 6.72% | 9.331 us | 5.48% | 0.098 us | 1.07% | PASS |
I16 | I64 | 2^20 | 1 | 18.815 us | 3.65% | 19.361 us | 3.95% | 0.546 us | 2.90% | PASS |
I16 | I64 | 2^24 | 1 | 144.780 us | 0.91% | 159.057 us | 0.76% | 14.277 us | 9.86% | FAIL |
I16 | I64 | 2^28 | 1 | 2.162 ms | 0.50% | 2.396 ms | 0.50% | 233.361 us | 10.79% | FAIL |
I16 | I64 | 2^16 | 0.544 | 9.072 us | 6.99% | 9.244 us | 5.93% | 0.172 us | 1.89% | PASS |
I16 | I64 | 2^20 | 0.544 | 18.352 us | 4.13% | 18.692 us | 3.55% | 0.339 us | 1.85% | PASS |
I16 | I64 | 2^24 | 0.544 | 140.339 us | 0.90% | 155.306 us | 0.77% | 14.967 us | 10.67% | FAIL |
I16 | I64 | 2^28 | 0.544 | 2.088 ms | 0.51% | 2.337 ms | 0.50% | 249.516 us | 11.95% | FAIL |
I16 | I64 | 2^16 | 0 | 8.793 us | 7.98% | 9.104 us | 6.38% | 0.311 us | 3.54% | PASS |
I16 | I64 | 2^20 | 0 | 17.499 us | 3.97% | 18.087 us | 3.87% | 0.588 us | 3.36% | PASS |
I16 | I64 | 2^24 | 0 | 115.227 us | 0.78% | 130.556 us | 0.62% | 15.329 us | 13.30% | FAIL |
I16 | I64 | 2^28 | 0 | 1.635 ms | 0.10% | 1.904 ms | 0.09% | 268.747 us | 16.44% | FAIL |
I32 | I32 | 2^16 | 1 | 9.431 us | 6.60% | 9.736 us | 6.77% | 0.305 us | 3.24% | PASS |
I32 | I32 | 2^20 | 1 | 21.706 us | 3.07% | 22.719 us | 3.11% | 1.013 us | 4.66% | FAIL |
I32 | I32 | 2^24 | 1 | 207.507 us | 0.82% | 213.549 us | 0.97% | 6.042 us | 2.91% | FAIL |
I32 | I32 | 2^28 | 1 | 3.186 ms | 0.55% | 3.248 ms | 0.59% | 61.900 us | 1.94% | FAIL |
I32 | I32 | 2^16 | 0.544 | 9.473 us | 6.93% | 9.646 us | 6.89% | 0.173 us | 1.82% | PASS |
I32 | I32 | 2^20 | 0.544 | 21.681 us | 3.26% | 22.808 us | 3.06% | 1.127 us | 5.20% | FAIL |
I32 | I32 | 2^24 | 0.544 | 184.719 us | 1.06% | 193.191 us | 1.28% | 8.471 us | 4.59% | FAIL |
I32 | I32 | 2^28 | 0.544 | 2.795 ms | 0.50% | 2.909 ms | 0.50% | 113.939 us | 4.08% | FAIL |
I32 | I32 | 2^16 | 0 | 9.067 us | 7.71% | 9.359 us | 7.03% | 0.292 us | 3.22% | PASS |
I32 | I32 | 2^20 | 0 | 20.829 us | 3.16% | 22.036 us | 3.17% | 1.207 us | 5.80% | FAIL |
I32 | I32 | 2^24 | 0 | 131.303 us | 0.83% | 135.584 us | 0.78% | 4.281 us | 3.26% | FAIL |
I32 | I32 | 2^28 | 0 | 1.834 ms | 0.16% | 1.900 ms | 0.16% | 65.846 us | 3.59% | FAIL |
I32 | I64 | 2^16 | 1 | 9.640 us | 7.28% | 9.596 us | 6.65% | -0.044 us | -0.46% | PASS |
I32 | I64 | 2^20 | 1 | 22.407 us | 3.52% | 22.311 us | 3.40% | -0.096 us | -0.43% | PASS |
I32 | I64 | 2^24 | 1 | 210.662 us | 0.78% | 217.703 us | 0.99% | 7.041 us | 3.34% | FAIL |
I32 | I64 | 2^28 | 1 | 3.223 ms | 0.55% | 3.336 ms | 0.55% | 113.104 us | 3.51% | FAIL |
I32 | I64 | 2^16 | 0.544 | 9.563 us | 6.50% | 9.429 us | 6.55% | -0.135 us | -1.41% | PASS |
I32 | I64 | 2^20 | 0.544 | 22.181 us | 3.56% | 22.420 us | 3.40% | 0.239 us | 1.08% | PASS |
I32 | I64 | 2^24 | 0.544 | 190.016 us | 1.00% | 200.693 us | 1.15% | 10.677 us | 5.62% | FAIL |
I32 | I64 | 2^28 | 0.544 | 2.873 ms | 0.50% | 3.044 ms | 0.50% | 170.261 us | 5.93% | FAIL |
I32 | I64 | 2^16 | 0 | 9.243 us | 6.79% | 9.278 us | 5.97% | 0.035 us | 0.38% | PASS |
I32 | I64 | 2^20 | 0 | 21.224 us | 3.60% | 21.764 us | 3.47% | 0.539 us | 2.54% | PASS |
I32 | I64 | 2^24 | 0 | 137.559 us | 0.78% | 146.899 us | 0.65% | 9.340 us | 6.79% | FAIL |
I32 | I64 | 2^28 | 0 | 1.944 ms | 0.16% | 2.121 ms | 0.12% | 177.032 us | 9.11% | FAIL |
I64 | I32 | 2^16 | 1 | 10.504 us | 6.91% | 10.514 us | 6.34% | 0.010 us | 0.09% | PASS |
I64 | I32 | 2^20 | 1 | 31.662 us | 2.47% | 32.551 us | 2.32% | 0.889 us | 2.81% | FAIL |
I64 | I32 | 2^24 | 1 | 378.684 us | 0.56% | 387.398 us | 0.75% | 8.713 us | 2.30% | FAIL |
I64 | I32 | 2^28 | 1 | 5.917 ms | 0.50% | 6.064 ms | 0.50% | 147.519 us | 2.49% | FAIL |
I64 | I32 | 2^16 | 0.544 | 10.212 us | 6.50% | 10.500 us | 6.41% | 0.288 us | 2.82% | PASS |
I64 | I32 | 2^20 | 0.544 | 29.567 us | 2.51% | 30.684 us | 2.50% | 1.116 us | 3.78% | FAIL |
I64 | I32 | 2^24 | 0.544 | 317.750 us | 0.73% | 326.572 us | 0.91% | 8.822 us | 2.78% | FAIL |
I64 | I32 | 2^28 | 0.544 | 4.910 ms | 0.50% | 5.055 ms | 0.50% | 144.807 us | 2.95% | FAIL |
I64 | I32 | 2^16 | 0 | 9.994 us | 6.83% | 10.303 us | 5.92% | 0.309 us | 3.09% | PASS |
I64 | I32 | 2^20 | 0 | 28.928 us | 2.71% | 29.853 us | 2.70% | 0.925 us | 3.20% | FAIL |
I64 | I32 | 2^24 | 0 | 217.336 us | 0.52% | 224.209 us | 0.58% | 6.873 us | 3.16% | FAIL |
I64 | I32 | 2^28 | 0 | 3.230 ms | 0.13% | 3.333 ms | 0.13% | 102.461 us | 3.17% | FAIL |
I64 | I64 | 2^16 | 1 | 9.930 us | 7.08% | 10.803 us | 6.51% | 0.873 us | 8.79% | FAIL |
I64 | I64 | 2^20 | 1 | 31.342 us | 2.54% | 32.495 us | 2.50% | 1.154 us | 3.68% | FAIL |
I64 | I64 | 2^24 | 1 | 379.375 us | 0.57% | 390.142 us | 0.84% | 10.767 us | 2.84% | FAIL |
I64 | I64 | 2^28 | 1 | 5.916 ms | 0.50% | 6.095 ms | 0.50% | 178.760 us | 3.02% | FAIL |
I64 | I64 | 2^16 | 0.544 | 9.786 us | 7.31% | 10.681 us | 6.44% | 0.895 us | 9.15% | FAIL |
I64 | I64 | 2^20 | 0.544 | 29.703 us | 3.04% | 30.512 us | 2.69% | 0.809 us | 2.73% | FAIL |
I64 | I64 | 2^24 | 0.544 | 317.789 us | 0.70% | 327.787 us | 0.85% | 9.998 us | 3.15% | FAIL |
I64 | I64 | 2^28 | 0.544 | 4.909 ms | 0.50% | 5.076 ms | 0.50% | 167.527 us | 3.41% | FAIL |
I64 | I64 | 2^16 | 0 | 9.740 us | 6.82% | 10.438 us | 6.11% | 0.699 us | 7.17% | FAIL |
I64 | I64 | 2^20 | 0 | 28.939 us | 2.48% | 29.784 us | 2.32% | 0.846 us | 2.92% | FAIL |
I64 | I64 | 2^24 | 0 | 221.492 us | 0.54% | 228.875 us | 0.49% | 7.383 us | 3.33% | FAIL |
I64 | I64 | 2^28 | 0 | 3.309 ms | 0.12% | 3.411 ms | 0.11% | 102.915 us | 3.11% | FAIL |
I128 | I32 | 2^16 | 1 | 12.917 us | 5.35% | 13.548 us | 4.98% | 0.632 us | 4.89% | PASS |
I128 | I32 | 2^20 | 1 | 54.920 us | 1.48% | 55.072 us | 1.85% | 0.152 us | 0.28% | PASS |
I128 | I32 | 2^24 | 1 | 739.042 us | 0.50% | 751.141 us | 0.57% | 12.099 us | 1.64% | FAIL |
I128 | I32 | 2^28 | 1 | 11.707 ms | 0.50% | 11.931 ms | 0.50% | 223.542 us | 1.91% | FAIL |
I128 | I32 | 2^16 | 0.544 | 12.877 us | 5.66% | 13.334 us | 4.94% | 0.456 us | 3.54% | PASS |
I128 | I32 | 2^20 | 0.544 | 47.881 us | 2.01% | 48.755 us | 2.00% | 0.874 us | 1.82% | PASS |
I128 | I32 | 2^24 | 0.544 | 607.521 us | 0.68% | 619.392 us | 0.75% | 11.871 us | 1.95% | FAIL |
I128 | I32 | 2^28 | 0.544 | 9.573 ms | 0.50% | 9.774 ms | 0.50% | 201.799 us | 2.11% | FAIL |
I128 | I32 | 2^16 | 0 | 12.708 us | 5.33% | 13.141 us | 5.61% | 0.432 us | 3.40% | PASS |
I128 | I32 | 2^20 | 0 | 41.695 us | 1.88% | 42.299 us | 1.76% | 0.604 us | 1.45% | PASS |
I128 | I32 | 2^24 | 0 | 413.564 us | 0.38% | 434.250 us | 0.31% | 20.686 us | 5.00% | FAIL |
I128 | I32 | 2^28 | 0 | 6.370 ms | 0.08% | 6.705 ms | 0.08% | 335.925 us | 5.27% | FAIL |
I128 | I64 | 2^16 | 1 | 12.474 us | 5.29% | 13.167 us | 4.76% | 0.693 us | 5.56% | FAIL |
I128 | I64 | 2^20 | 1 | 54.632 us | 1.64% | 55.377 us | 1.84% | 0.745 us | 1.36% | PASS |
I128 | I64 | 2^24 | 1 | 744.716 us | 0.50% | 759.217 us | 0.59% | 14.501 us | 1.95% | FAIL |
I128 | I64 | 2^28 | 1 | 11.789 ms | 0.50% | 12.048 ms | 0.50% | 259.238 us | 2.20% | FAIL |
I128 | I64 | 2^16 | 0.544 | 12.464 us | 5.05% | 12.995 us | 5.30% | 0.531 us | 4.26% | PASS |
I128 | I64 | 2^20 | 0.544 | 48.528 us | 1.63% | 49.392 us | 1.98% | 0.864 us | 1.78% | FAIL |
I128 | I64 | 2^24 | 0.544 | 616.587 us | 0.66% | 629.460 us | 0.68% | 12.874 us | 2.09% | FAIL |
I128 | I64 | 2^28 | 0.544 | 9.712 ms | 0.50% | 9.946 ms | 0.50% | 234.648 us | 2.42% | FAIL |
I128 | I64 | 2^16 | 0 | 12.279 us | 5.05% | 12.703 us | 6.12% | 0.424 us | 3.45% | PASS |
I128 | I64 | 2^20 | 0 | 42.418 us | 1.70% | 43.146 us | 1.64% | 0.727 us | 1.71% | FAIL |
I128 | I64 | 2^24 | 0 | 431.022 us | 0.31% | 446.924 us | 0.26% | 15.902 us | 3.69% | FAIL |
I128 | I64 | 2^28 | 0 | 6.662 ms | 0.07% | 6.920 ms | 0.06% | 257.548 us | 3.87% | FAIL |
F32 | I32 | 2^16 | 1 | 9.353 us | 7.35% | 9.680 us | 6.83% | 0.327 us | 3.49% | PASS |
F32 | I32 | 2^20 | 1 | 21.654 us | 3.25% | 22.933 us | 3.26% | 1.279 us | 5.91% | FAIL |
F32 | I32 | 2^24 | 1 | 207.630 us | 0.82% | 213.642 us | 0.95% | 6.012 us | 2.90% | FAIL |
F32 | I32 | 2^28 | 1 | 3.185 ms | 0.56% | 3.247 ms | 0.59% | 61.836 us | 1.94% | FAIL |
F32 | I32 | 2^16 | 0.544 | 9.468 us | 6.79% | 9.506 us | 6.40% | 0.039 us | 0.41% | PASS |
F32 | I32 | 2^20 | 0.544 | 21.738 us | 3.19% | 22.732 us | 3.15% | 0.994 us | 4.57% | FAIL |
F32 | I32 | 2^24 | 0.544 | 184.812 us | 1.04% | 193.011 us | 1.27% | 8.198 us | 4.44% | FAIL |
F32 | I32 | 2^28 | 0.544 | 2.796 ms | 0.50% | 2.909 ms | 0.50% | 113.690 us | 4.07% | FAIL |
F32 | I32 | 2^16 | 0 | 9.152 us | 6.81% | 9.270 us | 6.65% | 0.118 us | 1.29% | PASS |
F32 | I32 | 2^20 | 0 | 20.757 us | 3.06% | 21.913 us | 3.28% | 1.156 us | 5.57% | FAIL |
F32 | I32 | 2^24 | 0 | 131.182 us | 0.82% | 135.503 us | 0.82% | 4.321 us | 3.29% | FAIL |
F32 | I32 | 2^28 | 0 | 1.834 ms | 0.18% | 1.901 ms | 0.18% | 66.163 us | 3.61% | FAIL |
F32 | I64 | 2^16 | 1 | 9.581 us | 7.36% | 9.616 us | 6.51% | 0.036 us | 0.37% | PASS |
F32 | I64 | 2^20 | 1 | 22.103 us | 3.54% | 22.647 us | 3.31% | 0.543 us | 2.46% | PASS |
F32 | I64 | 2^24 | 1 | 210.402 us | 0.79% | 217.726 us | 1.00% | 7.324 us | 3.48% | FAIL |
F32 | I64 | 2^28 | 1 | 3.222 ms | 0.55% | 3.336 ms | 0.55% | 113.503 us | 3.52% | FAIL |
F32 | I64 | 2^16 | 0.544 | 9.603 us | 7.17% | 9.467 us | 6.80% | -0.136 us | -1.42% | PASS |
F32 | I64 | 2^20 | 0.544 | 22.201 us | 3.71% | 22.714 us | 4.59% | 0.513 us | 2.31% | PASS |
F32 | I64 | 2^24 | 0.544 | 189.620 us | 1.02% | 199.942 us | 1.12% | 10.322 us | 5.44% | FAIL |
F32 | I64 | 2^28 | 0.544 | 2.873 ms | 0.50% | 3.043 ms | 0.50% | 170.245 us | 5.93% | FAIL |
F32 | I64 | 2^16 | 0 | 9.292 us | 6.90% | 9.424 us | 6.41% | 0.132 us | 1.42% | PASS |
F32 | I64 | 2^20 | 0 | 21.237 us | 3.50% | 21.904 us | 3.19% | 0.668 us | 3.14% | PASS |
F32 | I64 | 2^24 | 0 | 137.274 us | 0.76% | 147.125 us | 0.70% | 9.851 us | 7.18% | FAIL |
F32 | I64 | 2^28 | 0 | 1.944 ms | 0.14% | 2.122 ms | 0.12% | 177.713 us | 9.14% | FAIL |
F64 | I32 | 2^16 | 1 | 10.381 us | 6.37% | 10.529 us | 6.75% | 0.147 us | 1.42% | PASS |
F64 | I32 | 2^20 | 1 | 31.717 us | 2.34% | 32.602 us | 2.43% | 0.885 us | 2.79% | FAIL |
F64 | I32 | 2^24 | 1 | 378.763 us | 0.57% | 387.486 us | 0.71% | 8.723 us | 2.30% | FAIL |
F64 | I32 | 2^28 | 1 | 5.917 ms | 0.50% | 6.065 ms | 0.50% | 147.912 us | 2.50% | FAIL |
F64 | I32 | 2^16 | 0.544 | 10.542 us | 6.62% | 10.450 us | 5.86% | -0.092 us | -0.87% | PASS |
F64 | I32 | 2^20 | 0.544 | 29.626 us | 2.52% | 30.642 us | 2.73% | 1.017 us | 3.43% | FAIL |
F64 | I32 | 2^24 | 0.544 | 317.616 us | 0.72% | 326.596 us | 0.87% | 8.980 us | 2.83% | FAIL |
F64 | I32 | 2^28 | 0.544 | 4.911 ms | 0.50% | 5.054 ms | 0.50% | 143.797 us | 2.93% | FAIL |
F64 | I32 | 2^16 | 0 | 10.042 us | 6.41% | 10.331 us | 6.53% | 0.289 us | 2.88% | PASS |
F64 | I32 | 2^20 | 0 | 28.956 us | 2.42% | 29.831 us | 2.63% | 0.876 us | 3.02% | FAIL |
F64 | I32 | 2^24 | 0 | 217.447 us | 0.52% | 224.264 us | 0.58% | 6.818 us | 3.14% | FAIL |
F64 | I32 | 2^28 | 0 | 3.231 ms | 0.12% | 3.332 ms | 0.12% | 101.283 us | 3.13% | FAIL |
F64 | I64 | 2^16 | 1 | 10.119 us | 6.49% | 10.626 us | 5.93% | 0.507 us | 5.01% | PASS |
F64 | I64 | 2^20 | 1 | 31.479 us | 2.45% | 32.458 us | 2.35% | 0.979 us | 3.11% | FAIL |
F64 | I64 | 2^24 | 1 | 379.315 us | 0.56% | 390.081 us | 0.81% | 10.766 us | 2.84% | FAIL |
F64 | I64 | 2^28 | 1 | 5.916 ms | 0.50% | 6.095 ms | 0.50% | 178.840 us | 3.02% | FAIL |
F64 | I64 | 2^16 | 0.544 | 10.253 us | 6.61% | 10.593 us | 5.90% | 0.340 us | 3.31% | PASS |
F64 | I64 | 2^20 | 0.544 | 29.804 us | 2.84% | 30.389 us | 2.54% | 0.584 us | 1.96% | PASS |
F64 | I64 | 2^24 | 0.544 | 317.599 us | 0.67% | 327.729 us | 0.84% | 10.130 us | 3.19% | FAIL |
F64 | I64 | 2^28 | 0.544 | 4.909 ms | 0.50% | 5.076 ms | 0.50% | 166.845 us | 3.40% | FAIL |
F64 | I64 | 2^16 | 0 | 9.717 us | 6.68% | 10.360 us | 5.73% | 0.643 us | 6.62% | FAIL |
F64 | I64 | 2^20 | 0 | 28.901 us | 2.46% | 29.741 us | 2.29% | 0.840 us | 2.91% | FAIL |
F64 | I64 | 2^24 | 0 | 221.525 us | 0.49% | 228.770 us | 0.46% | 7.244 us | 3.27% | FAIL |
F64 | I64 | 2^28 | 0 | 3.309 ms | 0.12% | 3.411 ms | 0.13% | 102.454 us | 3.10% | FAIL |
Comparing before/after, compiling for targeted architecture.
|
T{ct} | OffsetT{ct} | Elements{io} | Entropy | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|---|---|
I8 | I32 | 2^16 | 1 | 14.003 us | 10.59% | 14.326 us | 10.51% | 0.323 us | 2.30% | PASS |
I8 | I32 | 2^20 | 1 | 16.561 us | 2.95% | 16.863 us | 3.24% | 0.303 us | 1.83% | PASS |
I8 | I32 | 2^24 | 1 | 71.453 us | 1.01% | 73.136 us | 1.07% | 1.683 us | 2.36% | FAIL |
I8 | I32 | 2^28 | 1 | 915.396 us | 0.11% | 924.950 us | 0.33% | 9.554 us | 1.04% | FAIL |
I8 | I32 | 2^16 | 0.544 | 13.733 us | 3.97% | 13.789 us | 4.07% | 0.056 us | 0.41% | PASS |
I8 | I32 | 2^20 | 0.544 | 15.951 us | 3.67% | 16.185 us | 3.34% | 0.234 us | 1.47% | PASS |
I8 | I32 | 2^24 | 0.544 | 68.066 us | 1.06% | 69.339 us | 0.89% | 1.273 us | 1.87% | FAIL |
I8 | I32 | 2^28 | 0.544 | 848.127 us | 0.45% | 893.979 us | 0.30% | 45.852 us | 5.41% | FAIL |
I8 | I32 | 2^16 | 0 | 13.194 us | 3.54% | 13.290 us | 3.78% | 0.096 us | 0.73% | PASS |
I8 | I32 | 2^20 | 0 | 14.543 us | 3.27% | 14.651 us | 3.61% | 0.107 us | 0.74% | PASS |
I8 | I32 | 2^24 | 0 | 56.416 us | 1.29% | 58.472 us | 1.31% | 2.056 us | 3.64% | FAIL |
I8 | I32 | 2^28 | 0 | 658.149 us | 0.13% | 665.887 us | 0.50% | 7.738 us | 1.18% | FAIL |
I8 | I64 | 2^16 | 1 | 10.744 us | 5.11% | 10.954 us | 5.11% | 0.210 us | 1.96% | PASS |
I8 | I64 | 2^20 | 1 | 16.208 us | 3.11% | 16.455 us | 2.92% | 0.247 us | 1.52% | PASS |
I8 | I64 | 2^24 | 1 | 98.434 us | 0.70% | 105.275 us | 0.77% | 6.841 us | 6.95% | FAIL |
I8 | I64 | 2^28 | 1 | 1.437 ms | 0.26% | 1.553 ms | 0.24% | 116.434 us | 8.10% | FAIL |
I8 | I64 | 2^16 | 0.544 | 10.696 us | 5.31% | 10.956 us | 4.86% | 0.260 us | 2.43% | PASS |
I8 | I64 | 2^20 | 0.544 | 16.002 us | 3.55% | 16.360 us | 3.01% | 0.358 us | 2.24% | PASS |
I8 | I64 | 2^24 | 0.544 | 95.497 us | 0.76% | 101.129 us | 0.87% | 5.633 us | 5.90% | FAIL |
I8 | I64 | 2^28 | 0.544 | 1.385 ms | 0.26% | 1.490 ms | 0.27% | 104.969 us | 7.58% | FAIL |
I8 | I64 | 2^16 | 0 | 10.196 us | 4.57% | 10.374 us | 4.24% | 0.178 us | 1.75% | PASS |
I8 | I64 | 2^20 | 0 | 15.608 us | 3.27% | 15.804 us | 3.70% | 0.196 us | 1.26% | PASS |
I8 | I64 | 2^24 | 0 | 91.231 us | 0.71% | 96.614 us | 0.81% | 5.383 us | 5.90% | FAIL |
I8 | I64 | 2^28 | 0 | 1.281 ms | 0.50% | 1.393 ms | 0.35% | 112.541 us | 8.79% | FAIL |
I16 | I32 | 2^16 | 1 | 10.983 us | 4.82% | 11.045 us | 4.62% | 0.062 us | 0.56% | PASS |
I16 | I32 | 2^20 | 1 | 18.496 us | 2.93% | 19.112 us | 2.90% | 0.616 us | 3.33% | FAIL |
I16 | I32 | 2^24 | 1 | 111.354 us | 0.54% | 112.868 us | 0.60% | 1.515 us | 1.36% | FAIL |
I16 | I32 | 2^28 | 1 | 1.582 ms | 0.07% | 1.593 ms | 0.11% | 11.225 us | 0.71% | FAIL |
I16 | I32 | 2^16 | 0.544 | 11.523 us | 4.57% | 11.597 us | 4.42% | 0.074 us | 0.64% | PASS |
I16 | I32 | 2^20 | 0.544 | 18.395 us | 2.89% | 18.940 us | 2.94% | 0.545 us | 2.96% | FAIL |
I16 | I32 | 2^24 | 0.544 | 106.319 us | 0.81% | 109.550 us | 0.81% | 3.232 us | 3.04% | FAIL |
I16 | I32 | 2^28 | 0.544 | 1.487 ms | 0.36% | 1.560 ms | 0.16% | 73.625 us | 4.95% | FAIL |
I16 | I32 | 2^16 | 0 | 11.019 us | 4.54% | 11.083 us | 4.61% | 0.064 us | 0.58% | PASS |
I16 | I32 | 2^20 | 0 | 16.909 us | 3.43% | 17.193 us | 3.07% | 0.284 us | 1.68% | PASS |
I16 | I32 | 2^24 | 0 | 90.747 us | 0.63% | 91.090 us | 0.53% | 0.342 us | 0.38% | PASS |
I16 | I32 | 2^28 | 0 | 1.255 ms | 0.28% | 1.275 ms | 0.07% | 19.550 us | 1.56% | FAIL |
I16 | I64 | 2^16 | 1 | 10.767 us | 5.33% | 11.021 us | 4.62% | 0.254 us | 2.36% | PASS |
I16 | I64 | 2^20 | 1 | 17.485 us | 2.98% | 17.795 us | 3.11% | 0.310 us | 1.77% | PASS |
I16 | I64 | 2^24 | 1 | 106.389 us | 0.68% | 113.040 us | 1.00% | 6.651 us | 6.25% | FAIL |
I16 | I64 | 2^28 | 1 | 1.544 ms | 0.27% | 1.653 ms | 0.25% | 108.572 us | 7.03% | FAIL |
I16 | I64 | 2^16 | 0.544 | 10.822 us | 5.14% | 11.126 us | 4.53% | 0.305 us | 2.81% | PASS |
I16 | I64 | 2^20 | 0.544 | 17.368 us | 2.64% | 17.916 us | 3.00% | 0.547 us | 3.15% | FAIL |
I16 | I64 | 2^24 | 0.544 | 103.990 us | 0.87% | 109.129 us | 0.94% | 5.138 us | 4.94% | FAIL |
I16 | I64 | 2^28 | 0.544 | 1.491 ms | 0.30% | 1.592 ms | 0.25% | 101.094 us | 6.78% | FAIL |
I16 | I64 | 2^16 | 0 | 10.816 us | 5.26% | 10.876 us | 5.29% | 0.060 us | 0.55% | PASS |
I16 | I64 | 2^20 | 0 | 16.761 us | 3.21% | 17.238 us | 2.90% | 0.477 us | 2.85% | PASS |
I16 | I64 | 2^24 | 0 | 98.174 us | 0.79% | 102.736 us | 0.99% | 4.563 us | 4.65% | FAIL |
I16 | I64 | 2^28 | 0 | 1.356 ms | 0.50% | 1.453 ms | 0.35% | 96.181 us | 7.09% | FAIL |
I32 | I32 | 2^16 | 1 | 10.802 us | 5.53% | 11.041 us | 4.55% | 0.240 us | 2.22% | PASS |
I32 | I32 | 2^20 | 1 | 17.680 us | 3.19% | 17.937 us | 3.18% | 0.257 us | 1.46% | PASS |
I32 | I32 | 2^24 | 1 | 117.972 us | 2.52% | 122.942 us | 4.55% | 4.970 us | 4.21% | FAIL |
I32 | I32 | 2^28 | 1 | 1.686 ms | 0.75% | 1.758 ms | 1.41% | 72.262 us | 4.29% | FAIL |
I32 | I32 | 2^16 | 0.544 | 10.497 us | 4.70% | 10.917 us | 5.08% | 0.420 us | 4.00% | PASS |
I32 | I32 | 2^20 | 0.544 | 17.420 us | 3.16% | 17.763 us | 3.23% | 0.343 us | 1.97% | PASS |
I32 | I32 | 2^24 | 0.544 | 100.675 us | 2.20% | 106.385 us | 2.20% | 5.710 us | 5.67% | FAIL |
I32 | I32 | 2^28 | 0.544 | 1.348 ms | 0.69% | 1.401 ms | 0.96% | 52.663 us | 3.91% | FAIL |
I32 | I32 | 2^16 | 0 | 9.993 us | 5.48% | 10.154 us | 4.64% | 0.161 us | 1.61% | PASS |
I32 | I32 | 2^20 | 0 | 17.586 us | 19.34% | 17.253 us | 8.35% | -0.333 us | -1.89% | PASS |
I32 | I32 | 2^24 | 0 | 82.159 us | 2.07% | 84.814 us | 2.96% | 2.655 us | 3.23% | FAIL |
I32 | I32 | 2^28 | 0 | 932.078 us | 0.71% | 988.164 us | 0.80% | 56.087 us | 6.02% | FAIL |
I32 | I64 | 2^16 | 1 | 10.954 us | 5.06% | 11.092 us | 4.54% | 0.138 us | 1.26% | PASS |
I32 | I64 | 2^20 | 1 | 19.086 us | 2.82% | 19.605 us | 2.61% | 0.519 us | 2.72% | FAIL |
I32 | I64 | 2^24 | 1 | 126.630 us | 0.86% | 130.365 us | 0.93% | 3.734 us | 2.95% | FAIL |
I32 | I64 | 2^28 | 1 | 1.870 ms | 0.35% | 1.937 ms | 0.32% | 66.480 us | 3.55% | FAIL |
I32 | I64 | 2^16 | 0.544 | 10.859 us | 5.41% | 11.231 us | 4.18% | 0.372 us | 3.42% | PASS |
I32 | I64 | 2^20 | 0.544 | 18.805 us | 2.88% | 19.237 us | 2.65% | 0.432 us | 2.30% | PASS |
I32 | I64 | 2^24 | 0.544 | 117.265 us | 0.86% | 122.339 us | 1.00% | 5.075 us | 4.33% | FAIL |
I32 | I64 | 2^28 | 0.544 | 1.710 ms | 0.48% | 1.786 ms | 0.41% | 76.582 us | 4.48% | FAIL |
I32 | I64 | 2^16 | 0 | 10.346 us | 4.31% | 10.599 us | 5.19% | 0.253 us | 2.45% | PASS |
I32 | I64 | 2^20 | 0 | 18.268 us | 2.86% | 18.669 us | 2.57% | 0.401 us | 2.19% | PASS |
I32 | I64 | 2^24 | 0 | 105.070 us | 0.86% | 109.843 us | 0.73% | 4.773 us | 4.54% | FAIL |
I32 | I64 | 2^28 | 0 | 1.433 ms | 0.50% | 1.526 ms | 0.43% | 92.698 us | 6.47% | FAIL |
I64 | I32 | 2^16 | 1 | 10.832 us | 5.21% | 11.248 us | 4.48% | 0.416 us | 3.84% | PASS |
I64 | I32 | 2^20 | 1 | 23.997 us | 2.55% | 23.766 us | 2.48% | -0.231 us | -0.96% | PASS |
I64 | I32 | 2^24 | 1 | 210.850 us | 0.77% | 214.249 us | 1.11% | 3.398 us | 1.61% | FAIL |
I64 | I32 | 2^28 | 1 | 3.185 ms | 0.50% | 3.230 ms | 0.50% | 45.208 us | 1.42% | FAIL |
I64 | I32 | 2^16 | 0.544 | 11.070 us | 4.94% | 11.088 us | 4.56% | 0.018 us | 0.17% | PASS |
I64 | I32 | 2^20 | 0.544 | 23.345 us | 2.31% | 23.337 us | 3.05% | -0.007 us | -0.03% | PASS |
I64 | I32 | 2^24 | 0.544 | 174.978 us | 0.85% | 177.655 us | 1.12% | 2.676 us | 1.53% | FAIL |
I64 | I32 | 2^28 | 0.544 | 2.532 ms | 0.50% | 2.573 ms | 0.50% | 40.790 us | 1.61% | FAIL |
I64 | I32 | 2^16 | 0 | 10.724 us | 5.47% | 10.710 us | 5.32% | -0.013 us | -0.12% | PASS |
I64 | I32 | 2^20 | 0 | 22.017 us | 2.55% | 21.899 us | 2.69% | -0.118 us | -0.53% | PASS |
I64 | I32 | 2^24 | 0 | 126.339 us | 1.10% | 128.704 us | 1.49% | 2.365 us | 1.87% | FAIL |
I64 | I32 | 2^28 | 0 | 1.612 ms | 0.63% | 1.661 ms | 0.79% | 48.768 us | 3.03% | FAIL |
I64 | I64 | 2^16 | 1 | 11.439 us | 4.09% | 11.624 us | 4.82% | 0.185 us | 1.62% | PASS |
I64 | I64 | 2^20 | 1 | 26.201 us | 2.36% | 26.918 us | 2.21% | 0.718 us | 2.74% | FAIL |
I64 | I64 | 2^24 | 1 | 232.302 us | 0.60% | 238.925 us | 0.67% | 6.623 us | 2.85% | FAIL |
I64 | I64 | 2^28 | 1 | 3.559 ms | 0.45% | 3.664 ms | 0.32% | 104.862 us | 2.95% | FAIL |
I64 | I64 | 2^16 | 0.544 | 11.441 us | 4.39% | 11.572 us | 4.28% | 0.131 us | 1.14% | PASS |
I64 | I64 | 2^20 | 0.544 | 25.943 us | 2.04% | 26.668 us | 2.03% | 0.725 us | 2.79% | FAIL |
I64 | I64 | 2^24 | 0.544 | 207.195 us | 0.56% | 216.482 us | 0.63% | 9.287 us | 4.48% | FAIL |
I64 | I64 | 2^28 | 0.544 | 3.138 ms | 0.25% | 3.277 ms | 0.23% | 139.576 us | 4.45% | FAIL |
I64 | I64 | 2^16 | 0 | 11.340 us | 4.31% | 11.459 us | 4.49% | 0.119 us | 1.05% | PASS |
I64 | I64 | 2^20 | 0 | 25.170 us | 2.27% | 25.708 us | 2.26% | 0.539 us | 2.14% | PASS |
I64 | I64 | 2^24 | 0 | 181.791 us | 0.53% | 191.160 us | 0.62% | 9.369 us | 5.15% | FAIL |
I64 | I64 | 2^28 | 0 | 2.653 ms | 0.50% | 2.827 ms | 0.43% | 173.655 us | 6.54% | FAIL |
I128 | I32 | 2^16 | 1 | 12.977 us | 4.24% | 12.915 us | 4.32% | -0.062 us | -0.47% | PASS |
I128 | I32 | 2^20 | 1 | 40.951 us | 1.84% | 42.435 us | 2.38% | 1.484 us | 3.62% | FAIL |
I128 | I32 | 2^24 | 1 | 427.279 us | 2.77% | 440.225 us | 3.02% | 12.947 us | 3.03% | FAIL |
I128 | I32 | 2^28 | 1 | 6.582 ms | 0.80% | 6.775 ms | 0.80% | 192.812 us | 2.93% | FAIL |
I128 | I32 | 2^16 | 0.544 | 12.735 us | 4.56% | 12.525 us | 4.26% | -0.210 us | -1.65% | PASS |
I128 | I32 | 2^20 | 0.544 | 36.219 us | 2.55% | 36.966 us | 2.40% | 0.747 us | 2.06% | PASS |
I128 | I32 | 2^24 | 0.544 | 335.504 us | 1.05% | 345.505 us | 2.20% | 10.001 us | 2.98% | FAIL |
I128 | I32 | 2^28 | 0.544 | 5.081 ms | 0.50% | 5.237 ms | 0.67% | 155.983 us | 3.07% | FAIL |
I128 | I32 | 2^16 | 0 | 11.941 us | 4.68% | 11.964 us | 4.92% | 0.023 us | 0.19% | PASS |
I128 | I32 | 2^20 | 0 | 32.979 us | 2.32% | 33.000 us | 2.12% | 0.021 us | 0.06% | PASS |
I128 | I32 | 2^24 | 0 | 225.813 us | 0.68% | 234.537 us | 0.84% | 8.724 us | 3.86% | FAIL |
I128 | I32 | 2^28 | 0 | 3.175 ms | 0.50% | 3.320 ms | 0.50% | 144.796 us | 4.56% | FAIL |
I128 | I64 | 2^16 | 1 | 13.888 us | 4.96% | 13.856 us | 4.23% | -0.032 us | -0.23% | PASS |
I128 | I64 | 2^20 | 1 | 43.101 us | 1.81% | 44.190 us | 1.91% | 1.089 us | 2.53% | FAIL |
I128 | I64 | 2^24 | 1 | 485.846 us | 0.49% | 501.259 us | 0.42% | 15.413 us | 3.17% | FAIL |
I128 | I64 | 2^28 | 1 | 7.619 ms | 0.39% | 7.865 ms | 0.31% | 245.257 us | 3.22% | FAIL |
I128 | I64 | 2^16 | 0.544 | 13.515 us | 3.62% | 13.434 us | 3.87% | -0.080 us | -0.59% | PASS |
I128 | I64 | 2^20 | 0.544 | 41.715 us | 1.83% | 42.390 us | 1.89% | 0.675 us | 1.62% | PASS |
I128 | I64 | 2^24 | 0.544 | 447.034 us | 0.42% | 464.824 us | 0.38% | 17.790 us | 3.98% | FAIL |
I128 | I64 | 2^28 | 0.544 | 6.960 ms | 0.25% | 7.252 ms | 0.20% | 291.424 us | 4.19% | FAIL |
I128 | I64 | 2^16 | 0 | 15.888 us | 34.21% | 13.708 us | 4.45% | -2.180 us | -13.72% | FAIL |
I128 | I64 | 2^20 | 0 | 39.887 us | 2.15% | 40.225 us | 1.91% | 0.339 us | 0.85% | PASS |
I128 | I64 | 2^24 | 0 | 397.473 us | 0.36% | 412.908 us | 0.46% | 15.435 us | 3.88% | FAIL |
I128 | I64 | 2^28 | 0 | 6.129 ms | 0.49% | 6.385 ms | 0.35% | 256.153 us | 4.18% | FAIL |
F32 | I32 | 2^16 | 1 | 11.051 us | 4.55% | 10.888 us | 5.42% | -0.163 us | -1.47% | PASS |
F32 | I32 | 2^20 | 1 | 17.516 us | 2.99% | 17.733 us | 3.19% | 0.217 us | 1.24% | PASS |
F32 | I32 | 2^24 | 1 | 117.833 us | 2.43% | 122.766 us | 4.77% | 4.933 us | 4.19% | FAIL |
F32 | I32 | 2^28 | 1 | 1.697 ms | 0.92% | 1.765 ms | 1.31% | 67.986 us | 4.01% | FAIL |
F32 | I32 | 2^16 | 0.544 | 10.106 us | 5.13% | 10.284 us | 4.26% | 0.179 us | 1.77% | PASS |
F32 | I32 | 2^20 | 0.544 | 17.096 us | 4.58% | 17.642 us | 3.90% | 0.546 us | 3.20% | PASS |
F32 | I32 | 2^24 | 0.544 | 85.431 us | 2.10% | 88.044 us | 2.89% | 2.613 us | 3.06% | FAIL |
F32 | I32 | 2^28 | 0.544 | 1.015 ms | 0.68% | 1.064 ms | 0.55% | 49.253 us | 4.85% | FAIL |
F32 | I32 | 2^16 | 0 | 9.894 us | 5.61% | 10.132 us | 5.02% | 0.238 us | 2.40% | PASS |
F32 | I32 | 2^20 | 0 | 18.221 us | 23.24% | 17.400 us | 9.32% | -0.821 us | -4.51% | PASS |
F32 | I32 | 2^24 | 0 | 82.215 us | 2.24% | 84.631 us | 2.86% | 2.416 us | 2.94% | FAIL |
F32 | I32 | 2^28 | 0 | 930.426 us | 0.55% | 988.323 us | 0.81% | 57.897 us | 6.22% | FAIL |
F32 | I64 | 2^16 | 1 | 10.979 us | 4.83% | 11.289 us | 3.91% | 0.310 us | 2.82% | PASS |
F32 | I64 | 2^20 | 1 | 19.126 us | 2.76% | 19.535 us | 2.43% | 0.409 us | 2.14% | PASS |
F32 | I64 | 2^24 | 1 | 126.971 us | 0.89% | 130.462 us | 0.96% | 3.491 us | 2.75% | FAIL |
F32 | I64 | 2^28 | 1 | 1.875 ms | 0.24% | 1.937 ms | 0.50% | 62.148 us | 3.31% | FAIL |
F32 | I64 | 2^16 | 0.544 | 10.757 us | 5.42% | 11.033 us | 4.79% | 0.275 us | 2.56% | PASS |
F32 | I64 | 2^20 | 0.544 | 18.527 us | 2.79% | 18.861 us | 3.11% | 0.334 us | 1.80% | PASS |
F32 | I64 | 2^24 | 0.544 | 108.536 us | 0.84% | 112.933 us | 1.00% | 4.398 us | 4.05% | FAIL |
F32 | I64 | 2^28 | 0.544 | 1.517 ms | 0.43% | 1.597 ms | 0.40% | 80.000 us | 5.27% | FAIL |
F32 | I64 | 2^16 | 0 | 10.346 us | 4.28% | 10.902 us | 5.20% | 0.556 us | 5.37% | FAIL |
F32 | I64 | 2^20 | 0 | 18.266 us | 2.92% | 18.693 us | 2.87% | 0.427 us | 2.34% | PASS |
F32 | I64 | 2^24 | 0 | 105.444 us | 0.84% | 109.709 us | 0.78% | 4.266 us | 4.05% | FAIL |
F32 | I64 | 2^28 | 0 | 1.445 ms | 0.50% | 1.529 ms | 0.40% | 83.864 us | 5.81% | FAIL |
F64 | I32 | 2^16 | 1 | 10.800 us | 5.07% | 11.403 us | 4.30% | 0.604 us | 5.59% | FAIL |
F64 | I32 | 2^20 | 1 | 23.925 us | 2.55% | 23.718 us | 2.59% | -0.207 us | -0.87% | PASS |
F64 | I32 | 2^24 | 1 | 210.758 us | 0.75% | 213.802 us | 1.07% | 3.044 us | 1.44% | FAIL |
F64 | I32 | 2^28 | 1 | 3.185 ms | 0.50% | 3.230 ms | 0.50% | 44.668 us | 1.40% | FAIL |
F64 | I32 | 2^16 | 0.544 | 10.768 us | 5.53% | 10.646 us | 5.35% | -0.122 us | -1.13% | PASS |
F64 | I32 | 2^20 | 0.544 | 22.167 us | 2.85% | 22.324 us | 2.89% | 0.157 us | 0.71% | PASS |
F64 | I32 | 2^24 | 0.544 | 137.520 us | 1.21% | 139.555 us | 1.46% | 2.035 us | 1.48% | FAIL |
F64 | I32 | 2^28 | 0.544 | 1.846 ms | 0.50% | 1.892 ms | 0.63% | 45.714 us | 2.48% | FAIL |
F64 | I32 | 2^16 | 0 | 10.564 us | 5.11% | 10.446 us | 4.81% | -0.118 us | -1.11% | PASS |
F64 | I32 | 2^20 | 0 | 21.976 us | 2.61% | 21.674 us | 2.55% | -0.301 us | -1.37% | PASS |
F64 | I32 | 2^24 | 0 | 126.097 us | 1.14% | 127.951 us | 1.49% | 1.854 us | 1.47% | FAIL |
F64 | I32 | 2^28 | 0 | 1.607 ms | 0.92% | 1.656 ms | 0.85% | 48.949 us | 3.05% | FAIL |
F64 | I64 | 2^16 | 1 | 11.543 us | 4.54% | 11.689 us | 4.87% | 0.146 us | 1.27% | PASS |
F64 | I64 | 2^20 | 1 | 26.137 us | 2.32% | 26.968 us | 2.15% | 0.830 us | 3.18% | FAIL |
F64 | I64 | 2^24 | 1 | 231.907 us | 0.62% | 238.530 us | 0.70% | 6.624 us | 2.86% | FAIL |
F64 | I64 | 2^28 | 1 | 3.567 ms | 0.40% | 3.655 ms | 0.25% | 88.042 us | 2.47% | FAIL |
F64 | I64 | 2^16 | 0.544 | 11.250 us | 4.32% | 11.386 us | 4.13% | 0.136 us | 1.21% | PASS |
F64 | I64 | 2^20 | 0.544 | 25.499 us | 2.03% | 26.665 us | 2.31% | 1.166 us | 4.57% | FAIL |
F64 | I64 | 2^24 | 0.544 | 189.150 us | 0.58% | 197.122 us | 0.64% | 7.972 us | 4.21% | FAIL |
F64 | I64 | 2^28 | 0.544 | 2.796 ms | 0.50% | 2.934 ms | 0.34% | 137.801 us | 4.93% | FAIL |
F64 | I64 | 2^16 | 0 | 11.405 us | 4.47% | 11.638 us | 4.58% | 0.233 us | 2.04% | PASS |
F64 | I64 | 2^20 | 0 | 25.005 us | 2.24% | 25.606 us | 2.61% | 0.602 us | 2.41% | FAIL |
F64 | I64 | 2^24 | 0 | 180.254 us | 0.57% | 189.613 us | 0.62% | 9.359 us | 5.19% | FAIL |
F64 | I64 | 2^28 | 0 | 2.629 ms | 0.50% | 2.802 ms | 0.42% | 172.879 us | 6.58% | FAIL |
Select.Flagged - NVIDIA A100-PCIE-40GB
[0] NVIDIA A100-PCIE-40GB
T{ct} | OffsetT{ct} | Elements{io} | Entropy | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|---|---|
I8 | I32 | 2^16 | 1 | 10.550 us | 10.37% | 11.009 us | 16.12% | 0.459 us | 4.35% | PASS |
I8 | I32 | 2^20 | 1 | 15.326 us | 3.35% | 15.887 us | 3.90% | 0.561 us | 3.66% | FAIL |
I8 | I32 | 2^24 | 1 | 77.358 us | 3.26% | 82.103 us | 4.19% | 4.745 us | 6.13% | FAIL |
I8 | I32 | 2^28 | 1 | 1.043 ms | 0.50% | 1.118 ms | 0.73% | 74.777 us | 7.17% | FAIL |
I8 | I32 | 2^16 | 0.544 | 10.788 us | 5.26% | 10.942 us | 4.93% | 0.154 us | 1.43% | PASS |
I8 | I32 | 2^20 | 0.544 | 15.230 us | 3.03% | 15.646 us | 3.30% | 0.416 us | 2.73% | PASS |
I8 | I32 | 2^24 | 0.544 | 76.144 us | 2.11% | 80.767 us | 2.21% | 4.623 us | 6.07% | FAIL |
I8 | I32 | 2^28 | 0.544 | 1.017 ms | 0.55% | 1.094 ms | 0.72% | 77.114 us | 7.58% | FAIL |
I8 | I32 | 2^16 | 0 | 10.428 us | 4.59% | 10.554 us | 4.84% | 0.127 us | 1.21% | PASS |
I8 | I32 | 2^20 | 0 | 14.967 us | 5.05% | 15.327 us | 3.82% | 0.360 us | 2.41% | PASS |
I8 | I32 | 2^24 | 0 | 69.982 us | 2.12% | 73.301 us | 3.19% | 3.319 us | 4.74% | FAIL |
I8 | I32 | 2^28 | 0 | 866.153 us | 0.21% | 918.086 us | 0.27% | 51.933 us | 6.00% | FAIL |
I8 | I64 | 2^16 | 1 | 10.908 us | 4.97% | 11.144 us | 4.16% | 0.236 us | 2.16% | PASS |
I8 | I64 | 2^20 | 1 | 17.424 us | 2.91% | 17.918 us | 3.17% | 0.494 us | 2.84% | PASS |
I8 | I64 | 2^24 | 1 | 107.996 us | 0.63% | 113.144 us | 0.93% | 5.148 us | 4.77% | FAIL |
I8 | I64 | 2^28 | 1 | 1.562 ms | 0.29% | 1.653 ms | 0.26% | 91.281 us | 5.84% | FAIL |
I8 | I64 | 2^16 | 0.544 | 10.889 us | 5.04% | 11.076 us | 4.64% | 0.187 us | 1.72% | PASS |
I8 | I64 | 2^20 | 0.544 | 17.289 us | 2.62% | 17.676 us | 2.87% | 0.387 us | 2.24% | PASS |
I8 | I64 | 2^24 | 0.544 | 105.647 us | 0.85% | 110.765 us | 0.94% | 5.118 us | 4.84% | FAIL |
I8 | I64 | 2^28 | 0.544 | 1.525 ms | 0.47% | 1.616 ms | 0.37% | 90.833 us | 5.96% | FAIL |
I8 | I64 | 2^16 | 0 | 10.240 us | 3.96% | 10.422 us | 4.49% | 0.182 us | 1.78% | PASS |
I8 | I64 | 2^20 | 0 | 16.855 us | 3.45% | 17.290 us | 3.23% | 0.435 us | 2.58% | PASS |
I8 | I64 | 2^24 | 0 | 101.577 us | 0.70% | 106.896 us | 0.84% | 5.319 us | 5.24% | FAIL |
I8 | I64 | 2^28 | 0 | 1.406 ms | 0.10% | 1.498 ms | 0.12% | 91.279 us | 6.49% | FAIL |
I16 | I32 | 2^16 | 1 | 11.346 us | 4.13% | 11.864 us | 4.72% | 0.518 us | 4.56% | FAIL |
I16 | I32 | 2^20 | 1 | 17.175 us | 4.31% | 17.757 us | 3.53% | 0.581 us | 3.39% | PASS |
I16 | I32 | 2^24 | 1 | 100.276 us | 1.91% | 95.588 us | 2.53% | -4.688 us | -4.67% | FAIL |
I16 | I32 | 2^28 | 1 | 1.336 ms | 0.36% | 1.269 ms | 0.40% | -67.190 us | -5.03% | FAIL |
I16 | I32 | 2^16 | 0.544 | 11.666 us | 4.70% | 12.260 us | 3.98% | 0.593 us | 5.09% | FAIL |
I16 | I32 | 2^20 | 0.544 | 16.571 us | 3.29% | 17.085 us | 3.35% | 0.514 us | 3.10% | PASS |
I16 | I32 | 2^24 | 0.544 | 95.417 us | 1.92% | 90.798 us | 2.44% | -4.619 us | -4.84% | FAIL |
I16 | I32 | 2^28 | 0.544 | 1.248 ms | 0.50% | 1.165 ms | 0.39% | -82.988 us | -6.65% | FAIL |
I16 | I32 | 2^16 | 0 | 10.870 us | 5.21% | 11.374 us | 3.85% | 0.504 us | 4.64% | FAIL |
I16 | I32 | 2^20 | 0 | 16.404 us | 3.30% | 17.037 us | 3.54% | 0.633 us | 3.86% | FAIL |
I16 | I32 | 2^24 | 0 | 84.071 us | 2.11% | 79.902 us | 2.78% | -4.170 us | -4.96% | FAIL |
I16 | I32 | 2^28 | 0 | 1.037 ms | 0.24% | 938.336 us | 0.43% | -98.709 us | -9.52% | FAIL |
I16 | I64 | 2^16 | 1 | 10.695 us | 5.20% | 11.102 us | 4.57% | 0.407 us | 3.81% | PASS |
I16 | I64 | 2^20 | 1 | 18.518 us | 2.93% | 19.119 us | 2.77% | 0.601 us | 3.25% | FAIL |
I16 | I64 | 2^24 | 1 | 116.678 us | 0.84% | 121.883 us | 0.96% | 5.204 us | 4.46% | FAIL |
I16 | I64 | 2^28 | 1 | 1.696 ms | 0.29% | 1.777 ms | 0.26% | 81.068 us | 4.78% | FAIL |
I16 | I64 | 2^16 | 0.544 | 10.609 us | 5.14% | 10.979 us | 4.78% | 0.370 us | 3.49% | PASS |
I16 | I64 | 2^20 | 0.544 | 18.384 us | 2.45% | 18.899 us | 2.92% | 0.515 us | 2.80% | FAIL |
I16 | I64 | 2^24 | 0.544 | 114.292 us | 0.81% | 119.623 us | 0.90% | 5.331 us | 4.66% | FAIL |
I16 | I64 | 2^28 | 0.544 | 1.656 ms | 0.43% | 1.736 ms | 0.41% | 79.875 us | 4.82% | FAIL |
I16 | I64 | 2^16 | 0 | 10.696 us | 5.28% | 11.085 us | 4.58% | 0.389 us | 3.63% | PASS |
I16 | I64 | 2^20 | 0 | 17.845 us | 3.19% | 18.220 us | 2.95% | 0.375 us | 2.10% | PASS |
I16 | I64 | 2^24 | 0 | 107.377 us | 0.76% | 112.512 us | 0.95% | 5.135 us | 4.78% | FAIL |
I16 | I64 | 2^28 | 0 | 1.484 ms | 0.12% | 1.572 ms | 0.14% | 88.041 us | 5.93% | FAIL |
I32 | I32 | 2^16 | 1 | 10.617 us | 5.36% | 11.016 us | 4.73% | 0.399 us | 3.76% | PASS |
I32 | I32 | 2^20 | 1 | 19.408 us | 2.75% | 20.367 us | 2.91% | 0.959 us | 4.94% | FAIL |
I32 | I32 | 2^24 | 1 | 137.828 us | 1.01% | 144.374 us | 1.37% | 6.545 us | 4.75% | FAIL |
I32 | I32 | 2^28 | 1 | 1.953 ms | 0.50% | 2.047 ms | 0.57% | 94.395 us | 4.83% | FAIL |
I32 | I32 | 2^16 | 0.544 | 10.468 us | 4.74% | 10.768 us | 5.31% | 0.300 us | 2.87% | PASS |
I32 | I32 | 2^20 | 0.544 | 19.095 us | 3.06% | 20.093 us | 3.78% | 0.998 us | 5.22% | FAIL |
I32 | I32 | 2^24 | 0.544 | 124.197 us | 1.21% | 130.603 us | 1.48% | 6.406 us | 5.16% | FAIL |
I32 | I32 | 2^28 | 0.544 | 1.694 ms | 0.50% | 1.811 ms | 0.50% | 116.534 us | 6.88% | FAIL |
I32 | I32 | 2^16 | 0 | 10.018 us | 4.88% | 10.266 us | 4.59% | 0.248 us | 2.48% | PASS |
I32 | I32 | 2^20 | 0 | 18.045 us | 3.29% | 18.504 us | 2.86% | 0.459 us | 2.54% | PASS |
I32 | I32 | 2^24 | 0 | 108.258 us | 1.09% | 111.968 us | 1.45% | 3.710 us | 3.43% | FAIL |
I32 | I32 | 2^28 | 0 | 1.392 ms | 0.18% | 1.480 ms | 0.21% | 88.184 us | 6.34% | FAIL |
I32 | I64 | 2^16 | 1 | 11.141 us | 4.37% | 11.378 us | 4.41% | 0.237 us | 2.13% | PASS |
I32 | I64 | 2^20 | 1 | 20.818 us | 2.50% | 20.953 us | 2.55% | 0.135 us | 0.65% | PASS |
I32 | I64 | 2^24 | 1 | 141.316 us | 0.66% | 144.585 us | 0.68% | 3.269 us | 2.31% | FAIL |
I32 | I64 | 2^28 | 1 | 2.059 ms | 0.25% | 2.123 ms | 0.33% | 64.328 us | 3.12% | FAIL |
I32 | I64 | 2^16 | 0.544 | 11.284 us | 3.52% | 11.303 us | 3.50% | 0.019 us | 0.17% | PASS |
I32 | I64 | 2^20 | 0.544 | 20.760 us | 2.40% | 20.879 us | 2.53% | 0.119 us | 0.57% | PASS |
I32 | I64 | 2^24 | 0.544 | 131.145 us | 0.75% | 135.526 us | 0.83% | 4.381 us | 3.34% | FAIL |
I32 | I64 | 2^28 | 0.544 | 1.857 ms | 0.46% | 1.932 ms | 0.42% | 75.349 us | 4.06% | FAIL |
I32 | I64 | 2^16 | 0 | 10.646 us | 5.35% | 10.799 us | 5.12% | 0.153 us | 1.43% | PASS |
I32 | I64 | 2^20 | 0 | 19.968 us | 2.66% | 20.118 us | 2.97% | 0.150 us | 0.75% | PASS |
I32 | I64 | 2^24 | 0 | 116.104 us | 0.71% | 120.232 us | 0.88% | 4.128 us | 3.56% | FAIL |
I32 | I64 | 2^28 | 0 | 1.579 ms | 0.13% | 1.659 ms | 0.15% | 79.681 us | 5.04% | FAIL |
I64 | I32 | 2^16 | 1 | 11.029 us | 4.80% | 11.216 us | 4.26% | 0.188 us | 1.70% | PASS |
I64 | I32 | 2^20 | 1 | 27.381 us | 2.16% | 28.194 us | 2.24% | 0.814 us | 2.97% | FAIL |
I64 | I32 | 2^24 | 1 | 232.335 us | 1.19% | 243.800 us | 3.40% | 11.466 us | 4.93% | FAIL |
I64 | I32 | 2^28 | 1 | 3.487 ms | 0.55% | 3.672 ms | 0.91% | 185.850 us | 5.33% | FAIL |
I64 | I32 | 2^16 | 0.544 | 11.294 us | 4.37% | 11.290 us | 4.21% | -0.005 us | -0.04% | PASS |
I64 | I32 | 2^20 | 0.544 | 26.590 us | 2.06% | 27.128 us | 2.42% | 0.539 us | 2.03% | PASS |
I64 | I32 | 2^24 | 0.544 | 193.905 us | 1.03% | 200.667 us | 1.62% | 6.762 us | 3.49% | FAIL |
I64 | I32 | 2^28 | 0.544 | 2.815 ms | 0.53% | 2.924 ms | 0.79% | 108.853 us | 3.87% | FAIL |
I64 | I32 | 2^16 | 0 | 10.779 us | 5.41% | 10.723 us | 5.30% | -0.056 us | -0.52% | PASS |
I64 | I32 | 2^20 | 0 | 24.282 us | 2.48% | 24.749 us | 2.57% | 0.467 us | 1.92% | PASS |
I64 | I32 | 2^24 | 0 | 148.208 us | 0.77% | 154.013 us | 1.07% | 5.805 us | 3.92% | FAIL |
I64 | I32 | 2^28 | 0 | 1.972 ms | 0.17% | 2.080 ms | 0.30% | 108.251 us | 5.49% | FAIL |
I64 | I64 | 2^16 | 1 | 11.677 us | 4.66% | 11.843 us | 4.89% | 0.166 us | 1.42% | PASS |
I64 | I64 | 2^20 | 1 | 27.792 us | 1.86% | 28.562 us | 1.97% | 0.770 us | 2.77% | FAIL |
I64 | I64 | 2^24 | 1 | 245.119 us | 0.53% | 250.969 us | 0.62% | 5.850 us | 2.39% | FAIL |
I64 | I64 | 2^28 | 1 | 3.735 ms | 0.50% | 3.823 ms | 0.49% | 87.727 us | 2.35% | FAIL |
I64 | I64 | 2^16 | 0.544 | 11.712 us | 4.93% | 11.883 us | 4.87% | 0.171 us | 1.46% | PASS |
I64 | I64 | 2^20 | 0.544 | 27.297 us | 2.19% | 28.330 us | 2.08% | 1.033 us | 3.78% | FAIL |
I64 | I64 | 2^24 | 0.544 | 217.224 us | 0.58% | 226.457 us | 0.61% | 9.233 us | 4.25% | FAIL |
I64 | I64 | 2^28 | 0.544 | 3.235 ms | 0.50% | 3.399 ms | 0.43% | 163.751 us | 5.06% | FAIL |
I64 | I64 | 2^16 | 0 | 11.519 us | 4.33% | 11.842 us | 4.77% | 0.323 us | 2.80% | PASS |
I64 | I64 | 2^20 | 0 | 26.621 us | 2.26% | 27.374 us | 2.28% | 0.753 us | 2.83% | FAIL |
I64 | I64 | 2^24 | 0 | 191.486 us | 0.51% | 200.126 us | 0.62% | 8.640 us | 4.51% | FAIL |
I64 | I64 | 2^28 | 0 | 2.787 ms | 0.13% | 2.943 ms | 0.14% | 155.775 us | 5.59% | FAIL |
I128 | I32 | 2^16 | 1 | 12.278 us | 4.09% | 12.584 us | 4.34% | 0.306 us | 2.49% | PASS |
I128 | I32 | 2^20 | 1 | 40.328 us | 1.81% | 40.735 us | 1.66% | 0.407 us | 1.01% | PASS |
I128 | I32 | 2^24 | 1 | 426.357 us | 0.56% | 442.825 us | 1.05% | 16.468 us | 3.86% | FAIL |
I128 | I32 | 2^28 | 1 | 6.605 ms | 0.50% | 6.875 ms | 0.50% | 270.120 us | 4.09% | FAIL |
I128 | I32 | 2^16 | 0.544 | 12.255 us | 4.08% | 12.675 us | 4.46% | 0.420 us | 3.43% | PASS |
I128 | I32 | 2^20 | 0.544 | 37.975 us | 1.92% | 38.504 us | 1.83% | 0.529 us | 1.39% | PASS |
I128 | I32 | 2^24 | 0.544 | 345.927 us | 0.69% | 357.985 us | 1.20% | 12.058 us | 3.49% | FAIL |
I128 | I32 | 2^28 | 0.544 | 5.232 ms | 0.50% | 5.399 ms | 0.50% | 167.768 us | 3.21% | FAIL |
I128 | I32 | 2^16 | 0 | 11.840 us | 4.88% | 12.098 us | 4.68% | 0.258 us | 2.18% | PASS |
I128 | I32 | 2^20 | 0 | 35.066 us | 2.06% | 35.104 us | 1.86% | 0.038 us | 0.11% | PASS |
I128 | I32 | 2^24 | 0 | 234.184 us | 0.79% | 246.398 us | 0.99% | 12.214 us | 5.22% | FAIL |
I128 | I32 | 2^28 | 0 | 3.333 ms | 0.16% | 3.498 ms | 0.24% | 165.087 us | 4.95% | FAIL |
I128 | I64 | 2^16 | 1 | 13.868 us | 4.04% | 14.180 us | 4.05% | 0.312 us | 2.25% | PASS |
I128 | I64 | 2^20 | 1 | 43.816 us | 1.58% | 45.447 us | 1.85% | 1.631 us | 3.72% | FAIL |
I128 | I64 | 2^24 | 1 | 491.981 us | 0.50% | 508.654 us | 0.43% | 16.673 us | 3.39% | FAIL |
I128 | I64 | 2^28 | 1 | 7.708 ms | 0.46% | 7.969 ms | 0.36% | 261.033 us | 3.39% | FAIL |
I128 | I64 | 2^16 | 0.544 | 13.439 us | 3.68% | 13.799 us | 4.32% | 0.360 us | 2.68% | PASS |
I128 | I64 | 2^20 | 0.544 | 42.957 us | 3.84% | 44.550 us | 4.07% | 1.593 us | 3.71% | PASS |
I128 | I64 | 2^24 | 0.544 | 449.942 us | 0.43% | 469.523 us | 0.39% | 19.581 us | 4.35% | FAIL |
I128 | I64 | 2^28 | 0.544 | 7.014 ms | 0.45% | 7.325 ms | 0.36% | 311.103 us | 4.44% | FAIL |
I128 | I64 | 2^16 | 0 | 14.666 us | 23.28% | 15.752 us | 29.31% | 1.087 us | 7.41% | PASS |
I128 | I64 | 2^20 | 0 | 40.049 us | 1.55% | 41.397 us | 1.85% | 1.348 us | 3.37% | FAIL |
I128 | I64 | 2^24 | 0 | 397.476 us | 0.34% | 415.143 us | 0.51% | 17.667 us | 4.44% | FAIL |
I128 | I64 | 2^28 | 0 | 6.109 ms | 0.10% | 6.396 ms | 0.11% | 287.378 us | 4.70% | FAIL |
F32 | I32 | 2^16 | 1 | 11.108 us | 4.45% | 11.313 us | 3.90% | 0.205 us | 1.84% | PASS |
F32 | I32 | 2^20 | 1 | 19.752 us | 2.75% | 20.315 us | 2.91% | 0.564 us | 2.85% | FAIL |
F32 | I32 | 2^24 | 1 | 137.973 us | 1.01% | 144.387 us | 1.42% | 6.414 us | 4.65% | FAIL |
F32 | I32 | 2^28 | 1 | 1.948 ms | 0.50% | 2.041 ms | 0.50% | 93.589 us | 4.80% | FAIL |
F32 | I32 | 2^16 | 0.544 | 10.518 us | 5.06% | 10.886 us | 5.29% | 0.367 us | 3.49% | PASS |
F32 | I32 | 2^20 | 0.544 | 19.152 us | 3.25% | 19.804 us | 3.48% | 0.653 us | 3.41% | FAIL |
F32 | I32 | 2^24 | 0.544 | 124.317 us | 1.18% | 130.768 us | 1.47% | 6.451 us | 5.19% | FAIL |
F32 | I32 | 2^28 | 0.544 | 1.694 ms | 0.50% | 1.811 ms | 0.50% | 116.890 us | 6.90% | FAIL |
F32 | I32 | 2^16 | 0 | 10.079 us | 5.26% | 10.410 us | 4.51% | 0.330 us | 3.28% | PASS |
F32 | I32 | 2^20 | 0 | 18.077 us | 3.23% | 18.776 us | 3.06% | 0.699 us | 3.87% | FAIL |
F32 | I32 | 2^24 | 0 | 108.270 us | 1.06% | 112.225 us | 1.42% | 3.955 us | 3.65% | FAIL |
F32 | I32 | 2^28 | 0 | 1.391 ms | 0.19% | 1.481 ms | 0.21% | 89.383 us | 6.42% | FAIL |
F32 | I64 | 2^16 | 1 | 11.168 us | 4.32% | 11.582 us | 4.81% | 0.414 us | 3.71% | PASS |
F32 | I64 | 2^20 | 1 | 20.750 us | 2.56% | 21.095 us | 2.58% | 0.345 us | 1.66% | PASS |
F32 | I64 | 2^24 | 1 | 141.341 us | 0.66% | 144.912 us | 0.70% | 3.571 us | 2.53% | FAIL |
F32 | I64 | 2^28 | 1 | 2.050 ms | 0.39% | 2.110 ms | 0.26% | 59.936 us | 2.92% | FAIL |
F32 | I64 | 2^16 | 0.544 | 11.350 us | 3.75% | 11.603 us | 4.78% | 0.253 us | 2.23% | PASS |
F32 | I64 | 2^20 | 0.544 | 20.667 us | 2.45% | 20.948 us | 2.86% | 0.282 us | 1.36% | PASS |
F32 | I64 | 2^24 | 0.544 | 131.247 us | 0.76% | 135.918 us | 0.85% | 4.671 us | 3.56% | FAIL |
F32 | I64 | 2^28 | 0.544 | 1.857 ms | 0.47% | 1.933 ms | 0.43% | 75.756 us | 4.08% | FAIL |
F32 | I64 | 2^16 | 0 | 11.030 us | 4.63% | 11.340 us | 3.82% | 0.310 us | 2.81% | PASS |
F32 | I64 | 2^20 | 0 | 19.882 us | 2.77% | 20.193 us | 2.61% | 0.311 us | 1.56% | PASS |
F32 | I64 | 2^24 | 0 | 116.187 us | 0.73% | 120.631 us | 0.88% | 4.444 us | 3.82% | FAIL |
F32 | I64 | 2^28 | 0 | 1.579 ms | 0.12% | 1.659 ms | 0.15% | 80.105 us | 5.07% | FAIL |
F64 | I32 | 2^16 | 1 | 11.418 us | 4.04% | 11.761 us | 4.85% | 0.343 us | 3.00% | PASS |
F64 | I32 | 2^20 | 1 | 27.530 us | 2.43% | 28.444 us | 2.96% | 0.915 us | 3.32% | FAIL |
F64 | I32 | 2^24 | 1 | 232.420 us | 1.34% | 244.087 us | 2.88% | 11.667 us | 5.02% | FAIL |
F64 | I32 | 2^28 | 1 | 3.489 ms | 0.55% | 3.674 ms | 0.90% | 184.551 us | 5.29% | FAIL |
F64 | I32 | 2^16 | 0.544 | 10.880 us | 5.22% | 11.202 us | 4.28% | 0.322 us | 2.96% | PASS |
F64 | I32 | 2^20 | 0.544 | 26.797 us | 2.10% | 27.385 us | 2.32% | 0.588 us | 2.19% | FAIL |
F64 | I32 | 2^24 | 0.544 | 193.974 us | 1.07% | 200.739 us | 1.62% | 6.766 us | 3.49% | FAIL |
F64 | I32 | 2^28 | 0.544 | 2.815 ms | 0.53% | 2.925 ms | 0.82% | 109.445 us | 3.89% | FAIL |
F64 | I32 | 2^16 | 0 | 10.368 us | 4.71% | 10.643 us | 5.40% | 0.275 us | 2.65% | PASS |
F64 | I32 | 2^20 | 0 | 26.214 us | 20.57% | 24.966 us | 2.56% | -1.248 us | -4.76% | FAIL |
F64 | I32 | 2^24 | 0 | 147.941 us | 0.77% | 154.369 us | 1.06% | 6.428 us | 4.34% | FAIL |
F64 | I32 | 2^28 | 0 | 1.971 ms | 0.17% | 2.080 ms | 0.23% | 109.155 us | 5.54% | FAIL |
F64 | I64 | 2^16 | 1 | 11.522 us | 4.60% | 12.045 us | 4.46% | 0.523 us | 4.54% | FAIL |
F64 | I64 | 2^20 | 1 | 27.626 us | 1.76% | 28.816 us | 1.95% | 1.190 us | 4.31% | FAIL |
F64 | I64 | 2^24 | 1 | 245.109 us | 0.54% | 251.156 us | 0.62% | 6.047 us | 2.47% | FAIL |
F64 | I64 | 2^28 | 1 | 3.737 ms | 0.50% | 3.829 ms | 0.35% | 92.013 us | 2.46% | FAIL |
F64 | I64 | 2^16 | 0.544 | 11.550 us | 4.62% | 12.049 us | 4.71% | 0.500 us | 4.33% | PASS |
F64 | I64 | 2^20 | 0.544 | 27.294 us | 2.07% | 28.566 us | 1.92% | 1.271 us | 4.66% | FAIL |
F64 | I64 | 2^24 | 0.544 | 217.308 us | 0.53% | 226.582 us | 0.62% | 9.274 us | 4.27% | FAIL |
F64 | I64 | 2^28 | 0.544 | 3.234 ms | 0.50% | 3.398 ms | 0.47% | 163.952 us | 5.07% | FAIL |
F64 | I64 | 2^16 | 0 | 11.530 us | 4.38% | 12.090 us | 4.38% | 0.560 us | 4.85% | FAIL |
F64 | I64 | 2^20 | 0 | 26.495 us | 2.13% | 27.618 us | 2.12% | 1.123 us | 4.24% | FAIL |
F64 | I64 | 2^24 | 0 | 191.519 us | 0.49% | 200.335 us | 0.58% | 8.816 us | 4.60% | FAIL |
F64 | I64 | 2^28 | 0 | 2.787 ms | 0.13% | 2.943 ms | 0.14% | 155.876 us | 5.59% | FAIL |
🟩 CI Results: Pass: 100%/198 | Total Time: 3d 14h | Avg Time: 26m 14s | Hits: 64%/118084
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Hi @elstehle , This is a great finding, but I'm surprised to learn that there is an in-place version of thrust::copy_if. I thought the thrust::copy_if output buffer typically wasn't allowed to overlap with the input buffer, perhaps I misunderstood this. thrust::copy_if() API documentation states:
If so, I assume if I use a separate buffer (non-inplace) to store the output while calling thrust::copy_if, I won't encounter this issue. Please correct me if I'm wrong. Thank you very much, |
DeviceSelect
, thrust::copy_if
, and thrust::remove_if
DeviceSelect
& thrust::remove_if
Sorry for the confusion. You are correct that However, I believe in-place stream compaction is a reasonable feature request to have so I have opened #1799.
This is exactly right, out-of-place usage is not affected by the referenced issue. |
🟨 CI Results: Pass: 97%/198 | Total Time: 4d 02h | Avg Time: 29m 54s | Hits: 32%/115429
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
DeviceSelect
& thrust::remove_if
DeviceSelect
& thrust::remove_if
Edit: Turns out, this branch was still based off of 7fe0eb4 (main, May 24) and the sass comparison was against e734d68 (main, Jun 7). After bisecting, the sass delta, which I had reported in my original comment (see below), was introduced by 733eb94. Those sass changes only affected kernels other than Original comment:
|
🟨 CI Results: Pass: 99%/198 | Total Time: 3d 00h | Avg Time: 21m 53s | Hits: 68%/117380
|
# | Runner |
---|---|
154 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
16 | linux-amd64-gpu-v100-latest-1 |
12 | windows-amd64-cpu16 |
👃 Inspect Changes
Modifications in project?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🟨 CI finished in 12h 50m: Pass: 99%/249 | Total: 1d 07h | Avg: 7m 36s | Max: 47m 49s | Hits: 99%/246608
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! I have a few questions that we should consider before merging this.
🟨 CI finished in 3h 15m: Pass: 99%/249 | Total: 4d 20h | Avg: 28m 10s | Max: 1h 19m | Hits: 65%/247587
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 7h 06m: Pass: 100%/249 | Total: 4d 21h | Avg: 28m 12s | Max: 1h 19m | Hits: 65%/248439
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
…el template instantiations
🟩 CI finished in 3h 38m: Pass: 100%/249 | Total: 4d 17h | Avg: 27m 18s | Max: 50m 52s | Hits: 66%/248439
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
For the latest commit, aeff76e
|
T{ct} | OffsetT{ct} | IsInPlace{ct} | Elements{io} | Entropy | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|---|---|---|
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.007 us | 11.14% | 8.900 us | 9.40% | -0.107 us | -1.19% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 15.431 us | 6.21% | 15.290 us | 4.31% | -0.141 us | -0.91% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 106.602 us | 1.16% | 106.607 us | 0.99% | 0.005 us | 0.01% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 1.582 ms | 0.50% | 1.582 ms | 0.50% | -0.004 us | -0.00% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 8.909 us | 11.30% | 8.745 us | 7.71% | -0.164 us | -1.84% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 14.776 us | 6.39% | 14.673 us | 4.22% | -0.104 us | -0.70% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 99.508 us | 1.22% | 99.539 us | 0.95% | 0.031 us | 0.03% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.452 ms | 0.50% | 1.452 ms | 0.50% | -0.203 us | -0.01% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.681 us | 12.13% | 8.522 us | 8.96% | -0.160 us | -1.84% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 14.494 us | 6.75% | 14.376 us | 5.30% | -0.118 us | -0.81% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 90.669 us | 1.17% | 90.674 us | 0.76% | 0.005 us | 0.01% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.291 ms | 0.46% | 1.291 ms | 0.46% | -0.071 us | -0.01% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 8.881 us | 11.96% | 9.217 us | 7.13% | 0.336 us | 3.78% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 15.201 us | 6.24% | 16.716 us | 3.56% | 1.514 us | 9.96% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 106.884 us | 1.13% | 130.418 us | 0.97% | 23.534 us | 22.02% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 1.588 ms | 0.50% | 1.970 ms | 0.50% | 381.817 us | 24.04% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 8.807 us | 10.74% | 9.091 us | 6.10% | 0.284 us | 3.22% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 14.614 us | 5.59% | 16.189 us | 4.74% | 1.575 us | 10.78% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 99.812 us | 1.08% | 120.872 us | 0.99% | 21.060 us | 21.10% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.459 ms | 0.50% | 1.808 ms | 0.50% | 348.988 us | 23.93% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.556 us | 10.55% | 8.857 us | 8.39% | 0.301 us | 3.52% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 14.264 us | 6.21% | 15.734 us | 4.08% | 1.470 us | 10.31% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 90.695 us | 1.03% | 108.666 us | 0.90% | 17.971 us | 19.81% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.293 ms | 0.48% | 1.586 ms | 0.45% | 292.777 us | 22.65% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 8.992 us | 10.90% | 8.927 us | 7.19% | -0.065 us | -0.73% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 15.746 us | 6.16% | 15.658 us | 5.04% | -0.089 us | -0.56% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 112.095 us | 0.91% | 111.839 us | 0.83% | -0.256 us | -0.23% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 1.661 ms | 0.50% | 1.661 ms | 0.50% | -0.014 us | -0.00% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.037 us | 9.73% | 8.844 us | 8.08% | -0.193 us | -2.14% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 15.293 us | 6.06% | 15.044 us | 4.12% | -0.249 us | -1.63% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 105.595 us | 0.99% | 105.342 us | 0.72% | -0.253 us | -0.24% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.553 ms | 0.50% | 1.551 ms | 0.50% | -1.704 us | -0.11% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.755 us | 10.97% | 8.613 us | 8.53% | -0.142 us | -1.63% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 14.901 us | 6.37% | 14.745 us | 4.46% | -0.157 us | -1.05% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 95.914 us | 0.99% | 95.482 us | 0.86% | -0.432 us | -0.45% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.366 ms | 0.50% | 1.366 ms | 0.50% | -0.172 us | -0.01% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.045 us | 10.33% | 9.250 us | 6.63% | 0.205 us | 2.26% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 15.536 us | 5.19% | 16.757 us | 3.73% | 1.221 us | 7.86% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 112.301 us | 1.05% | 131.931 us | 1.06% | 19.629 us | 17.48% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 1.667 ms | 0.50% | 1.995 ms | 0.50% | 327.554 us | 19.65% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 8.993 us | 10.72% | 9.186 us | 7.10% | 0.193 us | 2.15% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 15.061 us | 5.97% | 16.360 us | 4.11% | 1.299 us | 8.62% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 105.852 us | 0.85% | 123.472 us | 0.73% | 17.620 us | 16.65% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.562 ms | 0.50% | 1.850 ms | 0.50% | 288.585 us | 18.48% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.641 us | 11.08% | 8.971 us | 8.32% | 0.330 us | 3.82% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 14.509 us | 5.74% | 15.818 us | 4.62% | 1.309 us | 9.02% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 95.546 us | 0.95% | 110.062 us | 0.64% | 14.516 us | 15.19% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.367 ms | 0.50% | 1.598 ms | 0.50% | 231.192 us | 16.91% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.076 us | 9.46% | 9.224 us | 6.13% | 0.147 us | 1.62% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 16.587 us | 5.77% | 16.572 us | 3.68% | -0.015 us | -0.09% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 125.800 us | 1.24% | 126.143 us | 1.11% | 0.343 us | 0.27% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 1.873 ms | 0.50% | 1.873 ms | 0.50% | 0.193 us | 0.01% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.098 us | 10.03% | 9.033 us | 7.38% | -0.066 us | -0.72% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 16.305 us | 5.83% | 16.281 us | 3.81% | -0.024 us | -0.15% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 115.662 us | 1.16% | 115.605 us | 1.09% | -0.057 us | -0.05% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.702 ms | 0.50% | 1.702 ms | 0.50% | -0.028 us | -0.00% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.738 us | 11.17% | 8.681 us | 7.56% | -0.057 us | -0.65% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 16.013 us | 5.96% | 15.905 us | 4.99% | -0.108 us | -0.67% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 95.845 us | 0.83% | 95.773 us | 0.69% | -0.073 us | -0.08% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.349 ms | 0.50% | 1.349 ms | 0.50% | -0.205 us | -0.02% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.290 us | 9.78% | 9.419 us | 6.98% | 0.129 us | 1.39% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 16.379 us | 5.24% | 17.882 us | 3.78% | 1.503 us | 9.18% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 125.515 us | 1.23% | 148.870 us | 1.14% | 23.355 us | 18.61% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 1.870 ms | 0.50% | 2.247 ms | 0.50% | 376.694 us | 20.14% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 8.967 us | 10.96% | 9.302 us | 5.65% | 0.335 us | 3.73% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 16.083 us | 5.42% | 17.760 us | 3.87% | 1.677 us | 10.43% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 115.538 us | 1.29% | 137.238 us | 1.07% | 21.699 us | 18.78% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.703 ms | 0.50% | 2.054 ms | 0.50% | 351.280 us | 20.63% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.619 us | 11.07% | 8.884 us | 7.71% | 0.264 us | 3.07% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 15.684 us | 5.61% | 17.120 us | 4.29% | 1.436 us | 9.15% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 95.589 us | 1.06% | 115.920 us | 0.79% | 20.331 us | 21.27% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.346 ms | 0.50% | 1.685 ms | 0.50% | 338.197 us | 25.12% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.442 us | 9.14% | 9.060 us | 7.27% | -0.382 us | -4.05% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 16.781 us | 5.58% | 16.821 us | 3.87% | 0.040 us | 0.24% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 128.234 us | 1.14% | 128.370 us | 0.96% | 0.136 us | 0.11% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 1.904 ms | 0.50% | 1.905 ms | 0.50% | 0.175 us | 0.01% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.077 us | 9.76% | 9.057 us | 6.52% | -0.019 us | -0.21% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 16.648 us | 4.96% | 16.683 us | 4.22% | 0.035 us | 0.21% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 118.221 us | 1.21% | 118.322 us | 0.87% | 0.101 us | 0.09% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.743 ms | 0.50% | 1.743 ms | 0.50% | 0.247 us | 0.01% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.813 us | 10.81% | 8.784 us | 8.14% | -0.030 us | -0.33% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 16.289 us | 5.41% | 16.237 us | 4.04% | -0.052 us | -0.32% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 100.343 us | 0.98% | 100.208 us | 0.68% | -0.135 us | -0.13% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.419 ms | 0.50% | 1.419 ms | 0.50% | -0.165 us | -0.01% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.163 us | 9.84% | 9.560 us | 6.91% | 0.397 us | 4.34% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 16.755 us | 5.83% | 18.175 us | 3.96% | 1.420 us | 8.48% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 128.191 us | 1.08% | 148.892 us | 0.88% | 20.701 us | 16.15% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 1.906 ms | 0.50% | 2.241 ms | 0.50% | 334.878 us | 17.57% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.094 us | 10.12% | 9.468 us | 6.78% | 0.375 us | 4.12% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 16.608 us | 5.82% | 17.967 us | 4.35% | 1.359 us | 8.18% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 118.511 us | 1.11% | 137.718 us | 0.80% | 19.207 us | 16.21% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.747 ms | 0.50% | 2.061 ms | 0.50% | 313.239 us | 17.93% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.780 us | 10.87% | 9.112 us | 7.28% | 0.333 us | 3.79% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 16.130 us | 5.77% | 17.346 us | 3.54% | 1.216 us | 7.54% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 100.348 us | 1.06% | 115.834 us | 0.69% | 15.486 us | 15.43% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.420 ms | 0.50% | 1.674 ms | 0.50% | 254.693 us | 17.94% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.087 us | 9.50% | 9.100 us | 6.54% | 0.013 us | 0.15% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 19.727 us | 4.18% | 19.902 us | 2.95% | 0.175 us | 0.89% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 184.678 us | 0.94% | 184.797 us | 0.89% | 0.119 us | 0.06% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 2.822 ms | 0.62% | 2.822 ms | 0.62% | -0.019 us | -0.00% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.084 us | 10.31% | 9.121 us | 6.97% | 0.038 us | 0.42% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 19.709 us | 4.36% | 19.754 us | 3.18% | 0.045 us | 0.23% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 156.173 us | 1.19% | 156.400 us | 1.15% | 0.228 us | 0.15% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 2.331 ms | 0.50% | 2.332 ms | 0.50% | 0.268 us | 0.01% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.811 us | 10.47% | 8.845 us | 7.67% | 0.034 us | 0.39% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 18.760 us | 4.75% | 18.830 us | 3.38% | 0.069 us | 0.37% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 113.675 us | 1.10% | 113.924 us | 1.00% | 0.249 us | 0.22% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.580 ms | 1.01% | 1.580 ms | 1.00% | 0.211 us | 0.01% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.211 us | 9.99% | 9.522 us | 7.28% | 0.311 us | 3.38% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 19.330 us | 4.83% | 20.825 us | 3.21% | 1.496 us | 7.74% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 184.238 us | 0.95% | 202.337 us | 1.22% | 18.099 us | 9.82% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 2.823 ms | 0.61% | 3.117 ms | 0.58% | 294.152 us | 10.42% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.222 us | 9.56% | 9.565 us | 7.44% | 0.343 us | 3.72% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 19.430 us | 4.31% | 20.810 us | 3.08% | 1.380 us | 7.10% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 155.713 us | 1.17% | 176.721 us | 1.21% | 21.009 us | 13.49% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.329 ms | 0.50% | 2.678 ms | 0.50% | 348.622 us | 14.97% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.950 us | 10.60% | 9.259 us | 7.39% | 0.308 us | 3.45% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 18.403 us | 4.97% | 19.856 us | 3.98% | 1.453 us | 7.90% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 113.154 us | 1.11% | 131.679 us | 0.98% | 18.525 us | 16.37% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.577 ms | 1.00% | 1.884 ms | 0.66% | 306.858 us | 19.45% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.556 us | 10.15% | 9.282 us | 6.07% | -0.275 us | -2.87% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 20.286 us | 4.29% | 20.178 us | 3.73% | -0.108 us | -0.53% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 186.279 us | 1.06% | 186.296 us | 0.99% | 0.016 us | 0.01% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 2.845 ms | 0.61% | 2.845 ms | 0.61% | -0.075 us | -0.00% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.158 us | 9.66% | 9.349 us | 7.11% | 0.191 us | 2.09% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 19.940 us | 4.34% | 20.234 us | 3.84% | 0.294 us | 1.47% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 158.219 us | 1.18% | 157.395 us | 1.12% | -0.824 us | -0.52% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 2.357 ms | 0.50% | 2.357 ms | 0.50% | -0.289 us | -0.01% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.898 us | 10.14% | 8.910 us | 7.63% | 0.012 us | 0.14% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 19.121 us | 4.63% | 19.203 us | 3.40% | 0.081 us | 0.42% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 117.220 us | 1.09% | 117.232 us | 1.07% | 0.012 us | 0.01% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.642 ms | 0.88% | 1.642 ms | 0.87% | 0.097 us | 0.01% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.601 us | 9.71% | 9.630 us | 7.01% | 0.030 us | 0.31% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 19.685 us | 4.33% | 20.800 us | 3.69% | 1.115 us | 5.66% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 185.464 us | 1.07% | 200.556 us | 1.18% | 15.092 us | 8.14% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 2.844 ms | 0.60% | 3.080 ms | 0.57% | 235.917 us | 8.29% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.216 us | 9.41% | 9.851 us | 7.16% | 0.635 us | 6.89% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 19.648 us | 4.19% | 20.840 us | 3.25% | 1.192 us | 6.07% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 157.396 us | 1.18% | 176.233 us | 1.09% | 18.837 us | 11.97% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.352 ms | 0.50% | 2.666 ms | 0.50% | 313.266 us | 13.32% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 9.038 us | 10.37% | 9.448 us | 6.43% | 0.410 us | 4.54% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 18.767 us | 4.70% | 20.207 us | 3.98% | 1.439 us | 7.67% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 116.400 us | 1.07% | 132.050 us | 0.86% | 15.651 us | 13.45% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.639 ms | 0.88% | 1.891 ms | 0.59% | 252.971 us | 15.44% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 10.156 us | 9.39% | 10.174 us | 6.65% | 0.018 us | 0.18% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 29.541 us | 3.10% | 29.546 us | 2.51% | 0.005 us | 0.02% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 349.572 us | 0.58% | 349.468 us | 0.50% | -0.104 us | -0.03% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.472 ms | 0.50% | 5.472 ms | 0.50% | -0.069 us | -0.00% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 10.577 us | 9.61% | 10.532 us | 6.89% | -0.045 us | -0.42% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 28.041 us | 3.48% | 28.151 us | 2.82% | 0.110 us | 0.39% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 281.177 us | 0.67% | 281.079 us | 0.61% | -0.098 us | -0.03% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 4.339 ms | 0.50% | 4.339 ms | 0.50% | 0.414 us | 0.01% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.914 us | 9.06% | 9.877 us | 6.99% | -0.036 us | -0.37% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 27.064 us | 3.54% | 27.703 us | 3.38% | 0.638 us | 2.36% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 192.591 us | 0.98% | 192.688 us | 0.97% | 0.097 us | 0.05% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 2.833 ms | 0.88% | 2.833 ms | 0.88% | 0.077 us | 0.00% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.214 us | 8.21% | 10.605 us | 5.88% | 0.391 us | 3.83% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 29.476 us | 3.90% | 30.299 us | 2.68% | 0.823 us | 2.79% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 349.741 us | 0.61% | 373.575 us | 1.05% | 23.834 us | 6.81% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.477 ms | 0.50% | 5.866 ms | 0.50% | 388.650 us | 7.10% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 10.582 us | 7.84% | 11.063 us | 6.18% | 0.481 us | 4.55% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 27.939 us | 3.34% | 29.521 us | 3.21% | 1.582 us | 5.66% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 280.994 us | 0.74% | 310.603 us | 0.93% | 29.610 us | 10.54% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 4.339 ms | 0.50% | 4.823 ms | 0.50% | 483.804 us | 11.15% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 9.888 us | 9.98% | 10.322 us | 5.13% | 0.434 us | 4.39% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 27.005 us | 3.50% | 28.126 us | 3.11% | 1.121 us | 4.15% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 192.298 us | 0.91% | 219.359 us | 0.89% | 27.061 us | 14.07% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 2.826 ms | 0.89% | 3.275 ms | 0.64% | 448.628 us | 15.87% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 10.723 us | 9.01% | 10.641 us | 6.18% | -0.082 us | -0.76% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 30.655 us | 3.12% | 30.421 us | 2.72% | -0.234 us | -0.76% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 357.695 us | 0.68% | 357.707 us | 0.68% | 0.013 us | 0.00% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.594 ms | 0.50% | 5.594 ms | 0.50% | -0.107 us | -0.00% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 10.307 us | 8.96% | 10.293 us | 5.60% | -0.014 us | -0.14% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 28.926 us | 3.21% | 28.965 us | 3.23% | 0.040 us | 0.14% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 291.111 us | 0.80% | 291.240 us | 0.78% | 0.129 us | 0.04% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 4.506 ms | 0.50% | 4.505 ms | 0.50% | -0.563 us | -0.01% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 10.427 us | 8.20% | 10.377 us | 6.86% | -0.050 us | -0.48% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 27.917 us | 4.04% | 27.829 us | 2.50% | -0.088 us | -0.31% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 205.002 us | 0.85% | 205.133 us | 0.82% | 0.131 us | 0.06% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 3.046 ms | 0.73% | 3.046 ms | 0.74% | 0.075 us | 0.00% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.662 us | 8.70% | 11.186 us | 6.03% | 0.524 us | 4.92% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 30.334 us | 3.39% | 31.449 us | 2.43% | 1.115 us | 3.67% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 357.806 us | 0.65% | 380.408 us | 0.90% | 22.601 us | 6.32% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.594 ms | 0.50% | 5.972 ms | 0.50% | 377.699 us | 6.75% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 10.262 us | 8.80% | 10.829 us | 6.69% | 0.568 us | 5.53% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 28.833 us | 4.16% | 30.139 us | 2.81% | 1.306 us | 4.53% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 290.620 us | 0.80% | 320.918 us | 0.82% | 30.298 us | 10.43% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 4.499 ms | 0.50% | 4.988 ms | 0.50% | 489.262 us | 10.88% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 10.432 us | 8.99% | 10.953 us | 6.46% | 0.521 us | 4.99% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 27.746 us | 3.14% | 28.971 us | 2.26% | 1.225 us | 4.42% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 204.764 us | 0.92% | 231.714 us | 0.72% | 26.950 us | 13.16% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 3.043 ms | 0.74% | 3.484 ms | 0.56% | 441.080 us | 14.50% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 12.641 us | 8.39% | 12.581 us | 5.84% | -0.060 us | -0.47% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 39.876 us | 2.39% | 39.822 us | 1.72% | -0.054 us | -0.14% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 394.223 us | 0.59% | 394.020 us | 0.54% | -0.203 us | -0.05% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 6.070 ms | 0.50% | 6.070 ms | 0.50% | -0.136 us | -0.00% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 12.641 us | 7.56% | 12.590 us | 5.75% | -0.050 us | -0.40% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 39.922 us | 2.85% | 39.844 us | 1.87% | -0.079 us | -0.20% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 393.988 us | 0.55% | 394.046 us | 0.56% | 0.058 us | 0.01% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 6.070 ms | 0.50% | 6.070 ms | 0.50% | 0.037 us | 0.00% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 12.602 us | 7.70% | 12.569 us | 5.22% | -0.032 us | -0.26% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 39.894 us | 2.25% | 39.834 us | 1.74% | -0.060 us | -0.15% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 393.885 us | 0.56% | 393.952 us | 0.58% | 0.067 us | 0.02% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 6.069 ms | 0.50% | 6.070 ms | 0.50% | 0.373 us | 0.01% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 12.565 us | 7.45% | 13.224 us | 5.27% | 0.659 us | 5.24% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 39.734 us | 2.45% | 43.211 us | 1.95% | 3.477 us | 8.75% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 393.148 us | 0.57% | 465.553 us | 0.53% | 72.404 us | 18.42% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 6.056 ms | 0.50% | 7.251 ms | 0.50% | 1.195 ms | 19.73% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 12.545 us | 7.03% | 13.226 us | 4.90% | 0.681 us | 5.43% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 39.770 us | 2.62% | 43.260 us | 2.02% | 3.490 us | 8.77% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 392.997 us | 0.58% | 465.384 us | 0.55% | 72.387 us | 18.42% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 6.056 ms | 0.50% | 7.251 ms | 0.50% | 1.195 ms | 19.73% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 12.521 us | 6.79% | 13.232 us | 5.39% | 0.711 us | 5.68% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 39.748 us | 2.53% | 43.242 us | 1.97% | 3.494 us | 8.79% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 393.161 us | 0.58% | 465.331 us | 0.50% | 72.170 us | 18.36% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 6.056 ms | 0.50% | 7.252 ms | 0.50% | 1.196 ms | 19.74% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 12.075 us | 8.34% | 12.043 us | 5.91% | -0.032 us | -0.26% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 40.972 us | 2.18% | 40.946 us | 1.76% | -0.026 us | -0.06% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 415.609 us | 0.50% | 415.617 us | 0.50% | 0.008 us | 0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 6.420 ms | 0.50% | 6.420 ms | 0.50% | -0.156 us | -0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 12.191 us | 7.96% | 12.121 us | 5.79% | -0.070 us | -0.58% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 41.120 us | 2.35% | 41.022 us | 1.86% | -0.099 us | -0.24% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 415.724 us | 0.50% | 415.608 us | 0.50% | -0.116 us | -0.03% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 6.420 ms | 0.50% | 6.420 ms | 0.50% | -0.012 us | -0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 12.157 us | 7.92% | 12.099 us | 6.11% | -0.057 us | -0.47% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 41.074 us | 2.13% | 40.998 us | 1.84% | -0.076 us | -0.18% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 415.625 us | 0.50% | 415.623 us | 0.50% | -0.002 us | -0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 6.420 ms | 0.50% | 6.420 ms | 0.50% | 0.134 us | 0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 12.262 us | 7.01% | 13.002 us | 4.77% | 0.740 us | 6.03% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 41.046 us | 2.60% | 44.094 us | 2.09% | 3.049 us | 7.43% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 414.718 us | 0.50% | 477.074 us | 0.48% | 62.356 us | 15.04% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 6.405 ms | 0.50% | 7.418 ms | 0.50% | 1.013 ms | 15.82% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 12.280 us | 7.11% | 13.014 us | 5.67% | 0.734 us | 5.98% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 41.036 us | 2.41% | 44.092 us | 1.96% | 3.056 us | 7.45% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 414.723 us | 0.58% | 476.940 us | 0.50% | 62.217 us | 15.00% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 6.405 ms | 0.50% | 7.418 ms | 0.50% | 1.013 ms | 15.82% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 12.200 us | 7.09% | 12.954 us | 5.81% | 0.755 us | 6.19% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 40.967 us | 2.42% | 44.007 us | 1.61% | 3.040 us | 7.42% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 414.601 us | 0.50% | 476.925 us | 0.49% | 62.324 us | 15.03% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 6.404 ms | 0.50% | 7.418 ms | 0.50% | 1.013 ms | 15.82% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.090 us | 10.09% | 9.084 us | 6.98% | -0.005 us | -0.06% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 19.736 us | 4.04% | 19.834 us | 3.73% | 0.098 us | 0.50% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 184.820 us | 1.03% | 184.908 us | 0.91% | 0.087 us | 0.05% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 2.960 ms | 0.67% | 2.960 ms | 0.67% | 0.058 us | 0.00% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 8.918 us | 10.77% | 8.956 us | 7.62% | 0.038 us | 0.42% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 19.030 us | 4.89% | 19.078 us | 3.87% | 0.047 us | 0.25% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 128.787 us | 1.16% | 128.853 us | 1.07% | 0.065 us | 0.05% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.859 ms | 0.79% | 1.858 ms | 0.79% | -0.177 us | -0.01% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.824 us | 10.89% | 8.855 us | 8.03% | 0.031 us | 0.35% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 18.777 us | 4.48% | 18.815 us | 3.55% | 0.038 us | 0.20% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 113.562 us | 1.08% | 113.590 us | 0.99% | 0.028 us | 0.02% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.580 ms | 1.00% | 1.580 ms | 1.00% | 0.020 us | 0.00% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.213 us | 9.50% | 9.456 us | 6.99% | 0.242 us | 2.63% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 19.300 us | 4.53% | 20.691 us | 4.14% | 1.391 us | 7.21% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 184.421 us | 0.94% | 202.357 us | 1.23% | 17.936 us | 9.73% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 2.955 ms | 0.66% | 3.195 ms | 0.59% | 239.910 us | 8.12% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.024 us | 10.13% | 9.195 us | 6.72% | 0.171 us | 1.90% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 18.608 us | 4.46% | 20.212 us | 3.40% | 1.603 us | 8.62% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 128.213 us | 1.22% | 145.994 us | 1.19% | 17.780 us | 13.87% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.854 ms | 0.77% | 2.146 ms | 0.71% | 291.596 us | 15.73% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.931 us | 9.48% | 9.134 us | 7.89% | 0.204 us | 2.28% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 18.366 us | 5.05% | 19.719 us | 3.25% | 1.352 us | 7.36% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 113.159 us | 1.19% | 131.735 us | 1.05% | 18.576 us | 16.42% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.578 ms | 1.00% | 1.884 ms | 0.66% | 306.240 us | 19.41% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.334 us | 9.11% | 9.198 us | 5.90% | -0.136 us | -1.45% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 20.174 us | 5.03% | 20.096 us | 4.17% | -0.078 us | -0.39% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 186.293 us | 1.06% | 186.392 us | 1.03% | 0.099 us | 0.05% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 2.969 ms | 0.67% | 2.969 ms | 0.66% | 0.192 us | 0.01% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 8.996 us | 9.30% | 9.002 us | 7.68% | 0.007 us | 0.07% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 19.309 us | 4.20% | 19.312 us | 3.46% | 0.003 us | 0.01% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 131.875 us | 1.31% | 131.866 us | 1.27% | -0.009 us | -0.01% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.902 ms | 0.70% | 1.902 ms | 0.68% | -0.085 us | -0.00% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.960 us | 10.61% | 8.971 us | 9.02% | 0.011 us | 0.12% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 19.053 us | 5.00% | 19.074 us | 3.74% | 0.021 us | 0.11% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 117.438 us | 1.28% | 117.481 us | 1.02% | 0.043 us | 0.04% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.645 ms | 0.88% | 1.645 ms | 0.88% | -0.022 us | -0.00% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.189 us | 8.36% | 9.609 us | 7.31% | 0.419 us | 4.56% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 19.579 us | 4.44% | 20.791 us | 3.14% | 1.212 us | 6.19% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 185.899 us | 1.04% | 201.170 us | 1.21% | 15.271 us | 8.21% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 2.964 ms | 0.66% | 3.157 ms | 0.58% | 193.144 us | 6.52% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 8.980 us | 9.85% | 9.404 us | 6.42% | 0.424 us | 4.72% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 18.899 us | 5.35% | 20.350 us | 3.17% | 1.452 us | 7.68% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 131.479 us | 1.38% | 145.287 us | 1.09% | 13.808 us | 10.50% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.898 ms | 0.69% | 2.130 ms | 0.67% | 231.565 us | 12.20% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.987 us | 10.25% | 9.316 us | 6.14% | 0.329 us | 3.66% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 18.671 us | 4.86% | 19.853 us | 3.88% | 1.182 us | 6.33% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 116.948 us | 1.09% | 132.369 us | 0.88% | 15.421 us | 13.19% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.642 ms | 0.89% | 1.898 ms | 0.58% | 256.094 us | 15.59% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 10.168 us | 9.38% | 10.246 us | 7.35% | 0.078 us | 0.76% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 29.440 us | 3.05% | 29.382 us | 2.83% | -0.058 us | -0.20% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 349.573 us | 0.58% | 349.559 us | 0.50% | -0.014 us | -0.00% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.471 ms | 0.50% | 5.471 ms | 0.50% | -0.029 us | -0.00% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.833 us | 9.91% | 9.662 us | 7.82% | -0.171 us | -1.74% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 26.817 us | 3.49% | 26.751 us | 2.66% | -0.065 us | -0.24% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 224.507 us | 0.86% | 224.448 us | 0.78% | -0.059 us | -0.03% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 3.384 ms | 0.50% | 3.384 ms | 0.50% | -0.220 us | -0.01% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.747 us | 9.66% | 9.639 us | 7.67% | -0.108 us | -1.11% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 27.028 us | 3.81% | 26.923 us | 2.57% | -0.105 us | -0.39% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 192.588 us | 0.97% | 192.669 us | 0.91% | 0.080 us | 0.04% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 2.833 ms | 0.88% | 2.833 ms | 0.88% | 0.019 us | 0.00% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.355 us | 8.89% | 11.014 us | 6.97% | 0.659 us | 6.37% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 29.343 us | 3.16% | 30.409 us | 2.69% | 1.066 us | 3.63% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 349.703 us | 0.61% | 373.643 us | 0.99% | 23.940 us | 6.85% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.476 ms | 0.50% | 5.863 ms | 0.50% | 387.350 us | 7.07% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.936 us | 8.76% | 10.482 us | 6.55% | 0.546 us | 5.50% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 26.761 us | 3.82% | 28.094 us | 2.79% | 1.332 us | 4.98% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 224.301 us | 0.84% | 250.054 us | 0.96% | 25.753 us | 11.48% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 3.381 ms | 0.50% | 3.803 ms | 0.50% | 422.081 us | 12.48% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 9.821 us | 9.16% | 10.241 us | 6.56% | 0.421 us | 4.28% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 27.019 us | 3.92% | 28.123 us | 3.23% | 1.103 us | 4.08% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 192.345 us | 1.04% | 219.372 us | 0.90% | 27.027 us | 14.05% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 2.827 ms | 0.89% | 3.274 ms | 0.64% | 447.008 us | 15.81% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 10.473 us | 8.21% | 10.543 us | 6.21% | 0.070 us | 0.67% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 30.201 us | 3.02% | 30.117 us | 2.72% | -0.084 us | -0.28% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 356.777 us | 0.59% | 356.719 us | 0.62% | -0.058 us | -0.02% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.579 ms | 0.50% | 5.579 ms | 0.50% | -0.056 us | -0.00% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 10.262 us | 8.14% | 10.232 us | 6.04% | -0.030 us | -0.30% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 27.376 us | 3.87% | 27.363 us | 2.95% | -0.012 us | -0.04% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 235.683 us | 0.94% | 235.588 us | 0.87% | -0.095 us | -0.04% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 3.568 ms | 0.50% | 3.568 ms | 0.50% | -0.384 us | -0.01% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 10.229 us | 8.82% | 10.180 us | 6.67% | -0.049 us | -0.48% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 27.559 us | 3.11% | 27.542 us | 2.42% | -0.017 us | -0.06% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 204.945 us | 0.84% | 204.917 us | 0.82% | -0.028 us | -0.01% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 3.046 ms | 0.74% | 3.046 ms | 0.74% | -0.063 us | -0.00% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.799 us | 8.74% | 11.257 us | 7.03% | 0.458 us | 4.24% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 30.393 us | 2.96% | 31.365 us | 2.47% | 0.972 us | 3.20% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 357.692 us | 0.67% | 380.848 us | 0.95% | 23.155 us | 6.47% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.592 ms | 0.50% | 5.977 ms | 0.50% | 384.659 us | 6.88% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 10.390 us | 9.12% | 10.878 us | 6.73% | 0.488 us | 4.70% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 27.362 us | 3.22% | 28.456 us | 2.49% | 1.094 us | 4.00% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 234.751 us | 0.83% | 258.557 us | 0.91% | 23.806 us | 10.14% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 3.554 ms | 0.50% | 3.955 ms | 0.50% | 400.615 us | 11.27% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 10.272 us | 8.69% | 10.721 us | 6.58% | 0.449 us | 4.38% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 27.522 us | 3.43% | 28.585 us | 2.53% | 1.063 us | 3.86% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 204.282 us | 0.89% | 229.501 us | 0.77% | 25.219 us | 12.35% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 3.036 ms | 0.75% | 3.454 ms | 0.60% | 417.787 us | 13.76% | FAIL |
Select.Flagged - Tesla V100-SXM2-32GB
T{ct} | OffsetT{ct} | IsInPlace{ct} | Elements{io} | Entropy | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
---|---|---|---|---|---|---|---|---|---|---|---|
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 8.914 us | 9.49% | 9.036 us | 8.40% | 0.122 us | 1.36% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 16.298 us | 5.87% | 16.257 us | 5.53% | -0.041 us | -0.25% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 120.410 us | 0.95% | 120.345 us | 0.86% | -0.065 us | -0.05% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 1.805 ms | 0.50% | 1.805 ms | 0.50% | 0.045 us | 0.00% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 8.783 us | 10.78% | 8.906 us | 8.33% | 0.123 us | 1.39% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 15.903 us | 5.68% | 15.911 us | 5.39% | 0.008 us | 0.05% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 120.355 us | 0.86% | 120.415 us | 0.95% | 0.060 us | 0.05% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.733 ms | 0.52% | 1.733 ms | 0.51% | -0.176 us | -0.01% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.566 us | 10.81% | 8.629 us | 7.93% | 0.063 us | 0.73% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 15.379 us | 5.59% | 15.328 us | 3.75% | -0.051 us | -0.33% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 103.673 us | 0.93% | 103.681 us | 0.84% | 0.008 us | 0.01% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.487 ms | 0.12% | 1.487 ms | 0.10% | -0.076 us | -0.01% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.129 us | 9.95% | 9.401 us | 7.16% | 0.272 us | 2.98% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 16.560 us | 5.20% | 18.294 us | 4.26% | 1.734 us | 10.47% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 121.442 us | 1.06% | 142.335 us | 0.86% | 20.893 us | 17.20% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 1.817 ms | 0.50% | 2.151 ms | 0.50% | 334.285 us | 18.40% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.055 us | 9.17% | 9.292 us | 6.48% | 0.237 us | 2.62% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 16.223 us | 5.19% | 17.938 us | 3.77% | 1.715 us | 10.57% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 121.616 us | 1.14% | 141.595 us | 0.91% | 19.979 us | 16.43% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.749 ms | 0.52% | 2.092 ms | 0.50% | 342.680 us | 19.59% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.727 us | 10.27% | 8.914 us | 7.87% | 0.187 us | 2.15% | PASS |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 15.601 us | 5.73% | 17.215 us | 4.45% | 1.614 us | 10.34% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 104.299 us | 0.94% | 122.782 us | 0.75% | 18.483 us | 17.72% | FAIL |
I8 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.498 ms | 0.14% | 1.791 ms | 0.17% | 293.387 us | 19.59% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.014 us | 9.14% | 8.922 us | 7.31% | -0.092 us | -1.02% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 17.074 us | 4.92% | 17.030 us | 4.15% | -0.043 us | -0.25% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 131.937 us | 0.78% | 132.036 us | 0.63% | 0.099 us | 0.07% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 1.991 ms | 0.50% | 1.991 ms | 0.50% | 0.271 us | 0.01% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 8.893 us | 10.78% | 8.950 us | 8.60% | 0.057 us | 0.64% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 16.782 us | 5.70% | 16.843 us | 4.15% | 0.060 us | 0.36% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 131.679 us | 0.85% | 131.770 us | 0.79% | 0.090 us | 0.07% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.919 ms | 0.50% | 1.919 ms | 0.50% | 0.217 us | 0.01% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.710 us | 10.74% | 8.751 us | 8.47% | 0.041 us | 0.47% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 15.955 us | 6.13% | 15.972 us | 4.21% | 0.017 us | 0.11% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 114.285 us | 0.70% | 114.355 us | 0.81% | 0.070 us | 0.06% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.662 ms | 0.08% | 1.662 ms | 0.10% | 0.010 us | 0.00% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.146 us | 9.08% | 9.543 us | 7.06% | 0.397 us | 4.34% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 16.883 us | 5.17% | 18.343 us | 4.45% | 1.459 us | 8.64% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 126.447 us | 0.96% | 144.267 us | 0.71% | 17.820 us | 14.09% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 1.903 ms | 0.50% | 2.181 ms | 0.50% | 278.194 us | 14.62% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.077 us | 10.18% | 9.464 us | 6.54% | 0.387 us | 4.27% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 16.545 us | 5.01% | 18.092 us | 3.63% | 1.547 us | 9.35% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 126.564 us | 0.88% | 143.767 us | 0.88% | 17.203 us | 13.59% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 1.828 ms | 0.50% | 2.123 ms | 0.50% | 295.200 us | 16.15% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.832 us | 10.10% | 9.036 us | 7.15% | 0.204 us | 2.31% | PASS |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 16.032 us | 5.64% | 17.334 us | 4.25% | 1.302 us | 8.12% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 109.697 us | 0.86% | 124.912 us | 0.68% | 15.216 us | 13.87% | FAIL |
I8 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.576 ms | 0.12% | 1.825 ms | 0.11% | 249.471 us | 15.83% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.095 us | 9.76% | 8.997 us | 7.18% | -0.097 us | -1.07% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 17.774 us | 4.89% | 17.622 us | 3.59% | -0.152 us | -0.85% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 137.066 us | 1.14% | 137.021 us | 1.03% | -0.045 us | -0.03% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 2.051 ms | 0.50% | 2.052 ms | 0.50% | 0.410 us | 0.02% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.066 us | 9.79% | 8.995 us | 6.34% | -0.071 us | -0.78% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 17.312 us | 4.59% | 17.258 us | 4.52% | -0.053 us | -0.31% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 132.787 us | 1.18% | 132.795 us | 1.09% | 0.008 us | 0.01% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 1.971 ms | 0.56% | 1.970 ms | 0.55% | -0.481 us | -0.02% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.746 us | 11.41% | 8.676 us | 8.10% | -0.071 us | -0.81% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 17.075 us | 4.90% | 17.129 us | 3.61% | 0.054 us | 0.31% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 106.139 us | 1.09% | 106.202 us | 0.70% | 0.063 us | 0.06% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.476 ms | 0.14% | 1.476 ms | 0.15% | 0.015 us | 0.00% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.117 us | 8.97% | 9.551 us | 7.54% | 0.434 us | 4.76% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 17.803 us | 5.49% | 19.636 us | 2.78% | 1.833 us | 10.29% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 143.685 us | 0.92% | 163.002 us | 1.07% | 19.317 us | 13.44% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 2.162 ms | 0.50% | 2.465 ms | 0.50% | 303.622 us | 14.05% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.024 us | 9.44% | 9.432 us | 7.73% | 0.408 us | 4.52% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 17.154 us | 5.41% | 18.922 us | 3.59% | 1.769 us | 10.31% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 138.911 us | 1.14% | 158.745 us | 1.00% | 19.834 us | 14.28% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.083 ms | 0.56% | 2.407 ms | 0.52% | 323.552 us | 15.53% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.754 us | 10.42% | 9.119 us | 6.69% | 0.365 us | 4.16% | PASS |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 16.812 us | 5.41% | 18.568 us | 3.52% | 1.756 us | 10.44% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 112.614 us | 0.80% | 133.348 us | 0.70% | 20.734 us | 18.41% | FAIL |
I16 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.594 ms | 0.13% | 1.940 ms | 0.17% | 346.239 us | 21.72% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.159 us | 9.73% | 9.254 us | 6.77% | 0.096 us | 1.04% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 18.807 us | 4.83% | 19.049 us | 3.88% | 0.242 us | 1.28% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 144.760 us | 0.98% | 144.643 us | 0.82% | -0.117 us | -0.08% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 2.163 ms | 0.50% | 2.163 ms | 0.50% | -0.190 us | -0.01% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.137 us | 8.87% | 9.048 us | 7.18% | -0.089 us | -0.98% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 18.416 us | 5.30% | 18.331 us | 3.66% | -0.085 us | -0.46% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 140.166 us | 1.01% | 139.951 us | 0.84% | -0.215 us | -0.15% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 2.087 ms | 0.51% | 2.086 ms | 0.50% | -0.502 us | -0.02% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 8.937 us | 10.44% | 8.782 us | 7.56% | -0.155 us | -1.73% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 17.762 us | 5.34% | 17.590 us | 4.43% | -0.171 us | -0.96% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 115.686 us | 0.94% | 115.442 us | 0.65% | -0.244 us | -0.21% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.635 ms | 0.10% | 1.634 ms | 0.09% | -0.127 us | -0.01% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.298 us | 8.82% | 9.551 us | 7.24% | 0.253 us | 2.72% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 18.430 us | 4.72% | 19.600 us | 2.75% | 1.170 us | 6.35% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 146.846 us | 0.87% | 164.037 us | 0.86% | 17.191 us | 11.71% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 2.211 ms | 0.50% | 2.485 ms | 0.50% | 274.573 us | 12.42% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.194 us | 9.09% | 9.433 us | 6.27% | 0.239 us | 2.60% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 17.957 us | 4.84% | 19.125 us | 4.10% | 1.168 us | 6.50% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 142.385 us | 0.96% | 160.276 us | 0.89% | 17.892 us | 12.57% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.134 ms | 0.51% | 2.425 ms | 0.50% | 291.188 us | 13.64% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.882 us | 10.28% | 9.217 us | 6.60% | 0.335 us | 3.77% | PASS |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 17.155 us | 5.38% | 18.556 us | 3.07% | 1.401 us | 8.16% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 117.623 us | 0.91% | 134.665 us | 0.64% | 17.042 us | 14.49% | FAIL |
I16 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.680 ms | 0.09% | 1.962 ms | 0.12% | 281.743 us | 16.77% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.497 us | 9.15% | 9.502 us | 7.09% | 0.006 us | 0.06% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 21.831 us | 4.11% | 21.834 us | 3.05% | 0.004 us | 0.02% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 207.161 us | 0.81% | 207.165 us | 0.79% | 0.004 us | 0.00% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 3.174 ms | 0.56% | 3.174 ms | 0.56% | 0.110 us | 0.00% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.470 us | 9.56% | 9.466 us | 8.31% | -0.004 us | -0.04% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 21.976 us | 4.06% | 21.943 us | 3.61% | -0.033 us | -0.15% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 185.022 us | 1.04% | 185.077 us | 1.02% | 0.055 us | 0.03% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 2.796 ms | 0.50% | 2.796 ms | 0.50% | -0.580 us | -0.02% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.082 us | 9.55% | 9.076 us | 7.57% | -0.005 us | -0.06% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 20.810 us | 4.65% | 20.773 us | 2.81% | -0.037 us | -0.18% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 130.582 us | 0.88% | 130.379 us | 0.84% | -0.203 us | -0.16% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.831 ms | 0.17% | 1.830 ms | 0.15% | -0.093 us | -0.01% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.494 us | 9.07% | 9.675 us | 7.39% | 0.181 us | 1.91% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 21.617 us | 4.29% | 22.717 us | 2.97% | 1.100 us | 5.09% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 209.253 us | 0.84% | 224.991 us | 1.03% | 15.738 us | 7.52% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 3.208 ms | 0.53% | 3.468 ms | 0.50% | 260.109 us | 8.11% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.466 us | 9.43% | 9.600 us | 7.07% | 0.134 us | 1.42% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 21.625 us | 4.47% | 22.913 us | 4.02% | 1.288 us | 5.96% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 188.181 us | 1.04% | 207.204 us | 1.00% | 19.023 us | 10.11% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.847 ms | 0.50% | 3.149 ms | 0.50% | 302.452 us | 10.62% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 8.991 us | 10.24% | 9.390 us | 6.57% | 0.399 us | 4.44% | PASS |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 20.585 us | 4.43% | 21.790 us | 3.14% | 1.206 us | 5.86% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 133.353 us | 0.93% | 151.234 us | 0.79% | 17.880 us | 13.41% | FAIL |
I32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.885 ms | 0.15% | 2.184 ms | 0.20% | 298.346 us | 15.82% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.687 us | 9.76% | 9.715 us | 7.79% | 0.028 us | 0.29% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 22.153 us | 4.42% | 22.217 us | 3.56% | 0.064 us | 0.29% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 209.579 us | 0.82% | 209.542 us | 0.88% | -0.036 us | -0.02% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 3.213 ms | 0.55% | 3.213 ms | 0.55% | -0.090 us | -0.00% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.574 us | 10.03% | 9.445 us | 7.04% | -0.130 us | -1.35% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 22.150 us | 4.43% | 22.076 us | 3.88% | -0.073 us | -0.33% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 190.030 us | 1.14% | 189.921 us | 1.02% | -0.109 us | -0.06% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 2.873 ms | 0.50% | 2.873 ms | 0.50% | 0.042 us | 0.00% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.283 us | 9.58% | 9.250 us | 7.03% | -0.032 us | -0.35% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 21.446 us | 4.06% | 21.366 us | 3.53% | -0.080 us | -0.37% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 137.004 us | 0.88% | 136.850 us | 0.77% | -0.154 us | -0.11% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.945 ms | 0.15% | 1.946 ms | 0.14% | 0.195 us | 0.01% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.580 us | 9.54% | 9.852 us | 7.50% | 0.272 us | 2.84% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 21.865 us | 4.07% | 22.948 us | 3.17% | 1.083 us | 4.95% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 210.739 us | 0.84% | 224.008 us | 0.89% | 13.269 us | 6.30% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 3.233 ms | 0.51% | 3.439 ms | 0.50% | 205.544 us | 6.36% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.425 us | 9.43% | 9.687 us | 7.37% | 0.262 us | 2.78% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 22.234 us | 4.90% | 23.272 us | 3.83% | 1.038 us | 4.67% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 190.206 us | 1.03% | 207.140 us | 1.00% | 16.934 us | 8.90% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.875 ms | 0.50% | 3.149 ms | 0.50% | 274.202 us | 9.54% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 9.113 us | 9.59% | 9.567 us | 7.64% | 0.453 us | 4.97% | PASS |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 20.935 us | 4.39% | 22.105 us | 3.46% | 1.170 us | 5.59% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 137.221 us | 0.99% | 152.410 us | 0.70% | 15.189 us | 11.07% | FAIL |
I32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.949 ms | 0.16% | 2.198 ms | 0.14% | 248.294 us | 12.74% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 10.518 us | 8.55% | 10.476 us | 7.15% | -0.042 us | -0.40% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 31.926 us | 2.95% | 31.905 us | 2.47% | -0.021 us | -0.07% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 377.979 us | 0.64% | 377.836 us | 0.58% | -0.143 us | -0.04% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.926 ms | 0.50% | 5.926 ms | 0.50% | -0.359 us | -0.01% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 10.233 us | 8.40% | 10.219 us | 6.71% | -0.014 us | -0.13% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 29.777 us | 3.07% | 29.731 us | 2.69% | -0.046 us | -0.15% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 316.915 us | 0.81% | 316.757 us | 0.77% | -0.158 us | -0.05% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 4.918 ms | 0.50% | 4.919 ms | 0.50% | 0.043 us | 0.00% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 10.034 us | 9.18% | 10.004 us | 6.49% | -0.030 us | -0.30% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 29.086 us | 3.20% | 29.067 us | 3.12% | -0.019 us | -0.07% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 217.329 us | 0.57% | 217.196 us | 0.57% | -0.134 us | -0.06% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 3.233 ms | 0.13% | 3.233 ms | 0.13% | -0.152 us | -0.00% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.637 us | 8.69% | 10.948 us | 6.35% | 0.312 us | 2.93% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 32.104 us | 3.14% | 33.831 us | 3.13% | 1.727 us | 5.38% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 379.048 us | 0.64% | 405.881 us | 0.87% | 26.833 us | 7.08% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.942 ms | 0.50% | 6.364 ms | 0.50% | 422.929 us | 7.12% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 10.313 us | 8.22% | 10.709 us | 7.41% | 0.396 us | 3.84% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 30.123 us | 3.19% | 32.174 us | 3.45% | 2.051 us | 6.81% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 317.541 us | 0.76% | 348.690 us | 0.85% | 31.149 us | 9.81% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 4.927 ms | 0.50% | 5.427 ms | 0.50% | 500.284 us | 10.15% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 10.021 us | 8.44% | 10.402 us | 6.88% | 0.381 us | 3.81% | PASS |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 29.298 us | 3.40% | 31.012 us | 2.57% | 1.714 us | 5.85% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 216.701 us | 0.64% | 245.581 us | 0.70% | 28.880 us | 13.33% | FAIL |
I64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 3.223 ms | 0.13% | 3.687 ms | 0.18% | 464.623 us | 14.42% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 10.052 us | 8.63% | 9.861 us | 6.76% | -0.191 us | -1.90% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 31.559 us | 3.20% | 31.412 us | 2.39% | -0.147 us | -0.47% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 377.742 us | 0.59% | 377.851 us | 0.53% | 0.109 us | 0.03% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.927 ms | 0.50% | 5.927 ms | 0.50% | -0.072 us | -0.00% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.821 us | 8.53% | 9.940 us | 6.83% | 0.119 us | 1.21% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 29.529 us | 3.40% | 29.599 us | 2.89% | 0.070 us | 0.24% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 316.296 us | 0.68% | 316.405 us | 0.69% | 0.109 us | 0.03% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 4.919 ms | 0.50% | 4.919 ms | 0.50% | 0.256 us | 0.01% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.588 us | 10.02% | 9.633 us | 7.61% | 0.045 us | 0.47% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 28.784 us | 3.26% | 28.850 us | 2.76% | 0.066 us | 0.23% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 221.116 us | 0.52% | 221.122 us | 0.50% | 0.006 us | 0.00% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 3.312 ms | 0.13% | 3.312 ms | 0.12% | 0.068 us | 0.00% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.828 us | 8.76% | 11.446 us | 6.10% | 0.618 us | 5.71% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 32.202 us | 2.82% | 33.978 us | 2.53% | 1.776 us | 5.51% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 381.326 us | 0.66% | 402.863 us | 0.89% | 21.537 us | 5.65% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.981 ms | 0.50% | 6.317 ms | 0.50% | 336.380 us | 5.62% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 11.070 us | 8.04% | 11.335 us | 5.41% | 0.265 us | 2.40% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 30.208 us | 3.36% | 32.285 us | 3.10% | 2.077 us | 6.88% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 321.095 us | 0.82% | 348.733 us | 0.81% | 27.638 us | 8.61% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 4.981 ms | 0.50% | 5.421 ms | 0.50% | 440.383 us | 8.84% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 10.564 us | 8.74% | 10.933 us | 6.79% | 0.368 us | 3.49% | PASS |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 29.573 us | 3.14% | 31.261 us | 2.46% | 1.689 us | 5.71% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 221.903 us | 0.52% | 248.289 us | 0.54% | 26.386 us | 11.89% | FAIL |
I64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 3.316 ms | 0.12% | 3.737 ms | 0.14% | 421.455 us | 12.71% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 13.052 us | 7.31% | 13.067 us | 6.38% | 0.016 us | 0.12% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 53.976 us | 2.01% | 53.970 us | 1.57% | -0.006 us | -0.01% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 738.310 us | 0.46% | 738.431 us | 0.48% | 0.121 us | 0.02% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 11.705 ms | 0.50% | 11.705 ms | 0.50% | 0.212 us | 0.00% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 12.870 us | 7.76% | 12.745 us | 5.59% | -0.124 us | -0.97% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 47.696 us | 2.26% | 47.814 us | 2.13% | 0.118 us | 0.25% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 606.811 us | 0.68% | 607.020 us | 0.68% | 0.209 us | 0.03% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 9.572 ms | 0.50% | 9.571 ms | 0.50% | -0.545 us | -0.01% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 12.712 us | 7.15% | 12.605 us | 5.97% | -0.107 us | -0.85% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 41.788 us | 2.70% | 41.681 us | 2.12% | -0.107 us | -0.26% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 413.667 us | 0.39% | 413.871 us | 0.36% | 0.204 us | 0.05% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 6.372 ms | 0.08% | 6.372 ms | 0.09% | 0.843 us | 0.01% | PASS |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 13.018 us | 6.82% | 13.888 us | 5.68% | 0.870 us | 6.68% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 53.800 us | 2.16% | 57.750 us | 2.10% | 3.950 us | 7.34% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 737.848 us | 0.47% | 797.278 us | 0.64% | 59.430 us | 8.05% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 11.698 ms | 0.50% | 12.630 ms | 0.50% | 931.807 us | 7.97% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 12.818 us | 7.38% | 13.616 us | 5.93% | 0.799 us | 6.23% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 46.942 us | 2.32% | 51.647 us | 2.13% | 4.705 us | 10.02% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 606.174 us | 0.69% | 674.760 us | 0.70% | 68.586 us | 11.31% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 9.560 ms | 0.50% | 10.662 ms | 0.50% | 1.103 ms | 11.53% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 12.697 us | 7.15% | 13.384 us | 4.81% | 0.687 us | 5.41% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 41.548 us | 1.96% | 45.190 us | 1.93% | 3.642 us | 8.77% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 412.133 us | 0.39% | 480.678 us | 0.44% | 68.545 us | 16.63% | FAIL |
I128 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 6.349 ms | 0.09% | 7.483 ms | 0.12% | 1.133 ms | 17.85% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 12.595 us | 6.73% | 12.687 us | 5.45% | 0.092 us | 0.73% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 54.426 us | 1.80% | 54.526 us | 1.63% | 0.100 us | 0.18% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 744.302 us | 0.50% | 744.240 us | 0.50% | -0.062 us | -0.01% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 11.787 ms | 0.50% | 11.787 ms | 0.50% | 0.033 us | 0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 12.493 us | 6.84% | 12.394 us | 5.20% | -0.099 us | -0.79% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 47.800 us | 2.55% | 48.634 us | 2.32% | 0.834 us | 1.75% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 615.550 us | 0.68% | 615.273 us | 0.65% | -0.278 us | -0.05% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 9.709 ms | 0.50% | 9.709 ms | 0.50% | 0.035 us | 0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 12.378 us | 6.47% | 12.235 us | 4.91% | -0.143 us | -1.15% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 42.668 us | 2.26% | 42.604 us | 1.99% | -0.064 us | -0.15% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 431.237 us | 0.38% | 431.117 us | 0.36% | -0.120 us | -0.03% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 6.666 ms | 0.07% | 6.666 ms | 0.07% | -0.013 us | -0.00% | PASS |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 12.672 us | 7.05% | 13.466 us | 4.98% | 0.794 us | 6.26% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 54.391 us | 1.78% | 57.538 us | 1.81% | 3.147 us | 5.79% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 742.842 us | 0.49% | 788.839 us | 0.61% | 45.997 us | 6.19% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 11.774 ms | 0.50% | 12.470 ms | 0.50% | 696.084 us | 5.91% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 12.471 us | 6.61% | 13.193 us | 4.96% | 0.722 us | 5.79% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 47.804 us | 2.56% | 52.308 us | 2.00% | 4.504 us | 9.42% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 614.566 us | 0.67% | 672.616 us | 0.63% | 58.050 us | 9.45% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 9.692 ms | 0.50% | 10.618 ms | 0.50% | 926.413 us | 9.56% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 12.312 us | 7.11% | 13.109 us | 5.23% | 0.796 us | 6.47% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 42.546 us | 2.59% | 45.749 us | 2.10% | 3.203 us | 7.53% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 429.194 us | 0.32% | 491.004 us | 0.32% | 61.810 us | 14.40% | FAIL |
I128 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 6.632 ms | 0.07% | 7.632 ms | 0.08% | 1.000 ms | 15.09% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.488 us | 9.89% | 9.524 us | 6.99% | 0.036 us | 0.38% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 21.830 us | 4.02% | 21.884 us | 3.59% | 0.053 us | 0.24% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 207.294 us | 0.86% | 207.319 us | 0.80% | 0.025 us | 0.01% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 3.174 ms | 0.56% | 3.174 ms | 0.56% | 0.015 us | 0.00% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.734 us | 9.98% | 9.795 us | 7.76% | 0.061 us | 0.62% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 21.942 us | 4.28% | 21.946 us | 3.46% | 0.004 us | 0.02% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 185.041 us | 1.05% | 185.014 us | 1.03% | -0.027 us | -0.01% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 2.796 ms | 0.50% | 2.796 ms | 0.50% | -0.226 us | -0.01% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.041 us | 9.95% | 9.148 us | 7.37% | 0.108 us | 1.19% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 20.787 us | 4.17% | 20.785 us | 3.45% | -0.001 us | -0.01% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 130.512 us | 0.94% | 130.351 us | 0.84% | -0.160 us | -0.12% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.831 ms | 0.18% | 1.830 ms | 0.16% | -0.471 us | -0.03% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.490 us | 9.34% | 9.849 us | 7.86% | 0.359 us | 3.79% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 21.651 us | 4.70% | 22.783 us | 3.08% | 1.133 us | 5.23% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 209.725 us | 0.84% | 225.322 us | 1.01% | 15.597 us | 7.44% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 3.205 ms | 0.53% | 3.468 ms | 0.50% | 262.680 us | 8.20% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.676 us | 9.91% | 9.905 us | 6.84% | 0.230 us | 2.37% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 21.802 us | 4.61% | 23.133 us | 3.75% | 1.331 us | 6.11% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 188.262 us | 1.07% | 206.613 us | 1.02% | 18.350 us | 9.75% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.847 ms | 0.50% | 3.150 ms | 0.50% | 303.125 us | 10.65% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 9.062 us | 9.72% | 9.347 us | 6.59% | 0.285 us | 3.15% | PASS |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 20.680 us | 4.38% | 21.757 us | 3.94% | 1.077 us | 5.21% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 133.447 us | 0.85% | 151.220 us | 0.76% | 17.773 us | 13.32% | FAIL |
F32 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.885 ms | 0.15% | 2.184 ms | 0.16% | 298.568 us | 15.84% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.778 us | 9.55% | 9.663 us | 8.03% | -0.116 us | -1.18% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 22.494 us | 4.59% | 22.345 us | 3.68% | -0.150 us | -0.66% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 209.628 us | 0.86% | 209.679 us | 0.80% | 0.051 us | 0.02% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 3.213 ms | 0.54% | 3.213 ms | 0.55% | -0.108 us | -0.00% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.547 us | 9.74% | 9.606 us | 7.85% | 0.059 us | 0.62% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 22.416 us | 4.70% | 22.186 us | 3.35% | -0.230 us | -1.02% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 189.858 us | 1.08% | 189.806 us | 0.99% | -0.052 us | -0.03% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 2.873 ms | 0.50% | 2.873 ms | 0.50% | -0.340 us | -0.01% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.211 us | 9.87% | 9.266 us | 7.58% | 0.055 us | 0.60% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 21.264 us | 4.09% | 21.225 us | 3.92% | -0.039 us | -0.18% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 136.972 us | 0.94% | 136.913 us | 0.81% | -0.059 us | -0.04% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 1.945 ms | 0.16% | 1.945 ms | 0.15% | -0.437 us | -0.02% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 9.514 us | 9.99% | 9.962 us | 6.92% | 0.448 us | 4.70% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 22.028 us | 4.60% | 23.254 us | 4.04% | 1.226 us | 5.57% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 210.701 us | 0.91% | 224.307 us | 0.94% | 13.606 us | 6.46% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 3.233 ms | 0.51% | 3.439 ms | 0.50% | 205.962 us | 6.37% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 9.550 us | 8.12% | 9.842 us | 8.17% | 0.292 us | 3.06% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 21.888 us | 3.60% | 23.140 us | 3.36% | 1.252 us | 5.72% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 190.309 us | 1.02% | 207.127 us | 1.00% | 16.818 us | 8.84% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 2.875 ms | 0.50% | 3.150 ms | 0.50% | 274.132 us | 9.53% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 9.176 us | 7.14% | 9.552 us | 7.80% | 0.375 us | 4.09% | PASS |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 21.073 us | 3.34% | 22.104 us | 3.50% | 1.032 us | 4.90% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 137.320 us | 0.83% | 152.563 us | 0.77% | 15.243 us | 11.10% | FAIL |
F32 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 1.950 ms | 0.15% | 2.198 ms | 0.15% | 248.403 us | 12.74% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 10.466 us | 5.85% | 10.310 us | 5.47% | -0.155 us | -1.48% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 32.007 us | 2.54% | 31.908 us | 2.38% | -0.099 us | -0.31% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 377.823 us | 0.57% | 377.850 us | 0.57% | 0.027 us | 0.01% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.926 ms | 0.50% | 5.926 ms | 0.50% | -0.067 us | -0.00% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 10.686 us | 6.56% | 10.336 us | 6.58% | -0.351 us | -3.28% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 29.754 us | 2.67% | 30.188 us | 2.83% | 0.435 us | 1.46% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 316.933 us | 0.75% | 316.882 us | 0.79% | -0.051 us | -0.02% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 4.918 ms | 0.50% | 4.919 ms | 0.50% | 0.297 us | 0.01% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 10.080 us | 6.79% | 9.917 us | 7.17% | -0.163 us | -1.62% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 29.114 us | 2.24% | 29.035 us | 2.40% | -0.078 us | -0.27% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 217.259 us | 0.57% | 217.277 us | 0.49% | 0.018 us | 0.01% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 3.233 ms | 0.12% | 3.232 ms | 0.12% | -0.766 us | -0.02% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.397 us | 6.12% | 10.933 us | 6.89% | 0.537 us | 5.16% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 32.010 us | 2.37% | 33.800 us | 2.71% | 1.790 us | 5.59% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 378.756 us | 0.59% | 405.616 us | 0.91% | 26.860 us | 7.09% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.941 ms | 0.50% | 6.365 ms | 0.50% | 424.126 us | 7.14% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 10.522 us | 6.90% | 11.088 us | 6.87% | 0.566 us | 5.38% | PASS |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 29.932 us | 2.39% | 32.130 us | 3.05% | 2.198 us | 7.34% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 317.432 us | 0.74% | 348.707 us | 0.87% | 31.275 us | 9.85% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 4.927 ms | 0.50% | 5.427 ms | 0.50% | 500.120 us | 10.15% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 9.942 us | 6.38% | 10.538 us | 5.94% | 0.596 us | 6.00% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 29.197 us | 3.05% | 31.094 us | 2.47% | 1.897 us | 6.50% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 216.607 us | 0.61% | 245.595 us | 0.70% | 28.988 us | 13.38% | FAIL |
F64 | I32 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 3.223 ms | 0.11% | 3.688 ms | 0.19% | 465.826 us | 14.45% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 1 | 9.906 us | 6.72% | 10.007 us | 6.63% | 0.100 us | 1.01% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 1 | 31.451 us | 2.91% | 31.492 us | 2.31% | 0.041 us | 0.13% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 1 | 377.775 us | 0.54% | 377.852 us | 0.55% | 0.077 us | 0.02% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 1 | 5.926 ms | 0.50% | 5.927 ms | 0.50% | 0.138 us | 0.00% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0.544 | 9.915 us | 7.07% | 9.781 us | 7.92% | -0.134 us | -1.35% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0.544 | 29.602 us | 2.98% | 30.115 us | 2.73% | 0.512 us | 1.73% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0.544 | 316.427 us | 0.65% | 316.611 us | 0.70% | 0.184 us | 0.06% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0.544 | 4.919 ms | 0.50% | 4.920 ms | 0.50% | 0.387 us | 0.01% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^16 | 0 | 9.681 us | 7.55% | 9.576 us | 7.56% | -0.105 us | -1.09% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^20 | 0 | 28.881 us | 2.39% | 28.872 us | 2.48% | -0.009 us | -0.03% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^24 | 0 | 221.119 us | 0.54% | 221.288 us | 0.52% | 0.169 us | 0.08% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, false> | 2^28 | 0 | 3.311 ms | 0.11% | 3.311 ms | 0.11% | -0.276 us | -0.01% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 1 | 10.842 us | 6.84% | 11.476 us | 6.34% | 0.634 us | 5.85% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 1 | 32.175 us | 2.24% | 33.865 us | 2.74% | 1.690 us | 5.25% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 1 | 381.725 us | 0.65% | 402.896 us | 0.83% | 21.171 us | 5.55% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 1 | 5.981 ms | 0.50% | 6.317 ms | 0.50% | 335.479 us | 5.61% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0.544 | 10.737 us | 6.50% | 11.206 us | 5.20% | 0.470 us | 4.37% | PASS |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0.544 | 30.316 us | 2.63% | 32.825 us | 2.65% | 2.509 us | 8.28% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0.544 | 321.029 us | 0.76% | 348.792 us | 0.85% | 27.763 us | 8.65% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0.544 | 4.981 ms | 0.50% | 5.422 ms | 0.50% | 440.682 us | 8.85% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^16 | 0 | 10.429 us | 6.40% | 11.047 us | 5.53% | 0.618 us | 5.93% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^20 | 0 | 29.420 us | 2.62% | 31.346 us | 2.88% | 1.926 us | 6.55% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^24 | 0 | 221.825 us | 0.49% | 248.314 us | 0.54% | 26.489 us | 11.94% | FAIL |
F64 | I64 | cuda::std::__4::integral_constant<bool, true> | 2^28 | 0 | 3.315 ms | 0.11% | 3.737 ms | 0.14% | 421.507 us | 12.71% | FAIL |
🟩 CI finished in 4h 41m: Pass: 100%/249 | Total: 5d 02h | Avg: 29m 30s | Max: 1h 03m | Hits: 36%/248433
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 1h 53m: Pass: 100%/249 | Total: 1d 11h | Avg: 8m 27s | Max: 29m 01s | Hits: 97%/248433
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟨 CI finished in 1h 36m: Pass: 99%/249 | Total: 2d 22h | Avg: 17m 02s | Max: 39m 06s | Hits: 84%/247581
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 2h 16m: Pass: 100%/249 | Total: 2d 22h | Avg: 17m 04s | Max: 39m 06s | Hits: 84%/248433
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 3h 25m: Pass: 100%/249 | Total: 3d 15h | Avg: 20m 58s | Max: 53m 03s | Hits: 82%/248433
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
Description
Closes #1730
Currently, for the in-place versions of mentioned algorithms, we may run into a race condition, where a thread block's input items may have already been overwritten by one of the subsequent thread blocks. More specifically, if there are attributes of an item that are not needed to evaluate whether an item is selected or not, when compiling with a tuning policy that is not using shared memory during the
BlockLoad
stage.All of the following conditions must be met for the race to be present:
cub::Traits<T>::PRIMITIVE
, as for other "more complex" types we emit ast.release
.BlockLoad
algorithm that doesn't load all of the items' data into shared memory (e.g.,BLOCK_LOAD_DIRECT
).In order for stream compaction to work in-place, we need to make sure a thread block has loaded its items before it signals successor thread blocks the number of items it selected (i.e., the "aggregate" or "partial" in the decoupled look-back), as that is the only information needed by successor thread blocks to infer their offset, which unblocks them to write out their stream-compacted items. To make sure in-place stream compaction works as expected for tuning policies with
BLOCK_LOAD_DIRECT
, we need a device-wide memory barrier betweenBlockLoad(...).Load(items)
andtile_state.SetPartial()
.Unless we're loading items to shared memory during the
BlockLoad
stage, theCTA_SYNC()
(__syncthreads()
) is not sufficient as that is only a memory barrier with regards to other threads in the thread block.Checklist - Post-Load acquire introduction (aeff76e)
MayAlias
template parameter.