Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test pr please ignore #2587

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

pbalcer
Copy link
Contributor

@pbalcer pbalcer commented Jan 20, 2025

No description provided.

@pbalcer pbalcer requested a review from a team as a code owner January 20, 2025 10:23
Copy link

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama|memory_benchmark_sycl"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12865994266

Copy link

Compute Benchmarks level_zero run (--filter "Velocity|llama|memory_benchmark_sycl"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12865994266
Job status: success. Test status: success.

Summary

Total 19 benchmarks in mean.
Geomean 104.308%.
Improved 6 Regressed 1 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group memory (4): 115.889%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 133.924000 μs 219.808 μs 164.13% 64.13% ++++++++++
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.599000 μs 5.865 μs 104.75% 4.75% +
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.176000 GB/s 3.043 GB/s 104.37% 4.37% +
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 253.546000 μs 254.865 μs 100.52% 0.52% .
Relative perf in group Velocity-Bench (9): 102.191%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Easywave 233.000000 ms 289.000 ms 124.03% 24.03% ++++
Velocity-Bench Sobel Filter 593.005000 ms 621.173 ms 104.75% 4.75% +
Velocity-Bench svm 0.136100 s 0.140 s 102.94% 2.94% .
Velocity-Bench dl-cifar 23.698200 s 23.972 s 101.16% 1.16% .
Velocity-Bench QuickSilver 118.600000 MMS/CTT 117.450 MMS/CTT 100.98% 0.98% .
Velocity-Bench Hashtable 359.537028 M keys/sec 356.084 M keys/sec 100.97% 0.97% .
Velocity-Bench CudaSift 202.681000 ms 204.342 ms 100.82% 0.82% .
Velocity-Bench Bitcracker 35.033700 s 35.119 s 100.24% 0.24% .
Velocity-Bench dl-mnist 2.730 s 2.380000 s 87.18% -12.82% --
Relative perf in group llama.cpp (6): 100.274%
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 512 431.144378 token/s 426.428 token/s 101.11% 1.11% .
llama.cpp Text Generation Batched 512 62.828598 token/s 62.478 token/s 100.56% 0.56% .
llama.cpp Text Generation Batched 256 62.794807 token/s 62.525 token/s 100.43% 0.43% .
llama.cpp Text Generation Batched 128 62.756959 token/s 62.531 token/s 100.36% 0.36% .
llama.cpp Prompt Processing Batched 256 870.777 token/s 872.219855 token/s 99.83% -0.17% .
llama.cpp Prompt Processing Batched 128 825.121 token/s 830.457525 token/s 99.36% -0.64% .
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.848000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.745000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 23.710000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.891000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.143000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.702000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 105463.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.623000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110815.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.859000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 123991.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.425000 μs
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 861.253000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6931.139000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17007.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 47383.460000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2073.904000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7868.958000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 9035.852000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 27237.512000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1194.467000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 42860.412000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 113343.613000 μs
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 253.100000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 273.484000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 271.662000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 272.505000 ms
Runtime_DAGTaskThroughput_SingleTask - 1691.410000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1756.502000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1721.262000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1694.375000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.188000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.967000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.769000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.866000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.226000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.268000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.919000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.115000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.140000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.113000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.772000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.628000 ms
MicroBench_LocalMem_int32_4096 - 29.834000 ms
MicroBench_LocalMem_fp32_4096 - 29.857000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.971000 ms
Pattern_Reduction_Hierarchical_int32 - 17.024000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.263000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.164000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.333000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.587000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.777000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.588000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.734000 ms
ScalarProduct_NDRange_int64 - 5.456000 ms
ScalarProduct_NDRange_fp32 - 3.767000 ms
ScalarProduct_Hierarchical_int32 - 10.555000 ms
ScalarProduct_Hierarchical_int64 - 11.508000 ms
ScalarProduct_Hierarchical_fp32 - 10.174000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.068000 ms
USM_Allocation_latency_fp32_host - 37.633000 ms
USM_Allocation_latency_fp32_shared - 0.057000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.717000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.085000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.889000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.256000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.510000 ms
VectorAddition_int64 - 3.066000 ms
VectorAddition_fp32 - 1.460000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.221000 ms
Polybench_3mm - 1.730000 ms
Polybench_Atax - 6.855000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 16.091000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 908.423000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.030000 ms
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc - 2475.310000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider - 2120.000000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3068.370000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 283.309000 ns
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc - 706.837000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider - 197.281000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 268.948000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 213.433000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc - 1259.770000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider - 1854.120000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3771.150000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 253.839000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc - 726.627000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider - 195.246000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 308.264000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 206.713000 ns
Relative perf in group alloc/min (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc - 803.081000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc - 177.090000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> - 978.697000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> - 975.381000 ns
Relative perf in group multiple (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc - 33503.600000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc - 4251.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc - 141113.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc - 30214.100000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> - 1170470.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> - 165011.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider - 1151930.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider - 145356.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> - 42332.700000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> - 15330.800000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> - 75942.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> - 25425.600000 ns

Details

Benchmark details - environment, command, output...
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),254.966,253.546,1.61%,249.381,446.321,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),134.914,133.924,2.31%,132.271,349.675,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.791,5.599,13.45%,5.199,65.839,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device multiplier=1),3.158,3.176,3.70%,0.384,3.444,[CPU],[GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.373307 s
359.537028 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00411306 s
bitcracker - total time for whole calculation: 35.0337 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1079 1266 29.2968% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1117 1270 30.3285% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1276 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1274 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1199 1254 32.555% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1129 1273 30.6544% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1264 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1268 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1117 1271 30.3285% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1208 1266 32.7993% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1216 1252 33.0166% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1130 1266 30.6815% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1266 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1175 1255 31.9033% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1259 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1259 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1171 1254 31.7947% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1268 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1251 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1266 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1271 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1102 1250 29.9213% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1078 1279 29.2696% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1209 1255 32.8265% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1135 1272 30.8173% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1265 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1274 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1264 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1122 1258 30.4643% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1273 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1082 1255 29.3782% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1125 1253 30.5458% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1264 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1100 1269 29.867% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1258 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1140 1265 30.953% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1273 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1269 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1263 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1258 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1270 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1264 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1061 1271 28.808% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 202.681 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.328980e-01 6.103600e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.604810e-01 7.423830e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.363200e-01 7.603240e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.675710e-01 8.242040e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.639380e-01 7.936470e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.320220e-01 7.813230e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.324010e-01 7.623190e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.349380e-01 7.812370e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.326040e-01 7.819730e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.334550e-01 7.578150e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.112e+07 1.112e+07 1.112e+07 0.000e+00 100.00
cycleInit 10 3.527e+06 3.527e+06 3.527e+06 0.000e+00 100.00
cycleTracking 10 7.596e+06 7.596e+06 7.596e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.916e+06 4.916e+06 4.916e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.011e+05 2.011e+05 2.011e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 3.990e+02 3.990e+02 3.990e+02 0.000e+00 100.00
Figure Of Merit 118.60 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.48054 s
sobelfilter - total time for whole calculation: 0.593005 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.7e-05 s
dl-cifar - total time for whole calculation: 23.6982 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.73 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Output:

Number of args 3
Using cuSVM (Carpenter)...

Buffering input text file (6989624 B).
Load Done
Starting Training
_C 1.000000
Workgroup Size: 1024
nbrCtas 80
elemsPerCta 1248
threadsPerCta 128
Total run time: 0.064777 seconds
Iter:100
M:97683
N:123
Train done. Calulate Vector counts
Training done

Loading elapsed time : 0.0638 s
Processing elapsed time : 0.0701 s
Storing elapsed time : 0.0023 s
Total elapsed time : 0.1361 s
Result's are correct: 0.0551

llama.cpp Prompt Processing Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:44:24Z","620516854","1258772","825.121317","1.672776"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:44:30Z","2040546361","2382184","62.728366","0.073113"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:44:40Z","588169198","3616369","870.524086","5.342074"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:44:43Z","2041094631","2104230","62.711501","0.064610"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:44:54Z","1199599219","5951935","426.817630","2.120240"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:01Z","2039763703","1386355","62.752390","0.042622"

llama.cpp Text Generation Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:46:49Z","622374982","15354734","823.043546","19.677805"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:46:57Z","2039619767","3689979","62.756959","0.113266"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:47:07Z","590807100","3235090","866.631865","4.737120"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:47:11Z","2036308627","1012105","62.858853","0.031175"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:47:21Z","1205494828","7645015","424.735565","2.702280"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:47:28Z","2036831696","1149439","62.842714","0.035395"

llama.cpp Prompt Processing Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:12Z","611723946","1513093","836.982939","2.070649"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:16Z","2034504723","3914442","62.914761","0.120766"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:27Z","588006272","4327260","870.776647","6.393052"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:30Z","2038385602","1105861","62.794807","0.034032"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:40Z","1187537973","1051367","431.144378","0.381549"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:48Z","2036080891","1669583","62.865905","0.051525"

llama.cpp Text Generation Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:12Z","611723946","1513093","836.982939","2.070649"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:16Z","2034504723","3914442","62.914761","0.120766"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:27Z","588006272","4327260","870.776647","6.393052"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:30Z","2038385602","1105861","62.794807","0.034032"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:40Z","1187537973","1051367","431.144378","0.381549"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:48Z","2036080891","1669583","62.865905","0.051525"

llama.cpp Prompt Processing Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:12Z","611723946","1513093","836.982939","2.070649"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:16Z","2034504723","3914442","62.914761","0.120766"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:27Z","588006272","4327260","870.776647","6.393052"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:30Z","2038385602","1105861","62.794807","0.034032"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:40Z","1187537973","1051367","431.144378","0.381549"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:45:48Z","2036080891","1669583","62.865905","0.051525"

llama.cpp Text Generation Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:59Z","800981973","419087399","733.624096","225.559274"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:46:07Z","2040098646","3348632","62.742199","0.102781"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:46:18Z","582747125","1350197","878.600987","2.037306"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:46:21Z","2037016342","2492470","62.837077","0.076754"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:46:31Z","1182129718","2992373","433.118815","1.096055"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:46:39Z","2037289954","1693822","62.828598","0.052196"

Copy link

Compute Benchmarks level_zero run (with params: --filter "SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866385766

Copy link

Compute Benchmarks level_zero run (--filter "SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866385766
Job status: success. Test status: success.

Summary

Total 9 benchmarks in mean.
Geomean 101.398%.
Improved 3 Regressed 0 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 101.398%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order 11.388000 μs 11.848 μs 104.04% 4.04% ++++++++++
api_overhead_benchmark_ur SubmitKernel in order 16.398000 μs 16.859 μs 102.81% 2.81% +++++++
api_overhead_benchmark_ur SubmitKernel in order with measure completion 20.953000 μs 21.425 μs 102.25% 2.25% ++++++
api_overhead_benchmark_sycl SubmitKernel in order 24.452000 μs 24.891 μs 101.80% 1.80% .
api_overhead_benchmark_sycl SubmitKernel out of order 23.390000 μs 23.710 μs 101.37% 1.37% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 104523.000000 instr 105463.000 instr 100.90% 0.90% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 109983.000000 instr 110815.000 instr 100.76% 0.76% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 123156.000000 instr 123991.000 instr 100.68% 0.68% .
api_overhead_benchmark_ur SubmitKernel out of order 15.927 μs 15.623000 μs 98.09% -1.91% .
api_overhead_benchmark_l0 SubmitKernel in order - 11.745000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.143000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.702000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 - 254.865000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 - 219.808000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.865000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.043000 GB/s
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 861.253000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6931.139000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17007.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 47383.460000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2073.904000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7868.958000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 9035.852000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 27237.512000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1194.467000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 42860.412000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 113343.613000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 356.084148 M keys/sec
Velocity-Bench Bitcracker - 35.118800 s
Velocity-Bench CudaSift - 204.342000 ms
Velocity-Bench Easywave - 289.000000 ms
Velocity-Bench QuickSilver - 117.450000 MMS/CTT
Velocity-Bench Sobel Filter - 621.173000 ms
Velocity-Bench dl-cifar - 23.972100 s
Velocity-Bench dl-mnist - 2.380000 s
Velocity-Bench svm - 0.140100 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 253.100000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 273.484000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 271.662000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 272.505000 ms
Runtime_DAGTaskThroughput_SingleTask - 1691.410000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1756.502000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1721.262000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1694.375000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.188000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.967000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.769000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.866000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.226000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.268000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.919000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.115000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.140000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.113000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.772000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.628000 ms
MicroBench_LocalMem_int32_4096 - 29.834000 ms
MicroBench_LocalMem_fp32_4096 - 29.857000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.971000 ms
Pattern_Reduction_Hierarchical_int32 - 17.024000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.263000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.164000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.333000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.587000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.777000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.588000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.734000 ms
ScalarProduct_NDRange_int64 - 5.456000 ms
ScalarProduct_NDRange_fp32 - 3.767000 ms
ScalarProduct_Hierarchical_int32 - 10.555000 ms
ScalarProduct_Hierarchical_int64 - 11.508000 ms
ScalarProduct_Hierarchical_fp32 - 10.174000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.068000 ms
USM_Allocation_latency_fp32_host - 37.633000 ms
USM_Allocation_latency_fp32_shared - 0.057000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.717000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.085000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.889000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.256000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.510000 ms
VectorAddition_int64 - 3.066000 ms
VectorAddition_fp32 - 1.460000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.221000 ms
Polybench_3mm - 1.730000 ms
Polybench_Atax - 6.855000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 16.091000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 908.423000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.030000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 830.457525 token/s
llama.cpp Text Generation Batched 128 - 62.530663 token/s
llama.cpp Prompt Processing Batched 256 - 872.219855 token/s
llama.cpp Text Generation Batched 256 - 62.524658 token/s
llama.cpp Prompt Processing Batched 512 - 426.427709 token/s
llama.cpp Text Generation Batched 512 - 62.477744 token/s
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc - 2475.310000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider - 2120.000000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3068.370000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 283.309000 ns
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc - 706.837000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider - 197.281000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 268.948000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 213.433000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc - 1259.770000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider - 1854.120000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3771.150000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 253.839000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc - 726.627000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider - 195.246000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 308.264000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 206.713000 ns
Relative perf in group alloc/min (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc - 803.081000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc - 177.090000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> - 978.697000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> - 975.381000 ns
Relative perf in group multiple (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc - 33503.600000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc - 4251.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc - 141113.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc - 30214.100000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> - 1170470.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> - 165011.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider - 1151930.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider - 145356.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> - 42332.700000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> - 15330.800000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> - 75942.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> - 25425.600000 ns

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),11.435,11.388,6.12%,10.820,212.273,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.523,23.390,3.81%,22.519,252.495,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.580,24.452,3.58%,23.492,245.074,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),104583.873,104523.000,6.04%,104415.000,2099442.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.069,15.847,282.52%,15.040,14371.492,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),104583.874,104523.000,6.04%,104415.000,2099442.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.098,15.927,284.81%,15.236,14514.255,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),110050.519,109983.000,4.01%,109983.000,1499807.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.625,16.398,298.47%,15.648,15707.647,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),110050.519,109983.000,4.01%,109983.000,1499807.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.625,16.398,298.47%,15.648,15707.647,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),123141.185,123156.000,3.78%,122528.000,1587858.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),21.221,21.037,211.06%,19.404,14181.116,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),123132.624,123156.000,3.78%,122528.000,1585974.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),21.162,20.953,237.63%,19.300,15920.068,[CPU],time [us]

Copy link

Compute Benchmarks level_zero run (with params: --filter "Velocity|memory_benchmark_sycl|SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866751038

Copy link

Compute Benchmarks level_zero run (--filter "Velocity|memory_benchmark_sycl|SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866751038
Job status: success. Test status: success.

Summary

Total 22 benchmarks in mean.
Geomean 103.257%.
Improved 7 Regressed 2 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 99.953%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_ur SubmitKernel in order 16.505000 μs 16.859 μs 102.14% 2.14% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.064000 μs 21.425 μs 101.71% 1.71% .
api_overhead_benchmark_l0 SubmitKernel out of order 11.733000 μs 11.848 μs 100.98% 0.98% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 104523.000000 instr 105463.000 instr 100.90% 0.90% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 123029.000000 instr 123991.000 instr 100.78% 0.78% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 109983.000000 instr 110815.000 instr 100.76% 0.76% .
api_overhead_benchmark_sycl SubmitKernel in order 24.777000 μs 24.891 μs 100.46% 0.46% .
api_overhead_benchmark_ur SubmitKernel out of order 15.656 μs 15.623000 μs 99.79% -0.21% .
api_overhead_benchmark_sycl SubmitKernel out of order 25.658 μs 23.710000 μs 92.41% -7.59% -
api_overhead_benchmark_l0 SubmitKernel in order - 11.745000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.143000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.702000 μs
Relative perf in group memory (4): 115.598%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 134.451000 μs 219.808 μs 163.49% 63.49% ++++++++++
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.615000 μs 5.865 μs 104.45% 4.45% +
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.176000 GB/s 3.043 GB/s 104.37% 4.37% +
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 254.378000 μs 254.865 μs 100.19% 0.19% .
Relative perf in group Velocity-Bench (9): 101.449%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Easywave 235.000000 ms 289.000 ms 122.98% 22.98% ++++
Velocity-Bench svm 0.135800 s 0.140 s 103.17% 3.17% .
Velocity-Bench Sobel Filter 608.972000 ms 621.173 ms 102.00% 2.00% .
Velocity-Bench dl-cifar 23.819900 s 23.972 s 100.64% 0.64% .
Velocity-Bench CudaSift 203.112000 ms 204.342 ms 100.61% 0.61% .
Velocity-Bench Hashtable 356.246919 M keys/sec 356.084 M keys/sec 100.05% 0.05% .
Velocity-Bench Bitcracker 35.105800 s 35.119 s 100.04% 0.04% .
Velocity-Bench QuickSilver 117.360 MMS/CTT 117.450000 MMS/CTT 99.92% -0.08% .
Velocity-Bench dl-mnist 2.740 s 2.380000 s 86.86% -13.14% --
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 861.253000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6931.139000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17007.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 47383.460000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2073.904000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7868.958000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 9035.852000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 27237.512000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1194.467000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 42860.412000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 113343.613000 μs
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 253.100000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 273.484000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 271.662000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 272.505000 ms
Runtime_DAGTaskThroughput_SingleTask - 1691.410000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1756.502000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1721.262000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1694.375000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.188000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.967000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.769000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.866000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.226000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.268000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.919000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.115000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.140000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.113000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.772000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.628000 ms
MicroBench_LocalMem_int32_4096 - 29.834000 ms
MicroBench_LocalMem_fp32_4096 - 29.857000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.971000 ms
Pattern_Reduction_Hierarchical_int32 - 17.024000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.263000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.164000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.333000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.587000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.777000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.588000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.734000 ms
ScalarProduct_NDRange_int64 - 5.456000 ms
ScalarProduct_NDRange_fp32 - 3.767000 ms
ScalarProduct_Hierarchical_int32 - 10.555000 ms
ScalarProduct_Hierarchical_int64 - 11.508000 ms
ScalarProduct_Hierarchical_fp32 - 10.174000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.068000 ms
USM_Allocation_latency_fp32_host - 37.633000 ms
USM_Allocation_latency_fp32_shared - 0.057000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.717000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.085000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.889000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.256000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.510000 ms
VectorAddition_int64 - 3.066000 ms
VectorAddition_fp32 - 1.460000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.221000 ms
Polybench_3mm - 1.730000 ms
Polybench_Atax - 6.855000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 16.091000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 908.423000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.030000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 830.457525 token/s
llama.cpp Text Generation Batched 128 - 62.530663 token/s
llama.cpp Prompt Processing Batched 256 - 872.219855 token/s
llama.cpp Text Generation Batched 256 - 62.524658 token/s
llama.cpp Prompt Processing Batched 512 - 426.427709 token/s
llama.cpp Text Generation Batched 512 - 62.477744 token/s
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc - 2475.310000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider - 2120.000000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3068.370000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 283.309000 ns
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc - 706.837000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider - 197.281000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 268.948000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 213.433000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc - 1259.770000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider - 1854.120000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3771.150000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 253.839000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc - 726.627000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider - 195.246000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 308.264000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 206.713000 ns
Relative perf in group alloc/min (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc - 803.081000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc - 177.090000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> - 978.697000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> - 975.381000 ns
Relative perf in group multiple (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc - 33503.600000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc - 4251.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc - 141113.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc - 30214.100000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> - 1170470.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> - 165011.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider - 1151930.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider - 145356.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> - 42332.700000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> - 15330.800000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> - 75942.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> - 25425.600000 ns

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),11.783,11.733,2.97%,11.110,83.040,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.907,25.658,3.95%,23.797,179.374,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.934,24.777,2.14%,23.888,83.642,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),255.832,254.378,1.74%,249.176,505.539,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),145.645,134.451,24.16%,132.947,313.311,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.911,5.615,13.82%,5.084,63.310,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device multiplier=1),3.160,3.176,3.24%,0.339,3.407,[CPU],[GB/s]

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),104583.877,104523.000,6.04%,104415.000,2099442.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),15.890,15.656,301.30%,15.064,15155.306,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),104583.877,104523.000,6.04%,104415.000,2099442.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),15.890,15.656,301.30%,15.064,15155.306,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),110050.519,109983.000,4.01%,109983.000,1499807.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.635,16.425,266.56%,15.642,14038.336,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),110050.519,109983.000,4.01%,109983.000,1499807.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.798,16.505,431.54%,15.627,22939.313,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),123100.819,123029.000,3.78%,122528.000,1587544.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),21.277,20.976,359.27%,19.335,24192.964,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),123031.776,122842.000,3.79%,122215.000,1589428.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),21.480,21.064,349.89%,19.849,23786.082,[CPU],time [us]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.376755 s
356.246919 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00392376 s
bitcracker - total time for whole calculation: 35.1058 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1166 1261 31.659% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1268 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1059 1261 28.7537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1252 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1212 1271 32.908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1269 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1265 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1262 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1253 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1271 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1080 1266 29.3239% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1264 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1257 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1254 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1202 1262 32.6364% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1252 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1272 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1267 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1255 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1254 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1269 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1162 1263 31.5504% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1250 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1255 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1260 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1214 1256 32.9623% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1252 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1274 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1264 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1258 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1153 1254 31.306% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1163 1259 31.5775% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1276 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1252 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1030 1254 27.9663% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1123 1257 30.4914% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1279 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 203.112 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.348360e-01 6.249330e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.390690e-01 7.631610e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.378300e-01 7.792570e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.701710e-01 8.456640e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.350090e-01 7.885810e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.355830e-01 7.634300e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.345930e-01 7.624400e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.346920e-01 7.831200e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.338210e-01 7.899690e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.348170e-01 7.756750e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.117e+07 1.117e+07 1.117e+07 0.000e+00 100.00
cycleInit 10 3.490e+06 3.490e+06 3.490e+06 0.000e+00 100.00
cycleTracking 10 7.676e+06 7.676e+06 7.676e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.914e+06 4.914e+06 4.914e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.079e+05 2.079e+05 2.079e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.100e+02 4.100e+02 4.100e+02 0.000e+00 100.00
Figure Of Merit 117.36 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.53547 s
sobelfilter - total time for whole calculation: 0.608972 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.5e-05 s
dl-cifar - total time for whole calculation: 23.8199 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.74 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Output:

Number of args 3
Using cuSVM (Carpenter)...

Buffering input text file (6989624 B).
Load Done
Starting Training
_C 1.000000
Workgroup Size: 1024
nbrCtas 80
elemsPerCta 1248
threadsPerCta 128
Total run time: 0.064632 seconds
Iter:100
M:97683
N:123
Train done. Calulate Vector counts
Training done

Loading elapsed time : 0.0636 s
Processing elapsed time : 0.0699 s
Storing elapsed time : 0.0023 s
Total elapsed time : 0.1358 s
Result's are correct: 0.0551

Copy link

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/12870157551

Copy link

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/12870157551
Job status: success. Test status: success.

Summary

Total 137 benchmarks in mean.
Geomean 100.146%.
Improved 15 Regressed 12 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 100.818%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_ur SubmitKernel in order 16.402000 μs 16.925 μs 103.19% 3.19% +
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.129000 μs 2.193 μs 103.01% 3.01% +
api_overhead_benchmark_sycl SubmitKernel out of order 23.394000 μs 23.959 μs 102.42% 2.42% +
api_overhead_benchmark_l0 SubmitKernel in order 11.556000 μs 11.680 μs 101.07% 1.07% .
api_overhead_benchmark_sycl SubmitKernel in order 24.633000 μs 24.896 μs 101.07% 1.07% .
api_overhead_benchmark_ur SubmitKernel out of order 15.687000 μs 15.814 μs 100.81% 0.81% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.701000 μs 1.708 μs 100.41% 0.41% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 105443.000000 instr 105463.000 instr 100.02% 0.02% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 110795.000000 instr 110815.000 instr 100.02% 0.02% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 124285.000 instr 123991.000000 instr 99.76% -0.24% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.687 μs 21.536000 μs 99.30% -0.70% .
api_overhead_benchmark_l0 SubmitKernel out of order 11.968 μs 11.830000 μs 98.85% -1.15% .
Relative perf in group memory (4): 99.375%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 253.995000 μs 258.713 μs 101.86% 1.86% .
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.078000 GB/s 3.040 GB/s 101.25% 1.25% .
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 225.911 μs 219.696000 μs 97.25% -2.75% -
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.970 μs 5.805000 μs 97.24% -2.76% -
Relative perf in group miscellaneous (1): 99.932%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 860.664 bw GB/s 860.076000 bw GB/s 99.93% -0.07% .
Relative perf in group multithread (10): 99.322%
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 110692.272000 μs 111451.459 μs 100.69% 0.69% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 2034.738000 μs 2042.798 μs 100.40% 0.40% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 26733.793000 μs 26812.066 μs 100.29% 0.29% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6941.165000 μs 6959.764 μs 100.27% 0.27% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 8887.982 μs 8876.618000 μs 99.87% -0.13% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 47448.720 μs 46954.509000 μs 98.96% -1.04% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7879.238 μs 7786.575000 μs 98.82% -1.18% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 1208.126 μs 1193.812000 μs 98.82% -1.18% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events 43105.091 μs 42531.905000 μs 98.67% -1.33% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 17601.132 μs 16985.886000 μs 96.50% -3.50% -
Relative perf in group graph (10): 100.175%
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 56587.543000 μs 57360.560 μs 101.37% 1.37% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 5588.057000 μs 5639.481 μs 100.92% 0.92% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 72589.318000 μs 72707.625 μs 100.16% 0.16% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 71756.219000 μs 71873.055 μs 100.16% 0.16% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 5598.098000 μs 5600.217 μs 100.04% 0.04% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 353396.532000 μs 353438.463 μs 100.01% 0.01% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 62.392000 μs 62.395 μs 100.00% 0.00% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 353207.353 μs 353207.153000 μs 100.00% -0.00% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 54.117 μs 53.918000 μs 99.63% -0.37% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 679.483 μs 675.864000 μs 99.47% -0.53% .
Relative perf in group Velocity-Bench (9): 102.320%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Easywave 235.000000 ms 291.000 ms 123.83% 23.83% ++++++++++
Velocity-Bench dl-cifar 23.582200 s 23.759 s 100.75% 0.75% .
Velocity-Bench svm 0.140900 s 0.142 s 100.50% 0.50% .
Velocity-Bench CudaSift 204.304000 ms 204.802 ms 100.24% 0.24% .
Velocity-Bench QuickSilver 117.510 MMS/CTT 117.680000 MMS/CTT 99.86% -0.14% .
Velocity-Bench Hashtable 353.075 M keys/sec 353.997735 M keys/sec 99.74% -0.26% .
Velocity-Bench Bitcracker 35.640 s 35.494200 s 99.59% -0.41% .
Velocity-Bench dl-mnist 2.390 s 2.380000 s 99.58% -0.42% .
Velocity-Bench Sobel Filter 621.619 ms 615.543000 ms 99.02% -0.98% .
Relative perf in group Runtime (8): 100.326%
Benchmark This PR baseline Relative perf Change -
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1704.355000 ms 1721.794 ms 101.02% 1.02% .
Runtime_DAGTaskThroughput_BasicParallelFor 1741.403000 ms 1754.165 ms 100.73% 0.73% .
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 271.586000 ms 273.534 ms 100.72% 0.72% .
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 272.753000 ms 274.696 ms 100.71% 0.71% .
Runtime_DAGTaskThroughput_SingleTask 1672.866000 ms 1684.141 ms 100.67% 0.67% .
Runtime_DAGTaskThroughput_NDRangeParallelFor 1677.413000 ms 1684.655 ms 100.43% 0.43% .
Runtime_IndependentDAGTaskThroughput_SingleTask 255.285000 ms 255.725 ms 100.17% 0.17% .
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 281.035 ms 275.902000 ms 98.17% -1.83% .
Relative perf in group MicroBench (14): 99.928%
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.754000 ms 4.876 ms 102.57% 2.57% +
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 4.676000 ms 4.702 ms 100.56% 0.56% .
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 4.792000 ms 4.814 ms 100.46% 0.46% .
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.899000 ms 4.911 ms 100.24% 0.24% .
MicroBench_LocalMem_int32_4096 29.845000 ms 29.859 ms 100.05% 0.05% .
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.153000 ms 618.182 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.165000 ms 618.175 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.551 ms 617.548000 ms 100.00% -0.00% .
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.549 ms 617.536000 ms 100.00% -0.00% .
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 4.796 ms 4.794000 ms 99.96% -0.04% .
MicroBench_LocalMem_fp32_4096 29.894 ms 29.835000 ms 99.80% -0.20% .
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 5.055 ms 5.026000 ms 99.43% -0.57% .
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 5.195 ms 5.125000 ms 98.65% -1.35% .
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 4.984 ms 4.852000 ms 97.35% -2.65% -
Relative perf in group Pattern (10): 99.859%
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_Hierarchical_int32 16.846000 ms 16.851 ms 100.03% 0.03% .
Pattern_SegmentedReduction_Hierarchical_int64 11.773000 ms 11.774 ms 100.01% 0.01% .
Pattern_SegmentedReduction_NDRange_int16 2.265000 ms 2.265 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_fp32 11.592 ms 11.590000 ms 99.98% -0.02% .
Pattern_SegmentedReduction_Hierarchical_int16 11.809 ms 11.804000 ms 99.96% -0.04% .
Pattern_SegmentedReduction_Hierarchical_int32 11.591 ms 11.585000 ms 99.95% -0.05% .
Pattern_SegmentedReduction_NDRange_int64 2.339 ms 2.337000 ms 99.91% -0.09% .
Pattern_SegmentedReduction_NDRange_int32 2.172 ms 2.170000 ms 99.91% -0.09% .
Pattern_SegmentedReduction_NDRange_fp32 2.168 ms 2.164000 ms 99.82% -0.18% .
Pattern_Reduction_NDRange_int32 16.767 ms 16.604000 ms 99.03% -0.97% .
Relative perf in group ScalarProduct (6): 100.128%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 3.759000 ms 3.777 ms 100.48% 0.48% .
ScalarProduct_NDRange_int64 5.476000 ms 5.499 ms 100.42% 0.42% .
ScalarProduct_Hierarchical_int64 11.514000 ms 11.530 ms 100.14% 0.14% .
ScalarProduct_NDRange_fp32 3.756000 ms 3.757 ms 100.03% 0.03% .
ScalarProduct_Hierarchical_int32 10.546 ms 10.537000 ms 99.91% -0.09% .
ScalarProduct_Hierarchical_fp32 10.166 ms 10.145000 ms 99.79% -0.21% .
Relative perf in group USM (7): 97.661%
Benchmark This PR baseline Relative perf Change -
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.871000 ms 1.883 ms 100.64% 0.64% .
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.091000 ms 1.095 ms 100.37% 0.37% .
USM_Allocation_latency_fp32_shared 0.057000 ms 0.057 ms 100.00% 0.00% .
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.252 ms 1.249000 ms 99.76% -0.24% .
USM_Allocation_latency_fp32_host 37.607 ms 37.401000 ms 99.45% -0.55% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.718 ms 1.703000 ms 99.13% -0.87% .
USM_Allocation_latency_fp32_device 0.068 ms 0.058000 ms 85.29% -14.71% ------
Relative perf in group VectorAddition (3): 99.674%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 1.474000 ms 1.476 ms 100.14% 0.14% .
VectorAddition_int64 3.071000 ms 3.072 ms 100.03% 0.03% .
VectorAddition_fp32 1.490 ms 1.473000 ms 98.86% -1.14% .
Relative perf in group Polybench (3): 99.250%
Benchmark This PR baseline Relative perf Change -
Polybench_2mm 1.211000 ms 1.215 ms 100.33% 0.33% .
Polybench_3mm 1.733 ms 1.729000 ms 99.77% -0.23% .
Polybench_Atax 6.872 ms 6.712000 ms 97.67% -2.33% -
Relative perf in group Kmeans (1): 99.876%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 16.111 ms 16.091000 ms 99.88% -0.12% .
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 946.725000 ms -
Relative perf in group MolecularDynamics (1): 103.333%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.030000 ms 0.031 ms 103.33% 3.33% +
Relative perf in group llama.cpp (6): 100.149%
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 512 432.355821 token/s 428.269 token/s 100.95% 0.95% .
llama.cpp Prompt Processing Batched 128 822.752124 token/s 820.039 token/s 100.33% 0.33% .
llama.cpp Text Generation Batched 512 62.578197 token/s 62.521 token/s 100.09% 0.09% .
llama.cpp Text Generation Batched 256 62.551 token/s 62.555854 token/s 99.99% -0.01% .
llama.cpp Text Generation Batched 128 62.500 token/s 62.541124 token/s 99.93% -0.07% .
llama.cpp Prompt Processing Batched 256 865.855 token/s 869.339926 token/s 99.60% -0.40% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): 100.540%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2101.150000 ns 2178.240 ns 103.67% 3.67% ++
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 302.779000 ns 303.268 ns 100.16% 0.16% .
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3081.310 ns 3080.300000 ns 99.97% -0.03% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2747.770 ns 2704.770000 ns 98.44% -1.56% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): 100.095%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 699.049000 ns 722.081 ns 103.29% 3.29% +
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 274.456 ns 273.931000 ns 99.81% -0.19% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 215.631 ns 213.298000 ns 98.92% -1.08% .
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 193.666 ns 190.623000 ns 98.43% -1.57% .
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): 99.893%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1878.420000 ns 1927.360 ns 102.61% 2.61% +
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 266.920 ns 266.688000 ns 99.91% -0.09% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1447.100 ns 1440.300000 ns 99.53% -0.47% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3435.180 ns 3352.290000 ns 97.59% -2.41% -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): 103.902%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 747.634000 ns 836.799 ns 111.93% 11.93% +++++
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 190.321000 ns 199.070 ns 104.60% 4.60% ++
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 205.886000 ns 206.587 ns 100.34% 0.34% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 308.951 ns 306.518000 ns 99.21% -0.79% .
Relative perf in group alloc/min (4): 100.425%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 1035.670000 ns 1068.480 ns 103.17% 3.17% +
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 174.121000 ns 176.254 ns 101.23% 1.23% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 972.724 ns 963.990000 ns 99.10% -0.90% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 834.343 ns 819.953000 ns 98.28% -1.72% .
Relative perf in group multiple (12): 99.406%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 30334.100000 ns 31773.300 ns 104.74% 4.74% ++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 14974.000000 ns 15455.100 ns 103.21% 3.21% +
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4141.490000 ns 4273.390 ns 103.18% 3.18% +
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 41802.500000 ns 42492.700 ns 101.65% 1.65% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1215620.000 ns 1213060.000000 ns 99.79% -0.21% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 166132.000 ns 164285.000000 ns 98.89% -1.11% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 141286.000 ns 139112.000000 ns 98.46% -1.54% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 32428.800 ns 31736.100000 ns 97.86% -2.14% -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 26177.100 ns 25433.600000 ns 97.16% -2.84% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1206500.000 ns 1168340.000000 ns 96.84% -3.16% -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 76653.200 ns 73568.200000 ns 95.98% -4.02% --
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 145134.000 ns 138786.000000 ns 95.63% -4.37% --

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),12.012,11.968,2.82%,11.233,69.468,[CPU],[us]

api_overhead_benchmark_l0 SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=l0 Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),11.591,11.556,6.40%,10.537,221.052,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.528,23.394,3.75%,22.417,243.740,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.784,24.633,3.85%,23.791,276.889,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),255.709,253.995,1.85%,251.328,550.369,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),252.272,225.911,26.23%,222.585,524.580,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),6.263,5.970,15.88%,5.602,57.263,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device multiplier=1),3.015,3.078,6.62%,0.410,3.359,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.136,2.129,7.32%,1.938,34.415,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.713,1.701,11.34%,1.600,57.139,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),860.165,860.664,0.43%,812.850,875.942,[GPU],bw [GB/s]

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=400 NumThreads=1 AllocSize=102400 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=1 DstUSM=1),6968.084,6941.165,1.03%,6913.055,7150.846,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=100 NumThreads=8 AllocSize=102400 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=1 DstUSM=1),17434.809,17601.132,3.62%,15852.152,18061.541,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=400 NumThreads=8 AllocSize=1024 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=1 DstUSM=1),47410.429,47448.720,1.80%,44013.739,50987.821,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=10 NumThreads=16 AllocSize=1024 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=1 DstUSM=1),2092.685,2034.738,28.20%,1535.141,16885.418,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=400 NumThreads=1 AllocSize=102400 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=0 DstUSM=1),7895.899,7879.238,1.31%,7744.120,8111.320,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=100 NumThreads=8 AllocSize=102400 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=0 DstUSM=1),9042.895,8887.982,3.72%,8742.078,9917.728,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=400 NumThreads=8 AllocSize=1024 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=0 DstUSM=1),26866.108,26733.793,1.90%,25479.431,28637.016,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=10 NumThreads=16 AllocSize=1024 MeasureCompletion=1 UseEvents=1 UseQueuePerThread=1 SrcUSM=0 DstUSM=1),1299.045,1208.126,45.20%,900.795,15315.649,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=4096 NumThreads=1 AllocSize=1024 MeasureCompletion=1 UseEvents=0 UseQueuePerThread=1 SrcUSM=0 DstUSM=1),43097.096,43105.091,0.29%,42901.875,43287.280,[CPU],[us]

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
MemcpyExecute(api=ur Ioq=1 NumOpsPerThread=4096 NumThreads=4 AllocSize=1024 MeasureCompletion=1 UseEvents=0 UseQueuePerThread=1 SrcUSM=0 DstUSM=1),110660.144,110692.272,0.28%,110007.456,111104.259,[CPU],[us]

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SinKernelGraph(api=sycl numKernels=10 withGraphs=0),71769.121,71756.219,0.05%,71741.040,71947.059,[CPU],[us]

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SinKernelGraph(api=sycl numKernels=10 withGraphs=1),72592.110,72589.318,0.01%,72575.564,72621.579,[CPU],[us]

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SinKernelGraph(api=sycl numKernels=100 withGraphs=0),353464.191,353396.532,0.08%,353238.544,354535.400,[CPU],[us]

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SinKernelGraph(api=sycl numKernels=100 withGraphs=1),353199.314,353207.353,0.02%,353051.667,353363.242,[CPU],[us]

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitExecGraph(api=sycl measureSubmit=1 numKernels=10 ioq=0),55.934,54.117,9.84%,52.238,89.722,[CPU],[us]

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitExecGraph(api=sycl measureSubmit=1 numKernels=10 ioq=1),63.955,62.392,9.68%,59.764,109.538,[CPU],[us]

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitExecGraph(api=sycl measureSubmit=1 numKernels=100 ioq=1),681.045,679.483,2.51%,656.979,734.479,[CPU],[us]

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitExecGraph(api=sycl measureSubmit=0 numKernels=10 ioq=0),5601.653,5588.057,0.98%,5520.960,5859.589,[CPU],[us]

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitExecGraph(api=sycl measureSubmit=0 numKernels=10 ioq=1),5601.889,5598.098,0.70%,5531.067,5707.358,[CPU],[us]

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitExecGraph(api=sycl measureSubmit=0 numKernels=100 ioq=1),56783.839,56587.543,0.81%,56238.511,58288.181,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),105503.688,105443.000,6.01%,105335.000,2106286.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.015,15.777,303.57%,15.131,15388.617,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),105503.688,105443.000,6.01%,105335.000,2106286.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),15.956,15.687,360.76%,15.072,18218.340,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),110862.264,110795.000,3.98%,110795.000,1500602.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.614,16.402,267.75%,15.595,14082.042,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),110862.264,110795.000,3.98%,110795.000,1500602.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),16.614,16.402,267.75%,15.595,14082.042,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),124511.348,124285.000,3.81%,123657.000,1592424.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),21.998,21.687,212.19%,20.011,14778.469,[CPU],time [us]

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),124511.348,124285.000,3.81%,123657.000,1592424.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),21.998,21.687,212.19%,20.011,14778.469,[CPU],time [us]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.38014 s
353.074645 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00384029 s
bitcracker - total time for whole calculation: 35.6396 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1257 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1266 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1255 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1036 1258 28.1292% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1250 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1270 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1265 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1263 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1259 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1256 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1251 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1209 1273 32.8265% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1262 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1069 1266 29.0253% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1261 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1260 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1249 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1040 1262 28.2379% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1243 1276 33.7497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1208 1259 32.7993% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1273 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1152 1270 31.2788% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1090 1253 29.5954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1118 1256 30.3557% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1272 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1263 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1269 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1256 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1270 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1270 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1163 1268 31.5775% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1267 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1269 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1262 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1100 1256 29.867% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1263 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1114 1260 30.2471% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1193 1264 32.3921% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1167 1254 31.6861% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1280 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1255 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1084 1269 29.4325% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1160 1258 31.4961% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 204.304 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.770570e-01 6.237680e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.395570e-01 7.631070e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.363820e-01 7.788010e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.697570e-01 8.329790e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.344320e-01 7.885440e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.334600e-01 7.641250e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.340610e-01 7.632370e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.333700e-01 7.844630e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.326560e-01 7.909960e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.329760e-01 7.761500e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.109e+07 1.109e+07 1.109e+07 0.000e+00 100.00
cycleInit 10 3.424e+06 3.424e+06 3.424e+06 0.000e+00 100.00
cycleTracking 10 7.666e+06 7.666e+06 7.666e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.920e+06 4.920e+06 4.920e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.940e+05 1.940e+05 1.940e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 3.980e+02 3.980e+02 3.980e+02 0.000e+00 100.00
Figure Of Merit 117.51 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.54239 s
sobelfilter - total time for whole calculation: 0.621619 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.4e-05 s
dl-cifar - total time for whole calculation: 23.5822 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.39 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Output:

Number of args 3
Using cuSVM (Carpenter)...

Buffering input text file (6989624 B).
Load Done
Starting Training
_C 1.000000
Workgroup Size: 1024
nbrCtas 80
elemsPerCta 1248
threadsPerCta 128
Total run time: 0.068670 seconds
Iter:100
M:97683
N:123
Train done. Calulate Vector counts
Training done

Loading elapsed time : 0.0647 s
Processing elapsed time : 0.0739 s
Storing elapsed time : 0.0023 s
Total elapsed time : 0.1409 s
Result's are correct: 0.0551

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.257646', '0.255285', '0.253539', '0.253539 0.255285 0.264114', '0.005669', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.281219', '0.281035', '0.271973', '0.271973 0.281035 0.290650', '0.009340', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.272665', '0.271586', '0.270977', '0.270977 0.271586 0.275432', '0.002416', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.282471', '0.272753', '0.269816', '0.269816 0.272753 0.304842', '0.019430', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.674034', '1.672866', '1.671368', '1.671368 1.672866 1.677868', '0.003404', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.741795', '1.741403', '1.740537', '1.740537 1.741403 1.743446', '0.001494', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.707536', '1.704355', '1.703413', '1.703413 1.704355 1.714839', '0.006342', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.677884', '1.677413', '1.675330', '1.675330 1.677413 1.680910', '0.002820', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005847', '0.004899', '0.004690', '0.004690 0.004899 0.007950', '0.001825', '26.650298', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004820', '0.004796', '0.004660', '0.004660 0.004796 0.005003', '0.000173', '26.823798', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004722', '0.004676', '0.004670', '0.004670 0.004676 0.004820', '0.000085', '26.766177', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004786', '0.004792', '0.004762', '0.004762 0.004792 0.004805', '0.000022', '26.248896', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618154', '0.618153', '0.618137', '0.618137 0.618153 0.618170', '0.000017', '0.202221', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618160', '0.618165', '0.618142', '0.618142 0.618165 0.618174', '0.000016', '0.202219', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004744', '0.004754', '0.004444', '0.004444 0.004754 0.005034', '0.000295', '28.127889', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005170', '0.005195', '0.005049', '0.005049 0.005195 0.005265', '0.000111', '24.758658', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005037', '0.005055', '0.004946', '0.004946 0.005055 0.005110', '0.000084', '25.275232', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005042', '0.004984', '0.004946', '0.004946 0.004984 0.005195', '0.000134', '25.271655', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617552', '0.617549', '0.617547', '0.617547 0.617549 0.617560', '0.000007', '0.202414', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617551', '0.617551', '0.617546', '0.617546 0.617551 0.617556', '0.000005', '0.202414', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.029843', '0.029845', '0.029833', '0.029833 0.029845 0.029850', '0.000009', '10458.273149', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.029885', '0.029894', '0.029846', '0.029846 0.029894 0.029915', '0.000036', '10453.832708', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016724', '0.016767', '0.016468', '0.016468 0.016767 0.016938', '0.000238', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016899', '0.016846', '0.016749', '0.016749 0.016846 0.017102', '0.000182', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003759', '0.003759', '0.003757', '0.003757 0.003759 0.003761', '0.000002', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005473', '0.005476', '0.005449', '0.005449 0.005476 0.005496', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003778', '0.003756', '0.003747', '0.003747 0.003756 0.003830', '0.000045', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010530', '0.010546', '0.010491', '0.010491 0.010546 0.010552', '0.000033', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011514', '0.011514', '0.011478', '0.011478 0.011514 0.011548', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010163', '0.010166', '0.010128', '0.010128 0.010166 0.010195', '0.000034', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002273', '0.002265', '0.002263', '0.002263 0.002265 0.002291', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002169', '0.002172', '0.002165', '0.002165 0.002172 0.002172', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002341', '0.002339', '0.002337', '0.002337 0.002339 0.002346', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.002169', '0.002168', '0.002161', '0.002161 0.002168 0.002179', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011808', '0.011809', '0.011804', '0.011804 0.011809 0.011811', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011598', '0.011591', '0.011586', '0.011586 0.011591 0.011619', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011776', '0.011773', '0.011761', '0.011761 0.011773 0.011794', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011594', '0.011592', '0.011589', '0.011589 0.011592 0.011602', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000062', '0.000068', '0.000049', '0.000049 0.000068 0.000070', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037602', '0.037607', '0.037580', '0.037580 0.037607 0.037618', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000059', '0.000057', '0.000055', '0.000055 0.000057 0.000063', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.002255', '0.001718', '0.001704', '0.001704 0.001718 0.003342', '0.000941', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001103', '0.001091', '0.001087', '0.001087 0.001091 0.001132', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001896', '0.001871', '0.001865', '0.001865 0.001871 0.001951', '0.000048', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001257', '0.001252', '0.001251', '0.001251 0.001252 0.001269', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001482', '0.001474', '0.001449', '0.001449 0.001474 0.001521', '0.000036', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003074', '0.003071', '0.003070', '0.003070 0.003071 0.003080', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001472', '0.001490', '0.001436', '0.001436 0.001490 0.001491', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001214', '0.001211', '0.001206', '0.001206 0.001211 0.001226', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001735', '0.001733', '0.001722', '0.001722 0.001733 0.001748', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006875', '0.006872', '0.006859', '0.006859 0.006872 0.006892', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016103', '0.016111', '0.016083', '0.016083 0.016111 0.016116', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.949226', '0.946725', '0.945995', '0.945995 0.946725 0.954959', '0.004978', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/pmdk/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=3 --output=/home/pmdk/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000038', '0.000030', '0.000026', '0.000026 0.000030 0.000057', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

llama.cpp Prompt Processing Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:03:37Z","622303813","1299155","822.752124","1.717802"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:03:43Z","2048663037","2986076","62.479878","0.090892"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:03:54Z","596520718","2379024","858.321389","3.410786"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:03:57Z","2049720141","2416722","62.447619","0.073491"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:08Z","1204354130","6127642","425.132887","2.151750"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:04:15Z","2051853700","2265839","62.382676","0.068861"

llama.cpp Text Generation Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:26Z","621947956","2920827","823.234469","3.850140"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:04:30Z","2048014579","3517272","62.499702","0.107110"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:40Z","586768102","5699282","872.641722","8.406182"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:04:44Z","2044662201","1836865","62.602068","0.056182"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:54Z","1184225951","4892642","432.355821","1.784400"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:05:02Z","2045441863","1628977","62.578197","0.049786"

llama.cpp Prompt Processing Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:02:50Z","633359859","30468306","809.860501","38.283399"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:02:54Z","2010781553","1512675","63.656869","0.047867"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:03:05Z","591351606","4620453","865.855011","6.696115"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:03:09Z","2046343761","1896019","62.550628","0.057892"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:03:19Z","1180475504","4885943","433.729461","1.792069"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:03:26Z","2045428964","931251","62.578571","0.028470"

llama.cpp Text Generation Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:02:50Z","633359859","30468306","809.860501","38.283399"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:02:54Z","2010781553","1512675","63.656869","0.047867"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:03:05Z","591351606","4620453","865.855011","6.696115"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:03:09Z","2046343761","1896019","62.550628","0.057892"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:03:19Z","1180475504","4885943","433.729461","1.792069"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:03:26Z","2045428964","931251","62.578571","0.028470"

llama.cpp Prompt Processing Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:26Z","621947956","2920827","823.234469","3.850140"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:04:30Z","2048014579","3517272","62.499702","0.107110"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:40Z","586768102","5699282","872.641722","8.406182"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:04:44Z","2044662201","1836865","62.602068","0.056182"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:54Z","1184225951","4892642","432.355821","1.784400"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:05:02Z","2045441863","1628977","62.578197","0.049786"

llama.cpp Text Generation Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:26Z","621947956","2920827","823.234469","3.850140"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:04:30Z","2048014579","3517272","62.499702","0.107110"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:40Z","586768102","5699282","872.641722","8.406182"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:04:44Z","2044662201","1836865","62.602068","0.056182"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T15:04:54Z","1184225951","4892642","432.355821","1.784400"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T15:05:02Z","2045441863","1628977","62.578197","0.049786"

alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2852.83,1924.5,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,716.912,716.914,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1496.34,1351.76,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,750.102,750.066,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,841.451,772.855,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,177.158,177.154,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2104.52,2101.4,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,194.158,194.152,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1878.42,1878.39,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,188.552,188.545,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3081.31,3052.73,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,274.456,274.333,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3740.52,3700.36,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,308.951,308.945,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,297.927,296.998,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,235.405,235.399,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,263.409,261.482,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,194.962,194.921,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,977.995,971.456,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.676,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,33026.3,31220,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4141.49,4141.39,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141286,89571.5,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,29928.2,29927.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.2065e+06,1.20608e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,162706,162704,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.20948e+06,1.20757e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,145134,145133,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,42847.6,42031.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,15353.3,15352.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,72980.4,72960.8,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26771.6,26771.2,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2747.77,1933.64,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,698.678,698.677,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1447.1,1353.82,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,735.912,735.88,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,834.343,769.453,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,174.121,174.117,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2101.15,2100.61,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,193.666,193.66,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1985.72,1984.97,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,190.321,190.315,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2941.16,2891.7,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,267.18,267.175,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3268.18,3223.81,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,303.07,303.062,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,306.998,297.981,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,209.25,209.253,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,266.92,262.003,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,205.886,205.886,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1035.67,999.825,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,972.724,972.712,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,32428.8,30573.3,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4212.23,4212.06,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,141388,89050.2,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30442.6,30442.3,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.19206e+06,1.19142e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,166132,166130,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.21562e+06,1.21484e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,143494,143489,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41802.5,41274.9,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14974,14973.5,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,77366.2,77352.6,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,25767.2,25766.6,ns,,,,,

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

Copy link

Compute Benchmarks level_zero run (with params: --filter "QueueInOrderMemcpy"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12927369918

Copy link

Compute Benchmarks level_zero run (--filter "QueueInOrderMemcpy"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12927369918
Job status: success. Test status: success.

Summary

Total 2 benchmarks in mean.
Geomean 128.146%.
Improved 1 Regressed 0 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group memory (4): 128.146%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 134.572000 μs 218.485 μs 162.36% 62.36% ++++++++++
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 250.326000 μs 253.192 μs 101.14% 1.14% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 - 5.831000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 - 3.082000 GB/s
Relative perf in group api (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order - 11.741000 μs
api_overhead_benchmark_l0 SubmitKernel in order - 11.419000 μs
api_overhead_benchmark_sycl SubmitKernel out of order - 23.423000 μs
api_overhead_benchmark_sycl SubmitKernel in order - 24.707000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 - 2.161000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 - 1.705000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count - 105463.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order - 15.763000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count - 110815.000000 instr
api_overhead_benchmark_ur SubmitKernel in order - 16.783000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count - 123991.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion - 21.455000 μs
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum - 859.195000 bw GB/s
Relative perf in group multithread (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 - 6931.007000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 - 17308.085000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 - 47028.145000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 - 2056.843000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 - 7746.524000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 - 8886.251000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 - 26759.017000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 - 1198.342000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events - 42714.145000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events - 114836.190000 μs
Relative perf in group graph (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 - 71756.948000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 - 72702.628000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 - 353472.301000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 - 353385.778000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 - 53.970000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 - 62.094000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 - 678.072000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 - 5605.418000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 - 5585.609000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 - 57222.637000 μs
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 352.453804 M keys/sec
Velocity-Bench Bitcracker - 35.732500 s
Velocity-Bench CudaSift - 204.556000 ms
Velocity-Bench Easywave - 233.000000 ms
Velocity-Bench QuickSilver - 117.650000 MMS/CTT
Velocity-Bench Sobel Filter - 625.818000 ms
Velocity-Bench dl-cifar - 23.906300 s
Velocity-Bench dl-mnist - 2.390000 s
Velocity-Bench svm - 0.138800 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 258.612000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 287.688000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 275.171000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 276.140000 ms
Runtime_DAGTaskThroughput_SingleTask - 1694.837000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1764.397000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1738.675000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1707.643000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 4.852000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.832000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.719000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.795000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.126000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.173000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.825000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.159000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.103000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.031000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.702000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.530000 ms
MicroBench_LocalMem_int32_4096 - 29.834000 ms
MicroBench_LocalMem_fp32_4096 - 29.869000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 17.062000 ms
Pattern_Reduction_Hierarchical_int32 - 16.673000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.267000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.169000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.805000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.582000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.765000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.586000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.731000 ms
ScalarProduct_NDRange_int64 - 5.455000 ms
ScalarProduct_NDRange_fp32 - 3.759000 ms
ScalarProduct_Hierarchical_int32 - 10.530000 ms
ScalarProduct_Hierarchical_int64 - 11.517000 ms
ScalarProduct_Hierarchical_fp32 - 10.158000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.070000 ms
USM_Allocation_latency_fp32_host - 37.440000 ms
USM_Allocation_latency_fp32_shared - 0.058000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.749000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.117000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.874000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.264000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.442000 ms
VectorAddition_int64 - 3.162000 ms
VectorAddition_fp32 - 1.476000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.225000 ms
Polybench_3mm - 1.736000 ms
Polybench_Atax - 6.898000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 16.081000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 994.314000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.030000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 819.567376 token/s
llama.cpp Text Generation Batched 128 - 62.459005 token/s
llama.cpp Prompt Processing Batched 256 - 867.008645 token/s
llama.cpp Text Generation Batched 256 - 62.493712 token/s
llama.cpp Prompt Processing Batched 512 - 424.730218 token/s
llama.cpp Text Generation Batched 512 - 62.448657 token/s
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc - 2613.500000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider - 2217.470000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3317.100000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 310.122000 ns
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc - 704.764000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider - 193.084000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 287.614000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 212.209000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc - 1251.740000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider - 2023.360000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> - 3438.070000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> - 283.524000 ns
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc - 742.676000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider - 191.882000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> - 315.228000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> - 205.946000 ns
Relative perf in group alloc/min (4): cannot calculate
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc - 848.832000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc - 178.321000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> - 1212.920000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> - 969.864000 ns
Relative perf in group multiple (12): cannot calculate
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc - 32562.900000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc - 4280.750000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc - 138769.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc - 30836.400000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> - 1155430.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> - 162481.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider - 1173940.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider - 149287.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> - 43668.900000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> - 15731.700000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> - 74953.300000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> - 26434.600000 ns

Details

Benchmark details - environment, command, output...
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),251.774,250.326,1.67%,245.491,462.437,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),135.567,134.572,2.16%,133.391,323.030,[CPU],[us]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant