test pr please ignore #2587

pbalcer · 2025-01-20T10:23:34Z

No description provided.

github-actions · 2025-01-20T10:28:03Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|llama|memory_benchmark_sycl"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12865994266

github-actions · 2025-01-20T10:48:30Z

Compute Benchmarks level_zero run (--filter "Velocity|llama|memory_benchmark_sycl"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12865994266
Job status: success. Test status: success.

Summary

Total 19 benchmarks in mean.
Geomean 104.308%.
Improved 6 Regressed 1 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group memory (4): 115.889%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	133.924000 μs	219.808 μs	164.13%	64.13%	++++++++++
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.599000 μs	5.865 μs	104.75%	4.75%	+
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.176000 GB/s	3.043 GB/s	104.37%	4.37%	+
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	253.546000 μs	254.865 μs	100.52%	0.52%	.

Relative perf in group Velocity-Bench (9): 102.191%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench Easywave	233.000000 ms	289.000 ms	124.03%	24.03%	++++
Velocity-Bench Sobel Filter	593.005000 ms	621.173 ms	104.75%	4.75%	+
Velocity-Bench svm	0.136100 s	0.140 s	102.94%	2.94%	.
Velocity-Bench dl-cifar	23.698200 s	23.972 s	101.16%	1.16%	.
Velocity-Bench QuickSilver	118.600000 MMS/CTT	117.450 MMS/CTT	100.98%	0.98%	.
Velocity-Bench Hashtable	359.537028 M keys/sec	356.084 M keys/sec	100.97%	0.97%	.
Velocity-Bench CudaSift	202.681000 ms	204.342 ms	100.82%	0.82%	.
Velocity-Bench Bitcracker	35.033700 s	35.119 s	100.24%	0.24%	.
Velocity-Bench dl-mnist	2.730 s	2.380000 s	87.18%	-12.82%	--

Relative perf in group llama.cpp (6): 100.274%

Benchmark	This PR	baseline	Relative perf	Change	-
llama.cpp Prompt Processing Batched 512	431.144378 token/s	426.428 token/s	101.11%	1.11%	.
llama.cpp Text Generation Batched 512	62.828598 token/s	62.478 token/s	100.56%	0.56%	.
llama.cpp Text Generation Batched 256	62.794807 token/s	62.525 token/s	100.43%	0.43%	.
llama.cpp Text Generation Batched 128	62.756959 token/s	62.531 token/s	100.36%	0.36%	.
llama.cpp Prompt Processing Batched 256	870.777 token/s	872.219855 token/s	99.83%	-0.17%	.
llama.cpp Prompt Processing Batched 128	825.121 token/s	830.457525 token/s	99.36%	-0.64%	.

Relative perf in group api (12): cannot calculate

Benchmark	This PR	baseline
api_overhead_benchmark_l0 SubmitKernel out of order	-	11.848000 μs
api_overhead_benchmark_l0 SubmitKernel in order	-	11.745000 μs
api_overhead_benchmark_sycl SubmitKernel out of order	-	23.710000 μs
api_overhead_benchmark_sycl SubmitKernel in order	-	24.891000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	-	2.143000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	-	1.702000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count	-	105463.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order	-	15.623000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count	-	110815.000000 instr
api_overhead_benchmark_ur SubmitKernel in order	-	16.859000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	-	123991.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion	-	21.425000 μs

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	-	861.253000 bw GB/s

Relative perf in group multithread (10): cannot calculate

Benchmark	This PR	baseline
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	-	6931.139000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	-	17007.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	-	47383.460000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	-	2073.904000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	-	7868.958000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	-	9035.852000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	-	27237.512000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	-	1194.467000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	42860.412000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	113343.613000 μs

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline
Runtime_IndependentDAGTaskThroughput_SingleTask	-	253.100000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	-	273.484000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	-	271.662000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	-	272.505000 ms
Runtime_DAGTaskThroughput_SingleTask	-	1691.410000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	-	1756.502000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	-	1721.262000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	-	1694.375000 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	-	5.188000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	-	4.967000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	-	4.769000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	-	4.866000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	-	618.226000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	-	618.268000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	-	4.919000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	-	5.115000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	-	5.140000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	-	5.113000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	-	617.772000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	-	617.628000 ms
MicroBench_LocalMem_int32_4096	-	29.834000 ms
MicroBench_LocalMem_fp32_4096	-	29.857000 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline
Pattern_Reduction_NDRange_int32	-	16.971000 ms
Pattern_Reduction_Hierarchical_int32	-	17.024000 ms
Pattern_SegmentedReduction_NDRange_int16	-	2.263000 ms
Pattern_SegmentedReduction_NDRange_int32	-	2.164000 ms
Pattern_SegmentedReduction_NDRange_int64	-	2.333000 ms
Pattern_SegmentedReduction_NDRange_fp32	-	2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16	-	11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32	-	11.587000 ms
Pattern_SegmentedReduction_Hierarchical_int64	-	11.777000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	-	11.588000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline
ScalarProduct_NDRange_int32	-	3.734000 ms
ScalarProduct_NDRange_int64	-	5.456000 ms
ScalarProduct_NDRange_fp32	-	3.767000 ms
ScalarProduct_Hierarchical_int32	-	10.555000 ms
ScalarProduct_Hierarchical_int64	-	11.508000 ms
ScalarProduct_Hierarchical_fp32	-	10.174000 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline
USM_Allocation_latency_fp32_device	-	0.068000 ms
USM_Allocation_latency_fp32_host	-	37.633000 ms
USM_Allocation_latency_fp32_shared	-	0.057000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	-	1.717000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	-	1.085000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	-	1.889000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	-	1.256000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline
VectorAddition_int32	-	1.510000 ms
VectorAddition_int64	-	3.066000 ms
VectorAddition_fp32	-	1.460000 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline
Polybench_2mm	-	1.221000 ms
Polybench_3mm	-	1.730000 ms
Polybench_Atax	-	6.855000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	-	16.091000 ms

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
LinearRegressionCoeff_fp32	-	908.423000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	-	0.030000 ms

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	-	2475.310000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	-	2120.000000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3068.370000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	283.309000 ns

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	-	706.837000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	-	197.281000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	268.948000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.433000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	-	1259.770000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	-	1854.120000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3771.150000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	253.839000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	-	726.627000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	-	195.246000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	308.264000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	206.713000 ns

Relative perf in group alloc/min (4): cannot calculate

Benchmark	This PR	baseline
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	-	803.081000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	-	177.090000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	-	978.697000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	-	975.381000 ns

Relative perf in group multiple (12): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	-	33503.600000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	-	4251.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	-	141113.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	-	30214.100000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1170470.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	165011.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	-	1151930.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	-	145356.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42332.700000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	15330.800000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	-	75942.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	-	25425.600000 ns

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00411306 s
bitcracker - total time for whole calculation: 35.0337 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1079 1266 29.2968% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1117 1270 30.3285% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1276 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1274 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1199 1254 32.555% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1129 1273 30.6544% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1264 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1268 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1117 1271 30.3285% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1208 1266 32.7993% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1216 1252 33.0166% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1130 1266 30.6815% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1266 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1175 1255 31.9033% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1259 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1259 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1171 1254 31.7947% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1268 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1251 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1266 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1271 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1102 1250 29.9213% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1078 1279 29.2696% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1209 1255 32.8265% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1135 1272 30.8173% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1265 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1274 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1264 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1122 1258 30.4643% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1273 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1082 1255 29.3782% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1125 1253 30.5458% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1264 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1100 1269 29.867% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1258 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1140 1265 30.953% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1273 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1269 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1263 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1258 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1270 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1264 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1061 1271 28.808% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 202.681 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.328980e-01 6.103600e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.604810e-01 7.423830e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.363200e-01 7.603240e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.675710e-01 8.242040e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.639380e-01 7.936470e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.320220e-01 7.813230e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.324010e-01 7.623190e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.349380e-01 7.812370e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.326040e-01 7.819730e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.334550e-01 7.578150e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.112e+07 1.112e+07 1.112e+07 0.000e+00 100.00
cycleInit 10 3.527e+06 3.527e+06 3.527e+06 0.000e+00 100.00
cycleTracking 10 7.596e+06 7.596e+06 7.596e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.916e+06 4.916e+06 4.916e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.011e+05 2.011e+05 2.011e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 3.990e+02 3.990e+02 3.990e+02 0.000e+00 100.00
Figure Of Merit 118.60 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.48054 s
sobelfilter - total time for whole calculation: 0.593005 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.7e-05 s
dl-cifar - total time for whole calculation: 23.6982 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.73 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

Output:

build_commit,build_number,cuda,vulkan,kompute,metal,sycl,rpc,gpu_blas,blas,cpu_info,gpu_info,model_filename,model_type,model_size,model_n_params,n_batch,n_ubatch,n_threads,cpu_mask,cpu_strict,poll,type_k,type_v,n_gpu_layers,split_mode,main_gpu,no_kv_offload,flash_attn,tensor_split,use_mmap,embeddings,n_prompt,n_gen,test_time,avg_ns,stddev_ns,avg_ts,stddev_ts
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:45:59Z","800981973","419087399","733.624096","225.559274"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","128","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:46:07Z","2040098646","3348632","62.742199","0.102781"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:46:18Z","582747125","1350197","878.600987","2.037306"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","256","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:46:21Z","2037016342","2492470","62.837077","0.076754"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","512","0","2025-01-20T10:46:31Z","1182129718","2992373","433.118815","1.096055"
"1ee9eea0","4073","0","0","0","0","1","0","1","1","INTEL(R) XEON(R) PLATINUM 8580","Intel(R) Data Center GPU Max 1100","/home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf","phi3 3B Q4_K - Medium","2392493568","3821079552","512","512","56","0x0","0","50","f16","f16","99","layer","0","0","0","0.00","1","0","0","128","2025-01-20T10:46:39Z","2037289954","1693822","62.828598","0.052196"

github-actions · 2025-01-20T10:50:27Z

Compute Benchmarks level_zero run (with params: --filter "SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866385766

github-actions · 2025-01-20T10:57:22Z

Compute Benchmarks level_zero run (--filter "SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866385766
Job status: success. Test status: success.

Summary

Total 9 benchmarks in mean.
Geomean 101.398%.
Improved 3 Regressed 0 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 101.398%

Benchmark	This PR	baseline	Relative perf	Change	-
api_overhead_benchmark_l0 SubmitKernel out of order	11.388000 μs	11.848 μs	104.04%	4.04%	++++++++++
api_overhead_benchmark_ur SubmitKernel in order	16.398000 μs	16.859 μs	102.81%	2.81%	+++++++
api_overhead_benchmark_ur SubmitKernel in order with measure completion	20.953000 μs	21.425 μs	102.25%	2.25%	++++++
api_overhead_benchmark_sycl SubmitKernel in order	24.452000 μs	24.891 μs	101.80%	1.80%	.
api_overhead_benchmark_sycl SubmitKernel out of order	23.390000 μs	23.710 μs	101.37%	1.37%	.
api_overhead_benchmark_ur SubmitKernel out of order CPU count	104523.000000 instr	105463.000 instr	100.90%	0.90%	.
api_overhead_benchmark_ur SubmitKernel in order CPU count	109983.000000 instr	110815.000 instr	100.76%	0.76%	.
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	123156.000000 instr	123991.000 instr	100.68%	0.68%	.
api_overhead_benchmark_ur SubmitKernel out of order	15.927 μs	15.623000 μs	98.09%	-1.91%	.
api_overhead_benchmark_l0 SubmitKernel in order	-	11.745000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	-	2.143000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	-	1.702000 μs

Relative perf in group memory (4): cannot calculate

Benchmark	This PR	baseline
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	-	254.865000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	-	219.808000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	-	5.865000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	-	3.043000 GB/s

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	-	861.253000 bw GB/s

Relative perf in group multithread (10): cannot calculate

Benchmark	This PR	baseline
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	-	6931.139000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	-	17007.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	-	47383.460000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	-	2073.904000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	-	7868.958000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	-	9035.852000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	-	27237.512000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	-	1194.467000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	42860.412000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	113343.613000 μs

Relative perf in group Velocity-Bench (9): cannot calculate

Benchmark	This PR	baseline
Velocity-Bench Hashtable	-	356.084148 M keys/sec
Velocity-Bench Bitcracker	-	35.118800 s
Velocity-Bench CudaSift	-	204.342000 ms
Velocity-Bench Easywave	-	289.000000 ms
Velocity-Bench QuickSilver	-	117.450000 MMS/CTT
Velocity-Bench Sobel Filter	-	621.173000 ms
Velocity-Bench dl-cifar	-	23.972100 s
Velocity-Bench dl-mnist	-	2.380000 s
Velocity-Bench svm	-	0.140100 s

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline
Runtime_IndependentDAGTaskThroughput_SingleTask	-	253.100000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	-	273.484000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	-	271.662000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	-	272.505000 ms
Runtime_DAGTaskThroughput_SingleTask	-	1691.410000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	-	1756.502000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	-	1721.262000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	-	1694.375000 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	-	5.188000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	-	4.967000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	-	4.769000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	-	4.866000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	-	618.226000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	-	618.268000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	-	4.919000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	-	5.115000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	-	5.140000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	-	5.113000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	-	617.772000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	-	617.628000 ms
MicroBench_LocalMem_int32_4096	-	29.834000 ms
MicroBench_LocalMem_fp32_4096	-	29.857000 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline
Pattern_Reduction_NDRange_int32	-	16.971000 ms
Pattern_Reduction_Hierarchical_int32	-	17.024000 ms
Pattern_SegmentedReduction_NDRange_int16	-	2.263000 ms
Pattern_SegmentedReduction_NDRange_int32	-	2.164000 ms
Pattern_SegmentedReduction_NDRange_int64	-	2.333000 ms
Pattern_SegmentedReduction_NDRange_fp32	-	2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16	-	11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32	-	11.587000 ms
Pattern_SegmentedReduction_Hierarchical_int64	-	11.777000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	-	11.588000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline
ScalarProduct_NDRange_int32	-	3.734000 ms
ScalarProduct_NDRange_int64	-	5.456000 ms
ScalarProduct_NDRange_fp32	-	3.767000 ms
ScalarProduct_Hierarchical_int32	-	10.555000 ms
ScalarProduct_Hierarchical_int64	-	11.508000 ms
ScalarProduct_Hierarchical_fp32	-	10.174000 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline
USM_Allocation_latency_fp32_device	-	0.068000 ms
USM_Allocation_latency_fp32_host	-	37.633000 ms
USM_Allocation_latency_fp32_shared	-	0.057000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	-	1.717000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	-	1.085000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	-	1.889000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	-	1.256000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline
VectorAddition_int32	-	1.510000 ms
VectorAddition_int64	-	3.066000 ms
VectorAddition_fp32	-	1.460000 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline
Polybench_2mm	-	1.221000 ms
Polybench_3mm	-	1.730000 ms
Polybench_Atax	-	6.855000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	-	16.091000 ms

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
LinearRegressionCoeff_fp32	-	908.423000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	-	0.030000 ms

Relative perf in group llama.cpp (6): cannot calculate

Benchmark	This PR	baseline
llama.cpp Prompt Processing Batched 128	-	830.457525 token/s
llama.cpp Text Generation Batched 128	-	62.530663 token/s
llama.cpp Prompt Processing Batched 256	-	872.219855 token/s
llama.cpp Text Generation Batched 256	-	62.524658 token/s
llama.cpp Prompt Processing Batched 512	-	426.427709 token/s
llama.cpp Text Generation Batched 512	-	62.477744 token/s

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	-	2475.310000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	-	2120.000000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3068.370000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	283.309000 ns

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	-	706.837000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	-	197.281000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	268.948000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.433000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	-	1259.770000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	-	1854.120000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3771.150000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	253.839000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	-	726.627000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	-	195.246000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	308.264000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	206.713000 ns

Relative perf in group alloc/min (4): cannot calculate

Benchmark	This PR	baseline
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	-	803.081000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	-	177.090000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	-	978.697000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	-	975.381000 ns

Relative perf in group multiple (12): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	-	33503.600000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	-	4251.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	-	141113.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	-	30214.100000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1170470.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	165011.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	-	1151930.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	-	145356.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42332.700000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	15330.800000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	-	75942.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	-	25425.600000 ns

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),123132.624,123156.000,3.78%,122528.000,1585974.000,[CPU],hw instructions [count]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=1),21.162,20.953,237.63%,19.300,15920.068,[CPU],time [us]

github-actions · 2025-01-20T11:11:51Z

Compute Benchmarks level_zero run (with params: --filter "Velocity|memory_benchmark_sycl|SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866751038

github-actions · 2025-01-20T11:26:40Z

Compute Benchmarks level_zero run (--filter "Velocity|memory_benchmark_sycl|SubmitKernel"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12866751038
Job status: success. Test status: success.

Summary

Total 22 benchmarks in mean.
Geomean 103.257%.
Improved 7 Regressed 2 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 99.953%

Benchmark	This PR	baseline	Relative perf	Change	-
api_overhead_benchmark_ur SubmitKernel in order	16.505000 μs	16.859 μs	102.14%	2.14%	.
api_overhead_benchmark_ur SubmitKernel in order with measure completion	21.064000 μs	21.425 μs	101.71%	1.71%	.
api_overhead_benchmark_l0 SubmitKernel out of order	11.733000 μs	11.848 μs	100.98%	0.98%	.
api_overhead_benchmark_ur SubmitKernel out of order CPU count	104523.000000 instr	105463.000 instr	100.90%	0.90%	.
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	123029.000000 instr	123991.000 instr	100.78%	0.78%	.
api_overhead_benchmark_ur SubmitKernel in order CPU count	109983.000000 instr	110815.000 instr	100.76%	0.76%	.
api_overhead_benchmark_sycl SubmitKernel in order	24.777000 μs	24.891 μs	100.46%	0.46%	.
api_overhead_benchmark_ur SubmitKernel out of order	15.656 μs	15.623000 μs	99.79%	-0.21%	.
api_overhead_benchmark_sycl SubmitKernel out of order	25.658 μs	23.710000 μs	92.41%	-7.59%	-
api_overhead_benchmark_l0 SubmitKernel in order	-	11.745000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	-	2.143000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	-	1.702000 μs

Relative perf in group memory (4): 115.598%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	134.451000 μs	219.808 μs	163.49%	63.49%	++++++++++
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.615000 μs	5.865 μs	104.45%	4.45%	+
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.176000 GB/s	3.043 GB/s	104.37%	4.37%	+
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	254.378000 μs	254.865 μs	100.19%	0.19%	.

Relative perf in group Velocity-Bench (9): 101.449%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench Easywave	235.000000 ms	289.000 ms	122.98%	22.98%	++++
Velocity-Bench svm	0.135800 s	0.140 s	103.17%	3.17%	.
Velocity-Bench Sobel Filter	608.972000 ms	621.173 ms	102.00%	2.00%	.
Velocity-Bench dl-cifar	23.819900 s	23.972 s	100.64%	0.64%	.
Velocity-Bench CudaSift	203.112000 ms	204.342 ms	100.61%	0.61%	.
Velocity-Bench Hashtable	356.246919 M keys/sec	356.084 M keys/sec	100.05%	0.05%	.
Velocity-Bench Bitcracker	35.105800 s	35.119 s	100.04%	0.04%	.
Velocity-Bench QuickSilver	117.360 MMS/CTT	117.450000 MMS/CTT	99.92%	-0.08%	.
Velocity-Bench dl-mnist	2.740 s	2.380000 s	86.86%	-13.14%	--

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	-	861.253000 bw GB/s

Relative perf in group multithread (10): cannot calculate

Benchmark	This PR	baseline
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	-	6931.139000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	-	17007.721000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	-	47383.460000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	-	2073.904000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	-	7868.958000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	-	9035.852000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	-	27237.512000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	-	1194.467000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	42860.412000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	113343.613000 μs

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline
Runtime_IndependentDAGTaskThroughput_SingleTask	-	253.100000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	-	273.484000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	-	271.662000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	-	272.505000 ms
Runtime_DAGTaskThroughput_SingleTask	-	1691.410000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	-	1756.502000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	-	1721.262000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	-	1694.375000 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	-	5.188000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	-	4.967000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	-	4.769000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	-	4.866000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	-	618.226000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	-	618.268000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	-	4.919000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	-	5.115000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	-	5.140000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	-	5.113000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	-	617.772000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	-	617.628000 ms
MicroBench_LocalMem_int32_4096	-	29.834000 ms
MicroBench_LocalMem_fp32_4096	-	29.857000 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline
Pattern_Reduction_NDRange_int32	-	16.971000 ms
Pattern_Reduction_Hierarchical_int32	-	17.024000 ms
Pattern_SegmentedReduction_NDRange_int16	-	2.263000 ms
Pattern_SegmentedReduction_NDRange_int32	-	2.164000 ms
Pattern_SegmentedReduction_NDRange_int64	-	2.333000 ms
Pattern_SegmentedReduction_NDRange_fp32	-	2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16	-	11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32	-	11.587000 ms
Pattern_SegmentedReduction_Hierarchical_int64	-	11.777000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	-	11.588000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline
ScalarProduct_NDRange_int32	-	3.734000 ms
ScalarProduct_NDRange_int64	-	5.456000 ms
ScalarProduct_NDRange_fp32	-	3.767000 ms
ScalarProduct_Hierarchical_int32	-	10.555000 ms
ScalarProduct_Hierarchical_int64	-	11.508000 ms
ScalarProduct_Hierarchical_fp32	-	10.174000 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline
USM_Allocation_latency_fp32_device	-	0.068000 ms
USM_Allocation_latency_fp32_host	-	37.633000 ms
USM_Allocation_latency_fp32_shared	-	0.057000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	-	1.717000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	-	1.085000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	-	1.889000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	-	1.256000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline
VectorAddition_int32	-	1.510000 ms
VectorAddition_int64	-	3.066000 ms
VectorAddition_fp32	-	1.460000 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline
Polybench_2mm	-	1.221000 ms
Polybench_3mm	-	1.730000 ms
Polybench_Atax	-	6.855000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	-	16.091000 ms

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
LinearRegressionCoeff_fp32	-	908.423000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	-	0.030000 ms

Relative perf in group llama.cpp (6): cannot calculate

Benchmark	This PR	baseline
llama.cpp Prompt Processing Batched 128	-	830.457525 token/s
llama.cpp Text Generation Batched 128	-	62.530663 token/s
llama.cpp Prompt Processing Batched 256	-	872.219855 token/s
llama.cpp Text Generation Batched 256	-	62.524658 token/s
llama.cpp Prompt Processing Batched 512	-	426.427709 token/s
llama.cpp Text Generation Batched 512	-	62.477744 token/s

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	-	2475.310000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	-	2120.000000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3068.370000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	283.309000 ns

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	-	706.837000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	-	197.281000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	268.948000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	213.433000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	-	1259.770000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	-	1854.120000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3771.150000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	253.839000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	-	726.627000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	-	195.246000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	308.264000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	206.713000 ns

Relative perf in group alloc/min (4): cannot calculate

Benchmark	This PR	baseline
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	-	803.081000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	-	177.090000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	-	978.697000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	-	975.381000 ns

Relative perf in group multiple (12): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	-	33503.600000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	-	4251.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	-	141113.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	-	30214.100000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1170470.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	165011.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	-	1151930.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	-	145356.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	42332.700000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	15330.800000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	-	75942.600000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	-	25425.600000 ns

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00392376 s
bitcracker - total time for whole calculation: 35.1058 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1166 1261 31.659% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1268 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1059 1261 28.7537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1252 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1212 1271 32.908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1269 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1265 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1262 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1253 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1271 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1080 1266 29.3239% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1264 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1257 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1254 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1202 1262 32.6364% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1252 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1238 1272 33.6139% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1267 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1255 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1254 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1269 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1162 1263 31.5504% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1250 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1255 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1260 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1214 1256 32.9623% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1265 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1252 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1274 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1264 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1258 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1153 1254 31.306% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1163 1259 31.5775% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1276 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1106 1252 30.0299% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1030 1254 27.9663% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1123 1257 30.4914% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1279 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 203.112 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.348360e-01 6.249330e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.390690e-01 7.631610e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.378300e-01 7.792570e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.701710e-01 8.456640e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.350090e-01 7.885810e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.355830e-01 7.634300e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.345930e-01 7.624400e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.346920e-01 7.831200e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.338210e-01 7.899690e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.348170e-01 7.756750e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.117e+07 1.117e+07 1.117e+07 0.000e+00 100.00
cycleInit 10 3.490e+06 3.490e+06 3.490e+06 0.000e+00 100.00
cycleTracking 10 7.676e+06 7.676e+06 7.676e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.914e+06 4.914e+06 4.914e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.079e+05 2.079e+05 2.079e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.100e+02 4.100e+02 4.100e+02 0.000e+00 100.00
Figure Of Merit 117.36 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.53547 s
sobelfilter - total time for whole calculation: 0.608972 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.5e-05 s
dl-cifar - total time for whole calculation: 23.8199 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.31294)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.74 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Output:

Number of args 3
Using cuSVM (Carpenter)...

Buffering input text file (6989624 B).
Load Done
Starting Training
_C 1.000000
Workgroup Size: 1024
nbrCtas 80
elemsPerCta 1248
threadsPerCta 128
Total run time: 0.064632 seconds
Iter:100
M:97683
N:123
Train done. Calulate Vector counts
Training done

Loading elapsed time : 0.0636 s
Processing elapsed time : 0.0699 s
Storing elapsed time : 0.0023 s
Total elapsed time : 0.1358 s
Result's are correct: 0.0551

github-actions · 2025-01-20T14:28:56Z

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/12870157551

github-actions · 2025-01-20T15:06:47Z

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/12870157551
Job status: success. Test status: success.

Summary

Total 137 benchmarks in mean.
Geomean 100.146%.
Improved 15 Regressed 12 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 100.818%

Benchmark	This PR	baseline	Relative perf	Change	-
api_overhead_benchmark_ur SubmitKernel in order	16.402000 μs	16.925 μs	103.19%	3.19%	+
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	2.129000 μs	2.193 μs	103.01%	3.01%	+
api_overhead_benchmark_sycl SubmitKernel out of order	23.394000 μs	23.959 μs	102.42%	2.42%	+
api_overhead_benchmark_l0 SubmitKernel in order	11.556000 μs	11.680 μs	101.07%	1.07%	.
api_overhead_benchmark_sycl SubmitKernel in order	24.633000 μs	24.896 μs	101.07%	1.07%	.
api_overhead_benchmark_ur SubmitKernel out of order	15.687000 μs	15.814 μs	100.81%	0.81%	.
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	1.701000 μs	1.708 μs	100.41%	0.41%	.
api_overhead_benchmark_ur SubmitKernel out of order CPU count	105443.000000 instr	105463.000 instr	100.02%	0.02%	.
api_overhead_benchmark_ur SubmitKernel in order CPU count	110795.000000 instr	110815.000 instr	100.02%	0.02%	.
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	124285.000 instr	123991.000000 instr	99.76%	-0.24%	.
api_overhead_benchmark_ur SubmitKernel in order with measure completion	21.687 μs	21.536000 μs	99.30%	-0.70%	.
api_overhead_benchmark_l0 SubmitKernel out of order	11.968 μs	11.830000 μs	98.85%	-1.15%	.

Relative perf in group memory (4): 99.375%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	253.995000 μs	258.713 μs	101.86%	1.86%	.
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	3.078000 GB/s	3.040 GB/s	101.25%	1.25%	.
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	225.911 μs	219.696000 μs	97.25%	-2.75%	-
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	5.970 μs	5.805000 μs	97.24%	-2.76%	-

Relative perf in group miscellaneous (1): 99.932%

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	860.664 bw GB/s	860.076000 bw GB/s	99.93%	-0.07%	.

Relative perf in group multithread (10): 99.322%

Benchmark	This PR	baseline	Relative perf	Change	-
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	110692.272000 μs	111451.459 μs	100.69%	0.69%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	2034.738000 μs	2042.798 μs	100.40%	0.40%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	26733.793000 μs	26812.066 μs	100.29%	0.29%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	6941.165000 μs	6959.764 μs	100.27%	0.27%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	8887.982 μs	8876.618000 μs	99.87%	-0.13%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	47448.720 μs	46954.509000 μs	98.96%	-1.04%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	7879.238 μs	7786.575000 μs	98.82%	-1.18%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	1208.126 μs	1193.812000 μs	98.82%	-1.18%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	43105.091 μs	42531.905000 μs	98.67%	-1.33%	.
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	17601.132 μs	16985.886000 μs	96.50%	-3.50%	-

Relative perf in group graph (10): 100.175%

Benchmark	This PR	baseline	Relative perf	Change	-
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100	56587.543000 μs	57360.560 μs	101.37%	1.37%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10	5588.057000 μs	5639.481 μs	100.92%	0.92%	.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10	72589.318000 μs	72707.625 μs	100.16%	0.16%	.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10	71756.219000 μs	71873.055 μs	100.16%	0.16%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10	5598.098000 μs	5600.217 μs	100.04%	0.04%	.
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100	353396.532000 μs	353438.463 μs	100.01%	0.01%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10	62.392000 μs	62.395 μs	100.00%	0.00%	.
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100	353207.353 μs	353207.153000 μs	100.00%	-0.00%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10	54.117 μs	53.918000 μs	99.63%	-0.37%	.
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100	679.483 μs	675.864000 μs	99.47%	-0.53%	.

Relative perf in group Velocity-Bench (9): 102.320%

Benchmark	This PR	baseline	Relative perf	Change	-
Velocity-Bench Easywave	235.000000 ms	291.000 ms	123.83%	23.83%	++++++++++
Velocity-Bench dl-cifar	23.582200 s	23.759 s	100.75%	0.75%	.
Velocity-Bench svm	0.140900 s	0.142 s	100.50%	0.50%	.
Velocity-Bench CudaSift	204.304000 ms	204.802 ms	100.24%	0.24%	.
Velocity-Bench QuickSilver	117.510 MMS/CTT	117.680000 MMS/CTT	99.86%	-0.14%	.
Velocity-Bench Hashtable	353.075 M keys/sec	353.997735 M keys/sec	99.74%	-0.26%	.
Velocity-Bench Bitcracker	35.640 s	35.494200 s	99.59%	-0.41%	.
Velocity-Bench dl-mnist	2.390 s	2.380000 s	99.58%	-0.42%	.
Velocity-Bench Sobel Filter	621.619 ms	615.543000 ms	99.02%	-0.98%	.

Relative perf in group Runtime (8): 100.326%

Benchmark	This PR	baseline	Relative perf	Change	-
Runtime_DAGTaskThroughput_HierarchicalParallelFor	1704.355000 ms	1721.794 ms	101.02%	1.02%	.
Runtime_DAGTaskThroughput_BasicParallelFor	1741.403000 ms	1754.165 ms	100.73%	0.73%	.
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	271.586000 ms	273.534 ms	100.72%	0.72%	.
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	272.753000 ms	274.696 ms	100.71%	0.71%	.
Runtime_DAGTaskThroughput_SingleTask	1672.866000 ms	1684.141 ms	100.67%	0.67%	.
Runtime_DAGTaskThroughput_NDRangeParallelFor	1677.413000 ms	1684.655 ms	100.43%	0.43%	.
Runtime_IndependentDAGTaskThroughput_SingleTask	255.285000 ms	255.725 ms	100.17%	0.17%	.
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	281.035 ms	275.902000 ms	98.17%	-1.83%	.

Relative perf in group MicroBench (14): 99.928%

Benchmark	This PR	baseline	Relative perf	Change	-
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	4.754000 ms	4.876 ms	102.57%	2.57%	+
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	4.676000 ms	4.702 ms	100.56%	0.56%	.
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	4.792000 ms	4.814 ms	100.46%	0.46%	.
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	4.899000 ms	4.911 ms	100.24%	0.24%	.
MicroBench_LocalMem_int32_4096	29.845000 ms	29.859 ms	100.05%	0.05%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	618.153000 ms	618.182 ms	100.00%	0.00%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	618.165000 ms	618.175 ms	100.00%	0.00%	.
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	617.551 ms	617.548000 ms	100.00%	-0.00%	.
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	617.549 ms	617.536000 ms	100.00%	-0.00%	.
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	4.796 ms	4.794000 ms	99.96%	-0.04%	.
MicroBench_LocalMem_fp32_4096	29.894 ms	29.835000 ms	99.80%	-0.20%	.
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	5.055 ms	5.026000 ms	99.43%	-0.57%	.
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	5.195 ms	5.125000 ms	98.65%	-1.35%	.
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	4.984 ms	4.852000 ms	97.35%	-2.65%	-

Relative perf in group Pattern (10): 99.859%

Benchmark	This PR	baseline	Relative perf	Change	-
Pattern_Reduction_Hierarchical_int32	16.846000 ms	16.851 ms	100.03%	0.03%	.
Pattern_SegmentedReduction_Hierarchical_int64	11.773000 ms	11.774 ms	100.01%	0.01%	.
Pattern_SegmentedReduction_NDRange_int16	2.265000 ms	2.265 ms	100.00%	0.00%	.
Pattern_SegmentedReduction_Hierarchical_fp32	11.592 ms	11.590000 ms	99.98%	-0.02%	.
Pattern_SegmentedReduction_Hierarchical_int16	11.809 ms	11.804000 ms	99.96%	-0.04%	.
Pattern_SegmentedReduction_Hierarchical_int32	11.591 ms	11.585000 ms	99.95%	-0.05%	.
Pattern_SegmentedReduction_NDRange_int64	2.339 ms	2.337000 ms	99.91%	-0.09%	.
Pattern_SegmentedReduction_NDRange_int32	2.172 ms	2.170000 ms	99.91%	-0.09%	.
Pattern_SegmentedReduction_NDRange_fp32	2.168 ms	2.164000 ms	99.82%	-0.18%	.
Pattern_Reduction_NDRange_int32	16.767 ms	16.604000 ms	99.03%	-0.97%	.

Relative perf in group ScalarProduct (6): 100.128%

Benchmark	This PR	baseline	Relative perf	Change	-
ScalarProduct_NDRange_int32	3.759000 ms	3.777 ms	100.48%	0.48%	.
ScalarProduct_NDRange_int64	5.476000 ms	5.499 ms	100.42%	0.42%	.
ScalarProduct_Hierarchical_int64	11.514000 ms	11.530 ms	100.14%	0.14%	.
ScalarProduct_NDRange_fp32	3.756000 ms	3.757 ms	100.03%	0.03%	.
ScalarProduct_Hierarchical_int32	10.546 ms	10.537000 ms	99.91%	-0.09%	.
ScalarProduct_Hierarchical_fp32	10.166 ms	10.145000 ms	99.79%	-0.21%	.

Relative perf in group USM (7): 97.661%

Benchmark	This PR	baseline	Relative perf	Change	-
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	1.871000 ms	1.883 ms	100.64%	0.64%	.
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	1.091000 ms	1.095 ms	100.37%	0.37%	.
USM_Allocation_latency_fp32_shared	0.057000 ms	0.057 ms	100.00%	0.00%	.
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	1.252 ms	1.249000 ms	99.76%	-0.24%	.
USM_Allocation_latency_fp32_host	37.607 ms	37.401000 ms	99.45%	-0.55%	.
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	1.718 ms	1.703000 ms	99.13%	-0.87%	.
USM_Allocation_latency_fp32_device	0.068 ms	0.058000 ms	85.29%	-14.71%	------

Relative perf in group VectorAddition (3): 99.674%

Benchmark	This PR	baseline	Relative perf	Change	-
VectorAddition_int32	1.474000 ms	1.476 ms	100.14%	0.14%	.
VectorAddition_int64	3.071000 ms	3.072 ms	100.03%	0.03%	.
VectorAddition_fp32	1.490 ms	1.473000 ms	98.86%	-1.14%	.

Relative perf in group Polybench (3): 99.250%

Benchmark	This PR	baseline	Relative perf	Change	-
Polybench_2mm	1.211000 ms	1.215 ms	100.33%	0.33%	.
Polybench_3mm	1.733 ms	1.729000 ms	99.77%	-0.23%	.
Polybench_Atax	6.872 ms	6.712000 ms	97.67%	-2.33%	-

Relative perf in group Kmeans (1): 99.876%

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	16.111 ms	16.091000 ms	99.88%	-0.12%	.

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
LinearRegressionCoeff_fp32	946.725000 ms	-

Relative perf in group MolecularDynamics (1): 103.333%

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	0.030000 ms	0.031 ms	103.33%	3.33%	+

Relative perf in group llama.cpp (6): 100.149%

Benchmark	This PR	baseline	Relative perf	Change	-
llama.cpp Prompt Processing Batched 512	432.355821 token/s	428.269 token/s	100.95%	0.95%	.
llama.cpp Prompt Processing Batched 128	822.752124 token/s	820.039 token/s	100.33%	0.33%	.
llama.cpp Text Generation Batched 512	62.578197 token/s	62.521 token/s	100.09%	0.09%	.
llama.cpp Text Generation Batched 256	62.551 token/s	62.555854 token/s	99.99%	-0.01%	.
llama.cpp Text Generation Batched 128	62.500 token/s	62.541124 token/s	99.93%	-0.07%	.
llama.cpp Prompt Processing Batched 256	865.855 token/s	869.339926 token/s	99.60%	-0.40%	.

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): 100.540%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	2101.150000 ns	2178.240 ns	103.67%	3.67%	++
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	302.779000 ns	303.268 ns	100.16%	0.16%	.
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3081.310 ns	3080.300000 ns	99.97%	-0.03%	.
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	2747.770 ns	2704.770000 ns	98.44%	-1.56%	.

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): 100.095%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	699.049000 ns	722.081 ns	103.29%	3.29%	+
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	274.456 ns	273.931000 ns	99.81%	-0.19%	.
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	215.631 ns	213.298000 ns	98.92%	-1.08%	.
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	193.666 ns	190.623000 ns	98.43%	-1.57%	.

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): 99.893%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	1878.420000 ns	1927.360 ns	102.61%	2.61%	+
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	266.920 ns	266.688000 ns	99.91%	-0.09%	.
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	1447.100 ns	1440.300000 ns	99.53%	-0.47%	.
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	3435.180 ns	3352.290000 ns	97.59%	-2.41%	-

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): 103.902%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	747.634000 ns	836.799 ns	111.93%	11.93%	+++++
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	190.321000 ns	199.070 ns	104.60%	4.60%	++
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	205.886000 ns	206.587 ns	100.34%	0.34%	.
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	308.951 ns	306.518000 ns	99.21%	-0.79%	.

Relative perf in group alloc/min (4): 100.425%

Benchmark	This PR	baseline	Relative perf	Change	-
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	1035.670000 ns	1068.480 ns	103.17%	3.17%	+
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	174.121000 ns	176.254 ns	101.23%	1.23%	.
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	972.724 ns	963.990000 ns	99.10%	-0.90%	.
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	834.343 ns	819.953000 ns	98.28%	-1.72%	.

Relative perf in group multiple (12): 99.406%

Benchmark	This PR	baseline	Relative perf	Change	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	30334.100000 ns	31773.300 ns	104.74%	4.74%	++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	14974.000000 ns	15455.100 ns	103.21%	3.21%	+
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	4141.490000 ns	4273.390 ns	103.18%	3.18%	+
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	41802.500000 ns	42492.700 ns	101.65%	1.65%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	1215620.000 ns	1213060.000000 ns	99.79%	-0.21%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	166132.000 ns	164285.000000 ns	98.89%	-1.11%	.
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	141286.000 ns	139112.000000 ns	98.46%	-1.54%	.
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	32428.800 ns	31736.100000 ns	97.86%	-2.14%	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	26177.100 ns	25433.600000 ns	97.16%	-2.84%	-
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	1206500.000 ns	1168340.000000 ns	96.84%	-3.16%	-
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	76653.200 ns	73568.200000 ns	95.98%	-4.02%	--
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	145134.000 ns	138786.000000 ns	95.63%	-4.37%	--

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00384029 s
bitcracker - total time for whole calculation: 35.6396 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/pmdk/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1257 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1266 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1255 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1036 1258 28.1292% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1250 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1270 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1265 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1263 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1259 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1256 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1251 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1209 1273 32.8265% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1262 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1069 1266 29.0253% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1261 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1260 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1249 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1040 1262 28.2379% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1243 1276 33.7497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1208 1259 32.7993% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1273 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1152 1270 31.2788% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1090 1253 29.5954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1118 1256 30.3557% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1272 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1263 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1103 1269 29.9484% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1256 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1270 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1270 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1163 1268 31.5775% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1267 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1269 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1262 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1100 1256 29.867% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1263 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1114 1260 30.2471% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1193 1264 32.3921% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1167 1254 31.6861% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1280 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1255 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1084 1269 29.4325% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1160 1258 31.4961% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 204.304 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/pmdk/bench_workdir/easywave/easyWave_sycl -grid /home/pmdk/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/pmdk/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.770570e-01 6.237680e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.395570e-01 7.631070e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.363820e-01 7.788010e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.697570e-01 8.329790e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.344320e-01 7.885440e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.334600e-01 7.641250e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.340610e-01 7.632370e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.333700e-01 7.844630e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.326560e-01 7.909960e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.329760e-01 7.761500e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.109e+07 1.109e+07 1.109e+07 0.000e+00 100.00
cycleInit 10 3.424e+06 3.424e+06 3.424e+06 0.000e+00 100.00
cycleTracking 10 7.666e+06 7.666e+06 7.666e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.920e+06 4.920e+06 4.920e+06 0.000e+00 100.00
cycleTracking_MPI 117 1.940e+05 1.940e+05 1.940e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 3.980e+02 3.980e+02 3.980e+02 0.000e+00 100.00
Figure Of Merit 117.51 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.54239 s
sobelfilter - total time for whole calculation: 0.621619 s

Velocity-Bench dl-cifar

Environment Variables:

Command:

/home/pmdk/bench_workdir/dl-cifar/dl-cifar_sycl

Output:

	Welcome to DL-CIFAR workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: DL NW size type: WORKLOAD_DEFAULT_SIZE
WL PARAMS: ==================================================
WL PARAMS:

dataFileReadTimer->getTotalOpTime(): 8.4e-05 s
dl-cifar - total time for whole calculation: 23.5822 s

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

Output:

	Welcome to DL-MNIST workload: SYCL version.

=======================================================================
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.6.0)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero

WL PARAMS:

WL PARAMS: ==================================================
WL PARAMS: User input parameters:
WL PARAMS: Trace: notrace
WL PARAMS: Tensor management policy: per_layer
WL PARAMS: Convolution algorithm: ONEDNN_AUTO
WL PARAMS: Dataset reader format: NCHW
WL PARAMS: Dry run: YES
WL PARAMS: OneDNN Conv PD memory format: ONEDNN_CONVPD_ANY
WL PARAMS: No of iterations for inference: 400
WL PARAMS: ==================================================
WL PARAMS:

dl-mnist - total time for whole calculation: 2.39 s

Velocity-Bench svm

Environment Variables:

Command:

/home/pmdk/bench_workdir/svm/svm_sycl /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a9a /home/pmdk/bench_workdir/velocity-bench-repo/svm/SYCL/a.m

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

Output:

name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,2425.59,1834.06,ns,,,,,
"glibc/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,699.049,699.052,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1235.82,1187.64,ns,,,,,
"glibc/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,747.634,747.636,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,786.691,759.934,ns,,,,,
"glibc/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,173.599,173.601,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,1759.39,1758.87,ns,,,,,
"os_provider/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,188.753,188.748,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,1839.45,1838.58,ns,,,,,
"os_provider/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,193.229,193.223,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,3480.42,3454.72,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,283.232,283.174,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,3435.18,3389.11,ns,,,,,
"proxy_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,315.928,315.921,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:4",800000,302.779,294.883,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/0/4096/iterations:200000/threads:1",200000,215.631,215.597,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:4",800000,269.363,261.992,ns,,,,,
"scalable_pool<os_provider>/alloc/size:10000/100000/4096/iterations:200000/threads:1",200000,207.218,207.218,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4",800000,1084.56,1076.55,ns,,,,,
"scalable_pool<os_provider>/alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1",200000,948.116,948.064,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,30483.7,28922.7,ns,,,,,
"glibc/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,4097.34,4097.21,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,139796,88907.8,ns,,,,,
"glibc/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,30334.1,30333.9,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.39495e+06,1.39427e+06,ns,,,,,
"proxy_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,168558,168557,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,1.41112e+06,1.40808e+06,ns,,,,,
"os_provider/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,152141,152138,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:4",8000,41409.5,40791.1,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/size:10000/4096/iterations:2000/threads:1",2000,14493.2,14492.7,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4",8000,76653.2,75841.2,ns,,,,,
"scalable_pool<os_provider>/multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1",2000,26177.1,26176.6,ns,,,,,

github-actions · 2025-01-23T10:37:14Z

Compute Benchmarks level_zero run (with params: --filter "QueueInOrderMemcpy"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12927369918

github-actions · 2025-01-23T10:43:21Z

Compute Benchmarks level_zero run (--filter "QueueInOrderMemcpy"):
https://github.com/oneapi-src/unified-runtime/actions/runs/12927369918
Job status: success. Test status: success.

Summary

Total 2 benchmarks in mean.
Geomean 128.146%.
Improved 1 Regressed 0 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group memory (4): 128.146%

Benchmark	This PR	baseline	Relative perf	Change	-
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024	134.572000 μs	218.485 μs	162.36%	62.36%	++++++++++
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024	250.326000 μs	253.192 μs	101.14%	1.14%	.
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024	-	5.831000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240	-	3.082000 GB/s

Relative perf in group api (12): cannot calculate

Benchmark	This PR	baseline
api_overhead_benchmark_l0 SubmitKernel out of order	-	11.741000 μs
api_overhead_benchmark_l0 SubmitKernel in order	-	11.419000 μs
api_overhead_benchmark_sycl SubmitKernel out of order	-	23.423000 μs
api_overhead_benchmark_sycl SubmitKernel in order	-	24.707000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024	-	2.161000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024	-	1.705000 μs
api_overhead_benchmark_ur SubmitKernel out of order CPU count	-	105463.000000 instr
api_overhead_benchmark_ur SubmitKernel out of order	-	15.763000 μs
api_overhead_benchmark_ur SubmitKernel in order CPU count	-	110815.000000 instr
api_overhead_benchmark_ur SubmitKernel in order	-	16.783000 μs
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count	-	123991.000000 instr
api_overhead_benchmark_ur SubmitKernel in order with measure completion	-	21.455000 μs

Relative perf in group miscellaneous (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
miscellaneous_benchmark_sycl VectorSum	-	859.195000 bw GB/s

Relative perf in group multithread (10): cannot calculate

Benchmark	This PR	baseline
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1	-	6931.007000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1	-	17308.085000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1	-	47028.145000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1	-	2056.843000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1	-	7746.524000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1	-	8886.251000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1	-	26759.017000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1	-	1198.342000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	42714.145000 μs
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events	-	114836.190000 μs

Relative perf in group graph (10): cannot calculate

Benchmark	This PR	baseline
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10	-	71756.948000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10	-	72702.628000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100	-	353472.301000 μs
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100	-	353385.778000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10	-	53.970000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10	-	62.094000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100	-	678.072000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10	-	5605.418000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10	-	5585.609000 μs
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100	-	57222.637000 μs

Relative perf in group Velocity-Bench (9): cannot calculate

Benchmark	This PR	baseline
Velocity-Bench Hashtable	-	352.453804 M keys/sec
Velocity-Bench Bitcracker	-	35.732500 s
Velocity-Bench CudaSift	-	204.556000 ms
Velocity-Bench Easywave	-	233.000000 ms
Velocity-Bench QuickSilver	-	117.650000 MMS/CTT
Velocity-Bench Sobel Filter	-	625.818000 ms
Velocity-Bench dl-cifar	-	23.906300 s
Velocity-Bench dl-mnist	-	2.390000 s
Velocity-Bench svm	-	0.138800 s

Relative perf in group Runtime (8): cannot calculate

Benchmark	This PR	baseline
Runtime_IndependentDAGTaskThroughput_SingleTask	-	258.612000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor	-	287.688000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor	-	275.171000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor	-	276.140000 ms
Runtime_DAGTaskThroughput_SingleTask	-	1694.837000 ms
Runtime_DAGTaskThroughput_BasicParallelFor	-	1764.397000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor	-	1738.675000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor	-	1707.643000 ms

Relative perf in group MicroBench (14): cannot calculate

Benchmark	This PR	baseline
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous	-	4.852000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous	-	4.832000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous	-	4.719000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous	-	4.795000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous	-	618.126000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous	-	618.173000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided	-	4.825000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided	-	5.159000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided	-	5.103000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided	-	5.031000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided	-	617.702000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided	-	617.530000 ms
MicroBench_LocalMem_int32_4096	-	29.834000 ms
MicroBench_LocalMem_fp32_4096	-	29.869000 ms

Relative perf in group Pattern (10): cannot calculate

Benchmark	This PR	baseline
Pattern_Reduction_NDRange_int32	-	17.062000 ms
Pattern_Reduction_Hierarchical_int32	-	16.673000 ms
Pattern_SegmentedReduction_NDRange_int16	-	2.267000 ms
Pattern_SegmentedReduction_NDRange_int32	-	2.169000 ms
Pattern_SegmentedReduction_NDRange_int64	-	2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32	-	2.163000 ms
Pattern_SegmentedReduction_Hierarchical_int16	-	11.805000 ms
Pattern_SegmentedReduction_Hierarchical_int32	-	11.582000 ms
Pattern_SegmentedReduction_Hierarchical_int64	-	11.765000 ms
Pattern_SegmentedReduction_Hierarchical_fp32	-	11.586000 ms

Relative perf in group ScalarProduct (6): cannot calculate

Benchmark	This PR	baseline
ScalarProduct_NDRange_int32	-	3.731000 ms
ScalarProduct_NDRange_int64	-	5.455000 ms
ScalarProduct_NDRange_fp32	-	3.759000 ms
ScalarProduct_Hierarchical_int32	-	10.530000 ms
ScalarProduct_Hierarchical_int64	-	11.517000 ms
ScalarProduct_Hierarchical_fp32	-	10.158000 ms

Relative perf in group USM (7): cannot calculate

Benchmark	This PR	baseline
USM_Allocation_latency_fp32_device	-	0.070000 ms
USM_Allocation_latency_fp32_host	-	37.440000 ms
USM_Allocation_latency_fp32_shared	-	0.058000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch	-	1.749000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch	-	1.117000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch	-	1.874000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch	-	1.264000 ms

Relative perf in group VectorAddition (3): cannot calculate

Benchmark	This PR	baseline
VectorAddition_int32	-	1.442000 ms
VectorAddition_int64	-	3.162000 ms
VectorAddition_fp32	-	1.476000 ms

Relative perf in group Polybench (3): cannot calculate

Benchmark	This PR	baseline
Polybench_2mm	-	1.225000 ms
Polybench_3mm	-	1.736000 ms
Polybench_Atax	-	6.898000 ms

Relative perf in group Kmeans (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
Kmeans_fp32	-	16.081000 ms

Relative perf in group LinearRegressionCoeff (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
LinearRegressionCoeff_fp32	-	994.314000 ms

Relative perf in group MolecularDynamics (1): cannot calculate

Benchmark	This PR	baseline	Relative perf	Change	-
MolecularDynamics	-	0.030000 ms

Relative perf in group llama.cpp (6): cannot calculate

Benchmark	This PR	baseline
llama.cpp Prompt Processing Batched 128	-	819.567376 token/s
llama.cpp Text Generation Batched 128	-	62.459005 token/s
llama.cpp Prompt Processing Batched 256	-	867.008645 token/s
llama.cpp Text Generation Batched 256	-	62.493712 token/s
llama.cpp Prompt Processing Batched 512	-	424.730218 token/s
llama.cpp Text Generation Batched 512	-	62.448657 token/s

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc	-	2613.500000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider	-	2217.470000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3317.100000 ns
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	310.122000 ns

Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc	-	704.764000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider	-	193.084000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	287.614000 ns
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	212.209000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc	-	1251.740000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider	-	2023.360000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider>	-	3438.070000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider>	-	283.524000 ns

Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (4): cannot calculate

Benchmark	This PR	baseline
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc	-	742.676000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider	-	191.882000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider>	-	315.228000 ns
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider>	-	205.946000 ns

Relative perf in group alloc/min (4): cannot calculate

Benchmark	This PR	baseline
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc	-	848.832000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc	-	178.321000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider>	-	1212.920000 ns
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider>	-	969.864000 ns

Relative perf in group multiple (12): cannot calculate

Benchmark	This PR	baseline
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc	-	32562.900000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc	-	4280.750000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc	-	138769.000000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc	-	30836.400000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider>	-	1155430.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider>	-	162481.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider	-	1173940.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider	-	149287.000000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider>	-	43668.900000 ns
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider>	-	15731.700000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider>	-	74953.300000 ns
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider>	-	26434.600000 ns

Details

Benchmark details - environment, command, output...

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),251.774,250.326,1.67%,245.491,462.437,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),135.567,134.572,2.16%,133.391,323.030,[CPU],[us]

test pr please ignore

452c4a4

pbalcer requested a review from a team as a code owner January 20, 2025 10:23

test pr please ignore #2587

Are you sure you want to change the base?

test pr please ignore #2587

Conversation

pbalcer commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

Summary

Performance change in benchmark groups

Details

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

================================== Retrieving Info

Reading hash file "/home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0

================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

Environment Variables:

Command:

Output:

github-actions bot commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

Summary

Performance change in benchmark groups

Details

Environment Variables:

Command:

Output:

Environment Variables:

==================================
Retrieving Info

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!