Prefill Performace is difference between llama-bench and llama-cli, when tested on Mac M4 Pro using CPU. #14276

v0jiuqi · 2025-06-19T10:04:48Z

v0jiuqi
Jun 19, 2025

I tested the performance of 512 prompts using both llama-bench and llama-cli. The results indicate that llama-bench achieves better single-threaded performance compared to llama-cli. What could be the reason for this difference?
test command:

./bin/llama-cli -m ../../../llm-model/qwen2.5-3b-q41.gguf -p "a prompt with 512 tokens" -n 128 -no-cnv --cpu-strict 1 -ngl 0 -t 1
./bin/llama-bench -m ../../../llm-model/qwen2.5-3b-q41.gguf -t 1--n-gpu-layers 0

test result:
231.98 ± 0.91 - 12.66 ± 0.00 (llama-bench)
183.56 - 12.08 (llama-cli)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefill Performace is difference between llama-bench and llama-cli, when tested on Mac M4 Pro using CPU. #14276

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Prefill Performace is difference between llama-bench and llama-cli, when tested on Mac M4 Pro using CPU. #14276

Uh oh!

v0jiuqi Jun 19, 2025

Replies: 0 comments

v0jiuqi
Jun 19, 2025