llama-server prompt cache support #10937
-
Does llama-server support prompt caching between requests, similar to llama-cli prompt file? I have a use case where the prefix stays the same between requests, with just a few characters changing at the end. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Oh wait maybe that's on by default:
|
Beta Was this translation helpful? Give feedback.
-
Yes, it's on by default. If you submit multiple requests sequentially with the same prefix, the prompt decoding for that common prefix will be reused for subsequent requests. |
Beta Was this translation helpful? Give feedback.
-
is it easy to implement multi cache support, if we have calls from different agents, so instead of cache_prompt true, we could have a number that is the maximum cache prompt |
Beta Was this translation helpful? Give feedback.
Yes, it's on by default. If you submit multiple requests sequentially with the same prefix, the prompt decoding for that common prefix will be reused for subsequent requests.