Skip to content

kv-cells : fix tracking of seq_pos during cache reuse #14339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 23, 2025

Conversation

ggerganov
Copy link
Member

fix #14298

We tracked the sequence positions present in the KV cells using an std::set because of the assumption that a position would only be present once at a maximum. However, during cache reuse, we do the following sequence of operations:

const int64_t kv_shift = (int64_t) head_p - (int64_t) head_c;
llama_memory_seq_rm (llama_get_memory(ctx), slot.id, head_p, head_c);
llama_memory_seq_add(llama_get_memory(ctx), slot.id, head_c, head_c + n_match, kv_shift);

I.e. we remove a chunk of positions in [p0, p0 + match) and then move another chunk [p1, p1 + match) into its place by adding shift = p0 - p1 to all positions in that chunk. During this "movement", depending on the order that we apply the shift, a position could occur more than once for a short time, which breaks the assumption of std::set. To fix that, we replace std::set with std::map and count the occurrences of each position, erasing the elements when the count reaches 0.

@ggerganov ggerganov requested a review from ngxson as a code owner June 23, 2025 09:27
@ggerganov ggerganov merged commit 7b50d58 into master Jun 23, 2025
47 checks passed
@ggerganov ggerganov deleted the gg/kv-cells-fix-seq-pos branch June 23, 2025 09:27
@LostRuins LostRuins mentioned this pull request Jun 24, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: Completion fails with error 500
1 participant