Skip to content

[KVCache] Per Layer Sliding Window #17928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joshua-j-hong
Copy link
Contributor

Adds per layer sliding window functionality to the KV Cache. Currently there are some major bugs that should be resolved before merging.

  1. Throughput slows down significantly, even before the sliding window length is reached
  2. Correctness issues past the sliding window length, potentially due to page indexing or some other KV cache issue

@joshua-j-hong joshua-j-hong changed the title KV Cache Per Layer Sliding Window [KVCache] Per Layer Sliding Window May 7, 2025
@joshua-j-hong joshua-j-hong force-pushed the jjhong_KV_alt_sliding_window branch from 3fb27bb to 936d500 Compare May 8, 2025 03:42
@joshua-j-hong
Copy link
Contributor Author

joshua-j-hong commented May 8, 2025

With some further testing and investigation, there is an additional MLC-LLM/TVM bug related to excessive prefilling (even without the per-layer sliding window changes outlined here) that may be causing inference slowdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant