Skip to content

One cache class to rule them all #40276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

One cache class to rule them all #40276

wants to merge 10 commits into from

Conversation

Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Aug 19, 2025

What does this PR do?

Now that we moved the cache logic to a per-layer approach and added DynamicSliding layer, duplicating the number of Cache classes is kind of useless, and brings a lot of unnecessary complexity.

This PR deprecates all Cache classes to keep only 3:

  • DynamicCache -> each layer type is inferred from config and correctly dispatched
  • StaticCache -> each layer type is inferred from config and correctly dispatched
  • QuantizedCache -> only has 1 layer type anyway

This simplifies quite a lot as we had created a lot of classes before, which are always the same up to the layer type.
Also, offloading is now ALWAYS possible, independently of the layer combination, which was not the case before: OffloadedCache would work with only full layers, and we had 2 different classes for StaticOffloaded, depending on hybrid structure or only full layer, but fully sliding models such as Mistral could not be offloaded with static implementation.
Finally, cache_implementation="static" will correctly instantiate the correct layer types, independently of the model, which is not the case currently (it would use only full layers all the time, even if model uses sliding layers, wasting memory)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, gpt_neo, granitemoehybrid, kyutai_speech_to_text, moshi, paligemma, phimoe, qwen2_moe

@Cyrilvallez
Copy link
Member Author

cc @gante @manueldeprada as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants