One cache class to rule them all #40276

Cyrilvallez · 2025-08-19T11:01:36Z

What does this PR do?

Now that we moved the cache logic to a per-layer approach and added DynamicSliding layer, duplicating the number of Cache classes is kind of useless, and brings a lot of unnecessary complexity.

This PR deprecates all Cache classes to keep only 3:

DynamicCache -> each layer type is inferred from config and correctly dispatched
StaticCache -> each layer type is inferred from config and correctly dispatched
QuantizedCache -> only has 1 layer type anyway

This simplifies quite a lot as we had created a lot of classes before, which are always the same up to the layer type.
Also, offloading is now ALWAYS possible, independently of the layer combination, which was not the case before: OffloadedCache would work with only full layers, and we had 2 different classes for StaticOffloaded, depending on hybrid structure or only full layer, but fully sliding models such as Mistral could not be offloaded with static implementation.
Finally, cache_implementation="static" will correctly instantiate the correct layer types, independently of the model, which is not the case currently (it would use only full layers all the time, even if model uses sliding layers, wasting memory)

HuggingFaceDocBuilderDev · 2025-08-19T11:13:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-08-19T15:15:35Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, gpt_neo, granitemoehybrid, kyutai_speech_to_text, moshi, paligemma, phimoe, qwen2_moe

Cyrilvallez · 2025-08-20T10:24:25Z

cc @gante @manueldeprada as well!

remove all classes

d987e8a

Cyrilvallez added 8 commits August 19, 2025 13:40

fix generate

dd692d7

start replacing everywhere

ebc8f4d

finish removing everywhere

467e22f

typo

00dd439

typo

24650df

fix

25673c0

typo

4d51b82

remove num_layers=1

a512b99

CI

62745ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

One cache class to rule them all #40276

One cache class to rule them all #40276

Uh oh!

Cyrilvallez commented Aug 19, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Cyrilvallez commented Aug 20, 2025

Uh oh!

Uh oh!

One cache class to rule them all #40276

Are you sure you want to change the base?

One cache class to rule them all #40276

Uh oh!

Conversation

Cyrilvallez commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Cyrilvallez commented Aug 20, 2025

Uh oh!

Uh oh!

Cyrilvallez commented Aug 19, 2025 •

edited

Loading