One cache class to rule them all #40276
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Now that we moved the cache logic to a per-layer approach and added DynamicSliding layer, duplicating the number of Cache classes is kind of useless, and brings a lot of unnecessary complexity.
This PR deprecates all Cache classes to keep only 3:
This simplifies quite a lot as we had created a lot of classes before, which are always the same up to the layer type.
Also, offloading is now ALWAYS possible, independently of the layer combination, which was not the case before:
OffloadedCache
would work with only full layers, and we had 2 different classes forStaticOffloaded
, depending on hybrid structure or only full layer, but fully sliding models such as Mistral could not be offloaded with static implementation.Finally,
cache_implementation="static"
will correctly instantiate the correct layer types, independently of the model, which is not the case currently (it would use only full layers all the time, even if model uses sliding layers, wasting memory)