[generate] handle support for cache classes when num enc layers != num dec layers #40277

gante · 2025-08-19T11:29:55Z

What does this PR do?

I think this bug has been in since EncoderDecoderCache was added: we naively assumed that # encoder layers == # decoder layers, as we used config.num_hidden_layers (# encoder layers in encoder-decoder models) to instantiate both caches 👀

This PR adds the missing logic:

we can specify which part of the config we want to pull with get_text_config() (previously we could only isolate the decoder). We also update num_hidden_layers accordingly, so that the caches see the right number of layers.
we pull the right config (encoder vs decoder) before parameterizing EncoderDecoderCache in generate
(adds tests)

Fixes #40120

Example of failing checkpoint, taken from #40120 (needs num encoder layers != num decoder layers)

import torch
from transformers import AutoTokenizer, pipeline

torch_dtype = torch.float16
device_map = "cpu"
model_kwargs = dict(torch_dtype=torch_dtype, device_map=device_map)
model_id = "sshleifer/distilbart-cnn-12-6"
tokenizer = AutoTokenizer.from_pretrained(model_id)
generator = pipeline(
    "summarization",
    model=model_id,
    tokenizer=tokenizer,
    model_kwargs=model_kwargs,
)
generation_config = generator.model.generation_config
generation_config.do_sample = True
generation_config.use_cache = True
generation_config.temperature = 1.0
generation_config.num_beams = 1
generation_config.max_new_tokens = 100
generation_config.min_new_tokens = 100
generation_config.top_p = 1.0
generation_config.cache_implementation="static"

prompt = "I like math"

output = generator(prompt , batch_size=1, generation_config=generation_config)  # the cache will have an incorrect number of layers on `main`, but it runs
output = generator(prompt , batch_size=1, generation_config=generation_config)  # crashes on `main`

HuggingFaceDocBuilderDev · 2025-08-19T11:41:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2025-08-20T10:18:20Z

@Cyrilvallez CI green now 👌

(possibly there are a few more edge cases, but I don't think it's worth going the extra mile to find them all)

manueldeprada

LGTM, only comment I have is that it would be clearer for me to name it cross_attention_cache instead of encoder_cache and same for self_aattention_cache and decoder_cache.

src/transformers/generation/utils.py

gante · 2025-08-20T11:07:27Z

@manueldeprada like this? (see latest changes)

github-actions · 2025-08-20T11:07:34Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: colqwen2, dia, qwen2_5_omni, t5gemma

handle support for cache classes when num enc layers != num dec layers

4e30dee

gante requested review from manueldeprada and Cyrilvallez August 19, 2025 11:30

gante added 2 commits August 19, 2025 15:29

handle overwrites

e6e2b68

one more corner case

da8a387

manueldeprada reviewed Aug 20, 2025

View reviewed changes

gante commented Aug 20, 2025

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

Update src/transformers/generation/utils.py

0857b7f

gante commented Aug 20, 2025

View reviewed changes

src/transformers/generation/utils.py Outdated Show resolved Hide resolved

Update src/transformers/generation/utils.py

caaec17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[generate] handle support for cache classes when num enc layers != num dec layers #40277

[generate] handle support for cache classes when num enc layers != num dec layers #40277

gante commented Aug 19, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 19, 2025

Uh oh!

gante commented Aug 20, 2025 •

edited

Loading

Uh oh!

manueldeprada left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

gante commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

[generate] handle support for cache classes when num enc layers != num dec layers #40277

Are you sure you want to change the base?

[generate] handle support for cache classes when num enc layers != num dec layers #40277

Conversation

gante commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 19, 2025

Uh oh!

gante commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manueldeprada left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gante commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

gante commented Aug 19, 2025 •

edited

Loading

gante commented Aug 20, 2025 •

edited

Loading

manueldeprada left a comment •

edited

Loading