Avoid creating tensor in CosmosAttnProcessor2_0 (#11761) #11763

chenxiao111222 · 2025-06-21T03:55:55Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

a-r-r-o-w · 2025-06-21T03:58:29Z

Just wanted to make a note that the reason this has to be a tensor is because it seemingly breaks ONNX export. I had it implemented the same way earlier but changed to this after suggestions from the nvidia team. cc @asfiyab-nvidia

yiyixuxu · 2025-06-23T22:19:12Z

ohh thanks for the info @a-r-r-o-w
10% speed difference is really not small though (if confirmed)

but I think this size ratio is actually determined by config (inner_dim vs inner_kv_dim) and won't vary at run time, no?

a-r-r-o-w · 2025-06-24T06:23:05Z

@yiyixuxu Yeah it shouldn't vary and we can compute this beforehand. I think the problem stemmed from using an integer (or int-like type) to do the repeat_interleave instead of a tensor. So, it doesn't matter if we compute it with query.idx(...) / key.idx(...) or pre-caculate the ratio. I'm not sure about the details but it looks like there were a few similar issues (pytorch/pytorch#100429, for example) which have been marked as resolved. They are very simple examples though, so this "mark dynamic" thing probably does not work with Cosmos (I'm too unfamiliar with ONNX to comment on this).

I don't think we have most of our model definitions compatible with ONNX though (has this been checked before?), so I think it might be okay to make a not and break this compatibility?

yiyixuxu · 2025-06-27T19:22:31Z

ohh thanks @a-r-r-o-w

do you have the original conversation about onnx export breaking? we could try to look into a solution if there is a reproducible script for the issue

otherwise, I think the easiest way is we could put an if else in with if torch.onnx.is_in_onnx_export() else statement in. Let me know what you think!

a-r-r-o-w · 2025-06-27T19:25:42Z

@yiyixuxu Unfortunately, there's not much to gather from the original conversation. You can find it here: #10660 (comment)

Your suggestion sounds good to me

yiyixuxu · 2025-06-27T20:49:18Z

cc @chenxiao111222 can you add a if torch.onnx.is_in_onnx_export( ) and keep the original code path there?

Avoid creating tensor in CosmosAttnProcessor2_0 (huggingface#11761)

72e22e8

a-r-r-o-w requested a review from yiyixuxu June 21, 2025 03:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid creating tensor in CosmosAttnProcessor2_0 (#11761) #11763

Avoid creating tensor in CosmosAttnProcessor2_0 (#11761) #11763

Uh oh!

chenxiao111222 commented Jun 21, 2025

Uh oh!

a-r-r-o-w commented Jun 21, 2025

Uh oh!

yiyixuxu commented Jun 23, 2025

Uh oh!

a-r-r-o-w commented Jun 24, 2025

Uh oh!

yiyixuxu commented Jun 27, 2025

Uh oh!

a-r-r-o-w commented Jun 27, 2025

Uh oh!

yiyixuxu commented Jun 27, 2025

Uh oh!

Uh oh!

Avoid creating tensor in CosmosAttnProcessor2_0 (#11761) #11763

Are you sure you want to change the base?

Avoid creating tensor in CosmosAttnProcessor2_0 (#11761) #11763

Uh oh!

Conversation

chenxiao111222 commented Jun 21, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

a-r-r-o-w commented Jun 21, 2025

Uh oh!

yiyixuxu commented Jun 23, 2025

Uh oh!

a-r-r-o-w commented Jun 24, 2025

Uh oh!

yiyixuxu commented Jun 27, 2025

Uh oh!

a-r-r-o-w commented Jun 27, 2025

Uh oh!

yiyixuxu commented Jun 27, 2025

Uh oh!

Uh oh!