Skip to content

Avoid creating tensor in CosmosAttnProcessor2_0 (#11761) #11763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chenxiao111222
Copy link

What does this PR do?

Fixes #11761

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@a-r-r-o-w
Copy link
Member

Just wanted to make a note that the reason this has to be a tensor is because it seemingly breaks ONNX export. I had it implemented the same way earlier but changed to this after suggestions from the nvidia team. cc @asfiyab-nvidia

@a-r-r-o-w a-r-r-o-w requested a review from yiyixuxu June 21, 2025 03:59
@yiyixuxu
Copy link
Collaborator

ohh thanks for the info @a-r-r-o-w
10% speed difference is really not small though (if confirmed)

but I think this size ratio is actually determined by config (inner_dim vs inner_kv_dim) and won't vary at run time, no?

@a-r-r-o-w
Copy link
Member

@yiyixuxu Yeah it shouldn't vary and we can compute this beforehand. I think the problem stemmed from using an integer (or int-like type) to do the repeat_interleave instead of a tensor. So, it doesn't matter if we compute it with query.idx(...) / key.idx(...) or pre-caculate the ratio. I'm not sure about the details but it looks like there were a few similar issues (pytorch/pytorch#100429, for example) which have been marked as resolved. They are very simple examples though, so this "mark dynamic" thing probably does not work with Cosmos (I'm too unfamiliar with ONNX to comment on this).

I don't think we have most of our model definitions compatible with ONNX though (has this been checked before?), so I think it might be okay to make a not and break this compatibility?

@yiyixuxu
Copy link
Collaborator

ohh thanks @a-r-r-o-w

do you have the original conversation about onnx export breaking? we could try to look into a solution if there is a reproducible script for the issue

otherwise, I think the easiest way is we could put an if else in with if torch.onnx.is_in_onnx_export() else statement in. Let me know what you think!

@a-r-r-o-w
Copy link
Member

@yiyixuxu Unfortunately, there's not much to gather from the original conversation. You can find it here: #10660 (comment)

Your suggestion sounds good to me

@yiyixuxu
Copy link
Collaborator

cc @chenxiao111222 can you add a if torch.onnx.is_in_onnx_export( ) and keep the original code path there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid creating tensor in CosmosAttnProcessor2_0
3 participants