Skip to content

[v0.9.1][Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 #1397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2025

Conversation

yiz-liu
Copy link
Contributor

@yiz-liu yiz-liu commented Jun 24, 2025

What this PR does / why we need it?

Reset all unused positions in NPUModelRunner to prevent out-of-bounds asserts in the GatherV3 operator.

Currently, in get_splitfuse_attn_mask, the position tensor may contain values that exceed the dimensions of the attention mask, triggering a GatherV3 boundary check failure. These invalid indices originate from stale “dirty” entries left over in position due to padding logic in the ACL graph. Specifically, in _process_reqs, the variable num_input_tokens is always greater than or equal to total_num_scheduled_tokens, so any positions not explicitly cleared from a previous batch will persist and cause this sporadic error.

BTW, in the original vLLM implementation, masks are constructed internally using other args, so these lingering values do not surface. However, on the Ascend platform—where split-fuse attention requires externally supplied masks—these residual indices become critical and lead to this elusive, hard-to-reproduce failure.

The fix is to explicitly reset or zero out all unused entries in the position tensor before passing it to GatherV3, ensuring that every index lies within the valid range of the attention mask.

Does this PR introduce any user-facing change?

How was this patch tested?

@yiz-liu
Copy link
Contributor Author

yiz-liu commented Jun 24, 2025

This pull request addresses the following issues:
#1324 , #1115 , #1038

@Yikun
Copy link
Collaborator

Yikun commented Jun 24, 2025

Please also submit the PR to main

@ApsarasX
Copy link
Collaborator

LGTM

@Yikun Yikun added accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR and removed ready-for-test start test by label for PR accuracy-test enable all accuracy test for PR labels Jun 24, 2025
@wangxiyuan wangxiyuan changed the title [Bugfix] Fix GatherV3 bug [v0.9.1][Bugfix] Fix GatherV3 bug Jun 25, 2025
@Yikun Yikun changed the title [v0.9.1][Bugfix] Fix GatherV3 bug [v0.9.1][Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 Jun 25, 2025
@wangxiyuan wangxiyuan merged commit bf17152 into vllm-project:v0.9.1-dev Jun 26, 2025
17 of 19 checks passed
@yiz-liu yiz-liu deleted the fix-gatherv3 branch June 26, 2025 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-test start test by label for PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants