Skip to content

Add int4 > bf16 PTX asm support #224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ggengnv
Copy link

@ggengnv ggengnv commented May 15, 2025

For int4_gemm, enable fast int4 > bf16 upcast using inline PTX asm for NVIDIA GPUs.
On benchmarked problem sizes, seeing ~1.5x-2x perf improvement on B200.

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@ggengnv ggengnv force-pushed the int4-upcast-asm branch from 8c8bcea to 3b10b81 Compare May 15, 2025 17:47
@ggengnv
Copy link
Author

ggengnv commented May 15, 2025

re-pushed for styling

@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ggengnv
Copy link
Author

ggengnv commented Jun 5, 2025

hi @xuzhao9 , is there anything I can do to help merge this PR?
I saw some internal tests are failing - do you think they're spurious errors or actual failures?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants