-
Notifications
You must be signed in to change notification settings - Fork 322
[mxfp8 moe training] use dim1 cast cuda kernel in bwd #2897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2897
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 Cancelled Jobs, 4 PendingAs of commit e64d4b5 with merge base 83a20c7 ( CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
c4c957f
to
9c2f8a2
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
9c2f8a2
to
e8e19bc
Compare
f451be2
to
95c3e51
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
e8e19bc
to
22facfa
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
22facfa
to
fd521e8
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
fd521e8
to
a2ff75b
Compare
a2ff75b
to
fd521e8
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
fd521e8
to
a12b575
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
a12b575
to
2ea5a0a
Compare
c388a0e
to
0f0598a
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
2ea5a0a
to
843448d
Compare
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
843448d
to
02e246a
Compare
MXGemmKernelChoice, | ||
ScaleCalculationMode, | ||
) | ||
from torchao.prototype.mx_formats.mx_linear import _to_mxfp8_dim1_kernel_wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move to utils file if you want to reuse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
stack-info: PR: #2897, branch: danielvegamyhre/stack/64
02e246a
to
e64d4b5
Compare
Stacked PRs:
[mxfp8 moe training] use dim1 cast cuda kernel in bwd
This way we can eliminate the .contiguous() call, as well as use the dim1 cast CUDA kernel which is the fastest option (vs inductor codgen or triton).