-
Notifications
You must be signed in to change notification settings - Fork 321
[mxfp8 moe training] refactor all var names with suffix _mx to _data for clarity #2879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2879
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit fb1628c with merge base 15a6de6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
36e6bc7
to
ca146f1
Compare
…or clarity stack-info: PR: #2879, branch: danielvegamyhre/stack/60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, but if you're using A_scales
I'd say the corresponding name is something like A_data
, not A_fp8
.
Yeah, hmm, scales and data are both different fp8 formats so I can see how this still could be unclear. The thing is, we already have |
@@ -291,13 +291,13 @@ def forward( | |||
ctx.out_dtype = out_dtype | |||
ctx.emulated = emulated | |||
|
|||
# A_mx shape: (M, K) | |||
# A_fp8 shape: (M, K) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of A_f8
, this isn't going to scale to other dtypes. A_data
is IMO better and more representative of the MX format being composed of raw data and scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamping because prototype
ca146f1
to
ac6f4f1
Compare
…or clarity stack-info: PR: #2879, branch: danielvegamyhre/stack/60
ac6f4f1
to
9f8ac6b
Compare
9f8ac6b
to
c7e1840
Compare
…for clarity stack-info: PR: #2879, branch: danielvegamyhre/stack/60
c7e1840
to
fb1628c
Compare
Stacked PRs:
[mxfp8 moe training] refactor all var names with suffix _mx to _fp8 for clarity