[mxfp8 moe training] integrate triton kernels for converting scales to blocked format #2902

danielvegamyhre · 2025-08-28T19:47:02Z

Conflicts due to bad git state... will clean up

Summary

Wrap kernels in custom ops for torch.compile composability
Replace use of torch_to_blocked_... functions with new triton kernels added in this stack [mxfp8 moe training] add per group blocked scale kernels #2886

Test plan

sanitize pytest test/prototype/moe_training/test_training.py -s

stack-info: PR: #2886, branch: danielvegamyhre/stack/62

… scales stack-info: PR: #2894, branch: danielvegamyhre/stack/63

stack-info: PR: #2897, branch: danielvegamyhre/stack/64

…o blocked format

pytorch-bot · 2025-08-28T19:47:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2902

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit beed882 with merge base f0cca99 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre added 4 commits August 28, 2025 12:30

[mxfp8 moe training] add per group blocked scale kernels

0312c7e

stack-info: PR: #2886, branch: danielvegamyhre/stack/62

[mxfp8 moe training] add triton kernel for blocked swizzled 3d weight…

6d41af1

… scales stack-info: PR: #2894, branch: danielvegamyhre/stack/63

[mxfp8 moe training] use dim1 cast cuda kernel in bwd

a12b575

stack-info: PR: #2897, branch: danielvegamyhre/stack/64

[mxfp8 moe training] integrate triton kernels for converting scales t…

beed882

…o blocked format

danielvegamyhre added mx topic: not user facing Use this tag if you don't want this PR to show up in release notes labels Aug 28, 2025

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 28, 2025

danielvegamyhre requested review from vkuzo and drisspg August 28, 2025 19:47

danielvegamyhre force-pushed the danielvegamyhre/stack/64 branch 2 times, most recently from 2ea5a0a to 843448d Compare August 28, 2025 21:42

danielvegamyhre changed the base branch from danielvegamyhre/stack/64 to danielvegamyhre/stack/65 August 28, 2025 23:43

danielvegamyhre changed the base branch from danielvegamyhre/stack/65 to main August 28, 2025 23:44

danielvegamyhre marked this pull request as draft August 29, 2025 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mxfp8 moe training] integrate triton kernels for converting scales to blocked format #2902

[mxfp8 moe training] integrate triton kernels for converting scales to blocked format #2902

Uh oh!

danielvegamyhre commented Aug 28, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

[mxfp8 moe training] integrate triton kernels for converting scales to blocked format #2902

Are you sure you want to change the base?

[mxfp8 moe training] integrate triton kernels for converting scales to blocked format #2902

Uh oh!

Conversation

danielvegamyhre commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts due to bad git state... will clean up

Summary

Test plan

Uh oh!

pytorch-bot bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2902

✅ No Failures

Uh oh!

Uh oh!

danielvegamyhre commented Aug 28, 2025 •

edited

Loading

pytorch-bot bot commented Aug 28, 2025 •

edited

Loading