[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm #2848

danielvegamyhre · 2025-08-22T14:42:51Z

Stacked PRs:

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

Summary

fbgemm recently added a 2d-3d mxfp8 grouped gemm in Enable MXFP8 grouped GEMM FBGEMM#4710
This PR integrates the new gemm into the MoE training code base for the following 2d-3d grouped gemms:
- output = input @ weight^T
- grad_input = grad_output @ weight
Add new mxfp8 utils to_blocked_per_group_2d (for input scales) and to_blocked_per_group_3d (for weight scales). These are pytorch reference implementations that are not performant. We can implement equivalent triton kernels for them later.
Notes on fbgemm API and pytorch grouped mm API:
- x must be shape (Mg, K) and row major / contiguous
- x scales must have been preprocessed to have per-group blocked layout and be contiguous
- weights must be shape (E, N, K) and row major / contiguous
- weight scales must have been pre-processed to have per-group blocked layout and be contiguous
- group sizes is a vector containing the size of each token group in the x tensor
- starting_row_after_padding corresponds to x_scales tensor and must be size len(group_sizes) + 1 where the first starting row is always 0, and each value corresponds to the starting row of group[i] in the x_scales tensor AFTER padding
Refactor _emulated_mxfp8_scaled_grouped_mm_2d_3d to have same function signature and input constraints as the fbgemm API

Test plan

pytest test/prototype/moe_training/test_scaled_grouped_mm.py -k mx
pytest test/prototype/moe_training/test_training.py -k mx

pytorch-bot · 2025-08-22T14:42:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2848

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Multiple CI trunk failures after landing https://github.com/pytorch/pytorch/pull/161002

✅ You can merge normally! (1 Unrelated Failure)

As of commit f70cc90 with merge base 8722c0c ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Run Regression Tests on ROCm / test-nightly (ROCM Nightly, linux.rocm.gpu.gfx942.2, --pre torch --index-url https://download.pyt... / linux-job (gh) (matched linux rule in flaky-rules.json)
Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

pytorch-bot · 2025-08-25T22:48:20Z

This PR needs to be approved by an authorized maintainer before merge.

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre · 2025-08-26T01:00:36Z

@drisspg @vkuzo this is working for all tested cases now and is ready for review/land

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

vkuzo · 2025-08-27T11:47:44Z

torchao/prototype/moe_training/scaled_grouped_mm.py

@@ -402,12 +400,30 @@ def backward(ctx, grad_out: torch.Tensor):
 def _emulated_mxfp8_scaled_grouped_mm_2d_3d(
    A_mx: torch.Tensor,
    A_scale: torch.Tensor,
-    B_t_mx: torch.Tensor,
-    B_t_scale: torch.Tensor,
+    B_mx: torch.Tensor,


nit: _mx should be for a combination of raw data and scale, if B_mx is just the data then better to call it something else

vkuzo · 2025-08-27T11:49:18Z

torchao/prototype/mx_formats/utils.py

+        blocked_scales: Tensor
+        start_row_after_padding: Tensor of shape (num_groups,) which contains the start row after padding for each group.
+    """
+    from fbgemm_gpu.experimental.gemm.triton_gemm.fp4_quantize import _to_blocked


is this the same function as the one we have in torchao?

ao/torchao/prototype/mx_formats/utils.py

Line 18 in 6f035e8

def to_blocked(input_matrix, use_triton_kernel: bool = False) -> Tensor:

Will test if they're the same and replace if we can - I was having trouble getting the kernel working without CUDA errors so was trying to minimize differences between fbgemm unit test code and this torchao code path.

should be the exact same

vkuzo

stamping since this is prototype

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

test/prototype/moe_training/test_scaled_grouped_mm.py

torchao/prototype/moe_training/kernels/mxfp8.py

danielvegamyhre added a commit that referenced this pull request Aug 22, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

e8759f2

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from a5a29db to e8759f2 Compare August 22, 2025 14:42

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2025

danielvegamyhre added a commit that referenced this pull request Aug 22, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

2638c34

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from e8759f2 to 2638c34 Compare August 22, 2025 14:45

danielvegamyhre added a commit that referenced this pull request Aug 22, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

b249e0c

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 2638c34 to b249e0c Compare August 22, 2025 14:45

danielvegamyhre added a commit that referenced this pull request Aug 23, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

57d96f2

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from b249e0c to 57d96f2 Compare August 23, 2025 00:08

danielvegamyhre added a commit that referenced this pull request Aug 23, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

24ac553

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 57d96f2 to 24ac553 Compare August 23, 2025 01:50

danielvegamyhre added a commit that referenced this pull request Aug 23, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

2772a69

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 24ac553 to 2772a69 Compare August 23, 2025 16:14

danielvegamyhre added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 23, 2025

danielvegamyhre added a commit that referenced this pull request Aug 23, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

6444a5e

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 2772a69 to 6444a5e Compare August 23, 2025 17:03

danielvegamyhre changed the title ~~[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm~~ [mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm, using uniform group sizes Aug 23, 2025

danielvegamyhre added a commit that referenced this pull request Aug 23, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

b975201

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 6444a5e to b975201 Compare August 23, 2025 17:18

danielvegamyhre changed the title ~~[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm, using uniform group sizes~~ [mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm Aug 23, 2025

danielvegamyhre added a commit that referenced this pull request Aug 23, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

2c8371d

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from b975201 to 2c8371d Compare August 23, 2025 17:20

danielvegamyhre added a commit that referenced this pull request Aug 23, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

7320c14

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 2c8371d to 7320c14 Compare August 23, 2025 17:24

danielvegamyhre requested review from drisspg and vkuzo August 23, 2025 17:25

danielvegamyhre added a commit that referenced this pull request Aug 25, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

e3a6297

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 7320c14 to e3a6297 Compare August 25, 2025 20:24

danielvegamyhre mentioned this pull request Aug 25, 2025

fbgemm mxfp8 grouped gemm issues #2877

Closed

danielvegamyhre added a commit that referenced this pull request Aug 25, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

3cd9a50

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from e3a6297 to 3cd9a50 Compare August 25, 2025 22:51

danielvegamyhre added a commit that referenced this pull request Aug 26, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

ad54e2b

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 3cd9a50 to ad54e2b Compare August 26, 2025 00:21

danielvegamyhre added a commit that referenced this pull request Aug 26, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

cabb470

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from ad54e2b to cabb470 Compare August 26, 2025 00:59

danielvegamyhre added a commit that referenced this pull request Aug 26, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

0dd17b5

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from cabb470 to 0dd17b5 Compare August 26, 2025 01:08

This was referenced Aug 26, 2025

[mxfp8 moe training] refactor all var names with suffix _mx to _data for clarity #2879

Merged

[mxfp8 moe training] add grouped gemm benchmark script #2882

Merged

[mxfp8 moe training] add per group blocked scale kernels #2886

Merged

vkuzo reviewed Aug 27, 2025

View reviewed changes

vkuzo approved these changes Aug 27, 2025

View reviewed changes

danielvegamyhre added ciflow/4xh100 ciflow/rocm mx labels Aug 27, 2025

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm

f70cc90

stack-info: PR: #2848, branch: danielvegamyhre/stack/55

danielvegamyhre force-pushed the danielvegamyhre/stack/55 branch from 0dd17b5 to f70cc90 Compare August 27, 2025 16:23

drisspg reviewed Aug 27, 2025

View reviewed changes

test/prototype/moe_training/test_scaled_grouped_mm.py Show resolved Hide resolved

drisspg reviewed Aug 27, 2025

View reviewed changes

torchao/prototype/moe_training/kernels/mxfp8.py Show resolved Hide resolved

drisspg approved these changes Aug 27, 2025

View reviewed changes

danielvegamyhre merged commit 15a6de6 into main Aug 27, 2025
19 of 20 checks passed

danielvegamyhre mentioned this pull request Aug 27, 2025

[roadmap/tracker] Low precision MoE training #2147

Open

50 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm #2848

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm #2848

Uh oh!

danielvegamyhre commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 25, 2025

Uh oh!

danielvegamyhre commented Aug 26, 2025

Uh oh!

vkuzo Aug 27, 2025

Uh oh!

vkuzo Aug 27, 2025

Uh oh!

danielvegamyhre Aug 27, 2025

Uh oh!

drisspg Aug 27, 2025

Uh oh!

vkuzo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm #2848

[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm #2848

Uh oh!

Conversation

danielvegamyhre commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!