[MoE] Cleanup MoE examples #1576

kylesayrs · 2025-06-20T17:53:25Z

Purpose

Update MoE examples to reflect latest MoE models
Remove redundant moe examples, standardize examples around W4A16

Prerequisites

Changes

Just four examples
- deepseekv2_5_example.py
- deepseekv3_example.py
- mixtral_example.py
- qwen_example.py
Update examples tests to run mixtral and qwen but not deepseek examples

Signed-off-by: Kyle Sayers <[email protected]>

gemini-code-assist

Summary of Changes

Hello @kylesayrs, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on cleaning up and modernizing the Mixture of Experts (MoE) quantization examples. It updates the models used, standardizes quantization configurations, and significantly improves the handling of model input devices within the calibration pipelines, making the examples more robust and aligned with current best practices.

Highlights

Updated MoE Examples: The DeepSeek and Mixtral quantization examples have been updated to use newer model IDs (DeepSeek-V2.5, Mixtral-8x7B-Instruct-v0.1), reduced calibration sample counts (from 2048 to 512), and standardized W4A16 quantization schemes. Outdated comments and version checks related to transformers library compatibility have also been removed.
Improved Model Device Handling: Calibration pipelines (basic, layer_sequential, sequential) now dynamically infer the model's input device using a new infer_model_device utility. This enhances flexibility and robustness, especially for models with offloaded parameters, and allows for explicit device specification via a new model_input_device argument in DatasetArguments.
Standardized Generation and Saving: Sample generation code across examples has been simplified for consistency, using a common input format and max_new_tokens=100. Save directory names now consistently reflect the W4A16 quantization scheme.
Test Suite Updates: The test suite for MoE examples has been updated to reflect the new example filenames, ensuring that the deepseekv2_5_example.py, mixtral_example.py, and qwen_example.py scripts are properly included in automated testing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-06-20T17:54:31Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <[email protected]>

gemini-code-assist

Code Review

This pull request provides a nice cleanup for the Mixture of Experts (MoE) examples, updating them to use more recent models and simplifying the quantization recipes. The most significant improvement is the robust handling of device placement for calibration data across various pipelines, which is a great enhancement.

My main feedback is to avoid hardcoding the "cuda" device in the sample generation part of the example scripts. Inferring the device from the model would make the examples more portable and robust. I've left specific suggestions on how to achieve this in the relevant files.

examples/quantizing_moe/deepseekv2_5_example.py

examples/quantizing_moe/mixtral_example.py

examples/quantizing_moe/qwen_example.py

This reverts commit c44da34.

Signed-off-by: Kyle Sayers <[email protected]>

dsikka

Just an fyi - both fp8 and w4a16 were popular enough to end up their own examples as a convenience (especially as fp8 can be data free and most popular on vLLM so it’s helpful to have that to point to).

I would keep 1-2 fp8, at least for the most popular models

kylesayrs added 14 commits June 19, 2025 10:55

deepseekv3

b30eade

Signed-off-by: Kyle Sayers <[email protected]>

remove dreg

a957f2f

Signed-off-by: Kyle Sayers <[email protected]>

reformat example

2fd2a25

Signed-off-by: Kyle Sayers <[email protected]>

wip: clean up moe examples

b8b217c

Signed-off-by: Kyle Sayers <[email protected]>

remove deepseek2.5 for now

43bc91d

Signed-off-by: Kyle Sayers <[email protected]>

update readme

7d8ed36

Signed-off-by: Kyle Sayers <[email protected]>

infer model device with optional override

b7273a9

Signed-off-by: Kyle Sayers <[email protected]>

handle nullable dataset_args

afebe2e

Signed-off-by: Kyle Sayers <[email protected]>

update docstrings, comments

ab3aa3e

Signed-off-by: Kyle Sayers <[email protected]>

rename files, update examples tests

e9e30c3

Signed-off-by: Kyle Sayers <[email protected]>

rebase on main

6bf5acb

Signed-off-by: Kyle Sayers <[email protected]>

clean examples

e77a31b

Signed-off-by: Kyle Sayers <[email protected]>

revert examples changes

366ac25

Signed-off-by: Kyle Sayers <[email protected]>

revert extra examples

c44da34

Signed-off-by: Kyle Sayers <[email protected]>

gemini-code-assist bot reviewed Jun 20, 2025

View reviewed changes

revert examples changes

2db2789

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 2 commits June 20, 2025 13:55

remove extra examples

0dc2381

Signed-off-by: Kyle Sayers <[email protected]>

revert examples tests changes

b70aba7

Signed-off-by: Kyle Sayers <[email protected]>

gemini-code-assist bot reviewed Jun 20, 2025

View reviewed changes

examples/quantizing_moe/deepseekv2_5_example.py Show resolved Hide resolved

examples/quantizing_moe/mixtral_example.py Show resolved Hide resolved

examples/quantizing_moe/qwen_example.py Show resolved Hide resolved

kylesayrs added 3 commits June 20, 2025 13:58

Revert "revert extra examples"

5e5657b

This reverts commit c44da34.

Merge branch 'kylesayrs/deepseek-v3' into kylesayrs/cleanup-moe-examples

735c317

clean up examples

4812350

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the base branch from main to kylesayrs/deepseekv2.5 June 20, 2025 18:04

dsikka reviewed Jun 20, 2025

View reviewed changes

Base automatically changed from kylesayrs/deepseekv2.5 to main June 20, 2025 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE] Cleanup MoE examples #1576

[MoE] Cleanup MoE examples #1576

Uh oh!

kylesayrs commented Jun 20, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Jun 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment •

edited

Loading

Uh oh!

Uh oh!

[MoE] Cleanup MoE examples #1576

Are you sure you want to change the base?

[MoE] Cleanup MoE examples #1576

Uh oh!

Conversation

kylesayrs commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prerequisites

Changes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Jun 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs commented Jun 20, 2025 •

edited

Loading

dsikka left a comment •

edited

Loading