Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add megablocks support for MLP MoE #2

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Spico197
Copy link
Collaborator

@Spico197 Spico197 commented Dec 7, 2024

What's New

Add megablocks support for MLP MoE. Dumping & Reloading test is passed by observing the continuous loss decline. But further downstream metrics are not tested. Please use this with caution.

  1. Conversion from the dense LLaMA model: smoe/utils/expert_construction/convert_llama_to_mixtral_mb.py
  2. Add moe_type="megablocks" support for smoe/models/mixtral/modeling_mixtral.py

Performance Test

  • Experiments are conducted on 4*A100 GPUs with parameters converted from LLaMA-3-8B (8 experts, top-2).
  • The dataset is composed of 50 samples from OpenHermes-2.5.
  • bsz=2, grad accum=4, seq len=4096
Setting Tokens/GPU/Second
w/o MegaBlocks 13485
w/ MegaBlocks 19051

@Spico197 Spico197 requested a review from XiaoYee December 7, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant