Skip to content

Fix issues with Llama HF->NeoX conversion #1345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 9, 2025

Conversation

aurelion-source
Copy link
Contributor

@aurelion-source aurelion-source commented Mar 10, 2025

Resolves #1337 and #1342

  • Following fix a GQA issue (#1314) #1315, the GQA code no longer splits heads based on num_q_heads. This PR updates tools/ckpts/convert_hf_llama_to_neox.py to concatenate tp-partitioned q, k, and v weights without per-head splitting.
  • The current RMSNorm implementation incorrectly adds epsilon to the RMS instead of the variance. Fix is the same as RMSNorm epsilon implementation #1342, but ensures compatibility with partial RMSNorm.

  instead of splitting by heads first for GQA
- Fixes RMSNorm implementation by adding epsilon to the varience
  instead of adding it directly to RMS
@aflah02
Copy link
Contributor

aflah02 commented Mar 10, 2025

Hi @aurelion-source
I think there is a similar issue in the reverse direction (NeoX to HF) caused by the same changes as when I convert a GQA model to HF it starts generating gibberish until I modify the conversion file. The RMS Norm implementation also varies which causes discrepancies b/w Llama HF class and NeoX

@aurelion-source
Copy link
Contributor Author

Hi @aurelion-source I think there is a similar issue in the reverse direction (NeoX to HF) caused by the same changes as when I convert a GQA model to HF it starts generating gibberish until I modify the conversion file. The RMS Norm implementation also varies which causes discrepancies b/w Llama HF class and NeoX

Hi @aflah02, thanks for pointing out the issue with the NeoX -> HF conversion. I've updated and tested the script to fix the problem.
This PR also addresses RMSNorm discrepancies to align with other implementations. Let me know if you run into any issues if you get a chance to test this.

@aflah02
Copy link
Contributor

aflah02 commented Apr 5, 2025

Thanks @aurelion-source
I'll try to test this next week and get back

One question I have is if I use the fused rms norm kernel will it be compatible with the HF version? or is it equivalent to the neox version

@Shetano
Copy link
Contributor

Shetano commented Apr 14, 2025

Thanks @aurelion-source I'll try to test this next week and get back

One question I have is if I use the fused rms norm kernel will it be compatible with the HF version? or is it equivalent to the neox version

Yes, it should be compatible with HF. It uses apex's fused RMSNorm implementation.

Quentin-Anthony
Quentin-Anthony previously approved these changes May 9, 2025
@Quentin-Anthony Quentin-Anthony merged commit 4c9f108 into main May 9, 2025
1 of 4 checks passed
@Quentin-Anthony Quentin-Anthony deleted the fix_llama_neox_conversion branch May 9, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert_hf_llama_to_neox.py appears incompatible with LLAMA-3.1-70B
5 participants