Fix issues with Llama HF->NeoX conversion #1345

aurelion-source · 2025-03-10T13:01:29Z

Following fix a GQA issue (#1314) #1315, the GQA code no longer splits heads based on num_q_heads. This PR updates tools/ckpts/convert_hf_llama_to_neox.py to concatenate tp-partitioned q, k, and v weights without per-head splitting.
The current RMSNorm implementation incorrectly adds epsilon to the RMS instead of the variance. Fix is the same as RMSNorm epsilon implementation #1342, but ensures compatibility with partial RMSNorm.

instead of splitting by heads first for GQA - Fixes RMSNorm implementation by adding epsilon to the varience instead of adding it directly to RMS

aflah02 · 2025-03-10T17:57:26Z

Hi @aurelion-source
I think there is a similar issue in the reverse direction (NeoX to HF) caused by the same changes as when I convert a GQA model to HF it starts generating gibberish until I modify the conversion file. The RMS Norm implementation also varies which causes discrepancies b/w Llama HF class and NeoX

aurelion-source · 2025-04-05T16:52:51Z

Hi @aurelion-source I think there is a similar issue in the reverse direction (NeoX to HF) caused by the same changes as when I convert a GQA model to HF it starts generating gibberish until I modify the conversion file. The RMS Norm implementation also varies which causes discrepancies b/w Llama HF class and NeoX

Hi @aflah02, thanks for pointing out the issue with the NeoX -> HF conversion. I've updated and tested the script to fix the problem.
This PR also addresses RMSNorm discrepancies to align with other implementations. Let me know if you run into any issues if you get a chance to test this.

aflah02 · 2025-04-05T17:00:30Z

Thanks @aurelion-source
I'll try to test this next week and get back

One question I have is if I use the fused rms norm kernel will it be compatible with the HF version? or is it equivalent to the neox version

Shetano · 2025-04-14T05:39:37Z

Thanks @aurelion-source I'll try to test this next week and get back

One question I have is if I use the fused rms norm kernel will it be compatible with the HF version? or is it equivalent to the neox version

Yes, it should be compatible with HF. It uses apex's fused RMSNorm implementation.

aurelion-source added 2 commits March 10, 2025 08:42

- Fixes Hf llama -> neox conversion by simply concatenating q,k,b

1507074

instead of splitting by heads first for GQA - Fixes RMSNorm implementation by adding epsilon to the varience instead of adding it directly to RMS

- Remove RMSNorm extra argument

648996c

aurelion-source self-assigned this Mar 10, 2025

aurelion-source requested a review from Quentin-Anthony as a code owner March 10, 2025 13:01

aurelion-source added 2 commits April 5, 2025 12:40

Fixes qkv projections for neox->hf conversion

996f536

Fixes qkv projections for neox->hf conversion

57459fd

Quentin-Anthony previously approved these changes May 9, 2025

View reviewed changes

precommit

7c1aa91

Quentin-Anthony dismissed their stale review via 7c1aa91 May 9, 2025 15:40

Quentin-Anthony approved these changes May 9, 2025

View reviewed changes

Quentin-Anthony merged commit 4c9f108 into main May 9, 2025
1 of 4 checks passed

Quentin-Anthony deleted the fix_llama_neox_conversion branch May 9, 2025 15:40

Quentin-Anthony mentioned this pull request May 9, 2025

RMSNorm epsilon implementation #1342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with Llama HF->NeoX conversion #1345

Fix issues with Llama HF->NeoX conversion #1345

aurelion-source commented Mar 10, 2025 •

edited

Loading

aflah02 commented Mar 10, 2025

aurelion-source commented Apr 5, 2025

aflah02 commented Apr 5, 2025

Shetano commented Apr 14, 2025

Fix issues with Llama HF->NeoX conversion #1345

Fix issues with Llama HF->NeoX conversion #1345

Conversation

aurelion-source commented Mar 10, 2025 • edited Loading

aflah02 commented Mar 10, 2025

aurelion-source commented Apr 5, 2025

aflah02 commented Apr 5, 2025

Shetano commented Apr 14, 2025

aurelion-source commented Mar 10, 2025 •

edited

Loading