Add NearSwap #480

popbyte · 2025-01-05T06:06:05Z

NearSwap Algorithm

NearSwap retains most of the weights of the base model, but when a weight is similar between the two, it is interpolated to the secondary model value. A parameter t specifies the sameness threshold. When the distance between two values is below t, the weight from the secondary model is used.

This PR implements the NearSwap algorithm from here

popbyte · 2025-01-05T11:17:05Z

CI should now pass, please review

1) Undo functional changes to nearswap in commit 905c328 - the t / lweight step is necessary for this to make sense. 2) All in pytorch to avoid unnecessary CPU-GPU transfer. 3) added minor detail to error messages

…ilar weights -> more bias towards v1

ElliotStein · 2025-01-06T14:58:00Z

Thanks for the PR! I’ve made a couple of changes:
1. I swapped out NumPy for PyTorch to avoid unnecessary CPU-GPU data transfers.
2. I restored the lweights = t / lweights line, which I believe is necessary since the t parameter wasn’t being used otherwise.

This looks like a solid implementation of the algorithm described at QuartetAnemoi-70B-t0.0001.

To clarify the algorithm (Given base_model and secondary_model):
• Use weights from secondary_model directly when they are close to those in base_model (within threshold t).
• Interpolate between base_model and secondary_model otherwise, with t/abs(V0-V1) as the scaling parameter.
• The further apart the weights are, the more the interpolation favours the base_model

Is that understanding correct?

popbyte · 2025-01-07T09:51:39Z

I restored the lweights = t / lweights line, which I believe is necessary since the t parameter wasn’t being used otherwise

Oops, that was accidentally removed.

Is that understanding correct?

Yes.

cg123 · 2025-01-25T07:17:20Z

Thanks for the PR! I've updated it to work with the new merge method registry.

add nearswap

1a656dd

popbyte marked this pull request as draft January 5, 2025 06:08

add tests and pass those tests

905c328

popbyte marked this pull request as ready for review January 5, 2025 06:47

popbyte added 4 commits January 5, 2025 08:12

threshold should not accept np.ndarray

2808892

remove group_label function as it is unused

92f7f3b

update README

0d68bb9

this commit makes CI pass

a03b696

ElliotStein added 4 commits January 6, 2025 12:37

Rewrite NearSwap:

28f10c8

1) Undo functional changes to nearswap in commit 905c328 - the t / lweight step is necessary for this to make sense. 2) All in pytorch to avoid unnecessary CPU-GPU transfer. 3) added minor detail to error messages

link to original repo

045dd2b

linear interp was the wrong way around - if the intention is more sim…

1e59788

…ilar weights -> more bias towards v1

revert previous commit

a9e884d

cg123 added 3 commits January 23, 2025 22:25

Merge branch 'main' into main

3aeb1e8

Merge branch 'main' into main

c1a81f7

Add nearswap to registry

976361b

cg123 merged commit 84c83f8 into arcee-ai:main Jan 25, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NearSwap #480

Add NearSwap #480

popbyte commented Jan 5, 2025 •

edited

Loading

popbyte commented Jan 5, 2025

ElliotStein commented Jan 6, 2025

popbyte commented Jan 7, 2025

cg123 commented Jan 25, 2025

Add NearSwap #480

Add NearSwap #480

Conversation

popbyte commented Jan 5, 2025 • edited Loading

NearSwap Algorithm

popbyte commented Jan 5, 2025

ElliotStein commented Jan 6, 2025

popbyte commented Jan 7, 2025

cg123 commented Jan 25, 2025

popbyte commented Jan 5, 2025 •

edited

Loading