-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NearSwap #480
Add NearSwap #480
Conversation
CI should now pass, please review |
1) Undo functional changes to nearswap in commit 905c328 - the t / lweight step is necessary for this to make sense. 2) All in pytorch to avoid unnecessary CPU-GPU transfer. 3) added minor detail to error messages
…ilar weights -> more bias towards v1
Thanks for the PR! I’ve made a couple of changes: This looks like a solid implementation of the algorithm described at QuartetAnemoi-70B-t0.0001. To clarify the algorithm (Given base_model and secondary_model): Is that understanding correct? |
Oops, that was accidentally removed.
Yes. |
Thanks for the PR! I've updated it to work with the new merge method registry. |
NearSwap Algorithm
NearSwap retains most of the weights of the base model, but when a weight is similar between the two, it is interpolated to the secondary model value. A parameter t specifies the sameness threshold. When the distance between two values is below t, the weight from the secondary model is used.
This PR implements the NearSwap algorithm from here