Best alpha/cpe values while using extended context length on Exllama? #227
Replies: 1 comment
-
For linear scaled models, use whatever it was finetuned with. So, SuperHOT 8k should be cpe 4, SuperHOT 16k should be cpe 8, and so on. Usually [finetune length] / [base model length] = cpe NTK scaling is more complicated. Unlike linear scaling, it does not need a finetune to work. There are several different variants of NTK scaling now that all work a little differently. "NTKv1" (--alpha) like ExLlama currently uses is kinda hard to predict, a ppl test is pretty reliable if you know your test is actually calculating ppl near the end of the context window and not on shorter sequences. For example, the ppl benchmark built into ExLlama tests an average sequence length of like 150, which is obviously nowhere near the point the model starts losing it if max length is set too high. Otherwise if you're using it on a regular base length model, you can just work out empirically where the model starts freaking out and then dial alpha up a bit, or context down a bit. Do note that alpha is a float, so you can pass in, like, 2.5 or whatever. Oh, and you probably shouldn't mix both --compress_pos_emb and --alpha. |
Beta Was this translation helpful? Give feedback.
-
Curious what values work best for y'all when using SuperHOT and not SuperHOT models with exllama, trying an 8K superhot model with cpe 4 works great but for some reason it starts losing coherence after a while when using alpha 4, also how does alpha and cpe interact with each other? Is ppl the best way to optimize the parameters?
Tbh just some discussion on what y'all have found works best (would love info on 8k/16k contexts)
Beta Was this translation helpful? Give feedback.
All reactions