license: llama2
Quants for Sao10K's model WinterGoddess 1.4 70b : https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2
With a twist : the model I used come from a third party, and has been tweaked with limarvp3 and a Linear Rope 8 training to go to 32k context (with even better results in rope 4 and rope 2, maybe other lesser ropes as well)
I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF
So I made a Q8_0 out of it (best way to requantize after), and requantized it in Q3_K_S and Q2_K for my needs.
Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).
And a bonus to play with it, my KoboldCPP_-v1.55.1.b1933-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 800 tokens). And good news, it lowers the perplexity by :
More than 3% with linear rope 8 (Pos Compress Embeddings) on Q2_K
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512
More than 2% with linear ropee 4 on Q2_K
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.8859,512
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.7739,512
More than 1.5% with linear rope 2 on Q2_K
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5030,512
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.42,512
More than 1% with linear rope 8 on Q3_K_S
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
A Q3_K_M with iMatrix has been added as well, as well as a Q2_K_S.
Rope 2.5 :
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K_S.gguf,-,wikitext,4.6789,512
Edit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens)
Interestingly, linear rope 2.5 (and linear rope 1.6 as well after further testing) is almost without loss compared to linear rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K :
- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512
- Linear rope 3 (max context 12288) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6203,512
- Linear rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512
And for the adventurous, linear rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512
- Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512
So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context.
All these results are reproducible with lowers deltas between them for Q3_K_S, and I suppose for other quants as well.
Then, I wonder about applying a NTK rope on the top of it to expend it further, even if it screws with the integrity of numbers in chat). Multiply a linear rope (2, 4, 8, whatever) by 5888 (Alpha 1.6, or RBF 16119.8), 6144 (Alpha 1.8, or RBF 18168.7) and even 7424 (Alpha 2.2, or RBF 22277). This to get a further boost in max context size. Ex with Linear 8 with Alpha 1.8/RBF22277 : 8*7424 = 59392. It's only theorical of course, but worth testing.
Original 70b 4k model perplexity :
- WinterGoddess-1.4x-70B-L2.Q3_K_M.gguf,-,wikitext,3.7428,512,PEC1
Benchs of the original Q4_K_S quant I found :
Linear rope 8 10000
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.2177,4096
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1324,6144
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.3923,2048
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.4945,1536
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.6700,1024
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,5.2577,512
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400
Linear rope 4 10000
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.5762,2048
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1235,512
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400
Linear rope 2 10000
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.3394 *327,2048
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.8254,512
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400
Linear rope 1 10000
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400