README.md · Nexesenex/WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF at 2441e06428b2798951e1f8d5305df9ac6f842b1b

metadata

license: llama2

Quants for Sao10K's model WinterGoddess 1.4 70b : https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2

With a twist : the model I used come from a third party, and has been tweaked with limarvp3 and a Linear Rope 8 training to go to 32k context (with even better results in rope 4 and rope 2, maybe other lesser ropes as well)

I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF

So I made a Q8_0 out of it (best way to requantize after), and requantized it in Q3_K_S and Q2_K for my needs.

Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).

And a bonus to play with it, my KoboldCPP_-v1.55.1.b1933-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933

Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens). And good news, it lowers the perplexity by :

More than 3% in Rope 8 on Q2_K

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512

More than 2% in Rope 4 on Q2_K

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.8859,512 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.7739,512

More than 1.5% in Rope 2 on Q2_K

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5030,512 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.42,512

More than 1% with Rope 8 on Q3_K_S

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512

A Q3_K_M with iMatrix has been added as well.

Interestingly, Rope 2.5 is almost without loss compared to rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K :

Rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512

Rope 3 (max context 12288) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6203,512

Rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512

And for the adventurous, Rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512

Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512

So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context.

Then, I wonder about applying a NTK rope on the top of it to expend it further, even if it screws with the integrity of numbers in chat). Multiply a linear rope (2, 4, 8, whatever) by 5888 (Alpha 1.6, or RBF 16119.8), 6144 (Alpha 1.8, or RBF 18168.7) and even 7424 (Alpha 2.2, or RBF 22277). This to get a further boost in max context size. Ex with Linear 8 with Alpha 1.8/RBF22277 : 8*7424 = 59392. It's only theorical of course, but worth testing.

Benchs of the original Q4_K_S quant I found :

Rope 8 10000

WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.2177,4096 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1324,6144 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.3923,2048 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.4945,1536 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.6700,1024 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,5.2577,512 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400

Rope 4 10000

WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.5762,2048 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1235,512 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400

Rope 2 10000

WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.3394 *327,2048 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.8254,512 WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400

Rope 1 10000

WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400