Nexesenex
/

WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant.GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Jan 21, 2024

Commit

4c779d7

·

verified ·

1 Parent(s): 0a94ba0

Update README.md

Files changed (1) hide show

README.md +8 -2

README.md CHANGED Viewed

@@ -15,7 +15,8 @@ And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from th
 -----
-Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens), and it lowers the perplexity by :
 More than 3% in Rope 8 on Q2_K
@@ -37,6 +38,10 @@ More than 1% with Rope 8 on Q3_K_S
 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
 Interestingly, Rope 2.5 is almost without loss compared to rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K :
 Rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512
@@ -45,7 +50,8 @@ Rope 3 (max context 12288) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b
 Rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512
-And for the adventurous, Rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512
 So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context.

 -----
+Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens).
+And good news, it lowers the perplexity by :
 More than 3% in Rope 8 on Q2_K
 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
 WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
+A Q3_K_M with iMatrix has been added as well.
+-----
 Interestingly, Rope 2.5 is almost without loss compared to rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K :
 Rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512
 Rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512
+And for the adventurous, Rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512
+- Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512
 So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context.