Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,20 @@ Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config
|
|
13 |
|
14 |
-----
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
Benchs of the original Q4_K_S quant I found :
|
17 |
|
18 |
Rope 8 10000
|
|
|
13 |
|
14 |
-----
|
15 |
|
16 |
+
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens), and it lowers the perplexity by :
|
17 |
+
|
18 |
+
More than 3% in Rope 8 on Q2_K
|
19 |
+
|
20 |
+
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512,
|
21 |
+
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512
|
22 |
+
|
23 |
+
More than 1% with Rope 8 on Q3_K_S
|
24 |
+
|
25 |
+
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
|
26 |
+
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
|
27 |
+
|
28 |
+
-----
|
29 |
+
|
30 |
Benchs of the original Q4_K_S quant I found :
|
31 |
|
32 |
Rope 8 10000
|