Update README.md
Browse files
README.md
CHANGED
@@ -18,40 +18,34 @@ And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from th
|
|
18 |
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens).
|
19 |
And good news, it lowers the perplexity by :
|
20 |
|
21 |
-
More than 3% with linear rope 8 on Q2_K
|
22 |
-
|
23 |
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512
|
24 |
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512
|
25 |
|
26 |
More than 2% with linear ropee 4 on Q2_K
|
27 |
-
|
28 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.
|
29 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.7739,512
|
30 |
|
31 |
More than 1.5% with linear rope 2 on Q2_K
|
32 |
-
|
33 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.
|
34 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.42,512
|
35 |
|
36 |
More than 1% with linear rope 8 on Q3_K_S
|
37 |
-
|
38 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.
|
39 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
|
40 |
|
41 |
A Q3_K_M with iMatrix has been added as well, and a Q2_K_S is otw.
|
42 |
|
43 |
Rope 2.5 :
|
44 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K_S.gguf,-,wikitext,4.6789,512
|
45 |
|
46 |
-----
|
47 |
|
48 |
Interestingly, linear rope 2.5 (and linear rope 1.6 as well after further testing) is almost without loss compared to linear rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K :
|
49 |
|
50 |
-
Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512
|
51 |
-
|
52 |
-
Linear rope 3 (max context
|
53 |
-
|
54 |
-
Linear rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512
|
55 |
|
56 |
And for the adventurous, linear rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512
|
57 |
- Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512
|
@@ -67,30 +61,29 @@ It's only theorical of course, but worth testing.
|
|
67 |
|
68 |
-----
|
69 |
|
|
|
|
|
|
|
70 |
Benchs of the original Q4_K_S quant I found :
|
71 |
|
72 |
Linear rope 8 10000
|
73 |
-
|
74 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.
|
75 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.
|
76 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.
|
77 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.
|
78 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,
|
79 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,
|
80 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400
|
81 |
|
82 |
Linear rope 4 10000
|
83 |
-
|
84 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,
|
85 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,
|
86 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400
|
87 |
|
88 |
Linear rope 2 10000
|
89 |
-
|
90 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.
|
91 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,
|
92 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400
|
93 |
|
94 |
Linear rope 1 10000
|
95 |
-
|
96 |
-
WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400
|
|
|
18 |
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens).
|
19 |
And good news, it lowers the perplexity by :
|
20 |
|
21 |
+
More than 3% with linear rope 8 (Pos Compress Embeddings) on Q2_K
|
|
|
22 |
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512
|
23 |
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512
|
24 |
|
25 |
More than 2% with linear ropee 4 on Q2_K
|
26 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.8859,512
|
27 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.7739,512
|
|
|
28 |
|
29 |
More than 1.5% with linear rope 2 on Q2_K
|
30 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5030,512
|
31 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.42,512
|
|
|
32 |
|
33 |
More than 1% with linear rope 8 on Q3_K_S
|
34 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512
|
35 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512
|
|
|
36 |
|
37 |
A Q3_K_M with iMatrix has been added as well, and a Q2_K_S is otw.
|
38 |
|
39 |
Rope 2.5 :
|
40 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K_S.gguf,-,wikitext,4.6789,512
|
41 |
|
42 |
-----
|
43 |
|
44 |
Interestingly, linear rope 2.5 (and linear rope 1.6 as well after further testing) is almost without loss compared to linear rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K :
|
45 |
|
46 |
+
- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512
|
47 |
+
- Linear rope 3 (max context 12288) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6203,512
|
48 |
+
- Linear rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512
|
|
|
|
|
49 |
|
50 |
And for the adventurous, linear rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512
|
51 |
- Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512
|
|
|
61 |
|
62 |
-----
|
63 |
|
64 |
+
Original 70b 4k model perplexity :
|
65 |
+
- WinterGoddess-1.4x-70B-L2.Q3_K_M.gguf,-,wikitext,3.7428,512,PEC1
|
66 |
+
|
67 |
Benchs of the original Q4_K_S quant I found :
|
68 |
|
69 |
Linear rope 8 10000
|
70 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.2177,4096
|
71 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1324,6144
|
72 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.3923,2048
|
73 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.4945,1536
|
74 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.6700,1024
|
75 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,5.2577,512
|
76 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400
|
|
|
77 |
|
78 |
Linear rope 4 10000
|
79 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.5762,2048
|
80 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1235,512
|
81 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400
|
|
|
82 |
|
83 |
Linear rope 2 10000
|
84 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.3394 *327,2048
|
85 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.8254,512
|
86 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400
|
|
|
87 |
|
88 |
Linear rope 1 10000
|
89 |
+
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400
|
|