Update README.md
Browse files
README.md
CHANGED
@@ -11,6 +11,8 @@ So I made a Q8_0 out of it (best way to requantize after), and requantized it in
|
|
11 |
|
12 |
Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).
|
13 |
|
|
|
|
|
14 |
-----
|
15 |
|
16 |
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens), and it lowers the perplexity by :
|
|
|
11 |
|
12 |
Lowers quants (SOTA 2 bits) to come if I'm able to make an iMatrix on my config (64GB RAM).
|
13 |
|
14 |
+
And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933
|
15 |
+
|
16 |
-----
|
17 |
|
18 |
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 640 tokens), and it lowers the perplexity by :
|