Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Custom Quants for MistralAI Mistral Large v2 123b
|
2 |
+
|
3 |
+
IQ4_XXSR, basically IQ4_XS with attn_q in IQ3_S, and attn_v in Q6_K, and token_embed in Q6_0.
|
4 |
+
Yes, you did read correctly, the last traditional quant of Ikawrakow, not available on Llama.cpp mainline.
|
5 |
+
|
6 |
+
WARNING :
|
7 |
+
Compatible with IK_Llama.cpp and Croco.cpp (my fork of the great KoboldCpp) only. I'll release .exe soon, but it works already (at least on Windows) for those who can compile.
|
8 |
+
https://github.com/Nexesenex/croco.cpp
|
9 |
+
|
10 |
+
Overall, maybe it's time for the Llama.cpp team to have a look at Ikawrakow's last work and offer terms of cooperation with him, so we can enjoy once again SOTA quants in Llama.cpp.
|
11 |
+
https://github.com/ikawrakow/ik_llama.cpp
|
12 |
+
|
13 |
+
Because the situation is becoming grotesque : we are quantizing massively models with non-SOTA quants while there is better in reach.
|
14 |
+
Thousands of terabytes of storage space, our compute and our time is wasted because of this situation.
|