Nexesenex
/

Mistral-Large-Instruct-2407-bf16-iMat-CQ-GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Nov 14, 2024

Commit

fb803af

·

verified ·

1 Parent(s): a438733

Create README.md

Files changed (1) hide show

README.md +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+Custom Quants for MistralAI Mistral Large v2 123b
+IQ4_XXSR, basically IQ4_XS with attn_q in IQ3_S, and attn_v in Q6_K, and token_embed in Q6_0.
+Yes, you did read correctly, the last traditional quant of Ikawrakow, not available on Llama.cpp mainline.
+WARNING :
+Compatible with IK_Llama.cpp and Croco.cpp (my fork of the great KoboldCpp) only. I'll release .exe soon, but it works already (at least on Windows) for those who can compile.
+https://github.com/Nexesenex/croco.cpp
+Overall, maybe it's time for the Llama.cpp team to have a look at Ikawrakow's last work and offer terms of cooperation with him, so we can enjoy once again SOTA quants in Llama.cpp.
+https://github.com/ikawrakow/ik_llama.cpp
+Because the situation is becoming grotesque : we are quantizing massively models with non-SOTA quants while there is better in reach.
+Thousands of terabytes of storage space, our compute and our time is wasted because of this situation.