gobean
/

Mixtral-8x7B-Instruct-v0.1-GGUF

Model card Files Files and versions Community

gobean commited on Apr 17

Commit

f6d5df3

•

1 Parent(s): b960622

Update README.md

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -1,6 +1,8 @@
 ---
 license: apache-2.0
 ---
 These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
@@ -14,5 +16,8 @@ The quality of the llamafiles generated from these freshly converted GGUFs were
 These three were most interesting because:
 - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
 - q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
-- q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"

 ---
 license: apache-2.0
 ---
+Update: Someone requested q4_0, q5_0, and q6_k. Added, and q5_0 is my new favorite for this any any Mixtral derivative. Try it. Something about the 'k' process ever so slightly alters mixtrals. Compare if you don't believe me.
 These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
 These three were most interesting because:
 - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
+- q4-0: for some reason, this is better quality than q4-k-m.
 - q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
+- q5-0: * recommended * for some reason, this is better quality than q5-k-m.
+- q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
+- q6-k: lower perplexity, but I don't like the output style