gobean commited on
Commit
f6d5df3
1 Parent(s): b960622

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
4
 
5
  These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
6
 
@@ -14,5 +16,8 @@ The quality of the llamafiles generated from these freshly converted GGUFs were
14
 
15
  These three were most interesting because:
16
  - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
 
17
  - q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
18
- - q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ Update: Someone requested q4_0, q5_0, and q6_k. Added, and q5_0 is my new favorite for this any any Mixtral derivative. Try it. Something about the 'k' process ever so slightly alters mixtrals. Compare if you don't believe me.
5
+
6
 
7
  These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
8
 
 
16
 
17
  These three were most interesting because:
18
  - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
19
+ - q4-0: for some reason, this is better quality than q4-k-m.
20
  - q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
21
+ - q5-0: * recommended * for some reason, this is better quality than q5-k-m.
22
+ - q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
23
+ - q6-k: lower perplexity, but I don't like the output style