Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,8 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
4 |
|
5 |
These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
6 |
|
@@ -14,5 +16,8 @@ The quality of the llamafiles generated from these freshly converted GGUFs were
|
|
14 |
|
15 |
These three were most interesting because:
|
16 |
- q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
|
|
17 |
- q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
|
18 |
-
- q5-
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
Update: Someone requested q4_0, q5_0, and q6_k. Added, and q5_0 is my new favorite for this any any Mixtral derivative. Try it. Something about the 'k' process ever so slightly alters mixtrals. Compare if you don't believe me.
|
5 |
+
|
6 |
|
7 |
These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
8 |
|
|
|
16 |
|
17 |
These three were most interesting because:
|
18 |
- q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
19 |
+
- q4-0: for some reason, this is better quality than q4-k-m.
|
20 |
- q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
|
21 |
+
- q5-0: * recommended * for some reason, this is better quality than q5-k-m.
|
22 |
+
- q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
|
23 |
+
- q6-k: lower perplexity, but I don't like the output style
|