Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,18 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
6 |
+
|
7 |
+
They were converted from Mistral's safetensors and quantized on April 3, 2024.
|
8 |
+
This matters because some of the GGUF files for Mixtral 8x7B were created as soon as llama.cpp supported MoE architecture, but there were still bugs at that time.
|
9 |
+
Those bugs have since been patched.
|
10 |
+
|
11 |
+
These are here for refence, comparison, and any future work.
|
12 |
+
|
13 |
+
The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
|
14 |
+
|
15 |
+
These three were most interesting because:
|
16 |
+
q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
17 |
+
q4-k-m: the widely accepted standard as "good enough", but in this case it does not fit on a 4090
|
18 |
+
q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
|