gobean commited on
Commit
749c4b1
1 Parent(s): 3cd7fbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -1,3 +1,18 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
6
+
7
+ They were converted from Mistral's safetensors and quantized on April 3, 2024.
8
+ This matters because some of the GGUF files for Mixtral 8x7B were created as soon as llama.cpp supported MoE architecture, but there were still bugs at that time.
9
+ Those bugs have since been patched.
10
+
11
+ These are here for refence, comparison, and any future work.
12
+
13
+ The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
14
+
15
+ These three were most interesting because:
16
+ q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
17
+ q4-k-m: the widely accepted standard as "good enough", but in this case it does not fit on a 4090
18
+ q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"