gobean's picture
Update README.md
749c4b1 verified
|
raw
history blame
No virus
1 kB
---
license: apache-2.0
---
These are the quantized GGUF files for [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
They were converted from Mistral's safetensors and quantized on April 3, 2024.
This matters because some of the GGUF files for Mixtral 8x7B were created as soon as llama.cpp supported MoE architecture, but there were still bugs at that time.
Those bugs have since been patched.
These are here for refence, comparison, and any future work.
The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
These three were most interesting because:
q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
q4-k-m: the widely accepted standard as "good enough", but in this case it does not fit on a 4090
q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"