Edit model card

Experimental quants of 4 expert MoE mixtrals in various GGUF formats.

Original model used for custom quants: NeverSleep/Mistral-11B-SynthIAirOmniMix
https://huggingface.co/NeverSleep/Mistral-11B-SynthIAirOmniMix

Goal is to have the best performing MoE < 10gb

Experimental q8 and q4 files for training/finetuning too.

No sparsity tricks yet.

8.4gb custom 2bit quant works ok up until 512 token length then starts looping.

  • Install llama.cpp from github and run it:
git clone https://github.com/ggerganov/llama.cpp

cd llama.cpp

make -j 

wget https://huggingface.co/nisten/quad-mixtrals-gguf/resolve/main/4mixq2.gguf

./server -m 4mixq2.gguf --host "my.internal.ip.or.my.cloud.host.name.goes.here.com" -c 512

limit output to 500 tokens

Downloads last month
845
GGUF
Model size
24.2B params
Architecture
llama

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Unable to determine this model's library. Check the docs .