I know, I know, 6.5bpw is enough for perfection. For some people.

But, for those who want the best they can load, here's an 8bpw quant. Makes a difference for me, I think.

tweaked exl2 quant parameters a bit because I run 6-8k contexts:

python3 convert.py -i ../models/Smaug-Mixtral_v0.1 -o smaug_mixtral -cf smaug_mixtral -l 4096 -b 8 -hb 8 -ss 4096

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support