vLLM 4Bit?

#1
by winsonsou - opened

Hello.. How do we run this in vLLM at 4Bit quant? would we happen to have the command to run? Btw thanks for the model! <3

To my understanding this is quantized using bitsandbytes, for vLLM we need one of these: AWQ, GPTQ.
I look forward to GPTQ too!

Unofficial Mistral Community org

I will work on it later today :)

Sign up or log in to comment