Does this quantized model adequately work?

#1
by Dtree07 - opened

Sorry to bother,I just wanted to try to deploy this model on a GPU which has a VRAM of 16 GB. What puzzles me is that the output are totally messy codes. Did anyone ever try to run this model successfully, please offer me your valuable suggestions which are really important for me, thank u so much!

This comment has been hidden
The Kaitchup org

No, it doesn't work well. 2-bit quantization of 7B models with GPTQ currently yields bad models. For 7B models, 4-bit quantization is usually the minimum recommended.

Thanks a lot. Actually the 3-bit quantized version also works,but I need to check whether or not the outputs are satisfying. Btw, I wonder whether the 2bit-quantized Mistral of better performance will be uploaded again in the near future?

The Kaitchup org

better 2-bit versions of Mistral will surely be made in the future, but probably not with GPTQ.

Yep. The newly released HQQ is worth a try because I used to run the massive Mixtral quantized to 2-bit successfully. Thanks for your reply, it's really kind of you to do so. ♥

Sign up or log in to comment