Can we get the GPTQ quantized model?

#5
by TheYuriLover - opened

Hello,

I noticed that you have experimented with quantization for your model, but the results were not as good as expected:

"When quantized to 4 bits, the model demonstrates unusual behavior, possibly due to its complexity. We suggest using a minimum quantization of 8 bits, although this has not been tested."

I recommend trying the new GPTQ quantization method, which includes the combined options "act-order" + "true-sequential" + "groupsize 128".
This method brings the quantized model's performance much closer to that of the 16-bit model.

Check out the following link for more information: https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/triton

If you decide to implement this quantization method, please consider uploading the resulting model to your repository. This way, users with both high-performance (16-bit) and lower-end computers (4-bit) can enjoy your models.

Best regards,

TheYuriLover changed discussion title from Can we get the GPTQ quantized version? to Can we get the GPTQ quantized model?

+1

Sign up or log in to comment