May I ask why the GPTQ version is slow

by lynngao815 - opened

Thank you very much for your effort with this GPTQ version! It is more convenient using your model with a consumer GPU and more affordable to do fine-tuning. However I am a lit bit confused about why the 4bit version is slower? I am not very familiar with those computer science fundamentals but is it supposed to be faster if using lower precisions? Really appreciated if someone could explain this to me!

Sign up or log in to comment