AWQ

#1
by Naugustogi - opened

I would like to see an version of this model to this new compression technics. https://arxiv.org/abs/2306.00978 (AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration)

Yeah I'm looking into it. Issue is that right now there's no support in the popular UIs, so it would be custom Python code only.

The AutoGPTQ team are discussing adding AWQ support to AutoGPTQ, which would be amazing and would instantly provide support to text-generation-webui

Would this be applicable for models using CPU too?

I haven't investigated it much yet but I doubt it - it's still pytorch based, so it's going to be terribly slow unless it's on GPU.

I haven't investigated it much yet but I doubt it - it's still pytorch based, so it's going to be terribly slow unless it's on GPU.

Hmm, okay. I hope someone will figure out some compression which could be used for models using CPU too. Maybe something that would let us fit those sweet 65B models into 16 GB of RAM. >:-) Oh well, a man can dream. :D

Naugustogi changed discussion status to closed
Naugustogi changed discussion status to open

Sign up or log in to comment