AWQ

by Naugustogi - opened Jun 3, 2023

Jun 3, 2023

I would like to see an version of this model to this new compression technics. https://arxiv.org/abs/2306.00978 (AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration)

TheBloke

Owner Jun 5, 2023

Yeah I'm looking into it. Issue is that right now there's no support in the popular UIs, so it would be custom Python code only.

The AutoGPTQ team are discussing adding AWQ support to AutoGPTQ, which would be amazing and would instantly provide support to text-generation-webui

MrDevolver

Jun 5, 2023

Would this be applicable for models using CPU too?

TheBloke

Owner Jun 5, 2023

I haven't investigated it much yet but I doubt it - it's still pytorch based, so it's going to be terribly slow unless it's on GPU.

MrDevolver

Jun 5, 2023

I haven't investigated it much yet but I doubt it - it's still pytorch based, so it's going to be terribly slow unless it's on GPU.

Hmm, okay. I hope someone will figure out some compression which could be used for models using CPU too. Maybe something that would let us fit those sweet 65B models into 16 GB of RAM. >:-) Oh well, a man can dream. :D

Naugustogi changed discussion status to closed Jun 5, 2023

Naugustogi changed discussion status to open Jun 5, 2023

Naugustogi

Jun 5, 2023

misclick

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment