GPTQ / AWQ

#2
by agahebr - opened

Hi!

I was wondering if you're planning on adding AWQ/GPTQ support for this model? I'd usually check TheBloke's page, but he seems to be AFK as of recent.

Owner

I'll consider it. Problem with AWQ/GPTQ is that the format is less compatible/flexible than GGUF/EXL2 where you can find a quant in exactly the size to work with your VRAM resources.

In production, I use vLLM (the excellent aphrodite-engine fork) for fast parallel inference, but since I only have 48 GB VRAM on my systems, for Miquliz 120B I use EXL2 with Exllamav2 or the new 2-bit GGUF imatrix quants with llama.cpp/KoboldCpp. So don't think AWQ/GPTQ is a good fit for 120B models as of now, and it would take a huge amount of my limited resources. (I miss TheBloke, too!)

Sign up or log in to comment