GPTQ AND AWQ Support for ZeroGPU

#117
by akhil2808 - opened
ZeroGPU Explorers org
edited Oct 9, 2024

image.png

Hey, I was wondering if Zerogpu supports AWQ and GPTQ quantisation considering that they are a dedicated GPU Quantization Type. I tried a lot of different ways to host my Qwen 2VL 72B Instruct AWQ Model but nothing seems to be working. If anyone could lend me a hand on this issue then I would be really thankful

akhil2808 changed discussion status to closed
akhil2808 changed discussion status to open
ZeroGPU Explorers org

.

I'll try to help debug it if I have the code.
However, it is not always possible to fix it since the specifications have changed considerably from the previous Zero GPU space...

ZeroGPU Explorers org

Also the main question is " are GPTQ and AWQ" formats even supported by ZeroGPU

I committed a version to boot.
However, the inference does not work.

Maybe it would work if the entire AWQ model was small enough to load into CUDA, but when I tried that with the 70B model, it crashed due to lack of VRAM.🤢
A similar algorithm that came out recently managed to work in Zero GPU space. I'm not sure which one it was...

Edit:
I remember now, it was AQLM.
https://discuss.huggingface.co/t/error-running-model-in-zerogpu/109819

ZeroGPU Explorers org

Also the main question is " are GPTQ and AWQ" formats even supported by ZeroGPU

The Format in itself is supported anyways.
what's not supported is loading 72B Vision models on zeroGPU, probably.
Quantized or not.

ZeroGPU Explorers org

@xi0v Oh is there any rule like that? You cant load models which are beyond a certain parameter? because this is less than a 13 billion parameter model which I think is small enough to fit on the 80GB Vram A100 ZeroGPU uses under the hood

ZeroGPU Explorers org

@John6666 I dont think a 13 billion model should be throwing an OOM error on an 80GB A100 thts unlikely

I thought the available VRAM was 40 GB? It's 80GB on the GPU specs, though.

ZeroGPU Explorers org
edited Oct 9, 2024

@xi0v Oh is there any rule like that? You cant load models which are beyond a certain parameter? because this is less than a 13 billion parameter model which I think is small enough to fit on the 80GB Vram A100 ZeroGPU uses under the hood

Well zeroGPU is limited in terms of computational power (hence, it being free but with Qouta) and ZeroGPU uses 40GB A100, not an 80GB if I recall correctly. 13B models work with no problem. What you tried using is a 72B model with Vision capabilities (which makes it need even more computational power to run).

Sign up or log in to comment