Do Zero inference GPUs not support 8,4 bit models?

#22
by Ayushnangia - opened
ZeroGPU Explorers org

I have been trying to run 4 bit version of a model but at inference it always gives me this.

what to do ?
(attached an image of the error as it is too big)

image.png

here is the spaces link https://huggingface.co/spaces/Ayushnangia/Try_mixtral

ZeroGPU Explorers org

i have the same problem here , this may be intentional https://huggingface.co/spaces/archit11/Llama-3-70B/discussions/1

ZeroGPU Explorers org

https://huggingface.co/spaces/eswardivi/AIO_Chat_llama3_8B

with this BNB Config in above space,am able to use 4bit and 8 bit

ZeroGPU Explorers org

image.png
I seems like quantization config is fixed by the model ?

Sign up or log in to comment