Quantized GGUF / EXL2 please?

#1
by siddhesh22 - opened

Possible to quantize this? Would appreciate it, only have 12GB VRAM.

Owner

This model was trained and validated under 4-bit quantization with bitsandbytes of which parametere has double quantization and nf4 format.

Actually, I did not have an experience to quantize this model by GGUF and EXL2 but only by bitsandbytes, but I think 12GB VRAM is too insufficient memory to my coding experience despite 4bit, because we must have additional four computer vision models.

Therefore, I think it may be impossible to run under only 12GB VRAM.

@BK-Lee How much GB VRAM is enough to run in local?

Sign up or log in to comment