Could you quantize this at Q4 please?

by Surprisekitty - opened Dec 9, 2023

Dec 9, 2023

•

edited Dec 9, 2023

Quantizing this at Q5 put's it just out of the range of being run by standard 24GB cards like the 3090. Having it quantized at Q4 would allow a single 3090 to run the model rather than having to splurge and get two of them. At which point you could run a 70B model. Anyone who wants to run a 34B most of the time runs a 4.65bpw for Exllama or a Q4_K_M quant. If you could please release a Q4 quantization, that would be much appreciated.

Surprisekitty changed discussion title from Why Q5 quant and not Q4? to Could you quantize this at Q4 please? Dec 9, 2023

crestf411

Owner Dec 14, 2023

Sorry for late response. Will upload q4_k_m quant in a bit.

crestf411

Owner Dec 14, 2023

Uploaded.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment