arcee-ai/SuperNova-Medius · Ideal quantization levels

Oct 18, 2024

Has there been any GGUF quant level tests on Medius? Some people find Llama to perform just as good as 4_km vs 5 km, and l'm wondering if this holds true for medius?

thecryptobro

Oct 19, 2024

I'm interested in this too. I've been playing with the IQ3_XS and IQ4_XS, can't notice crazy differences but i need to do more testing. IQ fits on 8GB VRAM.

Joseph717171

Oct 19, 2024

Depends on the quantization algorithm and how accurate it is. For GGUF IQuants, I would say it is fair to assume that IQ4_K_M will be a good size to capability ratio. However, I would venture that if you upcast the embeddings to F32 and quantize the Output Tensors to Q8_0 that you would have an even more thoughtful and capable model. Although, I haven't trained an imatrix on the F32 of the model to be able to quantize and verify this yet. It has been true for every other model I have converted and quantized, so I would posit that it is more than likely true here as well.🤔