Ideal quantization levels
Has there been any GGUF quant level tests on Medius? Some people find Llama to perform just as good as 4_km vs 5 km, and l'm wondering if this holds true for medius?
I'm interested in this too. I've been playing with the IQ3_XS and IQ4_XS, can't notice crazy differences but i need to do more testing. IQ fits on 8GB VRAM.
Depends on the quantization algorithm and how accurate it is. For GGUF IQuants, I would say it is fair to assume that IQ4_K_M will be a good size to capability ratio. However, I would venture that if you upcast the embeddings to F32 and quantize the Output Tensors to Q8_0 that you would have an even more thoughtful and capable model. Although, I haven't trained an imatrix on the F32 of the model to be able to quantize and verify this yet. It has been true for every other model I have converted and quantized, so I would posit that it is more than likely true here as well.π€