what is the difference between gpt4-x-vicuna-13B.ggmlv3.q5_1.bin and gpt4-x-vicuna-13B.ggmlv3.q5_K_M.bin?

by MorphzZ - opened

Hi Bloke, could you tell me the difference between these two?

q5_K_M is the new "k-quant" format. It's generally considered a better format. It will have slightly better accuracy, and likely higher performance as well. And it offers a wide range of size choices, so you can pick the one that best fits your hardware and gives you the ideal compromise between accuracy and speed.

I may stop releasing the 'old' quant formats (q4_0, q4_1, q5_0, q5_1 and q8_0) sometime soon.

But TLDR there's not a huge difference between those two, so don't sweat the choice. But generally I'd recommend q5_K_M or q6_K over q5_1 now.

Sign up or log in to comment