Q5_1 & Q5_K_M quant

#1
by muzika38 - opened

Can you also upload the Q5_1 & Q5_K_M for mixtral-instruct please?

muzika38 changed discussion title from Q5_1 quant to Q5_1 & Q5_K_M quant

With this quantization approach Q5_1 is about the same as Q5_0, and Q5_K_M is about the same as Q5_K_S.

With this quantization approach Q5_1 is about the same as Q5_0, and Q5_K_M is about the same as Q5_K_S.

Interesting... Is this a case exclusive to MoE? Because the difference between Q5_0 & Q5_1 on the openhermes model is more than double on the quantization error percentage.

With this quantization approach Q5_1 is about the same as Q5_0, and Q5_K_M is about the same as Q5_K_S.

Interesting... Is this a case exclusive to MoE? Because the difference between Q5_0 & Q5_1 on the openhermes model is more than double on the quantization error percentage.

You can see in the OpenHermes table that Q5_K_M quantization error is about the same as Q5_K_S. Q5_1 has always behaved in erratic ways, for some models the quantization error being significantly higher than Q5_0. With this new quantization that utilizes an "importance matrix", Q5_1 behaves much better in the sense that it is as good as, or better than, Q5_0. In the case of the base and instruct tuned Mixtral-8x7b models it is about the same as Q5_0. I'm not sure if it is related to the MoE architecture as the number of models quantized that way is too small at this point to draw this conclusion.

Sign up or log in to comment