May the 2-bit compression still face some performance limitations.

#1
by DesperateZero - opened

I tried the smallest iq2_xxs model and it's great - able to run on a single 24GB VRAM GPU. However, when subjectively comparing it to the Qwen1.5-72b-Chat model on hf qwen's space, I did notice a slightly quality drop. The iMat 3-bit series could potentially strike the right balance?

@DesperateZero I'll upload new quants retrained for longer on wiki and I moved the previous quants into a folder. This way we can compare between the two imatrix training.

I have conducted tests on the iq3xxs and found its ability to logically answer questions to be better than that of the iq2xxs. Unfortunately, due to the constraints of my current hardware environment, I haven't been able to test other versions. I intend to perform subjective tests on additional versions, as well as compare their performance with the official k-quants gguf , once I can use the company's host with 48gVRAM. My tests were conducted in Chinese, and I am curious about whether the imatrix from wiki has any effect on subjective impressions, and if so, how significant this effect might be.

Sign up or log in to comment