IQ1_S not usable

#1
by Kalemnor - opened

I think that IQ1_S is not usable on lower parameter models because the perplexity will skyrocket, but seems fine when using 33B or more models from what I've read.

Owner

Perplexity is one of the main factors, but I just do not like the output of quantizations lower than IQ4_NL at least for 7B.

I mainly use 7B and 10.7B models, so I might be biased.

From my testing anything lower than IQ4_XS starts having a harder time following instructions that were easily followed at higher bpw.

If I want speed I use IQ4_NL, otherwise I use Q4_K_M and Q5_K_M.

However, if you just want to mess around on lower spec machines 3 bpw quantizations are sufficient.

I use SOLAR 10.7B so I understand your point, I think there is no point using IQ1_S on 7B models not only cause you don't want like 30 on ppl but there is almost no point unless you want to use them on mobile hardware. And in that case it would be better to use a dedicated small models. So I do agree in general with what you say.
I usually go for Q5_K_M on 10.7B or lower. I've yet to try the IQ models, my argument is based on what I've read. By the way I was interested in the IQ4_NL too, does Ollama support inferencing with them?

Owner

I stopped using Ollama a while back. Before I stopped using it though I did see some PR's working on it. So it might support them now.

Right now I just use Koboldcpp, they function almost the same as they both use llamacpp as backend. The only thing I miss is the seamless model switching.

Kalemnor changed discussion status to closed

Sign up or log in to comment