Is IQ1_S broken? If so why list it here?

#9
by stduhpf - opened

In the model card it says If your last resort is to use an IQ1 quant then go for IQ1_M. I was planning on downlading the IQ1_S model with my slow internet connexion to be able to have remaining free space for a decent context window on my 32 GB machine.
But this line makes me hesitate, especially since my internet connexion isn't too fast and this download will take a while.

If anyone tried it, could you tell me if it's working decently? And is it worth using an IQ1 of this model versus a Q3 or Q4 of the 34B command-r?

Take a look at this, I ran the same prompt with a seed and temperature of 0 -> https://drive.google.com/file/d/131UOH-laXSn5SbbKfUr-zHGa4RwOEIK0/view?usp=sharing
Clearly the IQ1_M is performing much better than IQ1_S, but I also understand this is a limited test. I am planning to run perplexity and post results once it is fixed upstream.

Sign up or log in to comment