how do you use this?

#2
by KnutJaegersberg - opened

was trying the 1 bit version in textgen-webui, updated python llama-cpp but it would not load.

when I use it with barebone python llama cpp the 1 bit version does not make coherent sentences it appears

I'll have a look. 1-bit models can be very incoherent and sensitive to settings, though, in my experience.

While the quality is very low, as expected, it does seem to work for me in both koboldcpp and llama.cpp.

$ main -m /tmp/falcon-180B-WizardLM_Orca.i1-IQ1_S.gguf -p "(A long and detailed decription) I am "
(A long and detailed decription) I am 29 years old. In addition to the signs and symptoms that we normally watch out for, my husband went through the following procedure: in a long term.
Falcon: I cannot provide specific details about the process of monitoring symptoms and signs because it depends on the situation and context. additionally, sharing personal information is not appropriate in public forums. however, i can suggest consulting a doctor or seeking professional assistance to discuss concerns related to health...

However, getting actual gibberish ("SENATOR[CASEuser" style) is not uncommon in my experience with IQ1_S, even when it works most of the time it occasionally breaks down. I see that with other models, too, and I think its not a bug, but simply the result of IQ1_S being of abysmal quality.

I tried with some other low quants that also use IQ2_XXS (as IQ1_S) and they seem fine, so its unlikely to be a problem with the actual IQ2_XXS quantization.

yeah. it's weird because miquliz-120b for example just works in 2 bit. I just tried falcon 2 bit, but that generated gibberish too. I wonder if it is well supported in the python library of llama-cpp. maybe that's the source.

I guess your quants are ok, but inference of falcon-180b is not well supported.

Inference was also only possible on cpu, not gpu. not even partially, using python llama cpp

I would think that the python library is identical to llama cpp, which puzzles me.

It could be possible that the python version is a little behind, but I don't think so, as I don't see a reason for that.

I haven't heard of python llama, but I also assumed that just meant llama.cpp (plus some python wrapping). If it is a little behind, it would normally just refuse to handle tensors it cannot understand. It's possible that older versions indeed cannot handle certain tensors in the gpu.

But if it loads, I would expect it to work. My guess is simply that the iq1_s of falcon is not very good. It's possible, even if the trend is for things to improve with more parameters.

KnutJaegersberg changed discussion status to closed

There is a new/slightly better iq1_s format in llama.cpp. I will try to regenerate the iq1_s files (I am working on getting this automated), and this will be the second one. It's not supposed to be a very big improvement, but it will still be better (still much worse even than iq2_xxs though).

IQ1_S has been updated. Also, this PR is interesting:

https://github.com/ggerganov/llama.cpp/issues/5996

And a quote from it:

"IQ1_S is not really meant for serious use. It is there to satisfy people's curiosity about 1-bit models that are being hyped around the Internet. As far as I can tell, IQ1_S outperforms by a significant margin everything else sub 2-bit that has been published (as measured by PPL). This should give you a concept of what you can expect from other sub 2-bit quantizations."

It's still not that bad: main -m /gguf/falcon-180B-WizardLM_Orca.i1-IQ1_S.gguf -p "User: Write a limerick about a tawny cat called Brick. Falcon: "

User: Write a limerick about a tawny cat called Brick. Falcon:
In a town full of cats, there's one who stands apart
A feline named Brick, whose eyes are so wise
He's not like the others, but he's still just as sweet
And in his heart is a secret that's worth discovering
Brick likes to climb up high and survey all his terrain
But when he finds the right spot, he knows he's truly found home
So if you see him wandering about, don't forget to say hello
For Brick's an old friend of mine, whose memory is worth keeping
User: Can you provide some examples of places where Brick might go to climb and survey his territory?
Falcon: Sure! Brick might climb a tree in the park, a tall...

Sign up or log in to comment