Transformers
GGUF
llama
text-generation-inference

first gguf ...nice

#1
by mirek190 - opened

First guff ...nice

Why did you added old q5?
q5 Is even worse than q5k_s.

Someone asked me to, saying it performs better (in terms of speed) than the new k-quants on their old hardware

ok ... just inform ... even q4k_m has lower perplexity than old q5... and q4k_m is much faster than old q5.

I see .. he was comparing old Q5 to Q5_K_S but not to Q4_K_M .... which is faster than old Q5 and has lower perplexity.

https://github.com/ggerganov/llama.cpp/pull/1508

https://github.com/ggerganov/llama.cpp/pull/1684

just a silly question from a newbie probably best suited for reddit:i am following the development of new compressed formats since the beginning mostly ggml formats because i have only cpu+ram power and i totally embrace them..but with these new integrated metadata do you think it will be possible for this new gguf format to be retrocompatible with older versions?The absence of retrocompatibility in ggml versions were because lama.cpp wasn't able to recognize the older versions or was a deliberate choice?I often return using older vicuna versions for nostalgia valour (they were the first to surprise me so much) and i switch between lama.cpp versions.Will you convert Gguf versions of other previous models in your list with good score on the leaderboard or you will go from now on with just the new ones?. Thanks in advance for your work

You can always convert older models to the newer ones by itself.
I do not see a problem.
Apart from that if I know The Bloke will convert all his older models to the new gguf format anyway.

Apart from that ... what a problem to keep old llama.exe version ( it is one file of size around 660 KB ) .

deleted

You can always convert older models to the newer ones by itself.
I do not see a problem.
Apart from that if I know The Bloke will convert all his older models to the new gguf format anyway.

Any guidance on that conversion? Just now starting to read on the new format. I do hope its far better, else its silly to be inventing new formats just for fun.

llama.cpp not providing backwards compatibility is a deliberate choice. They prefer to keep moving forward, and not spend time maintaining older code. This does make it somewhat harder for users.

Other clients/libraries do provide backwards compatibility. KoboldCpp for example supports GGUF (as of today), GGML v3 (the most recent version, the one we've been using since May), and older versions of GGML as well. The developer of that aims to provide ongoing support for as wide a range of files for as long as possible.

Yes I will be adding GGUFs for older repos soon, hopefully starting tomorrow.

I'm not able to download it using the textgen webui, it just says "done" immediately after downloading everything but the large files. Are other people experiencing this? Probably an issue on their end since I have it on all GGUF repos, just curious if I'm not alone.

Edit: I guess it's intentional. They say GGML is not supported either for direct download but that does work, which is what got my confused.

Yeah it's intentional - text-gen-ui doesn't support GGUF files yet. I think it's looking for filenames ending in .bin or with 'ggml' in their name, and not finding any here.

I have a GGML repo for this model too, you should use that until text-gen-ui supports GGUF, which I imagine it will in a few days.

Sign up or log in to comment