Please

#1
by Aryanne - opened

Can you quantize conceptofmind/LLongMA-3b and conceptofmind/Flan-Open-Llama-3b, I can't find it anywhere, and with k-quants too?

I don't really see what the origin or license is for those models.

But otherwise, is there something difficult in converting these models to ggml? I think it should be possible to use the tools included in llama.cpp right now without any hacking patching (which was not true when I first uploaded the 3B models).

K quants for 3B models is problematic right now because they require a special build of llama.cpp to work.

I found more info on:
https://twitter.com/EnricoShippole/status/1672274141255180288?t=iDgpZy2ggF3xt9I4TlhTug&s=19

I asked you cause I don't any computer available at the moment to quantize and Idk how too. Thanks for answering ๐Ÿค—

I think it would be possible to quantize with k quants The Bloke did with https://huggingface.co/TheBloke/Flan-OpenLlama-7B-GGML

K quants are not supported well for 3B models. You have to basically compile a custom version of llama.cpp but it will probably be extremely hard for you to do.

I will try to add the models, though.

Alright, here they are:

I did manage to get LLongMA-3b working with the 8K context but I needed apply patches from Github as it is not merged yet.
Not sure how helpful it is for you.

SlyEcho changed discussion status to closed

Thanks๐Ÿ™Œ, I'm going to test on koboldcpp

all worked perfectly, thanks, if you can convert this too https://huggingface.co/syzymon/long_llama_3b

That model would only work with 2048 tokens in the current llama.cpp code, so I don't see the point right now.

Maybe later.

Sign up or log in to comment