Need a 7B Q2_K.gguf

#2
by boqsc - opened

Please make Q2_K gguf.

What's the use case for a Q2_K 7B model? Wouldn't performance degradation be extreme?

That's true, but sometimes it's not too bad and still produce decent answers.
Q2_K 7B is simply faster at local inference.
That's the main case. Also smaller in size and less RAM usage.

I also experienced that Q2_K sometimes provide with more raw and more random output that are not long boring answers.

The last case is to compare for myself and see if it's better at all that than Herman one:
https://huggingface.co/TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF

Uploading it now; it should be online soon. Since I have limited upload bandwidth and planned to update the repository on a regular basis at least for the next few weeks, I wanted to avoid having to make too many versions.

lemonilia changed discussion status to closed
lemonilia changed discussion status to open

The last case is to compare for myself and see if it's better at all that than Herman one:
https://huggingface.co/TheBloke/dolphin-2.2.1-AshhLimaRP-Mistral-7B-GGUF

The current version of Limamono has been finetuned on the base Mistral-7B model, by the way. So, I expect it not to follow general instructions very well, unless they are "roleplayed" in the specified novel/book/forum RP format.

lemonilia changed discussion status to closed

Sign up or log in to comment