TheBloke/Llama-2-7B-Chat-GGUF · Why "llama-2-7b-chat.Q8

Feb 14

I have enough computation power but why "llama-2-7b-chat.Q8_0.gguf" is noted with "not recommended".

Feb 22

I'm not a hundred percent sure, so take my answer with a grain of salt, but it is the least powerful model of Llama-2 Chat. First of all, it's 7B parameters (all of these GGUF models on this repo are), so it's significantly less powerful than 13B and 70B models are, but also, it is the smallest of the 7B models, as there are larger versions even of that.
This isn't to say you shouldn't use it, it's just that it may not fit your purposes as well as many other models may if your task is resource intensive. If you have the computing power, a larger model is probably better.

AhmetOnur

Feb 22

Thank you for the suggestion, but I would like to clarify that 'Q8_0' is the largest model in that repo.
Even if it's the largest model, why does it have a 'not-recommended' tag?

YaTharThShaRma999

Feb 22

@AhmetOnur q8 is the largest model in 7b but basically it’s not really recommended because q6 is similar quality to q8 and q6 is faster, and takes up less storage.

Also instead of running a q8, you can just run a 13b model q4 as JakeStBu said.

It will be better quality then a 7b q8.

Llama 2 chat is a kinda ok and very censored model(will refuse to tell you how to kill a process in Linux) so it’s not really the best unless you want that.

A better bet might be mistral 7b quantized models as they are good as 13b models but smaller and faster

TheBloke
/

Llama-2-7B-Chat-GGUF

Why "llama-2-7b-chat.Q8_0.gguf" model is not recommended