Q8?

#1
by FiditeNemini - opened

Sorry to be a bother, but would you perhaps have Q8 quants of the Vicuna 13B files available, please? Fantastic model!

Owner

Hi,
I am just uploading a Q6_K, I'd recommend that in every single case over a Q8 quant as long as we do not have native fp8 cublas support built into llama.cpp (not a huge thing to do but not on any todo atm).
Q6_K has the same real world perplexity as Q8, and in all tests I've seen also as fp16. Q5_K is very close to it already.

Thank you so much, very much appreciated!

Sign up or log in to comment