Q8?

by FiditeNemini - opened Feb 3

Discussion

FiditeNemini

Feb 3

Sorry to be a bother, but would you perhaps have Q8 quants of the Vicuna 13B files available, please? Fantastic model!

cmp-nct

Owner Feb 3

Hi,
I am just uploading a Q6_K, I'd recommend that in every single case over a Q8 quant as long as we do not have native fp8 cublas support built into llama.cpp (not a huge thing to do but not on any todo atm).
Q6_K has the same real world perplexity as Q8, and in all tests I've seen also as fp16. Q5_K is very close to it already.

FiditeNemini

Feb 3

Thank you so much, very much appreciated!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment