Perplexity scores for a Herd of 7B Llamas
- Perplexities calculated using
build = 635 (5c64a09)
of llama.cpp and the first 406 lines of wiki.test.raw - Previous perplexity benchmarking for llamas indicated that 406 lines is enough to compare different sizes and quantization levels
Full perplexity on wiki.test.raw
is under 7.0 (just barely) for the F16 version.
I will have all the numbers shortly.
Numbers are now up in the README.
Could you use dashed lines to separate the classes of the quantized models, e.g. "---" for q8, "-.-" for q5, "-..-" for q4, etc. It's hard for me to tell which line is which from the colors.
Could you use dashed lines to separate the classes of the quantized models, e.g. "---" for q8, "-.-" for q5, "-..-" for q4, etc. It's hard for me to tell which line is which from the colors.
If it helps the legend is ordered from highest to lowest perplexities. If you can't tell the difference between the lines, you can assume there's not much observable difference in the quality of the models.
I published a more general plot here:
https://github.com/openlm-research/open_llama/discussions/41#discussion-5277699
Hugging Face didn't allow me to upload the .csv file the plot is generated from.
Thanks for the scores, that's very helpful!