SlyEcho/open_llama_7b_ggml · Perplexity scores for a Herd of 7B Llamas

Jun 8, 2023

Perplexities calculated using build = 635 (5c64a09) of llama.cpp and the first 406 lines of wiki.test.raw
Previous perplexity benchmarking for llamas indicated that 406 lines is enough to compare different sizes and quantization levels

flyingkiwiguy changed discussion title from Perplexity scores for a Herd of 7B LLamas to Perplexity scores for a Herd of 7B Llamas Jun 8, 2023

SlyEcho

Owner Jun 8, 2023

•

edited Jun 8, 2023

Full perplexity on wiki.test.raw is under 7.0 (just barely) for the F16 version.

~~I will have all the numbers shortly.~~

Numbers are now up in the README.

ReadySetFly

Jun 8, 2023

Could you use dashed lines to separate the classes of the quantized models, e.g. "---" for q8, "-.-" for q5, "-..-" for q4, etc. It's hard for me to tell which line is which from the colors.

flyingkiwiguy

Jun 8, 2023

Could you use dashed lines to separate the classes of the quantized models, e.g. "---" for q8, "-.-" for q5, "-..-" for q4, etc. It's hard for me to tell which line is which from the colors.

If it helps the legend is ordered from highest to lowest perplexities. If you can't tell the difference between the lines, you can assume there's not much observable difference in the quality of the models.

I published a more general plot here:

https://github.com/openlm-research/open_llama/discussions/41#discussion-5277699

Hugging Face didn't allow me to upload the .csv file the plot is generated from.

Jeximo

Jun 8, 2023

•

edited Jun 8, 2023

Thanks for the scores, that's very helpful!