Perplexity scores for a Herd of 3B Llamas

#2
by flyingkiwiguy - opened
  1. Perplexities calculated using build = 635 (5c64a09) of llama.cpp and the first 406 lines of wiki.test.raw
  2. Previous perplexity benchmarking for llamas indicated that 406 lines is enough to compare different sizes and quantization levels

image.png

406 lines would save me a lot of time. But 3B is pretty fast to compute.

And here's a comparison across model sizes for select quantization levels (note: X-Axis is now on a log scale to better see the trend in doubling context size):
image.png

406 lines would save me a lot of time. But 3B is pretty fast to compute.

I iterated through most of your models for the three context sizes to get a complete picture of how good Open LLama is for varying quantization levels. Still plenty of room for Open LLama to get to FB Llama quality.

EDIT: I posted the same plot and included the data for the 100 or so runs I did to https://github.com/openlm-research/open_llama/discussions/41

@flyingkiwiguy These graphs are great!

There are also people over in this discussion who may be interested in your graphs:

https://github.com/ggerganov/llama.cpp/issues/1291

flyingkiwiguy changed discussion title from Perplexity scores for a Herd of 7B Llamas to Perplexity scores for a Herd of 3B Llamas

Sign up or log in to comment