Benchmark of different GGML version

#2
by aiapprentice101 - opened

Hi,

Awesome work! Do you happen to have any benchmark of different versions of the GGML model? It will be great to see how much performance deterioration we get from these quantization techniques.

@aiapprentice101 Did you see https://oobabooga.github.io/blog/posts/perplexities/ ? Its not about this model specifically but covers the relative performance of different GGML and GPTQ quants.

Thank you @mike-ravkine . The post seems to claim GPTQ is the best in terms of quality. However, when I test out the GPTQ version of Llama-2 (also from TheBloke), I get very bad performance. The model doesn't capture the one-shot instructions, and generates random stuffs.

Sign up or log in to comment