open_llama_3b_ggml / README.md
SlyEcho's picture
ppx info
601130f verified
|
raw
history blame
940 Bytes
metadata
license: apache-2.0

ggml versions of OpenLLaMa 3B

Use with llama.cpp

Support is now merged to master branch.

Newer quantizations

There are now more quantization types in llama.cpp, some lower than 4 bits. Currently these are not supported, maybe because some weights have shapes that don't divide by 256.

Perplexity on wiki.test.raw

Q chunk 600BT 1000BT
F16 [616] 8.4656 7.7861
Q8_0 [616] 8.4667 7.7874
Q5_1 [616] 8.5072 7.8424
Q5_0 [616] 8.5156 7.8474
Q4_1 [616] 8.6102 8.0483
Q4_0 [616] 8.6674 8.0962