ggml versions of OpenLLaMa 3B

Version: 1T tokens final version
Project: OpenLLaMA: An Open Reproduction of LLaMA
Model: openlm-research/open_llama_3b
llama.cpp: build 607(ffb06a3) or later

Use with llama.cpp

Support is now merged to master branch.

Newer quantizations

There are now more quantization types in llama.cpp, some lower than 4 bits. Currently these are not supported, maybe because some weights have shapes that don't divide by 256.

Perplexity on wiki.test.raw

Q	chunk	600BT	1000BT
F16	[616]	8.4656	7.7861
Q8_0	[616]	8.4667	7.7874
Q5_1	[616]	8.5072	7.8424
Q5_0	[616]	8.5156	7.8474
Q4_1	[616]	8.6102	8.0483
Q4_0	[616]	8.6674	8.0962