gguf versions of OpenLLaMa 3B

Newer quantizations

There are now more quantization types in llama.cpp, some lower than 4 bits. Currently these are not supported, maybe because some weights have shapes that don't divide by 256.

Perplexity on wiki.test.406

Coming soon...

Downloads last month
38
GGUF
Model size
3.43B params
Architecture
llama

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.