---
license: apache-2.0
---

# ggml versions of Flan-Open-Llama-3b

- Announcement: [Tweet by @EnricoShippole](https://twitter.com/EnricoShippole/status/1661756166248996867) ("open-source")
- Model: [conceptofmind/Flan-Open-Llama-3b](https://huggingface.co/conceptofmind/Flan-Open-Llama-3b)
- Base Model: [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b) [OpenLLaMA: An Open Reproduction of LLaMA](https://github.com/openlm-research/open_llama) (Apache 2.0)
- Dataset: [FLAN](https://github.com/google-research/FLAN) (Apache 2.0)
- [llama.cpp](https://github.com/ggerganov/llama.cpp): build 607(ffb06a3) or later
- Type: instruct

## Use with llama.cpp

Support is now merged to master branch.

## K-quants

There are now more quantization types in llama.cpp, some lower than 4 bits.
Currently these are not well supported because of technical reasons.
If you want to use them, you have to build llama.cpp (from build 829 (ff5d58f)) with the `LLAMA_QKK_64` Make or CMake variable enabled (see PR [#2001](https://github.com/ggerganov/llama.cpp/pull/2001)).
Then you can quantize the F16 or maybe Q8_0 version to what you want.