File size: 1,162 Bytes
97649f3 c5afa0b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
---
license: apache-2.0
---
# ggml versions of Flan-Open-Llama-3b
- Announcement: [Tweet by @EnricoShippole](https://twitter.com/EnricoShippole/status/1661756166248996867) ("open-source")
- Model: [conceptofmind/Flan-Open-Llama-3b](https://huggingface.co/conceptofmind/Flan-Open-Llama-3b)
- Base Model: [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b) [OpenLLaMA: An Open Reproduction of LLaMA](https://github.com/openlm-research/open_llama) (Apache 2.0)
- Dataset: [FLAN](https://github.com/google-research/FLAN) (Apache 2.0)
- [llama.cpp](https://github.com/ggerganov/llama.cpp): build 607(ffb06a3) or later
- Type: instruct
## Use with llama.cpp
Support is now merged to master branch.
## K-quants
There are now more quantization types in llama.cpp, some lower than 4 bits.
Currently these are not well supported because of technical reasons.
If you want to use them, you have to build llama.cpp (from build 829 (ff5d58f)) with the `LLAMA_QKK_64` Make or CMake variable enabled (see PR [#2001](https://github.com/ggerganov/llama.cpp/pull/2001)).
Then you can quantize the F16 or maybe Q8_0 version to what you want.
|