--- license: apache-2.0 --- # ggml versions of Flan-Open-Llama-3b - Announcement: [Tweet by @EnricoShippole](https://twitter.com/EnricoShippole/status/1661756166248996867) ("open-source") - Model: [conceptofmind/Flan-Open-Llama-3b](https://huggingface.co/conceptofmind/Flan-Open-Llama-3b) - Base Model: [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b) [OpenLLaMA: An Open Reproduction of LLaMA](https://github.com/openlm-research/open_llama) (Apache 2.0) - Dataset: [FLAN](https://github.com/google-research/FLAN) (Apache 2.0) - [llama.cpp](https://github.com/ggerganov/llama.cpp): build 607(ffb06a3) or later - Type: instruct ## Use with llama.cpp Support is now merged to master branch. ## K-quants There are now more quantization types in llama.cpp, some lower than 4 bits. Currently these are not well supported because of technical reasons. If you want to use them, you have to build llama.cpp (from build 829 (ff5d58f)) with the `LLAMA_QKK_64` Make or CMake variable enabled (see PR [#2001](https://github.com/ggerganov/llama.cpp/pull/2001)). Then you can quantize the F16 or maybe Q8_0 version to what you want.