|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# ggml versions of LLongMA-3b |
|
|
|
- Announcement: [Tweet by @EnricoShippole](https://twitter.com/EnricoShippole/status/1677346578720256000) |
|
- Model: [conceptofmind/LLongMA-3b](https://huggingface.co/conceptofmind/LLongMA-3b) (license not specified) |
|
- Base Model: [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b), project: [OpenLLaMA: An Open Reproduction of LLaMA](https://github.com/openlm-research/open_llama) (Apache 2.0) |
|
- Tuning dataset: [togethercomputer/RedPajama-Data-1T](https://huggingface.co/togethercomputer/RedPajama-Data-1T) (various licenses) |
|
- [llama.cpp](https://github.com/ggerganov/llama.cpp): 3B model size: build 607(ffb06a3) or later, extended context: N/A |
|
- Context length: 8192 token extended length model. |
|
- Type: foundational |
|
|
|
## Extended context |
|
|
|
This model uses an extended context by [scaling the position index](https://kaiokendev.github.io/context) in the RoPE algorithm by 1/4 to extend it from 2048 tokens of the original LLaMA models to 8192 tokens. |
|
For the best results the model should undergo an additional finetuning training step. |
|
This was achieved with training with 1 billion tokens of the RedPajama-1T dataset (OpenLLaMA 3B full training was 1 trillion tokens). |
|
|
|
To enable this in llama.cpp is an ongoing development effort. |
|
You can track it in PR [#2054](https://github.com/ggerganov/llama.cpp/pull/2054). |
|
It should be enabled with the flags `-c 8192 --rope-freq-scale 0.25` if it is correct. |
|
|
|
## K-quants |
|
|
|
There are now more quantization types in llama.cpp, some lower than 4 bits. |
|
Currently these are not well supported because of technical reasons. |
|
If you want to use them, you have to build llama.cpp (from build 829 (ff5d58f)) with the `LLAMA_QKK_64` Make or CMake variable enabled (see PR [#2001](https://github.com/ggerganov/llama.cpp/pull/2001)). |
|
Then you can quantize the F16 or maybe Q8_0 version to what you want. |
|
|