remove BOS from llama.cpp example (automatically added by llama.cpp)
Browse files
README.md
CHANGED
@@ -93,7 +93,7 @@ Generated importance matrix file: [Cerebrum-1.0-8x7b.imatrix.dat](https://huggin
|
|
93 |
Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
|
94 |
|
95 |
```shell
|
96 |
-
./main -ngl 33 -m Cerebrum-1.0-8x7b.IQ2_XS.gguf --override-kv llama.expert_used_count=int:3 --color -c 16384 --temp 0.7 --repeat-penalty 1.0 -n -1 -p "
|
97 |
```
|
98 |
|
99 |
Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
93 |
Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
|
94 |
|
95 |
```shell
|
96 |
+
./main -ngl 33 -m Cerebrum-1.0-8x7b.IQ2_XS.gguf --override-kv llama.expert_used_count=int:3 --color -c 16384 --temp 0.7 --repeat-penalty 1.0 -n -1 -p "A chat between a user and a thinking artificial intelligence assistant. The assistant describes its thought process and gives helpful and detailed answers to the user's questions.\nUser: {prompt}\nAI:"
|
97 |
```
|
98 |
|
99 |
Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|