Update README.md
Browse files
README.md
CHANGED
@@ -84,6 +84,8 @@ These quantised GGUFv2 files are compatible with llama.cpp from August 27th onwa
|
|
84 |
|
85 |
They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
|
86 |
|
|
|
|
|
87 |
## Explanation of quantisation methods
|
88 |
<details>
|
89 |
<summary>Click to see details</summary>
|
@@ -186,12 +188,12 @@ Windows Command Line users: You can set the environment variable by running `set
|
|
186 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
187 |
|
188 |
```shell
|
189 |
-
./main -ngl 32 -m mistral-7b-v0.1.Q4_K_M.gguf --color -c
|
190 |
```
|
191 |
|
192 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
193 |
|
194 |
-
|
195 |
|
196 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
197 |
|
@@ -207,6 +209,8 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
|
|
207 |
|
208 |
### How to load this model in Python code, using ctransformers
|
209 |
|
|
|
|
|
210 |
#### First install the package
|
211 |
|
212 |
Run one of the following commands, according to your system:
|
|
|
84 |
|
85 |
They are also compatible with many third party UIs and libraries - please see the list at the top of this README.
|
86 |
|
87 |
+
Sequence length note: The model will work at sequence lengths of 4096, or lower. GGUF does not yet have support for the new sliding window sequence length mode, so longer sequence lengths are not supported.
|
88 |
+
|
89 |
## Explanation of quantisation methods
|
90 |
<details>
|
91 |
<summary>Click to see details</summary>
|
|
|
188 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
189 |
|
190 |
```shell
|
191 |
+
./main -ngl 32 -m mistral-7b-v0.1.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
|
192 |
```
|
193 |
|
194 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
195 |
|
196 |
+
Sequence length can be 4096 or lower. Mistral's sliding window sequence length is not yet supported in llama.cpp, so sequence lengths longer than 4096 are not supported.
|
197 |
|
198 |
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
|
199 |
|
|
|
209 |
|
210 |
### How to load this model in Python code, using ctransformers
|
211 |
|
212 |
+
Note: I have not tested ctransformers with Mistral models, but it may work if you set the `model_type` to `llama`.
|
213 |
+
|
214 |
#### First install the package
|
215 |
|
216 |
Run one of the following commands, according to your system:
|