TheBloke commited on
Commit
634ad31
1 Parent(s): c5e21e9

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -204,12 +204,12 @@ Windows Command Line users: You can set the environment variable by running `set
204
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
205
 
206
  ```shell
207
- ./main -ngl 35 -m capytessborosyi-34b-200k-dare-ties.Q4_K_M.gguf --color -c 3072 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "SYSTEM: {system_message}\nUSER: {prompt}\nASSISTANT:"
208
  ```
209
 
210
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
211
 
212
- Change `-c 3072` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value.
213
 
214
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
215
 
@@ -258,7 +258,7 @@ from llama_cpp import Llama
258
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
259
  llm = Llama(
260
  model_path="./capytessborosyi-34b-200k-dare-ties.Q4_K_M.gguf", # Download the model file first
261
- n_ctx=3072, # The max sequence length to use - note that longer sequence lengths require much more resources
262
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
263
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
264
  )
 
204
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
205
 
206
  ```shell
207
+ ./main -ngl 35 -m capytessborosyi-34b-200k-dare-ties.Q4_K_M.gguf --color -c 200000 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "SYSTEM: {system_message}\nUSER: {prompt}\nASSISTANT:"
208
  ```
209
 
210
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
211
 
212
+ Change `-c 200000` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value.
213
 
214
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
215
 
 
258
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
259
  llm = Llama(
260
  model_path="./capytessborosyi-34b-200k-dare-ties.Q4_K_M.gguf", # Download the model file first
261
+ n_ctx=200000, # The max sequence length to use - note that longer sequence lengths require much more resources
262
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
263
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
264
  )