MaziyarPanahi commited on
Commit
32fbb7e
1 Parent(s): 7f54f76

Update README.md (#2)

Browse files

- Update README.md (0628142de750a2f3db4142eb4a970783f3cb36a1)

Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -97,7 +97,7 @@ pip3 install huggingface-hub
97
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
98
 
99
  ```shell
100
- huggingface-cli download MaziyarPanahi/jaskier-7b-dpo-v5.6-GGUF jaskier-7b-dpo-v5.6-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
101
  ```
102
  </details>
103
  <details>
@@ -120,7 +120,7 @@ pip3 install hf_transfer
120
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
121
 
122
  ```shell
123
- HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/jaskier-7b-dpo-v5.6-GGUF jaskier-7b-dpo-v5.6-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
124
  ```
125
 
126
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
@@ -131,7 +131,7 @@ Windows Command Line users: You can set the environment variable by running `set
131
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
132
 
133
  ```shell
134
- ./main -ngl 35 -m jaskier-7b-dpo-v5.6-GGUF.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
135
  {system_message}<|im_end|>
136
  <|im_start|>user
137
  {prompt}<|im_end|>
@@ -188,7 +188,7 @@ from llama_cpp import Llama
188
 
189
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
190
  llm = Llama(
191
- model_path="./jaskier-7b-dpo-v5.6-GGUF.Q4_K_M.gguf", # Download the model file first
192
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
193
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
194
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
@@ -208,7 +208,7 @@ output = llm(
208
 
209
  # Chat Completion API
210
 
211
- llm = Llama(model_path="./jaskier-7b-dpo-v5.6-GGUF.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
212
  llm.create_chat_completion(
213
  messages = [
214
  {"role": "system", "content": "You are a story writing assistant."},
 
97
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
98
 
99
  ```shell
100
+ huggingface-cli download MaziyarPanahi/jaskier-7b-dpo-v5.6-GGUF jaskier-7b-dpo-v5.6.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
101
  ```
102
  </details>
103
  <details>
 
120
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
121
 
122
  ```shell
123
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/jaskier-7b-dpo-v5.6-GGUF jaskier-7b-dpo-v5.6.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
124
  ```
125
 
126
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
 
131
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
132
 
133
  ```shell
134
+ ./main -ngl 35 -m jaskier-7b-dpo-v5.6.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
135
  {system_message}<|im_end|>
136
  <|im_start|>user
137
  {prompt}<|im_end|>
 
188
 
189
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
190
  llm = Llama(
191
+ model_path="./jaskier-7b-dpo-v5.6.Q4_K_M.gguf", # Download the model file first
192
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
193
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
194
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
 
208
 
209
  # Chat Completion API
210
 
211
+ llm = Llama(model_path="./jaskier-7b-dpo-v5.6.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
212
  llm.create_chat_completion(
213
  messages = [
214
  {"role": "system", "content": "You are a story writing assistant."},