MaziyarPanahi commited on
Commit
8c6897e
·
verified ·
1 Parent(s): ab7db8d

38d3509ce766513f2f506d576c12c8ee6150c9208cc80644b969f9b9a2f7d5d5

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -10,6 +10,7 @@ tags:
10
  - GGUF
11
  - transformers
12
  - pytorch
 
13
  - llama
14
  - text-generation
15
  - en
@@ -98,7 +99,7 @@ pip3 install huggingface-hub
98
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
99
 
100
  ```shell
101
- huggingface-cli download MaziyarPanahi/tulu-2-dpo-13b-GGUF tulu-2-dpo-13b-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
102
  ```
103
  </details>
104
  <details>
@@ -121,7 +122,7 @@ pip3 install hf_transfer
121
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
122
 
123
  ```shell
124
- HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/tulu-2-dpo-13b-GGUF tulu-2-dpo-13b-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
125
  ```
126
 
127
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
@@ -132,7 +133,7 @@ Windows Command Line users: You can set the environment variable by running `set
132
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
133
 
134
  ```shell
135
- ./main -ngl 35 -m tulu-2-dpo-13b-GGUF.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
136
  {system_message}<|im_end|>
137
  <|im_start|>user
138
  {prompt}<|im_end|>
@@ -149,7 +150,7 @@ For other parameters and how to use them, please refer to [the llama.cpp documen
149
 
150
  ## How to run in `text-generation-webui`
151
 
152
- Further instructions can be found in the text-generation-webui documentation, here: [text-generation-webui/docs/04 ‐ Model Tab.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/04%20%E2%80%90%20Model%20Tab.md#llamacpp).
153
 
154
  ## How to run from Python code
155
 
@@ -157,7 +158,7 @@ You can use GGUF models from Python using the [llama-cpp-python](https://github.
157
 
158
  ### How to load this model in Python code, using llama-cpp-python
159
 
160
- For full documentation, please see: [llama-cpp-python docs](https://abetlen.github.io/llama-cpp-python/).
161
 
162
  #### First install the package
163
 
@@ -189,7 +190,7 @@ from llama_cpp import Llama
189
 
190
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
191
  llm = Llama(
192
- model_path="./tulu-2-dpo-13b-GGUF.Q4_K_M.gguf", # Download the model file first
193
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
194
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
195
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
@@ -209,7 +210,7 @@ output = llm(
209
 
210
  # Chat Completion API
211
 
212
- llm = Llama(model_path="./tulu-2-dpo-13b-GGUF.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
213
  llm.create_chat_completion(
214
  messages = [
215
  {"role": "system", "content": "You are a story writing assistant."},
 
10
  - GGUF
11
  - transformers
12
  - pytorch
13
+ - safetensors
14
  - llama
15
  - text-generation
16
  - en
 
99
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
100
 
101
  ```shell
102
+ huggingface-cli download MaziyarPanahi/tulu-2-dpo-13b-GGUF tulu-2-dpo-13b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
103
  ```
104
  </details>
105
  <details>
 
122
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
123
 
124
  ```shell
125
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/tulu-2-dpo-13b-GGUF tulu-2-dpo-13b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
126
  ```
127
 
128
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
 
133
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
134
 
135
  ```shell
136
+ ./main -ngl 35 -m tulu-2-dpo-13b.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
137
  {system_message}<|im_end|>
138
  <|im_start|>user
139
  {prompt}<|im_end|>
 
150
 
151
  ## How to run in `text-generation-webui`
152
 
153
+ Further instructions can be found in the text-generation-webui documentation, here: [text-generation-webui/docs/04 ‐ Model Tab.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/04%20-%20Model%20Tab.md#llamacpp).
154
 
155
  ## How to run from Python code
156
 
 
158
 
159
  ### How to load this model in Python code, using llama-cpp-python
160
 
161
+ For full documentation, please see: [llama-cpp-python docs](https://github.com/abetlen/llama-cpp-python/).
162
 
163
  #### First install the package
164
 
 
190
 
191
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
192
  llm = Llama(
193
+ model_path="./tulu-2-dpo-13b.Q4_K_M.gguf", # Download the model file first
194
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
195
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
196
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
 
210
 
211
  # Chat Completion API
212
 
213
+ llm = Llama(model_path="./tulu-2-dpo-13b.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
214
  llm.create_chat_completion(
215
  messages = [
216
  {"role": "system", "content": "You are a story writing assistant."},