johnrachwanpruna commited on
Commit
b42492e
1 Parent(s): 438f966

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -61
README.md CHANGED
@@ -136,67 +136,67 @@ The following clients/libraries will automatically download models for you, prov
136
 
137
  You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries. Note that at the time of writing (Nov 27th 2023), ctransformers has not been updated for some time and is not compatible with some recent models. Therefore I recommend you use llama-cpp-python.
138
 
139
- ### How to load this model in Python code, using llama-cpp-python
140
-
141
- For full documentation, please see: [llama-cpp-python docs](https://abetlen.github.io/llama-cpp-python/).
142
-
143
- #### First install the package
144
-
145
- Run one of the following commands, according to your system:
146
-
147
- ```shell
148
- # Base ctransformers with no GPU acceleration
149
- pip install llama-cpp-python
150
- # With NVidia CUDA acceleration
151
- CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
152
- # Or with OpenBLAS acceleration
153
- CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
154
- # Or with CLBLast acceleration
155
- CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
156
- # Or with AMD ROCm GPU acceleration (Linux only)
157
- CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
158
- # Or with Metal GPU acceleration for macOS systems only
159
- CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
160
-
161
- # In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
162
- $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
163
- pip install llama-cpp-python
164
- ```
165
-
166
- #### Simple llama-cpp-python example code
167
-
168
- ```python
169
- from llama_cpp import Llama
170
-
171
- # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
172
- llm = Llama(
173
- model_path="./phi-2.IQ3_M.gguf", # Download the model file first
174
- n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
175
- n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
176
- n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
177
- )
178
-
179
- # Simple inference example
180
- output = llm(
181
- "<s>[INST] {prompt} [/INST]", # Prompt
182
- max_tokens=512, # Generate up to 512 tokens
183
- stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
184
- echo=True # Whether to echo the prompt
185
- )
186
-
187
- # Chat Completion API
188
-
189
- llm = Llama(model_path="./phi-2.IQ3_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
190
- llm.create_chat_completion(
191
- messages = [
192
- {"role": "system", "content": "You are a story writing assistant."},
193
- {
194
- "role": "user",
195
- "content": "Write a story about llamas."
196
- }
197
- ]
198
- )
199
- ```
200
 
201
  - **Option D** - Running with LangChain
202
 
 
136
 
137
  You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries. Note that at the time of writing (Nov 27th 2023), ctransformers has not been updated for some time and is not compatible with some recent models. Therefore I recommend you use llama-cpp-python.
138
 
139
+ ### How to load this model in Python code, using llama-cpp-python
140
+
141
+ For full documentation, please see: [llama-cpp-python docs](https://abetlen.github.io/llama-cpp-python/).
142
+
143
+ #### First install the package
144
+
145
+ Run one of the following commands, according to your system:
146
+
147
+ ```shell
148
+ # Base ctransformers with no GPU acceleration
149
+ pip install llama-cpp-python
150
+ # With NVidia CUDA acceleration
151
+ CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
152
+ # Or with OpenBLAS acceleration
153
+ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
154
+ # Or with CLBLast acceleration
155
+ CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
156
+ # Or with AMD ROCm GPU acceleration (Linux only)
157
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
158
+ # Or with Metal GPU acceleration for macOS systems only
159
+ CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
160
+
161
+ # In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
162
+ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
163
+ pip install llama-cpp-python
164
+ ```
165
+
166
+ #### Simple llama-cpp-python example code
167
+
168
+ ```python
169
+ from llama_cpp import Llama
170
+
171
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
172
+ llm = Llama(
173
+ model_path="./phi-2.IQ3_M.gguf", # Download the model file first
174
+ n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
175
+ n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
176
+ n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
177
+ )
178
+
179
+ # Simple inference example
180
+ output = llm(
181
+ "<s>[INST] {prompt} [/INST]", # Prompt
182
+ max_tokens=512, # Generate up to 512 tokens
183
+ stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
184
+ echo=True # Whether to echo the prompt
185
+ )
186
+
187
+ # Chat Completion API
188
+
189
+ llm = Llama(model_path="./phi-2.IQ3_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
190
+ llm.create_chat_completion(
191
+ messages = [
192
+ {"role": "system", "content": "You are a story writing assistant."},
193
+ {
194
+ "role": "user",
195
+ "content": "Write a story about llamas."
196
+ }
197
+ ]
198
+ )
199
+ ```
200
 
201
  - **Option D** - Running with LangChain
202