Text Generation
GGUF
English
code
CISCai commited on
Commit
2ab92e1
1 Parent(s): dd8e76d

Updated with llama-cpp-python example

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -117,6 +117,94 @@ There is a similar option for V-cache (`-ctv`), however that is [not working yet
117
 
118
  For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  <!-- README_GGUF.md-how-to-run end -->
121
 
122
  <!-- original-model-card start -->
 
117
 
118
  For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
119
 
120
+ ## How to run from Python code
121
+
122
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python).
123
+
124
+ ### How to load this model in Python code, using llama-cpp-python
125
+
126
+ For full documentation, please see: [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/).
127
+
128
+ #### First install the package
129
+
130
+ Run one of the following commands, according to your system:
131
+
132
+ ```shell
133
+ # Prebuilt wheel with basic CPU support
134
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
135
+ # Prebuilt wheel with NVidia CUDA acceleration
136
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 (or cu122 etc.)
137
+ # Prebuilt wheel with Metal GPU acceleration
138
+ pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
139
+ # Build base version with no GPU acceleration
140
+ pip install llama-cpp-python
141
+ # With NVidia CUDA acceleration
142
+ CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
143
+ # Or with OpenBLAS acceleration
144
+ CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
145
+ # Or with CLBLast acceleration
146
+ CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
147
+ # Or with AMD ROCm GPU acceleration (Linux only)
148
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
149
+ # Or with Metal GPU acceleration for macOS systems only
150
+ CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
151
+ # Or with Vulkan acceleration
152
+ CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
153
+ # Or with Kompute acceleration
154
+ CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
155
+ # Or with SYCL acceleration
156
+ CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
157
+
158
+ # In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
159
+ $env:CMAKE_ARGS = "-DLLAMA_CUDA=on"
160
+ pip install llama-cpp-python
161
+ ```
162
+
163
+ #### Simple llama-cpp-python example code
164
+
165
+ ```python
166
+ from llama_cpp import Llama
167
+
168
+ # Chat Completion API
169
+
170
+ llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384)
171
+ print(llm.create_chat_completion(
172
+ messages = [
173
+ {
174
+ "role": "user",
175
+ "content": "What's the weather like in Oslo?"
176
+ }
177
+ ],
178
+ tools=[{
179
+ "type": "function",
180
+ "function": {
181
+ "name": "get_current_weather",
182
+ "description": "Get the current weather in a given location",
183
+ "parameters": {
184
+ "type": "object",
185
+ "properties": {
186
+ "location": {
187
+ "type": "string",
188
+ "description": "The city and state, e.g. San Francisco, CA"
189
+ },
190
+ "unit": {
191
+ "type": "string",
192
+ "enum": [ "celsius", "fahrenheit" ]
193
+ }
194
+ },
195
+ "required": [ "location" ]
196
+ }
197
+ }
198
+ }],
199
+ tool_choice=[{
200
+ "type": "function",
201
+ "function": {
202
+ "name": "get_current_weather"
203
+ }
204
+ }]
205
+ ))
206
+ ```
207
+
208
  <!-- README_GGUF.md-how-to-run end -->
209
 
210
  <!-- original-model-card start -->