CISCai
/

OpenCodeInterpreter-DS-6.7B-SOTA-GGUF

Text Generation

GGUF

English

code

Model card Files Files and versions Community

CISCai commited on Apr 4

Commit

2ab92e1

•

1 Parent(s): dd8e76d

Updated with llama-cpp-python example

Browse files

Files changed (1) hide show

README.md +88 -0

README.md CHANGED Viewed

@@ -117,6 +117,94 @@ There is a similar option for V-cache (`-ctv`), however that is [not working yet
 For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
 <!-- README_GGUF.md-how-to-run end -->
 <!-- original-model-card start -->

 For other parameters and how to use them, please refer to [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)
+## How to run from Python code
+You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python).
+### How to load this model in Python code, using llama-cpp-python
+For full documentation, please see: [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/).
+#### First install the package
+Run one of the following commands, according to your system:
+```shell
+# Prebuilt wheel with basic CPU support
+pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
+# Prebuilt wheel with NVidia CUDA acceleration
+pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121 (or cu122 etc.)
+# Prebuilt wheel with Metal GPU acceleration
+pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
+# Build base version with no GPU acceleration
+pip install llama-cpp-python
+# With NVidia CUDA acceleration
+CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
+# Or with OpenBLAS acceleration
+CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
+# Or with CLBLast acceleration
+CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
+# Or with AMD ROCm GPU acceleration (Linux only)
+CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
+# Or with Metal GPU acceleration for macOS systems only
+CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
+# Or with Vulkan acceleration
+CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
+# Or with Kompute acceleration
+CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
+# Or with SYCL acceleration
+CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
+# In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
+$env:CMAKE_ARGS = "-DLLAMA_CUDA=on"
+pip install llama-cpp-python
+```
+#### Simple llama-cpp-python example code
+```python
+from llama_cpp import Llama
+# Chat Completion API
+llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384)
+print(llm.create_chat_completion(
+      messages = [
+        {
+          "role": "user",
+          "content": "What's the weather like in Oslo?"
+        }
+      ],
+      tools=[{
+        "type": "function",
+        "function": {
+          "name": "get_current_weather",
+          "description": "Get the current weather in a given location",
+          "parameters": {
+            "type": "object",
+            "properties": {
+              "location": {
+                "type": "string",
+                "description": "The city and state, e.g. San Francisco, CA"
+              },
+              "unit": {
+                "type": "string",
+                "enum": [ "celsius", "fahrenheit" ]
+              }
+            },
+            "required": [ "location" ]
+          }
+        }
+      }],
+      tool_choice=[{
+        "type": "function",
+        "function": {
+          "name": "get_current_weather"
+        }
+      }]
+))
+```
 <!-- README_GGUF.md-how-to-run end -->
 <!-- original-model-card start -->