Make sure to enable repeat-penalty for this model (latest llama.cpp has it disabled by default).
Browse files
README.md
CHANGED
@@ -95,7 +95,7 @@ Generated importance matrix file: [gorilla-openfunctions-v2.imatrix.dat](https:/
|
|
95 |
Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
|
96 |
|
97 |
```shell
|
98 |
-
./main -ngl 33 -m gorilla-openfunctions-v2.IQ3_M.gguf --color -c 16384 --temp 0 -p "You are an AI programming assistant, utilizing the Gorilla LLM model, developed by Gorilla LLM, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n### Instruction: <<function>>{functions}\n<<question>>{prompt}\n### Response: "
|
99 |
```
|
100 |
|
101 |
Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
@@ -159,7 +159,7 @@ from llama_cpp import Llama
|
|
159 |
|
160 |
# Chat Completion API
|
161 |
|
162 |
-
llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384)
|
163 |
print(llm.create_chat_completion(
|
164 |
messages = [
|
165 |
{
|
|
|
95 |
Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
|
96 |
|
97 |
```shell
|
98 |
+
./main -ngl 33 -m gorilla-openfunctions-v2.IQ3_M.gguf --color -c 16384 --temp 0 --repeat-penalty 1.1 -p "You are an AI programming assistant, utilizing the Gorilla LLM model, developed by Gorilla LLM, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n### Instruction: <<function>>{functions}\n<<question>>{prompt}\n### Response: "
|
99 |
```
|
100 |
|
101 |
Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
159 |
|
160 |
# Chat Completion API
|
161 |
|
162 |
+
llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384, temperature=0.0, repeat_penalty=1.1)
|
163 |
print(llm.create_chat_completion(
|
164 |
messages = [
|
165 |
{
|