CISCai commited on
Commit
a2a7102
·
verified ·
1 Parent(s): 0e8e952

Make sure to enable repeat-penalty for this model (latest llama.cpp has it disabled by default).

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -95,7 +95,7 @@ Generated importance matrix file: [gorilla-openfunctions-v2.imatrix.dat](https:/
95
  Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
96
 
97
  ```shell
98
- ./main -ngl 33 -m gorilla-openfunctions-v2.IQ3_M.gguf --color -c 16384 --temp 0 -p "You are an AI programming assistant, utilizing the Gorilla LLM model, developed by Gorilla LLM, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n### Instruction: <<function>>{functions}\n<<question>>{prompt}\n### Response: "
99
  ```
100
 
101
  Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
@@ -159,7 +159,7 @@ from llama_cpp import Llama
159
 
160
  # Chat Completion API
161
 
162
- llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384)
163
  print(llm.create_chat_completion(
164
  messages = [
165
  {
 
95
  Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
96
 
97
  ```shell
98
+ ./main -ngl 33 -m gorilla-openfunctions-v2.IQ3_M.gguf --color -c 16384 --temp 0 --repeat-penalty 1.1 -p "You are an AI programming assistant, utilizing the Gorilla LLM model, developed by Gorilla LLM, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n### Instruction: <<function>>{functions}\n<<question>>{prompt}\n### Response: "
99
  ```
100
 
101
  Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
 
159
 
160
  # Chat Completion API
161
 
162
+ llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384, temperature=0.0, repeat_penalty=1.1)
163
  print(llm.create_chat_completion(
164
  messages = [
165
  {