CISCai
/

gorilla-openfunctions-v2-SOTA-GGUF

CISCai commited on Apr 7, 2024

Commit

a2a7102

verified ·

1 Parent(s): 0e8e952

Make sure to enable repeat-penalty for this model (latest llama.cpp has it disabled by default).

Files changed (1) hide show

README.md CHANGED Viewed

@@ -95,7 +95,7 @@ Generated importance matrix file: [gorilla-openfunctions-v2.imatrix.dat](https:/
 Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
 ```shell
-./main -ngl 33 -m gorilla-openfunctions-v2.IQ3_M.gguf --color -c 16384 --temp 0 -p "You are an AI programming assistant, utilizing the Gorilla LLM model, developed by Gorilla LLM, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n### Instruction: <<function>>{functions}\n<<question>>{prompt}\n### Response: "
 ```
 Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
@@ -159,7 +159,7 @@ from llama_cpp import Llama
 # Chat Completion API
-llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384)
 print(llm.create_chat_completion(
       messages = [
         {

 Make sure you are using `llama.cpp` from commit [0becb22](https://github.com/ggerganov/llama.cpp/commit/0becb22ac05b6542bd9d5f2235691aa1d3d4d307) or later.
 ```shell
+./main -ngl 33 -m gorilla-openfunctions-v2.IQ3_M.gguf --color -c 16384 --temp 0 --repeat-penalty 1.1 -p "You are an AI programming assistant, utilizing the Gorilla LLM model, developed by Gorilla LLM, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.\n### Instruction: <<function>>{functions}\n<<question>>{prompt}\n### Response: "
 ```
 Change `-ngl 33` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
 # Chat Completion API
+llm = Llama(model_path="./gorilla-openfunctions-v2.IQ3_M.gguf", n_gpu_layers=33, n_ctx=16384, temperature=0.0, repeat_penalty=1.1)
 print(llm.create_chat_completion(
       messages = [
         {