Tool calls broken; Link to fix
FYI, tool call tokens are broken for the current version of the models in this repo. The tool call tokens are hidden from the output due to being marked as special tokens.
Example:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>system name=<|plugin|>
[{"name": "generate_image", "description": "Generates an image based on the given text prompt", "parameters": {"type": "object", "properties": {"prompt": {"type": "string", "description": "The text prompt used to guide image generation"}}, "required": ["prompt"]}}]<|im_end|>
<|im_start|>user
Draw a picture of a kitten.<|im_end|>
<|im_start|>assistant
I will call an image generation api to generate image
{"name": "generate_image", "parameters": {"prompt": "A cute and playful kitten with big, round eyes, sitting on a fluffy pillow, in a soft, pastel color palette, impressionism style, high resolution, with a warm, cozy atmosphere."}}
Notice the tokens are missing. This is the expected output:
<|im_start|>assistant
I will call an image generation api to generate image<|action_start|><|plugin|>
{"name": "generate_image", "parameters": {"prompt": "A cute and playful kitten with big, round eyes, sitting on a fluffy pillow, in a soft, pastel color palette, impressionism style, high resolution, with a warm, cozy atmosphere."}}<|action_end|>
I have posted corrected versions here.
@apresence hi, thank you for the feedback. We'll try to fix it later.
@apresence hi, thank you for the feedback. We'll try to fix it later.
You're welcome to look at the code and settings changes from the repo I stood up.
Let me know if I can help!
Also, if I may, I'd like to share an another issue with you. It seems that your model uses dynamic RoPE by default. It also supports linear, but when I tried that, the perplexity was quite bad. I notice when running the model in llama.cpp, which does not support dynamic and thus has to run linear, that the perplexity is worse than dynamic in transformers. That is, it starts out great, but after a few back-and-forth interactions it starts to degrade, eventually repeating itself and forgetting information. This is when the prompt is still well within the configured context length. I'd love to hear your team's insights on the issue, and any ideas about how to address it.
Thanks for the great model!
@apresence hi, thank you for the feedback. We'll try to fix it later.
Another
@apresence hi,
1. special tokens missing issue
After verification, there seems to be a misunderstanding about The tool call tokens are hidden from the output due to being marked as special tokens.
Special tokens are not showing by default. If you add --special
to llama-cli
, then you can see the full output string with special tokens. Besides, you can also check it in log file if you add --logdir
command
build/bin/llama-cli \
--model internlm2_5-7b-chat-fp16.gguf \
--predict 512 \
--ctx-size 4096 \
--gpu-layers 32 \
--temp 0.8 \
--top-p 0.8 \
--top-k 50 \
--seed 1024 \
--color \
--prompt '<|im_start|>system\nYou are a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "generate_image", "description": "Generates an image based on the given text prompt", "parameters": {"type": "object", "properties": {"prompt": {"type": "string", "description": "The text prompt used to guide image generation"}}, "required": ["prompt"]}}]<|im_end|>\n' \
--interactive \
--multiline-input \
--conversation \
--verbose \
--logdir ./logdir \
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
--special
Here are the conversations
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- To return control to the AI, end your input with '\'.
- To return control without starting a new line, end your input with '/'.
<s><|im_start|>system
You are a harmless AI assistant.<|im_end|>
<|im_start|>system name=<|plugin|>[{"name": "generate_image", "description": "Generates an image based on the given text prompt", "parameters": {"type": "object", "properties": {"prompt": {"type": "string", "description": "The text prompt used to guide image generation"}}, "required": ["prompt"]}}]<|im_end|>
> <|im_start|>user\nDraw a picture of a kitten.
I will call an image generation api to generate image<|action_start|><|plugin|>[{"name": "generate_image", "parameters": {"prompt": "A cute, fluffy kitten with big round eyes, sitting on a soft cushion, warm and cozy, pastel colors, impressionism, high resolution, captured on a DSLR camera, natural lighting, detailed fur texture."}}]<|im_end|>
As you can see, the special tokens are not missing.
2. llama.cpp does not support dynamic rope scaling
this is true and there is an open issue https://github.com/ggerganov/llama.cpp/issues/8361 . No response by far.
Thank you for taking the time to address this topic.
You are right, llama-cli
does show the tokens when the --special
flag is used. However, I discovered the issue originally with the /completion
endpoint of llama-server
. I just happened to use llama-cli
to demonstrate the issue because it was easy to provide output that others could follow and verify on their own. As an interesting note, unlike the HF generate() function, I don't see an option for llama-cli
to hide/unhide special tokens, either as a command line option (since I prove below that --special
is ignored) or json arguments in the API call itself. The only way I am aware to change the behavior is to modify GGUF metadata. That is exactly what I did, and the reason I posted models with those changes applied.
Let's remove llama-cli
from the equation. To that end, I've written and used a little test program to call the /completion
endpoint and demonstrate the issue. Below are clips of the output for different scenarios. I can provide the script and command line parameters upon request.
For the record, this is the version of llama-server
I used for these tests:
$ ./llama-server --version
version: 3368 (dd07a123)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Without tool call fix
The tool call tokens are never included regardless of the --special
option.
With --special
[ SYS ] === TEST MODEL: internlm.internlm2_5-7b-chat-q4_k_m.gguf ===
[ SYS ] Args: ['./llama-server', '--model', 'internlm.internlm2_5-7b-chat-q4_k_m.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none', '--special']
[ <<< ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n'
[ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$'
[ >>> ] 'I need to use the get_current_weather function to get the current weather in Shanghai.\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}\n'
[ SYS ] Overall result for 'internlm.internlm2_5-7b-chat-q4_k_m.gguf': FAIL
[ SYS ] Reason for result: Response does not match expected pattern
Without --special
[ SYS ] === TEST MODEL: internlm.internlm2_5-7b-chat-q4_k_m.gguf ===
[ SYS ] Args: ['./llama-server', '--model', 'internlm.internlm2_5-7b-chat-q4_k_m.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none']
[ >>> ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n'
[ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$'
[ <<< ] 'To fulfill your request, I need to use the \\"get_current_weather\\" function and provide the location parameter as \\"Shanghai\\". I will also specify the unit of measurement as \\"metric\\" to ensure accuracy.\n{"name": "get_current_weather", "parameters": {"location": "Shanghai", "unit": "metric"}}\n'
[ SYS ] Overall result for 'internlm.internlm2_5-7b-chat-q4_k_m.gguf': FAIL
[ SYS ] Reason for result: Response does not match expected pattern
With tool call fix
The tool call tokens are always included regardless of the --special
option.
With --special
[ SYS ] === TEST MODEL: apresence.internlm2_5-7b-chat-Q4_K_M.gguf ===
[ SYS ] Args: ['./llama-server', '--model', 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none', '--special']
[ <<< ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n'
[ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$'
[ >>> ] 'I need to use the get_current_weather function to get the current weather in Shanghai.<|action_start|><|plugin|>\n{"name": "get_current_weather", "parameters": {"location": "Shanghai", "unit": "metric"}}<|action_end|>\n'
[ SYS ] Test Result: PASS
[ <<< ] '<|im_start|>environment name=<|plugin|>\n{"temperature": 22}<|im_end|>\n<|im_start|>assistant\n'
[ SYS ] Expected response pattern: '^.*\\b22\\b.*$'
[ >>> ] 'The temperature is currently at 22 degrees Celsius.'
[ SYS ] Test Result: PASS
[ SYS ] Overall result for 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf': PASS
Without --special
[ SYS ] === TEST MODEL: apresence.internlm2_5-7b-chat-Q4_K_M.gguf ===
[ SYS ] Args: ['./llama-server', '--model', 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none']
[ >>> ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n'
[ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$'
[ <<< ] 'I need to use the get_current_weather function to get the current weather in Shanghai.<|action_start|><|plugin|>\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<|action_end|>\n'
[ SYS ] Test Result: PASS
[ >>> ] '<|im_start|>environment name=<|plugin|>\n{"temperature": 22}<|im_end|>\n<|im_start|>assistant\n'
[ SYS ] Expected response pattern: '^.*\\b22\\b.*$'
[ >>> ] "It seems you're interested in the temperature, which is currently at 22 degrees Celsius. How can I assist you further today? Is there a specific task or information you need?"
[ SYS ] Test Result: PASS
[ SYS ] Overall result for 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf': PASS
Just after I took the time to test and write all that up, I received a notification of an update for llama.cpp.
There is a fix planned: #8506.
Once the fix is published and verified to be working, I can take down the copies I set up on hf.
I'm really impressed how the community comes together for these things. Thanks everyone!
Thank you for taking the time to address this topic.
You are right,
llama-cli
does show the tokens when the--special
flag is used. However, I discovered the issue originally with the/completion
endpoint ofllama-server
. I just happened to usellama-cli
to demonstrate the issue because it was easy to provide output that others could follow and verify on their own. As an interesting note, unlike the HF generate() function, I don't see an option forllama-cli
to hide/unhide special tokens, either as a command line option (since I prove below that--special
is ignored) or json arguments in the API call itself. The only way I am aware to change the behavior is to modify GGUF metadata. That is exactly what I did, and the reason I posted models with those changes applied.Let's remove
llama-cli
from the equation. To that end, I've written and used a little test program to call the/completion
endpoint and demonstrate the issue. Below are clips of the output for different scenarios. I can provide the script and command line parameters upon request.For the record, this is the version of
llama-server
I used for these tests:$ ./llama-server --version version: 3368 (dd07a123) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Without tool call fix
The tool call tokens are never included regardless of the
--special
option.With --special
[ SYS ] === TEST MODEL: internlm.internlm2_5-7b-chat-q4_k_m.gguf === [ SYS ] Args: ['./llama-server', '--model', 'internlm.internlm2_5-7b-chat-q4_k_m.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none', '--special'] [ <<< ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n' [ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$' [ >>> ] 'I need to use the get_current_weather function to get the current weather in Shanghai.\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}\n' [ SYS ] Overall result for 'internlm.internlm2_5-7b-chat-q4_k_m.gguf': FAIL [ SYS ] Reason for result: Response does not match expected pattern
Without --special
[ SYS ] === TEST MODEL: internlm.internlm2_5-7b-chat-q4_k_m.gguf === [ SYS ] Args: ['./llama-server', '--model', 'internlm.internlm2_5-7b-chat-q4_k_m.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none'] [ >>> ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n' [ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$' [ <<< ] 'To fulfill your request, I need to use the \\"get_current_weather\\" function and provide the location parameter as \\"Shanghai\\". I will also specify the unit of measurement as \\"metric\\" to ensure accuracy.\n{"name": "get_current_weather", "parameters": {"location": "Shanghai", "unit": "metric"}}\n' [ SYS ] Overall result for 'internlm.internlm2_5-7b-chat-q4_k_m.gguf': FAIL [ SYS ] Reason for result: Response does not match expected pattern
With tool call fix
The tool call tokens are always included regardless of the
--special
option.With --special
[ SYS ] === TEST MODEL: apresence.internlm2_5-7b-chat-Q4_K_M.gguf === [ SYS ] Args: ['./llama-server', '--model', 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none', '--special'] [ <<< ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n' [ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$' [ >>> ] 'I need to use the get_current_weather function to get the current weather in Shanghai.<|action_start|><|plugin|>\n{"name": "get_current_weather", "parameters": {"location": "Shanghai", "unit": "metric"}}<|action_end|>\n' [ SYS ] Test Result: PASS [ <<< ] '<|im_start|>environment name=<|plugin|>\n{"temperature": 22}<|im_end|>\n<|im_start|>assistant\n' [ SYS ] Expected response pattern: '^.*\\b22\\b.*$' [ >>> ] 'The temperature is currently at 22 degrees Celsius.' [ SYS ] Test Result: PASS [ SYS ] Overall result for 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf': PASS
Without --special
[ SYS ] === TEST MODEL: apresence.internlm2_5-7b-chat-Q4_K_M.gguf === [ SYS ] Args: ['./llama-server', '--model', 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf', '--host', '127.0.0.1', '--port', '52756', '--gpu_layers', '32', '--split_mode', 'none'] [ >>> ] '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant<|im_end|>\n<|im_start|>system name=<|plugin|>\n[\n{\n"name": "get_current_weather",\n"description": "Get the current weather in a given location",\n"parameters": {\n"type": "object",\n"properties": {\n"location": {\n"type": "string",\n"description": "The city and state, e.g. San Francisco, CA",\n},\n"unit": {"type": "string"},\n},\n"required": ["location"],\n},\n}\n]\n<|im_end|>\n<|im_start|>user\nI want to know today\'s weather in Shanghai<|im_end|>\n<|im_start|>assistant\n' [ SYS ] Expected response pattern: '^.*<\\|action_start\\|><|plugin|>\\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<\\|action_end\\|><\\|im_end\\|>$' [ <<< ] 'I need to use the get_current_weather function to get the current weather in Shanghai.<|action_start|><|plugin|>\n{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<|action_end|>\n' [ SYS ] Test Result: PASS [ >>> ] '<|im_start|>environment name=<|plugin|>\n{"temperature": 22}<|im_end|>\n<|im_start|>assistant\n' [ SYS ] Expected response pattern: '^.*\\b22\\b.*$' [ >>> ] "It seems you're interested in the temperature, which is currently at 22 degrees Celsius. How can I assist you further today? Is there a specific task or information you need?" [ SYS ] Test Result: PASS [ SYS ] Overall result for 'apresence.internlm2_5-7b-chat-Q4_K_M.gguf': PASS
@apresence
hi, thanks for your detailed info. llama-server
with --special
is fixed in this PR: https://github.com/ggerganov/llama.cpp/pull/8553
I've tested and it works with --system-prompt-file
instead of --prompt
arguments. Maybe an issue should be created on how to correctly use llama-server
in llama.cpp .
- create
sys-prompt.txt
echo '<|im_start|>system\nYou are InternLM2-Chat, a harmless AI assistant.<|im_end|>\n<|im_start|>system name=<|plugin|>[{"name": "get_current_weather", "parameters": {"required": ["location"], "type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string"}}}, "description": "Get the current weather in a given location"}]<|im_end|>\n' >> sys-prompt.txt
- start server
CUDA_VISIBLE_DEVICES=2 build/bin/llama-server \
--model internlm2_5-7b-chat-fp16.gguf \
--predict 512 \
--ctx-size 4096 \
--gpu-layers 32 \
--temp 0.8 \
--top-p 0.8 \
--top-k 50 \
--seed 1024 \
--color \
--system-prompt-file sys-prompt.txt \
--interactive \
--multiline-input \
--conversation \
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
--special
- call service
from openai import OpenAI
client = OpenAI(
api_key='YOUR_API_KEY',
base_url='http://localhost:8080/v1'
)
messages = [{"role": "user", "content": "<|im_start|>user\nI want to know today's weather in Shanghai"}]
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=messages,
functions=tools,
temperature=0.8,
top_p=0.8
)
print( response.choices[0].message.content)
reulsts
I need to use the get_current_weather API to get the weather in Shanghai.<|action_start|><|plugin|>
{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<|action_end|>
<|im_end|>