chat prompt

#6
by apepkuss79 - opened

What is the chat prompt in plain text? Thanks!

It uses chatml, correct?

To facilitate a versatile representation of such various tasks, we transform the data samples into the ChatML (Cha) format.

We adopt a modified version of the ChatML format to enable general tool calling by introducing the β€œenvironment” role.

Pulled from the paper here: https://arxiv.org/abs/2403.17297

+
These are ChatML tokens

"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|action_start|>",
"<|action_end|>",
"<|interpreter|>",
"<|plugin|>"
],

tokenizer_config.json

I imagine standard ChatML would work, just without function calling

I didn't see the details of how to use function calling with internlm-2.5. Where can I find it? Thanks!

I didn't see the details of how to use function calling with internlm-2.5. Where can I find it? Thanks!

Pg 47 of the paper:
Screenshot_20240705-155245.png

InternLM2.5 GitHub repo:

InternLM2.5-Chat models have excellent tool utilization capabilities and can work with function calls in a zero-shot manner. It also supports to conduct analysis by collecting information from more than 100 web pages. See more examples in agent section.

https://github.com/InternLM/InternLM/tree/main/agent

FWIW, I have tested using the latest llama.cpp with the various GGUFs of internlm2_5-7b-chat, including the one provided by internlm, as well as HF transformers in both chat and completion modes, and I cannot get tool calls to work as described in the paper.

First, on the GGUF side, the results are entirely inconsistent; the model will often say that it does not have the ability to call whatever tools are defined. Once it has said that, I've never seen it change it's mind within that session. HF transformers is more consistent. In llama.cpp, I allow it to use a different seed each time. Maybe the HF implementation uses a fixed seed?

Second, focusing on HF due to its consistency, it returns the JSON part of the tool call without the surrounding <|action_start|><|plugin|>...<|action_end|> tags.

It doesn't appear to be a token issue, because those are decoding fine:

>>> tokenizer.decode(range(92538, 92544))
'<|plugin|><|interpreter|><|action_end|><|action_start|><|im_end|><|im_start|>'

Tests were performed in a docker container created from the image huggingface/transformers-pytorch-gpu and run on a GeForce RTX 4090.

Github Example

The following uses the example prompt given in the internlm github repo.

By the way, the tool call "json" isn't really valid JSON. There are hanging commas after some items without subsequent items, e.g. "required": ["location"],},}.

Code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "internlm/internlm2_5-7b-chat"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# This is a hack to stuff the plugin prompt in there without having to deal with all the (de)tokenization
meta_inst = """
You are InternLM2-Chat, a harmless AI assistant<|im_end|>
<|im_start|>system name=<|plugin|>
[
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string"},
            },
            "required": ["location"],
        },
    }
]
""".strip()
model = model.eval()
response, history = model.chat(
    tokenizer=tokenizer,
    query="I want to know today's weather in Shanghai",
    meta_instruction=meta_inst
)
print(response)

To be sure the prompt was exactly correct, I modified the InternLM code to output the prompt just before tokenization, and this is it:

<|im_start|>system
You are InternLM2-Chat, a harmless AI assistant<|im_end|>
<|im_start|>system name=<|plugin|>
[
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string"},
            },
            "required": ["location"],
        },
    }
]
<|im_end|>
<|im_start|>user
I want to know today's weather in Shanghai<|im_end|>
<|im_start|>assistant

Console log:

$ /usr/bin/python3 github-example
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8/8 [00:02<00:00,  3.69it/s]
I need to use the get_current_weather function to get the current weather in Shanghai.
{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}

Look ma! No tags!

Another Example

This one uses correctly formatted JSON.

Console log:

$ cat > kitten-example << EOF
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "internlm/internlm2_5-7b-chat"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# This is a hack to stuff the plugin prompt in there without having to deal with all the (de)tokenization
meta_inst = """You are a helpful assistant.<|im_end|>
<|im_start|>system name=<|plugin|>
[{"name": "generate_image", "description": "Generates an image based on the given text prompt", "parameters": {"type": "object", "properties": {"prompt": {"type": "string", "description": "The text prompt used to guide image generation"}}, "required": ["prompt"]}}]
""".strip()
model = model.eval()
response, history = model.chat(
    tokenizer=tokenizer,
    query="Draw a picture of a kitten.",
    meta_instruction=meta_inst
)
print(response)
EOF
$ /usr/bin/python3 kitten-example
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 8/8 [00:02<00:00,  3.69it/s]
I'm calling the API function 'generate_image' with the argument 'prompt' set to 'A kitten'. This API call will generate an image of a kitten based on the provided text prompt. I believe this API call is made because the user expressed interest in seeing a picture of a kitten. By using this function, I can fulfill the user's request and provide them with a visual representation of a kitten.
{"name": "generate_image", "parameters": {"prompt": "A kitten"}}

As an update, it looks like there is some inconsistency in the tool calling format even in internlm's own code. Here is an excerpt from lmdeploy/model.py:

        if tools:
            tools_prompt = dict(
                role='system',
                name='plugin',  # only support internlm2
                content=json.dumps(tools, ensure_ascii=False))
            insert_index = 0
            if messages[0]['role'] == 'system':
                insert_index = 1
            messages.insert(insert_index, tools_prompt)

In this code, the format used is <|im_start|>system name=plugin\n...<|im_end|> and not <|im_start|>system name=<|plugin|>\n...<|im_end|>.

Sign up or log in to comment