apresence/internlm2_5-7b-chat_with-tool-fix

Fixed Tool Calls

This is a version of InternLM 2.5 7B Chat for Transformers with tool calls fixed.

Notes:

Because the tool tokens ('<|plugin|>', '<|interpreter|>', '<|action_end|>', '<|action_start|>') were flagged as special tokens in the original model, they were being excluded from the output.
Along the way, I also found spacing inconsistencies and variance between the FAST and SLOW tokenizers. This has all been fixed.
A repo with fixed llama.cpp/GGUF versions is here.

Please visit the original model card for licensing details, official example code, etc.

Credits

InternLM on HF for the original model posted here.
InternLM on GitHub for the detailed chat format info.

General prompt format

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Tool call prompt example

<|im_start|>system
You are InternLM2-Chat, a harmless AI assistant<|im_end|>
<|im_start|>system name=<|plugin|>
[
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string"},
            },
            "required": ["location"],
        },
    }
]
<|im_end|>
<|im_start|>user
I want to know today's weather in Shanghai<|im_end|>
<|im_start|>assistant
Sure, I will search for the weather of Shanghai.<|action_start|><|plugin|>
{"name": "get_current_weather", "parameters": {"location": "Shanghai"}}<|action_end|><|im_end|>
<|im_start|>environment name=<|plugin|>
{"temperature": 22}<|im_end|>
<|im_start|>assistant
The weather in Shanghai is 22 celsius<|im_end|>

This example comes directly from the paper as noted here and outlined further here.

Those with keen eyes will note that there are extra commas in the example JSON which makes it non-compliant with most JSON libraries. In my testing, I've used properly formatted JSON without the extra commas and I haven't had any issues.

Example inference script

I always spend way to much time taking the simple examples on the model cards and converting that into a working program. To that end, I have written a working example inference script and included it in this repo.

Highlights:

Supports 4-bit quantization (enabled by default)
Includes example tool call (the LLM prompt part of it anyway)
Has a number of options (not required)
Use -h or --help to list all the options
If the model files are not already downloaded, the script will automatically download them

Example Usage (streaming json output):

$ /usr/bin/python3 ./inference_example.py --format json
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00,  3.87it/s]
["I", " will", " call", " an", " API", " to", " generate", " an", " image", " of", " a", " k", "itten", ".", "<|action_start|>", "<|plugin|>", "\n", "{\"", "name", "\":", " \"", "generate", "_image", "\",", " \"", "parameters", "\":", " {\"", "prompt", "\":", " \"", "A", " cute", " k", "itten", " with", " big", " eyes", " and", " a", " fl", "uffy", " tail", "\"}}", "<|action_end|>", "\n"]

Support

Want to support my work? Visit my ko-fi page here: https://ko-fi.com/apresence