Notes on fLlama model performance

#1
by RonanMcGovern - opened

This model will be improved as the underlying fine-tuning dataset improves and expands.

The 7B model already does well for simple functions, whereas the 13B model can capture more complexity (i.e. more data structures as function arguments).

Using the "test" split of the Function Calling Extended dataset, I'm scoring the performance as":

  • 7B = 5 out of 7 (see below for an example I consider a failure).
  • 13B = 6 out of 7.

Note that the "test" split was not used for training.

Sample fLlama-13B prompt (7B does well on this too)

Prompt:
[INST] <<SYS>>
You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:

{
    "function": "save_chat",
    "description": "This function saves content from current chat to a user's files. Use this function if the user asks to save this chat or content from this chat. If the user doesn't specify a name or description, use the content of the chat to determine a suitable chat name and description.",
    "arguments": [
        {
            "name": "fileName",
            "type": "string",
            "description": "A suitable name for the file where the chat history will be saved."
        },
        {
            "name": "fileDescription",
            "type": "string",
            "description": "A brief description of the chat history"
        },
        {
            "name": "fileContent",
            "type": "string",
            "description": "Content to save. Format the content in markdown. Do not include system messages"
        }
    ]
}

{
    "function": "delete_file",
    "description": "This function provides the ability to delete files in a user's folder. It doesn't delete 'files.json'. It updates 'files.json' to remove entries of deleted files.",
    "arguments": [
        {
            "name": "fileNames",
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "An array of file names to be deleted"
        }
    ]
}
                

To call a function, respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": value1,
        "argument2": value2
    }
}
<</SYS>>

There are one and two string kites. The two string ones are easier to control, although you can get the cords tangled. The one-stringed ones are sometimes used for kite fights, and you lose the kite and have to run after it if the string breaks. 

 Save this discussion of kites as KiteDiscussion. [/INST]

Generated Assistant Response:
 {
    "function": "save_chat",
    "arguments": {
        "fileName": "KiteDiscussion",
        "fileDescription": "A discussion about kites and their different types",
        "fileContent": "--- **KiteDiscussion** Kites are a popular toy for children and adults alike. There are many different types of kites, including one-stringed and two-stringed kites. One-stringed kites are easier to control, but two-stringed kites are more stable and easier to fly. Two-stringed kites are also easier to control, but they can be more difficult to maneuver. Kite fights are a popular activity, but you risk losing your kite if the string breaks. ---"
    }
}</s>

Correct Assistant Response:
{
    "function": "save_chat",
    "arguments": {
        "fileName": "KiteDiscussion",
        "fileDescription": "Notes on one and two stringed kites",
        "fileContent": "--- **Types of Kite** There are one and two string kites. The two string ones are easier to control, although you can get the cords tangled. The one-stringed ones are sometimes used for kite fights, and you lose the kite and have to run after it if the string breaks. ---"
    }
}

fLlama-7B performance on a tricky prompt

Prompt:
[INST] <<SYS>>
You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:

{
    "function": "delete_file",
    "description": "This function provides the ability to delete files in a user's folder. It doesn't delete 'files.json'. It updates 'files.json' to remove entries of deleted files.",
    "arguments": [
        {
            "name": "fileNames",
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "An array of file names to be deleted"
        }
    ]
}

{
    "function": "save_chat",
    "description": "This function saves content from current chat to a user's files. Use this function if the user asks to save this chat or content from this chat. If the user doesn't specify a name or description, use the content of the chat to determine a suitable chat name and description.",
    "arguments": [
        {
            "name": "fileName",
            "type": "string",
            "description": "A suitable name for the file where the chat history will be saved."
        },
        {
            "name": "fileDescription",
            "type": "string",
            "description": "A brief description of the chat history"
        },
        {
            "name": "fileContent",
            "type": "string",
            "description": "Content to save. Format the content in markdown. Do not include system messages"
        }
    ]
}
                

To call a function, respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": value1,
        "argument2": value2
    }
}
<</SYS>>

Irish stew is typically made using beef, onions and potatoes. Start by frying some onions in some butter in a pan. Add in and brown some cubes of beer - very lightly. Add salt, pepper and some herbs. Then add the sliced potatoes on top. Simmer over a medium heat for 15 minutes and then leave to simmer on low heat for another hour. 

 Save this Irish stew recipe as irish_stew_recipes. [/INST]

Generated Assistant Response:
 {
    "function": "save_chat",
    "arguments": {
        "fileName": "irish_stew_recipes",
        "fileDescription": "Irish stew recipe",
        "fileContent": [
            "Beef, onions and potatoes",
            "Fry onions in butter",
            "Add cubes of beef and brown lightly",
            "Add salt, pepper and herbs",
            "Add sliced potatoes and simmer for 15 minutes",
            "Leave to simmer on low heat for another hour"
        ]
    }
}</s>

Correct Assistant Response:
{
    "function": "save_chat",
    "arguments": {
        "fileName": "irish_stew_recipes",
        "fileDescription": "Some recipes for Irish stew",
        "fileContent": "--- **Irish Stew Recipe** Irish stew is typically made using beef, onions and potatoes. Start by frying some onions in some butter in a pan. Add in and brown some cubes of beer - very lightly. Add salt, pepper and some herbs. Then add the sliced potatoes on top. Simmer over a medium heat for 15 minutes and then leave to simmer on low heat for another hour. \n\n Save this Irish stew recipe as irish_stew_recipes. ---"
    }
}

fLlama-13b performance on that same prompt above (not 100% perfect, but very good):

Prompt:
[INST] <<SYS>>
You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:

{
    "function": "delete_file",
    "description": "This function provides the ability to delete files in a user's folder. It doesn't delete 'files.json'. It updates 'files.json' to remove entries of deleted files.",
    "arguments": [
        {
            "name": "fileNames",
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "An array of file names to be deleted"
        }
    ]
}

{
    "function": "save_chat",
    "description": "This function saves content from current chat to a user's files. Use this function if the user asks to save this chat or content from this chat. If the user doesn't specify a name or description, use the content of the chat to determine a suitable chat name and description.",
    "arguments": [
        {
            "name": "fileName",
            "type": "string",
            "description": "A suitable name for the file where the chat history will be saved."
        },
        {
            "name": "fileDescription",
            "type": "string",
            "description": "A brief description of the chat history"
        },
        {
            "name": "fileContent",
            "type": "string",
            "description": "Content to save. Format the content in markdown. Do not include system messages"
        }
    ]
}
                

To call a function, respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": value1,
        "argument2": value2
    }
}
<</SYS>>

Irish stew is typically made using beef, onions and potatoes. Start by frying some onions in some butter in a pan. Add in and brown some cubes of beer - very lightly. Add salt, pepper and some herbs. Then add the sliced potatoes on top. Simmer over a medium heat for 15 minutes and then leave to simmer on low heat for another hour. 

 Save this Irish stew recipe as irish_stew_recipes. [/INST]

Generated Assistant Response:
{
    "function": "save_chat",
    "arguments": {
        "fileName": "irish_stew_recipes",
        "fileDescription": "Irish stew recipe",
        "fileContent": "--- Irish stew recipe Irish stew is typically made using beef, onions and potatoes. Start by frying some onions in some butter in a pan. Add in and brown some cubes of beer - very lightly. Add salt, pepper and some herbs. Then add the sliced potatoes on top. Simmer over a medium heat for 15 minutes and then leave to simmer on low heat for another hour. ---"
    }
}</s>

Correct Assistant Response:
{
    "function": "save_chat",
    "arguments": {
        "fileName": "irish_stew_recipes",
        "fileDescription": "Some recipes for Irish stew",
        "fileContent": "--- **Irish Stew Recipe** Irish stew is typically made using beef, onions and potatoes. Start by frying some onions in some butter in a pan. Add in and brown some cubes of beer - very lightly. Add salt, pepper and some herbs. Then add the sliced potatoes on top. Simmer over a medium heat for 15 minutes and then leave to simmer on low heat for another hour. \n\n Save this Irish stew recipe as irish_stew_recipes. ---"
    }
}

This looks cool, but may I ask how to use this in chat-ui ? I don't know how to generate the specifical words like [INST] <<SYS>>, just copy paste the data seems not working.

Howdy @aisensiy .

I'm not familiar with chat-ui, but am willing to learn. How do you typically run it - in a private space? What hardware do you choose? I can try to replicate and debug.

Your best bet for trying out fLlama is probably to use Jupyter fLlama, which you can run for free in colab. Scroll down and uncomment the function calling model. BTW, actually executing the function calls is not yet (but will be soon) implemented in that notebook.

@RonanMcGovern

I am using chat-ui with tgi.

  1. Run tgi and give the model-id=Trelis/Llama-2-7b-chat-hf-function-calling, it will give me a llm backend
  2. Run chat-ui follow the Readme of the repository
  3. Follow th instruction of https://github.com/huggingface/chat-ui#custom-models and give a config of .env.local like this:
  {
    "name": "ToolLLaMA-2-7b",
    "datasetName": "ToolBench/ToolLLaMA-2-7b",
    "description": "A good alternative to ChatGPT",
    "endpoints": [{"url": "https://localhost:8080/generate_stream"}],
    "userMessageToken": "[INST]",
    "assistantMessageToken": "[/INST]",
    "messageEndToken": "</s>",
    "preprompt": "[INST]<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>\n\n[/INST]",
    "promptExamples": [
      {
        "title": "Write an email from bullet list",
        "prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
      }, {
        "title": "Code a snake game",
        "prompt": "Code a basic snake game in python, give explanations for each step."
      }, {
        "title": "Assist in a task",
        "prompt": "How do I make a delicious lemon cheesecake?"
      }
    ],
    "parameters": {
      "temperature": 0.3,
      "top_p": 0.95,
      "repetition_penalty": 1.2,
      "top_k": 50,
      "truncate": 2000,
      "max_new_tokens": 2048
    }
  },

BTW, just paste prompt:

[INST] <<SYS>>
You are a helpful research assistant. The following functions are available for you to fetch further data to answer user questions, if relevant:

{
    "function": "delete_file",
    "description": "This function provides the ability to delete files in a user's folder. It doesn't delete 'files.json'. It updates 'files.json' to remove entries of deleted files.",
    "arguments": [
        {
            "name": "fileNames",
            "type": "array",
            "items": {
                "type": "string"
            },
            "description": "An array of file names to be deleted"
        }
    ]
}

{
    "function": "save_chat",
    "description": "This function saves content from current chat to a user's files. Use this function if the user asks to save this chat or content from this chat. If the user doesn't specify a name or description, use the content of the chat to determine a suitable chat name and description.",
    "arguments": [
        {
            "name": "fileName",
            "type": "string",
            "description": "A suitable name for the file where the chat history will be saved."
        },
        {
            "name": "fileDescription",
            "type": "string",
            "description": "A brief description of the chat history"
        },
        {
            "name": "fileContent",
            "type": "string",
            "description": "Content to save. Format the content in markdown. Do not include system messages"
        }
    ]
}
                

To call a function, respond - immediately and only - with a JSON object of the following format:
{
    "function": "function_name",
    "arguments": {
        "argument1": value1,
        "argument2": value2
    }
}
<</SYS>>

Irish stew is typically made using beef, onions and potatoes. Start by frying some onions in some butter in a pan. Add in and brown some cubes of beer - very lightly. Add salt, pepper and some herbs. Then add the sliced potatoes on top. Simmer over a medium heat for 15 minutes and then leave to simmer on low heat for another hour. 

 Save this Irish stew recipe as irish_stew_recipes. [/INST]

is not working in Trelis/Llama-2-7b-chat-hf-function-calling model. But if I paste it for meta-llama/Llama-2-70b-chat-hf model, the result looks like this:

{
"function": "save_chat",
"arguments": {
"fileName": "irish_stew_recipes",
"fileDescription": "A simple recipe for traditional Irish stew",
"fileContent": "Irish stew is typically made using beef, onions, and potatoes.\nStart by frying some onions in some butter in a pan. Add in and brown some cubes of beef - very lightly. Add salt, pepper, and some herbs. Then add the sliced potatoes on top. Simmer over a medium heat for 15 minutes and then leave to simmer on low heat for another hour."
}
}

Sooo fancy...

Howdy, thanks - really interesting how Llama-70B can just zero shot this with good prompting.

A few things:

  1. Google Colab:
  • I've got Bing search working now, you can try it out here. Let me know any feedback.
  1. tgi + chatui:
  • I don't have a GPU myself and tgi doesn't work with ggml - so I can't run on my Mac.
  • I've requested GPU access from Azure and AWS so that I can spin up a server to replicate what you're seeing.
  1. Prompting - I reviewed your code above. Some slight suggestions....
  • There's a quirk whereby Llama-2-chat expects that the first user input be wrapped together within the system message. This is hard to do with the model syntax provided. I need to think more about whether there is a hack to get around this limitation.
  • The system message should be in the preprompt. Try putting the system prompt in the preprompt rather than the prompt. Then, put the short user message asking for the file in the prompt.

I'm not sure if that is the issue with 7B but it may help.

Howdy, thanks - really interesting how Llama-70B can just zero shot this with good prompting.

A few things:

  1. Google Colab:
  • I've got Bing search working now, you can try it out here. Let me know any feedback.
  1. tgi + chatui:
  • I don't have a GPU myself and tgi doesn't work with ggml - so I can't run on my Mac.
  • I've requested GPU access from Azure and AWS so that I can spin up a server to replicate what you're seeing.
  1. Prompting - I reviewed your code above. Some slight suggestions....
  • There's a quirk whereby Llama-2-chat expects that the first user input be wrapped together within the system message. This is hard to do with the model syntax provided. I need to think more about whether there is a hack to get around this limitation.
  • The system message should be in the preprompt. Try putting the system prompt in the preprompt rather than the prompt. Then, put the short user message asking for the file in the prompt.

I'm not sure if that is the issue with 7B but it may help.

What kind of gpu would you like to use, I hava a A6000(48GB memory) from a public cloud with a jupyterlab as the entrypoint (no private data in it) and I can give a temp access for you...But just a temp access for about 1 week...I am not sure if it is useful for you.

Thanks! I appreciate that. I should get amazon or azure access soon to a gpu, so I'll write back here once I've got tgi and chat-ui running.

Howdy @aisensiy , I just got up and running with Chat-ui.

Here is my .env.local MODELS config:

# 'name', 'userMessageToken', 'assistantMessageToken' are required
MODELS=`[
{
        "name": "Trelis/Llama-2-7b-chat-hf-function-calling",
        "datasetName": "Trelis/function_calling_extended",
        "description": "function calling Llama-7B-chat",
        "websiteUrl": "https://research.Trelis.com",
        "userMessageToken": "[INST]",
        "assistantMessageToken": "[/INST]",
        "messageEndToken": "</s>",
        "preprompt": "[INST]<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, >
        "parameters": {
                "temperature": 0.01,
                "top_p": 0.95,
                "repetition_penalty": 1.2,
                "top_k": 50,
                "truncate": 1000,
                "max_new_tokens": 1024
        },
        "endpoints": [{
                "url": "http://127.0.0.1:8080"
        }]
}
]`

and here's what that gave:

image.png

I then tried providing no preprompt (which actually would allow for a syntactically correct way of prompting Llama because both the system message and first user message will then together be wrapped in INST tokens):

# 'name', 'userMessageToken', 'assistantMessageToken' are required
MODELS=`[
{
        "name": "Trelis/Llama-2-7b-chat-hf-function-calling",
        "datasetName": "Trelis/function_calling_extended",
        "description": "function calling Llama-7B-chat",
        "websiteUrl": "https://research.Trelis.com",
        "userMessageToken": "[INST]",
        "assistantMessageToken": "[/INST]",
        "parameters": {
                "temperature": 0.01,
                "top_p": 0.95,
                "repetition_penalty": 1.2,
                "top_k": 50,
                "truncate": 1000,
                "max_new_tokens": 1024
        },
        "endpoints": [{
                "url": "http://127.0.0.1:8080"
        }]
}
]`

image.png

Oddly, it's as though there is no preprompt being fed in. I tested out various preprompts and I can't seem to affect the model. I'm probably missing something...

BTW, the raw model (not via chat ui) responds correctly:

image.png

I have the same issue...

Ok, thanks, I've posted an issue here: https://github.com/huggingface/chat-ui/issues/412

BTW, the above issue has been cleared @aisensiy . The way to prompt with chat-ui is to set up a customPromptTemplate. See above.

However, there remains an issue with how input is being fed to tgi and/or how tgi is tokenizing. New issue here

Ok @aisensiy I've now resolved this issue.

Here is the github for setup

And here is a video showing it working with llama-2-7b-chat-hf-function-calling-v2 (note that we've now moved to v2)

RonanMcGovern changed discussion status to closed

That is so great. I will try it again. Thanks !

Sign up or log in to comment