Invoker-13B

Model Description

Invoker is a suite of large language models based on Llama-2 and is finetuned to plan between calling functions and providing responses directly. It behaves similar to OpenAI's function-calling models which intelligently chooses the best function to call among a list of functions provided to it and summarizes the function responses.

This model stands out for its ability to plan between function calls and returning model responses directly. The fine-tuning process was performed with a 4096 sequence length on an 2x A100 80GB machine.

For more details, refer to https://github.com/jeffrey-fong/Invoker

Model Usage

Prompt Format

The prompt to the model consists of a list of available of functions to call and the chat messages

You must provide the list of functions to the model. All functions passed in should be in the same JSON format as OpenAI function-calling. If no functions are to be passed to the model, provide None in the Available Functions Field.

Available Functions:
```json
<function1 name and description>
```
```json
<function2 name and description>
```

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary.
USER: <query>
ASSISTANT:

Available Functions:
None

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary.
USER: <query>
ASSISTANT:

Getting Started

To run inference with the model in full float16 precision, you need approximately 1xA100 40GB or equivalent.

from transformers import LlamaTokenizer, LlamaForCausalLM
import transformers
import torch

model = "jeffrey-fong/invoker-13b"

tokenizer = LlamaTokenizer.from_pretrained(model, use_fast=False)
model = LlamaForCausalLM.from_pretrained(model, torch_dtype=torch.float16, device_map="auto")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
output_ids = model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=False, top_p=1.0, temperature=0.7)
raw_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
output = raw_output[len(prompt):]

Model Training

The model was trained using QLora which significantly reduces the computational resources required. Training was also accelerated using DeepSpeed Zero Stage 2 which provides fast data parallelism.

Training Data

We use a variety of sources when building our training dataset. All the datasets are carefully chosen to improve both the conversational and function-calling capability of the model.

ToolBench (0830 updated)

ToolBench is an open-source, large-scale and high quality instruction tuning SFT dataset to facilitate the training of LLMs with general tool-use capability. It consists of multi-turn conversations where the assistant, who is presented with several potential functions to call, will call one or multiple functions before returning its response to the user. We had undergone rigorous cleaning of the data where we

Removed all datapoints that do not end with the assistant returning a summarized response
Cleaned datapoints with unnecessary calls to the same function
Changed all function names and descriptions to not include the domain name, so the functions feels more generic

ShareGPT-34K

ShareGPT-34K is a filtered dataset containing high quality multi-turn conversations between a user and an assistant. Some of the assistant responses are generated from OpenAI's GPT-3.5-Turbo while some are from GPT-4.

OASST1

OASST1 is a human-generated and human-annotated assistant-style conversation corpus. We filtered out the conversations in English.

All the datasets used are under Apache-2.0 License. Therefore, this dataset will also be under the same license.