Model Description

This model was fine-tuned on meta-llama/Meta-Llama-3-8B-Instruct for function calling and json mode.

Usage

JSON Mode

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "hiieu/Meta-Llama-3-8B-Instruct-function-calling-json-mode"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant, answer in JSON with key \"message\""},
    {"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
# >> {"message": "I am a helpful assistant, with access to a vast amount of information. I can help you with tasks such as answering questions, providing definitions, translating text, and more. Feel free to ask me anything!"}

Function Calling

Function calling requires two step inferences, below is the example:

Step 1:

functions_metadata = [
    {
      "type": "function",
      "function": {
        "name": "get_temperature",
        "description": "get temperature of a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "name"
            }
          },
          "required": [
            "city"
          ]
        }
      }
    }
]

messages = [
    { "role": "system", "content": f"""You are a helpful assistant with access to the following functions: \n {str(functions_metadata)}\n\nTo use these functions respond with:\n<functioncall> {{ "name": "function_name", "arguments": {{ "arg_1": "value_1", "arg_1": "value_1", ... }} }} </functioncall>\n\nEdge cases you must handle:\n - If there are no functions that match the user request, you will respond politely that you cannot help."""},
    { "role": "user", "content": "What is the temperature in Tokyo right now?"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
# >> <functioncall> {"name": "get_temperature", "arguments": '{"city": "Tokyo"}'} </functioncall>"""}

Step 2:

messages = [
    { "role": "system", "content": f"""You are a helpful assistant with access to the following functions: \n {str(functions_metadata)}\n\nTo use these functions respond with:\n<functioncall> {{ "name": "function_name", "arguments": {{ "arg_1": "value_1", "arg_1": "value_1", ... }} }} </functioncall>\n\nEdge cases you must handle:\n - If there are no functions that match the user request, you will respond politely that you cannot help."""},
    { "role": "user", "content": "What is the temperature in Tokyo right now?"},
    # You will get the previous prediction, extract it will the tag <functioncall>
    # execute the function and append it to the messages like below:
    { "role": "assistant", "content": """<functioncall> {"name": "get_temperature", "arguments": '{"city": "Tokyo"}'} </functioncall>"""},    
    { "role": "user", "content": """<function_response> {"temperature":30 C} </function_response>"""}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
# >> The current temperature in Tokyo is 30 degrees Celsius.

Uploaded model

  • Developed by: hiieu

This model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
257
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hiieu/Meta-Llama-3-8B-Instruct-function-calling-json-mode

Finetuned
(485)
this model
Adapters
1 model
Finetunes
6 models
Merges
9 models
Quantizations
5 models

Spaces using hiieu/Meta-Llama-3-8B-Instruct-function-calling-json-mode 9