Model

This model is fine-tuned based on Meta-Llama/Meta-Llama-3-8B instructions via mlx-lm.

Note: The glaive-function-calling-v2 dataset contains some invalid JSON and single quotes for the arguments' values. I have re-trained the model based on cleaned-up data. If you encounter issues with the function calling JSON format, you may try this new version here: https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "mzbac/llama-3-8B-Instruct-function-calling"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

tool = {
            "name": "search_web",
            "description": "Perform a web search for a given search terms.",
            "parameter": {
                "type": "object", 
                "properties": {
                    "search_terms": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "The search queries for which the search is performed.",
                    "required": True,
                    }
                }
            },
        }

messages = [
            {
                "role": "system",
                "content": f"You are a helpful assistant with access to the following functions. Use them if required - {str(tool)}",
            },
            {"role": "user", "content": "Today's news in Melbourne, just for your information, today is April 27, 2014."},
        ]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.1,
)
response = outputs[0]
print(tokenizer.decode(response))

# <|begin_of_text|><|start_header_id|>system<|end_header_id|>

# You are a helpful assistant with access to the following functions. Use them if required - {'name':'search_web', 'description': 'Perform a web search for a given search terms.', 'parameter': {'type': 'object', 'properties': {'search_terms': {'type': 'array', 'items': {'type':'string'}, 'description': 'The search queries for which the search is performed.','required': True}}}}<|eot_id|><|start_header_id|>user<|end_header_id|>

# Today's news in Melbourne, just for your information, today is April 27, 2014.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

# <functioncall> {"name": "search_web", "arguments": '{"search_terms": ["Melbourne news", "April 27, 2014"]}'}<|eot_id|>

Training hyperparameters

lora_config.yaml

# The path to the local model directory or Hugging Face repo.
model: "meta-llama/Meta-Llama-3-8B-Instruct"
# Whether or not to train (boolean)
train: true

# Directory with {train, valid, test}.jsonl files
data: "data"

# The PRNG seed
seed: 0

# Number of layers to fine-tune
lora_layers: 32

# Minibatch size.
batch_size: 1

# Iterations to train for.
iters: 6000

# Number of validation batches, -1 uses the entire validation set.
val_batches: 25

# Adam learning rate.
learning_rate: 1e-6

# Number of training steps between loss reporting.
steps_per_report: 10

# Number of training steps between validations.
steps_per_eval: 200

# Load path to resume training with the given adapter weights.
resume_adapter_file: null

# Save/load path for the trained adapter weights.
adapter_path: "adapters"

# Save the model every N iterations.
save_every: 1000

# Evaluate on the test set after training
test: false

# Number of test set batches, -1 uses the entire test set.
test_batches: 100

# Maximum sequence length.
max_seq_length: 8192

# Use gradient checkpointing to reduce memory use.
grad_checkpoint: false

# LoRA parameters can only be specified in a config file
lora_parameters:
  # The layer keys to apply LoRA to.
  # These will be applied for the last lora_layers
  keys: ['mlp.gate_proj', 'mlp.down_proj', 'self_attn.q_proj', 'mlp.up_proj', 'self_attn.o_proj','self_attn.v_proj', 'self_attn.k_proj']
  rank: 128
  alpha: 256
  scale: 10.0
  dropout: 0.05

# Schedule can only be specified in a config file, uncomment to use.
#lr_schedule:
#  name: cosine_decay
#  warmup: 100 # 0 for no warmup
#  warmup_init: 1e-7 # 0 if not specified
#  arguments: [1e-6, 1000, 1e-7] # passed to scheduler

mzbac
/

llama-3-8B-Instruct-function-calling

Model

Usage

Training hyperparameters

Dataset used to train mzbac/llama-3-8B-Instruct-function-calling