| --- |
| license: llama3 |
| datasets: |
| - mzbac/glaive-function-calling-v2-llama-3-format |
| language: |
| - en |
| --- |
| |
| # Model |
|
|
| This model is fine-tuned based on Meta-Llama/Meta-Llama-3-8B instructions via mlx-lm. |
|
|
| **Note:** The glaive-function-calling-v2 dataset contains some invalid JSON and single quotes for the arguments' values. I have re-trained the model based on cleaned-up data. If you encounter issues with the function calling JSON format, you may try this new version here: https://huggingface.co/mzbac/llama-3-8B-Instruct-function-calling-v0.2 |
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| model_id = "mzbac/llama-3-8B-Instruct-function-calling" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| |
| tool = { |
| "name": "search_web", |
| "description": "Perform a web search for a given search terms.", |
| "parameter": { |
| "type": "object", |
| "properties": { |
| "search_terms": { |
| "type": "array", |
| "items": {"type": "string"}, |
| "description": "The search queries for which the search is performed.", |
| "required": True, |
| } |
| } |
| }, |
| } |
| |
| messages = [ |
| { |
| "role": "system", |
| "content": f"You are a helpful assistant with access to the following functions. Use them if required - {str(tool)}", |
| }, |
| {"role": "user", "content": "Today's news in Melbourne, just for your information, today is April 27, 2014."}, |
| ] |
| |
| input_ids = tokenizer.apply_chat_template( |
| messages, |
| add_generation_prompt=True, |
| return_tensors="pt" |
| ).to(model.device) |
| |
| terminators = [ |
| tokenizer.eos_token_id, |
| tokenizer.convert_tokens_to_ids("<|eot_id|>") |
| ] |
| |
| outputs = model.generate( |
| input_ids, |
| max_new_tokens=256, |
| eos_token_id=terminators, |
| do_sample=True, |
| temperature=0.1, |
| ) |
| response = outputs[0] |
| print(tokenizer.decode(response)) |
| |
| # <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
| |
| # You are a helpful assistant with access to the following functions. Use them if required - {'name':'search_web', 'description': 'Perform a web search for a given search terms.', 'parameter': {'type': 'object', 'properties': {'search_terms': {'type': 'array', 'items': {'type':'string'}, 'description': 'The search queries for which the search is performed.','required': True}}}}<|eot_id|><|start_header_id|>user<|end_header_id|> |
| |
| # Today's news in Melbourne, just for your information, today is April 27, 2014.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
| |
| # <functioncall> {"name": "search_web", "arguments": '{"search_terms": ["Melbourne news", "April 27, 2014"]}'}<|eot_id|> |
| ``` |
| ## Training hyperparameters |
| lora_config.yaml |
| ```yaml |
| # The path to the local model directory or Hugging Face repo. |
| model: "meta-llama/Meta-Llama-3-8B-Instruct" |
| # Whether or not to train (boolean) |
| train: true |
| |
| # Directory with {train, valid, test}.jsonl files |
| data: "data" |
| |
| # The PRNG seed |
| seed: 0 |
| |
| # Number of layers to fine-tune |
| lora_layers: 32 |
|
|
| # Minibatch size. |
| batch_size: 1 |
| |
| # Iterations to train for. |
| iters: 6000 |
| |
| # Number of validation batches, -1 uses the entire validation set. |
| val_batches: 25 |
|
|
| # Adam learning rate. |
| learning_rate: 1e-6 |
| |
| # Number of training steps between loss reporting. |
| steps_per_report: 10 |
| |
| # Number of training steps between validations. |
| steps_per_eval: 200 |
| |
| # Load path to resume training with the given adapter weights. |
| resume_adapter_file: null |
| |
| # Save/load path for the trained adapter weights. |
| adapter_path: "adapters" |
|
|
| # Save the model every N iterations. |
| save_every: 1000 |
| |
| # Evaluate on the test set after training |
| test: false |
| |
| # Number of test set batches, -1 uses the entire test set. |
| test_batches: 100 |
|
|
| # Maximum sequence length. |
| max_seq_length: 8192 |
|
|
| # Use gradient checkpointing to reduce memory use. |
| grad_checkpoint: false |
| |
| # LoRA parameters can only be specified in a config file |
| lora_parameters: |
| # The layer keys to apply LoRA to. |
| # These will be applied for the last lora_layers |
| keys: ['mlp.gate_proj', 'mlp.down_proj', 'self_attn.q_proj', 'mlp.up_proj', 'self_attn.o_proj','self_attn.v_proj', 'self_attn.k_proj'] |
| rank: 128 |
| alpha: 256 |
| scale: 10.0 |
| dropout: 0.05 |
|
|
| # Schedule can only be specified in a config file, uncomment to use. |
| #lr_schedule: |
| # name: cosine_decay |
| # warmup: 100 # 0 for no warmup |
| # warmup_init: 1e-7 # 0 if not specified |
| # arguments: [1e-6, 1000, 1e-7] # passed to scheduler |
| ``` |