Meta-Llama-3-8B-Instruct_bitsandbytes_4bit fine-tuned on Salesforce/xlam-function-calling-60k

Function-Calling Agent

LoRA Adpater Head

Parameter Efficient Finetuning (PEFT) a 4bit quantized Meta-Llama-3-8B-Instruct on Salesforce/xlam-function-calling-60k dataset.

Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

How to use

Install Required Libraries

!pip install transformers accelerate bitsandbytes>0.37.0
!pip install peft

Setup Adapter with Base Model

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer,AutoModelForCausalLM
from peft import PeftModel, PeftConfig, get_peft_model
import torch

base_model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit",device_map="auto")
model = PeftModel.from_pretrained(base_model, "SwastikM/Meta-Llama3-8B-Chat-Adapter")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

model = model.to("cuda")
model.eval()

Setup Template and Infer

x1 = {"role": "system", "content": """You are a APIGen Function Calling Tool. You will br provided with a user query and associated tools for answering the query.
    query (string): The query or problem statement.
    tools (array): An array of available tools that can be used to solve the query.
    Each tool is represented as an object with the following properties:
        name (string): The name of the tool.
        description (string): A brief description of what the tool does.
        parameters (object): An object representing the parameters required by the tool.
            Each parameter is represented as a key-value pair, where the key is the parameter name and the value is an object with the following properties:
                type (string): The data type of the parameter (e.g., "int", "float", "list").
                description (string): A brief description of the parameter.
                required (boolean): Indicates whether the parameter is required or optional.
    You will provide the Answer array.
        Answers array provides the specific tool and arguments used to generate each answer."""}
x2 = {"role": "user", "content": None}
x3 = {"role": "assistant", "content": None}
user_template = 'Query: {Q} Tools: {T}'
response_template = '{A}'
Q = "Where can I find live giveaways for beta access and games?"
T = """[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]"""


x2['content'] = f'{user_template.format(Q=Q,T=T)}'
prompts = [x1,x2]
input_ids = tokenizer.apply_chat_template(
    prompts,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Size Comparison

The table shows comparison VRAM requirements for loading and training of FP16 Base Model and 4bit bnb quantized model with PEFT. The value for base model referenced from Model Memory Calculator from HuggingFace

Model Total Size Training Using Adam
Base Model 28.21 GB 56.42 GB
4bitQuantized+PEFT 5.21 GB 13 GB

Training Details

Training Data

Dataset: Salesforce/xlam-function-calling-60k dataset

Trained on instruction column of 20,00 randomly shuffled data.

Training Procedure

HuggingFace Accelerate with Training Loop.

Training Hyperparameters

  • Optimizer: AdamW
  • lr: 2e-5
  • decay: linear
  • batch_size: 1
  • gradient_accumulation_steps: 2
  • fp16: True

LoraConfig

  • r: 8
  • lora_alpha: 32
  • task_type: TaskType.CAUSAL_LM
  • lora_dropout: 0.1

Hardware

  • GPU: P100

Acknowledgment

Model Card Authors

Swastik Maiti

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train SwastikM/Meta-Llama3-8B-Chat-Instruct-LoRA