README.md · kaitchup/Meta-Llama-3-8B-xLAM-Adapter at main

metadata

language:
  - en
license: cc-by-4.0
library_name: peft
datasets:
  - Salesforce/xlam-function-calling-60k

Model Details

This is an adapter for meta-llama/Meta-Llama-3-8B fine-tuned for function calling on xLAM. This adapter is undertrained. Its main purpose is for testing function calling capabilities of LLMs.

import torch, os
from peft import PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)

#use bf16 and FlashAttention if supported
if torch.cuda.is_bf16_supported():
  os.system('pip install flash_attn')
  compute_dtype = torch.bfloat16
  attn_implementation = 'flash_attention_2'
else:
  compute_dtype = torch.float16
  attn_implementation = 'sdpa'

adapter= "kaitchup/Meta-Llama-3-8B-xLAM-Adapter"
model_name = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=compute_dtype,
    device_map={"": 0},
    attn_implementation=attn_implementation,
)

model = PeftModel.from_pretrained(model, adapter)

prompt = "<user>Check if the numbers 8 and 1233 are powers of two.</user>\n\n<tools>"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, do_sample=False, temperature=0.0, max_new_tokens=150)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Developed by: The Kaitchup
Language(s) (NLP): English
License: cc-by-4.0