glaiveai/glaive-function-calling-v2
Viewer • Updated • 113k • 53.2k • 514
How to use roshangrewal/gemma4-e4b-toolcall-v02-lora with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-e4b-it-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "roshangrewal/gemma4-e4b-toolcall-v02-lora")How to use roshangrewal/gemma4-e4b-toolcall-v02-lora with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for roshangrewal/gemma4-e4b-toolcall-v02-lora to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for roshangrewal/gemma4-e4b-toolcall-v02-lora to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for roshangrewal/gemma4-e4b-toolcall-v02-lora to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="roshangrewal/gemma4-e4b-toolcall-v02-lora",
max_seq_length=2048,
)QLoRA adapter for tool-calling, designed to be applied on top of google/gemma-4-E4B-it. Achieves 94.4% accuracy on 1000 diverse tool-calling queries.
For full details, see the merged model card.
from transformers import AutoProcessor, AutoModelForMultimodalLM
from peft import PeftModel
import torch
# Load base + adapter
base = AutoModelForMultimodalLM.from_pretrained(
"google/gemma-4-E4B-it",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "roshangrewal/gemma4-e4b-toolcall-v02-lora")
model.eval()
processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it")
# Define tools and query
tools = [{"type": "function", "function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}}]
messages = [{"role": "user", "content": "What's the weather in Mumbai?"}]
text = processor.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
# <|tool_call>call:get_weather{city:<|"|>Mumbai<|"|>}<tool_call|>
| Parameter | Value |
|---|---|
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable params | 169M (2.08% of 8.1B) |
| Adapter size | ~679 MB |
| Training | 5000 steps, 78K examples, ~85 hours, Unsloth |
| Category | Accuracy |
|---|---|
| Multiple | 95.0% |
| Parallel | 90.0% |
| Simple Python | 88.5% |
| Parallel Multiple | 86.0% |
| Live Simple | 79.8% |
| Non-Live Average | 86.5% |
| Category | Accuracy |
|---|---|
| Simple | 100% |
| Complex Params | 100% |
| Many Tools (12+) | 93% |
| Ambiguous | 91.5% |
| No-Tool-Needed | 87.5% |
| OVERALL | 94.4% |