Instructions to use roshangrewal/gemma4-e4b-toolcall-v02 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use roshangrewal/gemma4-e4b-toolcall-v02 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for roshangrewal/gemma4-e4b-toolcall-v02 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for roshangrewal/gemma4-e4b-toolcall-v02 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for roshangrewal/gemma4-e4b-toolcall-v02 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="roshangrewal/gemma4-e4b-toolcall-v02", max_seq_length=2048, )
Gemma 4 E4B Fine-Tuned for Tool Calling — 95% accuracy, runs anywhere
Released gemma4-e4b-toolcall-v02 — a production-grade tool-calling model built on Gemma 4 E4B-it (4B params).
Highlights
- 95% on multi-tool selection (BFCL benchmark)
- 90% on parallel function calling
- 88.5% on simple function calling (BFCL official)
- Works with vLLM, Ollama, transformers, llama.cpp
- OpenAI-compatible API out of the box
- Apache 2.0 — fully commercial use
Quick Start
(APIServer pid=542852) INFO 06-16 21:56:39 [api_utils.py:339]
(APIServer pid=542852) INFO 06-16 21:56:39 [api_utils.py:339] █ █ █▄ ▄█
(APIServer pid=542852) INFO 06-16 21:56:39 [api_utils.py:339] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.23.0
(APIServer pid=542852) INFO 06-16 21:56:39 [api_utils.py:339] █▄█▀ █ █ █ █ model roshangrewal/gemma4-e4b-toolcall-v02
(APIServer pid=542852) INFO 06-16 21:56:39 [api_utils.py:339] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=542852) INFO 06-16 21:56:39 [api_utils.py:339]
(APIServer pid=542852) INFO 06-16 21:56:39 [api_utils.py:273] non-default args: {'model_tag': 'roshangrewal/gemma4-e4b-toolcall-v02', 'enable_auto_tool_choice': True, 'tool_call_parser': 'gemma4', 'model': 'roshangrewal/gemma4-e4b-toolcall-v02'}
What it does
Given available tools and a user query, the model:
- Selects the correct tool from 12+ options (93%)
- Extracts complex parameters from natural language (100%)
- Knows when NOT to call a tool and responds directly (87.5%)
- Handles multi-turn tool chains
- Retains full conversational ability — tool-calling added on top, nothing removed
Available formats
| Format | Link | Use case |
|---|---|---|
| Full model | gemma4-e4b-toolcall-v02 | vLLM, transformers |
| LoRA adapter | gemma4-e4b-toolcall-v02-lora | Lightweight, further fine-tuning |
| GGUF Q8 | gemma4-e4b-toolcall-v02-gguf | Ollama, llama.cpp, LM Studio |
Training
- Method: QLoRA (r=64) with Unsloth, 5000 steps
- Data: 78K examples from NVIDIA Nemotron-SFT-Agentic-v2 + Glaive function-calling
- Hardware: Single NVIDIA A100 80GB GPU, ~35 hours
- Evaluation: 1000-query test dataset included in the repo for reproducibility
BFCL Submission
PR submitted to Berkeley Function Calling Leaderboard: gorilla#1344
Feedback welcome! The model card has full details including what didn't work (DPO, gradient issues with PEFT) and how we solved them.