ToolGrad
Collection
[ACL 2026 Finding] ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients". GitHub Repo: https://github.com/zhongyi-zhou/toolgrad • 4 items • Updated • 1
ToolGrad 12B is a fine-tuned version of google/gemma-3-12b-it optimized for function calling and tool-use tasks. It is trained on the dataset generated using the method described in our paper ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients" (ACL 2026 Finding). The codebase is available at our GitHub Repository.
Single-turn tool-use tasks.
Evaluated on the Berkeley Function Calling Leaderboard (BFCL) v1 & v2:
| Model | Non-live | Live | Halluc. | |
|---|---|---|---|---|
| Overall | Overall | Rel. | Irrel. | |
| Gemma-3 12B | 79.44% | 74.24% | 70.29% | 93.75% |
| ToolGrad 12B | 87.81% ↑ | 78.46% ↑ | 93.75% | 59.27% |
| Model | Non-live | Live | ||||||
|---|---|---|---|---|---|---|---|---|
| Simple | Multi | Par | MultiPar | Simple | Multi | Par | MultiPar | |
| Gemma-3 12B | 76.25% | 94.00% | 91.00% | 56.50% | 85.66% | 71.89% | 87.50% | 45.83% |
| ToolGrad 12B | 75.25% ↓ | 94.00% ↑ | 93.50% ↑ | 88.50% ↑ | 85.66% ↑ | 77.11% ↑ | 75.00% ↓ | 62.50% ↑ |
You can load this model using the transformers library:
import torch
from transformers import AutoModelForCausalLM, AutoProcessor
model_id = "zhongyi-zhou/toolgrad-12b"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
If you find this work helpful, please cite our paper:
@misc{zhou2026toolgradefficienttoolusedataset,
title={ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"},
author={Zhongyi Zhou and Kohei Uehara and Haoyu Zhang and Jingtao Zhou and Lin Gu and Ruofei Du and Zheng Xu and Tatsuya Harada},
year={2026},
eprint={2508.04086},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.04086},
}