Image-Text-to-Text
PEFT
Safetensors
English
lora
agentic
tool-use
function-calling
vision-language
bird-identification
sft
Instructions to use Chinzhu/BirdAgent-Qwen3VL-4B-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Chinzhu/BirdAgent-Qwen3VL-4B-SFT with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-VL-4B-Instruct") model = PeftModel.from_pretrained(base_model, "Chinzhu/BirdAgent-Qwen3VL-4B-SFT") - Notebooks
- Google Colab
- Kaggle
BirdAgent β Qwen3-VL-4B agentic bird identifier (SFT cold-start)
LoRA adapter on Qwen/Qwen3-VL-4B-Instruct.
This is the SFT cold-start stage of BirdAgent (see the flagship
Chinzhu/BirdAgent-Qwen3VL-4B
GSPO model for the full description, tools, results, and figures).
π Paper (under review) Β· π» Code Β· π Benchmarks
- Stage: SFT on code-authored blueprint tool-use trajectories; loss on assistant + tool-call tokens only, tool observations and image-pad tokens masked out (verified before every run). This masking is load-bearing β masking the tool-call turns yields a model less agentic than the base.
- Adapter: LoRA
r=64,alpha=128,dropout=0.05(trained bf16, adapter saved fp32); targets language-modelq/k/v/o/gate/up/down_proj. - Result (pooled solve): 0.29 overall β already matching Sonnet-web and above every same-tool API model; the base 4B with the same tools scores 0.06.
- Note: this is an intermediate checkpoint; the released agent is the GSPO model. Same intended use, limitations (tool ceiling, general-VLM format lock), and responsible-use terms as the flagship card.
Citation
@inproceedings{wang2026birdagent,
title = {BirdAgent: A Small Vision--Language Model that Orchestrates
Domain Tools Beats Large Models that Merely Hold Them},
author = {Wang, Xinzhu},
booktitle = {Under review},
year = {2026}
}
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for Chinzhu/BirdAgent-Qwen3VL-4B-SFT
Base model
Qwen/Qwen3-VL-4B-Instruct