BirdAgent β€” Qwen3-VL-4B agentic bird identifier (SFT cold-start)

LoRA adapter on Qwen/Qwen3-VL-4B-Instruct. This is the SFT cold-start stage of BirdAgent (see the flagship Chinzhu/BirdAgent-Qwen3VL-4B GSPO model for the full description, tools, results, and figures).

πŸ“„ Paper (under review) Β· πŸ’» Code Β· πŸ“Š Benchmarks

  • Stage: SFT on code-authored blueprint tool-use trajectories; loss on assistant + tool-call tokens only, tool observations and image-pad tokens masked out (verified before every run). This masking is load-bearing β€” masking the tool-call turns yields a model less agentic than the base.
  • Adapter: LoRA r=64, alpha=128, dropout=0.05 (trained bf16, adapter saved fp32); targets language-model q/k/v/o/gate/up/down_proj.
  • Result (pooled solve): 0.29 overall β€” already matching Sonnet-web and above every same-tool API model; the base 4B with the same tools scores 0.06.
  • Note: this is an intermediate checkpoint; the released agent is the GSPO model. Same intended use, limitations (tool ceiling, general-VLM format lock), and responsible-use terms as the flagship card.

Citation

@inproceedings{wang2026birdagent,
  title     = {BirdAgent: A Small Vision--Language Model that Orchestrates
               Domain Tools Beats Large Models that Merely Hold Them},
  author    = {Wang, Xinzhu},
  booktitle = {Under review},
  year      = {2026}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Chinzhu/BirdAgent-Qwen3VL-4B-SFT

Adapter
(76)
this model