arxiv:2606.30616

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Published on Jun 29

· Submitted by

shiyang on Jun 30

#3 Paper of the day

Intern Science

Upvote

Authors:

Abstract

Agents-A1, a 35B Mixture-of-Experts Agentic Model, achieves trillion-parameter-level performance through long-horizon trajectory scaling and heterogeneous agent ability scaling via a three-stage training approach involving supervised fine-tuning, domain-level teacher models, and multi-teacher distillation.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

We introduce Agents-A1, a 35B Mixture-of-Experts Agentic Model that reaches trillion-parameter-level performance by scaling the agent horizon. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities. To support this goal, we build a long-horizon knowledge-action infrastructure that connects external knowledge, actions, observations, and verifier outcomes, producing agentic trajectories with an average length of 45K tokens. Based on this, we train Agents-A1 with a three-stage recipe. First, we perform full-domain supervised fine-tuning to align the base model with broad agentic behaviors. Second, we train domain-level teacher models to capture specialized expertise in each domain. Third, we propose a multi-teacher domain-routed on-policy distillation with salient vocabulary alignment to improve knowledge transfer efficiency across different domains, unifying six heterogeneous domains into one deployable student model. Agents-A1 achieves strong and broad performance for long-horizon agent benchmarks. Compared with 1T-parameter model such as Kimi-K2.6 and DeepSeek-V4-pro, Agents-A1 achieves leading results on SEAL-0 (56.4), IFBench (80.6), HiPhO (46.4), FrontierScience-Olympiad (79.0), and MolBench-Bind (56.8), and remains highly competitive on SciCode (44.3), HLE (47.6) and BrowseComp (75.5). We hope this work provides the community with a practical path for scaling the horizon using a 35B agent that can reach or match the performance of 1T models on long-horizon tasks.

View arXiv page View PDF Project page GitHub 68 Add to collection

Community

sY713

Paper submitter about 18 hours ago

•

edited about 17 hours ago

🚀 We are excited to share Agents-A1 from the Shanghai AI Lab.

Agents-A1 is a 35B MoE agentic model designed to scale long-horizon scientific and engineering capabilities, rather than simply scaling model parameters. It learns from knowledge-action trajectories that connect reasoning, tool use, execution feedback, and verification.

🔬 Agents-A1 shows strong capabilities in scientific reasoning, research-level coding, ML engineering, and scientific tool use. In our technical report, it achieves competitive results on benchmarks such as HLE with tools, HiPhO, FrontierScience, SciCode, MLE-Bench-Lite, MatTools, and MolBench-Bind.

🛠️ We hope Agents-A1 can serve as a practical open model for the community to explore autonomous research workflows, tool-integrated scientific problem solving, and next-generation AI-for-Science agents.