Qwen-Inno-35B-v1

Qwen-Inno-35B-v1 is an educational-agent model post-trained on Qwen3.6-35B-A3B via LoRA. It is designed to serve as the backbone for Inno Agent, an open-source personal learning agent with layered memory, educational post-training, and local deployment.

Model Details

Attribute Value
Base model Qwen3.6-35B-A3B
Parameters 35B total, ~3B active (MoE)
Training method LoRA-based supervised fine-tuning
Context window 262,144 tokens

Training Data: Three-Stream Mixture

The training mixture combines three complementary supervision sources:

Stream 1: Educational Data

Extracted from papers and learning materials, structured into examples for:

  • Concept explanation
  • Misconception diagnosis
  • Hinting and scaffolding
  • Exercise generation
  • Answer feedback
  • Learning-plan construction
  • Spaced review

This stream teaches the model how to teach — emphasizing pedagogical intent, difficulty calibration, and learner-facing clarity rather than only answer correctness.

Stream 2: General Chain-of-Thought (distilled from Claude Opus)

High-level reasoning data that preserves transferable capabilities:

  • Decomposing hard questions
  • Following constraints
  • Writing code
  • Solving general benchmark tasks

This prevents the model from becoming a narrow tutoring template while keeping it competent on non-educational tasks that arise during learning sessions (math derivations, programming exercises, tool-oriented problem solving).

Stream 3: De-identified Inno Agent Trajectories

Real system traces capturing behavior unique to the Inno Agent tool surface:

  • Reading the learner profile and compact context pack
  • Archiving materials into the L2 wiki
  • Querying maintained wiki pages
  • Creating Practice Lab workspaces
  • Interpreting terminal run outputs
  • Scheduling review jobs

These trajectories teach both the educational decision policy and the concrete action policy needed by the deployed agent.

Design Goal

The goal is not to maximize benchmark scores by adding more reasoning tokens. Instead, the objective is to obtain an educational-agent model that:

  1. Remains close to the base model on general capability
  2. Improves on education-oriented evaluation signals
  3. Produces shorter reasoning traces when long deliberation is unnecessary

Benchmark Results

Benchmark Qwen3.6-35B-A3B Qwen-Inno-35B-v1
MMLU-Pro 85.2 81.0
MMLU-Redux 93.3 90.6
IF-Eval 92.4 92.2
IF-bench 65.0 65.7
AIME25 83.3 83.3
MMMU 81.7 79.8
MMMU-Pro 75.3 81.0
RealWorldQA 85.3 80.3
MMBench-EN 92.8 91.6
OCRBench 90.0 88.4
edu-paper-QA 87.4 90.4

The post-trained model keeps a comparable overall capability profile while shifting toward educational behavior. Notable improvements on MMMU-Pro (+5.7), edu-paper-QA (+3.0), and IF-bench (+0.7); slight regressions on MMLU-Pro, MMLU-Redux, RealWorldQA, and OCRBench.

Note on edu-paper-qa: this is an internal test set built from educational papers, used here as a private education-oriented evaluation signal. It has not yet been publicly released.

Reasoning Length and Efficiency

For deployment, decoding cost matters as much as final accuracy. We compared median output length and the explicit think segment on AIME, MMLU-Pro, HumanEval, and IFBench.

Dataset Median Output Length Change
AIME −31.8%
MMLU-Pro −55.1%
HumanEval −72.2%
IFBench longer (regression)

Since most generated tokens live in the think segment rather than the final answer, this reduction translates directly into:

  • Lower decoding cost
  • Shorter user-visible latency
  • Better fit for local or organizational deployment

IFBench exception: Qwen-Inno-35B-v1 reasons longer and has more max-token truncations on instruction-heavy prompts. This suggests targeted filtering or preference optimization is needed so the model learns when to stop deliberating.

Intended Use

Qwen-Inno-35B-v1 is intended as the backbone of the Inno Agent runtime, where it benefits from external scaffolding:

  • L1 learner profile — durable goals, knowledge states, misconceptions, preferences
  • L2 native wiki — ingested learning materials as browsable pages
  • L3 session records — recent dialogue and tool calls
  • Compact context pack — short, decision-ready learner summary injected per turn
  • Tool surface — learner tools, wiki tools, scheduler, document parser, Practice Lab

A small model does not need to hold the learner's entire history and knowledge in context. Inno Agent's system memory, tools, and context pack provide external structure, so the model can complete high-quality personalized teaching with far fewer tokens.

Suitable for

  • Personal learning assistants
  • Privacy-sensitive local deployment (school clusters, personal GPUs, organizational servers)
  • Low-latency turn-by-turn tutoring
  • Educational tool-using agents

Not intended for

  • General software-engineering coding-agent workloads (the base model is a better choice)
  • Multi-tenant or group-chat customer-service systems
  • Standalone benchmark maximization without system scaffolding

Limitations

  • This is a preliminary post-training run. RL optimization and learning-outcome studies remain future work.
  • Benchmark improvements are not uniform: some general benchmarks (MMLU-Pro, MMLU-Redux, IF-Eval, MMMU, RealWorldQA, MMBenchEN-DEV-v1.1, OCRBench) show small regressions.
  • IFBench reasoning-length regression indicates instruction-following deliberation control is not yet stable.
  • Educational behavior depends on the surrounding Inno Agent memory and tool surface; standalone use will lose the personalization advantages.
  • The edu-paper-qa evaluation is internal and not yet publicly reproducible.

Training Configuration

Item Value
Base model Qwen3.6-35B-A3B
Method LoRA supervised fine-tuning
Data streams Educational + Opus-distilled CoT + Inno trajectories
Optimization stage Supervised post-training (RL/DPO future work)

Citation

If you use Qwen-Inno-35B-v1, please cite the Inno Agent technical report:

@techreport{innoagent2026,
  title  = {Inno Agent: An Open-Source Personal Learning Agent with
            Layered Memory, Educational Post-Training, and Local Deployment},
  author = {Hao Hao and Ye Lu and Ruotong Yang and
            Yongheng Guo and Aimin Zhou},
  institution = {Shanghai Institute of AI for Education},
  year   = {2026}
}

Links

Downloads last month
17
Safetensors
Model size
35B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Septend/Qwen-Inno-35B-v1

Adapter
(34)
this model