Instructions to use Septend/Qwen-Inno-35B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Septend/Qwen-Inno-35B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Septend/Qwen-Inno-35B-v1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Septend/Qwen-Inno-35B-v1") model = AutoModelForMultimodalLM.from_pretrained("Septend/Qwen-Inno-35B-v1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Septend/Qwen-Inno-35B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Septend/Qwen-Inno-35B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Septend/Qwen-Inno-35B-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Septend/Qwen-Inno-35B-v1
- SGLang
How to use Septend/Qwen-Inno-35B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Septend/Qwen-Inno-35B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Septend/Qwen-Inno-35B-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Septend/Qwen-Inno-35B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Septend/Qwen-Inno-35B-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Septend/Qwen-Inno-35B-v1 with Docker Model Runner:
docker model run hf.co/Septend/Qwen-Inno-35B-v1
Qwen-Inno-35B-v1
Qwen-Inno-35B-v1 is an educational-agent model post-trained on Qwen3.6-35B-A3B via LoRA. It is designed to serve as the backbone for Inno Agent, an open-source personal learning agent with layered memory, educational post-training, and local deployment.
Model Details
| Attribute | Value |
|---|---|
| Base model | Qwen3.6-35B-A3B |
| Parameters | 35B total, ~3B active (MoE) |
| Training method | LoRA-based supervised fine-tuning |
| Context window | 262,144 tokens |
Training Data: Three-Stream Mixture
The training mixture combines three complementary supervision sources:
Stream 1: Educational Data
Extracted from papers and learning materials, structured into examples for:
- Concept explanation
- Misconception diagnosis
- Hinting and scaffolding
- Exercise generation
- Answer feedback
- Learning-plan construction
- Spaced review
This stream teaches the model how to teach — emphasizing pedagogical intent, difficulty calibration, and learner-facing clarity rather than only answer correctness.
Stream 2: General Chain-of-Thought (distilled from Claude Opus)
High-level reasoning data that preserves transferable capabilities:
- Decomposing hard questions
- Following constraints
- Writing code
- Solving general benchmark tasks
This prevents the model from becoming a narrow tutoring template while keeping it competent on non-educational tasks that arise during learning sessions (math derivations, programming exercises, tool-oriented problem solving).
Stream 3: De-identified Inno Agent Trajectories
Real system traces capturing behavior unique to the Inno Agent tool surface:
- Reading the learner profile and compact context pack
- Archiving materials into the L2 wiki
- Querying maintained wiki pages
- Creating Practice Lab workspaces
- Interpreting terminal run outputs
- Scheduling review jobs
These trajectories teach both the educational decision policy and the concrete action policy needed by the deployed agent.
Design Goal
The goal is not to maximize benchmark scores by adding more reasoning tokens. Instead, the objective is to obtain an educational-agent model that:
- Remains close to the base model on general capability
- Improves on education-oriented evaluation signals
- Produces shorter reasoning traces when long deliberation is unnecessary
Benchmark Results
| Benchmark | Qwen3.6-35B-A3B | Qwen-Inno-35B-v1 |
|---|---|---|
| MMLU-Pro | 85.2 | 81.0 |
| MMLU-Redux | 93.3 | 90.6 |
| IF-Eval | 92.4 | 92.2 |
| IF-bench | 65.0 | 65.7 |
| AIME25 | 83.3 | 83.3 |
| MMMU | 81.7 | 79.8 |
| MMMU-Pro | 75.3 | 81.0 |
| RealWorldQA | 85.3 | 80.3 |
| MMBench-EN | 92.8 | 91.6 |
| OCRBench | 90.0 | 88.4 |
| edu-paper-QA | 87.4 | 90.4 |
The post-trained model keeps a comparable overall capability profile while shifting toward educational behavior. Notable improvements on MMMU-Pro (+5.7), edu-paper-QA (+3.0), and IF-bench (+0.7); slight regressions on MMLU-Pro, MMLU-Redux, RealWorldQA, and OCRBench.
Note on edu-paper-qa: this is an internal test set built from educational papers, used here as a private education-oriented evaluation signal. It has not yet been publicly released.
Reasoning Length and Efficiency
For deployment, decoding cost matters as much as final accuracy. We compared median output length and the explicit think segment on AIME, MMLU-Pro, HumanEval, and IFBench.
| Dataset | Median Output Length Change |
|---|---|
| AIME | −31.8% |
| MMLU-Pro | −55.1% |
| HumanEval | −72.2% |
| IFBench | longer (regression) |
Since most generated tokens live in the think segment rather than the final answer, this reduction translates directly into:
- Lower decoding cost
- Shorter user-visible latency
- Better fit for local or organizational deployment
IFBench exception: Qwen-Inno-35B-v1 reasons longer and has more max-token truncations on instruction-heavy prompts. This suggests targeted filtering or preference optimization is needed so the model learns when to stop deliberating.
Intended Use
Qwen-Inno-35B-v1 is intended as the backbone of the Inno Agent runtime, where it benefits from external scaffolding:
- L1 learner profile — durable goals, knowledge states, misconceptions, preferences
- L2 native wiki — ingested learning materials as browsable pages
- L3 session records — recent dialogue and tool calls
- Compact context pack — short, decision-ready learner summary injected per turn
- Tool surface — learner tools, wiki tools, scheduler, document parser, Practice Lab
A small model does not need to hold the learner's entire history and knowledge in context. Inno Agent's system memory, tools, and context pack provide external structure, so the model can complete high-quality personalized teaching with far fewer tokens.
Suitable for
- Personal learning assistants
- Privacy-sensitive local deployment (school clusters, personal GPUs, organizational servers)
- Low-latency turn-by-turn tutoring
- Educational tool-using agents
Not intended for
- General software-engineering coding-agent workloads (the base model is a better choice)
- Multi-tenant or group-chat customer-service systems
- Standalone benchmark maximization without system scaffolding
Limitations
- This is a preliminary post-training run. RL optimization and learning-outcome studies remain future work.
- Benchmark improvements are not uniform: some general benchmarks (MMLU-Pro, MMLU-Redux, IF-Eval, MMMU, RealWorldQA, MMBenchEN-DEV-v1.1, OCRBench) show small regressions.
- IFBench reasoning-length regression indicates instruction-following deliberation control is not yet stable.
- Educational behavior depends on the surrounding Inno Agent memory and tool surface; standalone use will lose the personalization advantages.
- The
edu-paper-qaevaluation is internal and not yet publicly reproducible.
Training Configuration
| Item | Value |
|---|---|
| Base model | Qwen3.6-35B-A3B |
| Method | LoRA supervised fine-tuning |
| Data streams | Educational + Opus-distilled CoT + Inno trajectories |
| Optimization stage | Supervised post-training (RL/DPO future work) |
Citation
If you use Qwen-Inno-35B-v1, please cite the Inno Agent technical report:
@techreport{innoagent2026,
title = {Inno Agent: An Open-Source Personal Learning Agent with
Layered Memory, Educational Post-Training, and Local Deployment},
author = {Hao Hao and Ye Lu and Ruotong Yang and
Yongheng Guo and Aimin Zhou},
institution = {Shanghai Institute of AI for Education},
year = {2026}
}
Links
- Code: github.com/hhyqhh/inno-agent
- Model: Septend/Qwen-Inno-35B-v1
- Base model: Qwen/Qwen3.6-35B-A3B
- Downloads last month
- 17
Model tree for Septend/Qwen-Inno-35B-v1
Base model
Qwen/Qwen3.6-35B-A3B