Instructions to use Septend/Qwen-Inno-35B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Septend/Qwen-Inno-35B-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Septend/Qwen-Inno-35B-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Septend/Qwen-Inno-35B-v1")
model = AutoModelForMultimodalLM.from_pretrained("Septend/Qwen-Inno-35B-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Septend/Qwen-Inno-35B-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Septend/Qwen-Inno-35B-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Septend/Qwen-Inno-35B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Septend/Qwen-Inno-35B-v1

SGLang

How to use Septend/Qwen-Inno-35B-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Septend/Qwen-Inno-35B-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Septend/Qwen-Inno-35B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Septend/Qwen-Inno-35B-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Septend/Qwen-Inno-35B-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Septend/Qwen-Inno-35B-v1 with Docker Model Runner:
```
docker model run hf.co/Septend/Qwen-Inno-35B-v1
```

Qwen-Inno-35B-v1

Qwen-Inno-35B-v1 is an educational-agent model post-trained on Qwen3.6-35B-A3B via LoRA. It is designed to serve as the backbone for Inno Agent, an open-source personal learning agent with layered memory, educational post-training, and local deployment.

Model Details

Attribute	Value
Base model	Qwen3.6-35B-A3B
Parameters	35B total, ~3B active (MoE)
Training method	LoRA-based supervised fine-tuning
Context window	262,144 tokens

Training Data: Three-Stream Mixture

The training mixture combines three complementary supervision sources:

Stream 1: Educational Data

Extracted from papers and learning materials, structured into examples for:

Concept explanation
Misconception diagnosis
Hinting and scaffolding
Exercise generation
Answer feedback
Learning-plan construction
Spaced review

This stream teaches the model how to teach — emphasizing pedagogical intent, difficulty calibration, and learner-facing clarity rather than only answer correctness.

Stream 2: General Chain-of-Thought (distilled from Claude Opus)

High-level reasoning data that preserves transferable capabilities:

Decomposing hard questions
Following constraints
Writing code
Solving general benchmark tasks

This prevents the model from becoming a narrow tutoring template while keeping it competent on non-educational tasks that arise during learning sessions (math derivations, programming exercises, tool-oriented problem solving).

Stream 3: De-identified Inno Agent Trajectories

Real system traces capturing behavior unique to the Inno Agent tool surface:

Reading the learner profile and compact context pack
Archiving materials into the L2 wiki
Querying maintained wiki pages
Creating Practice Lab workspaces
Interpreting terminal run outputs
Scheduling review jobs

These trajectories teach both the educational decision policy and the concrete action policy needed by the deployed agent.

Design Goal

The goal is not to maximize benchmark scores by adding more reasoning tokens. Instead, the objective is to obtain an educational-agent model that:

Remains close to the base model on general capability
Improves on education-oriented evaluation signals
Produces shorter reasoning traces when long deliberation is unnecessary

Benchmark Results

Benchmark	Qwen3.6-35B-A3B	Qwen-Inno-35B-v1
MMLU-Pro	85.2	81.0
MMLU-Redux	93.3	90.6
IF-Eval	92.4	92.2
IF-bench	65.0	65.7
AIME25	83.3	83.3
MMMU	81.7	79.8
MMMU-Pro	75.3	81.0
RealWorldQA	85.3	80.3
MMBench-EN	92.8	91.6
OCRBench	90.0	88.4
edu-paper-QA	87.4	90.4

The post-trained model keeps a comparable overall capability profile while shifting toward educational behavior. Notable improvements on MMMU-Pro (+5.7), edu-paper-QA (+3.0), and IF-bench (+0.7); slight regressions on MMLU-Pro, MMLU-Redux, RealWorldQA, and OCRBench.

Note on edu-paper-qa: this is an internal test set built from educational papers, used here as a private education-oriented evaluation signal. It has not yet been publicly released.

Reasoning Length and Efficiency

For deployment, decoding cost matters as much as final accuracy. We compared median output length and the explicit think segment on AIME, MMLU-Pro, HumanEval, and IFBench.

Dataset	Median Output Length Change
AIME	−31.8%
MMLU-Pro	−55.1%
HumanEval	−72.2%
IFBench	longer (regression)

Since most generated tokens live in the think segment rather than the final answer, this reduction translates directly into:

Lower decoding cost
Shorter user-visible latency
Better fit for local or organizational deployment

IFBench exception: Qwen-Inno-35B-v1 reasons longer and has more max-token truncations on instruction-heavy prompts. This suggests targeted filtering or preference optimization is needed so the model learns when to stop deliberating.

Intended Use

Qwen-Inno-35B-v1 is intended as the backbone of the Inno Agent runtime, where it benefits from external scaffolding:

L1 learner profile — durable goals, knowledge states, misconceptions, preferences
L2 native wiki — ingested learning materials as browsable pages
L3 session records — recent dialogue and tool calls
Compact context pack — short, decision-ready learner summary injected per turn
Tool surface — learner tools, wiki tools, scheduler, document parser, Practice Lab

A small model does not need to hold the learner's entire history and knowledge in context. Inno Agent's system memory, tools, and context pack provide external structure, so the model can complete high-quality personalized teaching with far fewer tokens.

Suitable for

Personal learning assistants
Privacy-sensitive local deployment (school clusters, personal GPUs, organizational servers)
Low-latency turn-by-turn tutoring
Educational tool-using agents

Not intended for

General software-engineering coding-agent workloads (the base model is a better choice)
Multi-tenant or group-chat customer-service systems
Standalone benchmark maximization without system scaffolding

Limitations

This is a preliminary post-training run. RL optimization and learning-outcome studies remain future work.
Benchmark improvements are not uniform: some general benchmarks (MMLU-Pro, MMLU-Redux, IF-Eval, MMMU, RealWorldQA, MMBenchEN-DEV-v1.1, OCRBench) show small regressions.
IFBench reasoning-length regression indicates instruction-following deliberation control is not yet stable.
Educational behavior depends on the surrounding Inno Agent memory and tool surface; standalone use will lose the personalization advantages.
The edu-paper-qa evaluation is internal and not yet publicly reproducible.

Training Configuration

Item	Value
Base model	Qwen3.6-35B-A3B
Method	LoRA supervised fine-tuning
Data streams	Educational + Opus-distilled CoT + Inno trajectories
Optimization stage	Supervised post-training (RL/DPO future work)

Citation

If you use Qwen-Inno-35B-v1, please cite the Inno Agent technical report:

@techreport{innoagent2026,
  title  = {Inno Agent: An Open-Source Personal Learning Agent with
            Layered Memory, Educational Post-Training, and Local Deployment},
  author = {Hao Hao and Ye Lu and Ruotong Yang and
            Yongheng Guo and Aimin Zhou},
  institution = {Shanghai Institute of AI for Education},
  year   = {2026}
}

Model tree for Septend/Qwen-Inno-35B-v1

Base model

Qwen/Qwen3.6-35B-A3B

Adapter

(34)

this model

Septend
/

Qwen-Inno-35B-v1