Instructions to use phucngodev/Qwopus3.5-9B-Coder-MTP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="phucngodev/Qwopus3.5-9B-Coder-MTP")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("phucngodev/Qwopus3.5-9B-Coder-MTP", dtype="auto")

llama-cpp-python

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="phucngodev/Qwopus3.5-9B-Coder-MTP",
	filename="Qwopus3.5-9B-Coder-MTP-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Use Docker

docker model run hf.co/phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

LM Studio
Jan

vLLM

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "phucngodev/Qwopus3.5-9B-Coder-MTP"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "phucngodev/Qwopus3.5-9B-Coder-MTP",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

SGLang

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "phucngodev/Qwopus3.5-9B-Coder-MTP" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "phucngodev/Qwopus3.5-9B-Coder-MTP",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "phucngodev/Qwopus3.5-9B-Coder-MTP" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "phucngodev/Qwopus3.5-9B-Coder-MTP",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use phucngodev/Qwopus3.5-9B-Coder-MTP with Ollama:
```
ollama run hf.co/phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M
```

Unsloth Studio new

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for phucngodev/Qwopus3.5-9B-Coder-MTP to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for phucngodev/Qwopus3.5-9B-Coder-MTP to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for phucngodev/Qwopus3.5-9B-Coder-MTP to start chatting

Pi new

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use phucngodev/Qwopus3.5-9B-Coder-MTP with Docker Model Runner:
```
docker model run hf.co/phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M
```

Lemonade

How to use phucngodev/Qwopus3.5-9B-Coder-MTP with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull phucngodev/Qwopus3.5-9B-Coder-MTP:Q4_K_M

Run and chat with the model

lemonade run user.Qwopus3.5-9B-Coder-MTP-Q4_K_M

List all available models

lemonade list

🌟 Qwopus3.5-9B-Coder-MTP (Multi-Token Prediction)

💡 Multi-Token Prediction (MTP) Architecture Overview

What is MTP (Multi-Token Prediction)?

MTP is a revolutionary technology in the field of Large Language Model (LLM) training and inference in recent years. Unlike traditional autoregressive models that predict only a single token at each step (Single-Token Prediction), MTP models are designed during training to simultaneously predict multiple future tokens at each position.

This architecture brings two core dimensions of transformation:

Deeper Representation and Planning: It forces the model to perform global planning (Long-Horizon Planning) at the representation level for longer-term contexts. This enhances cognitive coherence in logic-intensive tasks such as complex coding and multi-step mathematical reasoning, while effectively mitigating the "reasoning bubbles" and repetition loops common in traditional autoregressive models.

Extreme Inference Speedup (Speculative Decoding): During inference, the model is equipped with additional lightweight auxiliary prediction heads (Draft Heads, configured as draft=2 in this model). While the backbone network generates the current token, the Draft Heads predict the subsequent 2 candidate tokens in parallel with negligible computational overhead, which are then verified by the main model in a single forward pass. Once verified, the model can output multiple tokens in a single inference step, yielding substantial throughput gains.

🚀 Performance Briefing: Base vs MTP (draft=2)

Based on actual testing across Logic / Coding / DevOps / Math / Edge (5 core domains, 30 complex evaluation questions), Qwopus3.5-9B-Coder-MTP (draft=2) demonstrates absolute advantages in both speed and correctness:

⚡ Speed Leap: Overall throughput rate has jumped from 4.94 T/s to 6.71 T/s (+35.8% throughput improvement), saving 16.4 minutes in total latency (overall time reduced by 25%).
🎯 Accuracy & Robustness: Overall accuracy improved from 80.0% to 88.3% (+8.3pp). The model achieved a perfect score in both Coding (100% accuracy) and Math (100% accuracy), two high-difficulty task scenarios, completely eliminating the code truncations and repetitive behaviors observed in the Base model (independent of model type).
📊 Overall Efficiency Index: After weighting correctness against inference time, the overall reasoning efficiency of the MTP model improved by 38.4%.

The evaluation configuration and benchmark framework follow the official Qwen series testing by the Unsloth team, whose research demonstrates that setting draft=2 yields the optimal performance. For full details, see the official Unsloth MTP Benchmarks.

⚙️ Test Environment & Configuration

To guarantee the rigor, objectivity, and reproducibility of the evaluation, this benchmark was conducted under a unified hardware platform and sampling hyperparameters:

🖥️ Compute Platform: GB10 Dedicated Server Platform (equipped with high-performance LLM compute acceleration chips, providing abundant parallel computing power).
⚙️ Concurrency Configuration: Concurrency = 5 was used to perform multi-threaded concurrent pressure and stability testing, accurately simulating real-world multi-user concurrent invocation scenarios.
🛠️ Script Version: Benchlocal Test Suite v1.3.0 inference evaluation script.
🧪 Sampling Hyperparameters:
- Temperature: 1.0 (recommended standard, balancing logical reasoning and creativity).
- Top-p: 0.95 (retains high-probability candidates, filters tail noise, ensuring reasoning accuracy).

1. Token Volume and Speed Statistics

Question	Category	Base T/s	Base Time	Base Tokens	MTP T/s	MTP Time	MTP Tokens	Speedup
Token & Speed Details per Question
Q1	Logic	4.20	86.80	365	6.10	86.45	527	1.00x
Q2	Logic	4.40	178.70	786	5.80	130.80	759	1.37x
Q3	Logic	4.30	172.66	743	6.80	90.24	614	1.91x
Q4	Logic	4.20	153.05	643	7.90	67.85	536	2.25x
Q5	Logic	4.20	172.33	724	6.70	40.88	274	4.22x
Q6	Coding	4.40	240.96	1060	6.70	160.32	1074	1.50x
Q7	Coding	4.30	244.07	1050	6.20	173.26	1074	1.41x
Q8	Coding	4.30	245.05	1054	6.80	158.92	1081	1.54x
Q9	Coding	4.30	245.46	1055	6.60	162.95	1075	1.51x
Q10	Coding	4.40	241.59	1063	6.20	173.44	1075	1.39x
Q11	Coding	4.20	249.55	1048	6.90	156.09	1077	1.60x
Q12	Coding	4.20	211.45	888	6.50	155.98	1014	1.36x
Q13	Coding	4.30	248.09	1067	6.50	164.91	1072	1.50x
Q14	Coding	4.10	156.12	640	6.30	119.72	754	1.30x
Q15	Coding	4.30	144.47	621	6.40	165.97	1062	0.87x

Question ID	Question Summary	Correct Answer	Base	MTP
Logic Category (Q1-Q5) Answer Verification
Q1	17 sheep except 9 died, how many left	9 sheep	PASS	PASS
Q2	30 dollar hotel riddle, where is the 1 dollar	No loss, accounting error	PASS	PASS
Q3	Sequence: 2, 6, 12, 20, 30, ?	42 (n * (n + 1))	PASS	PASS
Q4	Bat + ball = $1.10, bat is $1 more expensive than ball	$0.05	PASS	PASS
Q5	Multiply by 3, add 6, divide by 3, subtract original number	Always 2	PASS	PASS

Logic: Base 5/5 = 100% | MTP 5/5 = 100%

Question ID	Question Summary	Base	MTP	Explanation
Coding Category (Q6-Q15) Answer Verification
Q6	Python Fibonacci generator	PARTIAL	PASS	Base Repetition=True, code truncation has logical issues
Q7	Python thread-safe singleton	PARTIAL	PASS	Base Repetition=True, incomplete implementation
Q8	Sort CSV by second column in descending order	PARTIAL	PASS	Base code truncated
Q9	Python HTTP Server	PASS	PASS	Both fully implemented
Q10	Python execution time decorator	PASS	PASS	Both fully implemented
Q11	C++ Binary Search Tree	PASS	PASS	Both fully implemented
Q12	Bash backup script (with date)	PASS	PASS	Both fully implemented
Q13	Python topological sort	PASS	PASS	Both fully implemented
Q14	Node.js Dockerfile	PASS	PASS	Both fully implemented
Q15	SQL second highest salary	PASS	PASS	Both implemented correctly

Coding: Base 7/10 = 70% | MTP 10/10 = 100%

Question ID	Question Summary	Base	MTP	Explanation
DevOps Category (Q16-Q20) Answer Verification
Q16	Nginx reverse proxy & load balancer	PARTIAL	PARTIAL	Both have correct config framework but Response was truncated
Q17	Hard Link vs Soft Link	PARTIAL	PASS	Base Repetition=True, has repetitive lines; MTP complete
Q18	crontab every Tuesday at 3:15 AM	PASS	PASS	Both correct: 15 3 * * 2 script.sh
Q19	SSH server security configuration	PARTIAL	PARTIAL	Both contents were truncated
Q20	systemd service restart on failure	PASS	PASS	Both explained correctly

DevOps: Base 2.5/5 = 50% | MTP 3.5/5 = 70%

Question ID	Question Summary	Correct Answer	Base	MTP
Math Category (Q21-Q25) Answer Verification
Q21	Find derivative of f(x) = x^3 * ln(x)	x^2 * (3ln(x) + 1)	PASS	PASS
Q22	System of equations: 2x+y=5, x-y=1	x = 2, y = 1	PASS	PASS
Q23	Probability of rolling a sum of 7 with two dice	1/6 = 16.67%	PASS	PASS
Q24	Integral of e^(2x)	(1/2)e^(2x)+C	PASS	PASS
Q25	Prove sum of first n odd numbers is n^2	Induction / Arithmetic progression	PARTIAL	PASS

Math: Base 4.5/5 = 90% | MTP 5/5 = 100%

Question ID	Question Summary	Base	MTP	Explanation
Edge Category (Q26-Q30) Answer Verification
Q26	Output 'Apple' 5 times	PASS	PASS	Both correctly outputted 5 lines
Q27	Output a phrase 3 times	PASS	PASS	Both correct
Q28	Explain infinity (with forbidden words constraint)	PASS	PARTIAL	MTP Repetition=True, Response truncated
Q29	Generate 5-level nested JSON	PASS	PARTIAL	MTP last item incomplete, Base generated 6 levels
Q30	30 'A's reply with 'B B B'	PASS	PASS	Both correct

Edge: Base 5/5 = 100% | MTP 3/5 = 60%

Category	Questions	Base Correct	Base Accuracy	MTP Correct	MTP Accuracy
Overall Accuracy Summary
Logic	5	5	100%	5	100%
Coding	10	7	70%	10	100%
DevOps	5	2.5	50%	3.5	70%
Math	5	4.5	90%	5	100%
Edge	5	5	100%	3	60%
Total	30	24	80.0%	26.5	88.3%

Efficiency Metric	Base Model	MTP Model	MTP Advantage
Reasoning Efficiency Comparison
Overall Throughput (T/s)	4.94	6.71	+35.8%
Overall Accuracy	80.0%	88.3%	+8.3pp
Total Latency	81.3 min	64.9 min	Saved 16.4min
Reasoning Efficiency Index (Accuracy / Latency)	1.64e-4	2.27e-4	+38.4%
Correct Answers per 1k Tokens	0.995 Q/kT	1.014 Q/kT	+1.9%

Quality Issue	Base Counts	MTP Counts
Quality Issues Statistics
Repetition (Repetitive output flags)	2 times (Q6, Q17)	2 times (Q6, Q28)
Timeout	0 times	0 times
Incomplete responses / Truncations	~8 occurrences	~4 occurrences
Excessively long reasoning chain	Less	More

8. Final Conclusion

Areas where MTP Model Excels

Speed: 35.8% faster overall, particularly outstanding in Math and Edge tasks.
Coding: 100% complete code outputs, whereas Base suffered 3 truncations due to repetition.
Math: 100% accuracy with more systematic reasoning chains.
Efficiency: Overall reasoning efficiency index is 38.4% higher.

Areas for MTP Model Improvement

Edge Task Stability: Truncations occurred in Q28/Q29 as excessively long reasoning chains hit token limits.
DevOps Long Texts: For long explanatory responses, draft matching rates are low, leading to limited speedups.

Scenario	Recommended Model
Recommended Scenarios
Code Generation	MTP
Mathematical Reasoning	MTP
Logical Reasoning	Both acceptable
Short-text instructions (Edge)	Base is more stable
DevOps long documents	Both require larger max_tokens

🌟 Qwopus3.5-9B-coder

🚀 Model Fine-Tuning and Logical Alignment (Qwopus3.5-9B-coder)

As the base model of this model, Qwopus3.5-9B-v3.5 is already a model with powerful capabilities. On this foundation, Qwopus3.5-9B-coder is specially optimized and fine-tuned for high-performance 🤖 Agentic Coding, complex Tool Calling, and logical reasoning.

💡 Why the 9B Dense Model? We believe that the 9B dense architecture represents the perfect "sweet spot" for large language models. It runs seamlessly at 8-bit precision on entry-level 16GB RAM devices—such as standard laptops and the Mac mini—making it exceptionally lightweight yet highly versatile. Without requiring expensive hardware, it allows you to achieve excellent performance paired with impressive inference speeds. Simply put, Qwen3.5-9B is currently the best open-source model in its class.

Vision & Tool Calling Support: This model supports visual capabilities and tool calling. To enable vision, please place the mmproj.gguf file from the GGUF repository into the same directory as the main .gguf file.

🛠 Training Strategy

The fine-tuning process of this model deeply integrates Trace Inversion data augmentation technology with high-quality Agent Traces. This systematic approach not only strengthens the model's ability to solve complex programming tasks, but also greatly improves its logical coherence and accuracy when using various tools.

This model is designed specifically for the following goals:

🧩 More structured and stronger logical reasoning capabilities, reducing repetitive thinking
💻 More powerful capabilities in code writing, debugging, and repository-level task processing
🛠 More stable and accurate Tool Calling capabilities for terminal commands, file operations, and browsers
🔁 Better cross-data source distillation alignment

Community Release Notice: Qwopus3.5-9B-coder is released purely as an experimental community version, aiming to explore the combination of Agent capabilities and deep reasoning, and is only for research and exploration use.

Warning: Because this model is vertically fine-tuned for programming agents and deep reasoning, and has not undergone comprehensive general performance evaluation, its capabilities in general domains or specific non-programming tasks may suffer from Capability Decay. Users are advised to be aware of its limitations in other scenarios while exploring its core capabilities.

📊 Baseline Performance Comparison

To verify the execution efficiency and logical robustness of Qwopus3.5-9B-coder in actual agent scenarios, we adopted the open-source testing framework benchlocal.

Test Configuration

Hardware Environment: Apple Silicon (Mac)
Inference Backend: LM Studio / MLX / GGUF
Testing Platform: benchlocal - An evaluation suite focusing on local model agent capabilities.
🍎 You can see the actual inference speeds of different model formats on the same device.

🧪 Benchmark Results

1. Complex Agent Performance - HermesAgent-20

The following is the comparative performance under the HermesAgent-20 task set:

Model	Test Set	Comprehensive Score	Core Dimensions (M/O/S/S/B)
HermesAgent-20 Performance Metrics
Qwopus3.5-9B-coder	HermesAgent-20	85	84 / 93 / 88 / 75 / 84
Qwen/Qwen3.5-9B	HermesAgent-20	71	75 / 58 / 100 / 53 / 69
armand0e/Qwen3.5-9B-Agent	HermesAgent-20	68	71 / 83 / 43 / 61 / 80
DJLougen/Harmonic-Hermes-9B	HermesAgent-20	47	60 / 45 / 23 / 69 / 38

2. Tool Call Stability - ToolCall-15

This is a ToolCall-15 test set targeting the stability of tool calls, aiming to test the stability of the model in tool calling:

Model	Test Set	Comprehensive Score	Dimension Scores (A/B/C/D/E)
ToolCall-15 Stability Metrics
Qwopus3.5-9B-coder	ToolCall-15	100	100 / 100 / 100 / 100 / 100
Qwen/Qwen3.5-9B	ToolCall-15	100	100 / 100 / 100 / 100 / 100
armand0e/Qwen3.5-9B-Agent	ToolCall-15	93	100 / 100 / 100 / 67 / 100

3. Code Debugging & Bug Fixing - BugFind-15

BugFind-15 is a test set containing 15 scenarios from shallow to deep, aiming to evaluate the real debugging capabilities of the model in discovering and fixing syntax, logical errors, and "trap" code in multiple programming languages through deterministic environment runtime verification.

Model	Test Set	Comprehensive Score	Dimension Scores (A/B/C/D/E)
BugFind-15 Performance Metrics
Qwopus3.5-9B-coder	BugFind-15	79	67 / 87 / 100 / 77 / 43
Jackrong/MLX-Qwen3.5-9B-DeepSeek-V4-Flash	BugFind-15	75	67 / 100 / 67 / 57 / 80
armand0e/Qwen3.5-9B-Agent	BugFind-15	58	29 / 87 / 73 / 20 / 67

🪐 SWE-bench Verified Performance (Repository-level Coding Capability)

The following shows the comparative performance on SWE-bench Verified, which evaluates language models on resolving software engineering issues in real-world open-source repositories:

Model	Test Set	Comprehensive Score (%)
SWE-bench Verified Performance Metrics
Claude 4.5 Opus	SWE-bench Verified	80.9
Qwen/Qwen3.5-27B	SWE-bench Verified	75.0
Qwen/Qwen3.6-35B-A3B	SWE-bench Verified	73.4
Qwopus3.5-9B-coder	SWE-bench Verified	53.33
google/gemma-4-31B-it	SWE-bench Verified	52.0
google/gemma-4-26B-A4B	SWE-bench Verified	45.0 - 48.0

⚙️ All tests were conducted with a temperature of 1 as officially recommended by qwen3.5. All errors and model issues were attempted to be regenerated twice after a test failure. If both attempts fail, it is considered a failure.

🍎 All screenshots of the test interfaces have been uploaded to the image folder in the repository. Click the link below to view and verify:

🔗 View Test Screenshots

❤️ Kyle Hessling for his generous hardware and equipment support. You can follow him for more updates on X / Twitter: @KyleHessling1.

🧪 Core Dataset Usage: Trace Inversion and High-Quality Agent Traces

In order to break through the "reasoning bubble" limitation of the model in actual programming and tool usage, and to endow it with real Agent behavioral capabilities, this model introduced core augmented datasets during training:

1. Reasoning Synthetic Data Combining Trace Inversion

Currently, based on public information, commercial models such as OpenAI's GPT series and Anthropic's Claude series have very clearly hidden the true internal reasoning chains of their models. For these models, what we can ultimately see in the API or front-end interface can often only be considered a highly compressed "Reasoning Bubble".

To break through this limitation, we adopted the Trace Inversion technology. This technology utilizes an external "surrogate model" to reconstruct a complete and logically coherent deep reasoning chain based on the "question + final answer + compressed reasoning summary" published by commercial models. The "reasoning bubble", which originally consisted of only a few sentences and logical leaps, is expanded into a high-quality deep learning trace with complete derivation, calculation, and logical verification, providing step-by-step logical learning signals for the model.

2. GLM-5.1 Agent Real Trace Data: lambda/hermes-agent-reasoning-traces

To significantly enhance the model's execution and coding capabilities in real environments, this model additionally introduced the lambda/hermes-agent-reasoning-traces dataset.

Data Source and Scale: This data subset contains approximately 10,000 high-quality multi-turn Tool Calling Trajectories generated based on the ZhipuAI GLM-5.1 and kimi-4.6 models.
Real Agent Behavior: Unlike traditional synthetic data, these samples represent real Agent conversations. Each sample not only contains the step-by-step reasoning process in the <think> tags, but also includes actual tool execution results (rather than fabricated outputs out of thin air).
Extensive Domain Coverage:
- Terminal & Coding: Script writing, code debugging, environment configuration, and data processing.
- Repository Tasks: Involving real code repository work, such as bug fixes, refactoring, and code review.
- Browser Automation: Web navigation, scraping, and form filling.
- Agent Tools: Memory persistence, task delegation, skill management, etc.

By learning these Agent trajectories that contain real feedback and thoughtful processes, Qwopus3.5-9B-coder can exhibit thinking and operational modes closer to human experts when facing complex programming and system operations tasks.

🗺️ Training Pipeline Overview

The training of this model integrates a phased learning pipeline of Trace Inversion data augmentation technology and high-quality Agent Trajectories data. Its core logic lies in restoring the highly compressed "reasoning bubble" of commercial models into a deep path for learning, and combining it with real agent operational traces to comprehensively improve the model's logical reasoning and code execution capabilities.

       [ 🗺️ Trace Inversion: Full Process of Data Inversion and "Attack" Distillation ]

  A. Surrogate Model Training
     Open Source Model (GLM-5.1 / DS-V4) ──► Complete Reasoning Chain ──► [ Qwen3-235B Compression ] ──► Reasoning Bubbles
                                       │                                   │
                                       └──────────► [ Training ] ◄─────────┘
                                            (Base: Qwen3-4B-Instruct)
                                            (Result: Trace-Inverter-4B)

  B. Inversion Phase: "Attacking" Claude-4.7-Max
     _______________________________________________________
    |                                                       |
    |  Claude-4.7-Max API ──► Compressed Bubbles + Final Answer |
    |_______________________________________________________|
                      │
                      ▼
    [ 🧠 Trace-Inverter-4B (Logical Reconstructor) ] ────► Synthetic CoT
                      │
                      ▼
    [ 🧩 Data Splicing ] ◄────────── (Original Prompt + Response)
    (Embed the inverted chain of thought into <think> tags, and splice with the original Q&A pair for restoration)
                      │
                      ▼
            (Result: claude-opus-4.6/4.7 Inversion Set)

  C. Final SFT Pipeline
     ___________________________________________
    |                                           |
    |      Base Model (Qwopus3.5-9B-v3.5)       |
    |___________________________________________|
                      │
                      ▼
    [ 📦 Stage 1: Format Establishment and Logic Injection ] ───────► [ 🛠️ Stage 2: Agent Trajectories and Programming Reinforcement ]
     (Integrate inverted reasoning data, stabilize thinking format)        (Introduce GLM-5.1 Agent Trajectories, reinforce interaction and execution)
                      │                                 │
                      │                                 ▼
                      │           __________________________________________________
                      │          |  🔍 Hermes Agent Trace Sample Structure Breakdown (GLM-5.1) |
                      │          |  1. [🛠️ System] -> JSON Tool Definition          |
                      │          |  2. [👤 Human]  -> Initial Task Instruction        |
                      │          |  ┌──────────────────────────────────────────────┐ |
                      │          |  │ 🔁 Multi-turn Loop:                           │ |
                      │          |  │ 3. [🧠 GPT]  -> <think> Logical Reasoning/Reflection │ |
                      │          |  │ 4. [🤖 GPT]  -> Tool Call Execution Action    │ |
                      │          |  │ 5. [⚙️ Tool] -> Real Feedback                 │ |
                      │          |  └──────────────────────────────────────────────┘ |
                      │          |__________________________________________________|
                      │                                 │
                      └────────────────┬────────────────┘
                                       ▼
                      ___________________________________
                     |                                   |
                     |   🌟 Final Model: Qwopus3.5-9B-coder  |
                     |___________________________________|

Because agent trajectory datasets are complex and diverse. The datasets have undergone rigorous cleaning and formatting.

🎯 Three-Stage Curriculum Learning

Qwopus3.5-9B-coder adopts a phased reasoning data mixture strategy similar to Curriculum Learning, gradually increasing the difficulty and complexity of training signals:

Early Stage (Format Establishment): Focuses on short-to-medium length reasoning samples with stable formats. The primary goal of this stage is to establish a reliable, structured new reasoning format while avoiding overwhelming the model with extreme complexity.
Middle Stage (Complexity Scaling & Multi-Teacher Distillation): Gradually increases the proportion of complex reasoning samples from multiple teacher models.
- The distillation data is sourced from more powerful models whose style distribution closely matches the base model, ensuring that the capability gap is not too wide, thereby achieving efficient learning.
Late Stage (Long-Context Reinforcement & Drift Prevention): Reinforces reasoning capabilities in long contexts. Crucially, this stage retains short-sample replay to ensure the model maintains its short-context instruction-following capability and minimizes capability drift.

🤝 Collaboration & Training Details

This model is the result of continuous exploration in Agentic AI and reasoning capabilities.

Training Infrastructure & Configuration:

🖥️ Hardware: Local compute devices / Cloud GPUs (e.g. GB10 / H100 / RTX 5090 / A100)
⚙️ Framework: Unsloth for efficient fine-tuning

⚠️ IMPORTANT

Compatibility and Deployment Notice

Tool Calling Format: When using this model for tool calling, please ensure that you use a Prompt format and System Prompt that match the training data to activate its Agent capabilities.

Reasoning Output Extraction: The model's thinking process is typically wrapped in <think> and </think> tags. When deploying to front-end applications, these tags may need to be parsed and hidden.

📚 Resources & Guides

👉 GitHub Repository: Jackrong-llm-finetuning-guide Visit the repository to dive into our fine-tuning codebase and guides.

🙏 Acknowledgements

Special thanks to:

The Qwen team for the strong Qwen3.6 MoE base model.
Unsloth for efficient fine-tuning frameworks.
Open-source datasets and community contributors.
Kyle Hessling for his generous hardware and equipment support. You can follow him for more updates on X / Twitter: @KyleHessling1.

📖 Citation

@misc{jackrong_qwopus35_9b_coder,
  title        = {Qwopus3.5-9B-coder},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face}
}

Downloads last month: -

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for phucngodev/Qwopus3.5-9B-Coder-MTP

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

unsloth/Qwen3.5-9B

Finetuned

Jackrong/Qwopus3.5-9B-v3.5

Adapter

(4)

this model