Instructions to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF

SGLang

How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF",
    max_seq_length=2048,
)

Docker Model Runner
How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with Docker Model Runner:
```
docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🪐 Qwopus-3.6-27B-Coder

Coder SFT Release

Agentic Coding & Tool-Use Reasoning Model Fine-Tuned on Qwopus3.6-27B-v2

🧬 Trace Inversion & Negentropy 🧠 27B Dense Model ⚡ Agentic Coding 🛠️ Tool Calling & Agent 🏆 SWE-bench Verified: 67.0% (off-thinking)

💡 What is Qwopus-3.6-27B-Coder?

🪐 Qwopus-3.6-27B-Coder is a reasoning-enhanced agentic coding model built on top of Qwopus3.6-27B-v2. It inherits the powerful reasoning foundation of the v2 base — which achieved 87.43% MMLU-Pro (300ex) and 75.25% SWE-bench Verified — and further specializes it for agentic code generation, structured tool calling, debugging, and instruction-following in developer workflows. The model is designed to excel at repository-level coding tasks, multi-turn tool orchestration, and complex logical reasoning under realistic agent environments.

🧩 Agentic Coding Optimized for repository-level coding, debugging, patch generation, and structured multi-step development workflows.

🛠️ Tool Calling Learns from real agent trajectories with tool definitions, tool calls, and environment feedback for robust multi-turn execution.

🧬 Trace Inversion Inherits the full Qwopus training recipe with reconstructed step-by-step reasoning trajectories from Claude Opus.

🚀 27B Scale Dense 27B parameters with native long-context support, delivering deep reasoning with practical single-GPU deployability.

Community Release Notice: Qwopus-3.6-27B-Coder is an experimental community release intended for research, evaluation, and agent workflow exploration. It has not undergone full safety evaluation or broad general-domain benchmarking.

Benchmark Status: The first completed benchmark is SWE-bench Verified full 500 in thinking-off / no-thinking mode, where the Q5_K_M 27B GGUF run resolved 335/500 = 67.0%. Other benchmark suites remain pending and will be updated as testing completes.

💡 1. Base Model, Training Stack & Collaboration

🧠 1.1 Base Model: Qwopus3.6-27B-v2

Qwopus3.6-27B-v2 is a reasoning-enhanced dense language model built on Qwen3.6-27B. Through a multi-stage curriculum learning pipeline and Trace Inversion augmentation, it achieves strong performance across knowledge, coding, and reasoning benchmarks. This coder variant inherits that foundation and extends it with specialized coding and tool-use data.

Attribute	Specifications & Details
🧠 Architecture	Dense Transformer / 27 Billion Parameters
🏢 Base Developer	Alibaba Cloud (DAMO Academy) — Qwen3.6-27B
🎯 Primary Focus	Agentic coding, tool-use stability, code debugging, structured instruction following, repository-level tasks
🧬 Distillation Strategy	Trace Inversion + high-quality agent trajectories + curriculum SFT
📄 Context Window	Native support up to 32K tokens (fine-tuning target); compatible with longer contexts via RoPE/YaRN scaling

🧪 1.2 Hardware Cooperation & Joint Collaboration

This project is built in close collaboration and joint effort with engineer Kyle Hessling, whose hardware infrastructure and training support made stable 27B-scale fine-tuning and evaluation possible.

👉 You can follow him for hardware and model training updates on X / Twitter: @KyleHessling1

🦥 1.3 Fine-Tuning Framework (Unsloth)

The model training workflow is accelerated and memory-optimized with Unsloth. Special thanks to the Unsloth team for making efficient large-model fine-tuning accessible.

👉 Documentation and fine-tuning guidance: unsloth.ai/docs

⚡ 1.4 MTP Variant: Faster Speculative Decoding

A Multi-Token Prediction (MTP) variant of this model is also available, featuring auxiliary prediction heads (draft=2) for speculative decoding. Based on the Qwopus3.6-27B-v2-MTP benchmark, the MTP variant achieved ~1.66x speedup over standard decoding with preserved accuracy. See the Qwopus3.6-27B-v2-MTP model card for detailed MTP performance analysis.

🌟The custom MTP heads processing pipeline is open-sourced in qwen-mtp-gguf. If you find this toolkit helpful, please consider leaving a star on GitHub!

📖 2. Background & Motivation

🎯 2.1 Why a 27B Coder Model?

The Qwopus coder line has demonstrated strong results at the 4B and 9B scales. The 27B coder variant represents a significant leap in reasoning depth, code generation quality, and tool-use robustness. At 27B parameters, the model has sufficient capacity to internalize complex repository structures, multi-file dependencies, and nuanced tool-calling patterns — while remaining deployable on a single GPU (e.g., RTX 5090). This scale bridges the gap between compact local models and expensive API-based solutions, making it suitable for production agentic coding workflows.

🧬 2.2 Trace Inversion & Agent Behavior

Commercial and frontier models often expose only compressed reasoning summaries. Qwopus-style training uses Trace Inversion to reconstruct these compressed "Reasoning Bubbles" into fuller learnable reasoning traces. For coding, this is paired with agent trajectories that include tool definitions, tool calls, and real feedback, teaching the model to reason through interactive work rather than only produce static answers.

This model integrates:

claude-opus-4.6-traceInversion-9000x: 9,000 high-value, fully reconstructed step-by-step reasoning trajectories.
claude-opus-4.7-traceInversion-5000x: 5,000 complex multi-turn logic and mathematics samples optimized for negative entropy reconstruction.
lambda/hermes-agent-reasoning-traces: ~10,000 high-quality multi-turn tool-calling trajectories from GLM-5.1 and kimi-4.6 models.

📦 2.3 Special Dataset: Trace Inversion & Agent Traces

Trace Inversion: Uses a specialized logical reconstructor, Trace-Inverter-4B, to reverse-engineer compressed reasoning bubbles into complete, step-by-step learnable CoT chains. This approach addresses the "Information Entropy Trap" — where direct imitation of compressed summaries leads to reasoning fractures — by ensuring the model learns continuous, rigorous logical derivations.

Agent Traces (lambda/hermes-agent-reasoning-traces): Each sample contains real multi-turn tool execution results (not fabricated outputs), with step-by-step reasoning inside <think> tags. Coverage includes:

Terminal & Coding: Script writing, debugging, environment configuration
Repository Tasks: Bug fixing, refactoring, code review
Browser Automation: Web navigation, scraping, form filling
Agent Tools: Memory persistence, task delegation, skill management

📊 3. Performance Benchmarks

📊 Evaluation & Performance Metrics

First completed result: SWE-bench Verified full 500, evaluated in no-thinking mode for fast local agentic coding.

⚡

No-Thinking SWE-bench Result This benchmark was intentionally run with thinking disabled. The goal is to show the model's practical coding ability when used as a fast local agent, without relying on long visible reasoning traces. On an RTX 5090 with MTP enabled, the model runs at approximately 100 tokens/sec, making this result especially relevant for interactive development workflows.

SWE-bench Verified 67.0% 335 / 500 resolved

Inference Mode Thinking Off no visible CoT required

Local Throughput ~100 t/s RTX 5090 + MTP

Evaluation Build Q5_K_M 27B GGUF quant

Evaluation setup: SWE-bench Verified full 500, Qwopus-3.6-27B-Coder Q5_K_M GGUF, thinking-off / no-thinking mode. Final score: 335/500 = 67.0%.

💻 3.1 SWE-bench Verified: Full 500 No-Thinking Result

SWE-bench Verified measures whether a model can solve real GitHub issues by editing repository code and passing the hidden tests. In this run, Qwopus-3.6-27B-Coder solved 335 out of 500 verified tasks while running in no-thinking mode, prioritizing direct action quality and local speed over long explicit reasoning.

Metric	Result	Notes
Final score	335/500 = 67.0%	Full SWE-bench Verified 500-task split
Mode	Thinking off	No long visible chain-of-thought during evaluation
Quantization	Q5_K_M GGUF	Local 27B quantized deployment
Throughput	~100 tokens/sec	Observed on RTX 5090 with MTP enabled

🧩 3.2 Repository-Level Breakdown

The result is strongest on practical library-maintenance tasks such as scikit-learn, xarray, requests, and Django, while also showing solid coverage on symbolic mathematics, test infrastructure, documentation tooling, and plotting libraries.

Repository

Resolved

Rate

scikit-learn

27/32

84%

pydata/xarray

18/22

82%

psf/requests

6/8

75%

django

166/231

72%

sympy

48/75

64%

pytest

12/19

63%

sphinx-doc

26/44

59%

matplotlib

20/34

59%

astropy

9/22

41%

pylint

2/10

20%

⚖️ 3.3 SWE-bench Verified Reference Comparison

Important comparison note: the reference scores below are from external model reports and are generally thinking-enabled or harness-specific where noted. Qwopus-3.6-27B-Coder is shown here as a no-thinking, quantized local run, so this table should be read as positioning context rather than a strict same-mode leaderboard.

Model	Thinking Mode	SWE-bench Verified	Context
Qwopus-3.6-27B-Coder	Off / No-thinking	67.0	Q5_K_M, RTX 5090 + MTP, ~100 t/s
OpenAI GPT-5	On	70.1	Thinking-on reference
OpenAI GPT-5 mini	On	59.8	Thinking-on reference
OpenAI GPT-5 nano	On	34.8	Thinking-on reference
GLM-4.7	On	70.6	OpenHands reference
GLM-4.5-Air	On	57.6	OpenHands reference
Qwen3-Coder-30B-A3B-Instruct (2025-07)	Off / No-thinking	70.3	No-thinking reference
Claude 4.0 Opus	On	67.6	Thinking-on reference
Claude 4.5 Opus	On	80.9	Thinking-on reference
Qwen3.6-27B	On	77.2	Thinking-on reference
Qwen3.5-397B-A17B	On	76.2	Thinking-on reference
Qwen3.5-27B	On	75.0	Thinking-on reference
Qwen3.6-35B-A3B	On	73.4	Thinking-on reference
Gemma4-31B	On	52.0	Thinking-on reference
Gemma4-26B-A4B	On	17.4	Thinking-on reference

🎮 3.4 Live Thinking-Disabled Demo: Boat Survival

Kyle Hessling also tested Qwopus-3.6-27B-Coder in a small interactive game environment with thinking disabled. The demo is a practical smoke test for fast decision-making, instruction adherence, and local responsiveness beyond static benchmark tables.

Open the Hugging Face Space View Kyle's reference post

Boat Survival thinking-disabled Qwopus-3.6-27B-Coder demo screenshot

Takeaway: The headline is not that this no-thinking local run beats every thinking-enabled frontier reference. The important result is that a quantized 27B local coder can reach 67.0% on the full SWE-bench Verified split while staying fast enough for interactive agent loops. This makes Qwopus-3.6-27B-Coder a practical option for developers who want strong repository-level repair performance without paying the latency cost of long reasoning mode.

🗺️ 4. Training & Data Pipeline Overview

The training process fuses Trace Inversion data augmentation with a Three-Stage Curriculum Learning pipeline. The core engineering focuses on expanding context length gradually while training on reconstructed reasoning traces and real agent trajectories to keep the output format stable.

       [ 🗺️ Trace Inversion: Reconstructing Distillation Workflow ]

  A. Surrogate Model Training (Trace Inverter)
     Open-source Model (GLM-5.1 / DS-V4) ──► Complete Reasoning Chain ──► [ Qwen3-235B Compression ] ──► Reasoning Bubbles
                                              │                                   │
                                              └──────────► [ Training ] ◄─────────┘
                                                   (Base: Qwen3-4B-Instruct)
                                                   (Result: Trace-Inverter-4B)

  B. Inversion Phase: Reconstructing Claude-4.7-Max
     _______________________________________________________
    |                                                       |
    |  Claude-4.7-Max API ──► Compressed Bubbles + Answer   |
    |_______________________________________________________|
                      │
                      ▼
    [ 🧠 Trace-Inverter-4B (Logic Reconstructor) ] ──► Synthetic Deep Reasoning Trace (Learnable CoT)
                      │
                      ▼
    [ 🧩 Data Splicing ] ◄────────── (Original Prompt + Response)
    (Embed reconstructed CoT in <think> tags, splicing with original prompt/response)
                      │
                      ▼
             (Result: claude-opus-4.6/4.7 inverted sets)

  C. Final Coder SFT Curriculum Pipeline
     ___________________________________________
    |                                           |
    |       Base Model (Qwopus3.6-27B-v2)       |
    |___________________________________________|
                      │
                      ▼
    [ 📦 Phase 1: Format Inception ] ──► [ 🛠️ Phase 2: Agent/Coding Expansion ] ──► [ 🚀 Phase 3: Long-Context SFT ]
      ( < 4096 tokens )                     ( 4096 - 8192 tokens )                     ( 8192 - 32K tokens )
      (Stable <think> format)               (Tool traces + coding tasks)               (Long / multi-turn / replay)
                      │                                                                            │
                      └─────────────────────────────┬──────────────────────────────────────────────┘
                                                    ▼
                                   _______________________________________________
                                  |                                               |
                                  |   🌟 Final Model: Qwopus-3.6-27B-Coder        |
                                  |_______________________________________________|

Due to the complex and diverse format of agent trajectory datasets, rigorous cleaning and format standardization were applied to ensure data quality.

📚 5. Three-Stage Curriculum Learning

To steadily scale reasoning quality under long-context inference, Qwopus-3.6-27B-Coder uses a curriculum-style data mixture building on the approach proven in the Qwopus coder line. The model is first stabilized on short, clean reasoning samples, then exposed to complex coding and agent traces, and finally reinforced with longer contexts plus replay data.

Curriculum Stage	Focus & Sample Characteristics	Strategy Details
📦 Stage 1: Format Inception	• Limit context within 4,096 tokens • Emphasize stable reasoning templates	Focuses on short-to-medium length, cleanly formatted reasoning samples. The primary goal is to establish reliable structured reasoning output, including stable `<think>` boundaries, before exposing the model to longer chains.
🛠️ Stage 2: Complexity Expansion	• Extend length to 4,096 - 8,192 tokens • Introduce higher-difficulty coding and agent samples	Gradually increases the ratio of complex reasoning chains, code debugging tasks, and multi-turn tool traces. The model learns to connect reasoning, action selection, and environment feedback.
🚀 Stage 3: Long-Context SFT	• Progressively scale samples up to 32K tokens • Use short-sample replay	Pushes the model toward long-context and multi-turn reasoning while replaying high-quality short samples to reduce instruction-following drift. The 32K figure describes the fine-tuning sequence/data mixture target, not a hard architectural limit.

🎯 6. Recommended Use Cases & Known Limits

✅ Good Fits

Agentic code generation and repository-level debugging, complex tool-call orchestration, structured multi-step reasoning, code review and patch generation, DevOps scripting and automation, and any workflow requiring deep logical reasoning combined with tool execution.

❌ Known Limits

As a specialized coder model, it has not undergone comprehensive general-domain safety evaluation. Capability decay may occur in non-coding or non-agent tasks. Tool-call behavior depends strongly on prompt format and tool schema consistency. Long-context performance beyond 32K may require RoPE/YaRN scaling.

Deployment note: The model may emit reasoning inside <think> and </think> tags. Front-end applications and agent frameworks should parse or hide these sections where appropriate. For tool calling, ensure the prompt format and system prompt match the training data configuration to activate agent capabilities.

⚠️ 7. Training & Deployment Notes

Compatibility Notes

Tool Calling Format: To activate the model's agent capabilities, ensure the prompt format and system prompt include appropriate tool definitions and match the training data format.

Reasoning Output Extraction: The model's thinking process is wrapped in <think> and </think> tags. Front-end applications may need to parse and hide these tags.

Long-Context Usage: For contexts beyond 32K, consider enabling RoPE/YaRN scaling (e.g., --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768 in llama.cpp).

📋 8. Benchmark Progress

The first completed evaluation is the no-thinking SWE-bench Verified run reported above. Additional local agentic benchmarks remain pending and will be added after testing.

Benchmark	Status	Result / Reference
SWE-bench Verified	✅ Completed	335/500 = 67.0% (thinking-off, Q5_K_M, RTX 5090 + MTP)
BugFind-15	📋 Pending	9B reference: 79
HermesAgent-20	📋 Pending	9B reference: 85
ToolCall-15	📋 Pending	9B reference: 100
InstructFollow-15	📋 Pending	9B reference: 93

📚 9. Resources & Guides

👉 GitHub Repository: Jackrong-llm-finetuning-guide Access the repository to dive into the codebase and reproduce our results.

👉 Qwen MTP GGUF Processing Workflow A custom splitting and merging methodology designed specifically for Qwen series Multi-Token Prediction (MTP) heads.

👉 benchlocal Evaluation Framework The evaluation framework used to run the local agentic and coding benchmarks.

👉 Qwopus3.6-27B-v2 Model Card Base model card with full MMLU-Pro, SWE-bench, and throughput benchmarks.

🙏 10. Acknowledgements

Special thanks to:

The Qwen team for providing the powerful Qwen3.6-27B base model.
Unsloth for providing the highly efficient fine-tuning framework.
Kyle Hessling for the close collaboration on hardware, training infrastructure, and evaluation support.
Open-source datasets and community contributors, particularly lambda/hermes-agent-reasoning-traces for the high-quality agent trajectory data.

📖 11. Citation

@misc{jackrong_qwopus36_27b_coder,
  title        = {Qwopus-3.6-27B-Coder},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwopus-3.6-27B-Coder}}
}

Downloads last month: 11,291

GGUF

Model size

0.5B params

Architecture

clip

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF

Base model

Jackrong/Qwopus3.6-27B-v2

Adapter

(14)

this model

Quantizations

1 model

Jackrong
/

Qwopus3.6-27B-Coder-MTP-GGUF