Instructions to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF
- SGLang
How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF", max_seq_length=2048, ) - Docker Model Runner
How to use Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF with Docker Model Runner:
docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF
- 💡 1. Base Model, Training Stack & Collaboration
- 📖 2. Background & Motivation
- 📊 3. Performance Benchmarks
- 🗺️ 4. Training & Data Pipeline Overview
- 📚 5. Three-Stage Curriculum Learning
- 🎯 6. Recommended Use Cases & Known Limits
- ⚠️ 7. Training & Deployment Notes
- 📋 8. Benchmark Progress
- 📚 9. Resources & Guides
- 🙏 10. Acknowledgements
- 📖 11. Citation
Community Release Notice: Qwopus-3.6-27B-Coder is an experimental community release intended for research, evaluation, and agent workflow exploration. It has not undergone full safety evaluation or broad general-domain benchmarking.
Benchmark Status: The first completed benchmark is SWE-bench Verified full 500 in thinking-off / no-thinking mode, where the Q5_K_M 27B GGUF run resolved 335/500 = 67.0%. Other benchmark suites remain pending and will be updated as testing completes.
💡 1. Base Model, Training Stack & Collaboration
📖 2. Background & Motivation
📊 3. Performance Benchmarks
🗺️ 4. Training & Data Pipeline Overview
The training process fuses Trace Inversion data augmentation with a Three-Stage Curriculum Learning pipeline. The core engineering focuses on expanding context length gradually while training on reconstructed reasoning traces and real agent trajectories to keep the output format stable.
[ 🗺️ Trace Inversion: Reconstructing Distillation Workflow ]
A. Surrogate Model Training (Trace Inverter)
Open-source Model (GLM-5.1 / DS-V4) ──► Complete Reasoning Chain ──► [ Qwen3-235B Compression ] ──► Reasoning Bubbles
│ │
└──────────► [ Training ] ◄─────────┘
(Base: Qwen3-4B-Instruct)
(Result: Trace-Inverter-4B)
B. Inversion Phase: Reconstructing Claude-4.7-Max
_______________________________________________________
| |
| Claude-4.7-Max API ──► Compressed Bubbles + Answer |
|_______________________________________________________|
│
▼
[ 🧠 Trace-Inverter-4B (Logic Reconstructor) ] ──► Synthetic Deep Reasoning Trace (Learnable CoT)
│
▼
[ 🧩 Data Splicing ] ◄────────── (Original Prompt + Response)
(Embed reconstructed CoT in <think> tags, splicing with original prompt/response)
│
▼
(Result: claude-opus-4.6/4.7 inverted sets)
C. Final Coder SFT Curriculum Pipeline
___________________________________________
| |
| Base Model (Qwopus3.6-27B-v2) |
|___________________________________________|
│
▼
[ 📦 Phase 1: Format Inception ] ──► [ 🛠️ Phase 2: Agent/Coding Expansion ] ──► [ 🚀 Phase 3: Long-Context SFT ]
( < 4096 tokens ) ( 4096 - 8192 tokens ) ( 8192 - 32K tokens )
(Stable <think> format) (Tool traces + coding tasks) (Long / multi-turn / replay)
│ │
└─────────────────────────────┬──────────────────────────────────────────────┘
▼
_______________________________________________
| |
| 🌟 Final Model: Qwopus-3.6-27B-Coder |
|_______________________________________________|
Due to the complex and diverse format of agent trajectory datasets, rigorous cleaning and format standardization were applied to ensure data quality.
📚 5. Three-Stage Curriculum Learning
To steadily scale reasoning quality under long-context inference, Qwopus-3.6-27B-Coder uses a curriculum-style data mixture building on the approach proven in the Qwopus coder line. The model is first stabilized on short, clean reasoning samples, then exposed to complex coding and agent traces, and finally reinforced with longer contexts plus replay data.
| Curriculum Stage | Focus & Sample Characteristics | Strategy Details |
|---|---|---|
| 📦 Stage 1: Format Inception | • Limit context within 4,096 tokens • Emphasize stable reasoning templates |
Focuses on short-to-medium length, cleanly formatted reasoning samples. The primary goal is to establish reliable structured reasoning output, including stable <think> boundaries, before exposing the model to longer chains. |
| 🛠️ Stage 2: Complexity Expansion | • Extend length to 4,096 - 8,192 tokens • Introduce higher-difficulty coding and agent samples |
Gradually increases the ratio of complex reasoning chains, code debugging tasks, and multi-turn tool traces. The model learns to connect reasoning, action selection, and environment feedback. |
| 🚀 Stage 3: Long-Context SFT | • Progressively scale samples up to 32K tokens • Use short-sample replay |
Pushes the model toward long-context and multi-turn reasoning while replaying high-quality short samples to reduce instruction-following drift. The 32K figure describes the fine-tuning sequence/data mixture target, not a hard architectural limit. |
🎯 6. Recommended Use Cases & Known Limits
Deployment note: The model may emit reasoning inside
<think>and</think>tags. Front-end applications and agent frameworks should parse or hide these sections where appropriate. For tool calling, ensure the prompt format and system prompt match the training data configuration to activate agent capabilities.
⚠️ 7. Training & Deployment Notes
Compatibility Notes
- Tool Calling Format: To activate the model's agent capabilities, ensure the prompt format and system prompt include appropriate tool definitions and match the training data format.
- Reasoning Output Extraction: The model's thinking process is wrapped in
<think>and</think>tags. Front-end applications may need to parse and hide these tags.- Long-Context Usage: For contexts beyond 32K, consider enabling RoPE/YaRN scaling (e.g.,
--rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768inllama.cpp).
📋 8. Benchmark Progress
The first completed evaluation is the no-thinking SWE-bench Verified run reported above. Additional local agentic benchmarks remain pending and will be added after testing.
| Benchmark | Status | Result / Reference |
|---|---|---|
| SWE-bench Verified | ✅ Completed | 335/500 = 67.0% (thinking-off, Q5_K_M, RTX 5090 + MTP) |
| BugFind-15 | 📋 Pending | 9B reference: 79 |
| HermesAgent-20 | 📋 Pending | 9B reference: 85 |
| ToolCall-15 | 📋 Pending | 9B reference: 100 |
| InstructFollow-15 | 📋 Pending | 9B reference: 93 |
📚 9. Resources & Guides
👉 GitHub Repository: Jackrong-llm-finetuning-guide Access the repository to dive into the codebase and reproduce our results.
👉 Qwen MTP GGUF Processing Workflow A custom splitting and merging methodology designed specifically for Qwen series Multi-Token Prediction (MTP) heads.
👉 benchlocal Evaluation Framework The evaluation framework used to run the local agentic and coding benchmarks.
👉 Qwopus3.6-27B-v2 Model Card Base model card with full MMLU-Pro, SWE-bench, and throughput benchmarks.
🙏 10. Acknowledgements
Special thanks to:
- The Qwen team for providing the powerful Qwen3.6-27B base model.
- Unsloth for providing the highly efficient fine-tuning framework.
- Kyle Hessling for the close collaboration on hardware, training infrastructure, and evaluation support.
- Open-source datasets and community contributors, particularly
lambda/hermes-agent-reasoning-tracesfor the high-quality agent trajectory data.
📖 11. Citation
@misc{jackrong_qwopus36_27b_coder,
title = {Qwopus-3.6-27B-Coder},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Jackrong/Qwopus-3.6-27B-Coder}}
}
- Downloads last month
- 11,291
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit