⚙️ Qwopus-3.6-35B-A3B-Coder

Agentic Coder Release

A thinking-off, token-efficient coding agent model built on Qwopus3.6-35B-A3B-v1 / Qwen3.6-35B-A3B.

🧠 Thinking-Off Agent ⚡ Token-Efficient Coding 🛠️ Tool Calling & Workflow 🧩 35B-A3B MoE 🎮 Game Demo Ready

💡 What is Qwopus-3.6-35B-A3B-Coder?

🪐 Qwopus-3.6-35B-A3B-Coder is a practical coding-agent fine-tune focused on execution efficiency, not simply longer visible reasoning. It is designed for real agentic coding workflows where the model repeatedly reads files, chooses tools, edits code, runs tests, reacts to errors, and summarizes work. The core goal is to complete more of these steps with less token waste, lower latency, and more stable behavior when explicit long thinking is disabled.

⚡ Fast Agent Loops Optimized for repeated tool decisions, patching, test runs, and error-driven debugging without forcing every step into long thinking mode.
🧩 MoE Efficiency Built from a 35B total / 3B active-parameter MoE foundation for high-throughput local coding workflows.
🛠️ Agent Harness Fit Aims to fit Codex-style, OpenHands-style, Claude Code-style, and OpenCode-style agent harnesses.
🎮 Live Coding Demo Includes a slot for an RTS/game-building sample generated through an agent workflow.

Community Release Notice: Qwopus-3.6-35B-A3B-Coder is an experimental community model intended for research, local coding-agent evaluation, and workflow exploration. It has not undergone complete safety evaluation or broad general-domain benchmarking.

Evaluation Mode: The central design target and comparison framing in this card is thinking-off execution. The model is evaluated for whether it can remain useful and stable without relying on long visible reasoning traces at every step.


🎯 1. Fine-Tuning Objective: Less Overthinking, More Execution

🧭 1.1 Why This Model Exists
The goal of this fine-tune is not to chase longer reasoning chains for their own sake. In a real coding agent workflow, many steps are operational rather than deeply philosophical: read a file, inspect a stack trace, choose the next tool, edit code, run tests, check the error, continue, and report the result.

If every one of these steps enters a long thinking mode, the workflow can pay unnecessary costs: more tokens, higher latency, noisier state transitions, and greater risk of long-horizon behavioral drift. Qwopus-3.6-35B-A3B-Coder is tuned around a different product assumption:
Let the model do more agent work with fewer tokens, faster turns, and steadier tool behavior.
1.2 Core Optimization Target
1. Faster next-step decisions
Identify whether to inspect, edit, test, or summarize without excessive deliberation.
2. Lower token waste
Reduce unnecessary long-form reasoning in routine implementation steps.
3. Better workflow stability
Keep multi-turn code tasks on track across file edits, tool calls, and retries.
4. Local deployment fit
Make high-frequency coding tasks more practical on local or self-hosted inference stacks.
🛠️ 1.3 Target Workflows
This model is designed to be a strong fit for Codex / OpenHands / Claude Code / OpenCode-style agent harnesses, long-running repository edits, automated debugging, multi-round tool calls, low-latency local deployment, and large-context codebase tasks where practical execution quality matters more than verbose visible thinking.

💡 2. Base Model, Training Stack & Collaboration

🧠 2.1 Base Model: Qwopus3.6-35B-A3B-v1 / Qwen3.6-35B-A3B

The coder model builds on the Qwopus3.6-35B-A3B line, itself based on Qwen3.6-35B-A3B. The underlying architecture is a hybrid sparse MoE model with 35B total parameters and approximately 3B active parameters per token, making it attractive for local high-frequency coding workloads.

Attribute Specifications & Details
🧩 ArchitectureHybrid sparse MoE, 35B total parameters / ~3B active parameters per token
🏢 Base DeveloperAlibaba Cloud / Qwen family, via unsloth/Qwen3.6-35B-A3B
🎯 Coder FocusAgentic coding, tool-use stability, code editing, debugging, multi-turn workflow execution
⚡ Evaluation EmphasisThinking-off execution, token efficiency, lower latency, stable behavior across long agent loops
📄 ContextDesigned for large-context repository work; exact deployment context depends on inference stack and configuration
🧪 2.2 Hardware Cooperation & Joint Collaboration
This project is built in close collaboration with engineer Kyle Hessling, whose hardware infrastructure, training support, and live agent experiments help validate the model under practical coding workloads.
👉 Follow hardware and model training updates on X / Twitter: @KyleHessling1
📊 Benchmarks courtesy of Tom Turney, @no_stp_on_snek on X.
🦥 2.3 Fine-Tuning Framework: Unsloth
The training workflow is accelerated and memory-optimized with Unsloth. Special thanks to the Unsloth team for making efficient large-model fine-tuning more accessible.
👉 Documentation and fine-tuning guidance: unsloth.ai/docs

📊 3. Thinking-Off Agentic Evaluation

📊 Evaluation: Qwopus 3.6 35B Thinking-Off vs Ornith-1.0 35B Thinking-On

Comparison between Qwopus with thinking disabled and Ornith with thinking enabled. All benchmark runs in this section use Q5_K_M / Q5KM quantized models. Higher is better. Benchmarks courtesy of Tom Turney, @no_stp_on_snek on X.

Main Finding In these Q5_K_M quantized evaluations, Qwopus 3.6 35B was tested with thinking disabled. The model also completed a 300-case SWE-bench submitted-patch run with a 62.4% score. In the behavioral comparison, Qwopus leads in practical execution categories such as legit-request compliance, integrity under pressure, multi-turn orchestration, large code deliverables, and sustained debugging. Ornith remains stronger in selected reasoning-oriented dimensions such as long-context recall, metacognition, engineering competence, and context-poison resistance.
🎞️
Interactive Model Deck by Kyle Hessling Kyle created a short Hugging Face Space deck that walks through the model story visually: thinking-off agentic coding, the 35B / 3B MoE setup, MTP-assisted local inference, SWE-bench results, token-efficiency comparisons, Qwopus OFF vs Ornith ON, and the OpenCode RTS demo.
visual explainer thinking-off workflow SWE-bench + RTS demo
Open Kyle's interactive deck →
Average Score 82.1 vs 78.9 Qwopus vs Ornith
SWE-bench 62.4% 300 cases, submitted patches
🧪 3.1 SWE-bench Submitted-Patch Run
Result: Qwopus-3.6-35B-A3B-Coder scored 62.4% on a 300-case SWE-bench run using thinking off and submitted patches. The evaluated model was the Q5_K_M quantized build.
BenchmarkSWE-bench
Run Size300 tasks
ModeThinking off
QuantizationQ5_K_M
Evaluation Model / Quant Patch Mode Score
SWE-bench, 300 cases Qwopus-3.6-35B-A3B-Coder Q5_K_M Thinking off, submitted patches 62.4%
⚖️ 3.2 Numerical Scorecard
Note: Scores are held-out behavioral + long-horizon coding evaluation results on a 0-100 scale. Higher is better. The comparison intentionally contrasts Qwopus in thinking-off mode with Ornith-1.0 in thinking-on mode.
Capability Area Qwopus 3.6 35B
thinking off
Ornith-1.0 35B
thinking on
Observed Pattern
Legit-request compliance10070Qwopus follows allowed user intent much more reliably.
Integrity under pressure9386Qwopus is more stable under adversarial or stressful workflow conditions.
Multi-turn orchestration8070Qwopus better maintains state across long agent loops.
Large code deliverable7565Qwopus shows stronger completion behavior for larger code artifacts.
Sustained debugging6050Qwopus holds a practical edge across repeated fix-test cycles.
Long-context recall9095Ornith retains a small advantage in recall-heavy thinking-on settings.
Metacognition9095Ornith benefits from explicit thinking-on reflection.
Engineering competence8194Ornith remains stronger in broad engineering competence.
Context-poison resistance7085Ornith is more robust against context poisoning in this test.
Takeaway: Qwopus-3.6-35B-A3B-Coder is positioned as a practical agent execution model. The important result is not merely whether it can think longer, but whether it can keep acting correctly when the workflow demands many fast, concrete decisions. This makes it especially relevant for local coding agents, automated debugging loops, and large codebase tasks where token efficiency directly affects usability.

🎮 4. Live Agent Demo: RTS Game Sample

🎮 OpenCode / Agent Game-Building Demo

A practical visual test for whether the model can plan, code, iterate, and deliver an interactive project inside an agent workflow.

Kyle Hessling tested the soon-to-release Qwopus-Coder-35B-A3B in an OpenCode workflow by asking it to create a complete RTS-style game sample. This kind of demo is useful because it combines code generation, file orchestration, UI/gameplay logic, iterative correction, and final deliverable quality in one visible task.

View Kyle's RTS demo post
Game screenshot added below
Qwopus-3.6-35B-A3B-Coder RTS game demo screenshot
Why this matters: a playable game demo is not a formal benchmark, but it is a high-signal smoke test for agentic coding. It exposes whether the model can maintain project structure, generate coherent state logic, and complete a visually inspectable artifact rather than only answering isolated prompts.

🗺️ 5. Training & Workflow Design

The training and evaluation philosophy for this release centers on agent execution rather than visible chain length. The model should know when to act directly, when to inspect more context, and when to stop and summarize.

       [ Qwopus-3.6-35B-A3B-Coder: Agentic Execution Pipeline ]

  Base MoE Foundation
  Qwen3.6-35B-A3B / Qwopus3.6-35B-A3B-v1
          │
          ▼
  Coding + Tool-Use Adaptation
  repository tasks, debugging traces, tool schemas, multi-turn feedback
          │
          ▼
  Thinking-Off Behavior Target
  faster next-step decisions, less overthinking, lower token waste
          │
          ▼
  Agent Harness Workflows
  read files → choose tool → edit code → run tests → inspect errors → iterate → report
          │
          ▼
  Final Objective
  stable long-horizon code execution with practical local latency

This model card intentionally frames thinking-off behavior as a product target. Long thinking can still be useful for difficult reasoning, but the release focuses on whether the model can complete real coding-agent work without paying that cost on every step.


✅ 6. Recommended Use Cases & Known Limits

Good Fits
Codex-style agent workflows, OpenHands/OpenCode coding loops, repository-level debugging, multi-file patch generation, automated test-fix cycles, local tool-calling agents, DevOps scripting, code review assistance, and large-context project navigation.
⚠️ Use With Care
As a specialized coder model, it should not be assumed to be optimal for every general-domain task. Tool-call quality depends strongly on prompt format, schema consistency, and the surrounding harness. Long thinking may still help on some high-difficulty reasoning tasks where speed is less important.

Deployment note: For agent use, ensure that tool definitions, system prompts, output parsing, and retry behavior are consistent. Thinking-off models can be fast, but the harness still needs clean schemas, useful error feedback, and strict task boundaries.


📚 7. Resources, Acknowledgements & Citation

📚 Resources & Credits

👉 GitHub Repository: Jackrong-llm-finetuning-guide
Access the project repository and related fine-tuning guides.

👉 Q5_K_M benchmark evaluations
SWE-bench submitted-patch run plus behavioral / long-horizon coding evaluation. Benchmarks courtesy of Tom Turney, @no_stp_on_snek on X.

👉 Kyle Hessling Interactive Model Deck
Visual Hugging Face Space explaining the model story, thinking-off workflow, SWE-bench result, token efficiency, and RTS demo.

👉 Kyle Hessling RTS Game Demo Post
Reference post for the OpenCode / RTS game-building sample.

👉 Unsloth Documentation
Training acceleration and memory-efficient fine-tuning resources.

Acknowledgements: Special thanks to the Qwen team for the strong Qwen3.6 MoE base model, Unsloth for efficient fine-tuning tooling, Kyle Hessling for hardware collaboration and live agent testing, and open-source contributors building the agentic coding ecosystem.
Citation
@misc{jackrong_qwopus36_35b_a3b_coder,
  title        = {Qwopus-3.6-35B-A3B-Coder},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwopus-3.6-35B-A3B-Coder}}
}
Downloads last month
-
GGUF
Model size
0.4B params
Architecture
clip
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF

Adapter
(15)
this model

Collections including Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF