Instructions to use Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Jackrong/Qwopus3.6-35B-A3B-Coder-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Jackrong/Qwopus3.6-35B-A3B-Coder-FP8")
model = AutoModelForMultimodalLM.from_pretrained("Jackrong/Qwopus3.6-35B-A3B-Coder-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jackrong/Qwopus3.6-35B-A3B-Coder-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-35B-A3B-Coder-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Jackrong/Qwopus3.6-35B-A3B-Coder-FP8

SGLang

How to use Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jackrong/Qwopus3.6-35B-A3B-Coder-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-35B-A3B-Coder-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jackrong/Qwopus3.6-35B-A3B-Coder-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-35B-A3B-Coder-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Jackrong/Qwopus3.6-35B-A3B-Coder-FP8",
    max_seq_length=2048,
)

Docker Model Runner
How to use Jackrong/Qwopus3.6-35B-A3B-Coder-FP8 with Docker Model Runner:
```
docker model run hf.co/Jackrong/Qwopus3.6-35B-A3B-Coder-FP8
```

⚙️ Qwopus-3.6-35B-A3B-Coder

Agentic Coder Release

A thinking-off, token-efficient coding agent model built on Qwopus3.6-35B-A3B-v1 / Qwen3.6-35B-A3B.

🧠 Thinking-Off Agent ⚡ Token-Efficient Coding 🛠️ Tool Calling & Workflow 🧩 35B-A3B MoE 🎮 Game Demo Ready

💡 What is Qwopus-3.6-35B-A3B-Coder?

🪐 Qwopus-3.6-35B-A3B-Coder is a practical coding-agent fine-tune focused on execution efficiency, not simply longer visible reasoning. It is designed for real agentic coding workflows where the model repeatedly reads files, chooses tools, edits code, runs tests, reacts to errors, and summarizes work. The core goal is to complete more of these steps with less token waste, lower latency, and more stable behavior when explicit long thinking is disabled.

⚡ Fast Agent Loops Optimized for repeated tool decisions, patching, test runs, and error-driven debugging without forcing every step into long thinking mode.

🧩 MoE Efficiency Built from a 35B total / 3B active-parameter MoE foundation for high-throughput local coding workflows.

🛠️ Agent Harness Fit Aims to fit Codex-style, OpenHands-style, Claude Code-style, and OpenCode-style agent harnesses.

🎮 Live Coding Demo Includes a slot for an RTS/game-building sample generated through an agent workflow.

Community Release Notice: Qwopus-3.6-35B-A3B-Coder is an experimental community model intended for research, local coding-agent evaluation, and workflow exploration. It has not undergone complete safety evaluation or broad general-domain benchmarking.

Evaluation Mode: The central design target and comparison framing in this card is thinking-off execution. The model is evaluated for whether it can remain useful and stable without relying on long visible reasoning traces at every step.

🎯 1. Fine-Tuning Objective: Less Overthinking, More Execution

🧭 1.1 Why This Model Exists

The goal of this fine-tune is not to chase longer reasoning chains for their own sake. In a real coding agent workflow, many steps are operational rather than deeply philosophical: read a file, inspect a stack trace, choose the next tool, edit code, run tests, check the error, continue, and report the result.

If every one of these steps enters a long thinking mode, the workflow can pay unnecessary costs: more tokens, higher latency, noisier state transitions, and greater risk of long-horizon behavioral drift. Qwopus-3.6-35B-A3B-Coder is tuned around a different product assumption:

Let the model do more agent work with fewer tokens, faster turns, and steadier tool behavior.

⚡ 1.2 Core Optimization Target

1. Faster next-step decisions
Identify whether to inspect, edit, test, or summarize without excessive deliberation.

2. Lower token waste
Reduce unnecessary long-form reasoning in routine implementation steps.

3. Better workflow stability
Keep multi-turn code tasks on track across file edits, tool calls, and retries.

4. Local deployment fit
Make high-frequency coding tasks more practical on local or self-hosted inference stacks.

🛠️ 1.3 Target Workflows

This model is designed to be a strong fit for Codex / OpenHands / Claude Code / OpenCode-style agent harnesses, long-running repository edits, automated debugging, multi-round tool calls, low-latency local deployment, and large-context codebase tasks where practical execution quality matters more than verbose visible thinking.

💡 2. Base Model, Training Stack & Collaboration

🧠 2.1 Base Model: Qwopus3.6-35B-A3B-v1 / Qwen3.6-35B-A3B

The coder model builds on the Qwopus3.6-35B-A3B line, itself based on Qwen3.6-35B-A3B. The underlying architecture is a hybrid sparse MoE model with 35B total parameters and approximately 3B active parameters per token, making it attractive for local high-frequency coding workloads.

Attribute	Specifications & Details
🧩 Architecture	Hybrid sparse MoE, 35B total parameters / ~3B active parameters per token
🏢 Base Developer	Alibaba Cloud / Qwen family, via unsloth/Qwen3.6-35B-A3B
🎯 Coder Focus	Agentic coding, tool-use stability, code editing, debugging, multi-turn workflow execution
⚡ Evaluation Emphasis	Thinking-off execution, token efficiency, lower latency, stable behavior across long agent loops
📄 Context	Designed for large-context repository work; exact deployment context depends on inference stack and configuration

🧪 2.2 Hardware Cooperation & Joint Collaboration

This project is built in close collaboration with engineer Kyle Hessling, whose hardware infrastructure, training support, and live agent experiments help validate the model under practical coding workloads.

👉 Follow hardware and model training updates on X / Twitter: @KyleHessling1

📊 Benchmarks courtesy of Tom Turney, @no_stp_on_snek on X.

🦥 2.3 Fine-Tuning Framework: Unsloth

The training workflow is accelerated and memory-optimized with Unsloth. Special thanks to the Unsloth team for making efficient large-model fine-tuning more accessible.

👉 Documentation and fine-tuning guidance: unsloth.ai/docs

📊 3. Thinking-Off Agentic Evaluation

📊 Evaluation: Qwopus 3.6 35B Thinking-Off vs Ornith-1.0 35B Thinking-On

Comparison between Qwopus with thinking disabled and Ornith with thinking enabled. All benchmark runs in this section use Q5_K_M / Q5KM quantized models. Higher is better. Benchmarks courtesy of Tom Turney, @no_stp_on_snek on X.

⚡

Main Finding In these Q5_K_M quantized evaluations, Qwopus 3.6 35B was tested with thinking disabled. The model also completed a 300-case SWE-bench submitted-patch run with a 62.4% score. In the behavioral comparison, Qwopus leads in practical execution categories such as legit-request compliance, integrity under pressure, multi-turn orchestration, large code deliverables, and sustained debugging. Ornith remains stronger in selected reasoning-oriented dimensions such as long-context recall, metacognition, engineering competence, and context-poison resistance.

🎞️

Interactive Model Deck by Kyle Hessling Kyle created a short Hugging Face Space deck that walks through the model story visually: thinking-off agentic coding, the 35B / 3B MoE setup, MTP-assisted local inference, SWE-bench results, token-efficiency comparisons, Qwopus OFF vs Ornith ON, and the OpenCode RTS demo.

visual explainer thinking-off workflow SWE-bench + RTS demo

Open Kyle's interactive deck →

Average Score 82.1 vs 78.9 Qwopus vs Ornith

SWE-bench 62.4% 300 cases, submitted patches

🧪 3.1 SWE-bench Submitted-Patch Run

Result: Qwopus-3.6-35B-A3B-Coder scored 62.4% on a 300-case SWE-bench run using thinking off and submitted patches. The evaluated model was the Q5_K_M quantized build.

BenchmarkSWE-bench

Run Size300 tasks

ModeThinking off

QuantizationQ5_K_M

Evaluation	Model / Quant	Patch Mode	Score
SWE-bench, 300 cases	Qwopus-3.6-35B-A3B-Coder Q5_K_M	Thinking off, submitted patches	62.4%

⚖️ 3.2 Numerical Scorecard

Note: Scores are held-out behavioral + long-horizon coding evaluation results on a 0-100 scale. Higher is better. The comparison intentionally contrasts Qwopus in thinking-off mode with Ornith-1.0 in thinking-on mode.

Capability Area	Qwopus 3.6 35B thinking off	Ornith-1.0 35B thinking on	Observed Pattern
Legit-request compliance	100	70	Qwopus follows allowed user intent much more reliably.
Integrity under pressure	93	86	Qwopus is more stable under adversarial or stressful workflow conditions.
Multi-turn orchestration	80	70	Qwopus better maintains state across long agent loops.
Large code deliverable	75	65	Qwopus shows stronger completion behavior for larger code artifacts.
Sustained debugging	60	50	Qwopus holds a practical edge across repeated fix-test cycles.
Long-context recall	90	95	Ornith retains a small advantage in recall-heavy thinking-on settings.
Metacognition	90	95	Ornith benefits from explicit thinking-on reflection.
Engineering competence	81	94	Ornith remains stronger in broad engineering competence.
Context-poison resistance	70	85	Ornith is more robust against context poisoning in this test.

Takeaway: Qwopus-3.6-35B-A3B-Coder is positioned as a practical agent execution model. The important result is not merely whether it can think longer, but whether it can keep acting correctly when the workflow demands many fast, concrete decisions. This makes it especially relevant for local coding agents, automated debugging loops, and large codebase tasks where token efficiency directly affects usability.

🎮 4. Live Agent Demo: RTS Game Sample

🎮 OpenCode / Agent Game-Building Demo

A practical visual test for whether the model can plan, code, iterate, and deliver an interactive project inside an agent workflow.

Kyle Hessling tested the soon-to-release Qwopus-Coder-35B-A3B in an OpenCode workflow by asking it to create a complete RTS-style game sample. This kind of demo is useful because it combines code generation, file orchestration, UI/gameplay logic, iterative correction, and final deliverable quality in one visible task.

View Kyle's RTS demo post

Game screenshot added below

Qwopus-3.6-35B-A3B-Coder RTS game demo screenshot

Why this matters: a playable game demo is not a formal benchmark, but it is a high-signal smoke test for agentic coding. It exposes whether the model can maintain project structure, generate coherent state logic, and complete a visually inspectable artifact rather than only answering isolated prompts.

🗺️ 5. Training & Workflow Design

The training and evaluation philosophy for this release centers on agent execution rather than visible chain length. The model should know when to act directly, when to inspect more context, and when to stop and summarize.

       [ Qwopus-3.6-35B-A3B-Coder: Agentic Execution Pipeline ]

  Base MoE Foundation
  Qwen3.6-35B-A3B / Qwopus3.6-35B-A3B-v1
          │
          ▼
  Coding + Tool-Use Adaptation
  repository tasks, debugging traces, tool schemas, multi-turn feedback
          │
          ▼
  Thinking-Off Behavior Target
  faster next-step decisions, less overthinking, lower token waste
          │
          ▼
  Agent Harness Workflows
  read files → choose tool → edit code → run tests → inspect errors → iterate → report
          │
          ▼
  Final Objective
  stable long-horizon code execution with practical local latency

This model card intentionally frames thinking-off behavior as a product target. Long thinking can still be useful for difficult reasoning, but the release focuses on whether the model can complete real coding-agent work without paying that cost on every step.

✅ 6. Recommended Use Cases & Known Limits

✅ Good Fits

Codex-style agent workflows, OpenHands/OpenCode coding loops, repository-level debugging, multi-file patch generation, automated test-fix cycles, local tool-calling agents, DevOps scripting, code review assistance, and large-context project navigation.

⚠️ Use With Care

As a specialized coder model, it should not be assumed to be optimal for every general-domain task. Tool-call quality depends strongly on prompt format, schema consistency, and the surrounding harness. Long thinking may still help on some high-difficulty reasoning tasks where speed is less important.

Deployment note: For agent use, ensure that tool definitions, system prompts, output parsing, and retry behavior are consistent. Thinking-off models can be fast, but the harness still needs clean schemas, useful error feedback, and strict task boundaries.

📚 7. Resources, Acknowledgements & Citation

📚 Resources & Credits

👉 GitHub Repository: Jackrong-llm-finetuning-guide
Access the project repository and related fine-tuning guides.

👉 Q5_K_M benchmark evaluations
SWE-bench submitted-patch run plus behavioral / long-horizon coding evaluation. Benchmarks courtesy of Tom Turney, @no_stp_on_snek on X.

👉 Kyle Hessling Interactive Model Deck
Visual Hugging Face Space explaining the model story, thinking-off workflow, SWE-bench result, token efficiency, and RTS demo.

👉 Kyle Hessling RTS Game Demo Post
Reference post for the OpenCode / RTS game-building sample.

👉 Unsloth Documentation
Training acceleration and memory-efficient fine-tuning resources.

Acknowledgements: Special thanks to the Qwen team for the strong Qwen3.6 MoE base model, Unsloth for efficient fine-tuning tooling, Kyle Hessling for hardware collaboration and live agent testing, and open-source contributors building the agentic coding ecosystem.

Citation

@misc{jackrong_qwopus36_35b_a3b_coder,
  title        = {Qwopus-3.6-35B-A3B-Coder},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwopus-3.6-35B-A3B-Coder}}
}

Release variant

Fine-grained FP8 E4M3 vLLM-compatible release of Jackrong/Qwopus3.6-35B-A3B-Coder using the official Qwen3.6-35B-A3B FP8 quantization format. Local vLLM smoke and 30-question checks were run before upload; answer-only QA passed with no empty answers, no binary/unicode replacement garbage, and no max-token hits.

FP8 release validation

This repository is the vLLM-compatible fine-grained FP8 E4M3 release of Jackrong/Qwopus3.6-35B-A3B-Coder.

Target repo: Jackrong/Qwopus3.6-35B-A3B-Coder-FP8
Source repo: Jackrong/Qwopus3.6-35B-A3B-Coder
Format: Qwen3.6 fine-grained FP8 layout with per-expert MoE tensors and *_scale_inv tensors.
Local vLLM smoke test: passed; output loaded normally and did not show binary/unicode replacement garbage.
30-question vLLM test: completed 30/30; answer-only QA passed with no empty answers, no binary/unicode replacement garbage, and no max-token hits.
Observed benchmark throughput: 51.80 tokens/s.

Local validation artifacts on the release machine:

Smoke log: /workspace/renji-training/logs/qwopus36_35b_coder_fp8_smoke.log
Benchmark report: /workspace/renji-training/Jackrong/Qwopus3.6-35B-A3B-Coder-FP8-vllm/test_data/vllm_fp8_30q_report.md
Answer-only QA report: /workspace/renji-training/Jackrong/Qwopus3.6-35B-A3B-Coder-FP8-vllm/test_data/answer_only_quality_gate.json