Instructions to use osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8 with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8 with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8 with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8

Run Hermes

hermes

MLX LM

How to use osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8 with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8

✅ MTP PRESERVED fp16 inside this model — native multi-token-prediction speculative decoding works with no external drafter. ✅ VISION tower preserved fp16. ✅ SSM-sensitive params (a_log, dt_bias, conv1d) kept fp16. Quantized with mlx-mtp — a pure–Apple-mlx stack (no third-party ML-inference frameworks at runtime).

MXFP8 (8-bit microscaling) MLX quantization of a ZeroFuse-abliterated Qwopus 3.6 27B Coder (Jackrong's agentic-coding SFT of Qwen 3.6 27B × Claude-Opus reasoning distill). Refusals reduced from 86/100 → 8/100 with KL drift of 0.007. Tensor set is identical to the base model (1199 tensors: 333 vision + 15 MTP). By the osmAPI research team and TERV.Pro student research team.

⚡ TL;DR

Property	Value
Disk size	~29.5 GB
Scheme	MXFP8 (8-bit microscaling) (OCP microscaling, `group_size=32`, E8M0 scale)
MTP speculative decoding	✅ Native, embedded — no external drafter
Vision	✅ Preserved (333 ViT weights, fp16)
Quantizer	mlx-mtp `quantize(mode=mxfp8)` — pure Apple mlx
Refusal rate (ZeroFuse, n=100)	8/100 (vs source 86/100)
KL divergence vs original	0.007
SWE-bench Verified (base Coder)	67.0% (off-thinking, 335/500)
Recommended RAM	36 GB+ Apple Silicon
Best for	Highest-fidelity local inference · vision · full MX precision
Released by	osmAPI · TERV.Pro

🧬 Lineage

Qwen/Qwen3.6-27B                              (Qwen Team — base multimodal pretrain)
        │
        ▼
Jackrong/Qwopus3.6-27B-v2                     (Jackrong — Claude-Opus reasoning distill)
        │
        ▼
Jackrong/Qwopus3.6-27B-Coder                  (Jackrong — agentic-coding SFT, Trace Inversion)
   ├── Datasets: Claude-opus-4.6-TraceInversion-9000x
   │              Claude-opus-4.7-TraceInversion-5000x
   │              hermes-agent-reasoning-traces
   └── SWE-bench Verified: 67.0% (off-thinking)
        │
        ▼
ZeroFuse abliteration (TPE-100)   (osmAPI · TERV.Pro)
   ├── 100 startup trials
   ├── Best Pareto trial: T98  direction_index=52.43
   └── Refusals 86 → 8/100  KL=0.007
        │
        ▼
MTP restore (mtp.* heads grafted back from original)  (osmAPI · TERV.Pro)
        │
        ▼
MXFP8 (8-bit microscaling) quantization via mlx-mtp (pure Apple mlx)  (osmAPI · TERV.Pro)
   └── LM → mxfp8; vision + MTP head + SSM params → fp16
        │
        ▼
this repo — osmQwopus-3.6-27B-Coder-uncensored-MXFP8

Direct upstream links:

🏛️ Foundation: Qwen/Qwen3.6-27B
🎓 Reasoning distill (v2): Jackrong/Qwopus3.6-27B-v2
🛠️ Coder SFT source: Jackrong/Qwopus3.6-27B-Coder
🔓 Abliteration tool: ZeroFuse by osmAPI
🧮 Quantizer + inference: mlx-mtp (built on Apple MLX)

📊 Abliteration Results

Measured with ZeroFuse on mlabonne/harmful_behaviors (100 hard red-team prompts) and KL divergence on mlabonne/harmless_alpaca.

Stage	Refusals (n=100) ↓	KL divergence ↓
`Jackrong/Qwopus3.6-27B-Coder` (source)	86 / 100	— (reference)
TPE best (T98) — shipped here	8 / 100	0.007

→ 90.7% reduction in refusals with coding capabilities preserved. No SFT / LoRA healing required.

🧪 Method

Step 1 — Abliteration (ZeroFuse TPE-100)

Setup — ZeroFuse on MPS (M-series Apple Silicon), 128 GB unified memory, batch_size=32.
TPE optimization — 100 Tree-structured Parzen Estimator trials over ZeroFuse's parameter space (direction_index, attn.o_proj.*, mlp.down_proj.*). Best trial T98 at direction_index=52.43. Only self_attn.o_proj and mlp.down_proj of the 64 decoder layers are orthogonalized — the vision tower (model.visual.*) is untouched.
Auto-save — Pareto-best trial (lowest refusals, then lowest KL) merged into base weights via ZeroFuse's adapter-merge path; saved as BF16 safetensors.
MTP restore — mtp.* heads grafted back verbatim from the original using restore_mtp_coder.py, giving an identical 1199-tensor set to the base model.

Step 2 — MXFP8 (8-bit microscaling) quantization (mlx-mtp, pure Apple mlx)

from mlx_mtp.quantize import quantize
quantize(src="<MTP-restored bf16 dir>", out="<out dir>", mode="mxfp8")

Tensor-level MX quantization — language-model linears → MXFP8 (8-bit microscaling): each group of 32 weights shares an E8M0 (uint8) exponent scale, giving true 8-bit storage with hardware-accelerated matmul on Apple Silicon. No third-party ML-inference frameworks at runtime — mlx.core.quantize only.
Vision preserved — the entire ViT (model.visual.*, 333 weights) is kept fp16 by mlx-mtp's skip predicate.
MTP preserved, embedded — the MTP head (mtp.*, 15 weights) stays fp16 inside the model. mlx-mtp's engine drives it as a self-drafter: draft one token from the embedded head, verify in one target forward, accept greedily, and roll back BOTH the KV cache and the Gated-DeltaNet SSM state on rejection. MXFP8 keeps more weight precision, so MTP draft-acceptance is higher than MXFP4.
SSM params preserved — Qwen3.5 Gated-DeltaNet sensitivities (a_log, dt_bias, conv1d) kept fp16 for stability.

📦 Use it

mlx-mtp loads this checkpoint natively (vision + MTP), on Apple mlx only:

pip install git+https://github.com/junainfinity/mlx-mtp.git

Text / Code — vanilla and native-MTP speculative decoding

from mlx_mtp.loader import load
from mlx_mtp.engine import vanilla_generate, mtp_generate

model, processor, config = load("osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8")

prompt = "Write a thread-safe LRU cache in Python with unit tests."

# vanilla autoregressive
print(vanilla_generate(model, processor, config, prompt, max_tokens=1024)["text"])

# native MTP speculative decode (embedded head — no external drafter)
r = mtp_generate(model, processor, config, prompt, max_tokens=1024)
print(r["text"])
print(f"{r['tps']:.1f} tok/s | accept {r['accept_rate']*100:.0f}%")

Vision (preserved ViT)

from mlx_mtp.run import _vision_generate
caption = _vision_generate(model, processor, config,
                           "Describe this screenshot and list any UI bugs.",
                           "screenshot.png", max_tokens=512)
print(caption)

MTP + DFlash hybrid (where a DFlash drafter is available)

from mlx_mtp.dflash import load_dflash_drafter
from mlx_mtp.hybrid import hybrid_generate

drafter, _ = load_dflash_drafter("z-lab/Qwen3.6-27B-DFlash")  # external block-diffusion drafter
print(hybrid_generate(model, processor, config, drafter, prompt, max_tokens=1024)["text"])

Repo	Scheme	Bits	Size
`osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8`	MXFP8 (8-bit microscaling)	8-bit MX	~29.5 GB	✅ you are here
`osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP4`	MXFP4 (4-bit microscaling)	4-bit MX	~16.1 GB	↗
`Jackrong/Qwopus3.6-27B-Coder` — source (not abliterated)	bf16	16	~54 GB	↗

⚠️ Behaviour caveats

Uncensored. Refusal directions were surgically removed; this model will answer prompts the parent would refuse. Use responsibly and within applicable law. Intended for safety research, red-teaming, creative and educational use.
Identity preserved. The model still self-identifies as Qwen (Alibaba Tongyi Lab) — abliteration does not rewrite factual self-knowledge.
Heavy chain-of-thought. Qwopus inherits Claude-Opus's verbose reasoning. For terse code: "Be brief. Output only the code, no explanation.".
Coder SFT. Fine-tuned for agentic coding (tool-use, debugging, patch generation). General-knowledge tasks may regress vs the v2 base. Vision is preserved structurally but not the SFT focus.
MTP note. The MTP head was trained on the base model's pre-abliteration hidden states; post-abliteration its draft-acceptance may be marginally lower. This is lossless — MTP only proposes tokens, which the (abliterated) main model verifies.

🙏 Credits & Gratitude

We are deeply grateful to everyone whose work made this release possible.

Foundation Model — Qwen Team @ Alibaba Tongyi Lab, for Qwen3.6-27B: a world-class open-weight multimodal foundation with hybrid Gated-DeltaNet attention, 262K context, and an MTP speculative-decoding head. Remarkable work, openly shared.

Claude-Opus Reasoning Distill & Coder SFT — Jackrong, for Qwopus3.6-27B-v2 and the agentic-coding extension Jackrong/Qwopus3.6-27B-Coder. The Trace Inversion recipe and resulting quality are what make this abliteration worth doing.

Abliteration Toolkit — osmAPI, for ZeroFuse, an elegant Optuna-driven refusal-ablation framework (TPE search, KL guardrails, checkpointing, LoRA-merge). This release would not exist without it.

MLX — Apple ML Research, for the MLX framework and its first-class MX quantization modes (MXFP4 / MXFP8) that make 27B inference and quantization on Apple Silicon possible at this quality. mlx-mtp is built on mlx.core / mlx.nn alone.

mlx-mtp (junainfinity) — our own pure–Apple-mlx quantization + inference stack for the osmQwopus / Qwen3.5-family VLMs. It vendors and extends the Qwen3.5 architecture (hybrid Gated-DeltaNet + full attention), the vision tower, and a natively-embedded MTP head, with tensor-level MXFP4/MXFP8 quantization that preserves vision + MTP + SSM at fp16. mlx-mtp on GitHub.

osmAPI & TERV.Pro — abliteration, MTP restoration, quantization, and publication by the osmAPI research team and TERV.Pro student research team. osmAPI builds multi-provider LLM routing for the Indian developer ecosystem — the OpenRouter of India.

📜 License

Apache-2.0, inherited from the foundation (Qwen3.6-27B) and the coder fine-tune (Jackrong/Qwopus3.6-27B-Coder) upstream.

Need a hosted endpoint, custom quant, or enterprise inference? osmAPI — multi-provider LLM routing built for the Indian developer ecosystem.

Downloads last month: 546

Safetensors

Model size

8B params

Tensor type

U32

BF16

MLX

Hardware compatibility

8-bit

Model tree for osmapi/osmQwopus-3.6-27B-Coder-uncensored-MXFP8

Base model

Jackrong/Qwopus3.6-27B-v2

Adapter

Jackrong/Qwopus3.6-27B-Coder

Quantized

(17)

this model