Qwopus3.6-27B-v2 · MTPLX 8-bit Quality

The first MTPLX build of Qwopus3.6-27B-v2 — native multi-token-prediction speculative decoding on Apple Silicon, no external drafter, exact rejection sampling (Leviathan–Chen with residual correction), so sampling behaves exactly like normal decoding, just faster.

Forged with mtplx forge build from the original BF16 Jackrong/Qwopus3.6-27B-v2:

  • Body: flat 8-bit MLX affine quantization, group size 64
  • MTP head: preserved in BF16 (mtp_policy: keep_bf16), packed as mtp.safetensors sidecar
  • Size: 29.4 GB
  • Calibrated MTP contract included (mtplx_runtime.json)

Measured performance (Apple M5 Max, 128 GB, MTPLX 1.0.3)

Mode Decode Acceptance by depth
Plain autoregressive ~17.5 tok/s
MTP depth 3 39.0 tok/s (2.2×) 94% / 86% / 74%

Verification suite: long-code-uncapped, 2048-token budget. For reference, the same machine runs Qwopus-v2 Q6_K on llama.cpp with MTP (n=2) at ~24–26 tok/s — this build is ~55% faster at higher body precision.

Usage

brew install youssofal/mtplx/mtplx   # or pipx install mtplx
mtplx pull nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality
mtplx quickstart --model nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality \
  --depth 3 --paged-kv-quantization q8 --batching-preset agent --reasoning off

Serves OpenAI-compatible (/v1/chat/completions) and Anthropic-compatible (/v1/messages) endpoints with warm-prefix KV reuse, SSD session cache, continuous batching, and vision support. Full 262144-token context; only 16 of 64 layers carry KV (hybrid Gated DeltaNet architecture), so KV at 256K is ~16 GiB BF16 / ~8 GiB q8.

Notes

  • Runtime contract tier is forge-local: verified on the forging machine (M5 Max). MTPLX loads it with an honest provenance note.
  • Thinking/reasoning can be left on; --reasoning off is recommended for terse agentic/coding use.
  • Quantized with MTPLX 1.0.3 Forge. All credit for the fine-tune to Jackrong; base model Qwen3.6-27B (Apache-2.0).
Downloads last month
123
Safetensors
Model size
27B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nom666/Qwopus3.6-27B-v2-MTPLX-8bit-Quality

Quantized
(55)
this model