Qwythos-9B-Claude-Mythos-5-MTPLX

An MTPLX-ready build of empero-ai/Qwythos-9B-Claude-Mythos-5-1M with a working multi-token-prediction (MTP) head, for faster local decoding on Apple Silicon with exact rejection-sampling.

  • Body: converted to MLX and quantized to 4-bit (affine, group size 64).
  • MTP head: kept in bf16 (small but drives draft acceptance).
  • On disk: ~5.2 GB.

Why this exists

The original empero-ai/Qwythos-9B-Claude-Mythos-5-1M safetensors release ships without an MTP head — only the companion ...-1M-GGUF repo carries a restored Qwen3.5-compatible MTP head (for llama.cpp --spec-type draft-mtp). MTPLX runs MLX, not GGUF, and its Forge tool only carries over an existing MTP head rather than training one.

This build bridges the gap: the bf16 MTP head was extracted from the GGUF's Qwythos-9B-Claude-Mythos-5-1M-MTP-BF16.gguf (blk.32 nextn tensors), remapped to MLX/MTPLX key names, paired with the full-precision body, and run through mtplx forge build. MTP-contract calibration reached exact agreement (1.0) at depths 1–3, confirming the head matches the trunk.

Architecture

Base Qwen3.5-9B (hybrid: 3:1 linear-attention to full-attention)
Layers 32 + 1 MTP layer
Hidden / heads 4096 / 16 attn, 4 KV, head_dim 256
Vocab 248320
Max context 1,048,576 (YaRN)
MTPLX arch_id qwen3-next-mtp

Usage (MTPLX)

mtplx serve --model <this-model-dir> \
  --chat-template-profile tokenizer \
  --reasoning-parser qwen3

Notes:

  • This is a reasoning model — it thinks before answering (<think> block), so allow a generous max_tokens (>=2048).
  • Use --chat-template-profile tokenizer so the model's own chat template (with <think> handling) is used, and --reasoning-parser qwen3 to fold the thinking segment.
  • Default sampling: temperature=0.6, top_p=0.95, top_k=20.

Provenance & license

  • Body: empero-ai/Qwythos-9B-Claude-Mythos-5-1M
  • MTP head: empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF
  • Forged locally with MTPLX Forge (see mtplx_runtime.json for the contract and verification details).
  • License: Apache-2.0, inherited from the base model. All credit for the base model and the restored MTP head belongs to Empero AI.
Downloads last month
356
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wang-yang/Qwythos-9B-Claude-Mythos-5-MTPLX

Finetuned
Qwen/Qwen3.5-9B
Quantized
(59)
this model