Ornith-1.0-9B 4-bit (MLX) + MTP head (trained)

A 4-bit MLX build of Ornith-1.0-9B with a Multi-Token-Prediction head as an mtp/weights.safetensors sidecar, for native MTP speculative decoding in mlx-serve on Apple Silicon.

Use (mlx-serve, macOS / Apple Silicon)

Point mlx-serve at this folder — MTP auto-enables on sidecar presence (no config needed):

mlx-serve --model ./Ornith-1.0-9B-4bit-MTP-MLX-Serve
# or download via the model browser in MLX Core.app

Opt out with --no-mtp or per-request enable_mtp:false; go deeper with --mtp-depth. The base verifies every drafted token (exact rejection sampling) — output distribution unchanged, only faster.

What's inside

Base: 4-bit MLX Ornith-9B (qwen3_5, hidden 4096, g64), repackaged from pavantippannagari/Ornith-1.0-9B-mlx-4Bit.
mtp/weights.safetensors: KL-distilled head re-aligned to Ornith (from protoLabsAI/Ornith-1.0-9B-MTP) (15 tensors, bf16 — mlx-serve's loadLinear accepts plain bf16 linears).

Validation

Tensor names / shapes / dtype statically verified against mlx-serve's src/mtp.zig loader (fc [H,2H]→[2H,H]=[8192,4096], all 15 names present, bf16 linears). Base hidden matches head; fc geometry passes validateGeometry. Not run on-device here (built on a Linux/CUDA box; mlx-serve is Apple-Silicon only) — confirm acceptance rate on your Mac. A GGUF sibling (smoke-tested, ~0.81–0.83 draft acceptance) is at giaki3003/Ornith-1.0-9B-MTP-GGUF.

License: MIT (derivative of MIT-licensed Ornith-1.0-9B).

Downloads last month: 303

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for giaki3003/Ornith-1.0-9B-4bit-MTP-MLX-Serve

Base model

deepreinforce-ai/Ornith-1.0-9B

Quantized

(60)

this model