Ornith-1.0-9B 4-bit (MLX) + MTP head (trained)

A 4-bit MLX build of Ornith-1.0-9B with a Multi-Token-Prediction head as an mtp/weights.safetensors sidecar, for native MTP speculative decoding in mlx-serve on Apple Silicon.

Use (mlx-serve, macOS / Apple Silicon)

Point mlx-serve at this folder — MTP auto-enables on sidecar presence (no config needed):

mlx-serve --model ./Ornith-1.0-9B-4bit-MTP-MLX-Serve
# or download via the model browser in MLX Core.app

Opt out with --no-mtp or per-request enable_mtp:false; go deeper with --mtp-depth. The base verifies every drafted token (exact rejection sampling) — output distribution unchanged, only faster.

What's inside

  • Base: 4-bit MLX Ornith-9B (qwen3_5, hidden 4096, g64), repackaged from pavantippannagari/Ornith-1.0-9B-mlx-4Bit.
  • mtp/weights.safetensors: KL-distilled head re-aligned to Ornith (from protoLabsAI/Ornith-1.0-9B-MTP) (15 tensors, bf16 — mlx-serve's loadLinear accepts plain bf16 linears).

Validation

Tensor names / shapes / dtype statically verified against mlx-serve's src/mtp.zig loader (fc [H,2H][2H,H]=[8192,4096], all 15 names present, bf16 linears). Base hidden matches head; fc geometry passes validateGeometry. Not run on-device here (built on a Linux/CUDA box; mlx-serve is Apple-Silicon only) — confirm acceptance rate on your Mac. A GGUF sibling (smoke-tested, ~0.81–0.83 draft acceptance) is at giaki3003/Ornith-1.0-9B-MTP-GGUF.

License: MIT (derivative of MIT-licensed Ornith-1.0-9B).

Downloads last month
303
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for giaki3003/Ornith-1.0-9B-4bit-MTP-MLX-Serve

Quantized
(60)
this model