zonos2-mlx — ready-to-run MLX weights

Pre-converted, pre-quantized MLX weights for Zyphra's ZONOS2 — an 8B-parameter Mixture-of-Experts autoregressive text-to-speech model — running natively on Apple Silicon.

Download and run. No PyTorch in the inference path, no conversion step.

🧠 Model: 16-expert top-1 MoE AR trunk (layer 26 routes top-2) → DAC 44.1 kHz neural codec for the waveform, with an ECAPA-TDNN speaker encoder (+ LDA) for voice cloning from a short reference clip.
🍎 Runtime: sb1992/mlx-zonos2 — a clean-room MLX reimplementation of the inference runtime, gated per-stage against the original PyTorch model.
📦 This repo: the weights only. Three precision tiers, each a self-contained folder.

Tiers

Each folder (bf16/, int8/, int4/) is self-contained — it bundles the quantized trunk plus the (tier-independent) DAC codec and ECAPA speaker encoder, so you download one folder and it just runs.

Folder	what's quantized	folder size	peak RAM	target Macs
`bf16/`	nothing (reference)	~14 GB	~44 GB	64 GB
`int8/`	attention/FFN/lm_head + experts int8; router/embeddings/norms bf16	~7.9 GB	~13 GB	32 GB
`int4/`	attention/FFN/lm_head int8; experts gate/up int4, down int8; router/embeddings/norms bf16	~5.7 GB	~10.6 GB	16 GB

_{Folder size includes the bundled ~315 MB DAC codec + ECAPA speaker encoder (identical across
tiers — Hugging Face Xet de-dups them, so they cost storage only once).}

The MoE experts (the bulk of the 8B) carry the int4; the router/gate, the lm_head, and the sensitive expert down projection stay int8/bf16 — the MoE-quant recipe that keeps the model intact. All three tiers produce full, intelligible audio — they're equal options, pick by the RAM you have.

Quick start

# 1. get the runtime
git clone https://github.com/sb1992/mlx-zonos2.git
cd mlx-zonos2
uv sync --extra oracle        # `oracle` extra = torchaudio, for enrolling a voice from raw audio

# 2. download one tier (self-contained: trunk + DAC + speaker encoder)
hf download shraey/zonos2-mlx --include "int8/*" --local-dir ./zonos2-mlx-weights

# 3. clone a voice + synthesize
python scripts/zonos2_cli.py \
    --model-dir ./zonos2-mlx-weights/int8 \
    --text "The quick brown fox jumps over the lazy dog." \
    --ref ref.wav \
    --out out.wav

Swap int8 → int4 (16 GB Macs) or bf16 (64 GB Macs) — same flow, just point --model-dir at the folder you downloaded. To grab every tier at once, drop the --include filter.

--ref enrolls a reference clip on the fly (needs the oracle extra for the mel front-end). You can also enroll a voice once into a small .zonos profile and reuse it — then generation is pure-MLX with no torch. See the runtime repo for the Python API, the enroll-once flow, and the full parity report.

Responsible use

This performs voice cloning — it can reproduce a person's voice from a few seconds of audio. Use it responsibly: no impersonation, fraud, or disinformation; only clone voices you own or have explicit consent for; disclose AI-generated audio wherever it's published. See the runtime repo for the full policy.

Attribution + license

This is a derivative port. The components it builds on are each independently licensed:

ZONOS2 — Apache-2.0, © Zyphra. The 8B-MoE model, the DAC 44.1 kHz codec, and the speaker encoder are Zyphra's. Code
Released checkpoint — this port converts the drbaph/ZONOS2-BF16 release (its speaker encoder is an ECAPA-TDNN, 2048-d).
Porting oracle — the clean plain-torch Zonos2_TTS-ComfyUI fork by Saganaki22 (Apache-2.0), used as the op-for-op reference.
MLX — Apple's ml-explore/mlx.

The MLX port code is licensed Apache-2.0. You must comply with the upstream ZONOS2 license and usage terms for the model weights. Full credit to Zyphra for the model, its training, and the open release — this repo only re-expresses their runtime in MLX.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for shraey/zonos2-mlx

Base model

drbaph/ZONOS2-BF16

Finetuned

(1)

this model