zonos2-mlx β€” ready-to-run MLX weights

Pre-converted, pre-quantized MLX weights for Zyphra's ZONOS2 β€” an 8B-parameter Mixture-of-Experts autoregressive text-to-speech model β€” running natively on Apple Silicon.

Download and run. No PyTorch in the inference path, no conversion step.

  • 🧠 Model: 16-expert top-1 MoE AR trunk (layer 26 routes top-2) β†’ DAC 44.1 kHz neural codec for the waveform, with an ECAPA-TDNN speaker encoder (+ LDA) for voice cloning from a short reference clip.
  • 🍎 Runtime: sb1992/mlx-zonos2 β€” a clean-room MLX reimplementation of the inference runtime, gated per-stage against the original PyTorch model.
  • πŸ“¦ This repo: the weights only. Three precision tiers, each a self-contained folder.

Tiers

Each folder (bf16/, int8/, int4/) is self-contained β€” it bundles the quantized trunk plus the (tier-independent) DAC codec and ECAPA speaker encoder, so you download one folder and it just runs.

Folder what's quantized folder size peak RAM target Macs
bf16/ nothing (reference) ~14 GB ~44 GB 64 GB
int8/ attention/FFN/lm_head + experts int8; router/embeddings/norms bf16 ~7.9 GB ~13 GB 32 GB
int4/ attention/FFN/lm_head int8; experts gate/up int4, down int8; router/embeddings/norms bf16 ~5.7 GB ~10.6 GB 16 GB

Folder size includes the bundled ~315 MB DAC codec + ECAPA speaker encoder (identical across tiers β€” Hugging Face Xet de-dups them, so they cost storage only once).

The MoE experts (the bulk of the 8B) carry the int4; the router/gate, the lm_head, and the sensitive expert down projection stay int8/bf16 β€” the MoE-quant recipe that keeps the model intact. All three tiers produce full, intelligible audio β€” they're equal options, pick by the RAM you have.

Quick start

# 1. get the runtime
git clone https://github.com/sb1992/mlx-zonos2.git
cd mlx-zonos2
uv sync --extra oracle        # `oracle` extra = torchaudio, for enrolling a voice from raw audio

# 2. download one tier (self-contained: trunk + DAC + speaker encoder)
hf download shraey/zonos2-mlx --include "int8/*" --local-dir ./zonos2-mlx-weights

# 3. clone a voice + synthesize
python scripts/zonos2_cli.py \
    --model-dir ./zonos2-mlx-weights/int8 \
    --text "The quick brown fox jumps over the lazy dog." \
    --ref ref.wav \
    --out out.wav

Swap int8 β†’ int4 (16 GB Macs) or bf16 (64 GB Macs) β€” same flow, just point --model-dir at the folder you downloaded. To grab every tier at once, drop the --include filter.

--ref enrolls a reference clip on the fly (needs the oracle extra for the mel front-end). You can also enroll a voice once into a small .zonos profile and reuse it β€” then generation is pure-MLX with no torch. See the runtime repo for the Python API, the enroll-once flow, and the full parity report.

Responsible use

This performs voice cloning β€” it can reproduce a person's voice from a few seconds of audio. Use it responsibly: no impersonation, fraud, or disinformation; only clone voices you own or have explicit consent for; disclose AI-generated audio wherever it's published. See the runtime repo for the full policy.

Attribution + license

This is a derivative port. The components it builds on are each independently licensed:

  • ZONOS2 β€” Apache-2.0, Β© Zyphra. The 8B-MoE model, the DAC 44.1 kHz codec, and the speaker encoder are Zyphra's. Code
  • Released checkpoint β€” this port converts the drbaph/ZONOS2-BF16 release (its speaker encoder is an ECAPA-TDNN, 2048-d).
  • Porting oracle β€” the clean plain-torch Zonos2_TTS-ComfyUI fork by Saganaki22 (Apache-2.0), used as the op-for-op reference.
  • MLX β€” Apple's ml-explore/mlx.

The MLX port code is licensed Apache-2.0. You must comply with the upstream ZONOS2 license and usage terms for the model weights. Full credit to Zyphra for the model, its training, and the open release β€” this repo only re-expresses their runtime in MLX.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shraey/zonos2-mlx

Finetuned
(1)
this model