Confucius4-TTS-mlx

An MLX port of netease-youdao/Confucius4-TTS (multilingual, cross-lingual, zero-shot voice-cloning TTS) for Apple Silicon.

The official model is CUDA-only. This repo re-implements the heavy parts in MLX so they run on the Mac GPU (Metal), and was validated numerically against the original PyTorch model at every stage.

What runs where

Stage Backend Notes
Frontend: w2v-bert feature extraction, CAMPPlus speaker enc, mel PyTorch (MPS) not ported; runs once per utterance
T2S (GPT-2 decode, KV-cached) MLX argmax matches torch 99.2%
S2A flow-matching (DiT + WaveNet, 25-step Euler + CFG) MLX mel rel. err 0.77%
BigVGAN vocoder MLX waveform corr 0.9998

Benchmark (Apple M5, 24 GB)

3.8 s of audio generated end-to-end in **8.7 s** (frontend 0.6 / T2S 4.1 / S2A 1.4 / vocoder 2.6), vs ~28.6 s for the original PyTorch pipeline on CPU.

Contents

  • confucius_mlx/ โ€” MLX implementations (t2s_mlx.py, s2a_mlx.py, vocoder_mlx.py)
  • weights/t2s_model.safetensors โ€” T2S weights (F32, loads directly with mx.load)
  • weights/s2a_mlx.safetensors โ€” S2A weights (weight-norm folded)
  • weights/bigvgan_mlx.safetensors โ€” BigVGAN vocoder weights (weight-norm folded, from NVIDIA BigVGAN v2)
  • checkpoints/ โ€” tokenizer + w2v-bert normalization stats
  • inference_config.yaml, scripts/convert_bigvgan.py, infer_mlx.py

scripts/convert_bigvgan.py is included for reproducibility (re-fetch + fold from the original NVIDIA checkpoint), but the converted weights ship in weights/ so you don't need to run it.

Usage

The frontend still uses the original repo. Set it up alongside:

git clone https://github.com/netease-youdao/Confucius4-TTS.git
pip install mlx torch torchaudio transformers==4.52.4 sentencepiece soundfile librosa pyyaml
python infer_mlx.py --ref voice.wav --text "Xin chร o" --lang vi --out out.wav

Status / limitations

Research work-in-progress. The frontend (w2v-bert conformer) is intentionally left on PyTorch/MPS. Numbers above are single-utterance on M5.

Attribution & license

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for beyoru/Confucius4-TTS-mlx

Finetuned
(3)
this model