Instructions to use beyoru/Confucius4-TTS-mlx-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use beyoru/Confucius4-TTS-mlx-int8 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Confucius4-TTS-mlx-int8 beyoru/Confucius4-TTS-mlx-int8
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Confucius4-TTS-mlx (int8)
8-bit quantized MLX build of netease-youdao/Confucius4-TTS (multilingual + cross-lingual zero-shot voice cloning, 14 languages: zh, en, ja, ko, de, fr, es, id, it, th, pt, ru, ms, vi) for Apple Silicon.
8-bit (group 64) on the T2S body matmuls and the w2v-bert encoder linears;
semantic_head + norms + embeddings kept fp32 (8-bit on the head audibly
degrades pronunciation). S2A flow + BigVGAN vocoder are fp32. ~2.6 GB total.
- T2S: ~2.64 GB (fp32) -> ~1.2 GB
- w2v-bert: ~1.5 GB (fp32) -> ~0.6 GB
- Speed (Apple M5): RTF ~1.7 (vs ~2.4 fp32)
- Quality: matched to fp32 in listening tests
Usage
Needs the confucius4 model in mlx-audio
(PR #799):
from mlx_audio.tts.utils import load
model = load("beyoru/Confucius4-TTS-mlx-int8")
for r in model.generate("Xin chào", ref_audio="voice.wav", lang="vi"):
... # r.audio at 22050 Hz
Attribution & license
- Model & architecture: netease-youdao/Confucius4-TTS (Apache-2.0)
- Vocoder: NVIDIA BigVGAN v2; speaker encoder: 3D-Speaker CAMPPlus (funasr)
- MLX port by Hert4, released under Apache-2.0.
- Downloads last month
- 38
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for beyoru/Confucius4-TTS-mlx-int8
Base model
netease-youdao/Confucius4-TTS