Instructions to use vanch007/mlx-indextts2-vietnamese-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use vanch007/mlx-indextts2-vietnamese-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir mlx-indextts2-vietnamese-8bit vanch007/mlx-indextts2-vietnamese-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
mlx-indextts2-vietnamese-8bit
This is a converted MLX IndexTTS2 model for Apple Silicon inference with solar2ain/mlx-indextts.
It was prepared for the local /Users/vanch/index-tts IndexTTS2 optimization project, where the goal was stable Vietnamese and multilingual TTS on an M3 Max Mac without PyTorch MPS memory crashes.
Variant
- Profile: Vietnamese
- Precision / quantization: 8bit
- Approx local size: 2.0GB
- Source checkpoint directory during conversion:
/Users/vanch/index-tts/checkpoints_vi - Note: Vietnamese model with upstream MLX GPT-only 8-bit quantization.
- Conversion detail: Converted with
mlx-indextts convert --quantize 8. In the current upstream implementation this quantizes GPT only; S2Mel and BigVGAN stay fp32.
Expected Files
The repository root is a ready-to-use MLX IndexTTS2 model directory:
gpt.safetensorss2mel.safetensorsbigvgan.safetensorsvq2emb.safetensorstokenizer.modelconfig.yamlconfig.jsonfeat1.ptfeat2.ptwav2vec2bert_stats.pt
Usage
Install and use mlx-indextts:
git clone https://github.com/solar2ain/mlx-indextts.git
cd mlx-indextts
uv sync --extra convert --extra v2
huggingface-cli download vanch007/mlx-indextts2-vietnamese-8bit \
--local-dir models/mlx-indextts2-vietnamese-8bit \
--local-dir-use-symlinks False
uv run mlx-indextts generate \
-m models/mlx-indextts2-vietnamese-8bit \
-r /path/to/reference_or_speaker.npz \
-t "Your text here" \
-o output.wav \
--memory-limit 24 \
--diffusion-steps 16
For repeated generation, precompute speaker conditioning first:
uv run mlx-indextts speaker \
-m models/mlx-indextts2-vietnamese-8bit \
-r /path/to/reference.wav \
-o speaker.npz \
--memory-limit 24
Benchmark
Benchmarked on a 128GB unified-memory M3 Max Mac using:
mlx-indexttsfromsolar2ain/mlx-indextts- precomputed
.npzspeaker conditioning memory_limit=24GBdiffusion_steps=16- emotion=
calm,emo_alpha=0.6 - same text set across fp32 / fp16 / 8bit / optimized PyTorch MPS
RTF lower is faster:
| Case | fp32 MLX RTF | fp16 MLX RTF | 8bit MLX RTF | PyTorch MPS RTF |
|---|---|---|---|---|
| vi short | 1.562 | 1.471 | 0.976 | 2.329 |
| vi long | 1.557 | 1.500 | 0.965 | 1.822 |
Summary from the local comparison:
- 8bit was the fastest MLX route in this test set.
- fp16 saved space but was slower than fp32 for the standard profile.
- Vietnamese fp16 was slightly faster than Vietnamese fp32, but Vietnamese 8bit was fastest.
ASR Validation
ASR validation with local mlx_whisper + whisper-large-v3-turbo found no empty audio, wrong-language output, or obvious missing sentences. Vietnamese long-form ASR still showed minor tone/word-ending differences, so subjective listening is recommended for production use.
ASR was used only as an automated sanity check. Final production selection should still include human listening, especially for long-form Vietnamese narration.
Provenance and Scope
This is an MLX conversion for local Apple Silicon inference, not the original PyTorch release. The original implementation and model family are associated with IndexTTS / IndexTTS2; the MLX runtime used here is solar2ain/mlx-indextts.
The benchmark numbers are environment-specific and should be treated as local M3 Max results, not universal performance guarantees.
- Downloads last month
- 32
Quantized