Instructions to use TigreGotico/voiceclonnx-chatterbox with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use TigreGotico/voiceclonnx-chatterbox with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Notebooks
- Google Colab
- Kaggle
TigreGotico/vconnx-chatterbox
ONNX artifacts for the Chatterbox VC (voice-conversion) engine used by vconnx.
Provenance
Derived from onnx-community/chatterbox-onnx
(Apache-2.0) via the vconnx export pipeline. Only the VC path is present —
the LLM / tokenizer components are not included because the VC pipeline does
not require them.
Original model: Chatterbox by Resemble AI (resemble-ai/chatterbox).
Files
| File | Size | Variant | Notes |
|---|---|---|---|
speech_encoder.onnx + .onnx_data |
~565 MB | fp32 | From onnx-community upstream |
speech_encoder_q8.onnx |
~216 MB | INT8 | Dynamic quantization (MatMul only) |
conditional_decoder.onnx + .onnx_data |
~516 MB | fp32 | From onnx-community upstream |
conditional_decoder_q8.onnx |
~252 MB | INT8 | Dynamic quantization (MatMul only, If-subgraph excluded) |
Total fp32: ~1 081 MB → Total INT8: ~468 MB (57 % reduction)
Quantization notes
speech_encoder_q8.onnx:
- Quantized via
onnxruntime.quantization.quantize_dynamic(MatMul ops only). - The two
Gemmnodes in the S3 VQ codebook (project_down) are excluded from quantization becausequantize_dynamicdecomposesGemm(transB=1)without transposing the weight — a known ORT preprocessing bug that corrupts the resulting MatMul node. A pre-quantization patch transposes the weight and rewrites those nodes asMatMul+Addbefore the quantizer runs. - Discrete token outputs match fp32 at runtime (token-selection is robust to the remaining float error in the continuous embeddings).
conditional_decoder_q8.onnx:
- Quantized via
onnxruntime.quantization.quantize_dynamic(MatMul ops only). - The decoder contains 20
Ifnodes with subgraph nodes; these subgraph nodes are excluded from quantization to avoid a hang in ORT's session initializer. - WER is identical to fp32 on the vconnx reference clips (8% vs 8%).
WER results
Measured with faster-whisper base.en on the vconnx reference demo clip
(10.7 s, source.wav):
| Variant | WER | Size (MB) |
|---|---|---|
| fp32 | 8% | 1081 |
| INT8 | 8% | 468 |
Gate: int8 flagged ⚠ when WER > 25% AND > 15 pp worse than fp32. INT8 passes — recommended.
Usage
Install vconnx and use the chatterbox engine with quantized=True:
from vconnx import VoiceCloner
cloner = VoiceCloner(engine="chatterbox", quantized=True)
result = cloner.clone_voice("source.wav", "reference.wav", "output.wav")
License
Apache-2.0 (inherited from upstream onnx-community/chatterbox-onnx).