Instructions to use hans00/Chatterbox-Multilingual-TTS-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use hans00/Chatterbox-Multilingual-TTS-GGUF with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Notebooks
- Google Colab
- Kaggle
Chatterbox-Multilingual T3 GGUF
End-to-end GGUF conversion of ResembleAI's Chatterbox multilingual T3 (t3_mtl23ls_v3.safetensors โ 23-language coverage).
This release adopts the native codec_lm split used by CSM / Qwen3-TTS / MOSS-TTSD:
- Backbone (
chatterbox-mtl-t3-<quant>.gguf) โ stockllamaarch GGUF of Chatterbox T3's Llama-520Mtfmr.*weights (30 layers, hidden 1024, 16 heads, head_dim=64, MLP 4096, llama3 RoPE scaling, rope_theta=500000, vocab placeholder =tokenizer.ggml.model = "none"). Runs in stock llama.cpp withembeddings=true. - Codec + codec_lm (
chatterbox-mtl-codec-<quant>.gguf) โ Chatterbox S3G (flow-matching decoder + HiFi-GAN vocoder) bundled with the T3 LM-adaptor side (audio embed table, speech head, text embed/head, learned positional embeddings, cond encoder weights). Runs in codec.cpp as aparallel_heads_delaycodec_lm withn_cb=1. - S3T tokenizer (
chatterbox-mtl-s3t.gguf) โ speech tokenizer needed when registering a custom voice from a reference WAV.
Compared to the earlier release in this repo, the old t3-*.gguf (custom shape) and t3-extras.gguf are gone โ everything the host runtime needs to drive the LM adaptor is now bundled into chatterbox-mtl-codec-*.gguf and exposed through codec.cpp's codec_lm API.
Files
Backbone (chatterbox-mtl-t3-<quant>.gguf)
| File | Size |
|---|---|
chatterbox-mtl-t3-f32.gguf |
1.9 GB |
chatterbox-mtl-t3-f16.gguf |
961 MB |
chatterbox-mtl-t3-bf16.gguf |
961 MB |
chatterbox-mtl-t3-q8_0.gguf |
511 MB |
chatterbox-mtl-t3-q6_k.gguf |
395 MB |
chatterbox-mtl-t3-q5_1.gguf |
361 MB |
chatterbox-mtl-t3-q5_k_m.gguf |
340 MB |
chatterbox-mtl-t3-q5_k_s.gguf |
331 MB |
chatterbox-mtl-t3-q5_0.gguf |
331 MB |
chatterbox-mtl-t3-q4_1.gguf |
301 MB |
chatterbox-mtl-t3-q4_k_m.gguf |
289 MB |
chatterbox-mtl-t3-q4_k_s.gguf |
273 MB |
chatterbox-mtl-t3-q4_0.gguf |
271 MB |
chatterbox-mtl-t3-q3_k_l.gguf |
254 MB |
chatterbox-mtl-t3-q3_k_m.gguf |
232 MB |
chatterbox-mtl-t3-q3_k_s.gguf |
207 MB |
chatterbox-mtl-t3-q2_k.gguf |
177 MB |
Codec + codec_lm (chatterbox-mtl-codec-<quant>.gguf)
| File | Size |
|---|---|
chatterbox-mtl-codec-f32.gguf |
572 MB |
chatterbox-mtl-codec-f16.gguf |
317 MB |
chatterbox-mtl-codec-q8_0.gguf |
226 MB |
chatterbox-mtl-codec-q5_k_m.gguf |
190 MB |
chatterbox-mtl-codec-q4_k_m.gguf |
178 MB |
S3T speech tokenizer
chatterbox-mtl-s3t.gguf (F16, 237 MB) โ needed for voice-clone, encodes a reference WAV into the speech token IDs that the codec_lm consumes when registering a custom speaker. Same weights as English Chatterbox.
Inference shape
Per-frame AR loop, single-codebook parallel_heads_delay:
backbone (Llama-520M, embeddings=true) hidden h
โ codec_lm_step_begin(state, h)
โ codec_lm_step_logits(0) โ sample speech-token โ codec_lm_step_push_code
โ codec_lm_step_finish โ codes[1]
โ codec_lm_compose_audio_embd(codes) + speech_pos_emb[step] โ next-step embedding
โ feed via b.embd; loop until stop_speech_token (6562)
Prompt prefix assembly (text + cond) lives inside codec.cpp's chatterbox path; the host application doesn't directly touch lm.chatterbox.* tensors.
Sources
- Upstream model:
ResembleAI/chatterbox(t3_mtl23ls_v3.safetensors) - Conversion tooling:
mybigday/codec.cpp(prep_chatterbox_t3+lm_adaptor/chatterbox.py+ChatterboxS3GConverterwithlm_source) - Inference runtime:
mybigday/llama.rn
Supported languages
Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, Chinese (23 total).
- Downloads last month
- 1,354
Model tree for hans00/Chatterbox-Multilingual-TTS-GGUF
Base model
ResembleAI/chatterbox