Chatterbox-Multilingual T3 GGUF

End-to-end GGUF conversion of ResembleAI's Chatterbox multilingual T3 (t3_mtl23ls_v3.safetensors โ€” 23-language coverage).

This release adopts the native codec_lm split used by CSM / Qwen3-TTS / MOSS-TTSD:

  • Backbone (chatterbox-mtl-t3-<quant>.gguf) โ€” stock llama arch GGUF of Chatterbox T3's Llama-520M tfmr.* weights (30 layers, hidden 1024, 16 heads, head_dim=64, MLP 4096, llama3 RoPE scaling, rope_theta=500000, vocab placeholder = tokenizer.ggml.model = "none"). Runs in stock llama.cpp with embeddings=true.
  • Codec + codec_lm (chatterbox-mtl-codec-<quant>.gguf) โ€” Chatterbox S3G (flow-matching decoder + HiFi-GAN vocoder) bundled with the T3 LM-adaptor side (audio embed table, speech head, text embed/head, learned positional embeddings, cond encoder weights). Runs in codec.cpp as a parallel_heads_delay codec_lm with n_cb=1.
  • S3T tokenizer (chatterbox-mtl-s3t.gguf) โ€” speech tokenizer needed when registering a custom voice from a reference WAV.

Compared to the earlier release in this repo, the old t3-*.gguf (custom shape) and t3-extras.gguf are gone โ€” everything the host runtime needs to drive the LM adaptor is now bundled into chatterbox-mtl-codec-*.gguf and exposed through codec.cpp's codec_lm API.

Files

Backbone (chatterbox-mtl-t3-<quant>.gguf)

File Size
chatterbox-mtl-t3-f32.gguf 1.9 GB
chatterbox-mtl-t3-f16.gguf 961 MB
chatterbox-mtl-t3-bf16.gguf 961 MB
chatterbox-mtl-t3-q8_0.gguf 511 MB
chatterbox-mtl-t3-q6_k.gguf 395 MB
chatterbox-mtl-t3-q5_1.gguf 361 MB
chatterbox-mtl-t3-q5_k_m.gguf 340 MB
chatterbox-mtl-t3-q5_k_s.gguf 331 MB
chatterbox-mtl-t3-q5_0.gguf 331 MB
chatterbox-mtl-t3-q4_1.gguf 301 MB
chatterbox-mtl-t3-q4_k_m.gguf 289 MB
chatterbox-mtl-t3-q4_k_s.gguf 273 MB
chatterbox-mtl-t3-q4_0.gguf 271 MB
chatterbox-mtl-t3-q3_k_l.gguf 254 MB
chatterbox-mtl-t3-q3_k_m.gguf 232 MB
chatterbox-mtl-t3-q3_k_s.gguf 207 MB
chatterbox-mtl-t3-q2_k.gguf 177 MB

Codec + codec_lm (chatterbox-mtl-codec-<quant>.gguf)

File Size
chatterbox-mtl-codec-f32.gguf 572 MB
chatterbox-mtl-codec-f16.gguf 317 MB
chatterbox-mtl-codec-q8_0.gguf 226 MB
chatterbox-mtl-codec-q5_k_m.gguf 190 MB
chatterbox-mtl-codec-q4_k_m.gguf 178 MB

S3T speech tokenizer

chatterbox-mtl-s3t.gguf (F16, 237 MB) โ€” needed for voice-clone, encodes a reference WAV into the speech token IDs that the codec_lm consumes when registering a custom speaker. Same weights as English Chatterbox.

Inference shape

Per-frame AR loop, single-codebook parallel_heads_delay:

backbone (Llama-520M, embeddings=true) hidden h
    โ†’ codec_lm_step_begin(state, h)
    โ†’ codec_lm_step_logits(0) โ†’ sample speech-token โ†’ codec_lm_step_push_code
    โ†’ codec_lm_step_finish โ†’ codes[1]
    โ†’ codec_lm_compose_audio_embd(codes) + speech_pos_emb[step] โ†’ next-step embedding
    โ†’ feed via b.embd; loop until stop_speech_token (6562)

Prompt prefix assembly (text + cond) lives inside codec.cpp's chatterbox path; the host application doesn't directly touch lm.chatterbox.* tensors.

Sources

Supported languages

Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, Chinese (23 total).

Downloads last month
1,354
GGUF
Model size
0.2B params
Architecture
chatterbox_s3g
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hans00/Chatterbox-Multilingual-TTS-GGUF

Quantized
(22)
this model