TADA-3B-ML — GGUF (ggml-quantised)

GGUF / ggml conversion of HumeAI/tada-3b-ml for use with CrispStrobe/CrispASR.

TADA-3B-ML is a 4B-param text-to-speech model built on Meta Llama 3.2 3B with a flow-matching (FM) speech decoder and custom Hume codec. Key innovation: 1:1 token alignment — every text token maps to exactly one speech vector (no 7:1 expansion like Orpheus/SNAC), eliminating transcript hallucination. 10 languages (en, es, ja, zh, de, fr, it, pt, ko, ar). 24 kHz mono output.

License: Apache-2.0 / Llama 3.2 Community License ("Built with Llama").

Pair this with the TADA codec decoder at tada-codec-f16.gguf (included in this repo) — the talker outputs continuous acoustic vectors that the codec converts to audio.

Files

File	Quant	Size	Notes
`tada-tts-3b-ml-f16.gguf`	F16	~8.2 GB	Reference quality (LLM + FM head)
`tada-tts-3b-ml-q4_k.gguf`	Q4_K	~2.5 GB	Recommended — good quality, fits 8 GB RAM
`tada-tts-3b-ml-q8_0.gguf`	Q8_0	~4.5 GB	Near-lossless
`tada-codec-f16.gguf`	F16	~1.1 GB	Codec decoder (required companion)
`tada-ref.gguf`	F32	~200 KB	Reference activations for diff harness

Architecture

Text Input
  |
BPE Tokenize (Llama-3.2 128K vocab)
  |
Llama-3.2-3B AR Forward (28L, 3072d, 24 heads / 8 KV)
  + acoustic embedding (512d) + time embedding (gray code)
  |-- Each position outputs: hidden state for FM head
  |
VibeVoice Diffusion Head (6L SwiGLU + AdaLN, flow matching)
  |-- Sinusoidal timestep embedding
  |-- 10 Euler ODE steps: noise -> speech vector (528d)
  |
TADA Codec Decoder (6L local-attention + DAC upsampler)
  |-- speech vectors -> 24 kHz PCM
  |
Output: float32 mono @ 24 kHz

Quick start

# 1. Build CrispASR
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --target crispasr-cli

# 2. Pull the model + codec
huggingface-cli download cstr/tada-tts-3b-ml-GGUF tada-tts-3b-ml-q4_k.gguf --local-dir .
huggingface-cli download cstr/tada-tts-3b-ml-GGUF tada-codec-f16.gguf --local-dir .

# 3. Synthesise
./build/bin/crispasr --backend tada \
    -m tada-tts-3b-ml-q4_k.gguf \
    --codec-model tada-codec-f16.gguf \
    --tts "Hello, this is a test of the TADA speech synthesis system." \
    --tts-output hello.wav

Source model

Upstream: HumeAI/tada-3b-ml (safetensors, ~6.6 GB BF16)
Codec: HumeAI/tada-codec (encoder + decoder)
Paper: arXiv:2602.23068
Converted with: models/convert-tada-to-gguf.py + models/convert-tada-codec-to-gguf.py