TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment
Paper • 2602.23068 • Published • 7
GGUF / ggml conversion of HumeAI/tada-3b-ml for use with CrispStrobe/CrispASR.
TADA-3B-ML is a 4B-param text-to-speech model built on Meta Llama 3.2 3B with a flow-matching (FM) speech decoder and custom Hume codec. Key innovation: 1:1 token alignment — every text token maps to exactly one speech vector (no 7:1 expansion like Orpheus/SNAC), eliminating transcript hallucination. 10 languages (en, es, ja, zh, de, fr, it, pt, ko, ar). 24 kHz mono output.
License: Apache-2.0 / Llama 3.2 Community License ("Built with Llama").
Pair this with the TADA codec decoder at tada-codec-f16.gguf (included in this repo) — the talker outputs continuous acoustic vectors that the codec converts to audio.
| File | Quant | Size | Notes |
|---|---|---|---|
tada-tts-3b-ml-f16.gguf |
F16 | ~8.2 GB | Reference quality (LLM + FM head) |
tada-tts-3b-ml-q4_k.gguf |
Q4_K | ~2.5 GB | Recommended — good quality, fits 8 GB RAM |
tada-tts-3b-ml-q8_0.gguf |
Q8_0 | ~4.5 GB | Near-lossless |
tada-codec-f16.gguf |
F16 | ~1.1 GB | Codec decoder (required companion) |
tada-ref.gguf |
F32 | ~200 KB | Reference activations for diff harness |
Text Input
|
BPE Tokenize (Llama-3.2 128K vocab)
|
Llama-3.2-3B AR Forward (28L, 3072d, 24 heads / 8 KV)
+ acoustic embedding (512d) + time embedding (gray code)
|-- Each position outputs: hidden state for FM head
|
VibeVoice Diffusion Head (6L SwiGLU + AdaLN, flow matching)
|-- Sinusoidal timestep embedding
|-- 10 Euler ODE steps: noise -> speech vector (528d)
|
TADA Codec Decoder (6L local-attention + DAC upsampler)
|-- speech vectors -> 24 kHz PCM
|
Output: float32 mono @ 24 kHz
# 1. Build CrispASR
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --target crispasr-cli
# 2. Pull the model + codec
huggingface-cli download cstr/tada-tts-3b-ml-GGUF tada-tts-3b-ml-q4_k.gguf --local-dir .
huggingface-cli download cstr/tada-tts-3b-ml-GGUF tada-codec-f16.gguf --local-dir .
# 3. Synthesise
./build/bin/crispasr --backend tada \
-m tada-tts-3b-ml-q4_k.gguf \
--codec-model tada-codec-f16.gguf \
--tts "Hello, this is a test of the TADA speech synthesis system." \
--tts-output hello.wav
models/convert-tada-to-gguf.py + models/convert-tada-codec-to-gguf.py