TADA-3B-ML — GGUF (ggml-quantised)

GGUF / ggml conversion of HumeAI/tada-3b-ml for use with CrispStrobe/CrispASR.

TADA-3B-ML is a 4B-param text-to-speech model built on Meta Llama 3.2 3B with a flow-matching (FM) speech decoder and custom Hume codec. Key innovation: 1:1 token alignment — every text token maps to exactly one speech vector (no 7:1 expansion like Orpheus/SNAC), eliminating transcript hallucination. 10 languages (en, es, ja, zh, de, fr, it, pt, ko, ar). 24 kHz mono output.

License: Apache-2.0 / Llama 3.2 Community License ("Built with Llama").

Pair this with the TADA codec decoder at tada-codec-f16.gguf (included in this repo) — the talker outputs continuous acoustic vectors that the codec converts to audio.

Files

File Quant Size Notes
tada-tts-3b-ml-f16.gguf F16 ~8.2 GB Reference quality (LLM + FM head)
tada-tts-3b-ml-q4_k.gguf Q4_K ~2.5 GB Recommended — good quality, fits 8 GB RAM
tada-tts-3b-ml-q8_0.gguf Q8_0 ~4.5 GB Near-lossless
tada-codec-f16.gguf F16 ~1.1 GB Codec decoder (required companion)
tada-ref.gguf F32 ~200 KB Reference activations for diff harness

Architecture

Text Input
  |
BPE Tokenize (Llama-3.2 128K vocab)
  |
Llama-3.2-3B AR Forward (28L, 3072d, 24 heads / 8 KV)
  + acoustic embedding (512d) + time embedding (gray code)
  |-- Each position outputs: hidden state for FM head
  |
VibeVoice Diffusion Head (6L SwiGLU + AdaLN, flow matching)
  |-- Sinusoidal timestep embedding
  |-- 10 Euler ODE steps: noise -> speech vector (528d)
  |
TADA Codec Decoder (6L local-attention + DAC upsampler)
  |-- speech vectors -> 24 kHz PCM
  |
Output: float32 mono @ 24 kHz

Quick start

# 1. Build CrispASR
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --target crispasr-cli

# 2. Pull the model + codec
huggingface-cli download cstr/tada-tts-3b-ml-GGUF tada-tts-3b-ml-q4_k.gguf --local-dir .
huggingface-cli download cstr/tada-tts-3b-ml-GGUF tada-codec-f16.gguf --local-dir .

# 3. Synthesise
./build/bin/crispasr --backend tada \
    -m tada-tts-3b-ml-q4_k.gguf \
    --codec-model tada-codec-f16.gguf \
    --tts "Hello, this is a test of the TADA speech synthesis system." \
    --tts-output hello.wav

Source model

Downloads last month
412
GGUF
Model size
0.5B params
Architecture
tada-codec
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cstr/tada-tts-3b-ml-GGUF

Quantized
(1)
this model

Paper for cstr/tada-tts-3b-ml-GGUF