TADA-1B — GGUF (ggml-quantised)

GGUF / ggml conversion of HumeAI/tada-1b for use with CrispStrobe/CrispASR.

TADA-1B is a text-to-speech model built on Meta Llama 3.2 1B with a flow-matching (FM) speech decoder and custom Hume codec. TADA uses 1:1 token alignment: every text token maps to one speech vector before the codec decoder renders 24 kHz mono PCM. This repo packages the talker model, required codec decoder, and a ready-to-use reference prompt GGUF for CrispASR's tada backend.

License: Llama 3.2 Community License. See the upstream HumeAI/tada-1b model card and LICENSE file for the original model terms.

Pair the talker with tada-codec-f16.gguf (included in this repo). The talker outputs continuous acoustic vectors; the codec converts those vectors to waveform audio.

Files

File	Quant	Size	Notes
`tada-tts-1b-f16.gguf`	F16	~3.1 GB	Reference-quality talker model
`tada-tts-1b-q4_k.gguf`	Q4_K	~1.7 GB	Recommended for CrispASR auto-download
`tada-codec-f16.gguf`	F16	~1.0 GB	Codec decoder, required companion
`tada-ref.gguf`	F32	~456 KB	Reference voice prompt for `--voice`; also used by CrispASR's TADA diff harness

The Q4_K file uses a TADA-aware quantization policy: large transformer block projection matrices are quantized, while talker.token_embd.* and all tada.* tensors are preserved at source precision. This keeps the flow-matching head, acoustic conditioning, and timing path stable.

Architecture

Text Input
  |
BPE Tokenize (Llama-3.2 vocab)
  |
Llama-3.2-1B AR Forward
  + acoustic embedding + gray-code time embedding
  |
Flow-Matching Speech Head
  |-- Euler ODE denoising: noise -> speech vector
  |
TADA Codec Decoder
  |-- speech vectors -> 24 kHz PCM
  |
Output: float32 mono @ 24 kHz

Quick start

# 1. Build CrispASR
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --target crispasr

# 2. Pull the model, codec, and reference prompt
huggingface-cli download cstr/tada-tts-1b-GGUF \
  tada-tts-1b-q4_k.gguf tada-codec-f16.gguf tada-ref.gguf \
  --local-dir .

# 3. Synthesize
./build/bin/crispasr --backend tada-1b --gpu-backend cpu \
  -m tada-tts-1b-q4_k.gguf \
  --codec-model tada-codec-f16.gguf \
  --voice tada-ref.gguf \
  --tts "Please call Stella." \
  --tts-output tada.wav \
  --seed 42

For F16 quality, replace tada-tts-1b-q4_k.gguf with tada-tts-1b-f16.gguf.

Recent CrispASR builds can also resolve this repo through the model registry:

./build/bin/crispasr --backend tada-1b -m auto --auto-download \
  --voice tada-ref.gguf \
  --tts "Hello from TADA one billion." \
  --tts-output hello.wav

Source model

Upstream: HumeAI/tada-1b
Base model: meta-llama/Llama-3.2-1B
Codec: HumeAI/tada-codec
Paper: arXiv:2602.23068
Converted with: models/convert-tada-to-gguf.py, models/convert-tada-codec-to-gguf.py, and crispasr-quantize
Runtime: CrispStrobe/CrispASR

Validation

The uploaded Q4_K model was smoke-tested locally with CrispASR by synthesizing:

Please call Stella.

and transcribing the generated WAV with ggml-tiny.en.bin; the ASR roundtrip returned:

Please call Stella!

Checksums

7be26395d37412dff5fd2bbeb47b3f584c3172a4cd0ac3793208c82b107b28cf  tada-tts-1b-f16.gguf
035b6edbf0f58e6e0c5ec77943aec233df1946e68e4b09c2bf002b113abe3a9a  tada-tts-1b-q4_k.gguf
ef5652e7a346c8a55dd6692676da2827320fd141042e87175880e032e1953082  tada-codec-f16.gguf
7efcc96795dd2b27577a4a81eb52d0c3add5ffa67f325fba5a938f3f98067ace  tada-ref.gguf

Notes

Use a recent CrispASR build with the TADA runtime fixes for prompt timing, codec expansion, and PyTorch-compatible MT19937 noise generation.
tada-ref.gguf is a ready-to-use reference prompt, not a Python-only cache. Pass it directly via --voice.
Custom voice references can be packed with CrispASR's TADA reference conversion tooling.