TADA-1B — GGUF (ggml-quantised)
GGUF / ggml conversion of HumeAI/tada-1b for use with CrispStrobe/CrispASR.
TADA-1B is a text-to-speech model built on Meta Llama 3.2 1B with a flow-matching (FM) speech decoder and custom Hume codec. TADA uses 1:1 token alignment: every text token maps to one speech vector before the codec decoder renders 24 kHz mono PCM. This repo packages the talker model, required codec decoder, and a ready-to-use reference prompt GGUF for CrispASR's tada backend.
License: Llama 3.2 Community License. See the upstream HumeAI/tada-1b model card and LICENSE file for the original model terms.
Pair the talker with tada-codec-f16.gguf (included in this repo). The talker outputs continuous acoustic vectors; the codec converts those vectors to waveform audio.
Files
| File | Quant | Size | Notes |
|---|---|---|---|
tada-tts-1b-f16.gguf |
F16 | ~3.1 GB | Reference-quality talker model |
tada-tts-1b-q4_k.gguf |
Q4_K | ~1.7 GB | Recommended for CrispASR auto-download |
tada-codec-f16.gguf |
F16 | ~1.0 GB | Codec decoder, required companion |
tada-ref.gguf |
F32 | ~456 KB | Reference voice prompt for --voice; also used by CrispASR's TADA diff harness |
The Q4_K file uses a TADA-aware quantization policy: large transformer block projection matrices are quantized, while talker.token_embd.* and all tada.* tensors are preserved at source precision. This keeps the flow-matching head, acoustic conditioning, and timing path stable.
Architecture
Text Input
|
BPE Tokenize (Llama-3.2 vocab)
|
Llama-3.2-1B AR Forward
+ acoustic embedding + gray-code time embedding
|
Flow-Matching Speech Head
|-- Euler ODE denoising: noise -> speech vector
|
TADA Codec Decoder
|-- speech vectors -> 24 kHz PCM
|
Output: float32 mono @ 24 kHz
Quick start
# 1. Build CrispASR
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --target crispasr
# 2. Pull the model, codec, and reference prompt
huggingface-cli download cstr/tada-tts-1b-GGUF \
tada-tts-1b-q4_k.gguf tada-codec-f16.gguf tada-ref.gguf \
--local-dir .
# 3. Synthesize
./build/bin/crispasr --backend tada-1b --gpu-backend cpu \
-m tada-tts-1b-q4_k.gguf \
--codec-model tada-codec-f16.gguf \
--voice tada-ref.gguf \
--tts "Please call Stella." \
--tts-output tada.wav \
--seed 42
For F16 quality, replace tada-tts-1b-q4_k.gguf with tada-tts-1b-f16.gguf.
Recent CrispASR builds can also resolve this repo through the model registry:
./build/bin/crispasr --backend tada-1b -m auto --auto-download \
--voice tada-ref.gguf \
--tts "Hello from TADA one billion." \
--tts-output hello.wav
Source model
- Upstream:
HumeAI/tada-1b - Base model:
meta-llama/Llama-3.2-1B - Codec:
HumeAI/tada-codec - Paper: arXiv:2602.23068
- Converted with:
models/convert-tada-to-gguf.py,models/convert-tada-codec-to-gguf.py, andcrispasr-quantize - Runtime:
CrispStrobe/CrispASR
Validation
The uploaded Q4_K model was smoke-tested locally with CrispASR by synthesizing:
Please call Stella.
and transcribing the generated WAV with ggml-tiny.en.bin; the ASR roundtrip returned:
Please call Stella!
Checksums
7be26395d37412dff5fd2bbeb47b3f584c3172a4cd0ac3793208c82b107b28cf tada-tts-1b-f16.gguf
035b6edbf0f58e6e0c5ec77943aec233df1946e68e4b09c2bf002b113abe3a9a tada-tts-1b-q4_k.gguf
ef5652e7a346c8a55dd6692676da2827320fd141042e87175880e032e1953082 tada-codec-f16.gguf
7efcc96795dd2b27577a4a81eb52d0c3add5ffa67f325fba5a938f3f98067ace tada-ref.gguf
Notes
- Use a recent CrispASR build with the TADA runtime fixes for prompt timing, codec expansion, and PyTorch-compatible MT19937 noise generation.
tada-ref.ggufis a ready-to-use reference prompt, not a Python-only cache. Pass it directly via--voice.- Custom voice references can be packed with CrispASR's TADA reference conversion tooling.
- Downloads last month
- -
16-bit