Instructions to use cstr/outetts-0.3-1b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- OuteTTS
How to use cstr/outetts-0.3-1b-GGUF with OuteTTS:
- Notebooks
- Google Colab
- Kaggle
OuteTTS-0.3-1B GGUF
GGUF quantizations of OuteAI/OuteTTS-0.3-1B for use with CrispASR.
Files
| File | Size | Quant | Notes |
|---|---|---|---|
outetts-0.3-1b-f16.gguf |
2.4 GB | F16 | Full precision (reference quality) |
outetts-0.3-1b-q8_0.gguf |
1.2 GB | Q8_0 | Recommended โ near-lossless quality |
outetts-0.3-1b-q5_k.gguf |
783 MB | Q5_K | Good quality, balanced size |
outetts-0.3-1b-q4_k.gguf |
802 MB | Q4_K | Smallest โ verified intelligible (requires CrispASR >= cbe208fa) |
wavtokenizer-decoder-f16.gguf |
130 MB | F16 | WavTokenizer decoder (required companion) |
The talker GGUF contains the OLMo-1B LLM. The WavTokenizer decoder GGUF is always needed as a companion file. Q8_0 is recommended for best quality/size balance. Q4_K works but may occasionally shift word boundaries on short utterances.
Note: Q4_K requires CrispASR commit cbe208fa or later, which fixed a WavTokenizer activation bug (SiLU -> GELU in ResNet blocks) that previously degraded quantized output.
Usage with CrispASR
# Build CrispASR
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc) --target crispasr-cli
# Create a speaker profile from reference audio
python tools/reference_backends/outetts_create_speaker.py \
--audio ref.wav --text "transcript of the reference audio" \
--out speaker.json
# Synthesize with voice cloning
./build/bin/crispasr --backend outetts \
-m outetts-0.3-1b-q8_0.gguf \
--codec-model wavtokenizer-decoder-f16.gguf \
--voice speaker.json \
--tts "Hello, how are you today?" \
--tts-output hello.wav --seed 42
Architecture
- LLM: OLMo-1B (16 layers, 2048 hidden, 16 MHA heads, SwiGLU FFN, parameter-free LayerNorm, RoPE)
- Codec: WavTokenizer single-codebook VQ-GAN (4096 entries, 512-d, 75 tokens/sec)
- Decoder: Vocos backbone (ConvNeXt + pos_net with GroupNorm/GELU ResBlocks) + ISTFTHead
- Output: 24 kHz mono PCM
- License: CC BY 4.0
Conversion
# From HuggingFace model
python models/convert-outetts-to-gguf.py \
--input OuteAI/OuteTTS-0.3-1B \
--output outetts-0.3-1b-f16.gguf
python models/convert-wavtokenizer-to-gguf.py \
--input OuteAI/wavtokenizer-large-75token-interface \
--output wavtokenizer-decoder-f16.gguf
# Quantize
crispasr-quantize outetts-0.3-1b-f16.gguf outetts-0.3-1b-q8_0.gguf q8_0
- Downloads last month
- 279
Hardware compatibility
Log In to add your hardware
8-bit
16-bit
Model tree for cstr/outetts-0.3-1b-GGUF
Base model
OuteAI/OuteTTS-0.3-1B