FastPitch (English) + HiFi-GAN — GGUF (ggml)

GGUF / ggml conversion of nvidia/tts_en_fastpitch + nvidia/tts_hifigan for use with CrispStrobe/CrispASR.

FastPitch is a non-autoregressive parallel TTS model that generates the entire mel spectrogram in a single forward pass (no sampling, no KV cache), making it very fast. The HiFi-GAN vocoder converts the mel to 22050 Hz PCM audio.

Text encoder: 6-layer Transformer (384-d, 1-head, post-norm, Conv1d FFN)
Duration predictor: 2-layer Conv1d stack + linear projection
Pitch predictor: 2-layer Conv1d stack + linear projection
Mel decoder: 6-layer Transformer (same architecture as encoder)
HiFi-GAN vocoder: conv_pre + 4 upsample stages (rates 8,8,2,2) with MRF resblocks + conv_post

Single speaker, English. ~60M parameters total (FastPitch + HiFi-GAN combined in one GGUF).

Released under CC-BY-4.0 (NeMo model license).

Files

File	Quant	Size	Notes
`fastpitch-en-f16.gguf`	F16	~230 MB	Reference quality
`fastpitch-en-q8_0.gguf`	Q8_0	~120 MB	Near-lossless
`fastpitch-en-q4_k.gguf`	Q4_K	~70 MB	Best size/quality balance

Quick start

# 1. Build CrispASR
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --target crispasr-cli

# 2. Download model (auto-download also works: -m auto --backend fastpitch)
hf download cstr/fastpitch-en-GGUF fastpitch-en-q8_0.gguf --local-dir .

# 3. Synthesize
./build/bin/crispasr --backend fastpitch -m fastpitch-en-q8_0.gguf \
    --tts "Hello there, how are you doing today?" \
    --tts-output hello.wav

# 4. Verify (ASR roundtrip)
./build/bin/crispasr -m models/ggml-base.en.bin -f hello.wav

Conversion

python models/convert-fastpitch-to-gguf.py \
    --hf-model nvidia/tts_en_fastpitch \
    --hf-vocoder nvidia/tts_hifigan \
    --output fastpitch-en-f16.gguf --ftype f16

Limitations

Single speaker only (the English model has n_speakers=1)
Character-level tokenization (no G2P phoneme conversion yet; proper ARPABET G2P would improve pronunciation of uncommon words)
Deterministic output (no temperature/seed controls — same input always produces same output)
22050 Hz sample rate

Downloads last month: 87

GGUF

Model size

60.3M params

Architecture

fastpitch

Hardware compatibility

8-bit

16-bit

Model tree for cstr/fastpitch-en-GGUF

Base model

nvidia/tts_en_fastpitch

Quantized

(1)

this model