Piper TTS (en_US lessac medium) — GGUF

Native C++ GGUF conversion of the Piper VITS voice en_US-lessac-medium for use with CrispASR.

Files

File	Size	Description
`piper-en_US-lessac-medium-f16.gguf`	30 MB	F16 weights (full model)

Piper models are small enough that quantization provides no meaningful savings. F16 is the only format.

./build/bin/crispasr --backend piper \
    -m piper-en_US-lessac-medium-f16.gguf \
    --tts "Hello, how are you today?" \
    --tts-output hello.wav

Phonemization uses espeak-ng (must be installed: apt install espeak-ng).

VITS (Conditional Variational Autoencoder with Adversarial Learning)
Text encoder: 6-layer relative-position transformer (192-d, 2 heads)
Duration predictor: Stochastic Duration Predictor with rational-quadratic spline flows
Flow: 4 affine coupling blocks with WaveNet conditioning
Decoder: HiFi-GAN (3 upsample stages, 9 MRF resblocks)
Output: 22.05 kHz mono PCM
License: MIT

python models/convert-piper-to-gguf.py \
    --onnx en_US-lessac-medium.onnx \
    --output piper-en_US-lessac-medium-f16.gguf

GGUF

Model size

15.7M params

Architecture

piper

Hardware compatibility

16-bit

Base model

Quantized

(27)

this model