Piper TTS (en_US lessac medium) โ€” GGUF

Native C++ GGUF conversion of the Piper VITS voice en_US-lessac-medium for use with CrispASR.

Files

File Size Description
piper-en_US-lessac-medium-f16.gguf 30 MB F16 weights (full model)

Piper models are small enough that quantization provides no meaningful savings. F16 is the only format.

Usage with CrispASR

./build/bin/crispasr --backend piper \
    -m piper-en_US-lessac-medium-f16.gguf \
    --tts "Hello, how are you today?" \
    --tts-output hello.wav

Phonemization uses espeak-ng (must be installed: apt install espeak-ng).

Architecture

  • VITS (Conditional Variational Autoencoder with Adversarial Learning)
  • Text encoder: 6-layer relative-position transformer (192-d, 2 heads)
  • Duration predictor: Stochastic Duration Predictor with rational-quadratic spline flows
  • Flow: 4 affine coupling blocks with WaveNet conditioning
  • Decoder: HiFi-GAN (3 upsample stages, 9 MRF resblocks)
  • Output: 22.05 kHz mono PCM
  • License: MIT

Conversion

python models/convert-piper-to-gguf.py \
    --onnx en_US-lessac-medium.onnx \
    --output piper-en_US-lessac-medium-f16.gguf
Downloads last month
124
GGUF
Model size
15.7M params
Architecture
piper
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/piper-en_US-lessac-medium-GGUF

Quantized
(27)
this model