Veda TTS โ€” LJSpeech (CGN v2, 206M)

Fine-tuned from LibriTTS base (ckpt-15000) on LJSpeech.

Model

Architecture CGN v2 (autoregressive)
Parameters 206M (1024d / 16L)
Audio codec SNAC @ 24kHz (3-level, 4096 codebook)
Text frontend Flite G2P (ARPAbet)

Training

  • Dataset: LJSpeech โ€” 12,445 train / 655 val
  • Precision: bf16 | LR: 5e-5 (cosine) | Batch: 32
  • Best step: 2500 | Early stopped: step 5000

Eval (8 synthesis sentences)

Metric Value
eval_loss 2.727
WER 5.2% (Whisper base.en)
DNSMOS 3.24

Demo

๐ŸŽ™๏ธ Try it

Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train vijayavedartham/veda-tts-ljspeech