vijayavedartham
/

veda-tts-ljspeech

speech-synthesis

Model card Files Files and versions

Veda TTS — LJSpeech (CGN v2, 206M)

Fine-tuned from LibriTTS base (ckpt-15000) on LJSpeech.

Model


Architecture	CGN v2 (autoregressive)
Parameters	206M (1024d / 16L)
Audio codec	SNAC @ 24kHz (3-level, 4096 codebook)
Text frontend	Flite G2P (ARPAbet)

Training

Dataset: LJSpeech — 12,445 train / 655 val
Precision: bf16 | LR: 5e-5 (cosine) | Batch: 32
Best step: 2500 | Early stopped: step 5000

Eval (8 synthesis sentences)

Metric	Value
eval_loss	2.727
WER	5.2% (Whisper base.en)
DNSMOS	3.24

Demo

Downloads last month: 1

Safetensors

Model size

0.2B params

Tensor type

F32

·

Dataset used to train vijayavedartham/veda-tts-ljspeech