IndicF5 Kannada Bedtime v2

Fine-tuned IndicF5 for Kannada bedtime story narration in a parent's cloned voice.

Training

  • Base: ai4bharat/IndicF5 (MIT license)
  • Dataset: SPRINGLab/IndicTTS_Kannada + synthetic bedtime clips (800 clips, ~6h)
  • Method: Full fine-tune (CFM component only)
  • Steps: 500 (best checkpoint at step 500, loss 0.4043)
  • GPU: A100-80GB via Modal

Evaluation (10 sentences × 2 references)

Checkpoint Silence Pitch Std Syll/s MOS Flatness Rank
stock 49.3% 46.4Hz 1.9 3.9 0.629 7th
v3_step0200 52.0% 51.0Hz 2.2 4.2 0.582 5th
v3_step0400 43.8% 41.1Hz 1.7 3.7 0.532 6th
v3_step0600 49.6% 47.8Hz 1.6 3.9 0.578 4th
v3_step0800 47.1% 47.2Hz 2.1 4.0 0.487 3rd
v3_final 43.8% 43.7Hz 2.5 4.2 0.538 2nd
v2_step0500 45.1% 45.4Hz 3.0 4.2 0.491 1st

Demo Audio

Kannada bedtime stories (fine-tuned)

Full pipeline (translate + narrate)

How to use

from transformers import AutoModel

model = AutoModel.from_pretrained("sush0401/IndicF5-Kannada-Bedtime-v2", trust_remote_code=True)
audio = model("ಕನ್ನಡ ಪಠ್ಯ", ref_audio_path="reference.wav", ref_text="Reference transcript")

Part of DreamVoice

DreamVoice — bedtime stories in a parent's cloned voice.

Downloads last month
83
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sush0401/IndicF5-Kannada-Bedtime-v2

Finetuned
(6)
this model