IndicF5 Kannada Bedtime v2
Fine-tuned IndicF5 for Kannada bedtime story narration in a parent's cloned voice.
Training
- Base:
ai4bharat/IndicF5(MIT license) - Dataset: SPRINGLab/IndicTTS_Kannada + synthetic bedtime clips (800 clips, ~6h)
- Method: Full fine-tune (CFM component only)
- Steps: 500 (best checkpoint at step 500, loss 0.4043)
- GPU: A100-80GB via Modal
Evaluation (10 sentences × 2 references)
| Checkpoint | Silence | Pitch Std | Syll/s | MOS | Flatness | Rank |
|---|---|---|---|---|---|---|
| stock | 49.3% | 46.4Hz | 1.9 | 3.9 | 0.629 | 7th |
| v3_step0200 | 52.0% | 51.0Hz | 2.2 | 4.2 | 0.582 | 5th |
| v3_step0400 | 43.8% | 41.1Hz | 1.7 | 3.7 | 0.532 | 6th |
| v3_step0600 | 49.6% | 47.8Hz | 1.6 | 3.9 | 0.578 | 4th |
| v3_step0800 | 47.1% | 47.2Hz | 2.1 | 4.0 | 0.487 | 3rd |
| v3_final | 43.8% | 43.7Hz | 2.5 | 4.2 | 0.538 | 2nd |
| v2_step0500 | 45.1% | 45.4Hz | 3.0 | 4.2 | 0.491 | 1st |
Demo Audio
Kannada bedtime stories (fine-tuned)
| Mood | Audio |
|---|---|
| Magical | kannada_magical.wav |
| Funny | kannada_funny.wav |
| Calming | kannada_calming.wav |
| Dreamy | kannada_dreamy.wav |
Full pipeline (translate + narrate)
- pipeline_test_kannada.wav — English story → IndicTrans2 → IndicF5 Kannada narration
- reference_voice.wav — Reference voice used for cloning
How to use
from transformers import AutoModel
model = AutoModel.from_pretrained("sush0401/IndicF5-Kannada-Bedtime-v2", trust_remote_code=True)
audio = model("ಕನ್ನಡ ಪಠ್ಯ", ref_audio_path="reference.wav", ref_text="Reference transcript")
Part of DreamVoice
DreamVoice — bedtime stories in a parent's cloned voice.
- Downloads last month
- 83
Model tree for sush0401/IndicF5-Kannada-Bedtime-v2
Base model
ai4bharat/IndicF5