F5-TTS Hinglish

Fine-tuned F5-TTS for Hinglish (Hindi-English code-switched) TTS with zero-shot voice cloning.

Dataset

ujs/hinglish — OpenSLR-104 (Hindi-English Code-Switching, IIT Guwahati).

Training Details

  • Base model: SPRINGLab/F5-Hindi-24KHz (151M params, F5-TTS Small)
  • Learning rate: 1e-05
  • Epochs: 10
  • Batch size: 3000 frames/GPU
  • Architecture: DiT (dim=768, depth=18, heads=12) + ConvNeXt V2 (dim=512, layers=4)
  • Audio: 24 kHz, 100-dim mel spectrogram, Vocos vocoder

Usage

from f5_tts.api import F5TTS
import soundfile as sf

tts = F5TTS(
    model_type="F5-TTS",
    ckpt_file="path/to/model_last.pt",
    vocab_file="path/to/vocab.txt",
)

wav, sr, _ = tts.infer(
    ref_file="reference.wav",       # 3-10 s of target speaker
    ref_text="yaar kya scene hai",  # transcript of ref audio
    gen_text="aaj ki meeting cancel ho gayi, let's go for lunch",
)

sf.write("output.wav", wav, sr)
Downloads last month
305
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SwarajSolanke-turtle/F5-TTS-Hinglish

Finetuned
(3)
this model

Dataset used to train SwarajSolanke-turtle/F5-TTS-Hinglish