Bhav-TTS: Emotional Bollywood Text-to-Speech

Bhav-TTS is a custom fine-tuned text-to-speech model built on top of F5-TTS (Flow Matching architecture), specialized for emotional Indian speech synthesis with Bollywood voice characteristics.

Model Details

Base Model: F5-TTS (SWivid/F5-TTS)
Fine-tuning Dataset: Bollywood emotional audio dataset
Training Steps: 1,200,000+
Model Size: 5.1 GB (model_last.pt)
Sample Rate: 24 kHz
Languages: Hindi, English (Indian English)
Specialization: Emotional speech with Bollywood voice profiles

Features

✨ Emotion-Aware Synthesis

Support for emotion tags in text (<happy>, <sad>, <angry>, etc.)
Trained on emotionally-diverse Bollywood audio

🎭 Celebrity Voice Profiles

Shahrukh Khan (Drama)
Amitabh Bachchan (Baritone)
Narendra Modi (Speech)
Zero-shot voice cloning capabilities

🗣️ Multilingual Support

Native Hindi speech synthesis
Indian English accent
Mixed-language capability

⚡ Production Ready

Checkpoint trained to convergence
Integrated into Gradio demo
OpenAI-compatible API support

Usage

Basic Inference

from transformers import AutoModel
import soundfile as sf

# Load model
model = AutoModel.from_pretrained("your-username/bhav-tts")

# Generate speech
text = "नमस्ते! मैं बहुत खुश हूँ। <happy>"
audio = model(text, voice="hi_male")

# Save
sf.write("output.wav", audio, 24000)

With Reference Speaker

# Zero-shot voice cloning
ref_audio = "speaker_reference.wav"
ref_text = "कुछ संदर्भ टेक्स्ट"
text = "आपका जवाब यहाँ"

audio = model(text, ref_audio=ref_audio, ref_text=ref_text)

Training Configuration

Parameter	Value
Base Architecture	F5-TTS (ConvNeXt V2 + Transformer)
Dataset	Custom Bollywood emotional audio
Sample Count	1,000+ utterances
Training Steps	1,200,000+
Checkpoints	Multi-stage saving enabled

Model Files

model_last.pt (5.1 GB) - Final trained model weights
pretrained_model_1200000.pt (1.3 GB) - Intermediate checkpoint (1.2M steps)
vocab.txt (13.8 KB) - Custom vocabulary/phoneme mapping

Citation

If you use Bhav-TTS in your research, please cite:

@article{chen2024f5tts,
  title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
  author={Chen, Yushen and others},
  journal={arXiv preprint arXiv:2410.06885},
  year={2024}
}

License

MIT License

Related Projects

F5-TTS: https://github.com/SWivid/F5-TTS
Svara-TTS: Multilingual Indic TTS
IndicF5: AI4Bharat's native Indic language variant

Status: ✅ Production Ready | Date: June 30, 2026

Downloads last month: 7

Paper for bhriguverma/bhav-tts

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 48