Bhav-TTS: Emotional Bollywood Text-to-Speech

Bhav-TTS is a custom fine-tuned text-to-speech model built on top of F5-TTS (Flow Matching architecture), specialized for emotional Indian speech synthesis with Bollywood voice characteristics.

Model Details

  • Base Model: F5-TTS (SWivid/F5-TTS)
  • Fine-tuning Dataset: Bollywood emotional audio dataset
  • Training Steps: 1,200,000+
  • Model Size: 5.1 GB (model_last.pt)
  • Sample Rate: 24 kHz
  • Languages: Hindi, English (Indian English)
  • Specialization: Emotional speech with Bollywood voice profiles

Features

Emotion-Aware Synthesis

  • Support for emotion tags in text (<happy>, <sad>, <angry>, etc.)
  • Trained on emotionally-diverse Bollywood audio

🎭 Celebrity Voice Profiles

  • Shahrukh Khan (Drama)
  • Amitabh Bachchan (Baritone)
  • Narendra Modi (Speech)
  • Zero-shot voice cloning capabilities

🗣️ Multilingual Support

  • Native Hindi speech synthesis
  • Indian English accent
  • Mixed-language capability

Production Ready

  • Checkpoint trained to convergence
  • Integrated into Gradio demo
  • OpenAI-compatible API support

Usage

Basic Inference

from transformers import AutoModel
import soundfile as sf

# Load model
model = AutoModel.from_pretrained("your-username/bhav-tts")

# Generate speech
text = "नमस्ते! मैं बहुत खुश हूँ। <happy>"
audio = model(text, voice="hi_male")

# Save
sf.write("output.wav", audio, 24000)

With Reference Speaker

# Zero-shot voice cloning
ref_audio = "speaker_reference.wav"
ref_text = "कुछ संदर्भ टेक्स्ट"
text = "आपका जवाब यहाँ"

audio = model(text, ref_audio=ref_audio, ref_text=ref_text)

Training Configuration

Parameter Value
Base Architecture F5-TTS (ConvNeXt V2 + Transformer)
Dataset Custom Bollywood emotional audio
Sample Count 1,000+ utterances
Training Steps 1,200,000+
Checkpoints Multi-stage saving enabled

Model Files

  • model_last.pt (5.1 GB) - Final trained model weights
  • pretrained_model_1200000.pt (1.3 GB) - Intermediate checkpoint (1.2M steps)
  • vocab.txt (13.8 KB) - Custom vocabulary/phoneme mapping

Citation

If you use Bhav-TTS in your research, please cite:

@article{chen2024f5tts,
  title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
  author={Chen, Yushen and others},
  journal={arXiv preprint arXiv:2410.06885},
  year={2024}
}

License

MIT License

Related Projects


Status: ✅ Production Ready | Date: June 30, 2026

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for bhriguverma/bhav-tts