VITS TTS for Indian Languages

This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more.


Model Overview

The model ai4bharat/vits_rasa_13 is based on the VITS architecture and supports the following features:

  • Languages: Multiple Indian languages.
  • Styles: Various speaking styles and emotions.
  • Speaker IDs: Predefined speaker profiles for male and female voices.

Installation

pip install transformers torch

Usage

Here's a quick example to get started:

import soundfile as sf
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)

text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?"  # Example text in Punjabi
speaker_id = 16  # PAN_M
style_id = 0  # ALEXA

inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)

Supported Languages

  • Assamese
  • Bengali
  • Bodo
  • Dogri
  • Kannada
  • Maithili
  • Malayalam
  • Marathi
  • Nepali
  • Punjabi
  • Sanskrit
  • Tamil
  • Telugu

Speaker-Style Identifier Overview

Speaker Name Speaker ID
ASM_F 0
ASM_M 1
BEN_F 2
BEN_M 3
BRX_F 4
BRX_M 5
DOI_F 6
DOI_M 7
KAN_F 8
KAN_M 9
MAI_M 10
MAL_F 11
MAR_F 12
MAR_M 13
NEP_F 14
PAN_F 15
PAN_M 16
SAN_M 17
TAM_F 18
TEL_F 19
Style Name Style ID
ALEXA 0
ANGER 1
BB 2
BOOK 3
CONV 4
DIGI 5
DISGUST 6
FEAR 7
HAPPY 8
NEWS 10
SAD 12
SURPRISE 14
UMANG 15
WIKI 16

Citation

If you use this model in your research, please cite:

@article{ai4bharat_vits_rasa_13,
  title={VITS TTS for Indian Languages},
  author={Ashwin Sankar},
  year={2024},
  publisher={Hugging Face}
}
Downloads last month
70
Safetensors
Model size
40.2M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.