---
license: cc-by-4.0
language:
- as
- bn
- brx
- doi
- kn
- mai
- ml
- mr
- ne
- pa
- sa
- ta
- te
library_name: transformers
pipeline_tag: text-to-speech
tags:
- text-to-speech
---
# VITS TTS for Indian Languages

This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more.

---

## Model Overview

The model `ai4bharat/vits_rasa_13` is based on the VITS architecture and supports the following features:
- **Languages**: Multiple Indian languages.
- **Styles**: Various speaking styles and emotions.
- **Speaker IDs**: Predefined speaker profiles for male and female voices.

---

## Installation

```bash
pip install transformers torch
```

---

## Usage

Here's a quick example to get started:

```python
import soundfile as sf
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)

text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?"  # Example text in Punjabi
speaker_id = 16  # PAN_M
style_id = 0  # ALEXA

inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)
```

---
## Supported Languages

- `Assamese` 
- `Bengali`   
- `Bodo`     
- `Dogri`      
- `Kannada`    
- `Maithili`   
- `Malayalam`  
- `Marathi`    
- `Nepali`     
- `Punjabi`    
- `Sanskrit`   
- `Tamil`      
- `Telugu`     

## Speaker-Style Identifier Overview

| Speaker Name | Speaker ID | Style Name  | Style ID |
|--------------|------------|-------------|----------|
| ASM_F        | 0          | ALEXA       | 0        |
| ASM_M        | 1          | ANGER       | 1        |
| BEN_F        | 2          | BB          | 2        |
| BEN_M        | 3          | BOOK        | 3        |
| BRX_F        | 4          | CONV        | 4        |
| BRX_M        | 5          | DIGI        | 5        |
| DOI_F        | 6          | DISGUST     | 6        |
| DOI_M        | 7          | FEAR        | 7        |
| KAN_F        | 8          | HAPPY       | 8        |
| KAN_M        | 9          | NEWS        | 10       |
| MAI_M        | 10         | SAD         | 12       |
| MAL_F        | 11         | SURPRISE    | 14       |
| MAR_F        | 12         | UMANG       | 15       |
| MAR_M        | 13         | WIKI        | 16       |
| NEP_F        | 14         |             |          |
| PAN_F        | 15         |             |          |
| PAN_M        | 16         |             |          |
| SAN_M        | 17         |             |          |
| TAM_F        | 18         |             |          |
| TEL_F        | 19         |             |          |

---

## Citation

If you use this model in your research, please cite:

```bibtex
@article{ai4bharat_vits_rasa_13,
  title={VITS TTS for Indian Languages},
  author={Ashwin Sankar},
  year={2024},
  publisher={Hugging Face}
}
```