vits_rasa_13 / README.md
AshwinSankar's picture
Update README.md
00b1590 verified
---
license: cc-by-4.0
language:
- as
- bn
- brx
- doi
- kn
- mai
- ml
- mr
- ne
- pa
- sa
- ta
- te
library_name: transformers
pipeline_tag: text-to-speech
tags:
- text-to-speech
---
# VITS TTS for Indian Languages
This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more.
---
## Model Overview
The model `ai4bharat/vits_rasa_13` is based on the VITS architecture and supports the following features:
- **Languages**: Multiple Indian languages.
- **Styles**: Various speaking styles and emotions.
- **Speaker IDs**: Predefined speaker profiles for male and female voices.
---
## Installation
```bash
pip install transformers torch
```
---
## Usage
Here's a quick example to get started:
```python
import soundfile as sf
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)
text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?" # Example text in Punjabi
speaker_id = 16 # PAN_M
style_id = 0 # ALEXA
inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)
```
---
## Supported Languages
- `Assamese`
- `Bengali`
- `Bodo`
- `Dogri`
- `Kannada`
- `Maithili`
- `Malayalam`
- `Marathi`
- `Nepali`
- `Punjabi`
- `Sanskrit`
- `Tamil`
- `Telugu`
---
## Speaker-Style Identifier Overview
<div style="display: flex; align-items: flex-start; gap: 20px; margin: 0; padding: 0;">
<table style="margin: 0; padding: 0; border-spacing: 0;">
<tr>
<th>Speaker Name</th>
<th>Speaker ID</th>
</tr>
<tr>
<td>ASM_F</td>
<td>0</td>
</tr>
<tr>
<td>ASM_M</td>
<td>1</td>
</tr>
<tr>
<td>BEN_F</td>
<td>2</td>
</tr>
<tr>
<td>BEN_M</td>
<td>3</td>
</tr>
<tr>
<td>BRX_F</td>
<td>4</td>
</tr>
<tr>
<td>BRX_M</td>
<td>5</td>
</tr>
<tr>
<td>DOI_F</td>
<td>6</td>
</tr>
<tr>
<td>DOI_M</td>
<td>7</td>
</tr>
<tr>
<td>KAN_F</td>
<td>8</td>
</tr>
<tr>
<td>KAN_M</td>
<td>9</td>
</tr>
<tr>
<td>MAI_M</td>
<td>10</td>
</tr>
<tr>
<td>MAL_F</td>
<td>11</td>
</tr>
<tr>
<td>MAR_F</td>
<td>12</td>
</tr>
<tr>
<td>MAR_M</td>
<td>13</td>
</tr>
<tr>
<td>NEP_F</td>
<td>14</td>
</tr>
<tr>
<td>PAN_F</td>
<td>15</td>
</tr>
<tr>
<td>PAN_M</td>
<td>16</td>
</tr>
<tr>
<td>SAN_M</td>
<td>17</td>
</tr>
<tr>
<td>TAM_F</td>
<td>18</td>
</tr>
<tr>
<td>TEL_F</td>
<td>19</td>
</tr>
</table>
<table>
<tr>
<th>Style Name</th>
<th>Style ID</th>
</tr>
<tr>
<td>ALEXA</td>
<td>0</td>
</tr>
<tr>
<td>ANGER</td>
<td>1</td>
</tr>
<tr>
<td>BB</td>
<td>2</td>
</tr>
<tr>
<td>BOOK</td>
<td>3</td>
</tr>
<tr>
<td>CONV</td>
<td>4</td>
</tr>
<tr>
<td>DIGI</td>
<td>5</td>
</tr>
<tr>
<td>DISGUST</td>
<td>6</td>
</tr>
<tr>
<td>FEAR</td>
<td>7</td>
</tr>
<tr>
<td>HAPPY</td>
<td>8</td>
</tr>
<tr>
<td>NEWS</td>
<td>10</td>
</tr>
<tr>
<td>SAD</td>
<td>12</td>
</tr>
<tr>
<td>SURPRISE</td>
<td>14</td>
</tr>
<tr>
<td>UMANG</td>
<td>15</td>
</tr>
<tr>
<td>WIKI</td>
<td>16</td>
</tr>
</table>
</div>
---
## Citation
If you use this model in your research, please cite:
```bibtex
@article{ai4bharat_vits_rasa_13,
title={VITS TTS for Indian Languages},
author={Ashwin Sankar},
year={2024},
publisher={Hugging Face}
}
```