|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- as |
|
- bn |
|
- brx |
|
- doi |
|
- kn |
|
- mai |
|
- ml |
|
- mr |
|
- ne |
|
- pa |
|
- sa |
|
- ta |
|
- te |
|
library_name: transformers |
|
pipeline_tag: text-to-speech |
|
tags: |
|
- text-to-speech |
|
--- |
|
# VITS TTS for Indian Languages |
|
|
|
This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more. |
|
|
|
--- |
|
|
|
## Model Overview |
|
|
|
The model `ai4bharat/vits_rasa_13` is based on the VITS architecture and supports the following features: |
|
- **Languages**: Multiple Indian languages. |
|
- **Styles**: Various speaking styles and emotions. |
|
- **Speaker IDs**: Predefined speaker profiles for male and female voices. |
|
|
|
--- |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
--- |
|
|
|
## Usage |
|
|
|
Here's a quick example to get started: |
|
|
|
```python |
|
import soundfile as sf |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda") |
|
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True) |
|
|
|
text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?" # Example text in Punjabi |
|
speaker_id = 16 # PAN_M |
|
style_id = 0 # ALEXA |
|
|
|
inputs = tokenizer(text=text, return_tensors="pt").to("cuda") |
|
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id) |
|
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate) |
|
print(outputs.waveform.shape) |
|
``` |
|
|
|
--- |
|
|
|
## Supported Languages |
|
|
|
- `Assamese` |
|
- `Bengali` |
|
- `Bodo` |
|
- `Dogri` |
|
- `Kannada` |
|
- `Maithili` |
|
- `Malayalam` |
|
- `Marathi` |
|
- `Nepali` |
|
- `Punjabi` |
|
- `Sanskrit` |
|
- `Tamil` |
|
- `Telugu` |
|
|
|
--- |
|
|
|
## Speaker-Style Identifier Overview |
|
|
|
<div style="display: flex; align-items: flex-start; gap: 20px; margin: 0; padding: 0;"> |
|
|
|
<table style="margin: 0; padding: 0; border-spacing: 0;"> |
|
<tr> |
|
<th>Speaker Name</th> |
|
<th>Speaker ID</th> |
|
</tr> |
|
<tr> |
|
<td>ASM_F</td> |
|
<td>0</td> |
|
</tr> |
|
<tr> |
|
<td>ASM_M</td> |
|
<td>1</td> |
|
</tr> |
|
<tr> |
|
<td>BEN_F</td> |
|
<td>2</td> |
|
</tr> |
|
<tr> |
|
<td>BEN_M</td> |
|
<td>3</td> |
|
</tr> |
|
<tr> |
|
<td>BRX_F</td> |
|
<td>4</td> |
|
</tr> |
|
<tr> |
|
<td>BRX_M</td> |
|
<td>5</td> |
|
</tr> |
|
<tr> |
|
<td>DOI_F</td> |
|
<td>6</td> |
|
</tr> |
|
<tr> |
|
<td>DOI_M</td> |
|
<td>7</td> |
|
</tr> |
|
<tr> |
|
<td>KAN_F</td> |
|
<td>8</td> |
|
</tr> |
|
<tr> |
|
<td>KAN_M</td> |
|
<td>9</td> |
|
</tr> |
|
<tr> |
|
<td>MAI_M</td> |
|
<td>10</td> |
|
</tr> |
|
<tr> |
|
<td>MAL_F</td> |
|
<td>11</td> |
|
</tr> |
|
<tr> |
|
<td>MAR_F</td> |
|
<td>12</td> |
|
</tr> |
|
<tr> |
|
<td>MAR_M</td> |
|
<td>13</td> |
|
</tr> |
|
<tr> |
|
<td>NEP_F</td> |
|
<td>14</td> |
|
</tr> |
|
<tr> |
|
<td>PAN_F</td> |
|
<td>15</td> |
|
</tr> |
|
<tr> |
|
<td>PAN_M</td> |
|
<td>16</td> |
|
</tr> |
|
<tr> |
|
<td>SAN_M</td> |
|
<td>17</td> |
|
</tr> |
|
<tr> |
|
<td>TAM_F</td> |
|
<td>18</td> |
|
</tr> |
|
<tr> |
|
<td>TEL_F</td> |
|
<td>19</td> |
|
</tr> |
|
</table> |
|
|
|
<table> |
|
<tr> |
|
<th>Style Name</th> |
|
<th>Style ID</th> |
|
</tr> |
|
<tr> |
|
<td>ALEXA</td> |
|
<td>0</td> |
|
</tr> |
|
<tr> |
|
<td>ANGER</td> |
|
<td>1</td> |
|
</tr> |
|
<tr> |
|
<td>BB</td> |
|
<td>2</td> |
|
</tr> |
|
<tr> |
|
<td>BOOK</td> |
|
<td>3</td> |
|
</tr> |
|
<tr> |
|
<td>CONV</td> |
|
<td>4</td> |
|
</tr> |
|
<tr> |
|
<td>DIGI</td> |
|
<td>5</td> |
|
</tr> |
|
<tr> |
|
<td>DISGUST</td> |
|
<td>6</td> |
|
</tr> |
|
<tr> |
|
<td>FEAR</td> |
|
<td>7</td> |
|
</tr> |
|
<tr> |
|
<td>HAPPY</td> |
|
<td>8</td> |
|
</tr> |
|
<tr> |
|
<td>NEWS</td> |
|
<td>10</td> |
|
</tr> |
|
<tr> |
|
<td>SAD</td> |
|
<td>12</td> |
|
</tr> |
|
<tr> |
|
<td>SURPRISE</td> |
|
<td>14</td> |
|
</tr> |
|
<tr> |
|
<td>UMANG</td> |
|
<td>15</td> |
|
</tr> |
|
<tr> |
|
<td>WIKI</td> |
|
<td>16</td> |
|
</tr> |
|
</table> |
|
|
|
</div> |
|
|
|
--- |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@article{ai4bharat_vits_rasa_13, |
|
title={VITS TTS for Indian Languages}, |
|
author={Ashwin Sankar}, |
|
year={2024}, |
|
publisher={Hugging Face} |
|
} |
|
``` |