--- license: cc-by-4.0 language: - as - bn - brx - doi - kn - mai - ml - mr - ne - pa - sa - ta - te library_name: transformers pipeline_tag: text-to-speech tags: - text-to-speech --- # VITS TTS for Indian Languages This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more. --- ## Model Overview The model `ai4bharat/vits_rasa_13` is based on the VITS architecture and supports the following features: - **Languages**: Multiple Indian languages. - **Styles**: Various speaking styles and emotions. - **Speaker IDs**: Predefined speaker profiles for male and female voices. --- ## Installation ```bash pip install transformers torch ``` --- ## Usage Here's a quick example to get started: ```python import soundfile as sf from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda") tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True) text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?" # Example text in Punjabi speaker_id = 16 # PAN_M style_id = 0 # ALEXA inputs = tokenizer(text=text, return_tensors="pt").to("cuda") outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id) sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate) print(outputs.waveform.shape) ``` --- ## Supported Languages - `Assamese` - `Bengali` - `Bodo` - `Dogri` - `Kannada` - `Maithili` - `Malayalam` - `Marathi` - `Nepali` - `Punjabi` - `Sanskrit` - `Tamil` - `Telugu` ## Speaker-Style Identifier Overview | Speaker Name | Speaker ID | Style Name | Style ID | |--------------|------------|-------------|----------| | ASM_F | 0 | ALEXA | 0 | | ASM_M | 1 | ANGER | 1 | | BEN_F | 2 | BB | 2 | | BEN_M | 3 | BOOK | 3 | | BRX_F | 4 | CONV | 4 | | BRX_M | 5 | DIGI | 5 | | DOI_F | 6 | DISGUST | 6 | | DOI_M | 7 | FEAR | 7 | | KAN_F | 8 | HAPPY | 8 | | KAN_M | 9 | NEWS | 10 | | MAI_M | 10 | SAD | 12 | | MAL_F | 11 | SURPRISE | 14 | | MAR_F | 12 | UMANG | 15 | | MAR_M | 13 | WIKI | 16 | | NEP_F | 14 | | | | PAN_F | 15 | | | | PAN_M | 16 | | | | SAN_M | 17 | | | | TAM_F | 18 | | | | TEL_F | 19 | | | --- ## Citation If you use this model in your research, please cite: ```bibtex @article{ai4bharat_vits_rasa_13, title={VITS TTS for Indian Languages}, author={Ashwin Sankar}, year={2024}, publisher={Hugging Face} } ```