Transformers
Inference Endpoints
Edit model card

Bangla TTS

The Bangla TTS was training mono(Female) speaker using Vit tts model. The paper is ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer we used the coqui-ai🐸-a toolkit for Bangla Text-to-Speech training as well as inference.

Open In Colab

Contributions

  • Collect various Bangla datasets from the internet some data are collected from Mozilla common voice dataset and train the model.

  • we’ve developed the Bangla Vits TTS(text to speech) system that we trained and used for reading various Bangla
    text with the highest performing State of the Art(SOTA) Bangla neural voice.

Dataset

The Bangla Text-to-Speech (TTS) Team at IIT Madras has curated a Bangla Speech corpus, which has been meticulously processed for research purposes. The dataset has been downsampled to 22050 and reformatted from the original IITM annotation style to the LJSpeech format. This refined dataset, tailored for Bangla TTS, is accompanied by the weight files of the best-trained models. Researchers are encouraged to cite the corresponding paper, available at Paper Link, when utilizing this dataset in their research endeavors. The provided dataset and model weights contribute to the advancement of Bangla TTS research and serve as a valuable resource for further investigations in the field. Dataset Link

Evaluation

Mean Opinion Score(MOS) : 4.10 MOS Calculation method

Inference

For testing please check the end point integration Github

References :

  1. https://aclanthology.org/2020.lrec-1.789.pdf
  2. https://arxiv.org/pdf/2106.06103.pdf
  3. https://arxiv.org/abs/2005.11129
  4. https://aclanthology.org/2020.emnlp-main.207.pdf
  5. https://github.com/mobassir94
Downloads last month
3
Unable to determine this model’s pipeline type. Check the docs .