Text-to-Speech (TTS) with VITS trained on Kiswahili and Luganda Common Voice

This repository provides all the necessary tools for Text-to-Speech (TTS) with Coqui TTS using a VITS fine-tuned on Kiswahili and Luganda Common Voice v13 from six speakers of a similar intonation.

The pre-trained model takes in as input a text and produces a waveform/audio in output.

How to Synthesize Speech using our models

First, you need to install TTS

pip install TTS

Perform Text-to-Speech (TTS)

from TTS.utils.synthesizer import Synthesizer


synthesizer = Synthesizer(
        "<model checkpoint path>",
        "<model configuration file>",
        None,
        None,
        None,
        None,
        None,
        None,
        None,
    )

sentence_to_synthesize = "Your Kiswahili or Luganda sentence here"
if sentence_to_synthesize:
    print(sentence_to_synthesize)
    wav = synthesizer.tts(sentence_to_synthesize, None, None, None)
    location = "output.wav"  # Choose a desired name for the output file
    synthesizer.save_wav(wav, location)

Limitations

We do not provide any warranty on the performance achieved by this model when used on other datasets.

Citing

Please, cite our work if you use our models for your research or business.

@inproceedings{buildingTTS,
  title={Building a Luganda Text-to-Speech Model from Crowdsourced Data},
  author={Kagumire, Sulaiman and Katumba, Andrew and Nakatumba-Nabende, Joyce and Quinn, John},
  booktitle={5th Workshop on African Natural Language Processing},
  year ={2024}
}

marconilab
/

VITS-commonvoice-females

Text-to-Speech (TTS) with VITS trained on Kiswahili and Luganda Common Voice

How to Synthesize Speech using our models

Perform Text-to-Speech (TTS)

Limitations

Citing

Dataset used to train marconilab/VITS-commonvoice-females