---
datasets:
- mozilla-foundation/common_voice_17_0
language:
- lg
base_model:
- speechbrain/tts-tacotron2-ljspeech
pipeline_tag: text-to-speech
metrics:
- mos
---
# Text-to-Speech (TTS) with Tacotron2 trained on Luganda CommonVoice
This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain.
The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
## Install SpeechBrain
```
pip install speechbrain
```
Please notice that we encourage you to read our tutorials and learn more about
[SpeechBrain](https://speechbrain.github.io).
### Perform Text-to-Speech (TTS)
```python
import torchaudio
from speechbrain.inference.TTS import Tacotron2
from speechbrain.inference.vocoders import HIFIGAN
# Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
tacotron2 = Tacotron2.from_hparams(source="sulaimank/tacotron2-cv-females", savedir="tmpdir_tts")
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
# Running the TTS
mel_output, mel_length, alignment = tacotron2.encode_text("Eddagala eryo lisigala mu nnyaanya okumala wiiki nga bbiri.")
# Running Vocoder (spectrogram-to-waveform)
waveforms = hifi_gan.decode_batch(mel_output)
# Save the waverform
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
```
If you want to generate multiple sentences in one-shot, you can do in this way:
```
from speechbrain.pretrained import Tacotron2
tacotron2 = Tacotron2.from_hparams(source="speechbrain/TTS_Tacotron2", savedir="tmpdir")
items = [
"A quick brown fox jumped over the lazy dog",
"How much wood would a woodchuck chuck?",
"Never odd or even"
]
mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)
### Limitations
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
```