akera commited on
Commit
ec66938
1 Parent(s): f3aa46c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ tags:
4
+ - text-to-speech
5
+ - TTS
6
+ - speech-synthesis
7
+ - Tacotron2
8
+ - speechbrain
9
+ license: "apache-2.0"
10
+ datasets:
11
+ - LJSpeech
12
+ metrics:
13
+ - mos
14
+ ---
15
+
16
+ # Sunbird AI Text-to-Speech (TTS) model trained on Luganda text
17
+
18
+ ### Text-to-Speech (TTS) with Tacotron2 trained on Male Commonvoice Recordings
19
+
20
+ This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [Tacotron2](https://arxiv.org/abs/1712.05884) pretrained on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
21
+
22
+ The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
23
+
24
+
25
+ ### Install SpeechBrain
26
+
27
+ ```
28
+ pip install speechbrain
29
+ ```
30
+
31
+ Please notice that we encourage you to read our tutorials and learn more about
32
+ [SpeechBrain](https://speechbrain.github.io).
33
+
34
+ ### Perform Text-to-Speech (TTS)
35
+
36
+ ```
37
+ import torchaudio
38
+ from speechbrain.pretrained import Tacotron2
39
+ from speechbrain.pretrained import HIFIGAN
40
+
41
+ # Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
42
+ tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="tmpdir_tts")
43
+ hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
44
+
45
+ # Running the TTS
46
+ mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb")
47
+
48
+ # Running Vocoder (spectrogram-to-waveform)
49
+ waveforms = hifi_gan.decode_batch(mel_output)
50
+
51
+ # Save the waverform
52
+ torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
53
+ ```
54
+
55
+ If you want to generate multiple sentences in one-shot, you can do in this way:
56
+
57
+ ```
58
+ from speechbrain.pretrained import Tacotron2
59
+ tacotron2 = Tacotron2.from_hparams(source="speechbrain/TTS_Tacotron2", savedir="tmpdir")
60
+ items = [
61
+ "A quick brown fox jumped over the lazy dog",
62
+ "How much wood would a woodchuck chuck?",
63
+ "Never odd or even"
64
+ ]
65
+ mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)
66
+
67
+ ```
68
+
69
+ ### Inference on GPU
70
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.