Krisshvamsi
/

TTS

@@ -28,28 +28,43 @@ The pre-trained model takes in input a short text and produces a spectrogram in
 ```
 pip install speechbrain
 ```
-### Perform Text-to-Speech (TTS)
 ```python
 import torchaudio
 from TTSModel import TTSModel
-from Models import *
 from speechbrain.inference.vocoders import HIFIGAN
 texts = ["This is a sample text for synthesis."]
 # Intialize TTS (Transformer) and Vocoder (HiFIGAN)
-my_tts_model = TTSModel.from_hparams(source="model_source_path")
 hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
 # Running the TTS
-mel_output, mel_length = my_tts_model.encode_text(texts)
 # Running Vocoder (spectrogram-to-waveform)
 waveforms = hifi_gan.decode_batch(mel_output)
 # Save the waverform
 torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
 ```
 If you want to generate multiple sentences in one-shot, pass the sentences as items in a list.
@@ -58,26 +73,7 @@ If you want to generate multiple sentences in one-shot, pass the sentences as it
 ### Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
-### Training
-The model was trained with SpeechBrain.
-To train it from scratch follow these steps:
-1. Clone SpeechBrain:
-```bash
-git clone https://github.com/speechbrain/speechbrain/
-```
-2. Install it:
-```bash
-cd speechbrain
-pip install -r requirements.txt
-pip install -e .
-```
-3. Run Training:
-```bash
-cd recipes/LJSpeech/TTS/tacotron2/
-python train.py --device=cuda:0 --max_grad_norm=1.0 --data_folder=/your_folder/LJSpeech-1.1 hparams/train.yaml
-```
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.

 ```
 pip install speechbrain
 ```
+### Perform Text-to-Speech (TTS) - Running Inference
+To run model inference pull the interface directory as shown in the cell below
+Note: Run on T4-GPU for faster inference
+```
+!pip install --upgrade --no-cache-dir gdown
+!gdown 1oy8Y5zwkLel7diA63GNCD-6cfoBV4tq7
+!unzip inference.zip
+```
+```python
+%%capture
+!pip install speechbrain
+%cd inference
+```
 ```python
 import torchaudio
 from TTSModel import TTSModel
+from IPython.display import Audio
 from speechbrain.inference.vocoders import HIFIGAN
 texts = ["This is a sample text for synthesis."]
+model_source_path = "/content/inference"
 # Intialize TTS (Transformer) and Vocoder (HiFIGAN)
+my_tts_model = TTSModel.from_hparams(source=model_source_path)
 hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
 # Running the TTS
+mel_output = my_tts_model.encode_text(texts)
 # Running Vocoder (spectrogram-to-waveform)
 waveforms = hifi_gan.decode_batch(mel_output)
 # Save the waverform
 torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
+print("Saved the audio file!")
 ```
 If you want to generate multiple sentences in one-shot, pass the sentences as items in a list.
 ### Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
+Note: For Training the model please visit this [TTS_Training_Inference](https://colab.research.google.com/drive/1VYu4kXdgpv7f742QGquA1G4ipD2Kg0kT?usp=sharing) notebook
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.