AlexK-PL commited on
Commit
8e7d044
1 Parent(s): 6de61b1

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +4 -8
app.py CHANGED
@@ -186,16 +186,16 @@ with gr.Blocks() as demo:
186
  cache_examples=False, )
187
  gr.Markdown("""
188
  ### Details and Indications
189
- This is a Text-to-Speech (TTS) system that consists of two modules: 1) a Tacotron2 replicated model, which generates
190
- the spectrogram of the speech corresponding to the input text. And 2) a pre-trained HiFiGAN vocoder that maps the
191
- spectrogram to a digital waveform. Global Style Tokens (GST) have been implemented to catch style information from
192
  the female speaker with which the model has been trained (see the links below for more information).
193
  Please, feel free to play with the GST scores and observe how the synthetic voice spells the input text.
194
  Keep in mind that GSTs have been trained in an unsupervised way, so there is no specific control of
195
  style attributes. Moreover, try to balance the GST scores by making them add up to a value close to 1. Below or
196
  higher than 1 may cause low energy, mispronunciations or distortion.
197
  You can choose between the HiFiGAN trained vocoder and the iterative algorithm Griffin-Lim, which does not need
198
- to be trained, but produces a speech quite "robotic".
199
 
200
  ### More Information
201
  Spectrogram generator has been adapted and trained from the
@@ -216,8 +216,4 @@ with gr.Blocks() as demo:
216
  <br>
217
  """)
218
 
219
- """Instead of using multiple heads for the attention module, we just set one single
220
- head for simplicity, ease control purposes, but also to observer whether this attention still
221
- works with just one head."""
222
-
223
  demo.launch()
 
186
  cache_examples=False, )
187
  gr.Markdown("""
188
  ### Details and Indications
189
+ This is a Text-to-Speech (TTS) system that consists of two modules: 1) a replicated Tacotron2 model, which generates
190
+ the spectrogram of the speech corresponding to the input text. And 2) a pre-trained HiFiGAN vocoder that maps
191
+ spectrograms to a digital waveforms. Global Style Tokens (GST) have been implemented to catch style information from
192
  the female speaker with which the model has been trained (see the links below for more information).
193
  Please, feel free to play with the GST scores and observe how the synthetic voice spells the input text.
194
  Keep in mind that GSTs have been trained in an unsupervised way, so there is no specific control of
195
  style attributes. Moreover, try to balance the GST scores by making them add up to a value close to 1. Below or
196
  higher than 1 may cause low energy, mispronunciations or distortion.
197
  You can choose between the HiFiGAN trained vocoder and the iterative algorithm Griffin-Lim, which does not need
198
+ to be trained but produces a "robotic" effect.
199
 
200
  ### More Information
201
  Spectrogram generator has been adapted and trained from the
 
216
  <br>
217
  """)
218
 
 
 
 
 
219
  demo.launch()