Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
@@ -186,16 +186,16 @@ with gr.Blocks() as demo:
|
|
186 |
cache_examples=False, )
|
187 |
gr.Markdown("""
|
188 |
### Details and Indications
|
189 |
-
This is a Text-to-Speech (TTS) system that consists of two modules: 1) a Tacotron2
|
190 |
-
the spectrogram of the speech corresponding to the input text. And 2) a pre-trained HiFiGAN vocoder that maps
|
191 |
-
|
192 |
the female speaker with which the model has been trained (see the links below for more information).
|
193 |
Please, feel free to play with the GST scores and observe how the synthetic voice spells the input text.
|
194 |
Keep in mind that GSTs have been trained in an unsupervised way, so there is no specific control of
|
195 |
style attributes. Moreover, try to balance the GST scores by making them add up to a value close to 1. Below or
|
196 |
higher than 1 may cause low energy, mispronunciations or distortion.
|
197 |
You can choose between the HiFiGAN trained vocoder and the iterative algorithm Griffin-Lim, which does not need
|
198 |
-
to be trained
|
199 |
|
200 |
### More Information
|
201 |
Spectrogram generator has been adapted and trained from the
|
@@ -216,8 +216,4 @@ with gr.Blocks() as demo:
|
|
216 |
<br>
|
217 |
""")
|
218 |
|
219 |
-
"""Instead of using multiple heads for the attention module, we just set one single
|
220 |
-
head for simplicity, ease control purposes, but also to observer whether this attention still
|
221 |
-
works with just one head."""
|
222 |
-
|
223 |
demo.launch()
|
|
|
186 |
cache_examples=False, )
|
187 |
gr.Markdown("""
|
188 |
### Details and Indications
|
189 |
+
This is a Text-to-Speech (TTS) system that consists of two modules: 1) a replicated Tacotron2 model, which generates
|
190 |
+
the spectrogram of the speech corresponding to the input text. And 2) a pre-trained HiFiGAN vocoder that maps
|
191 |
+
spectrograms to a digital waveforms. Global Style Tokens (GST) have been implemented to catch style information from
|
192 |
the female speaker with which the model has been trained (see the links below for more information).
|
193 |
Please, feel free to play with the GST scores and observe how the synthetic voice spells the input text.
|
194 |
Keep in mind that GSTs have been trained in an unsupervised way, so there is no specific control of
|
195 |
style attributes. Moreover, try to balance the GST scores by making them add up to a value close to 1. Below or
|
196 |
higher than 1 may cause low energy, mispronunciations or distortion.
|
197 |
You can choose between the HiFiGAN trained vocoder and the iterative algorithm Griffin-Lim, which does not need
|
198 |
+
to be trained but produces a "robotic" effect.
|
199 |
|
200 |
### More Information
|
201 |
Spectrogram generator has been adapted and trained from the
|
|
|
216 |
<br>
|
217 |
""")
|
218 |
|
|
|
|
|
|
|
|
|
219 |
demo.launch()
|