Spaces:

AlexK-PL
/

Tacotron2_GST_eng

Sleeping

App Files Files Community

AlexK-PL commited on Sep 26, 2023

Commit

8e7d044

1 Parent(s): 6de61b1

Update app.py

Browse files

Files changed (1) hide show

app.py +4 -8

app.py CHANGED Viewed

@@ -186,16 +186,16 @@ with gr.Blocks() as demo:
                         cache_examples=False, )
     gr.Markdown("""
     ### Details and Indications
-    This is a Text-to-Speech (TTS) system that consists of two modules: 1) a Tacotron2 replicated model, which generates
-    the spectrogram of the speech corresponding to the input text. And 2) a pre-trained HiFiGAN vocoder that maps the
-    spectrogram to a digital waveform. Global Style Tokens (GST) have been implemented to catch style information from
     the female speaker with which the model has been trained (see the links below for more information).
     Please, feel free to play with the GST scores and observe how the synthetic voice spells the input text.
     Keep in mind that GSTs have been trained in an unsupervised way, so there is no specific control of
     style attributes. Moreover, try to balance the GST scores by making them add up to a value close to 1. Below or
     higher than 1 may cause low energy, mispronunciations or distortion.
     You can choose between the HiFiGAN trained vocoder and the iterative algorithm Griffin-Lim, which does not need
-    to be trained, but produces a speech quite "robotic".
     ### More Information
     Spectrogram generator has been adapted and trained from the
@@ -216,8 +216,4 @@ with gr.Blocks() as demo:
     <br>
     """)
-    """Instead of using multiple heads for the attention module, we just set one single
-    head for simplicity, ease control purposes, but also to observer whether this attention still
-    works with just one head."""
 demo.launch()

                         cache_examples=False, )
     gr.Markdown("""
     ### Details and Indications
+    This is a Text-to-Speech (TTS) system that consists of two modules: 1) a replicated Tacotron2 model, which generates
+    the spectrogram of the speech corresponding to the input text. And 2) a pre-trained HiFiGAN vocoder that maps
+    spectrograms to a digital waveforms. Global Style Tokens (GST) have been implemented to catch style information from
     the female speaker with which the model has been trained (see the links below for more information).
     Please, feel free to play with the GST scores and observe how the synthetic voice spells the input text.
     Keep in mind that GSTs have been trained in an unsupervised way, so there is no specific control of
     style attributes. Moreover, try to balance the GST scores by making them add up to a value close to 1. Below or
     higher than 1 may cause low energy, mispronunciations or distortion.
     You can choose between the HiFiGAN trained vocoder and the iterative algorithm Griffin-Lim, which does not need
+    to be trained but produces a "robotic" effect.
     ### More Information
     Spectrogram generator has been adapted and trained from the
     <br>
     """)
 demo.launch()