CEHB commited on
Commit
e970b40
1 Parent(s): 3887b31

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +11 -12
app.py CHANGED
@@ -59,24 +59,24 @@ def predict(text, speaker):
59
  return (16000, speech)
60
 
61
 
62
- title = "SpeechT5: Speech Synthesis"
63
 
64
  description = """
65
- The <b>SpeechT5</b> model is pre-trained on text as well as speech inputs, with targets that are also a mix of text and speech.
66
- By pre-training on text and speech at the same time, it learns unified representations for both, resulting in improved modeling capabilities.
67
- SpeechT5 can be fine-tuned for different speech tasks. This space demonstrates the <b>text-to-speech</b> (TTS) checkpoint for the English language.
68
- See also the <a href="https://huggingface.co/spaces/Matthijs/speecht5-asr-demo">speech recognition (ASR) demo</a>
69
- and the <a href="https://huggingface.co/spaces/Matthijs/speecht5-vc-demo">voice conversion demo</a>.
70
- Refer to <a href="https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ">this Colab notebook</a> to learn how to fine-tune the SpeechT5 TTS model on your own dataset or language.
71
- <b>How to use:</b> Enter some English text and choose a speaker. The output is a mel spectrogram, which is converted to a mono 16 kHz waveform by the
72
- HiFi-GAN vocoder. Because the model always applies random dropout, each attempt will give slightly different results.
73
- The <em>Surprise Me!</em> option creates a completely randomized speaker.
74
  """
75
 
76
  article = """
77
  <div style='margin:20px auto;'>
78
  <p>References: <a href="https://arxiv.org/abs/2110.07205">SpeechT5 paper</a> |
79
- <a href="https://github.com/microsoft/SpeechT5/">original GitHub</a> |
80
  <a href="https://huggingface.co/mechanicalsea/speecht5-tts">original weights</a></p>
81
  <pre>
82
  @article{Ao2021SpeechT5,
@@ -88,7 +88,6 @@ article = """
88
  year={2021}
89
  }
90
  </pre>
91
- <p>Speaker embeddings were generated from <a href="http://www.festvox.org/cmu_arctic/">CMU ARCTIC</a> using <a href="https://huggingface.co/mechanicalsea/speecht5-vc/blob/main/manifest/utils/prep_cmu_arctic_spkemb.py">this script</a>.</p>
92
  </div>
93
  """
94
 
 
59
  return (16000, speech)
60
 
61
 
62
+ title = "SpeechT5 finetuned Swedish, TTS "
63
 
64
  description = """
65
+ SpeechT5 text-to-speech model finetuned on the Swedish language from the
66
+ Common Voice dataset. Inference runs on a basic CPU (2 vCPU, 16 GB ram) so
67
+ please have patience if it takes some time. As a company founded by a female
68
+ coder, our resources are extremely limited (female founders in tech only get approx.
69
+ 1 % of the venture capital and the women who receive funding seldom are the
70
+ ones actually handling the tech). We are in a very biased sphere where
71
+ female coders' companies seldom get the resources which would normally
72
+ be necessary to do what they do. The app uses the SpeechT5 model
73
+ finetuned for swedish by GreenCounsel, available here: [https://huggingface.co/GreenCounsel/speecht5_tts_common_voice_5_sv](https://huggingface.co/GreenCounsel/speecht5_tts_common_voice_5_sv).
74
  """
75
 
76
  article = """
77
  <div style='margin:20px auto;'>
78
  <p>References: <a href="https://arxiv.org/abs/2110.07205">SpeechT5 paper</a> |
79
+ <a href="https://github.com/microsoft/SpeechT5/">original SpeechT5</a> |
80
  <a href="https://huggingface.co/mechanicalsea/speecht5-tts">original weights</a></p>
81
  <pre>
82
  @article{Ao2021SpeechT5,
 
88
  year={2021}
89
  }
90
  </pre>
 
91
  </div>
92
  """
93