Spaces:
Running
Running
add about
Browse files- src/texts.py +34 -2
src/texts.py
CHANGED
@@ -1,11 +1,43 @@
|
|
1 |
LLM_BENCHMARKS_TEXT = f"""
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
|
5 |
|
6 |
## Reproducibility
|
7 |
To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
"""
|
10 |
|
11 |
EVALUATION_QUEUE_TEXT = """
|
|
|
1 |
LLM_BENCHMARKS_TEXT = f"""
|
2 |
+
# About
|
3 |
+
|
4 |
+
As many recent Text-to-Speech (TTS) models have shown, synthetic audio can be close to real human speech.
|
5 |
+
However, traditional evaluation methods for TTS systems need an update to keep pace with these new developments.
|
6 |
+
Our TTSDS benchmark assesses the quality of synthetic speech by considering factors like prosody, speaker identity, and intelligibility.
|
7 |
+
By comparing these factors with both real speech and noise datasets, we can better understand how synthetic speech stacks up.
|
8 |
+
|
9 |
+
## More information
|
10 |
More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
|
11 |
|
12 |
## Reproducibility
|
13 |
To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
|
14 |
|
15 |
+
## Credits
|
16 |
+
|
17 |
+
|
18 |
+
This benchmark is inspired by [TTS Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) which instead focuses on the subjective evaluation of TTS models.
|
19 |
+
Our benchmark would not be possible without the many open-source TTS models on Hugging Face and GitHub.
|
20 |
+
Additionally, our benchmark uses the following datasets:
|
21 |
+
- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/h)
|
22 |
+
- [LibriTTS](https://www.openslr.org/60/)
|
23 |
+
- [VCTK](https://datashare.ed.ac.uk/handle/10283/2950)
|
24 |
+
- [Common Voice](https://commonvoice.mozilla.org/)
|
25 |
+
- [ESC-50](https://github.com/karolpiczak/ESC-50)
|
26 |
+
And the following metrics/representations/tools:
|
27 |
+
- [Wav2Vec2](https://arxiv.org/abs/2006.11477)
|
28 |
+
- [Hubert](https://arxiv.org/abs/2006.11477)
|
29 |
+
- [WavLM](https://arxiv.org/abs/2110.13900)
|
30 |
+
- [PESQ](https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Speech_Quality)
|
31 |
+
- [VoiceFixer](https://arxiv.org/abs/2204.05841)
|
32 |
+
- [WADA SNR](https://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf)
|
33 |
+
- [Whisper](https://arxiv.org/abs/2212.04356)
|
34 |
+
- [Masked Prosody Model](https://huggingface.co/cdminix/masked_prosody_model)
|
35 |
+
- [PyWorld](https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder)
|
36 |
+
- [WeSpeaker](https://arxiv.org/abs/2210.17016)
|
37 |
+
- [D-Vector](https://github.com/yistLin/dvector)
|
38 |
+
|
39 |
+
Authors: Christoph Minixhofer, Ondřej Klejch, and Peter Bell
|
40 |
+
of the University of Edinburgh.
|
41 |
"""
|
42 |
|
43 |
EVALUATION_QUEUE_TEXT = """
|