Spaces:
Sleeping
Sleeping
rjzevallos
commited on
Update app.py
Browse files
app.py
CHANGED
@@ -18,15 +18,16 @@ METRICS_TAB_TEXT = """
|
|
18 |
## Metrics
|
19 |
|
20 |
Models in the leaderboard are evaluated using several key metrics:
|
21 |
-
* **UTMOS** (
|
22 |
* **WER** (Word Error Rate),
|
23 |
* **STOI** (Short-Time Objective Intelligibility),
|
24 |
* **PESQ** (Perceptual Evaluation of Speech Quality).
|
25 |
|
26 |
-
These metrics help evaluate both the accuracy and quality of the model
|
|
|
|
|
|
|
27 |
|
28 |
-
### UTMOS (User-TTS Mean Opinion Score)
|
29 |
-
UTMOS is a subjective metric that evaluates the perceived quality of speech generated by a TTS system. **A higher UTMOS indicates better quality** of the generated voice.
|
30 |
|
31 |
### WER (Word Error Rate)
|
32 |
WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
|
@@ -44,17 +45,15 @@ The WER calculation is done as follows:
|
|
44 |
WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
|
45 |
```
|
46 |
|
47 |
-
### STOI (Short-Time Objective Intelligibility)
|
48 |
STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
|
49 |
|
50 |
-
### PESQ (Perceptual Evaluation of Speech Quality)
|
51 |
PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
|
52 |
|
53 |
-
## How to Reproduce Our Results
|
54 |
-
The ASR Leaderboard will continue as an effort to benchmark open-source TTS models based on the metrics mentioned above. To reproduce these results, check our [GitHub repository](https://github.com/huggingface/open_asr_leaderboard).
|
55 |
|
56 |
## Benchmark Datasets
|
57 |
-
Model performance is evaluated using our test datasets. These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
|
58 |
"""
|
59 |
|
60 |
|
|
|
18 |
## Metrics
|
19 |
|
20 |
Models in the leaderboard are evaluated using several key metrics:
|
21 |
+
* **UTMOS** (UTokyo-SaruLab Mean Opinion Score),
|
22 |
* **WER** (Word Error Rate),
|
23 |
* **STOI** (Short-Time Objective Intelligibility),
|
24 |
* **PESQ** (Perceptual Evaluation of Speech Quality).
|
25 |
|
26 |
+
These metrics help evaluate both the accuracy and quality of the model.
|
27 |
+
|
28 |
+
### UTMOS (UTokyo-SaruLab Mean Opinion Score)[Paper](https://arxiv.org/abs/2204.02152)
|
29 |
+
UTMOS is a MOS prediction system. **A higher UTMOS indicates better quality** of the generated voice.
|
30 |
|
|
|
|
|
31 |
|
32 |
### WER (Word Error Rate)
|
33 |
WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
|
|
|
45 |
WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
|
46 |
```
|
47 |
|
48 |
+
### STOI (Short-Time Objective Intelligibility)[Paper](https://ieeexplore.ieee.org/abstract/document/5495701?casa_token=PLtqLc8KNAgAAAAA:FOLuZ4dgMYsnGb1dQHgqVOouQzRJ3vA5yqj-sbwf8gs9Q-AIDCLkMZzAgzRrAogwwxULK9zsYeE)
|
49 |
STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
|
50 |
|
51 |
+
### PESQ (Perceptual Evaluation of Speech Quality)[Paper](https://ieeexplore.ieee.org/abstract/document/941023?casa_token=jdtHy84_KhQAAAAA:qHN3WbT6cNdufj6OOn_fn0Je0RedMv-WJCmhQ_3CWy4nMTuDvFMF3KstAmKqLx5suQwdPgGByoY)
|
52 |
PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
|
53 |
|
|
|
|
|
54 |
|
55 |
## Benchmark Datasets
|
56 |
+
Model performance is evaluated using [our test datasets](https://huggingface.co/spaces/rjzevallos/test_app/blob/main/bsc.txt). These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
|
57 |
"""
|
58 |
|
59 |
|