Spaces:

rjzevallos
/

test_app

Sleeping

App Files Files Community

rjzevallos commited on Nov 25, 2024

Commit

bf10d7f

verified ·

1 Parent(s): 047ab5b

Update app.py

Browse files

Files changed (1) hide show

app.py +8 -9

app.py CHANGED Viewed

@@ -18,15 +18,16 @@ METRICS_TAB_TEXT = """
 ## Metrics
 Models in the leaderboard are evaluated using several key metrics:
-* **UTMOS** (User-TTS Mean Opinion Score),
 * **WER** (Word Error Rate),
 * **STOI** (Short-Time Objective Intelligibility),
 * **PESQ** (Perceptual Evaluation of Speech Quality).
-These metrics help evaluate both the accuracy and quality of the model, as well as the inference speed.
-### UTMOS (User-TTS Mean Opinion Score)
-UTMOS is a subjective metric that evaluates the perceived quality of speech generated by a TTS system. **A higher UTMOS indicates better quality** of the generated voice.
 ### WER (Word Error Rate)
 WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
@@ -44,17 +45,15 @@ The WER calculation is done as follows:
 WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
 ```
-### STOI (Short-Time Objective Intelligibility)
 STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
-### PESQ (Perceptual Evaluation of Speech Quality)
 PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
-## How to Reproduce Our Results
-The ASR Leaderboard will continue as an effort to benchmark open-source TTS models based on the metrics mentioned above. To reproduce these results, check our [GitHub repository](https://github.com/huggingface/open_asr_leaderboard).
 ## Benchmark Datasets
-Model performance is evaluated using our test datasets. These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
 """

 ## Metrics
 Models in the leaderboard are evaluated using several key metrics:
+* **UTMOS** (UTokyo-SaruLab Mean Opinion Score),
 * **WER** (Word Error Rate),
 * **STOI** (Short-Time Objective Intelligibility),
 * **PESQ** (Perceptual Evaluation of Speech Quality).
+These metrics help evaluate both the accuracy and quality of the model.
+### UTMOS (UTokyo-SaruLab Mean Opinion Score)[Paper](https://arxiv.org/abs/2204.02152)
+UTMOS is a MOS prediction system. **A higher UTMOS indicates better quality** of the generated voice.
 ### WER (Word Error Rate)
 WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
 WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
 ```
+### STOI (Short-Time Objective Intelligibility)[Paper](https://ieeexplore.ieee.org/abstract/document/5495701?casa_token=PLtqLc8KNAgAAAAA:FOLuZ4dgMYsnGb1dQHgqVOouQzRJ3vA5yqj-sbwf8gs9Q-AIDCLkMZzAgzRrAogwwxULK9zsYeE)
 STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
+### PESQ (Perceptual Evaluation of Speech Quality)[Paper](https://ieeexplore.ieee.org/abstract/document/941023?casa_token=jdtHy84_KhQAAAAA:qHN3WbT6cNdufj6OOn_fn0Je0RedMv-WJCmhQ_3CWy4nMTuDvFMF3KstAmKqLx5suQwdPgGByoY)
 PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
 ## Benchmark Datasets
+Model performance is evaluated using [our test datasets](https://huggingface.co/spaces/rjzevallos/test_app/blob/main/bsc.txt). These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
 """