rjzevallos commited on
Commit
bf10d7f
·
verified ·
1 Parent(s): 047ab5b

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +8 -9
app.py CHANGED
@@ -18,15 +18,16 @@ METRICS_TAB_TEXT = """
18
  ## Metrics
19
 
20
  Models in the leaderboard are evaluated using several key metrics:
21
- * **UTMOS** (User-TTS Mean Opinion Score),
22
  * **WER** (Word Error Rate),
23
  * **STOI** (Short-Time Objective Intelligibility),
24
  * **PESQ** (Perceptual Evaluation of Speech Quality).
25
 
26
- These metrics help evaluate both the accuracy and quality of the model, as well as the inference speed.
 
 
 
27
 
28
- ### UTMOS (User-TTS Mean Opinion Score)
29
- UTMOS is a subjective metric that evaluates the perceived quality of speech generated by a TTS system. **A higher UTMOS indicates better quality** of the generated voice.
30
 
31
  ### WER (Word Error Rate)
32
  WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
@@ -44,17 +45,15 @@ The WER calculation is done as follows:
44
  WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
45
  ```
46
 
47
- ### STOI (Short-Time Objective Intelligibility)
48
  STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
49
 
50
- ### PESQ (Perceptual Evaluation of Speech Quality)
51
  PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
52
 
53
- ## How to Reproduce Our Results
54
- The ASR Leaderboard will continue as an effort to benchmark open-source TTS models based on the metrics mentioned above. To reproduce these results, check our [GitHub repository](https://github.com/huggingface/open_asr_leaderboard).
55
 
56
  ## Benchmark Datasets
57
- Model performance is evaluated using our test datasets. These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
58
  """
59
 
60
 
 
18
  ## Metrics
19
 
20
  Models in the leaderboard are evaluated using several key metrics:
21
+ * **UTMOS** (UTokyo-SaruLab Mean Opinion Score),
22
  * **WER** (Word Error Rate),
23
  * **STOI** (Short-Time Objective Intelligibility),
24
  * **PESQ** (Perceptual Evaluation of Speech Quality).
25
 
26
+ These metrics help evaluate both the accuracy and quality of the model.
27
+
28
+ ### UTMOS (UTokyo-SaruLab Mean Opinion Score)[Paper](https://arxiv.org/abs/2204.02152)
29
+ UTMOS is a MOS prediction system. **A higher UTMOS indicates better quality** of the generated voice.
30
 
 
 
31
 
32
  ### WER (Word Error Rate)
33
  WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
 
45
  WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
46
  ```
47
 
48
+ ### STOI (Short-Time Objective Intelligibility)[Paper](https://ieeexplore.ieee.org/abstract/document/5495701?casa_token=PLtqLc8KNAgAAAAA:FOLuZ4dgMYsnGb1dQHgqVOouQzRJ3vA5yqj-sbwf8gs9Q-AIDCLkMZzAgzRrAogwwxULK9zsYeE)
49
  STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
50
 
51
+ ### PESQ (Perceptual Evaluation of Speech Quality)[Paper](https://ieeexplore.ieee.org/abstract/document/941023?casa_token=jdtHy84_KhQAAAAA:qHN3WbT6cNdufj6OOn_fn0Je0RedMv-WJCmhQ_3CWy4nMTuDvFMF3KstAmKqLx5suQwdPgGByoY)
52
  PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
53
 
 
 
54
 
55
  ## Benchmark Datasets
56
+ Model performance is evaluated using [our test datasets](https://huggingface.co/spaces/rjzevallos/test_app/blob/main/bsc.txt). These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
57
  """
58
 
59