comodoro
/

wav2vec2-xls-r-300m-sk-cv8

Automatic Speech Recognition

mozilla-foundation/common_voice_8_0

robust-speech-event

xlsr-fine-tuning-week

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

comodoro commited on Feb 1, 2022

Commit

27bf67c

·

1 Parent(s): d3ffde3

Fix eval script and readme

Files changed (2) hide show

README.md +3 -3
eval.py +1 -1

README.md CHANGED Viewed

@@ -22,10 +22,10 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 55.2
        - name: Test CER
          type: cer
-         value: 14.4
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -77,7 +77,7 @@ print("Reference:", test_dataset[:2]["sentence"])
 The model can be evaluated using the attached `eval.py` script:
 ```
-python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sk-cv8 --dataset mozilla-foundation/common-voice_8_0 --split test --config sk
 ```
 ## Training and evaluation data

     metrics:
        - name: Test WER
          type: wer
+         value: 59.5
        - name: Test CER
          type: cer
+         value: 15.6
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 The model can be evaluated using the attached `eval.py` script:
 ```
+python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sk-cv8 --dataset mozilla-foundation/common_voice_8_0 --split test --config sk
 ```
 ## Training and evaluation data

eval.py CHANGED Viewed

@@ -91,7 +91,7 @@ def normalize_text(text: str) -> str:
     text = unicodedata.normalize('NFKC', text)
     # remove punctuation
     text = re.sub(chars_to_ignore_regex, "", text)
-    batch["sentence"] = replace_chars(batch['sentence'])
     # Let's also make sure we split on all kinds of newlines, spaces, etc...
     text = " ".join(text.split())

     text = unicodedata.normalize('NFKC', text)
     # remove punctuation
     text = re.sub(chars_to_ignore_regex, "", text)
+    text = replace_chars(text)
     # Let's also make sure we split on all kinds of newlines, spaces, etc...
     text = " ".join(text.split())