bakrianoo commited on
Commit
7bd1af0
1 Parent(s): 720a1a6

Set a definition for WER

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -41,6 +41,8 @@ Please install:
41
 
42
  We evaluated the model against different Arabic-STT Wav2Vec models.
43
 
 
 
44
  | | Model | [using transliteration](https://pypi.org/project/lang-trans/) | WER | Training Datasets |
45
  |---:|:--------------------------------------|:---------------------|---------:|---------:|
46
  | 1 | bakrianoo/sinai-voice-ar-stt | True | 0.238001 |Common Voice 6|
@@ -80,8 +82,8 @@ resamplers = { # all three sampling rates exist in test split
80
  transformation = jiwer.Compose([
81
  # normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
82
  jiwer.SubstituteRegexes({
83
- r'[auiFNKo\\\\\\\\\\\\\\\\~_،؟»\\\\\\\\\\\\\\\\?;:\\\\\\\\\\\\\\\\-,\\\\\\\\\\\\\\\\.؛«!"]': "", "\\\\\\\\\\\\\\\\u06D6": "",
84
- r"[\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\{]": "A", "p": "h", "ک": "k", "ی": "y"}),
85
  # default transformation below
86
  jiwer.RemoveMultipleSpaces(),
87
  jiwer.Strip(),
@@ -274,8 +276,8 @@ test_split = test_split.map(predict, batched=True, batch_size=16, remove_columns
274
  transformation = jiwer.Compose([
275
  # normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
276
  jiwer.SubstituteRegexes({
277
- r'[auiFNKo\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~_،؟»\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\?;:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.؛«!"]': "", "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\u06D6": "",
278
- r"[\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\{]": "A", "p": "h", "ک": "k", "ی": "y"}),
279
  # default transformation below
280
  jiwer.RemoveMultipleSpaces(),
281
  jiwer.Strip(),
@@ -293,6 +295,8 @@ print(f"WER: {metrics['wer']:.2%}")
293
  ```
294
  **Test Result**: 23.80%
295
 
 
 
296
 
297
  ## Other Arabic Voice recognition Models
298
 
 
41
 
42
  We evaluated the model against different Arabic-STT Wav2Vec models.
43
 
44
+ [**WER**: Word Error Rate] The Lowest score you get, the best model you have
45
+
46
  | | Model | [using transliteration](https://pypi.org/project/lang-trans/) | WER | Training Datasets |
47
  |---:|:--------------------------------------|:---------------------|---------:|---------:|
48
  | 1 | bakrianoo/sinai-voice-ar-stt | True | 0.238001 |Common Voice 6|
 
82
  transformation = jiwer.Compose([
83
  # normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
84
  jiwer.SubstituteRegexes({
85
+ r'[auiFNKo\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~_،؟»\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\?;:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.؛«!"]': "", "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\u06D6": "",
86
+ r"[\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\{]": "A", "p": "h", "ک": "k", "ی": "y"}),
87
  # default transformation below
88
  jiwer.RemoveMultipleSpaces(),
89
  jiwer.Strip(),
 
276
  transformation = jiwer.Compose([
277
  # normalize some diacritics, remove punctuation, and replace Persian letters with Arabic ones
278
  jiwer.SubstituteRegexes({
279
+ r'[auiFNKo\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~_،؟»\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\?;:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-,\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\.؛«!"]': "", "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\u06D6": "",
280
+ r"[\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\|\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\{]": "A", "p": "h", "ک": "k", "ی": "y"}),
281
  # default transformation below
282
  jiwer.RemoveMultipleSpaces(),
283
  jiwer.Strip(),
 
295
  ```
296
  **Test Result**: 23.80%
297
 
298
+ [**WER**: Word Error Rate] The Lowest score you get, the best model you have
299
+
300
 
301
  ## Other Arabic Voice recognition Models
302