update model

Browse files

Files changed (4) hide show

README.md +21 -13
config.json +1 -1
pytorch_model.bin +2 -2
vocab.json +1 -1

README.md CHANGED Viewed

@@ -2,6 +2,7 @@
 language: ar
 datasets:
 - common_voice
 metrics:
 - wer
 - cer
@@ -24,15 +25,15 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 40.52
        - name: Test CER
          type: cer
-         value: 18.37
 ---
 # Wav2Vec2-Large-XLSR-53-Arabic
-Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice).
 When using this model, make sure that your speech input is sampled at 16kHz.
 The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
@@ -49,7 +50,7 @@ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
 LANG_ID = "ar"
 MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
-SAMPLES = 5
 test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
@@ -81,11 +82,16 @@ for i, predicted_sentence in enumerate(predicted_sentences):
 | Reference  | Prediction |
 | ------------- | ------------- |
-| ألديك قلم ؟ |  ألديك قلم |
-| ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست لنارك مسافة على هذه الأرض أبعد من يوم الأمس |
-| إنك تكبر المشكلة. | إنك تكبر المشكلة ك |
-| يرغب أن يلتقي بك. | يرغب أن يلتقي بك ن |
 | إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
 ## Evaluation
@@ -102,9 +108,11 @@ LANG_ID = "ar"
 MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
 DEVICE = "cuda"
-CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
-                   "؟", "،", "।", "॥", "«", "»", "„", "“", "”", "「", "」", "‘", "’", "《", "》", "(", ")", "[", "]",
-                   "=", "`", "_", "+", "<", ">", "…", "–", "°", "´", "ʾ", "‹", "›", "©", "®", "—", "→", "。"]
 test_dataset = load_dataset("common_voice", LANG_ID, split="test")
@@ -152,11 +160,11 @@ print(f"CER: {cer.compute(predictions=predictions, references=references, chunk_
 **Test Result**:
-In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-21). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
 | Model | WER | CER |
 | ------------- | ------------- | ------------- |
-| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **40.52%** | **18.37%** |
 | bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
 | othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
 | kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |

 language: ar
 datasets:
 - common_voice
+- arabic_speech_corpus
 metrics:
 - wer
 - cer
     metrics:
        - name: Test WER
          type: wer
+         value: 39.59
        - name: Test CER
          type: cer
+         value: 18.18
 ---
 # Wav2Vec2-Large-XLSR-53-Arabic
+Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice) and [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus).
 When using this model, make sure that your speech input is sampled at 16kHz.
 The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
 LANG_ID = "ar"
 MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
+SAMPLES = 10
 test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
 | Reference  | Prediction |
 | ------------- | ------------- |
+| ألديك قلم ؟ | ألديك قلم |
+| ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست نالك مسافة على هذه الأرض أبعد من يوم الأمس  م |
+| إنك تكبر المشكلة. | إنك تكبر المشكلة |
+| يرغب أن يلتقي بك. | يرغب أن يلتقي بك |
 | إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
+| سيسعدني مساعدتك أي وقت تحب. | سيسئدنيمساعدتك أي وقد تحب |
+| أَحَبُّ نظريّة علمية إليّ هي أن حلقات زحل مكونة بالكامل من الأمتعة المفقودة. | أحب نظرية علمية إلي  هي أن حل قتزح المكوينا بالكامل من الأمت عن المفقودة |
+| سأشتري له قلماً. | سأشتري له قلما |
+| أين المشكلة ؟ | أين المشكل |
+| وَلِلَّهِ يَسْجُدُ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضِ مِنْ دَابَّةٍ وَالْمَلَائِكَةُ وَهُمْ لَا يَسْتَكْبِرُونَ | ولله يسجد ما في السماوات وما في الأرض من دابة والملائكة وهم لا يستكبرون |
 ## Evaluation
 MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
 DEVICE = "cuda"
+CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", "；", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
+                  "؟", "،", "।", "॥", "«", "»", "„", "“", "”", "「", "」", "‘", "’", "《", "》", "(", ")", "[", "]",
+                  "{", "}", "=", "`", "_", "+", "<", ">", "…", "–", "°", "´", "ʾ", "‹", "›", "©", "®", "—", "→", "。",
+                  "、", "﹂", "﹁", "‧", "～", "﹏", "，", "｛", "｝", "（", "）", "［", "］", "【", "】", "‥", "〽",
+                  "『", "』", "〝", "〟", "⟨", "⟩", "〜", "：", "！", "？", "♪", "؛", "/", "\\", "º", "−", "^", "'", "ʻ", "ˆ"]
 test_dataset = load_dataset("common_voice", LANG_ID, split="test")
 **Test Result**:
+In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-14). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
 | Model | WER | CER |
 | ------------- | ------------- | ------------- |
+| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **39.59%** | **18.18%** |
 | bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
 | othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
 | kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |

config.json CHANGED Viewed

@@ -72,5 +72,5 @@
   "num_hidden_layers": 24,
   "pad_token_id": 0,
   "transformers_version": "4.5.0.dev0",
-  "vocab_size": 57
 }

   "num_hidden_layers": 24,
   "pad_token_id": 0,
   "transformers_version": "4.5.0.dev0",
+  "vocab_size": 51
 }

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4e4fd3ae7807254d9e98c4c08aa436ccf52a5fa88586b380a836d7d89c1a5621
-size 1262167512

 version https://git-lfs.github.com/spec/v1
+oid sha256:a0b26f6d9d3edfde1784aef863c192a8cc1e438a23b45910ab648531ebe1857b
+size 1262142936

vocab.json CHANGED Viewed

@@ -1 +1 @@

- {"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "ّ": 5, "ٌ": 6, "-": 7, "ر": 8, "ض": 9, "آ": 10, "ط": 11, "ٰ": 12, "ؤ": 13, "و": 14, "ق": 15, "ـ": 16, "ة": 17, "ِ": 18, "د": 19, "ذ": 20, "ز": 21, "ظ": 22, "ل": 23, "س": 24, "ْ": 25, "ُ": 26, "ف": 27, "ب": 28, "ش": 29, "ء": 30, "ۖ": 31, "ه": 32, "ت": 33, "ي": 34, "ج": 35, "ا": 36, "إ": 37, "ئ": 38, "أ": 39, "ك": 40, "ٍ": 41, "ً": 42, "ث": 43, "غ": 44, "خ": 45, "ک": 46, "ى": 47, "ص": 48, "َ": 49, "ی": 50~~, "ھ": 51, "م": 52, "ع": 53, "ن": 54, "؛": 55, "ح": 56~~}

+ {"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "-": 5, "ء": 6, "آ": 7, "أ": 8, "ؤ": 9, "إ": 10, "ئ": 11, "ا": 12, "ب": 13, "ة": 14, "ت": 15, "ث": 16, "ج": 17, "ح": 18, "خ": 19, "د": 20, "ذ": 21, "ر": 22, "ز": 23, "س": 24, "ش": 25, "ص": 26, "ض": 27, "ط": 28, "ظ": 29, "ع": 30, "غ": 31, "ـ": 32, "ف": 33, "ق": 34, "ك": 35, "ل": 36, "م": 37, "ن": 38, "ه": 39, "و": 40, "ى": 41, "ي": 42, "ً": 43, "ٌ": 44, "ٍ": 45, "َ": 46, "ُ": 47, "ِ": 48, "ّ": 49, "ْ": 50}