jonatasgrosman commited on
Commit
fbecdcb
1 Parent(s): ded0380

update model

Browse files
Files changed (4) hide show
  1. README.md +21 -13
  2. config.json +1 -1
  3. pytorch_model.bin +2 -2
  4. vocab.json +1 -1
README.md CHANGED
@@ -2,6 +2,7 @@
2
  language: ar
3
  datasets:
4
  - common_voice
 
5
  metrics:
6
  - wer
7
  - cer
@@ -24,15 +25,15 @@ model-index:
24
  metrics:
25
  - name: Test WER
26
  type: wer
27
- value: 40.52
28
  - name: Test CER
29
  type: cer
30
- value: 18.37
31
  ---
32
 
33
  # Wav2Vec2-Large-XLSR-53-Arabic
34
 
35
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice).
36
  When using this model, make sure that your speech input is sampled at 16kHz.
37
 
38
  The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
@@ -49,7 +50,7 @@ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
49
 
50
  LANG_ID = "ar"
51
  MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
52
- SAMPLES = 5
53
 
54
  test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
55
 
@@ -81,11 +82,16 @@ for i, predicted_sentence in enumerate(predicted_sentences):
81
 
82
  | Reference | Prediction |
83
  | ------------- | ------------- |
84
- | ألديك قلم ؟ | ألديك قلم |
85
- | ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست لنارك مسافة على هذه الأرض أبعد من يوم الأمس |
86
- | إنك تكبر المشكلة. | إنك تكبر المشكلة ك |
87
- | يرغب أن يلتقي بك. | يرغب أن يلتقي بك ن |
88
  | إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
 
 
 
 
 
89
 
90
  ## Evaluation
91
 
@@ -102,9 +108,11 @@ LANG_ID = "ar"
102
  MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
103
  DEVICE = "cuda"
104
 
105
- CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
106
- "؟", "،", "।", "॥", "«", "»", "„", "“", "”", "「", "」", "‘", "’", "《", "》", "(", ")", "[", "]",
107
- "=", "`", "_", "+", "<", ">", "…", "–", "°", "´", "ʾ", "‹", "›", "©", "®", "—", "→", "。"]
 
 
108
 
109
  test_dataset = load_dataset("common_voice", LANG_ID, split="test")
110
 
@@ -152,11 +160,11 @@ print(f"CER: {cer.compute(predictions=predictions, references=references, chunk_
152
 
153
  **Test Result**:
154
 
155
- In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-04-21). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
156
 
157
  | Model | WER | CER |
158
  | ------------- | ------------- | ------------- |
159
- | jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **40.52%** | **18.37%** |
160
  | bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
161
  | othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
162
  | kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
 
2
  language: ar
3
  datasets:
4
  - common_voice
5
+ - arabic_speech_corpus
6
  metrics:
7
  - wer
8
  - cer
 
25
  metrics:
26
  - name: Test WER
27
  type: wer
28
+ value: 39.59
29
  - name: Test CER
30
  type: cer
31
+ value: 18.18
32
  ---
33
 
34
  # Wav2Vec2-Large-XLSR-53-Arabic
35
 
36
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice) and [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus).
37
  When using this model, make sure that your speech input is sampled at 16kHz.
38
 
39
  The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
 
50
 
51
  LANG_ID = "ar"
52
  MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
53
+ SAMPLES = 10
54
 
55
  test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
56
 
 
82
 
83
  | Reference | Prediction |
84
  | ------------- | ------------- |
85
+ | ألديك قلم ؟ | ألديك قلم |
86
+ | ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست نالك مسافة على هذه الأرض أبعد من يوم الأمس م |
87
+ | إنك تكبر المشكلة. | إنك تكبر المشكلة |
88
+ | يرغب أن يلتقي بك. | يرغب أن يلتقي بك |
89
  | إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
90
+ | سيسعدني مساعدتك أي وقت تحب. | سيسئدنيمساعدتك أي وقد تحب |
91
+ | أَحَبُّ نظريّة علمية إليّ هي أن حلقات زحل مكونة بالكامل من الأمتعة المفقودة. | أحب نظرية علمية إلي هي أن حل قتزح المكوينا بالكامل من الأمت عن المفقودة |
92
+ | سأشتري له قلماً. | سأشتري له قلما |
93
+ | أين المشكلة ؟ | أين المشكل |
94
+ | وَلِلَّهِ يَسْجُدُ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضِ مِنْ دَابَّةٍ وَالْمَلَائِكَةُ وَهُمْ لَا يَسْتَكْبِرُونَ | ولله يسجد ما في السماوات وما في الأرض من دابة والملائكة وهم لا يستكبرون |
95
 
96
  ## Evaluation
97
 
 
108
  MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
109
  DEVICE = "cuda"
110
 
111
+ CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", ";", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
112
+ "؟", "،", "।", "॥", "«", "»", "„", "“", "”", "「", "」", "‘", "’", "《", "》", "(", ")", "[", "]",
113
+ "{", "}", "=", "`", "_", "+", "<", ">", "…", "–", "°", "´", "ʾ", "‹", "›", "©", "®", "—", "→", "。",
114
+ "、", "﹂", "﹁", "‧", "~", "﹏", ",", "{", "}", "(", ")", "[", "]", "【", "】", "‥", "〽",
115
+ "『", "』", "〝", "〟", "⟨", "⟩", "〜", ":", "!", "?", "♪", "؛", "/", "\\", "º", "−", "^", "'", "ʻ", "ˆ"]
116
 
117
  test_dataset = load_dataset("common_voice", LANG_ID, split="test")
118
 
 
160
 
161
  **Test Result**:
162
 
163
+ In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-14). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
164
 
165
  | Model | WER | CER |
166
  | ------------- | ------------- | ------------- |
167
+ | jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **39.59%** | **18.18%** |
168
  | bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
169
  | othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
170
  | kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
config.json CHANGED
@@ -72,5 +72,5 @@
72
  "num_hidden_layers": 24,
73
  "pad_token_id": 0,
74
  "transformers_version": "4.5.0.dev0",
75
- "vocab_size": 57
76
  }
 
72
  "num_hidden_layers": 24,
73
  "pad_token_id": 0,
74
  "transformers_version": "4.5.0.dev0",
75
+ "vocab_size": 51
76
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e4fd3ae7807254d9e98c4c08aa436ccf52a5fa88586b380a836d7d89c1a5621
3
- size 1262167512
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a0b26f6d9d3edfde1784aef863c192a8cc1e438a23b45910ab648531ebe1857b
3
+ size 1262142936
vocab.json CHANGED
@@ -1 +1 @@
1
- {"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "ّ": 5, "ٌ": 6, "-": 7, "ر": 8, "ض": 9, "آ": 10, "ط": 11, "ٰ": 12, "ؤ": 13, "و": 14, "ق": 15, "ـ": 16, "ة": 17, "ِ": 18, "د": 19, "ذ": 20, "ز": 21, "ظ": 22, "ل": 23, "س": 24, "ْ": 25, "ُ": 26, "ف": 27, "ب": 28, "ش": 29, "ء": 30, "ۖ": 31, "ه": 32, "ت": 33, "ي": 34, "ج": 35, "ا": 36, "إ": 37, "ئ": 38, "أ": 39, "ك": 40, "ٍ": 41, "ً": 42, "ث": 43, "غ": 44, "خ": 45, "ک": 46, "ى": 47, "ص": 48, "َ": 49, "ی": 50, "ھ": 51, "م": 52, "ع": 53, "ن": 54, "؛": 55, "ح": 56}
 
1
+ {"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "-": 5, "ء": 6, "آ": 7, "أ": 8, "ؤ": 9, "إ": 10, "ئ": 11, "ا": 12, "ب": 13, "ة": 14, "ت": 15, "ث": 16, "ج": 17, "ح": 18, "خ": 19, "د": 20, "ذ": 21, "ر": 22, "ز": 23, "س": 24, "ش": 25, "ص": 26, "ض": 27, "ط": 28, "ظ": 29, "ع": 30, "غ": 31, "ـ": 32, "ف": 33, "ق": 34, "ك": 35, "ل": 36, "م": 37, "ن": 38, "ه": 39, "و": 40, "ى": 41, "ي": 42, "ً": 43, "ٌ": 44, "ٍ": 45, "َ": 46, "ُ": 47, "ِ": 48, "ّ": 49, "ْ": 50}