jonatasgrosman
commited on
Commit
•
fbecdcb
1
Parent(s):
ded0380
update model
Browse files- README.md +21 -13
- config.json +1 -1
- pytorch_model.bin +2 -2
- vocab.json +1 -1
README.md
CHANGED
@@ -2,6 +2,7 @@
|
|
2 |
language: ar
|
3 |
datasets:
|
4 |
- common_voice
|
|
|
5 |
metrics:
|
6 |
- wer
|
7 |
- cer
|
@@ -24,15 +25,15 @@ model-index:
|
|
24 |
metrics:
|
25 |
- name: Test WER
|
26 |
type: wer
|
27 |
-
value:
|
28 |
- name: Test CER
|
29 |
type: cer
|
30 |
-
value: 18.
|
31 |
---
|
32 |
|
33 |
# Wav2Vec2-Large-XLSR-53-Arabic
|
34 |
|
35 |
-
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice).
|
36 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
37 |
|
38 |
The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
|
@@ -49,7 +50,7 @@ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
|
49 |
|
50 |
LANG_ID = "ar"
|
51 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
52 |
-
SAMPLES =
|
53 |
|
54 |
test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
|
55 |
|
@@ -81,11 +82,16 @@ for i, predicted_sentence in enumerate(predicted_sentences):
|
|
81 |
|
82 |
| Reference | Prediction |
|
83 |
| ------------- | ------------- |
|
84 |
-
| ألديك قلم ؟ |
|
85 |
-
| ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست
|
86 |
-
| إنك تكبر المشكلة. | إنك تكبر المشكلة
|
87 |
-
| يرغب أن يلتقي بك. | يرغب أن يلتقي بك
|
88 |
| إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
|
|
|
|
|
|
|
|
|
|
|
89 |
|
90 |
## Evaluation
|
91 |
|
@@ -102,9 +108,11 @@ LANG_ID = "ar"
|
|
102 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
103 |
DEVICE = "cuda"
|
104 |
|
105 |
-
CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
|
106 |
-
|
107 |
-
|
|
|
|
|
108 |
|
109 |
test_dataset = load_dataset("common_voice", LANG_ID, split="test")
|
110 |
|
@@ -152,11 +160,11 @@ print(f"CER: {cer.compute(predictions=predictions, references=references, chunk_
|
|
152 |
|
153 |
**Test Result**:
|
154 |
|
155 |
-
In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-
|
156 |
|
157 |
| Model | WER | CER |
|
158 |
| ------------- | ------------- | ------------- |
|
159 |
-
| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **
|
160 |
| bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
|
161 |
| othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
|
162 |
| kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
|
|
|
2 |
language: ar
|
3 |
datasets:
|
4 |
- common_voice
|
5 |
+
- arabic_speech_corpus
|
6 |
metrics:
|
7 |
- wer
|
8 |
- cer
|
|
|
25 |
metrics:
|
26 |
- name: Test WER
|
27 |
type: wer
|
28 |
+
value: 39.59
|
29 |
- name: Test CER
|
30 |
type: cer
|
31 |
+
value: 18.18
|
32 |
---
|
33 |
|
34 |
# Wav2Vec2-Large-XLSR-53-Arabic
|
35 |
|
36 |
+
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice) and [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus).
|
37 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
38 |
|
39 |
The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
|
|
|
50 |
|
51 |
LANG_ID = "ar"
|
52 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
53 |
+
SAMPLES = 10
|
54 |
|
55 |
test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
|
56 |
|
|
|
82 |
|
83 |
| Reference | Prediction |
|
84 |
| ------------- | ------------- |
|
85 |
+
| ألديك قلم ؟ | ألديك قلم |
|
86 |
+
| ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست نالك مسافة على هذه الأرض أبعد من يوم الأمس م |
|
87 |
+
| إنك تكبر المشكلة. | إنك تكبر المشكلة |
|
88 |
+
| يرغب أن يلتقي بك. | يرغب أن يلتقي بك |
|
89 |
| إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
|
90 |
+
| سيسعدني مساعدتك أي وقت تحب. | سيسئدنيمساعدتك أي وقد تحب |
|
91 |
+
| أَحَبُّ نظريّة علمية إليّ هي أن حلقات زحل مكونة بالكامل من الأمتعة المفقودة. | أحب نظرية علمية إلي هي أن حل قتزح المكوينا بالكامل من الأمت عن المفقودة |
|
92 |
+
| سأشتري له قلماً. | سأشتري له قلما |
|
93 |
+
| أين المشكلة ؟ | أين المشكل |
|
94 |
+
| وَلِلَّهِ يَسْجُدُ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضِ مِنْ دَابَّةٍ وَالْمَلَائِكَةُ وَهُمْ لَا يَسْتَكْبِرُونَ | ولله يسجد ما في السماوات وما في الأرض من دابة والملائكة وهم لا يستكبرون |
|
95 |
|
96 |
## Evaluation
|
97 |
|
|
|
108 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
109 |
DEVICE = "cuda"
|
110 |
|
111 |
+
CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", ";", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
|
112 |
+
"؟", "،", "।", "॥", "«", "»", "„", "“", "”", "「", "」", "‘", "’", "《", "》", "(", ")", "[", "]",
|
113 |
+
"{", "}", "=", "`", "_", "+", "<", ">", "…", "–", "°", "´", "ʾ", "‹", "›", "©", "®", "—", "→", "。",
|
114 |
+
"、", "﹂", "﹁", "‧", "~", "﹏", ",", "{", "}", "(", ")", "[", "]", "【", "】", "‥", "〽",
|
115 |
+
"『", "』", "〝", "〟", "⟨", "⟩", "〜", ":", "!", "?", "♪", "؛", "/", "\\", "º", "−", "^", "'", "ʻ", "ˆ"]
|
116 |
|
117 |
test_dataset = load_dataset("common_voice", LANG_ID, split="test")
|
118 |
|
|
|
160 |
|
161 |
**Test Result**:
|
162 |
|
163 |
+
In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-14). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
|
164 |
|
165 |
| Model | WER | CER |
|
166 |
| ------------- | ------------- | ------------- |
|
167 |
+
| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **39.59%** | **18.18%** |
|
168 |
| bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
|
169 |
| othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
|
170 |
| kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
|
config.json
CHANGED
@@ -72,5 +72,5 @@
|
|
72 |
"num_hidden_layers": 24,
|
73 |
"pad_token_id": 0,
|
74 |
"transformers_version": "4.5.0.dev0",
|
75 |
-
"vocab_size":
|
76 |
}
|
|
|
72 |
"num_hidden_layers": 24,
|
73 |
"pad_token_id": 0,
|
74 |
"transformers_version": "4.5.0.dev0",
|
75 |
+
"vocab_size": 51
|
76 |
}
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a0b26f6d9d3edfde1784aef863c192a8cc1e438a23b45910ab648531ebe1857b
|
3 |
+
size 1262142936
|
vocab.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "
|
|
|
1 |
+
{"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "-": 5, "ء": 6, "آ": 7, "أ": 8, "ؤ": 9, "إ": 10, "ئ": 11, "ا": 12, "ب": 13, "ة": 14, "ت": 15, "ث": 16, "ج": 17, "ح": 18, "خ": 19, "د": 20, "ذ": 21, "ر": 22, "ز": 23, "س": 24, "ش": 25, "ص": 26, "ض": 27, "ط": 28, "ظ": 29, "ع": 30, "غ": 31, "ـ": 32, "ف": 33, "ق": 34, "ك": 35, "ل": 36, "م": 37, "ن": 38, "ه": 39, "و": 40, "ى": 41, "ي": 42, "ً": 43, "ٌ": 44, "ٍ": 45, "َ": 46, "ُ": 47, "ِ": 48, "ّ": 49, "ْ": 50}
|