elgeish
/

wav2vec2-large-xlsr-53-arabic

@@ -25,13 +25,16 @@ model-index:
        - name: Test WER
          type: wer
          value: 26.55
 ---
 # Wav2Vec2-Large-XLSR-53-Arabic
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
-on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice)
-and the [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus) datasets.
 When using this model, make sure that your speech input is sampled at 16kHz.
 ## Usage
@@ -174,5 +177,26 @@ print(f"WER: {metrics['wer']:.2%}")
 ## Training
-You can find the script used to produce this model
-[here](https://github.com/elgeish/transformers/blob/cfc0bd01f2ac2ea3a5acc578ef2e204bf4304de7/examples/research_projects/wav2vec2/finetune_base_arabic_speech_corpus.sh).

        - name: Test WER
          type: wer
          value: 26.55
+       - name: Validation WER
+         type: wer
+         value: 23.39
 ---
 # Wav2Vec2-Large-XLSR-53-Arabic
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
+on Arabic using the `train` splits of [Common Voice](https://huggingface.co/datasets/common_voice)
+and [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus).
 When using this model, make sure that your speech input is sampled at 16kHz.
 ## Usage
 ## Training
+For more details, see [Fine-Tuning with Arabic Speech Corpus](https://github.com/huggingface/transformers/tree/1c06240e1b3477728129bb58e7b6c7734bb5074e/examples/research_projects/wav2vec2#fine-tuning-with-arabic-speech-corpus).
+This model represents Arabic in a format called [Buckwalter transliteration](https://en.wikipedia.org/wiki/Buckwalter_transliteration).
+The Buckwalter format only includes ASCII characters, some of which are non-alpha (e.g., `">"` maps to `"أ"`).
+The [lang-trans](https://github.com/kariminf/lang-trans) package is used to convert (transliterate) Arabic abjad.
+[This script](https://github.com/huggingface/transformers/blob/1c06240e1b3477728129bb58e7b6c7734bb5074e/examples/research_projects/wav2vec2/finetune_large_xlsr_53_arabic_speech_corpus.sh)
+was used to first fine-tune [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
+on the `train` split of the [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus) dataset;
+the `validation` split was used for model selection; the resulting model at this point is saved as [elgeish/wav2vec2-large-xlsr-53-levantine-arabic](https://huggingface.co/elgeish/wav2vec2-large-xlsr-53-levantine-arabic).
+Training was then resumed using the `train` split of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset;
+similarly, the `validation` split was used for model selection;
+training was stopped to meet the deadline of [Fine-Tune-XLSR Week](https://github.com/huggingface/transformers/blob/700229f8a4003c4f71f29275e0874b5ba58cd39d/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md):
+this model is the checkpoint at 100k steps and a validation WER of **23.39%**.
+<img src="validation_wer.png" alt="Validation WER" width="50%" />
+It's worth noting that validation WER is trending down, indicating the potential of further training (resuming the decaying learning rate at 7e-6).
+## Future Work
+One area to explore is using `attention_mask` in model input, which is recommended [here](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2).
+Also, exploring data augmentation using datasets used to train models listed [here](https://paperswithcode.com/sota/speech-recognition-on-common-voice-arabic).

validation_wer.png ADDED Viewed