--- language: - el tags: - pytorch - ASR --- # Greek (el) version of the XLSR-Wav2Vec2 automatic speech recognition (ASR) model * language: el * licence: apache-2.0 * dataset: CommonVoice (EL), 364MB: https://commonvoice.mozilla.org/el/datasets * model: XLSR-Wav2Vec2 * metrics: WER ### Model description Wav2Vec2 is a pretrained model for Automatic Speech Recognition (ASR) and was released in September 2020 by Alexei Baevski, Michael Auli, and Alex Conneau. Soon after the superior performance of Wav2Vec2 was demonstrated on the English ASR dataset LibriSpeech, Facebook AI presented XLSR-Wav2Vec2 (click here). XLSR stands for cross-lingual speech representations and refers to XLSR-Wav2Vec2`s ability to learn speech representations that are useful across multiple languages. Similar to Wav2Vec2, XLSR-Wav2Vec2 learns powerful speech representations from hundreds of thousands of hours of speech in more than 50 languages of unlabeled speech. Similar, to BERT's masked language modeling, the model learns contextualized speech representations by randomly masking feature vectors before passing them to a transformer network. ### How to use Instructions to replicate the process are included in the Jupyter notebook. ## Metrics | Metric | Value | | ----------- | ----------- | | Training Loss | 0.0536 | | Validation Loss | 0.61605 | | WER | 0.45049 | ### BibTeX entry and citation info Based on the tutorial of Patrick von Platen: https://huggingface.co/blog/fine-tune-xlsr-wav2vec2 Original colab notebook here: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLSR_Wav2Vec2_on_Turkish_ASR_with_%F0%9F%A4%97_Transformers.ipynb#scrollTo=V7YOT2mnUiea