metadata
language: multilingual
datasets:
- common_voice
- multilingual_librispeech
- covost2
tags:
- speech
- xls_r
- automatic-speech-recognition
pipeline_tag: automatic-speech-recognition
license: apache-2.0
Wav2Vec2-XLS-R-300M-21-EN
Facebook's Wav2Vec2 XLS-R fine-tuned for Speech Translation.
This is a SpeechEncoderDecoderModel model.
The encoder was warm-started from the facebook/wav2vec2-xls-r-300m
checkpoint and
the decoder from the facebook/mbart-large-50
checkpoint.
Consequently, the encoder-decoder model was fine-tuned on 21 {lang}
-> en
translation pairs of the Covost2 dataset.
The model can translate from the following spoken languages ({lang}
) to English:
{fr
,de
,es
,ca
,it
,ru
,zh-CN
,pt
,fa
,et
,mn
,nl
,tr
,ar
,sv-SE
,lv
,sl
,ta
,ja
,id
,cy
} -> en
For more information, please refer to Section 5.1.2 of the official XLS-R paper.
Usage
TODO...
Results
TODO...