--- language: multilingual datasets: - common_voice - multilingual_librispeech - covost2 tags: - speech - xls_r - automatic-speech-recognition pipeline_tag: automatic-speech-recognition license: apache-2.0 --- # Wav2Vec2-XLS-R-300M-21-EN Facebook's Wav2Vec2 XLS-R fine-tuned for **Speech Translation.** ![model image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/xls_r.png) This is a [SpeechEncoderDecoderModel](https://huggingface.co/transformers/model_doc/speechencoderdecoder.html) model. The encoder was warm-started from the [**`facebook/wav2vec2-xls-r-300m`**](https://huggingface.co/facebook/wav2vec2-xls-r-300m) checkpoint and the decoder from the [**`facebook/mbart-large-50`**](https://huggingface.co/facebook/mbart-large-50) checkpoint. Consequently, the encoder-decoder model was fine-tuned on 21 `{lang}` -> `en` translation pairs of the [Covost2 dataset](https://huggingface.co/datasets/covost2). The model can translate from the following spoken languages (`{lang}`) to English: {`fr`,`de`,`es`,`ca`,`it`,`ru`,`zh-CN`,`pt`,`fa`,`et`,`mn`,`nl`,`tr`,`ar`,`sv-SE`,`lv`,`sl`,`ta`,`ja`,`id`,`cy`} -> `en` For more information, please refer to Section *5.1.2* of the [official XLS-R paper](https://arxiv.org/abs/2111.09296). ## Usage TODO... ## Results TODO...