patrickvonplaten's picture
Update README.md
543c476
metadata
language: multilingual
datasets:
  - common_voice
  - multilingual_librispeech
  - covost2
tags:
  - speech
  - xls_r
  - automatic-speech-recognition
pipeline_tag: automatic-speech-recognition
license: apache-2.0

Wav2Vec2-XLS-R-300M-21-EN

Facebook's Wav2Vec2 XLS-R fine-tuned for Speech Translation.

model image

This is a SpeechEncoderDecoderModel model. The encoder was warm-started from the facebook/wav2vec2-xls-r-300m checkpoint and the decoder from the facebook/mbart-large-50 checkpoint. Consequently, the encoder-decoder model was fine-tuned on 21 {lang} -> en translation pairs of the Covost2 dataset.

The model can translate from the following spoken languages ({lang}) to English:

{fr,de,es,ca,it,ru,zh-CN,pt,fa,et,mn,nl,tr,ar,sv-SE,lv,sl,ta,ja,id,cy} -> en

For more information, please refer to Section 5.1.2 of the official XLS-R paper.

Usage

TODO...

Results

TODO...