Facebook's Wav2Vec2 XLS-R fine-tuned for Speech Translation.
This is a SpeechEncoderDecoderModel model.
The encoder was warm-started from the
facebook/wav2vec2-xls-r-300m checkpoint and
the decoder from the
Consequently, the encoder-decoder model was fine-tuned on 21
en translation pairs of the Covost2 dataset.
The model can translate from the following spoken languages
For more information, please refer to Section 5.1.2 of the official XLS-R paper.
The model can be tested directly on the speech recognition widget on this model card! Simple record some audio in one of the possible spoken languages or pick an example audio file to see how well the checkpoint can translate the input.
As this a standard sequence to sequence transformer model, you can use the
generate method to generate the
transcripts by passing the speech features to the model.
You can use the model directly via the ASR pipeline
from datasets import load_dataset from transformers import pipeline # replace following lines to load an audio file of your choice librispeech_en = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation") audio_file = librispeech_en["file"] asr = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-xls-r-300m-21-to-en", feature_extractor="facebook/wav2vec2-xls-r-300m-21-to-en") translation = asr(audio_file)
or step-by-step as follows:
import torch from transformers import Speech2Text2Processor, SpeechEncoderDecoderModel from datasets import load_dataset model = SpeechEncoderDecoderModel.from_pretrained("facebook/wav2vec2-xls-r-300m-21-to-en") processor = Speech2Text2Processor.from_pretrained("facebook/wav2vec2-xls-r-300m-21-to-en") ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation") inputs = processor(ds["audio"]["array"], sampling_rate=ds["audio"]["array"]["sampling_rate"], return_tensors="pt") generated_ids = model.generate(input_ids=inputs["input_features"], attention_mask=inputs["attention_mask"]) transcription = processor.batch_decode(generated_ids)
See the row of XLS-R (0.3B) for the performance on Covost2 for this model.
- Downloads last month