facebook
/

wav2vec2-xls-r-300m-21-to-en

Automatic Speech Recognition

speech-encoder-decoder

xls_r_translation

Inference Endpoints

Model card Files Files and versions Community

patrickvonplaten commited on Nov 18, 2021

Commit

f54d294

•

1 Parent(s): 47c41fa

Update README.md

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -31,9 +31,44 @@ For more information, please refer to Section *5.1.2* of the [official XLS-R pap
 ## Usage
-TODO...
 ## Results `{lang}` -> `en`
 ![results image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/X-%3EEnglish.png)

 ## Usage
+As this a standard sequence to sequence transformer model, you can use the `generate` method to generate the
+transcripts by passing the speech features to the model.
+You can use the model directly via the ASR pipeline
+```python
+from datasets import load_dataset
+from transformers import pipeline
+# replace following lines to load an audio file of your choice
+librispeech_en = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
+audio_file = librispeech_en[0]["file"]
+asr = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-xls-r-300m-21-to-en", feature_extractor="facebook/wav2vec2-xls-r-300m-21-to-en")
+translation = asr(audio_file)
+```
+or step-by-step as follows:
+```python
+import torch
+from transformers import Speech2Text2Processor, SpeechEncoderDecoder
+from datasets import load_dataset
+model = SpeechEncoderDecoder.from_pretrained("facebook/wav2vec2-xls-r-300m-21-to-en")
+processor = Speech2Text2Processor.from_pretrained("facebook/wav2vec2-xls-r-300m-21-to-en")
+ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
+inputs = processor(ds[0]["audio"]["array"], sampling_rate=16_000, return_tensors="pt")
+generated_ids = model.generate(input_ids=inputs["input_features"], attention_mask=inputs["attention_mask"])
+transcription = processor.batch_decode(generated_ids)
+```
 ## Results `{lang}` -> `en`
+See the row of **XLS-R (0.3B)** for the performance on [Covost2](https://huggingface.co/datasets/covost2) for this model.
 ![results image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/X-%3EEnglish.png)