patrickvonplaten commited on
Commit
f54d294
1 Parent(s): 47c41fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -31,9 +31,44 @@ For more information, please refer to Section *5.1.2* of the [official XLS-R pap
31
 
32
  ## Usage
33
 
34
- TODO...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Results `{lang}` -> `en`
37
 
 
 
38
  ![results image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/X-%3EEnglish.png)
39
 
 
31
 
32
  ## Usage
33
 
34
+ As this a standard sequence to sequence transformer model, you can use the `generate` method to generate the
35
+ transcripts by passing the speech features to the model.
36
+
37
+ You can use the model directly via the ASR pipeline
38
+
39
+ ```python
40
+ from datasets import load_dataset
41
+ from transformers import pipeline
42
+
43
+ # replace following lines to load an audio file of your choice
44
+ librispeech_en = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
45
+ audio_file = librispeech_en[0]["file"]
46
+
47
+ asr = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-xls-r-300m-21-to-en", feature_extractor="facebook/wav2vec2-xls-r-300m-21-to-en")
48
+
49
+ translation = asr(audio_file)
50
+ ```
51
+
52
+ or step-by-step as follows:
53
+
54
+ ```python
55
+ import torch
56
+ from transformers import Speech2Text2Processor, SpeechEncoderDecoder
57
+ from datasets import load_dataset
58
+
59
+ model = SpeechEncoderDecoder.from_pretrained("facebook/wav2vec2-xls-r-300m-21-to-en")
60
+ processor = Speech2Text2Processor.from_pretrained("facebook/wav2vec2-xls-r-300m-21-to-en")
61
+
62
+ ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
63
+
64
+ inputs = processor(ds[0]["audio"]["array"], sampling_rate=16_000, return_tensors="pt")
65
+ generated_ids = model.generate(input_ids=inputs["input_features"], attention_mask=inputs["attention_mask"])
66
+ transcription = processor.batch_decode(generated_ids)
67
+ ```
68
 
69
  ## Results `{lang}` -> `en`
70
 
71
+ See the row of **XLS-R (0.3B)** for the performance on [Covost2](https://huggingface.co/datasets/covost2) for this model.
72
+
73
  ![results image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/X-%3EEnglish.png)
74