patrickvonplaten commited on
Commit
5b59153
1 Parent(s): 9f16d8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -1
README.md CHANGED
@@ -12,4 +12,76 @@ pipeline_tag: automatic-speech-recognition
12
  license: apache-2.0
13
  ---
14
 
15
- # Wav2Vec2-XLS-R-1B-21-EN
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  license: apache-2.0
13
  ---
14
 
15
+ # Wav2Vec2-XLS-R-2b-21-EN
16
+
17
+ Facebook's Wav2Vec2 XLS-R fine-tuned for **Speech Translation.**
18
+
19
+ ![model image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/xls_r.png)
20
+
21
+ This is a [SpeechEncoderDecoderModel](https://huggingface.co/transformers/model_doc/speechencoderdecoder.html) model.
22
+ The encoder was warm-started from the [**`facebook/wav2vec2-xls-r-1b`**](https://huggingface.co/facebook/wav2vec2-xls-r-1b) checkpoint and
23
+ the decoder from the [**`facebook/mbart-large-50`**](https://huggingface.co/facebook/mbart-large-50) checkpoint.
24
+ Consequently, the encoder-decoder model was fine-tuned on 21 `{lang}` -> `en` translation pairs of the [Covost2 dataset](https://huggingface.co/datasets/covost2).
25
+
26
+ The model can translate from the following spoken languages `{lang}` -> `en` (English):
27
+
28
+ {`fr`, `de`, `es`, `ca`, `it`, `ru`, `zh-CN`, `pt`, `fa`, `et`, `mn`, `nl`, `tr`, `ar`, `sv-SE`, `lv`, `sl`, `ta`, `ja`, `id`, `cy`} -> `en`
29
+
30
+ For more information, please refer to Section *5.1.2* of the [official XLS-R paper](https://arxiv.org/abs/2111.09296).
31
+
32
+ ## Usage
33
+
34
+ ### Demo
35
+
36
+ The model can be tested directly on the speech recognition widget on this model card!
37
+ Simple record some audio in one of the possible spoken languages or pick an example audio file to see how well the checkpoint can translate the input.
38
+
39
+ ### Example
40
+
41
+ As this a standard sequence to sequence transformer model, you can use the `generate` method to generate the
42
+ transcripts by passing the speech features to the model.
43
+
44
+ You can use the model directly via the ASR pipeline
45
+
46
+ ```python
47
+ from datasets import load_dataset
48
+ from transformers import pipeline
49
+
50
+ # replace following lines to load an audio file of your choice
51
+ librispeech_en = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
52
+ audio_file = librispeech_en[0]["file"]
53
+
54
+ asr = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-xls-r-1b-21-to-en", feature_extractor="facebook/wav2vec2-xls-r-1b-21-to-en")
55
+
56
+ translation = asr(audio_file)
57
+ ```
58
+
59
+ or step-by-step as follows:
60
+
61
+ ```python
62
+ import torch
63
+ from transformers import Speech2Text2Processor, SpeechEncoderDecoder
64
+ from datasets import load_dataset
65
+
66
+ model = SpeechEncoderDecoder.from_pretrained("facebook/wav2vec2-xls-r-1b-21-to-en")
67
+ processor = Speech2Text2Processor.from_pretrained("facebook/wav2vec2-xls-r-1b-21-to-en")
68
+
69
+ ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
70
+
71
+ inputs = processor(ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["array"]["sampling_rate"], return_tensors="pt")
72
+ generated_ids = model.generate(input_ids=inputs["input_features"], attention_mask=inputs["attention_mask"])
73
+ transcription = processor.batch_decode(generated_ids)
74
+ ```
75
+
76
+ ## Results `{lang}` -> `en`
77
+
78
+ See the row of **XLS-R (1B)** for the performance on [Covost2](https://huggingface.co/datasets/covost2) for this model.
79
+
80
+ ![results image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/X-%3EEnglish.png)
81
+
82
+ ## More XLS-R models for `{lang}` -> `en` Speech Translation
83
+
84
+ - [Wav2Vec2-XLS-R-300M-21-EN](https://huggingface.co/facebook/wav2vec2-xls-r-300m-21-to-en)
85
+ - [Wav2Vec2-XLS-R-1B-21-EN](https://huggingface.co/facebook/wav2vec2-xls-r-1b-21-to-en)
86
+ - [Wav2Vec2-XLS-R-2B-21-EN](https://huggingface.co/facebook/wav2vec2-xls-r-2b-21-to-en)
87
+ - [Wav2Vec2-XLS-R-2B-22-16](https://huggingface.co/facebook/wav2vec2-xls-r-2b-22-to-16)