anuragshas commited on
Commit
2964fff
1 Parent(s): 9507dd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -6,11 +6,28 @@ tags:
6
  - automatic-speech-recognition
7
  - mozilla-foundation/common_voice_8_0
8
  - generated_from_trainer
 
9
  datasets:
10
- - common_voice
 
 
11
  model-index:
12
- - name: ''
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -78,3 +95,37 @@ The following hyperparameters were used during training:
78
  - Pytorch 1.10.2+cu102
79
  - Datasets 1.18.2.dev0
80
  - Tokenizers 0.11.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - automatic-speech-recognition
7
  - mozilla-foundation/common_voice_8_0
8
  - generated_from_trainer
9
+ - robust-speech-event
10
  datasets:
11
+ - mozilla-foundation/common_voice_8_0
12
+ metrics:
13
+ - wer
14
  model-index:
15
+ - name: XLS-R-300M - Maltese
16
+ results:
17
+ - task:
18
+ type: automatic-speech-recognition
19
+ name: Speech Recognition
20
+ dataset:
21
+ type: mozilla-foundation/common_voice_8_0
22
+ name: Common Voice 8
23
+ args: ur
24
+ metrics:
25
+ - type: wer # Required. Example: wer
26
+ value: 15.967 # Required. Example: 20.90
27
+ name: Test WER # Optional. Example: Test WER
28
+ - name: Test CER
29
+ type: cer
30
+ value: 3.657
31
  ---
32
 
33
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
95
  - Pytorch 1.10.2+cu102
96
  - Datasets 1.18.2.dev0
97
  - Tokenizers 0.11.0
98
+
99
+ #### Evaluation Commands
100
+ 1. To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`
101
+
102
+ ```bash
103
+ python eval.py --model_id anuragshas/wav2vec2-xls-r-300m-mt-cv8-with-lm --dataset mozilla-foundation/common_voice_8_0 --config mt --split test
104
+ ```
105
+
106
+
107
+ ### Inference With LM
108
+
109
+ ```python
110
+ import torch
111
+ from datasets import load_dataset
112
+ from transformers import AutoModelForCTC, AutoProcessor
113
+ import torchaudio.functional as F
114
+ model_id = "anuragshas/wav2vec2-xls-r-300m-mt-cv8-with-lm"
115
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_8_0", "mt", split="test", streaming=True, use_auth_token=True))
116
+ sample = next(sample_iter)
117
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
118
+ model = AutoModelForCTC.from_pretrained(model_id)
119
+ processor = AutoProcessor.from_pretrained(model_id)
120
+ input_values = processor(resampled_audio, return_tensors="pt").input_values
121
+ with torch.no_grad():
122
+ logits = model(input_values).logits
123
+ transcription = processor.batch_decode(logits.numpy()).text
124
+ # => "għadu jilagħbu ċirku tant bilfondi"
125
+ ```
126
+
127
+ ### Eval results on Common Voice 8 "test" (WER):
128
+
129
+ | Without LM | With LM (run `./eval.py`) |
130
+ |---|---|
131
+ | 19.853 | 15.967 |