anuragshas commited on
Commit
b347cdc
1 Parent(s): 524d26c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -3
README.md CHANGED
@@ -6,11 +6,28 @@ tags:
6
  - automatic-speech-recognition
7
  - mozilla-foundation/common_voice_7_0
8
  - generated_from_trainer
 
9
  datasets:
10
- - common_voice
 
 
11
  model-index:
12
- - name: ''
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -100,3 +117,38 @@ The following hyperparameters were used during training:
100
  - Pytorch 1.10.1+cu102
101
  - Datasets 1.17.1.dev0
102
  - Tokenizers 0.11.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - automatic-speech-recognition
7
  - mozilla-foundation/common_voice_7_0
8
  - generated_from_trainer
9
+ - robust-speech-event
10
  datasets:
11
+ - mozilla-foundation/common_voice_7_0
12
+ metrics:
13
+ - wer
14
  model-index:
15
+ - name: wav2vec2-xls-r-1b-hi
16
+ results:
17
+ - task:
18
+ type: automatic-speech-recognition
19
+ name: Speech Recognition
20
+ dataset:
21
+ type: mozilla-foundation/common_voice_7_0
22
+ name: Common Voice 7
23
+ args: hi
24
+ metrics:
25
+ - type: wer # Required. Example: wer
26
+ value: 49.972 # Required. Example: 20.90
27
+ name: Test WER # Optional. Example: Test WER
28
+ - name: Test CER
29
+ type: cer
30
+ value: 26.390
31
  ---
32
 
33
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
117
  - Pytorch 1.10.1+cu102
118
  - Datasets 1.17.1.dev0
119
  - Tokenizers 0.11.0
120
+
121
+
122
+ #### Evaluation Commands
123
+ 1. To evaluate on `mozilla-foundation/common_voice_7_0` with split `test`
124
+
125
+ ```bash
126
+ python eval.py --model_id anuragshas/wav2vec2-large-xls-r-300m-or --dataset mozilla-foundation/common_voice_7_0 --config or --split test
127
+ ```
128
+
129
+
130
+ ### Inference With LM
131
+
132
+ ```python
133
+ import torch
134
+ from datasets import load_dataset
135
+ from transformers import AutoModelForCTC, AutoProcessor
136
+ import torchaudio.functional as F
137
+ model_id = "anuragshas/wav2vec2-xls-r-1b-hi"
138
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_7_0", "hi", split="test", streaming=True, use_auth_token=True))
139
+ sample = next(sample_iter)
140
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
141
+ model = AutoModelForCTC.from_pretrained(model_id)
142
+ processor = AutoProcessor.from_pretrained(model_id)
143
+ input_values = processor(resampled_audio, return_tensors="pt").input_values
144
+ with torch.no_grad():
145
+ logits = model(input_values).logits
146
+ transcription = processor.batch_decode(logits.numpy()).text
147
+ # => "तुम्हारे पास तीन महीने बचे हैं"
148
+ ```
149
+
150
+ ### Eval results on Common Voice 7 "test" (WER):
151
+
152
+ | Without LM | With LM (run `./eval.py`) |
153
+ |---|---|
154
+ | 60.61 | 49.97 |