anuragshas commited on
Commit
2defdf4
1 Parent(s): d5b6a47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -2
README.md CHANGED
@@ -1,12 +1,30 @@
1
  ---
 
 
2
  license: apache-2.0
3
  tags:
4
  - generated_from_trainer
5
  datasets:
6
- - common_voice
 
 
7
  model-index:
8
  - name: wav2vec2-large-xls-r-300m-ur-cv8
9
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -71,3 +89,34 @@ The following hyperparameters were used during training:
71
  - Pytorch 1.10.0+cu111
72
  - Datasets 1.18.1
73
  - Tokenizers 0.11.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ur
4
  license: apache-2.0
5
  tags:
6
  - generated_from_trainer
7
  datasets:
8
+ - mozilla-foundation/common_voice_8_0
9
+ metrics:
10
+ - wer
11
  model-index:
12
  - name: wav2vec2-large-xls-r-300m-ur-cv8
13
+ results:
14
+ - task:
15
+ type: automatic-speech-recognition
16
+ name: Speech Recognition
17
+ dataset:
18
+ type: mozilla-foundation/common_voice_8_0
19
+ name: Common Voice 8
20
+ args: ur
21
+ metrics:
22
+ - type: wer # Required. Example: wer
23
+ value: 42.376 # Required. Example: 20.90
24
+ name: Test WER # Optional. Example: Test WER
25
+ - name: Test CER
26
+ type: cer
27
+ value: 18.180
28
  ---
29
 
30
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
89
  - Pytorch 1.10.0+cu111
90
  - Datasets 1.18.1
91
  - Tokenizers 0.11.0
92
+
93
+ ```bash
94
+ python eval.py --model_id anuragshas/wav2vec2-large-xls-r-300m-ur-cv8 --dataset mozilla-foundation/common_voice_8_0 --config ur --split test
95
+ ```
96
+
97
+
98
+ ### Inference With LM
99
+
100
+ ```python
101
+ import torch
102
+ from datasets import load_dataset
103
+ from transformers import AutoModelForCTC, AutoProcessor
104
+ import torchaudio.functional as F
105
+ model_id = "anuragshas/wav2vec2-large-xls-r-300m-ur-cv8"
106
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_8_0", "ur", split="test", streaming=True, use_auth_token=True))
107
+ sample = next(sample_iter)
108
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
109
+ model = AutoModelForCTC.from_pretrained(model_id)
110
+ processor = AutoProcessor.from_pretrained(model_id)
111
+ input_values = processor(resampled_audio, return_tensors="pt").input_values
112
+ with torch.no_grad():
113
+ logits = model(input_values).logits
114
+ transcription = processor.batch_decode(logits.numpy()).text
115
+ # => "اب نے ٹ پیس ان لیتے ہیں"
116
+ ```
117
+
118
+ ### Eval results on Common Voice 8 "test" (WER):
119
+
120
+ | Without LM | With LM (run `./eval.py`) |
121
+ |---|---|
122
+ | 52.146 | 42.376 |