anuragshas commited on
Commit
5ea0cd6
1 Parent(s): 85cc2a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -4
README.md CHANGED
@@ -1,18 +1,37 @@
1
  ---
 
 
2
  license: apache-2.0
3
  tags:
4
  - generated_from_trainer
 
5
  datasets:
6
- - common_voice
 
 
7
  model-index:
8
- - name: wav2vec2-large-xls-r-300m-pa-in
9
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- # wav2vec2-large-xls-r-300m-pa-in
16
 
17
  This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice dataset.
18
  It achieves the following results on the evaluation set:
@@ -67,3 +86,38 @@ The following hyperparameters were used during training:
67
  - Pytorch 1.10.0+cu111
68
  - Datasets 1.18.0
69
  - Tokenizers 0.10.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pa
4
  license: apache-2.0
5
  tags:
6
  - generated_from_trainer
7
+ - robust-speech-event
8
  datasets:
9
+ - mozilla-foundation/common_voice_7_0
10
+ metrics:
11
+ - wer
12
  model-index:
13
+ - name: XLS-R-300M - Punjabi
14
+ results:
15
+ - task:
16
+ type: automatic-speech-recognition
17
+ name: Speech Recognition
18
+ dataset:
19
+ type: mozilla-foundation/common_voice_7_0
20
+ name: Common Voice 7
21
+ args: pa-IN
22
+ metrics:
23
+ - type: wer # Required. Example: wer
24
+ value: 45.611 # Required. Example: 20.90
25
+ name: Test WER # Optional. Example: Test WER
26
+ - name: Test CER
27
+ type: cer
28
+ value: 15.584
29
  ---
30
 
31
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
  should probably proofread and complete it, then remove this comment. -->
33
 
34
+ # XLS-R-300M - Punjabi
35
 
36
  This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice dataset.
37
  It achieves the following results on the evaluation set:
 
86
  - Pytorch 1.10.0+cu111
87
  - Datasets 1.18.0
88
  - Tokenizers 0.10.3
89
+
90
+
91
+ #### Evaluation Commands
92
+ 1. To evaluate on `mozilla-foundation/common_voice_7_0` with split `test`
93
+
94
+ ```bash
95
+ python eval.py --model_id anuragshas/wav2vec2-large-xls-r-300m-pa-in --dataset mozilla-foundation/common_voice_7_0 --config pa-IN --split test
96
+ ```
97
+
98
+
99
+ ### Inference With LM
100
+
101
+ ```python
102
+ import torch
103
+ from datasets import load_dataset
104
+ from transformers import AutoModelForCTC, AutoProcessor
105
+ import torchaudio.functional as F
106
+ model_id = "anuragshas/wav2vec2-large-xls-r-300m-pa-in"
107
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_7_0", "pa-IN", split="test", streaming=True, use_auth_token=True))
108
+ sample = next(sample_iter)
109
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
110
+ model = AutoModelForCTC.from_pretrained(model_id)
111
+ processor = AutoProcessor.from_pretrained(model_id)
112
+ input_values = processor(resampled_audio, return_tensors="pt").input_values
113
+ with torch.no_grad():
114
+ logits = model(input_values).logits
115
+ transcription = processor.batch_decode(logits.numpy()).text
116
+ # => "ਉਨ੍ਹਾਂ ਨੇ ਸਾਰੇ ਤੇਅਰਵੇ ਵੱਖਰੀ ਕਿਸਮ ਦੇ ਕੀਤੇ ਹਨ"
117
+ ```
118
+
119
+ ### Eval results on Common Voice 7 "test" (WER):
120
+
121
+ | Without LM | With LM (run `./eval.py`) |
122
+ |---|---|
123
+ | 51.968 | 45.611 |