devasheeshG commited on
Commit
fbc6b69
1 Parent(s): 94be451

added benchmarks

Browse files
Files changed (2) hide show
  1. README.md +100 -4
  2. __init__.py +0 -16
README.md CHANGED
@@ -4,7 +4,76 @@ pipeline_tag: automatic-speech-recognition
4
  tags:
5
  - pytorch
6
  - audio
 
7
  - automatic-speech-recognition
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  language:
9
  - en
10
  - zh
@@ -117,7 +186,7 @@ language:
117
  * transformers Version: 4.30.2
118
  * accelerate Version: 0.20.3
119
 
120
- ## BENCHMARK:
121
 
122
  - RAM: 2.8 GB (Original_Model: 5.5GB)
123
  - VRAM: 1812 MB (Original_Model: 6GB)
@@ -130,17 +199,44 @@ language:
130
  | 1660 Super | OOM | 3.3 | 1,408 | - |
131
  | Collab (Tesla T4) | 2.8 | 2.2 | 2,560 | 320 |
132
  | Collab (CPU) | 35 | - | - | - |
 
 
 
133
 
134
  - **NOTE: TensorCores are efficient in mixed-precision calculations**
135
- - CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)
136
  - Punchuation: True
137
 
138
- ## Usage
 
 
 
 
 
 
139
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.
141
 
142
  Firstly, clone this repo and place all the files inside a folder.
143
- # Make sure you have git-lfs installed (https://git-lfs.com)
144
  ```bash
145
  git lfs install
146
  git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers
 
4
  tags:
5
  - pytorch
6
  - audio
7
+ - speech
8
  - automatic-speech-recognition
9
+ - whisper
10
+ - wav2vec2
11
+
12
+ model-index:
13
+ - name: whisper_medium_fp16_transformers
14
+ results:
15
+ - task:
16
+ type: automatic-speech-recognition
17
+ name: Automatic Speech Recognition
18
+ dataset:
19
+ type: common_voice
20
+ name: Common Voice (14.0) (Hindi) (test.tsv -> 2557 samples used)
21
+ metrics:
22
+ - type: wer
23
+ value: 1.7
24
+ name: Test WER
25
+ description: Word Error Rate
26
+ - type: mer
27
+ value: 1.1
28
+ name: Test MER
29
+ description: Match Error Rate
30
+ - type: wil
31
+ value: 3,584
32
+ name: Test WIL
33
+ description: Word Information Lost
34
+ - type: wip
35
+ value: 112
36
+ name: Test WIP
37
+ description: Word Information Preserved
38
+ - type: cer
39
+ value: 1.7
40
+ name: Test CER
41
+ description: Character Error Rate
42
+
43
+ - task:
44
+ type: automatic-speech-recognition
45
+ name: Automatic Speech Recognition
46
+ dataset:
47
+ type: common_voice
48
+ name: Common Voice (14.0) (English) (test.tsv -> 2557 samples used)
49
+ metrics:
50
+ - type: wer
51
+ value: -
52
+ name: Test WER
53
+ description: Word Error Rate
54
+ - type: mer
55
+ value: -
56
+ name: Test MER
57
+ description: Match Error Rate
58
+ - type: wil
59
+ value: -
60
+ name: Test WIL
61
+ description: Word Information Lost
62
+ - type: wip
63
+ value: -
64
+ name: Test WIP
65
+ description: Word Information Preserved
66
+ - type: cer
67
+ value: -
68
+ name: Test CER
69
+ description: Character Error Rate
70
+
71
+ widget:
72
+ - example_title: Librispeech sample 1
73
+ src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
74
+ - example_title: Librispeech sample 2
75
+ src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
76
+
77
  language:
78
  - en
79
  - zh
 
186
  * transformers Version: 4.30.2
187
  * accelerate Version: 0.20.3
188
 
189
+ ## Model Benchmarks:
190
 
191
  - RAM: 2.8 GB (Original_Model: 5.5GB)
192
  - VRAM: 1812 MB (Original_Model: 6GB)
 
199
  | 1660 Super | OOM | 3.3 | 1,408 | - |
200
  | Collab (Tesla T4) | 2.8 | 2.2 | 2,560 | 320 |
201
  | Collab (CPU) | 35 | - | - | - |
202
+ | M1 (CPU) | - | - | - | - |
203
+ | M1 (GPU -> 'mps') | - | - | - | - |
204
+
205
 
206
  - **NOTE: TensorCores are efficient in mixed-precision calculations**
207
+ - **CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)**
208
  - Punchuation: True
209
 
210
+ ## Model Error Benchmarks:
211
+
212
+ - **WER: Word Error Rate**
213
+ - **MER: Match Error Rate**
214
+ - **WIL: Word Information Lost**
215
+ - **WIP: Word Information Preserved**
216
+ - **CER: Character Error Rate**
217
 
218
+ ### Hindi (test.tsv -> 2557 samples used) [Common Voice 14.0](https://commonvoice.mozilla.org/en/datasets)
219
+ | | WER | MER | WIL | WIP | CER |
220
+ | ----------------- | -------------------- | ------- | --------- | ----------- | --- |
221
+ | Original_Model | - | - | - | - | - |
222
+ | This_Model | - | - | - | - | - |
223
+
224
+ ### English
225
+ | | WER | MER | WIL | WIP | CER |
226
+ | ----------------- | -------------------- | ------- | --------- | ----------- | --- |
227
+ | Original_Model | - | - | - | - | - |
228
+ | This_Model | - | - | - | - | - |
229
+
230
+ - **'jiwer' library is used for calculations**
231
+
232
+ ## Code:
233
+ - ### [$\textbf{Will be soon Uploaded on Github}$ ](https://github.com/devasheeshG)
234
+
235
+ ## Usage
236
  A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.
237
 
238
  Firstly, clone this repo and place all the files inside a folder.
239
+ ### Make sure you have git-lfs installed (https://git-lfs.com)
240
  ```bash
241
  git lfs install
242
  git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers
__init__.py CHANGED
@@ -1,19 +1,3 @@
1
- """
2
- CUDA: 12.1
3
- cuDNN Version: 8.9.2.26_1.0-1_amd64
4
- Tensorflow Version: 2.12.0
5
- Torch Version: 2.1.0.dev20230606+cu121
6
- Transformers Version: 4.30.2
7
- BENCHMARK:
8
- - RAM: 2.8 GB
9
- - VRAM: 1812 MB
10
- - test.wav: 23 s
11
- - GPU (3060) -> 1.1s (TensorCore is used for fp16 inference)
12
- - GPU (1660S) -> 3.3s
13
- - CPU -> torch.float16 not supported on CPU (Ryzen 5 3600)
14
- - Punchuation: True
15
- """
16
-
17
  from transformers import (
18
  WhisperForConditionalGeneration, WhisperProcessor, WhisperConfig
19
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  from transformers import (
2
  WhisperForConditionalGeneration, WhisperProcessor, WhisperConfig
3
  )