cociweb commited on
Commit
971d352
•
1 Parent(s): 85f3b62

Quantizated models added

Browse files
fp16/README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - hu
4
+ tags:
5
+ - audio
6
+ - automatic-speech-recognition
7
+ datasets:
8
+ - mozilla-foundation/common_voice_16_0
9
+ base_model: openai/whisper-small
10
+ license: mit
11
+ library_name: ctranslate2
12
+ ---
13
+
14
+ # Whisper small model for CTranslate2
15
+
16
+ This repository contains the conversion of a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. Fine-tune is made by [@sarpba](https://huggingface.co/sarpba) on the Common Voice 16 dataset of Mozilla Foundation.
17
+
18
+ This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/systran/faster-whisper).
19
+
20
+ ## Example
21
+
22
+ ```python
23
+ from faster_whisper import WhisperModel
24
+
25
+ model = WhisperModel("small")
26
+
27
+ segments, info = model.transcribe("audio.mp3")
28
+ for segment in segments:
29
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
30
+ ```
31
+
32
+ ## Conversion details
33
+
34
+ The original model was converted with the following command:
35
+
36
+ ```
37
+ ct2-transformers-converter --model Hungarians/whisper-small-cv16-hu --output_dir faster-whisper-small-cv16-fp16.hu \
38
+ --quantization fp16 --low_cpu_mem_usage --copy_files tokenizer_config.json preprocessor_config.json
39
+ ```
40
+
41
+ Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
42
+
43
+ ## More information
44
+
45
+ **For more information about the original model, see its [model card](https://huggingface.co/Hungarians/whisper-small-cv16-hu).**
fp16/config.json ADDED
@@ -0,0 +1,485 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 6,
5
+ 0
6
+ ],
7
+ [
8
+ 6,
9
+ 1
10
+ ],
11
+ [
12
+ 6,
13
+ 2
14
+ ],
15
+ [
16
+ 6,
17
+ 3
18
+ ],
19
+ [
20
+ 6,
21
+ 4
22
+ ],
23
+ [
24
+ 6,
25
+ 5
26
+ ],
27
+ [
28
+ 6,
29
+ 6
30
+ ],
31
+ [
32
+ 6,
33
+ 7
34
+ ],
35
+ [
36
+ 6,
37
+ 8
38
+ ],
39
+ [
40
+ 6,
41
+ 9
42
+ ],
43
+ [
44
+ 6,
45
+ 10
46
+ ],
47
+ [
48
+ 6,
49
+ 11
50
+ ],
51
+ [
52
+ 7,
53
+ 0
54
+ ],
55
+ [
56
+ 7,
57
+ 1
58
+ ],
59
+ [
60
+ 7,
61
+ 2
62
+ ],
63
+ [
64
+ 7,
65
+ 3
66
+ ],
67
+ [
68
+ 7,
69
+ 4
70
+ ],
71
+ [
72
+ 7,
73
+ 5
74
+ ],
75
+ [
76
+ 7,
77
+ 6
78
+ ],
79
+ [
80
+ 7,
81
+ 7
82
+ ],
83
+ [
84
+ 7,
85
+ 8
86
+ ],
87
+ [
88
+ 7,
89
+ 9
90
+ ],
91
+ [
92
+ 7,
93
+ 10
94
+ ],
95
+ [
96
+ 7,
97
+ 11
98
+ ],
99
+ [
100
+ 8,
101
+ 0
102
+ ],
103
+ [
104
+ 8,
105
+ 1
106
+ ],
107
+ [
108
+ 8,
109
+ 2
110
+ ],
111
+ [
112
+ 8,
113
+ 3
114
+ ],
115
+ [
116
+ 8,
117
+ 4
118
+ ],
119
+ [
120
+ 8,
121
+ 5
122
+ ],
123
+ [
124
+ 8,
125
+ 6
126
+ ],
127
+ [
128
+ 8,
129
+ 7
130
+ ],
131
+ [
132
+ 8,
133
+ 8
134
+ ],
135
+ [
136
+ 8,
137
+ 9
138
+ ],
139
+ [
140
+ 8,
141
+ 10
142
+ ],
143
+ [
144
+ 8,
145
+ 11
146
+ ],
147
+ [
148
+ 9,
149
+ 0
150
+ ],
151
+ [
152
+ 9,
153
+ 1
154
+ ],
155
+ [
156
+ 9,
157
+ 2
158
+ ],
159
+ [
160
+ 9,
161
+ 3
162
+ ],
163
+ [
164
+ 9,
165
+ 4
166
+ ],
167
+ [
168
+ 9,
169
+ 5
170
+ ],
171
+ [
172
+ 9,
173
+ 6
174
+ ],
175
+ [
176
+ 9,
177
+ 7
178
+ ],
179
+ [
180
+ 9,
181
+ 8
182
+ ],
183
+ [
184
+ 9,
185
+ 9
186
+ ],
187
+ [
188
+ 9,
189
+ 10
190
+ ],
191
+ [
192
+ 9,
193
+ 11
194
+ ],
195
+ [
196
+ 10,
197
+ 0
198
+ ],
199
+ [
200
+ 10,
201
+ 1
202
+ ],
203
+ [
204
+ 10,
205
+ 2
206
+ ],
207
+ [
208
+ 10,
209
+ 3
210
+ ],
211
+ [
212
+ 10,
213
+ 4
214
+ ],
215
+ [
216
+ 10,
217
+ 5
218
+ ],
219
+ [
220
+ 10,
221
+ 6
222
+ ],
223
+ [
224
+ 10,
225
+ 7
226
+ ],
227
+ [
228
+ 10,
229
+ 8
230
+ ],
231
+ [
232
+ 10,
233
+ 9
234
+ ],
235
+ [
236
+ 10,
237
+ 10
238
+ ],
239
+ [
240
+ 10,
241
+ 11
242
+ ],
243
+ [
244
+ 11,
245
+ 0
246
+ ],
247
+ [
248
+ 11,
249
+ 1
250
+ ],
251
+ [
252
+ 11,
253
+ 2
254
+ ],
255
+ [
256
+ 11,
257
+ 3
258
+ ],
259
+ [
260
+ 11,
261
+ 4
262
+ ],
263
+ [
264
+ 11,
265
+ 5
266
+ ],
267
+ [
268
+ 11,
269
+ 6
270
+ ],
271
+ [
272
+ 11,
273
+ 7
274
+ ],
275
+ [
276
+ 11,
277
+ 8
278
+ ],
279
+ [
280
+ 11,
281
+ 9
282
+ ],
283
+ [
284
+ 11,
285
+ 10
286
+ ],
287
+ [
288
+ 11,
289
+ 11
290
+ ]
291
+ ],
292
+ "lang_ids": [
293
+ 50259,
294
+ 50260,
295
+ 50261,
296
+ 50262,
297
+ 50263,
298
+ 50264,
299
+ 50265,
300
+ 50266,
301
+ 50267,
302
+ 50268,
303
+ 50269,
304
+ 50270,
305
+ 50271,
306
+ 50272,
307
+ 50273,
308
+ 50274,
309
+ 50275,
310
+ 50276,
311
+ 50277,
312
+ 50278,
313
+ 50279,
314
+ 50280,
315
+ 50281,
316
+ 50282,
317
+ 50283,
318
+ 50284,
319
+ 50285,
320
+ 50286,
321
+ 50287,
322
+ 50288,
323
+ 50289,
324
+ 50290,
325
+ 50291,
326
+ 50292,
327
+ 50293,
328
+ 50294,
329
+ 50295,
330
+ 50296,
331
+ 50297,
332
+ 50298,
333
+ 50299,
334
+ 50300,
335
+ 50301,
336
+ 50302,
337
+ 50303,
338
+ 50304,
339
+ 50305,
340
+ 50306,
341
+ 50307,
342
+ 50308,
343
+ 50309,
344
+ 50310,
345
+ 50311,
346
+ 50312,
347
+ 50313,
348
+ 50314,
349
+ 50315,
350
+ 50316,
351
+ 50317,
352
+ 50318,
353
+ 50319,
354
+ 50320,
355
+ 50321,
356
+ 50322,
357
+ 50323,
358
+ 50324,
359
+ 50325,
360
+ 50326,
361
+ 50327,
362
+ 50328,
363
+ 50329,
364
+ 50330,
365
+ 50331,
366
+ 50332,
367
+ 50333,
368
+ 50334,
369
+ 50335,
370
+ 50336,
371
+ 50337,
372
+ 50338,
373
+ 50339,
374
+ 50340,
375
+ 50341,
376
+ 50342,
377
+ 50343,
378
+ 50344,
379
+ 50345,
380
+ 50346,
381
+ 50347,
382
+ 50348,
383
+ 50349,
384
+ 50350,
385
+ 50351,
386
+ 50352,
387
+ 50353,
388
+ 50354,
389
+ 50355,
390
+ 50356,
391
+ 50357
392
+ ],
393
+ "suppress_ids": [
394
+ 1,
395
+ 2,
396
+ 7,
397
+ 8,
398
+ 9,
399
+ 10,
400
+ 14,
401
+ 25,
402
+ 26,
403
+ 27,
404
+ 28,
405
+ 29,
406
+ 31,
407
+ 58,
408
+ 59,
409
+ 60,
410
+ 61,
411
+ 62,
412
+ 63,
413
+ 90,
414
+ 91,
415
+ 92,
416
+ 93,
417
+ 359,
418
+ 503,
419
+ 522,
420
+ 542,
421
+ 873,
422
+ 893,
423
+ 902,
424
+ 918,
425
+ 922,
426
+ 931,
427
+ 1350,
428
+ 1853,
429
+ 1982,
430
+ 2460,
431
+ 2627,
432
+ 3246,
433
+ 3253,
434
+ 3268,
435
+ 3536,
436
+ 3846,
437
+ 3961,
438
+ 4183,
439
+ 4667,
440
+ 6585,
441
+ 6647,
442
+ 7273,
443
+ 9061,
444
+ 9383,
445
+ 10428,
446
+ 10929,
447
+ 11938,
448
+ 12033,
449
+ 12331,
450
+ 12562,
451
+ 13793,
452
+ 14157,
453
+ 14635,
454
+ 15265,
455
+ 15618,
456
+ 16553,
457
+ 16604,
458
+ 18362,
459
+ 18956,
460
+ 20075,
461
+ 21675,
462
+ 22520,
463
+ 26130,
464
+ 26161,
465
+ 26435,
466
+ 28279,
467
+ 29464,
468
+ 31650,
469
+ 32302,
470
+ 32470,
471
+ 36865,
472
+ 42863,
473
+ 47425,
474
+ 49870,
475
+ 50254,
476
+ 50258,
477
+ 50360,
478
+ 50361,
479
+ 50362
480
+ ],
481
+ "suppress_ids_begin": [
482
+ 220,
483
+ 50257
484
+ ]
485
+ }
fp16/hash.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "README.md": "7fa4402e90a36ea3c3c5546d6e94a652",
3
+ "config.json": "72c97d7fb45f7a607145d121b718ee65",
4
+ "model.bin": "3b0927dcce933b1d86300ab3cb5bca78",
5
+ "preprocessor_config.json": "15d1d7ee1cc6801b71f8ab68966aed86",
6
+ "tokenizer_config.json": "ea5ff3bfa7553fabbfbcb02846302bbc",
7
+ "vocabulary.json": "aebe7623626c8f3f61cc5208ff29c348",
8
+ "vocabulary.txt": "980d7011195d0c733bd374e31708717f",
9
+ "hash.json": "d41d8cd98f00b204e9800998ecf8427e"
10
+ }
fp16/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e03995062195aade7640ead072fa81cea85948077e0e7d7d8da3866ff79c532a
3
+ size 483546977
fp16/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 80,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
fp16/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.json → fp16/vocabulary.json RENAMED
File without changes
vocabulary.txt → fp16/vocabulary.txt RENAMED
File without changes
fp32/README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: openai/whisper-small
4
+ tags:
5
+ - hf-asr-leaderboard
6
+ - generated_from_trainer
7
+ datasets:
8
+ - mozilla-foundation/common_voice_16_0
9
+ language:
10
+ - hu
11
+ widget:
12
+ - example_title: Sample 1
13
+ src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample1.flac
14
+ - example_title: Sample 2
15
+ src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample2.flac
16
+ metrics:
17
+ - wer
18
+ pipeline_tag: automatic-speech-recognition
19
+ model-index:
20
+ - name: Whisper Small Hungarian
21
+ results:
22
+ - task:
23
+ name: Automatic Speech Recognition
24
+ type: automatic-speech-recognition
25
+ dataset:
26
+ name: Common Voice 16.0 - Hungarian
27
+ type: mozilla-foundation/common_voice_16_0
28
+ config: hu
29
+ split: test
30
+ args: hu
31
+ metrics:
32
+ - name: Wer
33
+ type: wer
34
+ value: 18.8314
35
+ verified: true
36
+
37
+ ---
38
+
39
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
40
+ should probably proofread and complete it, then remove this comment. -->
41
+
42
+ # Whisper Small Hungarian (training in progress)
43
+
44
+ This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 16 dataset of Mozilla Foundation.
45
+ It achieves the following results on the evaluation set:
46
+
47
+ Tempolary at step 3500:
48
+
49
+ - Wer: 18.8314
50
+
51
+ Unfortunatly the colab disconected, this is the end... :( maybe later continue
52
+
53
+
54
+ ## Model description
55
+
56
+ More information needed
57
+
58
+ ## Intended uses & limitations
59
+
60
+ More information needed
61
+
62
+ ## Training and evaluation data
63
+
64
+ More information needed
65
+
66
+ ## Training procedure
67
+
68
+ ### Training hyperparameters
69
+
70
+ The following hyperparameters were used during training:
71
+ - learning_rate: 1.25e-05
72
+ - train_batch_size: 8
73
+ - eval_batch_size: 4
74
+ - seed: 42
75
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
76
+ - lr_scheduler_type: constant_with_warmup
77
+ - lr_scheduler_warmup_steps: 400
78
+ - planed training_steps: 6000
79
+ - executed steps: 3500 only (colab dc)
80
+ - mixed_precision_training: Native AMP
81
+
82
+ ### Training results
83
+
84
+ | Steps | Training Loss | Validation Loss | Wer Ortho | Wer |
85
+ |:-----:|:-------------:|:---------------:|:---------:|:---------:|
86
+ | 500 | 0.354600 | 0.349688 | 34.385555 | 31.246555 |
87
+ | 1000 | 0.283800 | 0.290485 | 29.696507 | 26.625776 |
88
+ | 1500 | 0.248800 | 0.255122 | 26.360826 | 23.300925 |
89
+ | 2000 | 0.198300 | 0.234539 | 24.557530 | 21.714145 |
90
+ | 2500 | 0.196300 | 0.224310 | 23.557423 | 20.698512 |
91
+ | 3000 | 0.153000 | 0.210894 | 22.088291 | 19.231356 |
92
+ | 3500 | 0.109100 | 0.210817 | 21.465313 | 18.831435 |
93
+
94
+ ### Framework versions
95
+
96
+ - Transformers 4.36.2
97
+ - Pytorch 2.1.0+cu121
98
+ - Datasets 2.16.0
99
+ - Tokenizers 0.15.0
fp32/config.json ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openai/whisper-small",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "gelu",
5
+ "apply_spec_augment": false,
6
+ "architectures": [
7
+ "WhisperForConditionalGeneration"
8
+ ],
9
+ "attention_dropout": 0.0,
10
+ "begin_suppress_tokens": [
11
+ 220,
12
+ 50257
13
+ ],
14
+ "bos_token_id": 50257,
15
+ "classifier_proj_size": 256,
16
+ "d_model": 768,
17
+ "decoder_attention_heads": 12,
18
+ "decoder_ffn_dim": 3072,
19
+ "decoder_layerdrop": 0.0,
20
+ "decoder_layers": 12,
21
+ "decoder_start_token_id": 50258,
22
+ "dropout": 0.0,
23
+ "encoder_attention_heads": 12,
24
+ "encoder_ffn_dim": 3072,
25
+ "encoder_layerdrop": 0.0,
26
+ "encoder_layers": 12,
27
+ "eos_token_id": 50257,
28
+ "forced_decoder_ids": [
29
+ [
30
+ 1,
31
+ 50259
32
+ ],
33
+ [
34
+ 2,
35
+ 50359
36
+ ],
37
+ [
38
+ 3,
39
+ 50363
40
+ ]
41
+ ],
42
+ "init_std": 0.02,
43
+ "is_encoder_decoder": true,
44
+ "mask_feature_length": 10,
45
+ "mask_feature_min_masks": 0,
46
+ "mask_feature_prob": 0.0,
47
+ "mask_time_length": 10,
48
+ "mask_time_min_masks": 2,
49
+ "mask_time_prob": 0.05,
50
+ "max_length": 448,
51
+ "max_source_positions": 1500,
52
+ "max_target_positions": 448,
53
+ "median_filter_width": 7,
54
+ "model_type": "whisper",
55
+ "num_hidden_layers": 12,
56
+ "num_mel_bins": 80,
57
+ "pad_token_id": 50257,
58
+ "scale_embedding": false,
59
+ "suppress_tokens": [
60
+ 1,
61
+ 2,
62
+ 7,
63
+ 8,
64
+ 9,
65
+ 10,
66
+ 14,
67
+ 25,
68
+ 26,
69
+ 27,
70
+ 28,
71
+ 29,
72
+ 31,
73
+ 58,
74
+ 59,
75
+ 60,
76
+ 61,
77
+ 62,
78
+ 63,
79
+ 90,
80
+ 91,
81
+ 92,
82
+ 93,
83
+ 359,
84
+ 503,
85
+ 522,
86
+ 542,
87
+ 873,
88
+ 893,
89
+ 902,
90
+ 918,
91
+ 922,
92
+ 931,
93
+ 1350,
94
+ 1853,
95
+ 1982,
96
+ 2460,
97
+ 2627,
98
+ 3246,
99
+ 3253,
100
+ 3268,
101
+ 3536,
102
+ 3846,
103
+ 3961,
104
+ 4183,
105
+ 4667,
106
+ 6585,
107
+ 6647,
108
+ 7273,
109
+ 9061,
110
+ 9383,
111
+ 10428,
112
+ 10929,
113
+ 11938,
114
+ 12033,
115
+ 12331,
116
+ 12562,
117
+ 13793,
118
+ 14157,
119
+ 14635,
120
+ 15265,
121
+ 15618,
122
+ 16553,
123
+ 16604,
124
+ 18362,
125
+ 18956,
126
+ 20075,
127
+ 21675,
128
+ 22520,
129
+ 26130,
130
+ 26161,
131
+ 26435,
132
+ 28279,
133
+ 29464,
134
+ 31650,
135
+ 32302,
136
+ 32470,
137
+ 36865,
138
+ 42863,
139
+ 47425,
140
+ 49870,
141
+ 50254,
142
+ 50258,
143
+ 50360,
144
+ 50361,
145
+ 50362
146
+ ],
147
+ "torch_dtype": "float32",
148
+ "transformers_version": "4.36.2",
149
+ "use_cache": false,
150
+ "use_weighted_layer_sum": false,
151
+ "vocab_size": 51865
152
+ }
hash.json → fp32/hash.json RENAMED
File without changes
model.bin → fp32/model.bin RENAMED
File without changes
fp32/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 80,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
fp32/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
fp32/vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
fp32/vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff
 
int8/README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - hu
4
+ tags:
5
+ - audio
6
+ - automatic-speech-recognition
7
+ datasets:
8
+ - mozilla-foundation/common_voice_16_0
9
+ base_model: openai/whisper-small
10
+ license: mit
11
+ library_name: ctranslate2
12
+ ---
13
+
14
+ # Whisper small model for CTranslate2
15
+
16
+ This repository contains the conversion of a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. Fine-tune is made by [@sarpba](https://huggingface.co/sarpba) on the Common Voice 16 dataset of Mozilla Foundation.
17
+
18
+ This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/systran/faster-whisper).
19
+
20
+ ## Example
21
+
22
+ ```python
23
+ from faster_whisper import WhisperModel
24
+
25
+ model = WhisperModel("small")
26
+
27
+ segments, info = model.transcribe("audio.mp3")
28
+ for segment in segments:
29
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
30
+ ```
31
+
32
+ ## Conversion details
33
+
34
+ The original model was converted with the following command:
35
+
36
+ ```
37
+ ct2-transformers-converter --model Hungarians/whisper-small-cv16-hu --output_dir faster-whisper-small-cv16-int8.hu \
38
+ --quantization int8 --low_cpu_mem_usage --copy_files tokenizer_config.json preprocessor_config.json
39
+ ```
40
+
41
+ Note that the model weights are saved in INT8. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
42
+
43
+ ## More information
44
+
45
+ **For more information about the original model, see its [model card](https://huggingface.co/Hungarians/whisper-small-cv16-hu).**
int8/config.json ADDED
@@ -0,0 +1,485 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 6,
5
+ 0
6
+ ],
7
+ [
8
+ 6,
9
+ 1
10
+ ],
11
+ [
12
+ 6,
13
+ 2
14
+ ],
15
+ [
16
+ 6,
17
+ 3
18
+ ],
19
+ [
20
+ 6,
21
+ 4
22
+ ],
23
+ [
24
+ 6,
25
+ 5
26
+ ],
27
+ [
28
+ 6,
29
+ 6
30
+ ],
31
+ [
32
+ 6,
33
+ 7
34
+ ],
35
+ [
36
+ 6,
37
+ 8
38
+ ],
39
+ [
40
+ 6,
41
+ 9
42
+ ],
43
+ [
44
+ 6,
45
+ 10
46
+ ],
47
+ [
48
+ 6,
49
+ 11
50
+ ],
51
+ [
52
+ 7,
53
+ 0
54
+ ],
55
+ [
56
+ 7,
57
+ 1
58
+ ],
59
+ [
60
+ 7,
61
+ 2
62
+ ],
63
+ [
64
+ 7,
65
+ 3
66
+ ],
67
+ [
68
+ 7,
69
+ 4
70
+ ],
71
+ [
72
+ 7,
73
+ 5
74
+ ],
75
+ [
76
+ 7,
77
+ 6
78
+ ],
79
+ [
80
+ 7,
81
+ 7
82
+ ],
83
+ [
84
+ 7,
85
+ 8
86
+ ],
87
+ [
88
+ 7,
89
+ 9
90
+ ],
91
+ [
92
+ 7,
93
+ 10
94
+ ],
95
+ [
96
+ 7,
97
+ 11
98
+ ],
99
+ [
100
+ 8,
101
+ 0
102
+ ],
103
+ [
104
+ 8,
105
+ 1
106
+ ],
107
+ [
108
+ 8,
109
+ 2
110
+ ],
111
+ [
112
+ 8,
113
+ 3
114
+ ],
115
+ [
116
+ 8,
117
+ 4
118
+ ],
119
+ [
120
+ 8,
121
+ 5
122
+ ],
123
+ [
124
+ 8,
125
+ 6
126
+ ],
127
+ [
128
+ 8,
129
+ 7
130
+ ],
131
+ [
132
+ 8,
133
+ 8
134
+ ],
135
+ [
136
+ 8,
137
+ 9
138
+ ],
139
+ [
140
+ 8,
141
+ 10
142
+ ],
143
+ [
144
+ 8,
145
+ 11
146
+ ],
147
+ [
148
+ 9,
149
+ 0
150
+ ],
151
+ [
152
+ 9,
153
+ 1
154
+ ],
155
+ [
156
+ 9,
157
+ 2
158
+ ],
159
+ [
160
+ 9,
161
+ 3
162
+ ],
163
+ [
164
+ 9,
165
+ 4
166
+ ],
167
+ [
168
+ 9,
169
+ 5
170
+ ],
171
+ [
172
+ 9,
173
+ 6
174
+ ],
175
+ [
176
+ 9,
177
+ 7
178
+ ],
179
+ [
180
+ 9,
181
+ 8
182
+ ],
183
+ [
184
+ 9,
185
+ 9
186
+ ],
187
+ [
188
+ 9,
189
+ 10
190
+ ],
191
+ [
192
+ 9,
193
+ 11
194
+ ],
195
+ [
196
+ 10,
197
+ 0
198
+ ],
199
+ [
200
+ 10,
201
+ 1
202
+ ],
203
+ [
204
+ 10,
205
+ 2
206
+ ],
207
+ [
208
+ 10,
209
+ 3
210
+ ],
211
+ [
212
+ 10,
213
+ 4
214
+ ],
215
+ [
216
+ 10,
217
+ 5
218
+ ],
219
+ [
220
+ 10,
221
+ 6
222
+ ],
223
+ [
224
+ 10,
225
+ 7
226
+ ],
227
+ [
228
+ 10,
229
+ 8
230
+ ],
231
+ [
232
+ 10,
233
+ 9
234
+ ],
235
+ [
236
+ 10,
237
+ 10
238
+ ],
239
+ [
240
+ 10,
241
+ 11
242
+ ],
243
+ [
244
+ 11,
245
+ 0
246
+ ],
247
+ [
248
+ 11,
249
+ 1
250
+ ],
251
+ [
252
+ 11,
253
+ 2
254
+ ],
255
+ [
256
+ 11,
257
+ 3
258
+ ],
259
+ [
260
+ 11,
261
+ 4
262
+ ],
263
+ [
264
+ 11,
265
+ 5
266
+ ],
267
+ [
268
+ 11,
269
+ 6
270
+ ],
271
+ [
272
+ 11,
273
+ 7
274
+ ],
275
+ [
276
+ 11,
277
+ 8
278
+ ],
279
+ [
280
+ 11,
281
+ 9
282
+ ],
283
+ [
284
+ 11,
285
+ 10
286
+ ],
287
+ [
288
+ 11,
289
+ 11
290
+ ]
291
+ ],
292
+ "lang_ids": [
293
+ 50259,
294
+ 50260,
295
+ 50261,
296
+ 50262,
297
+ 50263,
298
+ 50264,
299
+ 50265,
300
+ 50266,
301
+ 50267,
302
+ 50268,
303
+ 50269,
304
+ 50270,
305
+ 50271,
306
+ 50272,
307
+ 50273,
308
+ 50274,
309
+ 50275,
310
+ 50276,
311
+ 50277,
312
+ 50278,
313
+ 50279,
314
+ 50280,
315
+ 50281,
316
+ 50282,
317
+ 50283,
318
+ 50284,
319
+ 50285,
320
+ 50286,
321
+ 50287,
322
+ 50288,
323
+ 50289,
324
+ 50290,
325
+ 50291,
326
+ 50292,
327
+ 50293,
328
+ 50294,
329
+ 50295,
330
+ 50296,
331
+ 50297,
332
+ 50298,
333
+ 50299,
334
+ 50300,
335
+ 50301,
336
+ 50302,
337
+ 50303,
338
+ 50304,
339
+ 50305,
340
+ 50306,
341
+ 50307,
342
+ 50308,
343
+ 50309,
344
+ 50310,
345
+ 50311,
346
+ 50312,
347
+ 50313,
348
+ 50314,
349
+ 50315,
350
+ 50316,
351
+ 50317,
352
+ 50318,
353
+ 50319,
354
+ 50320,
355
+ 50321,
356
+ 50322,
357
+ 50323,
358
+ 50324,
359
+ 50325,
360
+ 50326,
361
+ 50327,
362
+ 50328,
363
+ 50329,
364
+ 50330,
365
+ 50331,
366
+ 50332,
367
+ 50333,
368
+ 50334,
369
+ 50335,
370
+ 50336,
371
+ 50337,
372
+ 50338,
373
+ 50339,
374
+ 50340,
375
+ 50341,
376
+ 50342,
377
+ 50343,
378
+ 50344,
379
+ 50345,
380
+ 50346,
381
+ 50347,
382
+ 50348,
383
+ 50349,
384
+ 50350,
385
+ 50351,
386
+ 50352,
387
+ 50353,
388
+ 50354,
389
+ 50355,
390
+ 50356,
391
+ 50357
392
+ ],
393
+ "suppress_ids": [
394
+ 1,
395
+ 2,
396
+ 7,
397
+ 8,
398
+ 9,
399
+ 10,
400
+ 14,
401
+ 25,
402
+ 26,
403
+ 27,
404
+ 28,
405
+ 29,
406
+ 31,
407
+ 58,
408
+ 59,
409
+ 60,
410
+ 61,
411
+ 62,
412
+ 63,
413
+ 90,
414
+ 91,
415
+ 92,
416
+ 93,
417
+ 359,
418
+ 503,
419
+ 522,
420
+ 542,
421
+ 873,
422
+ 893,
423
+ 902,
424
+ 918,
425
+ 922,
426
+ 931,
427
+ 1350,
428
+ 1853,
429
+ 1982,
430
+ 2460,
431
+ 2627,
432
+ 3246,
433
+ 3253,
434
+ 3268,
435
+ 3536,
436
+ 3846,
437
+ 3961,
438
+ 4183,
439
+ 4667,
440
+ 6585,
441
+ 6647,
442
+ 7273,
443
+ 9061,
444
+ 9383,
445
+ 10428,
446
+ 10929,
447
+ 11938,
448
+ 12033,
449
+ 12331,
450
+ 12562,
451
+ 13793,
452
+ 14157,
453
+ 14635,
454
+ 15265,
455
+ 15618,
456
+ 16553,
457
+ 16604,
458
+ 18362,
459
+ 18956,
460
+ 20075,
461
+ 21675,
462
+ 22520,
463
+ 26130,
464
+ 26161,
465
+ 26435,
466
+ 28279,
467
+ 29464,
468
+ 31650,
469
+ 32302,
470
+ 32470,
471
+ 36865,
472
+ 42863,
473
+ 47425,
474
+ 49870,
475
+ 50254,
476
+ 50258,
477
+ 50360,
478
+ 50361,
479
+ 50362
480
+ ],
481
+ "suppress_ids_begin": [
482
+ 220,
483
+ 50257
484
+ ]
485
+ }
int8/hash.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "README.md": "4ff5ec2d9bc1f17efa15deee2206b5b9",
3
+ "config.json": "72c97d7fb45f7a607145d121b718ee65",
4
+ "model.bin": "2f4b4f3e055a34987ea2883995976e76",
5
+ "preprocessor_config.json": "15d1d7ee1cc6801b71f8ab68966aed86",
6
+ "tokenizer_config.json": "ea5ff3bfa7553fabbfbcb02846302bbc",
7
+ "vocabulary.json": "aebe7623626c8f3f61cc5208ff29c348",
8
+ "vocabulary.txt": "980d7011195d0c733bd374e31708717f",
9
+ "hash.json": "d41d8cd98f00b204e9800998ecf8427e"
10
+ }
int8/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84dd343cc0e9e5c322bbaadddfdb50a9e2324e393eb6dcca38e72a01b75ca6fa
3
+ size 254058951
int8/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 80,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
int8/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
int8/vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
int8/vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff