cociweb commited on
Commit
44aa66a
•
1 Parent(s): adb2e46

Quantizated models added

Browse files
fp16/README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - hu
4
+ tags:
5
+ - audio
6
+ - automatic-speech-recognition
7
+ datasets:
8
+ - mozilla-foundation/common_voice_16_0
9
+ base_model: openai/whisper-tiny
10
+ license: mit
11
+ library_name: ctranslate2
12
+ ---
13
+
14
+ # Whisper tiny model for CTranslate2
15
+
16
+ This repository contains the conversion of a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. Fine-tune is made by [@sarpba](https://huggingface.co/sarpba) on the Common Voice 16 dataset of Mozilla Foundation.
17
+
18
+ This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/systran/faster-whisper).
19
+
20
+ ## Example
21
+
22
+ ```python
23
+ from faster_whisper import WhisperModel
24
+
25
+ model = WhisperModel("tiny")
26
+
27
+ segments, info = model.transcribe("audio.mp3")
28
+ for segment in segments:
29
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
30
+ ```
31
+
32
+ ## Conversion details
33
+
34
+ The original model was converted with the following command:
35
+
36
+ ```
37
+ ct2-transformers-converter --model Hungarians/whisper-tiny-cv16-hu-v2 --output_dir faster-whisper-tiny-cv16-v2-fp16.hu \
38
+ --quantization fp16 --low_cpu_mem_usage --copy_files tokenizer_config.json preprocessor_config.json
39
+ ```
40
+
41
+ Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
42
+
43
+ ## HASH calculation
44
+
45
+ Hash calculation is executed with md5hash in the directory of the model with:
46
+
47
+ ```
48
+ find ./ -maxdepth 1 -type f -not -path '*/\.*' -exec md5sum {} \; | tr -d ' '| jq -R 'split("./") | {(.[1]): (.[0])}' | jq -s 'add' > hash.json
49
+ ```
50
+
51
+ ## More information
52
+
53
+ **For more information about the original model, see its [model card](https://huggingface.co/Hungarians/whisper-tiny-cv16-hu-v2).**
fp16/hash.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config.json": "f30b4fcb198aa49c4b8b905f6d6a1ca3",
3
+ "model.bin": "c4cac28d9225c95dfa0d66ad893feb93",
4
+ "preprocessor_config.json": "15d1d7ee1cc6801b71f8ab68966aed86",
5
+ "tokenizer_config.json": "ea5ff3bfa7553fabbfbcb02846302bbc",
6
+ "vocabulary.json": "aebe7623626c8f3f61cc5208ff29c348",
7
+ "README.md": "1540e632b3c1a6456cce39ddc2312048",
8
+ "vocabulary.txt": "980d7011195d0c733bd374e31708717f",
9
+ "hash.json": "d41d8cd98f00b204e9800998ecf8427e"
10
+ }
fp16/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:123ba49c0439c2dfcf1919d83a19be05a04d08e98e0c3f705606b1abf649b7f8
3
+ size 75538345
fp16/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 80,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
fp16/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.json → fp16/vocabulary.json RENAMED
File without changes
vocabulary.txt → fp16/vocabulary.txt RENAMED
File without changes
fp32/README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: openai/whisper-tiny
4
+ tags:
5
+ - hf-asr-leaderboard
6
+ - generated_from_trainer
7
+ datasets:
8
+ - mozilla-foundation/common_voice_16_0
9
+ language:
10
+ - hu
11
+ widget:
12
+ - example_title: Sample 1
13
+ src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample1.flac
14
+ - example_title: Sample 2
15
+ src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample2.flac
16
+ metrics:
17
+ - wer
18
+ pipeline_tag: automatic-speech-recognition
19
+ model-index:
20
+ - name: Whisper Tiny Hu v2
21
+ results:
22
+ - task:
23
+ name: Automatic Speech Recognition
24
+ type: automatic-speech-recognition
25
+ dataset:
26
+ name: Common Voice 16.0 - Hungarian
27
+ type: mozilla-foundation/common_voice_16_0
28
+ config: hu
29
+ split: test
30
+ args: hu
31
+ metrics:
32
+ - name: Wer
33
+ type: wer
34
+ value: 15.7367
35
+ verified: true
36
+
37
+ ---
38
+
39
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
40
+ should probably proofread and complete it, then remove this comment. -->
41
+
42
+
43
+ # Whisper Tiny Hu v2
44
+
45
+ This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the Common Voice 16.0 dataset.
46
+ It achieves the following results on the evaluation set:
47
+ - Loss: 0.1930
48
+ - Wer Ortho: 17.3040
49
+ - Wer: 15.7367
50
+
51
+ ## Model description
52
+
53
+ More information needed
54
+
55
+ ## Intended uses & limitations
56
+
57
+ More information needed
58
+
59
+ ## Training and evaluation data
60
+
61
+ More information needed
62
+
63
+ ## Training procedure
64
+
65
+ ### Training hyperparameters
66
+
67
+ The following hyperparameters were used during training:
68
+ - learning_rate: 4e-05
69
+ - train_batch_size: 8
70
+ - eval_batch_size: 8
71
+ - seed: 42
72
+ - gradient_accumulation_steps: 2
73
+ - total_train_batch_size: 16
74
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
75
+ - lr_scheduler_type: constant_with_warmup
76
+ - lr_scheduler_warmup_steps: 500
77
+ - training_steps: 15000
78
+ - mixed_precision_training: Native AMP
79
+
80
+ ### Training results
81
+
82
+ | Training Loss | Epoch | Step | Validation Loss | Wer Ortho | Wer |
83
+ |:-------------:|:-----:|:-----:|:---------------:|:---------:|:-------:|
84
+ | 0.5487 | 0.33 | 1000 | 0.5970 | 55.5492 | 52.2206 |
85
+ | 0.3922 | 0.67 | 2000 | 0.4419 | 43.1109 | 39.9911 |
86
+ | 0.3242 | 1.0 | 3000 | 0.3662 | 37.2727 | 34.2040 |
87
+ | 0.2517 | 1.34 | 4000 | 0.3329 | 33.7890 | 30.8746 |
88
+ | 0.2455 | 1.67 | 5000 | 0.2925 | 30.6185 | 28.0196 |
89
+ | 0.1398 | 2.01 | 6000 | 0.2600 | 27.1709 | 24.5983 |
90
+ | 0.1421 | 2.34 | 7000 | 0.2491 | 26.1291 | 23.6347 |
91
+ | 0.1578 | 2.68 | 8000 | 0.2342 | 24.4761 | 22.0783 |
92
+ | 0.0732 | 3.01 | 9000 | 0.2163 | 22.1245 | 19.8547 |
93
+ | 0.0941 | 3.35 | 10000 | 0.2143 | 22.2058 | 19.8399 |
94
+ | 0.0936 | 3.68 | 11000 | 0.2094 | 20.5980 | 18.7756 |
95
+ | 0.0489 | 4.02 | 12000 | 0.2027 | 18.9630 | 17.2665 |
96
+ | 0.0548 | 4.35 | 13000 | 0.1981 | 18.4933 | 16.5491 |
97
+ | 0.0585 | 4.69 | 14000 | 0.1953 | 17.7195 | 15.7693 |
98
+ | 0.0356 | 5.02 | 15000 | 0.1930 | 17.3040 | 15.7367 |
99
+
100
+
101
+ ### Framework versions
102
+
103
+ - Transformers 4.36.2
104
+ - Pytorch 2.1.0+cu121
105
+ - Datasets 2.16.1
106
+ - Tokenizers 0.15.0
fp32/config.json ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openai/whisper-tiny",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "gelu",
5
+ "apply_spec_augment": false,
6
+ "architectures": [
7
+ "WhisperForConditionalGeneration"
8
+ ],
9
+ "attention_dropout": 0.0,
10
+ "begin_suppress_tokens": [
11
+ 220,
12
+ 50257
13
+ ],
14
+ "bos_token_id": 50257,
15
+ "classifier_proj_size": 256,
16
+ "d_model": 384,
17
+ "decoder_attention_heads": 6,
18
+ "decoder_ffn_dim": 1536,
19
+ "decoder_layerdrop": 0.0,
20
+ "decoder_layers": 4,
21
+ "decoder_start_token_id": 50258,
22
+ "dropout": 0.0,
23
+ "encoder_attention_heads": 6,
24
+ "encoder_ffn_dim": 1536,
25
+ "encoder_layerdrop": 0.0,
26
+ "encoder_layers": 4,
27
+ "eos_token_id": 50257,
28
+ "forced_decoder_ids": [
29
+ [
30
+ 1,
31
+ 50259
32
+ ],
33
+ [
34
+ 2,
35
+ 50359
36
+ ],
37
+ [
38
+ 3,
39
+ 50363
40
+ ]
41
+ ],
42
+ "init_std": 0.02,
43
+ "is_encoder_decoder": true,
44
+ "mask_feature_length": 10,
45
+ "mask_feature_min_masks": 0,
46
+ "mask_feature_prob": 0.0,
47
+ "mask_time_length": 10,
48
+ "mask_time_min_masks": 2,
49
+ "mask_time_prob": 0.05,
50
+ "max_length": 448,
51
+ "max_source_positions": 1500,
52
+ "max_target_positions": 448,
53
+ "median_filter_width": 7,
54
+ "model_type": "whisper",
55
+ "num_hidden_layers": 4,
56
+ "num_mel_bins": 80,
57
+ "pad_token_id": 50257,
58
+ "scale_embedding": false,
59
+ "suppress_tokens": [
60
+ 1,
61
+ 2,
62
+ 7,
63
+ 8,
64
+ 9,
65
+ 10,
66
+ 14,
67
+ 25,
68
+ 26,
69
+ 27,
70
+ 28,
71
+ 29,
72
+ 31,
73
+ 58,
74
+ 59,
75
+ 60,
76
+ 61,
77
+ 62,
78
+ 63,
79
+ 90,
80
+ 91,
81
+ 92,
82
+ 93,
83
+ 359,
84
+ 503,
85
+ 522,
86
+ 542,
87
+ 873,
88
+ 893,
89
+ 902,
90
+ 918,
91
+ 922,
92
+ 931,
93
+ 1350,
94
+ 1853,
95
+ 1982,
96
+ 2460,
97
+ 2627,
98
+ 3246,
99
+ 3253,
100
+ 3268,
101
+ 3536,
102
+ 3846,
103
+ 3961,
104
+ 4183,
105
+ 4667,
106
+ 6585,
107
+ 6647,
108
+ 7273,
109
+ 9061,
110
+ 9383,
111
+ 10428,
112
+ 10929,
113
+ 11938,
114
+ 12033,
115
+ 12331,
116
+ 12562,
117
+ 13793,
118
+ 14157,
119
+ 14635,
120
+ 15265,
121
+ 15618,
122
+ 16553,
123
+ 16604,
124
+ 18362,
125
+ 18956,
126
+ 20075,
127
+ 21675,
128
+ 22520,
129
+ 26130,
130
+ 26161,
131
+ 26435,
132
+ 28279,
133
+ 29464,
134
+ 31650,
135
+ 32302,
136
+ 32470,
137
+ 36865,
138
+ 42863,
139
+ 47425,
140
+ 49870,
141
+ 50254,
142
+ 50258,
143
+ 50358,
144
+ 50359,
145
+ 50360,
146
+ 50361,
147
+ 50362
148
+ ],
149
+ "torch_dtype": "float32",
150
+ "transformers_version": "4.36.2",
151
+ "use_cache": false,
152
+ "use_weighted_layer_sum": false,
153
+ "vocab_size": 51865
154
+ }
hash.json → fp32/hash.json RENAMED
File without changes
model.bin → fp32/model.bin RENAMED
File without changes
fp32/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 80,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
fp32/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
fp32/vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
fp32/vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff
 
int8/README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - hu
4
+ tags:
5
+ - audio
6
+ - automatic-speech-recognition
7
+ datasets:
8
+ - mozilla-foundation/common_voice_16_0
9
+ base_model: openai/whisper-tiny
10
+ license: mit
11
+ library_name: ctranslate2
12
+ ---
13
+
14
+ # Whisper tiny model for CTranslate2
15
+
16
+ This repository contains the conversion of a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. Fine-tune is made by [@sarpba](https://huggingface.co/sarpba) on the Common Voice 16 dataset of Mozilla Foundation.
17
+
18
+ This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/systran/faster-whisper).
19
+
20
+ ## Example
21
+
22
+ ```python
23
+ from faster_whisper import WhisperModel
24
+
25
+ model = WhisperModel("tiny")
26
+
27
+ segments, info = model.transcribe("audio.mp3")
28
+ for segment in segments:
29
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
30
+ ```
31
+
32
+ ## Conversion details
33
+
34
+ The original model was converted with the following command:
35
+
36
+ ```
37
+ ct2-transformers-converter --model Hungarians/whisper-tiny-cv16-hu-v2 --output_dir faster-whisper-tiny-cv16-v2-int8.hu \
38
+ --quantization int8 --low_cpu_mem_usage --copy_files tokenizer_config.json preprocessor_config.json
39
+ ```
40
+
41
+ Note that the model weights are saved in INT8. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
42
+
43
+ ## HASH calculation
44
+
45
+ Hash calculation is executed with md5hash in the directory of the model with:
46
+
47
+ ```
48
+ find ./ -maxdepth 1 -type f -not -path '*/\.*' -exec md5sum {} \; | tr -d ' ' | jq -R 'split("./") | {(.[1]): (.[0])}' | jq -s 'add' > hash.json
49
+ ```
50
+
51
+ ## More information
52
+
53
+ **For more information about the original model, see its [model card](https://huggingface.co/Hungarians/whisper-tiny-cv16-hu-v2).**
int8/hash.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "README.md": "ddaeacaba1425164423be2c16bcee6e8",
3
+ "vocabulary.txt": "980d7011195d0c733bd374e31708717f",
4
+ "config.json": "f30b4fcb198aa49c4b8b905f6d6a1ca3",
5
+ "model.bin": "ad6f669131ccc36ef33d8d1adf1610de",
6
+ "preprocessor_config.json": "15d1d7ee1cc6801b71f8ab68966aed86",
7
+ "tokenizer_config.json": "ea5ff3bfa7553fabbfbcb02846302bbc",
8
+ "vocabulary.json": "aebe7623626c8f3f61cc5208ff29c348",
9
+ "hash.json": "d41d8cd98f00b204e9800998ecf8427e"
10
+ }
int8/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ace8dacb69d489544478bf09b835f9702b649da12dbdf1577a48b79f492a6af
3
+ size 42120441
int8/preprocessor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "chunk_length": 30,
3
+ "feature_extractor_type": "WhisperFeatureExtractor",
4
+ "feature_size": 80,
5
+ "hop_length": 160,
6
+ "n_fft": 400,
7
+ "n_samples": 480000,
8
+ "nb_max_frames": 3000,
9
+ "padding_side": "right",
10
+ "padding_value": 0.0,
11
+ "processor_class": "WhisperProcessor",
12
+ "return_attention_mask": false,
13
+ "sampling_rate": 16000
14
+ }
int8/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
int8/vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
int8/vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff