guillaumekln commited on
Commit
5ff8305
1 Parent(s): d987667

Upload with huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +140 -0
  2. config.json +229 -0
  3. model.bin +3 -0
  4. tokenizer.json +0 -0
  5. vocabulary.txt +0 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ - de
6
+ - es
7
+ - ru
8
+ - ko
9
+ - fr
10
+ - ja
11
+ - pt
12
+ - tr
13
+ - pl
14
+ - ca
15
+ - nl
16
+ - ar
17
+ - sv
18
+ - it
19
+ - id
20
+ - hi
21
+ - fi
22
+ - vi
23
+ - he
24
+ - uk
25
+ - el
26
+ - ms
27
+ - cs
28
+ - ro
29
+ - da
30
+ - hu
31
+ - ta
32
+ - 'no'
33
+ - th
34
+ - ur
35
+ - hr
36
+ - bg
37
+ - lt
38
+ - la
39
+ - mi
40
+ - ml
41
+ - cy
42
+ - sk
43
+ - te
44
+ - fa
45
+ - lv
46
+ - bn
47
+ - sr
48
+ - az
49
+ - sl
50
+ - kn
51
+ - et
52
+ - mk
53
+ - br
54
+ - eu
55
+ - is
56
+ - hy
57
+ - ne
58
+ - mn
59
+ - bs
60
+ - kk
61
+ - sq
62
+ - sw
63
+ - gl
64
+ - mr
65
+ - pa
66
+ - si
67
+ - km
68
+ - sn
69
+ - yo
70
+ - so
71
+ - af
72
+ - oc
73
+ - ka
74
+ - be
75
+ - tg
76
+ - sd
77
+ - gu
78
+ - am
79
+ - yi
80
+ - lo
81
+ - uz
82
+ - fo
83
+ - ht
84
+ - ps
85
+ - tk
86
+ - nn
87
+ - mt
88
+ - sa
89
+ - lb
90
+ - my
91
+ - bo
92
+ - tl
93
+ - mg
94
+ - as
95
+ - tt
96
+ - haw
97
+ - ln
98
+ - ha
99
+ - ba
100
+ - jw
101
+ - su
102
+ tags:
103
+ - audio
104
+ - automatic-speech-recognition
105
+ license: mit
106
+ library_name: ctranslate2
107
+ ---
108
+
109
+ # Whisper base model for CTranslate2
110
+
111
+ This repository contains the conversion of [openai/whisper-base](https://huggingface.co/openai/whisper-base) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format.
112
+
113
+ This model can be used in CTranslate2 or projets based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper).
114
+
115
+ ## Example
116
+
117
+ ```python
118
+ from faster_whisper import WhisperModel
119
+
120
+ model = WhisperModel("base")
121
+
122
+ segments, info = model.transcribe("audio.mp3")
123
+ for segment in segments:
124
+ print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
125
+ ```
126
+
127
+ ## Conversion details
128
+
129
+ The original model was converted with the following command:
130
+
131
+ ```
132
+ ct2-transformers-converter --model openai/whisper-base --output_dir faster-whisper-base \
133
+ --copy_files tokenizer.json --quantization float16
134
+ ```
135
+
136
+ Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
137
+
138
+ ## More information
139
+
140
+ **For more information about the original model, see its [model card](https://huggingface.co/openai/whisper-base).**
config.json ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alignment_heads": [
3
+ [
4
+ 3,
5
+ 1
6
+ ],
7
+ [
8
+ 4,
9
+ 2
10
+ ],
11
+ [
12
+ 4,
13
+ 3
14
+ ],
15
+ [
16
+ 4,
17
+ 7
18
+ ],
19
+ [
20
+ 5,
21
+ 1
22
+ ],
23
+ [
24
+ 5,
25
+ 2
26
+ ],
27
+ [
28
+ 5,
29
+ 4
30
+ ],
31
+ [
32
+ 5,
33
+ 6
34
+ ]
35
+ ],
36
+ "lang_ids": [
37
+ 50259,
38
+ 50260,
39
+ 50261,
40
+ 50262,
41
+ 50263,
42
+ 50264,
43
+ 50265,
44
+ 50266,
45
+ 50267,
46
+ 50268,
47
+ 50269,
48
+ 50270,
49
+ 50271,
50
+ 50272,
51
+ 50273,
52
+ 50274,
53
+ 50275,
54
+ 50276,
55
+ 50277,
56
+ 50278,
57
+ 50279,
58
+ 50280,
59
+ 50281,
60
+ 50282,
61
+ 50283,
62
+ 50284,
63
+ 50285,
64
+ 50286,
65
+ 50287,
66
+ 50288,
67
+ 50289,
68
+ 50290,
69
+ 50291,
70
+ 50292,
71
+ 50293,
72
+ 50294,
73
+ 50295,
74
+ 50296,
75
+ 50297,
76
+ 50298,
77
+ 50299,
78
+ 50300,
79
+ 50301,
80
+ 50302,
81
+ 50303,
82
+ 50304,
83
+ 50305,
84
+ 50306,
85
+ 50307,
86
+ 50308,
87
+ 50309,
88
+ 50310,
89
+ 50311,
90
+ 50312,
91
+ 50313,
92
+ 50314,
93
+ 50315,
94
+ 50316,
95
+ 50317,
96
+ 50318,
97
+ 50319,
98
+ 50320,
99
+ 50321,
100
+ 50322,
101
+ 50323,
102
+ 50324,
103
+ 50325,
104
+ 50326,
105
+ 50327,
106
+ 50328,
107
+ 50329,
108
+ 50330,
109
+ 50331,
110
+ 50332,
111
+ 50333,
112
+ 50334,
113
+ 50335,
114
+ 50336,
115
+ 50337,
116
+ 50338,
117
+ 50339,
118
+ 50340,
119
+ 50341,
120
+ 50342,
121
+ 50343,
122
+ 50344,
123
+ 50345,
124
+ 50346,
125
+ 50347,
126
+ 50348,
127
+ 50349,
128
+ 50350,
129
+ 50351,
130
+ 50352,
131
+ 50353,
132
+ 50354,
133
+ 50355,
134
+ 50356,
135
+ 50357
136
+ ],
137
+ "suppress_ids": [
138
+ 1,
139
+ 2,
140
+ 7,
141
+ 8,
142
+ 9,
143
+ 10,
144
+ 14,
145
+ 25,
146
+ 26,
147
+ 27,
148
+ 28,
149
+ 29,
150
+ 31,
151
+ 58,
152
+ 59,
153
+ 60,
154
+ 61,
155
+ 62,
156
+ 63,
157
+ 90,
158
+ 91,
159
+ 92,
160
+ 93,
161
+ 359,
162
+ 503,
163
+ 522,
164
+ 542,
165
+ 873,
166
+ 893,
167
+ 902,
168
+ 918,
169
+ 922,
170
+ 931,
171
+ 1350,
172
+ 1853,
173
+ 1982,
174
+ 2460,
175
+ 2627,
176
+ 3246,
177
+ 3253,
178
+ 3268,
179
+ 3536,
180
+ 3846,
181
+ 3961,
182
+ 4183,
183
+ 4667,
184
+ 6585,
185
+ 6647,
186
+ 7273,
187
+ 9061,
188
+ 9383,
189
+ 10428,
190
+ 10929,
191
+ 11938,
192
+ 12033,
193
+ 12331,
194
+ 12562,
195
+ 13793,
196
+ 14157,
197
+ 14635,
198
+ 15265,
199
+ 15618,
200
+ 16553,
201
+ 16604,
202
+ 18362,
203
+ 18956,
204
+ 20075,
205
+ 21675,
206
+ 22520,
207
+ 26130,
208
+ 26161,
209
+ 26435,
210
+ 28279,
211
+ 29464,
212
+ 31650,
213
+ 32302,
214
+ 32470,
215
+ 36865,
216
+ 42863,
217
+ 47425,
218
+ 49870,
219
+ 50254,
220
+ 50258,
221
+ 50360,
222
+ 50361,
223
+ 50362
224
+ ],
225
+ "suppress_ids_begin": [
226
+ 220,
227
+ 50257
228
+ ]
229
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d01c3014881c9c6f3133c182f3d2887eb6ca1c789a7538c5c007196857a0a6a9
3
+ size 145217532
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff