kurianbenoy commited on
Commit
ecc934a
β€’
1 Parent(s): 5857953
Files changed (1) hide show
  1. app.ipynb +630 -0
app.ipynb ADDED
@@ -0,0 +1,630 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "807e94de-b600-46ca-9808-372619e38e69",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Making kurianbenoy/faster-speech-to-text-for-malayalam with Jupyter notebooks"
9
+ ]
10
+ },
11
+ {
12
+ "cell_type": "markdown",
13
+ "id": "f0e04921-4634-4d16-940f-bf8dd20bb63b",
14
+ "metadata": {},
15
+ "source": [
16
+ "## Install packages"
17
+ ]
18
+ },
19
+ {
20
+ "cell_type": "code",
21
+ "execution_count": 1,
22
+ "id": "7a6257dd-ea39-44e1-b103-3f9588d6cf4d",
23
+ "metadata": {},
24
+ "outputs": [],
25
+ "source": [
26
+ "!pip install -Uqq nbdev gradio==3.31.0 faster-whisper==0.5.1"
27
+ ]
28
+ },
29
+ {
30
+ "cell_type": "markdown",
31
+ "id": "d7ba223d-8043-4aab-8df3-f6cf3a4ac6b2",
32
+ "metadata": {},
33
+ "source": [
34
+ "## Basic inference code"
35
+ ]
36
+ },
37
+ {
38
+ "cell_type": "code",
39
+ "execution_count": 2,
40
+ "id": "22e6e9c5-7a3f-4546-8039-ecf98004235b",
41
+ "metadata": {},
42
+ "outputs": [],
43
+ "source": [
44
+ "#|export\n",
45
+ "import gradio as gr\n",
46
+ "from faster_whisper import WhisperModel"
47
+ ]
48
+ },
49
+ {
50
+ "cell_type": "code",
51
+ "execution_count": 3,
52
+ "id": "81691362-0c73-4af0-9f99-96ffb7dc318b",
53
+ "metadata": {},
54
+ "outputs": [
55
+ {
56
+ "data": {
57
+ "text/plain": [
58
+ "'3.31.0'"
59
+ ]
60
+ },
61
+ "execution_count": 3,
62
+ "metadata": {},
63
+ "output_type": "execute_result"
64
+ }
65
+ ],
66
+ "source": [
67
+ "gr.__version__"
68
+ ]
69
+ },
70
+ {
71
+ "cell_type": "code",
72
+ "execution_count": 10,
73
+ "id": "5f4d3586-a6b9-4d3e-b02a-9f25f5068dbe",
74
+ "metadata": {},
75
+ "outputs": [
76
+ {
77
+ "ename": "AttributeError",
78
+ "evalue": "module 'faster_whisper' has no attribute '__version__'",
79
+ "output_type": "error",
80
+ "traceback": [
81
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
82
+ "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
83
+ "Cell \u001b[0;32mIn[10], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mfaster_whisper\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m \u001b[43mfaster_whisper\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m__version__\u001b[49m\n",
84
+ "\u001b[0;31mAttributeError\u001b[0m: module 'faster_whisper' has no attribute '__version__'"
85
+ ]
86
+ }
87
+ ],
88
+ "source": [
89
+ "# import faster_whisper\n",
90
+ "# faster_whisper.__version__"
91
+ ]
92
+ },
93
+ {
94
+ "cell_type": "code",
95
+ "execution_count": 33,
96
+ "id": "de8e21b9-449a-4ae3-bd64-bba334075fdd",
97
+ "metadata": {},
98
+ "outputs": [],
99
+ "source": [
100
+ "def t_asr(folder=\"vegam-whisper-medium-ml-fp16\", audio_file=\"00b38e80-80b8-4f70-babf-566e848879fc.webm\", compute_type=\"float16\", device=\"cpu\"):\n",
101
+ " model = WhisperModel(folder, device=device, compute_type=compute_type)\n",
102
+ " \n",
103
+ " segments, info = model.transcribe(audio_file, beam_size=5)\n",
104
+ " \n",
105
+ " for segment in segments:\n",
106
+ " print(\"[%.2fs -> %.2fs] %s\" % (segment.start, segment.end, segment.text))"
107
+ ]
108
+ },
109
+ {
110
+ "cell_type": "code",
111
+ "execution_count": 31,
112
+ "id": "87c58dd2-7d3d-4fb3-821c-cdac673fee0d",
113
+ "metadata": {},
114
+ "outputs": [
115
+ {
116
+ "name": "stdout",
117
+ "output_type": "stream",
118
+ "text": [
119
+ "[0.00s -> 4.58s] ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΅ΰ΄•ΰ΅ΰ΄•ΰ΅ΰ΄΅ΰ΅‹ΰ΄³ΰ΄‚ നാരായണ ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΄¨ΰ΅ΰ΄¨ΰ΄Ύΰ΄²ΰ΅‹ ΰ΄•ΰ΅‚ΰ΄°ΰ΄Ύΰ΄―ΰ΄£\n",
120
+ "CPU times: user 42.2 s, sys: 9.58 s, total: 51.8 s\n",
121
+ "Wall time: 13.5 s\n"
122
+ ]
123
+ }
124
+ ],
125
+ "source": [
126
+ "%%time\n",
127
+ "t_asr(compute_type=\"int8\")"
128
+ ]
129
+ },
130
+ {
131
+ "cell_type": "code",
132
+ "execution_count": 28,
133
+ "id": "a5624cf5-b3b8-4ae3-aa82-ee19505bb42d",
134
+ "metadata": {},
135
+ "outputs": [
136
+ {
137
+ "name": "stdout",
138
+ "output_type": "stream",
139
+ "text": [
140
+ "Detected language 'ta' with probability 0.372757\n",
141
+ "[0.00s -> 4.74s] ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΅ΰ΄•ΰ΅ΰ΄•ΰ΅ΰ΄΅ΰ΅‹ΰ΄³ΰ΄‚ നാരായണ ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΄¨ΰ΅ΰ΄¨ΰ΄Ύΰ΄²ΰ΅Š ΰ΄•ΰ΅‚ΰ΄°ΰ΄Ύΰ΄―ΰ΄£\n",
142
+ "CPU times: user 36.5 s, sys: 9.52 s, total: 46.1 s\n",
143
+ "Wall time: 12.3 s\n"
144
+ ]
145
+ }
146
+ ],
147
+ "source": [
148
+ "%%time\n",
149
+ "t_asr(folder=\"vegam-whisper-medium-ml\", compute_type=\"int8\")"
150
+ ]
151
+ },
152
+ {
153
+ "cell_type": "code",
154
+ "execution_count": 34,
155
+ "id": "25e1413f-8f80-4704-a94e-26b8d9581a6a",
156
+ "metadata": {},
157
+ "outputs": [
158
+ {
159
+ "name": "stdout",
160
+ "output_type": "stream",
161
+ "text": [
162
+ "[0.00s -> 4.58s] ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΅ΰ΄•ΰ΅ΰ΄•ΰ΅ΰ΄΅ΰ΅‹ΰ΄³ΰ΄‚ നാരായണ ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΄¨ΰ΅ΰ΄¨ΰ΄Ύΰ΄²ΰ΅‹ ΰ΄•ΰ΅‚ΰ΄°ΰ΄Ύΰ΄―ΰ΄£\n",
163
+ "CPU times: user 9.39 s, sys: 792 ms, total: 10.2 s\n",
164
+ "Wall time: 4.51 s\n"
165
+ ]
166
+ }
167
+ ],
168
+ "source": [
169
+ "%%time\n",
170
+ "t_asr(compute_type=\"int8\", device=\"cuda\")"
171
+ ]
172
+ },
173
+ {
174
+ "cell_type": "code",
175
+ "execution_count": 4,
176
+ "id": "48cd4ec3-512f-49d0-87ac-3ef989e25b80",
177
+ "metadata": {},
178
+ "outputs": [],
179
+ "source": [
180
+ "#|export\n",
181
+ "def transcribe_malayalam_speech(audio_file, compute_type=\"int8\", device=\"cpu\", folder=\"vegam-whisper-medium-ml-fp16\"):\n",
182
+ " model = WhisperModel(folder, device=device, compute_type=compute_type)\n",
183
+ " segments, info = model.transcribe(audio_file, beam_size=5)\n",
184
+ "\n",
185
+ " lst = []\n",
186
+ " for segment in segments:\n",
187
+ " # print(\"[%.2fs -> %.2fs] %s\" % (segment.start, segment.end, segment.text))\n",
188
+ " lst.append(segment.text)\n",
189
+ "\n",
190
+ " return(\" \".join(lst))"
191
+ ]
192
+ },
193
+ {
194
+ "cell_type": "code",
195
+ "execution_count": 5,
196
+ "id": "14fda29a-aee1-44b2-9269-048cc8b98ea8",
197
+ "metadata": {},
198
+ "outputs": [
199
+ {
200
+ "name": "stdout",
201
+ "output_type": "stream",
202
+ "text": [
203
+ "CPU times: user 43.1 s, sys: 12.3 s, total: 55.4 s\n",
204
+ "Wall time: 14.8 s\n"
205
+ ]
206
+ },
207
+ {
208
+ "data": {
209
+ "text/plain": [
210
+ "'ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΅ΰ΄•ΰ΅ΰ΄•ΰ΅ΰ΄΅ΰ΅‹ΰ΄³ΰ΄‚ നാരായണ ΰ΄ͺΰ΄Ύΰ΄²ΰ΄‚ ΰ΄•ΰ΄Ÿΰ΄¨ΰ΅ΰ΄¨ΰ΄Ύΰ΄²ΰ΅‹ ΰ΄•ΰ΅‚ΰ΄°ΰ΄Ύΰ΄―ΰ΄£'"
211
+ ]
212
+ },
213
+ "execution_count": 5,
214
+ "metadata": {},
215
+ "output_type": "execute_result"
216
+ }
217
+ ],
218
+ "source": [
219
+ "%%time\n",
220
+ "transcribe_malayalam_speech(audio_file=\"00b38e80-80b8-4f70-babf-566e848879fc.webm\")"
221
+ ]
222
+ },
223
+ {
224
+ "cell_type": "code",
225
+ "execution_count": 6,
226
+ "id": "bbdadecf-68d1-4183-8e43-7965c1aecf6a",
227
+ "metadata": {},
228
+ "outputs": [],
229
+ "source": [
230
+ "## Haha, You are burning GPUs and wasting CO2"
231
+ ]
232
+ },
233
+ {
234
+ "cell_type": "code",
235
+ "execution_count": null,
236
+ "id": "bf706a0a-c3a2-489c-a1fe-df4fbf700d9c",
237
+ "metadata": {},
238
+ "outputs": [],
239
+ "source": []
240
+ },
241
+ {
242
+ "cell_type": "markdown",
243
+ "id": "45fade75-e0b1-4c5d-90a3-ebd7345a4d16",
244
+ "metadata": {},
245
+ "source": [
246
+ "## Figure out Whisper Demo by Huggingface"
247
+ ]
248
+ },
249
+ {
250
+ "cell_type": "code",
251
+ "execution_count": 36,
252
+ "id": "fa06f8a6-87b7-45af-b36b-fb5ebe362455",
253
+ "metadata": {},
254
+ "outputs": [
255
+ {
256
+ "data": {
257
+ "application/vnd.jupyter.widget-view+json": {
258
+ "model_id": "e437727ccbcd40838a43a0c1bbb00143",
259
+ "version_major": 2,
260
+ "version_minor": 0
261
+ },
262
+ "text/plain": [
263
+ "Downloading (…)lve/main/config.json: 0%| | 0.00/1.97k [00:00<?, ?B/s]"
264
+ ]
265
+ },
266
+ "metadata": {},
267
+ "output_type": "display_data"
268
+ },
269
+ {
270
+ "data": {
271
+ "application/vnd.jupyter.widget-view+json": {
272
+ "model_id": "2f654c303e24413cb73990bdd9d99907",
273
+ "version_major": 2,
274
+ "version_minor": 0
275
+ },
276
+ "text/plain": [
277
+ "Downloading pytorch_model.bin: 0%| | 0.00/967M [00:00<?, ?B/s]"
278
+ ]
279
+ },
280
+ "metadata": {},
281
+ "output_type": "display_data"
282
+ },
283
+ {
284
+ "data": {
285
+ "application/vnd.jupyter.widget-view+json": {
286
+ "model_id": "16386d3b586d475fa021ea8d6f925161",
287
+ "version_major": 2,
288
+ "version_minor": 0
289
+ },
290
+ "text/plain": [
291
+ "Downloading (…)neration_config.json: 0%| | 0.00/3.51k [00:00<?, ?B/s]"
292
+ ]
293
+ },
294
+ "metadata": {},
295
+ "output_type": "display_data"
296
+ },
297
+ {
298
+ "data": {
299
+ "application/vnd.jupyter.widget-view+json": {
300
+ "model_id": "bf5964b1ba024ce685a04127f21f78d0",
301
+ "version_major": 2,
302
+ "version_minor": 0
303
+ },
304
+ "text/plain": [
305
+ "Downloading (…)okenizer_config.json: 0%| | 0.00/842 [00:00<?, ?B/s]"
306
+ ]
307
+ },
308
+ "metadata": {},
309
+ "output_type": "display_data"
310
+ },
311
+ {
312
+ "data": {
313
+ "application/vnd.jupyter.widget-view+json": {
314
+ "model_id": "038082a393084da998eed2085960e634",
315
+ "version_major": 2,
316
+ "version_minor": 0
317
+ },
318
+ "text/plain": [
319
+ "Downloading (…)olve/main/vocab.json: 0%| | 0.00/1.04M [00:00<?, ?B/s]"
320
+ ]
321
+ },
322
+ "metadata": {},
323
+ "output_type": "display_data"
324
+ },
325
+ {
326
+ "data": {
327
+ "application/vnd.jupyter.widget-view+json": {
328
+ "model_id": "105ef799439d4c1ea0e3d2cbbfbcaf5d",
329
+ "version_major": 2,
330
+ "version_minor": 0
331
+ },
332
+ "text/plain": [
333
+ "Downloading (…)/main/tokenizer.json: 0%| | 0.00/2.20M [00:00<?, ?B/s]"
334
+ ]
335
+ },
336
+ "metadata": {},
337
+ "output_type": "display_data"
338
+ },
339
+ {
340
+ "data": {
341
+ "application/vnd.jupyter.widget-view+json": {
342
+ "model_id": "e3369330ed9a4a9f8208ba6f160210bf",
343
+ "version_major": 2,
344
+ "version_minor": 0
345
+ },
346
+ "text/plain": [
347
+ "Downloading (…)olve/main/merges.txt: 0%| | 0.00/494k [00:00<?, ?B/s]"
348
+ ]
349
+ },
350
+ "metadata": {},
351
+ "output_type": "display_data"
352
+ },
353
+ {
354
+ "data": {
355
+ "application/vnd.jupyter.widget-view+json": {
356
+ "model_id": "4c3a9c73c84245b0b88e42980d65abdf",
357
+ "version_major": 2,
358
+ "version_minor": 0
359
+ },
360
+ "text/plain": [
361
+ "Downloading (…)main/normalizer.json: 0%| | 0.00/52.7k [00:00<?, ?B/s]"
362
+ ]
363
+ },
364
+ "metadata": {},
365
+ "output_type": "display_data"
366
+ },
367
+ {
368
+ "data": {
369
+ "application/vnd.jupyter.widget-view+json": {
370
+ "model_id": "8d61551d78914036a2b6475a6d840663",
371
+ "version_major": 2,
372
+ "version_minor": 0
373
+ },
374
+ "text/plain": [
375
+ "Downloading (…)in/added_tokens.json: 0%| | 0.00/2.08k [00:00<?, ?B/s]"
376
+ ]
377
+ },
378
+ "metadata": {},
379
+ "output_type": "display_data"
380
+ },
381
+ {
382
+ "data": {
383
+ "application/vnd.jupyter.widget-view+json": {
384
+ "model_id": "262213d3b6364b4e8648180c903c3008",
385
+ "version_major": 2,
386
+ "version_minor": 0
387
+ },
388
+ "text/plain": [
389
+ "Downloading (…)cial_tokens_map.json: 0%| | 0.00/2.08k [00:00<?, ?B/s]"
390
+ ]
391
+ },
392
+ "metadata": {},
393
+ "output_type": "display_data"
394
+ },
395
+ {
396
+ "data": {
397
+ "application/vnd.jupyter.widget-view+json": {
398
+ "model_id": "258a3b7a9eb94dcdb8355c09c1b683b3",
399
+ "version_major": 2,
400
+ "version_minor": 0
401
+ },
402
+ "text/plain": [
403
+ "Downloading (…)rocessor_config.json: 0%| | 0.00/185k [00:00<?, ?B/s]"
404
+ ]
405
+ },
406
+ "metadata": {},
407
+ "output_type": "display_data"
408
+ }
409
+ ],
410
+ "source": [
411
+ "import torch\n",
412
+ "from transformers import pipeline\n",
413
+ "from huggingface_hub import model_info\n",
414
+ "\n",
415
+ "MODEL_NAME = \"openai/whisper-small\" #this always needs to stay in line 8 :D sorry for the hackiness\n",
416
+ "lang = \"en\"\n",
417
+ "\n",
418
+ "device = 0 if torch.cuda.is_available() else \"cpu\"\n",
419
+ "pipe = pipeline(\n",
420
+ " task=\"automatic-speech-recognition\",\n",
421
+ " model=MODEL_NAME,\n",
422
+ " chunk_length_s=30,\n",
423
+ " device=device,\n",
424
+ ")\n",
425
+ "\n",
426
+ "pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task=\"transcribe\")\n",
427
+ "\n",
428
+ "def transcribe(microphone, file_upload):\n",
429
+ " warn_output = \"\"\n",
430
+ " if (microphone is not None) and (file_upload is not None):\n",
431
+ " warn_output = (\n",
432
+ " \"WARNING: You've uploaded an audio file and used the microphone. \"\n",
433
+ " \"The recorded file from the microphone will be used and the uploaded audio will be discarded.\\n\"\n",
434
+ " )\n",
435
+ "\n",
436
+ " elif (microphone is None) and (file_upload is None):\n",
437
+ " return \"ERROR: You have to either use the microphone or upload an audio file\"\n",
438
+ "\n",
439
+ " file = microphone if microphone is not None else file_upload\n",
440
+ "\n",
441
+ " text = pipe(file)[\"text\"]\n",
442
+ "\n",
443
+ " return warn_output + text"
444
+ ]
445
+ },
446
+ {
447
+ "cell_type": "code",
448
+ "execution_count": null,
449
+ "id": "023ffa7c-b82f-49ea-b6ca-00f84e2c8698",
450
+ "metadata": {},
451
+ "outputs": [],
452
+ "source": []
453
+ },
454
+ {
455
+ "cell_type": "markdown",
456
+ "id": "fe37c9e1-bc56-422d-9547-be94ab4e4844",
457
+ "metadata": {},
458
+ "source": [
459
+ "## Make an app with Gradio"
460
+ ]
461
+ },
462
+ {
463
+ "cell_type": "code",
464
+ "execution_count": 38,
465
+ "id": "9badfdcd-dd99-49ea-a318-eda88cddefb6",
466
+ "metadata": {},
467
+ "outputs": [
468
+ {
469
+ "name": "stdout",
470
+ "output_type": "stream",
471
+ "text": [
472
+ "Running on local URL: http://0.0.0.0:6007\n",
473
+ "Running on public URL: https://537af5b5b55ed185f5.gradio.live\n",
474
+ "\n",
475
+ "This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces\n"
476
+ ]
477
+ },
478
+ {
479
+ "data": {
480
+ "text/html": [
481
+ "<div><iframe src=\"https://537af5b5b55ed185f5.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
482
+ ],
483
+ "text/plain": [
484
+ "<IPython.core.display.HTML object>"
485
+ ]
486
+ },
487
+ "metadata": {},
488
+ "output_type": "display_data"
489
+ },
490
+ {
491
+ "data": {
492
+ "text/plain": []
493
+ },
494
+ "execution_count": 38,
495
+ "metadata": {},
496
+ "output_type": "execute_result"
497
+ }
498
+ ],
499
+ "source": [
500
+ "import gradio as gr\n",
501
+ "\n",
502
+ "def greet(name):\n",
503
+ " return \"Hello \" + name + \"!!\"\n",
504
+ "\n",
505
+ "iface = gr.Interface(fn=greet, inputs=\"text\", outputs=\"text\")\n",
506
+ "iface.launch(share=True)"
507
+ ]
508
+ },
509
+ {
510
+ "cell_type": "code",
511
+ "execution_count": 41,
512
+ "id": "81f3b241-8a6d-4ff0-bb70-d389d4d4e93a",
513
+ "metadata": {},
514
+ "outputs": [],
515
+ "source": [
516
+ "mf_transcribe = gr.Interface(\n",
517
+ " fn=transcribe,\n",
518
+ " inputs=[\n",
519
+ " gr.inputs.Audio(source=\"microphone\", type=\"filepath\", optional=True),\n",
520
+ " gr.inputs.Audio(source=\"upload\", type=\"filepath\", optional=True),\n",
521
+ " ],\n",
522
+ " outputs=\"text\",\n",
523
+ " title=\"Whisper Demo: Transcribe Audio\",\n",
524
+ " description=(\n",
525
+ " \"Transcribe long-form microphone or audio inputs with the click of a button! Demo uses the the fine-tuned\"\n",
526
+ " f\" checkpoint [{MODEL_NAME}](https://huggingface.co/{MODEL_NAME}) and πŸ€— Transformers to transcribe audio files\"\n",
527
+ " \" of arbitrary length.\"\n",
528
+ " ),\n",
529
+ " allow_flagging=\"never\",\n",
530
+ ")"
531
+ ]
532
+ },
533
+ {
534
+ "cell_type": "code",
535
+ "execution_count": null,
536
+ "id": "b1e34fa5-8340-4329-a348-b641ca4db341",
537
+ "metadata": {},
538
+ "outputs": [],
539
+ "source": []
540
+ },
541
+ {
542
+ "cell_type": "markdown",
543
+ "id": "7ec1f78d-d9c0-46c7-9466-0408bc6c6cdc",
544
+ "metadata": {},
545
+ "source": [
546
+ "## Create a requirements.txt file"
547
+ ]
548
+ },
549
+ {
550
+ "cell_type": "code",
551
+ "execution_count": 14,
552
+ "id": "7c3e753f-5051-4c3b-a5ab-fa65c7e7cae9",
553
+ "metadata": {},
554
+ "outputs": [
555
+ {
556
+ "name": "stdout",
557
+ "output_type": "stream",
558
+ "text": [
559
+ "Overwriting requirements.txt\n"
560
+ ]
561
+ }
562
+ ],
563
+ "source": [
564
+ "%%writefile requirements.txt\n",
565
+ "gradio==3.31.0\n",
566
+ "faster-whisper==0.5.1"
567
+ ]
568
+ },
569
+ {
570
+ "cell_type": "markdown",
571
+ "id": "43505375-9b3d-4661-93d1-11965cd8d6b5",
572
+ "metadata": {},
573
+ "source": [
574
+ "## Convert this notebook into a Gradio app"
575
+ ]
576
+ },
577
+ {
578
+ "cell_type": "code",
579
+ "execution_count": 59,
580
+ "id": "fba83810-1f0f-4777-b831-aabb4cfead39",
581
+ "metadata": {},
582
+ "outputs": [],
583
+ "source": [
584
+ "from nbdev.export import nb_export\n",
585
+ "nb_export('app.ipynb', lib_path='.', name='app')"
586
+ ]
587
+ },
588
+ {
589
+ "cell_type": "markdown",
590
+ "id": "2c7c52be-c7c4-4026-9886-ae9f71dec603",
591
+ "metadata": {},
592
+ "source": [
593
+ "## Reference\n",
594
+ "\n",
595
+ "1. [Create A πŸ€— Space From A Notebook](https://nbdev.fast.ai/blog/posts/2022-11-07-spaces/index.html)\n",
596
+ "2. [Nbdev Demo](https://gist.github.com/hamelsmu/35be07d242f3f19063c3a3839127dc67)\n",
597
+ "3. [Whisper-demo space by πŸ€—](https://huggingface.co/spaces/whisper-event/whisper-demo)"
598
+ ]
599
+ },
600
+ {
601
+ "cell_type": "code",
602
+ "execution_count": null,
603
+ "id": "5384528f-9a83-4a0d-b4fd-8ed8458b0eda",
604
+ "metadata": {},
605
+ "outputs": [],
606
+ "source": []
607
+ }
608
+ ],
609
+ "metadata": {
610
+ "kernelspec": {
611
+ "display_name": "Python 3 (ipykernel)",
612
+ "language": "python",
613
+ "name": "python3"
614
+ },
615
+ "language_info": {
616
+ "codemirror_mode": {
617
+ "name": "ipython",
618
+ "version": 3
619
+ },
620
+ "file_extension": ".py",
621
+ "mimetype": "text/x-python",
622
+ "name": "python",
623
+ "nbconvert_exporter": "python",
624
+ "pygments_lexer": "ipython3",
625
+ "version": "3.10.11"
626
+ }
627
+ },
628
+ "nbformat": 4,
629
+ "nbformat_minor": 5
630
+ }