File size: 1,616 Bytes
24997e8
 
 
803f441
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: apache-2.0
---
## Versions:

- CUDA: 12.1
- cuDNN Version: 8.9.2.26_1.0-1_amd64

* tensorflow Version: 2.12.0
* torch Version: 2.1.0.dev20230606+cu121
* transformers Version: 4.30.2
* accelerate Version: 0.20.3

## BENCHMARK:

- RAM: 2.8 GB (Original_Model: 5.5GB)
- VRAM: 1812 MB (Original_Model: 6GB)
- test.wav: 23 s (Multilingual Speech i.e. English+Hindi)

  | Device Name       | float32 (Original)   | float16 | CudaCores | TensorCores |
  | ----------------- | -------------------- | ------- | --------- | ----------- |
  | 3060              | 1.7                  | 1.1     | 3,584     | 112         |
  | 1660 Super        | can't use this model | 3.3     | 1,408     | -           |
  | Collab (Tesla T4) | 2.8                  | 2.2     | 2,560     | 320         |
  | CPU               | -                    | -       | -         | -           |


  - CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)
- Punchuation: True

## Usage

A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.

Firstly, clone this repo and place all the files inside a folder.

**Please try in jupyter notebook**

```python
# Import the Model
from whisper_medium_fp16_transformers import Model
```

```python
# Initilise the model
model = Model(
            model_name_or_path='whisper_medium_fp16_transformers',
            cuda_visible_device="0", 
            device='cuda',
      )
```

```python
# Load Audio
audio = model.load_audio('test.wav')
```

```python
# Transcribe (First transcription takes time.)
model.transcribe(audio)
```