File size: 1,616 Bytes
24997e8 803f441 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
license: apache-2.0
---
## Versions:
- CUDA: 12.1
- cuDNN Version: 8.9.2.26_1.0-1_amd64
* tensorflow Version: 2.12.0
* torch Version: 2.1.0.dev20230606+cu121
* transformers Version: 4.30.2
* accelerate Version: 0.20.3
## BENCHMARK:
- RAM: 2.8 GB (Original_Model: 5.5GB)
- VRAM: 1812 MB (Original_Model: 6GB)
- test.wav: 23 s (Multilingual Speech i.e. English+Hindi)
| Device Name | float32 (Original) | float16 | CudaCores | TensorCores |
| ----------------- | -------------------- | ------- | --------- | ----------- |
| 3060 | 1.7 | 1.1 | 3,584 | 112 |
| 1660 Super | can't use this model | 3.3 | 1,408 | - |
| Collab (Tesla T4) | 2.8 | 2.2 | 2,560 | 320 |
| CPU | - | - | - | - |
- CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)
- Punchuation: True
## Usage
A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.
Firstly, clone this repo and place all the files inside a folder.
**Please try in jupyter notebook**
```python
# Import the Model
from whisper_medium_fp16_transformers import Model
```
```python
# Initilise the model
model = Model(
model_name_or_path='whisper_medium_fp16_transformers',
cuda_visible_device="0",
device='cuda',
)
```
```python
# Load Audio
audio = model.load_audio('test.wav')
```
```python
# Transcribe (First transcription takes time.)
model.transcribe(audio)
```
|