|
--- |
|
license: apache-2.0 |
|
--- |
|
## Versions: |
|
|
|
- CUDA: 12.1 |
|
- cuDNN Version: 8.9.2.26_1.0-1_amd64 |
|
|
|
* tensorflow Version: 2.12.0 |
|
* torch Version: 2.1.0.dev20230606+cu121 |
|
* transformers Version: 4.30.2 |
|
* accelerate Version: 0.20.3 |
|
|
|
## BENCHMARK: |
|
|
|
- RAM: 2.8 GB (Original_Model: 5.5GB) |
|
- VRAM: 1812 MB (Original_Model: 6GB) |
|
- test.wav: 23 s (Multilingual Speech i.e. English+Hindi) |
|
|
|
| Device Name | float32 (Original) | float16 | CudaCores | TensorCores | |
|
| ----------------- | -------------------- | ------- | --------- | ----------- | |
|
| 3060 | 1.7 | 1.1 | 3,584 | 112 | |
|
| 1660 Super | can't use this model | 3.3 | 1,408 | - | |
|
| Collab (Tesla T4) | 2.8 | 2.2 | 2,560 | 320 | |
|
| CPU | - | - | - | - | |
|
|
|
|
|
- CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU) |
|
- Punchuation: True |
|
|
|
## Usage |
|
|
|
A file ``__init__.py`` is contained inside this repo which contains all the code to use this model. |
|
|
|
Firstly, clone this repo and place all the files inside a folder. |
|
|
|
**Please try in jupyter notebook** |
|
|
|
```python |
|
# Import the Model |
|
from whisper_medium_fp16_transformers import Model |
|
``` |
|
|
|
```python |
|
# Initilise the model |
|
model = Model( |
|
model_name_or_path='whisper_medium_fp16_transformers', |
|
cuda_visible_device="0", |
|
device='cuda', |
|
) |
|
``` |
|
|
|
```python |
|
# Load Audio |
|
audio = model.load_audio('test.wav') |
|
``` |
|
|
|
```python |
|
# Transcribe (First transcription takes time.) |
|
model.transcribe(audio) |
|
``` |
|
|