devasheeshG
/

whisper_medium_fp16_transformers

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

whisper_medium_fp16_transformers / README.md

devasheeshG's picture

added model card

803f441 about 1 year ago

|

1.62 kB

	---
	license: apache-2.0
	---
	## Versions:

	- CUDA: 12.1
	- cuDNN Version: 8.9.2.26_1.0-1_amd64

	* tensorflow Version: 2.12.0
	* torch Version: 2.1.0.dev20230606+cu121
	* transformers Version: 4.30.2
	* accelerate Version: 0.20.3

	## BENCHMARK:

	- RAM: 2.8 GB (Original_Model: 5.5GB)
	- VRAM: 1812 MB (Original_Model: 6GB)
	- test.wav: 23 s (Multilingual Speech i.e. English+Hindi)

	\| Device Name \| float32 (Original) \| float16 \| CudaCores \| TensorCores \|
	\| ----------------- \| -------------------- \| ------- \| --------- \| ----------- \|
	\| 3060 \| 1.7 \| 1.1 \| 3,584 \| 112 \|
	\| 1660 Super \| can't use this model \| 3.3 \| 1,408 \| - \|
	\| Collab (Tesla T4) \| 2.8 \| 2.2 \| 2,560 \| 320 \|
	\| CPU \| - \| - \| - \| - \|


	- CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab GPU)
	- Punchuation: True

	## Usage

	A file ``__init__.py`` is contained inside this repo which contains all the code to use this model.

	Firstly, clone this repo and place all the files inside a folder.

	Please try in jupyter notebook

	```python
	# Import the Model
	from whisper_medium_fp16_transformers import Model
	```

	```python
	# Initilise the model
	model = Model(
	model_name_or_path='whisper_medium_fp16_transformers',
	cuda_visible_device="0",
	device='cuda',
	)
	```

	```python
	# Load Audio
	audio = model.load_audio('test.wav')
	```

	```python
	# Transcribe (First transcription takes time.)
	model.transcribe(audio)
	```