rezaqorbani
/

WhisperCheckpoints

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

WhisperCheckpoints / README.md

rezaqorbani's picture

Update README.md

311a1cb 10 months ago

|

history blame contribute delete

No virus

1.49 kB

	---
	license: apache-2.0
	datasets:
	- mozilla-foundation/common_voice_11_0
	language:
	- sv
	metrics:
	- wer
	library_name: transformers
	---
	# Whisper finetuned on Swedish Speech

	Whisper is a state-of-the-art automatic speech recognition(ASR) model created by OpenAI. It is able to translate and transcribe multiple different languages. In this project the "small" Whisper model with 244M parameters was used. The dataset that wasa used for fine-tuning the Whisper model was the Swedish subset of the Mozilla foundation common voice 11 dataset.

	Each audio in the dataset will be truncated or padded to 30 second snippets and then converted to the log-Mel spectrogram. Once they are in the form of log-Mel spectrograms they will be sent into the Whisper model architecture. Training was done on Google Colab and during the training checkpoints were saved to google drive in case of disconnections. Additionally the models were also pushed to a huggingface model repo along with the tensorboard data to visualize the metrics.

	## Training Hyperparameters

	\| Hyperparameter \| Value \|
	\|--------------------\|-----------\|
	\| `num_train_epochs` \| 1 \|
	\| `per_device_train_batch_size` \| 16 \|
	\| `gradient_accumulation_steps` \| 1 \|
	\| `learning_rate` \| 1e-4 \|
	\| `warmup_steps` \| 50 \|
	\| `max_steps` \| 1000 \|
	\| `gradient_checkpointing` \| True \|
	\| `fp16` \| True \|
	\| `per_device_eval_batch_size` \| 8 \|
	\| `generation_max_length` \| 225 \|
	\| `save_steps` \| 250 \|
	\| `eval_steps` \| 250 \|