Edit model card

Whisper finetuned on Swedish Speech

Whisper is a state-of-the-art automatic speech recognition(ASR) model created by OpenAI. It is able to translate and transcribe multiple different languages. In this project the "small" Whisper model with 244M parameters was used. The dataset that wasa used for fine-tuning the Whisper model was the Swedish subset of the Mozilla foundation common voice 11 dataset.

Each audio in the dataset will be truncated or padded to 30 second snippets and then converted to the log-Mel spectrogram. Once they are in the form of log-Mel spectrograms they will be sent into the Whisper model architecture. Training was done on Google Colab and during the training checkpoints were saved to google drive in case of disconnections. Additionally the models were also pushed to a huggingface model repo along with the tensorboard data to visualize the metrics.

Training Hyperparameters

Hyperparameter Value
num_train_epochs 1
per_device_train_batch_size 16
gradient_accumulation_steps 1
learning_rate 1e-4
warmup_steps 50
max_steps 1000
gradient_checkpointing True
fp16 True
per_device_eval_batch_size 8
generation_max_length 225
save_steps 250
eval_steps 250
Downloads last month
2
Safetensors
Model size
242M params
Tensor type
F32
·

Dataset used to train rezaqorbani/WhisperCheckpoints