--- language: - el license: apache-2.0 tags: - whisper-event - generated_from_trainer datasets: - mozilla-foundation/common_voice_11_0 metrics: - wer model-index: - name: Whisper Small - Greek (el) results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: mozilla-foundation/common_voice_11_0 el type: mozilla-foundation/common_voice_11_0 config: el split: test args: el metrics: - name: Wer type: wer value: 25.696508172362552 --- # Whisper Small - Greek (el) This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the mozilla-foundation/common_voice_11_0 el dataset for translation from Greek to English. It achieves the following results on the evaluation set: - Loss: 0.4642 - Wer: 25.6965 ## Model description This model was finetuned with the encoder frozen. Only the decoder weights have been changed by this training run. ## Intended uses & limitations The purpose of this model was to understand how the freezing of a part of the model might affect learning, in an effort to assess the feasibility of enabling adapters. ## Training and evaluation data The training was performed by streaming interleaved train+eval spits of the greek (el) subset of mozilla-foundation/common_voice_11_0 (el). The test set was similarly used for validation. ## Training procedure Fine-tuning was performed on a lambdalabs laptop equipped with an NVIDIA GeForce RTX 3080 Laptop GPU (16GB). The script used to perform the training `run_speech_recognition_seq2seq_streaming.py` is included in the files of this space with the following arguments: ``` --model_name_or_path "openai/whisper-small" --model_revision "main" --do_train True --do_eval True --use_auth_token False --freeze_encoder True --model_index_name "Whisper Small - Greek (el)" --dataset_name "mozilla-foundation/common_voice_11_0" --dataset_config_name "el" --audio_column_name "audio" --text_column_name "sentence" --max_duration_in_seconds 30 --train_split_name "train+validation" --eval_split_name "test" --do_lower_case False --do_remove_punctuation False --do_normalize_eval True --language "greek" --task "translate" --shuffle_buffer_size 500 --output_dir "./data/finetuningRuns/whisper-sm-el-frzEnc-xlate" --per_device_train_batch_size 16 --gradient_accumulation_steps 4 --learning_rate 1e-5 --warmup_steps 500 --max_steps 5000 --gradient_checkpointing True --fp16 True --evaluation_strategy "steps" --per_device_eval_batch_size 8 --predict_with_generate True --generation_max_length 225 --save_steps 1000 --eval_steps 1000 --logging_steps 25 --report_to "tensorboard" --load_best_model_at_end True --metric_for_best_model "wer" --greater_is_better False --push_to_hub False --overwrite_output_dir True ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - training_steps: 5000 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:-------:| | 0.0032 | 18.01 | 1000 | 0.4642 | 25.6965 | | 0.0006 | 37.01 | 2000 | 0.5369 | 26.4395 | | 0.0003 | 56.01 | 3000 | 0.5703 | 26.3187 | | 0.0002 | 75.0 | 4000 | 0.5913 | 26.4302 | | 0.0001 | 94.0 | 5000 | 0.5996 | 26.4952 | Upon completion of training the best model was reloaded and tested with the following results extracted from the stdout log: ``` ***** eval metrics ***** epoch = 94.0 eval_loss = 0.4642 eval_runtime = 0:19:54.59 eval_samples_per_second = 1.42 eval_steps_per_second = 0.177 eval_wer = 25.6965 ``` ### Framework versions - Transformers 4.26.0.dev0 - Pytorch 1.13.0 - Datasets 2.7.1.dev0 - Tokenizers 0.12.1