farsipal's picture
Upload 29 files
98620d9
metadata
language:
  - el
license: apache-2.0
tags:
  - whisper-event
  - generated_from_trainer
  - whisper-large
  - mozilla-foundation/common_voice_11_0
  - greek
datasets:
  - mozilla-foundation/common_voice_11_0
  - google/fleurs
metrics:
  - wer
model-index:
  - name: whisper-lg-el-intlv-xs-2
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: mozilla-foundation/common_voice_11_0 el
          type: mozilla-foundation/common_voice_11_0
          config: el
          split: test
        metrics:
          - name: Wer
            type: wer
            value: 9.50037147102526

whisper-lg-el-intlv-xs-2

This model is a fine-tuned version of farsipal/whisper-lg-el-intlv-xs on the mozilla-foundation/common_voice_11_0,google/fleurs el,el_gr dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2872
  • Wer: 9.5004

Model description

The model was trained on two interleaved datasets for transcription in the Greek language.

Intended uses & limitations

Transcription in the Greek language

Training and evaluation data

Training was performed on two interleaved datasets. Testing was performed on common voice 11.0 (el) test only.

Training procedure

                --model_name_or_path   'farsipal/whisper-lg-el-intlv-xs' \
                --model_revision   main \
                --do_train   True \
                --do_eval   True \
                --use_auth_token   False \
                --freeze_feature_encoder   False \
                --freeze_encoder   False \
                --model_index_name   'whisper-lg-el-intlv-xs-2' \
                --dataset_name 'mozilla-foundation/common_voice_11_0,google/fleurs' \
                --dataset_config_name 'el,el_gr' \
                --train_split_name  'train+validation,train+validation' \
                --eval_split_name   'test,-' \
                --text_column_name  'sentence,transcription' \
                --audio_column_name 'audio,audio' \
                --streaming   False \
                --max_duration_in_seconds   30 \
                --do_lower_case   False \
                --do_remove_punctuation   False \
                --do_normalize_eval   True \
                --language   greek \
                --task transcribe \
                --shuffle_buffer_size   500 \
                --output_dir   './data/finetuningRuns/whisper-lg-el-intlv-xs-2' \
                --overwrite_output_dir   True \
                --per_device_train_batch_size   8 \
                --gradient_accumulation_steps  4 \
                --learning_rate   3.5e-6 \
                --dropout         0.15 \
                --attention_dropout 0.05 \
                --warmup_steps   500 \
                --max_steps   5000 \
                --eval_steps   1000 \
                --gradient_checkpointing   True \
                --cache_dir   '~/.cache' \
                --fp16   True \
                --evaluation_strategy   steps \
                --per_device_eval_batch_size   8 \
                --predict_with_generate   True \
                --generation_max_length   225 \
                --save_steps   1000 \
                --logging_steps   25 \
                --report_to   tensorboard \
                --load_best_model_at_end   True \
                --metric_for_best_model   wer \
                --greater_is_better   False \
                --push_to_hub   False  \
                --dataloader_num_workers 6

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3.5e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 5000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.0813 2.49 1000 0.2147 10.8284
0.0379 4.98 2000 0.2439 10.0111
0.0195 7.46 3000 0.2767 9.8811
0.0126 9.95 4000 0.2872 9.5004
0.0103 12.44 5000 0.3021 9.6954

Framework versions

  • Transformers 4.26.0.dev0
  • Pytorch 1.13.0+cu117
  • Datasets 2.8.1.dev0
  • Tokenizers 0.13.2