Edit model card

Whisper Large GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-large on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1318
  • Bleu: 31.26
  • Chrf: 50.41
  • Wer: 62.3143

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.03
  • training_steps: 3000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
3.1547 0.03 100 3.75 18.71 2.4173 124.0882
2.6996 0.07 200 8.16 25.45 2.1329 114.1378
2.4841 0.1 300 6.4 23.6 2.0262 158.1720
2.4706 0.13 400 9.16 27.67 1.9688 120.0810
2.3575 0.16 500 13.66 31.5 1.8284 100.8555
2.1916 0.2 600 12.97 31.8 1.7486 110.1756
2.1353 0.23 700 16.7 33.52 1.7568 86.8528
1.9885 0.26 800 19.34 35.35 1.6395 78.7033
1.9126 0.3 900 20.21 36.28 1.5658 78.2080
1.6418 0.33 1000 18.61 38.49 1.4998 86.8528
1.5782 0.36 1100 22.91 40.04 1.4716 71.0941
1.4899 0.39 1200 21.55 40.92 1.4444 78.7933
1.3155 0.43 1300 24.95 42.05 1.3934 70.9140
1.4144 0.46 1400 28.38 46.18 1.2791 65.8262
1.1949 0.49 1500 26.95 45.84 1.2879 70.6889
1.0179 0.53 1600 26.12 46.4 1.2624 69.6983
1.0935 0.56 1700 28.51 48.24 1.2076 67.4021
1.061 0.59 1800 27.42 48.83 1.1812 71.4543
1.0955 0.62 1900 31.32 49.91 1.1503 62.9896
1.0607 0.66 2000 31.26 50.41 1.1318 62.3143
1.1135 0.6897 2100 1.2135 26.57 46.18 69.7884
0.9819 0.7225 2200 1.2252 26.95 49.47 65.0158
0.9909 0.7553 2300 1.2072 30.35 46.49 63.3048
0.9521 0.7882 2400 1.2130 24.76 46.44 70.6889
0.8245 0.8210 2500 1.1724 24.84 47.05 78.1630
0.8303 0.8539 2600 1.1812 27.56 47.48 70.1036
0.6934 0.8867 2700 1.1716 31.61 50.4 63.8001
0.7117 0.9195 2800 1.1650 30.82 49.95 65.0158
0.6944 0.9524 2900 1.1516 31.21 49.8 63.5750
0.7132 0.9852 3000 1.1390 30.16 49.77 65.6011

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
12
Safetensors
Model size
1.54B params
Tensor type
F32
·

Finetuned from

Datasets used to train ymoslem/whisper-large-ga2en-v2.1

Collection including ymoslem/whisper-large-ga2en-v2.1

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    30.160
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    65.601