Edit model card

Whisper Medium GA-EN Speech Translation Raw

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, and SpokenWords dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6246
  • Bleu: 27.65
  • Chrf: 47.08
  • Wer: 71.0941

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.3743 0.0539 100 2.1064 5.67 20.91 126.9248
2.3196 0.1079 200 2.1133 11.35 26.01 89.5092
2.2729 0.1618 300 2.0561 6.85 25.04 156.5061
2.0887 0.2157 400 1.9701 10.46 29.21 118.6853
1.9663 0.2697 500 1.9824 16.53 31.2 77.5326
1.9504 0.3236 600 1.8619 7.02 27.46 193.7416
1.7843 0.3776 700 1.8683 16.6 33.6 87.7082
1.8915 0.4315 800 1.7730 16.89 36.54 91.8505
1.6921 0.4854 900 1.8049 13.14 34.45 114.0477
1.4761 0.5394 1000 1.8310 22.12 37.3 77.1724
1.3067 0.5933 1100 1.7911 17.21 34.34 90.5448
1.3564 0.6472 1200 1.7045 20.09 39.67 85.1869
1.489 0.7012 1300 1.7601 15.3 36.53 107.8793
1.3023 0.7551 1400 1.7428 18.99 39.54 89.7794
1.1744 0.8091 1500 1.7446 21.68 41.78 79.4687
1.0122 0.8630 1600 1.7180 18.28 39.27 96.7582
1.0787 0.9169 1700 1.6144 16.94 39.74 98.8744
0.9561 0.9709 1800 1.6290 25.29 42.13 74.9662
0.4452 1.0248 1900 1.7223 18.95 39.14 97.0734
0.4397 1.0787 2000 1.6855 23.4 40.9 77.9379
0.4382 1.1327 2100 1.6911 24.95 41.19 72.8951
0.3937 1.1866 2200 1.7127 23.33 41.09 78.4331
0.4119 1.2406 2300 1.6796 23.25 42.32 83.6560
0.4139 1.2945 2400 1.6730 23.13 43.25 83.3408
0.3506 1.3484 2500 1.7361 23.37 42.31 79.9190
0.4109 1.4024 2600 1.6233 23.78 44.32 82.8005
0.3563 1.4563 2700 1.6383 20.41 43.66 98.1540
0.3355 1.5102 2800 1.6675 25.27 44.91 75.6866
0.2751 1.5642 2900 1.7011 24.64 43.19 74.2008
0.28 1.6181 3000 1.6308 24.76 45.49 79.4687
0.3108 1.6721 3100 1.5976 28.9 47.03 68.7978
0.3231 1.7260 3200 1.6070 27.82 46.1 69.8334
0.2665 1.7799 3300 1.5853 26.0 44.51 74.9212
0.2788 1.8339 3400 1.5689 26.37 46.94 75.0113
0.243 1.8878 3500 1.5885 29.12 46.94 67.4021
0.2605 1.9417 3600 1.5680 28.64 46.38 67.8523
0.1664 1.9957 3700 1.5910 28.45 46.64 68.0774
0.049 2.0496 3800 1.6385 27.78 46.51 69.9235
0.0635 2.1036 3900 1.6272 27.57 47.25 71.1391
0.0467 2.1575 4000 1.6246 27.65 47.08 71.0941

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
764M params
Tensor type
F32
·
Inference API
or
This model can be loaded on Inference API (serverless).

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-a-v1-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, and SpokenWords
    self-reported
    27.650
  • Wer on IWSLT-2023, FLEURS, BiteSize, and SpokenWords
    self-reported
    71.094