Automatic Speech Recognition
Transformers
TensorBoard
Safetensors
Irish
English
whisper
Generated from Trainer
Eval Results
Inference Endpoints
Edit model card

Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3818
  • Bleu: 33.79
  • Chrf: 51.67
  • Wer: 61.6839

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.4382 0.0109 100 3.07 16.85 2.1114 171.0491
2.6151 0.0219 200 6.25 23.02 2.0207 126.9698
2.5699 0.0328 300 5.71 24.03 1.8660 155.5606
2.3084 0.0438 400 9.87 28.45 1.8084 129.0860
2.3327 0.0547 500 12.01 31.92 1.7823 102.7915
2.1495 0.0657 600 13.97 32.4 1.7238 98.6042
2.2164 0.0766 700 11.21 33.19 1.6538 146.0153
2.0071 0.0876 800 14.34 35.72 1.7038 96.9383
1.8334 0.0985 900 16.51 37.23 1.6329 96.8032
1.8359 0.1095 1000 17.87 35.94 1.6637 84.4665
1.7703 0.1204 1100 19.54 39.02 1.5626 79.7839
1.5805 0.1314 1200 20.19 40.4 1.5618 77.8028
1.4545 0.1423 1300 13.88 35.53 1.5599 112.5619
1.5177 0.1533 1400 18.79 40.11 1.4880 84.6916
1.6335 0.1642 1500 16.41 38.64 1.4996 96.9833
1.3809 0.1752 1600 18.3 40.17 1.4739 101.8910
1.2694 0.1861 1700 22.53 43.15 1.4498 76.9923
1.2321 0.1970 1800 19.92 42.59 1.4163 84.6015
1.1969 0.2080 1900 21.63 44.92 1.4137 85.3670
1.2023 0.2189 2000 20.42 41.57 1.3530 82.8906
1.1676 0.2299 2100 22.82 44.23 1.3723 78.1180
1.0332 0.2408 2200 26.73 44.75 1.3641 70.2386
0.8589 0.2518 2300 26.94 46.89 1.3344 72.7600
0.9829 0.2627 2400 28.15 47.21 1.3181 69.1130
0.8228 0.2737 2500 26.98 47.41 1.3049 74.0207
0.7667 0.2846 2600 30.0 49.42 1.2698 65.1058
0.8749 0.2956 2700 27.91 47.67 1.2878 66.9518
0.7504 0.3065 2800 32.03 50.35 1.2670 63.6650
0.7069 0.3175 2900 30.7 49.53 1.2771 64.4304
0.7199 0.3284 3000 30.21 48.93 1.2658 65.5561
0.6207 0.3394 3100 30.82 49.11 1.2687 66.0063
0.5995 0.3503 3200 31.99 50.94 1.2207 62.9446
0.6294 0.3612 3300 31.05 50.85 1.2422 64.7006
0.4612 0.3722 3400 33.1 51.82 1.2203 61.9090
0.5138 0.3831 3500 32.08 51.86 1.2007 63.0797
0.5059 0.3941 3600 31.8 51.19 1.2130 63.9352
0.417 0.4050 3700 32.45 51.41 1.1975 62.2692
0.2958 0.4160 3800 29.29 51.39 1.2046 62.7645
0.393 0.4269 3900 28.95 51.45 1.1968 63.1697
0.3858 0.4379 4000 29.54 51.58 1.1929 62.4043
0.5416 0.4488 4100 1.3522 27.29 43.94 67.9424
0.6644 0.4598 4200 1.4191 23.16 44.45 77.3976
0.5246 0.4707 4300 1.4221 22.26 44.91 77.2625
0.614 0.4817 4400 1.3956 26.9 46.15 70.4638
0.5973 0.4926 4500 1.4152 25.55 45.51 76.7222
0.544 0.5036 4600 1.4091 23.54 47.87 79.1085
0.5975 0.5145 4700 1.4644 21.85 42.69 78.5682
0.4675 0.5255 4800 1.4598 22.93 43.69 76.9023
0.7959 0.5364 4900 1.3884 24.91 44.98 74.5610
0.5936 0.5473 5000 1.4235 26.91 44.88 69.0680
0.4631 0.5583 5100 1.4002 25.77 45.81 74.0207
0.5188 0.5692 5200 1.4405 28.37 45.48 66.2765
0.4675 0.5802 5300 1.4045 21.1 43.11 92.1207
0.4214 0.5911 5400 1.4250 25.62 44.82 72.2197
0.4592 0.6021 5500 1.4107 27.24 46.44 70.0585
0.4809 0.6130 5600 1.3896 27.93 47.42 69.5182
0.4364 0.6240 5700 1.3808 25.84 47.47 77.6227
0.3333 0.6349 5800 1.4203 26.46 47.08 72.4899
0.3345 0.6459 5900 1.4763 23.1 44.6 81.2247
0.3368 0.6568 6000 1.4182 24.55 45.76 80.5493
0.3061 0.6678 6100 1.4218 23.1 45.97 81.3597
0.324 0.6787 6200 1.4453 28.26 47.06 67.5822
0.2667 0.6897 6300 1.4494 27.87 46.14 69.0230
0.2845 0.7006 6400 1.4448 26.39 46.72 71.4543
0.3125 0.7115 6500 1.4643 27.81 46.45 70.0135
0.264 0.7225 6600 1.4244 26.27 47.75 72.7600
0.2426 0.7334 6700 1.4081 25.84 46.68 76.4070
0.2174 0.7444 6800 1.4036 30.67 47.92 65.8262
0.2265 0.7553 6900 1.4174 28.11 49.12 71.2292
0.2016 0.7663 7000 1.4341 30.43 49.47 65.9163
0.1865 0.7772 7100 1.3690 32.05 49.5 63.1697
0.2148 0.7882 7200 1.3603 32.29 49.91 63.8901
0.2126 0.7991 7300 1.4046 32.07 49.31 63.6650
0.1594 0.8101 7400 1.4122 29.94 47.48 65.5110
0.1295 0.8210 7500 1.4243 30.14 49.79 65.7812
0.1378 0.8320 7600 1.4334 31.23 49.42 65.9613
0.1701 0.8429 7700 1.4149 31.04 49.95 65.6461
0.1102 0.8539 7800 1.4082 31.37 50.2 63.7100
0.1267 0.8648 7900 1.3642 32.86 50.83 60.8285
0.1384 0.8758 8000 1.3860 33.47 49.61 59.8829
0.1128 0.8867 8100 1.3840 32.78 50.04 61.8190
0.1197 0.8976 8200 1.3641 33.69 50.94 61.8190
0.1181 0.9086 8300 1.3913 32.0 49.65 63.5299
0.0866 0.9195 8400 1.4171 30.39 48.48 68.0324
0.0784 0.9305 8500 1.3850 32.27 49.32 63.3949
0.092 0.9414 8600 1.3880 33.78 51.13 61.2787
0.0685 0.9524 8700 1.3876 34.33 51.23 61.1887
0.0783 0.9633 8800 1.4010 33.4 48.9 62.5844
0.0735 0.9743 8900 1.4035 33.72 49.01 61.5038
0.0875 0.9852 9000 1.4064 30.44 49.06 67.5371
0.0822 0.9962 9100 1.3803 34.64 51.51 60.5133
0.041 1.0071 9200 1.3678 34.66 52.06 59.4327
0.0351 1.0181 9300 1.3739 33.88 51.16 61.3688
0.0368 1.0290 9400 1.3846 35.2 51.73 60.4232
0.035 1.0400 9500 1.3753 34.23 51.32 60.8735
0.0277 1.0509 9600 1.3788 35.0 52.59 60.0180
0.0247 1.0619 9700 1.3914 34.69 51.7 60.2882
0.0321 1.0728 9800 1.3804 34.63 51.91 60.6033
0.0286 1.0837 9900 1.3795 33.92 51.64 61.8640
0.0239 1.0947 10000 1.3818 33.79 51.67 61.6839

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
764M params
Tensor type
F32
·
Inference API
or
This model can be loaded on Inference API (serverless).

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-v5.3.1-10k-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    33.790
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    61.684