Edit model card

Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3291
  • Bleu: 33.46
  • Chrf: 52.93
  • Wer: 61.7740

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 9000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.4998 0.0236 100 4.24 19.77 2.0245 123.5029
2.5999 0.0472 200 5.55 23.63 2.0729 130.1666
2.4062 0.0708 300 5.92 24.15 1.9928 157.4966
2.1866 0.0944 400 12.74 30.47 1.8337 93.4714
2.2485 0.1180 500 10.32 30.65 1.8209 116.4791
2.1521 0.1416 600 9.84 30.97 1.7512 130.1666
1.9324 0.1653 700 17.24 34.37 1.7362 85.4570
1.9703 0.1889 800 13.05 32.27 1.6784 105.7632
1.7299 0.2125 900 9.81 31.71 1.6530 131.6974
1.7822 0.2361 1000 11.72 32.5 1.5541 125.7091
1.5493 0.2597 1100 15.04 36.72 1.5773 92.4358
1.4813 0.2833 1200 22.08 40.11 1.5341 75.8667
1.5285 0.3069 1300 18.88 40.93 1.4834 95.4975
1.3255 0.3305 1400 20.11 40.82 1.4956 85.2319
1.3931 0.3541 1500 22.81 41.51 1.4718 72.2197
1.3962 0.3777 1600 25.43 43.53 1.3794 71.1842
1.1412 0.4013 1700 22.13 43.19 1.4172 86.9428
1.1132 0.4249 1800 21.27 42.45 1.3989 81.0896
0.9261 0.4485 1900 26.39 45.4 1.4147 70.6889
0.994 0.4721 2000 24.38 42.87 1.4365 77.5326
0.8598 0.4958 2100 19.36 41.49 1.3559 96.6231
0.7784 0.5194 2200 26.54 45.57 1.3550 69.5633
0.7858 0.5430 2300 27.52 47.58 1.3156 68.8879
0.7715 0.5666 2400 26.12 46.47 1.2985 72.5349
0.7079 0.5902 2500 25.62 47.61 1.3134 68.6177
0.6704 0.6138 2600 28.2 47.37 1.3047 69.1130
0.6579 0.6374 2700 29.52 49.39 1.2486 68.2125
0.502 0.6610 2800 28.08 48.99 1.2511 68.6177
0.4442 0.6846 2900 32.57 50.66 1.2800 63.3498
0.5175 0.7082 3000 29.69 48.77 1.2650 66.2314
0.4416 0.7318 3100 32.36 50.29 1.2554 61.9090
0.4529 0.7554 3200 32.6 50.94 1.2050 61.5489
0.4435 0.7790 3300 33.2 52.17 1.2103 61.3688
0.3724 0.8026 3400 33.89 52.88 1.1756 59.8379
0.3883 0.8263 3500 32.21 51.86 1.1979 62.0891
0.3534 0.8499 3600 32.75 51.85 1.1943 61.2337
0.326 0.8735 3700 32.43 51.5 1.1891 62.1342
0.305 0.8971 3800 33.43 51.45 1.1858 59.4327
0.2258 0.9207 3900 32.53 51.42 1.1827 61.1887
0.3104 0.9443 4000 32.1 51.33 1.1857 61.2337
0.3847 0.9679 4100 1.3506 29.91 48.63 66.5466
0.426 0.9915 4200 1.3458 25.68 45.27 70.1036
0.2622 1.0151 4300 1.3544 27.52 48.0 66.4115
0.2429 1.0387 4400 1.4330 22.57 45.45 79.9190
0.269 1.0623 4500 1.4399 24.7 45.73 74.7411
0.3171 1.0859 4600 1.3711 29.55 47.78 68.4827
0.2321 1.1095 4700 1.4350 24.73 45.52 77.1724
0.2595 1.1331 4800 1.3851 30.54 47.85 65.1508
0.2426 1.1568 4900 1.4109 28.87 47.5 68.3926
0.2496 1.1804 5000 1.3717 29.97 48.74 68.6628
0.2551 1.2040 5100 1.4157 29.92 47.59 66.3215
0.231 1.2276 5200 1.3908 28.97 47.9 66.0063
0.245 1.2512 5300 1.4082 30.22 47.71 63.7100
0.284 1.2748 5400 1.3696 27.47 48.31 70.7789
0.2284 1.2984 5500 1.4044 27.63 47.37 68.2575
0.2457 1.3220 5600 1.3722 31.38 48.8 64.7906
0.2346 1.3456 5700 1.3397 33.61 50.14 60.3332
0.2088 1.3692 5800 1.3920 30.84 48.51 65.4660
0.1832 1.3928 5900 1.3892 31.47 49.56 64.5205
0.2171 1.4164 6000 1.3606 32.51 49.8 63.1697
0.1799 1.4400 6100 1.4130 30.8 50.05 63.3949
0.1756 1.4636 6200 1.3458 30.25 50.16 66.1864
0.1617 1.4873 6300 1.3971 32.27 50.74 63.4849
0.1909 1.5109 6400 1.4275 27.41 47.04 72.0396
0.1516 1.5345 6500 1.3591 30.1 49.05 66.0513
0.1892 1.5581 6600 1.3646 31.72 48.17 62.6294
0.2086 1.5817 6700 1.3314 28.85 49.68 67.3120
0.1253 1.6053 6800 1.3461 29.84 49.13 66.5466
0.1307 1.6289 6900 1.3671 29.39 48.77 67.7172
0.1376 1.6525 7000 1.3769 31.27 47.97 66.5916
0.1593 1.6761 7100 1.3699 30.53 49.33 65.4660
0.1604 1.6997 7200 1.3540 31.99 48.93 63.8001
0.118 1.7233 7300 1.3523 29.52 49.26 67.5822
0.1148 1.7469 7400 1.3130 31.49 49.49 62.8996
0.0946 1.7705 7500 1.3468 32.6 49.76 63.1697
0.0891 1.7941 7600 1.3268 31.84 50.41 63.5750
0.103 1.8178 7700 1.3243 32.81 50.61 60.3782
0.1016 1.8414 7800 1.2945 33.07 53.14 61.0086
0.1014 1.8650 7900 1.3163 32.35 51.28 63.3498
0.1257 1.8886 8000 1.3246 31.65 51.86 61.7740
0.0859 1.9122 8100 1.3247 30.69 51.47 64.4304
0.0943 1.9358 8200 1.3030 33.06 52.31 61.6389
0.11 1.9594 8300 1.2866 33.32 52.83 60.1081
0.0723 1.9830 8400 1.3071 32.96 51.64 61.7740
0.0312 2.0066 8500 1.3202 33.2 52.78 62.0891
0.0303 2.0302 8600 1.3348 33.24 52.75 62.4043
0.02 2.0538 8700 1.3447 33.32 52.6 62.0891
0.0329 2.0774 8800 1.3328 34.04 52.93 60.7384
0.0216 2.1010 8900 1.3266 33.47 52.75 61.3237
0.0224 2.1246 9000 1.3291 33.46 52.93 61.7740

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
764M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-v7.3.1-9k-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    33.460
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    61.774