Automatic Speech Recognition
Transformers
TensorBoard
Safetensors
Irish
English
whisper
Generated from Trainer
Eval Results
Inference Endpoints
Edit model card

Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1929
  • Bleu: 29.54
  • Chrf: 51.58
  • Wer: 62.4043

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.4382 0.0109 100 2.1114 3.07 16.85 171.0491
2.6151 0.0219 200 2.0207 6.25 23.02 126.9698
2.5699 0.0328 300 1.8660 5.71 24.03 155.5606
2.3084 0.0438 400 1.8084 9.87 28.45 129.0860
2.3327 0.0547 500 1.7823 12.01 31.92 102.7915
2.1495 0.0657 600 1.7238 13.97 32.4 98.6042
2.2164 0.0766 700 1.6538 11.21 33.19 146.0153
2.0071 0.0876 800 1.7038 14.34 35.72 96.9383
1.8334 0.0985 900 1.6329 16.51 37.23 96.8032
1.8359 0.1095 1000 1.6637 17.87 35.94 84.4665
1.7703 0.1204 1100 1.5626 19.54 39.02 79.7839
1.5805 0.1314 1200 1.5618 20.19 40.4 77.8028
1.4545 0.1423 1300 1.5599 13.88 35.53 112.5619
1.5177 0.1533 1400 1.4880 18.79 40.11 84.6916
1.6335 0.1642 1500 1.4996 16.41 38.64 96.9833
1.3809 0.1752 1600 1.4739 18.3 40.17 101.8910
1.2694 0.1861 1700 1.4498 22.53 43.15 76.9923
1.2321 0.1970 1800 1.4163 19.92 42.59 84.6015
1.1969 0.2080 1900 1.4137 21.63 44.92 85.3670
1.2023 0.2189 2000 1.3530 20.42 41.57 82.8906
1.1676 0.2299 2100 1.3723 22.82 44.23 78.1180
1.0332 0.2408 2200 1.3641 26.73 44.75 70.2386
0.8589 0.2518 2300 1.3344 26.94 46.89 72.7600
0.9829 0.2627 2400 1.3181 28.15 47.21 69.1130
0.8228 0.2737 2500 1.3049 26.98 47.41 74.0207
0.7667 0.2846 2600 1.2698 30.0 49.42 65.1058
0.8749 0.2956 2700 1.2878 27.91 47.67 66.9518
0.7504 0.3065 2800 1.2670 32.03 50.35 63.6650
0.7069 0.3175 2900 1.2771 30.7 49.53 64.4304
0.7199 0.3284 3000 1.2658 30.21 48.93 65.5561
0.6207 0.3394 3100 1.2687 30.82 49.11 66.0063
0.5995 0.3503 3200 1.2207 31.99 50.94 62.9446
0.6294 0.3612 3300 1.2422 31.05 50.85 64.7006
0.4612 0.3722 3400 1.2203 33.1 51.82 61.9090
0.5138 0.3831 3500 1.2007 32.08 51.86 63.0797
0.5059 0.3941 3600 1.2130 31.8 51.19 63.9352
0.417 0.4050 3700 1.1975 32.45 51.41 62.2692
0.2958 0.4160 3800 1.2046 29.29 51.39 62.7645
0.393 0.4269 3900 1.1968 28.95 51.45 63.1697
0.3858 0.4379 4000 1.1929 29.54 51.58 62.4043

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
764M params
Tensor type
F32
·
Inference API
or
This model can be loaded on Inference API (serverless).

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-v5.3.1-4k-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    29.540
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    62.404