Automatic Speech Recognition
Transformers
TensorBoard
Safetensors
Irish
English
whisper
generated_from_trainer
Eval Results
Inference Endpoints
Edit model card

Whisper Medium GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia, augmented with noise dataset. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 2900, epoch 0.6349, and it achieves the following results on the evaluation set:

  • Loss: 1.1883
  • Bleu: 32.88
  • Chrf: 51.52
  • Wer: 62.0441

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.02
  • training_steps: 3000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
2.4487 0.0219 100 1.9518 8.34 24.49 117.2445
2.11 0.0438 200 1.6630 15.32 32.12 84.0612
1.9757 0.0657 300 1.5366 10.86 33.42 131.7875
1.7964 0.0876 400 1.4825 19.81 36.71 81.9451
1.6422 0.1095 500 1.4432 18.83 40.4 84.0162
1.3839 0.1314 600 1.4169 24.91 40.87 69.0230
1.352 0.1533 700 1.4340 25.01 41.57 71.5894
1.2434 0.1752 800 1.3813 24.05 41.29 73.7506
1.2223 0.1970 900 1.3578 25.89 41.61 70.5988
1.0414 0.2189 1000 1.3075 27.45 44.17 68.2575
0.9199 0.2408 1100 1.3022 23.14 44.3 84.1513
0.8648 0.2627 1200 1.3050 23.36 43.37 72.4448
0.8469 0.2846 1300 1.2853 28.37 45.97 67.1319
0.7649 0.3065 1400 1.2755 28.56 46.76 66.0964
0.7321 0.3284 1500 1.2750 27.23 46.1 69.3381
0.6541 0.3503 1600 1.2557 30.02 48.06 65.6011
0.6107 0.3722 1700 1.2520 30.41 49.23 64.2954
0.5738 0.3941 1800 1.2435 32.45 50.27 63.4399
0.4983 0.4160 1900 1.2007 31.17 48.58 64.0702
0.4439 0.4379 2000 1.2140 32.29 50.37 60.6033
0.367 0.4598 2100 1.2230 29.54 49.14 67.7172
0.2807 0.4817 2200 1.2277 33.1 51.21 62.9446
0.2621 0.5036 2300 1.2441 30.59 49.49 64.8807
0.2965 0.5255 2400 1.1969 31.82 49.67 63.5299
0.236 0.5473 2500 1.2275 31.17 50.29 65.1959
0.229 0.5692 2600 1.2008 30.02 50.27 70.6439
0.164 0.5911 2700 1.2192 31.37 50.57 63.6200
0.1786 0.6130 2800 1.1965 31.81 50.13 62.8546
0.1987 0.6349 2900 1.1883 32.88 51.52 62.0441
0.1633 0.6568 3000 1.1903 32.01 50.38 62.7645

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
11
Safetensors
Model size
764M params
Tensor type
F32
·

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-v4

Collection including ymoslem/whisper-medium-ga2en-v4

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia, augmented with noise
    self-reported
    32.010
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia, augmented with noise
    self-reported
    62.765