Edit model card

Whisper Large GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-large on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. The datasets are augmented in two ways: noise augmentation, and truncating low-amplitude samples. The best model checkpoint (this version) based on ChrF is at step 3000, epoch 0.99, and it achieves the following results on the evaluation set:

  • Loss: 1.1742
  • Bleu: 30.16
  • Chrf: 50.72
  • Wer: 69.9685

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.03
  • training_steps: 3000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Wer
3.1833 0.03 100 2.5169 2.03 16.8 215.5786
2.7632 0.07 200 2.1827 7.81 24.07 113.1022
2.5687 0.1 300 2.0746 6.16 24.2 158.8474
2.5615 0.13 400 1.9379 8.68 26.18 120.8465
2.4554 0.16 500 1.8932 12.14 28.94 103.1067
2.3546 0.2 600 1.8734 14.34 29.83 91.5353
2.2804 0.23 700 1.8075 13.18 33.07 105.5380
2.1408 0.26 800 1.7034 13.01 33.0 89.4642
2.0411 0.3 900 1.6556 16.73 34.97 91.4453
1.7766 0.33 1000 1.6505 17.21 35.54 83.5209
1.7704 0.36 1100 1.5800 17.54 38.11 77.1724
1.6537 0.39 1200 1.5684 14.2 35.39 95.6326
1.4841 0.43 1300 1.4970 22.96 39.35 71.3643
1.641 0.46 1400 1.4693 16.3 37.69 103.7821
1.393 0.49 1500 1.3923 27.21 43.87 69.3381
1.249 0.53 1600 1.3876 23.33 42.26 76.5421
1.3385 0.56 1700 1.3404 23.86 42.82 75.0563
1.2537 0.59 1800 1.3226 17.03 41.72 100.1801
1.2891 0.62 1900 1.2995 27.26 43.62 69.1580
1.226 0.66 2000 1.2605 30.89 47.34 63.5750
1.1268 0.69 2100 1.2783 27.43 45.97 67.4921
1.0007 0.72 2200 1.2521 27.21 47.25 71.0041
0.9565 0.76 2300 1.2219 31.65 48.07 64.2053
0.9309 0.79 2400 1.2193 31.4 48.18 64.1603
0.7923 0.82 2500 1.2099 28.88 48.89 69.7884
0.8199 0.85 2600 1.1972 29.37 48.07 67.3120
0.6974 0.89 2700 1.1857 29.7 48.95 70.5988
0.6736 0.92 2800 1.1884 29.33 48.97 72.7150
0.6826 0.95 2900 1.1834 30.76 50.11 68.1225
0.7001 0.99 3000 1.1742 30.16 50.72 69.9685

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.0.1+cu118
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
30
Safetensors
Model size
1.54B params
Tensor type
F32
·

Finetuned from

Datasets used to train ymoslem/whisper-large-ga2en-v1.1.1

Collection including ymoslem/whisper-large-ga2en-v1.1.1

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    30.160
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia
    self-reported
    69.968