Edit model card

Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-small on the IWSLT-2023, FLEURS, BiteSize, and SpokenWords datasets. The best model checkpoint (this version) based on ChrF is at step 3300, epoch 3.67, and it achieves the following results on the evaluation set:

  • Loss: 1.5823
  • Bleu: 29.81
  • Chrf: 46.50
  • Wer: 66.7267

The best checkpoint based on BLEU achieves the following results:

  • Loss: 1.5752
  • Bleu: 30.77
  • Chrf: 46.43
  • Wer: 64.6556

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Experiment

  • language=English
  • +more steps

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.4954 0.11 100 3.7 18.03 2.1286 179.7839
2.045 0.22 200 12.65 25.53 1.8146 100.9005
1.7928 0.32 300 13.78 30.2 1.7253 101.9811
1.6615 0.43 400 15.8 31.88 1.6834 92.5259
1.4491 0.54 500 15.61 36.27 1.5971 107.3841
1.2074 0.65 600 19.92 36.31 1.5939 84.3314
1.2308 0.76 700 20.37 38.72 1.5234 84.8267
1.107 0.86 800 21.35 37.87 1.5460 82.8906
0.9491 0.97 900 21.06 40.74 1.5161 82.5754
0.384 1.08 1000 23.24 41.98 1.4927 82.2152
0.362 1.19 1100 23.19 42.24 1.5567 80.2792
0.3756 1.29 1200 27.83 43.8 1.5265 69.2481
0.3401 1.4 1300 21.79 41.66 1.5522 92.3908
0.3346 1.51 1400 24.61 42.15 1.5085 75.4615
0.3101 1.62 1500 26.67 43.41 1.4933 70.7789
0.3231 1.73 1600 27.95 42.82 1.4979 68.3026
0.2665 1.83 1700 28.5 43.76 1.4977 68.1225
0.2704 1.94 1800 28.15 43.87 1.5063 68.8429
0.0769 2.05 1900 25.76 43.22 1.5162 77.6227
0.0597 2.16 2000 25.04 43.15 1.5216 79.0635
0.0743 2.27 2100 27.85 44.43 1.5313 68.3926
0.0878 2.37 2200 27.54 43.96 1.5495 68.3476
0.0712 2.48 2300 28.28 44.39 1.5355 65.8712
0.0789 2.59 2400 28.64 44.75 1.5277 65.7812
0.073 2.7 2500 29.09 44.65 1.5327 65.7812
0.073 2.8 2600 25.26 43.44 1.5304 78.2981
0.0697 2.91 2700 25.71 43.02 1.5460 78.4782
0.0398 3.02 2800 28.26 44.71 1.5580 72.8501
0.0302 3.13 2900 30.25 45.46 1.5688 66.1414
0.0424 3.24 3000 29.88 45.21 1.5693 66.0964
0.0397 3.34 3100 30.01 45.85 1.5934 65.6911
0.0346 3.45 3200 30.2 45.8 1.5818 65.8262
0.032 3.56 3300 29.81 46.5 1.5823 66.7267
0.0348 3.67 3400 30.77 46.43 1.5752 64.6556
0.0277 3.78 3500 30.3 46.02 1.5791 64.6105
0.0364 3.88 3600 29.92 45.38 1.5895 65.0608
0.0398 3.99 3700 27.79 44.59 1.6167 68.2575
0.0152 4.1 3800 28.42 44.83 1.6241 67.5822
0.0201 4.21 3900 29.02 45.11 1.6243 67.4921
0.0168 4.31 4000 26.85 44.41 1.6195 73.5254

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
27
Safetensors
Model size
242M params
Tensor type
F32
·

Finetuned from

Datasets used to train ymoslem/whisper-small-ga2en-v1.2

Collection including ymoslem/whisper-small-ga2en-v1.2

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, SpokenWords
    self-reported
    26.850
  • Wer on IWSLT-2023, FLEURS, BiteSize, SpokenWords
    self-reported
    73.525