Edit model card

iwslt_aligned_smallT5_cont0

This model is a fine-tuned version of google/mt5-small on the paulh27/alignment_iwslt2017_de_en dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5612
  • Bleu: 65.6358
  • Gen Len: 28.7691

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adafactor
  • lr_scheduler_type: constant
  • training_steps: 500000

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
1.2426 0.78 10000 0.8300 46.2793 28.6532
0.9931 1.55 20000 0.6756 52.2709 28.6441
0.8573 2.33 30000 0.6143 55.8294 28.5405
0.762 3.11 40000 0.5811 57.5135 28.366
0.734 3.88 50000 0.5499 58.6125 28.5101
0.6722 4.66 60000 0.5228 59.6427 28.8356
0.6215 5.43 70000 0.5161 60.4701 28.7534
0.5756 6.21 80000 0.5068 62.0864 28.6498
0.5738 6.99 90000 0.5005 61.9714 28.5788
0.5384 7.76 100000 0.4909 62.407 28.5282
0.5109 8.54 110000 0.4902 62.1452 28.4617
0.4816 9.32 120000 0.4875 62.6499 28.5518
0.4493 10.09 130000 0.4867 62.6694 28.6993
0.4648 10.87 140000 0.4775 63.3179 28.5495
0.4414 11.64 150000 0.4787 63.6928 28.4673
0.4158 12.42 160000 0.4792 63.8752 28.5011
0.3895 13.2 170000 0.4794 63.8429 28.6498
0.4031 13.97 180000 0.4757 63.9496 28.7264
0.3844 14.75 190000 0.4855 63.7498 28.8288
0.3637 15.53 200000 0.4800 64.2277 28.661
0.3473 16.3 210000 0.4854 64.4683 28.786
0.3243 17.08 220000 0.4903 64.7805 28.6791
0.3426 17.85 230000 0.4819 64.679 28.4809
0.3295 18.63 240000 0.4852 65.3735 28.6014
0.3124 19.41 250000 0.4947 64.5641 28.6745
0.2933 20.18 260000 0.4972 65.1364 28.6419
0.3101 20.96 270000 0.4902 64.6747 28.6802
0.2991 21.74 280000 0.4907 64.9732 28.5653
0.2828 22.51 290000 0.5038 64.7552 28.6261
0.2688 23.29 300000 0.5042 65.0702 28.7534
0.2555 24.06 310000 0.5101 65.0378 29.089
0.2692 24.84 320000 0.5022 64.9991 28.6937
0.2593 25.62 330000 0.5085 65.2478 28.6137
0.2439 26.39 340000 0.5152 64.863 28.6464
0.2327 27.17 350000 0.5165 65.0748 28.7286
0.249 27.95 360000 0.5116 64.7249 28.6137
0.238 28.72 370000 0.5202 64.7651 28.5968
0.2297 29.5 380000 0.5243 65.3334 28.7005
0.2152 30.27 390000 0.5336 64.9364 28.6081
0.2106 31.05 400000 0.5408 65.117 28.6745
0.2234 31.83 410000 0.5249 64.8926 28.6318
0.2085 32.6 420000 0.5306 65.5715 28.7984
0.2018 33.38 430000 0.5429 64.9154 28.6351
0.1885 34.16 440000 0.5453 65.0538 28.8525
0.2049 34.93 450000 0.5434 65.2857 28.7207
0.1957 35.71 460000 0.5491 65.3436 28.714
0.1867 36.49 470000 0.5536 65.4934 28.7939
0.1765 37.26 480000 0.5583 65.5595 28.8255
0.1786 38.04 490000 0.5612 65.6358 28.7691
0.1809 38.81 500000 0.5573 65.0266 28.7455

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
7
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from

Dataset used to train paulh27/iwslt_aligned_smallT5_cont0

Evaluation results