Edit model card

mt5-base-fce-e8-b16

This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3758
  • Rouge1: 84.5938
  • Rouge2: 76.5987
  • Rougel: 84.0063
  • Rougelsum: 84.0286
  • Gen Len: 15.4865

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adafactor
  • lr_scheduler_type: linear
  • num_epochs: 8

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.5646 0.23 400 0.5403 83.2786 74.549 82.6906 82.6978 15.5126
0.6122 0.45 800 0.4896 84.3453 75.5159 83.7564 83.7691 15.4500
0.5041 0.68 1200 0.4294 84.2563 75.8731 83.6118 83.6071 15.4760
0.4594 0.9 1600 0.4136 84.7369 76.6048 84.1541 84.1573 15.4651
0.3861 1.13 2000 0.4121 84.6947 76.574 84.0885 84.095 15.4642
0.3382 1.35 2400 0.3899 84.5537 76.4381 83.9421 83.951 15.4651
0.3442 1.58 2800 0.3866 84.6272 76.6256 84.0616 84.0804 15.4674
0.3388 1.81 3200 0.3758 84.5938 76.5987 84.0063 84.0286 15.4865
0.3109 2.03 3600 0.3822 84.5223 76.5703 83.9217 83.9438 15.4710
0.2254 2.26 4000 0.3923 84.3225 76.4146 83.7686 83.7789 15.4596
0.236 2.48 4400 0.3932 84.4412 76.4434 83.8515 83.8815 15.4692
0.2395 2.71 4800 0.3849 84.2211 76.3678 83.6444 83.6462 15.4614
0.2458 2.93 5200 0.3850 84.3534 76.598 83.8321 83.8366 15.4587
0.1832 3.16 5600 0.3973 84.4197 76.7844 83.8758 83.8781 15.4678
0.1576 3.39 6000 0.4082 84.1841 76.4425 83.6272 83.618 15.4783
0.1635 3.61 6400 0.3996 84.2051 76.3261 83.6613 83.6599 15.4788
0.1667 3.84 6800 0.3940 84.4538 76.8139 83.8887 83.8886 15.4610
0.145 4.06 7200 0.4260 84.4028 76.8101 83.8844 83.8824 15.4628
0.107 4.29 7600 0.4403 84.3559 76.8066 83.8048 83.807 15.4587
0.1078 4.51 8000 0.4337 84.3045 76.8011 83.7587 83.7699 15.4742
0.1114 4.74 8400 0.4334 84.2865 76.5415 83.7221 83.718 15.4820
0.1104 4.97 8800 0.4273 84.3211 76.8211 83.7795 83.7726 15.4838
0.0732 5.19 9200 0.4787 84.3459 76.752 83.777 83.7552 15.4829
0.069 5.42 9600 0.4839 84.4351 76.8848 83.8682 83.8584 15.4811
0.0713 5.64 10000 0.4896 84.2962 76.7428 83.7387 83.7253 15.4829
0.0716 5.87 10400 0.4788 84.3068 76.7969 83.74 83.7402 15.4747
0.0613 6.09 10800 0.5252 84.4256 77.008 83.8688 83.8828 15.4815
0.0439 6.32 11200 0.5398 84.3753 76.8235 83.793 83.7986 15.4815
0.0452 6.55 11600 0.5377 84.4467 76.8923 83.8893 83.8818 15.4815
0.0434 6.77 12000 0.5347 84.3734 76.811 83.8108 83.8063 15.4843
0.0424 7.0 12400 0.5380 84.4558 76.9239 83.9033 83.9022 15.4751
0.0296 7.22 12800 0.5808 84.332 76.8729 83.7923 83.7826 15.4774
0.0287 7.45 13200 0.5956 84.4744 77.0945 83.9222 83.9228 15.4843
0.0283 7.67 13600 0.5966 84.4271 77.0661 83.877 83.8712 15.4829
0.0285 7.9 14000 0.5983 84.4562 77.0334 83.8987 83.8985 15.4824

Framework versions

  • Transformers 4.28.1
  • Pytorch 1.11.0a0+b6df043
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
3