Edit model card

mt5-small_final_final_new

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2941
  • Rouge1: 41.3841
  • Rouge2: 32.6198
  • Rougel: 38.6245
  • Rougelsum: 38.6833
  • Bleu: 28.8775
  • Gen Len: 17.0839
  • Meteor: 0.3704
  • No ans accuracy: 0.0
  • Av cosine sim: 0.7627

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 9
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Gen Len Meteor No ans accuracy Av cosine sim
14.5708 1.0 175 4.8623 10.2732 3.6837 9.295 9.3426 2.4037 8.7507 0.0865 0.0 0.4429
6.5938 1.99 350 3.0321 10.3823 5.1376 9.566 9.6003 3.8998 7.844 0.0969 0.0 0.4234
4.3372 2.99 525 2.3227 26.9602 18.9826 25.2396 25.2665 9.7754 12.2901 0.2376 0.0 0.6442
3.4266 3.98 700 2.0083 31.5678 23.6447 29.6748 29.7026 12.8064 13.222 0.2877 0.0 0.6947
3.0011 4.98 875 1.8600 32.2283 24.3874 30.2293 30.2518 14.2873 13.6664 0.2984 0.0 0.704
2.7444 5.97 1050 1.7535 32.4685 24.6833 30.4294 30.4397 14.9587 13.8386 0.3029 0.0 0.7074
2.5506 6.97 1225 1.6692 32.5693 24.8903 30.5541 30.5742 15.3203 13.9335 0.305 0.0 0.7097
2.4241 7.96 1400 1.5991 32.763 25.0389 30.7387 30.7372 15.8514 13.9643 0.3078 0.0 0.7127
2.2984 8.96 1575 1.5373 32.7553 25.113 30.7279 30.7385 16.1118 14.0551 0.3085 0.0 0.7126
2.2212 9.95 1750 1.4843 32.1917 24.619 30.2246 30.2458 16.1846 14.0741 0.3037 0.0 0.7068
2.1401 10.95 1925 1.4425 32.2614 24.7428 30.3223 30.3377 16.3919 13.9891 0.3044 0.0 0.7087
2.0755 11.94 2100 1.4034 32.222 24.6764 30.2975 30.3261 16.504 13.9859 0.3043 0.0 0.71
2.0328 12.94 2275 1.3723 32.1828 24.6096 30.2115 30.2389 16.5263 13.9632 0.3038 0.0 0.7099
1.9793 13.93 2450 1.3478 32.3184 24.6774 30.333 30.3495 16.8168 14.2392 0.3046 0.0 0.7097
1.9541 14.93 2625 1.3288 39.7212 31.117 37.1213 37.1596 26.1835 16.4908 0.3582 0.0 0.7527
1.9287 15.92 2800 1.3136 41.2942 32.5064 38.5652 38.6121 28.7564 17.0243 0.3693 0.0 0.7619
1.8985 16.92 2975 1.3059 41.3069 32.5558 38.5643 38.607 28.7815 17.0815 0.3697 0.0 0.7619
1.8938 17.91 3150 1.2985 41.4096 32.6579 38.6483 38.7074 28.8733 17.0759 0.3707 0.0 0.7628
1.8795 18.91 3325 1.2941 41.3841 32.6198 38.6245 38.6833 28.8775 17.0839 0.3704 0.0 0.7627

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.13.1
  • Tokenizers 0.13.3
Downloads last month
1

Finetuned from