Edit model card

mt5-small_mid_lr_mid_decay

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7428
  • Rouge1: 43.12
  • Rouge2: 37.6639
  • Rougel: 41.8367
  • Rougelsum: 41.904
  • Bleu: 31.957
  • Gen Len: 12.1285
  • Meteor: 0.3936
  • No ans accuracy: 22.29
  • Av cosine sim: 0.7406

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 9
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Gen Len Meteor No ans accuracy Av cosine sim
3.1455 1.0 175 0.9832 18.7107 15.4897 18.1977 18.2212 7.0634 7.6229 0.1626 22.4000 0.3949
1.1623 1.99 350 0.8542 38.7675 32.704 37.3557 37.3949 27.4323 12.5135 0.3487 17.9900 0.6992
0.9431 2.99 525 0.8017 41.6216 35.6002 40.2386 40.2881 30.7994 12.8117 0.3755 18.37 0.7304
0.8119 3.98 700 0.7787 43.5805 37.4117 42.1059 42.155 32.9646 13.2176 0.3947 17.7400 0.7582
0.7235 4.98 875 0.7477 43.4124 37.2017 41.8468 41.9097 32.9345 13.116 0.3946 18.92 0.7561
0.6493 5.97 1050 0.7266 40.4764 34.9927 39.0999 39.1711 29.0601 11.748 0.3687 22.6500 0.7071
0.5871 6.97 1225 0.7284 43.3812 37.5544 42.0405 42.0865 32.8345 12.6063 0.3949 21.05 0.7485
0.5453 7.96 1400 0.7389 43.4549 37.76 42.1025 42.215 32.6726 12.4537 0.3965 21.44 0.7496
0.5038 8.96 1575 0.7428 43.12 37.6639 41.8367 41.904 31.957 12.1285 0.3936 22.29 0.7406

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.13.1
  • Tokenizers 0.13.3
Downloads last month
2

Finetuned from