Edit model card

mt5-small_large_lr

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9688
  • Rouge1: 38.8633
  • Rouge2: 33.0802
  • Rougel: 37.6956
  • Rougelsum: 37.7116
  • Bleu: 26.6301
  • Gen Len: 11.5566
  • Meteor: 0.3519
  • No ans accuracy: 22.99
  • Av cosine sim: 0.6861

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 9
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Gen Len Meteor No ans accuracy Av cosine sim
5.4434 1.0 175 2.1918 1.8449 1.2024 1.7039 1.7116 0.0 2.7672 0.0145 28.9700 0.1363
1.8436 1.99 350 1.1852 33.6062 26.8725 32.2258 32.241 20.3395 12.2528 0.2957 17.3800 0.636
1.2276 2.99 525 1.0630 33.186 27.4949 32.0715 32.0522 20.3232 11.0301 0.2957 21.18 0.6109
0.9589 3.98 700 1.0083 40.265 33.6652 38.9503 38.9661 28.0884 12.8545 0.3623 17.54 0.7157
0.7931 4.98 875 0.9682 37.9437 31.7611 36.7618 36.7671 25.7738 12.0286 0.3424 20.66 0.6825
0.6686 5.97 1050 0.9601 37.5742 31.9098 36.4225 36.4381 24.9584 11.4169 0.3398 22.56 0.6713
0.5686 6.97 1225 0.9620 43.1436 36.6363 41.7279 41.7571 32.4301 13.6142 0.3893 16.9400 0.757
0.4939 7.96 1400 0.9688 38.8633 33.0802 37.6956 37.7116 26.6301 11.5566 0.3519 22.99 0.6861

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.13.1
  • Tokenizers 0.13.3
Downloads last month
1

Finetuned from