Edit model card

mt5-large-gramatika161k-b16-e10-lr5

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0909
  • Rouge1: 72.6295
  • Rouge2: 67.8521
  • Rougel: 72.5471
  • Rougelsum: 72.5591
  • Gen Len: 18.3276

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adafactor
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.9659 0.63 5000 0.1455 70.1028 63.4969 69.9738 69.9761 18.3378
0.1735 1.27 10000 0.1195 71.1156 65.2149 70.9932 71.0038 18.3324
0.1391 1.9 15000 0.1076 71.5692 66.0226 71.4676 71.472 18.3281
0.1149 2.54 20000 0.1035 71.8135 66.4584 71.7212 71.7292 18.3308
0.1029 3.17 25000 0.0961 72.104 66.9459 72.0139 72.0239 18.3282
0.0898 3.81 30000 0.0944 72.231 67.1623 72.1412 72.1542 18.3314
0.0803 4.44 35000 0.0926 72.3851 67.4624 72.3051 72.3183 18.3286
0.075 5.08 40000 0.0929 72.4219 67.5102 72.3376 72.3479 18.3298
0.0665 5.71 45000 0.0917 72.5132 67.6501 72.4271 72.4383 18.3264
0.0624 6.35 50000 0.0911 72.5711 67.771 72.4938 72.5041 18.3283
0.0588 6.98 55000 0.0909 72.6295 67.8521 72.5471 72.5591 18.3276
0.0534 7.62 60000 0.0920 72.6475 67.9046 72.5743 72.5853 18.3278
0.0514 8.25 65000 0.0930 72.6373 67.894 72.5612 72.5724 18.3277
0.0492 8.88 70000 0.0930 72.6593 67.9359 72.59 72.5971 18.3273
0.047 9.52 75000 0.0932 72.6906 68.01 72.6172 72.6269 18.3264

Framework versions

  • Transformers 4.30.1
  • Pytorch 1.11.0a0+b6df043
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8