Edit model card

mt5-large-gramatika161k-b16-5000

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0949
  • Rouge1: 72.227
  • Rouge2: 67.1468
  • Rougel: 72.1408
  • Rougelsum: 72.1494
  • Gen Len: 18.3283

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adafactor
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.684 0.63 5000 0.1422 70.2446 63.7161 70.115 70.1185 18.3370
0.1704 1.27 10000 0.1185 71.1601 65.3066 71.0354 71.041 18.3348
0.1383 1.9 15000 0.1079 71.5399 65.9422 71.4296 71.4371 18.3289
0.1166 2.54 20000 0.1032 71.8281 66.4753 71.7248 71.7321 18.3303
0.106 3.17 25000 0.0983 72.0264 66.8201 71.9367 71.9427 18.3291
0.0952 3.81 30000 0.0962 72.1134 66.9793 72.0288 72.0362 18.3297
0.0891 4.44 35000 0.0949 72.227 67.1468 72.1408 72.1494 18.3283

Framework versions

  • Transformers 4.30.1
  • Pytorch 1.11.0a0+b6df043
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9