Edit model card

mt5-large-gramatika161k-b16-e10-lr0.001

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1537
  • Rouge1: 70.8264
  • Rouge2: 64.518
  • Rougel: 70.6934
  • Rougelsum: 70.6881
  • Gen Len: 18.3298

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adafactor
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.3641 0.63 5000 0.1944 69.4204 61.9635 69.2556 69.2477 18.3389
0.1843 1.27 10000 0.1655 70.3343 63.6924 70.1851 70.1815 18.3377
0.1359 1.9 15000 0.1537 70.8264 64.518 70.6934 70.6881 18.3298
0.0912 2.54 20000 0.1643 71.037 64.8861 70.9075 70.9027 18.3295
0.0759 3.17 25000 0.1694 71.288 65.3505 71.1746 71.1675 18.3314
0.054 3.81 30000 0.1672 71.4356 65.5825 71.3263 71.3199 18.3294
0.0398 4.44 35000 0.1779 71.4473 65.6798 71.343 71.3354 18.3341
0.0331 5.08 40000 0.1908 71.615 65.9285 71.5126 71.4982 18.3344
0.021 5.71 45000 0.2025 71.6252 65.9628 71.5172 71.513 18.3317
0.0167 6.35 50000 0.2107 71.6508 66.0666 71.5547 71.542 18.3366
0.0126 6.98 55000 0.2084 71.8403 66.3396 71.7392 71.735 18.3337
0.0072 7.62 60000 0.2256 71.8659 66.388 71.7699 71.7644 18.3330
0.0057 8.25 65000 0.2578 71.9226 66.4948 71.8279 71.8162 18.3313
0.0036 8.88 70000 0.2784 71.9279 66.5248 71.8258 71.8149 18.3324
0.0021 9.52 75000 0.3040 71.9913 66.634 71.893 71.8844 18.3317

Framework versions

  • Transformers 4.30.1
  • Pytorch 1.11.0a0+b6df043
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9