Edit model card

mt5-large-gramatika1500k

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0471
  • Rouge1: 75.56
  • Rouge2: 72.1272
  • Rougel: 75.5131
  • Rougelsum: 75.5134
  • Gen Len: 18.4427

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adafactor
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.1109 1.33 100000 0.0567 75.0228 71.1923 74.9653 74.9652 18.4441
0.0572 2.67 200000 0.0494 75.3858 71.8285 75.3356 75.334 18.4427
0.0431 4.0 300000 0.0471 75.56 72.1272 75.5131 75.5134 18.4427
0.0332 5.33 400000 0.0486 75.6167 72.2424 75.5734 75.5726 18.4424
0.0277 6.67 500000 0.0490 75.6749 72.3462 75.6327 75.6317 18.4428
0.0236 8.0 600000 0.0501 75.6924 72.3891 75.6502 75.6508 18.4430
0.0202 9.34 700000 0.0525 75.7134 72.4174 75.6724 75.6714 18.4427

Framework versions

  • Transformers 4.31.0
  • Pytorch 1.11.0a0+b6df043
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
9

Finetuned from