Edit model card

t5-v1_1-large-gramatika161k-b16-5000

This model is a fine-tuned version of google/t5-v1_1-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1174
  • Rouge1: 48.5144
  • Rouge2: 40.6865
  • Rougel: 48.3881
  • Rougelsum: 48.3856
  • Gen Len: 18.8717

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adafactor
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.6024 0.63 5000 0.2087 45.4271 35.6446 45.1917 45.1772 18.8785
0.2262 1.27 10000 0.1614 46.8959 38.0793 46.7167 46.7103 18.8760
0.1842 1.9 15000 0.1436 47.6488 39.2469 47.4856 47.4753 18.8751
0.1588 2.54 20000 0.1311 48.0239 39.8829 47.8755 47.8715 18.8738
0.1461 3.17 25000 0.1247 48.2413 40.249 48.104 48.0976 18.8741
0.1342 3.81 30000 0.1195 48.4295 40.5473 48.2997 48.2965 18.8721
0.1279 4.44 35000 0.1174 48.5144 40.6865 48.3881 48.3856 18.8717

Framework versions

  • Transformers 4.30.1
  • Pytorch 1.11.0a0+b6df043
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8