Edit model card

mt5_baseline

This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0995
  • Rouge1: 17.5672
  • Rouge2: 5.8375
  • Rougel: 17.4382
  • Rougelsum: 17.4005
  • Gen Len: 20.9710

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 10.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
4.3576 1.0 1221 3.4988 12.4331 4.4297 12.2911 12.2479 18.4427
3.8587 2.0 2442 3.3121 14.8826 5.0815 14.7234 14.6877 19.6117
3.5957 3.0 3663 3.2360 15.0309 5.4261 14.9197 14.9144 20.4196
3.4707 4.0 4884 3.1737 16.2314 5.5413 16.008 16.0567 20.0129
3.363 5.0 6105 3.1553 16.3779 5.6995 16.2084 16.2036 20.9977
3.2796 6.0 7326 3.1189 17.0323 5.9413 16.9583 16.9213 20.2460
3.227 7.0 8547 3.1143 17.4254 6.0767 17.3036 17.3029 20.8706
3.1937 8.0 9768 3.1013 17.523 5.8829 17.398 17.3977 20.8485
3.1687 9.0 10989 3.0995 17.5672 5.8375 17.4382 17.4005 20.9710
3.177 10.0 12210 3.1020 17.6888 5.8978 17.5756 17.5283 20.9111

Framework versions

  • Transformers 4.18.0.dev0
  • Pytorch 2.0.0
  • Datasets 2.14.5
  • Tokenizers 0.12.1
Downloads last month
8