Holmeister's picture
End of training
6459d6a verified
|
raw
history blame
5.51 kB
metadata
license: apache-2.0
base_model: google/mt5-large
tags:
  - generated_from_trainer
metrics:
  - rouge
  - bleu
model-index:
  - name: mT5_TSATweets_cond_gen_5_instruction
    results: []

mT5_TSATweets_cond_gen_5_instruction

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0710
  • Rouge1: 0.709
  • Rouge2: 0.094
  • Rougel: 0.71
  • Rougelsum: 0.709
  • Bleu: 0.0
  • Precisions: [0.709, 0.0, 0.0, 0.0]
  • Brevity Penalty: 1.0
  • Length Ratio: 1.0
  • Translation Length: 1000
  • Reference Length: 1000
  • Meteor: 0.3545
  • Score: 29.1000
  • Num Edits: 291
  • Ref Length: 1000.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Precisions Brevity Penalty Length Ratio Translation Length Reference Length Meteor Score Num Edits Ref Length
No log 0.5 82 0.1335 0.2707 0.0 0.2707 0.2707 0.0 [0.2706896551724138, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.1353 72.9310 423 580.0
2.9168 1.0 164 0.0903 0.6172 0.0017 0.6190 0.6172 0.0 [0.6172413793103448, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3086 38.2759 222 580.0
2.9168 1.5 246 0.0968 0.6310 0.0 0.6319 0.6310 0.0 [0.6310344827586207, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3155 36.8966 214 580.0
0.1116 2.0 328 0.0769 0.6603 0.0328 0.6603 0.6586 0.0 [0.6603448275862069, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3302 33.9655 197 580.0
0.1116 2.5 410 0.0762 0.6931 0.0707 0.6931 0.6914 0.0 [0.6931034482758621, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3466 30.6897 178 580.0
0.0921 3.0 492 0.0709 0.6931 0.0276 0.6914 0.6931 0.0 [0.6931034482758621, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3466 30.6897 178 580.0
0.0921 3.5 574 0.0897 0.6897 0.0379 0.6897 0.6897 0.0 [0.6896551724137931, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3448 31.0345 180 580.0
0.079 4.0 656 0.0679 0.6948 0.0707 0.6948 0.6948 0.0 [0.6948275862068966, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3474 30.5172 177 580.0
0.079 4.5 738 0.0771 0.7103 0.0345 0.7103 0.7086 0.0 [0.7103448275862069, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3552 28.9655 168 580.0
0.0712 5.0 820 0.0675 0.7069 0.0517 0.7069 0.7052 0.0 [0.7051724137931035, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3526 29.4828 171 580.0
0.0712 5.5 902 0.0657 0.7138 0.0603 0.7138 0.7138 0.0 [0.7137931034482758, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3569 28.6207 166 580.0
0.065 6.0 984 0.0670 0.7069 0.0621 0.7069 0.7069 0.0 [0.7068965517241379, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3534 29.3103 170 580.0
0.065 6.5 1066 0.0658 0.7103 0.0672 0.7103 0.7103 0.0 [0.7103448275862069, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3552 28.9655 168 580.0
0.0596 7.0 1148 0.0741 0.7155 0.0586 0.7155 0.7155 0.0 [0.7155172413793104, 0.0, 0.0, 0.0] 1.0 1.0 580 580 0.3578 28.4483 165 580.0

Framework versions

  • Transformers 4.41.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1