kmok1's picture
End of training
5a09957 verified
metadata
license: apache-2.0
base_model: google/mt5-large
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: cs_mT5-large2_2e-5_50_v0.3
    results: []

cs_mT5-large2_2e-5_50_v0.3

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 10.7179
  • Bleu: 8.2299
  • Gen Len: 19.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
22.4209 1.0 6 14.7377 5.6697 19.0
20.671 2.0 12 14.9637 6.7619 19.0
17.7208 3.0 18 14.6564 5.3777 19.0
22.9549 4.0 24 15.1568 6.7736 19.0
16.6185 5.0 30 14.1533 7.0263 19.0
22.1158 6.0 36 15.0667 7.1851 19.0
24.587 7.0 42 15.5166 7.6752 19.0
16.4955 8.0 48 14.5515 7.5521 19.0
21.0521 9.0 54 13.0890 7.7939 19.0
16.1149 10.0 60 11.8305 7.7866 19.0
12.8454 11.0 66 11.8727 7.7197 19.0
18.482 12.0 72 11.6011 7.5761 19.0
18.6175 13.0 78 11.8911 7.7925 19.0
12.6805 14.0 84 11.8462 7.3764 19.0
14.3151 15.0 90 11.4554 7.6604 19.0
17.2287 16.0 96 11.1727 8.0204 19.0
16.3546 17.0 102 10.7514 8.0859 19.0
16.3339 18.0 108 11.1960 8.1381 19.0
16.6065 19.0 114 11.3321 8.126 19.0
14.3851 20.0 120 10.9074 6.3032 19.0
15.8189 21.0 126 10.5179 6.3626 19.0
8.4543 22.0 132 10.6037 6.3223 19.0
18.0304 23.0 138 10.3665 6.236 19.0
13.1475 24.0 144 10.3107 7.4434 19.0
21.3407 25.0 150 10.2976 7.4596 19.0
15.8901 26.0 156 10.4723 7.2047 19.0
13.3029 27.0 162 10.7863 7.2047 19.0
9.6205 28.0 168 11.2429 7.2047 19.0
15.4244 29.0 174 11.5663 7.1797 19.0
10.8496 30.0 180 11.9665 7.1839 19.0
16.4213 31.0 186 12.3102 7.1002 19.0
19.9358 32.0 192 12.3951 7.1693 19.0
13.9974 33.0 198 12.6037 7.1693 19.0
18.1208 34.0 204 12.4725 7.0996 19.0
10.2059 35.0 210 12.1561 7.286 19.0
15.9016 36.0 216 11.9896 7.286 19.0
16.7008 37.0 222 11.4571 8.4159 19.0
14.4533 38.0 228 11.1535 8.4159 19.0
15.1107 39.0 234 11.1553 8.4159 19.0
13.2587 40.0 240 11.0539 7.2709 19.0
14.9836 41.0 246 11.3945 7.1574 19.0
13.083 42.0 252 11.3690 7.1948 19.0
24.9864 43.0 258 11.2586 8.2299 19.0
22.1657 44.0 264 11.1126 8.2299 19.0
15.6887 45.0 270 11.0112 8.2299 19.0
8.581 46.0 276 10.8892 8.2299 19.0
14.0141 47.0 282 10.8514 8.2299 19.0
11.8402 48.0 288 10.8129 8.2299 19.0
14.7845 49.0 294 10.7252 8.2299 19.0
18.8443 50.0 300 10.7179 8.2299 19.0

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2