Edit model card

cs_mT5-large2_2e-5_100_v0.4

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1297
  • Bleu: 12.9589
  • Gen Len: 15.7619

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
18.9475 1.0 6 8.4357 8.3431 19.0
11.3295 2.0 12 7.0692 8.4786 19.0
11.2185 3.0 18 6.4881 7.8425 19.0
9.7688 4.0 24 6.2043 7.4958 19.0
7.633 5.0 30 6.1694 7.4994 19.0
10.8618 6.0 36 6.0789 7.2123 19.0
9.4099 7.0 42 6.0121 7.4767 19.0
7.1028 8.0 48 5.9718 7.4839 19.0
10.2013 9.0 54 5.9141 8.246 19.0
11.8248 10.0 60 5.8562 8.4493 19.0
6.4776 11.0 66 5.7982 8.3904 19.0
6.813 12.0 72 5.7229 8.6768 19.0
10.7703 13.0 78 5.6543 8.6985 19.0
7.1642 14.0 84 5.6137 8.8415 19.0
8.1195 15.0 90 5.5589 8.9574 19.0
10.4234 16.0 96 5.4890 8.8876 19.0
6.9893 17.0 102 5.4146 10.5467 19.0
7.3889 18.0 108 5.3484 10.5658 19.0
6.042 19.0 114 5.3008 10.4592 19.0
8.1065 20.0 120 5.2775 9.958 19.0
5.2708 21.0 126 5.2247 9.4528 19.0
4.9285 22.0 132 5.1740 7.2801 19.0
4.8751 23.0 138 5.1216 7.2902 19.0
5.6123 24.0 144 5.0377 9.9147 19.0
4.6797 25.0 150 4.9345 10.7926 19.0
4.3882 26.0 156 4.8420 10.7388 19.0
5.2828 27.0 162 4.7564 10.6526 19.0
4.2994 28.0 168 4.6960 9.7076 19.0
3.964 29.0 174 4.5760 9.6231 19.0
9.4351 30.0 180 4.5173 8.9601 19.0
5.1911 31.0 186 4.4738 9.8122 19.0
3.4724 32.0 192 4.4102 10.0874 19.0
4.39 33.0 198 4.3444 10.9115 19.0
6.3546 34.0 204 4.2755 11.0007 18.7619
5.4393 35.0 210 4.2020 11.127 18.7619
2.8915 36.0 216 4.1248 11.0087 18.7619
6.6742 37.0 222 4.0757 11.4869 18.9048
3.5017 38.0 228 4.0376 11.4063 18.9048
4.1386 39.0 234 3.9890 10.7683 18.8095
4.9058 40.0 240 3.9372 10.7683 18.8095
4.2836 41.0 246 3.8872 10.8023 18.8095
3.7174 42.0 252 3.8378 10.9883 18.4286
3.0365 43.0 258 3.7907 11.0504 18.381
3.3476 44.0 264 3.7541 11.0634 17.9524
3.9578 45.0 270 3.7206 11.4798 17.4762
4.3193 46.0 276 3.6877 11.5753 17.3333
3.6244 47.0 282 3.6538 11.8793 16.5714
2.9136 48.0 288 3.6136 12.0169 15.9524
2.2932 49.0 294 3.5735 11.126 16.3333
3.4335 50.0 300 3.5422 11.3689 16.3333
2.9941 51.0 306 3.5072 11.0037 16.3333
4.7679 52.0 312 3.4671 10.8819 16.3333
2.7498 53.0 318 3.4404 10.8882 16.619
3.5759 54.0 324 3.4195 10.8882 16.619
2.9012 55.0 330 3.4040 10.5862 17.3333
3.2823 56.0 336 3.3924 10.57 17.0476
4.5793 57.0 342 3.3806 11.605 17.0
4.271 58.0 348 3.3672 11.7166 16.8095
3.5227 59.0 354 3.3569 11.8219 16.5238
2.9193 60.0 360 3.3481 11.8931 16.381
3.5956 61.0 366 3.3385 11.1876 16.5238
3.5521 62.0 372 3.3293 11.0871 16.7143
2.6291 63.0 378 3.3136 11.2216 16.5714
2.0321 64.0 384 3.2935 11.29 16.381
2.5651 65.0 390 3.2846 11.3853 16.5714
2.9702 66.0 396 3.2866 11.3853 16.5714
2.2628 67.0 402 3.2755 11.294 16.3333
2.5516 68.0 408 3.2619 11.699 16.1429
3.3097 69.0 414 3.2485 11.682 16.1429
1.8752 70.0 420 3.2383 11.8141 15.9048
2.3432 71.0 426 3.2299 11.8141 15.9048
2.2128 72.0 432 3.2202 12.0422 15.381
2.7711 73.0 438 3.2107 11.9983 15.5238
3.7951 74.0 444 3.2039 12.2396 15.6667
2.7207 75.0 450 3.1969 12.9329 15.619
2.071 76.0 456 3.1905 12.4005 15.619
1.9696 77.0 462 3.1861 12.5352 15.2857
1.2979 78.0 468 3.1816 12.5352 15.2857
2.6149 79.0 474 3.1777 12.5352 15.2857
1.7925 80.0 480 3.1720 12.5352 15.2857
2.3365 81.0 486 3.1683 12.4005 15.619
3.0536 82.0 492 3.1653 12.4005 15.619
2.6278 83.0 498 3.1617 12.4005 15.619
3.2318 84.0 504 3.1583 12.4005 15.619
2.9789 85.0 510 3.1569 12.4005 15.619
2.3504 86.0 516 3.1537 12.4005 15.619
1.603 87.0 522 3.1508 12.4005 15.619
3.2194 88.0 528 3.1486 12.2448 16.0952
2.6168 89.0 534 3.1459 12.2448 16.0952
2.3382 90.0 540 3.1429 12.2448 16.0952
3.6469 91.0 546 3.1397 12.8612 15.9048
2.2697 92.0 552 3.1371 12.8612 15.9048
1.8352 93.0 558 3.1356 12.8612 15.9048
1.3854 94.0 564 3.1344 12.8612 15.9048
2.6405 95.0 570 3.1336 12.8612 15.9048
2.0361 96.0 576 3.1321 12.8612 15.9048
3.4828 97.0 582 3.1311 12.9589 15.7619
2.6929 98.0 588 3.1304 12.9589 15.7619
2.2882 99.0 594 3.1299 12.9589 15.7619
2.4893 100.0 600 3.1297 12.9589 15.7619

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
0
Safetensors
Model size
1.23B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from