Edit model card

cs_mT5_0.01_100_v0.2

This model is a fine-tuned version of google/mt5-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.1529
  • Bleu: 1.1802
  • Gen Len: 19.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.01
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
7.8225 1.0 6 7.8823 0.1442 19.0
6.4593 2.0 12 6.2595 0.0 19.0
5.7362 3.0 18 5.8728 0.5829 19.0
5.1022 4.0 24 6.0663 0.5829 19.0
5.1499 5.0 30 6.1787 0.5829 19.0
4.4478 6.0 36 6.1807 0.2295 19.0
4.6633 7.0 42 5.8996 0.5829 19.0
4.6893 8.0 48 6.0757 0.0 19.0
5.3314 9.0 54 5.8488 0.5829 19.0
4.9772 10.0 60 5.8862 0.5829 19.0
4.6109 11.0 66 6.0246 0.6041 19.0
4.133 12.0 72 5.9013 0.5829 19.0
4.9029 13.0 78 6.0611 0.6006 19.0
3.9306 14.0 84 5.8331 0.5938 19.0
4.2939 15.0 90 6.0608 0.202 19.0
3.7392 16.0 96 5.9132 0.4958 19.0
4.0965 17.0 102 6.0289 0.6011 19.0
4.8056 18.0 108 5.8952 0.6233 19.0
4.9226 19.0 114 6.1260 0.2865 19.0
3.463 20.0 120 6.0577 0.5829 19.0
3.6935 21.0 126 5.9865 0.6482 19.0
4.8423 22.0 132 6.1672 0.5938 19.0
4.1419 23.0 138 5.9532 0.0 19.0
4.114 24.0 144 5.9337 0.1363 19.0
3.687 25.0 150 5.9786 0.0 19.0
4.4531 26.0 156 6.2074 0.1645 8.0
3.7463 27.0 162 6.1692 0.0 19.0
4.1026 28.0 168 6.0478 0.0 19.0
3.8369 29.0 174 6.0615 0.0 19.0
3.7155 30.0 180 6.1976 0.6323 19.0
3.8799 31.0 186 6.2384 0.0 19.0
4.2195 32.0 192 6.1328 0.5829 19.0
5.1049 33.0 198 5.9780 0.5829 19.0
4.1496 34.0 204 6.0294 0.6233 19.0
3.8001 35.0 210 6.1042 0.1346 19.0
3.4327 36.0 216 5.8325 0.1023 19.0
4.1074 37.0 222 6.1190 0.611 19.0
3.84 38.0 228 6.4321 0.2321 19.0
3.7483 39.0 234 6.2523 0.2795 19.0
3.9157 40.0 240 6.2355 0.4213 19.0
3.3449 41.0 246 6.1757 0.611 19.0
3.5886 42.0 252 6.0657 0.5938 19.0
3.5048 43.0 258 6.0277 0.5829 19.0
3.7519 44.0 264 6.5569 0.681 19.0
3.7334 45.0 270 6.0739 0.5938 19.0
3.8206 46.0 276 6.0092 0.6401 19.0
3.5061 47.0 282 6.0719 0.6488 19.0
3.4392 48.0 288 6.0652 0.59 19.0
3.6158 49.0 294 6.0207 0.5829 19.0
3.1909 50.0 300 6.2023 0.1442 19.0
3.2138 51.0 306 6.1003 0.6233 19.0
4.0992 52.0 312 6.2286 0.5896 19.0
3.4983 53.0 318 6.2911 0.6006 19.0
3.0111 54.0 324 6.4124 0.3004 11.0
3.251 55.0 330 5.9168 0.7833 14.0
3.2281 56.0 336 6.0207 0.3379 19.0
3.6692 57.0 342 6.1399 0.6395 19.0
2.8706 58.0 348 6.4675 0.2317 19.0
3.7137 59.0 354 6.1596 0.0 19.0
3.6537 60.0 360 6.2131 0.5938 19.0
3.2023 61.0 366 6.2877 0.6323 19.0
2.3914 62.0 372 6.5001 0.6323 19.0
2.8612 63.0 378 6.5683 0.7084 19.0
3.1646 64.0 384 6.7003 0.2039 9.0
3.1234 65.0 390 6.1225 0.4851 11.0
3.0967 66.0 396 6.2517 0.5896 19.0
2.5832 67.0 402 6.3071 0.5896 19.0
3.2709 68.0 408 6.5033 0.6482 19.0
3.2511 69.0 414 6.4329 0.6395 19.0
2.7053 70.0 420 6.5449 0.6323 19.0
3.4684 71.0 426 6.9512 0.2914 19.0
2.7875 72.0 432 6.6579 0.6006 19.0
2.5674 73.0 438 6.4629 0.6395 19.0
2.3457 74.0 444 6.6680 0.7084 19.0
2.339 75.0 450 6.7313 0.7333 19.0
3.4058 76.0 456 6.7786 0.3105 16.0
2.5678 77.0 462 6.6553 0.5896 19.0
2.9506 78.0 468 6.9532 0.3379 19.0
2.2285 79.0 474 7.1575 0.191 6.0
2.5635 80.0 480 7.1580 0.2837 15.0
1.8763 81.0 486 7.0203 0.5896 19.0
3.3663 82.0 492 6.7150 0.5896 19.0
2.1434 83.0 498 6.5911 0.5896 19.0
2.6678 84.0 504 6.7084 0.5829 19.0
3.7082 85.0 510 6.7475 0.4447 13.0
3.3436 86.0 516 6.6436 0.2223 19.0
2.3866 87.0 522 6.6915 0.5896 19.0
2.0647 88.0 528 7.0000 0.5896 19.0
2.7861 89.0 534 7.1116 0.1346 19.0
2.5331 90.0 540 7.0207 0.0 19.0
2.3609 91.0 546 7.0159 0.2837 12.0
2.5884 92.0 552 6.9928 0.1628 19.0
2.2198 93.0 558 7.0179 0.6885 19.0
2.4258 94.0 564 7.0429 0.5938 19.0
1.9681 95.0 570 7.0348 1.3808 19.0
2.2643 96.0 576 7.0513 1.1802 19.0
2.1551 97.0 582 7.0741 1.1558 19.0
2.1624 98.0 588 7.1079 1.1558 19.0
2.6342 99.0 594 7.1447 1.1558 19.0
1.1566 100.0 600 7.1529 1.1802 19.0

Framework versions

  • Transformers 4.35.2
  • Pytorch 1.13.1+cu117
  • Datasets 2.17.0
  • Tokenizers 0.15.2
Downloads last month
1
Safetensors
Model size
582M params
Tensor type
F32
·

Finetuned from