Edit model card

mt5-small-finetuned-mt5

This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6691
  • Rouge1: 0.5388
  • Rouge2: 0.3838
  • Rougel: 0.5283
  • Rougelsum: 0.5270

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.6e-05
  • train_batch_size: 20
  • eval_batch_size: 20
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum
12.893 1.0 8 7.2101 0.0967 0.0309 0.0928 0.0928
12.4326 2.0 16 6.0616 0.1183 0.0458 0.1140 0.1141
12.0044 3.0 24 5.5399 0.1239 0.0469 0.1212 0.1200
11.4794 4.0 32 5.2619 0.1504 0.0541 0.1450 0.1470
10.85 5.0 40 4.8356 0.1675 0.0574 0.1605 0.1626
10.2044 6.0 48 4.2656 0.1933 0.0746 0.1862 0.1905
9.2904 7.0 56 3.7518 0.1983 0.0787 0.1891 0.1921
8.7029 8.0 64 3.4376 0.1873 0.0698 0.1797 0.1818
8.3889 9.0 72 3.2085 0.1811 0.0672 0.1738 0.1771
7.5091 10.0 80 3.0059 0.1581 0.0581 0.1557 0.1564
7.2132 11.0 88 2.8329 0.1654 0.0466 0.1623 0.1616
6.796 12.0 96 2.6879 0.1735 0.0486 0.1620 0.1617
6.4213 13.0 104 2.5694 0.1799 0.0482 0.1722 0.1726
5.7867 14.0 112 2.4405 0.1776 0.0497 0.1720 0.1715
5.2668 15.0 120 2.3098 0.1860 0.0521 0.1759 0.1766
5.0803 16.0 128 2.1944 0.2010 0.0677 0.1931 0.1939
4.6867 17.0 136 2.1139 0.2179 0.0811 0.2114 0.2117
4.5557 18.0 144 2.0466 0.2186 0.0805 0.2099 0.2103
4.4414 19.0 152 1.9919 0.2260 0.0916 0.2177 0.2172
4.0867 20.0 160 1.9404 0.2317 0.0976 0.2228 0.2221
3.6814 21.0 168 1.9014 0.2287 0.0921 0.2170 0.2157
3.5426 22.0 176 1.8656 0.2208 0.0862 0.2139 0.2131
3.266 23.0 184 1.8224 0.2348 0.0935 0.2232 0.2224
3.32 24.0 192 1.7907 0.2443 0.1072 0.2355 0.2348
3.1872 25.0 200 1.7459 0.2563 0.1121 0.2421 0.2414
2.9643 26.0 208 1.7043 0.2703 0.1213 0.2598 0.2591
2.8918 27.0 216 1.6654 0.2755 0.1190 0.2633 0.2634
2.7626 28.0 224 1.6199 0.3008 0.1385 0.2870 0.2861
2.8192 29.0 232 1.5712 0.3061 0.1410 0.2948 0.2942
2.5082 30.0 240 1.5405 0.3161 0.1533 0.3073 0.3069
2.564 31.0 248 1.5111 0.3296 0.1662 0.3198 0.3196
2.5577 32.0 256 1.4738 0.3344 0.1745 0.3250 0.3247
2.5199 33.0 264 1.4378 0.3468 0.1829 0.3336 0.3328
2.4798 34.0 272 1.4033 0.3593 0.1969 0.3448 0.3450
2.3208 35.0 280 1.3733 0.3728 0.2146 0.3613 0.3609
2.3704 36.0 288 1.3403 0.3721 0.2175 0.3644 0.3649
2.3199 37.0 296 1.3092 0.3718 0.2147 0.3638 0.3631
2.3046 38.0 304 1.2838 0.3674 0.2141 0.3608 0.3610
2.3183 39.0 312 1.2599 0.3728 0.2202 0.3664 0.3669
2.178 40.0 320 1.2272 0.3826 0.2274 0.3758 0.3749
2.1264 41.0 328 1.1940 0.3923 0.2348 0.3841 0.3835
2.0563 42.0 336 1.1629 0.3972 0.2391 0.3864 0.3865
2.0213 43.0 344 1.1324 0.4082 0.2509 0.3981 0.3980
1.9956 44.0 352 1.1085 0.4158 0.2569 0.4051 0.4054
2.0723 45.0 360 1.0895 0.4186 0.2594 0.4060 0.4061
1.9021 46.0 368 1.0713 0.4316 0.2775 0.4193 0.4194
1.9776 47.0 376 1.0510 0.4362 0.2785 0.4232 0.4237
1.8752 48.0 384 1.0289 0.4371 0.2778 0.4225 0.4230
1.8729 49.0 392 1.0070 0.4386 0.2766 0.4243 0.4245
1.9136 50.0 400 0.9900 0.4368 0.2773 0.4240 0.4232
1.86 51.0 408 0.9765 0.4413 0.2818 0.4291 0.4283
1.8629 52.0 416 0.9670 0.4494 0.2909 0.4386 0.4376
1.8345 53.0 424 0.9554 0.4515 0.2942 0.4402 0.4393
1.7786 54.0 432 0.9430 0.4559 0.2980 0.4439 0.4430
1.7535 55.0 440 0.9284 0.4585 0.3016 0.4480 0.4461
1.788 56.0 448 0.9126 0.4680 0.3096 0.4578 0.4568
1.6512 57.0 456 0.9015 0.4803 0.3201 0.4699 0.4691
1.7463 58.0 464 0.8937 0.4813 0.3194 0.4697 0.4693
1.7705 59.0 472 0.8835 0.4805 0.3192 0.4680 0.4673
1.6796 60.0 480 0.8709 0.4797 0.3168 0.4673 0.4667
1.652 61.0 488 0.8588 0.4811 0.3182 0.4686 0.4684
1.6272 62.0 496 0.8470 0.4812 0.3196 0.4696 0.4690
1.6013 63.0 504 0.8357 0.4910 0.3298 0.4779 0.4781
1.5951 64.0 512 0.8268 0.4948 0.3344 0.4818 0.4822
1.5817 65.0 520 0.8164 0.4896 0.3313 0.4787 0.4777
1.6403 66.0 528 0.8064 0.4983 0.3419 0.4867 0.4862
1.6281 67.0 536 0.7955 0.4992 0.3426 0.4866 0.4866
1.6482 68.0 544 0.7881 0.4990 0.3404 0.4860 0.4860
1.6103 69.0 552 0.7822 0.4997 0.3401 0.4882 0.4872
1.5396 70.0 560 0.7769 0.5023 0.3411 0.4896 0.4890
1.5271 71.0 568 0.7696 0.5040 0.3396 0.4908 0.4899
1.4252 72.0 576 0.7614 0.5128 0.3521 0.4999 0.4994
1.553 73.0 584 0.7541 0.5145 0.3525 0.5017 0.5012
1.5503 74.0 592 0.7475 0.5193 0.3561 0.5052 0.5047
1.4653 75.0 600 0.7415 0.5151 0.3540 0.5020 0.5018
1.5387 76.0 608 0.7355 0.5267 0.3632 0.5126 0.5121
1.5706 77.0 616 0.7292 0.5232 0.3628 0.5101 0.5096
1.4442 78.0 624 0.7229 0.5208 0.3626 0.5086 0.5082
1.4816 79.0 632 0.7173 0.5193 0.3606 0.5070 0.5060
1.5228 80.0 640 0.7119 0.5180 0.3596 0.5057 0.5053
1.4623 81.0 648 0.7077 0.5228 0.3645 0.5104 0.5092
1.4077 82.0 656 0.7025 0.5266 0.3699 0.5164 0.5156
1.4069 83.0 664 0.6977 0.5318 0.3749 0.5212 0.5203
1.4191 84.0 672 0.6934 0.5307 0.3732 0.5200 0.5192
1.4564 85.0 680 0.6898 0.5317 0.3764 0.5213 0.5202
1.4195 86.0 688 0.6872 0.5311 0.3751 0.5203 0.5186
1.422 87.0 696 0.6843 0.5319 0.3762 0.5212 0.5196
1.4821 88.0 704 0.6822 0.5355 0.3812 0.5254 0.5242
1.539 89.0 712 0.6809 0.5349 0.3792 0.5246 0.5234
1.4914 90.0 720 0.6793 0.5341 0.3785 0.5233 0.5221
1.4247 91.0 728 0.6774 0.5349 0.3795 0.5242 0.5229
1.4937 92.0 736 0.6757 0.5350 0.3788 0.5238 0.5226
1.3732 93.0 744 0.6741 0.5362 0.3809 0.5256 0.5243
1.3991 94.0 752 0.6729 0.5362 0.3816 0.5261 0.5249
1.481 95.0 760 0.6716 0.5384 0.3836 0.5280 0.5266
1.3902 96.0 768 0.6707 0.5384 0.3836 0.5280 0.5266
1.5239 97.0 776 0.6700 0.5388 0.3838 0.5283 0.5270
1.4486 98.0 784 0.6695 0.5388 0.3844 0.5290 0.5277
1.3551 99.0 792 0.6692 0.5388 0.3838 0.5283 0.5270
1.4213 100.0 800 0.6691 0.5388 0.3838 0.5283 0.5270

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from