--- license: apache-2.0 base_model: google/mt5-small tags: - generated_from_trainer metrics: - rouge model-index: - name: mt5-small-finetuned-mt5 results: [] --- # mt5-small-finetuned-mt5 This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.6691 - Rouge1: 0.5388 - Rouge2: 0.3838 - Rougel: 0.5283 - Rougelsum: 0.5270 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5.6e-05 - train_batch_size: 20 - eval_batch_size: 20 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 100 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:| | 12.893 | 1.0 | 8 | 7.2101 | 0.0967 | 0.0309 | 0.0928 | 0.0928 | | 12.4326 | 2.0 | 16 | 6.0616 | 0.1183 | 0.0458 | 0.1140 | 0.1141 | | 12.0044 | 3.0 | 24 | 5.5399 | 0.1239 | 0.0469 | 0.1212 | 0.1200 | | 11.4794 | 4.0 | 32 | 5.2619 | 0.1504 | 0.0541 | 0.1450 | 0.1470 | | 10.85 | 5.0 | 40 | 4.8356 | 0.1675 | 0.0574 | 0.1605 | 0.1626 | | 10.2044 | 6.0 | 48 | 4.2656 | 0.1933 | 0.0746 | 0.1862 | 0.1905 | | 9.2904 | 7.0 | 56 | 3.7518 | 0.1983 | 0.0787 | 0.1891 | 0.1921 | | 8.7029 | 8.0 | 64 | 3.4376 | 0.1873 | 0.0698 | 0.1797 | 0.1818 | | 8.3889 | 9.0 | 72 | 3.2085 | 0.1811 | 0.0672 | 0.1738 | 0.1771 | | 7.5091 | 10.0 | 80 | 3.0059 | 0.1581 | 0.0581 | 0.1557 | 0.1564 | | 7.2132 | 11.0 | 88 | 2.8329 | 0.1654 | 0.0466 | 0.1623 | 0.1616 | | 6.796 | 12.0 | 96 | 2.6879 | 0.1735 | 0.0486 | 0.1620 | 0.1617 | | 6.4213 | 13.0 | 104 | 2.5694 | 0.1799 | 0.0482 | 0.1722 | 0.1726 | | 5.7867 | 14.0 | 112 | 2.4405 | 0.1776 | 0.0497 | 0.1720 | 0.1715 | | 5.2668 | 15.0 | 120 | 2.3098 | 0.1860 | 0.0521 | 0.1759 | 0.1766 | | 5.0803 | 16.0 | 128 | 2.1944 | 0.2010 | 0.0677 | 0.1931 | 0.1939 | | 4.6867 | 17.0 | 136 | 2.1139 | 0.2179 | 0.0811 | 0.2114 | 0.2117 | | 4.5557 | 18.0 | 144 | 2.0466 | 0.2186 | 0.0805 | 0.2099 | 0.2103 | | 4.4414 | 19.0 | 152 | 1.9919 | 0.2260 | 0.0916 | 0.2177 | 0.2172 | | 4.0867 | 20.0 | 160 | 1.9404 | 0.2317 | 0.0976 | 0.2228 | 0.2221 | | 3.6814 | 21.0 | 168 | 1.9014 | 0.2287 | 0.0921 | 0.2170 | 0.2157 | | 3.5426 | 22.0 | 176 | 1.8656 | 0.2208 | 0.0862 | 0.2139 | 0.2131 | | 3.266 | 23.0 | 184 | 1.8224 | 0.2348 | 0.0935 | 0.2232 | 0.2224 | | 3.32 | 24.0 | 192 | 1.7907 | 0.2443 | 0.1072 | 0.2355 | 0.2348 | | 3.1872 | 25.0 | 200 | 1.7459 | 0.2563 | 0.1121 | 0.2421 | 0.2414 | | 2.9643 | 26.0 | 208 | 1.7043 | 0.2703 | 0.1213 | 0.2598 | 0.2591 | | 2.8918 | 27.0 | 216 | 1.6654 | 0.2755 | 0.1190 | 0.2633 | 0.2634 | | 2.7626 | 28.0 | 224 | 1.6199 | 0.3008 | 0.1385 | 0.2870 | 0.2861 | | 2.8192 | 29.0 | 232 | 1.5712 | 0.3061 | 0.1410 | 0.2948 | 0.2942 | | 2.5082 | 30.0 | 240 | 1.5405 | 0.3161 | 0.1533 | 0.3073 | 0.3069 | | 2.564 | 31.0 | 248 | 1.5111 | 0.3296 | 0.1662 | 0.3198 | 0.3196 | | 2.5577 | 32.0 | 256 | 1.4738 | 0.3344 | 0.1745 | 0.3250 | 0.3247 | | 2.5199 | 33.0 | 264 | 1.4378 | 0.3468 | 0.1829 | 0.3336 | 0.3328 | | 2.4798 | 34.0 | 272 | 1.4033 | 0.3593 | 0.1969 | 0.3448 | 0.3450 | | 2.3208 | 35.0 | 280 | 1.3733 | 0.3728 | 0.2146 | 0.3613 | 0.3609 | | 2.3704 | 36.0 | 288 | 1.3403 | 0.3721 | 0.2175 | 0.3644 | 0.3649 | | 2.3199 | 37.0 | 296 | 1.3092 | 0.3718 | 0.2147 | 0.3638 | 0.3631 | | 2.3046 | 38.0 | 304 | 1.2838 | 0.3674 | 0.2141 | 0.3608 | 0.3610 | | 2.3183 | 39.0 | 312 | 1.2599 | 0.3728 | 0.2202 | 0.3664 | 0.3669 | | 2.178 | 40.0 | 320 | 1.2272 | 0.3826 | 0.2274 | 0.3758 | 0.3749 | | 2.1264 | 41.0 | 328 | 1.1940 | 0.3923 | 0.2348 | 0.3841 | 0.3835 | | 2.0563 | 42.0 | 336 | 1.1629 | 0.3972 | 0.2391 | 0.3864 | 0.3865 | | 2.0213 | 43.0 | 344 | 1.1324 | 0.4082 | 0.2509 | 0.3981 | 0.3980 | | 1.9956 | 44.0 | 352 | 1.1085 | 0.4158 | 0.2569 | 0.4051 | 0.4054 | | 2.0723 | 45.0 | 360 | 1.0895 | 0.4186 | 0.2594 | 0.4060 | 0.4061 | | 1.9021 | 46.0 | 368 | 1.0713 | 0.4316 | 0.2775 | 0.4193 | 0.4194 | | 1.9776 | 47.0 | 376 | 1.0510 | 0.4362 | 0.2785 | 0.4232 | 0.4237 | | 1.8752 | 48.0 | 384 | 1.0289 | 0.4371 | 0.2778 | 0.4225 | 0.4230 | | 1.8729 | 49.0 | 392 | 1.0070 | 0.4386 | 0.2766 | 0.4243 | 0.4245 | | 1.9136 | 50.0 | 400 | 0.9900 | 0.4368 | 0.2773 | 0.4240 | 0.4232 | | 1.86 | 51.0 | 408 | 0.9765 | 0.4413 | 0.2818 | 0.4291 | 0.4283 | | 1.8629 | 52.0 | 416 | 0.9670 | 0.4494 | 0.2909 | 0.4386 | 0.4376 | | 1.8345 | 53.0 | 424 | 0.9554 | 0.4515 | 0.2942 | 0.4402 | 0.4393 | | 1.7786 | 54.0 | 432 | 0.9430 | 0.4559 | 0.2980 | 0.4439 | 0.4430 | | 1.7535 | 55.0 | 440 | 0.9284 | 0.4585 | 0.3016 | 0.4480 | 0.4461 | | 1.788 | 56.0 | 448 | 0.9126 | 0.4680 | 0.3096 | 0.4578 | 0.4568 | | 1.6512 | 57.0 | 456 | 0.9015 | 0.4803 | 0.3201 | 0.4699 | 0.4691 | | 1.7463 | 58.0 | 464 | 0.8937 | 0.4813 | 0.3194 | 0.4697 | 0.4693 | | 1.7705 | 59.0 | 472 | 0.8835 | 0.4805 | 0.3192 | 0.4680 | 0.4673 | | 1.6796 | 60.0 | 480 | 0.8709 | 0.4797 | 0.3168 | 0.4673 | 0.4667 | | 1.652 | 61.0 | 488 | 0.8588 | 0.4811 | 0.3182 | 0.4686 | 0.4684 | | 1.6272 | 62.0 | 496 | 0.8470 | 0.4812 | 0.3196 | 0.4696 | 0.4690 | | 1.6013 | 63.0 | 504 | 0.8357 | 0.4910 | 0.3298 | 0.4779 | 0.4781 | | 1.5951 | 64.0 | 512 | 0.8268 | 0.4948 | 0.3344 | 0.4818 | 0.4822 | | 1.5817 | 65.0 | 520 | 0.8164 | 0.4896 | 0.3313 | 0.4787 | 0.4777 | | 1.6403 | 66.0 | 528 | 0.8064 | 0.4983 | 0.3419 | 0.4867 | 0.4862 | | 1.6281 | 67.0 | 536 | 0.7955 | 0.4992 | 0.3426 | 0.4866 | 0.4866 | | 1.6482 | 68.0 | 544 | 0.7881 | 0.4990 | 0.3404 | 0.4860 | 0.4860 | | 1.6103 | 69.0 | 552 | 0.7822 | 0.4997 | 0.3401 | 0.4882 | 0.4872 | | 1.5396 | 70.0 | 560 | 0.7769 | 0.5023 | 0.3411 | 0.4896 | 0.4890 | | 1.5271 | 71.0 | 568 | 0.7696 | 0.5040 | 0.3396 | 0.4908 | 0.4899 | | 1.4252 | 72.0 | 576 | 0.7614 | 0.5128 | 0.3521 | 0.4999 | 0.4994 | | 1.553 | 73.0 | 584 | 0.7541 | 0.5145 | 0.3525 | 0.5017 | 0.5012 | | 1.5503 | 74.0 | 592 | 0.7475 | 0.5193 | 0.3561 | 0.5052 | 0.5047 | | 1.4653 | 75.0 | 600 | 0.7415 | 0.5151 | 0.3540 | 0.5020 | 0.5018 | | 1.5387 | 76.0 | 608 | 0.7355 | 0.5267 | 0.3632 | 0.5126 | 0.5121 | | 1.5706 | 77.0 | 616 | 0.7292 | 0.5232 | 0.3628 | 0.5101 | 0.5096 | | 1.4442 | 78.0 | 624 | 0.7229 | 0.5208 | 0.3626 | 0.5086 | 0.5082 | | 1.4816 | 79.0 | 632 | 0.7173 | 0.5193 | 0.3606 | 0.5070 | 0.5060 | | 1.5228 | 80.0 | 640 | 0.7119 | 0.5180 | 0.3596 | 0.5057 | 0.5053 | | 1.4623 | 81.0 | 648 | 0.7077 | 0.5228 | 0.3645 | 0.5104 | 0.5092 | | 1.4077 | 82.0 | 656 | 0.7025 | 0.5266 | 0.3699 | 0.5164 | 0.5156 | | 1.4069 | 83.0 | 664 | 0.6977 | 0.5318 | 0.3749 | 0.5212 | 0.5203 | | 1.4191 | 84.0 | 672 | 0.6934 | 0.5307 | 0.3732 | 0.5200 | 0.5192 | | 1.4564 | 85.0 | 680 | 0.6898 | 0.5317 | 0.3764 | 0.5213 | 0.5202 | | 1.4195 | 86.0 | 688 | 0.6872 | 0.5311 | 0.3751 | 0.5203 | 0.5186 | | 1.422 | 87.0 | 696 | 0.6843 | 0.5319 | 0.3762 | 0.5212 | 0.5196 | | 1.4821 | 88.0 | 704 | 0.6822 | 0.5355 | 0.3812 | 0.5254 | 0.5242 | | 1.539 | 89.0 | 712 | 0.6809 | 0.5349 | 0.3792 | 0.5246 | 0.5234 | | 1.4914 | 90.0 | 720 | 0.6793 | 0.5341 | 0.3785 | 0.5233 | 0.5221 | | 1.4247 | 91.0 | 728 | 0.6774 | 0.5349 | 0.3795 | 0.5242 | 0.5229 | | 1.4937 | 92.0 | 736 | 0.6757 | 0.5350 | 0.3788 | 0.5238 | 0.5226 | | 1.3732 | 93.0 | 744 | 0.6741 | 0.5362 | 0.3809 | 0.5256 | 0.5243 | | 1.3991 | 94.0 | 752 | 0.6729 | 0.5362 | 0.3816 | 0.5261 | 0.5249 | | 1.481 | 95.0 | 760 | 0.6716 | 0.5384 | 0.3836 | 0.5280 | 0.5266 | | 1.3902 | 96.0 | 768 | 0.6707 | 0.5384 | 0.3836 | 0.5280 | 0.5266 | | 1.5239 | 97.0 | 776 | 0.6700 | 0.5388 | 0.3838 | 0.5283 | 0.5270 | | 1.4486 | 98.0 | 784 | 0.6695 | 0.5388 | 0.3844 | 0.5290 | 0.5277 | | 1.3551 | 99.0 | 792 | 0.6692 | 0.5388 | 0.3838 | 0.5283 | 0.5270 | | 1.4213 | 100.0 | 800 | 0.6691 | 0.5388 | 0.3838 | 0.5283 | 0.5270 | ### Framework versions - Transformers 4.37.2 - Pytorch 2.1.0+cu121 - Datasets 2.17.1 - Tokenizers 0.15.2