mt5-small-thaisum
This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.0602
- Rouge1: 0.0804
- Rouge2: 0.0092
- Rougel: 0.079
- Rougelsum: 0.0804
- Gen Len: 19.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 100
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
No log | 1.0 | 200 | 2.9811 | 0.0876 | 0.0172 | 0.0883 | 0.0879 | 18.3775 |
No log | 2.0 | 400 | 2.8529 | 0.0814 | 0.0205 | 0.0824 | 0.0818 | 18.6075 |
4.6025 | 3.0 | 600 | 2.7693 | 0.08 | 0.0182 | 0.0801 | 0.0796 | 18.7575 |
4.6025 | 4.0 | 800 | 2.6774 | 0.0874 | 0.0195 | 0.0859 | 0.0861 | 18.8075 |
3.1265 | 5.0 | 1000 | 2.6566 | 0.0924 | 0.0243 | 0.0903 | 0.0906 | 18.92 |
3.1265 | 6.0 | 1200 | 2.6393 | 0.0993 | 0.0232 | 0.0994 | 0.0994 | 18.915 |
3.1265 | 7.0 | 1400 | 2.6257 | 0.092 | 0.0191 | 0.0914 | 0.0915 | 18.9725 |
2.8328 | 8.0 | 1600 | 2.6159 | 0.095 | 0.0187 | 0.0939 | 0.0944 | 18.9575 |
2.8328 | 9.0 | 1800 | 2.5708 | 0.0978 | 0.0223 | 0.0975 | 0.0976 | 18.9575 |
2.6385 | 10.0 | 2000 | 2.5637 | 0.09 | 0.0188 | 0.0902 | 0.0907 | 18.965 |
2.6385 | 11.0 | 2200 | 2.5621 | 0.0989 | 0.0179 | 0.0974 | 0.0979 | 18.975 |
2.6385 | 12.0 | 2400 | 2.5490 | 0.1002 | 0.0206 | 0.0996 | 0.0999 | 18.98 |
2.4812 | 13.0 | 2600 | 2.5260 | 0.1001 | 0.0175 | 0.0987 | 0.0994 | 18.9875 |
2.4812 | 14.0 | 2800 | 2.5626 | 0.0987 | 0.0234 | 0.098 | 0.0986 | 18.9875 |
2.3701 | 15.0 | 3000 | 2.5042 | 0.0965 | 0.021 | 0.0955 | 0.0965 | 18.99 |
2.3701 | 16.0 | 3200 | 2.5017 | 0.1038 | 0.021 | 0.1024 | 0.1034 | 18.99 |
2.3701 | 17.0 | 3400 | 2.5204 | 0.0914 | 0.0192 | 0.0905 | 0.0909 | 18.99 |
2.2397 | 18.0 | 3600 | 2.5311 | 0.0909 | 0.01 | 0.0896 | 0.0901 | 18.99 |
2.2397 | 19.0 | 3800 | 2.5149 | 0.0938 | 0.0131 | 0.0932 | 0.0939 | 18.99 |
2.1529 | 20.0 | 4000 | 2.5644 | 0.0944 | 0.016 | 0.0934 | 0.0949 | 18.99 |
2.1529 | 21.0 | 4200 | 2.5408 | 0.0918 | 0.0138 | 0.0906 | 0.0905 | 18.99 |
2.1529 | 22.0 | 4400 | 2.5415 | 0.083 | 0.0096 | 0.0823 | 0.0829 | 18.99 |
2.0546 | 23.0 | 4600 | 2.5564 | 0.0927 | 0.0154 | 0.092 | 0.0921 | 18.99 |
2.0546 | 24.0 | 4800 | 2.5778 | 0.0877 | 0.015 | 0.0868 | 0.0874 | 18.9875 |
1.9784 | 25.0 | 5000 | 2.5437 | 0.09 | 0.0144 | 0.0893 | 0.0902 | 18.99 |
1.9784 | 26.0 | 5200 | 2.5665 | 0.0838 | 0.0096 | 0.0826 | 0.0831 | 18.985 |
1.9784 | 27.0 | 5400 | 2.5867 | 0.0922 | 0.0129 | 0.0909 | 0.0921 | 18.99 |
1.8855 | 28.0 | 5600 | 2.5676 | 0.0887 | 0.0167 | 0.0882 | 0.0885 | 18.995 |
1.8855 | 29.0 | 5800 | 2.5806 | 0.0889 | 0.0162 | 0.0876 | 0.0881 | 18.9975 |
1.8416 | 30.0 | 6000 | 2.5722 | 0.0951 | 0.0183 | 0.0936 | 0.095 | 19.0 |
1.8416 | 31.0 | 6200 | 2.5865 | 0.0866 | 0.0119 | 0.0857 | 0.0862 | 18.99 |
1.8416 | 32.0 | 6400 | 2.5779 | 0.0956 | 0.0204 | 0.0933 | 0.0939 | 19.0 |
1.767 | 33.0 | 6600 | 2.6038 | 0.0954 | 0.02 | 0.0942 | 0.0962 | 18.99 |
1.767 | 34.0 | 6800 | 2.6091 | 0.0951 | 0.0192 | 0.0941 | 0.0953 | 19.0 |
1.6979 | 35.0 | 7000 | 2.6599 | 0.0813 | 0.0138 | 0.0802 | 0.0805 | 18.9875 |
1.6979 | 36.0 | 7200 | 2.6338 | 0.0905 | 0.015 | 0.0898 | 0.0901 | 18.99 |
1.6979 | 37.0 | 7400 | 2.6515 | 0.0898 | 0.015 | 0.0892 | 0.0894 | 18.985 |
1.6445 | 38.0 | 7600 | 2.6514 | 0.0797 | 0.0142 | 0.0788 | 0.079 | 18.985 |
1.6445 | 39.0 | 7800 | 2.6418 | 0.0905 | 0.0196 | 0.0884 | 0.089 | 19.0 |
1.5932 | 40.0 | 8000 | 2.6520 | 0.0764 | 0.0146 | 0.0758 | 0.0766 | 19.0 |
1.5932 | 41.0 | 8200 | 2.6709 | 0.0793 | 0.01 | 0.0773 | 0.0788 | 18.99 |
1.5932 | 42.0 | 8400 | 2.6838 | 0.0838 | 0.0162 | 0.0824 | 0.0836 | 18.99 |
1.5379 | 43.0 | 8600 | 2.6731 | 0.0876 | 0.0154 | 0.086 | 0.087 | 18.9875 |
1.5379 | 44.0 | 8800 | 2.6804 | 0.0902 | 0.0161 | 0.0889 | 0.0898 | 18.99 |
1.4868 | 45.0 | 9000 | 2.6993 | 0.0852 | 0.0158 | 0.0839 | 0.0849 | 18.99 |
1.4868 | 46.0 | 9200 | 2.6954 | 0.0785 | 0.0117 | 0.0778 | 0.0778 | 19.0 |
1.4868 | 47.0 | 9400 | 2.7108 | 0.0837 | 0.015 | 0.0823 | 0.0828 | 18.9875 |
1.4401 | 48.0 | 9600 | 2.7319 | 0.09 | 0.0154 | 0.0886 | 0.0888 | 18.9975 |
1.4401 | 49.0 | 9800 | 2.7670 | 0.0924 | 0.0175 | 0.0912 | 0.0919 | 18.99 |
1.4115 | 50.0 | 10000 | 2.7521 | 0.0853 | 0.017 | 0.0838 | 0.0846 | 19.0 |
1.4115 | 51.0 | 10200 | 2.7725 | 0.0885 | 0.0198 | 0.0869 | 0.0879 | 18.99 |
1.4115 | 52.0 | 10400 | 2.7646 | 0.0825 | 0.0158 | 0.0815 | 0.0824 | 19.0 |
1.3594 | 53.0 | 10600 | 2.7719 | 0.0842 | 0.0112 | 0.083 | 0.0835 | 18.99 |
1.3594 | 54.0 | 10800 | 2.8063 | 0.0884 | 0.0158 | 0.0868 | 0.0874 | 18.99 |
1.3279 | 55.0 | 11000 | 2.8011 | 0.0926 | 0.0175 | 0.0906 | 0.091 | 19.0 |
1.3279 | 56.0 | 11200 | 2.8125 | 0.0897 | 0.0125 | 0.0879 | 0.0892 | 19.0 |
1.3279 | 57.0 | 11400 | 2.8529 | 0.088 | 0.0158 | 0.0861 | 0.0863 | 19.0 |
1.2902 | 58.0 | 11600 | 2.8338 | 0.0952 | 0.0133 | 0.093 | 0.0939 | 19.0 |
1.2902 | 59.0 | 11800 | 2.8260 | 0.091 | 0.0167 | 0.0885 | 0.0894 | 19.0 |
1.2662 | 60.0 | 12000 | 2.8383 | 0.0948 | 0.014 | 0.0916 | 0.0925 | 19.0 |
1.2662 | 61.0 | 12200 | 2.8589 | 0.0861 | 0.0158 | 0.0835 | 0.0848 | 19.0 |
1.2662 | 62.0 | 12400 | 2.8772 | 0.0855 | 0.0158 | 0.0831 | 0.084 | 18.99 |
1.2309 | 63.0 | 12600 | 2.8566 | 0.0847 | 0.0158 | 0.0836 | 0.0833 | 19.0 |
1.2309 | 64.0 | 12800 | 2.8800 | 0.0848 | 0.0158 | 0.0833 | 0.0841 | 19.0 |
1.1995 | 65.0 | 13000 | 2.8895 | 0.0855 | 0.0112 | 0.0841 | 0.0852 | 19.0 |
1.1995 | 66.0 | 13200 | 2.8857 | 0.0865 | 0.0133 | 0.0845 | 0.0856 | 19.0 |
1.1995 | 67.0 | 13400 | 2.8999 | 0.0891 | 0.015 | 0.0875 | 0.0885 | 19.0 |
1.1799 | 68.0 | 13600 | 2.9295 | 0.0866 | 0.0133 | 0.0838 | 0.0856 | 18.9875 |
1.1799 | 69.0 | 13800 | 2.9197 | 0.0782 | 0.0083 | 0.0771 | 0.0781 | 18.9875 |
1.1527 | 70.0 | 14000 | 2.9388 | 0.0789 | 0.0125 | 0.0778 | 0.0784 | 18.9975 |
1.1527 | 71.0 | 14200 | 2.9264 | 0.0736 | 0.0075 | 0.0725 | 0.0736 | 19.0 |
1.1527 | 72.0 | 14400 | 2.9597 | 0.0862 | 0.01 | 0.0848 | 0.0859 | 19.0 |
1.1322 | 73.0 | 14600 | 2.9623 | 0.084 | 0.0092 | 0.083 | 0.0839 | 19.0 |
1.1322 | 74.0 | 14800 | 2.9681 | 0.08 | 0.011 | 0.0792 | 0.0798 | 19.0 |
1.1075 | 75.0 | 15000 | 2.9713 | 0.0844 | 0.0142 | 0.0825 | 0.0827 | 19.0 |
1.1075 | 76.0 | 15200 | 2.9669 | 0.0815 | 0.0133 | 0.0803 | 0.0808 | 19.0 |
1.1075 | 77.0 | 15400 | 2.9680 | 0.0718 | 0.0092 | 0.0712 | 0.0723 | 18.99 |
1.0893 | 78.0 | 15600 | 2.9932 | 0.0811 | 0.0108 | 0.0803 | 0.0812 | 18.99 |
1.0893 | 79.0 | 15800 | 2.9979 | 0.0832 | 0.0133 | 0.082 | 0.083 | 18.99 |
1.0733 | 80.0 | 16000 | 2.9997 | 0.0863 | 0.0142 | 0.085 | 0.0858 | 18.99 |
1.0733 | 81.0 | 16200 | 2.9965 | 0.086 | 0.0162 | 0.0845 | 0.0858 | 18.99 |
1.0733 | 82.0 | 16400 | 3.0170 | 0.0813 | 0.0125 | 0.0788 | 0.0799 | 18.99 |
1.0617 | 83.0 | 16600 | 2.9955 | 0.0865 | 0.0125 | 0.0839 | 0.0851 | 18.99 |
1.0617 | 84.0 | 16800 | 3.0299 | 0.0912 | 0.015 | 0.0903 | 0.0912 | 19.0 |
1.0417 | 85.0 | 17000 | 3.0333 | 0.089 | 0.0142 | 0.0877 | 0.0878 | 19.0 |
1.0417 | 86.0 | 17200 | 3.0282 | 0.0948 | 0.0158 | 0.0941 | 0.0938 | 19.0 |
1.0417 | 87.0 | 17400 | 3.0416 | 0.0906 | 0.015 | 0.0897 | 0.0908 | 19.0 |
1.0316 | 88.0 | 17600 | 3.0376 | 0.0874 | 0.0142 | 0.0871 | 0.0876 | 19.0 |
1.0316 | 89.0 | 17800 | 3.0287 | 0.0873 | 0.0142 | 0.0864 | 0.0871 | 19.0 |
1.0285 | 90.0 | 18000 | 3.0384 | 0.0798 | 0.0142 | 0.079 | 0.0797 | 19.0 |
1.0285 | 91.0 | 18200 | 3.0465 | 0.0759 | 0.01 | 0.0751 | 0.076 | 19.0 |
1.0285 | 92.0 | 18400 | 3.0444 | 0.0797 | 0.0117 | 0.0788 | 0.0794 | 19.0 |
1.0218 | 93.0 | 18600 | 3.0524 | 0.0751 | 0.0075 | 0.0744 | 0.0754 | 19.0 |
1.0218 | 94.0 | 18800 | 3.0524 | 0.0744 | 0.0075 | 0.0736 | 0.0746 | 19.0 |
1.0086 | 95.0 | 19000 | 3.0466 | 0.0815 | 0.0092 | 0.079 | 0.0803 | 19.0 |
1.0086 | 96.0 | 19200 | 3.0561 | 0.0804 | 0.0092 | 0.079 | 0.0804 | 19.0 |
1.0086 | 97.0 | 19400 | 3.0596 | 0.0774 | 0.0092 | 0.0761 | 0.0774 | 19.0 |
1.0027 | 98.0 | 19600 | 3.0548 | 0.0792 | 0.0092 | 0.0778 | 0.0789 | 19.0 |
1.0027 | 99.0 | 19800 | 3.0601 | 0.0804 | 0.0092 | 0.079 | 0.0804 | 19.0 |
1.0023 | 100.0 | 20000 | 3.0602 | 0.0804 | 0.0092 | 0.079 | 0.0804 | 19.0 |
Framework versions
- Transformers 4.29.1
- Pytorch 2.0.0+cu118
- Datasets 2.12.0
- Tokenizers 0.13.3
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.