Edit model card

t5-base-finetuned-en-to-no

This model is a fine-tuned version of t5-base on the opus_books dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9566
  • Bleu: 4.8513
  • Gen Len: 17.84

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 280
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
3.3949 1.0 788 2.7553 0.9274 18.1314
2.8659 2.0 1576 2.5367 1.2755 18.1543
2.7244 3.0 2364 2.3900 1.6351 18.0343
2.5228 4.0 3152 2.2902 1.7125 18.0543
2.4201 5.0 3940 2.2039 1.7217 18.0914
2.3168 6.0 4728 2.1429 2.0474 18.08
2.1856 7.0 5516 2.0772 2.228 18.0686
2.12 8.0 6304 2.0333 2.1694 17.98
2.0519 9.0 7092 1.9931 2.257 17.9914
1.9856 10.0 7880 1.9540 2.489 18.04
1.9164 11.0 8668 1.9266 2.5762 17.9629
1.8864 12.0 9456 1.9036 2.8294 17.9857
1.8276 13.0 10244 1.8695 2.9018 17.98
1.7715 14.0 11032 1.8584 3.04 17.9886
1.7302 15.0 11820 1.8487 2.9588 18.0057
1.6768 16.0 12608 1.8155 3.1968 17.9943
1.6564 17.0 13396 1.8137 3.3315 17.9657
1.6039 18.0 14184 1.7863 3.4057 18.0629
1.5735 19.0 14972 1.7945 3.6905 17.9571
1.5319 20.0 15760 1.7830 3.5128 17.9714
1.4993 21.0 16548 1.7745 3.4125 18.0057
1.4622 22.0 17336 1.7655 3.3974 17.9543
1.448 23.0 18124 1.7599 3.75 17.9057
1.3995 24.0 18912 1.7557 3.6852 17.8286
1.373 25.0 19700 1.7478 3.5797 17.9343
1.3513 26.0 20488 1.7558 3.8526 17.8457
1.3291 27.0 21276 1.7485 3.7037 17.9143
1.3002 28.0 22064 1.7480 3.7433 17.96
1.2655 29.0 22852 1.7578 4.0584 17.8914
1.2354 30.0 23640 1.7514 4.2106 17.8686
1.2224 31.0 24428 1.7576 3.9906 17.9
1.1999 32.0 25216 1.7627 4.1242 17.92
1.1672 33.0 26004 1.7548 4.1584 17.8286
1.1547 34.0 26792 1.7446 4.1721 17.8143
1.1313 35.0 27580 1.7613 4.3958 17.8457
1.08 36.0 28368 1.7628 4.342 17.8829
1.0927 37.0 29156 1.7685 4.4468 17.8971
1.0751 38.0 29944 1.7731 4.4297 17.8886
1.0492 39.0 30732 1.7641 4.5174 17.8714
1.036 40.0 31520 1.7643 4.4578 17.84
1.0172 41.0 32308 1.7820 4.5795 17.8429
0.9966 42.0 33096 1.7830 4.3455 17.8743
0.9812 43.0 33884 1.7890 4.3988 17.8486
0.9624 44.0 34672 1.7953 4.5418 17.8143
0.9485 45.0 35460 1.8046 4.5402 17.8143
0.9383 46.0 36248 1.8010 4.5572 17.76
0.9175 47.0 37036 1.8153 4.5916 17.7943
0.8877 48.0 37824 1.8133 4.5799 17.7857
0.8877 49.0 38612 1.8254 4.6511 17.7657
0.8595 50.0 39400 1.8229 4.7338 17.7657
0.8533 51.0 40188 1.8402 4.7568 17.7571
0.8414 52.0 40976 1.8406 4.7573 17.8429
0.8191 53.0 41764 1.8499 4.6985 17.76
0.8228 54.0 42552 1.8629 4.7603 17.7114
0.7987 55.0 43340 1.8638 4.5511 17.8
0.7877 56.0 44128 1.8673 4.5068 17.7771
0.7829 57.0 44916 1.8862 4.6033 17.7943
0.7571 58.0 45704 1.8874 4.6694 17.7486
0.7542 59.0 46492 1.8996 4.7531 17.7571
0.7301 60.0 47280 1.8950 4.6951 17.7514
0.73 61.0 48068 1.9035 4.7867 17.7343
0.7065 62.0 48856 1.9127 4.5863 17.7257
0.7015 63.0 49644 1.9418 4.9026 17.8086
0.6921 64.0 50432 1.9322 4.8127 17.7943
0.6714 65.0 51220 1.9382 4.5343 17.7286
0.6599 66.0 52008 1.9508 4.5273 17.7343
0.6529 67.0 52796 1.9577 4.6274 17.7743
0.647 68.0 53584 1.9789 4.5575 17.7571
0.627 69.0 54372 1.9795 4.319 17.7371
0.6279 70.0 55160 1.9788 4.6788 17.7486
0.5867 71.0 55948 2.0100 4.557 17.7714
0.5985 72.0 56736 2.0256 4.6005 17.8229
0.5939 73.0 57524 2.0336 4.7289 17.8
0.5727 74.0 58312 2.0328 4.5894 17.7229
0.5702 75.0 59100 2.0436 4.7621 17.78
0.5744 76.0 59888 2.0662 4.6161 17.8057
0.5554 77.0 60676 2.0586 4.6424 17.8057
0.5436 78.0 61464 2.0532 4.5742 17.7886
0.5359 79.0 62252 2.0680 4.8312 17.7886
0.5291 80.0 63040 2.0858 4.6342 17.8457
0.5034 81.0 63828 2.0861 4.7405 17.8257
0.5155 82.0 64616 2.1003 4.3956 17.7571
0.4989 83.0 65404 2.1072 4.339 17.7914
0.4903 84.0 66192 2.1113 4.3804 17.8143
0.4836 85.0 66980 2.1202 4.5776 17.8371
0.4794 86.0 67768 2.1277 4.6548 17.7686
0.4689 87.0 68556 2.1360 4.6453 17.7571
0.4623 88.0 69344 2.1460 4.7885 17.7771
0.4551 89.0 70132 2.1610 4.5342 17.7686
0.4405 90.0 70920 2.1649 4.5593 17.8057
0.4478 91.0 71708 2.1518 4.4945 17.8314
0.4265 92.0 72496 2.1873 4.453 17.8086
0.4191 93.0 73284 2.1808 4.6432 17.8057
0.4169 94.0 74072 2.1871 4.5543 17.82
0.4087 95.0 74860 2.2109 4.8367 17.7971
0.4054 96.0 75648 2.2092 4.7079 17.8171
0.3872 97.0 76436 2.2103 4.6996 17.7943
0.3884 98.0 77224 2.2111 4.9398 17.8314
0.3837 99.0 78012 2.2316 4.7849 17.8143
0.3777 100.0 78800 2.2298 4.7595 17.8343
0.3719 101.0 79588 2.2404 4.6768 17.8457
0.364 102.0 80376 2.2658 4.5789 17.8229
0.3549 103.0 81164 2.2790 4.6549 17.8029
0.3598 104.0 81952 2.2953 4.7411 17.8486
0.346 105.0 82740 2.2812 4.7529 17.7657
0.3376 106.0 83528 2.2997 4.5128 17.7886
0.3363 107.0 84316 2.2938 4.6983 17.7914
0.3368 108.0 85104 2.2909 4.4977 17.8257
0.3243 109.0 85892 2.3100 4.5156 17.8286
0.3197 110.0 86680 2.3310 4.7516 17.7943
0.3165 111.0 87468 2.3354 4.608 17.8114
0.3128 112.0 88256 2.3334 4.7388 17.8314
0.3038 113.0 89044 2.3343 4.6356 17.7914
0.3055 114.0 89832 2.3553 4.6694 17.7971
0.2977 115.0 90620 2.3530 4.6176 17.8086
0.2925 116.0 91408 2.3687 4.6855 17.8886
0.2794 117.0 92196 2.3856 4.5948 17.84
0.2913 118.0 92984 2.3844 4.7569 17.7943
0.2812 119.0 93772 2.3973 4.6009 17.7629
0.2731 120.0 94560 2.4074 4.7287 17.8086
0.2781 121.0 95348 2.4083 4.7944 17.8571
0.2708 122.0 96136 2.4414 4.7454 17.8829
0.2607 123.0 96924 2.4202 4.5074 17.8486
0.2617 124.0 97712 2.4371 4.6055 17.8629
0.2527 125.0 98500 2.4314 4.5891 17.8
0.2528 126.0 99288 2.4548 4.8362 17.8571
0.2522 127.0 100076 2.4461 4.6966 17.8514
0.2434 128.0 100864 2.4492 4.5774 17.8514
0.2381 129.0 101652 2.4720 4.4607 17.86
0.2411 130.0 102440 2.4820 4.484 17.8371
0.2352 131.0 103228 2.4954 4.8091 17.8457
0.2275 132.0 104016 2.4863 4.7008 17.8743
0.2244 133.0 104804 2.5089 4.8076 17.8571
0.2251 134.0 105592 2.5085 4.7374 17.8029
0.2242 135.0 106380 2.4979 4.851 17.8171
0.2217 136.0 107168 2.5122 4.6295 17.8314
0.2111 137.0 107956 2.5131 4.6315 17.8229
0.2078 138.0 108744 2.5216 4.6177 17.8229
0.2113 139.0 109532 2.5292 4.5603 17.8257
0.21 140.0 110320 2.5494 4.6128 17.7971
0.1994 141.0 111108 2.5435 4.9231 17.8714
0.2018 142.0 111896 2.5605 4.827 17.8314
0.1971 143.0 112684 2.5624 4.8075 17.78
0.1959 144.0 113472 2.5666 4.6358 17.84
0.1916 145.0 114260 2.5740 4.6628 17.8257
0.1939 146.0 115048 2.5730 4.8445 17.8286
0.1832 147.0 115836 2.5918 4.8198 17.8571
0.1884 148.0 116624 2.6013 4.7955 17.8257
0.1777 149.0 117412 2.5996 4.7503 17.8114
0.1711 150.0 118200 2.5971 4.5452 17.8514
0.1843 151.0 118988 2.6075 4.817 17.8143
0.1747 152.0 119776 2.6161 4.5231 17.8257
0.1698 153.0 120564 2.6225 4.7232 17.82
0.1685 154.0 121352 2.6285 4.7105 17.8229
0.1685 155.0 122140 2.6443 4.4228 17.8686
0.1695 156.0 122928 2.6356 4.5458 17.8657
0.1649 157.0 123716 2.6418 4.5955 17.8286
0.1643 158.0 124504 2.6565 4.5943 17.8457
0.1573 159.0 125292 2.6434 4.762 17.8429
0.1573 160.0 126080 2.6615 4.5916 17.8229
0.1558 161.0 126868 2.6529 4.527 17.8371
0.1545 162.0 127656 2.6697 4.705 17.7886
0.1563 163.0 128444 2.6747 4.6848 17.8086
0.1529 164.0 129232 2.6711 4.5149 17.8171
0.151 165.0 130020 2.6807 4.6484 17.8543
0.1471 166.0 130808 2.6909 4.7488 17.8657
0.1465 167.0 131596 2.6889 4.6446 17.8086
0.1345 168.0 132384 2.6935 4.6107 17.7971
0.1447 169.0 133172 2.6971 4.4718 17.86
0.1426 170.0 133960 2.7083 4.6878 17.84
0.1402 171.0 134748 2.7053 4.7539 17.8286
0.1382 172.0 135536 2.7140 4.7697 17.8343
0.1367 173.0 136324 2.7221 4.6764 17.8429
0.1365 174.0 137112 2.7364 4.7535 17.8343
0.1277 175.0 137900 2.7232 4.7312 17.8343
0.1331 176.0 138688 2.7292 4.8578 17.8171
0.1332 177.0 139476 2.7565 4.7861 17.8
0.1291 178.0 140264 2.7577 4.8903 17.7686
0.1298 179.0 141052 2.7474 4.7653 17.8171
0.1268 180.0 141840 2.7466 4.7403 17.8143
0.123 181.0 142628 2.7517 4.7989 17.8171
0.1267 182.0 143416 2.7634 4.7267 17.84
0.1246 183.0 144204 2.7620 4.8103 17.8343
0.1221 184.0 144992 2.7686 4.968 17.8429
0.1202 185.0 145780 2.7624 4.806 17.7914
0.1222 186.0 146568 2.7735 4.8647 17.82
0.1187 187.0 147356 2.7775 4.5615 17.8229
0.1175 188.0 148144 2.7703 4.824 17.82
0.121 189.0 148932 2.7824 4.8669 17.78
0.114 190.0 149720 2.7807 4.8833 17.8257
0.1146 191.0 150508 2.7869 4.9505 17.7857
0.1133 192.0 151296 2.7900 4.9474 17.7257
0.1137 193.0 152084 2.8008 4.8476 17.7371
0.1098 194.0 152872 2.7971 4.736 17.7543
0.1072 195.0 153660 2.7956 4.7635 17.8057
0.1106 196.0 154448 2.8019 4.6805 17.7657
0.1077 197.0 155236 2.8134 4.6501 17.8029
0.1076 198.0 156024 2.8222 4.5361 17.82
0.1054 199.0 156812 2.8173 4.8964 17.78
0.1045 200.0 157600 2.8248 4.9418 17.7771
0.1083 201.0 158388 2.8214 4.8408 17.7829
0.1035 202.0 159176 2.8277 4.66 17.8
0.1033 203.0 159964 2.8342 4.616 17.8114
0.1013 204.0 160752 2.8392 4.7213 17.8371
0.1012 205.0 161540 2.8313 4.7918 17.8
0.1021 206.0 162328 2.8372 4.8182 17.8371
0.0979 207.0 163116 2.8500 4.759 17.8657
0.0985 208.0 163904 2.8458 4.6711 17.8171
0.1006 209.0 164692 2.8468 4.7997 17.8286
0.0994 210.0 165480 2.8426 4.7327 17.8571
0.0981 211.0 166268 2.8565 4.7288 17.8457
0.0985 212.0 167056 2.8608 4.8843 17.8457
0.0933 213.0 167844 2.8656 4.7052 17.8143
0.0963 214.0 168632 2.8650 4.8149 17.7771
0.092 215.0 169420 2.8569 4.6251 17.8
0.0958 216.0 170208 2.8688 4.7479 17.7714
0.094 217.0 170996 2.8657 4.7716 17.8229
0.0926 218.0 171784 2.8741 4.6749 17.8143
0.0924 219.0 172572 2.8727 4.8438 17.82
0.0932 220.0 173360 2.8749 4.6733 17.84
0.0899 221.0 174148 2.8774 4.6198 17.8286
0.0925 222.0 174936 2.8796 4.6945 17.8286
0.0904 223.0 175724 2.8872 4.6184 17.82
0.0886 224.0 176512 2.8974 4.74 17.7743
0.0898 225.0 177300 2.8879 4.5856 17.8229
0.0874 226.0 178088 2.8880 4.582 17.8171
0.0877 227.0 178876 2.8941 4.64 17.8057
0.0892 228.0 179664 2.8975 4.7271 17.8114
0.0857 229.0 180452 2.8957 4.6847 17.7943
0.088 230.0 181240 2.8950 4.7799 17.8086
0.0885 231.0 182028 2.9061 4.699 17.7829
0.0863 232.0 182816 2.9085 4.7863 17.7771
0.0853 233.0 183604 2.9083 4.7545 17.7857
0.0838 234.0 184392 2.9067 4.6354 17.7829
0.0835 235.0 185180 2.9139 4.5979 17.8371
0.0865 236.0 185968 2.9094 4.7646 17.8314
0.0853 237.0 186756 2.9127 4.6967 17.7971
0.082 238.0 187544 2.9205 4.7171 17.8029
0.0811 239.0 188332 2.9204 4.6172 17.7971
0.0837 240.0 189120 2.9202 4.6729 17.8057
0.0803 241.0 189908 2.9190 4.9057 17.8143
0.0813 242.0 190696 2.9236 4.7919 17.8429
0.0814 243.0 191484 2.9307 4.7492 17.8286
0.0822 244.0 192272 2.9238 4.7454 17.8429
0.0823 245.0 193060 2.9269 4.8462 17.8257
0.0803 246.0 193848 2.9293 4.738 17.8286
0.0806 247.0 194636 2.9280 4.8432 17.78
0.0757 248.0 195424 2.9371 4.8563 17.8171
0.0774 249.0 196212 2.9330 4.7717 17.8057
0.079 250.0 197000 2.9373 4.7938 17.8371
0.0784 251.0 197788 2.9397 4.8316 17.82
0.0801 252.0 198576 2.9378 4.9071 17.8314
0.0795 253.0 199364 2.9366 4.8581 17.8343
0.077 254.0 200152 2.9372 4.8495 17.7971
0.0787 255.0 200940 2.9447 4.8479 17.8086
0.077 256.0 201728 2.9380 4.8716 17.84
0.0765 257.0 202516 2.9410 4.8944 17.7571
0.0762 258.0 203304 2.9423 4.7536 17.7971
0.0772 259.0 204092 2.9485 4.8251 17.8343
0.0761 260.0 204880 2.9401 4.7726 17.82
0.0766 261.0 205668 2.9427 4.8626 17.8286
0.0766 262.0 206456 2.9428 5.0326 17.8143
0.074 263.0 207244 2.9463 5.0095 17.8286
0.0758 264.0 208032 2.9497 4.987 17.8029
0.0778 265.0 208820 2.9534 4.9829 17.8086
0.0748 266.0 209608 2.9521 4.9309 17.8286
0.0759 267.0 210396 2.9519 4.9294 17.84
0.0738 268.0 211184 2.9521 4.9953 17.8486
0.077 269.0 211972 2.9521 4.8414 17.8486
0.0759 270.0 212760 2.9533 4.8158 17.8286
0.0725 271.0 213548 2.9534 4.8427 17.8457
0.0749 272.0 214336 2.9512 4.8769 17.8314
0.0745 273.0 215124 2.9520 4.8782 17.8257
0.0723 274.0 215912 2.9546 4.8465 17.8229
0.0748 275.0 216700 2.9567 4.8704 17.8343
0.072 276.0 217488 2.9569 4.8633 17.8371
0.0747 277.0 218276 2.9578 4.8667 17.8457
0.0722 278.0 219064 2.9566 4.8686 17.8371
0.0733 279.0 219852 2.9563 4.846 17.84
0.0713 280.0 220640 2.9566 4.8513 17.84

Framework versions

  • Transformers 4.15.0
  • Pytorch 1.12.1+cu113
  • Datasets 2.3.2
  • Tokenizers 0.10.3
Downloads last month
16

Dataset used to train thivy/t5-base-finetuned-en-to-no

Evaluation results