ydshieh's picture
ydshieh HF staff
Saving weights and logs of epoch 3 - step 6921
40adfb7
raw
history blame
72.3 kB
Epoch... (1/30 | Step: 10 | Loss: 3.4405534267425537, Learning Rate: 2.999609750986565e-05)
Epoch... (1/30 | Step: 20 | Loss: 3.218325614929199, Learning Rate: 2.9991762858117e-05)
Epoch... (1/30 | Step: 30 | Loss: 3.1018149852752686, Learning Rate: 2.998742820636835e-05)
Epoch... (1/30 | Step: 40 | Loss: 3.022020101547241, Learning Rate: 2.99830935546197e-05)
Epoch... (1/30 | Step: 50 | Loss: 2.981201648712158, Learning Rate: 2.997875890287105e-05)
Epoch... (1/30 | Step: 60 | Loss: 2.8424253463745117, Learning Rate: 2.9974426070111804e-05)
Epoch... (1/30 | Step: 70 | Loss: 2.907778263092041, Learning Rate: 2.9970091418363154e-05)
Epoch... (1/30 | Step: 80 | Loss: 2.7866015434265137, Learning Rate: 2.9965756766614504e-05)
Epoch... (1/30 | Step: 90 | Loss: 2.8242785930633545, Learning Rate: 2.996142029587645e-05)
Epoch... (1/30 | Step: 100 | Loss: 2.706552028656006, Learning Rate: 2.99570856441278e-05)
Epoch... (1/30 | Step: 110 | Loss: 2.611888885498047, Learning Rate: 2.995275099237915e-05)
Epoch... (1/30 | Step: 120 | Loss: 2.595040798187256, Learning Rate: 2.99484163406305e-05)
Epoch... (1/30 | Step: 130 | Loss: 2.6346092224121094, Learning Rate: 2.9944081688881852e-05)
Epoch... (1/30 | Step: 140 | Loss: 2.608229160308838, Learning Rate: 2.9939747037133202e-05)
Epoch... (1/30 | Step: 150 | Loss: 2.628932476043701, Learning Rate: 2.9935414204373956e-05)
Epoch... (1/30 | Step: 160 | Loss: 2.4869046211242676, Learning Rate: 2.9931079552625306e-05)
Epoch... (1/30 | Step: 170 | Loss: 2.5518672466278076, Learning Rate: 2.9926744900876656e-05)
Epoch... (1/30 | Step: 180 | Loss: 2.475501298904419, Learning Rate: 2.9922410249128006e-05)
Epoch... (1/30 | Step: 190 | Loss: 2.7231974601745605, Learning Rate: 2.9918073778389953e-05)
Epoch... (1/30 | Step: 200 | Loss: 2.500187873840332, Learning Rate: 2.9913739126641303e-05)
Epoch... (1/30 | Step: 210 | Loss: 2.5470097064971924, Learning Rate: 2.9909404474892654e-05)
Epoch... (1/30 | Step: 220 | Loss: 2.3819117546081543, Learning Rate: 2.9905069823144004e-05)
Epoch... (1/30 | Step: 230 | Loss: 2.5723557472229004, Learning Rate: 2.9900735171395354e-05)
Epoch... (1/30 | Step: 240 | Loss: 2.440537929534912, Learning Rate: 2.9896402338636108e-05)
Epoch... (1/30 | Step: 250 | Loss: 2.559695243835449, Learning Rate: 2.9892067686887458e-05)
Epoch... (1/30 | Step: 260 | Loss: 2.5182833671569824, Learning Rate: 2.9887733035138808e-05)
Epoch... (1/30 | Step: 270 | Loss: 2.405858039855957, Learning Rate: 2.988339838339016e-05)
Epoch... (1/30 | Step: 280 | Loss: 2.4234917163848877, Learning Rate: 2.987906373164151e-05)
Epoch... (1/30 | Step: 290 | Loss: 2.36462140083313, Learning Rate: 2.9874727260903455e-05)
Epoch... (1/30 | Step: 300 | Loss: 2.4654769897460938, Learning Rate: 2.9870392609154806e-05)
Epoch... (1/30 | Step: 300 | Loss: 2.4654769897460938, Learning Rate: 2.9870392609154806e-05)
Epoch... (1/30 | Step: 300 | Eval Loss: 2.3369832038879395 | Eval rouge1: 36.6481 | Eval rouge2: 12.0172 | Eval rougeL: 33.4031 | Eval rougeLsum: 33.4031 | Eval gen_len: 10.6758 |)
Epoch... (1/30 | Step: 310 | Loss: 2.2441658973693848, Learning Rate: 2.9866057957406156e-05)
Epoch... (1/30 | Step: 320 | Loss: 2.381657361984253, Learning Rate: 2.986172512464691e-05)
Epoch... (1/30 | Step: 330 | Loss: 2.39951753616333, Learning Rate: 2.985739047289826e-05)
Epoch... (1/30 | Step: 340 | Loss: 2.4004015922546387, Learning Rate: 2.985305582114961e-05)
Epoch... (1/30 | Step: 350 | Loss: 2.3319690227508545, Learning Rate: 2.984872116940096e-05)
Epoch... (1/30 | Step: 360 | Loss: 2.3237192630767822, Learning Rate: 2.984438651765231e-05)
Epoch... (1/30 | Step: 370 | Loss: 2.381218671798706, Learning Rate: 2.984005186590366e-05)
Epoch... (1/30 | Step: 380 | Loss: 2.309722900390625, Learning Rate: 2.9835715395165607e-05)
Epoch... (1/30 | Step: 390 | Loss: 2.3941807746887207, Learning Rate: 2.9831380743416958e-05)
Epoch... (1/30 | Step: 400 | Loss: 2.3451006412506104, Learning Rate: 2.9827046091668308e-05)
Epoch... (1/30 | Step: 410 | Loss: 2.278620719909668, Learning Rate: 2.982271325890906e-05)
Epoch... (1/30 | Step: 420 | Loss: 2.258894920349121, Learning Rate: 2.9818378607160412e-05)
Epoch... (1/30 | Step: 430 | Loss: 2.334801197052002, Learning Rate: 2.9814043955411762e-05)
Epoch... (1/30 | Step: 440 | Loss: 2.358175754547119, Learning Rate: 2.9809709303663112e-05)
Epoch... (1/30 | Step: 450 | Loss: 2.342679977416992, Learning Rate: 2.9805374651914462e-05)
Epoch... (1/30 | Step: 460 | Loss: 2.3427581787109375, Learning Rate: 2.9801040000165813e-05)
Epoch... (1/30 | Step: 470 | Loss: 2.2662670612335205, Learning Rate: 2.9796705348417163e-05)
Epoch... (1/30 | Step: 480 | Loss: 2.3363449573516846, Learning Rate: 2.979236887767911e-05)
Epoch... (1/30 | Step: 490 | Loss: 2.3524205684661865, Learning Rate: 2.978803422593046e-05)
Epoch... (1/30 | Step: 500 | Loss: 2.33699369430542, Learning Rate: 2.9783701393171214e-05)
Epoch... (1/30 | Step: 510 | Loss: 2.254800319671631, Learning Rate: 2.9779366741422564e-05)
Epoch... (1/30 | Step: 520 | Loss: 2.2564821243286133, Learning Rate: 2.9775032089673914e-05)
Epoch... (1/30 | Step: 530 | Loss: 2.312403678894043, Learning Rate: 2.9770697437925264e-05)
Epoch... (1/30 | Step: 540 | Loss: 2.361353874206543, Learning Rate: 2.9766362786176614e-05)
Epoch... (1/30 | Step: 550 | Loss: 2.231563091278076, Learning Rate: 2.9762028134427965e-05)
Epoch... (1/30 | Step: 560 | Loss: 2.23984956741333, Learning Rate: 2.9757693482679315e-05)
Epoch... (1/30 | Step: 570 | Loss: 2.294980049133301, Learning Rate: 2.9753358830930665e-05)
Epoch... (1/30 | Step: 580 | Loss: 2.234550952911377, Learning Rate: 2.9749022360192612e-05)
Epoch... (1/30 | Step: 590 | Loss: 2.2543816566467285, Learning Rate: 2.9744689527433366e-05)
Epoch... (1/30 | Step: 600 | Loss: 2.249704360961914, Learning Rate: 2.9740354875684716e-05)
Epoch... (1/30 | Step: 600 | Loss: 2.249704360961914, Learning Rate: 2.9740354875684716e-05)
Epoch... (1/30 | Step: 600 | Eval Loss: 2.2133584022521973 | Eval rouge1: 38.2794 | Eval rouge2: 13.1501 | Eval rougeL: 34.8961 | Eval rougeLsum: 34.8948 | Eval gen_len: 11.0128 |)
Epoch... (1/30 | Step: 610 | Loss: 2.2616004943847656, Learning Rate: 2.9736020223936066e-05)
Epoch... (1/30 | Step: 620 | Loss: 2.280752658843994, Learning Rate: 2.9731685572187416e-05)
Epoch... (1/30 | Step: 630 | Loss: 2.1695902347564697, Learning Rate: 2.9727350920438766e-05)
Epoch... (1/30 | Step: 640 | Loss: 2.3159074783325195, Learning Rate: 2.9723016268690117e-05)
Epoch... (1/30 | Step: 650 | Loss: 2.2354726791381836, Learning Rate: 2.9718681616941467e-05)
Epoch... (1/30 | Step: 660 | Loss: 2.2967095375061035, Learning Rate: 2.9714346965192817e-05)
Epoch... (1/30 | Step: 670 | Loss: 2.3010551929473877, Learning Rate: 2.9710012313444167e-05)
Epoch... (1/30 | Step: 680 | Loss: 2.292668342590332, Learning Rate: 2.9705677661695518e-05)
Epoch... (1/30 | Step: 690 | Loss: 2.195081949234009, Learning Rate: 2.9701343009946868e-05)
Epoch... (1/30 | Step: 700 | Loss: 2.296633720397949, Learning Rate: 2.9697008358198218e-05)
Epoch... (1/30 | Step: 710 | Loss: 2.149764060974121, Learning Rate: 2.9692673706449568e-05)
Epoch... (1/30 | Step: 720 | Loss: 2.2461729049682617, Learning Rate: 2.968833905470092e-05)
Epoch... (1/30 | Step: 730 | Loss: 2.2976291179656982, Learning Rate: 2.968400440295227e-05)
Epoch... (1/30 | Step: 740 | Loss: 2.2700982093811035, Learning Rate: 2.967966975120362e-05)
Epoch... (1/30 | Step: 750 | Loss: 2.2898383140563965, Learning Rate: 2.967533509945497e-05)
Epoch... (1/30 | Step: 760 | Loss: 2.2785892486572266, Learning Rate: 2.967100044770632e-05)
Epoch... (1/30 | Step: 770 | Loss: 2.1977713108062744, Learning Rate: 2.966666579595767e-05)
Epoch... (1/30 | Step: 780 | Loss: 2.214864730834961, Learning Rate: 2.966233114420902e-05)
Epoch... (1/30 | Step: 790 | Loss: 2.2334184646606445, Learning Rate: 2.965799649246037e-05)
Epoch... (1/30 | Step: 800 | Loss: 2.2037973403930664, Learning Rate: 2.965366184071172e-05)
Epoch... (1/30 | Step: 810 | Loss: 2.174184560775757, Learning Rate: 2.964932718896307e-05)
Epoch... (1/30 | Step: 820 | Loss: 2.2716355323791504, Learning Rate: 2.964499253721442e-05)
Epoch... (1/30 | Step: 830 | Loss: 2.193842887878418, Learning Rate: 2.964065788546577e-05)
Epoch... (1/30 | Step: 840 | Loss: 2.249634265899658, Learning Rate: 2.963632323371712e-05)
Epoch... (1/30 | Step: 850 | Loss: 2.237217426300049, Learning Rate: 2.963198858196847e-05)
Epoch... (1/30 | Step: 860 | Loss: 2.172455310821533, Learning Rate: 2.9627655749209225e-05)
Epoch... (1/30 | Step: 870 | Loss: 2.05983829498291, Learning Rate: 2.9623319278471172e-05)
Epoch... (1/30 | Step: 880 | Loss: 2.3632073402404785, Learning Rate: 2.9618984626722522e-05)
Epoch... (1/30 | Step: 890 | Loss: 2.254265785217285, Learning Rate: 2.9614649974973872e-05)
Epoch... (1/30 | Step: 900 | Loss: 2.2401223182678223, Learning Rate: 2.9610315323225223e-05)
Epoch... (1/30 | Step: 900 | Loss: 2.2401223182678223, Learning Rate: 2.9610315323225223e-05)
Epoch... (1/30 | Step: 900 | Eval Loss: 2.152062177658081 | Eval rouge1: 39.5335 | Eval rouge2: 14.3557 | Eval rougeL: 35.8974 | Eval rougeLsum: 35.9057 | Eval gen_len: 10.8698 |)
Epoch... (1/30 | Step: 910 | Loss: 2.145946979522705, Learning Rate: 2.9605980671476573e-05)
Epoch... (1/30 | Step: 920 | Loss: 2.0916032791137695, Learning Rate: 2.9601646019727923e-05)
Epoch... (1/30 | Step: 930 | Loss: 2.1092920303344727, Learning Rate: 2.9597311367979273e-05)
Epoch... (1/30 | Step: 940 | Loss: 2.2093448638916016, Learning Rate: 2.9592976716230623e-05)
Epoch... (1/30 | Step: 950 | Loss: 2.1340670585632324, Learning Rate: 2.9588643883471377e-05)
Epoch... (1/30 | Step: 960 | Loss: 2.104341506958008, Learning Rate: 2.9584309231722727e-05)
Epoch... (1/30 | Step: 970 | Loss: 2.1689233779907227, Learning Rate: 2.9579972760984674e-05)
Epoch... (1/30 | Step: 980 | Loss: 2.1623427867889404, Learning Rate: 2.9575638109236024e-05)
Epoch... (1/30 | Step: 990 | Loss: 2.050921678543091, Learning Rate: 2.9571303457487375e-05)
Epoch... (1/30 | Step: 1000 | Loss: 2.2413523197174072, Learning Rate: 2.9566968805738725e-05)
Epoch... (1/30 | Step: 1010 | Loss: 2.143608570098877, Learning Rate: 2.9562634153990075e-05)
Epoch... (1/30 | Step: 1020 | Loss: 2.1761255264282227, Learning Rate: 2.9558299502241425e-05)
Epoch... (1/30 | Step: 1030 | Loss: 2.2119503021240234, Learning Rate: 2.955396666948218e-05)
Epoch... (1/30 | Step: 1040 | Loss: 2.071683645248413, Learning Rate: 2.954963201773353e-05)
Epoch... (1/30 | Step: 1050 | Loss: 2.2042810916900635, Learning Rate: 2.954529736598488e-05)
Epoch... (1/30 | Step: 1060 | Loss: 2.1775331497192383, Learning Rate: 2.954096271423623e-05)
Epoch... (1/30 | Step: 1070 | Loss: 2.0984702110290527, Learning Rate: 2.9536626243498176e-05)
Epoch... (1/30 | Step: 1080 | Loss: 2.1763856410980225, Learning Rate: 2.9532291591749527e-05)
Epoch... (1/30 | Step: 1090 | Loss: 2.2860050201416016, Learning Rate: 2.9527956940000877e-05)
Epoch... (1/30 | Step: 1100 | Loss: 2.125678062438965, Learning Rate: 2.9523622288252227e-05)
Epoch... (1/30 | Step: 1110 | Loss: 2.127748727798462, Learning Rate: 2.9519287636503577e-05)
Epoch... (1/30 | Step: 1120 | Loss: 2.092984199523926, Learning Rate: 2.951495480374433e-05)
Epoch... (1/30 | Step: 1130 | Loss: 2.1310806274414062, Learning Rate: 2.951062015199568e-05)
Epoch... (1/30 | Step: 1140 | Loss: 2.1979918479919434, Learning Rate: 2.950628550024703e-05)
Epoch... (1/30 | Step: 1150 | Loss: 2.229048013687134, Learning Rate: 2.950195084849838e-05)
Epoch... (1/30 | Step: 1160 | Loss: 2.143617630004883, Learning Rate: 2.949761437776033e-05)
Epoch... (1/30 | Step: 1170 | Loss: 2.162456750869751, Learning Rate: 2.949327972601168e-05)
Epoch... (1/30 | Step: 1180 | Loss: 2.1484286785125732, Learning Rate: 2.948894507426303e-05)
Epoch... (1/30 | Step: 1190 | Loss: 2.19675350189209, Learning Rate: 2.948461042251438e-05)
Epoch... (1/30 | Step: 1200 | Loss: 2.069185972213745, Learning Rate: 2.948027577076573e-05)
Epoch... (1/30 | Step: 1200 | Loss: 2.069185972213745, Learning Rate: 2.948027577076573e-05)
Epoch... (1/30 | Step: 1200 | Eval Loss: 2.118244171142578 | Eval rouge1: 39.626 | Eval rouge2: 14.2226 | Eval rougeL: 36.0901 | Eval rougeLsum: 36.0902 | Eval gen_len: 10.9209 |)
Epoch... (1/30 | Step: 1210 | Loss: 2.143256664276123, Learning Rate: 2.9475942938006483e-05)
Epoch... (1/30 | Step: 1220 | Loss: 2.1436400413513184, Learning Rate: 2.9471608286257833e-05)
Epoch... (1/30 | Step: 1230 | Loss: 2.2154154777526855, Learning Rate: 2.9467273634509183e-05)
Epoch... (1/30 | Step: 1240 | Loss: 2.1441659927368164, Learning Rate: 2.9462938982760534e-05)
Epoch... (1/30 | Step: 1250 | Loss: 2.174199104309082, Learning Rate: 2.9458604331011884e-05)
Epoch... (1/30 | Step: 1260 | Loss: 2.1268279552459717, Learning Rate: 2.945426786027383e-05)
Epoch... (1/30 | Step: 1270 | Loss: 2.126941204071045, Learning Rate: 2.944993320852518e-05)
Epoch... (1/30 | Step: 1280 | Loss: 2.119166612625122, Learning Rate: 2.944559855677653e-05)
Epoch... (1/30 | Step: 1290 | Loss: 2.2846920490264893, Learning Rate: 2.944126390502788e-05)
Epoch... (1/30 | Step: 1300 | Loss: 2.1685166358947754, Learning Rate: 2.9436931072268635e-05)
Epoch... (1/30 | Step: 1310 | Loss: 2.151987314224243, Learning Rate: 2.9432596420519985e-05)
Epoch... (1/30 | Step: 1320 | Loss: 2.103717565536499, Learning Rate: 2.9428261768771335e-05)
Epoch... (1/30 | Step: 1330 | Loss: 2.155966281890869, Learning Rate: 2.9423927117022686e-05)
Epoch... (1/30 | Step: 1340 | Loss: 2.1677801609039307, Learning Rate: 2.9419592465274036e-05)
Epoch... (1/30 | Step: 1350 | Loss: 2.143979549407959, Learning Rate: 2.9415255994535983e-05)
Epoch... (1/30 | Step: 1360 | Loss: 2.229569911956787, Learning Rate: 2.9410921342787333e-05)
Epoch... (1/30 | Step: 1370 | Loss: 2.0859322547912598, Learning Rate: 2.9406586691038683e-05)
Epoch... (1/30 | Step: 1380 | Loss: 2.2380738258361816, Learning Rate: 2.9402252039290033e-05)
Epoch... (1/30 | Step: 1390 | Loss: 2.10669207572937, Learning Rate: 2.9397919206530787e-05)
Epoch... (1/30 | Step: 1400 | Loss: 2.1286675930023193, Learning Rate: 2.9393584554782137e-05)
Epoch... (1/30 | Step: 1410 | Loss: 2.140237331390381, Learning Rate: 2.9389249903033487e-05)
Epoch... (1/30 | Step: 1420 | Loss: 2.081178665161133, Learning Rate: 2.9384915251284838e-05)
Epoch... (1/30 | Step: 1430 | Loss: 2.0578155517578125, Learning Rate: 2.9380580599536188e-05)
Epoch... (1/30 | Step: 1440 | Loss: 2.082831859588623, Learning Rate: 2.9376245947787538e-05)
Epoch... (1/30 | Step: 1450 | Loss: 2.1357812881469727, Learning Rate: 2.937191129603889e-05)
Epoch... (1/30 | Step: 1460 | Loss: 2.164750576019287, Learning Rate: 2.9367574825300835e-05)
Epoch... (1/30 | Step: 1470 | Loss: 2.0534393787384033, Learning Rate: 2.9363240173552185e-05)
Epoch... (1/30 | Step: 1480 | Loss: 2.1811447143554688, Learning Rate: 2.935890734079294e-05)
Epoch... (1/30 | Step: 1490 | Loss: 2.1194841861724854, Learning Rate: 2.935457268904429e-05)
Epoch... (1/30 | Step: 1500 | Loss: 2.0982208251953125, Learning Rate: 2.935023803729564e-05)
Epoch... (1/30 | Step: 1500 | Loss: 2.0982208251953125, Learning Rate: 2.935023803729564e-05)
Epoch... (1/30 | Step: 1500 | Eval Loss: 2.087283134460449 | Eval rouge1: 39.7247 | Eval rouge2: 14.3773 | Eval rougeL: 36.2126 | Eval rougeLsum: 36.2124 | Eval gen_len: 10.922 |)
Epoch... (1/30 | Step: 1510 | Loss: 2.0628573894500732, Learning Rate: 2.934590338554699e-05)
Epoch... (1/30 | Step: 1520 | Loss: 2.0424842834472656, Learning Rate: 2.934156873379834e-05)
Epoch... (1/30 | Step: 1530 | Loss: 2.157275676727295, Learning Rate: 2.933723408204969e-05)
Epoch... (1/30 | Step: 1540 | Loss: 2.1352427005767822, Learning Rate: 2.933289943030104e-05)
Epoch... (1/30 | Step: 1550 | Loss: 2.1653575897216797, Learning Rate: 2.9328562959562987e-05)
Epoch... (1/30 | Step: 1560 | Loss: 2.0672290325164795, Learning Rate: 2.9324228307814337e-05)
Epoch... (1/30 | Step: 1570 | Loss: 2.097109794616699, Learning Rate: 2.931989547505509e-05)
Epoch... (1/30 | Step: 1580 | Loss: 2.087357997894287, Learning Rate: 2.931556082330644e-05)
Epoch... (1/30 | Step: 1590 | Loss: 2.1381149291992188, Learning Rate: 2.931122617155779e-05)
Epoch... (1/30 | Step: 1600 | Loss: 2.1855034828186035, Learning Rate: 2.9306891519809142e-05)
Epoch... (1/30 | Step: 1610 | Loss: 2.183502674102783, Learning Rate: 2.9302556868060492e-05)
Epoch... (1/30 | Step: 1620 | Loss: 2.1278882026672363, Learning Rate: 2.9298222216311842e-05)
Epoch... (1/30 | Step: 1630 | Loss: 2.086331605911255, Learning Rate: 2.9293887564563192e-05)
Epoch... (1/30 | Step: 1640 | Loss: 2.0485429763793945, Learning Rate: 2.9289552912814543e-05)
Epoch... (1/30 | Step: 1650 | Loss: 2.0960192680358887, Learning Rate: 2.9285218261065893e-05)
Epoch... (1/30 | Step: 1660 | Loss: 2.078531265258789, Learning Rate: 2.9280883609317243e-05)
Epoch... (1/30 | Step: 1670 | Loss: 2.1528992652893066, Learning Rate: 2.9276548957568593e-05)
Epoch... (1/30 | Step: 1680 | Loss: 2.044875144958496, Learning Rate: 2.9272214305819944e-05)
Epoch... (1/30 | Step: 1690 | Loss: 2.1002728939056396, Learning Rate: 2.9267879654071294e-05)
Epoch... (1/30 | Step: 1700 | Loss: 2.0666818618774414, Learning Rate: 2.9263545002322644e-05)
Epoch... (1/30 | Step: 1710 | Loss: 2.076720714569092, Learning Rate: 2.9259210350573994e-05)
Epoch... (1/30 | Step: 1720 | Loss: 2.0573716163635254, Learning Rate: 2.9254875698825344e-05)
Epoch... (1/30 | Step: 1730 | Loss: 2.0897603034973145, Learning Rate: 2.9250541047076695e-05)
Epoch... (1/30 | Step: 1740 | Loss: 2.141058921813965, Learning Rate: 2.9246206395328045e-05)
Epoch... (1/30 | Step: 1750 | Loss: 2.1023364067077637, Learning Rate: 2.9241871743579395e-05)
Epoch... (1/30 | Step: 1760 | Loss: 2.1418118476867676, Learning Rate: 2.9237537091830745e-05)
Epoch... (1/30 | Step: 1770 | Loss: 2.246041774749756, Learning Rate: 2.9233202440082096e-05)
Epoch... (1/30 | Step: 1780 | Loss: 2.1305742263793945, Learning Rate: 2.9228867788333446e-05)
Epoch... (1/30 | Step: 1790 | Loss: 2.1510705947875977, Learning Rate: 2.9224533136584796e-05)
Epoch... (1/30 | Step: 1800 | Loss: 2.2788705825805664, Learning Rate: 2.9220198484836146e-05)
Epoch... (1/30 | Step: 1800 | Loss: 2.2788705825805664, Learning Rate: 2.9220198484836146e-05)
Epoch... (1/30 | Step: 1800 | Eval Loss: 2.0597662925720215 | Eval rouge1: 40.5653 | Eval rouge2: 15.0792 | Eval rougeL: 36.9261 | Eval rougeLsum: 36.9178 | Eval gen_len: 10.8504 |)
Epoch... (1/30 | Step: 1810 | Loss: 2.1530094146728516, Learning Rate: 2.9215863833087496e-05)
Epoch... (1/30 | Step: 1820 | Loss: 2.216090679168701, Learning Rate: 2.9211529181338847e-05)
Epoch... (1/30 | Step: 1830 | Loss: 2.138352155685425, Learning Rate: 2.92071963485796e-05)
Epoch... (1/30 | Step: 1840 | Loss: 2.0977377891540527, Learning Rate: 2.9202859877841547e-05)
Epoch... (1/30 | Step: 1850 | Loss: 2.1130166053771973, Learning Rate: 2.9198525226092897e-05)
Epoch... (1/30 | Step: 1860 | Loss: 1.9936583042144775, Learning Rate: 2.9194190574344248e-05)
Epoch... (1/30 | Step: 1870 | Loss: 2.0906026363372803, Learning Rate: 2.9189855922595598e-05)
Epoch... (1/30 | Step: 1880 | Loss: 2.1219160556793213, Learning Rate: 2.9185521270846948e-05)
Epoch... (1/30 | Step: 1890 | Loss: 2.136955738067627, Learning Rate: 2.9181186619098298e-05)
Epoch... (1/30 | Step: 1900 | Loss: 2.098308563232422, Learning Rate: 2.917685196734965e-05)
Epoch... (1/30 | Step: 1910 | Loss: 2.095719337463379, Learning Rate: 2.9172517315601e-05)
Epoch... (1/30 | Step: 1920 | Loss: 2.089087724685669, Learning Rate: 2.9168184482841752e-05)
Epoch... (1/30 | Step: 1930 | Loss: 2.167004346847534, Learning Rate: 2.9163849831093103e-05)
Epoch... (1/30 | Step: 1940 | Loss: 1.9983344078063965, Learning Rate: 2.915951336035505e-05)
Epoch... (1/30 | Step: 1950 | Loss: 2.1173758506774902, Learning Rate: 2.91551787086064e-05)
Epoch... (1/30 | Step: 1960 | Loss: 2.0237390995025635, Learning Rate: 2.915084405685775e-05)
Epoch... (1/30 | Step: 1970 | Loss: 2.0724878311157227, Learning Rate: 2.91465094051091e-05)
Epoch... (1/30 | Step: 1980 | Loss: 2.0563502311706543, Learning Rate: 2.914217475336045e-05)
Epoch... (1/30 | Step: 1990 | Loss: 2.088345527648926, Learning Rate: 2.91378401016118e-05)
Epoch... (1/30 | Step: 2000 | Loss: 2.14585542678833, Learning Rate: 2.913350544986315e-05)
Epoch... (1/30 | Step: 2010 | Loss: 2.0599942207336426, Learning Rate: 2.9129172617103904e-05)
Epoch... (1/30 | Step: 2020 | Loss: 2.049421787261963, Learning Rate: 2.9124837965355255e-05)
Epoch... (1/30 | Step: 2030 | Loss: 2.032505989074707, Learning Rate: 2.9120503313606605e-05)
Epoch... (1/30 | Step: 2040 | Loss: 2.1009111404418945, Learning Rate: 2.911616684286855e-05)
Epoch... (1/30 | Step: 2050 | Loss: 2.0961179733276367, Learning Rate: 2.9111832191119902e-05)
Epoch... (1/30 | Step: 2060 | Loss: 2.0474748611450195, Learning Rate: 2.9107497539371252e-05)
Epoch... (1/30 | Step: 2070 | Loss: 2.1285176277160645, Learning Rate: 2.9103162887622602e-05)
Epoch... (1/30 | Step: 2080 | Loss: 2.0173821449279785, Learning Rate: 2.9098828235873953e-05)
Epoch... (1/30 | Step: 2090 | Loss: 2.1344692707061768, Learning Rate: 2.9094493584125303e-05)
Epoch... (1/30 | Step: 2100 | Loss: 2.0788259506225586, Learning Rate: 2.9090160751366057e-05)
Epoch... (1/30 | Step: 2100 | Loss: 2.0788259506225586, Learning Rate: 2.9090160751366057e-05)
Epoch... (1/30 | Step: 2100 | Eval Loss: 2.037459373474121 | Eval rouge1: 40.5198 | Eval rouge2: 15.162 | Eval rougeL: 36.9107 | Eval rougeLsum: 36.9123 | Eval gen_len: 10.8398 |)
Epoch... (1/30 | Step: 2110 | Loss: 2.0843522548675537, Learning Rate: 2.9085826099617407e-05)
Epoch... (1/30 | Step: 2120 | Loss: 2.0523273944854736, Learning Rate: 2.9081491447868757e-05)
Epoch... (1/30 | Step: 2130 | Loss: 2.0692148208618164, Learning Rate: 2.9077154977130704e-05)
Epoch... (1/30 | Step: 2140 | Loss: 2.0720198154449463, Learning Rate: 2.9072820325382054e-05)
Epoch... (1/30 | Step: 2150 | Loss: 2.018003463745117, Learning Rate: 2.9068485673633404e-05)
Epoch... (1/30 | Step: 2160 | Loss: 2.0908870697021484, Learning Rate: 2.9064151021884754e-05)
Epoch... (1/30 | Step: 2170 | Loss: 1.9835797548294067, Learning Rate: 2.9059816370136105e-05)
Epoch... (1/30 | Step: 2180 | Loss: 2.033381462097168, Learning Rate: 2.9055481718387455e-05)
Epoch... (1/30 | Step: 2190 | Loss: 2.0725207328796387, Learning Rate: 2.905114888562821e-05)
Epoch... (1/30 | Step: 2200 | Loss: 2.0173754692077637, Learning Rate: 2.904681423387956e-05)
Epoch... (1/30 | Step: 2210 | Loss: 2.1233925819396973, Learning Rate: 2.904247958213091e-05)
Epoch... (1/30 | Step: 2220 | Loss: 2.0552401542663574, Learning Rate: 2.903814493038226e-05)
Epoch... (1/30 | Step: 2230 | Loss: 2.046525001525879, Learning Rate: 2.9033808459644206e-05)
Epoch... (1/30 | Step: 2240 | Loss: 2.064979076385498, Learning Rate: 2.9029473807895556e-05)
Epoch... (1/30 | Step: 2250 | Loss: 2.0798213481903076, Learning Rate: 2.9025139156146906e-05)
Epoch... (1/30 | Step: 2260 | Loss: 2.0388433933258057, Learning Rate: 2.9020804504398257e-05)
Epoch... (1/30 | Step: 2270 | Loss: 2.0377979278564453, Learning Rate: 2.901647167163901e-05)
Epoch... (1/30 | Step: 2280 | Loss: 2.0889925956726074, Learning Rate: 2.901213701989036e-05)
Epoch... (1/30 | Step: 2290 | Loss: 2.05222749710083, Learning Rate: 2.900780236814171e-05)
Epoch... (1/30 | Step: 2300 | Loss: 2.033143997192383, Learning Rate: 2.900346771639306e-05)
Epoch... (1/30 | Step: 2307 | Loss: 2.164933443069458, Learning Rate: 2.9000433642067946e-05)
Epoch... (1/30 | Step: 2307 | Eval Loss: 2.0284838676452637 | Eval rouge1: 40.5299 | Eval rouge2: 15.057 | Eval rougeL: 36.9003 | Eval rougeLsum: 36.8978 | Eval gen_len: 10.9925 |)
Epoch... (2/30 | Step: 2310 | Loss: 1.9816746711730957, Learning Rate: 2.899913306464441e-05)
Epoch... (2/30 | Step: 2320 | Loss: 1.9637086391448975, Learning Rate: 2.8994796593906358e-05)
Epoch... (2/30 | Step: 2330 | Loss: 1.9946067333221436, Learning Rate: 2.8990461942157708e-05)
Epoch... (2/30 | Step: 2340 | Loss: 1.9579181671142578, Learning Rate: 2.898612729040906e-05)
Epoch... (2/30 | Step: 2350 | Loss: 2.0120954513549805, Learning Rate: 2.898179263866041e-05)
Epoch... (2/30 | Step: 2360 | Loss: 1.933640480041504, Learning Rate: 2.8977459805901162e-05)
Epoch... (2/30 | Step: 2370 | Loss: 2.001372814178467, Learning Rate: 2.8973125154152513e-05)
Epoch... (2/30 | Step: 2380 | Loss: 1.9272139072418213, Learning Rate: 2.8968790502403863e-05)
Epoch... (2/30 | Step: 2390 | Loss: 2.0162031650543213, Learning Rate: 2.8964455850655213e-05)
Epoch... (2/30 | Step: 2400 | Loss: 1.9704089164733887, Learning Rate: 2.8960121198906563e-05)
Epoch... (2/30 | Step: 2400 | Loss: 1.9704089164733887, Learning Rate: 2.8960121198906563e-05)
Epoch... (2/30 | Step: 2400 | Eval Loss: 2.024594306945801 | Eval rouge1: 40.4764 | Eval rouge2: 14.9051 | Eval rougeL: 36.8785 | Eval rougeLsum: 36.8769 | Eval gen_len: 10.8998 |)
Epoch... (2/30 | Step: 2410 | Loss: 1.992959976196289, Learning Rate: 2.8955786547157913e-05)
Epoch... (2/30 | Step: 2420 | Loss: 1.9835933446884155, Learning Rate: 2.8951451895409264e-05)
Epoch... (2/30 | Step: 2430 | Loss: 2.101801872253418, Learning Rate: 2.894711542467121e-05)
Epoch... (2/30 | Step: 2440 | Loss: 2.038966178894043, Learning Rate: 2.894278077292256e-05)
Epoch... (2/30 | Step: 2450 | Loss: 1.9256877899169922, Learning Rate: 2.8938447940163314e-05)
Epoch... (2/30 | Step: 2460 | Loss: 1.9768502712249756, Learning Rate: 2.8934113288414665e-05)
Epoch... (2/30 | Step: 2470 | Loss: 1.9429033994674683, Learning Rate: 2.8929778636666015e-05)
Epoch... (2/30 | Step: 2480 | Loss: 2.0289816856384277, Learning Rate: 2.8925443984917365e-05)
Epoch... (2/30 | Step: 2490 | Loss: 1.983972191810608, Learning Rate: 2.8921109333168715e-05)
Epoch... (2/30 | Step: 2500 | Loss: 2.0486741065979004, Learning Rate: 2.8916774681420065e-05)
Epoch... (2/30 | Step: 2510 | Loss: 1.9563899040222168, Learning Rate: 2.8912440029671416e-05)
Epoch... (2/30 | Step: 2520 | Loss: 2.0897140502929688, Learning Rate: 2.8908103558933362e-05)
Epoch... (2/30 | Step: 2530 | Loss: 2.080152988433838, Learning Rate: 2.8903768907184713e-05)
Epoch... (2/30 | Step: 2540 | Loss: 1.9517621994018555, Learning Rate: 2.8899436074425466e-05)
Epoch... (2/30 | Step: 2550 | Loss: 2.0202088356018066, Learning Rate: 2.8895101422676817e-05)
Epoch... (2/30 | Step: 2560 | Loss: 1.9686460494995117, Learning Rate: 2.8890766770928167e-05)
Epoch... (2/30 | Step: 2570 | Loss: 1.897174596786499, Learning Rate: 2.8886432119179517e-05)
Epoch... (2/30 | Step: 2580 | Loss: 1.9737465381622314, Learning Rate: 2.8882097467430867e-05)
Epoch... (2/30 | Step: 2590 | Loss: 2.0247108936309814, Learning Rate: 2.8877762815682217e-05)
Epoch... (2/30 | Step: 2600 | Loss: 2.019479990005493, Learning Rate: 2.8873428163933568e-05)
Epoch... (2/30 | Step: 2610 | Loss: 1.921512246131897, Learning Rate: 2.8869093512184918e-05)
Epoch... (2/30 | Step: 2620 | Loss: 2.002574920654297, Learning Rate: 2.8864757041446865e-05)
Epoch... (2/30 | Step: 2630 | Loss: 1.9710190296173096, Learning Rate: 2.886042420868762e-05)
Epoch... (2/30 | Step: 2640 | Loss: 2.0847268104553223, Learning Rate: 2.885608955693897e-05)
Epoch... (2/30 | Step: 2650 | Loss: 1.9489305019378662, Learning Rate: 2.885175490519032e-05)
Epoch... (2/30 | Step: 2660 | Loss: 1.9855422973632812, Learning Rate: 2.884742025344167e-05)
Epoch... (2/30 | Step: 2670 | Loss: 2.001290798187256, Learning Rate: 2.884308560169302e-05)
Epoch... (2/30 | Step: 2680 | Loss: 1.9203612804412842, Learning Rate: 2.883875094994437e-05)
Epoch... (2/30 | Step: 2690 | Loss: 1.928293228149414, Learning Rate: 2.883441629819572e-05)
Epoch... (2/30 | Step: 2700 | Loss: 2.0274925231933594, Learning Rate: 2.883008164644707e-05)
Epoch... (2/30 | Step: 2700 | Loss: 2.0274925231933594, Learning Rate: 2.883008164644707e-05)
Epoch... (2/30 | Step: 2700 | Eval Loss: 2.009991407394409 | Eval rouge1: 40.5888 | Eval rouge2: 15.0982 | Eval rougeL: 36.9127 | Eval rougeLsum: 36.9165 | Eval gen_len: 10.6976 |)
Epoch... (2/30 | Step: 2710 | Loss: 1.939640760421753, Learning Rate: 2.8825745175709017e-05)
Epoch... (2/30 | Step: 2720 | Loss: 1.9807898998260498, Learning Rate: 2.882141234294977e-05)
Epoch... (2/30 | Step: 2730 | Loss: 1.9780454635620117, Learning Rate: 2.881707769120112e-05)
Epoch... (2/30 | Step: 2740 | Loss: 1.9607644081115723, Learning Rate: 2.881274303945247e-05)
Epoch... (2/30 | Step: 2750 | Loss: 1.9642720222473145, Learning Rate: 2.880840838770382e-05)
Epoch... (2/30 | Step: 2760 | Loss: 2.0490057468414307, Learning Rate: 2.880407373595517e-05)
Epoch... (2/30 | Step: 2770 | Loss: 1.982258915901184, Learning Rate: 2.879973908420652e-05)
Epoch... (2/30 | Step: 2780 | Loss: 1.9002817869186401, Learning Rate: 2.8795404432457872e-05)
Epoch... (2/30 | Step: 2790 | Loss: 1.9543405771255493, Learning Rate: 2.8791069780709222e-05)
Epoch... (2/30 | Step: 2800 | Loss: 1.924607753753662, Learning Rate: 2.8786735128960572e-05)
Epoch... (2/30 | Step: 2810 | Loss: 1.9794992208480835, Learning Rate: 2.8782402296201326e-05)
Epoch... (2/30 | Step: 2820 | Loss: 1.9884564876556396, Learning Rate: 2.8778065825463273e-05)
Epoch... (2/30 | Step: 2830 | Loss: 2.0604281425476074, Learning Rate: 2.8773731173714623e-05)
Epoch... (2/30 | Step: 2840 | Loss: 1.9901032447814941, Learning Rate: 2.8769396521965973e-05)
Epoch... (2/30 | Step: 2850 | Loss: 1.9847352504730225, Learning Rate: 2.8765061870217323e-05)
Epoch... (2/30 | Step: 2860 | Loss: 1.9160382747650146, Learning Rate: 2.8760727218468674e-05)
Epoch... (2/30 | Step: 2870 | Loss: 2.013071060180664, Learning Rate: 2.8756392566720024e-05)
Epoch... (2/30 | Step: 2880 | Loss: 2.004521369934082, Learning Rate: 2.8752057914971374e-05)
Epoch... (2/30 | Step: 2890 | Loss: 1.962958812713623, Learning Rate: 2.8747723263222724e-05)
Epoch... (2/30 | Step: 2900 | Loss: 2.002131223678589, Learning Rate: 2.8743390430463478e-05)
Epoch... (2/30 | Step: 2910 | Loss: 1.8968100547790527, Learning Rate: 2.8739053959725425e-05)
Epoch... (2/30 | Step: 2920 | Loss: 1.9270609617233276, Learning Rate: 2.8734719307976775e-05)
Epoch... (2/30 | Step: 2930 | Loss: 1.918048620223999, Learning Rate: 2.8730384656228125e-05)
Epoch... (2/30 | Step: 2940 | Loss: 2.055185317993164, Learning Rate: 2.8726050004479475e-05)
Epoch... (2/30 | Step: 2950 | Loss: 1.998548984527588, Learning Rate: 2.8721715352730826e-05)
Epoch... (2/30 | Step: 2960 | Loss: 1.9728754758834839, Learning Rate: 2.8717380700982176e-05)
Epoch... (2/30 | Step: 2970 | Loss: 2.0119190216064453, Learning Rate: 2.8713046049233526e-05)
Epoch... (2/30 | Step: 2980 | Loss: 1.9915521144866943, Learning Rate: 2.870871321647428e-05)
Epoch... (2/30 | Step: 2990 | Loss: 2.039159059524536, Learning Rate: 2.870437856472563e-05)
Epoch... (2/30 | Step: 3000 | Loss: 1.9609485864639282, Learning Rate: 2.870004391297698e-05)
Epoch... (2/30 | Step: 3000 | Loss: 1.9609485864639282, Learning Rate: 2.870004391297698e-05)
Epoch... (2/30 | Step: 3000 | Eval Loss: 1.9941604137420654 | Eval rouge1: 40.6226 | Eval rouge2: 15.1062 | Eval rougeL: 36.9129 | Eval rougeLsum: 36.9145 | Eval gen_len: 11.0414 |)
Epoch... (2/30 | Step: 3010 | Loss: 1.9202957153320312, Learning Rate: 2.8695707442238927e-05)
Epoch... (2/30 | Step: 3020 | Loss: 1.959960699081421, Learning Rate: 2.8691372790490277e-05)
Epoch... (2/30 | Step: 3030 | Loss: 2.0083134174346924, Learning Rate: 2.8687038138741627e-05)
Epoch... (2/30 | Step: 3040 | Loss: 1.9896552562713623, Learning Rate: 2.8682703486992978e-05)
Epoch... (2/30 | Step: 3050 | Loss: 2.0054593086242676, Learning Rate: 2.8678368835244328e-05)
Epoch... (2/30 | Step: 3060 | Loss: 1.9596562385559082, Learning Rate: 2.8674034183495678e-05)
Epoch... (2/30 | Step: 3070 | Loss: 1.9955945014953613, Learning Rate: 2.8669701350736432e-05)
Epoch... (2/30 | Step: 3080 | Loss: 1.9095125198364258, Learning Rate: 2.8665366698987782e-05)
Epoch... (2/30 | Step: 3090 | Loss: 1.9675850868225098, Learning Rate: 2.8661032047239132e-05)
Epoch... (2/30 | Step: 3100 | Loss: 1.9429959058761597, Learning Rate: 2.865669557650108e-05)
Epoch... (2/30 | Step: 3110 | Loss: 1.9753589630126953, Learning Rate: 2.865236092475243e-05)
Epoch... (2/30 | Step: 3120 | Loss: 1.9604036808013916, Learning Rate: 2.864802627300378e-05)
Epoch... (2/30 | Step: 3130 | Loss: 1.9368562698364258, Learning Rate: 2.864369162125513e-05)
Epoch... (2/30 | Step: 3140 | Loss: 1.9978103637695312, Learning Rate: 2.863935696950648e-05)
Epoch... (2/30 | Step: 3150 | Loss: 2.0798728466033936, Learning Rate: 2.863502231775783e-05)
Epoch... (2/30 | Step: 3160 | Loss: 1.9111906290054321, Learning Rate: 2.8630689484998584e-05)
Epoch... (2/30 | Step: 3170 | Loss: 2.041853427886963, Learning Rate: 2.8626354833249934e-05)
Epoch... (2/30 | Step: 3180 | Loss: 2.0397024154663086, Learning Rate: 2.8622020181501284e-05)
Epoch... (2/30 | Step: 3190 | Loss: 1.8796460628509521, Learning Rate: 2.8617685529752634e-05)
Epoch... (2/30 | Step: 3200 | Loss: 2.0037131309509277, Learning Rate: 2.8613350878003985e-05)
Epoch... (2/30 | Step: 3210 | Loss: 1.9178547859191895, Learning Rate: 2.860901440726593e-05)
Epoch... (2/30 | Step: 3220 | Loss: 1.958061933517456, Learning Rate: 2.860467975551728e-05)
Epoch... (2/30 | Step: 3230 | Loss: 1.9921672344207764, Learning Rate: 2.8600345103768632e-05)
Epoch... (2/30 | Step: 3240 | Loss: 1.9145488739013672, Learning Rate: 2.8596010452019982e-05)
Epoch... (2/30 | Step: 3250 | Loss: 1.974982738494873, Learning Rate: 2.8591677619260736e-05)
Epoch... (2/30 | Step: 3260 | Loss: 1.9515241384506226, Learning Rate: 2.8587342967512086e-05)
Epoch... (2/30 | Step: 3270 | Loss: 1.94496488571167, Learning Rate: 2.8583008315763436e-05)
Epoch... (2/30 | Step: 3280 | Loss: 1.9740091562271118, Learning Rate: 2.8578673664014786e-05)
Epoch... (2/30 | Step: 3290 | Loss: 2.032979726791382, Learning Rate: 2.8574339012266137e-05)
Epoch... (2/30 | Step: 3300 | Loss: 1.9422719478607178, Learning Rate: 2.8570002541528083e-05)
Epoch... (2/30 | Step: 3300 | Loss: 1.9422719478607178, Learning Rate: 2.8570002541528083e-05)
Epoch... (2/30 | Step: 3300 | Eval Loss: 1.9856388568878174 | Eval rouge1: 40.961 | Eval rouge2: 15.427 | Eval rougeL: 37.1983 | Eval rougeLsum: 37.1916 | Eval gen_len: 10.9743 |)
Epoch... (2/30 | Step: 3310 | Loss: 2.0264008045196533, Learning Rate: 2.8565667889779434e-05)
Epoch... (2/30 | Step: 3320 | Loss: 1.9982149600982666, Learning Rate: 2.8561333238030784e-05)
Epoch... (2/30 | Step: 3330 | Loss: 2.0131986141204834, Learning Rate: 2.8556998586282134e-05)
Epoch... (2/30 | Step: 3340 | Loss: 1.9476488828659058, Learning Rate: 2.8552665753522888e-05)
Epoch... (2/30 | Step: 3350 | Loss: 1.9951847791671753, Learning Rate: 2.8548331101774238e-05)
Epoch... (2/30 | Step: 3360 | Loss: 1.931056022644043, Learning Rate: 2.8543996450025588e-05)
Epoch... (2/30 | Step: 3370 | Loss: 1.9832549095153809, Learning Rate: 2.853966179827694e-05)
Epoch... (2/30 | Step: 3380 | Loss: 1.9744317531585693, Learning Rate: 2.853532714652829e-05)
Epoch... (2/30 | Step: 3390 | Loss: 1.9657011032104492, Learning Rate: 2.853099249477964e-05)
Epoch... (2/30 | Step: 3400 | Loss: 1.9546856880187988, Learning Rate: 2.8526656024041586e-05)
Epoch... (2/30 | Step: 3410 | Loss: 1.9754679203033447, Learning Rate: 2.8522321372292936e-05)
Epoch... (2/30 | Step: 3420 | Loss: 1.9381368160247803, Learning Rate: 2.8517986720544286e-05)
Epoch... (2/30 | Step: 3430 | Loss: 1.948155164718628, Learning Rate: 2.851365388778504e-05)
Epoch... (2/30 | Step: 3440 | Loss: 1.971888542175293, Learning Rate: 2.850931923603639e-05)
Epoch... (2/30 | Step: 3450 | Loss: 1.9459969997406006, Learning Rate: 2.850498458428774e-05)
Epoch... (2/30 | Step: 3460 | Loss: 1.9678940773010254, Learning Rate: 2.850064993253909e-05)
Epoch... (2/30 | Step: 3470 | Loss: 1.9595050811767578, Learning Rate: 2.849631528079044e-05)
Epoch... (2/30 | Step: 3480 | Loss: 1.9060301780700684, Learning Rate: 2.849198062904179e-05)
Epoch... (2/30 | Step: 3490 | Loss: 1.9561471939086914, Learning Rate: 2.8487644158303738e-05)
Epoch... (2/30 | Step: 3500 | Loss: 1.9952422380447388, Learning Rate: 2.8483309506555088e-05)
Epoch... (2/30 | Step: 3510 | Loss: 1.9526166915893555, Learning Rate: 2.847897667379584e-05)
Epoch... (2/30 | Step: 3520 | Loss: 1.977981448173523, Learning Rate: 2.8474642022047192e-05)
Epoch... (2/30 | Step: 3530 | Loss: 1.9111697673797607, Learning Rate: 2.8470307370298542e-05)
Epoch... (2/30 | Step: 3540 | Loss: 1.948063850402832, Learning Rate: 2.8465972718549892e-05)
Epoch... (2/30 | Step: 3550 | Loss: 1.9008160829544067, Learning Rate: 2.8461638066801243e-05)
Epoch... (2/30 | Step: 3560 | Loss: 1.9312282800674438, Learning Rate: 2.8457303415052593e-05)
Epoch... (2/30 | Step: 3570 | Loss: 1.957169771194458, Learning Rate: 2.8452968763303943e-05)
Epoch... (2/30 | Step: 3580 | Loss: 1.8317185640335083, Learning Rate: 2.8448634111555293e-05)
Epoch... (2/30 | Step: 3590 | Loss: 2.0506675243377686, Learning Rate: 2.844429764081724e-05)
Epoch... (2/30 | Step: 3600 | Loss: 1.9117717742919922, Learning Rate: 2.8439964808057994e-05)
Epoch... (2/30 | Step: 3600 | Loss: 1.9117717742919922, Learning Rate: 2.8439964808057994e-05)
Epoch... (2/30 | Step: 3600 | Eval Loss: 1.9778660535812378 | Eval rouge1: 40.6615 | Eval rouge2: 15.2172 | Eval rougeL: 36.9778 | Eval rougeLsum: 36.9785 | Eval gen_len: 10.9873 |)
Epoch... (2/30 | Step: 3610 | Loss: 1.9407380819320679, Learning Rate: 2.8435630156309344e-05)
Epoch... (2/30 | Step: 3620 | Loss: 1.8952865600585938, Learning Rate: 2.8431295504560694e-05)
Epoch... (2/30 | Step: 3630 | Loss: 1.8948352336883545, Learning Rate: 2.8426960852812044e-05)
Epoch... (2/30 | Step: 3640 | Loss: 2.03857421875, Learning Rate: 2.8422626201063395e-05)
Epoch... (2/30 | Step: 3650 | Loss: 1.9013903141021729, Learning Rate: 2.8418291549314745e-05)
Epoch... (2/30 | Step: 3660 | Loss: 1.845206618309021, Learning Rate: 2.8413956897566095e-05)
Epoch... (2/30 | Step: 3670 | Loss: 1.7374927997589111, Learning Rate: 2.8409622245817445e-05)
Epoch... (2/30 | Step: 3680 | Loss: 1.9721872806549072, Learning Rate: 2.8405285775079392e-05)
Epoch... (2/30 | Step: 3690 | Loss: 1.9401315450668335, Learning Rate: 2.8400952942320146e-05)
Epoch... (2/30 | Step: 3700 | Loss: 1.960228443145752, Learning Rate: 2.8396618290571496e-05)
Epoch... (2/30 | Step: 3710 | Loss: 2.104597806930542, Learning Rate: 2.8392283638822846e-05)
Epoch... (2/30 | Step: 3720 | Loss: 1.9455502033233643, Learning Rate: 2.8387948987074196e-05)
Epoch... (2/30 | Step: 3730 | Loss: 2.000375747680664, Learning Rate: 2.8383614335325547e-05)
Epoch... (2/30 | Step: 3740 | Loss: 2.044059991836548, Learning Rate: 2.8379279683576897e-05)
Epoch... (2/30 | Step: 3750 | Loss: 1.90779447555542, Learning Rate: 2.8374945031828247e-05)
Epoch... (2/30 | Step: 3760 | Loss: 1.958655595779419, Learning Rate: 2.8370610380079597e-05)
Epoch... (2/30 | Step: 3770 | Loss: 1.943934440612793, Learning Rate: 2.8366275728330947e-05)
Epoch... (2/30 | Step: 3780 | Loss: 1.9462363719940186, Learning Rate: 2.83619428955717e-05)
Epoch... (2/30 | Step: 3790 | Loss: 2.00119686126709, Learning Rate: 2.8357606424833648e-05)
Epoch... (2/30 | Step: 3800 | Loss: 1.8953402042388916, Learning Rate: 2.8353271773084998e-05)
Epoch... (2/30 | Step: 3810 | Loss: 1.9074665307998657, Learning Rate: 2.834893712133635e-05)
Epoch... (2/30 | Step: 3820 | Loss: 1.9436546564102173, Learning Rate: 2.83446024695877e-05)
Epoch... (2/30 | Step: 3830 | Loss: 1.9025335311889648, Learning Rate: 2.834026781783905e-05)
Epoch... (2/30 | Step: 3840 | Loss: 1.9439079761505127, Learning Rate: 2.83359331660904e-05)
Epoch... (2/30 | Step: 3850 | Loss: 1.9517011642456055, Learning Rate: 2.833159851434175e-05)
Epoch... (2/30 | Step: 3860 | Loss: 1.9390623569488525, Learning Rate: 2.83272638625931e-05)
Epoch... (2/30 | Step: 3870 | Loss: 1.8799810409545898, Learning Rate: 2.8322931029833853e-05)
Epoch... (2/30 | Step: 3880 | Loss: 1.9533662796020508, Learning Rate: 2.83185945590958e-05)
Epoch... (2/30 | Step: 3890 | Loss: 2.0305581092834473, Learning Rate: 2.831425990734715e-05)
Epoch... (2/30 | Step: 3900 | Loss: 1.985913872718811, Learning Rate: 2.83099252555985e-05)
Epoch... (2/30 | Step: 3900 | Loss: 1.985913872718811, Learning Rate: 2.83099252555985e-05)
Epoch... (2/30 | Step: 3900 | Eval Loss: 1.9652690887451172 | Eval rouge1: 41.1961 | Eval rouge2: 15.6232 | Eval rougeL: 37.4628 | Eval rougeLsum: 37.4635 | Eval gen_len: 10.8986 |)
Epoch... (2/30 | Step: 3910 | Loss: 1.9935262203216553, Learning Rate: 2.830559060384985e-05)
Epoch... (2/30 | Step: 3920 | Loss: 1.9875662326812744, Learning Rate: 2.83012559521012e-05)
Epoch... (2/30 | Step: 3930 | Loss: 1.9753940105438232, Learning Rate: 2.829692130035255e-05)
Epoch... (2/30 | Step: 3940 | Loss: 1.9695476293563843, Learning Rate: 2.82925866486039e-05)
Epoch... (2/30 | Step: 3950 | Loss: 1.9616758823394775, Learning Rate: 2.828825199685525e-05)
Epoch... (2/30 | Step: 3960 | Loss: 2.000000476837158, Learning Rate: 2.8283919164096005e-05)
Epoch... (2/30 | Step: 3970 | Loss: 1.941230297088623, Learning Rate: 2.8279584512347355e-05)
Epoch... (2/30 | Step: 3980 | Loss: 1.913987159729004, Learning Rate: 2.8275248041609302e-05)
Epoch... (2/30 | Step: 3990 | Loss: 2.0030932426452637, Learning Rate: 2.8270913389860652e-05)
Epoch... (2/30 | Step: 4000 | Loss: 1.9450314044952393, Learning Rate: 2.8266578738112003e-05)
Epoch... (2/30 | Step: 4010 | Loss: 2.0085952281951904, Learning Rate: 2.8262244086363353e-05)
Epoch... (2/30 | Step: 4020 | Loss: 1.9368841648101807, Learning Rate: 2.8257909434614703e-05)
Epoch... (2/30 | Step: 4030 | Loss: 1.9371402263641357, Learning Rate: 2.8253574782866053e-05)
Epoch... (2/30 | Step: 4040 | Loss: 1.8853888511657715, Learning Rate: 2.8249240131117404e-05)
Epoch... (2/30 | Step: 4050 | Loss: 1.949146032333374, Learning Rate: 2.8244907298358157e-05)
Epoch... (2/30 | Step: 4060 | Loss: 1.930593490600586, Learning Rate: 2.8240572646609508e-05)
Epoch... (2/30 | Step: 4070 | Loss: 1.9453805685043335, Learning Rate: 2.8236236175871454e-05)
Epoch... (2/30 | Step: 4080 | Loss: 2.0240161418914795, Learning Rate: 2.8231901524122804e-05)
Epoch... (2/30 | Step: 4090 | Loss: 1.8439011573791504, Learning Rate: 2.8227566872374155e-05)
Epoch... (2/30 | Step: 4100 | Loss: 1.9838228225708008, Learning Rate: 2.8223232220625505e-05)
Epoch... (2/30 | Step: 4110 | Loss: 1.9583349227905273, Learning Rate: 2.8218897568876855e-05)
Epoch... (2/30 | Step: 4120 | Loss: 1.9604921340942383, Learning Rate: 2.8214562917128205e-05)
Epoch... (2/30 | Step: 4130 | Loss: 1.9949344396591187, Learning Rate: 2.8210228265379556e-05)
Epoch... (2/30 | Step: 4140 | Loss: 1.9361615180969238, Learning Rate: 2.820589543262031e-05)
Epoch... (2/30 | Step: 4150 | Loss: 1.8858580589294434, Learning Rate: 2.820156078087166e-05)
Epoch... (2/30 | Step: 4160 | Loss: 1.939874529838562, Learning Rate: 2.819722612912301e-05)
Epoch... (2/30 | Step: 4170 | Loss: 1.8935625553131104, Learning Rate: 2.819289147737436e-05)
Epoch... (2/30 | Step: 4180 | Loss: 1.8785367012023926, Learning Rate: 2.8188555006636307e-05)
Epoch... (2/30 | Step: 4190 | Loss: 1.893964171409607, Learning Rate: 2.8184220354887657e-05)
Epoch... (2/30 | Step: 4200 | Loss: 1.8666478395462036, Learning Rate: 2.8179885703139007e-05)
Epoch... (2/30 | Step: 4200 | Loss: 1.8666478395462036, Learning Rate: 2.8179885703139007e-05)
Epoch... (2/30 | Step: 4200 | Eval Loss: 1.9555346965789795 | Eval rouge1: 40.9762 | Eval rouge2: 15.5911 | Eval rougeL: 37.3749 | Eval rougeLsum: 37.3706 | Eval gen_len: 10.9797 |)
Epoch... (2/30 | Step: 4210 | Loss: 1.9187740087509155, Learning Rate: 2.8175551051390357e-05)
Epoch... (2/30 | Step: 4220 | Loss: 1.9928175210952759, Learning Rate: 2.817121821863111e-05)
Epoch... (2/30 | Step: 4230 | Loss: 1.9062347412109375, Learning Rate: 2.816688356688246e-05)
Epoch... (2/30 | Step: 4240 | Loss: 1.9475587606430054, Learning Rate: 2.816254891513381e-05)
Epoch... (2/30 | Step: 4250 | Loss: 1.9387705326080322, Learning Rate: 2.8158214263385162e-05)
Epoch... (2/30 | Step: 4260 | Loss: 1.930619239807129, Learning Rate: 2.8153879611636512e-05)
Epoch... (2/30 | Step: 4270 | Loss: 1.9640536308288574, Learning Rate: 2.814954314089846e-05)
Epoch... (2/30 | Step: 4280 | Loss: 1.9580998420715332, Learning Rate: 2.814520848914981e-05)
Epoch... (2/30 | Step: 4290 | Loss: 1.9430346488952637, Learning Rate: 2.814087383740116e-05)
Epoch... (2/30 | Step: 4300 | Loss: 1.9957571029663086, Learning Rate: 2.813653918565251e-05)
Epoch... (2/30 | Step: 4310 | Loss: 1.9378894567489624, Learning Rate: 2.8132206352893263e-05)
Epoch... (2/30 | Step: 4320 | Loss: 1.9086257219314575, Learning Rate: 2.8127871701144613e-05)
Epoch... (2/30 | Step: 4330 | Loss: 1.9071044921875, Learning Rate: 2.8123537049395964e-05)
Epoch... (2/30 | Step: 4340 | Loss: 1.9538824558258057, Learning Rate: 2.8119202397647314e-05)
Epoch... (2/30 | Step: 4350 | Loss: 1.8882595300674438, Learning Rate: 2.8114867745898664e-05)
Epoch... (2/30 | Step: 4360 | Loss: 1.939955472946167, Learning Rate: 2.8110533094150014e-05)
Epoch... (2/30 | Step: 4370 | Loss: 1.9267652034759521, Learning Rate: 2.810619662341196e-05)
Epoch... (2/30 | Step: 4380 | Loss: 1.8608458042144775, Learning Rate: 2.810186197166331e-05)
Epoch... (2/30 | Step: 4390 | Loss: 1.9454424381256104, Learning Rate: 2.809752731991466e-05)
Epoch... (2/30 | Step: 4400 | Loss: 1.9094369411468506, Learning Rate: 2.8093194487155415e-05)
Epoch... (2/30 | Step: 4410 | Loss: 1.8965256214141846, Learning Rate: 2.8088859835406765e-05)
Epoch... (2/30 | Step: 4420 | Loss: 1.980330467224121, Learning Rate: 2.8084525183658116e-05)
Epoch... (2/30 | Step: 4430 | Loss: 1.9047348499298096, Learning Rate: 2.8080190531909466e-05)
Epoch... (2/30 | Step: 4440 | Loss: 1.856551170349121, Learning Rate: 2.8075855880160816e-05)
Epoch... (2/30 | Step: 4450 | Loss: 1.9115313291549683, Learning Rate: 2.8071521228412166e-05)
Epoch... (2/30 | Step: 4460 | Loss: 1.8422142267227173, Learning Rate: 2.8067184757674113e-05)
Epoch... (2/30 | Step: 4470 | Loss: 1.9501553773880005, Learning Rate: 2.8062850105925463e-05)
Epoch... (2/30 | Step: 4480 | Loss: 1.99320650100708, Learning Rate: 2.8058515454176813e-05)
Epoch... (2/30 | Step: 4490 | Loss: 1.8857262134552002, Learning Rate: 2.8054182621417567e-05)
Epoch... (2/30 | Step: 4500 | Loss: 1.8706979751586914, Learning Rate: 2.8049847969668917e-05)
Epoch... (2/30 | Step: 4500 | Loss: 1.8706979751586914, Learning Rate: 2.8049847969668917e-05)
Epoch... (2/30 | Step: 4500 | Eval Loss: 1.9466681480407715 | Eval rouge1: 40.7165 | Eval rouge2: 15.3635 | Eval rougeL: 37.0299 | Eval rougeLsum: 37.031 | Eval gen_len: 11.2565 |)
Epoch... (2/30 | Step: 4510 | Loss: 1.9182460308074951, Learning Rate: 2.8045513317920268e-05)
Epoch... (2/30 | Step: 4520 | Loss: 1.937203288078308, Learning Rate: 2.8041178666171618e-05)
Epoch... (2/30 | Step: 4530 | Loss: 1.91261625289917, Learning Rate: 2.8036844014422968e-05)
Epoch... (2/30 | Step: 4540 | Loss: 1.9567575454711914, Learning Rate: 2.8032509362674318e-05)
Epoch... (2/30 | Step: 4550 | Loss: 1.9147999286651611, Learning Rate: 2.802817471092567e-05)
Epoch... (2/30 | Step: 4560 | Loss: 1.9001394510269165, Learning Rate: 2.802384005917702e-05)
Epoch... (2/30 | Step: 4570 | Loss: 1.9326331615447998, Learning Rate: 2.8019503588438965e-05)
Epoch... (2/30 | Step: 4580 | Loss: 1.9304759502410889, Learning Rate: 2.801517075567972e-05)
Epoch... (2/30 | Step: 4590 | Loss: 1.8214722871780396, Learning Rate: 2.801083610393107e-05)
Epoch... (2/30 | Step: 4600 | Loss: 1.8623323440551758, Learning Rate: 2.800650145218242e-05)
Epoch... (2/30 | Step: 4610 | Loss: 1.8202744722366333, Learning Rate: 2.800216680043377e-05)
Epoch... (2/30 | Step: 4614 | Loss: 1.9217591285705566, Learning Rate: 2.8000431484542787e-05)
Epoch... (2/30 | Step: 4614 | Eval Loss: 1.9446606636047363 | Eval rouge1: 41.3801 | Eval rouge2: 15.8404 | Eval rougeL: 37.7066 | Eval rougeLsum: 37.6976 | Eval gen_len: 11.1192 |)
Epoch... (3/30 | Step: 4620 | Loss: 1.9006057977676392, Learning Rate: 2.799783214868512e-05)
Epoch... (3/30 | Step: 4630 | Loss: 1.9258897304534912, Learning Rate: 2.799349749693647e-05)
Epoch... (3/30 | Step: 4640 | Loss: 1.8611785173416138, Learning Rate: 2.798916284518782e-05)
Epoch... (3/30 | Step: 4650 | Loss: 1.7929704189300537, Learning Rate: 2.7984826374449767e-05)
Epoch... (3/30 | Step: 4660 | Loss: 1.888678789138794, Learning Rate: 2.7980491722701117e-05)
Epoch... (3/30 | Step: 4670 | Loss: 1.8310635089874268, Learning Rate: 2.797615888994187e-05)
Epoch... (3/30 | Step: 4680 | Loss: 1.7876863479614258, Learning Rate: 2.797182423819322e-05)
Epoch... (3/30 | Step: 4690 | Loss: 1.8022078275680542, Learning Rate: 2.796748958644457e-05)
Epoch... (3/30 | Step: 4700 | Loss: 1.8882899284362793, Learning Rate: 2.7963154934695922e-05)
Epoch... (3/30 | Step: 4710 | Loss: 1.8650214672088623, Learning Rate: 2.7958820282947272e-05)
Epoch... (3/30 | Step: 4720 | Loss: 1.8108880519866943, Learning Rate: 2.7954485631198622e-05)
Epoch... (3/30 | Step: 4730 | Loss: 1.8337535858154297, Learning Rate: 2.7950150979449973e-05)
Epoch... (3/30 | Step: 4740 | Loss: 1.8705109357833862, Learning Rate: 2.7945816327701323e-05)
Epoch... (3/30 | Step: 4750 | Loss: 1.8093873262405396, Learning Rate: 2.7941481675952673e-05)
Epoch... (3/30 | Step: 4760 | Loss: 1.8352653980255127, Learning Rate: 2.7937147024204023e-05)
Epoch... (3/30 | Step: 4770 | Loss: 1.8993232250213623, Learning Rate: 2.7932812372455373e-05)
Epoch... (3/30 | Step: 4780 | Loss: 1.866384506225586, Learning Rate: 2.7928477720706724e-05)
Epoch... (3/30 | Step: 4790 | Loss: 1.8843146562576294, Learning Rate: 2.7924143068958074e-05)
Epoch... (3/30 | Step: 4800 | Loss: 1.8721106052398682, Learning Rate: 2.7919808417209424e-05)
Epoch... (3/30 | Step: 4800 | Loss: 1.8721106052398682, Learning Rate: 2.7919808417209424e-05)
Epoch... (3/30 | Step: 4800 | Eval Loss: 1.945229411125183 | Eval rouge1: 41.4206 | Eval rouge2: 15.7017 | Eval rougeL: 37.6112 | Eval rougeLsum: 37.6064 | Eval gen_len: 11.2409 |)
Epoch... (3/30 | Step: 4810 | Loss: 1.8651959896087646, Learning Rate: 2.7915473765460774e-05)
Epoch... (3/30 | Step: 4820 | Loss: 1.7777526378631592, Learning Rate: 2.7911139113712125e-05)
Epoch... (3/30 | Step: 4830 | Loss: 1.7878971099853516, Learning Rate: 2.7906804461963475e-05)
Epoch... (3/30 | Step: 4840 | Loss: 1.9086028337478638, Learning Rate: 2.7902469810214825e-05)
Epoch... (3/30 | Step: 4850 | Loss: 1.85693359375, Learning Rate: 2.7898135158466175e-05)
Epoch... (3/30 | Step: 4860 | Loss: 1.829287052154541, Learning Rate: 2.7893800506717525e-05)
Epoch... (3/30 | Step: 4870 | Loss: 1.8359487056732178, Learning Rate: 2.7889465854968876e-05)
Epoch... (3/30 | Step: 4880 | Loss: 1.8283047676086426, Learning Rate: 2.7885131203220226e-05)
Epoch... (3/30 | Step: 4890 | Loss: 1.8907103538513184, Learning Rate: 2.7880796551471576e-05)
Epoch... (3/30 | Step: 4900 | Loss: 1.900670051574707, Learning Rate: 2.7876461899722926e-05)
Epoch... (3/30 | Step: 4910 | Loss: 1.8678079843521118, Learning Rate: 2.7872127247974277e-05)
Epoch... (3/30 | Step: 4920 | Loss: 1.8865406513214111, Learning Rate: 2.7867792596225627e-05)
Epoch... (3/30 | Step: 4930 | Loss: 1.8659296035766602, Learning Rate: 2.786345976346638e-05)
Epoch... (3/30 | Step: 4940 | Loss: 1.8259683847427368, Learning Rate: 2.785912511171773e-05)
Epoch... (3/30 | Step: 4950 | Loss: 1.9121265411376953, Learning Rate: 2.785479045996908e-05)
Epoch... (3/30 | Step: 4960 | Loss: 1.7606853246688843, Learning Rate: 2.7850453989231028e-05)
Epoch... (3/30 | Step: 4970 | Loss: 1.9108521938323975, Learning Rate: 2.7846119337482378e-05)
Epoch... (3/30 | Step: 4980 | Loss: 1.7965797185897827, Learning Rate: 2.7841784685733728e-05)
Epoch... (3/30 | Step: 4990 | Loss: 1.8254327774047852, Learning Rate: 2.783745003398508e-05)
Epoch... (3/30 | Step: 5000 | Loss: 1.8682160377502441, Learning Rate: 2.783311538223643e-05)
Epoch... (3/30 | Step: 5010 | Loss: 1.887736439704895, Learning Rate: 2.782878073048778e-05)
Epoch... (3/30 | Step: 5020 | Loss: 1.9222376346588135, Learning Rate: 2.7824447897728533e-05)
Epoch... (3/30 | Step: 5030 | Loss: 1.8068890571594238, Learning Rate: 2.7820113245979883e-05)
Epoch... (3/30 | Step: 5040 | Loss: 1.877018928527832, Learning Rate: 2.781577677524183e-05)
Epoch... (3/30 | Step: 5050 | Loss: 1.818298578262329, Learning Rate: 2.781144212349318e-05)
Epoch... (3/30 | Step: 5060 | Loss: 1.7829349040985107, Learning Rate: 2.780710747174453e-05)
Epoch... (3/30 | Step: 5070 | Loss: 1.859926462173462, Learning Rate: 2.780277281999588e-05)
Epoch... (3/30 | Step: 5080 | Loss: 1.8778877258300781, Learning Rate: 2.779843816824723e-05)
Epoch... (3/30 | Step: 5090 | Loss: 1.8129793405532837, Learning Rate: 2.779410351649858e-05)
Epoch... (3/30 | Step: 5100 | Loss: 1.9016380310058594, Learning Rate: 2.778976886474993e-05)
Epoch... (3/30 | Step: 5100 | Loss: 1.9016380310058594, Learning Rate: 2.778976886474993e-05)
Epoch... (3/30 | Step: 5100 | Eval Loss: 1.9427725076675415 | Eval rouge1: 41.2198 | Eval rouge2: 15.6318 | Eval rougeL: 37.4398 | Eval rougeLsum: 37.4436 | Eval gen_len: 11.1195 |)
Epoch... (3/30 | Step: 5110 | Loss: 1.8517467975616455, Learning Rate: 2.7785436031990685e-05)
Epoch... (3/30 | Step: 5120 | Loss: 1.8552989959716797, Learning Rate: 2.7781101380242035e-05)
Epoch... (3/30 | Step: 5130 | Loss: 1.8470796346664429, Learning Rate: 2.7776766728493385e-05)
Epoch... (3/30 | Step: 5140 | Loss: 1.9408626556396484, Learning Rate: 2.7772432076744735e-05)
Epoch... (3/30 | Step: 5150 | Loss: 1.9104987382888794, Learning Rate: 2.7768095606006682e-05)
Epoch... (3/30 | Step: 5160 | Loss: 1.8002305030822754, Learning Rate: 2.7763760954258032e-05)
Epoch... (3/30 | Step: 5170 | Loss: 1.7606427669525146, Learning Rate: 2.7759426302509382e-05)
Epoch... (3/30 | Step: 5180 | Loss: 1.8362019062042236, Learning Rate: 2.7755091650760733e-05)
Epoch... (3/30 | Step: 5190 | Loss: 1.8140841722488403, Learning Rate: 2.7750756999012083e-05)
Epoch... (3/30 | Step: 5200 | Loss: 1.8190215826034546, Learning Rate: 2.7746424166252837e-05)
Epoch... (3/30 | Step: 5210 | Loss: 1.8725632429122925, Learning Rate: 2.7742089514504187e-05)
Epoch... (3/30 | Step: 5220 | Loss: 1.871530532836914, Learning Rate: 2.7737754862755537e-05)
Epoch... (3/30 | Step: 5230 | Loss: 1.8738446235656738, Learning Rate: 2.7733420211006887e-05)
Epoch... (3/30 | Step: 5240 | Loss: 1.726067304611206, Learning Rate: 2.7729083740268834e-05)
Epoch... (3/30 | Step: 5250 | Loss: 1.7526288032531738, Learning Rate: 2.7724749088520184e-05)
Epoch... (3/30 | Step: 5260 | Loss: 1.7957985401153564, Learning Rate: 2.7720414436771534e-05)
Epoch... (3/30 | Step: 5270 | Loss: 1.8102589845657349, Learning Rate: 2.7716079785022885e-05)
Epoch... (3/30 | Step: 5280 | Loss: 1.9033677577972412, Learning Rate: 2.7711745133274235e-05)
Epoch... (3/30 | Step: 5290 | Loss: 1.87894868850708, Learning Rate: 2.770741230051499e-05)
Epoch... (3/30 | Step: 5300 | Loss: 1.8765783309936523, Learning Rate: 2.770307764876634e-05)
Epoch... (3/30 | Step: 5310 | Loss: 1.9336323738098145, Learning Rate: 2.769874299701769e-05)
Epoch... (3/30 | Step: 5320 | Loss: 1.883346676826477, Learning Rate: 2.769440834526904e-05)
Epoch... (3/30 | Step: 5330 | Loss: 1.8765766620635986, Learning Rate: 2.769007369352039e-05)
Epoch... (3/30 | Step: 5340 | Loss: 1.8111586570739746, Learning Rate: 2.768573904177174e-05)
Epoch... (3/30 | Step: 5350 | Loss: 1.8051023483276367, Learning Rate: 2.7681402571033686e-05)
Epoch... (3/30 | Step: 5360 | Loss: 1.7918339967727661, Learning Rate: 2.7677067919285037e-05)
Epoch... (3/30 | Step: 5370 | Loss: 1.8722422122955322, Learning Rate: 2.7672733267536387e-05)
Epoch... (3/30 | Step: 5380 | Loss: 1.848788857460022, Learning Rate: 2.766840043477714e-05)
Epoch... (3/30 | Step: 5390 | Loss: 1.830802321434021, Learning Rate: 2.766406578302849e-05)
Epoch... (3/30 | Step: 5400 | Loss: 1.8050287961959839, Learning Rate: 2.765973113127984e-05)
Epoch... (3/30 | Step: 5400 | Loss: 1.8050287961959839, Learning Rate: 2.765973113127984e-05)
Epoch... (3/30 | Step: 5400 | Eval Loss: 1.9341686964035034 | Eval rouge1: 40.9793 | Eval rouge2: 15.5931 | Eval rougeL: 37.2943 | Eval rougeLsum: 37.2958 | Eval gen_len: 11.0549 |)
Epoch... (3/30 | Step: 5410 | Loss: 1.8422832489013672, Learning Rate: 2.765539647953119e-05)
Epoch... (3/30 | Step: 5420 | Loss: 1.8012787103652954, Learning Rate: 2.765106182778254e-05)
Epoch... (3/30 | Step: 5430 | Loss: 1.8209984302520752, Learning Rate: 2.7646725357044488e-05)
Epoch... (3/30 | Step: 5440 | Loss: 1.8918828964233398, Learning Rate: 2.764239070529584e-05)
Epoch... (3/30 | Step: 5450 | Loss: 1.79241943359375, Learning Rate: 2.763805605354719e-05)
Epoch... (3/30 | Step: 5460 | Loss: 1.8523470163345337, Learning Rate: 2.7633723220787942e-05)
Epoch... (3/30 | Step: 5470 | Loss: 1.8394627571105957, Learning Rate: 2.7629388569039293e-05)
Epoch... (3/30 | Step: 5480 | Loss: 1.8407964706420898, Learning Rate: 2.7625053917290643e-05)
Epoch... (3/30 | Step: 5490 | Loss: 1.7460325956344604, Learning Rate: 2.7620719265541993e-05)
Epoch... (3/30 | Step: 5500 | Loss: 1.8261981010437012, Learning Rate: 2.7616384613793343e-05)
Epoch... (3/30 | Step: 5510 | Loss: 1.8237570524215698, Learning Rate: 2.7612049962044694e-05)
Epoch... (3/30 | Step: 5520 | Loss: 1.8155288696289062, Learning Rate: 2.7607715310296044e-05)
Epoch... (3/30 | Step: 5530 | Loss: 1.8218441009521484, Learning Rate: 2.7603380658547394e-05)
Epoch... (3/30 | Step: 5540 | Loss: 1.8485835790634155, Learning Rate: 2.759904418780934e-05)
Epoch... (3/30 | Step: 5550 | Loss: 1.8281173706054688, Learning Rate: 2.7594711355050094e-05)
Epoch... (3/30 | Step: 5560 | Loss: 1.9323103427886963, Learning Rate: 2.7590376703301445e-05)
Epoch... (3/30 | Step: 5570 | Loss: 1.8508933782577515, Learning Rate: 2.7586042051552795e-05)
Epoch... (3/30 | Step: 5580 | Loss: 1.8664108514785767, Learning Rate: 2.7581707399804145e-05)
Epoch... (3/30 | Step: 5590 | Loss: 1.8382173776626587, Learning Rate: 2.7577372748055495e-05)
Epoch... (3/30 | Step: 5600 | Loss: 1.8153660297393799, Learning Rate: 2.7573038096306846e-05)
Epoch... (3/30 | Step: 5610 | Loss: 1.879223346710205, Learning Rate: 2.7568703444558196e-05)
Epoch... (3/30 | Step: 5620 | Loss: 1.9132184982299805, Learning Rate: 2.7564366973820142e-05)
Epoch... (3/30 | Step: 5630 | Loss: 1.8372087478637695, Learning Rate: 2.7560032322071493e-05)
Epoch... (3/30 | Step: 5640 | Loss: 1.9148569107055664, Learning Rate: 2.7555699489312246e-05)
Epoch... (3/30 | Step: 5650 | Loss: 1.8752307891845703, Learning Rate: 2.7551364837563597e-05)
Epoch... (3/30 | Step: 5660 | Loss: 1.8080058097839355, Learning Rate: 2.7547030185814947e-05)
Epoch... (3/30 | Step: 5670 | Loss: 1.8897976875305176, Learning Rate: 2.7542695534066297e-05)
Epoch... (3/30 | Step: 5680 | Loss: 1.879408359527588, Learning Rate: 2.7538360882317647e-05)
Epoch... (3/30 | Step: 5690 | Loss: 1.84775710105896, Learning Rate: 2.7534026230568998e-05)
Epoch... (3/30 | Step: 5700 | Loss: 1.8937106132507324, Learning Rate: 2.7529691578820348e-05)
Epoch... (3/30 | Step: 5700 | Loss: 1.8937106132507324, Learning Rate: 2.7529691578820348e-05)
Epoch... (3/30 | Step: 5700 | Eval Loss: 1.9317536354064941 | Eval rouge1: 41.2629 | Eval rouge2: 15.689 | Eval rougeL: 37.4603 | Eval rougeLsum: 37.46 | Eval gen_len: 11.1765 |)
Epoch... (3/30 | Step: 5710 | Loss: 1.8048105239868164, Learning Rate: 2.7525356927071698e-05)
Epoch... (3/30 | Step: 5720 | Loss: 1.8522942066192627, Learning Rate: 2.7521022275323048e-05)
Epoch... (3/30 | Step: 5730 | Loss: 1.8378591537475586, Learning Rate: 2.75166876235744e-05)
Epoch... (3/30 | Step: 5740 | Loss: 1.9069128036499023, Learning Rate: 2.751235297182575e-05)
Epoch... (3/30 | Step: 5750 | Loss: 1.7947864532470703, Learning Rate: 2.75080183200771e-05)
Epoch... (3/30 | Step: 5760 | Loss: 1.8484649658203125, Learning Rate: 2.750368366832845e-05)
Epoch... (3/30 | Step: 5770 | Loss: 1.8341162204742432, Learning Rate: 2.74993490165798e-05)
Epoch... (3/30 | Step: 5780 | Loss: 1.814949631690979, Learning Rate: 2.749501436483115e-05)
Epoch... (3/30 | Step: 5790 | Loss: 1.869789958000183, Learning Rate: 2.74906797130825e-05)
Epoch... (3/30 | Step: 5800 | Loss: 1.925891637802124, Learning Rate: 2.748634506133385e-05)
Epoch... (3/30 | Step: 5810 | Loss: 1.845622181892395, Learning Rate: 2.74820104095852e-05)
Epoch... (3/30 | Step: 5820 | Loss: 1.9024274349212646, Learning Rate: 2.747767575783655e-05)
Epoch... (3/30 | Step: 5830 | Loss: 1.734498381614685, Learning Rate: 2.74733411060879e-05)
Epoch... (3/30 | Step: 5840 | Loss: 1.8688403367996216, Learning Rate: 2.746900645433925e-05)
Epoch... (3/30 | Step: 5850 | Loss: 1.858962059020996, Learning Rate: 2.74646718025906e-05)
Epoch... (3/30 | Step: 5860 | Loss: 1.7663443088531494, Learning Rate: 2.746033715084195e-05)
Epoch... (3/30 | Step: 5870 | Loss: 1.741565465927124, Learning Rate: 2.74560024990933e-05)
Epoch... (3/30 | Step: 5880 | Loss: 1.8864623308181763, Learning Rate: 2.7451667847344652e-05)
Epoch... (3/30 | Step: 5890 | Loss: 1.7897045612335205, Learning Rate: 2.7447333195596002e-05)
Epoch... (3/30 | Step: 5900 | Loss: 1.794191837310791, Learning Rate: 2.7442998543847352e-05)
Epoch... (3/30 | Step: 5910 | Loss: 1.9371439218521118, Learning Rate: 2.7438665711088106e-05)
Epoch... (3/30 | Step: 5920 | Loss: 1.8133981227874756, Learning Rate: 2.7434331059339456e-05)
Epoch... (3/30 | Step: 5930 | Loss: 1.9133797883987427, Learning Rate: 2.7429994588601403e-05)
Epoch... (3/30 | Step: 5940 | Loss: 1.874544382095337, Learning Rate: 2.7425659936852753e-05)
Epoch... (3/30 | Step: 5950 | Loss: 1.7396442890167236, Learning Rate: 2.7421325285104103e-05)
Epoch... (3/30 | Step: 5960 | Loss: 1.8002548217773438, Learning Rate: 2.7416990633355454e-05)
Epoch... (3/30 | Step: 5970 | Loss: 1.8371654748916626, Learning Rate: 2.7412655981606804e-05)
Epoch... (3/30 | Step: 5980 | Loss: 1.7982285022735596, Learning Rate: 2.7408321329858154e-05)
Epoch... (3/30 | Step: 5990 | Loss: 1.862635612487793, Learning Rate: 2.7403986678109504e-05)
Epoch... (3/30 | Step: 6000 | Loss: 1.9368724822998047, Learning Rate: 2.7399653845350258e-05)
Epoch... (3/30 | Step: 6000 | Loss: 1.9368724822998047, Learning Rate: 2.7399653845350258e-05)
Epoch... (3/30 | Step: 6000 | Eval Loss: 1.9258944988250732 | Eval rouge1: 41.086 | Eval rouge2: 15.536 | Eval rougeL: 37.351 | Eval rougeLsum: 37.3528 | Eval gen_len: 11.2105 |)
Epoch... (3/30 | Step: 6010 | Loss: 1.9061357975006104, Learning Rate: 2.7395317374612205e-05)
Epoch... (3/30 | Step: 6020 | Loss: 1.8703413009643555, Learning Rate: 2.7390982722863555e-05)
Epoch... (3/30 | Step: 6030 | Loss: 1.9126508235931396, Learning Rate: 2.7386648071114905e-05)
Epoch... (3/30 | Step: 6040 | Loss: 1.8640691041946411, Learning Rate: 2.7382313419366255e-05)
Epoch... (3/30 | Step: 6050 | Loss: 1.8807706832885742, Learning Rate: 2.7377978767617606e-05)
Epoch... (3/30 | Step: 6060 | Loss: 1.9057433605194092, Learning Rate: 2.7373644115868956e-05)
Epoch... (3/30 | Step: 6070 | Loss: 1.8330992460250854, Learning Rate: 2.7369309464120306e-05)
Epoch... (3/30 | Step: 6080 | Loss: 1.9661799669265747, Learning Rate: 2.7364974812371656e-05)
Epoch... (3/30 | Step: 6090 | Loss: 1.7911491394042969, Learning Rate: 2.736064197961241e-05)
Epoch... (3/30 | Step: 6100 | Loss: 1.7153382301330566, Learning Rate: 2.735630732786376e-05)
Epoch... (3/30 | Step: 6110 | Loss: 1.8614788055419922, Learning Rate: 2.735197267611511e-05)
Epoch... (3/30 | Step: 6120 | Loss: 1.7746303081512451, Learning Rate: 2.7347636205377057e-05)
Epoch... (3/30 | Step: 6130 | Loss: 1.8193302154541016, Learning Rate: 2.7343301553628407e-05)
Epoch... (3/30 | Step: 6140 | Loss: 1.8340637683868408, Learning Rate: 2.7338966901879758e-05)
Epoch... (3/30 | Step: 6150 | Loss: 1.8399102687835693, Learning Rate: 2.7334632250131108e-05)
Epoch... (3/30 | Step: 6160 | Loss: 1.8006336688995361, Learning Rate: 2.7330297598382458e-05)
Epoch... (3/30 | Step: 6170 | Loss: 1.7936418056488037, Learning Rate: 2.7325964765623212e-05)
Epoch... (3/30 | Step: 6180 | Loss: 1.7291109561920166, Learning Rate: 2.7321630113874562e-05)
Epoch... (3/30 | Step: 6190 | Loss: 1.9827232360839844, Learning Rate: 2.7317295462125912e-05)
Epoch... (3/30 | Step: 6200 | Loss: 1.862128496170044, Learning Rate: 2.7312960810377263e-05)
Epoch... (3/30 | Step: 6210 | Loss: 1.775388240814209, Learning Rate: 2.730862433963921e-05)
Epoch... (3/30 | Step: 6220 | Loss: 1.7971916198730469, Learning Rate: 2.730428968789056e-05)
Epoch... (3/30 | Step: 6230 | Loss: 1.8290495872497559, Learning Rate: 2.729995503614191e-05)
Epoch... (3/30 | Step: 6240 | Loss: 1.844942331314087, Learning Rate: 2.729562038439326e-05)
Epoch... (3/30 | Step: 6250 | Loss: 1.816731333732605, Learning Rate: 2.729128573264461e-05)
Epoch... (3/30 | Step: 6260 | Loss: 1.8273515701293945, Learning Rate: 2.7286952899885364e-05)
Epoch... (3/30 | Step: 6270 | Loss: 1.8184173107147217, Learning Rate: 2.7282618248136714e-05)
Epoch... (3/30 | Step: 6280 | Loss: 1.796694040298462, Learning Rate: 2.7278283596388064e-05)
Epoch... (3/30 | Step: 6290 | Loss: 1.821446180343628, Learning Rate: 2.7273948944639415e-05)
Epoch... (3/30 | Step: 6300 | Loss: 1.8445713520050049, Learning Rate: 2.7269614292890765e-05)
Epoch... (3/30 | Step: 6300 | Loss: 1.8445713520050049, Learning Rate: 2.7269614292890765e-05)
Epoch... (3/30 | Step: 6300 | Eval Loss: 1.9208292961120605 | Eval rouge1: 41.3445 | Eval rouge2: 15.639 | Eval rougeL: 37.5483 | Eval rougeLsum: 37.5456 | Eval gen_len: 11.2487 |)
Epoch... (3/30 | Step: 6310 | Loss: 1.8458572626113892, Learning Rate: 2.7265279641142115e-05)
Epoch... (3/30 | Step: 6320 | Loss: 1.7985508441925049, Learning Rate: 2.726094317040406e-05)
Epoch... (3/30 | Step: 6330 | Loss: 1.751726508140564, Learning Rate: 2.7256608518655412e-05)
Epoch... (3/30 | Step: 6340 | Loss: 1.8094416856765747, Learning Rate: 2.7252273866906762e-05)
Epoch... (3/30 | Step: 6350 | Loss: 1.853166937828064, Learning Rate: 2.7247941034147516e-05)
Epoch... (3/30 | Step: 6360 | Loss: 1.8393105268478394, Learning Rate: 2.7243606382398866e-05)
Epoch... (3/30 | Step: 6370 | Loss: 1.8263094425201416, Learning Rate: 2.7239271730650216e-05)
Epoch... (3/30 | Step: 6380 | Loss: 1.804229974746704, Learning Rate: 2.7234937078901567e-05)
Epoch... (3/30 | Step: 6390 | Loss: 1.820934772491455, Learning Rate: 2.7230602427152917e-05)
Epoch... (3/30 | Step: 6400 | Loss: 1.9347383975982666, Learning Rate: 2.7226265956414863e-05)
Epoch... (3/30 | Step: 6410 | Loss: 1.873093605041504, Learning Rate: 2.7221931304666214e-05)
Epoch... (3/30 | Step: 6420 | Loss: 1.8458762168884277, Learning Rate: 2.7217596652917564e-05)
Epoch... (3/30 | Step: 6430 | Loss: 1.8492844104766846, Learning Rate: 2.7213262001168914e-05)
Epoch... (3/30 | Step: 6440 | Loss: 1.794797420501709, Learning Rate: 2.7208929168409668e-05)
Epoch... (3/30 | Step: 6450 | Loss: 1.7921502590179443, Learning Rate: 2.7204594516661018e-05)
Epoch... (3/30 | Step: 6460 | Loss: 1.7883315086364746, Learning Rate: 2.720025986491237e-05)
Epoch... (3/30 | Step: 6470 | Loss: 1.8689000606536865, Learning Rate: 2.719592521316372e-05)
Epoch... (3/30 | Step: 6480 | Loss: 1.8820496797561646, Learning Rate: 2.719159056141507e-05)
Epoch... (3/30 | Step: 6490 | Loss: 1.8595609664916992, Learning Rate: 2.718725590966642e-05)
Epoch... (3/30 | Step: 6500 | Loss: 1.7791099548339844, Learning Rate: 2.718292125791777e-05)
Epoch... (3/30 | Step: 6510 | Loss: 1.7369716167449951, Learning Rate: 2.7178584787179716e-05)
Epoch... (3/30 | Step: 6520 | Loss: 1.907957673072815, Learning Rate: 2.7174250135431066e-05)
Epoch... (3/30 | Step: 6530 | Loss: 1.8726749420166016, Learning Rate: 2.716991730267182e-05)
Epoch... (3/30 | Step: 6540 | Loss: 1.8146827220916748, Learning Rate: 2.716558265092317e-05)
Epoch... (3/30 | Step: 6550 | Loss: 1.8158860206604004, Learning Rate: 2.716124799917452e-05)
Epoch... (3/30 | Step: 6560 | Loss: 1.8074976205825806, Learning Rate: 2.715691334742587e-05)
Epoch... (3/30 | Step: 6570 | Loss: 1.8178911209106445, Learning Rate: 2.715257869567722e-05)
Epoch... (3/30 | Step: 6580 | Loss: 1.8084090948104858, Learning Rate: 2.714824404392857e-05)
Epoch... (3/30 | Step: 6590 | Loss: 1.836090326309204, Learning Rate: 2.714390939217992e-05)
Epoch... (3/30 | Step: 6600 | Loss: 1.8392951488494873, Learning Rate: 2.7139572921441868e-05)
Epoch... (3/30 | Step: 6600 | Loss: 1.8392951488494873, Learning Rate: 2.7139572921441868e-05)
Epoch... (3/30 | Step: 6600 | Eval Loss: 1.9147193431854248 | Eval rouge1: 41.1766 | Eval rouge2: 15.6082 | Eval rougeL: 37.5075 | Eval rougeLsum: 37.5043 | Eval gen_len: 11.1454 |)
Epoch... (3/30 | Step: 6610 | Loss: 1.8817616701126099, Learning Rate: 2.7135238269693218e-05)
Epoch... (3/30 | Step: 6620 | Loss: 1.8995423316955566, Learning Rate: 2.7130905436933972e-05)
Epoch... (3/30 | Step: 6630 | Loss: 1.7520475387573242, Learning Rate: 2.7126570785185322e-05)
Epoch... (3/30 | Step: 6640 | Loss: 1.841223955154419, Learning Rate: 2.7122236133436672e-05)
Epoch... (3/30 | Step: 6650 | Loss: 1.7794084548950195, Learning Rate: 2.7117901481688023e-05)
Epoch... (3/30 | Step: 6660 | Loss: 1.8177577257156372, Learning Rate: 2.7113566829939373e-05)
Epoch... (3/30 | Step: 6670 | Loss: 1.844765543937683, Learning Rate: 2.7109232178190723e-05)
Epoch... (3/30 | Step: 6680 | Loss: 1.87111496925354, Learning Rate: 2.7104897526442073e-05)
Epoch... (3/30 | Step: 6690 | Loss: 1.8403642177581787, Learning Rate: 2.7100562874693424e-05)
Epoch... (3/30 | Step: 6700 | Loss: 1.819663643836975, Learning Rate: 2.7096228222944774e-05)
Epoch... (3/30 | Step: 6710 | Loss: 1.8723413944244385, Learning Rate: 2.7091893571196124e-05)
Epoch... (3/30 | Step: 6720 | Loss: 1.789346694946289, Learning Rate: 2.7087558919447474e-05)
Epoch... (3/30 | Step: 6730 | Loss: 1.864307165145874, Learning Rate: 2.7083224267698824e-05)
Epoch... (3/30 | Step: 6740 | Loss: 1.8256621360778809, Learning Rate: 2.7078889615950175e-05)
Epoch... (3/30 | Step: 6750 | Loss: 1.8974370956420898, Learning Rate: 2.7074554964201525e-05)
Epoch... (3/30 | Step: 6760 | Loss: 1.7825756072998047, Learning Rate: 2.7070220312452875e-05)
Epoch... (3/30 | Step: 6770 | Loss: 1.768592357635498, Learning Rate: 2.7065885660704225e-05)
Epoch... (3/30 | Step: 6780 | Loss: 1.8202202320098877, Learning Rate: 2.7061551008955576e-05)
Epoch... (3/30 | Step: 6790 | Loss: 1.8324875831604004, Learning Rate: 2.7057216357206926e-05)
Epoch... (3/30 | Step: 6800 | Loss: 1.862152099609375, Learning Rate: 2.7052881705458276e-05)
Epoch... (3/30 | Step: 6810 | Loss: 1.855210781097412, Learning Rate: 2.7048547053709626e-05)
Epoch... (3/30 | Step: 6820 | Loss: 1.8279563188552856, Learning Rate: 2.7044212401960976e-05)
Epoch... (3/30 | Step: 6830 | Loss: 1.7613282203674316, Learning Rate: 2.7039877750212327e-05)
Epoch... (3/30 | Step: 6840 | Loss: 1.8263907432556152, Learning Rate: 2.7035543098463677e-05)
Epoch... (3/30 | Step: 6850 | Loss: 1.8450121879577637, Learning Rate: 2.7031208446715027e-05)
Epoch... (3/30 | Step: 6860 | Loss: 1.8444921970367432, Learning Rate: 2.7026873794966377e-05)
Epoch... (3/30 | Step: 6870 | Loss: 1.7890636920928955, Learning Rate: 2.7022539143217728e-05)
Epoch... (3/30 | Step: 6880 | Loss: 1.8365561962127686, Learning Rate: 2.701820631045848e-05)
Epoch... (3/30 | Step: 6890 | Loss: 1.871873378753662, Learning Rate: 2.701387165870983e-05)
Epoch... (3/30 | Step: 6900 | Loss: 1.7828240394592285, Learning Rate: 2.7009535187971778e-05)
Epoch... (3/30 | Step: 6900 | Loss: 1.7828240394592285, Learning Rate: 2.7009535187971778e-05)
Epoch... (3/30 | Step: 6900 | Eval Loss: 1.9094223976135254 | Eval rouge1: 41.6785 | Eval rouge2: 15.9564 | Eval rougeL: 37.9578 | Eval rougeLsum: 37.96 | Eval gen_len: 11.0899 |)
Epoch... (3/30 | Step: 6910 | Loss: 1.823469877243042, Learning Rate: 2.700520053622313e-05)
Epoch... (3/30 | Step: 6920 | Loss: 1.7423064708709717, Learning Rate: 2.700086588447448e-05)
Epoch... (3/30 | Step: 6921 | Loss: 1.853652000427246, Learning Rate: 2.7000432964996435e-05)
Epoch... (3/30 | Step: 6921 | Eval Loss: 1.9094812870025635 | Eval rouge1: 41.5037 | Eval rouge2: 15.8267 | Eval rougeL: 37.7187 | Eval rougeLsum: 37.7205 | Eval gen_len: 11.1947 |)