collapse_gemma-2-2b_hs2_accumulate_iter12_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0947
- Num Input Tokens Seen: 61333232
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6324 | 0.0044 | 5 | 1.3893 | 269472 |
1.6194 | 0.0087 | 10 | 1.3717 | 542360 |
1.5309 | 0.0131 | 15 | 1.3345 | 811408 |
1.4876 | 0.0174 | 20 | 1.2781 | 1068224 |
1.4483 | 0.0218 | 25 | 1.2382 | 1338536 |
1.2719 | 0.0262 | 30 | 1.2001 | 1599816 |
1.2172 | 0.0305 | 35 | 1.1828 | 1870328 |
1.0862 | 0.0349 | 40 | 1.2074 | 2131752 |
0.919 | 0.0392 | 45 | 1.2146 | 2402272 |
0.8415 | 0.0436 | 50 | 1.2478 | 2673128 |
0.6424 | 0.0480 | 55 | 1.2676 | 2945544 |
0.498 | 0.0523 | 60 | 1.2913 | 3206480 |
0.5197 | 0.0567 | 65 | 1.2855 | 3478184 |
0.3891 | 0.0610 | 70 | 1.2622 | 3743928 |
0.4065 | 0.0654 | 75 | 1.2210 | 4003328 |
0.3004 | 0.0698 | 80 | 1.2263 | 4274576 |
0.2788 | 0.0741 | 85 | 1.2290 | 4550864 |
0.2618 | 0.0785 | 90 | 1.2033 | 4822776 |
0.276 | 0.0828 | 95 | 1.2057 | 5093024 |
0.2414 | 0.0872 | 100 | 1.1954 | 5363384 |
0.2651 | 0.0916 | 105 | 1.1859 | 5626720 |
0.2902 | 0.0959 | 110 | 1.1965 | 5891328 |
0.165 | 0.1003 | 115 | 1.1824 | 6157144 |
0.2686 | 0.1046 | 120 | 1.1802 | 6427352 |
0.1767 | 0.1090 | 125 | 1.1885 | 6698400 |
0.2486 | 0.1134 | 130 | 1.1714 | 6959416 |
0.263 | 0.1177 | 135 | 1.1766 | 7223456 |
0.2126 | 0.1221 | 140 | 1.1709 | 7490928 |
0.23 | 0.1264 | 145 | 1.1771 | 7760848 |
0.227 | 0.1308 | 150 | 1.1679 | 8030168 |
0.2171 | 0.1352 | 155 | 1.1695 | 8303896 |
0.1874 | 0.1395 | 160 | 1.1698 | 8568168 |
0.1996 | 0.1439 | 165 | 1.1701 | 8839944 |
0.17 | 0.1482 | 170 | 1.1630 | 9114072 |
0.2253 | 0.1526 | 175 | 1.1642 | 9373880 |
0.1302 | 0.1570 | 180 | 1.1583 | 9645160 |
0.2076 | 0.1613 | 185 | 1.1621 | 9910208 |
0.2237 | 0.1657 | 190 | 1.1566 | 10176824 |
0.16 | 0.1700 | 195 | 1.1572 | 10442960 |
0.1274 | 0.1744 | 200 | 1.1568 | 10703040 |
0.1379 | 0.1788 | 205 | 1.1576 | 10967888 |
0.1656 | 0.1831 | 210 | 1.1561 | 11245224 |
0.2064 | 0.1875 | 215 | 1.1512 | 11513448 |
0.2662 | 0.1918 | 220 | 1.1496 | 11784464 |
0.127 | 0.1962 | 225 | 1.1491 | 12056888 |
0.1255 | 0.2006 | 230 | 1.1503 | 12325408 |
0.1565 | 0.2049 | 235 | 1.1510 | 12598584 |
0.1509 | 0.2093 | 240 | 1.1499 | 12868736 |
0.1953 | 0.2136 | 245 | 1.1463 | 13136656 |
0.1586 | 0.2180 | 250 | 1.1455 | 13399392 |
0.1646 | 0.2224 | 255 | 1.1461 | 13670328 |
0.1585 | 0.2267 | 260 | 1.1449 | 13941568 |
0.1749 | 0.2311 | 265 | 1.1425 | 14213656 |
0.229 | 0.2354 | 270 | 1.1419 | 14480296 |
0.1777 | 0.2398 | 275 | 1.1408 | 14742280 |
0.1847 | 0.2442 | 280 | 1.1393 | 15012952 |
0.1368 | 0.2485 | 285 | 1.1381 | 15283536 |
0.1488 | 0.2529 | 290 | 1.1395 | 15555064 |
0.1431 | 0.2572 | 295 | 1.1414 | 15818408 |
0.2209 | 0.2616 | 300 | 1.1387 | 16075000 |
0.1712 | 0.2660 | 305 | 1.1365 | 16343336 |
0.1321 | 0.2703 | 310 | 1.1413 | 16616912 |
0.1584 | 0.2747 | 315 | 1.1368 | 16875032 |
0.1653 | 0.2790 | 320 | 1.1343 | 17147168 |
0.1435 | 0.2834 | 325 | 1.1345 | 17420704 |
0.2174 | 0.2878 | 330 | 1.1381 | 17694952 |
0.1777 | 0.2921 | 335 | 1.1330 | 17969696 |
0.1376 | 0.2965 | 340 | 1.1380 | 18234344 |
0.1985 | 0.3009 | 345 | 1.1374 | 18504352 |
0.2443 | 0.3052 | 350 | 1.1313 | 18767576 |
0.2337 | 0.3096 | 355 | 1.1354 | 19037032 |
0.2491 | 0.3139 | 360 | 1.1353 | 19312240 |
0.1833 | 0.3183 | 365 | 1.1339 | 19576856 |
0.1297 | 0.3227 | 370 | 1.1322 | 19840624 |
0.2093 | 0.3270 | 375 | 1.1285 | 20107200 |
0.202 | 0.3314 | 380 | 1.1315 | 20382752 |
0.1222 | 0.3357 | 385 | 1.1303 | 20649192 |
0.2074 | 0.3401 | 390 | 1.1281 | 20916344 |
0.1704 | 0.3445 | 395 | 1.1294 | 21184176 |
0.1523 | 0.3488 | 400 | 1.1284 | 21448632 |
0.1934 | 0.3532 | 405 | 1.1291 | 21716296 |
0.1952 | 0.3575 | 410 | 1.1281 | 21979120 |
0.1823 | 0.3619 | 415 | 1.1271 | 22247920 |
0.202 | 0.3663 | 420 | 1.1289 | 22522616 |
0.183 | 0.3706 | 425 | 1.1289 | 22785016 |
0.1281 | 0.3750 | 430 | 1.1279 | 23051360 |
0.1718 | 0.3793 | 435 | 1.1240 | 23319136 |
0.1823 | 0.3837 | 440 | 1.1256 | 23581776 |
0.1819 | 0.3881 | 445 | 1.1254 | 23853344 |
0.2849 | 0.3924 | 450 | 1.1230 | 24120560 |
0.1892 | 0.3968 | 455 | 1.1239 | 24383760 |
0.2209 | 0.4011 | 460 | 1.1242 | 24650912 |
0.1956 | 0.4055 | 465 | 1.1222 | 24919072 |
0.2009 | 0.4099 | 470 | 1.1225 | 25182248 |
0.2131 | 0.4142 | 475 | 1.1244 | 25448152 |
0.2767 | 0.4186 | 480 | 1.1206 | 25724568 |
0.2035 | 0.4229 | 485 | 1.1230 | 25990072 |
0.1726 | 0.4273 | 490 | 1.1229 | 26266152 |
0.1747 | 0.4317 | 495 | 1.1211 | 26538856 |
0.1567 | 0.4360 | 500 | 1.1219 | 26808816 |
0.2012 | 0.4404 | 505 | 1.1238 | 27068048 |
0.1732 | 0.4447 | 510 | 1.1201 | 27342616 |
0.1561 | 0.4491 | 515 | 1.1198 | 27609920 |
0.095 | 0.4535 | 520 | 1.1188 | 27870624 |
0.1379 | 0.4578 | 525 | 1.1204 | 28134792 |
0.1674 | 0.4622 | 530 | 1.1222 | 28396568 |
0.1615 | 0.4665 | 535 | 1.1191 | 28663128 |
0.1583 | 0.4709 | 540 | 1.1178 | 28934016 |
0.1815 | 0.4753 | 545 | 1.1205 | 29204344 |
0.1575 | 0.4796 | 550 | 1.1200 | 29475824 |
0.119 | 0.4840 | 555 | 1.1180 | 29744824 |
0.1957 | 0.4883 | 560 | 1.1213 | 30008296 |
0.1721 | 0.4927 | 565 | 1.1178 | 30276424 |
0.1584 | 0.4971 | 570 | 1.1153 | 30547272 |
0.1939 | 0.5014 | 575 | 1.1177 | 30814240 |
0.1243 | 0.5058 | 580 | 1.1166 | 31083872 |
0.1661 | 0.5101 | 585 | 1.1138 | 31347928 |
0.1119 | 0.5145 | 590 | 1.1172 | 31618224 |
0.1535 | 0.5189 | 595 | 1.1183 | 31886376 |
0.213 | 0.5232 | 600 | 1.1155 | 32158544 |
0.1635 | 0.5276 | 605 | 1.1133 | 32424672 |
0.1192 | 0.5319 | 610 | 1.1127 | 32687600 |
0.1564 | 0.5363 | 615 | 1.1131 | 32952280 |
0.1562 | 0.5407 | 620 | 1.1148 | 33226656 |
0.1224 | 0.5450 | 625 | 1.1140 | 33500976 |
0.2394 | 0.5494 | 630 | 1.1127 | 33768984 |
0.1893 | 0.5537 | 635 | 1.1137 | 34044008 |
0.1695 | 0.5581 | 640 | 1.1114 | 34319264 |
0.1215 | 0.5625 | 645 | 1.1126 | 34579904 |
0.2103 | 0.5668 | 650 | 1.1133 | 34844112 |
0.2049 | 0.5712 | 655 | 1.1113 | 35113520 |
0.1377 | 0.5755 | 660 | 1.1107 | 35383040 |
0.1352 | 0.5799 | 665 | 1.1131 | 35652616 |
0.1551 | 0.5843 | 670 | 1.1133 | 35924136 |
0.2084 | 0.5886 | 675 | 1.1095 | 36186464 |
0.1519 | 0.5930 | 680 | 1.1094 | 36460336 |
0.1865 | 0.5973 | 685 | 1.1114 | 36724928 |
0.1369 | 0.6017 | 690 | 1.1113 | 36996368 |
0.1558 | 0.6061 | 695 | 1.1099 | 37257952 |
0.1818 | 0.6104 | 700 | 1.1101 | 37525992 |
0.0713 | 0.6148 | 705 | 1.1122 | 37785728 |
0.1486 | 0.6191 | 710 | 1.1124 | 38049040 |
0.1592 | 0.6235 | 715 | 1.1115 | 38323032 |
0.2139 | 0.6279 | 720 | 1.1096 | 38584096 |
0.0919 | 0.6322 | 725 | 1.1090 | 38851464 |
0.1401 | 0.6366 | 730 | 1.1083 | 39117616 |
0.1593 | 0.6409 | 735 | 1.1090 | 39376784 |
0.1478 | 0.6453 | 740 | 1.1104 | 39642896 |
0.2766 | 0.6497 | 745 | 1.1090 | 39909216 |
0.1153 | 0.6540 | 750 | 1.1077 | 40174736 |
0.1217 | 0.6584 | 755 | 1.1080 | 40441056 |
0.1804 | 0.6627 | 760 | 1.1097 | 40709984 |
0.1183 | 0.6671 | 765 | 1.1060 | 40973904 |
0.1686 | 0.6715 | 770 | 1.1067 | 41245344 |
0.1561 | 0.6758 | 775 | 1.1079 | 41513696 |
0.1969 | 0.6802 | 780 | 1.1069 | 41780856 |
0.2113 | 0.6845 | 785 | 1.1056 | 42043344 |
0.2114 | 0.6889 | 790 | 1.1049 | 42315768 |
0.1392 | 0.6933 | 795 | 1.1035 | 42589280 |
0.0981 | 0.6976 | 800 | 1.1044 | 42859712 |
0.1494 | 0.7020 | 805 | 1.1051 | 43124824 |
0.144 | 0.7063 | 810 | 1.1039 | 43386168 |
0.1248 | 0.7107 | 815 | 1.1039 | 43647616 |
0.145 | 0.7151 | 820 | 1.1035 | 43919624 |
0.1978 | 0.7194 | 825 | 1.1019 | 44181672 |
0.1561 | 0.7238 | 830 | 1.1038 | 44455232 |
0.1331 | 0.7281 | 835 | 1.1043 | 44717912 |
0.1112 | 0.7325 | 840 | 1.1033 | 44986768 |
0.1463 | 0.7369 | 845 | 1.1059 | 45254168 |
0.0943 | 0.7412 | 850 | 1.1056 | 45522464 |
0.1404 | 0.7456 | 855 | 1.1033 | 45790432 |
0.132 | 0.7499 | 860 | 1.1040 | 46057520 |
0.1499 | 0.7543 | 865 | 1.1032 | 46325824 |
0.1549 | 0.7587 | 870 | 1.1020 | 46588648 |
0.1277 | 0.7630 | 875 | 1.1045 | 46856704 |
0.2225 | 0.7674 | 880 | 1.1043 | 47124424 |
0.1375 | 0.7717 | 885 | 1.1017 | 47393720 |
0.199 | 0.7761 | 890 | 1.1015 | 47663000 |
0.1596 | 0.7805 | 895 | 1.1025 | 47924600 |
0.1723 | 0.7848 | 900 | 1.1026 | 48185544 |
0.1752 | 0.7892 | 905 | 1.1014 | 48459288 |
0.194 | 0.7935 | 910 | 1.1020 | 48732352 |
0.1177 | 0.7979 | 915 | 1.1014 | 48991616 |
0.1534 | 0.8023 | 920 | 1.1008 | 49258896 |
0.1284 | 0.8066 | 925 | 1.1017 | 49536624 |
0.1756 | 0.8110 | 930 | 1.1016 | 49812240 |
0.1669 | 0.8153 | 935 | 1.1010 | 50083288 |
0.1083 | 0.8197 | 940 | 1.1012 | 50352488 |
0.1682 | 0.8241 | 945 | 1.1016 | 50618112 |
0.1024 | 0.8284 | 950 | 1.1033 | 50887896 |
0.2281 | 0.8328 | 955 | 1.1046 | 51155288 |
0.1789 | 0.8371 | 960 | 1.1001 | 51418384 |
0.2187 | 0.8415 | 965 | 1.0984 | 51676344 |
0.1217 | 0.8459 | 970 | 1.1002 | 51945312 |
0.1369 | 0.8502 | 975 | 1.1023 | 52210120 |
0.1458 | 0.8546 | 980 | 1.1012 | 52474216 |
0.1776 | 0.8589 | 985 | 1.1004 | 52739136 |
0.2088 | 0.8633 | 990 | 1.1013 | 53019976 |
0.1626 | 0.8677 | 995 | 1.0996 | 53286120 |
0.163 | 0.8720 | 1000 | 1.0981 | 53549736 |
0.1933 | 0.8764 | 1005 | 1.0994 | 53822128 |
0.1906 | 0.8807 | 1010 | 1.0982 | 54094072 |
0.1602 | 0.8851 | 1015 | 1.0969 | 54364400 |
0.118 | 0.8895 | 1020 | 1.0995 | 54631776 |
0.1309 | 0.8938 | 1025 | 1.0993 | 54897744 |
0.1647 | 0.8982 | 1030 | 1.0983 | 55158376 |
0.1352 | 0.9026 | 1035 | 1.0982 | 55421744 |
0.1634 | 0.9069 | 1040 | 1.0982 | 55690776 |
0.1067 | 0.9113 | 1045 | 1.0978 | 55956256 |
0.2028 | 0.9156 | 1050 | 1.0976 | 56219136 |
0.1621 | 0.9200 | 1055 | 1.0973 | 56489928 |
0.1878 | 0.9244 | 1060 | 1.0966 | 56757960 |
0.1936 | 0.9287 | 1065 | 1.0952 | 57025544 |
0.1295 | 0.9331 | 1070 | 1.0949 | 57289760 |
0.1939 | 0.9374 | 1075 | 1.0984 | 57553256 |
0.2116 | 0.9418 | 1080 | 1.0993 | 57817224 |
0.1712 | 0.9462 | 1085 | 1.0960 | 58080104 |
0.1462 | 0.9505 | 1090 | 1.0959 | 58347832 |
0.1312 | 0.9549 | 1095 | 1.0962 | 58615408 |
0.1191 | 0.9592 | 1100 | 1.0955 | 58889640 |
0.1095 | 0.9636 | 1105 | 1.0959 | 59156928 |
0.1519 | 0.9680 | 1110 | 1.0955 | 59419384 |
0.2033 | 0.9723 | 1115 | 1.0961 | 59689160 |
0.1681 | 0.9767 | 1120 | 1.0966 | 59962888 |
0.1521 | 0.9810 | 1125 | 1.0950 | 60225448 |
0.1482 | 0.9854 | 1130 | 1.0956 | 60489408 |
0.2018 | 0.9898 | 1135 | 1.0959 | 60755280 |
0.1731 | 0.9941 | 1140 | 1.0951 | 61012952 |
0.1909 | 0.9985 | 1145 | 1.0949 | 61279968 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter12_sftsd1
Base model
google/gemma-2-2b