collapse_gemma-2-2b_hs2_accumulate_iter12_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1055
- Num Input Tokens Seen: 62098928
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5714 | 0.0044 | 5 | 1.3889 | 269136 |
1.6481 | 0.0087 | 10 | 1.3710 | 534416 |
1.4711 | 0.0131 | 15 | 1.3333 | 804880 |
1.4716 | 0.0174 | 20 | 1.2785 | 1077704 |
1.4401 | 0.0218 | 25 | 1.2433 | 1356384 |
1.2567 | 0.0262 | 30 | 1.2055 | 1629992 |
1.2435 | 0.0305 | 35 | 1.1900 | 1899528 |
1.0242 | 0.0349 | 40 | 1.2107 | 2171648 |
0.9154 | 0.0392 | 45 | 1.2221 | 2440512 |
0.8046 | 0.0436 | 50 | 1.2518 | 2711752 |
0.7353 | 0.0480 | 55 | 1.2596 | 2981656 |
0.5851 | 0.0523 | 60 | 1.2734 | 3245672 |
0.6236 | 0.0567 | 65 | 1.2475 | 3519424 |
0.481 | 0.0610 | 70 | 1.2441 | 3797456 |
0.483 | 0.0654 | 75 | 1.2201 | 4077904 |
0.3814 | 0.0698 | 80 | 1.2202 | 4346848 |
0.2834 | 0.0741 | 85 | 1.2232 | 4615392 |
0.3165 | 0.0785 | 90 | 1.2128 | 4888544 |
0.3916 | 0.0828 | 95 | 1.2116 | 5164160 |
0.3785 | 0.0872 | 100 | 1.1969 | 5429672 |
0.2609 | 0.0916 | 105 | 1.1992 | 5704864 |
0.2778 | 0.0959 | 110 | 1.2050 | 5977032 |
0.3176 | 0.1003 | 115 | 1.1959 | 6242688 |
0.2089 | 0.1046 | 120 | 1.2038 | 6513352 |
0.3142 | 0.1090 | 125 | 1.1955 | 6787856 |
0.2903 | 0.1134 | 130 | 1.1957 | 7056344 |
0.2665 | 0.1177 | 135 | 1.1920 | 7321736 |
0.2197 | 0.1221 | 140 | 1.1912 | 7592528 |
0.2884 | 0.1264 | 145 | 1.1848 | 7858864 |
0.2456 | 0.1308 | 150 | 1.1845 | 8125712 |
0.2546 | 0.1352 | 155 | 1.1916 | 8392864 |
0.1769 | 0.1395 | 160 | 1.1835 | 8664872 |
0.1613 | 0.1439 | 165 | 1.1868 | 8931488 |
0.2825 | 0.1482 | 170 | 1.1860 | 9204712 |
0.3585 | 0.1526 | 175 | 1.1792 | 9476968 |
0.1339 | 0.1570 | 180 | 1.1824 | 9747448 |
0.2799 | 0.1613 | 185 | 1.1733 | 10021032 |
0.1959 | 0.1657 | 190 | 1.1794 | 10288128 |
0.1911 | 0.1700 | 195 | 1.1851 | 10559544 |
0.2075 | 0.1744 | 200 | 1.1761 | 10836424 |
0.225 | 0.1788 | 205 | 1.1819 | 11109488 |
0.1565 | 0.1831 | 210 | 1.1742 | 11378832 |
0.1876 | 0.1875 | 215 | 1.1743 | 11656024 |
0.2206 | 0.1918 | 220 | 1.1710 | 11923888 |
0.1611 | 0.1962 | 225 | 1.1677 | 12200672 |
0.2343 | 0.2006 | 230 | 1.1676 | 12475688 |
0.1848 | 0.2049 | 235 | 1.1657 | 12748072 |
0.1716 | 0.2093 | 240 | 1.1723 | 13020904 |
0.1657 | 0.2136 | 245 | 1.1686 | 13288168 |
0.2398 | 0.2180 | 250 | 1.1621 | 13565136 |
0.2173 | 0.2224 | 255 | 1.1652 | 13835440 |
0.2074 | 0.2267 | 260 | 1.1628 | 14101336 |
0.2141 | 0.2311 | 265 | 1.1626 | 14373432 |
0.2124 | 0.2354 | 270 | 1.1600 | 14640240 |
0.1845 | 0.2398 | 275 | 1.1598 | 14912480 |
0.1993 | 0.2442 | 280 | 1.1626 | 15186512 |
0.2065 | 0.2485 | 285 | 1.1607 | 15460344 |
0.1766 | 0.2529 | 290 | 1.1583 | 15732912 |
0.1256 | 0.2572 | 295 | 1.1603 | 16000808 |
0.2397 | 0.2616 | 300 | 1.1526 | 16276400 |
0.209 | 0.2660 | 305 | 1.1525 | 16547224 |
0.1746 | 0.2703 | 310 | 1.1597 | 16815856 |
0.2298 | 0.2747 | 315 | 1.1556 | 17092312 |
0.144 | 0.2790 | 320 | 1.1515 | 17361664 |
0.2386 | 0.2834 | 325 | 1.1545 | 17640264 |
0.2497 | 0.2878 | 330 | 1.1556 | 17912984 |
0.2145 | 0.2921 | 335 | 1.1509 | 18179880 |
0.1915 | 0.2965 | 340 | 1.1517 | 18449384 |
0.2042 | 0.3008 | 345 | 1.1519 | 18722856 |
0.1674 | 0.3052 | 350 | 1.1479 | 18988304 |
0.2264 | 0.3096 | 355 | 1.1493 | 19251032 |
0.2014 | 0.3139 | 360 | 1.1478 | 19521648 |
0.1561 | 0.3183 | 365 | 1.1441 | 19794944 |
0.2617 | 0.3226 | 370 | 1.1474 | 20063320 |
0.1373 | 0.3270 | 375 | 1.1469 | 20334032 |
0.2242 | 0.3314 | 380 | 1.1464 | 20605936 |
0.2796 | 0.3357 | 385 | 1.1429 | 20880952 |
0.1325 | 0.3401 | 390 | 1.1447 | 21149584 |
0.1466 | 0.3444 | 395 | 1.1438 | 21421112 |
0.1876 | 0.3488 | 400 | 1.1406 | 21692168 |
0.1881 | 0.3532 | 405 | 1.1403 | 21958680 |
0.2034 | 0.3575 | 410 | 1.1418 | 22221928 |
0.2216 | 0.3619 | 415 | 1.1398 | 22497560 |
0.2018 | 0.3662 | 420 | 1.1435 | 22768288 |
0.1815 | 0.3706 | 425 | 1.1395 | 23041960 |
0.2286 | 0.3750 | 430 | 1.1365 | 23319424 |
0.2343 | 0.3793 | 435 | 1.1377 | 23587464 |
0.1481 | 0.3837 | 440 | 1.1412 | 23859176 |
0.2012 | 0.3880 | 445 | 1.1373 | 24133216 |
0.2469 | 0.3924 | 450 | 1.1388 | 24406512 |
0.178 | 0.3968 | 455 | 1.1389 | 24676872 |
0.1017 | 0.4011 | 460 | 1.1405 | 24949736 |
0.1884 | 0.4055 | 465 | 1.1362 | 25224328 |
0.2656 | 0.4098 | 470 | 1.1368 | 25499592 |
0.2071 | 0.4142 | 475 | 1.1359 | 25767288 |
0.1874 | 0.4186 | 480 | 1.1345 | 26044560 |
0.1535 | 0.4229 | 485 | 1.1354 | 26315040 |
0.1845 | 0.4273 | 490 | 1.1333 | 26586608 |
0.1864 | 0.4316 | 495 | 1.1327 | 26857224 |
0.1779 | 0.4360 | 500 | 1.1314 | 27122960 |
0.1885 | 0.4404 | 505 | 1.1344 | 27391728 |
0.1718 | 0.4447 | 510 | 1.1354 | 27657824 |
0.0942 | 0.4491 | 515 | 1.1335 | 27931472 |
0.1743 | 0.4534 | 520 | 1.1311 | 28194328 |
0.1209 | 0.4578 | 525 | 1.1338 | 28461568 |
0.2037 | 0.4622 | 530 | 1.1343 | 28728888 |
0.2294 | 0.4665 | 535 | 1.1313 | 28996896 |
0.1599 | 0.4709 | 540 | 1.1318 | 29266120 |
0.1154 | 0.4752 | 545 | 1.1323 | 29534432 |
0.1737 | 0.4796 | 550 | 1.1306 | 29800272 |
0.1687 | 0.4840 | 555 | 1.1277 | 30072208 |
0.1448 | 0.4883 | 560 | 1.1293 | 30339216 |
0.1832 | 0.4927 | 565 | 1.1277 | 30609424 |
0.1773 | 0.4970 | 570 | 1.1309 | 30873736 |
0.1646 | 0.5014 | 575 | 1.1303 | 31147760 |
0.1001 | 0.5057 | 580 | 1.1284 | 31409568 |
0.1803 | 0.5101 | 585 | 1.1294 | 31682912 |
0.1802 | 0.5145 | 590 | 1.1268 | 31957560 |
0.1769 | 0.5188 | 595 | 1.1247 | 32234352 |
0.172 | 0.5232 | 600 | 1.1297 | 32505368 |
0.1083 | 0.5275 | 605 | 1.1284 | 32777160 |
0.1767 | 0.5319 | 610 | 1.1261 | 33054424 |
0.1644 | 0.5363 | 615 | 1.1269 | 33317584 |
0.1901 | 0.5406 | 620 | 1.1257 | 33585000 |
0.2072 | 0.5450 | 625 | 1.1240 | 33858256 |
0.1388 | 0.5493 | 630 | 1.1271 | 34124376 |
0.2749 | 0.5537 | 635 | 1.1260 | 34397016 |
0.174 | 0.5581 | 640 | 1.1255 | 34666288 |
0.2024 | 0.5624 | 645 | 1.1254 | 34941128 |
0.1607 | 0.5668 | 650 | 1.1247 | 35203680 |
0.1947 | 0.5711 | 655 | 1.1272 | 35470872 |
0.206 | 0.5755 | 660 | 1.1243 | 35742856 |
0.1572 | 0.5799 | 665 | 1.1213 | 36009352 |
0.1607 | 0.5842 | 670 | 1.1253 | 36283296 |
0.1448 | 0.5886 | 675 | 1.1256 | 36546552 |
0.1908 | 0.5929 | 680 | 1.1220 | 36812224 |
0.159 | 0.5973 | 685 | 1.1214 | 37074480 |
0.2084 | 0.6017 | 690 | 1.1225 | 37349024 |
0.1685 | 0.6060 | 695 | 1.1201 | 37622672 |
0.1358 | 0.6104 | 700 | 1.1204 | 37895248 |
0.2432 | 0.6147 | 705 | 1.1233 | 38166608 |
0.1306 | 0.6191 | 710 | 1.1223 | 38437032 |
0.1781 | 0.6235 | 715 | 1.1192 | 38705960 |
0.184 | 0.6278 | 720 | 1.1191 | 38982568 |
0.1713 | 0.6322 | 725 | 1.1194 | 39261576 |
0.1609 | 0.6365 | 730 | 1.1213 | 39534896 |
0.235 | 0.6409 | 735 | 1.1203 | 39813040 |
0.1514 | 0.6453 | 740 | 1.1175 | 40091768 |
0.1631 | 0.6496 | 745 | 1.1165 | 40361576 |
0.1387 | 0.6540 | 750 | 1.1179 | 40633920 |
0.1071 | 0.6583 | 755 | 1.1188 | 40898648 |
0.171 | 0.6627 | 760 | 1.1221 | 41167744 |
0.1331 | 0.6671 | 765 | 1.1211 | 41436648 |
0.0944 | 0.6714 | 770 | 1.1195 | 41704784 |
0.1722 | 0.6758 | 775 | 1.1201 | 41979968 |
0.0978 | 0.6801 | 780 | 1.1197 | 42240136 |
0.2128 | 0.6845 | 785 | 1.1198 | 42514696 |
0.1698 | 0.6889 | 790 | 1.1181 | 42792112 |
0.1275 | 0.6932 | 795 | 1.1198 | 43064992 |
0.1837 | 0.6976 | 800 | 1.1183 | 43338704 |
0.1633 | 0.7019 | 805 | 1.1158 | 43609352 |
0.1483 | 0.7063 | 810 | 1.1169 | 43879592 |
0.1385 | 0.7107 | 815 | 1.1175 | 44159664 |
0.1491 | 0.7150 | 820 | 1.1179 | 44435824 |
0.2143 | 0.7194 | 825 | 1.1177 | 44711184 |
0.2218 | 0.7237 | 830 | 1.1162 | 44982280 |
0.1951 | 0.7281 | 835 | 1.1153 | 45250144 |
0.1571 | 0.7325 | 840 | 1.1179 | 45520936 |
0.1185 | 0.7368 | 845 | 1.1184 | 45796048 |
0.14 | 0.7412 | 850 | 1.1146 | 46070536 |
0.166 | 0.7455 | 855 | 1.1143 | 46343432 |
0.1848 | 0.7499 | 860 | 1.1160 | 46607008 |
0.1428 | 0.7543 | 865 | 1.1161 | 46880424 |
0.1463 | 0.7586 | 870 | 1.1167 | 47148984 |
0.215 | 0.7630 | 875 | 1.1149 | 47420136 |
0.1367 | 0.7673 | 880 | 1.1131 | 47690264 |
0.188 | 0.7717 | 885 | 1.1137 | 47961424 |
0.1886 | 0.7761 | 890 | 1.1125 | 48230816 |
0.1364 | 0.7804 | 895 | 1.1116 | 48505096 |
0.1311 | 0.7848 | 900 | 1.1122 | 48772000 |
0.1768 | 0.7891 | 905 | 1.1113 | 49045208 |
0.1662 | 0.7935 | 910 | 1.1145 | 49321520 |
0.1735 | 0.7979 | 915 | 1.1134 | 49591872 |
0.2092 | 0.8022 | 920 | 1.1139 | 49865088 |
0.1234 | 0.8066 | 925 | 1.1139 | 50134048 |
0.1547 | 0.8109 | 930 | 1.1135 | 50404304 |
0.1574 | 0.8153 | 935 | 1.1117 | 50674504 |
0.2222 | 0.8197 | 940 | 1.1120 | 50949408 |
0.0998 | 0.8240 | 945 | 1.1109 | 51218248 |
0.1338 | 0.8284 | 950 | 1.1111 | 51486696 |
0.1828 | 0.8327 | 955 | 1.1109 | 51761960 |
0.1714 | 0.8371 | 960 | 1.1090 | 52029544 |
0.0905 | 0.8415 | 965 | 1.1097 | 52293448 |
0.1579 | 0.8458 | 970 | 1.1132 | 52573088 |
0.1517 | 0.8502 | 975 | 1.1115 | 52847200 |
0.149 | 0.8545 | 980 | 1.1106 | 53115480 |
0.2022 | 0.8589 | 985 | 1.1096 | 53382824 |
0.1988 | 0.8633 | 990 | 1.1089 | 53652560 |
0.1743 | 0.8676 | 995 | 1.1099 | 53920424 |
0.1812 | 0.8720 | 1000 | 1.1082 | 54180336 |
0.1982 | 0.8763 | 1005 | 1.1088 | 54446768 |
0.1534 | 0.8807 | 1010 | 1.1109 | 54719024 |
0.1584 | 0.8851 | 1015 | 1.1098 | 54988144 |
0.1273 | 0.8894 | 1020 | 1.1076 | 55264128 |
0.139 | 0.8938 | 1025 | 1.1080 | 55539016 |
0.1294 | 0.8981 | 1030 | 1.1065 | 55811000 |
0.2221 | 0.9025 | 1035 | 1.1065 | 56091784 |
0.1273 | 0.9069 | 1040 | 1.1054 | 56366464 |
0.1897 | 0.9112 | 1045 | 1.1076 | 56637768 |
0.1722 | 0.9156 | 1050 | 1.1085 | 56912552 |
0.2818 | 0.9199 | 1055 | 1.1078 | 57180760 |
0.1612 | 0.9243 | 1060 | 1.1075 | 57450848 |
0.1764 | 0.9287 | 1065 | 1.1053 | 57719264 |
0.1715 | 0.9330 | 1070 | 1.1085 | 57987400 |
0.1137 | 0.9374 | 1075 | 1.1102 | 58255104 |
0.1939 | 0.9417 | 1080 | 1.1083 | 58524296 |
0.1489 | 0.9461 | 1085 | 1.1067 | 58791224 |
0.1914 | 0.9505 | 1090 | 1.1072 | 59060792 |
0.1485 | 0.9548 | 1095 | 1.1070 | 59334440 |
0.1623 | 0.9592 | 1100 | 1.1061 | 59612408 |
0.1823 | 0.9635 | 1105 | 1.1058 | 59874768 |
0.1025 | 0.9679 | 1110 | 1.1074 | 60145704 |
0.1447 | 0.9723 | 1115 | 1.1073 | 60416576 |
0.1845 | 0.9766 | 1120 | 1.1062 | 60688080 |
0.1749 | 0.9810 | 1125 | 1.1060 | 60957992 |
0.1365 | 0.9853 | 1130 | 1.1057 | 61229160 |
0.146 | 0.9897 | 1135 | 1.1069 | 61495888 |
0.1761 | 0.9941 | 1140 | 1.1058 | 61764696 |
0.1368 | 0.9984 | 1145 | 1.1058 | 62040384 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter12_sftsd2
Base model
google/gemma-2-2b