--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1115 - Num Input Tokens Seen: 63358856 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6281 | 0.0043 | 5 | 1.3934 | 274928 | | 1.6766 | 0.0085 | 10 | 1.3767 | 544864 | | 1.667 | 0.0128 | 15 | 1.3401 | 809432 | | 1.509 | 0.0170 | 20 | 1.2828 | 1087464 | | 1.4846 | 0.0213 | 25 | 1.2412 | 1359432 | | 1.4854 | 0.0256 | 30 | 1.2019 | 1633040 | | 1.3541 | 0.0298 | 35 | 1.1764 | 1897344 | | 1.3155 | 0.0341 | 40 | 1.1756 | 2165096 | | 1.0909 | 0.0384 | 45 | 1.1668 | 2444600 | | 1.1077 | 0.0426 | 50 | 1.1836 | 2713008 | | 0.9467 | 0.0469 | 55 | 1.1976 | 2984504 | | 0.8303 | 0.0511 | 60 | 1.2374 | 3258672 | | 0.7233 | 0.0554 | 65 | 1.2594 | 3526576 | | 0.6752 | 0.0597 | 70 | 1.2789 | 3802264 | | 0.6404 | 0.0639 | 75 | 1.2678 | 4073752 | | 0.503 | 0.0682 | 80 | 1.2459 | 4336176 | | 0.6266 | 0.0724 | 85 | 1.2244 | 4604864 | | 0.4903 | 0.0767 | 90 | 1.2451 | 4879872 | | 0.3967 | 0.0810 | 95 | 1.2479 | 5147992 | | 0.478 | 0.0852 | 100 | 1.2385 | 5423432 | | 0.3776 | 0.0895 | 105 | 1.2452 | 5690880 | | 0.5358 | 0.0938 | 110 | 1.2197 | 5960008 | | 0.3677 | 0.0980 | 115 | 1.2249 | 6227992 | | 0.3285 | 0.1023 | 120 | 1.2273 | 6491128 | | 0.3436 | 0.1065 | 125 | 1.2202 | 6750096 | | 0.3619 | 0.1108 | 130 | 1.2275 | 7019536 | | 0.2643 | 0.1151 | 135 | 1.2136 | 7290928 | | 0.3383 | 0.1193 | 140 | 1.2097 | 7559208 | | 0.2027 | 0.1236 | 145 | 1.2155 | 7833224 | | 0.268 | 0.1278 | 150 | 1.2159 | 8099480 | | 0.1734 | 0.1321 | 155 | 1.2109 | 8367592 | | 0.2535 | 0.1364 | 160 | 1.2149 | 8642344 | | 0.3114 | 0.1406 | 165 | 1.2048 | 8910480 | | 0.2827 | 0.1449 | 170 | 1.2074 | 9185168 | | 0.241 | 0.1492 | 175 | 1.2048 | 9457328 | | 0.2103 | 0.1534 | 180 | 1.2092 | 9726280 | | 0.2186 | 0.1577 | 185 | 1.2012 | 10003424 | | 0.2764 | 0.1619 | 190 | 1.2015 | 10272208 | | 0.3248 | 0.1662 | 195 | 1.2095 | 10544896 | | 0.1936 | 0.1705 | 200 | 1.1958 | 10812488 | | 0.2836 | 0.1747 | 205 | 1.2073 | 11089456 | | 0.2181 | 0.1790 | 210 | 1.1978 | 11359256 | | 0.2808 | 0.1832 | 215 | 1.2115 | 11622536 | | 0.2303 | 0.1875 | 220 | 1.2023 | 11894088 | | 0.2649 | 0.1918 | 225 | 1.1951 | 12169736 | | 0.163 | 0.1960 | 230 | 1.2074 | 12438536 | | 0.2623 | 0.2003 | 235 | 1.1926 | 12712496 | | 0.2829 | 0.2045 | 240 | 1.1889 | 12973344 | | 0.1654 | 0.2088 | 245 | 1.2087 | 13238288 | | 0.2101 | 0.2131 | 250 | 1.1974 | 13497064 | | 0.2679 | 0.2173 | 255 | 1.1970 | 13762864 | | 0.21 | 0.2216 | 260 | 1.2069 | 14031040 | | 0.2882 | 0.2259 | 265 | 1.1903 | 14305152 | | 0.161 | 0.2301 | 270 | 1.1928 | 14565352 | | 0.212 | 0.2344 | 275 | 1.1942 | 14832368 | | 0.2318 | 0.2386 | 280 | 1.1857 | 15100944 | | 0.2449 | 0.2429 | 285 | 1.1835 | 15373648 | | 0.2106 | 0.2472 | 290 | 1.1859 | 15643152 | | 0.1681 | 0.2514 | 295 | 1.1896 | 15916112 | | 0.1699 | 0.2557 | 300 | 1.1857 | 16185920 | | 0.1487 | 0.2599 | 305 | 1.1913 | 16451496 | | 0.1664 | 0.2642 | 310 | 1.1857 | 16711480 | | 0.187 | 0.2685 | 315 | 1.1839 | 16978216 | | 0.1805 | 0.2727 | 320 | 1.1826 | 17252816 | | 0.113 | 0.2770 | 325 | 1.1844 | 17524376 | | 0.1424 | 0.2813 | 330 | 1.1780 | 17797640 | | 0.1878 | 0.2855 | 335 | 1.1773 | 18070776 | | 0.1559 | 0.2898 | 340 | 1.1812 | 18340296 | | 0.1767 | 0.2940 | 345 | 1.1787 | 18607856 | | 0.1796 | 0.2983 | 350 | 1.1758 | 18877784 | | 0.1905 | 0.3026 | 355 | 1.1769 | 19145032 | | 0.2067 | 0.3068 | 360 | 1.1701 | 19411080 | | 0.1735 | 0.3111 | 365 | 1.1703 | 19683120 | | 0.1467 | 0.3153 | 370 | 1.1710 | 19954368 | | 0.1899 | 0.3196 | 375 | 1.1679 | 20220928 | | 0.1742 | 0.3239 | 380 | 1.1711 | 20484120 | | 0.1914 | 0.3281 | 385 | 1.1710 | 20756952 | | 0.2192 | 0.3324 | 390 | 1.1657 | 21020880 | | 0.1977 | 0.3367 | 395 | 1.1681 | 21289296 | | 0.2191 | 0.3409 | 400 | 1.1701 | 21550608 | | 0.2034 | 0.3452 | 405 | 1.1586 | 21819224 | | 0.0956 | 0.3494 | 410 | 1.1638 | 22091752 | | 0.2299 | 0.3537 | 415 | 1.1632 | 22362592 | | 0.2555 | 0.3580 | 420 | 1.1605 | 22636728 | | 0.1571 | 0.3622 | 425 | 1.1625 | 22904584 | | 0.1971 | 0.3665 | 430 | 1.1617 | 23172496 | | 0.184 | 0.3707 | 435 | 1.1643 | 23445400 | | 0.1952 | 0.3750 | 440 | 1.1570 | 23714944 | | 0.1909 | 0.3793 | 445 | 1.1621 | 23982496 | | 0.1895 | 0.3835 | 450 | 1.1596 | 24260496 | | 0.2287 | 0.3878 | 455 | 1.1568 | 24522088 | | 0.2005 | 0.3921 | 460 | 1.1558 | 24794496 | | 0.2329 | 0.3963 | 465 | 1.1526 | 25068160 | | 0.1996 | 0.4006 | 470 | 1.1563 | 25338592 | | 0.1696 | 0.4048 | 475 | 1.1597 | 25605840 | | 0.1521 | 0.4091 | 480 | 1.1532 | 25868568 | | 0.22 | 0.4134 | 485 | 1.1542 | 26134856 | | 0.1432 | 0.4176 | 490 | 1.1551 | 26399024 | | 0.2086 | 0.4219 | 495 | 1.1515 | 26672664 | | 0.2092 | 0.4261 | 500 | 1.1499 | 26942488 | | 0.2645 | 0.4304 | 505 | 1.1498 | 27215440 | | 0.1131 | 0.4347 | 510 | 1.1493 | 27490712 | | 0.1224 | 0.4389 | 515 | 1.1499 | 27763952 | | 0.1864 | 0.4432 | 520 | 1.1474 | 28039296 | | 0.1936 | 0.4475 | 525 | 1.1494 | 28312376 | | 0.1106 | 0.4517 | 530 | 1.1513 | 28585312 | | 0.1908 | 0.4560 | 535 | 1.1464 | 28865768 | | 0.1765 | 0.4602 | 540 | 1.1448 | 29134744 | | 0.1541 | 0.4645 | 545 | 1.1474 | 29409200 | | 0.2046 | 0.4688 | 550 | 1.1473 | 29681272 | | 0.2496 | 0.4730 | 555 | 1.1475 | 29955096 | | 0.1368 | 0.4773 | 560 | 1.1476 | 30226144 | | 0.1218 | 0.4815 | 565 | 1.1459 | 30504928 | | 0.2231 | 0.4858 | 570 | 1.1427 | 30776800 | | 0.163 | 0.4901 | 575 | 1.1443 | 31046360 | | 0.2059 | 0.4943 | 580 | 1.1467 | 31322592 | | 0.1813 | 0.4986 | 585 | 1.1417 | 31595528 | | 0.1555 | 0.5028 | 590 | 1.1418 | 31860016 | | 0.2493 | 0.5071 | 595 | 1.1478 | 32134176 | | 0.2077 | 0.5114 | 600 | 1.1402 | 32404440 | | 0.1179 | 0.5156 | 605 | 1.1431 | 32677328 | | 0.1535 | 0.5199 | 610 | 1.1459 | 32946528 | | 0.1342 | 0.5242 | 615 | 1.1429 | 33211040 | | 0.1166 | 0.5284 | 620 | 1.1404 | 33480416 | | 0.1949 | 0.5327 | 625 | 1.1422 | 33754328 | | 0.1487 | 0.5369 | 630 | 1.1394 | 34023400 | | 0.2949 | 0.5412 | 635 | 1.1387 | 34301016 | | 0.1005 | 0.5455 | 640 | 1.1392 | 34568856 | | 0.1058 | 0.5497 | 645 | 1.1401 | 34835552 | | 0.2073 | 0.5540 | 650 | 1.1370 | 35107216 | | 0.1519 | 0.5582 | 655 | 1.1394 | 35380776 | | 0.1598 | 0.5625 | 660 | 1.1409 | 35650240 | | 0.1849 | 0.5668 | 665 | 1.1389 | 35923440 | | 0.2379 | 0.5710 | 670 | 1.1376 | 36202688 | | 0.1738 | 0.5753 | 675 | 1.1405 | 36468728 | | 0.1388 | 0.5796 | 680 | 1.1376 | 36743760 | | 0.1362 | 0.5838 | 685 | 1.1347 | 37008264 | | 0.1655 | 0.5881 | 690 | 1.1386 | 37274384 | | 0.1673 | 0.5923 | 695 | 1.1373 | 37542432 | | 0.1822 | 0.5966 | 700 | 1.1318 | 37814184 | | 0.2159 | 0.6009 | 705 | 1.1359 | 38079800 | | 0.0818 | 0.6051 | 710 | 1.1377 | 38353440 | | 0.1606 | 0.6094 | 715 | 1.1314 | 38622312 | | 0.2045 | 0.6136 | 720 | 1.1298 | 38888976 | | 0.1382 | 0.6179 | 725 | 1.1335 | 39157440 | | 0.1373 | 0.6222 | 730 | 1.1324 | 39424448 | | 0.1212 | 0.6264 | 735 | 1.1321 | 39685152 | | 0.1736 | 0.6307 | 740 | 1.1328 | 39955296 | | 0.2216 | 0.6350 | 745 | 1.1322 | 40224264 | | 0.159 | 0.6392 | 750 | 1.1309 | 40489880 | | 0.2088 | 0.6435 | 755 | 1.1308 | 40760920 | | 0.1727 | 0.6477 | 760 | 1.1288 | 41036304 | | 0.1264 | 0.6520 | 765 | 1.1313 | 41308776 | | 0.143 | 0.6563 | 770 | 1.1341 | 41587328 | | 0.1625 | 0.6605 | 775 | 1.1328 | 41858424 | | 0.2043 | 0.6648 | 780 | 1.1314 | 42130400 | | 0.1761 | 0.6690 | 785 | 1.1297 | 42408016 | | 0.1527 | 0.6733 | 790 | 1.1288 | 42679608 | | 0.1701 | 0.6776 | 795 | 1.1300 | 42942480 | | 0.2072 | 0.6818 | 800 | 1.1308 | 43218032 | | 0.1241 | 0.6861 | 805 | 1.1280 | 43496640 | | 0.1741 | 0.6904 | 810 | 1.1297 | 43772392 | | 0.174 | 0.6946 | 815 | 1.1314 | 44035960 | | 0.1829 | 0.6989 | 820 | 1.1298 | 44311064 | | 0.1704 | 0.7031 | 825 | 1.1278 | 44579240 | | 0.174 | 0.7074 | 830 | 1.1272 | 44853824 | | 0.1689 | 0.7117 | 835 | 1.1281 | 45122712 | | 0.1658 | 0.7159 | 840 | 1.1275 | 45396376 | | 0.1402 | 0.7202 | 845 | 1.1272 | 45659992 | | 0.1506 | 0.7244 | 850 | 1.1272 | 45938136 | | 0.1741 | 0.7287 | 855 | 1.1271 | 46205480 | | 0.1701 | 0.7330 | 860 | 1.1284 | 46465984 | | 0.1296 | 0.7372 | 865 | 1.1305 | 46725880 | | 0.222 | 0.7415 | 870 | 1.1291 | 46996672 | | 0.1797 | 0.7458 | 875 | 1.1266 | 47262816 | | 0.1854 | 0.7500 | 880 | 1.1240 | 47542048 | | 0.1431 | 0.7543 | 885 | 1.1235 | 47820544 | | 0.1636 | 0.7585 | 890 | 1.1276 | 48087000 | | 0.1267 | 0.7628 | 895 | 1.1238 | 48357488 | | 0.1658 | 0.7671 | 900 | 1.1219 | 48629528 | | 0.1863 | 0.7713 | 905 | 1.1293 | 48899952 | | 0.1718 | 0.7756 | 910 | 1.1283 | 49171336 | | 0.2038 | 0.7798 | 915 | 1.1232 | 49434544 | | 0.1561 | 0.7841 | 920 | 1.1253 | 49697136 | | 0.1312 | 0.7884 | 925 | 1.1250 | 49959432 | | 0.1334 | 0.7926 | 930 | 1.1256 | 50228984 | | 0.1727 | 0.7969 | 935 | 1.1272 | 50492880 | | 0.1703 | 0.8012 | 940 | 1.1220 | 50761584 | | 0.1577 | 0.8054 | 945 | 1.1225 | 51026360 | | 0.1965 | 0.8097 | 950 | 1.1240 | 51298120 | | 0.128 | 0.8139 | 955 | 1.1219 | 51565616 | | 0.1908 | 0.8182 | 960 | 1.1229 | 51835072 | | 0.1182 | 0.8225 | 965 | 1.1249 | 52098072 | | 0.121 | 0.8267 | 970 | 1.1219 | 52363024 | | 0.1408 | 0.8310 | 975 | 1.1221 | 52633408 | | 0.1766 | 0.8352 | 980 | 1.1262 | 52901664 | | 0.1924 | 0.8395 | 985 | 1.1221 | 53172064 | | 0.0933 | 0.8438 | 990 | 1.1195 | 53439088 | | 0.1967 | 0.8480 | 995 | 1.1248 | 53708936 | | 0.2113 | 0.8523 | 1000 | 1.1240 | 53972568 | | 0.1773 | 0.8565 | 1005 | 1.1232 | 54246864 | | 0.1417 | 0.8608 | 1010 | 1.1192 | 54508856 | | 0.1155 | 0.8651 | 1015 | 1.1206 | 54774728 | | 0.2128 | 0.8693 | 1020 | 1.1234 | 55045544 | | 0.1417 | 0.8736 | 1025 | 1.1195 | 55322176 | | 0.1838 | 0.8779 | 1030 | 1.1187 | 55594384 | | 0.1663 | 0.8821 | 1035 | 1.1188 | 55867672 | | 0.1767 | 0.8864 | 1040 | 1.1180 | 56144856 | | 0.0898 | 0.8906 | 1045 | 1.1197 | 56423512 | | 0.2001 | 0.8949 | 1050 | 1.1176 | 56700800 | | 0.2134 | 0.8992 | 1055 | 1.1176 | 56967968 | | 0.1479 | 0.9034 | 1060 | 1.1187 | 57237424 | | 0.135 | 0.9077 | 1065 | 1.1166 | 57512288 | | 0.1949 | 0.9119 | 1070 | 1.1154 | 57785904 | | 0.1553 | 0.9162 | 1075 | 1.1168 | 58061496 | | 0.1645 | 0.9205 | 1080 | 1.1176 | 58337408 | | 0.1523 | 0.9247 | 1085 | 1.1163 | 58612048 | | 0.2493 | 0.9290 | 1090 | 1.1176 | 58885584 | | 0.1114 | 0.9333 | 1095 | 1.1175 | 59162552 | | 0.1506 | 0.9375 | 1100 | 1.1158 | 59425968 | | 0.1647 | 0.9418 | 1105 | 1.1154 | 59701184 | | 0.1982 | 0.9460 | 1110 | 1.1149 | 59966112 | | 0.1758 | 0.9503 | 1115 | 1.1135 | 60230008 | | 0.1625 | 0.9546 | 1120 | 1.1132 | 60497792 | | 0.1389 | 0.9588 | 1125 | 1.1136 | 60773880 | | 0.1207 | 0.9631 | 1130 | 1.1148 | 61040768 | | 0.1471 | 0.9673 | 1135 | 1.1152 | 61313120 | | 0.162 | 0.9716 | 1140 | 1.1124 | 61587408 | | 0.1433 | 0.9759 | 1145 | 1.1122 | 61853400 | | 0.1479 | 0.9801 | 1150 | 1.1130 | 62126856 | | 0.1043 | 0.9844 | 1155 | 1.1137 | 62392328 | | 0.125 | 0.9887 | 1160 | 1.1138 | 62659824 | | 0.2094 | 0.9929 | 1165 | 1.1131 | 62932648 | | 0.1797 | 0.9972 | 1170 | 1.1126 | 63200224 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1