collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1055
- Num Input Tokens Seen: 67259368
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5671 | 0.0040 | 5 | 1.3893 | 272824 |
1.6849 | 0.0080 | 10 | 1.3735 | 539200 |
1.5548 | 0.0121 | 15 | 1.3410 | 813336 |
1.5425 | 0.0161 | 20 | 1.2875 | 1083664 |
1.4656 | 0.0201 | 25 | 1.2496 | 1352864 |
1.3104 | 0.0241 | 30 | 1.2188 | 1627144 |
1.2102 | 0.0281 | 35 | 1.1918 | 1903616 |
1.0365 | 0.0322 | 40 | 1.2027 | 2179440 |
0.9723 | 0.0362 | 45 | 1.2295 | 2452416 |
0.8749 | 0.0402 | 50 | 1.2301 | 2723624 |
0.7583 | 0.0442 | 55 | 1.2641 | 2995552 |
0.6034 | 0.0482 | 60 | 1.2442 | 3263112 |
0.6268 | 0.0523 | 65 | 1.2775 | 3532048 |
0.5245 | 0.0563 | 70 | 1.2555 | 3803712 |
0.5529 | 0.0603 | 75 | 1.2545 | 4067952 |
0.451 | 0.0643 | 80 | 1.2311 | 4341488 |
0.3354 | 0.0683 | 85 | 1.2287 | 4606360 |
0.3631 | 0.0724 | 90 | 1.2392 | 4870680 |
0.3391 | 0.0764 | 95 | 1.2107 | 5141880 |
0.3349 | 0.0804 | 100 | 1.2144 | 5419064 |
0.3393 | 0.0844 | 105 | 1.2027 | 5684304 |
0.3478 | 0.0885 | 110 | 1.2009 | 5960032 |
0.3301 | 0.0925 | 115 | 1.2019 | 6233704 |
0.3595 | 0.0965 | 120 | 1.1983 | 6513328 |
0.2708 | 0.1005 | 125 | 1.1992 | 6787368 |
0.3066 | 0.1045 | 130 | 1.1890 | 7058856 |
0.3377 | 0.1086 | 135 | 1.1993 | 7326624 |
0.2734 | 0.1126 | 140 | 1.1889 | 7592432 |
0.2268 | 0.1166 | 145 | 1.1897 | 7861688 |
0.1791 | 0.1206 | 150 | 1.1898 | 8135448 |
0.2283 | 0.1246 | 155 | 1.1914 | 8402560 |
0.2149 | 0.1287 | 160 | 1.1828 | 8676568 |
0.2341 | 0.1327 | 165 | 1.1826 | 8955080 |
0.1841 | 0.1367 | 170 | 1.1806 | 9223800 |
0.2142 | 0.1407 | 175 | 1.1779 | 9498448 |
0.2336 | 0.1447 | 180 | 1.1816 | 9774656 |
0.2968 | 0.1488 | 185 | 1.1765 | 10048784 |
0.1364 | 0.1528 | 190 | 1.1827 | 10319264 |
0.2182 | 0.1568 | 195 | 1.1731 | 10587344 |
0.234 | 0.1608 | 200 | 1.1809 | 10855224 |
0.2346 | 0.1648 | 205 | 1.1748 | 11119840 |
0.2163 | 0.1689 | 210 | 1.1755 | 11385120 |
0.1277 | 0.1729 | 215 | 1.1708 | 11649928 |
0.2155 | 0.1769 | 220 | 1.1753 | 11919808 |
0.2224 | 0.1809 | 225 | 1.1692 | 12187272 |
0.2839 | 0.1849 | 230 | 1.1682 | 12449784 |
0.2706 | 0.1890 | 235 | 1.1657 | 12713640 |
0.208 | 0.1930 | 240 | 1.1663 | 12981568 |
0.3348 | 0.1970 | 245 | 1.1668 | 13246088 |
0.2328 | 0.2010 | 250 | 1.1643 | 13527032 |
0.21 | 0.2050 | 255 | 1.1641 | 13794352 |
0.2265 | 0.2091 | 260 | 1.1654 | 14060808 |
0.2371 | 0.2131 | 265 | 1.1611 | 14326568 |
0.1337 | 0.2171 | 270 | 1.1660 | 14596256 |
0.1614 | 0.2211 | 275 | 1.1640 | 14863360 |
0.156 | 0.2251 | 280 | 1.1612 | 15131520 |
0.2243 | 0.2292 | 285 | 1.1625 | 15396504 |
0.1061 | 0.2332 | 290 | 1.1599 | 15662048 |
0.2318 | 0.2372 | 295 | 1.1587 | 15924504 |
0.2002 | 0.2412 | 300 | 1.1617 | 16188520 |
0.1501 | 0.2453 | 305 | 1.1567 | 16462088 |
0.164 | 0.2493 | 310 | 1.1586 | 16728552 |
0.1685 | 0.2533 | 315 | 1.1593 | 16997816 |
0.2137 | 0.2573 | 320 | 1.1597 | 17271304 |
0.2127 | 0.2613 | 325 | 1.1538 | 17545976 |
0.2043 | 0.2654 | 330 | 1.1637 | 17818488 |
0.2089 | 0.2694 | 335 | 1.1541 | 18087784 |
0.2464 | 0.2734 | 340 | 1.1537 | 18359592 |
0.2208 | 0.2774 | 345 | 1.1545 | 18632376 |
0.1978 | 0.2814 | 350 | 1.1534 | 18901080 |
0.1607 | 0.2855 | 355 | 1.1560 | 19166664 |
0.1316 | 0.2895 | 360 | 1.1539 | 19433808 |
0.1762 | 0.2935 | 365 | 1.1498 | 19707816 |
0.2698 | 0.2975 | 370 | 1.1493 | 19974784 |
0.1259 | 0.3015 | 375 | 1.1471 | 20245472 |
0.1371 | 0.3056 | 380 | 1.1475 | 20525528 |
0.2212 | 0.3096 | 385 | 1.1480 | 20802288 |
0.2278 | 0.3136 | 390 | 1.1473 | 21064528 |
0.1991 | 0.3176 | 395 | 1.1484 | 21333008 |
0.1766 | 0.3216 | 400 | 1.1453 | 21608208 |
0.129 | 0.3257 | 405 | 1.1489 | 21881824 |
0.1451 | 0.3297 | 410 | 1.1449 | 22153952 |
0.1526 | 0.3337 | 415 | 1.1432 | 22427152 |
0.2111 | 0.3377 | 420 | 1.1434 | 22694304 |
0.1552 | 0.3417 | 425 | 1.1449 | 22967072 |
0.2009 | 0.3458 | 430 | 1.1419 | 23234792 |
0.1275 | 0.3498 | 435 | 1.1435 | 23509544 |
0.1635 | 0.3538 | 440 | 1.1424 | 23780264 |
0.1961 | 0.3578 | 445 | 1.1379 | 24049528 |
0.1363 | 0.3618 | 450 | 1.1440 | 24327024 |
0.1557 | 0.3659 | 455 | 1.1421 | 24597640 |
0.1438 | 0.3699 | 460 | 1.1379 | 24860472 |
0.2417 | 0.3739 | 465 | 1.1393 | 25133472 |
0.1708 | 0.3779 | 470 | 1.1363 | 25405592 |
0.1151 | 0.3819 | 475 | 1.1423 | 25672128 |
0.1869 | 0.3860 | 480 | 1.1394 | 25937304 |
0.1781 | 0.3900 | 485 | 1.1371 | 26209136 |
0.1838 | 0.3940 | 490 | 1.1383 | 26481296 |
0.189 | 0.3980 | 495 | 1.1367 | 26752808 |
0.1679 | 0.4021 | 500 | 1.1336 | 27019792 |
0.0757 | 0.4061 | 505 | 1.1386 | 27288800 |
0.1733 | 0.4101 | 510 | 1.1366 | 27554256 |
0.1756 | 0.4141 | 515 | 1.1338 | 27831136 |
0.1946 | 0.4181 | 520 | 1.1366 | 28100856 |
0.188 | 0.4222 | 525 | 1.1330 | 28369880 |
0.1342 | 0.4262 | 530 | 1.1344 | 28644496 |
0.1069 | 0.4302 | 535 | 1.1356 | 28909128 |
0.1664 | 0.4342 | 540 | 1.1350 | 29183120 |
0.1259 | 0.4382 | 545 | 1.1349 | 29449168 |
0.1821 | 0.4423 | 550 | 1.1306 | 29719296 |
0.1504 | 0.4463 | 555 | 1.1333 | 29998376 |
0.1849 | 0.4503 | 560 | 1.1339 | 30265000 |
0.1199 | 0.4543 | 565 | 1.1305 | 30539552 |
0.1379 | 0.4583 | 570 | 1.1315 | 30808552 |
0.1908 | 0.4624 | 575 | 1.1320 | 31085144 |
0.1671 | 0.4664 | 580 | 1.1316 | 31350720 |
0.1946 | 0.4704 | 585 | 1.1303 | 31619488 |
0.1132 | 0.4744 | 590 | 1.1300 | 31890024 |
0.1649 | 0.4784 | 595 | 1.1296 | 32158616 |
0.1743 | 0.4825 | 600 | 1.1289 | 32424064 |
0.1583 | 0.4865 | 605 | 1.1268 | 32691920 |
0.2174 | 0.4905 | 610 | 1.1305 | 32960456 |
0.1992 | 0.4945 | 615 | 1.1311 | 33228408 |
0.1422 | 0.4985 | 620 | 1.1280 | 33498080 |
0.2044 | 0.5026 | 625 | 1.1322 | 33770464 |
0.1475 | 0.5066 | 630 | 1.1341 | 34036664 |
0.2034 | 0.5106 | 635 | 1.1277 | 34305904 |
0.191 | 0.5146 | 640 | 1.1277 | 34575208 |
0.1587 | 0.5186 | 645 | 1.1287 | 34850104 |
0.2476 | 0.5227 | 650 | 1.1249 | 35116504 |
0.2104 | 0.5267 | 655 | 1.1235 | 35393896 |
0.1535 | 0.5307 | 660 | 1.1271 | 35668016 |
0.1741 | 0.5347 | 665 | 1.1279 | 35943656 |
0.2061 | 0.5387 | 670 | 1.1240 | 36214360 |
0.1185 | 0.5428 | 675 | 1.1265 | 36481112 |
0.1764 | 0.5468 | 680 | 1.1249 | 36749112 |
0.1545 | 0.5508 | 685 | 1.1244 | 37024504 |
0.0851 | 0.5548 | 690 | 1.1291 | 37299576 |
0.1687 | 0.5589 | 695 | 1.1281 | 37569920 |
0.1989 | 0.5629 | 700 | 1.1243 | 37842896 |
0.1796 | 0.5669 | 705 | 1.1256 | 38116584 |
0.1833 | 0.5709 | 710 | 1.1240 | 38390296 |
0.1043 | 0.5749 | 715 | 1.1225 | 38659752 |
0.1557 | 0.5790 | 720 | 1.1230 | 38935632 |
0.15 | 0.5830 | 725 | 1.1246 | 39203824 |
0.1298 | 0.5870 | 730 | 1.1232 | 39473296 |
0.1411 | 0.5910 | 735 | 1.1229 | 39741784 |
0.147 | 0.5950 | 740 | 1.1204 | 40009360 |
0.2156 | 0.5991 | 745 | 1.1213 | 40282160 |
0.1898 | 0.6031 | 750 | 1.1213 | 40548488 |
0.1643 | 0.6071 | 755 | 1.1206 | 40817320 |
0.1633 | 0.6111 | 760 | 1.1194 | 41089992 |
0.2122 | 0.6151 | 765 | 1.1179 | 41361824 |
0.144 | 0.6192 | 770 | 1.1214 | 41629032 |
0.157 | 0.6232 | 775 | 1.1204 | 41903408 |
0.1663 | 0.6272 | 780 | 1.1181 | 42177056 |
0.1367 | 0.6312 | 785 | 1.1184 | 42442704 |
0.1402 | 0.6352 | 790 | 1.1200 | 42709072 |
0.1044 | 0.6393 | 795 | 1.1179 | 42983656 |
0.144 | 0.6433 | 800 | 1.1194 | 43254328 |
0.1364 | 0.6473 | 805 | 1.1194 | 43520408 |
0.1167 | 0.6513 | 810 | 1.1207 | 43798248 |
0.1429 | 0.6553 | 815 | 1.1164 | 44062968 |
0.2173 | 0.6594 | 820 | 1.1190 | 44331392 |
0.1875 | 0.6634 | 825 | 1.1198 | 44599096 |
0.2148 | 0.6674 | 830 | 1.1168 | 44875096 |
0.1699 | 0.6714 | 835 | 1.1172 | 45141816 |
0.1539 | 0.6754 | 840 | 1.1186 | 45410320 |
0.1526 | 0.6795 | 845 | 1.1158 | 45682664 |
0.1549 | 0.6835 | 850 | 1.1160 | 45949128 |
0.1646 | 0.6875 | 855 | 1.1175 | 46216296 |
0.1656 | 0.6915 | 860 | 1.1159 | 46485272 |
0.1019 | 0.6955 | 865 | 1.1151 | 46756512 |
0.1653 | 0.6996 | 870 | 1.1183 | 47019768 |
0.1772 | 0.7036 | 875 | 1.1163 | 47293512 |
0.1266 | 0.7076 | 880 | 1.1179 | 47567472 |
0.1419 | 0.7116 | 885 | 1.1160 | 47843136 |
0.1546 | 0.7156 | 890 | 1.1134 | 48110400 |
0.1465 | 0.7197 | 895 | 1.1148 | 48378768 |
0.1887 | 0.7237 | 900 | 1.1161 | 48647776 |
0.1071 | 0.7277 | 905 | 1.1143 | 48920112 |
0.1604 | 0.7317 | 910 | 1.1151 | 49190544 |
0.136 | 0.7358 | 915 | 1.1167 | 49459456 |
0.2092 | 0.7398 | 920 | 1.1136 | 49732632 |
0.1856 | 0.7438 | 925 | 1.1119 | 50006352 |
0.1166 | 0.7478 | 930 | 1.1140 | 50275992 |
0.2299 | 0.7518 | 935 | 1.1159 | 50547224 |
0.0837 | 0.7559 | 940 | 1.1146 | 50811760 |
0.1858 | 0.7599 | 945 | 1.1140 | 51084608 |
0.1008 | 0.7639 | 950 | 1.1140 | 51358360 |
0.1142 | 0.7679 | 955 | 1.1132 | 51634840 |
0.1369 | 0.7719 | 960 | 1.1133 | 51907800 |
0.1994 | 0.7760 | 965 | 1.1155 | 52177152 |
0.1486 | 0.7800 | 970 | 1.1120 | 52447296 |
0.1639 | 0.7840 | 975 | 1.1098 | 52707720 |
0.17 | 0.7880 | 980 | 1.1100 | 52975352 |
0.1352 | 0.7920 | 985 | 1.1121 | 53247312 |
0.2062 | 0.7961 | 990 | 1.1133 | 53511120 |
0.1653 | 0.8001 | 995 | 1.1122 | 53776256 |
0.1477 | 0.8041 | 1000 | 1.1100 | 54039120 |
0.1882 | 0.8081 | 1005 | 1.1101 | 54313664 |
0.204 | 0.8121 | 1010 | 1.1123 | 54585280 |
0.2283 | 0.8162 | 1015 | 1.1110 | 54858160 |
0.1394 | 0.8202 | 1020 | 1.1093 | 55133280 |
0.2045 | 0.8242 | 1025 | 1.1098 | 55405552 |
0.1561 | 0.8282 | 1030 | 1.1102 | 55676936 |
0.127 | 0.8322 | 1035 | 1.1096 | 55955232 |
0.1593 | 0.8363 | 1040 | 1.1093 | 56227728 |
0.1457 | 0.8403 | 1045 | 1.1085 | 56498840 |
0.1505 | 0.8443 | 1050 | 1.1090 | 56774088 |
0.0862 | 0.8483 | 1055 | 1.1083 | 57043608 |
0.1709 | 0.8523 | 1060 | 1.1089 | 57316712 |
0.1509 | 0.8564 | 1065 | 1.1101 | 57589400 |
0.0836 | 0.8604 | 1070 | 1.1123 | 57861224 |
0.0966 | 0.8644 | 1075 | 1.1111 | 58131600 |
0.1184 | 0.8684 | 1080 | 1.1087 | 58406088 |
0.1669 | 0.8724 | 1085 | 1.1105 | 58677056 |
0.1793 | 0.8765 | 1090 | 1.1105 | 58947920 |
0.1333 | 0.8805 | 1095 | 1.1075 | 59225112 |
0.1882 | 0.8845 | 1100 | 1.1071 | 59497416 |
0.1828 | 0.8885 | 1105 | 1.1118 | 59770208 |
0.1227 | 0.8926 | 1110 | 1.1080 | 60043840 |
0.1234 | 0.8966 | 1115 | 1.1054 | 60310768 |
0.1036 | 0.9006 | 1120 | 1.1096 | 60581280 |
0.1349 | 0.9046 | 1125 | 1.1095 | 60852688 |
0.1352 | 0.9086 | 1130 | 1.1063 | 61121240 |
0.1958 | 0.9127 | 1135 | 1.1101 | 61391848 |
0.1466 | 0.9167 | 1140 | 1.1118 | 61664592 |
0.1887 | 0.9207 | 1145 | 1.1101 | 61934816 |
0.1769 | 0.9247 | 1150 | 1.1103 | 62201872 |
0.2028 | 0.9287 | 1155 | 1.1101 | 62478232 |
0.1435 | 0.9328 | 1160 | 1.1093 | 62750744 |
0.1907 | 0.9368 | 1165 | 1.1085 | 63023736 |
0.1991 | 0.9408 | 1170 | 1.1089 | 63296368 |
0.0962 | 0.9448 | 1175 | 1.1070 | 63568840 |
0.095 | 0.9488 | 1180 | 1.1092 | 63836616 |
0.0938 | 0.9529 | 1185 | 1.1117 | 64100784 |
0.161 | 0.9569 | 1190 | 1.1090 | 64373400 |
0.1724 | 0.9609 | 1195 | 1.1078 | 64640352 |
0.1555 | 0.9649 | 1200 | 1.1077 | 64919536 |
0.1529 | 0.9689 | 1205 | 1.1102 | 65190544 |
0.1552 | 0.9730 | 1210 | 1.1076 | 65465016 |
0.1303 | 0.9770 | 1215 | 1.1060 | 65741824 |
0.1953 | 0.9810 | 1220 | 1.1070 | 66013568 |
0.1245 | 0.9850 | 1225 | 1.1057 | 66286920 |
0.1362 | 0.9890 | 1230 | 1.1054 | 66561008 |
0.217 | 0.9931 | 1235 | 1.1071 | 66831688 |
0.2241 | 0.9971 | 1240 | 1.1069 | 67100816 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd2
Base model
google/gemma-2-2b