--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1045 - Num Input Tokens Seen: 36166336 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3909 | 0 | | 1.6049 | 0.0075 | 5 | 1.3862 | 273640 | | 1.6224 | 0.0151 | 10 | 1.3404 | 554216 | | 1.4024 | 0.0226 | 15 | 1.2742 | 825824 | | 1.3776 | 0.0302 | 20 | 1.2246 | 1100896 | | 1.2832 | 0.0377 | 25 | 1.1803 | 1379192 | | 1.22 | 0.0452 | 30 | 1.1783 | 1656944 | | 0.9584 | 0.0528 | 35 | 1.1731 | 1925784 | | 0.8881 | 0.0603 | 40 | 1.2068 | 2192080 | | 0.8391 | 0.0678 | 45 | 1.2100 | 2459864 | | 0.7926 | 0.0754 | 50 | 1.2160 | 2736544 | | 0.647 | 0.0829 | 55 | 1.2217 | 3005032 | | 0.6438 | 0.0905 | 60 | 1.2151 | 3277256 | | 0.5487 | 0.0980 | 65 | 1.2157 | 3547224 | | 0.536 | 0.1055 | 70 | 1.2048 | 3817448 | | 0.4943 | 0.1131 | 75 | 1.1964 | 4094432 | | 0.5394 | 0.1206 | 80 | 1.1933 | 4367400 | | 0.3851 | 0.1282 | 85 | 1.1909 | 4635248 | | 0.4303 | 0.1357 | 90 | 1.1893 | 4903792 | | 0.4199 | 0.1432 | 95 | 1.1818 | 5173464 | | 0.3878 | 0.1508 | 100 | 1.1820 | 5446408 | | 0.4044 | 0.1583 | 105 | 1.1846 | 5722824 | | 0.3266 | 0.1658 | 110 | 1.1800 | 5998616 | | 0.3367 | 0.1734 | 115 | 1.1756 | 6269328 | | 0.2639 | 0.1809 | 120 | 1.1786 | 6542264 | | 0.2647 | 0.1885 | 125 | 1.1753 | 6813600 | | 0.3762 | 0.1960 | 130 | 1.1739 | 7087552 | | 0.3209 | 0.2035 | 135 | 1.1699 | 7360376 | | 0.3376 | 0.2111 | 140 | 1.1709 | 7632536 | | 0.2674 | 0.2186 | 145 | 1.1719 | 7901296 | | 0.2631 | 0.2262 | 150 | 1.1681 | 8167576 | | 0.3092 | 0.2337 | 155 | 1.1664 | 8438360 | | 0.3305 | 0.2412 | 160 | 1.1669 | 8709792 | | 0.3066 | 0.2488 | 165 | 1.1607 | 8988856 | | 0.2807 | 0.2563 | 170 | 1.1590 | 9265304 | | 0.3085 | 0.2639 | 175 | 1.1574 | 9543928 | | 0.2921 | 0.2714 | 180 | 1.1527 | 9817056 | | 0.3605 | 0.2789 | 185 | 1.1557 | 10088872 | | 0.2578 | 0.2865 | 190 | 1.1481 | 10360768 | | 0.3511 | 0.2940 | 195 | 1.1570 | 10632016 | | 0.3591 | 0.3015 | 200 | 1.1461 | 10907720 | | 0.2076 | 0.3091 | 205 | 1.1540 | 11181728 | | 0.3326 | 0.3166 | 210 | 1.1482 | 11460608 | | 0.3914 | 0.3242 | 215 | 1.1478 | 11730288 | | 0.304 | 0.3317 | 220 | 1.1487 | 12001208 | | 0.3811 | 0.3392 | 225 | 1.1459 | 12272960 | | 0.2744 | 0.3468 | 230 | 1.1408 | 12542408 | | 0.326 | 0.3543 | 235 | 1.1443 | 12813656 | | 0.3474 | 0.3619 | 240 | 1.1414 | 13084432 | | 0.3346 | 0.3694 | 245 | 1.1430 | 13360240 | | 0.2965 | 0.3769 | 250 | 1.1417 | 13639536 | | 0.2382 | 0.3845 | 255 | 1.1373 | 13914080 | | 0.2243 | 0.3920 | 260 | 1.1406 | 14189128 | | 0.1954 | 0.3995 | 265 | 1.1370 | 14460672 | | 0.2857 | 0.4071 | 270 | 1.1398 | 14727040 | | 0.2819 | 0.4146 | 275 | 1.1351 | 15002688 | | 0.2801 | 0.4222 | 280 | 1.1367 | 15275512 | | 0.2907 | 0.4297 | 285 | 1.1351 | 15554848 | | 0.2928 | 0.4372 | 290 | 1.1314 | 15828296 | | 0.2588 | 0.4448 | 295 | 1.1358 | 16106416 | | 0.2453 | 0.4523 | 300 | 1.1329 | 16381944 | | 0.3333 | 0.4599 | 305 | 1.1309 | 16661632 | | 0.1884 | 0.4674 | 310 | 1.1300 | 16934712 | | 0.3095 | 0.4749 | 315 | 1.1309 | 17209816 | | 0.2858 | 0.4825 | 320 | 1.1301 | 17484664 | | 0.3195 | 0.4900 | 325 | 1.1264 | 17759488 | | 0.3203 | 0.4975 | 330 | 1.1277 | 18034664 | | 0.3492 | 0.5051 | 335 | 1.1266 | 18311424 | | 0.3129 | 0.5126 | 340 | 1.1249 | 18584528 | | 0.2546 | 0.5202 | 345 | 1.1277 | 18861208 | | 0.2907 | 0.5277 | 350 | 1.1233 | 19135856 | | 0.2693 | 0.5352 | 355 | 1.1235 | 19415704 | | 0.2942 | 0.5428 | 360 | 1.1219 | 19685048 | | 0.2393 | 0.5503 | 365 | 1.1222 | 19954816 | | 0.2333 | 0.5579 | 370 | 1.1219 | 20226432 | | 0.2208 | 0.5654 | 375 | 1.1232 | 20499384 | | 0.2508 | 0.5729 | 380 | 1.1209 | 20779280 | | 0.2002 | 0.5805 | 385 | 1.1235 | 21053584 | | 0.3333 | 0.5880 | 390 | 1.1216 | 21325712 | | 0.2492 | 0.5956 | 395 | 1.1233 | 21599000 | | 0.2484 | 0.6031 | 400 | 1.1225 | 21871640 | | 0.3439 | 0.6106 | 405 | 1.1191 | 22140448 | | 0.3389 | 0.6182 | 410 | 1.1218 | 22409872 | | 0.2778 | 0.6257 | 415 | 1.1197 | 22691600 | | 0.2713 | 0.6332 | 420 | 1.1177 | 22961160 | | 0.2169 | 0.6408 | 425 | 1.1194 | 23229808 | | 0.2825 | 0.6483 | 430 | 1.1193 | 23493888 | | 0.2436 | 0.6559 | 435 | 1.1170 | 23766688 | | 0.3057 | 0.6634 | 440 | 1.1191 | 24038552 | | 0.2639 | 0.6709 | 445 | 1.1159 | 24312808 | | 0.322 | 0.6785 | 450 | 1.1162 | 24589072 | | 0.1909 | 0.6860 | 455 | 1.1180 | 24855872 | | 0.2823 | 0.6936 | 460 | 1.1171 | 25129120 | | 0.2644 | 0.7011 | 465 | 1.1143 | 25401832 | | 0.2379 | 0.7086 | 470 | 1.1151 | 25676584 | | 0.2572 | 0.7162 | 475 | 1.1151 | 25946424 | | 0.1768 | 0.7237 | 480 | 1.1121 | 26216712 | | 0.3079 | 0.7312 | 485 | 1.1137 | 26483648 | | 0.1986 | 0.7388 | 490 | 1.1112 | 26756200 | | 0.2847 | 0.7463 | 495 | 1.1128 | 27024176 | | 0.1732 | 0.7539 | 500 | 1.1135 | 27293512 | | 0.2724 | 0.7614 | 505 | 1.1120 | 27569208 | | 0.285 | 0.7689 | 510 | 1.1124 | 27836456 | | 0.2303 | 0.7765 | 515 | 1.1100 | 28107632 | | 0.2479 | 0.7840 | 520 | 1.1107 | 28377688 | | 0.2432 | 0.7916 | 525 | 1.1109 | 28646944 | | 0.3432 | 0.7991 | 530 | 1.1102 | 28922352 | | 0.217 | 0.8066 | 535 | 1.1094 | 29197160 | | 0.2464 | 0.8142 | 540 | 1.1099 | 29473128 | | 0.3135 | 0.8217 | 545 | 1.1086 | 29746736 | | 0.2532 | 0.8292 | 550 | 1.1095 | 30013224 | | 0.3145 | 0.8368 | 555 | 1.1090 | 30281256 | | 0.207 | 0.8443 | 560 | 1.1067 | 30549144 | | 0.1811 | 0.8519 | 565 | 1.1080 | 30828416 | | 0.3074 | 0.8594 | 570 | 1.1079 | 31104032 | | 0.2753 | 0.8669 | 575 | 1.1048 | 31374216 | | 0.155 | 0.8745 | 580 | 1.1082 | 31649384 | | 0.2296 | 0.8820 | 585 | 1.1087 | 31920192 | | 0.2206 | 0.8896 | 590 | 1.1057 | 32187320 | | 0.2657 | 0.8971 | 595 | 1.1065 | 32463088 | | 0.2821 | 0.9046 | 600 | 1.1069 | 32731832 | | 0.2835 | 0.9122 | 605 | 1.1051 | 33003520 | | 0.2168 | 0.9197 | 610 | 1.1063 | 33270088 | | 0.2783 | 0.9273 | 615 | 1.1067 | 33542704 | | 0.2993 | 0.9348 | 620 | 1.1048 | 33816144 | | 0.2227 | 0.9423 | 625 | 1.1027 | 34089248 | | 0.243 | 0.9499 | 630 | 1.1044 | 34359824 | | 0.2575 | 0.9574 | 635 | 1.1044 | 34638264 | | 0.1769 | 0.9649 | 640 | 1.1049 | 34910856 | | 0.2472 | 0.9725 | 645 | 1.1055 | 35184536 | | 0.2593 | 0.9800 | 650 | 1.1024 | 35455744 | | 0.2254 | 0.9876 | 655 | 1.1048 | 35726536 | | 0.1744 | 0.9951 | 660 | 1.1068 | 35999296 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1