--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1021 - Num Input Tokens Seen: 21968712 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.6085 | 0.0130 | 5 | 1.3800 | 289184 | | 1.4378 | 0.0260 | 10 | 1.2933 | 571680 | | 1.3575 | 0.0390 | 15 | 1.2182 | 858680 | | 1.3348 | 0.0520 | 20 | 1.1684 | 1145936 | | 1.1904 | 0.0650 | 25 | 1.1500 | 1437472 | | 1.2228 | 0.0779 | 30 | 1.1339 | 1724288 | | 1.0694 | 0.0909 | 35 | 1.1383 | 2009272 | | 0.9697 | 0.1039 | 40 | 1.1630 | 2289000 | | 0.9051 | 0.1169 | 45 | 1.1742 | 2569208 | | 0.8855 | 0.1299 | 50 | 1.1729 | 2856576 | | 0.8853 | 0.1429 | 55 | 1.1758 | 3146856 | | 0.8296 | 0.1559 | 60 | 1.1816 | 3431392 | | 0.7121 | 0.1689 | 65 | 1.1736 | 3726000 | | 0.7528 | 0.1819 | 70 | 1.1792 | 4010080 | | 0.5996 | 0.1949 | 75 | 1.1802 | 4295264 | | 0.6437 | 0.2079 | 80 | 1.1785 | 4576256 | | 0.6683 | 0.2209 | 85 | 1.1733 | 4869384 | | 0.5115 | 0.2338 | 90 | 1.1750 | 5151776 | | 0.545 | 0.2468 | 95 | 1.1701 | 5443960 | | 0.5348 | 0.2598 | 100 | 1.1673 | 5728368 | | 0.5687 | 0.2728 | 105 | 1.1641 | 6017560 | | 0.4856 | 0.2858 | 110 | 1.1663 | 6300000 | | 0.4691 | 0.2988 | 115 | 1.1630 | 6586672 | | 0.4454 | 0.3118 | 120 | 1.1585 | 6869504 | | 0.5734 | 0.3248 | 125 | 1.1606 | 7159680 | | 0.4317 | 0.3378 | 130 | 1.1529 | 7437936 | | 0.4603 | 0.3508 | 135 | 1.1541 | 7727120 | | 0.5264 | 0.3638 | 140 | 1.1542 | 8013352 | | 0.5051 | 0.3767 | 145 | 1.1493 | 8302848 | | 0.397 | 0.3897 | 150 | 1.1528 | 8588472 | | 0.4173 | 0.4027 | 155 | 1.1463 | 8876960 | | 0.3443 | 0.4157 | 160 | 1.1474 | 9156600 | | 0.4343 | 0.4287 | 165 | 1.1455 | 9440520 | | 0.4683 | 0.4417 | 170 | 1.1431 | 9726600 | | 0.4732 | 0.4547 | 175 | 1.1408 | 10009248 | | 0.4876 | 0.4677 | 180 | 1.1414 | 10297320 | | 0.4574 | 0.4807 | 185 | 1.1369 | 10582704 | | 0.4038 | 0.4937 | 190 | 1.1354 | 10870648 | | 0.4239 | 0.5067 | 195 | 1.1355 | 11148576 | | 0.5262 | 0.5196 | 200 | 1.1291 | 11436464 | | 0.4788 | 0.5326 | 205 | 1.1322 | 11721416 | | 0.3975 | 0.5456 | 210 | 1.1276 | 12012696 | | 0.3807 | 0.5586 | 215 | 1.1310 | 12299376 | | 0.4784 | 0.5716 | 220 | 1.1232 | 12594368 | | 0.4 | 0.5846 | 225 | 1.1272 | 12880616 | | 0.4511 | 0.5976 | 230 | 1.1229 | 13164112 | | 0.4119 | 0.6106 | 235 | 1.1234 | 13446016 | | 0.3515 | 0.6236 | 240 | 1.1224 | 13729688 | | 0.3695 | 0.6366 | 245 | 1.1201 | 14015064 | | 0.387 | 0.6496 | 250 | 1.1190 | 14303192 | | 0.4503 | 0.6626 | 255 | 1.1167 | 14587200 | | 0.3205 | 0.6755 | 260 | 1.1184 | 14875032 | | 0.3369 | 0.6885 | 265 | 1.1154 | 15159592 | | 0.46 | 0.7015 | 270 | 1.1173 | 15443480 | | 0.4148 | 0.7145 | 275 | 1.1121 | 15737624 | | 0.4251 | 0.7275 | 280 | 1.1141 | 16021928 | | 0.3786 | 0.7405 | 285 | 1.1126 | 16306944 | | 0.3593 | 0.7535 | 290 | 1.1114 | 16592904 | | 0.4698 | 0.7665 | 295 | 1.1114 | 16875744 | | 0.3327 | 0.7795 | 300 | 1.1098 | 17163408 | | 0.3521 | 0.7925 | 305 | 1.1125 | 17451024 | | 0.3682 | 0.8055 | 310 | 1.1076 | 17741680 | | 0.3266 | 0.8184 | 315 | 1.1098 | 18022800 | | 0.3986 | 0.8314 | 320 | 1.1078 | 18298600 | | 0.3869 | 0.8444 | 325 | 1.1078 | 18585288 | | 0.3904 | 0.8574 | 330 | 1.1072 | 18870912 | | 0.361 | 0.8704 | 335 | 1.1070 | 19165960 | | 0.4643 | 0.8834 | 340 | 1.1047 | 19458704 | | 0.4603 | 0.8964 | 345 | 1.1048 | 19741152 | | 0.4815 | 0.9094 | 350 | 1.1053 | 20029752 | | 0.3097 | 0.9224 | 355 | 1.1050 | 20317240 | | 0.3686 | 0.9354 | 360 | 1.1033 | 20601320 | | 0.485 | 0.9484 | 365 | 1.1042 | 20895904 | | 0.3946 | 0.9614 | 370 | 1.1014 | 21179672 | | 0.4621 | 0.9743 | 375 | 1.1032 | 21460376 | | 0.4748 | 0.9873 | 380 | 1.1025 | 21737656 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1