metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2
results: []
collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1017
- Num Input Tokens Seen: 30391200
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6925 | 0.0091 | 5 | 1.3858 | 274360 |
1.4659 | 0.0183 | 10 | 1.3191 | 554728 |
1.4901 | 0.0274 | 15 | 1.2524 | 829128 |
1.2538 | 0.0365 | 20 | 1.1937 | 1108016 |
1.2819 | 0.0457 | 25 | 1.1684 | 1390936 |
1.0879 | 0.0548 | 30 | 1.1576 | 1671768 |
1.0761 | 0.0640 | 35 | 1.1720 | 1947856 |
0.9364 | 0.0731 | 40 | 1.1667 | 2228328 |
0.8285 | 0.0822 | 45 | 1.2083 | 2515248 |
0.7714 | 0.0914 | 50 | 1.2007 | 2796696 |
0.7316 | 0.1005 | 55 | 1.2211 | 3077936 |
0.5592 | 0.1096 | 60 | 1.2119 | 3353176 |
0.5585 | 0.1188 | 65 | 1.2018 | 3626024 |
0.4803 | 0.1279 | 70 | 1.2017 | 3898832 |
0.5021 | 0.1370 | 75 | 1.1963 | 4175912 |
0.4514 | 0.1462 | 80 | 1.2121 | 4455072 |
0.3612 | 0.1553 | 85 | 1.1931 | 4727720 |
0.4515 | 0.1645 | 90 | 1.1881 | 5009488 |
0.4461 | 0.1736 | 95 | 1.1880 | 5282608 |
0.5034 | 0.1827 | 100 | 1.1860 | 5553496 |
0.5685 | 0.1919 | 105 | 1.1842 | 5836064 |
0.4516 | 0.2010 | 110 | 1.1854 | 6114952 |
0.2958 | 0.2101 | 115 | 1.1750 | 6392272 |
0.3735 | 0.2193 | 120 | 1.1766 | 6663208 |
0.3907 | 0.2284 | 125 | 1.1676 | 6944456 |
0.4901 | 0.2376 | 130 | 1.1709 | 7221960 |
0.3111 | 0.2467 | 135 | 1.1608 | 7500464 |
0.3151 | 0.2558 | 140 | 1.1681 | 7786536 |
0.3311 | 0.2650 | 145 | 1.1629 | 8061032 |
0.3119 | 0.2741 | 150 | 1.1624 | 8339776 |
0.425 | 0.2832 | 155 | 1.1626 | 8614064 |
0.3599 | 0.2924 | 160 | 1.1609 | 8885704 |
0.3478 | 0.3015 | 165 | 1.1554 | 9166584 |
0.4074 | 0.3106 | 170 | 1.1529 | 9453272 |
0.24 | 0.3198 | 175 | 1.1585 | 9734480 |
0.3161 | 0.3289 | 180 | 1.1508 | 10011232 |
0.3567 | 0.3381 | 185 | 1.1568 | 10284712 |
0.3651 | 0.3472 | 190 | 1.1469 | 10565320 |
0.2963 | 0.3563 | 195 | 1.1513 | 10834768 |
0.3133 | 0.3655 | 200 | 1.1498 | 11114320 |
0.4982 | 0.3746 | 205 | 1.1447 | 11395816 |
0.3136 | 0.3837 | 210 | 1.1435 | 11676048 |
0.2945 | 0.3929 | 215 | 1.1452 | 11957056 |
0.2632 | 0.4020 | 220 | 1.1417 | 12225504 |
0.2754 | 0.4111 | 225 | 1.1421 | 12506816 |
0.2892 | 0.4203 | 230 | 1.1411 | 12778688 |
0.3303 | 0.4294 | 235 | 1.1351 | 13052448 |
0.3272 | 0.4386 | 240 | 1.1422 | 13325752 |
0.2219 | 0.4477 | 245 | 1.1361 | 13612800 |
0.3318 | 0.4568 | 250 | 1.1347 | 13888688 |
0.3058 | 0.4660 | 255 | 1.1358 | 14167640 |
0.3574 | 0.4751 | 260 | 1.1317 | 14443576 |
0.3944 | 0.4842 | 265 | 1.1296 | 14722000 |
0.3048 | 0.4934 | 270 | 1.1306 | 14994688 |
0.2954 | 0.5025 | 275 | 1.1313 | 15271576 |
0.3244 | 0.5116 | 280 | 1.1269 | 15548760 |
0.371 | 0.5208 | 285 | 1.1297 | 15821744 |
0.3526 | 0.5299 | 290 | 1.1274 | 16091768 |
0.2937 | 0.5391 | 295 | 1.1271 | 16364464 |
0.3097 | 0.5482 | 300 | 1.1230 | 16641960 |
0.3057 | 0.5573 | 305 | 1.1273 | 16918448 |
0.3099 | 0.5665 | 310 | 1.1251 | 17193440 |
0.283 | 0.5756 | 315 | 1.1235 | 17470240 |
0.3392 | 0.5847 | 320 | 1.1248 | 17749104 |
0.3276 | 0.5939 | 325 | 1.1205 | 18032184 |
0.2521 | 0.6030 | 330 | 1.1216 | 18317360 |
0.2278 | 0.6122 | 335 | 1.1183 | 18588736 |
0.2214 | 0.6213 | 340 | 1.1208 | 18864160 |
0.3554 | 0.6304 | 345 | 1.1189 | 19143568 |
0.2126 | 0.6396 | 350 | 1.1188 | 19430928 |
0.3241 | 0.6487 | 355 | 1.1182 | 19712432 |
0.2468 | 0.6578 | 360 | 1.1167 | 19992936 |
0.302 | 0.6670 | 365 | 1.1179 | 20275360 |
0.225 | 0.6761 | 370 | 1.1145 | 20554416 |
0.2699 | 0.6852 | 375 | 1.1150 | 20833584 |
0.2959 | 0.6944 | 380 | 1.1127 | 21116288 |
0.3684 | 0.7035 | 385 | 1.1135 | 21393272 |
0.2894 | 0.7127 | 390 | 1.1132 | 21664504 |
0.3468 | 0.7218 | 395 | 1.1104 | 21945840 |
0.3365 | 0.7309 | 400 | 1.1112 | 22224640 |
0.2756 | 0.7401 | 405 | 1.1138 | 22492512 |
0.2134 | 0.7492 | 410 | 1.1097 | 22774128 |
0.273 | 0.7583 | 415 | 1.1099 | 23054632 |
0.248 | 0.7675 | 420 | 1.1095 | 23332744 |
0.4175 | 0.7766 | 425 | 1.1101 | 23610928 |
0.2982 | 0.7857 | 430 | 1.1105 | 23886096 |
0.2497 | 0.7949 | 435 | 1.1085 | 24164752 |
0.2912 | 0.8040 | 440 | 1.1079 | 24441944 |
0.3517 | 0.8132 | 445 | 1.1078 | 24716256 |
0.3852 | 0.8223 | 450 | 1.1070 | 24992216 |
0.3735 | 0.8314 | 455 | 1.1088 | 25271800 |
0.3185 | 0.8406 | 460 | 1.1092 | 25558096 |
0.2549 | 0.8497 | 465 | 1.1083 | 25837144 |
0.1872 | 0.8588 | 470 | 1.1066 | 26120576 |
0.2247 | 0.8680 | 475 | 1.1073 | 26393552 |
0.2985 | 0.8771 | 480 | 1.1055 | 26672072 |
0.27 | 0.8862 | 485 | 1.1037 | 26957208 |
0.2618 | 0.8954 | 490 | 1.1059 | 27236264 |
0.2642 | 0.9045 | 495 | 1.1053 | 27515256 |
0.2234 | 0.9137 | 500 | 1.1039 | 27791360 |
0.3124 | 0.9228 | 505 | 1.1068 | 28070688 |
0.3348 | 0.9319 | 510 | 1.1028 | 28340240 |
0.3423 | 0.9411 | 515 | 1.1021 | 28613928 |
0.24 | 0.9502 | 520 | 1.1043 | 28889472 |
0.2406 | 0.9593 | 525 | 1.1058 | 29170016 |
0.2347 | 0.9685 | 530 | 1.1031 | 29451680 |
0.2342 | 0.9776 | 535 | 1.1043 | 29728536 |
0.3459 | 0.9868 | 540 | 1.1039 | 30007456 |
0.2486 | 0.9959 | 545 | 1.1014 | 30279832 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1