jkazdan's picture
End of training
83a9658 verified
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: gemma-2-2b_hs2_iter1_sftsd2
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# gemma-2-2b_hs2_iter1_sftsd2
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2677
- Num Input Tokens Seen: 35944792
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log | 0 | 0 | 1.3956 | 0 |
| 1.7284 | 0.0078 | 5 | 1.3893 | 280208 |
| 1.6517 | 0.0155 | 10 | 1.3377 | 557184 |
| 1.6454 | 0.0233 | 15 | 1.2693 | 839944 |
| 1.5751 | 0.0310 | 20 | 1.2066 | 1116448 |
| 1.5203 | 0.0388 | 25 | 1.1696 | 1397280 |
| 1.3882 | 0.0465 | 30 | 1.1491 | 1680616 |
| 1.3696 | 0.0543 | 35 | 1.1215 | 1955768 |
| 1.3119 | 0.0621 | 40 | 1.1258 | 2232888 |
| 1.2479 | 0.0698 | 45 | 1.1246 | 2508928 |
| 1.1877 | 0.0776 | 50 | 1.1361 | 2794176 |
| 1.191 | 0.0853 | 55 | 1.1428 | 3073752 |
| 1.0747 | 0.0931 | 60 | 1.1572 | 3348696 |
| 1.0531 | 0.1008 | 65 | 1.1677 | 3627608 |
| 1.0625 | 0.1086 | 70 | 1.2026 | 3898840 |
| 0.9715 | 0.1164 | 75 | 1.1986 | 4179624 |
| 0.9747 | 0.1241 | 80 | 1.2427 | 4461328 |
| 0.9716 | 0.1319 | 85 | 1.2294 | 4736920 |
| 0.9452 | 0.1396 | 90 | 1.2619 | 5013312 |
| 1.0247 | 0.1474 | 95 | 1.2677 | 5293472 |
| 0.8262 | 0.1551 | 100 | 1.2883 | 5578928 |
| 0.6799 | 0.1629 | 105 | 1.3027 | 5856496 |
| 0.8171 | 0.1707 | 110 | 1.3283 | 6126776 |
| 0.7027 | 0.1784 | 115 | 1.3519 | 6406440 |
| 0.7042 | 0.1862 | 120 | 1.3772 | 6680416 |
| 0.7543 | 0.1939 | 125 | 1.3535 | 6953656 |
| 0.6807 | 0.2017 | 130 | 1.4231 | 7232432 |
| 0.5855 | 0.2094 | 135 | 1.3784 | 7507120 |
| 0.7191 | 0.2172 | 140 | 1.3874 | 7790608 |
| 0.5875 | 0.2250 | 145 | 1.3863 | 8065000 |
| 0.5784 | 0.2327 | 150 | 1.3860 | 8346920 |
| 0.5354 | 0.2405 | 155 | 1.4086 | 8629712 |
| 0.6037 | 0.2482 | 160 | 1.3805 | 8906336 |
| 0.4554 | 0.2560 | 165 | 1.4042 | 9182640 |
| 0.5024 | 0.2637 | 170 | 1.3671 | 9454216 |
| 0.4448 | 0.2715 | 175 | 1.4069 | 9741640 |
| 0.3781 | 0.2793 | 180 | 1.3702 | 10016872 |
| 0.4783 | 0.2870 | 185 | 1.3783 | 10296808 |
| 0.4253 | 0.2948 | 190 | 1.3800 | 10576400 |
| 0.4147 | 0.3025 | 195 | 1.4138 | 10849744 |
| 0.3662 | 0.3103 | 200 | 1.3900 | 11131368 |
| 0.4319 | 0.3180 | 205 | 1.3745 | 11410688 |
| 0.4286 | 0.3258 | 210 | 1.3917 | 11686496 |
| 0.3489 | 0.3336 | 215 | 1.3719 | 11963000 |
| 0.3023 | 0.3413 | 220 | 1.3771 | 12246240 |
| 0.3079 | 0.3491 | 225 | 1.3956 | 12524112 |
| 0.3112 | 0.3568 | 230 | 1.3856 | 12799064 |
| 0.3492 | 0.3646 | 235 | 1.3985 | 13083872 |
| 0.3029 | 0.3723 | 240 | 1.3872 | 13360208 |
| 0.2984 | 0.3801 | 245 | 1.3776 | 13633336 |
| 0.2626 | 0.3879 | 250 | 1.3793 | 13912936 |
| 0.3146 | 0.3956 | 255 | 1.3871 | 14192456 |
| 0.2251 | 0.4034 | 260 | 1.3809 | 14472368 |
| 0.3034 | 0.4111 | 265 | 1.3884 | 14747808 |
| 0.2146 | 0.4189 | 270 | 1.3702 | 15025808 |
| 0.2951 | 0.4266 | 275 | 1.3882 | 15318024 |
| 0.2956 | 0.4344 | 280 | 1.3525 | 15601792 |
| 0.157 | 0.4422 | 285 | 1.3716 | 15874904 |
| 0.2694 | 0.4499 | 290 | 1.3435 | 16160224 |
| 0.2041 | 0.4577 | 295 | 1.3542 | 16438136 |
| 0.1862 | 0.4654 | 300 | 1.3395 | 16719344 |
| 0.2708 | 0.4732 | 305 | 1.3424 | 17005168 |
| 0.2677 | 0.4809 | 310 | 1.3313 | 17288536 |
| 0.2224 | 0.4887 | 315 | 1.3514 | 17562128 |
| 0.2176 | 0.4965 | 320 | 1.3322 | 17839520 |
| 0.2215 | 0.5042 | 325 | 1.3404 | 18114128 |
| 0.1953 | 0.5120 | 330 | 1.3326 | 18386816 |
| 0.2335 | 0.5197 | 335 | 1.3491 | 18665728 |
| 0.1657 | 0.5275 | 340 | 1.3284 | 18949264 |
| 0.174 | 0.5352 | 345 | 1.3171 | 19231624 |
| 0.1749 | 0.5430 | 350 | 1.3372 | 19510600 |
| 0.1945 | 0.5508 | 355 | 1.3175 | 19787144 |
| 0.1734 | 0.5585 | 360 | 1.3292 | 20060832 |
| 0.1688 | 0.5663 | 365 | 1.3111 | 20343000 |
| 0.2114 | 0.5740 | 370 | 1.2973 | 20621792 |
| 0.247 | 0.5818 | 375 | 1.3031 | 20899152 |
| 0.2554 | 0.5895 | 380 | 1.2981 | 21178736 |
| 0.198 | 0.5973 | 385 | 1.3084 | 21460192 |
| 0.1658 | 0.6051 | 390 | 1.3008 | 21743360 |
| 0.1594 | 0.6128 | 395 | 1.2969 | 22023504 |
| 0.1822 | 0.6206 | 400 | 1.2989 | 22303296 |
| 0.2182 | 0.6283 | 405 | 1.2980 | 22580208 |
| 0.1501 | 0.6361 | 410 | 1.2959 | 22857520 |
| 0.1971 | 0.6438 | 415 | 1.2961 | 23139408 |
| 0.1209 | 0.6516 | 420 | 1.2936 | 23416216 |
| 0.2045 | 0.6594 | 425 | 1.2940 | 23689032 |
| 0.1208 | 0.6671 | 430 | 1.2885 | 23975088 |
| 0.1807 | 0.6749 | 435 | 1.2912 | 24256472 |
| 0.2515 | 0.6826 | 440 | 1.2910 | 24532032 |
| 0.1493 | 0.6904 | 445 | 1.2938 | 24809760 |
| 0.1724 | 0.6981 | 450 | 1.3047 | 25077656 |
| 0.112 | 0.7059 | 455 | 1.2930 | 25360240 |
| 0.2521 | 0.7137 | 460 | 1.2930 | 25639896 |
| 0.2002 | 0.7214 | 465 | 1.2897 | 25922216 |
| 0.1125 | 0.7292 | 470 | 1.2828 | 26202416 |
| 0.1834 | 0.7369 | 475 | 1.2841 | 26481664 |
| 0.1494 | 0.7447 | 480 | 1.2855 | 26760896 |
| 0.1791 | 0.7524 | 485 | 1.2840 | 27040744 |
| 0.2091 | 0.7602 | 490 | 1.2826 | 27323320 |
| 0.1752 | 0.7680 | 495 | 1.2811 | 27594200 |
| 0.1807 | 0.7757 | 500 | 1.2799 | 27873000 |
| 0.1733 | 0.7835 | 505 | 1.2862 | 28148616 |
| 0.172 | 0.7912 | 510 | 1.2815 | 28433512 |
| 0.1504 | 0.7990 | 515 | 1.2831 | 28712512 |
| 0.1439 | 0.8067 | 520 | 1.2806 | 28992760 |
| 0.2205 | 0.8145 | 525 | 1.2719 | 29271976 |
| 0.1169 | 0.8223 | 530 | 1.2834 | 29558328 |
| 0.1666 | 0.8300 | 535 | 1.2790 | 29840880 |
| 0.097 | 0.8378 | 540 | 1.2884 | 30128280 |
| 0.17 | 0.8455 | 545 | 1.2879 | 30413024 |
| 0.2252 | 0.8533 | 550 | 1.2760 | 30695448 |
| 0.2019 | 0.8610 | 555 | 1.2799 | 30977600 |
| 0.167 | 0.8688 | 560 | 1.2860 | 31262328 |
| 0.1098 | 0.8766 | 565 | 1.2680 | 31543232 |
| 0.1777 | 0.8843 | 570 | 1.2755 | 31825656 |
| 0.154 | 0.8921 | 575 | 1.2777 | 32092336 |
| 0.1597 | 0.8998 | 580 | 1.2637 | 32373968 |
| 0.2177 | 0.9076 | 585 | 1.2753 | 32649792 |
| 0.1061 | 0.9153 | 590 | 1.2658 | 32937384 |
| 0.2398 | 0.9231 | 595 | 1.2653 | 33209792 |
| 0.1724 | 0.9309 | 600 | 1.2751 | 33486552 |
| 0.1393 | 0.9386 | 605 | 1.2729 | 33765048 |
| 0.1553 | 0.9464 | 610 | 1.2851 | 34043056 |
| 0.1746 | 0.9541 | 615 | 1.2804 | 34325472 |
| 0.1952 | 0.9619 | 620 | 1.2698 | 34601600 |
| 0.1611 | 0.9696 | 625 | 1.2857 | 34884752 |
| 0.1861 | 0.9774 | 630 | 1.2717 | 35162760 |
| 0.1539 | 0.9852 | 635 | 1.2646 | 35439896 |
| 0.1576 | 0.9929 | 640 | 1.2741 | 35722472 |
### Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1