jkazdan's picture
End of training
befa171 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: gemma-2-2b_hs2_iter1_sftsd2
    results: []

gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2172
  • Num Input Tokens Seen: 18470688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.8375 0.0151 5 1.3784 277040
1.5937 0.0301 10 1.2687 554320
1.5082 0.0452 15 1.1925 832568
1.3528 0.0602 20 1.1570 1104816
1.29 0.0753 25 1.1362 1377136
1.2141 0.0903 30 1.1415 1648280
1.0916 0.1054 35 1.1517 1928952
1.0637 0.1205 40 1.1848 2205568
0.997 0.1355 45 1.2021 2486744
0.8411 0.1506 50 1.2454 2759672
0.819 0.1656 55 1.2625 3034344
0.8372 0.1807 60 1.2813 3310160
0.7501 0.1957 65 1.3245 3591528
0.701 0.2108 70 1.3285 3867064
0.6381 0.2259 75 1.3442 4136080
0.5853 0.2409 80 1.3674 4413080
0.5914 0.2560 85 1.3762 4697248
0.539 0.2710 90 1.3602 4976440
0.5163 0.2861 95 1.3418 5258848
0.3974 0.3011 100 1.3244 5530232
0.415 0.3162 105 1.3646 5806632
0.3812 0.3313 110 1.3175 6085304
0.3926 0.3463 115 1.3466 6366392
0.3356 0.3614 120 1.3194 6645272
0.3933 0.3764 125 1.3229 6933352
0.3463 0.3915 130 1.3271 7209752
0.3245 0.4065 135 1.3134 7487224
0.3898 0.4216 140 1.3007 7763992
0.238 0.4367 145 1.3160 8052304
0.3031 0.4517 150 1.3038 8323880
0.363 0.4668 155 1.3004 8594840
0.3207 0.4818 160 1.2812 8877704
0.2837 0.4969 165 1.2827 9158496
0.1469 0.5120 170 1.2875 9437080
0.2441 0.5270 175 1.2807 9715752
0.2553 0.5421 180 1.2806 9997688
0.2823 0.5571 185 1.2647 10279272
0.2381 0.5722 190 1.2680 10555816
0.2152 0.5872 195 1.2607 10829488
0.2018 0.6023 200 1.2581 11107824
0.2278 0.6174 205 1.2819 11388528
0.2623 0.6324 210 1.2529 11675728
0.2305 0.6475 215 1.2584 11954704
0.1346 0.6625 220 1.2531 12227408
0.2306 0.6776 225 1.2524 12509728
0.2329 0.6926 230 1.2434 12789144
0.1821 0.7077 235 1.2447 13064784
0.238 0.7228 240 1.2315 13335048
0.2227 0.7378 245 1.2391 13612832
0.2414 0.7529 250 1.2377 13892512
0.1753 0.7679 255 1.2327 14174312
0.2232 0.7830 260 1.2354 14454112
0.209 0.7980 265 1.2343 14724840
0.1725 0.8131 270 1.2314 15000280
0.1442 0.8282 275 1.2273 15282784
0.2197 0.8432 280 1.2237 15556416
0.2327 0.8583 285 1.2239 15842432
0.233 0.8733 290 1.2274 16119456
0.2136 0.8884 295 1.2228 16398960
0.1161 0.9034 300 1.2295 16675056
0.1408 0.9185 305 1.2214 16956240
0.2016 0.9336 310 1.2247 17235632
0.2294 0.9486 315 1.2298 17515584
0.1335 0.9637 320 1.2145 17798760
0.1811 0.9787 325 1.2251 18075960
0.2033 0.9938 330 1.2213 18358176

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1