Edit model card

collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9540
  • Num Input Tokens Seen: 23746304

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.3192 0.0106 5 1.1930 251724
1.2066 0.0211 10 1.0874 499628
1.068 0.0317 15 1.0377 746640
0.7682 0.0423 20 1.0245 998064
0.6356 0.0529 25 1.0272 1247608
0.593 0.0634 30 1.0298 1506576
0.4293 0.0740 35 1.0250 1764228
0.3318 0.0846 40 1.0225 2018200
0.336 0.0952 45 1.0146 2271692
0.3199 0.1057 50 1.0056 2516500
0.268 0.1163 55 1.0010 2771000
0.3191 0.1269 60 0.9958 3020960
0.3025 0.1374 65 0.9967 3270868
0.2459 0.1480 70 0.9926 3522436
0.3354 0.1586 75 0.9893 3769816
0.3724 0.1692 80 0.9870 4023492
0.3067 0.1797 85 0.9857 4280320
0.3403 0.1903 90 0.9824 4533272
0.2916 0.2009 95 0.9814 4782572
0.2843 0.2115 100 0.9797 5032932
0.2819 0.2220 105 0.9773 5288380
0.192 0.2326 110 0.9761 5544116
0.223 0.2432 115 0.9750 5792512
0.2713 0.2538 120 0.9727 6045400
0.2705 0.2643 125 0.9716 6296860
0.3249 0.2749 130 0.9712 6546012
0.2854 0.2855 135 0.9688 6806812
0.242 0.2960 140 0.9702 7059244
0.2796 0.3066 145 0.9696 7303032
0.3008 0.3172 150 0.9680 7552056
0.2189 0.3278 155 0.9691 7807484
0.2304 0.3383 160 0.9697 8044776
0.2107 0.3489 165 0.9691 8297808
0.2989 0.3595 170 0.9663 8540908
0.2247 0.3701 175 0.9648 8800560
0.322 0.3806 180 0.9657 9055232
0.2985 0.3912 185 0.9644 9309612
0.2521 0.4018 190 0.9645 9563664
0.3678 0.4123 195 0.9656 9821076
0.2472 0.4229 200 0.9652 10073776
0.2586 0.4335 205 0.9634 10328452
0.2413 0.4441 210 0.9642 10588044
0.2502 0.4546 215 0.9636 10835888
0.2606 0.4652 220 0.9634 11091212
0.2124 0.4758 225 0.9637 11346152
0.3122 0.4864 230 0.9618 11605376
0.3033 0.4969 235 0.9603 11859096
0.4002 0.5075 240 0.9617 12112920
0.2307 0.5181 245 0.9596 12364040
0.3126 0.5286 250 0.9601 12612780
0.2353 0.5392 255 0.9606 12862168
0.2599 0.5498 260 0.9596 13118060
0.2025 0.5604 265 0.9582 13367704
0.2109 0.5709 270 0.9580 13618324
0.223 0.5815 275 0.9605 13868120
0.2711 0.5921 280 0.9605 14117864
0.2627 0.6027 285 0.9580 14361576
0.299 0.6132 290 0.9576 14613300
0.2192 0.6238 295 0.9569 14863772
0.2936 0.6344 300 0.9580 15117120
0.213 0.6449 305 0.9583 15367888
0.212 0.6555 310 0.9583 15620172
0.207 0.6661 315 0.9586 15867484
0.2712 0.6767 320 0.9580 16119180
0.2482 0.6872 325 0.9565 16372908
0.2093 0.6978 330 0.9549 16621448
0.2663 0.7084 335 0.9566 16873836
0.2744 0.7190 340 0.9569 17124156
0.2421 0.7295 345 0.9559 17371376
0.2775 0.7401 350 0.9555 17618956
0.1681 0.7507 355 0.9553 17871212
0.261 0.7613 360 0.9547 18122512
0.2847 0.7718 365 0.9541 18373000
0.2619 0.7824 370 0.9535 18624220
0.2633 0.7930 375 0.9543 18869156
0.2664 0.8035 380 0.9544 19118420
0.2411 0.8141 385 0.9527 19372484
0.2671 0.8247 390 0.9521 19619032
0.2495 0.8353 395 0.9528 19870712
0.1758 0.8458 400 0.9530 20120812
0.2762 0.8564 405 0.9534 20376396
0.3021 0.8670 410 0.9533 20629920
0.2111 0.8776 415 0.9535 20874064
0.2174 0.8881 420 0.9525 21129004
0.2087 0.8987 425 0.9531 21379724
0.3074 0.9093 430 0.9521 21630104
0.2314 0.9198 435 0.9519 21878192
0.224 0.9304 440 0.9527 22128940
0.2769 0.9410 445 0.9525 22382812
0.2538 0.9516 450 0.9538 22638260
0.2352 0.9621 455 0.9539 22886880
0.3061 0.9727 460 0.9513 23143232
0.2891 0.9833 465 0.9506 23396844
0.252 0.9939 470 0.9551 23650700

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1

Base model

google/gemma-2-9b
Finetuned
(99)
this model