collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9420
  • Num Input Tokens Seen: 21270888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
3.3744 0.0119 5 1.0898 257008
2.7344 0.0239 10 1.0110 511176
3.0196 0.0358 15 0.9968 766352
2.6511 0.0477 20 0.9826 1021480
2.4604 0.0597 25 0.9840 1268604
2.3963 0.0716 30 0.9900 1514644
2.5123 0.0835 35 0.9893 1773960
2.267 0.0955 40 0.9847 2024128
2.2102 0.1074 45 0.9850 2275396
2.1111 0.1193 50 0.9831 2532996
2.2625 0.1312 55 0.9879 2786708
1.9372 0.1432 60 0.9866 3040624
1.6937 0.1551 65 0.9815 3290344
1.643 0.1670 70 0.9785 3545668
1.6501 0.1790 75 0.9776 3796572
1.533 0.1909 80 0.9773 4049160
1.429 0.2028 85 0.9747 4305488
1.5398 0.2148 90 0.9717 4555888
1.4633 0.2267 95 0.9719 4803416
1.216 0.2386 100 0.9739 5063736
1.2603 0.2506 105 0.9643 5316860
1.5926 0.2625 110 0.9673 5573004
1.3854 0.2744 115 0.9663 5831764
1.4685 0.2864 120 0.9643 6088656
1.2814 0.2983 125 0.9634 6343448
1.4396 0.3102 130 0.9613 6596508
1.2699 0.3221 135 0.9591 6855352
1.2285 0.3341 140 0.9617 7111244
1.2788 0.3460 145 0.9611 7360936
1.3858 0.3579 150 0.9586 7617700
1.2758 0.3699 155 0.9615 7876384
1.2891 0.3818 160 0.9556 8132024
1.3362 0.3937 165 0.9589 8384580
1.306 0.4057 170 0.9557 8634968
1.3192 0.4176 175 0.9574 8888388
1.3276 0.4295 180 0.9537 9137756
1.3805 0.4415 185 0.9530 9392512
1.2827 0.4534 190 0.9540 9652288
1.2674 0.4653 195 0.9523 9908688
1.4104 0.4773 200 0.9512 10160904
1.3507 0.4892 205 0.9547 10411964
1.5425 0.5011 210 0.9498 10663204
1.2436 0.5130 215 0.9523 10910880
1.3822 0.5250 220 0.9495 11163128
1.2537 0.5369 225 0.9531 11415784
1.1275 0.5488 230 0.9494 11669324
1.2746 0.5608 235 0.9499 11928348
1.1185 0.5727 240 0.9482 12186304
1.151 0.5846 245 0.9504 12439048
1.3418 0.5966 250 0.9459 12695300
1.2136 0.6085 255 0.9465 12956336
1.3555 0.6204 260 0.9489 13207856
1.1649 0.6324 265 0.9455 13462604
1.1214 0.6443 270 0.9458 13710788
1.1163 0.6562 275 0.9450 13968176
1.081 0.6682 280 0.9453 14220592
1.1374 0.6801 285 0.9431 14473072
1.4752 0.6920 290 0.9449 14719728
1.2133 0.7040 295 0.9462 14971712
1.185 0.7159 300 0.9434 15223916
1.4205 0.7278 305 0.9459 15484568
1.1185 0.7397 310 0.9448 15737896
1.1153 0.7517 315 0.9441 15994588
1.3097 0.7636 320 0.9413 16249952
1.2363 0.7755 325 0.9454 16503828
1.2772 0.7875 330 0.9407 16758796
1.1471 0.7994 335 0.9428 17009524
1.196 0.8113 340 0.9406 17259112
1.1234 0.8233 345 0.9424 17507920
1.2518 0.8352 350 0.9377 17761472
1.3816 0.8471 355 0.9454 18018912
1.2513 0.8591 360 0.9391 18274396
1.2215 0.8710 365 0.9404 18534984
1.3596 0.8829 370 0.9403 18789204
1.1752 0.8949 375 0.9404 19044632
1.1623 0.9068 380 0.9422 19301532
1.3607 0.9187 385 0.9392 19559488
1.1718 0.9306 390 0.9397 19804384
1.2385 0.9426 395 0.9396 20056360
1.2311 0.9545 400 0.9443 20312576
1.3821 0.9664 405 0.9408 20561528
1.3602 0.9784 410 0.9416 20814848
1.2911 0.9903 415 0.9420 21072348

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
27.2B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0

Base model

google/gemma-2-27b
Finetuned
(52)
this model