collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9420
- Num Input Tokens Seen: 21270888
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
3.3744 | 0.0119 | 5 | 1.0898 | 257008 |
2.7344 | 0.0239 | 10 | 1.0110 | 511176 |
3.0196 | 0.0358 | 15 | 0.9968 | 766352 |
2.6511 | 0.0477 | 20 | 0.9826 | 1021480 |
2.4604 | 0.0597 | 25 | 0.9840 | 1268604 |
2.3963 | 0.0716 | 30 | 0.9900 | 1514644 |
2.5123 | 0.0835 | 35 | 0.9893 | 1773960 |
2.267 | 0.0955 | 40 | 0.9847 | 2024128 |
2.2102 | 0.1074 | 45 | 0.9850 | 2275396 |
2.1111 | 0.1193 | 50 | 0.9831 | 2532996 |
2.2625 | 0.1312 | 55 | 0.9879 | 2786708 |
1.9372 | 0.1432 | 60 | 0.9866 | 3040624 |
1.6937 | 0.1551 | 65 | 0.9815 | 3290344 |
1.643 | 0.1670 | 70 | 0.9785 | 3545668 |
1.6501 | 0.1790 | 75 | 0.9776 | 3796572 |
1.533 | 0.1909 | 80 | 0.9773 | 4049160 |
1.429 | 0.2028 | 85 | 0.9747 | 4305488 |
1.5398 | 0.2148 | 90 | 0.9717 | 4555888 |
1.4633 | 0.2267 | 95 | 0.9719 | 4803416 |
1.216 | 0.2386 | 100 | 0.9739 | 5063736 |
1.2603 | 0.2506 | 105 | 0.9643 | 5316860 |
1.5926 | 0.2625 | 110 | 0.9673 | 5573004 |
1.3854 | 0.2744 | 115 | 0.9663 | 5831764 |
1.4685 | 0.2864 | 120 | 0.9643 | 6088656 |
1.2814 | 0.2983 | 125 | 0.9634 | 6343448 |
1.4396 | 0.3102 | 130 | 0.9613 | 6596508 |
1.2699 | 0.3221 | 135 | 0.9591 | 6855352 |
1.2285 | 0.3341 | 140 | 0.9617 | 7111244 |
1.2788 | 0.3460 | 145 | 0.9611 | 7360936 |
1.3858 | 0.3579 | 150 | 0.9586 | 7617700 |
1.2758 | 0.3699 | 155 | 0.9615 | 7876384 |
1.2891 | 0.3818 | 160 | 0.9556 | 8132024 |
1.3362 | 0.3937 | 165 | 0.9589 | 8384580 |
1.306 | 0.4057 | 170 | 0.9557 | 8634968 |
1.3192 | 0.4176 | 175 | 0.9574 | 8888388 |
1.3276 | 0.4295 | 180 | 0.9537 | 9137756 |
1.3805 | 0.4415 | 185 | 0.9530 | 9392512 |
1.2827 | 0.4534 | 190 | 0.9540 | 9652288 |
1.2674 | 0.4653 | 195 | 0.9523 | 9908688 |
1.4104 | 0.4773 | 200 | 0.9512 | 10160904 |
1.3507 | 0.4892 | 205 | 0.9547 | 10411964 |
1.5425 | 0.5011 | 210 | 0.9498 | 10663204 |
1.2436 | 0.5130 | 215 | 0.9523 | 10910880 |
1.3822 | 0.5250 | 220 | 0.9495 | 11163128 |
1.2537 | 0.5369 | 225 | 0.9531 | 11415784 |
1.1275 | 0.5488 | 230 | 0.9494 | 11669324 |
1.2746 | 0.5608 | 235 | 0.9499 | 11928348 |
1.1185 | 0.5727 | 240 | 0.9482 | 12186304 |
1.151 | 0.5846 | 245 | 0.9504 | 12439048 |
1.3418 | 0.5966 | 250 | 0.9459 | 12695300 |
1.2136 | 0.6085 | 255 | 0.9465 | 12956336 |
1.3555 | 0.6204 | 260 | 0.9489 | 13207856 |
1.1649 | 0.6324 | 265 | 0.9455 | 13462604 |
1.1214 | 0.6443 | 270 | 0.9458 | 13710788 |
1.1163 | 0.6562 | 275 | 0.9450 | 13968176 |
1.081 | 0.6682 | 280 | 0.9453 | 14220592 |
1.1374 | 0.6801 | 285 | 0.9431 | 14473072 |
1.4752 | 0.6920 | 290 | 0.9449 | 14719728 |
1.2133 | 0.7040 | 295 | 0.9462 | 14971712 |
1.185 | 0.7159 | 300 | 0.9434 | 15223916 |
1.4205 | 0.7278 | 305 | 0.9459 | 15484568 |
1.1185 | 0.7397 | 310 | 0.9448 | 15737896 |
1.1153 | 0.7517 | 315 | 0.9441 | 15994588 |
1.3097 | 0.7636 | 320 | 0.9413 | 16249952 |
1.2363 | 0.7755 | 325 | 0.9454 | 16503828 |
1.2772 | 0.7875 | 330 | 0.9407 | 16758796 |
1.1471 | 0.7994 | 335 | 0.9428 | 17009524 |
1.196 | 0.8113 | 340 | 0.9406 | 17259112 |
1.1234 | 0.8233 | 345 | 0.9424 | 17507920 |
1.2518 | 0.8352 | 350 | 0.9377 | 17761472 |
1.3816 | 0.8471 | 355 | 0.9454 | 18018912 |
1.2513 | 0.8591 | 360 | 0.9391 | 18274396 |
1.2215 | 0.8710 | 365 | 0.9404 | 18534984 |
1.3596 | 0.8829 | 370 | 0.9403 | 18789204 |
1.1752 | 0.8949 | 375 | 0.9404 | 19044632 |
1.1623 | 0.9068 | 380 | 0.9422 | 19301532 |
1.3607 | 0.9187 | 385 | 0.9392 | 19559488 |
1.1718 | 0.9306 | 390 | 0.9397 | 19804384 |
1.2385 | 0.9426 | 395 | 0.9396 | 20056360 |
1.2311 | 0.9545 | 400 | 0.9443 | 20312576 |
1.3821 | 0.9664 | 405 | 0.9408 | 20561528 |
1.3602 | 0.9784 | 410 | 0.9416 | 20814848 |
1.2911 | 0.9903 | 415 | 0.9420 | 21072348 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter5_sftsd0
Base model
google/gemma-2-27b