collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9540
- Num Input Tokens Seen: 23746304
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.3192 | 0.0106 | 5 | 1.1930 | 251724 |
1.2066 | 0.0211 | 10 | 1.0874 | 499628 |
1.068 | 0.0317 | 15 | 1.0377 | 746640 |
0.7682 | 0.0423 | 20 | 1.0245 | 998064 |
0.6356 | 0.0529 | 25 | 1.0272 | 1247608 |
0.593 | 0.0634 | 30 | 1.0298 | 1506576 |
0.4293 | 0.0740 | 35 | 1.0250 | 1764228 |
0.3318 | 0.0846 | 40 | 1.0225 | 2018200 |
0.336 | 0.0952 | 45 | 1.0146 | 2271692 |
0.3199 | 0.1057 | 50 | 1.0056 | 2516500 |
0.268 | 0.1163 | 55 | 1.0010 | 2771000 |
0.3191 | 0.1269 | 60 | 0.9958 | 3020960 |
0.3025 | 0.1374 | 65 | 0.9967 | 3270868 |
0.2459 | 0.1480 | 70 | 0.9926 | 3522436 |
0.3354 | 0.1586 | 75 | 0.9893 | 3769816 |
0.3724 | 0.1692 | 80 | 0.9870 | 4023492 |
0.3067 | 0.1797 | 85 | 0.9857 | 4280320 |
0.3403 | 0.1903 | 90 | 0.9824 | 4533272 |
0.2916 | 0.2009 | 95 | 0.9814 | 4782572 |
0.2843 | 0.2115 | 100 | 0.9797 | 5032932 |
0.2819 | 0.2220 | 105 | 0.9773 | 5288380 |
0.192 | 0.2326 | 110 | 0.9761 | 5544116 |
0.223 | 0.2432 | 115 | 0.9750 | 5792512 |
0.2713 | 0.2538 | 120 | 0.9727 | 6045400 |
0.2705 | 0.2643 | 125 | 0.9716 | 6296860 |
0.3249 | 0.2749 | 130 | 0.9712 | 6546012 |
0.2854 | 0.2855 | 135 | 0.9688 | 6806812 |
0.242 | 0.2960 | 140 | 0.9702 | 7059244 |
0.2796 | 0.3066 | 145 | 0.9696 | 7303032 |
0.3008 | 0.3172 | 150 | 0.9680 | 7552056 |
0.2189 | 0.3278 | 155 | 0.9691 | 7807484 |
0.2304 | 0.3383 | 160 | 0.9697 | 8044776 |
0.2107 | 0.3489 | 165 | 0.9691 | 8297808 |
0.2989 | 0.3595 | 170 | 0.9663 | 8540908 |
0.2247 | 0.3701 | 175 | 0.9648 | 8800560 |
0.322 | 0.3806 | 180 | 0.9657 | 9055232 |
0.2985 | 0.3912 | 185 | 0.9644 | 9309612 |
0.2521 | 0.4018 | 190 | 0.9645 | 9563664 |
0.3678 | 0.4123 | 195 | 0.9656 | 9821076 |
0.2472 | 0.4229 | 200 | 0.9652 | 10073776 |
0.2586 | 0.4335 | 205 | 0.9634 | 10328452 |
0.2413 | 0.4441 | 210 | 0.9642 | 10588044 |
0.2502 | 0.4546 | 215 | 0.9636 | 10835888 |
0.2606 | 0.4652 | 220 | 0.9634 | 11091212 |
0.2124 | 0.4758 | 225 | 0.9637 | 11346152 |
0.3122 | 0.4864 | 230 | 0.9618 | 11605376 |
0.3033 | 0.4969 | 235 | 0.9603 | 11859096 |
0.4002 | 0.5075 | 240 | 0.9617 | 12112920 |
0.2307 | 0.5181 | 245 | 0.9596 | 12364040 |
0.3126 | 0.5286 | 250 | 0.9601 | 12612780 |
0.2353 | 0.5392 | 255 | 0.9606 | 12862168 |
0.2599 | 0.5498 | 260 | 0.9596 | 13118060 |
0.2025 | 0.5604 | 265 | 0.9582 | 13367704 |
0.2109 | 0.5709 | 270 | 0.9580 | 13618324 |
0.223 | 0.5815 | 275 | 0.9605 | 13868120 |
0.2711 | 0.5921 | 280 | 0.9605 | 14117864 |
0.2627 | 0.6027 | 285 | 0.9580 | 14361576 |
0.299 | 0.6132 | 290 | 0.9576 | 14613300 |
0.2192 | 0.6238 | 295 | 0.9569 | 14863772 |
0.2936 | 0.6344 | 300 | 0.9580 | 15117120 |
0.213 | 0.6449 | 305 | 0.9583 | 15367888 |
0.212 | 0.6555 | 310 | 0.9583 | 15620172 |
0.207 | 0.6661 | 315 | 0.9586 | 15867484 |
0.2712 | 0.6767 | 320 | 0.9580 | 16119180 |
0.2482 | 0.6872 | 325 | 0.9565 | 16372908 |
0.2093 | 0.6978 | 330 | 0.9549 | 16621448 |
0.2663 | 0.7084 | 335 | 0.9566 | 16873836 |
0.2744 | 0.7190 | 340 | 0.9569 | 17124156 |
0.2421 | 0.7295 | 345 | 0.9559 | 17371376 |
0.2775 | 0.7401 | 350 | 0.9555 | 17618956 |
0.1681 | 0.7507 | 355 | 0.9553 | 17871212 |
0.261 | 0.7613 | 360 | 0.9547 | 18122512 |
0.2847 | 0.7718 | 365 | 0.9541 | 18373000 |
0.2619 | 0.7824 | 370 | 0.9535 | 18624220 |
0.2633 | 0.7930 | 375 | 0.9543 | 18869156 |
0.2664 | 0.8035 | 380 | 0.9544 | 19118420 |
0.2411 | 0.8141 | 385 | 0.9527 | 19372484 |
0.2671 | 0.8247 | 390 | 0.9521 | 19619032 |
0.2495 | 0.8353 | 395 | 0.9528 | 19870712 |
0.1758 | 0.8458 | 400 | 0.9530 | 20120812 |
0.2762 | 0.8564 | 405 | 0.9534 | 20376396 |
0.3021 | 0.8670 | 410 | 0.9533 | 20629920 |
0.2111 | 0.8776 | 415 | 0.9535 | 20874064 |
0.2174 | 0.8881 | 420 | 0.9525 | 21129004 |
0.2087 | 0.8987 | 425 | 0.9531 | 21379724 |
0.3074 | 0.9093 | 430 | 0.9521 | 21630104 |
0.2314 | 0.9198 | 435 | 0.9519 | 21878192 |
0.224 | 0.9304 | 440 | 0.9527 | 22128940 |
0.2769 | 0.9410 | 445 | 0.9525 | 22382812 |
0.2538 | 0.9516 | 450 | 0.9538 | 22638260 |
0.2352 | 0.9621 | 455 | 0.9539 | 22886880 |
0.3061 | 0.9727 | 460 | 0.9513 | 23143232 |
0.2891 | 0.9833 | 465 | 0.9506 | 23396844 |
0.252 | 0.9939 | 470 | 0.9551 | 23650700 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter5_sftsd1
Base model
google/gemma-2-9b