RylanSchaeffer's picture
End of training
8ea8365 verified
|
raw
history blame
5.38 kB
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd0
    results: []

collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9289
  • Num Input Tokens Seen: 13382208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.5273 0.0186 5 1.0497 239432
2.2642 0.0371 10 0.9938 490472
2.1944 0.0557 15 0.9799 738476
2.0449 0.0742 20 0.9761 991768
1.7622 0.0928 25 0.9788 1234816
1.6823 0.1113 30 0.9860 1486428
1.5237 0.1299 35 0.9862 1735404
1.4638 0.1484 40 0.9833 1983880
1.2775 0.1670 45 0.9803 2226820
1.246 0.1855 50 0.9762 2471660
1.1798 0.2041 55 0.9701 2723564
1.1618 0.2226 60 0.9658 2969216
1.1255 0.2412 65 0.9656 3218648
0.902 0.2597 70 0.9609 3474940
0.873 0.2783 75 0.9577 3721068
0.7585 0.2968 80 0.9560 3977036
0.9329 0.3154 85 0.9542 4227848
0.9888 0.3340 90 0.9544 4471040
0.8856 0.3525 95 0.9510 4719044
0.8959 0.3711 100 0.9519 4966088
0.707 0.3896 105 0.9476 5210868
0.8089 0.4082 110 0.9476 5470016
0.7476 0.4267 115 0.9459 5718420
0.6473 0.4453 120 0.9438 5972536
0.758 0.4638 125 0.9435 6221248
0.8454 0.4824 130 0.9403 6475340
0.7976 0.5009 135 0.9412 6727528
0.8476 0.5195 140 0.9400 6982388
0.7554 0.5380 145 0.9387 7218200
0.7193 0.5566 150 0.9386 7466484
0.6614 0.5751 155 0.9378 7709588
0.7586 0.5937 160 0.9344 7958964
0.769 0.6122 165 0.9353 8214680
0.6696 0.6308 170 0.9347 8457832
0.8566 0.6494 175 0.9377 8710088
0.8531 0.6679 180 0.9346 8959260
0.8454 0.6865 185 0.9346 9216248
0.7314 0.7050 190 0.9330 9465964
0.914 0.7236 195 0.9326 9718276
0.6292 0.7421 200 0.9335 9963556
0.683 0.7607 205 0.9348 10204596
0.5968 0.7792 210 0.9338 10460212
0.7731 0.7978 215 0.9338 10712008
0.707 0.8163 220 0.9318 10955092
0.7059 0.8349 225 0.9348 11197300
0.6878 0.8534 230 0.9301 11440440
0.6978 0.8720 235 0.9312 11685992
0.8379 0.8905 240 0.9294 11928976
0.8208 0.9091 245 0.9331 12185160
0.7653 0.9276 250 0.9314 12430192
0.7021 0.9462 255 0.9295 12684252
0.78 0.9647 260 0.9327 12932032
0.6731 0.9833 265 0.9279 13180768

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1