RylanSchaeffer's picture
End of training
3edbf2b verified
metadata
license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1
    results: []

collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9401
  • Num Input Tokens Seen: 9695924

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.2409 0.0268 5 1.1095 255572
1.0595 0.0536 10 1.0210 515832
0.9484 0.0804 15 0.9901 775964
0.8491 0.1072 20 0.9915 1028892
0.7828 0.1340 25 0.9898 1286060
0.7926 0.1608 30 0.9886 1549820
0.6968 0.1875 35 0.9855 1809168
0.751 0.2143 40 0.9830 2071332
0.6349 0.2411 45 0.9763 2329060
0.5858 0.2679 50 0.9717 2582412
0.6271 0.2947 55 0.9682 2841400
0.539 0.3215 60 0.9675 3103564
0.6166 0.3483 65 0.9633 3371340
0.6678 0.3751 70 0.9611 3634204
0.5751 0.4019 75 0.9581 3892340
0.5311 0.4287 80 0.9560 4156988
0.6751 0.4555 85 0.9548 4419404
0.6184 0.4823 90 0.9538 4677684
0.6578 0.5090 95 0.9523 4937352
0.6409 0.5358 100 0.9522 5199988
0.6468 0.5626 105 0.9507 5461972
0.5908 0.5894 110 0.9494 5724396
0.5753 0.6162 115 0.9490 5986712
0.5835 0.6430 120 0.9489 6238272
0.4922 0.6698 125 0.9483 6502692
0.5653 0.6966 130 0.9465 6766008
0.4244 0.7234 135 0.9458 7026916
0.561 0.7502 140 0.9455 7285852
0.5852 0.7770 145 0.9460 7548120
0.5483 0.8038 150 0.9445 7813604
0.5537 0.8305 155 0.9442 8074268
0.567 0.8573 160 0.9438 8329848
0.486 0.8841 165 0.9435 8586556
0.5464 0.9109 170 0.9422 8853500
0.5167 0.9377 175 0.9406 9116632
0.5577 0.9645 180 0.9423 9374420
0.5194 0.9913 185 0.9407 9644032

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1