jkazdan's picture
End of training
2ab65db verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter5_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3657
  • Num Input Tokens Seen: 8077024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6417 0.0316 5 1.3077 259488
1.2388 0.0632 10 1.2367 506760
0.8793 0.0947 15 1.2729 765824
0.6917 0.1263 20 1.4184 1019280
0.4293 0.1579 25 1.5732 1271416
0.2959 0.1895 30 1.6498 1515792
0.2466 0.2211 35 1.8461 1771464
0.1581 0.2527 40 1.9813 2020528
0.0765 0.2842 45 2.1203 2277040
0.0524 0.3158 50 2.2317 2527976
0.0539 0.3474 55 2.3208 2784520
0.0348 0.3790 60 2.3321 3041696
0.0366 0.4106 65 2.3404 3300560
0.037 0.4422 70 2.3494 3547576
0.0324 0.4737 75 2.3125 3810280
0.0307 0.5053 80 2.2571 4069032
0.0281 0.5369 85 2.2877 4323872
0.0289 0.5685 90 2.3183 4583064
0.0304 0.6001 95 2.3403 4844432
0.0272 0.6317 100 2.3549 5101512
0.0276 0.6632 105 2.3650 5358960
0.0306 0.6948 110 2.3604 5616864
0.0282 0.7264 115 2.3438 5877496
0.0288 0.7580 120 2.3419 6129360
0.0281 0.7896 125 2.3471 6382240
0.0305 0.8212 130 2.3799 6635400
0.0295 0.8527 135 2.3850 6889824
0.0294 0.8843 140 2.3463 7146448
0.0258 0.9159 145 2.3439 7407840
0.0272 0.9475 150 2.3552 7662520
0.0263 0.9791 155 2.3712 7923432

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1