RylanSchaeffer's picture
End of training
676f5b2 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter16_sftsd1
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter16_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2063
  • Num Input Tokens Seen: 4908568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.384 0.0528 5 1.2748 263536
1.0504 0.1055 10 1.2136 521104
0.9216 0.1583 15 1.2253 786688
0.705 0.2111 20 1.2516 1052112
0.6686 0.2639 25 1.2797 1310000
0.622 0.3166 30 1.2586 1569720
0.5686 0.3694 35 1.2699 1830672
0.4761 0.4222 40 1.2417 2099344
0.4431 0.4749 45 1.2457 2362288
0.4104 0.5277 50 1.2311 2622728
0.4555 0.5805 55 1.2327 2882280
0.3677 0.6332 60 1.2202 3140984
0.32 0.6860 65 1.2245 3401624
0.3617 0.7388 70 1.2184 3659992
0.2982 0.7916 75 1.2144 3920072
0.32 0.8443 80 1.2069 4179680
0.4088 0.8971 85 1.2115 4434784
0.4142 0.9499 90 1.2070 4701080

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1