RylanSchaeffer's picture
End of training
e0baeda verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter18_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter18_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5445
  • Num Input Tokens Seen: 4524632

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.462 0.0511 5 1.2773 238088
0.8829 0.1021 10 1.2972 475032
0.4981 0.1532 15 1.5077 699248
0.3365 0.2042 20 1.6826 928128
0.1539 0.2553 25 1.9020 1160416
0.1165 0.3063 30 2.1476 1400368
0.0497 0.3574 35 2.2795 1625064
0.0301 0.4084 40 2.4279 1861016
0.0261 0.4595 45 2.5046 2095384
0.0258 0.5105 50 2.5742 2323472
0.0317 0.5616 55 2.5833 2562512
0.028 0.6126 60 2.6027 2786488
0.0243 0.6637 65 2.5843 3015672
0.0258 0.7147 70 2.5825 3252176
0.0254 0.7658 75 2.5659 3483824
0.0241 0.8168 80 2.5607 3724008
0.0229 0.8679 85 2.5589 3960312
0.0248 0.9190 90 2.5480 4198040
0.0236 0.9700 95 2.5445 4434000

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1