RylanSchaeffer's picture
End of training
371f62a verified
metadata
license: gemma
base_model: google/gemma-2-27b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-27b_hs2_replace_iter2_sftsd1
    results: []

collapse_gemma-2-27b_hs2_replace_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1843
  • Num Input Tokens Seen: 3808768

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.1282 0
2.5155 0.0608 5 1.0474 236020
2.3221 0.1216 10 1.0643 471208
1.8872 0.1824 15 1.0913 707824
1.5782 0.2432 20 1.1446 943300
1.3696 0.3040 25 1.1695 1175352
1.1143 0.3647 30 1.1811 1412980
1.1623 0.4255 35 1.1684 1635940
1.235 0.4863 40 1.1777 1866292
1.1213 0.5471 45 1.1692 2096140
1.125 0.6079 50 1.1775 2327208
1.0627 0.6687 55 1.1690 2561988
0.9847 0.7295 60 1.1899 2784360
1.0474 0.7903 65 1.1640 3017608
0.9585 0.8511 70 1.1777 3249220
0.9715 0.9119 75 1.1700 3485472
1.0036 0.9726 80 1.1909 3722208

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1