Edit model card

collapse_gemma-2-2b_hs2_replace_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4538
  • Num Input Tokens Seen: 4832464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6784 0.0591 5 1.2633 282096
1.3537 0.1183 10 1.1871 571576
1.0696 0.1774 15 1.2164 857160
0.9162 0.2365 20 1.2391 1142344
0.7598 0.2956 25 1.3479 1427536
0.5372 0.3548 30 1.4227 1715736
0.4796 0.4139 35 1.4737 2003760
0.3889 0.4730 40 1.5021 2286384
0.1994 0.5322 45 1.5032 2573248
0.3391 0.5913 50 1.4714 2862104
0.3297 0.6504 55 1.4358 3145472
0.2038 0.7095 60 1.4488 3432144
0.195 0.7687 65 1.4273 3724448
0.1749 0.8278 70 1.4248 4016736
0.1654 0.8869 75 1.4554 4305224
0.1846 0.9460 80 1.4274 4595952

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter2_sftsd0

Base model

google/gemma-2-2b
Finetuned
(406)
this model