Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter14_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2046
  • Num Input Tokens Seen: 4998392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4664 0.0531 5 1.2779 265768
1.0297 0.1062 10 1.2239 526776
0.9672 0.1594 15 1.2051 794288
0.9285 0.2125 20 1.2391 1063824
0.7632 0.2656 25 1.2306 1332408
0.7406 0.3187 30 1.2478 1595464
0.6883 0.3718 35 1.2507 1871024
0.5929 0.4250 40 1.2429 2133560
0.4589 0.4781 45 1.2391 2394480
0.6095 0.5312 50 1.2221 2663544
0.5181 0.5843 55 1.2246 2930064
0.4917 0.6375 60 1.2135 3199536
0.5105 0.6906 65 1.2249 3465264
0.4253 0.7437 70 1.2138 3727952
0.4506 0.7968 75 1.2148 3991304
0.4301 0.8499 80 1.2095 4255664
0.432 0.9031 85 1.2015 4523456
0.3698 0.9562 90 1.2208 4781552

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter14_sftsd2

Base model

google/gemma-2-2b
Finetuned
(437)
this model