Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0975
  • Num Input Tokens Seen: 13721160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5421 0.0206 5 1.3563 284760
1.4213 0.0412 10 1.2364 571568
1.3773 0.0618 15 1.1718 845064
1.2116 0.0824 20 1.1443 1127704
1.1315 0.1030 25 1.1199 1412496
1.1024 0.1236 30 1.1226 1698920
1.0443 0.1441 35 1.1252 1986472
1.0363 0.1647 40 1.1266 2267632
1.0423 0.1853 45 1.1341 2547936
0.9706 0.2059 50 1.1300 2830576
0.9604 0.2265 55 1.1429 3118224
0.9255 0.2471 60 1.1355 3404464
0.9483 0.2677 65 1.1537 3688352
0.8534 0.2883 70 1.1419 3977080
0.8731 0.3089 75 1.1393 4258200
0.8774 0.3295 80 1.1458 4542712
0.8021 0.3501 85 1.1396 4833248
0.7919 0.3707 90 1.1405 5110392
0.765 0.3912 95 1.1369 5394440
0.6146 0.4118 100 1.1466 5677160
0.7264 0.4324 105 1.1348 5959104
0.6176 0.4530 110 1.1390 6236792
0.718 0.4736 115 1.1362 6522184
0.6601 0.4942 120 1.1386 6805272
0.7045 0.5148 125 1.1291 7080584
0.6125 0.5354 130 1.1355 7359048
0.7828 0.5560 135 1.1299 7639800
0.7475 0.5766 140 1.1292 7925000
0.7263 0.5972 145 1.1283 8212784
0.591 0.6178 150 1.1274 8498984
0.6697 0.6384 155 1.1224 8783480
0.6356 0.6589 160 1.1216 9069640
0.6016 0.6795 165 1.1205 9358968
0.5734 0.7001 170 1.1175 9644264
0.5932 0.7207 175 1.1157 9934824
0.5129 0.7413 180 1.1148 10221456
0.6567 0.7619 185 1.1130 10498184
0.6554 0.7825 190 1.1117 10777688
0.5459 0.8031 195 1.1105 11062480
0.6166 0.8237 200 1.1069 11343448
0.6983 0.8443 205 1.1061 11620888
0.5964 0.8649 210 1.1052 11908944
0.5881 0.8855 215 1.1031 12192472
0.5667 0.9060 220 1.1026 12474256
0.5131 0.9266 225 1.1018 12762728
0.5854 0.9472 230 1.0999 13045696
0.6179 0.9678 235 1.1003 13323080
0.5287 0.9884 240 1.0984 13609776

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

Base model

google/gemma-2-2b
Finetuned
(423)
this model