collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0907
  • Num Input Tokens Seen: 15769592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5428 0.0181 5 1.3546 287760
1.4491 0.0363 10 1.2428 573400
1.3019 0.0544 15 1.1801 857448
1.1411 0.0725 20 1.1595 1149600
1.1571 0.0907 25 1.1375 1436240
0.9624 0.1088 30 1.1437 1719440
0.9695 0.1270 35 1.1431 2009848
0.8998 0.1451 40 1.1629 2294176
0.8159 0.1632 45 1.1511 2583312
0.7574 0.1814 50 1.1571 2871264
0.779 0.1995 55 1.1611 3152816
0.7257 0.2176 60 1.1561 3436104
0.6731 0.2358 65 1.1555 3727224
0.6363 0.2539 70 1.1421 4018624
0.7335 0.2720 75 1.1438 4302752
0.6083 0.2902 80 1.1426 4586272
0.5381 0.3083 85 1.1403 4871408
0.6245 0.3265 90 1.1329 5161168
0.5615 0.3446 95 1.1357 5445744
0.6186 0.3627 100 1.1283 5732440
0.6334 0.3809 105 1.1311 6010696
0.6243 0.3990 110 1.1259 6302592
0.4819 0.4171 115 1.1253 6590264
0.6061 0.4353 120 1.1222 6871720
0.5721 0.4534 125 1.1224 7161672
0.5233 0.4715 130 1.1187 7445560
0.5879 0.4897 135 1.1184 7731240
0.6161 0.5078 140 1.1148 8022128
0.5413 0.5260 145 1.1141 8311128
0.6638 0.5441 150 1.1113 8599464
0.5392 0.5622 155 1.1110 8882912
0.544 0.5804 160 1.1090 9170720
0.4453 0.5985 165 1.1078 9462296
0.6199 0.6166 170 1.1070 9744376
0.4516 0.6348 175 1.1062 10034944
0.4961 0.6529 180 1.1056 10319088
0.4468 0.6710 185 1.1047 10604352
0.53 0.6892 190 1.1032 10894032
0.4841 0.7073 195 1.1033 11183504
0.3908 0.7255 200 1.1012 11472104
0.562 0.7436 205 1.1016 11760408
0.5476 0.7617 210 1.0987 12042136
0.6008 0.7799 215 1.0985 12330008
0.5593 0.7980 220 1.0963 12618392
0.4895 0.8161 225 1.0967 12904064
0.5783 0.8343 230 1.0968 13192384
0.4983 0.8524 235 1.0945 13481168
0.6185 0.8706 240 1.0926 13771296
0.5053 0.8887 245 1.0932 14059504
0.5093 0.9068 250 1.0932 14347376
0.5297 0.9250 255 1.0926 14625616
0.4765 0.9431 260 1.0919 14910888
0.4637 0.9612 265 1.0927 15198944
0.3962 0.9794 270 1.0896 15487232
0.3994 0.9975 275 1.0907 15769592

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

Base model

google/gemma-2-2b
Finetuned
(491)
this model