collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0907
- Num Input Tokens Seen: 15769592
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5428 | 0.0181 | 5 | 1.3546 | 287760 |
1.4491 | 0.0363 | 10 | 1.2428 | 573400 |
1.3019 | 0.0544 | 15 | 1.1801 | 857448 |
1.1411 | 0.0725 | 20 | 1.1595 | 1149600 |
1.1571 | 0.0907 | 25 | 1.1375 | 1436240 |
0.9624 | 0.1088 | 30 | 1.1437 | 1719440 |
0.9695 | 0.1270 | 35 | 1.1431 | 2009848 |
0.8998 | 0.1451 | 40 | 1.1629 | 2294176 |
0.8159 | 0.1632 | 45 | 1.1511 | 2583312 |
0.7574 | 0.1814 | 50 | 1.1571 | 2871264 |
0.779 | 0.1995 | 55 | 1.1611 | 3152816 |
0.7257 | 0.2176 | 60 | 1.1561 | 3436104 |
0.6731 | 0.2358 | 65 | 1.1555 | 3727224 |
0.6363 | 0.2539 | 70 | 1.1421 | 4018624 |
0.7335 | 0.2720 | 75 | 1.1438 | 4302752 |
0.6083 | 0.2902 | 80 | 1.1426 | 4586272 |
0.5381 | 0.3083 | 85 | 1.1403 | 4871408 |
0.6245 | 0.3265 | 90 | 1.1329 | 5161168 |
0.5615 | 0.3446 | 95 | 1.1357 | 5445744 |
0.6186 | 0.3627 | 100 | 1.1283 | 5732440 |
0.6334 | 0.3809 | 105 | 1.1311 | 6010696 |
0.6243 | 0.3990 | 110 | 1.1259 | 6302592 |
0.4819 | 0.4171 | 115 | 1.1253 | 6590264 |
0.6061 | 0.4353 | 120 | 1.1222 | 6871720 |
0.5721 | 0.4534 | 125 | 1.1224 | 7161672 |
0.5233 | 0.4715 | 130 | 1.1187 | 7445560 |
0.5879 | 0.4897 | 135 | 1.1184 | 7731240 |
0.6161 | 0.5078 | 140 | 1.1148 | 8022128 |
0.5413 | 0.5260 | 145 | 1.1141 | 8311128 |
0.6638 | 0.5441 | 150 | 1.1113 | 8599464 |
0.5392 | 0.5622 | 155 | 1.1110 | 8882912 |
0.544 | 0.5804 | 160 | 1.1090 | 9170720 |
0.4453 | 0.5985 | 165 | 1.1078 | 9462296 |
0.6199 | 0.6166 | 170 | 1.1070 | 9744376 |
0.4516 | 0.6348 | 175 | 1.1062 | 10034944 |
0.4961 | 0.6529 | 180 | 1.1056 | 10319088 |
0.4468 | 0.6710 | 185 | 1.1047 | 10604352 |
0.53 | 0.6892 | 190 | 1.1032 | 10894032 |
0.4841 | 0.7073 | 195 | 1.1033 | 11183504 |
0.3908 | 0.7255 | 200 | 1.1012 | 11472104 |
0.562 | 0.7436 | 205 | 1.1016 | 11760408 |
0.5476 | 0.7617 | 210 | 1.0987 | 12042136 |
0.6008 | 0.7799 | 215 | 1.0985 | 12330008 |
0.5593 | 0.7980 | 220 | 1.0963 | 12618392 |
0.4895 | 0.8161 | 225 | 1.0967 | 12904064 |
0.5783 | 0.8343 | 230 | 1.0968 | 13192384 |
0.4983 | 0.8524 | 235 | 1.0945 | 13481168 |
0.6185 | 0.8706 | 240 | 1.0926 | 13771296 |
0.5053 | 0.8887 | 245 | 1.0932 | 14059504 |
0.5093 | 0.9068 | 250 | 1.0932 | 14347376 |
0.5297 | 0.9250 | 255 | 1.0926 | 14625616 |
0.4765 | 0.9431 | 260 | 1.0919 | 14910888 |
0.4637 | 0.9612 | 265 | 1.0927 | 15198944 |
0.3962 | 0.9794 | 270 | 1.0896 | 15487232 |
0.3994 | 0.9975 | 275 | 1.0907 | 15769592 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
Base model
google/gemma-2-2b