metadata
license: gemma
base_model: google/gemma-2-9b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1
results: []
collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1
This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9401
- Num Input Tokens Seen: 9695924
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.2335 | 0 |
1.2409 | 0.0268 | 5 | 1.1095 | 255572 |
1.0595 | 0.0536 | 10 | 1.0210 | 515832 |
0.9484 | 0.0804 | 15 | 0.9901 | 775964 |
0.8491 | 0.1072 | 20 | 0.9915 | 1028892 |
0.7828 | 0.1340 | 25 | 0.9898 | 1286060 |
0.7926 | 0.1608 | 30 | 0.9886 | 1549820 |
0.6968 | 0.1875 | 35 | 0.9855 | 1809168 |
0.751 | 0.2143 | 40 | 0.9830 | 2071332 |
0.6349 | 0.2411 | 45 | 0.9763 | 2329060 |
0.5858 | 0.2679 | 50 | 0.9717 | 2582412 |
0.6271 | 0.2947 | 55 | 0.9682 | 2841400 |
0.539 | 0.3215 | 60 | 0.9675 | 3103564 |
0.6166 | 0.3483 | 65 | 0.9633 | 3371340 |
0.6678 | 0.3751 | 70 | 0.9611 | 3634204 |
0.5751 | 0.4019 | 75 | 0.9581 | 3892340 |
0.5311 | 0.4287 | 80 | 0.9560 | 4156988 |
0.6751 | 0.4555 | 85 | 0.9548 | 4419404 |
0.6184 | 0.4823 | 90 | 0.9538 | 4677684 |
0.6578 | 0.5090 | 95 | 0.9523 | 4937352 |
0.6409 | 0.5358 | 100 | 0.9522 | 5199988 |
0.6468 | 0.5626 | 105 | 0.9507 | 5461972 |
0.5908 | 0.5894 | 110 | 0.9494 | 5724396 |
0.5753 | 0.6162 | 115 | 0.9490 | 5986712 |
0.5835 | 0.6430 | 120 | 0.9489 | 6238272 |
0.4922 | 0.6698 | 125 | 0.9483 | 6502692 |
0.5653 | 0.6966 | 130 | 0.9465 | 6766008 |
0.4244 | 0.7234 | 135 | 0.9458 | 7026916 |
0.561 | 0.7502 | 140 | 0.9455 | 7285852 |
0.5852 | 0.7770 | 145 | 0.9460 | 7548120 |
0.5483 | 0.8038 | 150 | 0.9445 | 7813604 |
0.5537 | 0.8305 | 155 | 0.9442 | 8074268 |
0.567 | 0.8573 | 160 | 0.9438 | 8329848 |
0.486 | 0.8841 | 165 | 0.9435 | 8586556 |
0.5464 | 0.9109 | 170 | 0.9422 | 8853500 |
0.5167 | 0.9377 | 175 | 0.9406 | 9116632 |
0.5577 | 0.9645 | 180 | 0.9423 | 9374420 |
0.5194 | 0.9913 | 185 | 0.9407 | 9644032 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1