--- license: gemma base_model: google/gemma-2-9b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1 results: [] --- # collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1 This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9401 - Num Input Tokens Seen: 9695924 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 1 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.2335 | 0 | | 1.2409 | 0.0268 | 5 | 1.1095 | 255572 | | 1.0595 | 0.0536 | 10 | 1.0210 | 515832 | | 0.9484 | 0.0804 | 15 | 0.9901 | 775964 | | 0.8491 | 0.1072 | 20 | 0.9915 | 1028892 | | 0.7828 | 0.1340 | 25 | 0.9898 | 1286060 | | 0.7926 | 0.1608 | 30 | 0.9886 | 1549820 | | 0.6968 | 0.1875 | 35 | 0.9855 | 1809168 | | 0.751 | 0.2143 | 40 | 0.9830 | 2071332 | | 0.6349 | 0.2411 | 45 | 0.9763 | 2329060 | | 0.5858 | 0.2679 | 50 | 0.9717 | 2582412 | | 0.6271 | 0.2947 | 55 | 0.9682 | 2841400 | | 0.539 | 0.3215 | 60 | 0.9675 | 3103564 | | 0.6166 | 0.3483 | 65 | 0.9633 | 3371340 | | 0.6678 | 0.3751 | 70 | 0.9611 | 3634204 | | 0.5751 | 0.4019 | 75 | 0.9581 | 3892340 | | 0.5311 | 0.4287 | 80 | 0.9560 | 4156988 | | 0.6751 | 0.4555 | 85 | 0.9548 | 4419404 | | 0.6184 | 0.4823 | 90 | 0.9538 | 4677684 | | 0.6578 | 0.5090 | 95 | 0.9523 | 4937352 | | 0.6409 | 0.5358 | 100 | 0.9522 | 5199988 | | 0.6468 | 0.5626 | 105 | 0.9507 | 5461972 | | 0.5908 | 0.5894 | 110 | 0.9494 | 5724396 | | 0.5753 | 0.6162 | 115 | 0.9490 | 5986712 | | 0.5835 | 0.6430 | 120 | 0.9489 | 6238272 | | 0.4922 | 0.6698 | 125 | 0.9483 | 6502692 | | 0.5653 | 0.6966 | 130 | 0.9465 | 6766008 | | 0.4244 | 0.7234 | 135 | 0.9458 | 7026916 | | 0.561 | 0.7502 | 140 | 0.9455 | 7285852 | | 0.5852 | 0.7770 | 145 | 0.9460 | 7548120 | | 0.5483 | 0.8038 | 150 | 0.9445 | 7813604 | | 0.5537 | 0.8305 | 155 | 0.9442 | 8074268 | | 0.567 | 0.8573 | 160 | 0.9438 | 8329848 | | 0.486 | 0.8841 | 165 | 0.9435 | 8586556 | | 0.5464 | 0.9109 | 170 | 0.9422 | 8853500 | | 0.5167 | 0.9377 | 175 | 0.9406 | 9116632 | | 0.5577 | 0.9645 | 180 | 0.9423 | 9374420 | | 0.5194 | 0.9913 | 185 | 0.9407 | 9644032 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1