---
license: gemma
base_model: google/gemma-2-9b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd1

This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9401
- Num Input Tokens Seen: 9695924

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.2335          | 0                 |
| 1.2409        | 0.0268 | 5    | 1.1095          | 255572            |
| 1.0595        | 0.0536 | 10   | 1.0210          | 515832            |
| 0.9484        | 0.0804 | 15   | 0.9901          | 775964            |
| 0.8491        | 0.1072 | 20   | 0.9915          | 1028892           |
| 0.7828        | 0.1340 | 25   | 0.9898          | 1286060           |
| 0.7926        | 0.1608 | 30   | 0.9886          | 1549820           |
| 0.6968        | 0.1875 | 35   | 0.9855          | 1809168           |
| 0.751         | 0.2143 | 40   | 0.9830          | 2071332           |
| 0.6349        | 0.2411 | 45   | 0.9763          | 2329060           |
| 0.5858        | 0.2679 | 50   | 0.9717          | 2582412           |
| 0.6271        | 0.2947 | 55   | 0.9682          | 2841400           |
| 0.539         | 0.3215 | 60   | 0.9675          | 3103564           |
| 0.6166        | 0.3483 | 65   | 0.9633          | 3371340           |
| 0.6678        | 0.3751 | 70   | 0.9611          | 3634204           |
| 0.5751        | 0.4019 | 75   | 0.9581          | 3892340           |
| 0.5311        | 0.4287 | 80   | 0.9560          | 4156988           |
| 0.6751        | 0.4555 | 85   | 0.9548          | 4419404           |
| 0.6184        | 0.4823 | 90   | 0.9538          | 4677684           |
| 0.6578        | 0.5090 | 95   | 0.9523          | 4937352           |
| 0.6409        | 0.5358 | 100  | 0.9522          | 5199988           |
| 0.6468        | 0.5626 | 105  | 0.9507          | 5461972           |
| 0.5908        | 0.5894 | 110  | 0.9494          | 5724396           |
| 0.5753        | 0.6162 | 115  | 0.9490          | 5986712           |
| 0.5835        | 0.6430 | 120  | 0.9489          | 6238272           |
| 0.4922        | 0.6698 | 125  | 0.9483          | 6502692           |
| 0.5653        | 0.6966 | 130  | 0.9465          | 6766008           |
| 0.4244        | 0.7234 | 135  | 0.9458          | 7026916           |
| 0.561         | 0.7502 | 140  | 0.9455          | 7285852           |
| 0.5852        | 0.7770 | 145  | 0.9460          | 7548120           |
| 0.5483        | 0.8038 | 150  | 0.9445          | 7813604           |
| 0.5537        | 0.8305 | 155  | 0.9442          | 8074268           |
| 0.567         | 0.8573 | 160  | 0.9438          | 8329848           |
| 0.486         | 0.8841 | 165  | 0.9435          | 8586556           |
| 0.5464        | 0.9109 | 170  | 0.9422          | 8853500           |
| 0.5167        | 0.9377 | 175  | 0.9406          | 9116632           |
| 0.5577        | 0.9645 | 180  | 0.9423          | 9374420           |
| 0.5194        | 0.9913 | 185  | 0.9407          | 9644032           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1