---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1021
- Num Input Tokens Seen: 21968712

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.6085        | 0.0130 | 5    | 1.3800          | 289184            |
| 1.4378        | 0.0260 | 10   | 1.2933          | 571680            |
| 1.3575        | 0.0390 | 15   | 1.2182          | 858680            |
| 1.3348        | 0.0520 | 20   | 1.1684          | 1145936           |
| 1.1904        | 0.0650 | 25   | 1.1500          | 1437472           |
| 1.2228        | 0.0779 | 30   | 1.1339          | 1724288           |
| 1.0694        | 0.0909 | 35   | 1.1383          | 2009272           |
| 0.9697        | 0.1039 | 40   | 1.1630          | 2289000           |
| 0.9051        | 0.1169 | 45   | 1.1742          | 2569208           |
| 0.8855        | 0.1299 | 50   | 1.1729          | 2856576           |
| 0.8853        | 0.1429 | 55   | 1.1758          | 3146856           |
| 0.8296        | 0.1559 | 60   | 1.1816          | 3431392           |
| 0.7121        | 0.1689 | 65   | 1.1736          | 3726000           |
| 0.7528        | 0.1819 | 70   | 1.1792          | 4010080           |
| 0.5996        | 0.1949 | 75   | 1.1802          | 4295264           |
| 0.6437        | 0.2079 | 80   | 1.1785          | 4576256           |
| 0.6683        | 0.2209 | 85   | 1.1733          | 4869384           |
| 0.5115        | 0.2338 | 90   | 1.1750          | 5151776           |
| 0.545         | 0.2468 | 95   | 1.1701          | 5443960           |
| 0.5348        | 0.2598 | 100  | 1.1673          | 5728368           |
| 0.5687        | 0.2728 | 105  | 1.1641          | 6017560           |
| 0.4856        | 0.2858 | 110  | 1.1663          | 6300000           |
| 0.4691        | 0.2988 | 115  | 1.1630          | 6586672           |
| 0.4454        | 0.3118 | 120  | 1.1585          | 6869504           |
| 0.5734        | 0.3248 | 125  | 1.1606          | 7159680           |
| 0.4317        | 0.3378 | 130  | 1.1529          | 7437936           |
| 0.4603        | 0.3508 | 135  | 1.1541          | 7727120           |
| 0.5264        | 0.3638 | 140  | 1.1542          | 8013352           |
| 0.5051        | 0.3767 | 145  | 1.1493          | 8302848           |
| 0.397         | 0.3897 | 150  | 1.1528          | 8588472           |
| 0.4173        | 0.4027 | 155  | 1.1463          | 8876960           |
| 0.3443        | 0.4157 | 160  | 1.1474          | 9156600           |
| 0.4343        | 0.4287 | 165  | 1.1455          | 9440520           |
| 0.4683        | 0.4417 | 170  | 1.1431          | 9726600           |
| 0.4732        | 0.4547 | 175  | 1.1408          | 10009248          |
| 0.4876        | 0.4677 | 180  | 1.1414          | 10297320          |
| 0.4574        | 0.4807 | 185  | 1.1369          | 10582704          |
| 0.4038        | 0.4937 | 190  | 1.1354          | 10870648          |
| 0.4239        | 0.5067 | 195  | 1.1355          | 11148576          |
| 0.5262        | 0.5196 | 200  | 1.1291          | 11436464          |
| 0.4788        | 0.5326 | 205  | 1.1322          | 11721416          |
| 0.3975        | 0.5456 | 210  | 1.1276          | 12012696          |
| 0.3807        | 0.5586 | 215  | 1.1310          | 12299376          |
| 0.4784        | 0.5716 | 220  | 1.1232          | 12594368          |
| 0.4           | 0.5846 | 225  | 1.1272          | 12880616          |
| 0.4511        | 0.5976 | 230  | 1.1229          | 13164112          |
| 0.4119        | 0.6106 | 235  | 1.1234          | 13446016          |
| 0.3515        | 0.6236 | 240  | 1.1224          | 13729688          |
| 0.3695        | 0.6366 | 245  | 1.1201          | 14015064          |
| 0.387         | 0.6496 | 250  | 1.1190          | 14303192          |
| 0.4503        | 0.6626 | 255  | 1.1167          | 14587200          |
| 0.3205        | 0.6755 | 260  | 1.1184          | 14875032          |
| 0.3369        | 0.6885 | 265  | 1.1154          | 15159592          |
| 0.46          | 0.7015 | 270  | 1.1173          | 15443480          |
| 0.4148        | 0.7145 | 275  | 1.1121          | 15737624          |
| 0.4251        | 0.7275 | 280  | 1.1141          | 16021928          |
| 0.3786        | 0.7405 | 285  | 1.1126          | 16306944          |
| 0.3593        | 0.7535 | 290  | 1.1114          | 16592904          |
| 0.4698        | 0.7665 | 295  | 1.1114          | 16875744          |
| 0.3327        | 0.7795 | 300  | 1.1098          | 17163408          |
| 0.3521        | 0.7925 | 305  | 1.1125          | 17451024          |
| 0.3682        | 0.8055 | 310  | 1.1076          | 17741680          |
| 0.3266        | 0.8184 | 315  | 1.1098          | 18022800          |
| 0.3986        | 0.8314 | 320  | 1.1078          | 18298600          |
| 0.3869        | 0.8444 | 325  | 1.1078          | 18585288          |
| 0.3904        | 0.8574 | 330  | 1.1072          | 18870912          |
| 0.361         | 0.8704 | 335  | 1.1070          | 19165960          |
| 0.4643        | 0.8834 | 340  | 1.1047          | 19458704          |
| 0.4603        | 0.8964 | 345  | 1.1048          | 19741152          |
| 0.4815        | 0.9094 | 350  | 1.1053          | 20029752          |
| 0.3097        | 0.9224 | 355  | 1.1050          | 20317240          |
| 0.3686        | 0.9354 | 360  | 1.1033          | 20601320          |
| 0.485         | 0.9484 | 365  | 1.1042          | 20895904          |
| 0.3946        | 0.9614 | 370  | 1.1014          | 21179672          |
| 0.4621        | 0.9743 | 375  | 1.1032          | 21460376          |
| 0.4748        | 0.9873 | 380  | 1.1025          | 21737656          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1