---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: gemma-2-2b_hs2_iter1_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.4430
- Num Input Tokens Seen: 17305888

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 16
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.7634        | 0.0160 | 5    | 1.3647          | 280680            |
| 1.7058        | 0.0320 | 10   | 1.2551          | 560560            |
| 1.4737        | 0.0480 | 15   | 1.1867          | 829000            |
| 1.346         | 0.0640 | 20   | 1.1565          | 1107160           |
| 1.2651        | 0.0800 | 25   | 1.1401          | 1382752           |
| 1.1926        | 0.0960 | 30   | 1.1571          | 1661728           |
| 1.0951        | 0.1120 | 35   | 1.1751          | 1933304           |
| 1.0001        | 0.1279 | 40   | 1.2205          | 2214264           |
| 0.9762        | 0.1439 | 45   | 1.2602          | 2489024           |
| 0.8297        | 0.1599 | 50   | 1.3327          | 2764992           |
| 0.7969        | 0.1759 | 55   | 1.3682          | 3039920           |
| 0.8151        | 0.1919 | 60   | 1.3863          | 3315424           |
| 0.6221        | 0.2079 | 65   | 1.4445          | 3587920           |
| 0.5957        | 0.2239 | 70   | 1.4630          | 3876656           |
| 0.4842        | 0.2399 | 75   | 1.4861          | 4153720           |
| 0.4818        | 0.2559 | 80   | 1.4824          | 4429368           |
| 0.45          | 0.2719 | 85   | 1.5948          | 4708392           |
| 0.4573        | 0.2879 | 90   | 1.4911          | 4989712           |
| 0.4216        | 0.3039 | 95   | 1.5597          | 5272344           |
| 0.3548        | 0.3199 | 100  | 1.5243          | 5546808           |
| 0.3257        | 0.3359 | 105  | 1.5387          | 5823112           |
| 0.3723        | 0.3519 | 110  | 1.5167          | 6109528           |
| 0.2783        | 0.3679 | 115  | 1.5226          | 6386720           |
| 0.1892        | 0.3838 | 120  | 1.5139          | 6664328           |
| 0.2645        | 0.3998 | 125  | 1.5059          | 6941176           |
| 0.1636        | 0.4158 | 130  | 1.5091          | 7222536           |
| 0.202         | 0.4318 | 135  | 1.5481          | 7494936           |
| 0.2311        | 0.4478 | 140  | 1.4857          | 7770984           |
| 0.2528        | 0.4638 | 145  | 1.4971          | 8055360           |
| 0.2558        | 0.4798 | 150  | 1.4835          | 8330712           |
| 0.1999        | 0.4958 | 155  | 1.4816          | 8613280           |
| 0.1584        | 0.5118 | 160  | 1.4518          | 8891640           |
| 0.1637        | 0.5278 | 165  | 1.4738          | 9170232           |
| 0.1785        | 0.5438 | 170  | 1.4616          | 9443744           |
| 0.172         | 0.5598 | 175  | 1.4296          | 9719752           |
| 0.1687        | 0.5758 | 180  | 1.4798          | 9993896           |
| 0.1333        | 0.5918 | 185  | 1.4364          | 10276328          |
| 0.1173        | 0.6078 | 190  | 1.5083          | 10554248          |
| 0.118         | 0.6238 | 195  | 1.4917          | 10836392          |
| 0.1599        | 0.6397 | 200  | 1.4452          | 11112312          |
| 0.2224        | 0.6557 | 205  | 1.4793          | 11389776          |
| 0.1497        | 0.6717 | 210  | 1.4294          | 11662248          |
| 0.1591        | 0.6877 | 215  | 1.4589          | 11930472          |
| 0.1778        | 0.7037 | 220  | 1.4534          | 12205904          |
| 0.1652        | 0.7197 | 225  | 1.4452          | 12479536          |
| 0.1618        | 0.7357 | 230  | 1.4894          | 12761120          |
| 0.153         | 0.7517 | 235  | 1.4536          | 13028616          |
| 0.0795        | 0.7677 | 240  | 1.4597          | 13300744          |
| 0.1222        | 0.7837 | 245  | 1.4621          | 13577992          |
| 0.1454        | 0.7997 | 250  | 1.4310          | 13858896          |
| 0.1635        | 0.8157 | 255  | 1.4786          | 14135016          |
| 0.1454        | 0.8317 | 260  | 1.4677          | 14412744          |
| 0.0808        | 0.8477 | 265  | 1.4608          | 14696120          |
| 0.1334        | 0.8637 | 270  | 1.4460          | 14965904          |
| 0.1086        | 0.8796 | 275  | 1.4609          | 15250560          |
| 0.1077        | 0.8956 | 280  | 1.4766          | 15527232          |
| 0.1172        | 0.9116 | 285  | 1.4532          | 15807240          |
| 0.1097        | 0.9276 | 290  | 1.4706          | 16085560          |
| 0.1058        | 0.9436 | 295  | 1.4791          | 16364832          |
| 0.0922        | 0.9596 | 300  | 1.4987          | 16644056          |
| 0.1252        | 0.9756 | 305  | 1.4820          | 16920032          |
| 0.1657        | 0.9916 | 310  | 1.4333          | 17199096          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1