---
base_model: google/gemma-2b
library_name: peft
license: gemma
metrics:
- accuracy
tags:
- trl
- reward-trainer
- generated_from_trainer
model-index:
- name: reward_modeling
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/quirky_lats_at_mats/huggingface/runs/k92pr3b1)
# reward_modeling

This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4036
- Accuracy: 0.8058

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Accuracy |
|:-------------:|:------:|:----:|:---------------:|:--------:|
| 0.9241        | 0.0787 | 5    | 0.6996          | 0.5678   |
| 0.7708        | 0.1575 | 10   | 0.6284          | 0.6660   |
| 0.7875        | 0.2362 | 15   | 0.5749          | 0.7244   |
| 0.6575        | 0.3150 | 20   | 0.5360          | 0.7390   |
| 0.6802        | 0.3937 | 25   | 0.5087          | 0.7432   |
| 0.3982        | 0.4724 | 30   | 0.4890          | 0.7578   |
| 0.4555        | 0.5512 | 35   | 0.4775          | 0.7599   |
| 0.8838        | 0.6299 | 40   | 0.4683          | 0.7662   |
| 0.4692        | 0.7087 | 45   | 0.4611          | 0.7662   |
| 0.5455        | 0.7874 | 50   | 0.4531          | 0.7620   |
| 0.5696        | 0.8661 | 55   | 0.4459          | 0.7662   |
| 0.7453        | 0.9449 | 60   | 0.4414          | 0.7766   |
| 0.5369        | 1.0236 | 65   | 0.4371          | 0.7829   |
| 0.3994        | 1.1024 | 70   | 0.4334          | 0.7850   |
| 0.4235        | 1.1811 | 75   | 0.4298          | 0.7912   |
| 0.4811        | 1.2598 | 80   | 0.4266          | 0.7912   |
| 0.5072        | 1.3386 | 85   | 0.4253          | 0.7912   |
| 0.4405        | 1.4173 | 90   | 0.4228          | 0.7850   |
| 0.5349        | 1.4961 | 95   | 0.4196          | 0.7871   |
| 0.3342        | 1.5748 | 100  | 0.4170          | 0.7829   |
| 0.5271        | 1.6535 | 105  | 0.4149          | 0.7933   |
| 0.3463        | 1.7323 | 110  | 0.4136          | 0.7975   |
| 0.4867        | 1.8110 | 115  | 0.4128          | 0.7996   |
| 0.3221        | 1.8898 | 120  | 0.4125          | 0.7996   |
| 0.3542        | 1.9685 | 125  | 0.4116          | 0.7996   |
| 0.5465        | 2.0472 | 130  | 0.4107          | 0.7996   |
| 0.3427        | 2.1260 | 135  | 0.4101          | 0.7996   |
| 0.4787        | 2.2047 | 140  | 0.4087          | 0.8038   |
| 0.4229        | 2.2835 | 145  | 0.4073          | 0.8017   |
| 0.4514        | 2.3622 | 150  | 0.4063          | 0.8038   |
| 0.5116        | 2.4409 | 155  | 0.4051          | 0.8038   |
| 0.3234        | 2.5197 | 160  | 0.4045          | 0.8058   |
| 0.3993        | 2.5984 | 165  | 0.4040          | 0.8058   |
| 0.3264        | 2.6772 | 170  | 0.4037          | 0.8058   |
| 0.3316        | 2.7559 | 175  | 0.4035          | 0.8038   |
| 0.4855        | 2.8346 | 180  | 0.4035          | 0.8038   |
| 0.536         | 2.9134 | 185  | 0.4036          | 0.8058   |


### Framework versions

- PEFT 0.11.1
- Transformers 4.42.3
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1