---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
- name: llama_DPO_model_e2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama_DPO_model_e2

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0896
- Rewards/chosen: 0.4401
- Rewards/rejected: -2.0930
- Rewards/accuracies: 1.0
- Rewards/margins: 2.5330
- Logps/rejected: -205.7391
- Logps/chosen: -156.2334
- Logits/rejected: -1.0514
- Logits/chosen: -0.8587

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6699        | 0.1   | 25   | 0.6428          | 0.0307         | -0.0744          | 0.9033             | 0.1051          | -185.5532      | -160.3267    | -1.0520         | -0.8550       |
| 0.5702        | 0.2   | 50   | 0.5471          | 0.0866         | -0.2359          | 0.9933             | 0.3225          | -187.1690      | -159.7680    | -1.0514         | -0.8544       |
| 0.488         | 0.3   | 75   | 0.4456          | 0.1502         | -0.4424          | 1.0                | 0.5926          | -189.2334      | -159.1314    | -1.0527         | -0.8555       |
| 0.3957        | 0.4   | 100  | 0.3600          | 0.2054         | -0.6615          | 1.0                | 0.8669          | -191.4245      | -158.5795    | -1.0530         | -0.8577       |
| 0.3338        | 0.5   | 125  | 0.2865          | 0.2569         | -0.8933          | 1.0                | 1.1502          | -193.7425      | -158.0646    | -1.0524         | -0.8564       |
| 0.253         | 0.6   | 150  | 0.2257          | 0.3043         | -1.1373          | 1.0                | 1.4416          | -196.1830      | -157.5914    | -1.0523         | -0.8570       |
| 0.2134        | 0.7   | 175  | 0.1819          | 0.3496         | -1.3537          | 1.0                | 1.7033          | -198.3466      | -157.1379    | -1.0530         | -0.8584       |
| 0.1613        | 0.79  | 200  | 0.1473          | 0.3842         | -1.5693          | 1.0                | 1.9535          | -200.5027      | -156.7917    | -1.0525         | -0.8591       |
| 0.1358        | 0.89  | 225  | 0.1231          | 0.4031         | -1.7582          | 1.0                | 2.1614          | -202.3919      | -156.6024    | -1.0523         | -0.8593       |
| 0.115         | 0.99  | 250  | 0.1076          | 0.4205         | -1.8980          | 1.0                | 2.3185          | -203.7897      | -156.4292    | -1.0521         | -0.8590       |
| 0.1111        | 1.09  | 275  | 0.0989          | 0.4291         | -1.9856          | 1.0                | 2.4148          | -204.6660      | -156.3426    | -1.0515         | -0.8591       |
| 0.0902        | 1.19  | 300  | 0.0949          | 0.4280         | -2.0337          | 1.0                | 2.4617          | -205.1465      | -156.3540    | -1.0507         | -0.8576       |
| 0.0867        | 1.29  | 325  | 0.0920          | 0.4325         | -2.0705          | 1.0                | 2.5030          | -205.5146      | -156.3087    | -1.0510         | -0.8576       |
| 0.0973        | 1.39  | 350  | 0.0905          | 0.4357         | -2.0839          | 1.0                | 2.5196          | -205.6485      | -156.2766    | -1.0506         | -0.8576       |
| 0.0942        | 1.49  | 375  | 0.0897          | 0.4422         | -2.0838          | 1.0                | 2.5260          | -205.6476      | -156.2122    | -1.0515         | -0.8578       |
| 0.0858        | 1.59  | 400  | 0.0897          | 0.4392         | -2.0903          | 1.0                | 2.5295          | -205.7121      | -156.2415    | -1.0515         | -0.8587       |
| 0.083         | 1.69  | 425  | 0.0893          | 0.4401         | -2.0972          | 1.0                | 2.5373          | -205.7811      | -156.2327    | -1.0511         | -0.8584       |
| 0.0964        | 1.79  | 450  | 0.0897          | 0.4368         | -2.0947          | 1.0                | 2.5315          | -205.7564      | -156.2662    | -1.0511         | -0.8577       |
| 0.0931        | 1.89  | 475  | 0.0890          | 0.4406         | -2.0970          | 1.0                | 2.5376          | -205.7794      | -156.2282    | -1.0512         | -0.8585       |
| 0.0915        | 1.99  | 500  | 0.0896          | 0.4401         | -2.0930          | 1.0                | 2.5330          | -205.7391      | -156.2334    | -1.0514         | -0.8587       |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2