---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-hf
model-index:
- name: llama_DPO_model_e2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama_DPO_model_e2

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0037
- Rewards/chosen: 0.5612
- Rewards/rejected: -5.9460
- Rewards/accuracies: 1.0
- Rewards/margins: 6.5073
- Logps/rejected: -244.2698
- Logps/chosen: -155.0214
- Logits/rejected: -1.0632
- Logits/chosen: -0.8795

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.3693        | 0.1   | 25   | 0.1586          | 0.3906         | -1.4782          | 1.0                | 1.8688          | -199.5915      | -156.7276    | -1.0532         | -0.8639       |
| 0.0442        | 0.2   | 50   | 0.0275          | 0.5577         | -3.3969          | 1.0                | 3.9546          | -218.7789      | -155.0573    | -1.0591         | -0.8709       |
| 0.0153        | 0.3   | 75   | 0.0123          | 0.5805         | -4.3685          | 1.0                | 4.9490          | -228.4945      | -154.8291    | -1.0641         | -0.8765       |
| 0.0098        | 0.4   | 100  | 0.0083          | 0.5880         | -4.8560          | 1.0                | 5.4440          | -233.3696      | -154.7535    | -1.0654         | -0.8801       |
| 0.0072        | 0.5   | 125  | 0.0065          | 0.5779         | -5.1733          | 1.0                | 5.7513          | -236.5429      | -154.8546    | -1.0667         | -0.8808       |
| 0.0056        | 0.6   | 150  | 0.0058          | 0.5669         | -5.3483          | 1.0                | 5.9152          | -238.2926      | -154.9651    | -1.0674         | -0.8815       |
| 0.0059        | 0.7   | 175  | 0.0051          | 0.5733         | -5.4970          | 1.0                | 6.0704          | -239.7797      | -154.9004    | -1.0659         | -0.8820       |
| 0.0065        | 0.79  | 200  | 0.0047          | 0.5713         | -5.6304          | 1.0                | 6.2017          | -241.1136      | -154.9210    | -1.0653         | -0.8803       |
| 0.0044        | 0.89  | 225  | 0.0043          | 0.5689         | -5.7514          | 1.0                | 6.3203          | -242.3240      | -154.9452    | -1.0650         | -0.8816       |
| 0.004         | 0.99  | 250  | 0.0041          | 0.5671         | -5.8118          | 1.0                | 6.3790          | -242.9280      | -154.9625    | -1.0644         | -0.8796       |
| 0.0029        | 1.09  | 275  | 0.0040          | 0.5648         | -5.8589          | 1.0                | 6.4237          | -243.3990      | -154.9863    | -1.0633         | -0.8800       |
| 0.0035        | 1.19  | 300  | 0.0038          | 0.5658         | -5.8892          | 1.0                | 6.4549          | -243.7013      | -154.9761    | -1.0630         | -0.8785       |
| 0.0024        | 1.29  | 325  | 0.0039          | 0.5618         | -5.9044          | 1.0                | 6.4662          | -243.8535      | -155.0163    | -1.0628         | -0.8787       |
| 0.0034        | 1.39  | 350  | 0.0038          | 0.5595         | -5.9136          | 1.0                | 6.4731          | -243.9456      | -155.0389    | -1.0632         | -0.8788       |
| 0.0029        | 1.49  | 375  | 0.0038          | 0.5601         | -5.9328          | 1.0                | 6.4929          | -244.1375      | -155.0332    | -1.0634         | -0.8792       |
| 0.003         | 1.59  | 400  | 0.0038          | 0.5605         | -5.9352          | 1.0                | 6.4957          | -244.1614      | -155.0284    | -1.0632         | -0.8793       |
| 0.0021        | 1.69  | 425  | 0.0038          | 0.5593         | -5.9410          | 1.0                | 6.5003          | -244.2199      | -155.0412    | -1.0630         | -0.8792       |
| 0.0036        | 1.79  | 450  | 0.0038          | 0.5605         | -5.9408          | 1.0                | 6.5013          | -244.2178      | -155.0292    | -1.0631         | -0.8794       |
| 0.0031        | 1.89  | 475  | 0.0038          | 0.5567         | -5.9439          | 1.0                | 6.5006          | -244.2483      | -155.0666    | -1.0634         | -0.8782       |
| 0.0032        | 1.99  | 500  | 0.0037          | 0.5612         | -5.9460          | 1.0                | 6.5073          | -244.2698      | -155.0214    | -1.0632         | -0.8795       |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2