---
base_model: meta-llama/Llama-2-7b-hf
library_name: peft
license: llama2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V2

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2147
- Rewards/chosen: -2.3589
- Rewards/rejected: -2.1848
- Rewards/accuracies: 0.3333
- Rewards/margins: -0.1740
- Logps/rejected: -176.9075
- Logps/chosen: -185.7344
- Logits/rejected: -0.3397
- Logits/chosen: -0.3554

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.7064        | 0.3020 | 77   | 0.7263          | -0.0650        | -0.0237          | 0.5                | -0.0414         | -155.2957      | -162.7962    | 0.2969          | 0.2895        |
| 0.6816        | 0.6039 | 154  | 0.7127          | -0.1015        | -0.1222          | 0.5                | 0.0207          | -156.2813      | -163.1606    | 0.2989          | 0.2915        |
| 0.6192        | 0.9059 | 231  | 0.7010          | -0.0808        | -0.1624          | 0.5833             | 0.0816          | -156.6835      | -162.9536    | 0.2774          | 0.2692        |
| 0.2805        | 1.2078 | 308  | 0.8302          | -0.5931        | -0.6582          | 0.6667             | 0.0651          | -161.6412      | -168.0767    | 0.1922          | 0.1839        |
| 0.3604        | 1.5098 | 385  | 0.8663          | -0.8552        | -0.8899          | 0.5833             | 0.0347          | -163.9578      | -170.6977    | 0.0866          | 0.0775        |
| 0.3524        | 1.8118 | 462  | 0.9587          | -1.3495        | -1.3440          | 0.5                | -0.0055         | -168.4993      | -175.6406    | -0.0538         | -0.0645       |
| 0.2168        | 2.1137 | 539  | 1.0785          | -1.8309        | -1.7601          | 0.5833             | -0.0708         | -172.6597      | -180.4545    | -0.2246         | -0.2382       |
| 0.0395        | 2.4157 | 616  | 1.2284          | -2.4130        | -2.2406          | 0.3333             | -0.1724         | -177.4654      | -186.2757    | -0.3472         | -0.3633       |
| 0.2081        | 2.7176 | 693  | 1.2147          | -2.3589        | -2.1848          | 0.3333             | -0.1740         | -176.9075      | -185.7344    | -0.3397         | -0.3554       |


### Framework versions

- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 3.0.2
- Tokenizers 0.19.1