---
license: llama2
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Llama-2-7b-chat-hf
model-index:
- name: model_hh_usp4_400
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# model_hh_usp4_400

This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 4.4266
- Rewards/chosen: -7.2918
- Rewards/rejected: -9.3870
- Rewards/accuracies: 0.5500
- Rewards/margins: 2.0952
- Logps/rejected: -122.5611
- Logps/chosen: -121.2051
- Logits/rejected: -0.2787
- Logits/chosen: -0.2572

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.098         | 4.0   | 100  | 2.3307          | -3.2289        | -4.9359          | 0.5700             | 1.7070          | -117.6155      | -116.6908    | -0.5136         | -0.5011       |
| 0.2615        | 8.0   | 200  | 3.5637          | -3.5399        | -4.5546          | 0.5700             | 1.0147          | -117.1918      | -117.0363    | -0.4837         | -0.4844       |
| 0.0137        | 12.0  | 300  | 4.2146          | -3.4955        | -5.8321          | 0.5600             | 2.3366          | -118.6113      | -116.9870    | -0.3503         | -0.3327       |
| 0.0           | 16.0  | 400  | 4.4247          | -7.2840        | -9.3968          | 0.5500             | 2.1128          | -122.5721      | -121.1964    | -0.2788         | -0.2574       |
| 0.0           | 20.0  | 500  | 4.4045          | -7.2800        | -9.4193          | 0.5600             | 2.1393          | -122.5971      | -121.1920    | -0.2793         | -0.2578       |
| 0.0           | 24.0  | 600  | 4.4242          | -7.2774        | -9.3711          | 0.5600             | 2.0936          | -122.5435      | -121.1891    | -0.2789         | -0.2573       |
| 0.0           | 28.0  | 700  | 4.4048          | -7.2951        | -9.4062          | 0.5600             | 2.1110          | -122.5825      | -121.2088    | -0.2785         | -0.2570       |
| 0.0           | 32.0  | 800  | 4.4098          | -7.2804        | -9.3847          | 0.5500             | 2.1043          | -122.5586      | -121.1924    | -0.2783         | -0.2569       |
| 0.0           | 36.0  | 900  | 4.4251          | -7.2849        | -9.3768          | 0.5500             | 2.0918          | -122.5498      | -121.1974    | -0.2792         | -0.2575       |
| 0.0           | 40.0  | 1000 | 4.4266          | -7.2918        | -9.3870          | 0.5500             | 2.0952          | -122.5611      | -121.2051    | -0.2787         | -0.2572       |


### Framework versions

- PEFT 0.10.0
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2