---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [ale-bay/zephyr-7b-sft-qlora](https://huggingface.co/ale-bay/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4975
- Rewards/chosen: -2.4549
- Rewards/rejected: -3.4757
- Rewards/accuracies: 0.7490
- Rewards/margins: 1.0207
- Logps/rejected: -595.2866
- Logps/chosen: -517.1966
- Logits/rejected: -1.3432
- Logits/chosen: -1.4358

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6641        | 0.05  | 100  | 0.6636          | 0.0054         | -0.0681          | 0.6900             | 0.0735          | -254.5337      | -271.1659    | -2.0436         | -2.1368       |
| 0.6105        | 0.1   | 200  | 0.6075          | -0.3236        | -0.5938          | 0.6890             | 0.2702          | -307.0967      | -304.0613    | -2.0030         | -2.0919       |
| 0.5883        | 0.16  | 300  | 0.5817          | -0.7122        | -1.1286          | 0.7020             | 0.4164          | -360.5768      | -342.9188    | -1.9914         | -2.0761       |
| 0.5651        | 0.21  | 400  | 0.5665          | -0.7901        | -1.2897          | 0.7250             | 0.4996          | -376.6874      | -350.7093    | -1.9001         | -1.9820       |
| 0.5136        | 0.26  | 500  | 0.5520          | -1.0330        | -1.6646          | 0.7190             | 0.6316          | -414.1808      | -374.9992    | -1.8081         | -1.8880       |
| 0.5587        | 0.31  | 600  | 0.5327          | -1.3215        | -2.0089          | 0.7320             | 0.6874          | -448.6079      | -403.8534    | -1.4665         | -1.5609       |
| 0.5167        | 0.37  | 700  | 0.5299          | -1.2797        | -2.1992          | 0.7230             | 0.9196          | -467.6413      | -399.6684    | -1.3918         | -1.4903       |
| 0.5465        | 0.42  | 800  | 0.5189          | -1.6646        | -2.4686          | 0.7200             | 0.8041          | -494.5844      | -438.1617    | -1.3685         | -1.4642       |
| 0.5002        | 0.47  | 900  | 0.5142          | -1.7844        | -2.7217          | 0.7290             | 0.9373          | -519.8885      | -450.1383    | -1.4179         | -1.5054       |
| 0.5017        | 0.52  | 1000 | 0.5058          | -2.6175        | -3.6120          | 0.7360             | 0.9946          | -608.9218      | -533.4493    | -1.2973         | -1.3948       |
| 0.4966        | 0.58  | 1100 | 0.5043          | -2.0581        | -2.9819          | 0.7370             | 0.9239          | -545.9103      | -477.5080    | -1.3783         | -1.4740       |
| 0.5087        | 0.63  | 1200 | 0.5040          | -2.3715        | -3.3475          | 0.7450             | 0.9760          | -582.4712      | -508.8495    | -1.3331         | -1.4262       |
| 0.4799        | 0.68  | 1300 | 0.5011          | -2.3067        | -3.3444          | 0.7450             | 1.0377          | -582.1562      | -502.3687    | -1.3340         | -1.4277       |
| 0.4606        | 0.73  | 1400 | 0.4991          | -2.5016        | -3.5583          | 0.7430             | 1.0567          | -603.5469      | -521.8631    | -1.3291         | -1.4219       |
| 0.4763        | 0.79  | 1500 | 0.4985          | -2.4979        | -3.5204          | 0.7470             | 1.0225          | -599.7631      | -521.4944    | -1.3394         | -1.4325       |
| 0.5008        | 0.84  | 1600 | 0.4977          | -2.4555        | -3.4719          | 0.7480             | 1.0164          | -594.9102      | -517.2504    | -1.3492         | -1.4415       |
| 0.4654        | 0.89  | 1700 | 0.4976          | -2.4498        | -3.4672          | 0.7510             | 1.0174          | -594.4417      | -516.6852    | -1.3478         | -1.4402       |
| 0.4854        | 0.94  | 1800 | 0.4975          | -2.4526        | -3.4731          | 0.7480             | 1.0205          | -595.0339      | -516.9640    | -1.3441         | -1.4366       |
| 0.4879        | 0.99  | 1900 | 0.4974          | -2.4531        | -3.4740          | 0.75               | 1.0209          | -595.1221      | -517.0148    | -1.3432         | -1.4359       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.3
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.15.2