---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: openbmb/Eurus-7b-sft
datasets:
- generation/UF
model-index:
- name: eurus-dpo-qlora-uf-ours-5e-6
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# eurus-dpo-qlora-uf-ours-5e-6

This model is a fine-tuned version of [openbmb/Eurus-7b-sft](https://huggingface.co/openbmb/Eurus-7b-sft) on the generation/UF dataset.
It achieves the following results on the evaluation set:
- Loss: 6.1425
- Rewards/chosen: -23.7027
- Rewards/rejected: -32.8691
- Rewards/accuracies: 0.6260
- Rewards/margins: 9.1664
- Rewards/margins Max: 58.9042
- Rewards/margins Min: -33.2590
- Rewards/margins Std: 29.8583
- Logps/rejected: -3544.4312
- Logps/chosen: -2645.1541
- Logits/rejected: -0.9100
- Logits/chosen: -1.0759

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.4256        | 0.28  | 100  | 0.8163          | -1.8022        | -1.9583          | 0.5610             | 0.1561          | 2.2049              | -1.8191             | 1.3259              | -453.3455      | -455.0959    | -1.9771         | -2.0751       |
| 0.1591        | 0.56  | 200  | 1.2122          | -5.0976        | -6.6216          | 0.6050             | 1.5239          | 9.9971              | -4.8753             | 4.8268              | -919.6762      | -784.6454    | -1.3460         | -1.4469       |
| 0.1126        | 0.85  | 300  | 1.7230          | -6.1628        | -8.5878          | 0.6090             | 2.4250          | 18.9102             | -8.2202             | 8.7236              | -1116.3019     | -891.1599    | -1.2133         | -1.3142       |
| 0.074         | 1.13  | 400  | 2.0005          | -8.7127        | -11.9396         | 0.6220             | 3.2269          | 20.1537             | -9.9867             | 9.6878              | -1451.4778     | -1146.1495   | -1.3244         | -1.4370       |
| 0.0551        | 1.41  | 500  | 2.6568          | -10.4325       | -15.1571         | 0.6260             | 4.7246          | 28.6045             | -13.6975            | 13.8040             | -1773.2283     | -1318.1323   | -1.2958         | -1.4257       |
| 0.169         | 1.69  | 600  | 3.7089          | -14.9797       | -20.5965         | 0.6160             | 5.6168          | 36.0405             | -19.8931            | 18.0728             | -2317.1677     | -1772.8466   | -1.0370         | -1.1529       |
| 0.0661        | 1.97  | 700  | 4.1957          | -15.9319       | -22.6457         | 0.6220             | 6.7138          | 41.9072             | -22.6906            | 20.9609             | -2522.0879     | -1868.0721   | -1.1163         | -1.2633       |
| 0.0044        | 2.25  | 800  | 5.9108          | -22.7617       | -31.4584         | 0.6230             | 8.6967          | 56.6380             | -31.9336            | 28.6036             | -3403.3569     | -2551.0461   | -0.9371         | -1.0936       |
| 0.011         | 2.54  | 900  | 5.9213          | -23.0839       | -32.0567         | 0.6230             | 8.9728          | 56.9548             | -32.0980            | 28.8598             | -3463.1873     | -2583.2671   | -0.9208         | -1.0846       |
| 0.0138        | 2.82  | 1000 | 6.0584          | -23.3438       | -32.4235         | 0.6280             | 9.0798          | 58.3224             | -32.8664            | 29.5381             | -3499.8743     | -2609.2573   | -0.9160         | -1.0810       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2