File size: 11,722 Bytes
ae0e280
 
 
 
7eb1fc8
 
ae0e280
 
 
7eb1fc8
 
ae0e280
 
 
 
 
 
 
 
 
 
 
7eb1fc8
ae0e280
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4880
- Rewards/chosen: -2.8615
- Rewards/rejected: -3.9313
- Rewards/accuracies: 0.7262
- Rewards/margins: 1.0698
- Logps/rejected: -626.2534
- Logps/chosen: -549.3907
- Logits/rejected: 1.3412
- Logits/chosen: 0.7713

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 4
- total_train_batch_size: 12
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6884        | 0.02  | 100  | 0.6868          | 0.0390         | 0.0284           | 0.6146             | 0.0106          | -230.2779      | -259.3362    | -2.3476         | -2.3366       |
| 0.6654        | 0.04  | 200  | 0.6657          | 0.0334         | -0.0194          | 0.6399             | 0.0528          | -235.0622      | -259.9052    | -2.2635         | -2.2585       |
| 0.6346        | 0.06  | 300  | 0.6431          | -0.2564        | -0.3692          | 0.6533             | 0.1128          | -270.0399      | -288.8787    | -2.2107         | -2.2217       |
| 0.5888        | 0.08  | 400  | 0.6162          | -0.4195        | -0.6312          | 0.6518             | 0.2118          | -296.2420      | -305.1884    | -1.9579         | -1.9905       |
| 0.5806        | 0.1   | 500  | 0.5916          | -1.3171        | -1.6507          | 0.6637             | 0.3337          | -398.1920      | -394.9468    | -0.4990         | -0.5253       |
| 0.6219        | 0.12  | 600  | 0.5753          | -1.1344        | -1.5063          | 0.6503             | 0.3719          | -383.7478      | -376.6808    | 0.0384          | -0.0361       |
| 0.5586        | 0.14  | 700  | 0.5733          | -0.7892        | -1.1878          | 0.6667             | 0.3986          | -351.8957      | -342.1609    | 0.3073          | 0.2473        |
| 0.6123        | 0.16  | 800  | 0.5578          | -1.2731        | -1.7042          | 0.6652             | 0.4311          | -403.5397      | -390.5542    | 1.0809          | 1.0327        |
| 0.555         | 0.18  | 900  | 0.5461          | -1.1941        | -1.8087          | 0.6771             | 0.6146          | -413.9875      | -382.6491    | 1.4158          | 1.3993        |
| 0.4905        | 0.2   | 1000 | 0.5463          | -1.2469        | -1.9528          | 0.6890             | 0.7058          | -428.3945      | -387.9334    | 0.8211          | 0.7732        |
| 0.5214        | 0.22  | 1100 | 0.5356          | -1.2786        | -1.8992          | 0.6979             | 0.6206          | -423.0347      | -391.1008    | 1.3945          | 1.4163        |
| 0.4988        | 0.24  | 1200 | 0.5307          | -1.2179        | -1.9293          | 0.6979             | 0.7115          | -426.0503      | -385.0261    | 1.0273          | 0.9228        |
| 0.5324        | 0.26  | 1300 | 0.5320          | -1.4512        | -2.1779          | 0.7024             | 0.7267          | -450.9060      | -408.3595    | 0.9344          | 0.5917        |
| 0.5286        | 0.27  | 1400 | 0.5193          | -1.3777        | -2.1412          | 0.7039             | 0.7634          | -447.2371      | -401.0145    | 1.1979          | 0.8244        |
| 0.6095        | 0.29  | 1500 | 0.5206          | -1.1730        | -1.8883          | 0.7009             | 0.7153          | -421.9497      | -380.5422    | 0.3598          | -0.0238       |
| 0.5627        | 0.31  | 1600 | 0.5225          | -1.8811        | -2.7733          | 0.6935             | 0.8922          | -510.4463      | -451.3462    | 0.7395          | 0.4147        |
| 0.5222        | 0.33  | 1700 | 0.5210          | -1.1883        | -1.8477          | 0.7143             | 0.6593          | -417.8853      | -382.0739    | -0.0643         | -0.3844       |
| 0.5163        | 0.35  | 1800 | 0.5219          | -1.1780        | -1.9783          | 0.7247             | 0.8003          | -430.9522      | -381.0428    | 1.3000          | 0.9605        |
| 0.511         | 0.37  | 1900 | 0.5214          | -1.8532        | -2.7395          | 0.7188             | 0.8863          | -507.0662      | -448.5622    | 1.3052          | 0.9550        |
| 0.484         | 0.39  | 2000 | 0.5161          | -1.7800        | -2.6182          | 0.7188             | 0.8382          | -494.9370      | -441.2427    | 1.6339          | 1.3132        |
| 0.4863        | 0.41  | 2100 | 0.5183          | -2.7826        | -3.8427          | 0.7158             | 1.0600          | -617.3857      | -541.5035    | 2.3428          | 2.0461        |
| 0.5233        | 0.43  | 2200 | 0.5115          | -1.7702        | -2.6185          | 0.7173             | 0.8483          | -494.9643      | -440.2580    | 0.9791          | 0.5628        |
| 0.5343        | 0.45  | 2300 | 0.5079          | -1.4313        | -2.2210          | 0.7202             | 0.7897          | -455.2213      | -406.3701    | 1.0255          | 0.5469        |
| 0.5251        | 0.47  | 2400 | 0.5088          | -2.7117        | -3.7995          | 0.7173             | 1.0878          | -613.0708      | -534.4126    | 2.1153          | 1.5133        |
| 0.5104        | 0.49  | 2500 | 0.5006          | -2.9970        | -4.0022          | 0.7202             | 1.0052          | -633.3362      | -562.9377    | 2.2889          | 1.7461        |
| 0.429         | 0.51  | 2600 | 0.5238          | -3.6282        | -4.8032          | 0.7143             | 1.1750          | -713.4386      | -626.0600    | 3.6631          | 3.2827        |
| 0.4255        | 0.53  | 2700 | 0.4993          | -2.4946        | -3.5067          | 0.7188             | 1.0121          | -583.7889      | -512.7010    | 2.1920          | 1.6873        |
| 0.4733        | 0.55  | 2800 | 0.4990          | -3.2116        | -4.2800          | 0.7202             | 1.0684          | -661.1174      | -584.3987    | 2.6796          | 2.2111        |
| 0.5394        | 0.57  | 2900 | 0.5040          | -2.9132        | -3.9276          | 0.7158             | 1.0143          | -625.8766      | -554.5653    | 1.7758          | 1.2351        |
| 0.5128        | 0.59  | 3000 | 0.5061          | -2.5974        | -3.5725          | 0.7173             | 0.9750          | -590.3638      | -522.9818    | 2.1284          | 1.6663        |
| 0.5215        | 0.61  | 3100 | 0.4960          | -2.2632        | -3.1876          | 0.7188             | 0.9245          | -551.8787      | -489.5560    | 1.4432          | 0.8594        |
| 0.5023        | 0.63  | 3200 | 0.4999          | -2.8630        | -3.9641          | 0.7128             | 1.1011          | -629.5237      | -549.5392    | 1.9057          | 1.2951        |
| 0.5042        | 0.65  | 3300 | 0.4904          | -2.8448        | -3.8793          | 0.7307             | 1.0345          | -621.0500      | -547.7245    | 1.9776          | 1.4334        |
| 0.498         | 0.67  | 3400 | 0.4879          | -2.8423        | -3.8097          | 0.7321             | 0.9673          | -614.0843      | -547.4754    | 1.4781          | 0.9608        |
| 0.4987        | 0.69  | 3500 | 0.4902          | -2.6926        | -3.7172          | 0.7307             | 1.0246          | -604.8372      | -532.4977    | 1.3819          | 0.8557        |
| 0.5824        | 0.71  | 3600 | 0.4908          | -2.5673        | -3.5933          | 0.7292             | 1.0260          | -592.4445      | -519.9661    | 1.1037          | 0.5336        |
| 0.425         | 0.73  | 3700 | 0.4906          | -2.7666        | -3.8246          | 0.7307             | 1.0580          | -615.5826      | -539.9020    | 1.2903          | 0.7257        |
| 0.4756        | 0.75  | 3800 | 0.4916          | -2.8732        | -3.9598          | 0.7292             | 1.0866          | -629.0961      | -550.5607    | 1.5015          | 0.9387        |
| 0.4597        | 0.77  | 3900 | 0.4896          | -2.8617        | -3.9425          | 0.7277             | 1.0808          | -627.3712      | -549.4086    | 1.3350          | 0.7636        |
| 0.4649        | 0.79  | 4000 | 0.4885          | -2.8682        | -3.9370          | 0.7232             | 1.0688          | -626.8230      | -550.0615    | 1.2903          | 0.7213        |
| 0.4689        | 0.8   | 4100 | 0.4880          | -2.8425        | -3.9060          | 0.7232             | 1.0634          | -623.7166      | -547.4950    | 1.2495          | 0.6763        |
| 0.4275        | 0.82  | 4200 | 0.4877          | -2.8671        | -3.9353          | 0.7232             | 1.0682          | -626.6478      | -549.9532    | 1.3067          | 0.7331        |
| 0.5325        | 0.84  | 4300 | 0.4881          | -2.8855        | -3.9630          | 0.7262             | 1.0775          | -629.4202      | -551.7905    | 1.3795          | 0.8070        |
| 0.532         | 0.86  | 4400 | 0.4881          | -2.8672        | -3.9406          | 0.7277             | 1.0734          | -627.1785      | -549.9610    | 1.3435          | 0.7732        |
| 0.4558        | 0.88  | 4500 | 0.4879          | -2.8560        | -3.9259          | 0.7262             | 1.0699          | -625.7067      | -548.8392    | 1.3411          | 0.7711        |
| 0.5541        | 0.9   | 4600 | 0.4882          | -2.8601        | -3.9295          | 0.7262             | 1.0694          | -626.0704      | -549.2481    | 1.3428          | 0.7729        |
| 0.5743        | 0.92  | 4700 | 0.4879          | -2.8641        | -3.9344          | 0.7262             | 1.0702          | -626.5551      | -549.6526    | 1.3445          | 0.7755        |
| 0.4657        | 0.94  | 4800 | 0.4880          | -2.8626        | -3.9322          | 0.7292             | 1.0696          | -626.3386      | -549.4993    | 1.3437          | 0.7749        |
| 0.5126        | 0.96  | 4900 | 0.4880          | -2.8636        | -3.9339          | 0.7277             | 1.0703          | -626.5126      | -549.6042    | 1.3440          | 0.7748        |
| 0.3967        | 0.98  | 5000 | 0.4880          | -2.8643        | -3.9344          | 0.7262             | 1.0702          | -626.5614      | -549.6658    | 1.3424          | 0.7736        |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.2.1+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2