File size: 16,674 Bytes
4398fcf
 
 
 
32ebdfa
4398fcf
 
 
32ebdfa
 
 
 
 
4398fcf
 
 
 
 
 
 
 
 
 
 
32ebdfa
4398fcf
32ebdfa
 
 
 
 
 
 
 
 
4398fcf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4888
- Rewards/chosen: -3.3026
- Rewards/rejected: -4.6171
- Rewards/accuracies: 0.7510
- Rewards/margins: 1.3145
- Logps/rejected: -706.2916
- Logps/chosen: -594.8843
- Logits/rejected: 1.7556
- Logits/chosen: 1.0124

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6885        | 0.01  | 100  | 0.6887          | 0.0401         | 0.0310           | 0.6155             | 0.0091          | -241.4763      | -260.6096    | -2.3013         | -2.3864       |
| 0.6826        | 0.03  | 200  | 0.6777          | 0.0538         | 0.0208           | 0.6555             | 0.0329          | -242.4942      | -259.2415    | -2.2939         | -2.3792       |
| 0.6623        | 0.04  | 300  | 0.6578          | -0.0931        | -0.1758          | 0.6735             | 0.0827          | -262.1588      | -273.9337    | -2.2310         | -2.3202       |
| 0.6619        | 0.05  | 400  | 0.6455          | -0.2994        | -0.4240          | 0.6610             | 0.1245          | -286.9754      | -294.5644    | -2.0309         | -2.1441       |
| 0.6257        | 0.07  | 500  | 0.6194          | -0.3522        | -0.5612          | 0.6850             | 0.2089          | -300.6967      | -299.8442    | -2.0400         | -2.1485       |
| 0.6114        | 0.08  | 600  | 0.6004          | -0.6308        | -0.9602          | 0.6755             | 0.3295          | -340.6012      | -327.6964    | -1.5503         | -1.7200       |
| 0.5394        | 0.09  | 700  | 0.6103          | -1.5690        | -1.9843          | 0.6635             | 0.4153          | -443.0096      | -421.5208    | -0.6532         | -0.9309       |
| 0.6171        | 0.1   | 800  | 0.6372          | -1.7546        | -2.0641          | 0.6405             | 0.3095          | -450.9858      | -440.0762    | 0.0235          | -0.3349       |
| 0.5553        | 0.12  | 900  | 0.5687          | -1.3500        | -1.8540          | 0.6930             | 0.5041          | -429.9809      | -399.6168    | 2.6187          | 1.9978        |
| 0.6299        | 0.13  | 1000 | 0.5620          | -1.1629        | -1.7464          | 0.6975             | 0.5835          | -419.2182      | -380.9113    | 3.4192          | 2.7155        |
| 0.5898        | 0.14  | 1100 | 0.5619          | -2.4368        | -3.0963          | 0.7090             | 0.6594          | -554.2042      | -508.3033    | 5.3078          | 4.4134        |
| 0.4782        | 0.16  | 1200 | 0.5594          | -1.5060        | -2.2383          | 0.7090             | 0.7323          | -468.4132      | -415.2229    | 4.0187          | 3.1485        |
| 0.5709        | 0.17  | 1300 | 0.5481          | -1.7316        | -2.3668          | 0.7245             | 0.6352          | -481.2582      | -437.7783    | 4.1315          | 3.2570        |
| 0.5181        | 0.18  | 1400 | 0.5454          | -2.4857        | -3.3898          | 0.7140             | 0.9042          | -583.5640      | -513.1900    | 4.6977          | 3.6944        |
| 0.5495        | 0.2   | 1500 | 0.5428          | -2.5602        | -3.3574          | 0.7205             | 0.7972          | -580.3215      | -520.6432    | 4.1847          | 3.2888        |
| 0.574         | 0.21  | 1600 | 0.5638          | -2.7101        | -3.5446          | 0.7190             | 0.8346          | -599.0428      | -535.6277    | 4.9219          | 3.9304        |
| 0.4901        | 0.22  | 1700 | 0.5284          | -2.4900        | -3.3577          | 0.7335             | 0.8677          | -580.3493      | -513.6201    | 3.8220          | 2.9305        |
| 0.5149        | 0.24  | 1800 | 0.5408          | -1.7507        | -2.4663          | 0.7215             | 0.7156          | -491.2047      | -439.6899    | 2.0262          | 1.2751        |
| 0.6382        | 0.25  | 1900 | 0.5325          | -2.1268        | -2.9548          | 0.7255             | 0.8279          | -540.0542      | -477.3052    | 2.4039          | 1.4990        |
| 0.5178        | 0.26  | 2000 | 0.5276          | -1.4221        | -2.1526          | 0.7305             | 0.7305          | -459.8390      | -406.8324    | 1.5288          | 0.8157        |
| 0.524         | 0.27  | 2100 | 0.5663          | -2.7101        | -3.7077          | 0.7110             | 0.9976          | -615.3445      | -535.6266    | 2.5955          | 1.6625        |
| 0.523         | 0.29  | 2200 | 0.5422          | -2.2871        | -3.3438          | 0.7230             | 1.0567          | -578.9616      | -493.3343    | 3.5955          | 2.5436        |
| 0.5431        | 0.3   | 2300 | 0.5253          | -2.1932        | -3.2183          | 0.7340             | 1.0252          | -566.4124      | -483.9387    | 4.2433          | 3.2004        |
| 0.5147        | 0.31  | 2400 | 0.5132          | -2.8441        | -3.8795          | 0.7315             | 1.0354          | -632.5286      | -549.0342    | 4.6772          | 3.6861        |
| 0.4198        | 0.33  | 2500 | 0.5214          | -2.1756        | -3.1443          | 0.7290             | 0.9687          | -559.0054      | -482.1783    | 2.7950          | 1.8511        |
| 0.5994        | 0.34  | 2600 | 0.5188          | -3.1314        | -4.1849          | 0.7290             | 1.0535          | -663.0683      | -577.7604    | 3.4511          | 2.4450        |
| 0.4812        | 0.35  | 2700 | 0.5139          | -3.0136        | -4.1060          | 0.7455             | 1.0924          | -655.1821      | -565.9851    | 3.7760          | 2.7916        |
| 0.4696        | 0.37  | 2800 | 0.5137          | -2.2305        | -3.2368          | 0.7355             | 1.0063          | -568.2574      | -487.6709    | 2.6757          | 1.8289        |
| 0.5418        | 0.38  | 2900 | 0.5177          | -2.0641        | -3.1462          | 0.7345             | 1.0822          | -559.2020      | -471.0270    | 2.0189          | 1.1899        |
| 0.5068        | 0.39  | 3000 | 0.5096          | -2.4564        | -3.5648          | 0.7400             | 1.1084          | -601.0543      | -510.2569    | 2.8679          | 2.0023        |
| 0.4429        | 0.41  | 3100 | 0.5324          | -2.7544        | -3.8869          | 0.7180             | 1.1325          | -633.2682      | -540.0566    | 1.3309          | 0.6491        |
| 0.5977        | 0.42  | 3200 | 0.4963          | -2.8842        | -3.9825          | 0.7425             | 1.0983          | -642.8285      | -553.0416    | 2.0170          | 1.2328        |
| 0.5281        | 0.43  | 3300 | 0.5074          | -2.4254        | -3.5511          | 0.7325             | 1.1257          | -599.6907      | -507.1647    | 1.1826          | 0.4294        |
| 0.5114        | 0.44  | 3400 | 0.5197          | -2.8424        | -4.0833          | 0.7255             | 1.2409          | -652.9095      | -548.8630    | 2.1493          | 1.2128        |
| 0.4984        | 0.46  | 3500 | 0.5002          | -3.1997        | -4.4222          | 0.7450             | 1.2225          | -686.7951      | -584.5864    | 3.3502          | 2.4203        |
| 0.5723        | 0.47  | 3600 | 0.5010          | -3.0065        | -4.2439          | 0.7410             | 1.2374          | -668.9721      | -565.2749    | 3.1534          | 2.2598        |
| 0.5496        | 0.48  | 3700 | 0.5015          | -3.0581        | -4.3336          | 0.7395             | 1.2755          | -677.9391      | -570.4304    | 3.3120          | 2.4472        |
| 0.5106        | 0.5   | 3800 | 0.5013          | -3.5077        | -4.8209          | 0.7395             | 1.3132          | -726.6729      | -615.3915    | 2.7134          | 1.8547        |
| 0.376         | 0.51  | 3900 | 0.4995          | -3.2636        | -4.5260          | 0.7415             | 1.2624          | -697.1753      | -590.9803    | 2.7739          | 1.9628        |
| 0.4935        | 0.52  | 4000 | 0.4916          | -2.8251        | -3.9628          | 0.7465             | 1.1377          | -640.8605      | -547.1311    | 2.2899          | 1.5516        |
| 0.445         | 0.54  | 4100 | 0.4959          | -3.1300        | -4.4063          | 0.7480             | 1.2763          | -685.2046      | -577.6177    | 2.5949          | 1.8263        |
| 0.443         | 0.55  | 4200 | 0.5039          | -2.6104        | -3.9167          | 0.7345             | 1.3063          | -636.2510      | -525.6652    | 2.5643          | 1.7637        |
| 0.517         | 0.56  | 4300 | 0.5042          | -3.0608        | -4.4485          | 0.7375             | 1.3877          | -689.4330      | -570.7054    | 2.6212          | 1.8545        |
| 0.3693        | 0.58  | 4400 | 0.4969          | -3.2698        | -4.5598          | 0.7470             | 1.2900          | -700.5564      | -591.6002    | 2.5178          | 1.8051        |
| 0.481         | 0.59  | 4500 | 0.4893          | -2.8076        | -3.9614          | 0.7445             | 1.1537          | -640.7148      | -545.3853    | 2.0329          | 1.3648        |
| 0.4696        | 0.6   | 4600 | 0.4945          | -3.3369        | -4.5983          | 0.7465             | 1.2614          | -704.4065      | -598.3125    | 2.6733          | 1.9401        |
| 0.4437        | 0.62  | 4700 | 0.4940          | -2.8130        | -4.0860          | 0.7445             | 1.2730          | -653.1788      | -545.9229    | 2.0547          | 1.2696        |
| 0.4492        | 0.63  | 4800 | 0.4963          | -2.7727        | -4.0657          | 0.7465             | 1.2930          | -651.1524      | -541.8960    | 2.3393          | 1.5355        |
| 0.5163        | 0.64  | 4900 | 0.5017          | -3.3498        | -4.7649          | 0.7465             | 1.4150          | -721.0643      | -599.6019    | 2.0201          | 1.2216        |
| 0.488         | 0.65  | 5000 | 0.4917          | -3.2508        | -4.5623          | 0.7480             | 1.3115          | -700.8107      | -589.7007    | 1.9166          | 1.1418        |
| 0.3606        | 0.67  | 5100 | 0.4905          | -2.9757        | -4.2308          | 0.7460             | 1.2551          | -667.6595      | -562.1877    | 1.5031          | 0.7813        |
| 0.58          | 0.68  | 5200 | 0.4897          | -2.8783        | -4.1021          | 0.75               | 1.2239          | -654.7924      | -552.4492    | 1.2839          | 0.5850        |
| 0.5788        | 0.69  | 5300 | 0.4900          | -3.0607        | -4.2816          | 0.7490             | 1.2209          | -672.7391      | -570.6943    | 1.4059          | 0.7114        |
| 0.4138        | 0.71  | 5400 | 0.4910          | -3.3493        | -4.6193          | 0.7515             | 1.2701          | -706.5120      | -599.5464    | 1.6121          | 0.8970        |
| 0.5737        | 0.72  | 5500 | 0.4898          | -3.1843        | -4.4515          | 0.7480             | 1.2672          | -689.7249      | -583.0511    | 1.4061          | 0.6955        |
| 0.4249        | 0.73  | 5600 | 0.4918          | -3.3448        | -4.6778          | 0.7490             | 1.3330          | -712.3564      | -599.0980    | 1.7110          | 0.9558        |
| 0.5457        | 0.75  | 5700 | 0.4897          | -3.2784        | -4.5741          | 0.75               | 1.2957          | -701.9877      | -592.4562    | 1.7372          | 0.9922        |
| 0.5287        | 0.76  | 5800 | 0.4920          | -3.3167        | -4.6600          | 0.7495             | 1.3433          | -710.5778      | -596.2890    | 1.9802          | 1.2037        |
| 0.5286        | 0.77  | 5900 | 0.4919          | -3.2305        | -4.5655          | 0.7465             | 1.3350          | -701.1276      | -587.6722    | 1.9038          | 1.1361        |
| 0.5147        | 0.79  | 6000 | 0.4910          | -3.3145        | -4.6435          | 0.7505             | 1.3290          | -708.9319      | -596.0760    | 1.9303          | 1.1726        |
| 0.4478        | 0.8   | 6100 | 0.4886          | -3.2069        | -4.5013          | 0.7480             | 1.2944          | -694.7131      | -585.3105    | 1.7621          | 1.0186        |
| 0.5236        | 0.81  | 6200 | 0.4901          | -3.3207        | -4.6497          | 0.7495             | 1.3290          | -709.5499      | -596.6957    | 1.8309          | 1.0794        |
| 0.5079        | 0.82  | 6300 | 0.4890          | -3.3084        | -4.6220          | 0.7495             | 1.3137          | -706.7820      | -595.4583    | 1.7747          | 1.0322        |
| 0.4942        | 0.84  | 6400 | 0.4891          | -3.2621        | -4.5672          | 0.7495             | 1.3051          | -701.3010      | -590.8314    | 1.7716          | 1.0268        |
| 0.4688        | 0.85  | 6500 | 0.4891          | -3.2863        | -4.5956          | 0.7505             | 1.3093          | -704.1410      | -593.2547    | 1.7863          | 1.0402        |
| 0.5062        | 0.86  | 6600 | 0.4889          | -3.2923        | -4.6029          | 0.7485             | 1.3106          | -704.8691      | -593.8478    | 1.7695          | 1.0261        |
| 0.574         | 0.88  | 6700 | 0.4887          | -3.2779        | -4.5886          | 0.7495             | 1.3108          | -703.4429      | -592.4089    | 1.7573          | 1.0140        |
| 0.5737        | 0.89  | 6800 | 0.4887          | -3.2917        | -4.6042          | 0.7510             | 1.3124          | -704.9940      | -593.7938    | 1.7560          | 1.0126        |
| 0.4298        | 0.9   | 6900 | 0.4889          | -3.2985        | -4.6115          | 0.7505             | 1.3131          | -705.7332      | -594.4664    | 1.7563          | 1.0130        |
| 0.55          | 0.92  | 7000 | 0.4889          | -3.2997        | -4.6137          | 0.7505             | 1.3140          | -705.9527      | -594.5901    | 1.7567          | 1.0132        |
| 0.4123        | 0.93  | 7100 | 0.4889          | -3.3026        | -4.6168          | 0.7515             | 1.3142          | -706.2578      | -594.8819    | 1.7586          | 1.0151        |
| 0.5207        | 0.94  | 7200 | 0.4887          | -3.3049        | -4.6192          | 0.75               | 1.3143          | -706.5007      | -595.1128    | 1.7557          | 1.0126        |
| 0.4618        | 0.96  | 7300 | 0.4888          | -3.3019        | -4.6165          | 0.7515             | 1.3145          | -706.2247      | -594.8143    | 1.7552          | 1.0116        |
| 0.4826        | 0.97  | 7400 | 0.4889          | -3.3035        | -4.6177          | 0.7510             | 1.3142          | -706.3512      | -594.9731    | 1.7538          | 1.0108        |
| 0.3856        | 0.98  | 7500 | 0.4887          | -3.3043        | -4.6187          | 0.7515             | 1.3144          | -706.4486      | -595.0473    | 1.7544          | 1.0114        |
| 0.5369        | 0.99  | 7600 | 0.4886          | -3.3028        | -4.6175          | 0.7520             | 1.3147          | -706.3290      | -594.9012    | 1.7559          | 1.0126        |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0
- Datasets 2.17.1
- Tokenizers 0.15.2