File size: 13,080 Bytes
5aa0414
3a82156
 
5aa0414
add9404
5aa0414
 
 
add9404
 
 
 
 
5aa0414
 
 
 
 
 
 
 
 
 
add9404
5aa0414
add9404
 
 
0e8774a
add9404
 
 
 
 
5aa0414
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e8774a
5aa0414
 
 
0e8774a
 
5aa0414
0e8774a
5aa0414
0e8774a
5aa0414
 
 
 
 
0e8774a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5aa0414
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1487
- Rewards/chosen: -10.8742
- Rewards/rejected: -16.0045
- Rewards/accuracies: 0.7285
- Rewards/margins: 5.1303
- Logps/rejected: -424.4627
- Logps/chosen: -383.6538
- Logits/rejected: -0.5906
- Logits/chosen: -1.0023

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6673        | 0.0523 | 100  | 0.6670          | 0.0699         | 0.0097           | 0.6797             | 0.0602          | -264.3204      | -274.2128    | -2.5742         | -2.6289       |
| 0.5806        | 0.1047 | 200  | 0.5926          | 0.3801         | -0.0108          | 0.7051             | 0.3909          | -264.5256      | -271.1104    | -2.5225         | -2.5806       |
| 0.554         | 0.1570 | 300  | 0.5669          | 0.3096         | -0.4486          | 0.7246             | 0.7581          | -268.9032      | -271.8162    | -2.4975         | -2.5603       |
| 0.5674        | 0.2093 | 400  | 0.5521          | 0.7133         | -0.0663          | 0.7246             | 0.7797          | -265.0810      | -267.7786    | -2.4794         | -2.5387       |
| 0.512         | 0.2616 | 500  | 0.5478          | 0.1922         | -0.9270          | 0.7266             | 1.1192          | -273.6879      | -272.9901    | -2.4185         | -2.4842       |
| 0.5511        | 0.3140 | 600  | 0.5389          | -0.0115        | -1.1320          | 0.7539             | 1.1205          | -275.7375      | -275.0270    | -2.3648         | -2.4308       |
| 0.5851        | 0.3663 | 700  | 0.5448          | 0.0450         | -1.1453          | 0.7402             | 1.1903          | -275.8708      | -274.4615    | -2.4055         | -2.4622       |
| 0.5302        | 0.4186 | 800  | 0.5569          | -0.2258        | -1.2912          | 0.7324             | 1.0653          | -277.3294      | -277.1702    | -2.5104         | -2.5742       |
| 0.518         | 0.4710 | 900  | 0.5607          | -0.2557        | -1.4332          | 0.75               | 1.1775          | -278.7496      | -277.4685    | -2.4298         | -2.4910       |
| 0.5525        | 0.5233 | 1000 | 0.5601          | -0.7719        | -1.9891          | 0.7480             | 1.2172          | -284.3084      | -282.6305    | -2.4482         | -2.5089       |
| 0.5189        | 0.5756 | 1100 | 0.5515          | -0.4040        | -1.5951          | 0.7422             | 1.1911          | -280.3683      | -278.9518    | -2.4816         | -2.5430       |
| 0.5331        | 0.6279 | 1200 | 0.5453          | -0.5342        | -1.7671          | 0.7383             | 1.2329          | -282.0886      | -280.2540    | -2.4521         | -2.5080       |
| 0.5104        | 0.6803 | 1300 | 0.5511          | -0.4634        | -1.8916          | 0.7363             | 1.4282          | -283.3339      | -279.5460    | -2.4281         | -2.4909       |
| 0.4976        | 0.7326 | 1400 | 0.5413          | -0.3748        | -1.7652          | 0.7363             | 1.3904          | -282.0694      | -278.6596    | -2.4395         | -2.4947       |
| 0.4814        | 0.7849 | 1500 | 0.5447          | -0.8885        | -2.1522          | 0.7305             | 1.2637          | -285.9394      | -283.7968    | -2.4376         | -2.4908       |
| 0.5075        | 0.8373 | 1600 | 0.5423          | -0.3051        | -1.5253          | 0.7344             | 1.2202          | -279.6703      | -277.9630    | -2.4316         | -2.4816       |
| 0.4906        | 0.8896 | 1700 | 0.5806          | -1.4841        | -3.0212          | 0.7266             | 1.5371          | -294.6296      | -289.7531    | -2.4876         | -2.5438       |
| 0.536         | 0.9419 | 1800 | 0.5603          | -0.5951        | -2.1710          | 0.7383             | 1.5759          | -286.1272      | -280.8625    | -2.5694         | -2.6123       |
| 0.5164        | 0.9942 | 1900 | 0.5567          | -0.5404        | -2.0173          | 0.7422             | 1.4769          | -284.5909      | -280.3160    | -2.5490         | -2.5898       |
| 0.0947        | 1.0466 | 2000 | 0.5942          | -1.0618        | -2.9986          | 0.7344             | 1.9369          | -294.4039      | -285.5296    | -2.5622         | -2.6140       |
| 0.068         | 1.0989 | 2100 | 0.6230          | -1.6457        | -3.9093          | 0.7520             | 2.2636          | -303.5109      | -291.3689    | -2.4361         | -2.5042       |
| 0.0747        | 1.1512 | 2200 | 0.6291          | -1.3268        | -3.4945          | 0.7461             | 2.1677          | -299.3621      | -288.1795    | -2.3844         | -2.4542       |
| 0.0553        | 1.2036 | 2300 | 0.6765          | -2.2209        | -4.6502          | 0.7344             | 2.4293          | -310.9199      | -297.1208    | -2.4889         | -2.5616       |
| 0.1207        | 1.2559 | 2400 | 0.6530          | -1.7158        | -3.9584          | 0.7246             | 2.2427          | -304.0018      | -292.0695    | -2.4457         | -2.5092       |
| 0.152         | 1.3082 | 2500 | 0.6882          | -1.8791        | -4.3806          | 0.7207             | 2.5015          | -308.2237      | -293.7032    | -2.4232         | -2.4917       |
| 0.1114        | 1.3605 | 2600 | 0.6422          | -2.2334        | -4.3890          | 0.7227             | 2.1556          | -308.3074      | -297.2458    | -2.5713         | -2.6189       |
| 0.1173        | 1.4129 | 2700 | 0.6619          | -1.5700        | -4.0282          | 0.7266             | 2.4581          | -304.6991      | -290.6119    | -2.5152         | -2.5719       |
| 0.0925        | 1.4652 | 2800 | 0.6523          | -2.3231        | -4.6279          | 0.7207             | 2.3048          | -310.6963      | -298.1424    | -2.5141         | -2.5711       |
| 0.1221        | 1.5175 | 2900 | 0.6496          | -2.8770        | -5.1437          | 0.7266             | 2.2667          | -315.8546      | -303.6823    | -2.4733         | -2.5414       |
| 0.0807        | 1.5699 | 3000 | 0.6925          | -2.7762        | -5.3350          | 0.7383             | 2.5588          | -317.7678      | -302.6737    | -2.3267         | -2.4141       |
| 0.105         | 1.6222 | 3100 | 0.6540          | -2.6858        | -5.0067          | 0.7246             | 2.3209          | -314.4846      | -301.7698    | -2.3683         | -2.4395       |
| 0.1162        | 1.6745 | 3200 | 0.6481          | -1.8133        | -4.0448          | 0.7148             | 2.2315          | -304.8652      | -293.0446    | -2.3670         | -2.4379       |
| 0.0667        | 1.7268 | 3300 | 0.6541          | -2.0364        | -4.3933          | 0.7363             | 2.3569          | -308.3506      | -295.2763    | -2.2794         | -2.3589       |
| 0.0935        | 1.7792 | 3400 | 0.6690          | -2.7292        | -5.2592          | 0.7441             | 2.5300          | -317.0096      | -302.2036    | -2.2855         | -2.3694       |
| 0.095         | 1.8315 | 3500 | 0.6361          | -2.9308        | -5.1591          | 0.7266             | 2.2284          | -316.0090      | -304.2198    | -2.3827         | -2.4530       |
| 0.0719        | 1.8838 | 3600 | 0.6778          | -2.3616        | -4.8272          | 0.7246             | 2.4656          | -312.6893      | -298.5278    | -2.4285         | -2.5018       |
| 0.0729        | 1.9362 | 3700 | 0.6754          | -2.9280        | -5.4360          | 0.7285             | 2.5080          | -318.7774      | -304.1916    | -2.4287         | -2.5049       |
| 0.0867        | 1.9885 | 3800 | 0.6744          | -3.0956        | -5.5458          | 0.7324             | 2.4502          | -319.8756      | -305.8675    | -2.3542         | -2.4301       |
| 0.0057        | 2.0408 | 3900 | 0.8833          | -5.0083        | -8.7774          | 0.7324             | 3.7690          | -352.1913      | -324.9953    | -1.5131         | -1.7155       |
| 0.0042        | 2.0931 | 4000 | 0.9722          | -6.1264        | -10.3554         | 0.7441             | 4.2290          | -367.9712      | -336.1759    | -1.6158         | -1.8694       |
| 0.0144        | 2.1455 | 4100 | 1.0865          | -7.7872        | -12.6090         | 0.7227             | 4.8218          | -390.5074      | -352.7837    | -1.3817         | -1.7022       |
| 0.0222        | 2.1978 | 4200 | 1.1130          | -7.9969        | -12.8510         | 0.7090             | 4.8541          | -392.9280      | -354.8811    | -1.3909         | -1.6967       |
| 0.0062        | 2.2501 | 4300 | 1.0722          | -8.7884        | -13.4773         | 0.7188             | 4.6889          | -399.1902      | -362.7955    | -1.5072         | -1.7459       |
| 0.0164        | 2.3025 | 4400 | 1.0993          | -8.7821        | -13.5683         | 0.7246             | 4.7862          | -400.1005      | -362.7325    | -1.2294         | -1.5182       |
| 0.0043        | 2.3548 | 4500 | 1.1250          | -9.9027        | -14.7785         | 0.7324             | 4.8758          | -412.2026      | -373.9385    | -0.7476         | -1.0957       |
| 0.0055        | 2.4071 | 4600 | 1.1975          | -10.4385       | -15.5644         | 0.7285             | 5.1258          | -420.0612      | -379.2971    | -0.5940         | -1.0020       |
| 0.0096        | 2.4594 | 4700 | 1.1443          | -10.2507       | -15.1793         | 0.7344             | 4.9286          | -416.2106      | -377.4187    | -0.9036         | -1.2413       |
| 0.0121        | 2.5118 | 4800 | 1.1422          | -10.3821       | -15.4221         | 0.7188             | 5.0400          | -418.6388      | -378.7332    | -0.8425         | -1.2175       |
| 0.0129        | 2.5641 | 4900 | 1.1155          | -9.3510        | -14.2451         | 0.7227             | 4.8941          | -406.8687      | -368.4216    | -0.9190         | -1.2930       |
| 0.0027        | 2.6164 | 5000 | 1.1905          | -10.7239       | -16.0360         | 0.7246             | 5.3121          | -424.7772      | -382.1504    | -0.6076         | -1.0264       |
| 0.0069        | 2.6688 | 5100 | 1.1635          | -10.2624       | -15.5178         | 0.7266             | 5.2555          | -419.5960      | -377.5356    | -0.7336         | -1.1315       |
| 0.009         | 2.7211 | 5200 | 1.1697          | -10.4591       | -15.6846         | 0.7266             | 5.2255          | -421.2634      | -379.5029    | -0.5587         | -0.9680       |
| 0.0088        | 2.7734 | 5300 | 1.1614          | -9.6958        | -14.8576         | 0.7246             | 5.1618          | -412.9938      | -371.8698    | -0.7312         | -1.1117       |
| 0.0078        | 2.8257 | 5400 | 1.1537          | -10.1101       | -15.2615         | 0.7168             | 5.1514          | -417.0325      | -376.0129    | -0.6843         | -1.0802       |
| 0.0209        | 2.8781 | 5500 | 1.1425          | -10.8046       | -15.9002         | 0.7266             | 5.0956          | -423.4199      | -382.9582    | -0.5316         | -0.9493       |
| 0.0145        | 2.9304 | 5600 | 1.1673          | -10.6083       | -15.8081         | 0.7266             | 5.1997          | -422.4983      | -380.9951    | -0.5878         | -1.0058       |
| 0.0189        | 2.9827 | 5700 | 1.1475          | -10.8669       | -16.0106         | 0.7285             | 5.1437          | -424.5231      | -383.5809    | -0.5915         | -1.0022       |


### Framework versions

- Transformers 4.40.2
- Pytorch 2.1.2
- Datasets 2.19.1
- Tokenizers 0.19.1