File size: 14,330 Bytes
0be1f7d
 
9fc7c6f
 
 
0be1f7d
 
 
9fc7c6f
0be1f7d
 
 
 
 
 
 
 
 
 
 
 
 
9fc7c6f
0be1f7d
9fc7c6f
 
 
 
 
 
0be1f7d
9fc7c6f
 
 
 
 
 
 
0be1f7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/UF
- generation/UFfull2
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-dpop-qlora-uf-ours-uffull-5e-6
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-dpop-qlora-uf-ours-uffull-5e-6

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF and the generation/UFfull2 datasets.
It achieves the following results on the evaluation set:
- Loss: 0.6950
- Positive Losses: 0.5820
- Dpo Losses: 0.6380
- Rewards/chosen: 0.2290
- Rewards/rejected: 0.0996
- Rewards/accuracies: 0.7060
- Rewards/margins: 0.1294
- Rewards/margins Max: 0.5134
- Rewards/margins Min: -0.1814
- Rewards/margins Std: 0.2328
- Logps/rejected: -255.8980
- Logps/chosen: -261.5583
- Logits/rejected: -2.6096
- Logits/chosen: -2.6435

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6915        | 0.02  | 100  | 0.6917          | 0.0059          | 0.6910     | 0.0266         | 0.0222           | 0.6170             | 0.0043          | 0.0246              | -0.0134             | 0.0126              | -263.6297      | -281.7968    | -2.7663         | -2.8014       |
| 0.6797        | 0.05  | 200  | 0.6897          | 0.0702          | 0.6800     | 0.0886         | 0.0608           | 0.6570             | 0.0278          | 0.1378              | -0.0648             | 0.0675              | -259.7737      | -275.5939    | -2.7413         | -2.7759       |
| 0.6804        | 0.07  | 300  | 0.6845          | 0.0848          | 0.6724     | 0.1325         | 0.0877           | 0.6675             | 0.0448          | 0.2086              | -0.0924             | 0.1004              | -257.0813      | -271.2012    | -2.7504         | -2.7853       |
| 0.6951        | 0.1   | 400  | 0.6829          | 0.1179          | 0.6671     | 0.1575         | 0.1005           | 0.6715             | 0.0570          | 0.2589              | -0.1125             | 0.1237              | -255.7986      | -268.7028    | -2.6989         | -2.7337       |
| 0.6599        | 0.12  | 500  | 0.6868          | 0.1747          | 0.6620     | 0.1717         | 0.1030           | 0.6805             | 0.0688          | 0.2913              | -0.1240             | 0.1393              | -255.5571      | -267.2820    | -2.6656         | -2.7019       |
| 0.6899        | 0.14  | 600  | 0.6773          | 0.1322          | 0.6631     | 0.1930         | 0.1265           | 0.6805             | 0.0665          | 0.2912              | -0.1245             | 0.1385              | -253.2036      | -265.1512    | -2.6976         | -2.7346       |
| 0.6596        | 0.17  | 700  | 0.6841          | 0.2476          | 0.6579     | 0.1952         | 0.1160           | 0.6790             | 0.0792          | 0.3399              | -0.1420             | 0.1603              | -254.2511      | -264.9378    | -2.6481         | -2.6842       |
| 0.6618        | 0.19  | 800  | 0.7055          | 0.6819          | 0.6582     | 0.1938         | 0.1128           | 0.6725             | 0.0810          | 0.3642              | -0.1653             | 0.1763              | -254.5748      | -265.0780    | -2.6749         | -2.7097       |
| 0.6742        | 0.22  | 900  | 0.7031          | 0.6125          | 0.6568     | 0.1979         | 0.1141           | 0.6810             | 0.0839          | 0.3706              | -0.1651             | 0.1783              | -254.4471      | -264.6613    | -2.6218         | -2.6566       |
| 0.6751        | 0.24  | 1000 | 0.7010          | 0.6677          | 0.6601     | 0.2068         | 0.1295           | 0.6755             | 0.0773          | 0.3517              | -0.1632             | 0.1718              | -252.9047      | -263.7737    | -2.6192         | -2.6553       |
| 0.7098        | 0.26  | 1100 | 0.7131          | 0.8234          | 0.6548     | 0.1971         | 0.1068           | 0.6775             | 0.0903          | 0.3961              | -0.1800             | 0.1920              | -255.1729      | -264.7435    | -2.6144         | -2.6518       |
| 0.6678        | 0.29  | 1200 | 0.7126          | 0.8054          | 0.6533     | 0.2007         | 0.1068           | 0.6810             | 0.0938          | 0.4066              | -0.1769             | 0.1949              | -255.1695      | -264.3879    | -2.5888         | -2.6260       |
| 0.6611        | 0.31  | 1300 | 0.7072          | 0.7968          | 0.6584     | 0.2114         | 0.1291           | 0.6725             | 0.0823          | 0.3729              | -0.1733             | 0.1825              | -252.9392      | -263.3107    | -2.5893         | -2.6265       |
| 0.6852        | 0.34  | 1400 | 0.7117          | 0.8828          | 0.6578     | 0.2125         | 0.1283           | 0.6865             | 0.0842          | 0.3801              | -0.1702             | 0.1839              | -253.0243      | -263.2099    | -2.5908         | -2.6269       |
| 0.7148        | 0.36  | 1500 | 0.7147          | 0.8994          | 0.6537     | 0.2082         | 0.1146           | 0.6775             | 0.0936          | 0.4107              | -0.1826             | 0.1980              | -254.3940      | -263.6350    | -2.5606         | -2.5971       |
| 0.734         | 0.38  | 1600 | 0.7263          | 0.9562          | 0.6467     | 0.1975         | 0.0887           | 0.7005             | 0.1088          | 0.4496              | -0.1881             | 0.2128              | -256.9880      | -264.7073    | -2.5414         | -2.5748       |
| 0.68          | 0.41  | 1700 | 0.6886          | 0.4934          | 0.6531     | 0.2201         | 0.1281           | 0.6895             | 0.0920          | 0.3890              | -0.1655             | 0.1858              | -253.0398      | -262.4442    | -2.6144         | -2.6469       |
| 0.9221        | 0.43  | 1800 | 0.6972          | 0.5938          | 0.6479     | 0.2127         | 0.1083           | 0.6855             | 0.1044          | 0.4219              | -0.1737             | 0.2001              | -255.0207      | -263.1860    | -2.6572         | -2.6883       |
| 0.6965        | 0.45  | 1900 | 0.7029          | 0.5493          | 0.6415     | 0.2047         | 0.0857           | 0.6980             | 0.1190          | 0.4554              | -0.1734             | 0.2113              | -257.2836      | -263.9902    | -2.6385         | -2.6680       |
| 0.6754        | 0.48  | 2000 | 0.6736          | 0.2085          | 0.6476     | 0.2262         | 0.1217           | 0.6960             | 0.1045          | 0.4193              | -0.1652             | 0.1960              | -253.6813      | -261.8383    | -2.6573         | -2.6879       |
| 0.6527        | 0.5   | 2100 | 0.6734          | 0.1901          | 0.6479     | 0.2309         | 0.1262           | 0.6940             | 0.1046          | 0.4298              | -0.1721             | 0.2013              | -253.2316      | -261.3691    | -2.6274         | -2.6587       |
| 0.6693        | 0.53  | 2200 | 0.6811          | 0.3594          | 0.6470     | 0.2250         | 0.1186           | 0.6885             | 0.1064          | 0.4311              | -0.1714             | 0.2022              | -253.9932      | -261.9567    | -2.6328         | -2.6644       |
| 0.6652        | 0.55  | 2300 | 0.6946          | 0.5078          | 0.6431     | 0.2178         | 0.1017           | 0.6895             | 0.1161          | 0.4629              | -0.1818             | 0.2158              | -255.6816      | -262.6781    | -2.6122         | -2.6429       |
| 0.6511        | 0.57  | 2400 | 0.6755          | 0.2132          | 0.6463     | 0.2309         | 0.1228           | 0.6960             | 0.1081          | 0.4351              | -0.1715             | 0.2030              | -253.5698      | -261.3663    | -2.6075         | -2.6392       |
| 0.6512        | 0.6   | 2500 | 0.7102          | 0.5940          | 0.6370     | 0.2139         | 0.0822           | 0.6990             | 0.1318          | 0.5141              | -0.1918             | 0.2364              | -257.6378      | -263.0636    | -2.6184         | -2.6519       |
| 0.7342        | 0.62  | 2600 | 0.6884          | 0.3826          | 0.6413     | 0.2233         | 0.1023           | 0.7040             | 0.1210          | 0.4842              | -0.1791             | 0.2219              | -255.6233      | -262.1221    | -2.6165         | -2.6506       |
| 0.6754        | 0.65  | 2700 | 0.6847          | 0.3415          | 0.6419     | 0.2283         | 0.1092           | 0.7055             | 0.1192          | 0.4752              | -0.1765             | 0.2181              | -254.9368      | -261.6212    | -2.6158         | -2.6511       |
| 0.7445        | 0.67  | 2800 | 0.6769          | 0.2621          | 0.6445     | 0.2313         | 0.1188           | 0.7020             | 0.1125          | 0.4532              | -0.1690             | 0.2084              | -253.9747      | -261.3299    | -2.6176         | -2.6513       |
| 0.6656        | 0.69  | 2900 | 0.6867          | 0.4407          | 0.6412     | 0.2299         | 0.1090           | 0.7045             | 0.1208          | 0.4813              | -0.1757             | 0.2199              | -254.9489      | -261.4680    | -2.6212         | -2.6566       |
| 0.6641        | 0.72  | 3000 | 0.6918          | 0.5290          | 0.6395     | 0.2278         | 0.1026           | 0.7025             | 0.1252          | 0.4930              | -0.1780             | 0.2250              | -255.5911      | -261.6767    | -2.6344         | -2.6687       |
| 0.6752        | 0.74  | 3100 | 0.6963          | 0.6115          | 0.6398     | 0.2272         | 0.1021           | 0.7030             | 0.1252          | 0.5000              | -0.1806             | 0.2279              | -255.6473      | -261.7339    | -2.6282         | -2.6628       |
| 0.6417        | 0.77  | 3200 | 0.7057          | 0.7185          | 0.6364     | 0.2246         | 0.0908           | 0.7040             | 0.1338          | 0.5276              | -0.1863             | 0.2394              | -256.7738      | -261.9981    | -2.6277         | -2.6619       |
| 0.6436        | 0.79  | 3300 | 0.7146          | 0.8124          | 0.6342     | 0.2203         | 0.0808           | 0.7040             | 0.1395          | 0.5452              | -0.1905             | 0.2463              | -257.7732      | -262.4228    | -2.6190         | -2.6530       |
| 0.7092        | 0.81  | 3400 | 0.6972          | 0.6209          | 0.6389     | 0.2266         | 0.0993           | 0.7015             | 0.1273          | 0.5073              | -0.1826             | 0.2310              | -255.9223      | -261.7928    | -2.6091         | -2.6431       |
| 0.6491        | 0.84  | 3500 | 0.6972          | 0.6241          | 0.6390     | 0.2273         | 0.1003           | 0.7020             | 0.1270          | 0.5062              | -0.1824             | 0.2306              | -255.8255      | -261.7234    | -2.6038         | -2.6383       |
| 0.6879        | 0.86  | 3600 | 0.7091          | 0.7585          | 0.6353     | 0.2220         | 0.0856           | 0.7060             | 0.1364          | 0.5352              | -0.1870             | 0.2418              | -257.2982      | -262.2594    | -2.6103         | -2.6440       |
| 0.6129        | 0.89  | 3700 | 0.7033          | 0.6942          | 0.6366     | 0.2255         | 0.0924           | 0.7065             | 0.1331          | 0.5254              | -0.1849             | 0.2379              | -256.6156      | -261.9067    | -2.6075         | -2.6417       |
| 0.6578        | 0.91  | 3800 | 0.6956          | 0.5982          | 0.6385     | 0.2286         | 0.1002           | 0.7040             | 0.1284          | 0.5109              | -0.1818             | 0.2321              | -255.8333      | -261.5916    | -2.6073         | -2.6413       |
| 0.6535        | 0.93  | 3900 | 0.6949          | 0.5854          | 0.6383     | 0.2289         | 0.1000           | 0.7045             | 0.1288          | 0.5118              | -0.1813             | 0.2323              | -255.8504      | -261.5681    | -2.6069         | -2.6411       |
| 0.6876        | 0.96  | 4000 | 0.6951          | 0.5831          | 0.6380     | 0.2289         | 0.0994           | 0.7035             | 0.1295          | 0.5141              | -0.1813             | 0.2330              | -255.9116      | -261.5652    | -2.6055         | -2.6398       |
| 0.6531        | 0.98  | 4100 | 0.6952          | 0.5853          | 0.6381     | 0.2289         | 0.0995           | 0.7040             | 0.1294          | 0.5136              | -0.1815             | 0.2329              | -255.9032      | -261.5644    | -2.6099         | -2.6438       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2