YYYYYYibo commited on
Commit
3e29067
1 Parent(s): 99653e7

Model save

Browse files
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: mistralai/Mistral-7B-v0.1
9
+ model-index:
10
+ - name: zephyr-7b-dpo-qlora
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-dpo-qlora
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.5735
22
+ - Rewards/chosen: -0.6770
23
+ - Rewards/rejected: -1.1070
24
+ - Rewards/accuracies: 0.6940
25
+ - Rewards/margins: 0.4300
26
+ - Logps/rejected: -351.8942
27
+ - Logps/chosen: -331.1508
28
+ - Logits/rejected: -1.4599
29
+ - Logits/chosen: -1.7015
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 2
50
+ - eval_batch_size: 2
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - num_devices: 4
54
+ - gradient_accumulation_steps: 4
55
+ - total_train_batch_size: 32
56
+ - total_eval_batch_size: 8
57
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
58
+ - lr_scheduler_type: cosine
59
+ - lr_scheduler_warmup_ratio: 0.1
60
+ - num_epochs: 1
61
+
62
+ ### Training results
63
+
64
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
65
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
66
+ | 0.6269 | 0.32 | 100 | 0.6269 | -0.2377 | -0.4431 | 0.6820 | 0.2054 | -285.4985 | -287.2169 | -2.2566 | -2.3666 |
67
+ | 0.6332 | 0.64 | 200 | 0.5821 | -0.5909 | -0.9588 | 0.7060 | 0.3679 | -337.0687 | -322.5442 | -1.6871 | -1.8938 |
68
+ | 0.5648 | 0.96 | 300 | 0.5735 | -0.6770 | -1.1070 | 0.6940 | 0.4300 | -351.8942 | -331.1508 | -1.4599 | -1.7015 |
69
+
70
+
71
+ ### Framework versions
72
+
73
+ - PEFT 0.7.1
74
+ - Transformers 4.36.2
75
+ - Pytorch 2.2.2+cu121
76
+ - Datasets 2.14.6
77
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5f218add99e924b421593d76eae2b8befa6912c7ea10bd6c21fc9114735b1983
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28defe27fdc102149779b9769d25766751de171b48982e943278effb731ebb99
3
  size 671150064
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6116275028922619,
4
+ "train_runtime": 6907.8509,
5
+ "train_samples": 10000,
6
+ "train_samples_per_second": 1.448,
7
+ "train_steps_per_second": 0.045
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.6116275028922619,
4
+ "train_runtime": 6907.8509,
5
+ "train_samples": 10000,
6
+ "train_samples_per_second": 1.448,
7
+ "train_steps_per_second": 0.045
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,526 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.9984,
5
+ "eval_steps": 100,
6
+ "global_step": 312,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 1.5625e-07,
14
+ "logits/chosen": -2.7731900215148926,
15
+ "logits/rejected": -2.6362287998199463,
16
+ "logps/chosen": -356.1260070800781,
17
+ "logps/rejected": -311.3892822265625,
18
+ "loss": 0.6931,
19
+ "rewards/accuracies": 0.0,
20
+ "rewards/chosen": 0.0,
21
+ "rewards/margins": 0.0,
22
+ "rewards/rejected": 0.0,
23
+ "step": 1
24
+ },
25
+ {
26
+ "epoch": 0.03,
27
+ "learning_rate": 1.5625e-06,
28
+ "logits/chosen": -2.3915464878082275,
29
+ "logits/rejected": -2.3424172401428223,
30
+ "logps/chosen": -243.08827209472656,
31
+ "logps/rejected": -240.88124084472656,
32
+ "loss": 0.6927,
33
+ "rewards/accuracies": 0.5,
34
+ "rewards/chosen": 0.004824994597584009,
35
+ "rewards/margins": 0.001562346238642931,
36
+ "rewards/rejected": 0.003262649057433009,
37
+ "step": 10
38
+ },
39
+ {
40
+ "epoch": 0.06,
41
+ "learning_rate": 3.125e-06,
42
+ "logits/chosen": -2.6855998039245605,
43
+ "logits/rejected": -2.503112316131592,
44
+ "logps/chosen": -276.1568908691406,
45
+ "logps/rejected": -245.57150268554688,
46
+ "loss": 0.687,
47
+ "rewards/accuracies": 0.6499999761581421,
48
+ "rewards/chosen": 0.04376252368092537,
49
+ "rewards/margins": 0.011996113695204258,
50
+ "rewards/rejected": 0.03176640719175339,
51
+ "step": 20
52
+ },
53
+ {
54
+ "epoch": 0.1,
55
+ "learning_rate": 4.6875000000000004e-06,
56
+ "logits/chosen": -2.5015687942504883,
57
+ "logits/rejected": -2.448686122894287,
58
+ "logps/chosen": -244.99642944335938,
59
+ "logps/rejected": -249.10916137695312,
60
+ "loss": 0.6832,
61
+ "rewards/accuracies": 0.625,
62
+ "rewards/chosen": 0.05081823468208313,
63
+ "rewards/margins": 0.018069546669721603,
64
+ "rewards/rejected": 0.032748688012361526,
65
+ "step": 30
66
+ },
67
+ {
68
+ "epoch": 0.13,
69
+ "learning_rate": 4.989935734988098e-06,
70
+ "logits/chosen": -2.5158021450042725,
71
+ "logits/rejected": -2.337573289871216,
72
+ "logps/chosen": -272.7821350097656,
73
+ "logps/rejected": -227.36007690429688,
74
+ "loss": 0.6668,
75
+ "rewards/accuracies": 0.675000011920929,
76
+ "rewards/chosen": 0.0395994558930397,
77
+ "rewards/margins": 0.04740050435066223,
78
+ "rewards/rejected": -0.00780104985460639,
79
+ "step": 40
80
+ },
81
+ {
82
+ "epoch": 0.16,
83
+ "learning_rate": 4.949188496058089e-06,
84
+ "logits/chosen": -2.430145502090454,
85
+ "logits/rejected": -2.4263150691986084,
86
+ "logps/chosen": -249.273681640625,
87
+ "logps/rejected": -266.5956726074219,
88
+ "loss": 0.6528,
89
+ "rewards/accuracies": 0.6625000238418579,
90
+ "rewards/chosen": 0.02035255916416645,
91
+ "rewards/margins": 0.059511054307222366,
92
+ "rewards/rejected": -0.03915848955512047,
93
+ "step": 50
94
+ },
95
+ {
96
+ "epoch": 0.19,
97
+ "learning_rate": 4.8776412907378845e-06,
98
+ "logits/chosen": -2.4893181324005127,
99
+ "logits/rejected": -2.418604612350464,
100
+ "logps/chosen": -301.8047790527344,
101
+ "logps/rejected": -252.42892456054688,
102
+ "loss": 0.6591,
103
+ "rewards/accuracies": 0.612500011920929,
104
+ "rewards/chosen": -0.06781601160764694,
105
+ "rewards/margins": 0.07723621279001236,
106
+ "rewards/rejected": -0.1450522094964981,
107
+ "step": 60
108
+ },
109
+ {
110
+ "epoch": 0.22,
111
+ "learning_rate": 4.7761938666470405e-06,
112
+ "logits/chosen": -2.4578957557678223,
113
+ "logits/rejected": -2.4078097343444824,
114
+ "logps/chosen": -259.1146545410156,
115
+ "logps/rejected": -255.2762908935547,
116
+ "loss": 0.6412,
117
+ "rewards/accuracies": 0.7250000238418579,
118
+ "rewards/chosen": -0.09445185959339142,
119
+ "rewards/margins": 0.15250881016254425,
120
+ "rewards/rejected": -0.24696068465709686,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.26,
125
+ "learning_rate": 4.646121984004666e-06,
126
+ "logits/chosen": -2.5219717025756836,
127
+ "logits/rejected": -2.3697924613952637,
128
+ "logps/chosen": -289.8721618652344,
129
+ "logps/rejected": -306.9769287109375,
130
+ "loss": 0.6357,
131
+ "rewards/accuracies": 0.6499999761581421,
132
+ "rewards/chosen": -0.21299012005329132,
133
+ "rewards/margins": 0.16702046990394592,
134
+ "rewards/rejected": -0.38001060485839844,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 0.29,
139
+ "learning_rate": 4.4890613722044526e-06,
140
+ "logits/chosen": -2.427962064743042,
141
+ "logits/rejected": -2.326305866241455,
142
+ "logps/chosen": -265.3756408691406,
143
+ "logps/rejected": -262.7252197265625,
144
+ "loss": 0.6272,
145
+ "rewards/accuracies": 0.737500011920929,
146
+ "rewards/chosen": -0.16604574024677277,
147
+ "rewards/margins": 0.19639183580875397,
148
+ "rewards/rejected": -0.36243754625320435,
149
+ "step": 90
150
+ },
151
+ {
152
+ "epoch": 0.32,
153
+ "learning_rate": 4.3069871595684795e-06,
154
+ "logits/chosen": -2.213723659515381,
155
+ "logits/rejected": -2.217102527618408,
156
+ "logps/chosen": -245.6179962158203,
157
+ "logps/rejected": -293.12518310546875,
158
+ "loss": 0.6269,
159
+ "rewards/accuracies": 0.612500011920929,
160
+ "rewards/chosen": -0.38440248370170593,
161
+ "rewards/margins": 0.126637801527977,
162
+ "rewards/rejected": -0.5110402703285217,
163
+ "step": 100
164
+ },
165
+ {
166
+ "epoch": 0.32,
167
+ "eval_logits/chosen": -2.3665878772735596,
168
+ "eval_logits/rejected": -2.256598949432373,
169
+ "eval_logps/chosen": -287.2168884277344,
170
+ "eval_logps/rejected": -285.49847412109375,
171
+ "eval_loss": 0.6268974542617798,
172
+ "eval_rewards/accuracies": 0.6819999814033508,
173
+ "eval_rewards/chosen": -0.23765824735164642,
174
+ "eval_rewards/margins": 0.2054254114627838,
175
+ "eval_rewards/rejected": -0.44308364391326904,
176
+ "eval_runtime": 543.2957,
177
+ "eval_samples_per_second": 3.681,
178
+ "eval_steps_per_second": 0.46,
179
+ "step": 100
180
+ },
181
+ {
182
+ "epoch": 0.35,
183
+ "learning_rate": 4.102189034962561e-06,
184
+ "logits/chosen": -2.338050603866577,
185
+ "logits/rejected": -2.2199347019195557,
186
+ "logps/chosen": -304.7019958496094,
187
+ "logps/rejected": -283.5575256347656,
188
+ "loss": 0.6232,
189
+ "rewards/accuracies": 0.6875,
190
+ "rewards/chosen": -0.18706437945365906,
191
+ "rewards/margins": 0.24619019031524658,
192
+ "rewards/rejected": -0.43325456976890564,
193
+ "step": 110
194
+ },
195
+ {
196
+ "epoch": 0.38,
197
+ "learning_rate": 3.8772424536302565e-06,
198
+ "logits/chosen": -2.199939250946045,
199
+ "logits/rejected": -2.1462173461914062,
200
+ "logps/chosen": -280.5738525390625,
201
+ "logps/rejected": -272.75537109375,
202
+ "loss": 0.6256,
203
+ "rewards/accuracies": 0.7124999761581421,
204
+ "rewards/chosen": -0.23016035556793213,
205
+ "rewards/margins": 0.2529276907444,
206
+ "rewards/rejected": -0.48308807611465454,
207
+ "step": 120
208
+ },
209
+ {
210
+ "epoch": 0.42,
211
+ "learning_rate": 3.634976249348867e-06,
212
+ "logits/chosen": -2.4285922050476074,
213
+ "logits/rejected": -2.252119541168213,
214
+ "logps/chosen": -337.8984375,
215
+ "logps/rejected": -329.4248962402344,
216
+ "loss": 0.6299,
217
+ "rewards/accuracies": 0.737500011920929,
218
+ "rewards/chosen": -0.27586501836776733,
219
+ "rewards/margins": 0.285078763961792,
220
+ "rewards/rejected": -0.5609437823295593,
221
+ "step": 130
222
+ },
223
+ {
224
+ "epoch": 0.45,
225
+ "learning_rate": 3.3784370602033572e-06,
226
+ "logits/chosen": -2.072373628616333,
227
+ "logits/rejected": -1.9053455591201782,
228
+ "logps/chosen": -251.76571655273438,
229
+ "logps/rejected": -285.0694885253906,
230
+ "loss": 0.6067,
231
+ "rewards/accuracies": 0.6875,
232
+ "rewards/chosen": -0.5010747909545898,
233
+ "rewards/margins": 0.26867786049842834,
234
+ "rewards/rejected": -0.7697526216506958,
235
+ "step": 140
236
+ },
237
+ {
238
+ "epoch": 0.48,
239
+ "learning_rate": 3.1108510153447352e-06,
240
+ "logits/chosen": -2.21221661567688,
241
+ "logits/rejected": -2.136280059814453,
242
+ "logps/chosen": -338.2016296386719,
243
+ "logps/rejected": -331.0526428222656,
244
+ "loss": 0.608,
245
+ "rewards/accuracies": 0.625,
246
+ "rewards/chosen": -0.639680027961731,
247
+ "rewards/margins": 0.2550516426563263,
248
+ "rewards/rejected": -0.8947317004203796,
249
+ "step": 150
250
+ },
251
+ {
252
+ "epoch": 0.51,
253
+ "learning_rate": 2.835583164544139e-06,
254
+ "logits/chosen": -2.2209646701812744,
255
+ "logits/rejected": -2.022948980331421,
256
+ "logps/chosen": -377.3534851074219,
257
+ "logps/rejected": -344.77252197265625,
258
+ "loss": 0.5937,
259
+ "rewards/accuracies": 0.737500011920929,
260
+ "rewards/chosen": -0.7194479703903198,
261
+ "rewards/margins": 0.39620086550712585,
262
+ "rewards/rejected": -1.115648865699768,
263
+ "step": 160
264
+ },
265
+ {
266
+ "epoch": 0.54,
267
+ "learning_rate": 2.556095160739513e-06,
268
+ "logits/chosen": -2.1350314617156982,
269
+ "logits/rejected": -1.85476553440094,
270
+ "logps/chosen": -351.29638671875,
271
+ "logps/rejected": -354.8650817871094,
272
+ "loss": 0.6069,
273
+ "rewards/accuracies": 0.612500011920929,
274
+ "rewards/chosen": -0.7903974652290344,
275
+ "rewards/margins": 0.24958536028862,
276
+ "rewards/rejected": -1.039982795715332,
277
+ "step": 170
278
+ },
279
+ {
280
+ "epoch": 0.58,
281
+ "learning_rate": 2.2759017277414165e-06,
282
+ "logits/chosen": -2.0943400859832764,
283
+ "logits/rejected": -1.8893616199493408,
284
+ "logps/chosen": -322.147216796875,
285
+ "logps/rejected": -327.81304931640625,
286
+ "loss": 0.6252,
287
+ "rewards/accuracies": 0.625,
288
+ "rewards/chosen": -0.6768954992294312,
289
+ "rewards/margins": 0.20395174622535706,
290
+ "rewards/rejected": -0.8808472752571106,
291
+ "step": 180
292
+ },
293
+ {
294
+ "epoch": 0.61,
295
+ "learning_rate": 1.9985264605418185e-06,
296
+ "logits/chosen": -1.9419981241226196,
297
+ "logits/rejected": -1.7324016094207764,
298
+ "logps/chosen": -328.23760986328125,
299
+ "logps/rejected": -314.13922119140625,
300
+ "loss": 0.584,
301
+ "rewards/accuracies": 0.7250000238418579,
302
+ "rewards/chosen": -0.4452829360961914,
303
+ "rewards/margins": 0.4189114570617676,
304
+ "rewards/rejected": -0.864194393157959,
305
+ "step": 190
306
+ },
307
+ {
308
+ "epoch": 0.64,
309
+ "learning_rate": 1.7274575140626318e-06,
310
+ "logits/chosen": -2.144902229309082,
311
+ "logits/rejected": -1.7156444787979126,
312
+ "logps/chosen": -362.327880859375,
313
+ "logps/rejected": -322.9747619628906,
314
+ "loss": 0.6332,
315
+ "rewards/accuracies": 0.7124999761581421,
316
+ "rewards/chosen": -0.4486660957336426,
317
+ "rewards/margins": 0.3436250388622284,
318
+ "rewards/rejected": -0.7922911047935486,
319
+ "step": 200
320
+ },
321
+ {
322
+ "epoch": 0.64,
323
+ "eval_logits/chosen": -1.893760323524475,
324
+ "eval_logits/rejected": -1.6871448755264282,
325
+ "eval_logps/chosen": -322.544189453125,
326
+ "eval_logps/rejected": -337.0687255859375,
327
+ "eval_loss": 0.5820500254631042,
328
+ "eval_rewards/accuracies": 0.7059999704360962,
329
+ "eval_rewards/chosen": -0.5909315943717957,
330
+ "eval_rewards/margins": 0.3678547739982605,
331
+ "eval_rewards/rejected": -0.9587863683700562,
332
+ "eval_runtime": 543.1459,
333
+ "eval_samples_per_second": 3.682,
334
+ "eval_steps_per_second": 0.46,
335
+ "step": 200
336
+ },
337
+ {
338
+ "epoch": 0.67,
339
+ "learning_rate": 1.466103737583699e-06,
340
+ "logits/chosen": -1.8559290170669556,
341
+ "logits/rejected": -1.7014697790145874,
342
+ "logps/chosen": -324.19256591796875,
343
+ "logps/rejected": -352.70697021484375,
344
+ "loss": 0.552,
345
+ "rewards/accuracies": 0.762499988079071,
346
+ "rewards/chosen": -0.6294658780097961,
347
+ "rewards/margins": 0.45733365416526794,
348
+ "rewards/rejected": -1.0867995023727417,
349
+ "step": 210
350
+ },
351
+ {
352
+ "epoch": 0.7,
353
+ "learning_rate": 1.217751806485235e-06,
354
+ "logits/chosen": -1.8568174839019775,
355
+ "logits/rejected": -1.6362855434417725,
356
+ "logps/chosen": -356.0939636230469,
357
+ "logps/rejected": -389.1434326171875,
358
+ "loss": 0.5765,
359
+ "rewards/accuracies": 0.75,
360
+ "rewards/chosen": -0.6206706762313843,
361
+ "rewards/margins": 0.5877247452735901,
362
+ "rewards/rejected": -1.2083956003189087,
363
+ "step": 220
364
+ },
365
+ {
366
+ "epoch": 0.74,
367
+ "learning_rate": 9.855248903979505e-07,
368
+ "logits/chosen": -1.9677798748016357,
369
+ "logits/rejected": -1.8001766204833984,
370
+ "logps/chosen": -333.63409423828125,
371
+ "logps/rejected": -372.82232666015625,
372
+ "loss": 0.5961,
373
+ "rewards/accuracies": 0.75,
374
+ "rewards/chosen": -0.6273213028907776,
375
+ "rewards/margins": 0.4957484304904938,
376
+ "rewards/rejected": -1.1230696439743042,
377
+ "step": 230
378
+ },
379
+ {
380
+ "epoch": 0.77,
381
+ "learning_rate": 7.723433775328385e-07,
382
+ "logits/chosen": -1.6269299983978271,
383
+ "logits/rejected": -1.5314247608184814,
384
+ "logps/chosen": -343.7135314941406,
385
+ "logps/rejected": -360.920166015625,
386
+ "loss": 0.5733,
387
+ "rewards/accuracies": 0.7124999761581421,
388
+ "rewards/chosen": -0.6133186221122742,
389
+ "rewards/margins": 0.3855026662349701,
390
+ "rewards/rejected": -0.9988213777542114,
391
+ "step": 240
392
+ },
393
+ {
394
+ "epoch": 0.8,
395
+ "learning_rate": 5.808881491049723e-07,
396
+ "logits/chosen": -1.8092960119247437,
397
+ "logits/rejected": -1.4363586902618408,
398
+ "logps/chosen": -302.46234130859375,
399
+ "logps/rejected": -305.09393310546875,
400
+ "loss": 0.5822,
401
+ "rewards/accuracies": 0.6875,
402
+ "rewards/chosen": -0.7107473611831665,
403
+ "rewards/margins": 0.3086285889148712,
404
+ "rewards/rejected": -1.0193760395050049,
405
+ "step": 250
406
+ },
407
+ {
408
+ "epoch": 0.83,
409
+ "learning_rate": 4.1356686569674344e-07,
410
+ "logits/chosen": -2.0522053241729736,
411
+ "logits/rejected": -1.6467043161392212,
412
+ "logps/chosen": -370.14019775390625,
413
+ "logps/rejected": -350.79364013671875,
414
+ "loss": 0.5594,
415
+ "rewards/accuracies": 0.699999988079071,
416
+ "rewards/chosen": -0.6428462266921997,
417
+ "rewards/margins": 0.41562938690185547,
418
+ "rewards/rejected": -1.0584756135940552,
419
+ "step": 260
420
+ },
421
+ {
422
+ "epoch": 0.86,
423
+ "learning_rate": 2.7248368952908055e-07,
424
+ "logits/chosen": -1.736702561378479,
425
+ "logits/rejected": -1.5174537897109985,
426
+ "logps/chosen": -293.9969177246094,
427
+ "logps/rejected": -317.34844970703125,
428
+ "loss": 0.5803,
429
+ "rewards/accuracies": 0.7250000238418579,
430
+ "rewards/chosen": -0.5701481103897095,
431
+ "rewards/margins": 0.500605583190918,
432
+ "rewards/rejected": -1.0707536935806274,
433
+ "step": 270
434
+ },
435
+ {
436
+ "epoch": 0.9,
437
+ "learning_rate": 1.59412823400657e-07,
438
+ "logits/chosen": -1.6159837245941162,
439
+ "logits/rejected": -1.2289941310882568,
440
+ "logps/chosen": -342.9421081542969,
441
+ "logps/rejected": -372.6164855957031,
442
+ "loss": 0.5578,
443
+ "rewards/accuracies": 0.737500011920929,
444
+ "rewards/chosen": -0.7729519605636597,
445
+ "rewards/margins": 0.49912238121032715,
446
+ "rewards/rejected": -1.2720743417739868,
447
+ "step": 280
448
+ },
449
+ {
450
+ "epoch": 0.93,
451
+ "learning_rate": 7.577619905828281e-08,
452
+ "logits/chosen": -1.502423644065857,
453
+ "logits/rejected": -1.456081509590149,
454
+ "logps/chosen": -329.8805236816406,
455
+ "logps/rejected": -356.34417724609375,
456
+ "loss": 0.5827,
457
+ "rewards/accuracies": 0.7124999761581421,
458
+ "rewards/chosen": -0.678324818611145,
459
+ "rewards/margins": 0.42007485032081604,
460
+ "rewards/rejected": -1.0983997583389282,
461
+ "step": 290
462
+ },
463
+ {
464
+ "epoch": 0.96,
465
+ "learning_rate": 2.262559558016325e-08,
466
+ "logits/chosen": -1.6769917011260986,
467
+ "logits/rejected": -1.406165361404419,
468
+ "logps/chosen": -328.3318786621094,
469
+ "logps/rejected": -353.73968505859375,
470
+ "loss": 0.5648,
471
+ "rewards/accuracies": 0.6625000238418579,
472
+ "rewards/chosen": -0.6493935585021973,
473
+ "rewards/margins": 0.43068727850914,
474
+ "rewards/rejected": -1.0800807476043701,
475
+ "step": 300
476
+ },
477
+ {
478
+ "epoch": 0.96,
479
+ "eval_logits/chosen": -1.7015434503555298,
480
+ "eval_logits/rejected": -1.4598934650421143,
481
+ "eval_logps/chosen": -331.1508483886719,
482
+ "eval_logps/rejected": -351.8941955566406,
483
+ "eval_loss": 0.5735270977020264,
484
+ "eval_rewards/accuracies": 0.6940000057220459,
485
+ "eval_rewards/chosen": -0.6769981980323792,
486
+ "eval_rewards/margins": 0.4300425946712494,
487
+ "eval_rewards/rejected": -1.1070406436920166,
488
+ "eval_runtime": 542.9185,
489
+ "eval_samples_per_second": 3.684,
490
+ "eval_steps_per_second": 0.46,
491
+ "step": 300
492
+ },
493
+ {
494
+ "epoch": 0.99,
495
+ "learning_rate": 6.294126437336734e-10,
496
+ "logits/chosen": -1.7649977207183838,
497
+ "logits/rejected": -1.521240234375,
498
+ "logps/chosen": -326.1722717285156,
499
+ "logps/rejected": -356.2889709472656,
500
+ "loss": 0.5603,
501
+ "rewards/accuracies": 0.737500011920929,
502
+ "rewards/chosen": -0.6321894526481628,
503
+ "rewards/margins": 0.4693359434604645,
504
+ "rewards/rejected": -1.1015253067016602,
505
+ "step": 310
506
+ },
507
+ {
508
+ "epoch": 1.0,
509
+ "step": 312,
510
+ "total_flos": 0.0,
511
+ "train_loss": 0.6116275028922619,
512
+ "train_runtime": 6907.8509,
513
+ "train_samples_per_second": 1.448,
514
+ "train_steps_per_second": 0.045
515
+ }
516
+ ],
517
+ "logging_steps": 10,
518
+ "max_steps": 312,
519
+ "num_input_tokens_seen": 0,
520
+ "num_train_epochs": 1,
521
+ "save_steps": 100,
522
+ "total_flos": 0.0,
523
+ "train_batch_size": 2,
524
+ "trial_name": null,
525
+ "trial_params": null
526
+ }