lole25 commited on
Commit
84e4e18
1 Parent(s): 198ab7f

Model save

Browse files
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ base_model: DUAL-GPO/phi-2-irepo-chatml-merged-i1
9
+ model-index:
10
+ - name: phi-2-irepo-chatml-v1-i2
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # phi-2-irepo-chatml-v1-i2
18
+
19
+ This model is a fine-tuned version of [DUAL-GPO/phi-2-irepo-chatml-merged-i1](https://huggingface.co/DUAL-GPO/phi-2-irepo-chatml-merged-i1) on the None dataset.
20
+
21
+ ## Model description
22
+
23
+ More information needed
24
+
25
+ ## Intended uses & limitations
26
+
27
+ More information needed
28
+
29
+ ## Training and evaluation data
30
+
31
+ More information needed
32
+
33
+ ## Training procedure
34
+
35
+ ### Training hyperparameters
36
+
37
+ The following hyperparameters were used during training:
38
+ - learning_rate: 5e-06
39
+ - train_batch_size: 4
40
+ - eval_batch_size: 4
41
+ - seed: 42
42
+ - distributed_type: multi-GPU
43
+ - num_devices: 2
44
+ - gradient_accumulation_steps: 4
45
+ - total_train_batch_size: 32
46
+ - total_eval_batch_size: 8
47
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
+ - lr_scheduler_type: cosine
49
+ - lr_scheduler_warmup_ratio: 0.1
50
+ - num_epochs: 1
51
+
52
+ ### Training results
53
+
54
+
55
+
56
+ ### Framework versions
57
+
58
+ - PEFT 0.7.1
59
+ - Transformers 4.36.2
60
+ - Pytorch 2.1.2+cu121
61
+ - Datasets 2.14.6
62
+ - Tokenizers 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d61168f8bba3c7f804416f20e580da71edd3dfa13037021d0a9647dc6d404ef0
3
  size 335579632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15a70400a9ae7ba66ab0b9dfa5d1a9beac6a887414ac73d95d2ea90fae9a6c16
3
  size 335579632
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.19462941225971966,
4
+ "train_runtime": 7972.3934,
5
+ "train_samples": 30000,
6
+ "train_samples_per_second": 3.763,
7
+ "train_steps_per_second": 0.118
8
+ }
runs/May22_02-23-10_gpu4-119-5/events.out.tfevents.1716308984.gpu4-119-5.2153119.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9cc74d9cbb0eb3cd75e5dec5c81d7df1b36e32c381c9fe63b849c86e44c6fd8c
3
- size 62460
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a81baf823b62be7d0c8e5f6535789824f1515c1d4fa2049d5ee61044995acd08
3
+ size 64716
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 0.19462941225971966,
4
+ "train_runtime": 7972.3934,
5
+ "train_samples": 30000,
6
+ "train_samples_per_second": 3.763,
7
+ "train_steps_per_second": 0.118
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1346 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.9994666666666666,
5
+ "eval_steps": 500,
6
+ "global_step": 937,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 5.319148936170213e-08,
14
+ "logits/chosen": -0.31276124715805054,
15
+ "logits/rejected": -0.11341337859630585,
16
+ "logps/chosen": -559.525146484375,
17
+ "logps/rejected": -486.2456970214844,
18
+ "loss": 0.21,
19
+ "rewards/accuracies": 0.0,
20
+ "rewards/chosen": 0.0,
21
+ "rewards/margins": 0.0,
22
+ "rewards/rejected": 0.0,
23
+ "step": 1
24
+ },
25
+ {
26
+ "epoch": 0.01,
27
+ "learning_rate": 5.319148936170213e-07,
28
+ "logits/chosen": -0.20243170857429504,
29
+ "logits/rejected": -0.07215167582035065,
30
+ "logps/chosen": -473.5186767578125,
31
+ "logps/rejected": -507.1302185058594,
32
+ "loss": 0.2065,
33
+ "rewards/accuracies": 0.3541666567325592,
34
+ "rewards/chosen": -7.249015470733866e-05,
35
+ "rewards/margins": 0.00014273211127147079,
36
+ "rewards/rejected": -0.0002152222878066823,
37
+ "step": 10
38
+ },
39
+ {
40
+ "epoch": 0.02,
41
+ "learning_rate": 1.0638297872340427e-06,
42
+ "logits/chosen": -0.18446393311023712,
43
+ "logits/rejected": -0.09755989164113998,
44
+ "logps/chosen": -501.7010803222656,
45
+ "logps/rejected": -487.3160705566406,
46
+ "loss": 0.2124,
47
+ "rewards/accuracies": 0.4124999940395355,
48
+ "rewards/chosen": -5.829105430166237e-05,
49
+ "rewards/margins": 7.958527567097917e-05,
50
+ "rewards/rejected": -0.0001378763117827475,
51
+ "step": 20
52
+ },
53
+ {
54
+ "epoch": 0.03,
55
+ "learning_rate": 1.595744680851064e-06,
56
+ "logits/chosen": -0.15609130263328552,
57
+ "logits/rejected": -0.04423709958791733,
58
+ "logps/chosen": -560.1486206054688,
59
+ "logps/rejected": -544.0206298828125,
60
+ "loss": 0.2048,
61
+ "rewards/accuracies": 0.48750001192092896,
62
+ "rewards/chosen": -0.0003287494764663279,
63
+ "rewards/margins": 0.00016076143947429955,
64
+ "rewards/rejected": -0.0004895109450444579,
65
+ "step": 30
66
+ },
67
+ {
68
+ "epoch": 0.04,
69
+ "learning_rate": 2.1276595744680853e-06,
70
+ "logits/chosen": -0.2074490785598755,
71
+ "logits/rejected": -0.14103737473487854,
72
+ "logps/chosen": -507.80450439453125,
73
+ "logps/rejected": -515.2080078125,
74
+ "loss": 0.214,
75
+ "rewards/accuracies": 0.44999998807907104,
76
+ "rewards/chosen": -0.0010981714585795999,
77
+ "rewards/margins": 0.00048262160271406174,
78
+ "rewards/rejected": -0.0015807930612936616,
79
+ "step": 40
80
+ },
81
+ {
82
+ "epoch": 0.05,
83
+ "learning_rate": 2.6595744680851065e-06,
84
+ "logits/chosen": -0.12519846856594086,
85
+ "logits/rejected": -0.1412961781024933,
86
+ "logps/chosen": -461.9590759277344,
87
+ "logps/rejected": -499.2351989746094,
88
+ "loss": 0.2124,
89
+ "rewards/accuracies": 0.46875,
90
+ "rewards/chosen": -0.0024143296759575605,
91
+ "rewards/margins": 0.0007537025958299637,
92
+ "rewards/rejected": -0.0031680327374488115,
93
+ "step": 50
94
+ },
95
+ {
96
+ "epoch": 0.06,
97
+ "learning_rate": 3.191489361702128e-06,
98
+ "logits/chosen": -0.173623189330101,
99
+ "logits/rejected": -0.03094838559627533,
100
+ "logps/chosen": -551.9820556640625,
101
+ "logps/rejected": -527.4284057617188,
102
+ "loss": 0.2003,
103
+ "rewards/accuracies": 0.4437499940395355,
104
+ "rewards/chosen": -0.00582545343786478,
105
+ "rewards/margins": 0.0019644282292574644,
106
+ "rewards/rejected": -0.007789881434291601,
107
+ "step": 60
108
+ },
109
+ {
110
+ "epoch": 0.07,
111
+ "learning_rate": 3.723404255319149e-06,
112
+ "logits/chosen": -0.161810502409935,
113
+ "logits/rejected": -0.10678007453680038,
114
+ "logps/chosen": -567.8081665039062,
115
+ "logps/rejected": -562.3734130859375,
116
+ "loss": 0.2098,
117
+ "rewards/accuracies": 0.4625000059604645,
118
+ "rewards/chosen": -0.012994857504963875,
119
+ "rewards/margins": 0.003251770045608282,
120
+ "rewards/rejected": -0.016246628016233444,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.09,
125
+ "learning_rate": 4.255319148936171e-06,
126
+ "logits/chosen": -0.15964026749134064,
127
+ "logits/rejected": -0.27652230858802795,
128
+ "logps/chosen": -562.570556640625,
129
+ "logps/rejected": -621.7036743164062,
130
+ "loss": 0.2037,
131
+ "rewards/accuracies": 0.5,
132
+ "rewards/chosen": -0.026814639568328857,
133
+ "rewards/margins": 0.0097076166421175,
134
+ "rewards/rejected": -0.03652225807309151,
135
+ "step": 80
136
+ },
137
+ {
138
+ "epoch": 0.1,
139
+ "learning_rate": 4.787234042553192e-06,
140
+ "logits/chosen": -0.2600744664669037,
141
+ "logits/rejected": -0.20050808787345886,
142
+ "logps/chosen": -609.1525268554688,
143
+ "logps/rejected": -612.4235229492188,
144
+ "loss": 0.2067,
145
+ "rewards/accuracies": 0.4000000059604645,
146
+ "rewards/chosen": -0.059279996901750565,
147
+ "rewards/margins": 0.004630334675312042,
148
+ "rewards/rejected": -0.0639103353023529,
149
+ "step": 90
150
+ },
151
+ {
152
+ "epoch": 0.11,
153
+ "learning_rate": 4.999375059004058e-06,
154
+ "logits/chosen": -0.2565140724182129,
155
+ "logits/rejected": -0.22637882828712463,
156
+ "logps/chosen": -574.8885498046875,
157
+ "logps/rejected": -590.8546142578125,
158
+ "loss": 0.1998,
159
+ "rewards/accuracies": 0.40625,
160
+ "rewards/chosen": -0.07415835559368134,
161
+ "rewards/margins": 0.01800454594194889,
162
+ "rewards/rejected": -0.09216289967298508,
163
+ "step": 100
164
+ },
165
+ {
166
+ "epoch": 0.12,
167
+ "learning_rate": 4.9955571065548795e-06,
168
+ "logits/chosen": -0.1685013473033905,
169
+ "logits/rejected": -0.2401442974805832,
170
+ "logps/chosen": -557.1212158203125,
171
+ "logps/rejected": -602.7764892578125,
172
+ "loss": 0.196,
173
+ "rewards/accuracies": 0.4749999940395355,
174
+ "rewards/chosen": -0.09011422097682953,
175
+ "rewards/margins": 0.019372332841157913,
176
+ "rewards/rejected": -0.10948655754327774,
177
+ "step": 110
178
+ },
179
+ {
180
+ "epoch": 0.13,
181
+ "learning_rate": 4.9882736864879e-06,
182
+ "logits/chosen": -0.2641439139842987,
183
+ "logits/rejected": -0.2980344891548157,
184
+ "logps/chosen": -588.050537109375,
185
+ "logps/rejected": -627.3956298828125,
186
+ "loss": 0.2053,
187
+ "rewards/accuracies": 0.4375,
188
+ "rewards/chosen": -0.10959631204605103,
189
+ "rewards/margins": 0.014565527439117432,
190
+ "rewards/rejected": -0.12416181713342667,
191
+ "step": 120
192
+ },
193
+ {
194
+ "epoch": 0.14,
195
+ "learning_rate": 4.977534912960124e-06,
196
+ "logits/chosen": -0.2924054265022278,
197
+ "logits/rejected": -0.08088915795087814,
198
+ "logps/chosen": -576.1680297851562,
199
+ "logps/rejected": -614.0890502929688,
200
+ "loss": 0.1901,
201
+ "rewards/accuracies": 0.4312500059604645,
202
+ "rewards/chosen": -0.09112486243247986,
203
+ "rewards/margins": 0.025440961122512817,
204
+ "rewards/rejected": -0.11656580865383148,
205
+ "step": 130
206
+ },
207
+ {
208
+ "epoch": 0.15,
209
+ "learning_rate": 4.963355698422092e-06,
210
+ "logits/chosen": -0.10601979494094849,
211
+ "logits/rejected": -0.1950257569551468,
212
+ "logps/chosen": -595.1011352539062,
213
+ "logps/rejected": -659.9929809570312,
214
+ "loss": 0.2058,
215
+ "rewards/accuracies": 0.4749999940395355,
216
+ "rewards/chosen": -0.1052999347448349,
217
+ "rewards/margins": 0.02551344595849514,
218
+ "rewards/rejected": -0.1308133900165558,
219
+ "step": 140
220
+ },
221
+ {
222
+ "epoch": 0.16,
223
+ "learning_rate": 4.945755732909625e-06,
224
+ "logits/chosen": -0.2408047914505005,
225
+ "logits/rejected": -0.2040824145078659,
226
+ "logps/chosen": -551.7179565429688,
227
+ "logps/rejected": -606.5433959960938,
228
+ "loss": 0.1955,
229
+ "rewards/accuracies": 0.4937500059604645,
230
+ "rewards/chosen": -0.07721008360385895,
231
+ "rewards/margins": 0.026318836957216263,
232
+ "rewards/rejected": -0.10352891683578491,
233
+ "step": 150
234
+ },
235
+ {
236
+ "epoch": 0.17,
237
+ "learning_rate": 4.924759456701167e-06,
238
+ "logits/chosen": -0.21895582973957062,
239
+ "logits/rejected": -0.2554505467414856,
240
+ "logps/chosen": -608.0427856445312,
241
+ "logps/rejected": -679.7128295898438,
242
+ "loss": 0.2025,
243
+ "rewards/accuracies": 0.40625,
244
+ "rewards/chosen": -0.10357453674077988,
245
+ "rewards/margins": 0.022874176502227783,
246
+ "rewards/rejected": -0.12644873559474945,
247
+ "step": 160
248
+ },
249
+ {
250
+ "epoch": 0.18,
251
+ "learning_rate": 4.900396026378671e-06,
252
+ "logits/chosen": -0.25241002440452576,
253
+ "logits/rejected": -0.2686356008052826,
254
+ "logps/chosen": -576.2278442382812,
255
+ "logps/rejected": -611.9133911132812,
256
+ "loss": 0.2044,
257
+ "rewards/accuracies": 0.4437499940395355,
258
+ "rewards/chosen": -0.1014503687620163,
259
+ "rewards/margins": 0.020282840356230736,
260
+ "rewards/rejected": -0.12173320353031158,
261
+ "step": 170
262
+ },
263
+ {
264
+ "epoch": 0.19,
265
+ "learning_rate": 4.872699274339169e-06,
266
+ "logits/chosen": -0.24474278092384338,
267
+ "logits/rejected": -0.19586482644081116,
268
+ "logps/chosen": -570.9044189453125,
269
+ "logps/rejected": -617.5431518554688,
270
+ "loss": 0.1944,
271
+ "rewards/accuracies": 0.44999998807907104,
272
+ "rewards/chosen": -0.09906121343374252,
273
+ "rewards/margins": 0.01674678549170494,
274
+ "rewards/rejected": -0.11580799520015717,
275
+ "step": 180
276
+ },
277
+ {
278
+ "epoch": 0.2,
279
+ "learning_rate": 4.8417076618132434e-06,
280
+ "logits/chosen": -0.2917916774749756,
281
+ "logits/rejected": -0.20423956215381622,
282
+ "logps/chosen": -567.7699584960938,
283
+ "logps/rejected": -593.5147705078125,
284
+ "loss": 0.2046,
285
+ "rewards/accuracies": 0.40625,
286
+ "rewards/chosen": -0.08719009160995483,
287
+ "rewards/margins": 0.013276703655719757,
288
+ "rewards/rejected": -0.10046680271625519,
289
+ "step": 190
290
+ },
291
+ {
292
+ "epoch": 0.21,
293
+ "learning_rate": 4.807464225455655e-06,
294
+ "logits/chosen": -0.14698217809200287,
295
+ "logits/rejected": -0.23266562819480896,
296
+ "logps/chosen": -531.8690185546875,
297
+ "logps/rejected": -583.5828857421875,
298
+ "loss": 0.1964,
299
+ "rewards/accuracies": 0.40625,
300
+ "rewards/chosen": -0.07782838493585587,
301
+ "rewards/margins": 0.0252009816467762,
302
+ "rewards/rejected": -0.10302937030792236,
303
+ "step": 200
304
+ },
305
+ {
306
+ "epoch": 0.22,
307
+ "learning_rate": 4.770016517582283e-06,
308
+ "logits/chosen": -0.21580150723457336,
309
+ "logits/rejected": -0.18905040621757507,
310
+ "logps/chosen": -626.87744140625,
311
+ "logps/rejected": -649.6925659179688,
312
+ "loss": 0.1977,
313
+ "rewards/accuracies": 0.48750001192092896,
314
+ "rewards/chosen": -0.104043148458004,
315
+ "rewards/margins": 0.021797046065330505,
316
+ "rewards/rejected": -0.1258401870727539,
317
+ "step": 210
318
+ },
319
+ {
320
+ "epoch": 0.23,
321
+ "learning_rate": 4.7294165401363616e-06,
322
+ "logits/chosen": -0.12353191524744034,
323
+ "logits/rejected": -0.2215413749217987,
324
+ "logps/chosen": -633.0154418945312,
325
+ "logps/rejected": -633.0941162109375,
326
+ "loss": 0.2058,
327
+ "rewards/accuracies": 0.4124999940395355,
328
+ "rewards/chosen": -0.10003998130559921,
329
+ "rewards/margins": 0.009050301276147366,
330
+ "rewards/rejected": -0.10909029096364975,
331
+ "step": 220
332
+ },
333
+ {
334
+ "epoch": 0.25,
335
+ "learning_rate": 4.68572067247573e-06,
336
+ "logits/chosen": -0.16852374374866486,
337
+ "logits/rejected": -0.21371085941791534,
338
+ "logps/chosen": -614.1183471679688,
339
+ "logps/rejected": -670.2012939453125,
340
+ "loss": 0.2077,
341
+ "rewards/accuracies": 0.4375,
342
+ "rewards/chosen": -0.08841963112354279,
343
+ "rewards/margins": 0.02279593050479889,
344
+ "rewards/rejected": -0.11121556907892227,
345
+ "step": 230
346
+ },
347
+ {
348
+ "epoch": 0.26,
349
+ "learning_rate": 4.638989593081364e-06,
350
+ "logits/chosen": -0.1663983315229416,
351
+ "logits/rejected": -0.21970775723457336,
352
+ "logps/chosen": -602.5869750976562,
353
+ "logps/rejected": -618.7034912109375,
354
+ "loss": 0.2061,
355
+ "rewards/accuracies": 0.4937500059604645,
356
+ "rewards/chosen": -0.07862231880426407,
357
+ "rewards/margins": 0.021257968619465828,
358
+ "rewards/rejected": -0.09988027811050415,
359
+ "step": 240
360
+ },
361
+ {
362
+ "epoch": 0.27,
363
+ "learning_rate": 4.5892881952959015e-06,
364
+ "logits/chosen": -0.21088270843029022,
365
+ "logits/rejected": -0.14775848388671875,
366
+ "logps/chosen": -577.7684326171875,
367
+ "logps/rejected": -632.3033447265625,
368
+ "loss": 0.2054,
369
+ "rewards/accuracies": 0.4937500059604645,
370
+ "rewards/chosen": -0.0773148387670517,
371
+ "rewards/margins": 0.026050010696053505,
372
+ "rewards/rejected": -0.10336484014987946,
373
+ "step": 250
374
+ },
375
+ {
376
+ "epoch": 0.28,
377
+ "learning_rate": 4.536685497209182e-06,
378
+ "logits/chosen": -0.1055503636598587,
379
+ "logits/rejected": -0.06379745155572891,
380
+ "logps/chosen": -522.751708984375,
381
+ "logps/rejected": -602.4344482421875,
382
+ "loss": 0.2001,
383
+ "rewards/accuracies": 0.4375,
384
+ "rewards/chosen": -0.06098253279924393,
385
+ "rewards/margins": 0.030480870977044106,
386
+ "rewards/rejected": -0.09146340191364288,
387
+ "step": 260
388
+ },
389
+ {
390
+ "epoch": 0.29,
391
+ "learning_rate": 4.481254545815943e-06,
392
+ "logits/chosen": -0.15926873683929443,
393
+ "logits/rejected": -0.04976898431777954,
394
+ "logps/chosen": -529.4932250976562,
395
+ "logps/rejected": -549.9386596679688,
396
+ "loss": 0.1973,
397
+ "rewards/accuracies": 0.4437499940395355,
398
+ "rewards/chosen": -0.06077051907777786,
399
+ "rewards/margins": 0.01582062616944313,
400
+ "rewards/rejected": -0.0765911340713501,
401
+ "step": 270
402
+ },
403
+ {
404
+ "epoch": 0.3,
405
+ "learning_rate": 4.42307231557875e-06,
406
+ "logits/chosen": -0.07944826781749725,
407
+ "logits/rejected": -0.05855567380785942,
408
+ "logps/chosen": -512.50439453125,
409
+ "logps/rejected": -543.458984375,
410
+ "loss": 0.1986,
411
+ "rewards/accuracies": 0.4312500059604645,
412
+ "rewards/chosen": -0.06550983339548111,
413
+ "rewards/margins": 0.023027174174785614,
414
+ "rewards/rejected": -0.08853700011968613,
415
+ "step": 280
416
+ },
417
+ {
418
+ "epoch": 0.31,
419
+ "learning_rate": 4.3622196015370305e-06,
420
+ "logits/chosen": -0.12430046498775482,
421
+ "logits/rejected": -0.06956211477518082,
422
+ "logps/chosen": -550.2479248046875,
423
+ "logps/rejected": -614.044189453125,
424
+ "loss": 0.1944,
425
+ "rewards/accuracies": 0.5062500238418579,
426
+ "rewards/chosen": -0.056610800325870514,
427
+ "rewards/margins": 0.029858995229005814,
428
+ "rewards/rejected": -0.08646979182958603,
429
+ "step": 290
430
+ },
431
+ {
432
+ "epoch": 0.32,
433
+ "learning_rate": 4.298780907110648e-06,
434
+ "logits/chosen": -0.09455857425928116,
435
+ "logits/rejected": -0.07383386790752411,
436
+ "logps/chosen": -598.065185546875,
437
+ "logps/rejected": -647.9603271484375,
438
+ "loss": 0.1876,
439
+ "rewards/accuracies": 0.4437499940395355,
440
+ "rewards/chosen": -0.06337399780750275,
441
+ "rewards/margins": 0.026696253567934036,
442
+ "rewards/rejected": -0.09007024019956589,
443
+ "step": 300
444
+ },
445
+ {
446
+ "epoch": 0.33,
447
+ "learning_rate": 4.23284432675381e-06,
448
+ "logits/chosen": -0.19348487257957458,
449
+ "logits/rejected": -0.1443384736776352,
450
+ "logps/chosen": -539.6243896484375,
451
+ "logps/rejected": -612.7183837890625,
452
+ "loss": 0.1963,
453
+ "rewards/accuracies": 0.5062500238418579,
454
+ "rewards/chosen": -0.05517622083425522,
455
+ "rewards/margins": 0.02591213583946228,
456
+ "rewards/rejected": -0.0810883566737175,
457
+ "step": 310
458
+ },
459
+ {
460
+ "epoch": 0.34,
461
+ "learning_rate": 4.164501423622277e-06,
462
+ "logits/chosen": -0.19629542529582977,
463
+ "logits/rejected": -0.13960464298725128,
464
+ "logps/chosen": -516.0609130859375,
465
+ "logps/rejected": -658.4205932617188,
466
+ "loss": 0.1915,
467
+ "rewards/accuracies": 0.5562499761581421,
468
+ "rewards/chosen": -0.05958019569516182,
469
+ "rewards/margins": 0.06007415056228638,
470
+ "rewards/rejected": -0.1196543425321579,
471
+ "step": 320
472
+ },
473
+ {
474
+ "epoch": 0.35,
475
+ "learning_rate": 4.0938471024237355e-06,
476
+ "logits/chosen": -0.1600683629512787,
477
+ "logits/rejected": -0.10378336906433105,
478
+ "logps/chosen": -590.7578125,
479
+ "logps/rejected": -621.64697265625,
480
+ "loss": 0.2007,
481
+ "rewards/accuracies": 0.4437499940395355,
482
+ "rewards/chosen": -0.08227936178445816,
483
+ "rewards/margins": 0.01520558726042509,
484
+ "rewards/rejected": -0.09748493880033493,
485
+ "step": 330
486
+ },
487
+ {
488
+ "epoch": 0.36,
489
+ "learning_rate": 4.020979477627907e-06,
490
+ "logits/chosen": -0.19418606162071228,
491
+ "logits/rejected": -0.1177397221326828,
492
+ "logps/chosen": -586.6962890625,
493
+ "logps/rejected": -654.0504150390625,
494
+ "loss": 0.1894,
495
+ "rewards/accuracies": 0.518750011920929,
496
+ "rewards/chosen": -0.07023846358060837,
497
+ "rewards/margins": 0.03478557616472244,
498
+ "rewards/rejected": -0.10502403974533081,
499
+ "step": 340
500
+ },
501
+ {
502
+ "epoch": 0.37,
503
+ "learning_rate": 3.9459997372194105e-06,
504
+ "logits/chosen": -0.1304813176393509,
505
+ "logits/rejected": -0.04862945154309273,
506
+ "logps/chosen": -594.4133911132812,
507
+ "logps/rejected": -617.715087890625,
508
+ "loss": 0.192,
509
+ "rewards/accuracies": 0.5,
510
+ "rewards/chosen": -0.08139745891094208,
511
+ "rewards/margins": 0.026553615927696228,
512
+ "rewards/rejected": -0.10795106738805771,
513
+ "step": 350
514
+ },
515
+ {
516
+ "epoch": 0.38,
517
+ "learning_rate": 3.869012002182573e-06,
518
+ "logits/chosen": -0.21274884045124054,
519
+ "logits/rejected": -0.03855857998132706,
520
+ "logps/chosen": -557.4656982421875,
521
+ "logps/rejected": -637.321044921875,
522
+ "loss": 0.1848,
523
+ "rewards/accuracies": 0.4749999940395355,
524
+ "rewards/chosen": -0.07546891272068024,
525
+ "rewards/margins": 0.03727220743894577,
526
+ "rewards/rejected": -0.1127411276102066,
527
+ "step": 360
528
+ },
529
+ {
530
+ "epoch": 0.39,
531
+ "learning_rate": 3.7901231819133104e-06,
532
+ "logits/chosen": -0.10762195289134979,
533
+ "logits/rejected": -0.10060106217861176,
534
+ "logps/chosen": -599.8753051757812,
535
+ "logps/rejected": -646.8792724609375,
536
+ "loss": 0.1955,
537
+ "rewards/accuracies": 0.41874998807907104,
538
+ "rewards/chosen": -0.0741112157702446,
539
+ "rewards/margins": 0.03268015384674072,
540
+ "rewards/rejected": -0.10679137706756592,
541
+ "step": 370
542
+ },
543
+ {
544
+ "epoch": 0.41,
545
+ "learning_rate": 3.709442825758875e-06,
546
+ "logits/chosen": -0.12406639009714127,
547
+ "logits/rejected": -0.053130537271499634,
548
+ "logps/chosen": -587.0034790039062,
549
+ "logps/rejected": -618.0760498046875,
550
+ "loss": 0.19,
551
+ "rewards/accuracies": 0.4937500059604645,
552
+ "rewards/chosen": -0.07897321879863739,
553
+ "rewards/margins": 0.025586843490600586,
554
+ "rewards/rejected": -0.10456006228923798,
555
+ "step": 380
556
+ },
557
+ {
558
+ "epoch": 0.42,
559
+ "learning_rate": 3.6270829708916113e-06,
560
+ "logits/chosen": -0.11101411283016205,
561
+ "logits/rejected": -0.08626400679349899,
562
+ "logps/chosen": -569.6163330078125,
563
+ "logps/rejected": -620.4082641601562,
564
+ "loss": 0.1913,
565
+ "rewards/accuracies": 0.48124998807907104,
566
+ "rewards/chosen": -0.06503543257713318,
567
+ "rewards/margins": 0.037478551268577576,
568
+ "rewards/rejected": -0.10251398384571075,
569
+ "step": 390
570
+ },
571
+ {
572
+ "epoch": 0.43,
573
+ "learning_rate": 3.543157986727991e-06,
574
+ "logits/chosen": -0.11596628278493881,
575
+ "logits/rejected": -0.09326865524053574,
576
+ "logps/chosen": -569.7626342773438,
577
+ "logps/rejected": -647.47119140625,
578
+ "loss": 0.1913,
579
+ "rewards/accuracies": 0.5562499761581421,
580
+ "rewards/chosen": -0.0574682354927063,
581
+ "rewards/margins": 0.03390919789671898,
582
+ "rewards/rejected": -0.09137743711471558,
583
+ "step": 400
584
+ },
585
+ {
586
+ "epoch": 0.44,
587
+ "learning_rate": 3.4577844161089614e-06,
588
+ "logits/chosen": -0.1688176691532135,
589
+ "logits/rejected": -0.1762055903673172,
590
+ "logps/chosen": -548.4512939453125,
591
+ "logps/rejected": -596.2463989257812,
592
+ "loss": 0.1879,
593
+ "rewards/accuracies": 0.5062500238418579,
594
+ "rewards/chosen": -0.054659001529216766,
595
+ "rewards/margins": 0.025764942169189453,
596
+ "rewards/rejected": -0.08042393624782562,
597
+ "step": 410
598
+ },
599
+ {
600
+ "epoch": 0.45,
601
+ "learning_rate": 3.3710808134621577e-06,
602
+ "logits/chosen": -0.12280504405498505,
603
+ "logits/rejected": -0.018482182174921036,
604
+ "logps/chosen": -567.9172973632812,
605
+ "logps/rejected": -593.0560302734375,
606
+ "loss": 0.189,
607
+ "rewards/accuracies": 0.5062500238418579,
608
+ "rewards/chosen": -0.0538947694003582,
609
+ "rewards/margins": 0.02232169173657894,
610
+ "rewards/rejected": -0.07621645927429199,
611
+ "step": 420
612
+ },
613
+ {
614
+ "epoch": 0.46,
615
+ "learning_rate": 3.2831675801707126e-06,
616
+ "logits/chosen": -0.04735702648758888,
617
+ "logits/rejected": -0.10849102586507797,
618
+ "logps/chosen": -590.4489135742188,
619
+ "logps/rejected": -649.82568359375,
620
+ "loss": 0.1887,
621
+ "rewards/accuracies": 0.581250011920929,
622
+ "rewards/chosen": -0.04551684111356735,
623
+ "rewards/margins": 0.026576777920126915,
624
+ "rewards/rejected": -0.07209362089633942,
625
+ "step": 430
626
+ },
627
+ {
628
+ "epoch": 0.47,
629
+ "learning_rate": 3.194166797377289e-06,
630
+ "logits/chosen": -0.08134131878614426,
631
+ "logits/rejected": -0.1677294671535492,
632
+ "logps/chosen": -574.8263549804688,
633
+ "logps/rejected": -607.7601318359375,
634
+ "loss": 0.1893,
635
+ "rewards/accuracies": 0.46875,
636
+ "rewards/chosen": -0.04221652075648308,
637
+ "rewards/margins": 0.030459443107247353,
638
+ "rewards/rejected": -0.07267596572637558,
639
+ "step": 440
640
+ },
641
+ {
642
+ "epoch": 0.48,
643
+ "learning_rate": 3.104202056455501e-06,
644
+ "logits/chosen": -0.0588027760386467,
645
+ "logits/rejected": -0.1330319195985794,
646
+ "logps/chosen": -547.6630249023438,
647
+ "logps/rejected": -580.7600708007812,
648
+ "loss": 0.1985,
649
+ "rewards/accuracies": 0.4937500059604645,
650
+ "rewards/chosen": -0.04689568281173706,
651
+ "rewards/margins": 0.024683769792318344,
652
+ "rewards/rejected": -0.07157944142818451,
653
+ "step": 450
654
+ },
655
+ {
656
+ "epoch": 0.49,
657
+ "learning_rate": 3.013398287384144e-06,
658
+ "logits/chosen": -0.0910586565732956,
659
+ "logits/rejected": -0.13333860039710999,
660
+ "logps/chosen": -520.99267578125,
661
+ "logps/rejected": -608.8109130859375,
662
+ "loss": 0.1948,
663
+ "rewards/accuracies": 0.5375000238418579,
664
+ "rewards/chosen": -0.04666762426495552,
665
+ "rewards/margins": 0.04471370577812195,
666
+ "rewards/rejected": -0.09138132631778717,
667
+ "step": 460
668
+ },
669
+ {
670
+ "epoch": 0.5,
671
+ "learning_rate": 2.9218815852625717e-06,
672
+ "logits/chosen": -0.09454444795846939,
673
+ "logits/rejected": -0.04375922679901123,
674
+ "logps/chosen": -620.7197265625,
675
+ "logps/rejected": -636.3668212890625,
676
+ "loss": 0.201,
677
+ "rewards/accuracies": 0.5062500238418579,
678
+ "rewards/chosen": -0.06732948869466782,
679
+ "rewards/margins": 0.026028599590063095,
680
+ "rewards/rejected": -0.09335808455944061,
681
+ "step": 470
682
+ },
683
+ {
684
+ "epoch": 0.51,
685
+ "learning_rate": 2.829779035208113e-06,
686
+ "logits/chosen": -0.09432949125766754,
687
+ "logits/rejected": -0.08926217257976532,
688
+ "logps/chosen": -597.0772705078125,
689
+ "logps/rejected": -639.5493774414062,
690
+ "loss": 0.1909,
691
+ "rewards/accuracies": 0.606249988079071,
692
+ "rewards/chosen": -0.040321771055459976,
693
+ "rewards/margins": 0.03370783478021622,
694
+ "rewards/rejected": -0.07402960956096649,
695
+ "step": 480
696
+ },
697
+ {
698
+ "epoch": 0.52,
699
+ "learning_rate": 2.737218535878705e-06,
700
+ "logits/chosen": -0.1773318350315094,
701
+ "logits/rejected": -0.07903443276882172,
702
+ "logps/chosen": -552.8883666992188,
703
+ "logps/rejected": -618.2833251953125,
704
+ "loss": 0.2029,
705
+ "rewards/accuracies": 0.5249999761581421,
706
+ "rewards/chosen": -0.04510737583041191,
707
+ "rewards/margins": 0.028245270252227783,
708
+ "rewards/rejected": -0.07335264980792999,
709
+ "step": 490
710
+ },
711
+ {
712
+ "epoch": 0.53,
713
+ "learning_rate": 2.64432862186579e-06,
714
+ "logits/chosen": -0.07201124727725983,
715
+ "logits/rejected": -0.04144411161541939,
716
+ "logps/chosen": -526.00634765625,
717
+ "logps/rejected": -577.3812255859375,
718
+ "loss": 0.1891,
719
+ "rewards/accuracies": 0.5249999761581421,
720
+ "rewards/chosen": -0.03259889408946037,
721
+ "rewards/margins": 0.028664156794548035,
722
+ "rewards/rejected": -0.06126304715871811,
723
+ "step": 500
724
+ },
725
+ {
726
+ "epoch": 0.54,
727
+ "learning_rate": 2.551238285204126e-06,
728
+ "logits/chosen": -0.13225743174552917,
729
+ "logits/rejected": -0.03518156707286835,
730
+ "logps/chosen": -558.69970703125,
731
+ "logps/rejected": -633.7002563476562,
732
+ "loss": 0.1987,
733
+ "rewards/accuracies": 0.5687500238418579,
734
+ "rewards/chosen": -0.034947603940963745,
735
+ "rewards/margins": 0.041034139692783356,
736
+ "rewards/rejected": -0.0759817361831665,
737
+ "step": 510
738
+ },
739
+ {
740
+ "epoch": 0.55,
741
+ "learning_rate": 2.4580767962463688e-06,
742
+ "logits/chosen": -0.03775392845273018,
743
+ "logits/rejected": -0.06259463727474213,
744
+ "logps/chosen": -564.3277587890625,
745
+ "logps/rejected": -616.877685546875,
746
+ "loss": 0.1935,
747
+ "rewards/accuracies": 0.5249999761581421,
748
+ "rewards/chosen": -0.041550230234861374,
749
+ "rewards/margins": 0.04528028517961502,
750
+ "rewards/rejected": -0.0868305116891861,
751
+ "step": 520
752
+ },
753
+ {
754
+ "epoch": 0.57,
755
+ "learning_rate": 2.3649735241511546e-06,
756
+ "logits/chosen": -0.11865083128213882,
757
+ "logits/rejected": -0.14535991847515106,
758
+ "logps/chosen": -539.8975219726562,
759
+ "logps/rejected": -628.8270263671875,
760
+ "loss": 0.1988,
761
+ "rewards/accuracies": 0.550000011920929,
762
+ "rewards/chosen": -0.06274162977933884,
763
+ "rewards/margins": 0.050676118582487106,
764
+ "rewards/rejected": -0.11341774463653564,
765
+ "step": 530
766
+ },
767
+ {
768
+ "epoch": 0.58,
769
+ "learning_rate": 2.2720577572339914e-06,
770
+ "logits/chosen": -0.1661374866962433,
771
+ "logits/rejected": -0.10748039186000824,
772
+ "logps/chosen": -546.2053833007812,
773
+ "logps/rejected": -584.2305908203125,
774
+ "loss": 0.1901,
775
+ "rewards/accuracies": 0.48124998807907104,
776
+ "rewards/chosen": -0.05626441910862923,
777
+ "rewards/margins": 0.02776341699063778,
778
+ "rewards/rejected": -0.08402784168720245,
779
+ "step": 540
780
+ },
781
+ {
782
+ "epoch": 0.59,
783
+ "learning_rate": 2.1794585234303995e-06,
784
+ "logits/chosen": -0.10749207437038422,
785
+ "logits/rejected": -0.13697417080402374,
786
+ "logps/chosen": -517.0869140625,
787
+ "logps/rejected": -581.8153686523438,
788
+ "loss": 0.1866,
789
+ "rewards/accuracies": 0.4749999940395355,
790
+ "rewards/chosen": -0.052382372319698334,
791
+ "rewards/margins": 0.035972487181425095,
792
+ "rewards/rejected": -0.08835486322641373,
793
+ "step": 550
794
+ },
795
+ {
796
+ "epoch": 0.6,
797
+ "learning_rate": 2.0873044111206407e-06,
798
+ "logits/chosen": -0.1282195746898651,
799
+ "logits/rejected": -0.1339006870985031,
800
+ "logps/chosen": -576.3350830078125,
801
+ "logps/rejected": -666.8603515625,
802
+ "loss": 0.1907,
803
+ "rewards/accuracies": 0.5625,
804
+ "rewards/chosen": -0.04062817618250847,
805
+ "rewards/margins": 0.03738432377576828,
806
+ "rewards/rejected": -0.07801250368356705,
807
+ "step": 560
808
+ },
809
+ {
810
+ "epoch": 0.61,
811
+ "learning_rate": 1.9957233905648293e-06,
812
+ "logits/chosen": -0.10549817234277725,
813
+ "logits/rejected": -0.11278073489665985,
814
+ "logps/chosen": -566.6007080078125,
815
+ "logps/rejected": -636.8270263671875,
816
+ "loss": 0.1877,
817
+ "rewards/accuracies": 0.574999988079071,
818
+ "rewards/chosen": -0.048470962792634964,
819
+ "rewards/margins": 0.04373977333307266,
820
+ "rewards/rejected": -0.09221073240041733,
821
+ "step": 570
822
+ },
823
+ {
824
+ "epoch": 0.62,
825
+ "learning_rate": 1.904842636196402e-06,
826
+ "logits/chosen": -0.0554957278072834,
827
+ "logits/rejected": -0.13037823140621185,
828
+ "logps/chosen": -597.04150390625,
829
+ "logps/rejected": -615.6434326171875,
830
+ "loss": 0.1909,
831
+ "rewards/accuracies": 0.5,
832
+ "rewards/chosen": -0.0562109649181366,
833
+ "rewards/margins": 0.028234709054231644,
834
+ "rewards/rejected": -0.08444567024707794,
835
+ "step": 580
836
+ },
837
+ {
838
+ "epoch": 0.63,
839
+ "learning_rate": 1.814788350020726e-06,
840
+ "logits/chosen": -0.0553332157433033,
841
+ "logits/rejected": -0.14984294772148132,
842
+ "logps/chosen": -511.7176818847656,
843
+ "logps/rejected": -577.5421752929688,
844
+ "loss": 0.1891,
845
+ "rewards/accuracies": 0.46875,
846
+ "rewards/chosen": -0.05183824896812439,
847
+ "rewards/margins": 0.0338759571313858,
848
+ "rewards/rejected": -0.08571420609951019,
849
+ "step": 590
850
+ },
851
+ {
852
+ "epoch": 0.64,
853
+ "learning_rate": 1.725685586364051e-06,
854
+ "logits/chosen": -0.1068972796201706,
855
+ "logits/rejected": -0.13699831068515778,
856
+ "logps/chosen": -547.6019897460938,
857
+ "logps/rejected": -624.2053833007812,
858
+ "loss": 0.1908,
859
+ "rewards/accuracies": 0.5625,
860
+ "rewards/chosen": -0.04226940870285034,
861
+ "rewards/margins": 0.04575734585523605,
862
+ "rewards/rejected": -0.08802676200866699,
863
+ "step": 600
864
+ },
865
+ {
866
+ "epoch": 0.65,
867
+ "learning_rate": 1.6376580782162172e-06,
868
+ "logits/chosen": -0.12253417819738388,
869
+ "logits/rejected": -0.09159277379512787,
870
+ "logps/chosen": -534.8265380859375,
871
+ "logps/rejected": -639.2476806640625,
872
+ "loss": 0.1866,
873
+ "rewards/accuracies": 0.550000011920929,
874
+ "rewards/chosen": -0.038840554654598236,
875
+ "rewards/margins": 0.04929639771580696,
876
+ "rewards/rejected": -0.0881369560956955,
877
+ "step": 610
878
+ },
879
+ {
880
+ "epoch": 0.66,
881
+ "learning_rate": 1.550828065408227e-06,
882
+ "logits/chosen": -0.11153294146060944,
883
+ "logits/rejected": -0.0631122812628746,
884
+ "logps/chosen": -581.9796142578125,
885
+ "logps/rejected": -639.3689575195312,
886
+ "loss": 0.1738,
887
+ "rewards/accuracies": 0.53125,
888
+ "rewards/chosen": -0.04295315593481064,
889
+ "rewards/margins": 0.037230443209409714,
890
+ "rewards/rejected": -0.08018360286951065,
891
+ "step": 620
892
+ },
893
+ {
894
+ "epoch": 0.67,
895
+ "learning_rate": 1.4653161248633053e-06,
896
+ "logits/chosen": -0.10305066406726837,
897
+ "logits/rejected": -0.13783864676952362,
898
+ "logps/chosen": -582.2150268554688,
899
+ "logps/rejected": -607.2169799804688,
900
+ "loss": 0.1865,
901
+ "rewards/accuracies": 0.48750001192092896,
902
+ "rewards/chosen": -0.05051354691386223,
903
+ "rewards/margins": 0.02962956391274929,
904
+ "rewards/rejected": -0.08014310896396637,
905
+ "step": 630
906
+ },
907
+ {
908
+ "epoch": 0.68,
909
+ "learning_rate": 1.381241003157162e-06,
910
+ "logits/chosen": -0.09553556144237518,
911
+ "logits/rejected": -0.1049310564994812,
912
+ "logps/chosen": -561.0845947265625,
913
+ "logps/rejected": -615.9722900390625,
914
+ "loss": 0.19,
915
+ "rewards/accuracies": 0.5249999761581421,
916
+ "rewards/chosen": -0.046824414283037186,
917
+ "rewards/margins": 0.03598689287900925,
918
+ "rewards/rejected": -0.08281131088733673,
919
+ "step": 640
920
+ },
921
+ {
922
+ "epoch": 0.69,
923
+ "learning_rate": 1.298719451619979e-06,
924
+ "logits/chosen": -0.1247280016541481,
925
+ "logits/rejected": -0.0659816786646843,
926
+ "logps/chosen": -560.4979858398438,
927
+ "logps/rejected": -620.7578735351562,
928
+ "loss": 0.2002,
929
+ "rewards/accuracies": 0.5062500238418579,
930
+ "rewards/chosen": -0.04652046412229538,
931
+ "rewards/margins": 0.041216202080249786,
932
+ "rewards/rejected": -0.08773668110370636,
933
+ "step": 650
934
+ },
935
+ {
936
+ "epoch": 0.7,
937
+ "learning_rate": 1.2178660642091036e-06,
938
+ "logits/chosen": -0.03698350116610527,
939
+ "logits/rejected": -0.2196667492389679,
940
+ "logps/chosen": -521.7525634765625,
941
+ "logps/rejected": -626.46435546875,
942
+ "loss": 0.1979,
943
+ "rewards/accuracies": 0.4749999940395355,
944
+ "rewards/chosen": -0.05702243372797966,
945
+ "rewards/margins": 0.041262269020080566,
946
+ "rewards/rejected": -0.09828470647335052,
947
+ "step": 660
948
+ },
949
+ {
950
+ "epoch": 0.71,
951
+ "learning_rate": 1.1387931183775821e-06,
952
+ "logits/chosen": -0.1309659779071808,
953
+ "logits/rejected": -0.126008078455925,
954
+ "logps/chosen": -526.6151123046875,
955
+ "logps/rejected": -586.6326293945312,
956
+ "loss": 0.1836,
957
+ "rewards/accuracies": 0.5,
958
+ "rewards/chosen": -0.0365142747759819,
959
+ "rewards/margins": 0.039250634610652924,
960
+ "rewards/rejected": -0.07576490938663483,
961
+ "step": 670
962
+ },
963
+ {
964
+ "epoch": 0.73,
965
+ "learning_rate": 1.061610419159532e-06,
966
+ "logits/chosen": -0.06580721586942673,
967
+ "logits/rejected": -0.11697240173816681,
968
+ "logps/chosen": -545.3971557617188,
969
+ "logps/rejected": -590.3699340820312,
970
+ "loss": 0.186,
971
+ "rewards/accuracies": 0.5249999761581421,
972
+ "rewards/chosen": -0.040514297783374786,
973
+ "rewards/margins": 0.041993193328380585,
974
+ "rewards/rejected": -0.08250749111175537,
975
+ "step": 680
976
+ },
977
+ {
978
+ "epoch": 0.74,
979
+ "learning_rate": 9.864251466888364e-07,
980
+ "logits/chosen": 0.015632059425115585,
981
+ "logits/rejected": -0.14370284974575043,
982
+ "logps/chosen": -527.1017456054688,
983
+ "logps/rejected": -602.5015869140625,
984
+ "loss": 0.1872,
985
+ "rewards/accuracies": 0.5375000238418579,
986
+ "rewards/chosen": -0.03584219887852669,
987
+ "rewards/margins": 0.0341840498149395,
988
+ "rewards/rejected": -0.07002625614404678,
989
+ "step": 690
990
+ },
991
+ {
992
+ "epoch": 0.75,
993
+ "learning_rate": 9.133417073629288e-07,
994
+ "logits/chosen": -0.1096029132604599,
995
+ "logits/rejected": -0.09382790327072144,
996
+ "logps/chosen": -552.9088745117188,
997
+ "logps/rejected": -619.2091674804688,
998
+ "loss": 0.1929,
999
+ "rewards/accuracies": 0.5249999761581421,
1000
+ "rewards/chosen": -0.04123011603951454,
1001
+ "rewards/margins": 0.03130009397864342,
1002
+ "rewards/rejected": -0.07253019511699677,
1003
+ "step": 700
1004
+ },
1005
+ {
1006
+ "epoch": 0.76,
1007
+ "learning_rate": 8.424615888583332e-07,
1008
+ "logits/chosen": -0.1330350786447525,
1009
+ "logits/rejected": -0.07537052035331726,
1010
+ "logps/chosen": -521.3177490234375,
1011
+ "logps/rejected": -601.4888305664062,
1012
+ "loss": 0.1829,
1013
+ "rewards/accuracies": 0.5687500238418579,
1014
+ "rewards/chosen": -0.037754353135824203,
1015
+ "rewards/margins": 0.041079822927713394,
1016
+ "rewards/rejected": -0.0788341760635376,
1017
+ "step": 710
1018
+ },
1019
+ {
1020
+ "epoch": 0.77,
1021
+ "learning_rate": 7.738832191993092e-07,
1022
+ "logits/chosen": -0.13393089175224304,
1023
+ "logits/rejected": -0.07735292613506317,
1024
+ "logps/chosen": -589.1104736328125,
1025
+ "logps/rejected": -623.0423583984375,
1026
+ "loss": 0.1937,
1027
+ "rewards/accuracies": 0.59375,
1028
+ "rewards/chosen": -0.04533671587705612,
1029
+ "rewards/margins": 0.03662148863077164,
1030
+ "rewards/rejected": -0.08195820450782776,
1031
+ "step": 720
1032
+ },
1033
+ {
1034
+ "epoch": 0.78,
1035
+ "learning_rate": 7.077018300752917e-07,
1036
+ "logits/chosen": -0.09014391899108887,
1037
+ "logits/rejected": -0.02712271548807621,
1038
+ "logps/chosen": -550.0320434570312,
1039
+ "logps/rejected": -605.1174926757812,
1040
+ "loss": 0.1961,
1041
+ "rewards/accuracies": 0.518750011920929,
1042
+ "rewards/chosen": -0.05133052542805672,
1043
+ "rewards/margins": 0.041539210826158524,
1044
+ "rewards/rejected": -0.09286972880363464,
1045
+ "step": 730
1046
+ },
1047
+ {
1048
+ "epoch": 0.79,
1049
+ "learning_rate": 6.440093245969342e-07,
1050
+ "logits/chosen": -0.08313737064599991,
1051
+ "logits/rejected": -0.1943168193101883,
1052
+ "logps/chosen": -516.8920288085938,
1053
+ "logps/rejected": -601.4186401367188,
1054
+ "loss": 0.1848,
1055
+ "rewards/accuracies": 0.606249988079071,
1056
+ "rewards/chosen": -0.04221433773636818,
1057
+ "rewards/margins": 0.0475175604224205,
1058
+ "rewards/rejected": -0.08973188698291779,
1059
+ "step": 740
1060
+ },
1061
+ {
1062
+ "epoch": 0.8,
1063
+ "learning_rate": 5.828941496744075e-07,
1064
+ "logits/chosen": -0.11161942780017853,
1065
+ "logits/rejected": -0.0919300764799118,
1066
+ "logps/chosen": -563.8603515625,
1067
+ "logps/rejected": -619.1151733398438,
1068
+ "loss": 0.1903,
1069
+ "rewards/accuracies": 0.5,
1070
+ "rewards/chosen": -0.04418020322918892,
1071
+ "rewards/margins": 0.03953651711344719,
1072
+ "rewards/rejected": -0.08371671289205551,
1073
+ "step": 750
1074
+ },
1075
+ {
1076
+ "epoch": 0.81,
1077
+ "learning_rate": 5.244411731951671e-07,
1078
+ "logits/chosen": -0.13506890833377838,
1079
+ "logits/rejected": -0.033810555934906006,
1080
+ "logps/chosen": -605.5892944335938,
1081
+ "logps/rejected": -609.83544921875,
1082
+ "loss": 0.1878,
1083
+ "rewards/accuracies": 0.48124998807907104,
1084
+ "rewards/chosen": -0.03747162967920303,
1085
+ "rewards/margins": 0.02192925289273262,
1086
+ "rewards/rejected": -0.059400878846645355,
1087
+ "step": 760
1088
+ },
1089
+ {
1090
+ "epoch": 0.82,
1091
+ "learning_rate": 4.6873156617173594e-07,
1092
+ "logits/chosen": -0.07261113822460175,
1093
+ "logits/rejected": -0.16117814183235168,
1094
+ "logps/chosen": -553.5911254882812,
1095
+ "logps/rejected": -624.5232543945312,
1096
+ "loss": 0.1921,
1097
+ "rewards/accuracies": 0.5687500238418579,
1098
+ "rewards/chosen": -0.04296105355024338,
1099
+ "rewards/margins": 0.0388905294239521,
1100
+ "rewards/rejected": -0.08185158669948578,
1101
+ "step": 770
1102
+ },
1103
+ {
1104
+ "epoch": 0.83,
1105
+ "learning_rate": 4.1584269002318653e-07,
1106
+ "logits/chosen": -0.07403261959552765,
1107
+ "logits/rejected": -0.054157156497240067,
1108
+ "logps/chosen": -535.3461303710938,
1109
+ "logps/rejected": -585.4727783203125,
1110
+ "loss": 0.1828,
1111
+ "rewards/accuracies": 0.4749999940395355,
1112
+ "rewards/chosen": -0.0406302735209465,
1113
+ "rewards/margins": 0.03608276695013046,
1114
+ "rewards/rejected": -0.07671303302049637,
1115
+ "step": 780
1116
+ },
1117
+ {
1118
+ "epoch": 0.84,
1119
+ "learning_rate": 3.658479891468258e-07,
1120
+ "logits/chosen": -0.1717700958251953,
1121
+ "logits/rejected": -0.08853835612535477,
1122
+ "logps/chosen": -527.3263549804688,
1123
+ "logps/rejected": -540.2444458007812,
1124
+ "loss": 0.1778,
1125
+ "rewards/accuracies": 0.4375,
1126
+ "rewards/chosen": -0.04036609083414078,
1127
+ "rewards/margins": 0.03141506761312485,
1128
+ "rewards/rejected": -0.07178115844726562,
1129
+ "step": 790
1130
+ },
1131
+ {
1132
+ "epoch": 0.85,
1133
+ "learning_rate": 3.18816888929272e-07,
1134
+ "logits/chosen": -0.09848084300756454,
1135
+ "logits/rejected": -0.06764743477106094,
1136
+ "logps/chosen": -563.3206787109375,
1137
+ "logps/rejected": -668.9093017578125,
1138
+ "loss": 0.2002,
1139
+ "rewards/accuracies": 0.5562499761581421,
1140
+ "rewards/chosen": -0.046812716871500015,
1141
+ "rewards/margins": 0.054834604263305664,
1142
+ "rewards/rejected": -0.10164730250835419,
1143
+ "step": 800
1144
+ },
1145
+ {
1146
+ "epoch": 0.86,
1147
+ "learning_rate": 2.748146993385484e-07,
1148
+ "logits/chosen": -0.09693370759487152,
1149
+ "logits/rejected": -0.07278673350811005,
1150
+ "logps/chosen": -522.9954833984375,
1151
+ "logps/rejected": -612.6608276367188,
1152
+ "loss": 0.1854,
1153
+ "rewards/accuracies": 0.5249999761581421,
1154
+ "rewards/chosen": -0.04407941550016403,
1155
+ "rewards/margins": 0.05026249960064888,
1156
+ "rewards/rejected": -0.09434191882610321,
1157
+ "step": 810
1158
+ },
1159
+ {
1160
+ "epoch": 0.87,
1161
+ "learning_rate": 2.3390252423108077e-07,
1162
+ "logits/chosen": -0.07084161043167114,
1163
+ "logits/rejected": -0.18225322663784027,
1164
+ "logps/chosen": -488.76483154296875,
1165
+ "logps/rejected": -558.3425903320312,
1166
+ "loss": 0.1939,
1167
+ "rewards/accuracies": 0.5,
1168
+ "rewards/chosen": -0.035873524844646454,
1169
+ "rewards/margins": 0.037640780210494995,
1170
+ "rewards/rejected": -0.07351429760456085,
1171
+ "step": 820
1172
+ },
1173
+ {
1174
+ "epoch": 0.89,
1175
+ "learning_rate": 1.961371764995243e-07,
1176
+ "logits/chosen": -0.11218070983886719,
1177
+ "logits/rejected": -0.143798828125,
1178
+ "logps/chosen": -548.5975341796875,
1179
+ "logps/rejected": -618.435302734375,
1180
+ "loss": 0.2009,
1181
+ "rewards/accuracies": 0.5375000238418579,
1182
+ "rewards/chosen": -0.03908687084913254,
1183
+ "rewards/margins": 0.042751066386699677,
1184
+ "rewards/rejected": -0.08183793723583221,
1185
+ "step": 830
1186
+ },
1187
+ {
1188
+ "epoch": 0.9,
1189
+ "learning_rate": 1.61571099179261e-07,
1190
+ "logits/chosen": -0.0712205171585083,
1191
+ "logits/rejected": -0.06110917776823044,
1192
+ "logps/chosen": -584.1240234375,
1193
+ "logps/rejected": -650.0173950195312,
1194
+ "loss": 0.1955,
1195
+ "rewards/accuracies": 0.4937500059604645,
1196
+ "rewards/chosen": -0.04009150713682175,
1197
+ "rewards/margins": 0.030330544337630272,
1198
+ "rewards/rejected": -0.07042204588651657,
1199
+ "step": 840
1200
+ },
1201
+ {
1202
+ "epoch": 0.91,
1203
+ "learning_rate": 1.3025229262312367e-07,
1204
+ "logits/chosen": -0.0935712531208992,
1205
+ "logits/rejected": -0.05454383045434952,
1206
+ "logps/chosen": -496.932861328125,
1207
+ "logps/rejected": -605.6661987304688,
1208
+ "loss": 0.1884,
1209
+ "rewards/accuracies": 0.5375000238418579,
1210
+ "rewards/chosen": -0.042653247714042664,
1211
+ "rewards/margins": 0.048957787454128265,
1212
+ "rewards/rejected": -0.09161103516817093,
1213
+ "step": 850
1214
+ },
1215
+ {
1216
+ "epoch": 0.92,
1217
+ "learning_rate": 1.0222424784546853e-07,
1218
+ "logits/chosen": -0.08921684324741364,
1219
+ "logits/rejected": -0.15163610875606537,
1220
+ "logps/chosen": -579.2117919921875,
1221
+ "logps/rejected": -619.4464111328125,
1222
+ "loss": 0.1904,
1223
+ "rewards/accuracies": 0.550000011920929,
1224
+ "rewards/chosen": -0.04733709245920181,
1225
+ "rewards/margins": 0.03301934152841568,
1226
+ "rewards/rejected": -0.08035643398761749,
1227
+ "step": 860
1228
+ },
1229
+ {
1230
+ "epoch": 0.93,
1231
+ "learning_rate": 7.752588612816553e-08,
1232
+ "logits/chosen": -0.04686546325683594,
1233
+ "logits/rejected": -0.15816907584667206,
1234
+ "logps/chosen": -509.0023498535156,
1235
+ "logps/rejected": -572.1159057617188,
1236
+ "loss": 0.1754,
1237
+ "rewards/accuracies": 0.512499988079071,
1238
+ "rewards/chosen": -0.042182981967926025,
1239
+ "rewards/margins": 0.04292844608426094,
1240
+ "rewards/rejected": -0.08511142432689667,
1241
+ "step": 870
1242
+ },
1243
+ {
1244
+ "epoch": 0.94,
1245
+ "learning_rate": 5.619150497236991e-08,
1246
+ "logits/chosen": -0.07643123716115952,
1247
+ "logits/rejected": -0.16245657205581665,
1248
+ "logps/chosen": -535.0369873046875,
1249
+ "logps/rejected": -608.0992431640625,
1250
+ "loss": 0.192,
1251
+ "rewards/accuracies": 0.5562499761581421,
1252
+ "rewards/chosen": -0.04792182892560959,
1253
+ "rewards/margins": 0.03496783226728439,
1254
+ "rewards/rejected": -0.08288966119289398,
1255
+ "step": 880
1256
+ },
1257
+ {
1258
+ "epoch": 0.95,
1259
+ "learning_rate": 3.825073047112743e-08,
1260
+ "logits/chosen": -0.13168227672576904,
1261
+ "logits/rejected": -0.046010442078113556,
1262
+ "logps/chosen": -579.3240356445312,
1263
+ "logps/rejected": -674.3414306640625,
1264
+ "loss": 0.1964,
1265
+ "rewards/accuracies": 0.612500011920929,
1266
+ "rewards/chosen": -0.04349333792924881,
1267
+ "rewards/margins": 0.047455307096242905,
1268
+ "rewards/rejected": -0.09094865620136261,
1269
+ "step": 890
1270
+ },
1271
+ {
1272
+ "epoch": 0.96,
1273
+ "learning_rate": 2.372847616895685e-08,
1274
+ "logits/chosen": -0.04904794320464134,
1275
+ "logits/rejected": -0.019006099551916122,
1276
+ "logps/chosen": -542.4931640625,
1277
+ "logps/rejected": -638.1673583984375,
1278
+ "loss": 0.1889,
1279
+ "rewards/accuracies": 0.5687500238418579,
1280
+ "rewards/chosen": -0.04928978905081749,
1281
+ "rewards/margins": 0.03806794807314873,
1282
+ "rewards/rejected": -0.08735774457454681,
1283
+ "step": 900
1284
+ },
1285
+ {
1286
+ "epoch": 0.97,
1287
+ "learning_rate": 1.264490846553279e-08,
1288
+ "logits/chosen": -0.12707039713859558,
1289
+ "logits/rejected": -0.10833065211772919,
1290
+ "logps/chosen": -579.73681640625,
1291
+ "logps/rejected": -622.3654174804688,
1292
+ "loss": 0.1897,
1293
+ "rewards/accuracies": 0.574999988079071,
1294
+ "rewards/chosen": -0.046609390527009964,
1295
+ "rewards/margins": 0.03541853651404381,
1296
+ "rewards/rejected": -0.08202792704105377,
1297
+ "step": 910
1298
+ },
1299
+ {
1300
+ "epoch": 0.98,
1301
+ "learning_rate": 5.015418611516165e-09,
1302
+ "logits/chosen": -0.0854305848479271,
1303
+ "logits/rejected": -0.11656080186367035,
1304
+ "logps/chosen": -616.4360961914062,
1305
+ "logps/rejected": -670.5054931640625,
1306
+ "loss": 0.1907,
1307
+ "rewards/accuracies": 0.5249999761581421,
1308
+ "rewards/chosen": -0.04680439084768295,
1309
+ "rewards/margins": 0.05593379586935043,
1310
+ "rewards/rejected": -0.10273818671703339,
1311
+ "step": 920
1312
+ },
1313
+ {
1314
+ "epoch": 0.99,
1315
+ "learning_rate": 8.506013354186993e-10,
1316
+ "logits/chosen": -0.11298644542694092,
1317
+ "logits/rejected": -0.03937912359833717,
1318
+ "logps/chosen": -532.8866577148438,
1319
+ "logps/rejected": -597.7803344726562,
1320
+ "loss": 0.2033,
1321
+ "rewards/accuracies": 0.5062500238418579,
1322
+ "rewards/chosen": -0.043054092675447464,
1323
+ "rewards/margins": 0.037277717143297195,
1324
+ "rewards/rejected": -0.08033180981874466,
1325
+ "step": 930
1326
+ },
1327
+ {
1328
+ "epoch": 1.0,
1329
+ "step": 937,
1330
+ "total_flos": 0.0,
1331
+ "train_loss": 0.19462941225971966,
1332
+ "train_runtime": 7972.3934,
1333
+ "train_samples_per_second": 3.763,
1334
+ "train_steps_per_second": 0.118
1335
+ }
1336
+ ],
1337
+ "logging_steps": 10,
1338
+ "max_steps": 937,
1339
+ "num_input_tokens_seen": 0,
1340
+ "num_train_epochs": 1,
1341
+ "save_steps": 100,
1342
+ "total_flos": 0.0,
1343
+ "train_batch_size": 4,
1344
+ "trial_name": null,
1345
+ "trial_params": null
1346
+ }