silviasapora commited on
Commit
c57d7a0
·
verified ·
1 Parent(s): c45cbf8

Model save

Browse files
Files changed (4) hide show
  1. README.md +67 -0
  2. all_results.json +9 -0
  3. train_results.json +9 -0
  4. trainer_state.json +1176 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-7b
3
+ library_name: transformers
4
+ model_name: gemma-7b-softplus-basic-5e-5
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - orpo
9
+ licence: license
10
+ ---
11
+
12
+ # Model Card for gemma-7b-softplus-basic-5e-5
13
+
14
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="silviasapora/gemma-7b-softplus-basic-5e-5", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
+
28
+ ## Training procedure
29
+
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/silvias/huggingface/runs/xu25ngo6)
31
+
32
+
33
+ This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
34
+
35
+ ### Framework versions
36
+
37
+ - TRL: 0.13.0
38
+ - Transformers: 4.48.1
39
+ - Pytorch: 2.5.1
40
+ - Datasets: 3.2.0
41
+ - Tokenizers: 0.21.0
42
+
43
+ ## Citations
44
+
45
+ Cite ORPO as:
46
+
47
+ ```bibtex
48
+ @article{hong2024orpo,
49
+ title = {{ORPO: Monolithic Preference Optimization without Reference Model}},
50
+ author = {Jiwoo Hong and Noah Lee and James Thorne},
51
+ year = 2024,
52
+ eprint = {arXiv:2403.07691}
53
+ }
54
+ ```
55
+
56
+ Cite TRL as:
57
+
58
+ ```bibtex
59
+ @misc{vonwerra2022trl,
60
+ title = {{TRL: Transformer Reinforcement Learning}},
61
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
62
+ year = 2020,
63
+ journal = {GitHub repository},
64
+ publisher = {GitHub},
65
+ howpublished = {\url{https://github.com/huggingface/trl}}
66
+ }
67
+ ```
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9765925925925925,
3
+ "total_flos": 0.0,
4
+ "train_loss": 71.97097947862413,
5
+ "train_runtime": 8116.2448,
6
+ "train_samples": 6750,
7
+ "train_samples_per_second": 2.495,
8
+ "train_steps_per_second": 0.039
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9765925925925925,
3
+ "total_flos": 0.0,
4
+ "train_loss": 71.97097947862413,
5
+ "train_runtime": 8116.2448,
6
+ "train_samples": 6750,
7
+ "train_samples_per_second": 2.495,
8
+ "train_steps_per_second": 0.039
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.9765925925925925,
5
+ "eval_steps": 500,
6
+ "global_step": 315,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.047407407407407405,
13
+ "grad_norm": 899.4839477539062,
14
+ "learning_rate": 7.8125e-06,
15
+ "log_odds_chosen": -3.253441572189331,
16
+ "log_odds_ratio": -11.290441513061523,
17
+ "logits/chosen": 151.96023559570312,
18
+ "logits/rejected": 125.16461181640625,
19
+ "logps/chosen": -21.246450424194336,
20
+ "logps/rejected": -17.994016647338867,
21
+ "loss": 456.0568,
22
+ "nll_loss": 9.107990264892578,
23
+ "rewards/accuracies": 0.512499988079071,
24
+ "rewards/chosen": -10.623225212097168,
25
+ "rewards/margins": -1.6262180805206299,
26
+ "rewards/rejected": -8.997008323669434,
27
+ "step": 5
28
+ },
29
+ {
30
+ "epoch": 0.09481481481481481,
31
+ "grad_norm": 665.6010131835938,
32
+ "learning_rate": 1.5625e-05,
33
+ "log_odds_chosen": -0.5072727203369141,
34
+ "log_odds_ratio": -9.659940719604492,
35
+ "logits/chosen": 151.84805297851562,
36
+ "logits/rejected": 114.96217346191406,
37
+ "logps/chosen": -19.77961540222168,
38
+ "logps/rejected": -19.273290634155273,
39
+ "loss": 394.0516,
40
+ "nll_loss": 8.324145317077637,
41
+ "rewards/accuracies": 0.40625,
42
+ "rewards/chosen": -9.88980770111084,
43
+ "rewards/margins": -0.25316303968429565,
44
+ "rewards/rejected": -9.636645317077637,
45
+ "step": 10
46
+ },
47
+ {
48
+ "epoch": 0.14222222222222222,
49
+ "grad_norm": 584.8187255859375,
50
+ "learning_rate": 2.34375e-05,
51
+ "log_odds_chosen": -3.0495519638061523,
52
+ "log_odds_ratio": -12.564888000488281,
53
+ "logits/chosen": 165.22348022460938,
54
+ "logits/rejected": 135.7709197998047,
55
+ "logps/chosen": -23.73749351501465,
56
+ "logps/rejected": -20.685997009277344,
57
+ "loss": 398.6285,
58
+ "nll_loss": 8.013145446777344,
59
+ "rewards/accuracies": 0.45625001192092896,
60
+ "rewards/chosen": -11.868746757507324,
61
+ "rewards/margins": -1.5257480144500732,
62
+ "rewards/rejected": -10.342998504638672,
63
+ "step": 15
64
+ },
65
+ {
66
+ "epoch": 0.18962962962962962,
67
+ "grad_norm": 2222.182861328125,
68
+ "learning_rate": 3.125e-05,
69
+ "log_odds_chosen": -1.6242220401763916,
70
+ "log_odds_ratio": -6.402337551116943,
71
+ "logits/chosen": 166.9097137451172,
72
+ "logits/rejected": 148.36062622070312,
73
+ "logps/chosen": -13.310510635375977,
74
+ "logps/rejected": -11.688575744628906,
75
+ "loss": 263.4254,
76
+ "nll_loss": 5.480126857757568,
77
+ "rewards/accuracies": 0.4312500059604645,
78
+ "rewards/chosen": -6.655255317687988,
79
+ "rewards/margins": -0.8109671473503113,
80
+ "rewards/rejected": -5.844287872314453,
81
+ "step": 20
82
+ },
83
+ {
84
+ "epoch": 0.23703703703703705,
85
+ "grad_norm": 239.38790893554688,
86
+ "learning_rate": 3.90625e-05,
87
+ "log_odds_chosen": -0.3463691771030426,
88
+ "log_odds_ratio": -1.4706324338912964,
89
+ "logits/chosen": 178.37814331054688,
90
+ "logits/rejected": 188.9853515625,
91
+ "logps/chosen": -2.9337403774261475,
92
+ "logps/rejected": -2.58659291267395,
93
+ "loss": 101.069,
94
+ "nll_loss": 2.465604305267334,
95
+ "rewards/accuracies": 0.512499988079071,
96
+ "rewards/chosen": -1.4668701887130737,
97
+ "rewards/margins": -0.17357376217842102,
98
+ "rewards/rejected": -1.293296456336975,
99
+ "step": 25
100
+ },
101
+ {
102
+ "epoch": 0.28444444444444444,
103
+ "grad_norm": 168.3093719482422,
104
+ "learning_rate": 4.6875e-05,
105
+ "log_odds_chosen": -0.08818509429693222,
106
+ "log_odds_ratio": -0.9202126264572144,
107
+ "logits/chosen": 208.46914672851562,
108
+ "logits/rejected": 211.0690155029297,
109
+ "logps/chosen": -1.8011804819107056,
110
+ "logps/rejected": -1.7284847497940063,
111
+ "loss": 76.8201,
112
+ "nll_loss": 1.9414145946502686,
113
+ "rewards/accuracies": 0.48750001192092896,
114
+ "rewards/chosen": -0.9005902409553528,
115
+ "rewards/margins": -0.036347877234220505,
116
+ "rewards/rejected": -0.8642423748970032,
117
+ "step": 30
118
+ },
119
+ {
120
+ "epoch": 0.33185185185185184,
121
+ "grad_norm": 344.3992919921875,
122
+ "learning_rate": 4.998613757348784e-05,
123
+ "log_odds_chosen": 0.2355690896511078,
124
+ "log_odds_ratio": -0.7713075876235962,
125
+ "logits/chosen": 221.82272338867188,
126
+ "logits/rejected": 217.92764282226562,
127
+ "logps/chosen": -1.5275888442993164,
128
+ "logps/rejected": -1.7404054403305054,
129
+ "loss": 73.3205,
130
+ "nll_loss": 1.7808992862701416,
131
+ "rewards/accuracies": 0.5,
132
+ "rewards/chosen": -0.7637944221496582,
133
+ "rewards/margins": 0.1064082607626915,
134
+ "rewards/rejected": -0.8702027201652527,
135
+ "step": 35
136
+ },
137
+ {
138
+ "epoch": 0.37925925925925924,
139
+ "grad_norm": 205.59912109375,
140
+ "learning_rate": 4.990147841143462e-05,
141
+ "log_odds_chosen": 0.3138998746871948,
142
+ "log_odds_ratio": -0.7420972585678101,
143
+ "logits/chosen": 225.82821655273438,
144
+ "logits/rejected": 226.96774291992188,
145
+ "logps/chosen": -1.5016671419143677,
146
+ "logps/rejected": -1.7676680088043213,
147
+ "loss": 71.1531,
148
+ "nll_loss": 1.7955719232559204,
149
+ "rewards/accuracies": 0.574999988079071,
150
+ "rewards/chosen": -0.7508335709571838,
151
+ "rewards/margins": 0.1330004334449768,
152
+ "rewards/rejected": -0.8838340044021606,
153
+ "step": 40
154
+ },
155
+ {
156
+ "epoch": 0.4266666666666667,
157
+ "grad_norm": 139.50970458984375,
158
+ "learning_rate": 4.97401218720448e-05,
159
+ "log_odds_chosen": 0.1954536736011505,
160
+ "log_odds_ratio": -0.72107994556427,
161
+ "logits/chosen": 208.2395782470703,
162
+ "logits/rejected": 205.417236328125,
163
+ "logps/chosen": -1.3215817213058472,
164
+ "logps/rejected": -1.5023378133773804,
165
+ "loss": 66.4276,
166
+ "nll_loss": 1.647528886795044,
167
+ "rewards/accuracies": 0.550000011920929,
168
+ "rewards/chosen": -0.6607908606529236,
169
+ "rewards/margins": 0.0903780609369278,
170
+ "rewards/rejected": -0.7511689066886902,
171
+ "step": 45
172
+ },
173
+ {
174
+ "epoch": 0.4740740740740741,
175
+ "grad_norm": 77.09749603271484,
176
+ "learning_rate": 4.9502564938797946e-05,
177
+ "log_odds_chosen": 0.19976872205734253,
178
+ "log_odds_ratio": -0.7074006199836731,
179
+ "logits/chosen": 202.8539276123047,
180
+ "logits/rejected": 195.70155334472656,
181
+ "logps/chosen": -1.2350599765777588,
182
+ "logps/rejected": -1.4153530597686768,
183
+ "loss": 63.0281,
184
+ "nll_loss": 1.5506360530853271,
185
+ "rewards/accuracies": 0.574999988079071,
186
+ "rewards/chosen": -0.6175299882888794,
187
+ "rewards/margins": 0.0901465192437172,
188
+ "rewards/rejected": -0.7076765298843384,
189
+ "step": 50
190
+ },
191
+ {
192
+ "epoch": 0.5214814814814814,
193
+ "grad_norm": 59.40414810180664,
194
+ "learning_rate": 4.918953929490768e-05,
195
+ "log_odds_chosen": 0.14897426962852478,
196
+ "log_odds_ratio": -0.7380977869033813,
197
+ "logits/chosen": 190.48110961914062,
198
+ "logits/rejected": 195.70455932617188,
199
+ "logps/chosen": -1.221083641052246,
200
+ "logps/rejected": -1.3429195880889893,
201
+ "loss": 62.9068,
202
+ "nll_loss": 1.543068528175354,
203
+ "rewards/accuracies": 0.518750011920929,
204
+ "rewards/chosen": -0.610541820526123,
205
+ "rewards/margins": 0.0609179362654686,
206
+ "rewards/rejected": -0.6714597940444946,
207
+ "step": 55
208
+ },
209
+ {
210
+ "epoch": 0.5688888888888889,
211
+ "grad_norm": 126.07240295410156,
212
+ "learning_rate": 4.88020090697132e-05,
213
+ "log_odds_chosen": 0.15872475504875183,
214
+ "log_odds_ratio": -0.7120335102081299,
215
+ "logits/chosen": 186.23619079589844,
216
+ "logits/rejected": 179.56393432617188,
217
+ "logps/chosen": -1.2440166473388672,
218
+ "logps/rejected": -1.3791967630386353,
219
+ "loss": 63.2646,
220
+ "nll_loss": 1.5736169815063477,
221
+ "rewards/accuracies": 0.543749988079071,
222
+ "rewards/chosen": -0.6220083236694336,
223
+ "rewards/margins": 0.06759000569581985,
224
+ "rewards/rejected": -0.6895983815193176,
225
+ "step": 60
226
+ },
227
+ {
228
+ "epoch": 0.6162962962962963,
229
+ "grad_norm": 139.16448974609375,
230
+ "learning_rate": 4.834116786912897e-05,
231
+ "log_odds_chosen": 0.18847045302391052,
232
+ "log_odds_ratio": -0.7004148364067078,
233
+ "logits/chosen": 180.8818359375,
234
+ "logits/rejected": 185.00570678710938,
235
+ "logps/chosen": -1.2013404369354248,
236
+ "logps/rejected": -1.3533903360366821,
237
+ "loss": 60.8734,
238
+ "nll_loss": 1.485313057899475,
239
+ "rewards/accuracies": 0.5625,
240
+ "rewards/chosen": -0.6006702184677124,
241
+ "rewards/margins": 0.07602496445178986,
242
+ "rewards/rejected": -0.6766951680183411,
243
+ "step": 65
244
+ },
245
+ {
246
+ "epoch": 0.6637037037037037,
247
+ "grad_norm": 251.40016174316406,
248
+ "learning_rate": 4.7808435099299045e-05,
249
+ "log_odds_chosen": 0.18260684609413147,
250
+ "log_odds_ratio": -0.7129999995231628,
251
+ "logits/chosen": 182.959716796875,
252
+ "logits/rejected": 183.2157745361328,
253
+ "logps/chosen": -1.1487746238708496,
254
+ "logps/rejected": -1.2979029417037964,
255
+ "loss": 61.4203,
256
+ "nll_loss": 1.4665228128433228,
257
+ "rewards/accuracies": 0.543749988079071,
258
+ "rewards/chosen": -0.5743873119354248,
259
+ "rewards/margins": 0.07456417381763458,
260
+ "rewards/rejected": -0.6489514708518982,
261
+ "step": 70
262
+ },
263
+ {
264
+ "epoch": 0.7111111111111111,
265
+ "grad_norm": 124.29930877685547,
266
+ "learning_rate": 4.720545159477922e-05,
267
+ "log_odds_chosen": 0.17588524520397186,
268
+ "log_odds_ratio": -0.7309717535972595,
269
+ "logits/chosen": 175.79225158691406,
270
+ "logits/rejected": 185.4251708984375,
271
+ "logps/chosen": -1.171048641204834,
272
+ "logps/rejected": -1.3039839267730713,
273
+ "loss": 60.8133,
274
+ "nll_loss": 1.5102540254592896,
275
+ "rewards/accuracies": 0.574999988079071,
276
+ "rewards/chosen": -0.585524320602417,
277
+ "rewards/margins": 0.06646756827831268,
278
+ "rewards/rejected": -0.6519919633865356,
279
+ "step": 75
280
+ },
281
+ {
282
+ "epoch": 0.7585185185185185,
283
+ "grad_norm": 358.4803771972656,
284
+ "learning_rate": 4.653407456471222e-05,
285
+ "log_odds_chosen": 0.1115400418639183,
286
+ "log_odds_ratio": -0.7338749170303345,
287
+ "logits/chosen": 191.45970153808594,
288
+ "logits/rejected": 181.09698486328125,
289
+ "logps/chosen": -1.1183122396469116,
290
+ "logps/rejected": -1.2054791450500488,
291
+ "loss": 57.9734,
292
+ "nll_loss": 1.4206385612487793,
293
+ "rewards/accuracies": 0.4937500059604645,
294
+ "rewards/chosen": -0.5591561198234558,
295
+ "rewards/margins": 0.04358343034982681,
296
+ "rewards/rejected": -0.6027395725250244,
297
+ "step": 80
298
+ },
299
+ {
300
+ "epoch": 0.8059259259259259,
301
+ "grad_norm": 136.4271697998047,
302
+ "learning_rate": 4.579637187256222e-05,
303
+ "log_odds_chosen": 0.04175865277647972,
304
+ "log_odds_ratio": -0.7415879964828491,
305
+ "logits/chosen": 192.60121154785156,
306
+ "logits/rejected": 192.03916931152344,
307
+ "logps/chosen": -1.0868916511535645,
308
+ "logps/rejected": -1.1372497081756592,
309
+ "loss": 59.0801,
310
+ "nll_loss": 1.4550647735595703,
311
+ "rewards/accuracies": 0.574999988079071,
312
+ "rewards/chosen": -0.5434458255767822,
313
+ "rewards/margins": 0.025179097428917885,
314
+ "rewards/rejected": -0.5686248540878296,
315
+ "step": 85
316
+ },
317
+ {
318
+ "epoch": 0.8533333333333334,
319
+ "grad_norm": 56.09324645996094,
320
+ "learning_rate": 4.499461566702685e-05,
321
+ "log_odds_chosen": 0.12758071720600128,
322
+ "log_odds_ratio": -0.7168221473693848,
323
+ "logits/chosen": 182.20901489257812,
324
+ "logits/rejected": 187.005126953125,
325
+ "logps/chosen": -1.0412800312042236,
326
+ "logps/rejected": -1.140512466430664,
327
+ "loss": 56.929,
328
+ "nll_loss": 1.3427871465682983,
329
+ "rewards/accuracies": 0.5625,
330
+ "rewards/chosen": -0.5206400156021118,
331
+ "rewards/margins": 0.04961626976728439,
332
+ "rewards/rejected": -0.570256233215332,
333
+ "step": 90
334
+ },
335
+ {
336
+ "epoch": 0.9007407407407407,
337
+ "grad_norm": 101.89226531982422,
338
+ "learning_rate": 4.413127538374411e-05,
339
+ "log_odds_chosen": 0.03433248773217201,
340
+ "log_odds_ratio": -0.7621655464172363,
341
+ "logits/chosen": 183.79061889648438,
342
+ "logits/rejected": 181.02096557617188,
343
+ "logps/chosen": -1.1271755695343018,
344
+ "logps/rejected": -1.155602216720581,
345
+ "loss": 57.6873,
346
+ "nll_loss": 1.4238755702972412,
347
+ "rewards/accuracies": 0.4937500059604645,
348
+ "rewards/chosen": -0.5635877847671509,
349
+ "rewards/margins": 0.014213320799171925,
350
+ "rewards/rejected": -0.5778011083602905,
351
+ "step": 95
352
+ },
353
+ {
354
+ "epoch": 0.9481481481481482,
355
+ "grad_norm": 56.19839859008789,
356
+ "learning_rate": 4.320901013934887e-05,
357
+ "log_odds_chosen": 0.19043493270874023,
358
+ "log_odds_ratio": -0.6898019313812256,
359
+ "logits/chosen": 187.52651977539062,
360
+ "logits/rejected": 188.4263458251953,
361
+ "logps/chosen": -1.064992904663086,
362
+ "logps/rejected": -1.2005788087844849,
363
+ "loss": 56.784,
364
+ "nll_loss": 1.3723514080047607,
365
+ "rewards/accuracies": 0.5562499761581421,
366
+ "rewards/chosen": -0.532496452331543,
367
+ "rewards/margins": 0.06779297441244125,
368
+ "rewards/rejected": -0.6002894043922424,
369
+ "step": 100
370
+ },
371
+ {
372
+ "epoch": 0.9955555555555555,
373
+ "grad_norm": 134.25718688964844,
374
+ "learning_rate": 4.223066054130568e-05,
375
+ "log_odds_chosen": 0.19098126888275146,
376
+ "log_odds_ratio": -0.708988606929779,
377
+ "logits/chosen": 186.099365234375,
378
+ "logits/rejected": 191.89564514160156,
379
+ "logps/chosen": -1.0854551792144775,
380
+ "logps/rejected": -1.2490203380584717,
381
+ "loss": 59.6264,
382
+ "nll_loss": 1.4756276607513428,
383
+ "rewards/accuracies": 0.5562499761581421,
384
+ "rewards/chosen": -0.5427275896072388,
385
+ "rewards/margins": 0.0817825049161911,
386
+ "rewards/rejected": -0.6245101690292358,
387
+ "step": 105
388
+ },
389
+ {
390
+ "epoch": 1.037925925925926,
391
+ "grad_norm": 48.896156311035156,
392
+ "learning_rate": 4.1199239938743797e-05,
393
+ "log_odds_chosen": 0.09143175929784775,
394
+ "log_odds_ratio": -0.7199470400810242,
395
+ "logits/chosen": 192.02879333496094,
396
+ "logits/rejected": 189.01097106933594,
397
+ "logps/chosen": -1.0210695266723633,
398
+ "logps/rejected": -1.0847587585449219,
399
+ "loss": 46.1645,
400
+ "nll_loss": 1.2595568895339966,
401
+ "rewards/accuracies": 0.5384615659713745,
402
+ "rewards/chosen": -0.5105347633361816,
403
+ "rewards/margins": 0.031844645738601685,
404
+ "rewards/rejected": -0.5423793792724609,
405
+ "step": 110
406
+ },
407
+ {
408
+ "epoch": 1.0853333333333333,
409
+ "grad_norm": 44.063316345214844,
410
+ "learning_rate": 4.0117925141242174e-05,
411
+ "log_odds_chosen": 0.09873426705598831,
412
+ "log_odds_ratio": -0.741472601890564,
413
+ "logits/chosen": 185.77395629882812,
414
+ "logits/rejected": 194.8834686279297,
415
+ "logps/chosen": -0.9893190264701843,
416
+ "logps/rejected": -1.0386199951171875,
417
+ "loss": 50.7825,
418
+ "nll_loss": 1.2114965915679932,
419
+ "rewards/accuracies": 0.550000011920929,
420
+ "rewards/chosen": -0.49465951323509216,
421
+ "rewards/margins": 0.024650435894727707,
422
+ "rewards/rejected": -0.5193099975585938,
423
+ "step": 115
424
+ },
425
+ {
426
+ "epoch": 1.1327407407407408,
427
+ "grad_norm": 42.30266571044922,
428
+ "learning_rate": 3.899004663415084e-05,
429
+ "log_odds_chosen": 0.03399781137704849,
430
+ "log_odds_ratio": -0.7742182612419128,
431
+ "logits/chosen": 182.58143615722656,
432
+ "logits/rejected": 186.83212280273438,
433
+ "logps/chosen": -0.9977661967277527,
434
+ "logps/rejected": -1.0048539638519287,
435
+ "loss": 51.3458,
436
+ "nll_loss": 1.2454302310943604,
437
+ "rewards/accuracies": 0.512499988079071,
438
+ "rewards/chosen": -0.49888309836387634,
439
+ "rewards/margins": 0.0035438979975879192,
440
+ "rewards/rejected": -0.5024269819259644,
441
+ "step": 120
442
+ },
443
+ {
444
+ "epoch": 1.1801481481481482,
445
+ "grad_norm": 125.85904693603516,
446
+ "learning_rate": 3.781907832058587e-05,
447
+ "log_odds_chosen": 0.09150873124599457,
448
+ "log_odds_ratio": -0.7233962416648865,
449
+ "logits/chosen": 181.49447631835938,
450
+ "logits/rejected": 180.0559844970703,
451
+ "logps/chosen": -0.9917734861373901,
452
+ "logps/rejected": -1.051242470741272,
453
+ "loss": 49.8693,
454
+ "nll_loss": 1.1629549264907837,
455
+ "rewards/accuracies": 0.59375,
456
+ "rewards/chosen": -0.49588674306869507,
457
+ "rewards/margins": 0.029734421521425247,
458
+ "rewards/rejected": -0.525621235370636,
459
+ "step": 125
460
+ },
461
+ {
462
+ "epoch": 1.2275555555555555,
463
+ "grad_norm": 39.95571517944336,
464
+ "learning_rate": 3.660862682169282e-05,
465
+ "log_odds_chosen": 0.09172184765338898,
466
+ "log_odds_ratio": -0.7451839447021484,
467
+ "logits/chosen": 179.13302612304688,
468
+ "logits/rejected": 183.4129638671875,
469
+ "logps/chosen": -1.0290215015411377,
470
+ "logps/rejected": -1.0855556726455688,
471
+ "loss": 51.0956,
472
+ "nll_loss": 1.1969176530838013,
473
+ "rewards/accuracies": 0.518750011920929,
474
+ "rewards/chosen": -0.5145107507705688,
475
+ "rewards/margins": 0.028267022222280502,
476
+ "rewards/rejected": -0.5427778363227844,
477
+ "step": 130
478
+ },
479
+ {
480
+ "epoch": 1.274962962962963,
481
+ "grad_norm": 115.57888793945312,
482
+ "learning_rate": 3.5362420368134356e-05,
483
+ "log_odds_chosen": 0.07470552623271942,
484
+ "log_odds_ratio": -0.7265952825546265,
485
+ "logits/chosen": 183.11488342285156,
486
+ "logits/rejected": 177.27346801757812,
487
+ "logps/chosen": -0.9192112684249878,
488
+ "logps/rejected": -0.9482501745223999,
489
+ "loss": 49.3732,
490
+ "nll_loss": 1.1410008668899536,
491
+ "rewards/accuracies": 0.574999988079071,
492
+ "rewards/chosen": -0.4596056342124939,
493
+ "rewards/margins": 0.0145194623619318,
494
+ "rewards/rejected": -0.47412508726119995,
495
+ "step": 135
496
+ },
497
+ {
498
+ "epoch": 1.3223703703703704,
499
+ "grad_norm": 40.40840148925781,
500
+ "learning_rate": 3.408429731701635e-05,
501
+ "log_odds_chosen": 0.15658657252788544,
502
+ "log_odds_ratio": -0.7292534112930298,
503
+ "logits/chosen": 183.2898712158203,
504
+ "logits/rejected": 183.63088989257812,
505
+ "logps/chosen": -0.9870889782905579,
506
+ "logps/rejected": -1.0798033475875854,
507
+ "loss": 51.391,
508
+ "nll_loss": 1.1972719430923462,
509
+ "rewards/accuracies": 0.5625,
510
+ "rewards/chosen": -0.49354448914527893,
511
+ "rewards/margins": 0.046357229351997375,
512
+ "rewards/rejected": -0.5399016737937927,
513
+ "step": 140
514
+ },
515
+ {
516
+ "epoch": 1.3697777777777778,
517
+ "grad_norm": 43.5965461730957,
518
+ "learning_rate": 3.2778194329621104e-05,
519
+ "log_odds_chosen": 0.0713091716170311,
520
+ "log_odds_ratio": -0.7587572932243347,
521
+ "logits/chosen": 175.42311096191406,
522
+ "logits/rejected": 179.28359985351562,
523
+ "logps/chosen": -0.9829910397529602,
524
+ "logps/rejected": -1.0261934995651245,
525
+ "loss": 51.113,
526
+ "nll_loss": 1.2084944248199463,
527
+ "rewards/accuracies": 0.48750001192092896,
528
+ "rewards/chosen": -0.4914955198764801,
529
+ "rewards/margins": 0.021601270884275436,
530
+ "rewards/rejected": -0.5130967497825623,
531
+ "step": 145
532
+ },
533
+ {
534
+ "epoch": 1.417185185185185,
535
+ "grad_norm": 68.55988311767578,
536
+ "learning_rate": 3.144813424636031e-05,
537
+ "log_odds_chosen": 0.03821509703993797,
538
+ "log_odds_ratio": -0.7714879512786865,
539
+ "logits/chosen": 173.73973083496094,
540
+ "logits/rejected": 177.93600463867188,
541
+ "logps/chosen": -1.0191147327423096,
542
+ "logps/rejected": -1.0414360761642456,
543
+ "loss": 49.2398,
544
+ "nll_loss": 1.162936806678772,
545
+ "rewards/accuracies": 0.46875,
546
+ "rewards/chosen": -0.5095573663711548,
547
+ "rewards/margins": 0.011160662397742271,
548
+ "rewards/rejected": -0.5207180380821228,
549
+ "step": 150
550
+ },
551
+ {
552
+ "epoch": 1.4645925925925927,
553
+ "grad_norm": 87.03449249267578,
554
+ "learning_rate": 3.0098213696293542e-05,
555
+ "log_odds_chosen": 0.10836371034383774,
556
+ "log_odds_ratio": -0.7314402461051941,
557
+ "logits/chosen": 175.88619995117188,
558
+ "logits/rejected": 182.52371215820312,
559
+ "logps/chosen": -0.9865490794181824,
560
+ "logps/rejected": -1.0499975681304932,
561
+ "loss": 50.2605,
562
+ "nll_loss": 1.1926219463348389,
563
+ "rewards/accuracies": 0.5562499761581421,
564
+ "rewards/chosen": -0.4932745397090912,
565
+ "rewards/margins": 0.03172413632273674,
566
+ "rewards/rejected": -0.5249987840652466,
567
+ "step": 155
568
+ },
569
+ {
570
+ "epoch": 1.512,
571
+ "grad_norm": 58.0168342590332,
572
+ "learning_rate": 2.8732590479375165e-05,
573
+ "log_odds_chosen": 0.12078354507684708,
574
+ "log_odds_ratio": -0.7301235795021057,
575
+ "logits/chosen": 173.8167266845703,
576
+ "logits/rejected": 181.9156494140625,
577
+ "logps/chosen": -0.9404324293136597,
578
+ "logps/rejected": -1.0260807275772095,
579
+ "loss": 48.9856,
580
+ "nll_loss": 1.1453100442886353,
581
+ "rewards/accuracies": 0.512499988079071,
582
+ "rewards/chosen": -0.47021621465682983,
583
+ "rewards/margins": 0.0428241565823555,
584
+ "rewards/rejected": -0.5130403637886047,
585
+ "step": 160
586
+ },
587
+ {
588
+ "epoch": 1.5594074074074074,
589
+ "grad_norm": 47.982913970947266,
590
+ "learning_rate": 2.7355470760292956e-05,
591
+ "log_odds_chosen": 0.0946766808629036,
592
+ "log_odds_ratio": -0.7334403395652771,
593
+ "logits/chosen": 166.52523803710938,
594
+ "logits/rejected": 171.69485473632812,
595
+ "logps/chosen": -0.9203466176986694,
596
+ "logps/rejected": -0.9718505144119263,
597
+ "loss": 50.1007,
598
+ "nll_loss": 1.1756643056869507,
599
+ "rewards/accuracies": 0.543749988079071,
600
+ "rewards/chosen": -0.4601733088493347,
601
+ "rewards/margins": 0.025751952081918716,
602
+ "rewards/rejected": -0.48592525720596313,
603
+ "step": 165
604
+ },
605
+ {
606
+ "epoch": 1.6068148148148147,
607
+ "grad_norm": 50.041561126708984,
608
+ "learning_rate": 2.597109611334169e-05,
609
+ "log_odds_chosen": 0.17343485355377197,
610
+ "log_odds_ratio": -0.6794081926345825,
611
+ "logits/chosen": 171.59066772460938,
612
+ "logits/rejected": 179.27670288085938,
613
+ "logps/chosen": -0.936198353767395,
614
+ "logps/rejected": -1.0650147199630737,
615
+ "loss": 49.7789,
616
+ "nll_loss": 1.186452031135559,
617
+ "rewards/accuracies": 0.574999988079071,
618
+ "rewards/chosen": -0.4680991768836975,
619
+ "rewards/margins": 0.06440822780132294,
620
+ "rewards/rejected": -0.5325073599815369,
621
+ "step": 170
622
+ },
623
+ {
624
+ "epoch": 1.6542222222222223,
625
+ "grad_norm": 47.45258331298828,
626
+ "learning_rate": 2.458373045823404e-05,
627
+ "log_odds_chosen": 0.07308103144168854,
628
+ "log_odds_ratio": -0.7454826235771179,
629
+ "logits/chosen": 172.97149658203125,
630
+ "logits/rejected": 172.43182373046875,
631
+ "logps/chosen": -0.9455814361572266,
632
+ "logps/rejected": -1.0050225257873535,
633
+ "loss": 49.1083,
634
+ "nll_loss": 1.1290260553359985,
635
+ "rewards/accuracies": 0.5375000238418579,
636
+ "rewards/chosen": -0.4727907180786133,
637
+ "rewards/margins": 0.029720569029450417,
638
+ "rewards/rejected": -0.5025112628936768,
639
+ "step": 175
640
+ },
641
+ {
642
+ "epoch": 1.7016296296296296,
643
+ "grad_norm": 58.00626754760742,
644
+ "learning_rate": 2.3197646927086697e-05,
645
+ "log_odds_chosen": 0.1577189862728119,
646
+ "log_odds_ratio": -0.7023075819015503,
647
+ "logits/chosen": 168.512451171875,
648
+ "logits/rejected": 175.3289794921875,
649
+ "logps/chosen": -0.9105680584907532,
650
+ "logps/rejected": -1.0289887189865112,
651
+ "loss": 49.2958,
652
+ "nll_loss": 1.1708818674087524,
653
+ "rewards/accuracies": 0.53125,
654
+ "rewards/chosen": -0.4552840292453766,
655
+ "rewards/margins": 0.059210360050201416,
656
+ "rewards/rejected": -0.5144943594932556,
657
+ "step": 180
658
+ },
659
+ {
660
+ "epoch": 1.749037037037037,
661
+ "grad_norm": 34.98508071899414,
662
+ "learning_rate": 2.1817114703032176e-05,
663
+ "log_odds_chosen": 0.12351224571466446,
664
+ "log_odds_ratio": -0.7270097732543945,
665
+ "logits/chosen": 173.38623046875,
666
+ "logits/rejected": 165.6890869140625,
667
+ "logps/chosen": -0.9919120669364929,
668
+ "logps/rejected": -1.0763611793518066,
669
+ "loss": 48.9183,
670
+ "nll_loss": 1.1456924676895142,
671
+ "rewards/accuracies": 0.5562499761581421,
672
+ "rewards/chosen": -0.49595603346824646,
673
+ "rewards/margins": 0.04222451522946358,
674
+ "rewards/rejected": -0.5381805896759033,
675
+ "step": 185
676
+ },
677
+ {
678
+ "epoch": 1.7964444444444445,
679
+ "grad_norm": 35.57364273071289,
680
+ "learning_rate": 2.0446385870993467e-05,
681
+ "log_odds_chosen": 0.046650804579257965,
682
+ "log_odds_ratio": -0.7537095546722412,
683
+ "logits/chosen": 161.9040069580078,
684
+ "logits/rejected": 164.87686157226562,
685
+ "logps/chosen": -0.9197586178779602,
686
+ "logps/rejected": -0.9383406639099121,
687
+ "loss": 49.163,
688
+ "nll_loss": 1.1618720293045044,
689
+ "rewards/accuracies": 0.48750001192092896,
690
+ "rewards/chosen": -0.4598793089389801,
691
+ "rewards/margins": 0.009291025809943676,
692
+ "rewards/rejected": -0.46917033195495605,
693
+ "step": 190
694
+ },
695
+ {
696
+ "epoch": 1.8438518518518519,
697
+ "grad_norm": 57.462890625,
698
+ "learning_rate": 1.9089682321121834e-05,
699
+ "log_odds_chosen": 0.10921354591846466,
700
+ "log_odds_ratio": -0.7022596001625061,
701
+ "logits/chosen": 164.3706817626953,
702
+ "logits/rejected": 167.53907775878906,
703
+ "logps/chosen": -0.9306834936141968,
704
+ "logps/rejected": -1.0057896375656128,
705
+ "loss": 48.8493,
706
+ "nll_loss": 1.1373628377914429,
707
+ "rewards/accuracies": 0.550000011920929,
708
+ "rewards/chosen": -0.4653417468070984,
709
+ "rewards/margins": 0.03755306452512741,
710
+ "rewards/rejected": -0.5028948187828064,
711
+ "step": 195
712
+ },
713
+ {
714
+ "epoch": 1.8912592592592592,
715
+ "grad_norm": 51.58010482788086,
716
+ "learning_rate": 1.775118274523545e-05,
717
+ "log_odds_chosen": 0.0803312212228775,
718
+ "log_odds_ratio": -0.7448180913925171,
719
+ "logits/chosen": 177.42755126953125,
720
+ "logits/rejected": 166.17420959472656,
721
+ "logps/chosen": -0.971086323261261,
722
+ "logps/rejected": -1.025727391242981,
723
+ "loss": 50.0697,
724
+ "nll_loss": 1.1380794048309326,
725
+ "rewards/accuracies": 0.5562499761581421,
726
+ "rewards/chosen": -0.4855431616306305,
727
+ "rewards/margins": 0.027320479974150658,
728
+ "rewards/rejected": -0.5128636956214905,
729
+ "step": 200
730
+ },
731
+ {
732
+ "epoch": 1.9386666666666668,
733
+ "grad_norm": 40.688987731933594,
734
+ "learning_rate": 1.643500976631037e-05,
735
+ "log_odds_chosen": 0.10912259668111801,
736
+ "log_odds_ratio": -0.7380350232124329,
737
+ "logits/chosen": 173.36325073242188,
738
+ "logits/rejected": 171.0248565673828,
739
+ "logps/chosen": -0.9703338742256165,
740
+ "logps/rejected": -1.0370721817016602,
741
+ "loss": 48.6517,
742
+ "nll_loss": 1.1243274211883545,
743
+ "rewards/accuracies": 0.5562499761581421,
744
+ "rewards/chosen": -0.4851669371128082,
745
+ "rewards/margins": 0.03336922079324722,
746
+ "rewards/rejected": -0.5185360908508301,
747
+ "step": 205
748
+ },
749
+ {
750
+ "epoch": 1.986074074074074,
751
+ "grad_norm": 47.03554153442383,
752
+ "learning_rate": 1.514521724066537e-05,
753
+ "log_odds_chosen": 0.07311935722827911,
754
+ "log_odds_ratio": -0.7410269975662231,
755
+ "logits/chosen": 161.46792602539062,
756
+ "logits/rejected": 176.0153045654297,
757
+ "logps/chosen": -0.9304733276367188,
758
+ "logps/rejected": -0.9752525091171265,
759
+ "loss": 49.3022,
760
+ "nll_loss": 1.163869857788086,
761
+ "rewards/accuracies": 0.5375000238418579,
762
+ "rewards/chosen": -0.4652366638183594,
763
+ "rewards/margins": 0.02238963544368744,
764
+ "rewards/rejected": -0.48762625455856323,
765
+ "step": 210
766
+ },
767
+ {
768
+ "epoch": 2.0284444444444443,
769
+ "grad_norm": 46.73811721801758,
770
+ "learning_rate": 1.3885777771950348e-05,
771
+ "log_odds_chosen": 0.04928523674607277,
772
+ "log_odds_ratio": -0.7592554688453674,
773
+ "logits/chosen": 154.12142944335938,
774
+ "logits/rejected": 166.54530334472656,
775
+ "logps/chosen": -0.8669578433036804,
776
+ "logps/rejected": -0.8768206238746643,
777
+ "loss": 38.0486,
778
+ "nll_loss": 0.969521701335907,
779
+ "rewards/accuracies": 0.5314685106277466,
780
+ "rewards/chosen": -0.4334789216518402,
781
+ "rewards/margins": 0.004931411240249872,
782
+ "rewards/rejected": -0.43841031193733215,
783
+ "step": 215
784
+ },
785
+ {
786
+ "epoch": 2.075851851851852,
787
+ "grad_norm": 66.5871353149414,
788
+ "learning_rate": 1.2660570475395683e-05,
789
+ "log_odds_chosen": 0.07944320142269135,
790
+ "log_odds_ratio": -0.7612829804420471,
791
+ "logits/chosen": 158.1134033203125,
792
+ "logits/rejected": 145.2221221923828,
793
+ "logps/chosen": -0.8715459704399109,
794
+ "logps/rejected": -0.8965581059455872,
795
+ "loss": 42.355,
796
+ "nll_loss": 0.9041290283203125,
797
+ "rewards/accuracies": 0.606249988079071,
798
+ "rewards/chosen": -0.43577298521995544,
799
+ "rewards/margins": 0.01250611525028944,
800
+ "rewards/rejected": -0.4482790529727936,
801
+ "step": 220
802
+ },
803
+ {
804
+ "epoch": 2.1232592592592594,
805
+ "grad_norm": 43.993412017822266,
806
+ "learning_rate": 1.1473369030008974e-05,
807
+ "log_odds_chosen": 0.07922948896884918,
808
+ "log_odds_ratio": -0.748726487159729,
809
+ "logits/chosen": 150.79869079589844,
810
+ "logits/rejected": 159.51519775390625,
811
+ "logps/chosen": -0.9018028974533081,
812
+ "logps/rejected": -0.9198867082595825,
813
+ "loss": 41.5811,
814
+ "nll_loss": 0.9309173822402954,
815
+ "rewards/accuracies": 0.550000011920929,
816
+ "rewards/chosen": -0.45090144872665405,
817
+ "rewards/margins": 0.009041833691298962,
818
+ "rewards/rejected": -0.45994335412979126,
819
+ "step": 225
820
+ },
821
+ {
822
+ "epoch": 2.1706666666666665,
823
+ "grad_norm": 42.56783676147461,
824
+ "learning_rate": 1.0327830055518842e-05,
825
+ "log_odds_chosen": 0.1404309719800949,
826
+ "log_odds_ratio": -0.7541959285736084,
827
+ "logits/chosen": 149.96189880371094,
828
+ "logits/rejected": 155.0712127685547,
829
+ "logps/chosen": -0.8312407732009888,
830
+ "logps/rejected": -0.9046605825424194,
831
+ "loss": 41.5919,
832
+ "nll_loss": 0.9145228266716003,
833
+ "rewards/accuracies": 0.59375,
834
+ "rewards/chosen": -0.4156203866004944,
835
+ "rewards/margins": 0.03670995682477951,
836
+ "rewards/rejected": -0.4523302912712097,
837
+ "step": 230
838
+ },
839
+ {
840
+ "epoch": 2.218074074074074,
841
+ "grad_norm": 46.45758819580078,
842
+ "learning_rate": 9.227481849865235e-06,
843
+ "log_odds_chosen": -0.03481599688529968,
844
+ "log_odds_ratio": -0.8210490942001343,
845
+ "logits/chosen": 156.96890258789062,
846
+ "logits/rejected": 156.92953491210938,
847
+ "logps/chosen": -0.899451732635498,
848
+ "logps/rejected": -0.8785243034362793,
849
+ "loss": 42.667,
850
+ "nll_loss": 0.988889217376709,
851
+ "rewards/accuracies": 0.5062500238418579,
852
+ "rewards/chosen": -0.449725866317749,
853
+ "rewards/margins": -0.010463694110512733,
854
+ "rewards/rejected": -0.43926215171813965,
855
+ "step": 235
856
+ },
857
+ {
858
+ "epoch": 2.2654814814814817,
859
+ "grad_norm": 41.77062225341797,
860
+ "learning_rate": 8.175713521924978e-06,
861
+ "log_odds_chosen": 0.19800688326358795,
862
+ "log_odds_ratio": -0.7289772033691406,
863
+ "logits/chosen": 161.128173828125,
864
+ "logits/rejected": 159.4349365234375,
865
+ "logps/chosen": -0.8593847155570984,
866
+ "logps/rejected": -0.9744852185249329,
867
+ "loss": 41.5045,
868
+ "nll_loss": 0.9035336375236511,
869
+ "rewards/accuracies": 0.612500011920929,
870
+ "rewards/chosen": -0.4296923577785492,
871
+ "rewards/margins": 0.05755022168159485,
872
+ "rewards/rejected": -0.48724260926246643,
873
+ "step": 240
874
+ },
875
+ {
876
+ "epoch": 2.3128888888888888,
877
+ "grad_norm": 41.01064682006836,
878
+ "learning_rate": 7.1757645529443665e-06,
879
+ "log_odds_chosen": -0.0014010012382641435,
880
+ "log_odds_ratio": -0.7947742342948914,
881
+ "logits/chosen": 155.06546020507812,
882
+ "logits/rejected": 155.10775756835938,
883
+ "logps/chosen": -0.878911018371582,
884
+ "logps/rejected": -0.8687770962715149,
885
+ "loss": 41.6816,
886
+ "nll_loss": 0.9432564973831177,
887
+ "rewards/accuracies": 0.5249999761581421,
888
+ "rewards/chosen": -0.439455509185791,
889
+ "rewards/margins": -0.0050669461488723755,
890
+ "rewards/rejected": -0.43438854813575745,
891
+ "step": 245
892
+ },
893
+ {
894
+ "epoch": 2.3602962962962963,
895
+ "grad_norm": 41.34367752075195,
896
+ "learning_rate": 6.230714818829733e-06,
897
+ "log_odds_chosen": 0.0013508498668670654,
898
+ "log_odds_ratio": -0.7936294674873352,
899
+ "logits/chosen": 153.34498596191406,
900
+ "logits/rejected": 158.2041015625,
901
+ "logps/chosen": -0.8574392199516296,
902
+ "logps/rejected": -0.8317993879318237,
903
+ "loss": 42.0429,
904
+ "nll_loss": 0.932290256023407,
905
+ "rewards/accuracies": 0.5375000238418579,
906
+ "rewards/chosen": -0.4287196099758148,
907
+ "rewards/margins": -0.0128199253231287,
908
+ "rewards/rejected": -0.41589969396591187,
909
+ "step": 250
910
+ },
911
+ {
912
+ "epoch": 2.407703703703704,
913
+ "grad_norm": 37.83774185180664,
914
+ "learning_rate": 5.343475104027743e-06,
915
+ "log_odds_chosen": 0.17812205851078033,
916
+ "log_odds_ratio": -0.7299228310585022,
917
+ "logits/chosen": 155.40672302246094,
918
+ "logits/rejected": 160.87484741210938,
919
+ "logps/chosen": -0.8623374104499817,
920
+ "logps/rejected": -0.9512872695922852,
921
+ "loss": 42.1891,
922
+ "nll_loss": 0.9383618235588074,
923
+ "rewards/accuracies": 0.5625,
924
+ "rewards/chosen": -0.43116870522499084,
925
+ "rewards/margins": 0.044474903494119644,
926
+ "rewards/rejected": -0.4756436347961426,
927
+ "step": 255
928
+ },
929
+ {
930
+ "epoch": 2.455111111111111,
931
+ "grad_norm": 34.86994934082031,
932
+ "learning_rate": 4.516778136213037e-06,
933
+ "log_odds_chosen": 0.06448222696781158,
934
+ "log_odds_ratio": -0.7623555660247803,
935
+ "logits/chosen": 159.42433166503906,
936
+ "logits/rejected": 155.6562957763672,
937
+ "logps/chosen": -0.853624701499939,
938
+ "logps/rejected": -0.8752864003181458,
939
+ "loss": 41.3101,
940
+ "nll_loss": 0.8972175717353821,
941
+ "rewards/accuracies": 0.5625,
942
+ "rewards/chosen": -0.4268123507499695,
943
+ "rewards/margins": 0.010830795392394066,
944
+ "rewards/rejected": -0.4376432001590729,
945
+ "step": 260
946
+ },
947
+ {
948
+ "epoch": 2.5025185185185186,
949
+ "grad_norm": 40.81254577636719,
950
+ "learning_rate": 3.7531701693965554e-06,
951
+ "log_odds_chosen": 0.0020365030504763126,
952
+ "log_odds_ratio": -0.7836844325065613,
953
+ "logits/chosen": 152.5310516357422,
954
+ "logits/rejected": 162.28890991210938,
955
+ "logps/chosen": -0.904189944267273,
956
+ "logps/rejected": -0.8998059034347534,
957
+ "loss": 41.5474,
958
+ "nll_loss": 0.9223392605781555,
959
+ "rewards/accuracies": 0.5249999761581421,
960
+ "rewards/chosen": -0.4520949721336365,
961
+ "rewards/margins": -0.002192039042711258,
962
+ "rewards/rejected": -0.4499029517173767,
963
+ "step": 265
964
+ },
965
+ {
966
+ "epoch": 2.549925925925926,
967
+ "grad_norm": 35.19020080566406,
968
+ "learning_rate": 3.055003141378948e-06,
969
+ "log_odds_chosen": 0.08025786280632019,
970
+ "log_odds_ratio": -0.7740025520324707,
971
+ "logits/chosen": 152.10177612304688,
972
+ "logits/rejected": 160.26998901367188,
973
+ "logps/chosen": -0.8461346626281738,
974
+ "logps/rejected": -0.8950198888778687,
975
+ "loss": 41.2177,
976
+ "nll_loss": 0.8989561200141907,
977
+ "rewards/accuracies": 0.543749988079071,
978
+ "rewards/chosen": -0.4230673313140869,
979
+ "rewards/margins": 0.02444261871278286,
980
+ "rewards/rejected": -0.4475099444389343,
981
+ "step": 270
982
+ },
983
+ {
984
+ "epoch": 2.5973333333333333,
985
+ "grad_norm": 39.02735900878906,
986
+ "learning_rate": 2.424427429704365e-06,
987
+ "log_odds_chosen": 0.061826206743717194,
988
+ "log_odds_ratio": -0.7568526268005371,
989
+ "logits/chosen": 161.21261596679688,
990
+ "logits/rejected": 161.19491577148438,
991
+ "logps/chosen": -0.8993644714355469,
992
+ "logps/rejected": -0.903190016746521,
993
+ "loss": 41.1732,
994
+ "nll_loss": 0.8910198211669922,
995
+ "rewards/accuracies": 0.5874999761581421,
996
+ "rewards/chosen": -0.44968223571777344,
997
+ "rewards/margins": 0.0019127646228298545,
998
+ "rewards/rejected": -0.4515950083732605,
999
+ "step": 275
1000
+ },
1001
+ {
1002
+ "epoch": 2.644740740740741,
1003
+ "grad_norm": 39.4185676574707,
1004
+ "learning_rate": 1.8633852284264508e-06,
1005
+ "log_odds_chosen": 0.02886720933020115,
1006
+ "log_odds_ratio": -0.8128382563591003,
1007
+ "logits/chosen": 153.80528259277344,
1008
+ "logits/rejected": 150.25201416015625,
1009
+ "logps/chosen": -0.8430732488632202,
1010
+ "logps/rejected": -0.8419806361198425,
1011
+ "loss": 41.047,
1012
+ "nll_loss": 0.9150816202163696,
1013
+ "rewards/accuracies": 0.5062500238418579,
1014
+ "rewards/chosen": -0.4215366244316101,
1015
+ "rewards/margins": -0.0005463186535052955,
1016
+ "rewards/rejected": -0.42099031805992126,
1017
+ "step": 280
1018
+ },
1019
+ {
1020
+ "epoch": 2.6921481481481484,
1021
+ "grad_norm": 50.44974899291992,
1022
+ "learning_rate": 1.3736045660864034e-06,
1023
+ "log_odds_chosen": -0.17111532390117645,
1024
+ "log_odds_ratio": -0.9221655130386353,
1025
+ "logits/chosen": 152.35177612304688,
1026
+ "logits/rejected": 156.8704376220703,
1027
+ "logps/chosen": -1.0376865863800049,
1028
+ "logps/rejected": -0.9074798822402954,
1029
+ "loss": 42.497,
1030
+ "nll_loss": 0.9657168388366699,
1031
+ "rewards/accuracies": 0.4625000059604645,
1032
+ "rewards/chosen": -0.5188432931900024,
1033
+ "rewards/margins": -0.06510341912508011,
1034
+ "rewards/rejected": -0.4537399411201477,
1035
+ "step": 285
1036
+ },
1037
+ {
1038
+ "epoch": 2.7395555555555555,
1039
+ "grad_norm": 38.56003952026367,
1040
+ "learning_rate": 9.565939833279192e-07,
1041
+ "log_odds_chosen": -0.0652594044804573,
1042
+ "log_odds_ratio": -0.8579480051994324,
1043
+ "logits/chosen": 155.33377075195312,
1044
+ "logits/rejected": 156.094482421875,
1045
+ "logps/chosen": -0.9537650942802429,
1046
+ "logps/rejected": -0.8857321739196777,
1047
+ "loss": 41.9732,
1048
+ "nll_loss": 0.9463297724723816,
1049
+ "rewards/accuracies": 0.512499988079071,
1050
+ "rewards/chosen": -0.47688254714012146,
1051
+ "rewards/margins": -0.034016452729701996,
1052
+ "rewards/rejected": -0.44286608695983887,
1053
+ "step": 290
1054
+ },
1055
+ {
1056
+ "epoch": 2.786962962962963,
1057
+ "grad_norm": 35.984092712402344,
1058
+ "learning_rate": 6.136378865420872e-07,
1059
+ "log_odds_chosen": -0.05488128587603569,
1060
+ "log_odds_ratio": -0.8374530076980591,
1061
+ "logits/chosen": 159.79251098632812,
1062
+ "logits/rejected": 153.42626953125,
1063
+ "logps/chosen": -0.8930980563163757,
1064
+ "logps/rejected": -0.848624587059021,
1065
+ "loss": 42.7359,
1066
+ "nll_loss": 0.9567394256591797,
1067
+ "rewards/accuracies": 0.4937500059604645,
1068
+ "rewards/chosen": -0.44654902815818787,
1069
+ "rewards/margins": -0.02223675511777401,
1070
+ "rewards/rejected": -0.4243122935295105,
1071
+ "step": 295
1072
+ },
1073
+ {
1074
+ "epoch": 2.83437037037037,
1075
+ "grad_norm": 40.65980911254883,
1076
+ "learning_rate": 3.45792591853214e-07,
1077
+ "log_odds_chosen": 0.03835665062069893,
1078
+ "log_odds_ratio": -0.7734814286231995,
1079
+ "logits/chosen": 157.85092163085938,
1080
+ "logits/rejected": 154.9134979248047,
1081
+ "logps/chosen": -0.8622520565986633,
1082
+ "logps/rejected": -0.8551470637321472,
1083
+ "loss": 41.1462,
1084
+ "nll_loss": 0.9184630513191223,
1085
+ "rewards/accuracies": 0.543749988079071,
1086
+ "rewards/chosen": -0.43112602829933167,
1087
+ "rewards/margins": -0.0035525336861610413,
1088
+ "rewards/rejected": -0.4275735318660736,
1089
+ "step": 300
1090
+ },
1091
+ {
1092
+ "epoch": 2.8817777777777778,
1093
+ "grad_norm": 45.03302001953125,
1094
+ "learning_rate": 1.538830716302092e-07,
1095
+ "log_odds_chosen": 0.0824732631444931,
1096
+ "log_odds_ratio": -0.7868438959121704,
1097
+ "logits/chosen": 152.17709350585938,
1098
+ "logits/rejected": 160.5044403076172,
1099
+ "logps/chosen": -0.8794566988945007,
1100
+ "logps/rejected": -0.887550950050354,
1101
+ "loss": 40.8093,
1102
+ "nll_loss": 0.9352946281433105,
1103
+ "rewards/accuracies": 0.543749988079071,
1104
+ "rewards/chosen": -0.43972834944725037,
1105
+ "rewards/margins": 0.004047115799039602,
1106
+ "rewards/rejected": -0.443775475025177,
1107
+ "step": 305
1108
+ },
1109
+ {
1110
+ "epoch": 2.9291851851851853,
1111
+ "grad_norm": 41.13324737548828,
1112
+ "learning_rate": 3.8500413544415025e-08,
1113
+ "log_odds_chosen": 0.036232512444257736,
1114
+ "log_odds_ratio": -0.7697147727012634,
1115
+ "logits/chosen": 153.0935821533203,
1116
+ "logits/rejected": 152.86441040039062,
1117
+ "logps/chosen": -0.8729937672615051,
1118
+ "logps/rejected": -0.8775345087051392,
1119
+ "loss": 41.2786,
1120
+ "nll_loss": 0.9117900133132935,
1121
+ "rewards/accuracies": 0.59375,
1122
+ "rewards/chosen": -0.43649688363075256,
1123
+ "rewards/margins": 0.002270397264510393,
1124
+ "rewards/rejected": -0.4387672543525696,
1125
+ "step": 310
1126
+ },
1127
+ {
1128
+ "epoch": 2.9765925925925925,
1129
+ "grad_norm": 116.33460235595703,
1130
+ "learning_rate": 0.0,
1131
+ "log_odds_chosen": 0.13965412974357605,
1132
+ "log_odds_ratio": -0.712945282459259,
1133
+ "logits/chosen": 145.65011596679688,
1134
+ "logits/rejected": 152.03152465820312,
1135
+ "logps/chosen": -0.739302933216095,
1136
+ "logps/rejected": -0.7891243696212769,
1137
+ "loss": 39.5762,
1138
+ "nll_loss": 0.8406442403793335,
1139
+ "rewards/accuracies": 0.6187499761581421,
1140
+ "rewards/chosen": -0.3696514666080475,
1141
+ "rewards/margins": 0.024910712614655495,
1142
+ "rewards/rejected": -0.3945621848106384,
1143
+ "step": 315
1144
+ },
1145
+ {
1146
+ "epoch": 2.9765925925925925,
1147
+ "step": 315,
1148
+ "total_flos": 0.0,
1149
+ "train_loss": 71.97097947862413,
1150
+ "train_runtime": 8116.2448,
1151
+ "train_samples_per_second": 2.495,
1152
+ "train_steps_per_second": 0.039
1153
+ }
1154
+ ],
1155
+ "logging_steps": 5,
1156
+ "max_steps": 315,
1157
+ "num_input_tokens_seen": 0,
1158
+ "num_train_epochs": 3,
1159
+ "save_steps": 100000,
1160
+ "stateful_callbacks": {
1161
+ "TrainerControl": {
1162
+ "args": {
1163
+ "should_epoch_stop": false,
1164
+ "should_evaluate": false,
1165
+ "should_log": false,
1166
+ "should_save": true,
1167
+ "should_training_stop": true
1168
+ },
1169
+ "attributes": {}
1170
+ }
1171
+ },
1172
+ "total_flos": 0.0,
1173
+ "train_batch_size": 1,
1174
+ "trial_name": null,
1175
+ "trial_params": null
1176
+ }