ben81828 commited on
Commit
cffcbbd
1 Parent(s): 01f03fd

End of training

Browse files
README.md CHANGED
@@ -4,6 +4,7 @@ license: apache-2.0
4
  base_model: AdaptLLM/biomed-Qwen2-VL-2B-Instruct
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: qwenvl-2B-cadica-stenosis-classify-lora
@@ -15,10 +16,10 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # qwenvl-2B-cadica-stenosis-classify-lora
17
 
18
- This model is a fine-tuned version of [AdaptLLM/biomed-Qwen2-VL-2B-Instruct](https://huggingface.co/AdaptLLM/biomed-Qwen2-VL-2B-Instruct) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.7947
21
- - Num Input Tokens Seen: 10902632
22
 
23
  ## Model description
24
 
 
4
  base_model: AdaptLLM/biomed-Qwen2-VL-2B-Instruct
5
  tags:
6
  - llama-factory
7
+ - lora
8
  - generated_from_trainer
9
  model-index:
10
  - name: qwenvl-2B-cadica-stenosis-classify-lora
 
16
 
17
  # qwenvl-2B-cadica-stenosis-classify-lora
18
 
19
+ This model is a fine-tuned version of [AdaptLLM/biomed-Qwen2-VL-2B-Instruct](https://huggingface.co/AdaptLLM/biomed-Qwen2-VL-2B-Instruct) on the CADICA狹窄分析選擇題(TRAIN) dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.7947
22
+ - Num Input Tokens Seen: 11152104
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.9968586387434555,
3
+ "eval_loss": 0.794740617275238,
4
+ "eval_runtime": 46.549,
5
+ "eval_samples_per_second": 3.136,
6
+ "eval_steps_per_second": 0.795,
7
+ "num_input_tokens_seen": 11152104,
8
+ "total_flos": 754095660204032.0,
9
+ "train_loss": 0.9049516569136241,
10
+ "train_runtime": 17154.0309,
11
+ "train_samples_per_second": 1.336,
12
+ "train_steps_per_second": 0.042
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.9968586387434555,
3
+ "eval_loss": 0.794740617275238,
4
+ "eval_runtime": 46.549,
5
+ "eval_samples_per_second": 3.136,
6
+ "eval_steps_per_second": 0.795,
7
+ "num_input_tokens_seen": 11152104
8
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.9968586387434555,
3
+ "num_input_tokens_seen": 11152104,
4
+ "total_flos": 754095660204032.0,
5
+ "train_loss": 0.9049516569136241,
6
+ "train_runtime": 17154.0309,
7
+ "train_samples_per_second": 1.336,
8
+ "train_steps_per_second": 0.042
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1313 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.794740617275238,
3
+ "best_model_checkpoint": "saves/CADICA_qwenvl_stenosis_classily/lora/sft/checkpoint-700",
4
+ "epoch": 1.9968586387434555,
5
+ "eval_steps": 50,
6
+ "global_step": 716,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.013961605584642234,
13
+ "grad_norm": 21.25276507868793,
14
+ "learning_rate": 6.944444444444445e-06,
15
+ "loss": 2.9908,
16
+ "num_input_tokens_seen": 77944,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 0.027923211169284468,
21
+ "grad_norm": 21.89043285054519,
22
+ "learning_rate": 1.388888888888889e-05,
23
+ "loss": 3.0071,
24
+ "num_input_tokens_seen": 155896,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 0.041884816753926704,
29
+ "grad_norm": 16.65776874449816,
30
+ "learning_rate": 2.0833333333333336e-05,
31
+ "loss": 2.354,
32
+ "num_input_tokens_seen": 233896,
33
+ "step": 15
34
+ },
35
+ {
36
+ "epoch": 0.055846422338568937,
37
+ "grad_norm": 3.772799389266845,
38
+ "learning_rate": 2.777777777777778e-05,
39
+ "loss": 1.2959,
40
+ "num_input_tokens_seen": 311840,
41
+ "step": 20
42
+ },
43
+ {
44
+ "epoch": 0.06980802792321117,
45
+ "grad_norm": 2.5936011954385334,
46
+ "learning_rate": 3.472222222222222e-05,
47
+ "loss": 1.0206,
48
+ "num_input_tokens_seen": 389816,
49
+ "step": 25
50
+ },
51
+ {
52
+ "epoch": 0.08376963350785341,
53
+ "grad_norm": 1.380523901017673,
54
+ "learning_rate": 4.166666666666667e-05,
55
+ "loss": 0.9285,
56
+ "num_input_tokens_seen": 467808,
57
+ "step": 30
58
+ },
59
+ {
60
+ "epoch": 0.09773123909249563,
61
+ "grad_norm": 0.9535971270874376,
62
+ "learning_rate": 4.8611111111111115e-05,
63
+ "loss": 0.9052,
64
+ "num_input_tokens_seen": 545776,
65
+ "step": 35
66
+ },
67
+ {
68
+ "epoch": 0.11169284467713787,
69
+ "grad_norm": 0.7487685762175865,
70
+ "learning_rate": 5.555555555555556e-05,
71
+ "loss": 0.929,
72
+ "num_input_tokens_seen": 623744,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 0.1256544502617801,
77
+ "grad_norm": 0.9517829869317949,
78
+ "learning_rate": 6.25e-05,
79
+ "loss": 0.9076,
80
+ "num_input_tokens_seen": 701720,
81
+ "step": 45
82
+ },
83
+ {
84
+ "epoch": 0.13961605584642234,
85
+ "grad_norm": 0.5105376471286923,
86
+ "learning_rate": 6.944444444444444e-05,
87
+ "loss": 0.9039,
88
+ "num_input_tokens_seen": 779728,
89
+ "step": 50
90
+ },
91
+ {
92
+ "epoch": 0.13961605584642234,
93
+ "eval_loss": 0.9039102792739868,
94
+ "eval_runtime": 74.9579,
95
+ "eval_samples_per_second": 1.948,
96
+ "eval_steps_per_second": 0.494,
97
+ "num_input_tokens_seen": 779728,
98
+ "step": 50
99
+ },
100
+ {
101
+ "epoch": 0.15357766143106458,
102
+ "grad_norm": 0.6125311992064874,
103
+ "learning_rate": 7.638888888888889e-05,
104
+ "loss": 0.8983,
105
+ "num_input_tokens_seen": 857728,
106
+ "step": 55
107
+ },
108
+ {
109
+ "epoch": 0.16753926701570682,
110
+ "grad_norm": 0.8799068808838695,
111
+ "learning_rate": 8.333333333333334e-05,
112
+ "loss": 0.9115,
113
+ "num_input_tokens_seen": 935680,
114
+ "step": 60
115
+ },
116
+ {
117
+ "epoch": 0.18150087260034903,
118
+ "grad_norm": 0.7270711909487898,
119
+ "learning_rate": 9.027777777777779e-05,
120
+ "loss": 0.9022,
121
+ "num_input_tokens_seen": 1013664,
122
+ "step": 65
123
+ },
124
+ {
125
+ "epoch": 0.19546247818499127,
126
+ "grad_norm": 0.6023654770278246,
127
+ "learning_rate": 9.722222222222223e-05,
128
+ "loss": 0.8981,
129
+ "num_input_tokens_seen": 1091656,
130
+ "step": 70
131
+ },
132
+ {
133
+ "epoch": 0.2094240837696335,
134
+ "grad_norm": 0.5698794386547648,
135
+ "learning_rate": 9.999464569905628e-05,
136
+ "loss": 0.9067,
137
+ "num_input_tokens_seen": 1169664,
138
+ "step": 75
139
+ },
140
+ {
141
+ "epoch": 0.22338568935427575,
142
+ "grad_norm": 0.32260644881875,
143
+ "learning_rate": 9.99619291237835e-05,
144
+ "loss": 0.9075,
145
+ "num_input_tokens_seen": 1247672,
146
+ "step": 80
147
+ },
148
+ {
149
+ "epoch": 0.23734729493891799,
150
+ "grad_norm": 0.41708039368778405,
151
+ "learning_rate": 9.989949002448076e-05,
152
+ "loss": 0.8964,
153
+ "num_input_tokens_seen": 1325640,
154
+ "step": 85
155
+ },
156
+ {
157
+ "epoch": 0.2513089005235602,
158
+ "grad_norm": 0.6145758907120942,
159
+ "learning_rate": 9.980736554638366e-05,
160
+ "loss": 0.9128,
161
+ "num_input_tokens_seen": 1403688,
162
+ "step": 90
163
+ },
164
+ {
165
+ "epoch": 0.26527050610820246,
166
+ "grad_norm": 0.30302247663937915,
167
+ "learning_rate": 9.968561049466214e-05,
168
+ "loss": 0.8991,
169
+ "num_input_tokens_seen": 1481664,
170
+ "step": 95
171
+ },
172
+ {
173
+ "epoch": 0.2792321116928447,
174
+ "grad_norm": 0.32920212256475023,
175
+ "learning_rate": 9.953429730181653e-05,
176
+ "loss": 0.9033,
177
+ "num_input_tokens_seen": 1559632,
178
+ "step": 100
179
+ },
180
+ {
181
+ "epoch": 0.2792321116928447,
182
+ "eval_loss": 0.9009457230567932,
183
+ "eval_runtime": 47.3577,
184
+ "eval_samples_per_second": 3.083,
185
+ "eval_steps_per_second": 0.781,
186
+ "num_input_tokens_seen": 1559632,
187
+ "step": 100
188
+ },
189
+ {
190
+ "epoch": 0.2931937172774869,
191
+ "grad_norm": 0.3748068082841054,
192
+ "learning_rate": 9.935351598458742e-05,
193
+ "loss": 0.902,
194
+ "num_input_tokens_seen": 1637592,
195
+ "step": 105
196
+ },
197
+ {
198
+ "epoch": 0.30715532286212915,
199
+ "grad_norm": 0.367692204424778,
200
+ "learning_rate": 9.914337409040418e-05,
201
+ "loss": 0.903,
202
+ "num_input_tokens_seen": 1715592,
203
+ "step": 110
204
+ },
205
+ {
206
+ "epoch": 0.32111692844677137,
207
+ "grad_norm": 0.523389228578757,
208
+ "learning_rate": 9.890399663340478e-05,
209
+ "loss": 0.9014,
210
+ "num_input_tokens_seen": 1793544,
211
+ "step": 115
212
+ },
213
+ {
214
+ "epoch": 0.33507853403141363,
215
+ "grad_norm": 0.7666885810234405,
216
+ "learning_rate": 9.863552602006435e-05,
217
+ "loss": 0.8966,
218
+ "num_input_tokens_seen": 1871520,
219
+ "step": 120
220
+ },
221
+ {
222
+ "epoch": 0.34904013961605584,
223
+ "grad_norm": 0.45411297588089927,
224
+ "learning_rate": 9.83381219644771e-05,
225
+ "loss": 0.9032,
226
+ "num_input_tokens_seen": 1949488,
227
+ "step": 125
228
+ },
229
+ {
230
+ "epoch": 0.36300174520069806,
231
+ "grad_norm": 0.34304009173464395,
232
+ "learning_rate": 9.801196139334195e-05,
233
+ "loss": 0.8919,
234
+ "num_input_tokens_seen": 2027488,
235
+ "step": 130
236
+ },
237
+ {
238
+ "epoch": 0.3769633507853403,
239
+ "grad_norm": 0.46756876437741973,
240
+ "learning_rate": 9.765723834070804e-05,
241
+ "loss": 0.9025,
242
+ "num_input_tokens_seen": 2105424,
243
+ "step": 135
244
+ },
245
+ {
246
+ "epoch": 0.39092495636998253,
247
+ "grad_norm": 0.5067842714503226,
248
+ "learning_rate": 9.72741638325434e-05,
249
+ "loss": 0.9001,
250
+ "num_input_tokens_seen": 2183432,
251
+ "step": 140
252
+ },
253
+ {
254
+ "epoch": 0.4048865619546248,
255
+ "grad_norm": 0.45330848125911494,
256
+ "learning_rate": 9.686296576119471e-05,
257
+ "loss": 0.9007,
258
+ "num_input_tokens_seen": 2261408,
259
+ "step": 145
260
+ },
261
+ {
262
+ "epoch": 0.418848167539267,
263
+ "grad_norm": 0.3379495380306588,
264
+ "learning_rate": 9.642388874981347e-05,
265
+ "loss": 0.9001,
266
+ "num_input_tokens_seen": 2339368,
267
+ "step": 150
268
+ },
269
+ {
270
+ "epoch": 0.418848167539267,
271
+ "eval_loss": 0.8987511396408081,
272
+ "eval_runtime": 46.4204,
273
+ "eval_samples_per_second": 3.145,
274
+ "eval_steps_per_second": 0.797,
275
+ "num_input_tokens_seen": 2339368,
276
+ "step": 150
277
+ },
278
+ {
279
+ "epoch": 0.4328097731239092,
280
+ "grad_norm": 0.4705971769713144,
281
+ "learning_rate": 9.595719400682881e-05,
282
+ "loss": 0.8974,
283
+ "num_input_tokens_seen": 2417328,
284
+ "step": 155
285
+ },
286
+ {
287
+ "epoch": 0.4467713787085515,
288
+ "grad_norm": 0.2877245912486978,
289
+ "learning_rate": 9.546315917055361e-05,
290
+ "loss": 0.895,
291
+ "num_input_tokens_seen": 2495328,
292
+ "step": 160
293
+ },
294
+ {
295
+ "epoch": 0.4607329842931937,
296
+ "grad_norm": 0.3089085786477158,
297
+ "learning_rate": 9.494207814401672e-05,
298
+ "loss": 0.8993,
299
+ "num_input_tokens_seen": 2573264,
300
+ "step": 165
301
+ },
302
+ {
303
+ "epoch": 0.47469458987783597,
304
+ "grad_norm": 0.23434151328428765,
305
+ "learning_rate": 9.439426092011875e-05,
306
+ "loss": 0.9011,
307
+ "num_input_tokens_seen": 2651200,
308
+ "step": 170
309
+ },
310
+ {
311
+ "epoch": 0.4886561954624782,
312
+ "grad_norm": 0.3895079869752368,
313
+ "learning_rate": 9.382003339721652e-05,
314
+ "loss": 0.8943,
315
+ "num_input_tokens_seen": 2729208,
316
+ "step": 175
317
+ },
318
+ {
319
+ "epoch": 0.5026178010471204,
320
+ "grad_norm": 0.21859941380068879,
321
+ "learning_rate": 9.321973718524472e-05,
322
+ "loss": 0.9074,
323
+ "num_input_tokens_seen": 2807176,
324
+ "step": 180
325
+ },
326
+ {
327
+ "epoch": 0.5165794066317626,
328
+ "grad_norm": 0.3026569089827376,
329
+ "learning_rate": 9.25937294024912e-05,
330
+ "loss": 0.8979,
331
+ "num_input_tokens_seen": 2885136,
332
+ "step": 185
333
+ },
334
+ {
335
+ "epoch": 0.5305410122164049,
336
+ "grad_norm": 0.24552828812005026,
337
+ "learning_rate": 9.194238246314599e-05,
338
+ "loss": 0.8908,
339
+ "num_input_tokens_seen": 2963120,
340
+ "step": 190
341
+ },
342
+ {
343
+ "epoch": 0.5445026178010471,
344
+ "grad_norm": 0.31370225105827704,
345
+ "learning_rate": 9.126608385575076e-05,
346
+ "loss": 0.8922,
347
+ "num_input_tokens_seen": 3041096,
348
+ "step": 195
349
+ },
350
+ {
351
+ "epoch": 0.5584642233856894,
352
+ "grad_norm": 0.19991243424614802,
353
+ "learning_rate": 9.056523591268064e-05,
354
+ "loss": 0.902,
355
+ "num_input_tokens_seen": 3119064,
356
+ "step": 200
357
+ },
358
+ {
359
+ "epoch": 0.5584642233856894,
360
+ "eval_loss": 0.9003660678863525,
361
+ "eval_runtime": 46.4656,
362
+ "eval_samples_per_second": 3.142,
363
+ "eval_steps_per_second": 0.796,
364
+ "num_input_tokens_seen": 3119064,
365
+ "step": 200
366
+ },
367
+ {
368
+ "epoch": 0.5724258289703316,
369
+ "grad_norm": 0.29215232660029566,
370
+ "learning_rate": 8.984025557079523e-05,
371
+ "loss": 0.9016,
372
+ "num_input_tokens_seen": 3197048,
373
+ "step": 205
374
+ },
375
+ {
376
+ "epoch": 0.5863874345549738,
377
+ "grad_norm": 0.4850357234182881,
378
+ "learning_rate": 8.90915741234015e-05,
379
+ "loss": 0.907,
380
+ "num_input_tokens_seen": 3275024,
381
+ "step": 210
382
+ },
383
+ {
384
+ "epoch": 0.6003490401396161,
385
+ "grad_norm": 0.3567000129034729,
386
+ "learning_rate": 8.831963696367581e-05,
387
+ "loss": 0.8966,
388
+ "num_input_tokens_seen": 3353024,
389
+ "step": 215
390
+ },
391
+ {
392
+ "epoch": 0.6143106457242583,
393
+ "grad_norm": 0.26853087654006846,
394
+ "learning_rate": 8.752490331969807e-05,
395
+ "loss": 0.9031,
396
+ "num_input_tokens_seen": 3430936,
397
+ "step": 220
398
+ },
399
+ {
400
+ "epoch": 0.6282722513089005,
401
+ "grad_norm": 0.20742364801678845,
402
+ "learning_rate": 8.670784598125533e-05,
403
+ "loss": 0.9012,
404
+ "num_input_tokens_seen": 3508920,
405
+ "step": 225
406
+ },
407
+ {
408
+ "epoch": 0.6422338568935427,
409
+ "grad_norm": 0.2838565453202246,
410
+ "learning_rate": 8.586895101857747e-05,
411
+ "loss": 0.8936,
412
+ "num_input_tokens_seen": 3586920,
413
+ "step": 230
414
+ },
415
+ {
416
+ "epoch": 0.6561954624781849,
417
+ "grad_norm": 0.4892803498442385,
418
+ "learning_rate": 8.500871749317243e-05,
419
+ "loss": 0.9042,
420
+ "num_input_tokens_seen": 3664896,
421
+ "step": 235
422
+ },
423
+ {
424
+ "epoch": 0.6701570680628273,
425
+ "grad_norm": 0.37976656488421395,
426
+ "learning_rate": 8.412765716093272e-05,
427
+ "loss": 0.9034,
428
+ "num_input_tokens_seen": 3742832,
429
+ "step": 240
430
+ },
431
+ {
432
+ "epoch": 0.6841186736474695,
433
+ "grad_norm": 0.291645150101734,
434
+ "learning_rate": 8.322629416769006e-05,
435
+ "loss": 0.8969,
436
+ "num_input_tokens_seen": 3820792,
437
+ "step": 245
438
+ },
439
+ {
440
+ "epoch": 0.6980802792321117,
441
+ "grad_norm": 0.30837187813275513,
442
+ "learning_rate": 8.230516473739935e-05,
443
+ "loss": 0.8933,
444
+ "num_input_tokens_seen": 3898784,
445
+ "step": 250
446
+ },
447
+ {
448
+ "epoch": 0.6980802792321117,
449
+ "eval_loss": 0.9052047729492188,
450
+ "eval_runtime": 46.4894,
451
+ "eval_samples_per_second": 3.141,
452
+ "eval_steps_per_second": 0.796,
453
+ "num_input_tokens_seen": 3898784,
454
+ "step": 250
455
+ },
456
+ {
457
+ "epoch": 0.7120418848167539,
458
+ "grad_norm": 0.30727171157251326,
459
+ "learning_rate": 8.1364816853137e-05,
460
+ "loss": 0.9079,
461
+ "num_input_tokens_seen": 3976824,
462
+ "step": 255
463
+ },
464
+ {
465
+ "epoch": 0.7260034904013961,
466
+ "grad_norm": 0.24728667308223853,
467
+ "learning_rate": 8.040580993110404e-05,
468
+ "loss": 0.9044,
469
+ "num_input_tokens_seen": 4054752,
470
+ "step": 260
471
+ },
472
+ {
473
+ "epoch": 0.7399650959860384,
474
+ "grad_norm": 0.154782478333375,
475
+ "learning_rate": 7.942871448782748e-05,
476
+ "loss": 0.895,
477
+ "num_input_tokens_seen": 4132664,
478
+ "step": 265
479
+ },
480
+ {
481
+ "epoch": 0.7539267015706806,
482
+ "grad_norm": 0.21937062747939326,
483
+ "learning_rate": 7.843411180075794e-05,
484
+ "loss": 0.8984,
485
+ "num_input_tokens_seen": 4210656,
486
+ "step": 270
487
+ },
488
+ {
489
+ "epoch": 0.7678883071553229,
490
+ "grad_norm": 0.28854882430140594,
491
+ "learning_rate": 7.742259356246593e-05,
492
+ "loss": 0.904,
493
+ "num_input_tokens_seen": 4288664,
494
+ "step": 275
495
+ },
496
+ {
497
+ "epoch": 0.7818499127399651,
498
+ "grad_norm": 0.23726372374470991,
499
+ "learning_rate": 7.639476152864162e-05,
500
+ "loss": 0.8973,
501
+ "num_input_tokens_seen": 4366608,
502
+ "step": 280
503
+ },
504
+ {
505
+ "epoch": 0.7958115183246073,
506
+ "grad_norm": 0.34106252388262337,
507
+ "learning_rate": 7.535122716010849e-05,
508
+ "loss": 0.9018,
509
+ "num_input_tokens_seen": 4444568,
510
+ "step": 285
511
+ },
512
+ {
513
+ "epoch": 0.8097731239092496,
514
+ "grad_norm": 0.37561723629929783,
515
+ "learning_rate": 7.42926112590631e-05,
516
+ "loss": 0.8886,
517
+ "num_input_tokens_seen": 4522512,
518
+ "step": 290
519
+ },
520
+ {
521
+ "epoch": 0.8237347294938918,
522
+ "grad_norm": 0.26053439132207656,
523
+ "learning_rate": 7.321954359975776e-05,
524
+ "loss": 0.9002,
525
+ "num_input_tokens_seen": 4600504,
526
+ "step": 295
527
+ },
528
+ {
529
+ "epoch": 0.837696335078534,
530
+ "grad_norm": 0.248855595995717,
531
+ "learning_rate": 7.21326625538456e-05,
532
+ "loss": 0.897,
533
+ "num_input_tokens_seen": 4678472,
534
+ "step": 300
535
+ },
536
+ {
537
+ "epoch": 0.837696335078534,
538
+ "eval_loss": 0.9003945589065552,
539
+ "eval_runtime": 46.3308,
540
+ "eval_samples_per_second": 3.151,
541
+ "eval_steps_per_second": 0.799,
542
+ "num_input_tokens_seen": 4678472,
543
+ "step": 300
544
+ },
545
+ {
546
+ "epoch": 0.8516579406631762,
547
+ "grad_norm": 0.18200381843269203,
548
+ "learning_rate": 7.103261471061116e-05,
549
+ "loss": 0.9088,
550
+ "num_input_tokens_seen": 4756440,
551
+ "step": 305
552
+ },
553
+ {
554
+ "epoch": 0.8656195462478184,
555
+ "grad_norm": 0.19255528640902111,
556
+ "learning_rate": 6.992005449231208e-05,
557
+ "loss": 0.899,
558
+ "num_input_tokens_seen": 4834424,
559
+ "step": 310
560
+ },
561
+ {
562
+ "epoch": 0.8795811518324608,
563
+ "grad_norm": 0.3960850388870267,
564
+ "learning_rate": 6.879564376486114e-05,
565
+ "loss": 0.905,
566
+ "num_input_tokens_seen": 4912376,
567
+ "step": 315
568
+ },
569
+ {
570
+ "epoch": 0.893542757417103,
571
+ "grad_norm": 0.3472454916197344,
572
+ "learning_rate": 6.76600514440799e-05,
573
+ "loss": 0.8968,
574
+ "num_input_tokens_seen": 4990328,
575
+ "step": 320
576
+ },
577
+ {
578
+ "epoch": 0.9075043630017452,
579
+ "grad_norm": 0.42131468150264795,
580
+ "learning_rate": 6.651395309775837e-05,
581
+ "loss": 0.8916,
582
+ "num_input_tokens_seen": 5068304,
583
+ "step": 325
584
+ },
585
+ {
586
+ "epoch": 0.9214659685863874,
587
+ "grad_norm": 0.7373865772840422,
588
+ "learning_rate": 6.535803054375738e-05,
589
+ "loss": 0.8937,
590
+ "num_input_tokens_seen": 5146272,
591
+ "step": 330
592
+ },
593
+ {
594
+ "epoch": 0.9354275741710296,
595
+ "grad_norm": 0.6075085371236206,
596
+ "learning_rate": 6.419297144439283e-05,
597
+ "loss": 0.8965,
598
+ "num_input_tokens_seen": 5224232,
599
+ "step": 335
600
+ },
601
+ {
602
+ "epoch": 0.9493891797556719,
603
+ "grad_norm": 0.5336896324950464,
604
+ "learning_rate": 6.301946889734302e-05,
605
+ "loss": 0.8957,
606
+ "num_input_tokens_seen": 5302200,
607
+ "step": 340
608
+ },
609
+ {
610
+ "epoch": 0.9633507853403142,
611
+ "grad_norm": 0.7535717304812762,
612
+ "learning_rate": 6.183822102332234e-05,
613
+ "loss": 0.9025,
614
+ "num_input_tokens_seen": 5380168,
615
+ "step": 345
616
+ },
617
+ {
618
+ "epoch": 0.9773123909249564,
619
+ "grad_norm": 1.9146693047973427,
620
+ "learning_rate": 6.064993055076698e-05,
621
+ "loss": 0.8997,
622
+ "num_input_tokens_seen": 5458104,
623
+ "step": 350
624
+ },
625
+ {
626
+ "epoch": 0.9773123909249564,
627
+ "eval_loss": 0.9016226530075073,
628
+ "eval_runtime": 46.29,
629
+ "eval_samples_per_second": 3.154,
630
+ "eval_steps_per_second": 0.799,
631
+ "num_input_tokens_seen": 5458104,
632
+ "step": 350
633
+ },
634
+ {
635
+ "epoch": 0.9912739965095986,
636
+ "grad_norm": 0.4275320872134287,
637
+ "learning_rate": 5.945530439777923e-05,
638
+ "loss": 0.902,
639
+ "num_input_tokens_seen": 5536072,
640
+ "step": 355
641
+ },
642
+ {
643
+ "epoch": 1.0027923211169285,
644
+ "grad_norm": 1.5609644797837416,
645
+ "learning_rate": 5.8255053251579616e-05,
646
+ "loss": 0.7347,
647
+ "num_input_tokens_seen": 5600392,
648
+ "step": 360
649
+ },
650
+ {
651
+ "epoch": 1.0167539267015706,
652
+ "grad_norm": 0.4880192687307677,
653
+ "learning_rate": 5.704989114571648e-05,
654
+ "loss": 0.8899,
655
+ "num_input_tokens_seen": 5678424,
656
+ "step": 365
657
+ },
658
+ {
659
+ "epoch": 1.030715532286213,
660
+ "grad_norm": 0.37744049750587216,
661
+ "learning_rate": 5.5840535035285025e-05,
662
+ "loss": 0.8929,
663
+ "num_input_tokens_seen": 5756400,
664
+ "step": 370
665
+ },
666
+ {
667
+ "epoch": 1.0446771378708553,
668
+ "grad_norm": 0.9395658719072697,
669
+ "learning_rate": 5.4627704370408236e-05,
670
+ "loss": 0.8904,
671
+ "num_input_tokens_seen": 5834352,
672
+ "step": 375
673
+ },
674
+ {
675
+ "epoch": 1.0586387434554974,
676
+ "grad_norm": 0.5490353927737941,
677
+ "learning_rate": 5.341212066823355e-05,
678
+ "loss": 0.8964,
679
+ "num_input_tokens_seen": 5912320,
680
+ "step": 380
681
+ },
682
+ {
683
+ "epoch": 1.0726003490401397,
684
+ "grad_norm": 1.1114978460946199,
685
+ "learning_rate": 5.219450708369977e-05,
686
+ "loss": 0.8843,
687
+ "num_input_tokens_seen": 5990312,
688
+ "step": 385
689
+ },
690
+ {
691
+ "epoch": 1.0865619546247818,
692
+ "grad_norm": 1.1913979980069938,
693
+ "learning_rate": 5.0975587979329734e-05,
694
+ "loss": 0.8879,
695
+ "num_input_tokens_seen": 6068280,
696
+ "step": 390
697
+ },
698
+ {
699
+ "epoch": 1.100523560209424,
700
+ "grad_norm": 2.222951083897475,
701
+ "learning_rate": 4.9756088494304504e-05,
702
+ "loss": 0.8816,
703
+ "num_input_tokens_seen": 6146288,
704
+ "step": 395
705
+ },
706
+ {
707
+ "epoch": 1.1144851657940662,
708
+ "grad_norm": 1.5657191660795382,
709
+ "learning_rate": 4.853673411307564e-05,
710
+ "loss": 0.9109,
711
+ "num_input_tokens_seen": 6224248,
712
+ "step": 400
713
+ },
714
+ {
715
+ "epoch": 1.1144851657940662,
716
+ "eval_loss": 0.8960007429122925,
717
+ "eval_runtime": 46.5432,
718
+ "eval_samples_per_second": 3.137,
719
+ "eval_steps_per_second": 0.795,
720
+ "num_input_tokens_seen": 6224248,
721
+ "step": 400
722
+ },
723
+ {
724
+ "epoch": 1.1284467713787085,
725
+ "grad_norm": 2.66254478509122,
726
+ "learning_rate": 4.731825023377192e-05,
727
+ "loss": 0.8631,
728
+ "num_input_tokens_seen": 6302208,
729
+ "step": 405
730
+ },
731
+ {
732
+ "epoch": 1.1424083769633508,
733
+ "grad_norm": 1.7871963773512198,
734
+ "learning_rate": 4.610136173665751e-05,
735
+ "loss": 0.8722,
736
+ "num_input_tokens_seen": 6380096,
737
+ "step": 410
738
+ },
739
+ {
740
+ "epoch": 1.156369982547993,
741
+ "grad_norm": 2.511479150461674,
742
+ "learning_rate": 4.4886792552898286e-05,
743
+ "loss": 0.864,
744
+ "num_input_tokens_seen": 6458096,
745
+ "step": 415
746
+ },
747
+ {
748
+ "epoch": 1.1703315881326353,
749
+ "grad_norm": 2.9185608745372837,
750
+ "learning_rate": 4.367526523389253e-05,
751
+ "loss": 0.8446,
752
+ "num_input_tokens_seen": 6536064,
753
+ "step": 420
754
+ },
755
+ {
756
+ "epoch": 1.1842931937172776,
757
+ "grad_norm": 3.7053458609530403,
758
+ "learning_rate": 4.24675005214227e-05,
759
+ "loss": 0.8576,
760
+ "num_input_tokens_seen": 6614048,
761
+ "step": 425
762
+ },
763
+ {
764
+ "epoch": 1.1982547993019197,
765
+ "grad_norm": 2.3073667949552124,
766
+ "learning_rate": 4.1264216918883656e-05,
767
+ "loss": 0.8715,
768
+ "num_input_tokens_seen": 6691984,
769
+ "step": 430
770
+ },
771
+ {
772
+ "epoch": 1.212216404886562,
773
+ "grad_norm": 2.597389084730036,
774
+ "learning_rate": 4.006613026384249e-05,
775
+ "loss": 0.8708,
776
+ "num_input_tokens_seen": 6769984,
777
+ "step": 435
778
+ },
779
+ {
780
+ "epoch": 1.2261780104712041,
781
+ "grad_norm": 1.889225145403208,
782
+ "learning_rate": 3.887395330218429e-05,
783
+ "loss": 0.8546,
784
+ "num_input_tokens_seen": 6847976,
785
+ "step": 440
786
+ },
787
+ {
788
+ "epoch": 1.2401396160558464,
789
+ "grad_norm": 2.61428130233799,
790
+ "learning_rate": 3.768839526409718e-05,
791
+ "loss": 0.8592,
792
+ "num_input_tokens_seen": 6925944,
793
+ "step": 445
794
+ },
795
+ {
796
+ "epoch": 1.2541012216404885,
797
+ "grad_norm": 5.825082846471074,
798
+ "learning_rate": 3.651016144214878e-05,
799
+ "loss": 0.8127,
800
+ "num_input_tokens_seen": 7003904,
801
+ "step": 450
802
+ },
803
+ {
804
+ "epoch": 1.2541012216404885,
805
+ "eval_loss": 0.8821887373924255,
806
+ "eval_runtime": 46.527,
807
+ "eval_samples_per_second": 3.138,
808
+ "eval_steps_per_second": 0.795,
809
+ "num_input_tokens_seen": 7003904,
810
+ "step": 450
811
+ },
812
+ {
813
+ "epoch": 1.2680628272251309,
814
+ "grad_norm": 3.745385127572465,
815
+ "learning_rate": 3.533995277170532e-05,
816
+ "loss": 0.837,
817
+ "num_input_tokens_seen": 7081856,
818
+ "step": 455
819
+ },
820
+ {
821
+ "epoch": 1.2820244328097732,
822
+ "grad_norm": 3.1595331481084896,
823
+ "learning_rate": 3.4178465413942625e-05,
824
+ "loss": 0.8631,
825
+ "num_input_tokens_seen": 7159776,
826
+ "step": 460
827
+ },
828
+ {
829
+ "epoch": 1.2959860383944153,
830
+ "grad_norm": 3.241335528071935,
831
+ "learning_rate": 3.3026390341697576e-05,
832
+ "loss": 0.8511,
833
+ "num_input_tokens_seen": 7237720,
834
+ "step": 465
835
+ },
836
+ {
837
+ "epoch": 1.3099476439790576,
838
+ "grad_norm": 2.065109380918334,
839
+ "learning_rate": 3.188441292840587e-05,
840
+ "loss": 0.8439,
841
+ "num_input_tokens_seen": 7315704,
842
+ "step": 470
843
+ },
844
+ {
845
+ "epoch": 1.3239092495637,
846
+ "grad_norm": 3.25037907997737,
847
+ "learning_rate": 3.075321254037112e-05,
848
+ "loss": 0.872,
849
+ "num_input_tokens_seen": 7393672,
850
+ "step": 475
851
+ },
852
+ {
853
+ "epoch": 1.337870855148342,
854
+ "grad_norm": 4.772644937407586,
855
+ "learning_rate": 2.963346213260737e-05,
856
+ "loss": 0.8397,
857
+ "num_input_tokens_seen": 7471632,
858
+ "step": 480
859
+ },
860
+ {
861
+ "epoch": 1.3518324607329844,
862
+ "grad_norm": 4.371699254241009,
863
+ "learning_rate": 2.8525827848495913e-05,
864
+ "loss": 0.8254,
865
+ "num_input_tokens_seen": 7549624,
866
+ "step": 485
867
+ },
868
+ {
869
+ "epoch": 1.3657940663176265,
870
+ "grad_norm": 3.1620853722419797,
871
+ "learning_rate": 2.743096862349427e-05,
872
+ "loss": 0.8236,
873
+ "num_input_tokens_seen": 7627568,
874
+ "step": 490
875
+ },
876
+ {
877
+ "epoch": 1.3797556719022688,
878
+ "grad_norm": 5.330846549870451,
879
+ "learning_rate": 2.6349535793133196e-05,
880
+ "loss": 0.8561,
881
+ "num_input_tokens_seen": 7705512,
882
+ "step": 495
883
+ },
884
+ {
885
+ "epoch": 1.3937172774869109,
886
+ "grad_norm": 6.723230790293138,
887
+ "learning_rate": 2.5282172705535013e-05,
888
+ "loss": 0.8198,
889
+ "num_input_tokens_seen": 7783528,
890
+ "step": 500
891
+ },
892
+ {
893
+ "epoch": 1.3937172774869109,
894
+ "eval_loss": 0.846021294593811,
895
+ "eval_runtime": 46.2996,
896
+ "eval_samples_per_second": 3.153,
897
+ "eval_steps_per_second": 0.799,
898
+ "num_input_tokens_seen": 7783528,
899
+ "step": 500
900
+ },
901
+ {
902
+ "epoch": 1.4076788830715532,
903
+ "grad_norm": 3.08479773422221,
904
+ "learning_rate": 2.4229514338683458e-05,
905
+ "loss": 0.8498,
906
+ "num_input_tokens_seen": 7861512,
907
+ "step": 505
908
+ },
909
+ {
910
+ "epoch": 1.4216404886561955,
911
+ "grad_norm": 6.6639723675246385,
912
+ "learning_rate": 2.3192186922673186e-05,
913
+ "loss": 0.8195,
914
+ "num_input_tokens_seen": 7939480,
915
+ "step": 510
916
+ },
917
+ {
918
+ "epoch": 1.4356020942408376,
919
+ "grad_norm": 5.041383098958532,
920
+ "learning_rate": 2.2170807567163294e-05,
921
+ "loss": 0.8428,
922
+ "num_input_tokens_seen": 8017496,
923
+ "step": 515
924
+ },
925
+ {
926
+ "epoch": 1.44956369982548,
927
+ "grad_norm": 6.547003709523484,
928
+ "learning_rate": 2.1165983894256647e-05,
929
+ "loss": 0.8534,
930
+ "num_input_tokens_seen": 8095504,
931
+ "step": 520
932
+ },
933
+ {
934
+ "epoch": 1.4635253054101223,
935
+ "grad_norm": 6.396371201229991,
936
+ "learning_rate": 2.0178313677023425e-05,
937
+ "loss": 0.8113,
938
+ "num_input_tokens_seen": 8173440,
939
+ "step": 525
940
+ },
941
+ {
942
+ "epoch": 1.4774869109947644,
943
+ "grad_norm": 3.194803428296668,
944
+ "learning_rate": 1.9208384483883817e-05,
945
+ "loss": 0.8325,
946
+ "num_input_tokens_seen": 8251400,
947
+ "step": 530
948
+ },
949
+ {
950
+ "epoch": 1.4914485165794067,
951
+ "grad_norm": 3.8201513107026552,
952
+ "learning_rate": 1.8256773329061567e-05,
953
+ "loss": 0.8158,
954
+ "num_input_tokens_seen": 8329384,
955
+ "step": 535
956
+ },
957
+ {
958
+ "epoch": 1.505410122164049,
959
+ "grad_norm": 7.081372136918514,
960
+ "learning_rate": 1.732404632931625e-05,
961
+ "loss": 0.8183,
962
+ "num_input_tokens_seen": 8407384,
963
+ "step": 540
964
+ },
965
+ {
966
+ "epoch": 1.5193717277486911,
967
+ "grad_norm": 5.594321718168384,
968
+ "learning_rate": 1.6410758367158385e-05,
969
+ "loss": 0.8364,
970
+ "num_input_tokens_seen": 8485328,
971
+ "step": 545
972
+ },
973
+ {
974
+ "epoch": 1.5333333333333332,
975
+ "grad_norm": 6.74399246622262,
976
+ "learning_rate": 1.5517452760747975e-05,
977
+ "loss": 0.832,
978
+ "num_input_tokens_seen": 8563264,
979
+ "step": 550
980
+ },
981
+ {
982
+ "epoch": 1.5333333333333332,
983
+ "eval_loss": 0.8187811374664307,
984
+ "eval_runtime": 46.4274,
985
+ "eval_samples_per_second": 3.145,
986
+ "eval_steps_per_second": 0.797,
987
+ "num_input_tokens_seen": 8563264,
988
+ "step": 550
989
+ },
990
+ {
991
+ "epoch": 1.5472949389179755,
992
+ "grad_norm": 3.6364413823808968,
993
+ "learning_rate": 1.4644660940672627e-05,
994
+ "loss": 0.8445,
995
+ "num_input_tokens_seen": 8641240,
996
+ "step": 555
997
+ },
998
+ {
999
+ "epoch": 1.5612565445026179,
1000
+ "grad_norm": 4.8432505694035495,
1001
+ "learning_rate": 1.3792902133797692e-05,
1002
+ "loss": 0.8372,
1003
+ "num_input_tokens_seen": 8719256,
1004
+ "step": 560
1005
+ },
1006
+ {
1007
+ "epoch": 1.57521815008726,
1008
+ "grad_norm": 3.5469277881436545,
1009
+ "learning_rate": 1.2962683054376373e-05,
1010
+ "loss": 0.8107,
1011
+ "num_input_tokens_seen": 8797240,
1012
+ "step": 565
1013
+ },
1014
+ {
1015
+ "epoch": 1.5891797556719023,
1016
+ "grad_norm": 3.916756694748693,
1017
+ "learning_rate": 1.2154497602603703e-05,
1018
+ "loss": 0.8472,
1019
+ "num_input_tokens_seen": 8875208,
1020
+ "step": 570
1021
+ },
1022
+ {
1023
+ "epoch": 1.6031413612565446,
1024
+ "grad_norm": 5.159527110013197,
1025
+ "learning_rate": 1.13688265707936e-05,
1026
+ "loss": 0.8221,
1027
+ "num_input_tokens_seen": 8953176,
1028
+ "step": 575
1029
+ },
1030
+ {
1031
+ "epoch": 1.6171029668411867,
1032
+ "grad_norm": 3.9357730966522637,
1033
+ "learning_rate": 1.060613735735384e-05,
1034
+ "loss": 0.8076,
1035
+ "num_input_tokens_seen": 9031192,
1036
+ "step": 580
1037
+ },
1038
+ {
1039
+ "epoch": 1.6310645724258288,
1040
+ "grad_norm": 4.296138435306084,
1041
+ "learning_rate": 9.86688368872919e-06,
1042
+ "loss": 0.7987,
1043
+ "num_input_tokens_seen": 9109184,
1044
+ "step": 585
1045
+ },
1046
+ {
1047
+ "epoch": 1.6450261780104714,
1048
+ "grad_norm": 5.917164796099687,
1049
+ "learning_rate": 9.151505349477902e-06,
1050
+ "loss": 0.7814,
1051
+ "num_input_tokens_seen": 9187136,
1052
+ "step": 590
1053
+ },
1054
+ {
1055
+ "epoch": 1.6589877835951135,
1056
+ "grad_norm": 6.642916615289702,
1057
+ "learning_rate": 8.460427920642423e-06,
1058
+ "loss": 0.7907,
1059
+ "num_input_tokens_seen": 9265112,
1060
+ "step": 595
1061
+ },
1062
+ {
1063
+ "epoch": 1.6729493891797556,
1064
+ "grad_norm": 6.455529863705343,
1065
+ "learning_rate": 7.794062526569734e-06,
1066
+ "loss": 0.786,
1067
+ "num_input_tokens_seen": 9343120,
1068
+ "step": 600
1069
+ },
1070
+ {
1071
+ "epoch": 1.6729493891797556,
1072
+ "eval_loss": 0.8021153211593628,
1073
+ "eval_runtime": 46.5141,
1074
+ "eval_samples_per_second": 3.139,
1075
+ "eval_steps_per_second": 0.795,
1076
+ "num_input_tokens_seen": 9343120,
1077
+ "step": 600
1078
+ },
1079
+ {
1080
+ "epoch": 1.6869109947643979,
1081
+ "grad_norm": 18.00061211274046,
1082
+ "learning_rate": 7.152805590332079e-06,
1083
+ "loss": 0.7702,
1084
+ "num_input_tokens_seen": 9421080,
1085
+ "step": 605
1086
+ },
1087
+ {
1088
+ "epoch": 1.7008726003490402,
1089
+ "grad_norm": 6.515893934614007,
1090
+ "learning_rate": 6.53703859789348e-06,
1091
+ "loss": 0.7447,
1092
+ "num_input_tokens_seen": 9499048,
1093
+ "step": 610
1094
+ },
1095
+ {
1096
+ "epoch": 1.7148342059336823,
1097
+ "grad_norm": 8.393536107737633,
1098
+ "learning_rate": 5.947127871162456e-06,
1099
+ "loss": 0.7943,
1100
+ "num_input_tokens_seen": 9577048,
1101
+ "step": 615
1102
+ },
1103
+ {
1104
+ "epoch": 1.7287958115183246,
1105
+ "grad_norm": 10.522936719634327,
1106
+ "learning_rate": 5.383424350065824e-06,
1107
+ "loss": 0.7784,
1108
+ "num_input_tokens_seen": 9655032,
1109
+ "step": 620
1110
+ },
1111
+ {
1112
+ "epoch": 1.742757417102967,
1113
+ "grad_norm": 4.303878340242376,
1114
+ "learning_rate": 4.846263383773364e-06,
1115
+ "loss": 0.8188,
1116
+ "num_input_tokens_seen": 9733000,
1117
+ "step": 625
1118
+ },
1119
+ {
1120
+ "epoch": 1.756719022687609,
1121
+ "grad_norm": 7.6073494826846035,
1122
+ "learning_rate": 4.335964531197401e-06,
1123
+ "loss": 0.7514,
1124
+ "num_input_tokens_seen": 9810984,
1125
+ "step": 630
1126
+ },
1127
+ {
1128
+ "epoch": 1.7706806282722511,
1129
+ "grad_norm": 4.962378312676078,
1130
+ "learning_rate": 3.8528313708861174e-06,
1131
+ "loss": 0.8165,
1132
+ "num_input_tokens_seen": 9888984,
1133
+ "step": 635
1134
+ },
1135
+ {
1136
+ "epoch": 1.7846422338568937,
1137
+ "grad_norm": 5.475236961963435,
1138
+ "learning_rate": 3.397151320423647e-06,
1139
+ "loss": 0.7778,
1140
+ "num_input_tokens_seen": 9966984,
1141
+ "step": 640
1142
+ },
1143
+ {
1144
+ "epoch": 1.7986038394415358,
1145
+ "grad_norm": 7.658362208235343,
1146
+ "learning_rate": 2.9691954654443355e-06,
1147
+ "loss": 0.7745,
1148
+ "num_input_tokens_seen": 10044928,
1149
+ "step": 645
1150
+ },
1151
+ {
1152
+ "epoch": 1.812565445026178,
1153
+ "grad_norm": 6.541910443575109,
1154
+ "learning_rate": 2.5692183983629713e-06,
1155
+ "loss": 0.8312,
1156
+ "num_input_tokens_seen": 10122936,
1157
+ "step": 650
1158
+ },
1159
+ {
1160
+ "epoch": 1.812565445026178,
1161
+ "eval_loss": 0.7986289858818054,
1162
+ "eval_runtime": 46.4273,
1163
+ "eval_samples_per_second": 3.145,
1164
+ "eval_steps_per_second": 0.797,
1165
+ "num_input_tokens_seen": 10122936,
1166
+ "step": 650
1167
+ },
1168
+ {
1169
+ "epoch": 1.8265270506108202,
1170
+ "grad_norm": 9.39953604203485,
1171
+ "learning_rate": 2.197458066916891e-06,
1172
+ "loss": 0.794,
1173
+ "num_input_tokens_seen": 10200848,
1174
+ "step": 655
1175
+ },
1176
+ {
1177
+ "epoch": 1.8404886561954625,
1178
+ "grad_norm": 5.45740552150688,
1179
+ "learning_rate": 1.8541356326100433e-06,
1180
+ "loss": 0.8129,
1181
+ "num_input_tokens_seen": 10278848,
1182
+ "step": 660
1183
+ },
1184
+ {
1185
+ "epoch": 1.8544502617801046,
1186
+ "grad_norm": 5.2960185422888575,
1187
+ "learning_rate": 1.5394553391432143e-06,
1188
+ "loss": 0.8057,
1189
+ "num_input_tokens_seen": 10356800,
1190
+ "step": 665
1191
+ },
1192
+ {
1193
+ "epoch": 1.868411867364747,
1194
+ "grad_norm": 4.792059943516234,
1195
+ "learning_rate": 1.2536043909088191e-06,
1196
+ "loss": 0.7678,
1197
+ "num_input_tokens_seen": 10434768,
1198
+ "step": 670
1199
+ },
1200
+ {
1201
+ "epoch": 1.8823734729493893,
1202
+ "grad_norm": 4.924682676533088,
1203
+ "learning_rate": 9.967528416222838e-07,
1204
+ "loss": 0.7851,
1205
+ "num_input_tokens_seen": 10512752,
1206
+ "step": 675
1207
+ },
1208
+ {
1209
+ "epoch": 1.8963350785340314,
1210
+ "grad_norm": 7.6405932797853096,
1211
+ "learning_rate": 7.690534931565518e-07,
1212
+ "loss": 0.7381,
1213
+ "num_input_tokens_seen": 10590760,
1214
+ "step": 680
1215
+ },
1216
+ {
1217
+ "epoch": 1.9102966841186735,
1218
+ "grad_norm": 4.326706788996488,
1219
+ "learning_rate": 5.706418046396989e-07,
1220
+ "loss": 0.7495,
1221
+ "num_input_tokens_seen": 10668752,
1222
+ "step": 685
1223
+ },
1224
+ {
1225
+ "epoch": 1.924258289703316,
1226
+ "grad_norm": 5.0004961580783815,
1227
+ "learning_rate": 4.0163581186984935e-07,
1228
+ "loss": 0.7406,
1229
+ "num_input_tokens_seen": 10746696,
1230
+ "step": 690
1231
+ },
1232
+ {
1233
+ "epoch": 1.9382198952879581,
1234
+ "grad_norm": 7.870737498179825,
1235
+ "learning_rate": 2.62136057095258e-07,
1236
+ "loss": 0.7823,
1237
+ "num_input_tokens_seen": 10824664,
1238
+ "step": 695
1239
+ },
1240
+ {
1241
+ "epoch": 1.9521815008726002,
1242
+ "grad_norm": 6.748561230572036,
1243
+ "learning_rate": 1.5222552920138856e-07,
1244
+ "loss": 0.7797,
1245
+ "num_input_tokens_seen": 10902632,
1246
+ "step": 700
1247
+ },
1248
+ {
1249
+ "epoch": 1.9521815008726002,
1250
+ "eval_loss": 0.794740617275238,
1251
+ "eval_runtime": 46.5422,
1252
+ "eval_samples_per_second": 3.137,
1253
+ "eval_steps_per_second": 0.795,
1254
+ "num_input_tokens_seen": 10902632,
1255
+ "step": 700
1256
+ },
1257
+ {
1258
+ "epoch": 1.9661431064572426,
1259
+ "grad_norm": 23.84213538429929,
1260
+ "learning_rate": 7.196961434052796e-08,
1261
+ "loss": 0.8029,
1262
+ "num_input_tokens_seen": 10980576,
1263
+ "step": 705
1264
+ },
1265
+ {
1266
+ "epoch": 1.9801047120418849,
1267
+ "grad_norm": 7.4190814811781145,
1268
+ "learning_rate": 2.1416057033352144e-08,
1269
+ "loss": 0.7942,
1270
+ "num_input_tokens_seen": 11058552,
1271
+ "step": 710
1272
+ },
1273
+ {
1274
+ "epoch": 1.994066317626527,
1275
+ "grad_norm": 6.100028794501354,
1276
+ "learning_rate": 5.949317655462583e-10,
1277
+ "loss": 0.7964,
1278
+ "num_input_tokens_seen": 11136520,
1279
+ "step": 715
1280
+ },
1281
+ {
1282
+ "epoch": 1.9968586387434555,
1283
+ "num_input_tokens_seen": 11152104,
1284
+ "step": 716,
1285
+ "total_flos": 754095660204032.0,
1286
+ "train_loss": 0.9049516569136241,
1287
+ "train_runtime": 17154.0309,
1288
+ "train_samples_per_second": 1.336,
1289
+ "train_steps_per_second": 0.042
1290
+ }
1291
+ ],
1292
+ "logging_steps": 5,
1293
+ "max_steps": 716,
1294
+ "num_input_tokens_seen": 11152104,
1295
+ "num_train_epochs": 2,
1296
+ "save_steps": 50,
1297
+ "stateful_callbacks": {
1298
+ "TrainerControl": {
1299
+ "args": {
1300
+ "should_epoch_stop": false,
1301
+ "should_evaluate": false,
1302
+ "should_log": false,
1303
+ "should_save": true,
1304
+ "should_training_stop": true
1305
+ },
1306
+ "attributes": {}
1307
+ }
1308
+ },
1309
+ "total_flos": 754095660204032.0,
1310
+ "train_batch_size": 1,
1311
+ "trial_name": null,
1312
+ "trial_params": null
1313
+ }
training_eval_loss.png ADDED
training_loss.png ADDED