tyzhu commited on
Commit
5d15588
1 Parent(s): 502853f

End of training

Browse files
Files changed (6) hide show
  1. README.md +14 -2
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. tokenizer.json +1 -6
  5. train_results.json +9 -0
  6. trainer_state.json +1379 -0
README.md CHANGED
@@ -3,11 +3,23 @@ license: other
3
  base_model: Qwen/Qwen1.5-4B
4
  tags:
5
  - generated_from_trainer
 
 
6
  metrics:
7
  - accuracy
8
  model-index:
9
  - name: lmind_hotpot_train8000_eval7405_v1_docidx_Qwen_Qwen1.5-4B_3e-5_lora2
10
- results: []
 
 
 
 
 
 
 
 
 
 
11
  library_name: peft
12
  ---
13
 
@@ -16,7 +28,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # lmind_hotpot_train8000_eval7405_v1_docidx_Qwen_Qwen1.5-4B_3e-5_lora2
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 1.3258
22
  - Accuracy: 0.7513
 
3
  base_model: Qwen/Qwen1.5-4B
4
  tags:
5
  - generated_from_trainer
6
+ datasets:
7
+ - tyzhu/lmind_hotpot_train8000_eval7405_v1_docidx
8
  metrics:
9
  - accuracy
10
  model-index:
11
  - name: lmind_hotpot_train8000_eval7405_v1_docidx_Qwen_Qwen1.5-4B_3e-5_lora2
12
+ results:
13
+ - task:
14
+ name: Causal Language Modeling
15
+ type: text-generation
16
+ dataset:
17
+ name: tyzhu/lmind_hotpot_train8000_eval7405_v1_docidx
18
+ type: tyzhu/lmind_hotpot_train8000_eval7405_v1_docidx
19
+ metrics:
20
+ - name: Accuracy
21
+ type: accuracy
22
+ value: 0.7512829373650108
23
  library_name: peft
24
  ---
25
 
 
28
 
29
  # lmind_hotpot_train8000_eval7405_v1_docidx_Qwen_Qwen1.5-4B_3e-5_lora2
30
 
31
+ This model is a fine-tuned version of [Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B) on the tyzhu/lmind_hotpot_train8000_eval7405_v1_docidx dataset.
32
  It achieves the following results on the evaluation set:
33
  - Loss: 1.3258
34
  - Accuracy: 0.7513
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.998510574918082,
3
+ "eval_accuracy": 0.7512829373650108,
4
+ "eval_loss": 1.3258286714553833,
5
+ "eval_runtime": 7.7118,
6
+ "eval_samples": 500,
7
+ "eval_samples_per_second": 64.836,
8
+ "eval_steps_per_second": 8.169,
9
+ "perplexity": 3.765304262709877,
10
+ "total_flos": 1.3733500524072796e+18,
11
+ "train_loss": 0.22626538049336412,
12
+ "train_runtime": 9900.7803,
13
+ "train_samples": 26854,
14
+ "train_samples_per_second": 54.246,
15
+ "train_steps_per_second": 1.695
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.998510574918082,
3
+ "eval_accuracy": 0.7512829373650108,
4
+ "eval_loss": 1.3258286714553833,
5
+ "eval_runtime": 7.7118,
6
+ "eval_samples": 500,
7
+ "eval_samples_per_second": 64.836,
8
+ "eval_steps_per_second": 8.169,
9
+ "perplexity": 3.765304262709877
10
+ }
tokenizer.json CHANGED
@@ -1,11 +1,6 @@
1
  {
2
  "version": "1.0",
3
- "truncation": {
4
- "direction": "Right",
5
- "max_length": 1024,
6
- "strategy": "LongestFirst",
7
- "stride": 0
8
- },
9
  "padding": null,
10
  "added_tokens": [
11
  {
 
1
  {
2
  "version": "1.0",
3
+ "truncation": null,
 
 
 
 
 
4
  "padding": null,
5
  "added_tokens": [
6
  {
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.998510574918082,
3
+ "total_flos": 1.3733500524072796e+18,
4
+ "train_loss": 0.22626538049336412,
5
+ "train_runtime": 9900.7803,
6
+ "train_samples": 26854,
7
+ "train_samples_per_second": 54.246,
8
+ "train_steps_per_second": 1.695
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1379 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 19.998510574918082,
5
+ "eval_steps": 500,
6
+ "global_step": 16780,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.11915400655347036,
13
+ "grad_norm": 0.24907414615154266,
14
+ "learning_rate": 3e-05,
15
+ "loss": 1.7899,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.23830801310694072,
20
+ "grad_norm": 0.2722090780735016,
21
+ "learning_rate": 3e-05,
22
+ "loss": 1.6296,
23
+ "step": 200
24
+ },
25
+ {
26
+ "epoch": 0.3574620196604111,
27
+ "grad_norm": 0.30233123898506165,
28
+ "learning_rate": 3e-05,
29
+ "loss": 1.6514,
30
+ "step": 300
31
+ },
32
+ {
33
+ "epoch": 0.47661602621388144,
34
+ "grad_norm": 0.281762957572937,
35
+ "learning_rate": 3e-05,
36
+ "loss": 1.6225,
37
+ "step": 400
38
+ },
39
+ {
40
+ "epoch": 0.5957700327673519,
41
+ "grad_norm": 0.2878308892250061,
42
+ "learning_rate": 3e-05,
43
+ "loss": 1.6158,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 0.7149240393208222,
48
+ "grad_norm": 0.26083847880363464,
49
+ "learning_rate": 3e-05,
50
+ "loss": 1.6232,
51
+ "step": 600
52
+ },
53
+ {
54
+ "epoch": 0.8340780458742926,
55
+ "grad_norm": 0.3153495490550995,
56
+ "learning_rate": 3e-05,
57
+ "loss": 1.616,
58
+ "step": 700
59
+ },
60
+ {
61
+ "epoch": 0.9532320524277629,
62
+ "grad_norm": 0.3171740770339966,
63
+ "learning_rate": 3e-05,
64
+ "loss": 1.6137,
65
+ "step": 800
66
+ },
67
+ {
68
+ "epoch": 0.9997021149836163,
69
+ "eval_accuracy": 0.7176760259179266,
70
+ "eval_loss": 1.8263936042785645,
71
+ "eval_runtime": 7.3244,
72
+ "eval_samples_per_second": 68.265,
73
+ "eval_steps_per_second": 8.601,
74
+ "step": 839
75
+ },
76
+ {
77
+ "epoch": 1.0723860589812333,
78
+ "grad_norm": 0.30662721395492554,
79
+ "learning_rate": 3e-05,
80
+ "loss": 1.6034,
81
+ "step": 900
82
+ },
83
+ {
84
+ "epoch": 1.1915400655347037,
85
+ "grad_norm": 0.3360077738761902,
86
+ "learning_rate": 3e-05,
87
+ "loss": 1.5982,
88
+ "step": 1000
89
+ },
90
+ {
91
+ "epoch": 1.310694072088174,
92
+ "grad_norm": 0.42994430661201477,
93
+ "learning_rate": 3e-05,
94
+ "loss": 1.5979,
95
+ "step": 1100
96
+ },
97
+ {
98
+ "epoch": 1.4298480786416443,
99
+ "grad_norm": 0.37275680899620056,
100
+ "learning_rate": 3e-05,
101
+ "loss": 1.5859,
102
+ "step": 1200
103
+ },
104
+ {
105
+ "epoch": 1.5490020851951147,
106
+ "grad_norm": 0.3591364622116089,
107
+ "learning_rate": 3e-05,
108
+ "loss": 1.6032,
109
+ "step": 1300
110
+ },
111
+ {
112
+ "epoch": 1.668156091748585,
113
+ "grad_norm": 0.43212202191352844,
114
+ "learning_rate": 3e-05,
115
+ "loss": 1.5805,
116
+ "step": 1400
117
+ },
118
+ {
119
+ "epoch": 1.7873100983020556,
120
+ "grad_norm": 0.36169254779815674,
121
+ "learning_rate": 3e-05,
122
+ "loss": 1.5891,
123
+ "step": 1500
124
+ },
125
+ {
126
+ "epoch": 1.9064641048555258,
127
+ "grad_norm": 0.37111684679985046,
128
+ "learning_rate": 3e-05,
129
+ "loss": 1.5978,
130
+ "step": 1600
131
+ },
132
+ {
133
+ "epoch": 1.9994042299672325,
134
+ "eval_accuracy": 0.7190323974082073,
135
+ "eval_loss": 1.8264212608337402,
136
+ "eval_runtime": 7.4268,
137
+ "eval_samples_per_second": 67.324,
138
+ "eval_steps_per_second": 8.483,
139
+ "step": 1678
140
+ },
141
+ {
142
+ "epoch": 2.025618111408996,
143
+ "grad_norm": 0.38414663076400757,
144
+ "learning_rate": 3e-05,
145
+ "loss": 1.5958,
146
+ "step": 1700
147
+ },
148
+ {
149
+ "epoch": 2.1447721179624666,
150
+ "grad_norm": 0.5014183521270752,
151
+ "learning_rate": 3e-05,
152
+ "loss": 1.5473,
153
+ "step": 1800
154
+ },
155
+ {
156
+ "epoch": 2.2639261245159368,
157
+ "grad_norm": 0.5397933721542358,
158
+ "learning_rate": 3e-05,
159
+ "loss": 1.5569,
160
+ "step": 1900
161
+ },
162
+ {
163
+ "epoch": 2.3830801310694074,
164
+ "grad_norm": 0.5639057159423828,
165
+ "learning_rate": 3e-05,
166
+ "loss": 1.567,
167
+ "step": 2000
168
+ },
169
+ {
170
+ "epoch": 2.5022341376228776,
171
+ "grad_norm": 0.5303636789321899,
172
+ "learning_rate": 3e-05,
173
+ "loss": 1.56,
174
+ "step": 2100
175
+ },
176
+ {
177
+ "epoch": 2.621388144176348,
178
+ "grad_norm": 0.5156840085983276,
179
+ "learning_rate": 3e-05,
180
+ "loss": 1.5562,
181
+ "step": 2200
182
+ },
183
+ {
184
+ "epoch": 2.7405421507298184,
185
+ "grad_norm": 0.5576587319374084,
186
+ "learning_rate": 3e-05,
187
+ "loss": 1.5442,
188
+ "step": 2300
189
+ },
190
+ {
191
+ "epoch": 2.8596961572832886,
192
+ "grad_norm": 0.5914539694786072,
193
+ "learning_rate": 3e-05,
194
+ "loss": 1.548,
195
+ "step": 2400
196
+ },
197
+ {
198
+ "epoch": 2.978850163836759,
199
+ "grad_norm": 0.5907685160636902,
200
+ "learning_rate": 3e-05,
201
+ "loss": 1.5493,
202
+ "step": 2500
203
+ },
204
+ {
205
+ "epoch": 2.999106344950849,
206
+ "eval_accuracy": 0.7211187904967603,
207
+ "eval_loss": 1.7809884548187256,
208
+ "eval_runtime": 6.9826,
209
+ "eval_samples_per_second": 71.606,
210
+ "eval_steps_per_second": 9.022,
211
+ "step": 2517
212
+ },
213
+ {
214
+ "epoch": 3.0980041703902295,
215
+ "grad_norm": 0.597890317440033,
216
+ "learning_rate": 3e-05,
217
+ "loss": 1.5151,
218
+ "step": 2600
219
+ },
220
+ {
221
+ "epoch": 3.2171581769436997,
222
+ "grad_norm": 0.6472206115722656,
223
+ "learning_rate": 3e-05,
224
+ "loss": 1.5069,
225
+ "step": 2700
226
+ },
227
+ {
228
+ "epoch": 3.33631218349717,
229
+ "grad_norm": 0.603473424911499,
230
+ "learning_rate": 3e-05,
231
+ "loss": 1.5205,
232
+ "step": 2800
233
+ },
234
+ {
235
+ "epoch": 3.4554661900506405,
236
+ "grad_norm": 0.6695737242698669,
237
+ "learning_rate": 3e-05,
238
+ "loss": 1.505,
239
+ "step": 2900
240
+ },
241
+ {
242
+ "epoch": 3.5746201966041107,
243
+ "grad_norm": 0.6201011538505554,
244
+ "learning_rate": 3e-05,
245
+ "loss": 1.5005,
246
+ "step": 3000
247
+ },
248
+ {
249
+ "epoch": 3.6937742031575813,
250
+ "grad_norm": 0.6618828773498535,
251
+ "learning_rate": 3e-05,
252
+ "loss": 1.5139,
253
+ "step": 3100
254
+ },
255
+ {
256
+ "epoch": 3.8129282097110515,
257
+ "grad_norm": 0.7267889380455017,
258
+ "learning_rate": 3e-05,
259
+ "loss": 1.5124,
260
+ "step": 3200
261
+ },
262
+ {
263
+ "epoch": 3.932082216264522,
264
+ "grad_norm": 0.6930146217346191,
265
+ "learning_rate": 3e-05,
266
+ "loss": 1.5101,
267
+ "step": 3300
268
+ },
269
+ {
270
+ "epoch": 4.0,
271
+ "eval_accuracy": 0.7231447084233261,
272
+ "eval_loss": 1.7412924766540527,
273
+ "eval_runtime": 6.9917,
274
+ "eval_samples_per_second": 71.513,
275
+ "eval_steps_per_second": 9.011,
276
+ "step": 3357
277
+ },
278
+ {
279
+ "epoch": 4.051236222817992,
280
+ "grad_norm": 0.6705595850944519,
281
+ "learning_rate": 3e-05,
282
+ "loss": 1.488,
283
+ "step": 3400
284
+ },
285
+ {
286
+ "epoch": 4.1703902293714625,
287
+ "grad_norm": 0.707845151424408,
288
+ "learning_rate": 3e-05,
289
+ "loss": 1.4576,
290
+ "step": 3500
291
+ },
292
+ {
293
+ "epoch": 4.289544235924933,
294
+ "grad_norm": 0.698554515838623,
295
+ "learning_rate": 3e-05,
296
+ "loss": 1.4619,
297
+ "step": 3600
298
+ },
299
+ {
300
+ "epoch": 4.408698242478403,
301
+ "grad_norm": 0.7763566970825195,
302
+ "learning_rate": 3e-05,
303
+ "loss": 1.454,
304
+ "step": 3700
305
+ },
306
+ {
307
+ "epoch": 4.5278522490318736,
308
+ "grad_norm": 0.7236441969871521,
309
+ "learning_rate": 3e-05,
310
+ "loss": 1.462,
311
+ "step": 3800
312
+ },
313
+ {
314
+ "epoch": 4.647006255585344,
315
+ "grad_norm": 0.7589882612228394,
316
+ "learning_rate": 3e-05,
317
+ "loss": 1.4766,
318
+ "step": 3900
319
+ },
320
+ {
321
+ "epoch": 4.766160262138815,
322
+ "grad_norm": 0.818758487701416,
323
+ "learning_rate": 3e-05,
324
+ "loss": 1.4625,
325
+ "step": 4000
326
+ },
327
+ {
328
+ "epoch": 4.885314268692285,
329
+ "grad_norm": 0.7571113109588623,
330
+ "learning_rate": 3e-05,
331
+ "loss": 1.4678,
332
+ "step": 4100
333
+ },
334
+ {
335
+ "epoch": 4.9997021149836165,
336
+ "eval_accuracy": 0.7247473002159828,
337
+ "eval_loss": 1.7400386333465576,
338
+ "eval_runtime": 7.2905,
339
+ "eval_samples_per_second": 68.582,
340
+ "eval_steps_per_second": 8.641,
341
+ "step": 4196
342
+ },
343
+ {
344
+ "epoch": 5.004468275245755,
345
+ "grad_norm": 0.711908221244812,
346
+ "learning_rate": 3e-05,
347
+ "loss": 1.4785,
348
+ "step": 4200
349
+ },
350
+ {
351
+ "epoch": 5.123622281799226,
352
+ "grad_norm": 0.7867583632469177,
353
+ "learning_rate": 3e-05,
354
+ "loss": 1.4144,
355
+ "step": 4300
356
+ },
357
+ {
358
+ "epoch": 5.242776288352696,
359
+ "grad_norm": 0.8279469013214111,
360
+ "learning_rate": 3e-05,
361
+ "loss": 1.4163,
362
+ "step": 4400
363
+ },
364
+ {
365
+ "epoch": 5.361930294906166,
366
+ "grad_norm": 0.7871400713920593,
367
+ "learning_rate": 3e-05,
368
+ "loss": 1.4124,
369
+ "step": 4500
370
+ },
371
+ {
372
+ "epoch": 5.481084301459637,
373
+ "grad_norm": 0.8378657102584839,
374
+ "learning_rate": 3e-05,
375
+ "loss": 1.4212,
376
+ "step": 4600
377
+ },
378
+ {
379
+ "epoch": 5.600238308013107,
380
+ "grad_norm": 0.8661020994186401,
381
+ "learning_rate": 3e-05,
382
+ "loss": 1.4213,
383
+ "step": 4700
384
+ },
385
+ {
386
+ "epoch": 5.719392314566577,
387
+ "grad_norm": 0.8194323778152466,
388
+ "learning_rate": 3e-05,
389
+ "loss": 1.4275,
390
+ "step": 4800
391
+ },
392
+ {
393
+ "epoch": 5.838546321120048,
394
+ "grad_norm": 0.861905038356781,
395
+ "learning_rate": 3e-05,
396
+ "loss": 1.4141,
397
+ "step": 4900
398
+ },
399
+ {
400
+ "epoch": 5.957700327673518,
401
+ "grad_norm": 0.8899133205413818,
402
+ "learning_rate": 3e-05,
403
+ "loss": 1.4236,
404
+ "step": 5000
405
+ },
406
+ {
407
+ "epoch": 5.999404229967233,
408
+ "eval_accuracy": 0.726682505399568,
409
+ "eval_loss": 1.7018622159957886,
410
+ "eval_runtime": 7.202,
411
+ "eval_samples_per_second": 69.425,
412
+ "eval_steps_per_second": 8.748,
413
+ "step": 5035
414
+ },
415
+ {
416
+ "epoch": 6.076854334226988,
417
+ "grad_norm": 0.9570803642272949,
418
+ "learning_rate": 3e-05,
419
+ "loss": 1.3728,
420
+ "step": 5100
421
+ },
422
+ {
423
+ "epoch": 6.196008340780459,
424
+ "grad_norm": 0.9672366976737976,
425
+ "learning_rate": 3e-05,
426
+ "loss": 1.3649,
427
+ "step": 5200
428
+ },
429
+ {
430
+ "epoch": 6.31516234733393,
431
+ "grad_norm": 0.912813663482666,
432
+ "learning_rate": 3e-05,
433
+ "loss": 1.3586,
434
+ "step": 5300
435
+ },
436
+ {
437
+ "epoch": 6.434316353887399,
438
+ "grad_norm": 0.9537220597267151,
439
+ "learning_rate": 3e-05,
440
+ "loss": 1.3888,
441
+ "step": 5400
442
+ },
443
+ {
444
+ "epoch": 6.55347036044087,
445
+ "grad_norm": 1.0322316884994507,
446
+ "learning_rate": 3e-05,
447
+ "loss": 1.3805,
448
+ "step": 5500
449
+ },
450
+ {
451
+ "epoch": 6.67262436699434,
452
+ "grad_norm": 0.9503114223480225,
453
+ "learning_rate": 3e-05,
454
+ "loss": 1.3797,
455
+ "step": 5600
456
+ },
457
+ {
458
+ "epoch": 6.79177837354781,
459
+ "grad_norm": 0.980711817741394,
460
+ "learning_rate": 3e-05,
461
+ "loss": 1.372,
462
+ "step": 5700
463
+ },
464
+ {
465
+ "epoch": 6.910932380101281,
466
+ "grad_norm": 0.9597466588020325,
467
+ "learning_rate": 3e-05,
468
+ "loss": 1.3843,
469
+ "step": 5800
470
+ },
471
+ {
472
+ "epoch": 6.999106344950849,
473
+ "eval_accuracy": 0.7285701943844493,
474
+ "eval_loss": 1.67252779006958,
475
+ "eval_runtime": 7.0918,
476
+ "eval_samples_per_second": 70.504,
477
+ "eval_steps_per_second": 8.883,
478
+ "step": 5874
479
+ },
480
+ {
481
+ "epoch": 7.030086386654752,
482
+ "grad_norm": 1.0293294191360474,
483
+ "learning_rate": 3e-05,
484
+ "loss": 1.3604,
485
+ "step": 5900
486
+ },
487
+ {
488
+ "epoch": 7.149240393208221,
489
+ "grad_norm": 1.1493397951126099,
490
+ "learning_rate": 3e-05,
491
+ "loss": 1.321,
492
+ "step": 6000
493
+ },
494
+ {
495
+ "epoch": 7.268394399761692,
496
+ "grad_norm": 1.0697908401489258,
497
+ "learning_rate": 3e-05,
498
+ "loss": 1.3199,
499
+ "step": 6100
500
+ },
501
+ {
502
+ "epoch": 7.387548406315163,
503
+ "grad_norm": 0.9961449503898621,
504
+ "learning_rate": 3e-05,
505
+ "loss": 1.3055,
506
+ "step": 6200
507
+ },
508
+ {
509
+ "epoch": 7.506702412868632,
510
+ "grad_norm": 0.9932839274406433,
511
+ "learning_rate": 3e-05,
512
+ "loss": 1.3357,
513
+ "step": 6300
514
+ },
515
+ {
516
+ "epoch": 7.625856419422103,
517
+ "grad_norm": 1.1646201610565186,
518
+ "learning_rate": 3e-05,
519
+ "loss": 1.3237,
520
+ "step": 6400
521
+ },
522
+ {
523
+ "epoch": 7.745010425975574,
524
+ "grad_norm": 1.049346685409546,
525
+ "learning_rate": 3e-05,
526
+ "loss": 1.3364,
527
+ "step": 6500
528
+ },
529
+ {
530
+ "epoch": 7.864164432529043,
531
+ "grad_norm": 1.0449471473693848,
532
+ "learning_rate": 3e-05,
533
+ "loss": 1.3321,
534
+ "step": 6600
535
+ },
536
+ {
537
+ "epoch": 7.983318439082514,
538
+ "grad_norm": 1.0868715047836304,
539
+ "learning_rate": 3e-05,
540
+ "loss": 1.3481,
541
+ "step": 6700
542
+ },
543
+ {
544
+ "epoch": 8.0,
545
+ "eval_accuracy": 0.7304362850971923,
546
+ "eval_loss": 1.6380517482757568,
547
+ "eval_runtime": 7.0963,
548
+ "eval_samples_per_second": 70.459,
549
+ "eval_steps_per_second": 8.878,
550
+ "step": 6714
551
+ },
552
+ {
553
+ "epoch": 8.102472445635984,
554
+ "grad_norm": 1.170162558555603,
555
+ "learning_rate": 3e-05,
556
+ "loss": 1.2724,
557
+ "step": 6800
558
+ },
559
+ {
560
+ "epoch": 8.221626452189454,
561
+ "grad_norm": 1.0903751850128174,
562
+ "learning_rate": 3e-05,
563
+ "loss": 1.2611,
564
+ "step": 6900
565
+ },
566
+ {
567
+ "epoch": 8.340780458742925,
568
+ "grad_norm": 1.1591229438781738,
569
+ "learning_rate": 3e-05,
570
+ "loss": 1.2769,
571
+ "step": 7000
572
+ },
573
+ {
574
+ "epoch": 8.459934465296396,
575
+ "grad_norm": 1.0120233297348022,
576
+ "learning_rate": 3e-05,
577
+ "loss": 1.2806,
578
+ "step": 7100
579
+ },
580
+ {
581
+ "epoch": 8.579088471849866,
582
+ "grad_norm": 1.1763852834701538,
583
+ "learning_rate": 3e-05,
584
+ "loss": 1.3051,
585
+ "step": 7200
586
+ },
587
+ {
588
+ "epoch": 8.698242478403337,
589
+ "grad_norm": 1.1366691589355469,
590
+ "learning_rate": 3e-05,
591
+ "loss": 1.283,
592
+ "step": 7300
593
+ },
594
+ {
595
+ "epoch": 8.817396484956806,
596
+ "grad_norm": 1.1808778047561646,
597
+ "learning_rate": 3e-05,
598
+ "loss": 1.2863,
599
+ "step": 7400
600
+ },
601
+ {
602
+ "epoch": 8.936550491510276,
603
+ "grad_norm": 1.2300970554351807,
604
+ "learning_rate": 3e-05,
605
+ "loss": 1.2954,
606
+ "step": 7500
607
+ },
608
+ {
609
+ "epoch": 8.999702114983616,
610
+ "eval_accuracy": 0.7324362850971923,
611
+ "eval_loss": 1.6103289127349854,
612
+ "eval_runtime": 7.1148,
613
+ "eval_samples_per_second": 70.276,
614
+ "eval_steps_per_second": 8.855,
615
+ "step": 7553
616
+ },
617
+ {
618
+ "epoch": 9.055704498063747,
619
+ "grad_norm": 1.319324016571045,
620
+ "learning_rate": 3e-05,
621
+ "loss": 1.2613,
622
+ "step": 7600
623
+ },
624
+ {
625
+ "epoch": 9.174858504617218,
626
+ "grad_norm": 1.1479202508926392,
627
+ "learning_rate": 3e-05,
628
+ "loss": 1.2172,
629
+ "step": 7700
630
+ },
631
+ {
632
+ "epoch": 9.294012511170688,
633
+ "grad_norm": 1.147423505783081,
634
+ "learning_rate": 3e-05,
635
+ "loss": 1.2342,
636
+ "step": 7800
637
+ },
638
+ {
639
+ "epoch": 9.413166517724159,
640
+ "grad_norm": 1.2309343814849854,
641
+ "learning_rate": 3e-05,
642
+ "loss": 1.2386,
643
+ "step": 7900
644
+ },
645
+ {
646
+ "epoch": 9.53232052427763,
647
+ "grad_norm": 1.121598243713379,
648
+ "learning_rate": 3e-05,
649
+ "loss": 1.2312,
650
+ "step": 8000
651
+ },
652
+ {
653
+ "epoch": 9.651474530831099,
654
+ "grad_norm": 1.3166605234146118,
655
+ "learning_rate": 3e-05,
656
+ "loss": 1.2376,
657
+ "step": 8100
658
+ },
659
+ {
660
+ "epoch": 9.77062853738457,
661
+ "grad_norm": 1.2599198818206787,
662
+ "learning_rate": 3e-05,
663
+ "loss": 1.2457,
664
+ "step": 8200
665
+ },
666
+ {
667
+ "epoch": 9.88978254393804,
668
+ "grad_norm": 1.297348141670227,
669
+ "learning_rate": 3e-05,
670
+ "loss": 1.2426,
671
+ "step": 8300
672
+ },
673
+ {
674
+ "epoch": 9.999404229967233,
675
+ "eval_accuracy": 0.7338099352051836,
676
+ "eval_loss": 1.5784611701965332,
677
+ "eval_runtime": 7.1512,
678
+ "eval_samples_per_second": 69.918,
679
+ "eval_steps_per_second": 8.81,
680
+ "step": 8392
681
+ },
682
+ {
683
+ "epoch": 10.00893655049151,
684
+ "grad_norm": 1.33698308467865,
685
+ "learning_rate": 3e-05,
686
+ "loss": 1.2404,
687
+ "step": 8400
688
+ },
689
+ {
690
+ "epoch": 10.128090557044981,
691
+ "grad_norm": 1.469753384590149,
692
+ "learning_rate": 3e-05,
693
+ "loss": 1.1701,
694
+ "step": 8500
695
+ },
696
+ {
697
+ "epoch": 10.247244563598452,
698
+ "grad_norm": 1.8849149942398071,
699
+ "learning_rate": 3e-05,
700
+ "loss": 1.1834,
701
+ "step": 8600
702
+ },
703
+ {
704
+ "epoch": 10.36639857015192,
705
+ "grad_norm": 1.3924856185913086,
706
+ "learning_rate": 3e-05,
707
+ "loss": 1.1877,
708
+ "step": 8700
709
+ },
710
+ {
711
+ "epoch": 10.485552576705391,
712
+ "grad_norm": 1.3874456882476807,
713
+ "learning_rate": 3e-05,
714
+ "loss": 1.1965,
715
+ "step": 8800
716
+ },
717
+ {
718
+ "epoch": 10.604706583258862,
719
+ "grad_norm": 1.5605155229568481,
720
+ "learning_rate": 3e-05,
721
+ "loss": 1.1833,
722
+ "step": 8900
723
+ },
724
+ {
725
+ "epoch": 10.723860589812332,
726
+ "grad_norm": 1.2716716527938843,
727
+ "learning_rate": 3e-05,
728
+ "loss": 1.1923,
729
+ "step": 9000
730
+ },
731
+ {
732
+ "epoch": 10.843014596365803,
733
+ "grad_norm": 1.434921383857727,
734
+ "learning_rate": 3e-05,
735
+ "loss": 1.2105,
736
+ "step": 9100
737
+ },
738
+ {
739
+ "epoch": 10.962168602919274,
740
+ "grad_norm": 1.3238595724105835,
741
+ "learning_rate": 3e-05,
742
+ "loss": 1.2169,
743
+ "step": 9200
744
+ },
745
+ {
746
+ "epoch": 10.99910634495085,
747
+ "eval_accuracy": 0.7355291576673866,
748
+ "eval_loss": 1.5435452461242676,
749
+ "eval_runtime": 7.5254,
750
+ "eval_samples_per_second": 66.442,
751
+ "eval_steps_per_second": 8.372,
752
+ "step": 9231
753
+ },
754
+ {
755
+ "epoch": 11.081322609472744,
756
+ "grad_norm": 1.414612054824829,
757
+ "learning_rate": 3e-05,
758
+ "loss": 1.1529,
759
+ "step": 9300
760
+ },
761
+ {
762
+ "epoch": 11.200476616026213,
763
+ "grad_norm": 1.3387949466705322,
764
+ "learning_rate": 3e-05,
765
+ "loss": 1.1356,
766
+ "step": 9400
767
+ },
768
+ {
769
+ "epoch": 11.319630622579684,
770
+ "grad_norm": 1.312972068786621,
771
+ "learning_rate": 3e-05,
772
+ "loss": 1.1461,
773
+ "step": 9500
774
+ },
775
+ {
776
+ "epoch": 11.438784629133155,
777
+ "grad_norm": 1.5637264251708984,
778
+ "learning_rate": 3e-05,
779
+ "loss": 1.1396,
780
+ "step": 9600
781
+ },
782
+ {
783
+ "epoch": 11.557938635686625,
784
+ "grad_norm": 1.419885277748108,
785
+ "learning_rate": 3e-05,
786
+ "loss": 1.16,
787
+ "step": 9700
788
+ },
789
+ {
790
+ "epoch": 11.677092642240096,
791
+ "grad_norm": 1.555853009223938,
792
+ "learning_rate": 3e-05,
793
+ "loss": 1.1438,
794
+ "step": 9800
795
+ },
796
+ {
797
+ "epoch": 11.796246648793566,
798
+ "grad_norm": 1.543087124824524,
799
+ "learning_rate": 3e-05,
800
+ "loss": 1.1596,
801
+ "step": 9900
802
+ },
803
+ {
804
+ "epoch": 11.915400655347035,
805
+ "grad_norm": 1.6761687994003296,
806
+ "learning_rate": 3e-05,
807
+ "loss": 1.167,
808
+ "step": 10000
809
+ },
810
+ {
811
+ "epoch": 12.0,
812
+ "eval_accuracy": 0.7374902807775378,
813
+ "eval_loss": 1.5215668678283691,
814
+ "eval_runtime": 7.0414,
815
+ "eval_samples_per_second": 71.008,
816
+ "eval_steps_per_second": 8.947,
817
+ "step": 10071
818
+ },
819
+ {
820
+ "epoch": 12.034554661900506,
821
+ "grad_norm": 1.6120909452438354,
822
+ "learning_rate": 3e-05,
823
+ "loss": 1.1362,
824
+ "step": 10100
825
+ },
826
+ {
827
+ "epoch": 12.153708668453977,
828
+ "grad_norm": 1.432334542274475,
829
+ "learning_rate": 3e-05,
830
+ "loss": 1.0834,
831
+ "step": 10200
832
+ },
833
+ {
834
+ "epoch": 12.272862675007447,
835
+ "grad_norm": 1.400549292564392,
836
+ "learning_rate": 3e-05,
837
+ "loss": 1.096,
838
+ "step": 10300
839
+ },
840
+ {
841
+ "epoch": 12.392016681560918,
842
+ "grad_norm": 1.365933895111084,
843
+ "learning_rate": 3e-05,
844
+ "loss": 1.0918,
845
+ "step": 10400
846
+ },
847
+ {
848
+ "epoch": 12.511170688114388,
849
+ "grad_norm": 1.9454312324523926,
850
+ "learning_rate": 3e-05,
851
+ "loss": 1.0951,
852
+ "step": 10500
853
+ },
854
+ {
855
+ "epoch": 12.63032469466786,
856
+ "grad_norm": 1.6147247552871704,
857
+ "learning_rate": 3e-05,
858
+ "loss": 1.115,
859
+ "step": 10600
860
+ },
861
+ {
862
+ "epoch": 12.749478701221328,
863
+ "grad_norm": 1.5626009702682495,
864
+ "learning_rate": 3e-05,
865
+ "loss": 1.118,
866
+ "step": 10700
867
+ },
868
+ {
869
+ "epoch": 12.868632707774799,
870
+ "grad_norm": 1.6438124179840088,
871
+ "learning_rate": 3e-05,
872
+ "loss": 1.1273,
873
+ "step": 10800
874
+ },
875
+ {
876
+ "epoch": 12.98778671432827,
877
+ "grad_norm": 1.5316485166549683,
878
+ "learning_rate": 3e-05,
879
+ "loss": 1.1276,
880
+ "step": 10900
881
+ },
882
+ {
883
+ "epoch": 12.999702114983616,
884
+ "eval_accuracy": 0.7392181425485961,
885
+ "eval_loss": 1.4949097633361816,
886
+ "eval_runtime": 7.1295,
887
+ "eval_samples_per_second": 70.131,
888
+ "eval_steps_per_second": 8.837,
889
+ "step": 10910
890
+ },
891
+ {
892
+ "epoch": 13.10694072088174,
893
+ "grad_norm": 1.6530476808547974,
894
+ "learning_rate": 3e-05,
895
+ "loss": 1.0685,
896
+ "step": 11000
897
+ },
898
+ {
899
+ "epoch": 13.22609472743521,
900
+ "grad_norm": 1.571428894996643,
901
+ "learning_rate": 3e-05,
902
+ "loss": 1.0333,
903
+ "step": 11100
904
+ },
905
+ {
906
+ "epoch": 13.345248733988681,
907
+ "grad_norm": 1.5977364778518677,
908
+ "learning_rate": 3e-05,
909
+ "loss": 1.0519,
910
+ "step": 11200
911
+ },
912
+ {
913
+ "epoch": 13.46440274054215,
914
+ "grad_norm": 1.8079047203063965,
915
+ "learning_rate": 3e-05,
916
+ "loss": 1.0746,
917
+ "step": 11300
918
+ },
919
+ {
920
+ "epoch": 13.58355674709562,
921
+ "grad_norm": 1.59292733669281,
922
+ "learning_rate": 3e-05,
923
+ "loss": 1.0751,
924
+ "step": 11400
925
+ },
926
+ {
927
+ "epoch": 13.702710753649091,
928
+ "grad_norm": 1.6971626281738281,
929
+ "learning_rate": 3e-05,
930
+ "loss": 1.0613,
931
+ "step": 11500
932
+ },
933
+ {
934
+ "epoch": 13.821864760202562,
935
+ "grad_norm": 1.7633167505264282,
936
+ "learning_rate": 3e-05,
937
+ "loss": 1.0639,
938
+ "step": 11600
939
+ },
940
+ {
941
+ "epoch": 13.941018766756033,
942
+ "grad_norm": 1.5861283540725708,
943
+ "learning_rate": 3e-05,
944
+ "loss": 1.0819,
945
+ "step": 11700
946
+ },
947
+ {
948
+ "epoch": 13.999404229967233,
949
+ "eval_accuracy": 0.7406004319654428,
950
+ "eval_loss": 1.4818942546844482,
951
+ "eval_runtime": 7.4071,
952
+ "eval_samples_per_second": 67.503,
953
+ "eval_steps_per_second": 8.505,
954
+ "step": 11749
955
+ },
956
+ {
957
+ "epoch": 14.060172773309503,
958
+ "grad_norm": 1.7363096475601196,
959
+ "learning_rate": 3e-05,
960
+ "loss": 1.059,
961
+ "step": 11800
962
+ },
963
+ {
964
+ "epoch": 14.179326779862972,
965
+ "grad_norm": 1.8236732482910156,
966
+ "learning_rate": 3e-05,
967
+ "loss": 1.0138,
968
+ "step": 11900
969
+ },
970
+ {
971
+ "epoch": 14.298480786416443,
972
+ "grad_norm": 1.7334719896316528,
973
+ "learning_rate": 3e-05,
974
+ "loss": 1.0069,
975
+ "step": 12000
976
+ },
977
+ {
978
+ "epoch": 14.417634792969913,
979
+ "grad_norm": 1.6551170349121094,
980
+ "learning_rate": 3e-05,
981
+ "loss": 1.0113,
982
+ "step": 12100
983
+ },
984
+ {
985
+ "epoch": 14.536788799523384,
986
+ "grad_norm": 1.630021095275879,
987
+ "learning_rate": 3e-05,
988
+ "loss": 1.0346,
989
+ "step": 12200
990
+ },
991
+ {
992
+ "epoch": 14.655942806076855,
993
+ "grad_norm": 1.7144087553024292,
994
+ "learning_rate": 3e-05,
995
+ "loss": 1.0095,
996
+ "step": 12300
997
+ },
998
+ {
999
+ "epoch": 14.775096812630325,
1000
+ "grad_norm": 1.7622939348220825,
1001
+ "learning_rate": 3e-05,
1002
+ "loss": 1.0338,
1003
+ "step": 12400
1004
+ },
1005
+ {
1006
+ "epoch": 14.894250819183796,
1007
+ "grad_norm": 1.7320427894592285,
1008
+ "learning_rate": 3e-05,
1009
+ "loss": 1.032,
1010
+ "step": 12500
1011
+ },
1012
+ {
1013
+ "epoch": 14.99910634495085,
1014
+ "eval_accuracy": 0.7426090712742981,
1015
+ "eval_loss": 1.4468342065811157,
1016
+ "eval_runtime": 7.1972,
1017
+ "eval_samples_per_second": 69.471,
1018
+ "eval_steps_per_second": 8.753,
1019
+ "step": 12588
1020
+ },
1021
+ {
1022
+ "epoch": 15.01787310098302,
1023
+ "grad_norm": 1.8335142135620117,
1024
+ "learning_rate": 3e-05,
1025
+ "loss": 0.9336,
1026
+ "step": 12600
1027
+ },
1028
+ {
1029
+ "epoch": 15.137027107536491,
1030
+ "grad_norm": 1.7047183513641357,
1031
+ "learning_rate": 3e-05,
1032
+ "loss": 0.9394,
1033
+ "step": 12700
1034
+ },
1035
+ {
1036
+ "epoch": 15.256181114089962,
1037
+ "grad_norm": 1.5435420274734497,
1038
+ "learning_rate": 3e-05,
1039
+ "loss": 0.9804,
1040
+ "step": 12800
1041
+ },
1042
+ {
1043
+ "epoch": 15.375335120643431,
1044
+ "grad_norm": 1.994692325592041,
1045
+ "learning_rate": 3e-05,
1046
+ "loss": 0.9765,
1047
+ "step": 12900
1048
+ },
1049
+ {
1050
+ "epoch": 15.494489127196902,
1051
+ "grad_norm": 2.072622537612915,
1052
+ "learning_rate": 3e-05,
1053
+ "loss": 0.9955,
1054
+ "step": 13000
1055
+ },
1056
+ {
1057
+ "epoch": 15.613643133750372,
1058
+ "grad_norm": 2.14208722114563,
1059
+ "learning_rate": 3e-05,
1060
+ "loss": 0.9814,
1061
+ "step": 13100
1062
+ },
1063
+ {
1064
+ "epoch": 15.732797140303843,
1065
+ "grad_norm": 1.855945110321045,
1066
+ "learning_rate": 3e-05,
1067
+ "loss": 0.9939,
1068
+ "step": 13200
1069
+ },
1070
+ {
1071
+ "epoch": 15.851951146857314,
1072
+ "grad_norm": 1.8408160209655762,
1073
+ "learning_rate": 3e-05,
1074
+ "loss": 0.9853,
1075
+ "step": 13300
1076
+ },
1077
+ {
1078
+ "epoch": 15.971105153410784,
1079
+ "grad_norm": 1.8263169527053833,
1080
+ "learning_rate": 3e-05,
1081
+ "loss": 0.9981,
1082
+ "step": 13400
1083
+ },
1084
+ {
1085
+ "epoch": 15.999702114983616,
1086
+ "eval_accuracy": 0.7442721382289417,
1087
+ "eval_loss": 1.4091745615005493,
1088
+ "eval_runtime": 8.0125,
1089
+ "eval_samples_per_second": 62.403,
1090
+ "eval_steps_per_second": 7.863,
1091
+ "step": 13424
1092
+ },
1093
+ {
1094
+ "epoch": 16.090259159964255,
1095
+ "grad_norm": 2.0035765171051025,
1096
+ "learning_rate": 3e-05,
1097
+ "loss": 0.9524,
1098
+ "step": 13500
1099
+ },
1100
+ {
1101
+ "epoch": 16.209413166517724,
1102
+ "grad_norm": 2.207390785217285,
1103
+ "learning_rate": 3e-05,
1104
+ "loss": 0.9136,
1105
+ "step": 13600
1106
+ },
1107
+ {
1108
+ "epoch": 16.328567173071196,
1109
+ "grad_norm": 1.8691837787628174,
1110
+ "learning_rate": 3e-05,
1111
+ "loss": 0.9374,
1112
+ "step": 13700
1113
+ },
1114
+ {
1115
+ "epoch": 16.447721179624665,
1116
+ "grad_norm": 1.9585497379302979,
1117
+ "learning_rate": 3e-05,
1118
+ "loss": 0.9427,
1119
+ "step": 13800
1120
+ },
1121
+ {
1122
+ "epoch": 16.566875186178134,
1123
+ "grad_norm": 2.1230435371398926,
1124
+ "learning_rate": 3e-05,
1125
+ "loss": 0.9485,
1126
+ "step": 13900
1127
+ },
1128
+ {
1129
+ "epoch": 16.686029192731606,
1130
+ "grad_norm": 1.8812588453292847,
1131
+ "learning_rate": 3e-05,
1132
+ "loss": 0.9474,
1133
+ "step": 14000
1134
+ },
1135
+ {
1136
+ "epoch": 16.805183199285075,
1137
+ "grad_norm": 2.00522518157959,
1138
+ "learning_rate": 3e-05,
1139
+ "loss": 0.9554,
1140
+ "step": 14100
1141
+ },
1142
+ {
1143
+ "epoch": 16.924337205838548,
1144
+ "grad_norm": 2.1199073791503906,
1145
+ "learning_rate": 3e-05,
1146
+ "loss": 0.9523,
1147
+ "step": 14200
1148
+ },
1149
+ {
1150
+ "epoch": 16.999404229967233,
1151
+ "eval_accuracy": 0.7462505399568035,
1152
+ "eval_loss": 1.394910454750061,
1153
+ "eval_runtime": 7.7306,
1154
+ "eval_samples_per_second": 64.678,
1155
+ "eval_steps_per_second": 8.149,
1156
+ "step": 14263
1157
+ },
1158
+ {
1159
+ "epoch": 17.043491212392016,
1160
+ "grad_norm": 1.9974786043167114,
1161
+ "learning_rate": 3e-05,
1162
+ "loss": 0.9456,
1163
+ "step": 14300
1164
+ },
1165
+ {
1166
+ "epoch": 17.16264521894549,
1167
+ "grad_norm": 1.8176357746124268,
1168
+ "learning_rate": 3e-05,
1169
+ "loss": 0.8874,
1170
+ "step": 14400
1171
+ },
1172
+ {
1173
+ "epoch": 17.281799225498958,
1174
+ "grad_norm": 2.0692601203918457,
1175
+ "learning_rate": 3e-05,
1176
+ "loss": 0.9041,
1177
+ "step": 14500
1178
+ },
1179
+ {
1180
+ "epoch": 17.400953232052427,
1181
+ "grad_norm": 2.0602495670318604,
1182
+ "learning_rate": 3e-05,
1183
+ "loss": 0.9052,
1184
+ "step": 14600
1185
+ },
1186
+ {
1187
+ "epoch": 17.5201072386059,
1188
+ "grad_norm": 2.2935659885406494,
1189
+ "learning_rate": 3e-05,
1190
+ "loss": 0.9026,
1191
+ "step": 14700
1192
+ },
1193
+ {
1194
+ "epoch": 17.639261245159368,
1195
+ "grad_norm": 2.3279073238372803,
1196
+ "learning_rate": 3e-05,
1197
+ "loss": 0.8857,
1198
+ "step": 14800
1199
+ },
1200
+ {
1201
+ "epoch": 17.75841525171284,
1202
+ "grad_norm": 2.5026357173919678,
1203
+ "learning_rate": 3e-05,
1204
+ "loss": 0.9163,
1205
+ "step": 14900
1206
+ },
1207
+ {
1208
+ "epoch": 17.87756925826631,
1209
+ "grad_norm": 1.9783616065979004,
1210
+ "learning_rate": 3e-05,
1211
+ "loss": 0.907,
1212
+ "step": 15000
1213
+ },
1214
+ {
1215
+ "epoch": 17.996723264819778,
1216
+ "grad_norm": 1.7425028085708618,
1217
+ "learning_rate": 3e-05,
1218
+ "loss": 0.9281,
1219
+ "step": 15100
1220
+ },
1221
+ {
1222
+ "epoch": 17.99910634495085,
1223
+ "eval_accuracy": 0.7477105831533477,
1224
+ "eval_loss": 1.3853000402450562,
1225
+ "eval_runtime": 7.8211,
1226
+ "eval_samples_per_second": 63.93,
1227
+ "eval_steps_per_second": 8.055,
1228
+ "step": 15102
1229
+ },
1230
+ {
1231
+ "epoch": 18.11587727137325,
1232
+ "grad_norm": 2.080223798751831,
1233
+ "learning_rate": 3e-05,
1234
+ "loss": 0.8379,
1235
+ "step": 15200
1236
+ },
1237
+ {
1238
+ "epoch": 18.23503127792672,
1239
+ "grad_norm": 2.135795831680298,
1240
+ "learning_rate": 3e-05,
1241
+ "loss": 0.857,
1242
+ "step": 15300
1243
+ },
1244
+ {
1245
+ "epoch": 18.35418528448019,
1246
+ "grad_norm": 1.939634919166565,
1247
+ "learning_rate": 3e-05,
1248
+ "loss": 0.8646,
1249
+ "step": 15400
1250
+ },
1251
+ {
1252
+ "epoch": 18.47333929103366,
1253
+ "grad_norm": 2.035285234451294,
1254
+ "learning_rate": 3e-05,
1255
+ "loss": 0.8725,
1256
+ "step": 15500
1257
+ },
1258
+ {
1259
+ "epoch": 18.592493297587133,
1260
+ "grad_norm": 1.9423282146453857,
1261
+ "learning_rate": 3e-05,
1262
+ "loss": 0.8773,
1263
+ "step": 15600
1264
+ },
1265
+ {
1266
+ "epoch": 18.7116473041406,
1267
+ "grad_norm": 1.8535542488098145,
1268
+ "learning_rate": 3e-05,
1269
+ "loss": 0.8783,
1270
+ "step": 15700
1271
+ },
1272
+ {
1273
+ "epoch": 18.83080131069407,
1274
+ "grad_norm": 1.8560757637023926,
1275
+ "learning_rate": 3e-05,
1276
+ "loss": 0.8783,
1277
+ "step": 15800
1278
+ },
1279
+ {
1280
+ "epoch": 18.949955317247543,
1281
+ "grad_norm": 2.0741496086120605,
1282
+ "learning_rate": 3e-05,
1283
+ "loss": 0.8664,
1284
+ "step": 15900
1285
+ },
1286
+ {
1287
+ "epoch": 19.0,
1288
+ "eval_accuracy": 0.7496069114470842,
1289
+ "eval_loss": 1.366912603378296,
1290
+ "eval_runtime": 7.6548,
1291
+ "eval_samples_per_second": 65.318,
1292
+ "eval_steps_per_second": 8.23,
1293
+ "step": 15942
1294
+ },
1295
+ {
1296
+ "epoch": 19.069109323801012,
1297
+ "grad_norm": 2.1111717224121094,
1298
+ "learning_rate": 3e-05,
1299
+ "loss": 0.862,
1300
+ "step": 16000
1301
+ },
1302
+ {
1303
+ "epoch": 19.188263330354484,
1304
+ "grad_norm": 2.0599365234375,
1305
+ "learning_rate": 3e-05,
1306
+ "loss": 0.8021,
1307
+ "step": 16100
1308
+ },
1309
+ {
1310
+ "epoch": 19.307417336907953,
1311
+ "grad_norm": 1.752618432044983,
1312
+ "learning_rate": 3e-05,
1313
+ "loss": 0.8294,
1314
+ "step": 16200
1315
+ },
1316
+ {
1317
+ "epoch": 19.426571343461426,
1318
+ "grad_norm": 2.0611188411712646,
1319
+ "learning_rate": 3e-05,
1320
+ "loss": 0.8242,
1321
+ "step": 16300
1322
+ },
1323
+ {
1324
+ "epoch": 19.545725350014894,
1325
+ "grad_norm": 2.7928466796875,
1326
+ "learning_rate": 3e-05,
1327
+ "loss": 0.8313,
1328
+ "step": 16400
1329
+ },
1330
+ {
1331
+ "epoch": 19.664879356568363,
1332
+ "grad_norm": 2.2697372436523438,
1333
+ "learning_rate": 3e-05,
1334
+ "loss": 0.8389,
1335
+ "step": 16500
1336
+ },
1337
+ {
1338
+ "epoch": 19.784033363121836,
1339
+ "grad_norm": 1.9462206363677979,
1340
+ "learning_rate": 3e-05,
1341
+ "loss": 0.8413,
1342
+ "step": 16600
1343
+ },
1344
+ {
1345
+ "epoch": 19.903187369675305,
1346
+ "grad_norm": 1.8066469430923462,
1347
+ "learning_rate": 3e-05,
1348
+ "loss": 0.8537,
1349
+ "step": 16700
1350
+ },
1351
+ {
1352
+ "epoch": 19.998510574918082,
1353
+ "eval_accuracy": 0.7512829373650108,
1354
+ "eval_loss": 1.3258286714553833,
1355
+ "eval_runtime": 7.9417,
1356
+ "eval_samples_per_second": 62.959,
1357
+ "eval_steps_per_second": 7.933,
1358
+ "step": 16780
1359
+ },
1360
+ {
1361
+ "epoch": 19.998510574918082,
1362
+ "step": 16780,
1363
+ "total_flos": 1.3733500524072796e+18,
1364
+ "train_loss": 0.22626538049336412,
1365
+ "train_runtime": 9900.7803,
1366
+ "train_samples_per_second": 54.246,
1367
+ "train_steps_per_second": 1.695
1368
+ }
1369
+ ],
1370
+ "logging_steps": 100,
1371
+ "max_steps": 16780,
1372
+ "num_input_tokens_seen": 0,
1373
+ "num_train_epochs": 20,
1374
+ "save_steps": 500,
1375
+ "total_flos": 1.3733500524072796e+18,
1376
+ "train_batch_size": 1,
1377
+ "trial_name": null,
1378
+ "trial_params": null
1379
+ }