gpt2
Venkatesh Srinivas commited on
Commit
123e9fd
1 Parent(s): c8b51fb

Add details for 12.5B checkpoint

Browse files
Files changed (4) hide show
  1. README.md +2 -0
  2. ckpt.pt.31000.val_2.6607 +0 -3
  3. train_val.png +0 -0
  4. train_val.txt +483 -0
README.md CHANGED
@@ -38,6 +38,8 @@ in 'float16' rather than 'bfloat16'. Learning rate ramped up 6e-5 to 4e-4
38
  over the first 3000 iterations (786M tokens) and stayed there for the
39
  rest of the training process.
40
 
 
 
41
  ---
42
 
43
  Evaluations
 
38
  over the first 3000 iterations (786M tokens) and stayed there for the
39
  rest of the training process.
40
 
41
+ ![train_val](train_val.png)
42
+
43
  ---
44
 
45
  Evaluations
ckpt.pt.31000.val_2.6607 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b9f96f574207d59feca0595b7694d2d2d5ba61a6306652a6d7bb0f0d5c875a70
3
- size 4595163769
 
 
 
 
train_val.png ADDED
train_val.txt ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Overriding: eval_iters = 50Overriding: eval_iters = 50
2
+ Overriding: eval_interval = 100Overriding: eval_interval = 100
3
+ step 0: train loss 11.0252, val loss 11.0342
4
+ step 100: train loss 8.3994, val loss 8.2066
5
+ step 200: train loss 7.3136, val loss 7.1235
6
+ step 300: train loss 6.5888, val loss 6.7433
7
+ step 400: train loss 6.5067, val loss 6.4013
8
+ step 500: train loss 6.1970, val loss 6.1153
9
+ step 600: train loss 5.9715, val loss 6.0343
10
+ step 700: train loss 5.7357, val loss 5.7946
11
+ step 800: train loss 5.6244, val loss 5.7100
12
+ step 900: train loss 5.4724, val loss 5.5178
13
+ step 1000: train loss 5.4297, val loss 5.3089
14
+ step 1100: train loss 5.0414, val loss 5.2748
15
+ step 1200: train loss 4.9450, val loss 4.9600
16
+ step 1300: train loss 4.6848, val loss 4.8181
17
+ step 1400: train loss 4.5482, val loss 4.4525
18
+ step 1500: train loss 4.4756, val loss 4.3209
19
+ step 1600: train loss 4.2531, val loss 4.2776
20
+ step 1700: train loss 4.2488, val loss 4.2306
21
+ step 1800: train loss 4.0376, val loss 4.1076
22
+ step 1900: train loss 4.0463, val loss 4.0019
23
+ step 2000: train loss 3.9624, val loss 3.8664
24
+ step 2100: train loss 3.9590, val loss 3.7839
25
+ step 2200: train loss 3.9238, val loss 3.8385
26
+ step 2300: train loss 3.6838, val loss 3.7538
27
+ step 2400: train loss 3.7332, val loss 3.6593
28
+ step 2500: train loss 3.7454, val loss 3.5440
29
+ step 2600: train loss 3.5528, val loss 3.6207
30
+ step 2700: train loss 3.5916, val loss 3.6545
31
+ step 2800: train loss 3.7254, val loss 3.6136
32
+ step 2900: train loss 3.5898, val loss 3.3846
33
+ step 3000: train loss 3.5164, val loss 3.4608
34
+ step 3100: train loss 3.6373, val loss 3.5505
35
+ step 3200: train loss 3.5100, val loss 3.6281
36
+ step 3300: train loss 3.5623, val loss 3.5894
37
+ step 3400: train loss 3.4841, val loss 3.4290
38
+ step 3500: train loss 3.5908, val loss 3.4267
39
+ step 3600: train loss 3.4661, val loss 3.5482
40
+ step 3700: train loss 3.4633, val loss 3.4274
41
+ step 3800: train loss 3.4503, val loss 3.5384
42
+ step 3900: train loss 3.3948, val loss 3.3274
43
+ step 4000: train loss 3.4388, val loss 3.3746
44
+ step 4100: train loss 3.3921, val loss 3.2486
45
+ step 4200: train loss 3.4422, val loss 3.3624
46
+ step 4300: train loss 3.3533, val loss 3.2563
47
+ step 4400: train loss 3.3215, val loss 3.3935
48
+ step 4500: train loss 3.4373, val loss 3.2724
49
+ step 4600: train loss 3.2562, val loss 3.2819
50
+ step 4700: train loss 3.3209, val loss 3.2646
51
+ step 4800: train loss 3.1498, val loss 3.3252
52
+ step 4900: train loss 3.3318, val loss 3.3322
53
+ step 5000: train loss 3.1285, val loss 3.2495
54
+ step 5100: train loss 3.3448, val loss 3.1907
55
+ step 5200: train loss 3.3123, val loss 3.1915
56
+ step 5300: train loss 3.2482, val loss 3.3080
57
+ step 5400: train loss 3.0714, val loss 3.1940
58
+ step 5500: train loss 3.1294, val loss 3.2508
59
+ step 5600: train loss 3.2360, val loss 3.0566
60
+ step 5700: train loss 3.2703, val loss 3.1624
61
+ step 5800: train loss 3.3135, val loss 3.1183
62
+ step 5900: train loss 3.2142, val loss 3.1934
63
+ step 6000: train loss 3.2289, val loss 3.1825
64
+ step 6100: train loss 3.0920, val loss 3.1858
65
+ step 6200: train loss 3.2835, val loss 3.1578
66
+ step 6300: train loss 3.1277, val loss 3.1348
67
+ step 6400: train loss 3.0799, val loss 3.2929
68
+ step 6500: train loss 3.0791, val loss 3.2397
69
+ step 6600: train loss 3.2201, val loss 3.2587
70
+ step 6700: train loss 3.0092, val loss 3.2005
71
+ step 6800: train loss 3.0824, val loss 3.0970
72
+ step 6900: train loss 3.2339, val loss 3.1762
73
+ step 7000: train loss 3.1754, val loss 3.1966
74
+ step 7100: train loss 3.1720, val loss 3.1533
75
+ step 7200: train loss 3.1673, val loss 3.1003
76
+ step 7300: train loss 3.1047, val loss 3.1397
77
+ step 7400: train loss 3.1211, val loss 3.1447
78
+ step 7500: train loss 3.1564, val loss 3.0936
79
+ step 7600: train loss 3.0931, val loss 3.1315
80
+ step 7700: train loss 2.9800, val loss 3.2394
81
+ step 7800: train loss 3.1775, val loss 3.2620
82
+ step 7900: train loss 3.0847, val loss 3.0954
83
+ step 8000: train loss 3.0581, val loss 3.0713
84
+ step 8100: train loss 3.1880, val loss 3.0542
85
+ step 8200: train loss 3.1568, val loss 3.0514
86
+ step 8300: train loss 3.0128, val loss 3.1295
87
+ step 8400: train loss 3.2077, val loss 3.0505
88
+ step 8500: train loss 3.0058, val loss 3.1052
89
+ step 8600: train loss 3.0915, val loss 2.8884
90
+ step 8700: train loss 3.1190, val loss 3.0491
91
+ step 8800: train loss 2.9319, val loss 2.9831
92
+ step 8900: train loss 2.9605, val loss 3.0030
93
+ step 9000: train loss 3.0953, val loss 2.9161
94
+ step 9100: train loss 3.1344, val loss 3.0248
95
+ step 9200: train loss 2.9525, val loss 3.0419
96
+ step 9300: train loss 2.9842, val loss 2.9508
97
+ step 9400: train loss 3.1642, val loss 3.0025
98
+ step 9500: train loss 2.9276, val loss 3.0674
99
+ step 9600: train loss 3.0968, val loss 3.0211
100
+ step 9700: train loss 3.1166, val loss 3.0580
101
+ step 9800: train loss 2.9912, val loss 2.9596
102
+ step 9900: train loss 2.9809, val loss 2.9561
103
+ step 10000: train loss 3.0402, val loss 2.9424
104
+ step 10100: train loss 2.9393, val loss 2.9690
105
+ step 10200: train loss 3.0273, val loss 3.0578
106
+ step 10300: train loss 2.9466, val loss 3.1119
107
+ step 10400: train loss 2.9821, val loss 2.9871
108
+ step 10500: train loss 3.0022, val loss 3.0068
109
+ step 10600: train loss 2.9527, val loss 3.0174
110
+ step 10700: train loss 3.0224, val loss 3.0772
111
+ step 10800: train loss 2.9642, val loss 3.0270
112
+ step 10900: train loss 2.9446, val loss 2.9751
113
+ step 11000: train loss 2.9466, val loss 2.9945
114
+ step 11100: train loss 2.9304, val loss 2.9444
115
+ step 11200: train loss 2.9619, val loss 3.0315
116
+ step 11300: train loss 2.9358, val loss 2.9847
117
+ step 11400: train loss 3.0165, val loss 2.7416
118
+ step 11500: train loss 2.8405, val loss 3.0835
119
+ step 11600: train loss 3.0746, val loss 3.0534
120
+ step 11700: train loss 2.9898, val loss 2.9221
121
+ step 11800: train loss 2.8608, val loss 3.0250
122
+ step 11900: train loss 2.9855, val loss 2.9443
123
+ step 12000: train loss 2.9834, val loss 2.9962
124
+ step 12100: train loss 2.8355, val loss 3.0118
125
+ step 12200: train loss 2.9886, val loss 2.9714
126
+ step 12300: train loss 2.9457, val loss 2.9599
127
+ step 12400: train loss 2.8276, val loss 3.0673
128
+ step 12500: train loss 2.9246, val loss 2.9800
129
+ step 12600: train loss 3.0029, val loss 2.8929
130
+ step 12700: train loss 2.9373, val loss 2.9386
131
+ step 12800: train loss 2.9504, val loss 3.0079
132
+ step 12900: train loss 2.9921, val loss 2.9243
133
+ step 13000: train loss 2.9724, val loss 3.0502
134
+ step 13100: train loss 2.9558, val loss 2.8818
135
+ step 13200: train loss 2.9938, val loss 2.9503
136
+ step 13300: train loss 2.9165, val loss 3.0683
137
+ step 13400: train loss 2.9777, val loss 2.9374
138
+ step 13500: train loss 3.0141, val loss 2.9254
139
+ step 13600: train loss 2.8655, val loss 2.9531
140
+ step 13700: train loss 2.8848, val loss 3.0087
141
+ step 13800: train loss 2.9226, val loss 2.8738
142
+ step 13900: train loss 2.8910, val loss 2.9250
143
+ step 14000: train loss 2.9752, val loss 2.9531
144
+ step 14100: train loss 2.9497, val loss 2.9894
145
+ step 14200: train loss 2.9779, val loss 2.9911
146
+ step 14300: train loss 2.9423, val loss 2.7897
147
+ step 14400: train loss 3.0338, val loss 3.0204
148
+ step 14500: train loss 2.8680, val loss 2.8760
149
+ step 14600: train loss 3.0093, val loss 2.8034
150
+ step 14700: train loss 2.9222, val loss 2.8466
151
+ step 14800: train loss 2.7877, val loss 2.8845
152
+ step 14900: train loss 2.9715, val loss 3.0005
153
+ step 15000: train loss 3.0018, val loss 3.0310
154
+ step 15100: train loss 2.9654, val loss 2.9176
155
+ step 15200: train loss 2.9580, val loss 2.9125
156
+ step 15300: train loss 3.0046, val loss 2.8712
157
+ step 15400: train loss 3.0046, val loss 2.9361
158
+ step 15500: train loss 2.7949, val loss 2.8170
159
+ step 15600: train loss 2.9127, val loss 2.9011
160
+ step 15700: train loss 2.9440, val loss 2.9167
161
+ step 15800: train loss 2.8596, val loss 2.8605
162
+ step 15900: train loss 2.8704, val loss 2.8725
163
+ step 16000: train loss 2.8634, val loss 3.0975
164
+ step 16100: train loss 2.9963, val loss 2.7633
165
+ step 16200: train loss 2.9618, val loss 2.9352
166
+ step 16300: train loss 2.7306, val loss 2.9384
167
+ step 16400: train loss 2.9731, val loss 2.9716
168
+ step 16500: train loss 2.8599, val loss 3.0492
169
+ step 16600: train loss 2.8712, val loss 2.9475
170
+ step 16700: train loss 2.9567, val loss 2.8846
171
+ step 16800: train loss 2.8565, val loss 3.0182
172
+ step 16900: train loss 2.8318, val loss 3.0222
173
+ step 17000: train loss 3.0119, val loss 2.8964
174
+ step 17100: train loss 2.8578, val loss 2.7679
175
+ step 17200: train loss 2.8943, val loss 2.9294
176
+ step 17300: train loss 2.8835, val loss 2.8658
177
+ step 17400: train loss 2.9415, val loss 2.9057
178
+ step 17500: train loss 2.8730, val loss 2.7631
179
+ step 17600: train loss 2.7918, val loss 2.7859
180
+ step 17700: train loss 2.9455, val loss 2.9624
181
+ step 17800: train loss 2.7874, val loss 2.8241
182
+ step 17900: train loss 2.9045, val loss 2.8924
183
+ step 18000: train loss 2.6872, val loss 2.9278
184
+ step 18100: train loss 2.9407, val loss 2.9969
185
+ step 18200: train loss 3.0288, val loss 2.9354
186
+ step 18300: train loss 2.8862, val loss 2.8489
187
+ step 18400: train loss 2.8283, val loss 2.8086
188
+ step 18500: train loss 2.8491, val loss 2.8545
189
+ step 18600: train loss 2.8140, val loss 2.9770
190
+ step 18700: train loss 2.9287, val loss 2.8787
191
+ step 18800: train loss 3.0498, val loss 2.7461
192
+ step 18900: train loss 2.9223, val loss 2.8665
193
+ step 19000: train loss 2.9418, val loss 2.9149
194
+ step 19100: train loss 2.6789, val loss 2.9049
195
+ step 19200: train loss 2.8974, val loss 2.8892
196
+ step 19300: train loss 2.8448, val loss 2.9557
197
+ step 19400: train loss 2.8466, val loss 2.9635
198
+ step 19500: train loss 2.8872, val loss 2.8272
199
+ step 19600: train loss 2.7967, val loss 3.0509
200
+ step 19700: train loss 2.8516, val loss 2.7520
201
+ step 19800: train loss 3.0064, val loss 2.8897
202
+ step 19900: train loss 2.8801, val loss 2.9297
203
+ step 20000: train loss 2.8270, val loss 2.9379
204
+ step 20100: train loss 2.8988, val loss 2.8314
205
+ step 20200: train loss 2.6983, val loss 2.9195
206
+ step 20300: train loss 2.8345, val loss 2.8455
207
+ step 20400: train loss 2.7777, val loss 2.9164
208
+ step 20500: train loss 2.9010, val loss 2.8442
209
+ step 20600: train loss 2.8983, val loss 2.8687
210
+ step 20700: train loss 2.7852, val loss 2.8359
211
+ step 20800: train loss 2.6776, val loss 2.8802
212
+ step 20900: train loss 2.7957, val loss 2.9362
213
+ step 21000: train loss 2.8322, val loss 2.8738
214
+ step 21100: train loss 2.8448, val loss 2.8849
215
+ step 21200: train loss 2.9563, val loss 3.0302
216
+ step 21300: train loss 2.9416, val loss 2.7907
217
+ step 21400: train loss 2.7988, val loss 2.8956
218
+ step 21500: train loss 2.8556, val loss 2.8462
219
+ step 21600: train loss 2.8326, val loss 2.8084
220
+ step 21700: train loss 2.8916, val loss 2.9479
221
+ step 21800: train loss 2.6759, val loss 2.8316
222
+ step 21900: train loss 2.7605, val loss 2.8726
223
+ step 22000: train loss 2.8973, val loss 2.7646
224
+ step 22100: train loss 2.7950, val loss 2.8894
225
+ step 22200: train loss 2.8879, val loss 2.8456
226
+ step 22300: train loss 2.8610, val loss 2.7752
227
+ step 22400: train loss 2.8503, val loss 2.7268
228
+ step 22500: train loss 2.7624, val loss 2.8039
229
+ step 22600: train loss 2.7896, val loss 2.9268
230
+ step 22700: train loss 2.9371, val loss 2.8718
231
+ step 22800: train loss 2.9747, val loss 2.7481
232
+ step 22900: train loss 2.8736, val loss 2.8353
233
+ step 23000: train loss 2.8346, val loss 2.7387
234
+ step 23100: train loss 2.8266, val loss 2.9682
235
+ step 23200: train loss 2.8811, val loss 2.8276
236
+ step 23300: train loss 2.8492, val loss 2.7715
237
+ step 23400: train loss 2.9512, val loss 2.8733
238
+ step 23500: train loss 2.8948, val loss 2.8610
239
+ step 23600: train loss 2.9883, val loss 2.8248
240
+ step 23700: train loss 2.7142, val loss 2.9138
241
+ step 23800: train loss 2.7128, val loss 2.8417
242
+ step 23900: train loss 3.0065, val loss 2.8004
243
+ step 24000: train loss 2.8458, val loss 2.7381
244
+ step 24100: train loss 2.7890, val loss 2.8468
245
+ step 24200: train loss 2.9545, val loss 2.7933
246
+ step 24300: train loss 2.8738, val loss 2.9072
247
+ step 24400: train loss 2.8440, val loss 2.7552
248
+ step 24500: train loss 2.8107, val loss 2.7479
249
+ step 24600: train loss 2.8175, val loss 2.8063
250
+ step 24700: train loss 2.9319, val loss 2.8145
251
+ step 24800: train loss 2.8535, val loss 2.8273
252
+ step 24900: train loss 2.7535, val loss 2.9339
253
+ step 25000: train loss 2.7998, val loss 2.8346
254
+ step 25100: train loss 2.8028, val loss 2.7334
255
+ step 25200: train loss 3.0190, val loss 2.7507
256
+ step 25300: train loss 2.9597, val loss 2.7477
257
+ step 25400: train loss 3.0206, val loss 2.8678
258
+ step 25500: train loss 2.8184, val loss 2.8603
259
+ step 25600: train loss 2.8984, val loss 2.7563
260
+ step 25700: train loss 2.7563, val loss 2.8466
261
+ step 25800: train loss 2.8035, val loss 2.8461
262
+ step 25900: train loss 2.8879, val loss 3.0032
263
+ step 26000: train loss 2.8628, val loss 2.8316
264
+ step 26100: train loss 2.8199, val loss 2.8175
265
+ step 26200: train loss 2.8381, val loss 2.7543
266
+ step 26300: train loss 2.7932, val loss 2.7437
267
+ step 26400: train loss 2.7451, val loss 2.8037
268
+ step 26500: train loss 2.8398, val loss 2.7688
269
+ step 26600: train loss 2.8197, val loss 2.6988
270
+ step 26700: train loss 2.8181, val loss 2.8315
271
+ step 26800: train loss 2.7584, val loss 2.6994
272
+ step 26900: train loss 2.7917, val loss 2.7537
273
+ step 27000: train loss 2.6462, val loss 2.7579
274
+ step 27100: train loss 2.8499, val loss 2.7959
275
+ step 27200: train loss 2.8724, val loss 2.8232
276
+ step 27300: train loss 2.7593, val loss 2.8665
277
+ step 27400: train loss 2.8588, val loss 2.9407
278
+ step 27500: train loss 2.7949, val loss 2.6853
279
+ step 27600: train loss 2.7752, val loss 2.8110
280
+ step 27700: train loss 2.9131, val loss 2.9227
281
+ step 27800: train loss 2.7813, val loss 2.7983
282
+ step 27900: train loss 2.7238, val loss 2.9116
283
+ step 28000: train loss 2.6029, val loss 2.6874
284
+ step 28100: train loss 2.7992, val loss 2.8840
285
+ step 28200: train loss 2.8726, val loss 2.7155
286
+ step 28300: train loss 2.8896, val loss 2.7741
287
+ step 28400: train loss 2.8420, val loss 2.7712
288
+ step 28500: train loss 2.7476, val loss 2.8297
289
+ step 28600: train loss 2.8152, val loss 2.8123
290
+ step 28700: train loss 2.8929, val loss 2.8723
291
+ step 28800: train loss 2.8116, val loss 2.8850
292
+ step 28900: train loss 2.8026, val loss 2.8580
293
+ step 29000: train loss 2.6830, val loss 2.7671
294
+ step 29100: train loss 2.7769, val loss 2.8252
295
+ step 29200: train loss 2.8928, val loss 2.7823
296
+ step 29300: train loss 2.7859, val loss 2.8006
297
+ step 29400: train loss 2.8484, val loss 2.8032
298
+ step 29500: train loss 2.8194, val loss 2.7389
299
+ step 29600: train loss 2.8775, val loss 2.8360
300
+ step 29700: train loss 2.7912, val loss 2.7585
301
+ step 29800: train loss 2.8499, val loss 2.8210
302
+ step 29900: train loss 2.9061, val loss 2.6846
303
+ step 30000: train loss 2.7540, val loss 2.8391
304
+ step 30100: train loss 2.8292, val loss 2.8358
305
+ step 30200: train loss 2.5902, val loss 2.8730
306
+ step 30300: train loss 2.8947, val loss 2.8475
307
+ step 30400: train loss 2.8898, val loss 2.7538
308
+ step 30500: train loss 2.8530, val loss 2.8979
309
+ step 30600: train loss 2.8079, val loss 2.8202
310
+ step 30700: train loss 2.6925, val loss 2.7329
311
+ step 30800: train loss 2.7408, val loss 2.7117
312
+ step 30900: train loss 2.7052, val loss 2.8759
313
+ step 31000: train loss 2.7108, val loss 2.6607
314
+ step 31100: train loss 2.8145, val loss 2.7848
315
+ step 31200: train loss 2.8752, val loss 2.8979
316
+ step 31300: train loss 2.6798, val loss 2.8022
317
+ step 31400: train loss 2.9750, val loss 2.6888
318
+ step 31500: train loss 2.6494, val loss 2.8619
319
+ step 31600: train loss 2.8156, val loss 2.8232
320
+ step 31700: train loss 2.7252, val loss 2.7410
321
+ step 31800: train loss 2.6924, val loss 2.7541
322
+ step 31900: train loss 2.8176, val loss 2.9296
323
+ step 32000: train loss 2.8469, val loss 2.8549
324
+ step 32100: train loss 2.8750, val loss 2.9075
325
+ step 32200: train loss 2.8387, val loss 2.7277
326
+ step 32300: train loss 2.7656, val loss 2.7939
327
+ step 32400: train loss 2.6632, val loss 2.7976
328
+ step 32500: train loss 2.7674, val loss 2.7517
329
+ step 32600: train loss 2.8411, val loss 2.7297
330
+ step 32700: train loss 2.8641, val loss 2.7247
331
+ step 32800: train loss 2.6665, val loss 2.7943
332
+ step 32900: train loss 2.8883, val loss 2.7321
333
+ step 33000: train loss 2.8978, val loss 2.7700
334
+ step 33100: train loss 2.7607, val loss 2.6791
335
+ step 33200: train loss 2.7516, val loss 2.8169
336
+ step 33300: train loss 2.8498, val loss 2.6707
337
+ step 33400: train loss 2.8504, val loss 2.9119
338
+ step 33500: train loss 2.7596, val loss 2.9151
339
+ step 33600: train loss 2.9359, val loss 2.9191
340
+ step 33700: train loss 2.7263, val loss 2.8193
341
+ step 33800: train loss 2.8230, val loss 2.8280
342
+ step 33900: train loss 2.8378, val loss 2.7144
343
+ step 34000: train loss 2.7823, val loss 2.8035
344
+ step 34100: train loss 2.7779, val loss 2.8396
345
+ step 34200: train loss 2.8372, val loss 2.8954
346
+ step 34300: train loss 2.8226, val loss 2.6627
347
+ step 34400: train loss 2.8642, val loss 2.8739
348
+ step 34500: train loss 2.7282, val loss 2.6650
349
+ step 34600: train loss 2.7650, val loss 2.7226
350
+ step 34700: train loss 2.7236, val loss 2.6892
351
+ step 34800: train loss 2.7721, val loss 2.9387
352
+ step 34900: train loss 2.7465, val loss 2.7535
353
+ step 35000: train loss 2.7129, val loss 2.7230
354
+ step 35100: train loss 2.7448, val loss 2.7261
355
+ step 35200: train loss 2.9534, val loss 2.7127
356
+ step 35300: train loss 2.6951, val loss 2.8034
357
+ step 35400: train loss 2.8718, val loss 2.7998
358
+ step 35500: train loss 2.7152, val loss 2.7406
359
+ step 35600: train loss 2.8066, val loss 2.7981
360
+ step 35700: train loss 2.8076, val loss 2.7320
361
+ step 35800: train loss 2.9054, val loss 2.7541
362
+ step 35900: train loss 2.8348, val loss 2.6628
363
+ step 36000: train loss 2.7294, val loss 2.7758
364
+ step 36100: train loss 2.8457, val loss 2.8148
365
+ step 36200: train loss 2.8626, val loss 2.8337
366
+ step 36300: train loss 2.7538, val loss 2.8294
367
+ step 36400: train loss 2.5631, val loss 2.7590
368
+ step 36500: train loss 2.8542, val loss 2.7585
369
+ step 36600: train loss 2.7567, val loss 2.8492
370
+ step 36700: train loss 2.8481, val loss 2.7103
371
+ step 36800: train loss 2.8135, val loss 2.7256
372
+ step 36900: train loss 2.6976, val loss 2.6366
373
+ step 37000: train loss 2.8643, val loss 2.7390
374
+ step 37100: train loss 2.7979, val loss 2.6219
375
+ step 37200: train loss 2.7855, val loss 2.8387
376
+ step 37300: train loss 2.8332, val loss 2.8489
377
+ step 37400: train loss 2.6962, val loss 2.9051
378
+ step 37500: train loss 2.7735, val loss 2.8329
379
+ step 37600: train loss 2.8305, val loss 2.7830
380
+ step 37700: train loss 2.7930, val loss 2.7070
381
+ step 37800: train loss 2.7834, val loss 2.7718
382
+ step 37900: train loss 2.9645, val loss 2.7499
383
+ step 38000: train loss 2.6900, val loss 2.8002
384
+ step 38100: train loss 2.7324, val loss 2.8638
385
+ step 38200: train loss 2.6724, val loss 2.7601
386
+ step 38300: train loss 2.8456, val loss 2.7571
387
+ step 38400: train loss 2.7720, val loss 2.8515
388
+ step 38500: train loss 2.7960, val loss 2.8611
389
+ step 38600: train loss 2.7673, val loss 2.8128
390
+ step 38700: train loss 2.8076, val loss 2.8023
391
+ step 38800: train loss 2.8252, val loss 2.7761
392
+ step 38900: train loss 2.6206, val loss 2.8931
393
+ step 39000: train loss 2.7810, val loss 2.6949
394
+ step 39100: train loss 2.8880, val loss 2.6300
395
+ step 39200: train loss 2.7765, val loss 2.8009
396
+ step 39300: train loss 2.8100, val loss 2.9730
397
+ step 39400: train loss 2.6373, val loss 2.7640
398
+ step 39500: train loss 2.7533, val loss 2.7617
399
+ step 39600: train loss 2.8452, val loss 2.8122
400
+ step 39700: train loss 2.7849, val loss 2.8067
401
+ step 39800: train loss 2.7890, val loss 2.7672
402
+ step 39900: train loss 2.7164, val loss 2.6389
403
+ step 40000: train loss 2.8189, val loss 2.7924
404
+ step 40100: train loss 2.9345, val loss 2.9801
405
+ step 40200: train loss 2.9074, val loss 2.7438
406
+ step 40300: train loss 2.8472, val loss 2.7186
407
+ step 40400: train loss 2.5992, val loss 2.7979
408
+ step 40500: train loss 2.8513, val loss 2.7371
409
+ step 40600: train loss 2.6937, val loss 2.7330
410
+ step 40700: train loss 2.7758, val loss 2.7263
411
+ step 40800: train loss 2.7242, val loss 2.8467
412
+ step 40900: train loss 2.7578, val loss 2.9498
413
+ step 41000: train loss 2.7946, val loss 2.7555
414
+ step 41100: train loss 2.8186, val loss 2.7127
415
+ step 41200: train loss 2.7768, val loss 2.7014
416
+ step 41300: train loss 2.8141, val loss 2.7691
417
+ step 41400: train loss 2.7520, val loss 2.6608
418
+ step 41500: train loss 2.7952, val loss 2.8809
419
+ step 41600: train loss 2.7405, val loss 2.8320
420
+ step 41700: train loss 2.7319, val loss 2.6906
421
+ step 41800: train loss 2.7042, val loss 2.8355
422
+ step 41900: train loss 2.6836, val loss 2.7683
423
+ step 42000: train loss 2.8002, val loss 2.7833
424
+ step 42100: train loss 2.9250, val loss 2.7595
425
+ step 42200: train loss 2.6998, val loss 2.8130
426
+ step 42300: train loss 2.6696, val loss 2.7072
427
+ step 42400: train loss 2.6971, val loss 2.7896
428
+ step 42500: train loss 2.7793, val loss 2.8207
429
+ step 42600: train loss 2.7416, val loss 2.6938
430
+ step 42700: train loss 2.5605, val loss 2.8192
431
+ step 42800: train loss 2.8029, val loss 2.6802
432
+ step 42900: train loss 2.8314, val loss 2.7868
433
+ step 43000: train loss 2.7065, val loss 2.5963
434
+ step 43100: train loss 2.8072, val loss 2.7424
435
+ step 43200: train loss 2.6797, val loss 2.7166
436
+ step 43300: train loss 2.6579, val loss 2.7534
437
+ step 43400: train loss 2.8590, val loss 2.8177
438
+ step 43500: train loss 2.7240, val loss 2.8758
439
+ step 43600: train loss 2.8024, val loss 2.7224
440
+ step 43700: train loss 2.8347, val loss 2.7132
441
+ step 43800: train loss 2.8055, val loss 2.6904
442
+ step 43900: train loss 2.7516, val loss 2.7553
443
+ step 44000: train loss 2.7896, val loss 2.7832
444
+ step 44100: train loss 2.8472, val loss 2.7570
445
+ step 44200: train loss 2.6282, val loss 2.6458
446
+ step 44300: train loss 2.7891, val loss 2.6897
447
+ step 44400: train loss 2.8262, val loss 2.7445
448
+ step 44500: train loss 2.7764, val loss 2.7653
449
+ step 44600: train loss 2.8129, val loss 2.7805
450
+ step 44700: train loss 2.8649, val loss 2.8448
451
+ step 44800: train loss 2.6760, val loss 2.7656
452
+ step 44900: train loss 2.7011, val loss 2.7474
453
+ step 45000: train loss 2.7879, val loss 2.6947
454
+ step 45100: train loss 2.9080, val loss 2.7905
455
+ step 45200: train loss 2.7495, val loss 2.7055
456
+ step 45300: train loss 2.6580, val loss 2.8663
457
+ step 45400: train loss 2.8094, val loss 2.8226
458
+ step 45500: train loss 2.7298, val loss 2.8190
459
+ step 45600: train loss 2.7434, val loss 2.6559
460
+ step 45700: train loss 2.8474, val loss 2.7221
461
+ step 45800: train loss 2.8787, val loss 2.8628
462
+ step 45900: train loss 2.7202, val loss 2.6398
463
+ step 46000: train loss 2.8298, val loss 2.8447
464
+ step 46100: train loss 2.6955, val loss 2.8386
465
+ step 46200: train loss 2.7849, val loss 2.6825
466
+ step 46300: train loss 2.8191, val loss 2.7793
467
+ step 46400: train loss 2.7815, val loss 2.7403
468
+ step 46500: train loss 2.8007, val loss 2.7719
469
+ step 46600: train loss 2.6661, val loss 2.8360
470
+ step 46700: train loss 2.8279, val loss 2.7529
471
+ step 46800: train loss 2.8326, val loss 2.7180
472
+ step 46900: train loss 2.7323, val loss 2.8723
473
+ step 47000: train loss 2.7846, val loss 2.7797
474
+ step 47100: train loss 2.7533, val loss 2.7694
475
+ step 47200: train loss 2.8556, val loss 2.7418
476
+ step 47300: train loss 2.7036, val loss 2.7377
477
+ step 47400: train loss 2.7860, val loss 2.8879
478
+ step 47500: train loss 2.7223, val loss 2.8096
479
+ step 47600: train loss 2.7241, val loss 2.9097
480
+ step 47700: train loss 2.6891, val loss 2.8653
481
+ step 47800: train loss 2.7393, val loss 2.7434
482
+ step 47900: train loss 2.8288, val loss 2.6818
483
+ step 48000: train loss 2.7092, val loss 2.7769