Venkatesh Srinivas
commited on
Commit
•
123e9fd
1
Parent(s):
c8b51fb
Add details for 12.5B checkpoint
Browse files- README.md +2 -0
- ckpt.pt.31000.val_2.6607 +0 -3
- train_val.png +0 -0
- train_val.txt +483 -0
README.md
CHANGED
@@ -38,6 +38,8 @@ in 'float16' rather than 'bfloat16'. Learning rate ramped up 6e-5 to 4e-4
|
|
38 |
over the first 3000 iterations (786M tokens) and stayed there for the
|
39 |
rest of the training process.
|
40 |
|
|
|
|
|
41 |
---
|
42 |
|
43 |
Evaluations
|
|
|
38 |
over the first 3000 iterations (786M tokens) and stayed there for the
|
39 |
rest of the training process.
|
40 |
|
41 |
+
![train_val](train_val.png)
|
42 |
+
|
43 |
---
|
44 |
|
45 |
Evaluations
|
ckpt.pt.31000.val_2.6607
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:b9f96f574207d59feca0595b7694d2d2d5ba61a6306652a6d7bb0f0d5c875a70
|
3 |
-
size 4595163769
|
|
|
|
|
|
|
|
train_val.png
ADDED
![]() |
train_val.txt
ADDED
@@ -0,0 +1,483 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Overriding: eval_iters = 50Overriding: eval_iters = 50
|
2 |
+
Overriding: eval_interval = 100Overriding: eval_interval = 100
|
3 |
+
step 0: train loss 11.0252, val loss 11.0342
|
4 |
+
step 100: train loss 8.3994, val loss 8.2066
|
5 |
+
step 200: train loss 7.3136, val loss 7.1235
|
6 |
+
step 300: train loss 6.5888, val loss 6.7433
|
7 |
+
step 400: train loss 6.5067, val loss 6.4013
|
8 |
+
step 500: train loss 6.1970, val loss 6.1153
|
9 |
+
step 600: train loss 5.9715, val loss 6.0343
|
10 |
+
step 700: train loss 5.7357, val loss 5.7946
|
11 |
+
step 800: train loss 5.6244, val loss 5.7100
|
12 |
+
step 900: train loss 5.4724, val loss 5.5178
|
13 |
+
step 1000: train loss 5.4297, val loss 5.3089
|
14 |
+
step 1100: train loss 5.0414, val loss 5.2748
|
15 |
+
step 1200: train loss 4.9450, val loss 4.9600
|
16 |
+
step 1300: train loss 4.6848, val loss 4.8181
|
17 |
+
step 1400: train loss 4.5482, val loss 4.4525
|
18 |
+
step 1500: train loss 4.4756, val loss 4.3209
|
19 |
+
step 1600: train loss 4.2531, val loss 4.2776
|
20 |
+
step 1700: train loss 4.2488, val loss 4.2306
|
21 |
+
step 1800: train loss 4.0376, val loss 4.1076
|
22 |
+
step 1900: train loss 4.0463, val loss 4.0019
|
23 |
+
step 2000: train loss 3.9624, val loss 3.8664
|
24 |
+
step 2100: train loss 3.9590, val loss 3.7839
|
25 |
+
step 2200: train loss 3.9238, val loss 3.8385
|
26 |
+
step 2300: train loss 3.6838, val loss 3.7538
|
27 |
+
step 2400: train loss 3.7332, val loss 3.6593
|
28 |
+
step 2500: train loss 3.7454, val loss 3.5440
|
29 |
+
step 2600: train loss 3.5528, val loss 3.6207
|
30 |
+
step 2700: train loss 3.5916, val loss 3.6545
|
31 |
+
step 2800: train loss 3.7254, val loss 3.6136
|
32 |
+
step 2900: train loss 3.5898, val loss 3.3846
|
33 |
+
step 3000: train loss 3.5164, val loss 3.4608
|
34 |
+
step 3100: train loss 3.6373, val loss 3.5505
|
35 |
+
step 3200: train loss 3.5100, val loss 3.6281
|
36 |
+
step 3300: train loss 3.5623, val loss 3.5894
|
37 |
+
step 3400: train loss 3.4841, val loss 3.4290
|
38 |
+
step 3500: train loss 3.5908, val loss 3.4267
|
39 |
+
step 3600: train loss 3.4661, val loss 3.5482
|
40 |
+
step 3700: train loss 3.4633, val loss 3.4274
|
41 |
+
step 3800: train loss 3.4503, val loss 3.5384
|
42 |
+
step 3900: train loss 3.3948, val loss 3.3274
|
43 |
+
step 4000: train loss 3.4388, val loss 3.3746
|
44 |
+
step 4100: train loss 3.3921, val loss 3.2486
|
45 |
+
step 4200: train loss 3.4422, val loss 3.3624
|
46 |
+
step 4300: train loss 3.3533, val loss 3.2563
|
47 |
+
step 4400: train loss 3.3215, val loss 3.3935
|
48 |
+
step 4500: train loss 3.4373, val loss 3.2724
|
49 |
+
step 4600: train loss 3.2562, val loss 3.2819
|
50 |
+
step 4700: train loss 3.3209, val loss 3.2646
|
51 |
+
step 4800: train loss 3.1498, val loss 3.3252
|
52 |
+
step 4900: train loss 3.3318, val loss 3.3322
|
53 |
+
step 5000: train loss 3.1285, val loss 3.2495
|
54 |
+
step 5100: train loss 3.3448, val loss 3.1907
|
55 |
+
step 5200: train loss 3.3123, val loss 3.1915
|
56 |
+
step 5300: train loss 3.2482, val loss 3.3080
|
57 |
+
step 5400: train loss 3.0714, val loss 3.1940
|
58 |
+
step 5500: train loss 3.1294, val loss 3.2508
|
59 |
+
step 5600: train loss 3.2360, val loss 3.0566
|
60 |
+
step 5700: train loss 3.2703, val loss 3.1624
|
61 |
+
step 5800: train loss 3.3135, val loss 3.1183
|
62 |
+
step 5900: train loss 3.2142, val loss 3.1934
|
63 |
+
step 6000: train loss 3.2289, val loss 3.1825
|
64 |
+
step 6100: train loss 3.0920, val loss 3.1858
|
65 |
+
step 6200: train loss 3.2835, val loss 3.1578
|
66 |
+
step 6300: train loss 3.1277, val loss 3.1348
|
67 |
+
step 6400: train loss 3.0799, val loss 3.2929
|
68 |
+
step 6500: train loss 3.0791, val loss 3.2397
|
69 |
+
step 6600: train loss 3.2201, val loss 3.2587
|
70 |
+
step 6700: train loss 3.0092, val loss 3.2005
|
71 |
+
step 6800: train loss 3.0824, val loss 3.0970
|
72 |
+
step 6900: train loss 3.2339, val loss 3.1762
|
73 |
+
step 7000: train loss 3.1754, val loss 3.1966
|
74 |
+
step 7100: train loss 3.1720, val loss 3.1533
|
75 |
+
step 7200: train loss 3.1673, val loss 3.1003
|
76 |
+
step 7300: train loss 3.1047, val loss 3.1397
|
77 |
+
step 7400: train loss 3.1211, val loss 3.1447
|
78 |
+
step 7500: train loss 3.1564, val loss 3.0936
|
79 |
+
step 7600: train loss 3.0931, val loss 3.1315
|
80 |
+
step 7700: train loss 2.9800, val loss 3.2394
|
81 |
+
step 7800: train loss 3.1775, val loss 3.2620
|
82 |
+
step 7900: train loss 3.0847, val loss 3.0954
|
83 |
+
step 8000: train loss 3.0581, val loss 3.0713
|
84 |
+
step 8100: train loss 3.1880, val loss 3.0542
|
85 |
+
step 8200: train loss 3.1568, val loss 3.0514
|
86 |
+
step 8300: train loss 3.0128, val loss 3.1295
|
87 |
+
step 8400: train loss 3.2077, val loss 3.0505
|
88 |
+
step 8500: train loss 3.0058, val loss 3.1052
|
89 |
+
step 8600: train loss 3.0915, val loss 2.8884
|
90 |
+
step 8700: train loss 3.1190, val loss 3.0491
|
91 |
+
step 8800: train loss 2.9319, val loss 2.9831
|
92 |
+
step 8900: train loss 2.9605, val loss 3.0030
|
93 |
+
step 9000: train loss 3.0953, val loss 2.9161
|
94 |
+
step 9100: train loss 3.1344, val loss 3.0248
|
95 |
+
step 9200: train loss 2.9525, val loss 3.0419
|
96 |
+
step 9300: train loss 2.9842, val loss 2.9508
|
97 |
+
step 9400: train loss 3.1642, val loss 3.0025
|
98 |
+
step 9500: train loss 2.9276, val loss 3.0674
|
99 |
+
step 9600: train loss 3.0968, val loss 3.0211
|
100 |
+
step 9700: train loss 3.1166, val loss 3.0580
|
101 |
+
step 9800: train loss 2.9912, val loss 2.9596
|
102 |
+
step 9900: train loss 2.9809, val loss 2.9561
|
103 |
+
step 10000: train loss 3.0402, val loss 2.9424
|
104 |
+
step 10100: train loss 2.9393, val loss 2.9690
|
105 |
+
step 10200: train loss 3.0273, val loss 3.0578
|
106 |
+
step 10300: train loss 2.9466, val loss 3.1119
|
107 |
+
step 10400: train loss 2.9821, val loss 2.9871
|
108 |
+
step 10500: train loss 3.0022, val loss 3.0068
|
109 |
+
step 10600: train loss 2.9527, val loss 3.0174
|
110 |
+
step 10700: train loss 3.0224, val loss 3.0772
|
111 |
+
step 10800: train loss 2.9642, val loss 3.0270
|
112 |
+
step 10900: train loss 2.9446, val loss 2.9751
|
113 |
+
step 11000: train loss 2.9466, val loss 2.9945
|
114 |
+
step 11100: train loss 2.9304, val loss 2.9444
|
115 |
+
step 11200: train loss 2.9619, val loss 3.0315
|
116 |
+
step 11300: train loss 2.9358, val loss 2.9847
|
117 |
+
step 11400: train loss 3.0165, val loss 2.7416
|
118 |
+
step 11500: train loss 2.8405, val loss 3.0835
|
119 |
+
step 11600: train loss 3.0746, val loss 3.0534
|
120 |
+
step 11700: train loss 2.9898, val loss 2.9221
|
121 |
+
step 11800: train loss 2.8608, val loss 3.0250
|
122 |
+
step 11900: train loss 2.9855, val loss 2.9443
|
123 |
+
step 12000: train loss 2.9834, val loss 2.9962
|
124 |
+
step 12100: train loss 2.8355, val loss 3.0118
|
125 |
+
step 12200: train loss 2.9886, val loss 2.9714
|
126 |
+
step 12300: train loss 2.9457, val loss 2.9599
|
127 |
+
step 12400: train loss 2.8276, val loss 3.0673
|
128 |
+
step 12500: train loss 2.9246, val loss 2.9800
|
129 |
+
step 12600: train loss 3.0029, val loss 2.8929
|
130 |
+
step 12700: train loss 2.9373, val loss 2.9386
|
131 |
+
step 12800: train loss 2.9504, val loss 3.0079
|
132 |
+
step 12900: train loss 2.9921, val loss 2.9243
|
133 |
+
step 13000: train loss 2.9724, val loss 3.0502
|
134 |
+
step 13100: train loss 2.9558, val loss 2.8818
|
135 |
+
step 13200: train loss 2.9938, val loss 2.9503
|
136 |
+
step 13300: train loss 2.9165, val loss 3.0683
|
137 |
+
step 13400: train loss 2.9777, val loss 2.9374
|
138 |
+
step 13500: train loss 3.0141, val loss 2.9254
|
139 |
+
step 13600: train loss 2.8655, val loss 2.9531
|
140 |
+
step 13700: train loss 2.8848, val loss 3.0087
|
141 |
+
step 13800: train loss 2.9226, val loss 2.8738
|
142 |
+
step 13900: train loss 2.8910, val loss 2.9250
|
143 |
+
step 14000: train loss 2.9752, val loss 2.9531
|
144 |
+
step 14100: train loss 2.9497, val loss 2.9894
|
145 |
+
step 14200: train loss 2.9779, val loss 2.9911
|
146 |
+
step 14300: train loss 2.9423, val loss 2.7897
|
147 |
+
step 14400: train loss 3.0338, val loss 3.0204
|
148 |
+
step 14500: train loss 2.8680, val loss 2.8760
|
149 |
+
step 14600: train loss 3.0093, val loss 2.8034
|
150 |
+
step 14700: train loss 2.9222, val loss 2.8466
|
151 |
+
step 14800: train loss 2.7877, val loss 2.8845
|
152 |
+
step 14900: train loss 2.9715, val loss 3.0005
|
153 |
+
step 15000: train loss 3.0018, val loss 3.0310
|
154 |
+
step 15100: train loss 2.9654, val loss 2.9176
|
155 |
+
step 15200: train loss 2.9580, val loss 2.9125
|
156 |
+
step 15300: train loss 3.0046, val loss 2.8712
|
157 |
+
step 15400: train loss 3.0046, val loss 2.9361
|
158 |
+
step 15500: train loss 2.7949, val loss 2.8170
|
159 |
+
step 15600: train loss 2.9127, val loss 2.9011
|
160 |
+
step 15700: train loss 2.9440, val loss 2.9167
|
161 |
+
step 15800: train loss 2.8596, val loss 2.8605
|
162 |
+
step 15900: train loss 2.8704, val loss 2.8725
|
163 |
+
step 16000: train loss 2.8634, val loss 3.0975
|
164 |
+
step 16100: train loss 2.9963, val loss 2.7633
|
165 |
+
step 16200: train loss 2.9618, val loss 2.9352
|
166 |
+
step 16300: train loss 2.7306, val loss 2.9384
|
167 |
+
step 16400: train loss 2.9731, val loss 2.9716
|
168 |
+
step 16500: train loss 2.8599, val loss 3.0492
|
169 |
+
step 16600: train loss 2.8712, val loss 2.9475
|
170 |
+
step 16700: train loss 2.9567, val loss 2.8846
|
171 |
+
step 16800: train loss 2.8565, val loss 3.0182
|
172 |
+
step 16900: train loss 2.8318, val loss 3.0222
|
173 |
+
step 17000: train loss 3.0119, val loss 2.8964
|
174 |
+
step 17100: train loss 2.8578, val loss 2.7679
|
175 |
+
step 17200: train loss 2.8943, val loss 2.9294
|
176 |
+
step 17300: train loss 2.8835, val loss 2.8658
|
177 |
+
step 17400: train loss 2.9415, val loss 2.9057
|
178 |
+
step 17500: train loss 2.8730, val loss 2.7631
|
179 |
+
step 17600: train loss 2.7918, val loss 2.7859
|
180 |
+
step 17700: train loss 2.9455, val loss 2.9624
|
181 |
+
step 17800: train loss 2.7874, val loss 2.8241
|
182 |
+
step 17900: train loss 2.9045, val loss 2.8924
|
183 |
+
step 18000: train loss 2.6872, val loss 2.9278
|
184 |
+
step 18100: train loss 2.9407, val loss 2.9969
|
185 |
+
step 18200: train loss 3.0288, val loss 2.9354
|
186 |
+
step 18300: train loss 2.8862, val loss 2.8489
|
187 |
+
step 18400: train loss 2.8283, val loss 2.8086
|
188 |
+
step 18500: train loss 2.8491, val loss 2.8545
|
189 |
+
step 18600: train loss 2.8140, val loss 2.9770
|
190 |
+
step 18700: train loss 2.9287, val loss 2.8787
|
191 |
+
step 18800: train loss 3.0498, val loss 2.7461
|
192 |
+
step 18900: train loss 2.9223, val loss 2.8665
|
193 |
+
step 19000: train loss 2.9418, val loss 2.9149
|
194 |
+
step 19100: train loss 2.6789, val loss 2.9049
|
195 |
+
step 19200: train loss 2.8974, val loss 2.8892
|
196 |
+
step 19300: train loss 2.8448, val loss 2.9557
|
197 |
+
step 19400: train loss 2.8466, val loss 2.9635
|
198 |
+
step 19500: train loss 2.8872, val loss 2.8272
|
199 |
+
step 19600: train loss 2.7967, val loss 3.0509
|
200 |
+
step 19700: train loss 2.8516, val loss 2.7520
|
201 |
+
step 19800: train loss 3.0064, val loss 2.8897
|
202 |
+
step 19900: train loss 2.8801, val loss 2.9297
|
203 |
+
step 20000: train loss 2.8270, val loss 2.9379
|
204 |
+
step 20100: train loss 2.8988, val loss 2.8314
|
205 |
+
step 20200: train loss 2.6983, val loss 2.9195
|
206 |
+
step 20300: train loss 2.8345, val loss 2.8455
|
207 |
+
step 20400: train loss 2.7777, val loss 2.9164
|
208 |
+
step 20500: train loss 2.9010, val loss 2.8442
|
209 |
+
step 20600: train loss 2.8983, val loss 2.8687
|
210 |
+
step 20700: train loss 2.7852, val loss 2.8359
|
211 |
+
step 20800: train loss 2.6776, val loss 2.8802
|
212 |
+
step 20900: train loss 2.7957, val loss 2.9362
|
213 |
+
step 21000: train loss 2.8322, val loss 2.8738
|
214 |
+
step 21100: train loss 2.8448, val loss 2.8849
|
215 |
+
step 21200: train loss 2.9563, val loss 3.0302
|
216 |
+
step 21300: train loss 2.9416, val loss 2.7907
|
217 |
+
step 21400: train loss 2.7988, val loss 2.8956
|
218 |
+
step 21500: train loss 2.8556, val loss 2.8462
|
219 |
+
step 21600: train loss 2.8326, val loss 2.8084
|
220 |
+
step 21700: train loss 2.8916, val loss 2.9479
|
221 |
+
step 21800: train loss 2.6759, val loss 2.8316
|
222 |
+
step 21900: train loss 2.7605, val loss 2.8726
|
223 |
+
step 22000: train loss 2.8973, val loss 2.7646
|
224 |
+
step 22100: train loss 2.7950, val loss 2.8894
|
225 |
+
step 22200: train loss 2.8879, val loss 2.8456
|
226 |
+
step 22300: train loss 2.8610, val loss 2.7752
|
227 |
+
step 22400: train loss 2.8503, val loss 2.7268
|
228 |
+
step 22500: train loss 2.7624, val loss 2.8039
|
229 |
+
step 22600: train loss 2.7896, val loss 2.9268
|
230 |
+
step 22700: train loss 2.9371, val loss 2.8718
|
231 |
+
step 22800: train loss 2.9747, val loss 2.7481
|
232 |
+
step 22900: train loss 2.8736, val loss 2.8353
|
233 |
+
step 23000: train loss 2.8346, val loss 2.7387
|
234 |
+
step 23100: train loss 2.8266, val loss 2.9682
|
235 |
+
step 23200: train loss 2.8811, val loss 2.8276
|
236 |
+
step 23300: train loss 2.8492, val loss 2.7715
|
237 |
+
step 23400: train loss 2.9512, val loss 2.8733
|
238 |
+
step 23500: train loss 2.8948, val loss 2.8610
|
239 |
+
step 23600: train loss 2.9883, val loss 2.8248
|
240 |
+
step 23700: train loss 2.7142, val loss 2.9138
|
241 |
+
step 23800: train loss 2.7128, val loss 2.8417
|
242 |
+
step 23900: train loss 3.0065, val loss 2.8004
|
243 |
+
step 24000: train loss 2.8458, val loss 2.7381
|
244 |
+
step 24100: train loss 2.7890, val loss 2.8468
|
245 |
+
step 24200: train loss 2.9545, val loss 2.7933
|
246 |
+
step 24300: train loss 2.8738, val loss 2.9072
|
247 |
+
step 24400: train loss 2.8440, val loss 2.7552
|
248 |
+
step 24500: train loss 2.8107, val loss 2.7479
|
249 |
+
step 24600: train loss 2.8175, val loss 2.8063
|
250 |
+
step 24700: train loss 2.9319, val loss 2.8145
|
251 |
+
step 24800: train loss 2.8535, val loss 2.8273
|
252 |
+
step 24900: train loss 2.7535, val loss 2.9339
|
253 |
+
step 25000: train loss 2.7998, val loss 2.8346
|
254 |
+
step 25100: train loss 2.8028, val loss 2.7334
|
255 |
+
step 25200: train loss 3.0190, val loss 2.7507
|
256 |
+
step 25300: train loss 2.9597, val loss 2.7477
|
257 |
+
step 25400: train loss 3.0206, val loss 2.8678
|
258 |
+
step 25500: train loss 2.8184, val loss 2.8603
|
259 |
+
step 25600: train loss 2.8984, val loss 2.7563
|
260 |
+
step 25700: train loss 2.7563, val loss 2.8466
|
261 |
+
step 25800: train loss 2.8035, val loss 2.8461
|
262 |
+
step 25900: train loss 2.8879, val loss 3.0032
|
263 |
+
step 26000: train loss 2.8628, val loss 2.8316
|
264 |
+
step 26100: train loss 2.8199, val loss 2.8175
|
265 |
+
step 26200: train loss 2.8381, val loss 2.7543
|
266 |
+
step 26300: train loss 2.7932, val loss 2.7437
|
267 |
+
step 26400: train loss 2.7451, val loss 2.8037
|
268 |
+
step 26500: train loss 2.8398, val loss 2.7688
|
269 |
+
step 26600: train loss 2.8197, val loss 2.6988
|
270 |
+
step 26700: train loss 2.8181, val loss 2.8315
|
271 |
+
step 26800: train loss 2.7584, val loss 2.6994
|
272 |
+
step 26900: train loss 2.7917, val loss 2.7537
|
273 |
+
step 27000: train loss 2.6462, val loss 2.7579
|
274 |
+
step 27100: train loss 2.8499, val loss 2.7959
|
275 |
+
step 27200: train loss 2.8724, val loss 2.8232
|
276 |
+
step 27300: train loss 2.7593, val loss 2.8665
|
277 |
+
step 27400: train loss 2.8588, val loss 2.9407
|
278 |
+
step 27500: train loss 2.7949, val loss 2.6853
|
279 |
+
step 27600: train loss 2.7752, val loss 2.8110
|
280 |
+
step 27700: train loss 2.9131, val loss 2.9227
|
281 |
+
step 27800: train loss 2.7813, val loss 2.7983
|
282 |
+
step 27900: train loss 2.7238, val loss 2.9116
|
283 |
+
step 28000: train loss 2.6029, val loss 2.6874
|
284 |
+
step 28100: train loss 2.7992, val loss 2.8840
|
285 |
+
step 28200: train loss 2.8726, val loss 2.7155
|
286 |
+
step 28300: train loss 2.8896, val loss 2.7741
|
287 |
+
step 28400: train loss 2.8420, val loss 2.7712
|
288 |
+
step 28500: train loss 2.7476, val loss 2.8297
|
289 |
+
step 28600: train loss 2.8152, val loss 2.8123
|
290 |
+
step 28700: train loss 2.8929, val loss 2.8723
|
291 |
+
step 28800: train loss 2.8116, val loss 2.8850
|
292 |
+
step 28900: train loss 2.8026, val loss 2.8580
|
293 |
+
step 29000: train loss 2.6830, val loss 2.7671
|
294 |
+
step 29100: train loss 2.7769, val loss 2.8252
|
295 |
+
step 29200: train loss 2.8928, val loss 2.7823
|
296 |
+
step 29300: train loss 2.7859, val loss 2.8006
|
297 |
+
step 29400: train loss 2.8484, val loss 2.8032
|
298 |
+
step 29500: train loss 2.8194, val loss 2.7389
|
299 |
+
step 29600: train loss 2.8775, val loss 2.8360
|
300 |
+
step 29700: train loss 2.7912, val loss 2.7585
|
301 |
+
step 29800: train loss 2.8499, val loss 2.8210
|
302 |
+
step 29900: train loss 2.9061, val loss 2.6846
|
303 |
+
step 30000: train loss 2.7540, val loss 2.8391
|
304 |
+
step 30100: train loss 2.8292, val loss 2.8358
|
305 |
+
step 30200: train loss 2.5902, val loss 2.8730
|
306 |
+
step 30300: train loss 2.8947, val loss 2.8475
|
307 |
+
step 30400: train loss 2.8898, val loss 2.7538
|
308 |
+
step 30500: train loss 2.8530, val loss 2.8979
|
309 |
+
step 30600: train loss 2.8079, val loss 2.8202
|
310 |
+
step 30700: train loss 2.6925, val loss 2.7329
|
311 |
+
step 30800: train loss 2.7408, val loss 2.7117
|
312 |
+
step 30900: train loss 2.7052, val loss 2.8759
|
313 |
+
step 31000: train loss 2.7108, val loss 2.6607
|
314 |
+
step 31100: train loss 2.8145, val loss 2.7848
|
315 |
+
step 31200: train loss 2.8752, val loss 2.8979
|
316 |
+
step 31300: train loss 2.6798, val loss 2.8022
|
317 |
+
step 31400: train loss 2.9750, val loss 2.6888
|
318 |
+
step 31500: train loss 2.6494, val loss 2.8619
|
319 |
+
step 31600: train loss 2.8156, val loss 2.8232
|
320 |
+
step 31700: train loss 2.7252, val loss 2.7410
|
321 |
+
step 31800: train loss 2.6924, val loss 2.7541
|
322 |
+
step 31900: train loss 2.8176, val loss 2.9296
|
323 |
+
step 32000: train loss 2.8469, val loss 2.8549
|
324 |
+
step 32100: train loss 2.8750, val loss 2.9075
|
325 |
+
step 32200: train loss 2.8387, val loss 2.7277
|
326 |
+
step 32300: train loss 2.7656, val loss 2.7939
|
327 |
+
step 32400: train loss 2.6632, val loss 2.7976
|
328 |
+
step 32500: train loss 2.7674, val loss 2.7517
|
329 |
+
step 32600: train loss 2.8411, val loss 2.7297
|
330 |
+
step 32700: train loss 2.8641, val loss 2.7247
|
331 |
+
step 32800: train loss 2.6665, val loss 2.7943
|
332 |
+
step 32900: train loss 2.8883, val loss 2.7321
|
333 |
+
step 33000: train loss 2.8978, val loss 2.7700
|
334 |
+
step 33100: train loss 2.7607, val loss 2.6791
|
335 |
+
step 33200: train loss 2.7516, val loss 2.8169
|
336 |
+
step 33300: train loss 2.8498, val loss 2.6707
|
337 |
+
step 33400: train loss 2.8504, val loss 2.9119
|
338 |
+
step 33500: train loss 2.7596, val loss 2.9151
|
339 |
+
step 33600: train loss 2.9359, val loss 2.9191
|
340 |
+
step 33700: train loss 2.7263, val loss 2.8193
|
341 |
+
step 33800: train loss 2.8230, val loss 2.8280
|
342 |
+
step 33900: train loss 2.8378, val loss 2.7144
|
343 |
+
step 34000: train loss 2.7823, val loss 2.8035
|
344 |
+
step 34100: train loss 2.7779, val loss 2.8396
|
345 |
+
step 34200: train loss 2.8372, val loss 2.8954
|
346 |
+
step 34300: train loss 2.8226, val loss 2.6627
|
347 |
+
step 34400: train loss 2.8642, val loss 2.8739
|
348 |
+
step 34500: train loss 2.7282, val loss 2.6650
|
349 |
+
step 34600: train loss 2.7650, val loss 2.7226
|
350 |
+
step 34700: train loss 2.7236, val loss 2.6892
|
351 |
+
step 34800: train loss 2.7721, val loss 2.9387
|
352 |
+
step 34900: train loss 2.7465, val loss 2.7535
|
353 |
+
step 35000: train loss 2.7129, val loss 2.7230
|
354 |
+
step 35100: train loss 2.7448, val loss 2.7261
|
355 |
+
step 35200: train loss 2.9534, val loss 2.7127
|
356 |
+
step 35300: train loss 2.6951, val loss 2.8034
|
357 |
+
step 35400: train loss 2.8718, val loss 2.7998
|
358 |
+
step 35500: train loss 2.7152, val loss 2.7406
|
359 |
+
step 35600: train loss 2.8066, val loss 2.7981
|
360 |
+
step 35700: train loss 2.8076, val loss 2.7320
|
361 |
+
step 35800: train loss 2.9054, val loss 2.7541
|
362 |
+
step 35900: train loss 2.8348, val loss 2.6628
|
363 |
+
step 36000: train loss 2.7294, val loss 2.7758
|
364 |
+
step 36100: train loss 2.8457, val loss 2.8148
|
365 |
+
step 36200: train loss 2.8626, val loss 2.8337
|
366 |
+
step 36300: train loss 2.7538, val loss 2.8294
|
367 |
+
step 36400: train loss 2.5631, val loss 2.7590
|
368 |
+
step 36500: train loss 2.8542, val loss 2.7585
|
369 |
+
step 36600: train loss 2.7567, val loss 2.8492
|
370 |
+
step 36700: train loss 2.8481, val loss 2.7103
|
371 |
+
step 36800: train loss 2.8135, val loss 2.7256
|
372 |
+
step 36900: train loss 2.6976, val loss 2.6366
|
373 |
+
step 37000: train loss 2.8643, val loss 2.7390
|
374 |
+
step 37100: train loss 2.7979, val loss 2.6219
|
375 |
+
step 37200: train loss 2.7855, val loss 2.8387
|
376 |
+
step 37300: train loss 2.8332, val loss 2.8489
|
377 |
+
step 37400: train loss 2.6962, val loss 2.9051
|
378 |
+
step 37500: train loss 2.7735, val loss 2.8329
|
379 |
+
step 37600: train loss 2.8305, val loss 2.7830
|
380 |
+
step 37700: train loss 2.7930, val loss 2.7070
|
381 |
+
step 37800: train loss 2.7834, val loss 2.7718
|
382 |
+
step 37900: train loss 2.9645, val loss 2.7499
|
383 |
+
step 38000: train loss 2.6900, val loss 2.8002
|
384 |
+
step 38100: train loss 2.7324, val loss 2.8638
|
385 |
+
step 38200: train loss 2.6724, val loss 2.7601
|
386 |
+
step 38300: train loss 2.8456, val loss 2.7571
|
387 |
+
step 38400: train loss 2.7720, val loss 2.8515
|
388 |
+
step 38500: train loss 2.7960, val loss 2.8611
|
389 |
+
step 38600: train loss 2.7673, val loss 2.8128
|
390 |
+
step 38700: train loss 2.8076, val loss 2.8023
|
391 |
+
step 38800: train loss 2.8252, val loss 2.7761
|
392 |
+
step 38900: train loss 2.6206, val loss 2.8931
|
393 |
+
step 39000: train loss 2.7810, val loss 2.6949
|
394 |
+
step 39100: train loss 2.8880, val loss 2.6300
|
395 |
+
step 39200: train loss 2.7765, val loss 2.8009
|
396 |
+
step 39300: train loss 2.8100, val loss 2.9730
|
397 |
+
step 39400: train loss 2.6373, val loss 2.7640
|
398 |
+
step 39500: train loss 2.7533, val loss 2.7617
|
399 |
+
step 39600: train loss 2.8452, val loss 2.8122
|
400 |
+
step 39700: train loss 2.7849, val loss 2.8067
|
401 |
+
step 39800: train loss 2.7890, val loss 2.7672
|
402 |
+
step 39900: train loss 2.7164, val loss 2.6389
|
403 |
+
step 40000: train loss 2.8189, val loss 2.7924
|
404 |
+
step 40100: train loss 2.9345, val loss 2.9801
|
405 |
+
step 40200: train loss 2.9074, val loss 2.7438
|
406 |
+
step 40300: train loss 2.8472, val loss 2.7186
|
407 |
+
step 40400: train loss 2.5992, val loss 2.7979
|
408 |
+
step 40500: train loss 2.8513, val loss 2.7371
|
409 |
+
step 40600: train loss 2.6937, val loss 2.7330
|
410 |
+
step 40700: train loss 2.7758, val loss 2.7263
|
411 |
+
step 40800: train loss 2.7242, val loss 2.8467
|
412 |
+
step 40900: train loss 2.7578, val loss 2.9498
|
413 |
+
step 41000: train loss 2.7946, val loss 2.7555
|
414 |
+
step 41100: train loss 2.8186, val loss 2.7127
|
415 |
+
step 41200: train loss 2.7768, val loss 2.7014
|
416 |
+
step 41300: train loss 2.8141, val loss 2.7691
|
417 |
+
step 41400: train loss 2.7520, val loss 2.6608
|
418 |
+
step 41500: train loss 2.7952, val loss 2.8809
|
419 |
+
step 41600: train loss 2.7405, val loss 2.8320
|
420 |
+
step 41700: train loss 2.7319, val loss 2.6906
|
421 |
+
step 41800: train loss 2.7042, val loss 2.8355
|
422 |
+
step 41900: train loss 2.6836, val loss 2.7683
|
423 |
+
step 42000: train loss 2.8002, val loss 2.7833
|
424 |
+
step 42100: train loss 2.9250, val loss 2.7595
|
425 |
+
step 42200: train loss 2.6998, val loss 2.8130
|
426 |
+
step 42300: train loss 2.6696, val loss 2.7072
|
427 |
+
step 42400: train loss 2.6971, val loss 2.7896
|
428 |
+
step 42500: train loss 2.7793, val loss 2.8207
|
429 |
+
step 42600: train loss 2.7416, val loss 2.6938
|
430 |
+
step 42700: train loss 2.5605, val loss 2.8192
|
431 |
+
step 42800: train loss 2.8029, val loss 2.6802
|
432 |
+
step 42900: train loss 2.8314, val loss 2.7868
|
433 |
+
step 43000: train loss 2.7065, val loss 2.5963
|
434 |
+
step 43100: train loss 2.8072, val loss 2.7424
|
435 |
+
step 43200: train loss 2.6797, val loss 2.7166
|
436 |
+
step 43300: train loss 2.6579, val loss 2.7534
|
437 |
+
step 43400: train loss 2.8590, val loss 2.8177
|
438 |
+
step 43500: train loss 2.7240, val loss 2.8758
|
439 |
+
step 43600: train loss 2.8024, val loss 2.7224
|
440 |
+
step 43700: train loss 2.8347, val loss 2.7132
|
441 |
+
step 43800: train loss 2.8055, val loss 2.6904
|
442 |
+
step 43900: train loss 2.7516, val loss 2.7553
|
443 |
+
step 44000: train loss 2.7896, val loss 2.7832
|
444 |
+
step 44100: train loss 2.8472, val loss 2.7570
|
445 |
+
step 44200: train loss 2.6282, val loss 2.6458
|
446 |
+
step 44300: train loss 2.7891, val loss 2.6897
|
447 |
+
step 44400: train loss 2.8262, val loss 2.7445
|
448 |
+
step 44500: train loss 2.7764, val loss 2.7653
|
449 |
+
step 44600: train loss 2.8129, val loss 2.7805
|
450 |
+
step 44700: train loss 2.8649, val loss 2.8448
|
451 |
+
step 44800: train loss 2.6760, val loss 2.7656
|
452 |
+
step 44900: train loss 2.7011, val loss 2.7474
|
453 |
+
step 45000: train loss 2.7879, val loss 2.6947
|
454 |
+
step 45100: train loss 2.9080, val loss 2.7905
|
455 |
+
step 45200: train loss 2.7495, val loss 2.7055
|
456 |
+
step 45300: train loss 2.6580, val loss 2.8663
|
457 |
+
step 45400: train loss 2.8094, val loss 2.8226
|
458 |
+
step 45500: train loss 2.7298, val loss 2.8190
|
459 |
+
step 45600: train loss 2.7434, val loss 2.6559
|
460 |
+
step 45700: train loss 2.8474, val loss 2.7221
|
461 |
+
step 45800: train loss 2.8787, val loss 2.8628
|
462 |
+
step 45900: train loss 2.7202, val loss 2.6398
|
463 |
+
step 46000: train loss 2.8298, val loss 2.8447
|
464 |
+
step 46100: train loss 2.6955, val loss 2.8386
|
465 |
+
step 46200: train loss 2.7849, val loss 2.6825
|
466 |
+
step 46300: train loss 2.8191, val loss 2.7793
|
467 |
+
step 46400: train loss 2.7815, val loss 2.7403
|
468 |
+
step 46500: train loss 2.8007, val loss 2.7719
|
469 |
+
step 46600: train loss 2.6661, val loss 2.8360
|
470 |
+
step 46700: train loss 2.8279, val loss 2.7529
|
471 |
+
step 46800: train loss 2.8326, val loss 2.7180
|
472 |
+
step 46900: train loss 2.7323, val loss 2.8723
|
473 |
+
step 47000: train loss 2.7846, val loss 2.7797
|
474 |
+
step 47100: train loss 2.7533, val loss 2.7694
|
475 |
+
step 47200: train loss 2.8556, val loss 2.7418
|
476 |
+
step 47300: train loss 2.7036, val loss 2.7377
|
477 |
+
step 47400: train loss 2.7860, val loss 2.8879
|
478 |
+
step 47500: train loss 2.7223, val loss 2.8096
|
479 |
+
step 47600: train loss 2.7241, val loss 2.9097
|
480 |
+
step 47700: train loss 2.6891, val loss 2.8653
|
481 |
+
step 47800: train loss 2.7393, val loss 2.7434
|
482 |
+
step 47900: train loss 2.8288, val loss 2.6818
|
483 |
+
step 48000: train loss 2.7092, val loss 2.7769
|