Upload ./training.log with huggingface_hub
Browse files- training.log +510 -0
training.log
ADDED
@@ -0,0 +1,510 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2023-10-24 10:30:21,441 ----------------------------------------------------------------------------------------------------
|
2 |
+
2023-10-24 10:30:21,442 Model: "SequenceTagger(
|
3 |
+
(embeddings): TransformerWordEmbeddings(
|
4 |
+
(model): BertModel(
|
5 |
+
(embeddings): BertEmbeddings(
|
6 |
+
(word_embeddings): Embedding(64001, 768)
|
7 |
+
(position_embeddings): Embedding(512, 768)
|
8 |
+
(token_type_embeddings): Embedding(2, 768)
|
9 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
10 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
11 |
+
)
|
12 |
+
(encoder): BertEncoder(
|
13 |
+
(layer): ModuleList(
|
14 |
+
(0): BertLayer(
|
15 |
+
(attention): BertAttention(
|
16 |
+
(self): BertSelfAttention(
|
17 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
18 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
19 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
21 |
+
)
|
22 |
+
(output): BertSelfOutput(
|
23 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
24 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
25 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
26 |
+
)
|
27 |
+
)
|
28 |
+
(intermediate): BertIntermediate(
|
29 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
30 |
+
(intermediate_act_fn): GELUActivation()
|
31 |
+
)
|
32 |
+
(output): BertOutput(
|
33 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
34 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
35 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
36 |
+
)
|
37 |
+
)
|
38 |
+
(1): BertLayer(
|
39 |
+
(attention): BertAttention(
|
40 |
+
(self): BertSelfAttention(
|
41 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
42 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
43 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
44 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
45 |
+
)
|
46 |
+
(output): BertSelfOutput(
|
47 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
48 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
49 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
50 |
+
)
|
51 |
+
)
|
52 |
+
(intermediate): BertIntermediate(
|
53 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
54 |
+
(intermediate_act_fn): GELUActivation()
|
55 |
+
)
|
56 |
+
(output): BertOutput(
|
57 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
58 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
59 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
60 |
+
)
|
61 |
+
)
|
62 |
+
(2): BertLayer(
|
63 |
+
(attention): BertAttention(
|
64 |
+
(self): BertSelfAttention(
|
65 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
66 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
67 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
68 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
69 |
+
)
|
70 |
+
(output): BertSelfOutput(
|
71 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
72 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
73 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
74 |
+
)
|
75 |
+
)
|
76 |
+
(intermediate): BertIntermediate(
|
77 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
78 |
+
(intermediate_act_fn): GELUActivation()
|
79 |
+
)
|
80 |
+
(output): BertOutput(
|
81 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
82 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
83 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
84 |
+
)
|
85 |
+
)
|
86 |
+
(3): BertLayer(
|
87 |
+
(attention): BertAttention(
|
88 |
+
(self): BertSelfAttention(
|
89 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
90 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
91 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
92 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
93 |
+
)
|
94 |
+
(output): BertSelfOutput(
|
95 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
96 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
97 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
98 |
+
)
|
99 |
+
)
|
100 |
+
(intermediate): BertIntermediate(
|
101 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
102 |
+
(intermediate_act_fn): GELUActivation()
|
103 |
+
)
|
104 |
+
(output): BertOutput(
|
105 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
106 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
107 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
108 |
+
)
|
109 |
+
)
|
110 |
+
(4): BertLayer(
|
111 |
+
(attention): BertAttention(
|
112 |
+
(self): BertSelfAttention(
|
113 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
114 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
115 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
116 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
117 |
+
)
|
118 |
+
(output): BertSelfOutput(
|
119 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
120 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
121 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
122 |
+
)
|
123 |
+
)
|
124 |
+
(intermediate): BertIntermediate(
|
125 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
126 |
+
(intermediate_act_fn): GELUActivation()
|
127 |
+
)
|
128 |
+
(output): BertOutput(
|
129 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
130 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
131 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
132 |
+
)
|
133 |
+
)
|
134 |
+
(5): BertLayer(
|
135 |
+
(attention): BertAttention(
|
136 |
+
(self): BertSelfAttention(
|
137 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
138 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
139 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
140 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
141 |
+
)
|
142 |
+
(output): BertSelfOutput(
|
143 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
144 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
145 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
146 |
+
)
|
147 |
+
)
|
148 |
+
(intermediate): BertIntermediate(
|
149 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
150 |
+
(intermediate_act_fn): GELUActivation()
|
151 |
+
)
|
152 |
+
(output): BertOutput(
|
153 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
154 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
155 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
156 |
+
)
|
157 |
+
)
|
158 |
+
(6): BertLayer(
|
159 |
+
(attention): BertAttention(
|
160 |
+
(self): BertSelfAttention(
|
161 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
162 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
163 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
164 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
165 |
+
)
|
166 |
+
(output): BertSelfOutput(
|
167 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
168 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
169 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
170 |
+
)
|
171 |
+
)
|
172 |
+
(intermediate): BertIntermediate(
|
173 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
174 |
+
(intermediate_act_fn): GELUActivation()
|
175 |
+
)
|
176 |
+
(output): BertOutput(
|
177 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
178 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
179 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
180 |
+
)
|
181 |
+
)
|
182 |
+
(7): BertLayer(
|
183 |
+
(attention): BertAttention(
|
184 |
+
(self): BertSelfAttention(
|
185 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
186 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
187 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
188 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
189 |
+
)
|
190 |
+
(output): BertSelfOutput(
|
191 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
192 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
193 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
194 |
+
)
|
195 |
+
)
|
196 |
+
(intermediate): BertIntermediate(
|
197 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
198 |
+
(intermediate_act_fn): GELUActivation()
|
199 |
+
)
|
200 |
+
(output): BertOutput(
|
201 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
202 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
203 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
204 |
+
)
|
205 |
+
)
|
206 |
+
(8): BertLayer(
|
207 |
+
(attention): BertAttention(
|
208 |
+
(self): BertSelfAttention(
|
209 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
210 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
211 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
212 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
213 |
+
)
|
214 |
+
(output): BertSelfOutput(
|
215 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
216 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
217 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
218 |
+
)
|
219 |
+
)
|
220 |
+
(intermediate): BertIntermediate(
|
221 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
222 |
+
(intermediate_act_fn): GELUActivation()
|
223 |
+
)
|
224 |
+
(output): BertOutput(
|
225 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
226 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
227 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
228 |
+
)
|
229 |
+
)
|
230 |
+
(9): BertLayer(
|
231 |
+
(attention): BertAttention(
|
232 |
+
(self): BertSelfAttention(
|
233 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
234 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
235 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
236 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
237 |
+
)
|
238 |
+
(output): BertSelfOutput(
|
239 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
240 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
241 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
242 |
+
)
|
243 |
+
)
|
244 |
+
(intermediate): BertIntermediate(
|
245 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
246 |
+
(intermediate_act_fn): GELUActivation()
|
247 |
+
)
|
248 |
+
(output): BertOutput(
|
249 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
250 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
251 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
252 |
+
)
|
253 |
+
)
|
254 |
+
(10): BertLayer(
|
255 |
+
(attention): BertAttention(
|
256 |
+
(self): BertSelfAttention(
|
257 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
258 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
259 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
260 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
261 |
+
)
|
262 |
+
(output): BertSelfOutput(
|
263 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
264 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
265 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
266 |
+
)
|
267 |
+
)
|
268 |
+
(intermediate): BertIntermediate(
|
269 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
270 |
+
(intermediate_act_fn): GELUActivation()
|
271 |
+
)
|
272 |
+
(output): BertOutput(
|
273 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
274 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
275 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
276 |
+
)
|
277 |
+
)
|
278 |
+
(11): BertLayer(
|
279 |
+
(attention): BertAttention(
|
280 |
+
(self): BertSelfAttention(
|
281 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
282 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
283 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
284 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
285 |
+
)
|
286 |
+
(output): BertSelfOutput(
|
287 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
288 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
289 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
290 |
+
)
|
291 |
+
)
|
292 |
+
(intermediate): BertIntermediate(
|
293 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
294 |
+
(intermediate_act_fn): GELUActivation()
|
295 |
+
)
|
296 |
+
(output): BertOutput(
|
297 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
298 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
299 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
300 |
+
)
|
301 |
+
)
|
302 |
+
)
|
303 |
+
)
|
304 |
+
(pooler): BertPooler(
|
305 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
306 |
+
(activation): Tanh()
|
307 |
+
)
|
308 |
+
)
|
309 |
+
)
|
310 |
+
(locked_dropout): LockedDropout(p=0.5)
|
311 |
+
(linear): Linear(in_features=768, out_features=21, bias=True)
|
312 |
+
(loss_function): CrossEntropyLoss()
|
313 |
+
)"
|
314 |
+
2023-10-24 10:30:21,442 ----------------------------------------------------------------------------------------------------
|
315 |
+
2023-10-24 10:30:21,442 MultiCorpus: 5901 train + 1287 dev + 1505 test sentences
|
316 |
+
- NER_HIPE_2022 Corpus: 5901 train + 1287 dev + 1505 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/fr/with_doc_seperator
|
317 |
+
2023-10-24 10:30:21,442 ----------------------------------------------------------------------------------------------------
|
318 |
+
2023-10-24 10:30:21,442 Train: 5901 sentences
|
319 |
+
2023-10-24 10:30:21,442 (train_with_dev=False, train_with_test=False)
|
320 |
+
2023-10-24 10:30:21,442 ----------------------------------------------------------------------------------------------------
|
321 |
+
2023-10-24 10:30:21,442 Training Params:
|
322 |
+
2023-10-24 10:30:21,442 - learning_rate: "3e-05"
|
323 |
+
2023-10-24 10:30:21,442 - mini_batch_size: "4"
|
324 |
+
2023-10-24 10:30:21,442 - max_epochs: "10"
|
325 |
+
2023-10-24 10:30:21,443 - shuffle: "True"
|
326 |
+
2023-10-24 10:30:21,443 ----------------------------------------------------------------------------------------------------
|
327 |
+
2023-10-24 10:30:21,443 Plugins:
|
328 |
+
2023-10-24 10:30:21,443 - TensorboardLogger
|
329 |
+
2023-10-24 10:30:21,443 - LinearScheduler | warmup_fraction: '0.1'
|
330 |
+
2023-10-24 10:30:21,443 ----------------------------------------------------------------------------------------------------
|
331 |
+
2023-10-24 10:30:21,443 Final evaluation on model from best epoch (best-model.pt)
|
332 |
+
2023-10-24 10:30:21,443 - metric: "('micro avg', 'f1-score')"
|
333 |
+
2023-10-24 10:30:21,443 ----------------------------------------------------------------------------------------------------
|
334 |
+
2023-10-24 10:30:21,443 Computation:
|
335 |
+
2023-10-24 10:30:21,443 - compute on device: cuda:0
|
336 |
+
2023-10-24 10:30:21,443 - embedding storage: none
|
337 |
+
2023-10-24 10:30:21,443 ----------------------------------------------------------------------------------------------------
|
338 |
+
2023-10-24 10:30:21,443 Model training base path: "hmbench-hipe2020/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3"
|
339 |
+
2023-10-24 10:30:21,443 ----------------------------------------------------------------------------------------------------
|
340 |
+
2023-10-24 10:30:21,443 ----------------------------------------------------------------------------------------------------
|
341 |
+
2023-10-24 10:30:21,443 Logging anything other than scalars to TensorBoard is currently not supported.
|
342 |
+
2023-10-24 10:30:30,685 epoch 1 - iter 147/1476 - loss 1.95486685 - time (sec): 9.24 - samples/sec: 1731.50 - lr: 0.000003 - momentum: 0.000000
|
343 |
+
2023-10-24 10:30:39,970 epoch 1 - iter 294/1476 - loss 1.26204596 - time (sec): 18.53 - samples/sec: 1712.28 - lr: 0.000006 - momentum: 0.000000
|
344 |
+
2023-10-24 10:30:49,068 epoch 1 - iter 441/1476 - loss 1.01783980 - time (sec): 27.62 - samples/sec: 1664.50 - lr: 0.000009 - momentum: 0.000000
|
345 |
+
2023-10-24 10:30:58,969 epoch 1 - iter 588/1476 - loss 0.82960997 - time (sec): 37.53 - samples/sec: 1720.25 - lr: 0.000012 - momentum: 0.000000
|
346 |
+
2023-10-24 10:31:09,389 epoch 1 - iter 735/1476 - loss 0.69238553 - time (sec): 47.95 - samples/sec: 1760.88 - lr: 0.000015 - momentum: 0.000000
|
347 |
+
2023-10-24 10:31:18,839 epoch 1 - iter 882/1476 - loss 0.61723371 - time (sec): 57.40 - samples/sec: 1758.66 - lr: 0.000018 - momentum: 0.000000
|
348 |
+
2023-10-24 10:31:28,128 epoch 1 - iter 1029/1476 - loss 0.55914623 - time (sec): 66.68 - samples/sec: 1750.72 - lr: 0.000021 - momentum: 0.000000
|
349 |
+
2023-10-24 10:31:37,986 epoch 1 - iter 1176/1476 - loss 0.51214581 - time (sec): 76.54 - samples/sec: 1748.28 - lr: 0.000024 - momentum: 0.000000
|
350 |
+
2023-10-24 10:31:47,268 epoch 1 - iter 1323/1476 - loss 0.47688361 - time (sec): 85.82 - samples/sec: 1743.62 - lr: 0.000027 - momentum: 0.000000
|
351 |
+
2023-10-24 10:31:56,801 epoch 1 - iter 1470/1476 - loss 0.44429417 - time (sec): 95.36 - samples/sec: 1740.14 - lr: 0.000030 - momentum: 0.000000
|
352 |
+
2023-10-24 10:31:57,149 ----------------------------------------------------------------------------------------------------
|
353 |
+
2023-10-24 10:31:57,150 EPOCH 1 done: loss 0.4435 - lr: 0.000030
|
354 |
+
2023-10-24 10:32:03,446 DEV : loss 0.1300630122423172 - f1-score (micro avg) 0.7315
|
355 |
+
2023-10-24 10:32:03,467 saving best model
|
356 |
+
2023-10-24 10:32:04,027 ----------------------------------------------------------------------------------------------------
|
357 |
+
2023-10-24 10:32:13,594 epoch 2 - iter 147/1476 - loss 0.09774384 - time (sec): 9.57 - samples/sec: 1764.56 - lr: 0.000030 - momentum: 0.000000
|
358 |
+
2023-10-24 10:32:22,802 epoch 2 - iter 294/1476 - loss 0.12373892 - time (sec): 18.77 - samples/sec: 1717.04 - lr: 0.000029 - momentum: 0.000000
|
359 |
+
2023-10-24 10:32:31,998 epoch 2 - iter 441/1476 - loss 0.13775868 - time (sec): 27.97 - samples/sec: 1678.40 - lr: 0.000029 - momentum: 0.000000
|
360 |
+
2023-10-24 10:32:41,763 epoch 2 - iter 588/1476 - loss 0.12915200 - time (sec): 37.74 - samples/sec: 1702.64 - lr: 0.000029 - momentum: 0.000000
|
361 |
+
2023-10-24 10:32:51,061 epoch 2 - iter 735/1476 - loss 0.12918783 - time (sec): 47.03 - samples/sec: 1692.83 - lr: 0.000028 - momentum: 0.000000
|
362 |
+
2023-10-24 10:33:00,686 epoch 2 - iter 882/1476 - loss 0.12958345 - time (sec): 56.66 - samples/sec: 1704.04 - lr: 0.000028 - momentum: 0.000000
|
363 |
+
2023-10-24 10:33:09,743 epoch 2 - iter 1029/1476 - loss 0.12887412 - time (sec): 65.71 - samples/sec: 1691.48 - lr: 0.000028 - momentum: 0.000000
|
364 |
+
2023-10-24 10:33:19,759 epoch 2 - iter 1176/1476 - loss 0.12754934 - time (sec): 75.73 - samples/sec: 1725.62 - lr: 0.000027 - momentum: 0.000000
|
365 |
+
2023-10-24 10:33:29,648 epoch 2 - iter 1323/1476 - loss 0.12821778 - time (sec): 85.62 - samples/sec: 1726.78 - lr: 0.000027 - momentum: 0.000000
|
366 |
+
2023-10-24 10:33:39,650 epoch 2 - iter 1470/1476 - loss 0.12716691 - time (sec): 95.62 - samples/sec: 1735.61 - lr: 0.000027 - momentum: 0.000000
|
367 |
+
2023-10-24 10:33:39,996 ----------------------------------------------------------------------------------------------------
|
368 |
+
2023-10-24 10:33:39,997 EPOCH 2 done: loss 0.1270 - lr: 0.000027
|
369 |
+
2023-10-24 10:33:48,525 DEV : loss 0.13529355823993683 - f1-score (micro avg) 0.7895
|
370 |
+
2023-10-24 10:33:48,546 saving best model
|
371 |
+
2023-10-24 10:33:49,250 ----------------------------------------------------------------------------------------------------
|
372 |
+
2023-10-24 10:33:58,610 epoch 3 - iter 147/1476 - loss 0.06413979 - time (sec): 9.36 - samples/sec: 1629.69 - lr: 0.000026 - momentum: 0.000000
|
373 |
+
2023-10-24 10:34:08,613 epoch 3 - iter 294/1476 - loss 0.08008883 - time (sec): 19.36 - samples/sec: 1719.49 - lr: 0.000026 - momentum: 0.000000
|
374 |
+
2023-10-24 10:34:18,091 epoch 3 - iter 441/1476 - loss 0.07801843 - time (sec): 28.84 - samples/sec: 1704.77 - lr: 0.000026 - momentum: 0.000000
|
375 |
+
2023-10-24 10:34:27,913 epoch 3 - iter 588/1476 - loss 0.07393475 - time (sec): 38.66 - samples/sec: 1742.86 - lr: 0.000025 - momentum: 0.000000
|
376 |
+
2023-10-24 10:34:37,183 epoch 3 - iter 735/1476 - loss 0.07460277 - time (sec): 47.93 - samples/sec: 1724.93 - lr: 0.000025 - momentum: 0.000000
|
377 |
+
2023-10-24 10:34:46,875 epoch 3 - iter 882/1476 - loss 0.07823560 - time (sec): 57.62 - samples/sec: 1735.79 - lr: 0.000025 - momentum: 0.000000
|
378 |
+
2023-10-24 10:34:56,414 epoch 3 - iter 1029/1476 - loss 0.07661923 - time (sec): 67.16 - samples/sec: 1731.96 - lr: 0.000024 - momentum: 0.000000
|
379 |
+
2023-10-24 10:35:05,800 epoch 3 - iter 1176/1476 - loss 0.07755315 - time (sec): 76.55 - samples/sec: 1727.99 - lr: 0.000024 - momentum: 0.000000
|
380 |
+
2023-10-24 10:35:15,765 epoch 3 - iter 1323/1476 - loss 0.07765157 - time (sec): 86.51 - samples/sec: 1744.74 - lr: 0.000024 - momentum: 0.000000
|
381 |
+
2023-10-24 10:35:24,956 epoch 3 - iter 1470/1476 - loss 0.07767577 - time (sec): 95.70 - samples/sec: 1735.25 - lr: 0.000023 - momentum: 0.000000
|
382 |
+
2023-10-24 10:35:25,291 ----------------------------------------------------------------------------------------------------
|
383 |
+
2023-10-24 10:35:25,291 EPOCH 3 done: loss 0.0777 - lr: 0.000023
|
384 |
+
2023-10-24 10:35:33,788 DEV : loss 0.1414514034986496 - f1-score (micro avg) 0.8064
|
385 |
+
2023-10-24 10:35:33,809 saving best model
|
386 |
+
2023-10-24 10:35:34,565 ----------------------------------------------------------------------------------------------------
|
387 |
+
2023-10-24 10:35:44,207 epoch 4 - iter 147/1476 - loss 0.05130159 - time (sec): 9.64 - samples/sec: 1746.38 - lr: 0.000023 - momentum: 0.000000
|
388 |
+
2023-10-24 10:35:53,931 epoch 4 - iter 294/1476 - loss 0.04892145 - time (sec): 19.36 - samples/sec: 1811.41 - lr: 0.000023 - momentum: 0.000000
|
389 |
+
2023-10-24 10:36:03,580 epoch 4 - iter 441/1476 - loss 0.04776918 - time (sec): 29.01 - samples/sec: 1782.70 - lr: 0.000022 - momentum: 0.000000
|
390 |
+
2023-10-24 10:36:13,245 epoch 4 - iter 588/1476 - loss 0.04668210 - time (sec): 38.68 - samples/sec: 1746.47 - lr: 0.000022 - momentum: 0.000000
|
391 |
+
2023-10-24 10:36:22,993 epoch 4 - iter 735/1476 - loss 0.04560696 - time (sec): 48.43 - samples/sec: 1754.83 - lr: 0.000022 - momentum: 0.000000
|
392 |
+
2023-10-24 10:36:32,409 epoch 4 - iter 882/1476 - loss 0.04633596 - time (sec): 57.84 - samples/sec: 1747.73 - lr: 0.000021 - momentum: 0.000000
|
393 |
+
2023-10-24 10:36:42,394 epoch 4 - iter 1029/1476 - loss 0.04952081 - time (sec): 67.83 - samples/sec: 1755.06 - lr: 0.000021 - momentum: 0.000000
|
394 |
+
2023-10-24 10:36:51,859 epoch 4 - iter 1176/1476 - loss 0.05327110 - time (sec): 77.29 - samples/sec: 1744.99 - lr: 0.000021 - momentum: 0.000000
|
395 |
+
2023-10-24 10:37:01,316 epoch 4 - iter 1323/1476 - loss 0.05312525 - time (sec): 86.75 - samples/sec: 1738.37 - lr: 0.000020 - momentum: 0.000000
|
396 |
+
2023-10-24 10:37:10,558 epoch 4 - iter 1470/1476 - loss 0.05433269 - time (sec): 95.99 - samples/sec: 1726.40 - lr: 0.000020 - momentum: 0.000000
|
397 |
+
2023-10-24 10:37:10,926 ----------------------------------------------------------------------------------------------------
|
398 |
+
2023-10-24 10:37:10,926 EPOCH 4 done: loss 0.0545 - lr: 0.000020
|
399 |
+
2023-10-24 10:37:19,428 DEV : loss 0.17535756528377533 - f1-score (micro avg) 0.8202
|
400 |
+
2023-10-24 10:37:19,449 saving best model
|
401 |
+
2023-10-24 10:37:20,150 ----------------------------------------------------------------------------------------------------
|
402 |
+
2023-10-24 10:37:29,883 epoch 5 - iter 147/1476 - loss 0.03243990 - time (sec): 9.73 - samples/sec: 1742.69 - lr: 0.000020 - momentum: 0.000000
|
403 |
+
2023-10-24 10:37:39,519 epoch 5 - iter 294/1476 - loss 0.04635094 - time (sec): 19.37 - samples/sec: 1773.97 - lr: 0.000019 - momentum: 0.000000
|
404 |
+
2023-10-24 10:37:49,362 epoch 5 - iter 441/1476 - loss 0.04025634 - time (sec): 29.21 - samples/sec: 1779.48 - lr: 0.000019 - momentum: 0.000000
|
405 |
+
2023-10-24 10:37:58,525 epoch 5 - iter 588/1476 - loss 0.03809557 - time (sec): 38.37 - samples/sec: 1749.42 - lr: 0.000019 - momentum: 0.000000
|
406 |
+
2023-10-24 10:38:08,520 epoch 5 - iter 735/1476 - loss 0.03722634 - time (sec): 48.37 - samples/sec: 1747.98 - lr: 0.000018 - momentum: 0.000000
|
407 |
+
2023-10-24 10:38:17,615 epoch 5 - iter 882/1476 - loss 0.03652541 - time (sec): 57.46 - samples/sec: 1725.74 - lr: 0.000018 - momentum: 0.000000
|
408 |
+
2023-10-24 10:38:26,667 epoch 5 - iter 1029/1476 - loss 0.03632487 - time (sec): 66.52 - samples/sec: 1721.17 - lr: 0.000018 - momentum: 0.000000
|
409 |
+
2023-10-24 10:38:35,997 epoch 5 - iter 1176/1476 - loss 0.03493445 - time (sec): 75.85 - samples/sec: 1706.50 - lr: 0.000017 - momentum: 0.000000
|
410 |
+
2023-10-24 10:38:45,476 epoch 5 - iter 1323/1476 - loss 0.03543690 - time (sec): 85.33 - samples/sec: 1712.30 - lr: 0.000017 - momentum: 0.000000
|
411 |
+
2023-10-24 10:38:55,831 epoch 5 - iter 1470/1476 - loss 0.03603493 - time (sec): 95.68 - samples/sec: 1734.96 - lr: 0.000017 - momentum: 0.000000
|
412 |
+
2023-10-24 10:38:56,171 ----------------------------------------------------------------------------------------------------
|
413 |
+
2023-10-24 10:38:56,172 EPOCH 5 done: loss 0.0362 - lr: 0.000017
|
414 |
+
2023-10-24 10:39:04,666 DEV : loss 0.18856941163539886 - f1-score (micro avg) 0.8052
|
415 |
+
2023-10-24 10:39:04,687 ----------------------------------------------------------------------------------------------------
|
416 |
+
2023-10-24 10:39:14,450 epoch 6 - iter 147/1476 - loss 0.03138711 - time (sec): 9.76 - samples/sec: 1826.14 - lr: 0.000016 - momentum: 0.000000
|
417 |
+
2023-10-24 10:39:24,003 epoch 6 - iter 294/1476 - loss 0.02845150 - time (sec): 19.31 - samples/sec: 1750.17 - lr: 0.000016 - momentum: 0.000000
|
418 |
+
2023-10-24 10:39:33,538 epoch 6 - iter 441/1476 - loss 0.02731184 - time (sec): 28.85 - samples/sec: 1735.88 - lr: 0.000016 - momentum: 0.000000
|
419 |
+
2023-10-24 10:39:43,120 epoch 6 - iter 588/1476 - loss 0.02757989 - time (sec): 38.43 - samples/sec: 1737.16 - lr: 0.000015 - momentum: 0.000000
|
420 |
+
2023-10-24 10:39:52,417 epoch 6 - iter 735/1476 - loss 0.02426912 - time (sec): 47.73 - samples/sec: 1730.41 - lr: 0.000015 - momentum: 0.000000
|
421 |
+
2023-10-24 10:40:02,130 epoch 6 - iter 882/1476 - loss 0.02398610 - time (sec): 57.44 - samples/sec: 1740.41 - lr: 0.000015 - momentum: 0.000000
|
422 |
+
2023-10-24 10:40:11,398 epoch 6 - iter 1029/1476 - loss 0.02369056 - time (sec): 66.71 - samples/sec: 1723.81 - lr: 0.000014 - momentum: 0.000000
|
423 |
+
2023-10-24 10:40:20,772 epoch 6 - iter 1176/1476 - loss 0.02397182 - time (sec): 76.08 - samples/sec: 1726.11 - lr: 0.000014 - momentum: 0.000000
|
424 |
+
2023-10-24 10:40:30,870 epoch 6 - iter 1323/1476 - loss 0.02409287 - time (sec): 86.18 - samples/sec: 1738.85 - lr: 0.000014 - momentum: 0.000000
|
425 |
+
2023-10-24 10:40:40,391 epoch 6 - iter 1470/1476 - loss 0.02429223 - time (sec): 95.70 - samples/sec: 1733.92 - lr: 0.000013 - momentum: 0.000000
|
426 |
+
2023-10-24 10:40:40,734 ----------------------------------------------------------------------------------------------------
|
427 |
+
2023-10-24 10:40:40,735 EPOCH 6 done: loss 0.0242 - lr: 0.000013
|
428 |
+
2023-10-24 10:40:49,251 DEV : loss 0.1976252794265747 - f1-score (micro avg) 0.8181
|
429 |
+
2023-10-24 10:40:49,272 ----------------------------------------------------------------------------------------------------
|
430 |
+
2023-10-24 10:40:58,821 epoch 7 - iter 147/1476 - loss 0.02416041 - time (sec): 9.55 - samples/sec: 1720.61 - lr: 0.000013 - momentum: 0.000000
|
431 |
+
2023-10-24 10:41:08,314 epoch 7 - iter 294/1476 - loss 0.02557096 - time (sec): 19.04 - samples/sec: 1705.97 - lr: 0.000013 - momentum: 0.000000
|
432 |
+
2023-10-24 10:41:18,019 epoch 7 - iter 441/1476 - loss 0.02401158 - time (sec): 28.75 - samples/sec: 1731.71 - lr: 0.000012 - momentum: 0.000000
|
433 |
+
2023-10-24 10:41:27,278 epoch 7 - iter 588/1476 - loss 0.02052173 - time (sec): 38.01 - samples/sec: 1714.38 - lr: 0.000012 - momentum: 0.000000
|
434 |
+
2023-10-24 10:41:36,455 epoch 7 - iter 735/1476 - loss 0.01942122 - time (sec): 47.18 - samples/sec: 1703.48 - lr: 0.000012 - momentum: 0.000000
|
435 |
+
2023-10-24 10:41:46,543 epoch 7 - iter 882/1476 - loss 0.01902328 - time (sec): 57.27 - samples/sec: 1730.36 - lr: 0.000011 - momentum: 0.000000
|
436 |
+
2023-10-24 10:41:56,034 epoch 7 - iter 1029/1476 - loss 0.01808307 - time (sec): 66.76 - samples/sec: 1730.74 - lr: 0.000011 - momentum: 0.000000
|
437 |
+
2023-10-24 10:42:05,699 epoch 7 - iter 1176/1476 - loss 0.01833467 - time (sec): 76.43 - samples/sec: 1730.68 - lr: 0.000011 - momentum: 0.000000
|
438 |
+
2023-10-24 10:42:15,348 epoch 7 - iter 1323/1476 - loss 0.01730497 - time (sec): 86.08 - samples/sec: 1734.66 - lr: 0.000010 - momentum: 0.000000
|
439 |
+
2023-10-24 10:42:24,846 epoch 7 - iter 1470/1476 - loss 0.01857979 - time (sec): 95.57 - samples/sec: 1734.74 - lr: 0.000010 - momentum: 0.000000
|
440 |
+
2023-10-24 10:42:25,221 ----------------------------------------------------------------------------------------------------
|
441 |
+
2023-10-24 10:42:25,221 EPOCH 7 done: loss 0.0186 - lr: 0.000010
|
442 |
+
2023-10-24 10:42:33,760 DEV : loss 0.2103830873966217 - f1-score (micro avg) 0.8275
|
443 |
+
2023-10-24 10:42:33,781 saving best model
|
444 |
+
2023-10-24 10:42:34,480 ----------------------------------------------------------------------------------------------------
|
445 |
+
2023-10-24 10:42:43,884 epoch 8 - iter 147/1476 - loss 0.01365644 - time (sec): 9.40 - samples/sec: 1696.78 - lr: 0.000010 - momentum: 0.000000
|
446 |
+
2023-10-24 10:42:53,017 epoch 8 - iter 294/1476 - loss 0.01467945 - time (sec): 18.54 - samples/sec: 1658.43 - lr: 0.000009 - momentum: 0.000000
|
447 |
+
2023-10-24 10:43:03,240 epoch 8 - iter 441/1476 - loss 0.01419339 - time (sec): 28.76 - samples/sec: 1760.03 - lr: 0.000009 - momentum: 0.000000
|
448 |
+
2023-10-24 10:43:12,794 epoch 8 - iter 588/1476 - loss 0.01139119 - time (sec): 38.31 - samples/sec: 1760.19 - lr: 0.000009 - momentum: 0.000000
|
449 |
+
2023-10-24 10:43:22,562 epoch 8 - iter 735/1476 - loss 0.01070596 - time (sec): 48.08 - samples/sec: 1754.76 - lr: 0.000008 - momentum: 0.000000
|
450 |
+
2023-10-24 10:43:32,626 epoch 8 - iter 882/1476 - loss 0.01177176 - time (sec): 58.15 - samples/sec: 1762.34 - lr: 0.000008 - momentum: 0.000000
|
451 |
+
2023-10-24 10:43:41,858 epoch 8 - iter 1029/1476 - loss 0.01286942 - time (sec): 67.38 - samples/sec: 1742.60 - lr: 0.000008 - momentum: 0.000000
|
452 |
+
2023-10-24 10:43:51,105 epoch 8 - iter 1176/1476 - loss 0.01221774 - time (sec): 76.62 - samples/sec: 1735.10 - lr: 0.000007 - momentum: 0.000000
|
453 |
+
2023-10-24 10:44:00,445 epoch 8 - iter 1323/1476 - loss 0.01176144 - time (sec): 85.96 - samples/sec: 1731.26 - lr: 0.000007 - momentum: 0.000000
|
454 |
+
2023-10-24 10:44:10,081 epoch 8 - iter 1470/1476 - loss 0.01227108 - time (sec): 95.60 - samples/sec: 1733.60 - lr: 0.000007 - momentum: 0.000000
|
455 |
+
2023-10-24 10:44:10,447 ----------------------------------------------------------------------------------------------------
|
456 |
+
2023-10-24 10:44:10,447 EPOCH 8 done: loss 0.0123 - lr: 0.000007
|
457 |
+
2023-10-24 10:44:19,008 DEV : loss 0.2190389633178711 - f1-score (micro avg) 0.827
|
458 |
+
2023-10-24 10:44:19,029 ----------------------------------------------------------------------------------------------------
|
459 |
+
2023-10-24 10:44:28,424 epoch 9 - iter 147/1476 - loss 0.00656723 - time (sec): 9.39 - samples/sec: 1692.09 - lr: 0.000006 - momentum: 0.000000
|
460 |
+
2023-10-24 10:44:38,241 epoch 9 - iter 294/1476 - loss 0.00522978 - time (sec): 19.21 - samples/sec: 1767.71 - lr: 0.000006 - momentum: 0.000000
|
461 |
+
2023-10-24 10:44:47,503 epoch 9 - iter 441/1476 - loss 0.00478950 - time (sec): 28.47 - samples/sec: 1720.95 - lr: 0.000006 - momentum: 0.000000
|
462 |
+
2023-10-24 10:44:56,695 epoch 9 - iter 588/1476 - loss 0.00468800 - time (sec): 37.66 - samples/sec: 1698.76 - lr: 0.000005 - momentum: 0.000000
|
463 |
+
2023-10-24 10:45:05,920 epoch 9 - iter 735/1476 - loss 0.00606865 - time (sec): 46.89 - samples/sec: 1700.29 - lr: 0.000005 - momentum: 0.000000
|
464 |
+
2023-10-24 10:45:15,342 epoch 9 - iter 882/1476 - loss 0.00622321 - time (sec): 56.31 - samples/sec: 1699.13 - lr: 0.000005 - momentum: 0.000000
|
465 |
+
2023-10-24 10:45:24,857 epoch 9 - iter 1029/1476 - loss 0.00584308 - time (sec): 65.83 - samples/sec: 1709.37 - lr: 0.000004 - momentum: 0.000000
|
466 |
+
2023-10-24 10:45:34,881 epoch 9 - iter 1176/1476 - loss 0.00648079 - time (sec): 75.85 - samples/sec: 1730.03 - lr: 0.000004 - momentum: 0.000000
|
467 |
+
2023-10-24 10:45:45,039 epoch 9 - iter 1323/1476 - loss 0.00662559 - time (sec): 86.01 - samples/sec: 1738.06 - lr: 0.000004 - momentum: 0.000000
|
468 |
+
2023-10-24 10:45:54,542 epoch 9 - iter 1470/1476 - loss 0.00685200 - time (sec): 95.51 - samples/sec: 1737.40 - lr: 0.000003 - momentum: 0.000000
|
469 |
+
2023-10-24 10:45:54,884 ----------------------------------------------------------------------------------------------------
|
470 |
+
2023-10-24 10:45:54,885 EPOCH 9 done: loss 0.0068 - lr: 0.000003
|
471 |
+
2023-10-24 10:46:03,440 DEV : loss 0.22385086119174957 - f1-score (micro avg) 0.8342
|
472 |
+
2023-10-24 10:46:03,462 saving best model
|
473 |
+
2023-10-24 10:46:04,162 ----------------------------------------------------------------------------------------------------
|
474 |
+
2023-10-24 10:46:13,588 epoch 10 - iter 147/1476 - loss 0.00352151 - time (sec): 9.42 - samples/sec: 1718.97 - lr: 0.000003 - momentum: 0.000000
|
475 |
+
2023-10-24 10:46:22,961 epoch 10 - iter 294/1476 - loss 0.00446608 - time (sec): 18.80 - samples/sec: 1702.67 - lr: 0.000003 - momentum: 0.000000
|
476 |
+
2023-10-24 10:46:32,841 epoch 10 - iter 441/1476 - loss 0.00470680 - time (sec): 28.68 - samples/sec: 1742.15 - lr: 0.000002 - momentum: 0.000000
|
477 |
+
2023-10-24 10:46:42,528 epoch 10 - iter 588/1476 - loss 0.00484924 - time (sec): 38.36 - samples/sec: 1761.90 - lr: 0.000002 - momentum: 0.000000
|
478 |
+
2023-10-24 10:46:52,776 epoch 10 - iter 735/1476 - loss 0.00598633 - time (sec): 48.61 - samples/sec: 1776.90 - lr: 0.000002 - momentum: 0.000000
|
479 |
+
2023-10-24 10:47:02,222 epoch 10 - iter 882/1476 - loss 0.00629805 - time (sec): 58.06 - samples/sec: 1762.31 - lr: 0.000001 - momentum: 0.000000
|
480 |
+
2023-10-24 10:47:12,020 epoch 10 - iter 1029/1476 - loss 0.00622798 - time (sec): 67.86 - samples/sec: 1757.82 - lr: 0.000001 - momentum: 0.000000
|
481 |
+
2023-10-24 10:47:21,170 epoch 10 - iter 1176/1476 - loss 0.00617265 - time (sec): 77.01 - samples/sec: 1743.69 - lr: 0.000001 - momentum: 0.000000
|
482 |
+
2023-10-24 10:47:30,350 epoch 10 - iter 1323/1476 - loss 0.00592236 - time (sec): 86.19 - samples/sec: 1734.52 - lr: 0.000000 - momentum: 0.000000
|
483 |
+
2023-10-24 10:47:39,701 epoch 10 - iter 1470/1476 - loss 0.00547473 - time (sec): 95.54 - samples/sec: 1736.26 - lr: 0.000000 - momentum: 0.000000
|
484 |
+
2023-10-24 10:47:40,046 ----------------------------------------------------------------------------------------------------
|
485 |
+
2023-10-24 10:47:40,046 EPOCH 10 done: loss 0.0055 - lr: 0.000000
|
486 |
+
2023-10-24 10:47:48,614 DEV : loss 0.22415557503700256 - f1-score (micro avg) 0.8404
|
487 |
+
2023-10-24 10:47:48,636 saving best model
|
488 |
+
2023-10-24 10:47:49,932 ----------------------------------------------------------------------------------------------------
|
489 |
+
2023-10-24 10:47:49,932 Loading model from best epoch ...
|
490 |
+
2023-10-24 10:47:51,801 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-time, B-time, E-time, I-time, S-prod, B-prod, E-prod, I-prod
|
491 |
+
2023-10-24 10:47:58,484
|
492 |
+
Results:
|
493 |
+
- F-score (micro) 0.7974
|
494 |
+
- F-score (macro) 0.7083
|
495 |
+
- Accuracy 0.6897
|
496 |
+
|
497 |
+
By class:
|
498 |
+
precision recall f1-score support
|
499 |
+
|
500 |
+
loc 0.8366 0.8893 0.8621 858
|
501 |
+
pers 0.7656 0.7784 0.7719 537
|
502 |
+
org 0.5882 0.6061 0.5970 132
|
503 |
+
prod 0.7018 0.6557 0.6780 61
|
504 |
+
time 0.5873 0.6852 0.6325 54
|
505 |
+
|
506 |
+
micro avg 0.7806 0.8149 0.7974 1642
|
507 |
+
macro avg 0.6959 0.7229 0.7083 1642
|
508 |
+
weighted avg 0.7802 0.8149 0.7969 1642
|
509 |
+
|
510 |
+
2023-10-24 10:47:58,485 ----------------------------------------------------------------------------------------------------
|