root commited on
Commit
8f0563d
1 Parent(s): a848c07

added the model files

Browse files
Files changed (6) hide show
  1. dev.tsv +0 -0
  2. final-model.pt +3 -0
  3. loss.tsv +11 -0
  4. test.tsv +0 -0
  5. training.log +803 -0
  6. weights.txt +0 -0
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e91fc16458f5843cd3a21aa03e1947a19dcbea5d3e82f32e209791e60fb6f93
3
+ size 2256883501
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 01:43:03 4 0.0000 0.7577931622146611 0.3607260286808014 0.0 0.0 0.0 0.0
3
+ 2 01:46:23 4 0.0000 0.3273878849018949 0.44372475147247314 0.0 0.0 0.0 0.0
4
+ 3 01:49:46 4 0.0000 0.3004312500567572 0.4250624477863312 0.0 0.0 0.0 0.0
5
+ 4 01:53:06 4 0.0000 0.28442059537854003 0.4436105787754059 0.0 0.0 0.0 0.0
6
+ 5 01:56:28 4 0.0000 0.27345010702887845 0.46451953053474426 0.0 0.0 0.0 0.0
7
+ 6 01:59:50 4 0.0000 0.258577936120499 0.5034258961677551 0.0 0.0 0.0 0.0
8
+ 7 02:03:11 4 0.0000 0.249647237000558 0.5326654314994812 0.0 0.0 0.0 0.0
9
+ 8 02:06:33 4 0.0000 0.2402628662797549 0.5238903760910034 0.0 0.0 0.0 0.0
10
+ 9 02:09:53 4 0.0000 0.23584941995850597 0.5382402539253235 0.0 0.0 0.0 0.0
11
+ 10 02:13:16 4 0.0000 0.2320775723195998 0.5321827530860901 0.0 0.0 0.0 0.0
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,803 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2022-04-25 01:39:43,366 ----------------------------------------------------------------------------------------------------
2
+ 2022-04-25 01:39:43,370 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): XLMRobertaModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(250002, 1024, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 1024, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 1024)
9
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
18
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
19
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
24
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): RobertaOutput(
33
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
34
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ (1): RobertaLayer(
39
+ (attention): RobertaAttention(
40
+ (self): RobertaSelfAttention(
41
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
42
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
43
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
44
+ (dropout): Dropout(p=0.1, inplace=False)
45
+ )
46
+ (output): RobertaSelfOutput(
47
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
48
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
49
+ (dropout): Dropout(p=0.1, inplace=False)
50
+ )
51
+ )
52
+ (intermediate): RobertaIntermediate(
53
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
54
+ (intermediate_act_fn): GELUActivation()
55
+ )
56
+ (output): RobertaOutput(
57
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
58
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
59
+ (dropout): Dropout(p=0.1, inplace=False)
60
+ )
61
+ )
62
+ (2): RobertaLayer(
63
+ (attention): RobertaAttention(
64
+ (self): RobertaSelfAttention(
65
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
66
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
67
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
68
+ (dropout): Dropout(p=0.1, inplace=False)
69
+ )
70
+ (output): RobertaSelfOutput(
71
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
72
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
73
+ (dropout): Dropout(p=0.1, inplace=False)
74
+ )
75
+ )
76
+ (intermediate): RobertaIntermediate(
77
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
78
+ (intermediate_act_fn): GELUActivation()
79
+ )
80
+ (output): RobertaOutput(
81
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
82
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
83
+ (dropout): Dropout(p=0.1, inplace=False)
84
+ )
85
+ )
86
+ (3): RobertaLayer(
87
+ (attention): RobertaAttention(
88
+ (self): RobertaSelfAttention(
89
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
90
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
91
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
92
+ (dropout): Dropout(p=0.1, inplace=False)
93
+ )
94
+ (output): RobertaSelfOutput(
95
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
96
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
97
+ (dropout): Dropout(p=0.1, inplace=False)
98
+ )
99
+ )
100
+ (intermediate): RobertaIntermediate(
101
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
102
+ (intermediate_act_fn): GELUActivation()
103
+ )
104
+ (output): RobertaOutput(
105
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
106
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
107
+ (dropout): Dropout(p=0.1, inplace=False)
108
+ )
109
+ )
110
+ (4): RobertaLayer(
111
+ (attention): RobertaAttention(
112
+ (self): RobertaSelfAttention(
113
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
114
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
115
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
116
+ (dropout): Dropout(p=0.1, inplace=False)
117
+ )
118
+ (output): RobertaSelfOutput(
119
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
120
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
121
+ (dropout): Dropout(p=0.1, inplace=False)
122
+ )
123
+ )
124
+ (intermediate): RobertaIntermediate(
125
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
126
+ (intermediate_act_fn): GELUActivation()
127
+ )
128
+ (output): RobertaOutput(
129
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
130
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
131
+ (dropout): Dropout(p=0.1, inplace=False)
132
+ )
133
+ )
134
+ (5): RobertaLayer(
135
+ (attention): RobertaAttention(
136
+ (self): RobertaSelfAttention(
137
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
138
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
139
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ (output): RobertaSelfOutput(
143
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
144
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
145
+ (dropout): Dropout(p=0.1, inplace=False)
146
+ )
147
+ )
148
+ (intermediate): RobertaIntermediate(
149
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
150
+ (intermediate_act_fn): GELUActivation()
151
+ )
152
+ (output): RobertaOutput(
153
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
154
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
155
+ (dropout): Dropout(p=0.1, inplace=False)
156
+ )
157
+ )
158
+ (6): RobertaLayer(
159
+ (attention): RobertaAttention(
160
+ (self): RobertaSelfAttention(
161
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
162
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
163
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
164
+ (dropout): Dropout(p=0.1, inplace=False)
165
+ )
166
+ (output): RobertaSelfOutput(
167
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
168
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
169
+ (dropout): Dropout(p=0.1, inplace=False)
170
+ )
171
+ )
172
+ (intermediate): RobertaIntermediate(
173
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
174
+ (intermediate_act_fn): GELUActivation()
175
+ )
176
+ (output): RobertaOutput(
177
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
178
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
179
+ (dropout): Dropout(p=0.1, inplace=False)
180
+ )
181
+ )
182
+ (7): RobertaLayer(
183
+ (attention): RobertaAttention(
184
+ (self): RobertaSelfAttention(
185
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
186
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
187
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
188
+ (dropout): Dropout(p=0.1, inplace=False)
189
+ )
190
+ (output): RobertaSelfOutput(
191
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
192
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
193
+ (dropout): Dropout(p=0.1, inplace=False)
194
+ )
195
+ )
196
+ (intermediate): RobertaIntermediate(
197
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
198
+ (intermediate_act_fn): GELUActivation()
199
+ )
200
+ (output): RobertaOutput(
201
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
202
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
203
+ (dropout): Dropout(p=0.1, inplace=False)
204
+ )
205
+ )
206
+ (8): RobertaLayer(
207
+ (attention): RobertaAttention(
208
+ (self): RobertaSelfAttention(
209
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
210
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
211
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
212
+ (dropout): Dropout(p=0.1, inplace=False)
213
+ )
214
+ (output): RobertaSelfOutput(
215
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
216
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
217
+ (dropout): Dropout(p=0.1, inplace=False)
218
+ )
219
+ )
220
+ (intermediate): RobertaIntermediate(
221
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
222
+ (intermediate_act_fn): GELUActivation()
223
+ )
224
+ (output): RobertaOutput(
225
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
226
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ )
230
+ (9): RobertaLayer(
231
+ (attention): RobertaAttention(
232
+ (self): RobertaSelfAttention(
233
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
234
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
235
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
236
+ (dropout): Dropout(p=0.1, inplace=False)
237
+ )
238
+ (output): RobertaSelfOutput(
239
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
240
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (intermediate): RobertaIntermediate(
245
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
246
+ (intermediate_act_fn): GELUActivation()
247
+ )
248
+ (output): RobertaOutput(
249
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
250
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
251
+ (dropout): Dropout(p=0.1, inplace=False)
252
+ )
253
+ )
254
+ (10): RobertaLayer(
255
+ (attention): RobertaAttention(
256
+ (self): RobertaSelfAttention(
257
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
258
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
259
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
260
+ (dropout): Dropout(p=0.1, inplace=False)
261
+ )
262
+ (output): RobertaSelfOutput(
263
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
264
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
265
+ (dropout): Dropout(p=0.1, inplace=False)
266
+ )
267
+ )
268
+ (intermediate): RobertaIntermediate(
269
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
270
+ (intermediate_act_fn): GELUActivation()
271
+ )
272
+ (output): RobertaOutput(
273
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
274
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
275
+ (dropout): Dropout(p=0.1, inplace=False)
276
+ )
277
+ )
278
+ (11): RobertaLayer(
279
+ (attention): RobertaAttention(
280
+ (self): RobertaSelfAttention(
281
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
282
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
283
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
284
+ (dropout): Dropout(p=0.1, inplace=False)
285
+ )
286
+ (output): RobertaSelfOutput(
287
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
288
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
289
+ (dropout): Dropout(p=0.1, inplace=False)
290
+ )
291
+ )
292
+ (intermediate): RobertaIntermediate(
293
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
294
+ (intermediate_act_fn): GELUActivation()
295
+ )
296
+ (output): RobertaOutput(
297
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
298
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
299
+ (dropout): Dropout(p=0.1, inplace=False)
300
+ )
301
+ )
302
+ (12): RobertaLayer(
303
+ (attention): RobertaAttention(
304
+ (self): RobertaSelfAttention(
305
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
306
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
307
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
308
+ (dropout): Dropout(p=0.1, inplace=False)
309
+ )
310
+ (output): RobertaSelfOutput(
311
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
312
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
313
+ (dropout): Dropout(p=0.1, inplace=False)
314
+ )
315
+ )
316
+ (intermediate): RobertaIntermediate(
317
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
318
+ (intermediate_act_fn): GELUActivation()
319
+ )
320
+ (output): RobertaOutput(
321
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
322
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
323
+ (dropout): Dropout(p=0.1, inplace=False)
324
+ )
325
+ )
326
+ (13): RobertaLayer(
327
+ (attention): RobertaAttention(
328
+ (self): RobertaSelfAttention(
329
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
330
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
331
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
332
+ (dropout): Dropout(p=0.1, inplace=False)
333
+ )
334
+ (output): RobertaSelfOutput(
335
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
336
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
337
+ (dropout): Dropout(p=0.1, inplace=False)
338
+ )
339
+ )
340
+ (intermediate): RobertaIntermediate(
341
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
342
+ (intermediate_act_fn): GELUActivation()
343
+ )
344
+ (output): RobertaOutput(
345
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
346
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
347
+ (dropout): Dropout(p=0.1, inplace=False)
348
+ )
349
+ )
350
+ (14): RobertaLayer(
351
+ (attention): RobertaAttention(
352
+ (self): RobertaSelfAttention(
353
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
354
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
355
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
356
+ (dropout): Dropout(p=0.1, inplace=False)
357
+ )
358
+ (output): RobertaSelfOutput(
359
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
360
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
361
+ (dropout): Dropout(p=0.1, inplace=False)
362
+ )
363
+ )
364
+ (intermediate): RobertaIntermediate(
365
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
366
+ (intermediate_act_fn): GELUActivation()
367
+ )
368
+ (output): RobertaOutput(
369
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
370
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
371
+ (dropout): Dropout(p=0.1, inplace=False)
372
+ )
373
+ )
374
+ (15): RobertaLayer(
375
+ (attention): RobertaAttention(
376
+ (self): RobertaSelfAttention(
377
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
378
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
379
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
380
+ (dropout): Dropout(p=0.1, inplace=False)
381
+ )
382
+ (output): RobertaSelfOutput(
383
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
384
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
385
+ (dropout): Dropout(p=0.1, inplace=False)
386
+ )
387
+ )
388
+ (intermediate): RobertaIntermediate(
389
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
390
+ (intermediate_act_fn): GELUActivation()
391
+ )
392
+ (output): RobertaOutput(
393
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
394
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
395
+ (dropout): Dropout(p=0.1, inplace=False)
396
+ )
397
+ )
398
+ (16): RobertaLayer(
399
+ (attention): RobertaAttention(
400
+ (self): RobertaSelfAttention(
401
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
402
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
403
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
404
+ (dropout): Dropout(p=0.1, inplace=False)
405
+ )
406
+ (output): RobertaSelfOutput(
407
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
408
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
409
+ (dropout): Dropout(p=0.1, inplace=False)
410
+ )
411
+ )
412
+ (intermediate): RobertaIntermediate(
413
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
414
+ (intermediate_act_fn): GELUActivation()
415
+ )
416
+ (output): RobertaOutput(
417
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
418
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
419
+ (dropout): Dropout(p=0.1, inplace=False)
420
+ )
421
+ )
422
+ (17): RobertaLayer(
423
+ (attention): RobertaAttention(
424
+ (self): RobertaSelfAttention(
425
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
426
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
427
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
428
+ (dropout): Dropout(p=0.1, inplace=False)
429
+ )
430
+ (output): RobertaSelfOutput(
431
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
432
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
433
+ (dropout): Dropout(p=0.1, inplace=False)
434
+ )
435
+ )
436
+ (intermediate): RobertaIntermediate(
437
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
438
+ (intermediate_act_fn): GELUActivation()
439
+ )
440
+ (output): RobertaOutput(
441
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
442
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
443
+ (dropout): Dropout(p=0.1, inplace=False)
444
+ )
445
+ )
446
+ (18): RobertaLayer(
447
+ (attention): RobertaAttention(
448
+ (self): RobertaSelfAttention(
449
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
450
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
451
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
452
+ (dropout): Dropout(p=0.1, inplace=False)
453
+ )
454
+ (output): RobertaSelfOutput(
455
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
456
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
457
+ (dropout): Dropout(p=0.1, inplace=False)
458
+ )
459
+ )
460
+ (intermediate): RobertaIntermediate(
461
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
462
+ (intermediate_act_fn): GELUActivation()
463
+ )
464
+ (output): RobertaOutput(
465
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
466
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
467
+ (dropout): Dropout(p=0.1, inplace=False)
468
+ )
469
+ )
470
+ (19): RobertaLayer(
471
+ (attention): RobertaAttention(
472
+ (self): RobertaSelfAttention(
473
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
474
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
475
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
476
+ (dropout): Dropout(p=0.1, inplace=False)
477
+ )
478
+ (output): RobertaSelfOutput(
479
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
480
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
481
+ (dropout): Dropout(p=0.1, inplace=False)
482
+ )
483
+ )
484
+ (intermediate): RobertaIntermediate(
485
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
486
+ (intermediate_act_fn): GELUActivation()
487
+ )
488
+ (output): RobertaOutput(
489
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
490
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
491
+ (dropout): Dropout(p=0.1, inplace=False)
492
+ )
493
+ )
494
+ (20): RobertaLayer(
495
+ (attention): RobertaAttention(
496
+ (self): RobertaSelfAttention(
497
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
498
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
499
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
500
+ (dropout): Dropout(p=0.1, inplace=False)
501
+ )
502
+ (output): RobertaSelfOutput(
503
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
504
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
505
+ (dropout): Dropout(p=0.1, inplace=False)
506
+ )
507
+ )
508
+ (intermediate): RobertaIntermediate(
509
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
510
+ (intermediate_act_fn): GELUActivation()
511
+ )
512
+ (output): RobertaOutput(
513
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
514
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
515
+ (dropout): Dropout(p=0.1, inplace=False)
516
+ )
517
+ )
518
+ (21): RobertaLayer(
519
+ (attention): RobertaAttention(
520
+ (self): RobertaSelfAttention(
521
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
522
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
523
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
524
+ (dropout): Dropout(p=0.1, inplace=False)
525
+ )
526
+ (output): RobertaSelfOutput(
527
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
528
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
529
+ (dropout): Dropout(p=0.1, inplace=False)
530
+ )
531
+ )
532
+ (intermediate): RobertaIntermediate(
533
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
534
+ (intermediate_act_fn): GELUActivation()
535
+ )
536
+ (output): RobertaOutput(
537
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
538
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
539
+ (dropout): Dropout(p=0.1, inplace=False)
540
+ )
541
+ )
542
+ (22): RobertaLayer(
543
+ (attention): RobertaAttention(
544
+ (self): RobertaSelfAttention(
545
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
546
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
547
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
548
+ (dropout): Dropout(p=0.1, inplace=False)
549
+ )
550
+ (output): RobertaSelfOutput(
551
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
552
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
553
+ (dropout): Dropout(p=0.1, inplace=False)
554
+ )
555
+ )
556
+ (intermediate): RobertaIntermediate(
557
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
558
+ (intermediate_act_fn): GELUActivation()
559
+ )
560
+ (output): RobertaOutput(
561
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
562
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
563
+ (dropout): Dropout(p=0.1, inplace=False)
564
+ )
565
+ )
566
+ (23): RobertaLayer(
567
+ (attention): RobertaAttention(
568
+ (self): RobertaSelfAttention(
569
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
570
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
571
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
572
+ (dropout): Dropout(p=0.1, inplace=False)
573
+ )
574
+ (output): RobertaSelfOutput(
575
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
576
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
577
+ (dropout): Dropout(p=0.1, inplace=False)
578
+ )
579
+ )
580
+ (intermediate): RobertaIntermediate(
581
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
582
+ (intermediate_act_fn): GELUActivation()
583
+ )
584
+ (output): RobertaOutput(
585
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
586
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
587
+ (dropout): Dropout(p=0.1, inplace=False)
588
+ )
589
+ )
590
+ )
591
+ )
592
+ (pooler): RobertaPooler(
593
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
594
+ (activation): Tanh()
595
+ )
596
+ )
597
+ )
598
+ (word_dropout): WordDropout(p=0.05)
599
+ (locked_dropout): LockedDropout(p=0.5)
600
+ (linear): Linear(in_features=1024, out_features=20, bias=True)
601
+ (loss_function): CrossEntropyLoss()
602
+ )"
603
+ 2022-04-25 01:39:43,372 ----------------------------------------------------------------------------------------------------
604
+ 2022-04-25 01:39:43,372 Corpus: "Corpus: 1820 train + 50 dev + 67 test sentences"
605
+ 2022-04-25 01:39:43,373 ----------------------------------------------------------------------------------------------------
606
+ 2022-04-25 01:39:43,374 Parameters:
607
+ 2022-04-25 01:39:43,374 - learning_rate: "0.000005"
608
+ 2022-04-25 01:39:43,375 - mini_batch_size: "4"
609
+ 2022-04-25 01:39:43,375 - patience: "3"
610
+ 2022-04-25 01:39:43,376 - anneal_factor: "0.5"
611
+ 2022-04-25 01:39:43,377 - max_epochs: "10"
612
+ 2022-04-25 01:39:43,378 - shuffle: "True"
613
+ 2022-04-25 01:39:43,378 - train_with_dev: "False"
614
+ 2022-04-25 01:39:43,379 - batch_growth_annealing: "False"
615
+ 2022-04-25 01:39:43,379 ----------------------------------------------------------------------------------------------------
616
+ 2022-04-25 01:39:43,380 Model training base path: "resources/taggers/ner_xlm_finedtuned_ck1_ft"
617
+ 2022-04-25 01:39:43,381 ----------------------------------------------------------------------------------------------------
618
+ 2022-04-25 01:39:43,381 Device: cuda:0
619
+ 2022-04-25 01:39:43,382 ----------------------------------------------------------------------------------------------------
620
+ 2022-04-25 01:39:43,382 Embeddings storage mode: none
621
+ 2022-04-25 01:39:43,383 ----------------------------------------------------------------------------------------------------
622
+ 2022-04-25 01:40:01,316 epoch 1 - iter 45/455 - loss 2.02383973 - samples/sec: 10.04 - lr: 0.000000
623
+ 2022-04-25 01:40:19,778 epoch 1 - iter 90/455 - loss 1.77018784 - samples/sec: 9.75 - lr: 0.000001
624
+ 2022-04-25 01:40:38,303 epoch 1 - iter 135/455 - loss 1.55487540 - samples/sec: 9.72 - lr: 0.000001
625
+ 2022-04-25 01:40:57,281 epoch 1 - iter 180/455 - loss 1.34519623 - samples/sec: 9.49 - lr: 0.000002
626
+ 2022-04-25 01:41:18,145 epoch 1 - iter 225/455 - loss 1.15539089 - samples/sec: 8.63 - lr: 0.000002
627
+ 2022-04-25 01:41:36,602 epoch 1 - iter 270/455 - loss 1.02895662 - samples/sec: 9.76 - lr: 0.000003
628
+ 2022-04-25 01:41:55,400 epoch 1 - iter 315/455 - loss 0.93416075 - samples/sec: 9.58 - lr: 0.000003
629
+ 2022-04-25 01:42:14,308 epoch 1 - iter 360/455 - loss 0.86211554 - samples/sec: 9.52 - lr: 0.000004
630
+ 2022-04-25 01:42:33,218 epoch 1 - iter 405/455 - loss 0.80736508 - samples/sec: 9.52 - lr: 0.000004
631
+ 2022-04-25 01:42:52,404 epoch 1 - iter 450/455 - loss 0.76251684 - samples/sec: 9.38 - lr: 0.000005
632
+ 2022-04-25 01:42:54,450 ----------------------------------------------------------------------------------------------------
633
+ 2022-04-25 01:42:54,452 EPOCH 1 done: loss 0.7578 - lr 0.000005
634
+ 2022-04-25 01:43:03,256 Evaluating as a multi-label problem: False
635
+ 2022-04-25 01:43:03,269 DEV : loss 0.3607260286808014 - f1-score (micro avg) 0.0
636
+ 2022-04-25 01:43:03,277 BAD EPOCHS (no improvement): 4
637
+ 2022-04-25 01:43:03,278 ----------------------------------------------------------------------------------------------------
638
+ 2022-04-25 01:43:22,465 epoch 2 - iter 45/455 - loss 0.35669344 - samples/sec: 9.38 - lr: 0.000005
639
+ 2022-04-25 01:43:41,226 epoch 2 - iter 90/455 - loss 0.33744187 - samples/sec: 9.60 - lr: 0.000005
640
+ 2022-04-25 01:44:00,335 epoch 2 - iter 135/455 - loss 0.33264492 - samples/sec: 9.42 - lr: 0.000005
641
+ 2022-04-25 01:44:19,259 epoch 2 - iter 180/455 - loss 0.33442139 - samples/sec: 9.51 - lr: 0.000005
642
+ 2022-04-25 01:44:37,971 epoch 2 - iter 225/455 - loss 0.33062050 - samples/sec: 9.62 - lr: 0.000005
643
+ 2022-04-25 01:44:56,896 epoch 2 - iter 270/455 - loss 0.32856691 - samples/sec: 9.51 - lr: 0.000005
644
+ 2022-04-25 01:45:17,782 epoch 2 - iter 315/455 - loss 0.32794608 - samples/sec: 8.62 - lr: 0.000005
645
+ 2022-04-25 01:45:36,760 epoch 2 - iter 360/455 - loss 0.32718419 - samples/sec: 9.49 - lr: 0.000005
646
+ 2022-04-25 01:45:55,772 epoch 2 - iter 405/455 - loss 0.32696006 - samples/sec: 9.47 - lr: 0.000005
647
+ 2022-04-25 01:46:15,075 epoch 2 - iter 450/455 - loss 0.32726336 - samples/sec: 9.33 - lr: 0.000004
648
+ 2022-04-25 01:46:17,246 ----------------------------------------------------------------------------------------------------
649
+ 2022-04-25 01:46:17,247 EPOCH 2 done: loss 0.3274 - lr 0.000004
650
+ 2022-04-25 01:46:23,646 Evaluating as a multi-label problem: False
651
+ 2022-04-25 01:46:23,664 DEV : loss 0.44372475147247314 - f1-score (micro avg) 0.0
652
+ 2022-04-25 01:46:23,675 BAD EPOCHS (no improvement): 4
653
+ 2022-04-25 01:46:23,676 ----------------------------------------------------------------------------------------------------
654
+ 2022-04-25 01:46:42,384 epoch 3 - iter 45/455 - loss 0.31045361 - samples/sec: 9.63 - lr: 0.000004
655
+ 2022-04-25 01:47:03,681 epoch 3 - iter 90/455 - loss 0.30688918 - samples/sec: 8.45 - lr: 0.000004
656
+ 2022-04-25 01:47:22,548 epoch 3 - iter 135/455 - loss 0.30176367 - samples/sec: 9.54 - lr: 0.000004
657
+ 2022-04-25 01:47:41,337 epoch 3 - iter 180/455 - loss 0.29894450 - samples/sec: 9.58 - lr: 0.000004
658
+ 2022-04-25 01:48:00,045 epoch 3 - iter 225/455 - loss 0.29867330 - samples/sec: 9.62 - lr: 0.000004
659
+ 2022-04-25 01:48:18,928 epoch 3 - iter 270/455 - loss 0.29997778 - samples/sec: 9.54 - lr: 0.000004
660
+ 2022-04-25 01:48:37,737 epoch 3 - iter 315/455 - loss 0.30151499 - samples/sec: 9.57 - lr: 0.000004
661
+ 2022-04-25 01:48:56,808 epoch 3 - iter 360/455 - loss 0.30030851 - samples/sec: 9.44 - lr: 0.000004
662
+ 2022-04-25 01:49:15,866 epoch 3 - iter 405/455 - loss 0.29995926 - samples/sec: 9.45 - lr: 0.000004
663
+ 2022-04-25 01:49:37,329 epoch 3 - iter 450/455 - loss 0.30000599 - samples/sec: 8.39 - lr: 0.000004
664
+ 2022-04-25 01:49:39,502 ----------------------------------------------------------------------------------------------------
665
+ 2022-04-25 01:49:39,503 EPOCH 3 done: loss 0.3004 - lr 0.000004
666
+ 2022-04-25 01:49:46,186 Evaluating as a multi-label problem: False
667
+ 2022-04-25 01:49:46,198 DEV : loss 0.4250624477863312 - f1-score (micro avg) 0.0
668
+ 2022-04-25 01:49:46,207 BAD EPOCHS (no improvement): 4
669
+ 2022-04-25 01:49:46,208 ----------------------------------------------------------------------------------------------------
670
+ 2022-04-25 01:50:04,886 epoch 4 - iter 45/455 - loss 0.27018579 - samples/sec: 9.64 - lr: 0.000004
671
+ 2022-04-25 01:50:23,747 epoch 4 - iter 90/455 - loss 0.28505798 - samples/sec: 9.55 - lr: 0.000004
672
+ 2022-04-25 01:50:42,591 epoch 4 - iter 135/455 - loss 0.28106699 - samples/sec: 9.55 - lr: 0.000004
673
+ 2022-04-25 01:51:01,834 epoch 4 - iter 180/455 - loss 0.28213592 - samples/sec: 9.36 - lr: 0.000004
674
+ 2022-04-25 01:51:22,523 epoch 4 - iter 225/455 - loss 0.28339344 - samples/sec: 8.70 - lr: 0.000004
675
+ 2022-04-25 01:51:41,984 epoch 4 - iter 270/455 - loss 0.28600075 - samples/sec: 9.25 - lr: 0.000004
676
+ 2022-04-25 01:52:01,001 epoch 4 - iter 315/455 - loss 0.28507349 - samples/sec: 9.47 - lr: 0.000004
677
+ 2022-04-25 01:52:19,572 epoch 4 - iter 360/455 - loss 0.28385244 - samples/sec: 9.70 - lr: 0.000003
678
+ 2022-04-25 01:52:38,471 epoch 4 - iter 405/455 - loss 0.28397099 - samples/sec: 9.53 - lr: 0.000003
679
+ 2022-04-25 01:52:57,371 epoch 4 - iter 450/455 - loss 0.28432390 - samples/sec: 9.53 - lr: 0.000003
680
+ 2022-04-25 01:52:59,489 ----------------------------------------------------------------------------------------------------
681
+ 2022-04-25 01:52:59,490 EPOCH 4 done: loss 0.2844 - lr 0.000003
682
+ 2022-04-25 01:53:06,144 Evaluating as a multi-label problem: False
683
+ 2022-04-25 01:53:06,157 DEV : loss 0.4436105787754059 - f1-score (micro avg) 0.0
684
+ 2022-04-25 01:53:06,166 BAD EPOCHS (no improvement): 4
685
+ 2022-04-25 01:53:06,168 ----------------------------------------------------------------------------------------------------
686
+ 2022-04-25 01:53:27,165 epoch 5 - iter 45/455 - loss 0.26753679 - samples/sec: 8.58 - lr: 0.000003
687
+ 2022-04-25 01:53:46,071 epoch 5 - iter 90/455 - loss 0.27230605 - samples/sec: 9.52 - lr: 0.000003
688
+ 2022-04-25 01:54:04,859 epoch 5 - iter 135/455 - loss 0.27246786 - samples/sec: 9.58 - lr: 0.000003
689
+ 2022-04-25 01:54:23,704 epoch 5 - iter 180/455 - loss 0.27259198 - samples/sec: 9.55 - lr: 0.000003
690
+ 2022-04-25 01:54:42,577 epoch 5 - iter 225/455 - loss 0.27431760 - samples/sec: 9.54 - lr: 0.000003
691
+ 2022-04-25 01:55:01,271 epoch 5 - iter 270/455 - loss 0.27392484 - samples/sec: 9.63 - lr: 0.000003
692
+ 2022-04-25 01:55:20,066 epoch 5 - iter 315/455 - loss 0.27357625 - samples/sec: 9.58 - lr: 0.000003
693
+ 2022-04-25 01:55:39,125 epoch 5 - iter 360/455 - loss 0.27202662 - samples/sec: 9.45 - lr: 0.000003
694
+ 2022-04-25 01:55:57,915 epoch 5 - iter 405/455 - loss 0.27381644 - samples/sec: 9.58 - lr: 0.000003
695
+ 2022-04-25 01:56:19,310 epoch 5 - iter 450/455 - loss 0.27384803 - samples/sec: 8.42 - lr: 0.000003
696
+ 2022-04-25 01:56:21,405 ----------------------------------------------------------------------------------------------------
697
+ 2022-04-25 01:56:21,405 EPOCH 5 done: loss 0.2735 - lr 0.000003
698
+ 2022-04-25 01:56:27,996 Evaluating as a multi-label problem: False
699
+ 2022-04-25 01:56:28,008 DEV : loss 0.46451953053474426 - f1-score (micro avg) 0.0
700
+ 2022-04-25 01:56:28,017 BAD EPOCHS (no improvement): 4
701
+ 2022-04-25 01:56:28,018 ----------------------------------------------------------------------------------------------------
702
+ 2022-04-25 01:56:46,994 epoch 6 - iter 45/455 - loss 0.26238774 - samples/sec: 9.49 - lr: 0.000003
703
+ 2022-04-25 01:57:06,067 epoch 6 - iter 90/455 - loss 0.26228525 - samples/sec: 9.44 - lr: 0.000003
704
+ 2022-04-25 01:57:25,103 epoch 6 - iter 135/455 - loss 0.26298919 - samples/sec: 9.46 - lr: 0.000003
705
+ 2022-04-25 01:57:45,904 epoch 6 - iter 180/455 - loss 0.26033810 - samples/sec: 8.66 - lr: 0.000003
706
+ 2022-04-25 01:58:04,752 epoch 6 - iter 225/455 - loss 0.25980613 - samples/sec: 9.55 - lr: 0.000003
707
+ 2022-04-25 01:58:23,635 epoch 6 - iter 270/455 - loss 0.25741937 - samples/sec: 9.53 - lr: 0.000002
708
+ 2022-04-25 01:58:42,770 epoch 6 - iter 315/455 - loss 0.25761401 - samples/sec: 9.41 - lr: 0.000002
709
+ 2022-04-25 01:59:01,669 epoch 6 - iter 360/455 - loss 0.25802951 - samples/sec: 9.53 - lr: 0.000002
710
+ 2022-04-25 01:59:20,507 epoch 6 - iter 405/455 - loss 0.25786031 - samples/sec: 9.56 - lr: 0.000002
711
+ 2022-04-25 01:59:39,104 epoch 6 - iter 450/455 - loss 0.25875289 - samples/sec: 9.68 - lr: 0.000002
712
+ 2022-04-25 01:59:41,245 ----------------------------------------------------------------------------------------------------
713
+ 2022-04-25 01:59:41,247 EPOCH 6 done: loss 0.2586 - lr 0.000002
714
+ 2022-04-25 01:59:50,159 Evaluating as a multi-label problem: False
715
+ 2022-04-25 01:59:50,176 DEV : loss 0.5034258961677551 - f1-score (micro avg) 0.0
716
+ 2022-04-25 01:59:50,186 BAD EPOCHS (no improvement): 4
717
+ 2022-04-25 01:59:50,188 ----------------------------------------------------------------------------------------------------
718
+ 2022-04-25 02:00:09,428 epoch 7 - iter 45/455 - loss 0.25272579 - samples/sec: 9.36 - lr: 0.000002
719
+ 2022-04-25 02:00:28,674 epoch 7 - iter 90/455 - loss 0.24877335 - samples/sec: 9.35 - lr: 0.000002
720
+ 2022-04-25 02:00:47,419 epoch 7 - iter 135/455 - loss 0.25029754 - samples/sec: 9.61 - lr: 0.000002
721
+ 2022-04-25 02:01:06,330 epoch 7 - iter 180/455 - loss 0.24783496 - samples/sec: 9.52 - lr: 0.000002
722
+ 2022-04-25 02:01:25,050 epoch 7 - iter 225/455 - loss 0.24702442 - samples/sec: 9.62 - lr: 0.000002
723
+ 2022-04-25 02:01:43,981 epoch 7 - iter 270/455 - loss 0.24574698 - samples/sec: 9.51 - lr: 0.000002
724
+ 2022-04-25 02:02:02,729 epoch 7 - iter 315/455 - loss 0.24814380 - samples/sec: 9.60 - lr: 0.000002
725
+ 2022-04-25 02:02:24,035 epoch 7 - iter 360/455 - loss 0.24891601 - samples/sec: 8.45 - lr: 0.000002
726
+ 2022-04-25 02:02:43,529 epoch 7 - iter 405/455 - loss 0.24938588 - samples/sec: 9.24 - lr: 0.000002
727
+ 2022-04-25 02:03:02,611 epoch 7 - iter 450/455 - loss 0.24975402 - samples/sec: 9.44 - lr: 0.000002
728
+ 2022-04-25 02:03:04,674 ----------------------------------------------------------------------------------------------------
729
+ 2022-04-25 02:03:04,675 EPOCH 7 done: loss 0.2496 - lr 0.000002
730
+ 2022-04-25 02:03:11,014 Evaluating as a multi-label problem: False
731
+ 2022-04-25 02:03:11,028 DEV : loss 0.5326654314994812 - f1-score (micro avg) 0.0
732
+ 2022-04-25 02:03:11,037 BAD EPOCHS (no improvement): 4
733
+ 2022-04-25 02:03:11,039 ----------------------------------------------------------------------------------------------------
734
+ 2022-04-25 02:03:29,928 epoch 8 - iter 45/455 - loss 0.23902515 - samples/sec: 9.53 - lr: 0.000002
735
+ 2022-04-25 02:03:48,547 epoch 8 - iter 90/455 - loss 0.24182299 - samples/sec: 9.67 - lr: 0.000002
736
+ 2022-04-25 02:04:09,761 epoch 8 - iter 135/455 - loss 0.23794694 - samples/sec: 8.49 - lr: 0.000002
737
+ 2022-04-25 02:04:28,820 epoch 8 - iter 180/455 - loss 0.23901632 - samples/sec: 9.45 - lr: 0.000001
738
+ 2022-04-25 02:04:47,476 epoch 8 - iter 225/455 - loss 0.24089284 - samples/sec: 9.65 - lr: 0.000001
739
+ 2022-04-25 02:05:06,576 epoch 8 - iter 270/455 - loss 0.24050137 - samples/sec: 9.43 - lr: 0.000001
740
+ 2022-04-25 02:05:25,230 epoch 8 - iter 315/455 - loss 0.24061046 - samples/sec: 9.65 - lr: 0.000001
741
+ 2022-04-25 02:05:43,780 epoch 8 - iter 360/455 - loss 0.24122314 - samples/sec: 9.71 - lr: 0.000001
742
+ 2022-04-25 02:06:03,140 epoch 8 - iter 405/455 - loss 0.24068138 - samples/sec: 9.30 - lr: 0.000001
743
+ 2022-04-25 02:06:22,289 epoch 8 - iter 450/455 - loss 0.24028428 - samples/sec: 9.40 - lr: 0.000001
744
+ 2022-04-25 02:06:24,348 ----------------------------------------------------------------------------------------------------
745
+ 2022-04-25 02:06:24,350 EPOCH 8 done: loss 0.2403 - lr 0.000001
746
+ 2022-04-25 02:06:33,470 Evaluating as a multi-label problem: False
747
+ 2022-04-25 02:06:33,485 DEV : loss 0.5238903760910034 - f1-score (micro avg) 0.0
748
+ 2022-04-25 02:06:33,495 BAD EPOCHS (no improvement): 4
749
+ 2022-04-25 02:06:33,497 ----------------------------------------------------------------------------------------------------
750
+ 2022-04-25 02:06:52,645 epoch 9 - iter 45/455 - loss 0.22659045 - samples/sec: 9.40 - lr: 0.000001
751
+ 2022-04-25 02:07:11,647 epoch 9 - iter 90/455 - loss 0.23007686 - samples/sec: 9.48 - lr: 0.000001
752
+ 2022-04-25 02:07:30,432 epoch 9 - iter 135/455 - loss 0.23182102 - samples/sec: 9.59 - lr: 0.000001
753
+ 2022-04-25 02:07:49,161 epoch 9 - iter 180/455 - loss 0.23484638 - samples/sec: 9.61 - lr: 0.000001
754
+ 2022-04-25 02:08:08,185 epoch 9 - iter 225/455 - loss 0.23575341 - samples/sec: 9.46 - lr: 0.000001
755
+ 2022-04-25 02:08:29,084 epoch 9 - iter 270/455 - loss 0.23430629 - samples/sec: 8.62 - lr: 0.000001
756
+ 2022-04-25 02:08:48,058 epoch 9 - iter 315/455 - loss 0.23511980 - samples/sec: 9.49 - lr: 0.000001
757
+ 2022-04-25 02:09:07,055 epoch 9 - iter 360/455 - loss 0.23591144 - samples/sec: 9.48 - lr: 0.000001
758
+ 2022-04-25 02:09:25,960 epoch 9 - iter 405/455 - loss 0.23587694 - samples/sec: 9.52 - lr: 0.000001
759
+ 2022-04-25 02:09:45,046 epoch 9 - iter 450/455 - loss 0.23596768 - samples/sec: 9.43 - lr: 0.000001
760
+ 2022-04-25 02:09:47,133 ----------------------------------------------------------------------------------------------------
761
+ 2022-04-25 02:09:47,134 EPOCH 9 done: loss 0.2358 - lr 0.000001
762
+ 2022-04-25 02:09:53,727 Evaluating as a multi-label problem: False
763
+ 2022-04-25 02:09:53,740 DEV : loss 0.5382402539253235 - f1-score (micro avg) 0.0
764
+ 2022-04-25 02:09:53,749 BAD EPOCHS (no improvement): 4
765
+ 2022-04-25 02:09:53,750 ----------------------------------------------------------------------------------------------------
766
+ 2022-04-25 02:10:14,720 epoch 10 - iter 45/455 - loss 0.22667111 - samples/sec: 8.59 - lr: 0.000001
767
+ 2022-04-25 02:10:34,134 epoch 10 - iter 90/455 - loss 0.22673460 - samples/sec: 9.27 - lr: 0.000000
768
+ 2022-04-25 02:10:53,154 epoch 10 - iter 135/455 - loss 0.22714280 - samples/sec: 9.47 - lr: 0.000000
769
+ 2022-04-25 02:11:12,101 epoch 10 - iter 180/455 - loss 0.22947185 - samples/sec: 9.50 - lr: 0.000000
770
+ 2022-04-25 02:11:30,855 epoch 10 - iter 225/455 - loss 0.23026782 - samples/sec: 9.60 - lr: 0.000000
771
+ 2022-04-25 02:11:49,560 epoch 10 - iter 270/455 - loss 0.23211704 - samples/sec: 9.63 - lr: 0.000000
772
+ 2022-04-25 02:12:08,468 epoch 10 - iter 315/455 - loss 0.23132383 - samples/sec: 9.52 - lr: 0.000000
773
+ 2022-04-25 02:12:27,224 epoch 10 - iter 360/455 - loss 0.23094819 - samples/sec: 9.60 - lr: 0.000000
774
+ 2022-04-25 02:12:46,168 epoch 10 - iter 405/455 - loss 0.23152902 - samples/sec: 9.50 - lr: 0.000000
775
+ 2022-04-25 02:13:07,714 epoch 10 - iter 450/455 - loss 0.23243307 - samples/sec: 8.36 - lr: 0.000000
776
+ 2022-04-25 02:13:09,804 ----------------------------------------------------------------------------------------------------
777
+ 2022-04-25 02:13:09,806 EPOCH 10 done: loss 0.2321 - lr 0.000000
778
+ 2022-04-25 02:13:16,510 Evaluating as a multi-label problem: False
779
+ 2022-04-25 02:13:16,522 DEV : loss 0.5321827530860901 - f1-score (micro avg) 0.0
780
+ 2022-04-25 02:13:16,531 BAD EPOCHS (no improvement): 4
781
+ 2022-04-25 02:13:19,604 ----------------------------------------------------------------------------------------------------
782
+ 2022-04-25 02:13:19,607 Testing using last state of model ...
783
+ 2022-04-25 02:13:30,230 Evaluating as a multi-label problem: False
784
+ 2022-04-25 02:13:30,247 0.0 0.0 0.0 0.0
785
+ 2022-04-25 02:13:30,248
786
+ Results:
787
+ - F-score (micro) 0.0
788
+ - F-score (macro) 0.0
789
+ - Accuracy 0.0
790
+
791
+ By class:
792
+ precision recall f1-score support
793
+
794
+ nk> 0.0000 0.0000 0.0000 0.0
795
+ ORG 0.0000 0.0000 0.0000 687.0
796
+ LOC 0.0000 0.0000 0.0000 304.0
797
+ PENT 0.0000 0.0000 0.0000 6.0
798
+
799
+ micro avg 0.0000 0.0000 0.0000 997.0
800
+ macro avg 0.0000 0.0000 0.0000 997.0
801
+ weighted avg 0.0000 0.0000 0.0000 997.0
802
+
803
+ 2022-04-25 02:13:30,248 ----------------------------------------------------------------------------------------------------
weights.txt ADDED
File without changes