Benedikt Fuchs commited on
Commit
90f0ce2
1 Parent(s): 6a1a187
Files changed (5) hide show
  1. README.md +74 -1
  2. loss.tsv +11 -0
  3. model_args.bin +3 -0
  4. pytorch_model.bin +3 -0
  5. training.log +515 -0
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: en
7
+ datasets:
8
+ - conll2003
9
+ widget:
10
+ - text: "George Washington went to Washington"
11
  ---
12
+
13
+ This is a very small model I use for testing my [ner eval dashboard](https://github.com/helpmefindaname/ner-eval-dashboard)
14
+
15
+
16
+
17
+ F1-Score: **48,73** (corrected CoNLL-03)
18
+
19
+ Predicts 4 tags:
20
+
21
+ | **tag** | **meaning** |
22
+ |---------------------------------|-----------|
23
+ | PER | person name |
24
+ | LOC | location name |
25
+ | ORG | organization name |
26
+ | MISC | other name |
27
+
28
+ Based on document-level XLM-R embeddings and [FLERT](https://arxiv.org/pdf/2011.06993v1.pdf/).
29
+
30
+ ---
31
+
32
+ ### Demo: How to use in Flair
33
+
34
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
35
+
36
+ ```python
37
+ from flair.data import Sentence
38
+ from flair.models import SequenceTagger
39
+ # load tagger
40
+ tagger = SequenceTagger.load("flair/ner-english-large")
41
+ # make example sentence
42
+ sentence = Sentence("George Washington went to Washington")
43
+ # predict NER tags
44
+ tagger.predict(sentence)
45
+ # print sentence
46
+ print(sentence)
47
+ # print predicted NER spans
48
+ print('The following NER tags are found:')
49
+ # iterate over entities and print
50
+ for entity in sentence.get_spans('ner'):
51
+ print(entity)
52
+ ```
53
+
54
+ This yields the following output:
55
+ ```
56
+ Span [1,2]: "George Washington" [− Labels: PER (1.0)]
57
+ Span [5]: "Washington" [− Labels: LOC (1.0)]
58
+ ```
59
+
60
+ So, the entities "*George Washington*" (labeled as a **person**) and "*Washington*" (labeled as a **location**) are found in the sentence "*George Washington went to Washington*".
61
+
62
+
63
+ ---
64
+
65
+ ### Training: Script to train this model
66
+
67
+ The following command was used to train this model:
68
+ where `examples\ner\run_ner.py` refers to [this script](https://github.com/flairNLP/flair/blob/master/examples/ner/run_ner.py)
69
+
70
+ ```
71
+ python examples\ner\run_ner.py --model_name_or_path hf-internal-testing/tiny-random-bert --dataset_name CONLL_03 --learning_rate 0.002 --mini_batch_chunk_size 1024 --batch_size 64 --num_epochs 100
72
+ ```
73
+
74
+
75
+
76
+ ---
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 23:44:02 4 0.0001 0.5328251830571181 0.08702843636274338 0.8993 0.9091 0.9042 0.8614
3
+ 2 23:47:24 4 0.0000 0.1937733137503905 0.06405811011791229 0.9344 0.9377 0.9361 0.9075
4
+ 3 23:51:01 4 0.0000 0.16199104815067825 0.06513667851686478 0.9462 0.9463 0.9462 0.9207
5
+ 4 23:54:37 4 0.0000 0.15143144882149487 0.0851067453622818 0.9446 0.9443 0.9445 0.9183
6
+ 5 23:58:11 4 0.0000 0.14078483339379705 0.07939312607049942 0.9478 0.9527 0.9502 0.9253
7
+ 6 00:01:40 4 0.0000 0.1396117310001689 0.08579559624195099 0.9455 0.9539 0.9497 0.9269
8
+ 7 00:05:08 4 0.0000 0.1330616688582298 0.09259101003408432 0.9488 0.9542 0.9515 0.9292
9
+ 8 00:08:36 4 0.0000 0.13271965400214392 0.09469996392726898 0.9485 0.9524 0.9505 0.928
10
+ 9 00:11:57 4 0.0000 0.1294400242274288 0.09501232951879501 0.9475 0.9534 0.9504 0.9272
11
+ 10 00:15:27 4 0.0000 0.1298187901297082 0.09416753053665161 0.9488 0.9539 0.9513 0.9283
model_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9565db4d50e0dbef463f52b139b5a5974a8da3ef61c0c825f71958641df9c393
3
+ size 241
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:744952f8978643963cf3c1fb70cfb5ebfce5c5ad3ae24b56377fb6f20636b5d9
3
+ size 434073325
training.log ADDED
@@ -0,0 +1,515 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2022-05-09 23:40:59,402 ----------------------------------------------------------------------------------------------------
2
+ 2022-05-09 23:40:59,404 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(28996, 768, padding_idx=0)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0): BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ (1): BertLayer(
39
+ (attention): BertAttention(
40
+ (self): BertSelfAttention(
41
+ (query): Linear(in_features=768, out_features=768, bias=True)
42
+ (key): Linear(in_features=768, out_features=768, bias=True)
43
+ (value): Linear(in_features=768, out_features=768, bias=True)
44
+ (dropout): Dropout(p=0.1, inplace=False)
45
+ )
46
+ (output): BertSelfOutput(
47
+ (dense): Linear(in_features=768, out_features=768, bias=True)
48
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
49
+ (dropout): Dropout(p=0.1, inplace=False)
50
+ )
51
+ )
52
+ (intermediate): BertIntermediate(
53
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
54
+ (intermediate_act_fn): GELUActivation()
55
+ )
56
+ (output): BertOutput(
57
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
58
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
59
+ (dropout): Dropout(p=0.1, inplace=False)
60
+ )
61
+ )
62
+ (2): BertLayer(
63
+ (attention): BertAttention(
64
+ (self): BertSelfAttention(
65
+ (query): Linear(in_features=768, out_features=768, bias=True)
66
+ (key): Linear(in_features=768, out_features=768, bias=True)
67
+ (value): Linear(in_features=768, out_features=768, bias=True)
68
+ (dropout): Dropout(p=0.1, inplace=False)
69
+ )
70
+ (output): BertSelfOutput(
71
+ (dense): Linear(in_features=768, out_features=768, bias=True)
72
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
73
+ (dropout): Dropout(p=0.1, inplace=False)
74
+ )
75
+ )
76
+ (intermediate): BertIntermediate(
77
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
78
+ (intermediate_act_fn): GELUActivation()
79
+ )
80
+ (output): BertOutput(
81
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
82
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
83
+ (dropout): Dropout(p=0.1, inplace=False)
84
+ )
85
+ )
86
+ (3): BertLayer(
87
+ (attention): BertAttention(
88
+ (self): BertSelfAttention(
89
+ (query): Linear(in_features=768, out_features=768, bias=True)
90
+ (key): Linear(in_features=768, out_features=768, bias=True)
91
+ (value): Linear(in_features=768, out_features=768, bias=True)
92
+ (dropout): Dropout(p=0.1, inplace=False)
93
+ )
94
+ (output): BertSelfOutput(
95
+ (dense): Linear(in_features=768, out_features=768, bias=True)
96
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
97
+ (dropout): Dropout(p=0.1, inplace=False)
98
+ )
99
+ )
100
+ (intermediate): BertIntermediate(
101
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
102
+ (intermediate_act_fn): GELUActivation()
103
+ )
104
+ (output): BertOutput(
105
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
106
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
107
+ (dropout): Dropout(p=0.1, inplace=False)
108
+ )
109
+ )
110
+ (4): BertLayer(
111
+ (attention): BertAttention(
112
+ (self): BertSelfAttention(
113
+ (query): Linear(in_features=768, out_features=768, bias=True)
114
+ (key): Linear(in_features=768, out_features=768, bias=True)
115
+ (value): Linear(in_features=768, out_features=768, bias=True)
116
+ (dropout): Dropout(p=0.1, inplace=False)
117
+ )
118
+ (output): BertSelfOutput(
119
+ (dense): Linear(in_features=768, out_features=768, bias=True)
120
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
121
+ (dropout): Dropout(p=0.1, inplace=False)
122
+ )
123
+ )
124
+ (intermediate): BertIntermediate(
125
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
126
+ (intermediate_act_fn): GELUActivation()
127
+ )
128
+ (output): BertOutput(
129
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
130
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
131
+ (dropout): Dropout(p=0.1, inplace=False)
132
+ )
133
+ )
134
+ (5): BertLayer(
135
+ (attention): BertAttention(
136
+ (self): BertSelfAttention(
137
+ (query): Linear(in_features=768, out_features=768, bias=True)
138
+ (key): Linear(in_features=768, out_features=768, bias=True)
139
+ (value): Linear(in_features=768, out_features=768, bias=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ (output): BertSelfOutput(
143
+ (dense): Linear(in_features=768, out_features=768, bias=True)
144
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
145
+ (dropout): Dropout(p=0.1, inplace=False)
146
+ )
147
+ )
148
+ (intermediate): BertIntermediate(
149
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
150
+ (intermediate_act_fn): GELUActivation()
151
+ )
152
+ (output): BertOutput(
153
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
154
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
155
+ (dropout): Dropout(p=0.1, inplace=False)
156
+ )
157
+ )
158
+ (6): BertLayer(
159
+ (attention): BertAttention(
160
+ (self): BertSelfAttention(
161
+ (query): Linear(in_features=768, out_features=768, bias=True)
162
+ (key): Linear(in_features=768, out_features=768, bias=True)
163
+ (value): Linear(in_features=768, out_features=768, bias=True)
164
+ (dropout): Dropout(p=0.1, inplace=False)
165
+ )
166
+ (output): BertSelfOutput(
167
+ (dense): Linear(in_features=768, out_features=768, bias=True)
168
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
169
+ (dropout): Dropout(p=0.1, inplace=False)
170
+ )
171
+ )
172
+ (intermediate): BertIntermediate(
173
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
174
+ (intermediate_act_fn): GELUActivation()
175
+ )
176
+ (output): BertOutput(
177
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
178
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
179
+ (dropout): Dropout(p=0.1, inplace=False)
180
+ )
181
+ )
182
+ (7): BertLayer(
183
+ (attention): BertAttention(
184
+ (self): BertSelfAttention(
185
+ (query): Linear(in_features=768, out_features=768, bias=True)
186
+ (key): Linear(in_features=768, out_features=768, bias=True)
187
+ (value): Linear(in_features=768, out_features=768, bias=True)
188
+ (dropout): Dropout(p=0.1, inplace=False)
189
+ )
190
+ (output): BertSelfOutput(
191
+ (dense): Linear(in_features=768, out_features=768, bias=True)
192
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
193
+ (dropout): Dropout(p=0.1, inplace=False)
194
+ )
195
+ )
196
+ (intermediate): BertIntermediate(
197
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
198
+ (intermediate_act_fn): GELUActivation()
199
+ )
200
+ (output): BertOutput(
201
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
202
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
203
+ (dropout): Dropout(p=0.1, inplace=False)
204
+ )
205
+ )
206
+ (8): BertLayer(
207
+ (attention): BertAttention(
208
+ (self): BertSelfAttention(
209
+ (query): Linear(in_features=768, out_features=768, bias=True)
210
+ (key): Linear(in_features=768, out_features=768, bias=True)
211
+ (value): Linear(in_features=768, out_features=768, bias=True)
212
+ (dropout): Dropout(p=0.1, inplace=False)
213
+ )
214
+ (output): BertSelfOutput(
215
+ (dense): Linear(in_features=768, out_features=768, bias=True)
216
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
217
+ (dropout): Dropout(p=0.1, inplace=False)
218
+ )
219
+ )
220
+ (intermediate): BertIntermediate(
221
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
222
+ (intermediate_act_fn): GELUActivation()
223
+ )
224
+ (output): BertOutput(
225
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
226
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ )
230
+ (9): BertLayer(
231
+ (attention): BertAttention(
232
+ (self): BertSelfAttention(
233
+ (query): Linear(in_features=768, out_features=768, bias=True)
234
+ (key): Linear(in_features=768, out_features=768, bias=True)
235
+ (value): Linear(in_features=768, out_features=768, bias=True)
236
+ (dropout): Dropout(p=0.1, inplace=False)
237
+ )
238
+ (output): BertSelfOutput(
239
+ (dense): Linear(in_features=768, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (intermediate): BertIntermediate(
245
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
246
+ (intermediate_act_fn): GELUActivation()
247
+ )
248
+ (output): BertOutput(
249
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
250
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
251
+ (dropout): Dropout(p=0.1, inplace=False)
252
+ )
253
+ )
254
+ (10): BertLayer(
255
+ (attention): BertAttention(
256
+ (self): BertSelfAttention(
257
+ (query): Linear(in_features=768, out_features=768, bias=True)
258
+ (key): Linear(in_features=768, out_features=768, bias=True)
259
+ (value): Linear(in_features=768, out_features=768, bias=True)
260
+ (dropout): Dropout(p=0.1, inplace=False)
261
+ )
262
+ (output): BertSelfOutput(
263
+ (dense): Linear(in_features=768, out_features=768, bias=True)
264
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
265
+ (dropout): Dropout(p=0.1, inplace=False)
266
+ )
267
+ )
268
+ (intermediate): BertIntermediate(
269
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
270
+ (intermediate_act_fn): GELUActivation()
271
+ )
272
+ (output): BertOutput(
273
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
274
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
275
+ (dropout): Dropout(p=0.1, inplace=False)
276
+ )
277
+ )
278
+ (11): BertLayer(
279
+ (attention): BertAttention(
280
+ (self): BertSelfAttention(
281
+ (query): Linear(in_features=768, out_features=768, bias=True)
282
+ (key): Linear(in_features=768, out_features=768, bias=True)
283
+ (value): Linear(in_features=768, out_features=768, bias=True)
284
+ (dropout): Dropout(p=0.1, inplace=False)
285
+ )
286
+ (output): BertSelfOutput(
287
+ (dense): Linear(in_features=768, out_features=768, bias=True)
288
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
289
+ (dropout): Dropout(p=0.1, inplace=False)
290
+ )
291
+ )
292
+ (intermediate): BertIntermediate(
293
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
294
+ (intermediate_act_fn): GELUActivation()
295
+ )
296
+ (output): BertOutput(
297
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
298
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
299
+ (dropout): Dropout(p=0.1, inplace=False)
300
+ )
301
+ )
302
+ )
303
+ )
304
+ (pooler): BertPooler(
305
+ (dense): Linear(in_features=768, out_features=768, bias=True)
306
+ (activation): Tanh()
307
+ )
308
+ )
309
+ )
310
+ (word_dropout): WordDropout(p=0.05)
311
+ (locked_dropout): LockedDropout(p=0.5)
312
+ (linear): Linear(in_features=768, out_features=17, bias=True)
313
+ (loss_function): CrossEntropyLoss()
314
+ )"
315
+ 2022-05-09 23:40:59,408 ----------------------------------------------------------------------------------------------------
316
+ 2022-05-09 23:40:59,408 Corpus: "Corpus: 14987 train + 3466 dev + 3684 test sentences"
317
+ 2022-05-09 23:40:59,408 ----------------------------------------------------------------------------------------------------
318
+ 2022-05-09 23:40:59,408 Parameters:
319
+ 2022-05-09 23:40:59,408 - learning_rate: "0.000050"
320
+ 2022-05-09 23:40:59,408 - mini_batch_size: "16"
321
+ 2022-05-09 23:40:59,408 - patience: "3"
322
+ 2022-05-09 23:40:59,409 - anneal_factor: "0.5"
323
+ 2022-05-09 23:40:59,409 - max_epochs: "10"
324
+ 2022-05-09 23:40:59,409 - shuffle: "True"
325
+ 2022-05-09 23:40:59,409 - train_with_dev: "False"
326
+ 2022-05-09 23:40:59,409 - batch_growth_annealing: "False"
327
+ 2022-05-09 23:40:59,409 ----------------------------------------------------------------------------------------------------
328
+ 2022-05-09 23:40:59,409 Model training base path: "resources\taggers\ner"
329
+ 2022-05-09 23:40:59,409 ----------------------------------------------------------------------------------------------------
330
+ 2022-05-09 23:40:59,409 Device: cuda:0
331
+ 2022-05-09 23:40:59,410 ----------------------------------------------------------------------------------------------------
332
+ 2022-05-09 23:40:59,410 Embeddings storage mode: none
333
+ 2022-05-09 23:40:59,410 ----------------------------------------------------------------------------------------------------
334
+ 2022-05-09 23:41:15,820 epoch 1 - iter 93/937 - loss 2.04152065 - samples/sec: 90.73 - lr: 0.000005
335
+ 2022-05-09 23:41:31,406 epoch 1 - iter 186/937 - loss 1.48569545 - samples/sec: 95.52 - lr: 0.000010
336
+ 2022-05-09 23:41:46,603 epoch 1 - iter 279/937 - loss 1.18645416 - samples/sec: 97.92 - lr: 0.000015
337
+ 2022-05-09 23:42:01,525 epoch 1 - iter 372/937 - loss 1.01481547 - samples/sec: 99.74 - lr: 0.000020
338
+ 2022-05-09 23:42:16,869 epoch 1 - iter 465/937 - loss 0.86894115 - samples/sec: 97.01 - lr: 0.000025
339
+ 2022-05-09 23:42:32,505 epoch 1 - iter 558/937 - loss 0.75848951 - samples/sec: 95.21 - lr: 0.000030
340
+ 2022-05-09 23:42:48,889 epoch 1 - iter 651/937 - loss 0.68004440 - samples/sec: 90.87 - lr: 0.000035
341
+ 2022-05-09 23:43:05,305 epoch 1 - iter 744/937 - loss 0.62468227 - samples/sec: 90.67 - lr: 0.000040
342
+ 2022-05-09 23:43:22,552 epoch 1 - iter 837/937 - loss 0.57575609 - samples/sec: 86.33 - lr: 0.000045
343
+ 2022-05-09 23:43:40,505 epoch 1 - iter 930/937 - loss 0.53467358 - samples/sec: 82.91 - lr: 0.000050
344
+ 2022-05-09 23:43:41,669 ----------------------------------------------------------------------------------------------------
345
+ 2022-05-09 23:43:41,670 EPOCH 1 done: loss 0.5328 - lr 0.000050
346
+ 2022-05-09 23:44:01,944 Evaluating as a multi-label problem: False
347
+ 2022-05-09 23:44:01,998 DEV : loss 0.08702843636274338 - f1-score (micro avg) 0.9042
348
+ 2022-05-09 23:44:02,088 BAD EPOCHS (no improvement): 4
349
+ 2022-05-09 23:44:02,089 ----------------------------------------------------------------------------------------------------
350
+ 2022-05-09 23:44:19,412 epoch 2 - iter 93/937 - loss 0.21171218 - samples/sec: 85.94 - lr: 0.000049
351
+ 2022-05-09 23:44:39,339 epoch 2 - iter 186/937 - loss 0.20667256 - samples/sec: 74.71 - lr: 0.000049
352
+ 2022-05-09 23:44:57,325 epoch 2 - iter 279/937 - loss 0.20359662 - samples/sec: 82.76 - lr: 0.000048
353
+ 2022-05-09 23:45:15,903 epoch 2 - iter 372/937 - loss 0.20181902 - samples/sec: 80.11 - lr: 0.000048
354
+ 2022-05-09 23:45:33,625 epoch 2 - iter 465/937 - loss 0.20239195 - samples/sec: 84.00 - lr: 0.000047
355
+ 2022-05-09 23:45:51,983 epoch 2 - iter 558/937 - loss 0.20029145 - samples/sec: 81.07 - lr: 0.000047
356
+ 2022-05-09 23:46:10,178 epoch 2 - iter 651/937 - loss 0.19802516 - samples/sec: 81.82 - lr: 0.000046
357
+ 2022-05-09 23:46:27,567 epoch 2 - iter 744/937 - loss 0.19751023 - samples/sec: 85.60 - lr: 0.000046
358
+ 2022-05-09 23:46:46,030 epoch 2 - iter 837/937 - loss 0.19578745 - samples/sec: 80.62 - lr: 0.000045
359
+ 2022-05-09 23:47:03,838 epoch 2 - iter 930/937 - loss 0.19400286 - samples/sec: 83.60 - lr: 0.000044
360
+ 2022-05-09 23:47:05,067 ----------------------------------------------------------------------------------------------------
361
+ 2022-05-09 23:47:05,067 EPOCH 2 done: loss 0.1938 - lr 0.000044
362
+ 2022-05-09 23:47:24,009 Evaluating as a multi-label problem: False
363
+ 2022-05-09 23:47:24,058 DEV : loss 0.06405811011791229 - f1-score (micro avg) 0.9361
364
+ 2022-05-09 23:47:24,143 BAD EPOCHS (no improvement): 4
365
+ 2022-05-09 23:47:24,144 ----------------------------------------------------------------------------------------------------
366
+ 2022-05-09 23:47:43,087 epoch 3 - iter 93/937 - loss 0.17145472 - samples/sec: 78.59 - lr: 0.000044
367
+ 2022-05-09 23:48:02,729 epoch 3 - iter 186/937 - loss 0.16975910 - samples/sec: 75.78 - lr: 0.000043
368
+ 2022-05-09 23:48:22,058 epoch 3 - iter 279/937 - loss 0.16698979 - samples/sec: 77.00 - lr: 0.000043
369
+ 2022-05-09 23:48:42,011 epoch 3 - iter 372/937 - loss 0.16408423 - samples/sec: 74.60 - lr: 0.000042
370
+ 2022-05-09 23:49:02,832 epoch 3 - iter 465/937 - loss 0.16405058 - samples/sec: 71.49 - lr: 0.000042
371
+ 2022-05-09 23:49:24,164 epoch 3 - iter 558/937 - loss 0.16308247 - samples/sec: 69.79 - lr: 0.000041
372
+ 2022-05-09 23:49:44,385 epoch 3 - iter 651/937 - loss 0.16211092 - samples/sec: 73.61 - lr: 0.000041
373
+ 2022-05-09 23:50:05,176 epoch 3 - iter 744/937 - loss 0.16230919 - samples/sec: 71.59 - lr: 0.000040
374
+ 2022-05-09 23:50:24,259 epoch 3 - iter 837/937 - loss 0.16223568 - samples/sec: 78.01 - lr: 0.000039
375
+ 2022-05-09 23:50:42,702 epoch 3 - iter 930/937 - loss 0.16166223 - samples/sec: 80.71 - lr: 0.000039
376
+ 2022-05-09 23:50:43,928 ----------------------------------------------------------------------------------------------------
377
+ 2022-05-09 23:50:43,928 EPOCH 3 done: loss 0.1620 - lr 0.000039
378
+ 2022-05-09 23:51:01,357 Evaluating as a multi-label problem: False
379
+ 2022-05-09 23:51:01,410 DEV : loss 0.06513667851686478 - f1-score (micro avg) 0.9462
380
+ 2022-05-09 23:51:01,494 BAD EPOCHS (no improvement): 4
381
+ 2022-05-09 23:51:01,495 ----------------------------------------------------------------------------------------------------
382
+ 2022-05-09 23:51:19,373 epoch 4 - iter 93/937 - loss 0.14617156 - samples/sec: 83.28 - lr: 0.000038
383
+ 2022-05-09 23:51:39,862 epoch 4 - iter 186/937 - loss 0.15318927 - samples/sec: 72.64 - lr: 0.000038
384
+ 2022-05-09 23:51:58,633 epoch 4 - iter 279/937 - loss 0.15311397 - samples/sec: 79.31 - lr: 0.000037
385
+ 2022-05-09 23:52:17,782 epoch 4 - iter 372/937 - loss 0.15237270 - samples/sec: 77.73 - lr: 0.000037
386
+ 2022-05-09 23:52:37,756 epoch 4 - iter 465/937 - loss 0.15252893 - samples/sec: 74.51 - lr: 0.000036
387
+ 2022-05-09 23:52:57,040 epoch 4 - iter 558/937 - loss 0.15296964 - samples/sec: 77.19 - lr: 0.000036
388
+ 2022-05-09 23:53:17,120 epoch 4 - iter 651/937 - loss 0.15177070 - samples/sec: 74.12 - lr: 0.000035
389
+ 2022-05-09 23:53:36,789 epoch 4 - iter 744/937 - loss 0.15212670 - samples/sec: 75.67 - lr: 0.000034
390
+ 2022-05-09 23:53:55,789 epoch 4 - iter 837/937 - loss 0.15188826 - samples/sec: 78.35 - lr: 0.000034
391
+ 2022-05-09 23:54:15,078 epoch 4 - iter 930/937 - loss 0.15158585 - samples/sec: 77.16 - lr: 0.000033
392
+ 2022-05-09 23:54:16,427 ----------------------------------------------------------------------------------------------------
393
+ 2022-05-09 23:54:16,428 EPOCH 4 done: loss 0.1514 - lr 0.000033
394
+ 2022-05-09 23:54:37,613 Evaluating as a multi-label problem: False
395
+ 2022-05-09 23:54:37,666 DEV : loss 0.0851067453622818 - f1-score (micro avg) 0.9445
396
+ 2022-05-09 23:54:37,758 BAD EPOCHS (no improvement): 4
397
+ 2022-05-09 23:54:37,759 ----------------------------------------------------------------------------------------------------
398
+ 2022-05-09 23:54:57,548 epoch 5 - iter 93/937 - loss 0.13786995 - samples/sec: 75.23 - lr: 0.000033
399
+ 2022-05-09 23:55:17,232 epoch 5 - iter 186/937 - loss 0.14230070 - samples/sec: 75.62 - lr: 0.000032
400
+ 2022-05-09 23:55:36,628 epoch 5 - iter 279/937 - loss 0.14258916 - samples/sec: 76.74 - lr: 0.000032
401
+ 2022-05-09 23:55:56,340 epoch 5 - iter 372/937 - loss 0.14284130 - samples/sec: 75.52 - lr: 0.000031
402
+ 2022-05-09 23:56:15,854 epoch 5 - iter 465/937 - loss 0.14169986 - samples/sec: 76.27 - lr: 0.000031
403
+ 2022-05-09 23:56:34,410 epoch 5 - iter 558/937 - loss 0.14100332 - samples/sec: 80.21 - lr: 0.000030
404
+ 2022-05-09 23:56:53,730 epoch 5 - iter 651/937 - loss 0.14139534 - samples/sec: 77.04 - lr: 0.000029
405
+ 2022-05-09 23:57:12,846 epoch 5 - iter 744/937 - loss 0.14072810 - samples/sec: 77.88 - lr: 0.000029
406
+ 2022-05-09 23:57:32,509 epoch 5 - iter 837/937 - loss 0.13972343 - samples/sec: 75.72 - lr: 0.000028
407
+ 2022-05-09 23:57:51,218 epoch 5 - iter 930/937 - loss 0.14088149 - samples/sec: 79.56 - lr: 0.000028
408
+ 2022-05-09 23:57:52,684 ----------------------------------------------------------------------------------------------------
409
+ 2022-05-09 23:57:52,685 EPOCH 5 done: loss 0.1408 - lr 0.000028
410
+ 2022-05-09 23:58:11,005 Evaluating as a multi-label problem: False
411
+ 2022-05-09 23:58:11,060 DEV : loss 0.07939312607049942 - f1-score (micro avg) 0.9502
412
+ 2022-05-09 23:58:11,147 BAD EPOCHS (no improvement): 4
413
+ 2022-05-09 23:58:11,148 ----------------------------------------------------------------------------------------------------
414
+ 2022-05-09 23:58:29,830 epoch 6 - iter 93/937 - loss 0.13587072 - samples/sec: 79.69 - lr: 0.000027
415
+ 2022-05-09 23:58:48,422 epoch 6 - iter 186/937 - loss 0.13733201 - samples/sec: 80.06 - lr: 0.000027
416
+ 2022-05-09 23:59:06,303 epoch 6 - iter 279/937 - loss 0.14061270 - samples/sec: 83.23 - lr: 0.000026
417
+ 2022-05-09 23:59:24,586 epoch 6 - iter 372/937 - loss 0.13957657 - samples/sec: 81.44 - lr: 0.000026
418
+ 2022-05-09 23:59:43,413 epoch 6 - iter 465/937 - loss 0.13980319 - samples/sec: 79.05 - lr: 0.000025
419
+ 2022-05-10 00:00:01,871 epoch 6 - iter 558/937 - loss 0.13997926 - samples/sec: 80.63 - lr: 0.000024
420
+ 2022-05-10 00:00:19,776 epoch 6 - iter 651/937 - loss 0.13934109 - samples/sec: 83.13 - lr: 0.000024
421
+ 2022-05-10 00:00:38,921 epoch 6 - iter 744/937 - loss 0.13935470 - samples/sec: 77.75 - lr: 0.000023
422
+ 2022-05-10 00:00:57,515 epoch 6 - iter 837/937 - loss 0.13944998 - samples/sec: 80.07 - lr: 0.000023
423
+ 2022-05-10 00:01:15,467 epoch 6 - iter 930/937 - loss 0.13962343 - samples/sec: 82.92 - lr: 0.000022
424
+ 2022-05-10 00:01:16,715 ----------------------------------------------------------------------------------------------------
425
+ 2022-05-10 00:01:16,715 EPOCH 6 done: loss 0.1396 - lr 0.000022
426
+ 2022-05-10 00:01:40,529 Evaluating as a multi-label problem: False
427
+ 2022-05-10 00:01:40,579 DEV : loss 0.08579559624195099 - f1-score (micro avg) 0.9497
428
+ 2022-05-10 00:01:40,666 BAD EPOCHS (no improvement): 4
429
+ 2022-05-10 00:01:40,667 ----------------------------------------------------------------------------------------------------
430
+ 2022-05-10 00:01:59,831 epoch 7 - iter 93/937 - loss 0.13534539 - samples/sec: 77.69 - lr: 0.000022
431
+ 2022-05-10 00:02:18,246 epoch 7 - iter 186/937 - loss 0.13551684 - samples/sec: 80.83 - lr: 0.000021
432
+ 2022-05-10 00:02:36,156 epoch 7 - iter 279/937 - loss 0.13584534 - samples/sec: 83.13 - lr: 0.000021
433
+ 2022-05-10 00:02:55,093 epoch 7 - iter 372/937 - loss 0.13345388 - samples/sec: 78.60 - lr: 0.000020
434
+ 2022-05-10 00:03:13,968 epoch 7 - iter 465/937 - loss 0.13357006 - samples/sec: 78.85 - lr: 0.000019
435
+ 2022-05-10 00:03:33,833 epoch 7 - iter 558/937 - loss 0.13346607 - samples/sec: 74.94 - lr: 0.000019
436
+ 2022-05-10 00:03:52,609 epoch 7 - iter 651/937 - loss 0.13318798 - samples/sec: 79.29 - lr: 0.000018
437
+ 2022-05-10 00:04:11,143 epoch 7 - iter 744/937 - loss 0.13297235 - samples/sec: 80.32 - lr: 0.000018
438
+ 2022-05-10 00:04:29,324 epoch 7 - iter 837/937 - loss 0.13294986 - samples/sec: 81.87 - lr: 0.000017
439
+ 2022-05-10 00:04:48,227 epoch 7 - iter 930/937 - loss 0.13304211 - samples/sec: 78.74 - lr: 0.000017
440
+ 2022-05-10 00:04:49,540 ----------------------------------------------------------------------------------------------------
441
+ 2022-05-10 00:04:49,540 EPOCH 7 done: loss 0.1331 - lr 0.000017
442
+ 2022-05-10 00:05:07,897 Evaluating as a multi-label problem: False
443
+ 2022-05-10 00:05:07,956 DEV : loss 0.09259101003408432 - f1-score (micro avg) 0.9515
444
+ 2022-05-10 00:05:08,048 BAD EPOCHS (no improvement): 4
445
+ 2022-05-10 00:05:08,049 ----------------------------------------------------------------------------------------------------
446
+ 2022-05-10 00:05:26,187 epoch 8 - iter 93/937 - loss 0.13287977 - samples/sec: 82.08 - lr: 0.000016
447
+ 2022-05-10 00:05:46,292 epoch 8 - iter 186/937 - loss 0.13409706 - samples/sec: 74.04 - lr: 0.000016
448
+ 2022-05-10 00:06:04,623 epoch 8 - iter 279/937 - loss 0.13270913 - samples/sec: 81.19 - lr: 0.000015
449
+ 2022-05-10 00:06:23,601 epoch 8 - iter 372/937 - loss 0.13243728 - samples/sec: 78.43 - lr: 0.000014
450
+ 2022-05-10 00:06:42,643 epoch 8 - iter 465/937 - loss 0.13287784 - samples/sec: 78.17 - lr: 0.000014
451
+ 2022-05-10 00:07:02,185 epoch 8 - iter 558/937 - loss 0.13373988 - samples/sec: 76.17 - lr: 0.000013
452
+ 2022-05-10 00:07:20,122 epoch 8 - iter 651/937 - loss 0.13402409 - samples/sec: 82.98 - lr: 0.000013
453
+ 2022-05-10 00:07:39,327 epoch 8 - iter 744/937 - loss 0.13327101 - samples/sec: 77.50 - lr: 0.000012
454
+ 2022-05-10 00:07:57,782 epoch 8 - iter 837/937 - loss 0.13355020 - samples/sec: 80.65 - lr: 0.000012
455
+ 2022-05-10 00:08:16,804 epoch 8 - iter 930/937 - loss 0.13294805 - samples/sec: 78.25 - lr: 0.000011
456
+ 2022-05-10 00:08:18,099 ----------------------------------------------------------------------------------------------------
457
+ 2022-05-10 00:08:18,099 EPOCH 8 done: loss 0.1327 - lr 0.000011
458
+ 2022-05-10 00:08:36,160 Evaluating as a multi-label problem: False
459
+ 2022-05-10 00:08:36,214 DEV : loss 0.09469996392726898 - f1-score (micro avg) 0.9505
460
+ 2022-05-10 00:08:36,300 BAD EPOCHS (no improvement): 4
461
+ 2022-05-10 00:08:36,301 ----------------------------------------------------------------------------------------------------
462
+ 2022-05-10 00:08:54,628 epoch 9 - iter 93/937 - loss 0.13256573 - samples/sec: 81.23 - lr: 0.000011
463
+ 2022-05-10 00:09:13,253 epoch 9 - iter 186/937 - loss 0.13218317 - samples/sec: 79.94 - lr: 0.000010
464
+ 2022-05-10 00:09:31,322 epoch 9 - iter 279/937 - loss 0.13240640 - samples/sec: 82.40 - lr: 0.000009
465
+ 2022-05-10 00:09:49,199 epoch 9 - iter 372/937 - loss 0.13118429 - samples/sec: 83.28 - lr: 0.000009
466
+ 2022-05-10 00:10:06,958 epoch 9 - iter 465/937 - loss 0.13128632 - samples/sec: 83.83 - lr: 0.000008
467
+ 2022-05-10 00:10:25,134 epoch 9 - iter 558/937 - loss 0.12936261 - samples/sec: 81.90 - lr: 0.000008
468
+ 2022-05-10 00:10:43,680 epoch 9 - iter 651/937 - loss 0.12973987 - samples/sec: 80.27 - lr: 0.000007
469
+ 2022-05-10 00:11:01,678 epoch 9 - iter 744/937 - loss 0.12968500 - samples/sec: 82.71 - lr: 0.000007
470
+ 2022-05-10 00:11:19,484 epoch 9 - iter 837/937 - loss 0.12985020 - samples/sec: 83.59 - lr: 0.000006
471
+ 2022-05-10 00:11:37,340 epoch 9 - iter 930/937 - loss 0.12947938 - samples/sec: 83.36 - lr: 0.000006
472
+ 2022-05-10 00:11:38,689 ----------------------------------------------------------------------------------------------------
473
+ 2022-05-10 00:11:38,689 EPOCH 9 done: loss 0.1294 - lr 0.000006
474
+ 2022-05-10 00:11:56,867 Evaluating as a multi-label problem: False
475
+ 2022-05-10 00:11:56,918 DEV : loss 0.09501232951879501 - f1-score (micro avg) 0.9504
476
+ 2022-05-10 00:11:57,003 BAD EPOCHS (no improvement): 4
477
+ 2022-05-10 00:11:57,004 ----------------------------------------------------------------------------------------------------
478
+ 2022-05-10 00:12:15,701 epoch 10 - iter 93/937 - loss 0.12882436 - samples/sec: 79.62 - lr: 0.000005
479
+ 2022-05-10 00:12:34,784 epoch 10 - iter 186/937 - loss 0.12932802 - samples/sec: 78.02 - lr: 0.000004
480
+ 2022-05-10 00:12:53,563 epoch 10 - iter 279/937 - loss 0.12935565 - samples/sec: 79.27 - lr: 0.000004
481
+ 2022-05-10 00:13:12,428 epoch 10 - iter 372/937 - loss 0.13016513 - samples/sec: 78.91 - lr: 0.000003
482
+ 2022-05-10 00:13:31,484 epoch 10 - iter 465/937 - loss 0.13001423 - samples/sec: 78.12 - lr: 0.000003
483
+ 2022-05-10 00:13:50,860 epoch 10 - iter 558/937 - loss 0.12967414 - samples/sec: 76.82 - lr: 0.000002
484
+ 2022-05-10 00:14:10,036 epoch 10 - iter 651/937 - loss 0.13044245 - samples/sec: 77.61 - lr: 0.000002
485
+ 2022-05-10 00:14:29,046 epoch 10 - iter 744/937 - loss 0.13049319 - samples/sec: 78.30 - lr: 0.000001
486
+ 2022-05-10 00:14:47,934 epoch 10 - iter 837/937 - loss 0.12970693 - samples/sec: 78.83 - lr: 0.000001
487
+ 2022-05-10 00:15:06,881 epoch 10 - iter 930/937 - loss 0.12987301 - samples/sec: 78.57 - lr: 0.000000
488
+ 2022-05-10 00:15:08,384 ----------------------------------------------------------------------------------------------------
489
+ 2022-05-10 00:15:08,384 EPOCH 10 done: loss 0.1298 - lr 0.000000
490
+ 2022-05-10 00:15:27,169 Evaluating as a multi-label problem: False
491
+ 2022-05-10 00:15:27,221 DEV : loss 0.09416753053665161 - f1-score (micro avg) 0.9513
492
+ 2022-05-10 00:15:27,303 BAD EPOCHS (no improvement): 4
493
+ 2022-05-10 00:15:28,112 ----------------------------------------------------------------------------------------------------
494
+ 2022-05-10 00:15:28,113 Testing using last state of model ...
495
+ 2022-05-10 00:15:47,035 Evaluating as a multi-label problem: False
496
+ 2022-05-10 00:15:47,087 0.9117 0.9212 0.9164 0.879
497
+ 2022-05-10 00:15:47,087
498
+ Results:
499
+ - F-score (micro) 0.9164
500
+ - F-score (macro) 0.9024
501
+ - Accuracy 0.879
502
+
503
+ By class:
504
+ precision recall f1-score support
505
+
506
+ ORG 0.8893 0.9097 0.8994 1661
507
+ LOC 0.9301 0.9335 0.9318 1668
508
+ PER 0.9699 0.9579 0.9639 1617
509
+ MISC 0.7951 0.8348 0.8145 702
510
+
511
+ micro avg 0.9117 0.9212 0.9164 5648
512
+ macro avg 0.8961 0.9090 0.9024 5648
513
+ weighted avg 0.9127 0.9212 0.9169 5648
514
+
515
+ 2022-05-10 00:15:47,088 ----------------------------------------------------------------------------------------------------