alanakbik commited on
Commit
9c1cbc3
1 Parent(s): 0e58c4d

initial commit

Browse files
Files changed (4) hide show
  1. README.md +155 -0
  2. loss.tsv +21 -0
  3. pytorch_model.bin +3 -0
  4. training.log +915 -0
README.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: en
7
+ datasets:
8
+ - conll2003
9
+ inference: false
10
+ ---
11
+
12
+ ## English NER in Flair (large model)
13
+
14
+ This is the large 4-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).
15
+
16
+ F1-Score: **94,36** (corrected CoNLL-03)
17
+
18
+ Predicts 4 tags:
19
+
20
+ | **tag** | **meaning** |
21
+ |---------------------------------|-----------|
22
+ | PER | person name |
23
+ | LOC | location name |
24
+ | ORG | organization name |
25
+ | MISC | other name |
26
+
27
+ Based on [document-level XLM-R embeddings](https://www.aclweb.org/anthology/C18-1139/).
28
+
29
+ ---
30
+
31
+ ### Demo: How to use in Flair
32
+
33
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
34
+
35
+ ```python
36
+ from flair.data import Sentence
37
+ from flair.models import SequenceTagger
38
+
39
+ # load tagger
40
+ tagger = SequenceTagger.load("flair/ner-english-large")
41
+
42
+ # make example sentence
43
+ sentence = Sentence("George Washington went to Washington")
44
+
45
+ # predict NER tags
46
+ tagger.predict(sentence)
47
+
48
+ # print sentence
49
+ print(sentence)
50
+
51
+ # print predicted NER spans
52
+ print('The following NER tags are found:')
53
+ # iterate over entities and print
54
+ for entity in sentence.get_spans('ner'):
55
+ print(entity)
56
+
57
+ ```
58
+
59
+ This yields the following output:
60
+ ```
61
+ Span [1,2]: "George Washington" [− Labels: PER (0.9968)]
62
+ Span [5]: "Washington" [− Labels: LOC (0.9994)]
63
+ ```
64
+
65
+ So, the entities "*George Washington*" (labeled as a **person**) and "*Washington*" (labeled as a **location**) are found in the sentence "*George Washington went to Washington*".
66
+
67
+
68
+ ---
69
+
70
+ ### Training: Script to train this model
71
+
72
+ The following Flair script was used to train this model:
73
+
74
+ ```python
75
+ import torch
76
+
77
+ # 1. get the corpus
78
+ from flair.datasets import CONLL_03
79
+
80
+ corpus = CONLL_03()
81
+
82
+ # 2. what tag do we want to predict?
83
+ tag_type = 'ner'
84
+
85
+ # 3. make the tag dictionary from the corpus
86
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
87
+
88
+ # 4. initialize fine-tuneable transformer embeddings WITH document context
89
+ from flair.embeddings import TransformerWordEmbeddings
90
+
91
+ embeddings = TransformerWordEmbeddings(
92
+ model='xlm-roberta-large',
93
+ layers="-1",
94
+ subtoken_pooling="first",
95
+ fine_tune=True,
96
+ use_context=True,
97
+ )
98
+
99
+ # 5. initialize bare-bones sequence tagger (no CRF, no RNN, no reprojection)
100
+ from flair.models import SequenceTagger
101
+
102
+ tagger = SequenceTagger(
103
+ hidden_size=256,
104
+ embeddings=embeddings,
105
+ tag_dictionary=tag_dictionary,
106
+ tag_type='ner',
107
+ use_crf=False,
108
+ use_rnn=False,
109
+ reproject_embeddings=False,
110
+ )
111
+
112
+ # 6. initialize trainer with AdamW optimizer
113
+ from flair.trainers import ModelTrainer
114
+
115
+ trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.AdamW)
116
+
117
+ # 7. run training with XLM parameters (20 epochs, small LR)
118
+ from torch.optim.lr_scheduler import OneCycleLR
119
+
120
+ trainer.train('resources/taggers/ner-english-large',
121
+ learning_rate=5.0e-6,
122
+ mini_batch_size=4,
123
+ mini_batch_chunk_size=1,
124
+ max_epochs=20,
125
+ scheduler=OneCycleLR,
126
+ embeddings_storage_mode='none',
127
+ weight_decay=0.,
128
+ )
129
+
130
+ )
131
+ ```
132
+
133
+
134
+
135
+ ---
136
+
137
+ ### Cite
138
+
139
+ Please cite the following paper when using this model.
140
+
141
+ ```
142
+ @inproceedings{akbik2018coling,
143
+ title={Contextual String Embeddings for Sequence Labeling},
144
+ author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
145
+ booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
146
+ pages = {1638--1649},
147
+ year = {2018}
148
+ }
149
+ ```
150
+
151
+ ---
152
+
153
+ ### Issues?
154
+
155
+ The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
loss.tsv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS TEST_LOSS TEST_PRECISION TEST_RECALL TEST_F1
2
+ 1 12:33:21 4 0.0000 0.3592169054972083 0.17636922001838684 0.9072 0.9043 0.9057
3
+ 2 13:10:20 4 0.0000 0.2211920055868173 0.11124755442142487 0.9254 0.9305 0.9279
4
+ 3 13:47:17 4 0.0000 0.19381283097644833 0.11028687655925751 0.9294 0.9402 0.9348
5
+ 4 14:24:09 4 0.0000 0.1925625054863614 0.10681818425655365 0.9321 0.9445 0.9383
6
+ 5 15:00:56 4 0.0000 0.18075492875338123 0.11206260323524475 0.9368 0.9406 0.9387
7
+ 6 15:37:42 4 0.0000 0.17524857581343897 0.11003755778074265 0.9335 0.9410 0.9372
8
+ 7 16:14:31 4 0.0000 0.15931038153566326 0.1253410428762436 0.9304 0.9358 0.9331
9
+ 8 16:51:24 4 0.0000 0.16053336656099176 0.12391051650047302 0.9396 0.9426 0.9411
10
+ 9 17:28:05 4 0.0000 0.1590428160623491 0.1257738471031189 0.9339 0.9473 0.9406
11
+ 10 18:04:53 4 0.0000 0.14964490167298533 0.1382586508989334 0.9302 0.9408 0.9355
12
+ 11 18:41:33 4 0.0000 0.14682254513923948 0.13701947033405304 0.9351 0.9424 0.9387
13
+ 12 19:18:12 4 0.0000 0.15043419943368025 0.15095502138137817 0.9359 0.9418 0.9388
14
+ 13 19:54:57 4 0.0000 0.14460110974684232 0.14258751273155212 0.9374 0.9424 0.9399
15
+ 14 20:31:42 4 0.0000 0.14146839202254796 0.16016331315040588 0.9372 0.9420 0.9396
16
+ 15 21:08:34 4 0.0000 0.14678317207995362 0.15258659422397614 0.9380 0.9420 0.9400
17
+ 16 21:45:16 4 0.0000 0.152214311589488 0.14317740499973297 0.9405 0.9465 0.9434
18
+ 17 22:21:56 4 0.0000 0.1459472061536186 0.14864514768123627 0.9411 0.9459 0.9435
19
+ 18 22:58:37 4 0.0000 0.1397127115109613 0.1518455296754837 0.9409 0.9465 0.9437
20
+ 19 23:35:20 4 0.0000 0.14249197369562547 0.15170469880104065 0.9406 0.9461 0.9433
21
+ 20 00:12:06 4 0.0000 0.13975302390449296 0.15191785991191864 0.9408 0.9465 0.9436
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f59c05bbd3db05518b632f212b1aac7de1ff0b3914d6c0d587b6a68e214a287
3
+ size 2239866761
training.log ADDED
@@ -0,0 +1,915 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-02-20 11:56:18,090 ----------------------------------------------------------------------------------------------------
2
+ 2021-02-20 11:56:18,093 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): XLMRobertaModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(250002, 1024, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 1024, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 1024)
9
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
18
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
19
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
24
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
30
+ )
31
+ (output): RobertaOutput(
32
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
33
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
34
+ (dropout): Dropout(p=0.1, inplace=False)
35
+ )
36
+ )
37
+ (1): RobertaLayer(
38
+ (attention): RobertaAttention(
39
+ (self): RobertaSelfAttention(
40
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
41
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
42
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
43
+ (dropout): Dropout(p=0.1, inplace=False)
44
+ )
45
+ (output): RobertaSelfOutput(
46
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
47
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
48
+ (dropout): Dropout(p=0.1, inplace=False)
49
+ )
50
+ )
51
+ (intermediate): RobertaIntermediate(
52
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
53
+ )
54
+ (output): RobertaOutput(
55
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
56
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
57
+ (dropout): Dropout(p=0.1, inplace=False)
58
+ )
59
+ )
60
+ (2): RobertaLayer(
61
+ (attention): RobertaAttention(
62
+ (self): RobertaSelfAttention(
63
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
64
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
65
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
66
+ (dropout): Dropout(p=0.1, inplace=False)
67
+ )
68
+ (output): RobertaSelfOutput(
69
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
70
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
71
+ (dropout): Dropout(p=0.1, inplace=False)
72
+ )
73
+ )
74
+ (intermediate): RobertaIntermediate(
75
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
76
+ )
77
+ (output): RobertaOutput(
78
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
79
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
80
+ (dropout): Dropout(p=0.1, inplace=False)
81
+ )
82
+ )
83
+ (3): RobertaLayer(
84
+ (attention): RobertaAttention(
85
+ (self): RobertaSelfAttention(
86
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
87
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
88
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
89
+ (dropout): Dropout(p=0.1, inplace=False)
90
+ )
91
+ (output): RobertaSelfOutput(
92
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
93
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
94
+ (dropout): Dropout(p=0.1, inplace=False)
95
+ )
96
+ )
97
+ (intermediate): RobertaIntermediate(
98
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
99
+ )
100
+ (output): RobertaOutput(
101
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
102
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
103
+ (dropout): Dropout(p=0.1, inplace=False)
104
+ )
105
+ )
106
+ (4): RobertaLayer(
107
+ (attention): RobertaAttention(
108
+ (self): RobertaSelfAttention(
109
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
110
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
111
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
112
+ (dropout): Dropout(p=0.1, inplace=False)
113
+ )
114
+ (output): RobertaSelfOutput(
115
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
116
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
117
+ (dropout): Dropout(p=0.1, inplace=False)
118
+ )
119
+ )
120
+ (intermediate): RobertaIntermediate(
121
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
122
+ )
123
+ (output): RobertaOutput(
124
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
125
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
126
+ (dropout): Dropout(p=0.1, inplace=False)
127
+ )
128
+ )
129
+ (5): RobertaLayer(
130
+ (attention): RobertaAttention(
131
+ (self): RobertaSelfAttention(
132
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
133
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
134
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
135
+ (dropout): Dropout(p=0.1, inplace=False)
136
+ )
137
+ (output): RobertaSelfOutput(
138
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
139
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ )
143
+ (intermediate): RobertaIntermediate(
144
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
145
+ )
146
+ (output): RobertaOutput(
147
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
148
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
149
+ (dropout): Dropout(p=0.1, inplace=False)
150
+ )
151
+ )
152
+ (6): RobertaLayer(
153
+ (attention): RobertaAttention(
154
+ (self): RobertaSelfAttention(
155
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
156
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
157
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
158
+ (dropout): Dropout(p=0.1, inplace=False)
159
+ )
160
+ (output): RobertaSelfOutput(
161
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
162
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
163
+ (dropout): Dropout(p=0.1, inplace=False)
164
+ )
165
+ )
166
+ (intermediate): RobertaIntermediate(
167
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
168
+ )
169
+ (output): RobertaOutput(
170
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
171
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
172
+ (dropout): Dropout(p=0.1, inplace=False)
173
+ )
174
+ )
175
+ (7): RobertaLayer(
176
+ (attention): RobertaAttention(
177
+ (self): RobertaSelfAttention(
178
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
179
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
180
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
181
+ (dropout): Dropout(p=0.1, inplace=False)
182
+ )
183
+ (output): RobertaSelfOutput(
184
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
185
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
186
+ (dropout): Dropout(p=0.1, inplace=False)
187
+ )
188
+ )
189
+ (intermediate): RobertaIntermediate(
190
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
191
+ )
192
+ (output): RobertaOutput(
193
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
194
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
195
+ (dropout): Dropout(p=0.1, inplace=False)
196
+ )
197
+ )
198
+ (8): RobertaLayer(
199
+ (attention): RobertaAttention(
200
+ (self): RobertaSelfAttention(
201
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
202
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
203
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
204
+ (dropout): Dropout(p=0.1, inplace=False)
205
+ )
206
+ (output): RobertaSelfOutput(
207
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
208
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
209
+ (dropout): Dropout(p=0.1, inplace=False)
210
+ )
211
+ )
212
+ (intermediate): RobertaIntermediate(
213
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
214
+ )
215
+ (output): RobertaOutput(
216
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
217
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
218
+ (dropout): Dropout(p=0.1, inplace=False)
219
+ )
220
+ )
221
+ (9): RobertaLayer(
222
+ (attention): RobertaAttention(
223
+ (self): RobertaSelfAttention(
224
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
225
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
226
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ (output): RobertaSelfOutput(
230
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
231
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
232
+ (dropout): Dropout(p=0.1, inplace=False)
233
+ )
234
+ )
235
+ (intermediate): RobertaIntermediate(
236
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
237
+ )
238
+ (output): RobertaOutput(
239
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
240
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (10): RobertaLayer(
245
+ (attention): RobertaAttention(
246
+ (self): RobertaSelfAttention(
247
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
248
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
249
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
250
+ (dropout): Dropout(p=0.1, inplace=False)
251
+ )
252
+ (output): RobertaSelfOutput(
253
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
254
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
255
+ (dropout): Dropout(p=0.1, inplace=False)
256
+ )
257
+ )
258
+ (intermediate): RobertaIntermediate(
259
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
260
+ )
261
+ (output): RobertaOutput(
262
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
263
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
264
+ (dropout): Dropout(p=0.1, inplace=False)
265
+ )
266
+ )
267
+ (11): RobertaLayer(
268
+ (attention): RobertaAttention(
269
+ (self): RobertaSelfAttention(
270
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
271
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
272
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
273
+ (dropout): Dropout(p=0.1, inplace=False)
274
+ )
275
+ (output): RobertaSelfOutput(
276
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
277
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
278
+ (dropout): Dropout(p=0.1, inplace=False)
279
+ )
280
+ )
281
+ (intermediate): RobertaIntermediate(
282
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
283
+ )
284
+ (output): RobertaOutput(
285
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
286
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
287
+ (dropout): Dropout(p=0.1, inplace=False)
288
+ )
289
+ )
290
+ (12): RobertaLayer(
291
+ (attention): RobertaAttention(
292
+ (self): RobertaSelfAttention(
293
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
294
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
295
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
296
+ (dropout): Dropout(p=0.1, inplace=False)
297
+ )
298
+ (output): RobertaSelfOutput(
299
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
300
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
301
+ (dropout): Dropout(p=0.1, inplace=False)
302
+ )
303
+ )
304
+ (intermediate): RobertaIntermediate(
305
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
306
+ )
307
+ (output): RobertaOutput(
308
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
309
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
310
+ (dropout): Dropout(p=0.1, inplace=False)
311
+ )
312
+ )
313
+ (13): RobertaLayer(
314
+ (attention): RobertaAttention(
315
+ (self): RobertaSelfAttention(
316
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
317
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
318
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
319
+ (dropout): Dropout(p=0.1, inplace=False)
320
+ )
321
+ (output): RobertaSelfOutput(
322
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
323
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
324
+ (dropout): Dropout(p=0.1, inplace=False)
325
+ )
326
+ )
327
+ (intermediate): RobertaIntermediate(
328
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
329
+ )
330
+ (output): RobertaOutput(
331
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
332
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
333
+ (dropout): Dropout(p=0.1, inplace=False)
334
+ )
335
+ )
336
+ (14): RobertaLayer(
337
+ (attention): RobertaAttention(
338
+ (self): RobertaSelfAttention(
339
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
340
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
341
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
342
+ (dropout): Dropout(p=0.1, inplace=False)
343
+ )
344
+ (output): RobertaSelfOutput(
345
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
346
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
347
+ (dropout): Dropout(p=0.1, inplace=False)
348
+ )
349
+ )
350
+ (intermediate): RobertaIntermediate(
351
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
352
+ )
353
+ (output): RobertaOutput(
354
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
355
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
356
+ (dropout): Dropout(p=0.1, inplace=False)
357
+ )
358
+ )
359
+ (15): RobertaLayer(
360
+ (attention): RobertaAttention(
361
+ (self): RobertaSelfAttention(
362
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
363
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
364
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
365
+ (dropout): Dropout(p=0.1, inplace=False)
366
+ )
367
+ (output): RobertaSelfOutput(
368
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
369
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
370
+ (dropout): Dropout(p=0.1, inplace=False)
371
+ )
372
+ )
373
+ (intermediate): RobertaIntermediate(
374
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
375
+ )
376
+ (output): RobertaOutput(
377
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
378
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
379
+ (dropout): Dropout(p=0.1, inplace=False)
380
+ )
381
+ )
382
+ (16): RobertaLayer(
383
+ (attention): RobertaAttention(
384
+ (self): RobertaSelfAttention(
385
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
386
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
387
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
388
+ (dropout): Dropout(p=0.1, inplace=False)
389
+ )
390
+ (output): RobertaSelfOutput(
391
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
392
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
393
+ (dropout): Dropout(p=0.1, inplace=False)
394
+ )
395
+ )
396
+ (intermediate): RobertaIntermediate(
397
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
398
+ )
399
+ (output): RobertaOutput(
400
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
401
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
402
+ (dropout): Dropout(p=0.1, inplace=False)
403
+ )
404
+ )
405
+ (17): RobertaLayer(
406
+ (attention): RobertaAttention(
407
+ (self): RobertaSelfAttention(
408
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
409
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
410
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
411
+ (dropout): Dropout(p=0.1, inplace=False)
412
+ )
413
+ (output): RobertaSelfOutput(
414
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
415
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
416
+ (dropout): Dropout(p=0.1, inplace=False)
417
+ )
418
+ )
419
+ (intermediate): RobertaIntermediate(
420
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
421
+ )
422
+ (output): RobertaOutput(
423
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
424
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
425
+ (dropout): Dropout(p=0.1, inplace=False)
426
+ )
427
+ )
428
+ (18): RobertaLayer(
429
+ (attention): RobertaAttention(
430
+ (self): RobertaSelfAttention(
431
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
432
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
433
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
434
+ (dropout): Dropout(p=0.1, inplace=False)
435
+ )
436
+ (output): RobertaSelfOutput(
437
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
438
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
439
+ (dropout): Dropout(p=0.1, inplace=False)
440
+ )
441
+ )
442
+ (intermediate): RobertaIntermediate(
443
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
444
+ )
445
+ (output): RobertaOutput(
446
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
447
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
448
+ (dropout): Dropout(p=0.1, inplace=False)
449
+ )
450
+ )
451
+ (19): RobertaLayer(
452
+ (attention): RobertaAttention(
453
+ (self): RobertaSelfAttention(
454
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
455
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
456
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
457
+ (dropout): Dropout(p=0.1, inplace=False)
458
+ )
459
+ (output): RobertaSelfOutput(
460
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
461
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
462
+ (dropout): Dropout(p=0.1, inplace=False)
463
+ )
464
+ )
465
+ (intermediate): RobertaIntermediate(
466
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
467
+ )
468
+ (output): RobertaOutput(
469
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
470
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
471
+ (dropout): Dropout(p=0.1, inplace=False)
472
+ )
473
+ )
474
+ (20): RobertaLayer(
475
+ (attention): RobertaAttention(
476
+ (self): RobertaSelfAttention(
477
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
478
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
479
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
480
+ (dropout): Dropout(p=0.1, inplace=False)
481
+ )
482
+ (output): RobertaSelfOutput(
483
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
484
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
485
+ (dropout): Dropout(p=0.1, inplace=False)
486
+ )
487
+ )
488
+ (intermediate): RobertaIntermediate(
489
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
490
+ )
491
+ (output): RobertaOutput(
492
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
493
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
494
+ (dropout): Dropout(p=0.1, inplace=False)
495
+ )
496
+ )
497
+ (21): RobertaLayer(
498
+ (attention): RobertaAttention(
499
+ (self): RobertaSelfAttention(
500
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
501
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
502
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
503
+ (dropout): Dropout(p=0.1, inplace=False)
504
+ )
505
+ (output): RobertaSelfOutput(
506
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
507
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
508
+ (dropout): Dropout(p=0.1, inplace=False)
509
+ )
510
+ )
511
+ (intermediate): RobertaIntermediate(
512
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
513
+ )
514
+ (output): RobertaOutput(
515
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
516
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
517
+ (dropout): Dropout(p=0.1, inplace=False)
518
+ )
519
+ )
520
+ (22): RobertaLayer(
521
+ (attention): RobertaAttention(
522
+ (self): RobertaSelfAttention(
523
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
524
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
525
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
526
+ (dropout): Dropout(p=0.1, inplace=False)
527
+ )
528
+ (output): RobertaSelfOutput(
529
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
530
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
531
+ (dropout): Dropout(p=0.1, inplace=False)
532
+ )
533
+ )
534
+ (intermediate): RobertaIntermediate(
535
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
536
+ )
537
+ (output): RobertaOutput(
538
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
539
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
540
+ (dropout): Dropout(p=0.1, inplace=False)
541
+ )
542
+ )
543
+ (23): RobertaLayer(
544
+ (attention): RobertaAttention(
545
+ (self): RobertaSelfAttention(
546
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
547
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
548
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
549
+ (dropout): Dropout(p=0.1, inplace=False)
550
+ )
551
+ (output): RobertaSelfOutput(
552
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
553
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
554
+ (dropout): Dropout(p=0.1, inplace=False)
555
+ )
556
+ )
557
+ (intermediate): RobertaIntermediate(
558
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
559
+ )
560
+ (output): RobertaOutput(
561
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
562
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
563
+ (dropout): Dropout(p=0.1, inplace=False)
564
+ )
565
+ )
566
+ )
567
+ )
568
+ (pooler): RobertaPooler(
569
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
570
+ (activation): Tanh()
571
+ )
572
+ )
573
+ )
574
+ (word_dropout): WordDropout(p=0.05)
575
+ (locked_dropout): LockedDropout(p=0.5)
576
+ (linear): Linear(in_features=1024, out_features=20, bias=True)
577
+ (beta): 1.0
578
+ (weights): None
579
+ (weight_tensor) None
580
+ )"
581
+ 2021-02-20 11:56:18,094 ----------------------------------------------------------------------------------------------------
582
+ 2021-02-20 11:56:18,095 Corpus: "MultiCorpus: 16744 train + 3449 dev + 3658 test sentences
583
+ - CONLL_03 Corpus: 14903 train + 3449 dev + 3658 test sentences
584
+ - WIKIGOLD_NER Corpus: 1841 train + 0 dev + 0 test sentences"
585
+ 2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
586
+ 2021-02-20 11:56:18,095 Parameters:
587
+ 2021-02-20 11:56:18,095 - learning_rate: "5e-06"
588
+ 2021-02-20 11:56:18,095 - mini_batch_size: "4"
589
+ 2021-02-20 11:56:18,095 - patience: "3"
590
+ 2021-02-20 11:56:18,095 - anneal_factor: "0.5"
591
+ 2021-02-20 11:56:18,095 - max_epochs: "20"
592
+ 2021-02-20 11:56:18,095 - shuffle: "True"
593
+ 2021-02-20 11:56:18,095 - train_with_dev: "True"
594
+ 2021-02-20 11:56:18,095 - batch_growth_annealing: "False"
595
+ 2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
596
+ 2021-02-20 11:56:18,095 Model training base path: "resources/contextdrop/d-flert-en_release-ft+dev-xlm-roberta-large-context+drop-64-True-42"
597
+ 2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
598
+ 2021-02-20 11:56:18,095 Device: cuda:1
599
+ 2021-02-20 11:56:18,095 ----------------------------------------------------------------------------------------------------
600
+ 2021-02-20 11:56:18,095 Embeddings storage mode: none
601
+ 2021-02-20 11:56:18,104 ----------------------------------------------------------------------------------------------------
602
+ 2021-02-20 11:59:49,493 epoch 1 - iter 504/5049 - loss 0.84988712 - samples/sec: 9.54 - lr: 0.000005
603
+ 2021-02-20 12:03:17,203 epoch 1 - iter 1008/5049 - loss 0.64131590 - samples/sec: 9.71 - lr: 0.000005
604
+ 2021-02-20 12:06:42,427 epoch 1 - iter 1512/5049 - loss 0.54315957 - samples/sec: 9.82 - lr: 0.000005
605
+ 2021-02-20 12:10:12,872 epoch 1 - iter 2016/5049 - loss 0.48025516 - samples/sec: 9.58 - lr: 0.000005
606
+ 2021-02-20 12:13:43,522 epoch 1 - iter 2520/5049 - loss 0.46057764 - samples/sec: 9.57 - lr: 0.000005
607
+ 2021-02-20 12:17:12,894 epoch 1 - iter 3024/5049 - loss 0.42570537 - samples/sec: 9.63 - lr: 0.000005
608
+ 2021-02-20 12:20:41,525 epoch 1 - iter 3528/5049 - loss 0.39857695 - samples/sec: 9.66 - lr: 0.000005
609
+ 2021-02-20 12:24:14,564 epoch 1 - iter 4032/5049 - loss 0.38416717 - samples/sec: 9.46 - lr: 0.000005
610
+ 2021-02-20 12:27:45,615 epoch 1 - iter 4536/5049 - loss 0.37032747 - samples/sec: 9.55 - lr: 0.000005
611
+ 2021-02-20 12:31:13,574 epoch 1 - iter 5040/5049 - loss 0.35966340 - samples/sec: 9.70 - lr: 0.000005
612
+ 2021-02-20 12:31:17,124 ----------------------------------------------------------------------------------------------------
613
+ 2021-02-20 12:31:17,124 EPOCH 1 done: loss 0.3592 - lr 0.0000050
614
+ 2021-02-20 12:33:21,019 TEST : loss 0.17636922001838684 - score 0.9057
615
+ 2021-02-20 12:33:21,046 BAD EPOCHS (no improvement): 4
616
+ 2021-02-20 12:33:21,047 ----------------------------------------------------------------------------------------------------
617
+ 2021-02-20 12:36:49,271 epoch 2 - iter 504/5049 - loss 0.25564826 - samples/sec: 9.68 - lr: 0.000005
618
+ 2021-02-20 12:40:20,219 epoch 2 - iter 1008/5049 - loss 0.25560543 - samples/sec: 9.56 - lr: 0.000005
619
+ 2021-02-20 12:43:48,750 epoch 2 - iter 1512/5049 - loss 0.24306949 - samples/sec: 9.67 - lr: 0.000005
620
+ 2021-02-20 12:47:18,010 epoch 2 - iter 2016/5049 - loss 0.23918902 - samples/sec: 9.63 - lr: 0.000005
621
+ 2021-02-20 12:50:49,034 epoch 2 - iter 2520/5049 - loss 0.23745494 - samples/sec: 9.55 - lr: 0.000005
622
+ 2021-02-20 12:54:18,224 epoch 2 - iter 3024/5049 - loss 0.23599522 - samples/sec: 9.64 - lr: 0.000005
623
+ 2021-02-20 12:57:46,500 epoch 2 - iter 3528/5049 - loss 0.22758435 - samples/sec: 9.68 - lr: 0.000005
624
+ 2021-02-20 13:01:14,137 epoch 2 - iter 4032/5049 - loss 0.22602197 - samples/sec: 9.71 - lr: 0.000005
625
+ 2021-02-20 13:04:43,356 epoch 2 - iter 4536/5049 - loss 0.22365802 - samples/sec: 9.64 - lr: 0.000005
626
+ 2021-02-20 13:08:14,129 epoch 2 - iter 5040/5049 - loss 0.22152549 - samples/sec: 9.57 - lr: 0.000005
627
+ 2021-02-20 13:08:17,630 ----------------------------------------------------------------------------------------------------
628
+ 2021-02-20 13:08:17,630 EPOCH 2 done: loss 0.2212 - lr 0.0000049
629
+ 2021-02-20 13:10:20,643 TEST : loss 0.11124755442142487 - score 0.9279
630
+ 2021-02-20 13:10:20,675 BAD EPOCHS (no improvement): 4
631
+ 2021-02-20 13:10:20,680 ----------------------------------------------------------------------------------------------------
632
+ 2021-02-20 13:13:50,443 epoch 3 - iter 504/5049 - loss 0.17266852 - samples/sec: 9.61 - lr: 0.000005
633
+ 2021-02-20 13:17:19,023 epoch 3 - iter 1008/5049 - loss 0.18002962 - samples/sec: 9.67 - lr: 0.000005
634
+ 2021-02-20 13:20:49,199 epoch 3 - iter 1512/5049 - loss 0.18510266 - samples/sec: 9.59 - lr: 0.000005
635
+ 2021-02-20 13:24:19,385 epoch 3 - iter 2016/5049 - loss 0.19983503 - samples/sec: 9.59 - lr: 0.000005
636
+ 2021-02-20 13:27:48,348 epoch 3 - iter 2520/5049 - loss 0.20190812 - samples/sec: 9.65 - lr: 0.000005
637
+ 2021-02-20 13:31:15,582 epoch 3 - iter 3024/5049 - loss 0.19944912 - samples/sec: 9.73 - lr: 0.000005
638
+ 2021-02-20 13:34:43,944 epoch 3 - iter 3528/5049 - loss 0.19932389 - samples/sec: 9.68 - lr: 0.000005
639
+ 2021-02-20 13:38:13,075 epoch 3 - iter 4032/5049 - loss 0.19547160 - samples/sec: 9.64 - lr: 0.000005
640
+ 2021-02-20 13:41:42,971 epoch 3 - iter 4536/5049 - loss 0.19618987 - samples/sec: 9.61 - lr: 0.000005
641
+ 2021-02-20 13:45:10,066 epoch 3 - iter 5040/5049 - loss 0.19343864 - samples/sec: 9.74 - lr: 0.000005
642
+ 2021-02-20 13:45:13,621 ----------------------------------------------------------------------------------------------------
643
+ 2021-02-20 13:45:13,622 EPOCH 3 done: loss 0.1938 - lr 0.0000047
644
+ 2021-02-20 13:47:17,651 TEST : loss 0.11028687655925751 - score 0.9348
645
+ 2021-02-20 13:47:17,678 BAD EPOCHS (no improvement): 4
646
+ 2021-02-20 13:47:17,680 ----------------------------------------------------------------------------------------------------
647
+ 2021-02-20 13:50:48,046 epoch 4 - iter 504/5049 - loss 0.19022199 - samples/sec: 9.58 - lr: 0.000005
648
+ 2021-02-20 13:54:14,852 epoch 4 - iter 1008/5049 - loss 0.17976050 - samples/sec: 9.75 - lr: 0.000005
649
+ 2021-02-20 13:57:44,871 epoch 4 - iter 1512/5049 - loss 0.17729127 - samples/sec: 9.60 - lr: 0.000005
650
+ 2021-02-20 14:01:14,307 epoch 4 - iter 2016/5049 - loss 0.17812706 - samples/sec: 9.63 - lr: 0.000005
651
+ 2021-02-20 14:04:41,981 epoch 4 - iter 2520/5049 - loss 0.18816455 - samples/sec: 9.71 - lr: 0.000005
652
+ 2021-02-20 14:08:10,238 epoch 4 - iter 3024/5049 - loss 0.18990221 - samples/sec: 9.68 - lr: 0.000005
653
+ 2021-02-20 14:11:38,151 epoch 4 - iter 3528/5049 - loss 0.19181303 - samples/sec: 9.70 - lr: 0.000005
654
+ 2021-02-20 14:15:03,479 epoch 4 - iter 4032/5049 - loss 0.19180866 - samples/sec: 9.82 - lr: 0.000005
655
+ 2021-02-20 14:18:32,995 epoch 4 - iter 4536/5049 - loss 0.19160628 - samples/sec: 9.62 - lr: 0.000005
656
+ 2021-02-20 14:22:00,977 epoch 4 - iter 5040/5049 - loss 0.19256281 - samples/sec: 9.69 - lr: 0.000005
657
+ 2021-02-20 14:22:04,481 ----------------------------------------------------------------------------------------------------
658
+ 2021-02-20 14:22:04,482 EPOCH 4 done: loss 0.1926 - lr 0.0000045
659
+ 2021-02-20 14:24:09,809 TEST : loss 0.10681818425655365 - score 0.9383
660
+ 2021-02-20 14:24:09,842 BAD EPOCHS (no improvement): 4
661
+ 2021-02-20 14:24:09,844 ----------------------------------------------------------------------------------------------------
662
+ 2021-02-20 14:27:37,280 epoch 5 - iter 504/5049 - loss 0.16645148 - samples/sec: 9.72 - lr: 0.000004
663
+ 2021-02-20 14:31:05,862 epoch 5 - iter 1008/5049 - loss 0.17264234 - samples/sec: 9.67 - lr: 0.000004
664
+ 2021-02-20 14:34:31,375 epoch 5 - iter 1512/5049 - loss 0.18603685 - samples/sec: 9.81 - lr: 0.000004
665
+ 2021-02-20 14:37:57,695 epoch 5 - iter 2016/5049 - loss 0.18245931 - samples/sec: 9.77 - lr: 0.000004
666
+ 2021-02-20 14:41:25,198 epoch 5 - iter 2520/5049 - loss 0.19293042 - samples/sec: 9.72 - lr: 0.000004
667
+ 2021-02-20 14:44:53,631 epoch 5 - iter 3024/5049 - loss 0.19454820 - samples/sec: 9.67 - lr: 0.000004
668
+ 2021-02-20 14:48:21,579 epoch 5 - iter 3528/5049 - loss 0.18990338 - samples/sec: 9.70 - lr: 0.000004
669
+ 2021-02-20 14:51:51,276 epoch 5 - iter 4032/5049 - loss 0.18768864 - samples/sec: 9.61 - lr: 0.000004
670
+ 2021-02-20 14:55:18,914 epoch 5 - iter 4536/5049 - loss 0.18508693 - samples/sec: 9.71 - lr: 0.000004
671
+ 2021-02-20 14:58:47,195 epoch 5 - iter 5040/5049 - loss 0.18082235 - samples/sec: 9.68 - lr: 0.000004
672
+ 2021-02-20 14:58:50,697 ----------------------------------------------------------------------------------------------------
673
+ 2021-02-20 14:58:50,697 EPOCH 5 done: loss 0.1808 - lr 0.0000043
674
+ 2021-02-20 15:00:56,633 TEST : loss 0.11206260323524475 - score 0.9387
675
+ 2021-02-20 15:00:56,668 BAD EPOCHS (no improvement): 4
676
+ 2021-02-20 15:00:56,672 ----------------------------------------------------------------------------------------------------
677
+ 2021-02-20 15:04:25,586 epoch 6 - iter 504/5049 - loss 0.15912418 - samples/sec: 9.65 - lr: 0.000004
678
+ 2021-02-20 15:07:53,476 epoch 6 - iter 1008/5049 - loss 0.14931369 - samples/sec: 9.70 - lr: 0.000004
679
+ 2021-02-20 15:11:20,667 epoch 6 - iter 1512/5049 - loss 0.15761230 - samples/sec: 9.73 - lr: 0.000004
680
+ 2021-02-20 15:14:47,624 epoch 6 - iter 2016/5049 - loss 0.16075756 - samples/sec: 9.74 - lr: 0.000004
681
+ 2021-02-20 15:18:15,842 epoch 6 - iter 2520/5049 - loss 0.16126459 - samples/sec: 9.68 - lr: 0.000004
682
+ 2021-02-20 15:21:44,174 epoch 6 - iter 3024/5049 - loss 0.16137015 - samples/sec: 9.68 - lr: 0.000004
683
+ 2021-02-20 15:25:11,675 epoch 6 - iter 3528/5049 - loss 0.16742578 - samples/sec: 9.72 - lr: 0.000004
684
+ 2021-02-20 15:28:38,600 epoch 6 - iter 4032/5049 - loss 0.17104120 - samples/sec: 9.74 - lr: 0.000004
685
+ 2021-02-20 15:32:04,821 epoch 6 - iter 4536/5049 - loss 0.17299492 - samples/sec: 9.78 - lr: 0.000004
686
+ 2021-02-20 15:35:33,611 epoch 6 - iter 5040/5049 - loss 0.17502829 - samples/sec: 9.66 - lr: 0.000004
687
+ 2021-02-20 15:35:37,145 ----------------------------------------------------------------------------------------------------
688
+ 2021-02-20 15:35:37,146 EPOCH 6 done: loss 0.1752 - lr 0.0000040
689
+ 2021-02-20 15:37:42,922 TEST : loss 0.11003755778074265 - score 0.9372
690
+ 2021-02-20 15:37:42,957 BAD EPOCHS (no improvement): 4
691
+ 2021-02-20 15:37:42,959 ----------------------------------------------------------------------------------------------------
692
+ 2021-02-20 15:41:11,469 epoch 7 - iter 504/5049 - loss 0.15970022 - samples/sec: 9.67 - lr: 0.000004
693
+ 2021-02-20 15:44:38,687 epoch 7 - iter 1008/5049 - loss 0.16257612 - samples/sec: 9.73 - lr: 0.000004
694
+ 2021-02-20 15:48:07,772 epoch 7 - iter 1512/5049 - loss 0.15637818 - samples/sec: 9.64 - lr: 0.000004
695
+ 2021-02-20 15:51:34,834 epoch 7 - iter 2016/5049 - loss 0.15584222 - samples/sec: 9.74 - lr: 0.000004
696
+ 2021-02-20 15:55:02,825 epoch 7 - iter 2520/5049 - loss 0.15669211 - samples/sec: 9.69 - lr: 0.000004
697
+ 2021-02-20 15:58:30,698 epoch 7 - iter 3024/5049 - loss 0.15856211 - samples/sec: 9.70 - lr: 0.000004
698
+ 2021-02-20 16:01:58,633 epoch 7 - iter 3528/5049 - loss 0.15671081 - samples/sec: 9.70 - lr: 0.000004
699
+ 2021-02-20 16:05:28,295 epoch 7 - iter 4032/5049 - loss 0.15648069 - samples/sec: 9.62 - lr: 0.000004
700
+ 2021-02-20 16:08:56,407 epoch 7 - iter 4536/5049 - loss 0.16071403 - samples/sec: 9.69 - lr: 0.000004
701
+ 2021-02-20 16:12:23,980 epoch 7 - iter 5040/5049 - loss 0.15912073 - samples/sec: 9.71 - lr: 0.000004
702
+ 2021-02-20 16:12:27,258 ----------------------------------------------------------------------------------------------------
703
+ 2021-02-20 16:12:27,258 EPOCH 7 done: loss 0.1593 - lr 0.0000036
704
+ 2021-02-20 16:14:31,752 TEST : loss 0.1253410428762436 - score 0.9331
705
+ 2021-02-20 16:14:31,787 BAD EPOCHS (no improvement): 4
706
+ 2021-02-20 16:14:31,791 ----------------------------------------------------------------------------------------------------
707
+ 2021-02-20 16:18:01,243 epoch 8 - iter 504/5049 - loss 0.14515327 - samples/sec: 9.63 - lr: 0.000004
708
+ 2021-02-20 16:21:29,154 epoch 8 - iter 1008/5049 - loss 0.15844524 - samples/sec: 9.70 - lr: 0.000004
709
+ 2021-02-20 16:24:57,953 epoch 8 - iter 1512/5049 - loss 0.15855560 - samples/sec: 9.66 - lr: 0.000004
710
+ 2021-02-20 16:28:25,738 epoch 8 - iter 2016/5049 - loss 0.15470104 - samples/sec: 9.70 - lr: 0.000003
711
+ 2021-02-20 16:31:54,212 epoch 8 - iter 2520/5049 - loss 0.15710933 - samples/sec: 9.67 - lr: 0.000003
712
+ 2021-02-20 16:35:23,560 epoch 8 - iter 3024/5049 - loss 0.15654992 - samples/sec: 9.63 - lr: 0.000003
713
+ 2021-02-20 16:38:51,123 epoch 8 - iter 3528/5049 - loss 0.15659144 - samples/sec: 9.71 - lr: 0.000003
714
+ 2021-02-20 16:42:19,109 epoch 8 - iter 4032/5049 - loss 0.15848049 - samples/sec: 9.69 - lr: 0.000003
715
+ 2021-02-20 16:45:47,760 epoch 8 - iter 4536/5049 - loss 0.15995362 - samples/sec: 9.66 - lr: 0.000003
716
+ 2021-02-20 16:49:16,138 epoch 8 - iter 5040/5049 - loss 0.16040715 - samples/sec: 9.68 - lr: 0.000003
717
+ 2021-02-20 16:49:19,652 ----------------------------------------------------------------------------------------------------
718
+ 2021-02-20 16:49:19,652 EPOCH 8 done: loss 0.1605 - lr 0.0000033
719
+ 2021-02-20 16:51:24,065 TEST : loss 0.12391051650047302 - score 0.9411
720
+ 2021-02-20 16:51:24,100 BAD EPOCHS (no improvement): 4
721
+ 2021-02-20 16:51:24,104 ----------------------------------------------------------------------------------------------------
722
+ 2021-02-20 16:54:50,947 epoch 9 - iter 504/5049 - loss 0.14319218 - samples/sec: 9.75 - lr: 0.000003
723
+ 2021-02-20 16:58:17,610 epoch 9 - iter 1008/5049 - loss 0.14626190 - samples/sec: 9.76 - lr: 0.000003
724
+ 2021-02-20 17:01:45,887 epoch 9 - iter 1512/5049 - loss 0.14569758 - samples/sec: 9.68 - lr: 0.000003
725
+ 2021-02-20 17:05:13,774 epoch 9 - iter 2016/5049 - loss 0.15481491 - samples/sec: 9.70 - lr: 0.000003
726
+ 2021-02-20 17:08:40,875 epoch 9 - iter 2520/5049 - loss 0.15113900 - samples/sec: 9.74 - lr: 0.000003
727
+ 2021-02-20 17:12:07,457 epoch 9 - iter 3024/5049 - loss 0.15237128 - samples/sec: 9.76 - lr: 0.000003
728
+ 2021-02-20 17:15:34,821 epoch 9 - iter 3528/5049 - loss 0.15264122 - samples/sec: 9.72 - lr: 0.000003
729
+ 2021-02-20 17:19:02,407 epoch 9 - iter 4032/5049 - loss 0.15553964 - samples/sec: 9.71 - lr: 0.000003
730
+ 2021-02-20 17:22:30,994 epoch 9 - iter 4536/5049 - loss 0.15608309 - samples/sec: 9.67 - lr: 0.000003
731
+ 2021-02-20 17:25:57,168 epoch 9 - iter 5040/5049 - loss 0.15908414 - samples/sec: 9.78 - lr: 0.000003
732
+ 2021-02-20 17:26:00,585 ----------------------------------------------------------------------------------------------------
733
+ 2021-02-20 17:26:00,585 EPOCH 9 done: loss 0.1590 - lr 0.0000029
734
+ 2021-02-20 17:28:05,552 TEST : loss 0.1257738471031189 - score 0.9406
735
+ 2021-02-20 17:28:05,583 BAD EPOCHS (no improvement): 4
736
+ 2021-02-20 17:28:05,587 ----------------------------------------------------------------------------------------------------
737
+ 2021-02-20 17:31:34,037 epoch 10 - iter 504/5049 - loss 0.16538340 - samples/sec: 9.67 - lr: 0.000003
738
+ 2021-02-20 17:35:01,686 epoch 10 - iter 1008/5049 - loss 0.16480578 - samples/sec: 9.71 - lr: 0.000003
739
+ 2021-02-20 17:38:30,133 epoch 10 - iter 1512/5049 - loss 0.15934007 - samples/sec: 9.67 - lr: 0.000003
740
+ 2021-02-20 17:41:57,567 epoch 10 - iter 2016/5049 - loss 0.15438570 - samples/sec: 9.72 - lr: 0.000003
741
+ 2021-02-20 17:45:26,625 epoch 10 - iter 2520/5049 - loss 0.14967620 - samples/sec: 9.64 - lr: 0.000003
742
+ 2021-02-20 17:48:54,021 epoch 10 - iter 3024/5049 - loss 0.14847286 - samples/sec: 9.72 - lr: 0.000003
743
+ 2021-02-20 17:52:21,779 epoch 10 - iter 3528/5049 - loss 0.15086106 - samples/sec: 9.70 - lr: 0.000003
744
+ 2021-02-20 17:55:47,985 epoch 10 - iter 4032/5049 - loss 0.14921308 - samples/sec: 9.78 - lr: 0.000003
745
+ 2021-02-20 17:59:16,097 epoch 10 - iter 4536/5049 - loss 0.15006289 - samples/sec: 9.69 - lr: 0.000003
746
+ 2021-02-20 18:02:43,316 epoch 10 - iter 5040/5049 - loss 0.14961823 - samples/sec: 9.73 - lr: 0.000003
747
+ 2021-02-20 18:02:46,866 ----------------------------------------------------------------------------------------------------
748
+ 2021-02-20 18:02:46,866 EPOCH 10 done: loss 0.1496 - lr 0.0000025
749
+ 2021-02-20 18:04:53,002 TEST : loss 0.1382586508989334 - score 0.9355
750
+ 2021-02-20 18:04:53,034 BAD EPOCHS (no improvement): 4
751
+ 2021-02-20 18:04:53,040 ----------------------------------------------------------------------------------------------------
752
+ 2021-02-20 18:08:21,528 epoch 11 - iter 504/5049 - loss 0.15655231 - samples/sec: 9.67 - lr: 0.000002
753
+ 2021-02-20 18:11:49,866 epoch 11 - iter 1008/5049 - loss 0.15351701 - samples/sec: 9.68 - lr: 0.000002
754
+ 2021-02-20 18:15:15,360 epoch 11 - iter 1512/5049 - loss 0.16074115 - samples/sec: 9.81 - lr: 0.000002
755
+ 2021-02-20 18:18:41,580 epoch 11 - iter 2016/5049 - loss 0.15942462 - samples/sec: 9.78 - lr: 0.000002
756
+ 2021-02-20 18:22:09,414 epoch 11 - iter 2520/5049 - loss 0.15244022 - samples/sec: 9.70 - lr: 0.000002
757
+ 2021-02-20 18:25:37,073 epoch 11 - iter 3024/5049 - loss 0.15098374 - samples/sec: 9.71 - lr: 0.000002
758
+ 2021-02-20 18:29:04,540 epoch 11 - iter 3528/5049 - loss 0.14850464 - samples/sec: 9.72 - lr: 0.000002
759
+ 2021-02-20 18:32:31,548 epoch 11 - iter 4032/5049 - loss 0.14682730 - samples/sec: 9.74 - lr: 0.000002
760
+ 2021-02-20 18:35:57,985 epoch 11 - iter 4536/5049 - loss 0.14759185 - samples/sec: 9.77 - lr: 0.000002
761
+ 2021-02-20 18:39:25,816 epoch 11 - iter 5040/5049 - loss 0.14698340 - samples/sec: 9.70 - lr: 0.000002
762
+ 2021-02-20 18:39:29,260 ----------------------------------------------------------------------------------------------------
763
+ 2021-02-20 18:39:29,260 EPOCH 11 done: loss 0.1468 - lr 0.0000021
764
+ 2021-02-20 18:41:33,245 TEST : loss 0.13701947033405304 - score 0.9387
765
+ 2021-02-20 18:41:33,275 BAD EPOCHS (no improvement): 4
766
+ 2021-02-20 18:41:33,280 ----------------------------------------------------------------------------------------------------
767
+ 2021-02-20 18:45:02,899 epoch 12 - iter 504/5049 - loss 0.14915151 - samples/sec: 9.62 - lr: 0.000002
768
+ 2021-02-20 18:48:30,072 epoch 12 - iter 1008/5049 - loss 0.13316084 - samples/sec: 9.73 - lr: 0.000002
769
+ 2021-02-20 18:51:53,567 epoch 12 - iter 1512/5049 - loss 0.13759726 - samples/sec: 9.91 - lr: 0.000002
770
+ 2021-02-20 18:55:21,958 epoch 12 - iter 2016/5049 - loss 0.14573488 - samples/sec: 9.68 - lr: 0.000002
771
+ 2021-02-20 18:58:50,123 epoch 12 - iter 2520/5049 - loss 0.14529516 - samples/sec: 9.69 - lr: 0.000002
772
+ 2021-02-20 19:02:16,173 epoch 12 - iter 3024/5049 - loss 0.14807294 - samples/sec: 9.78 - lr: 0.000002
773
+ 2021-02-20 19:05:43,697 epoch 12 - iter 3528/5049 - loss 0.15232340 - samples/sec: 9.72 - lr: 0.000002
774
+ 2021-02-20 19:09:08,910 epoch 12 - iter 4032/5049 - loss 0.15379466 - samples/sec: 9.82 - lr: 0.000002
775
+ 2021-02-20 19:12:36,683 epoch 12 - iter 4536/5049 - loss 0.15073956 - samples/sec: 9.70 - lr: 0.000002
776
+ 2021-02-20 19:16:04,449 epoch 12 - iter 5040/5049 - loss 0.15045583 - samples/sec: 9.70 - lr: 0.000002
777
+ 2021-02-20 19:16:08,082 ----------------------------------------------------------------------------------------------------
778
+ 2021-02-20 19:16:08,082 EPOCH 12 done: loss 0.1504 - lr 0.0000017
779
+ 2021-02-20 19:18:12,918 TEST : loss 0.15095502138137817 - score 0.9388
780
+ 2021-02-20 19:18:12,953 BAD EPOCHS (no improvement): 4
781
+ 2021-02-20 19:18:12,959 ----------------------------------------------------------------------------------------------------
782
+ 2021-02-20 19:21:40,048 epoch 13 - iter 504/5049 - loss 0.12902688 - samples/sec: 9.74 - lr: 0.000002
783
+ 2021-02-20 19:25:08,962 epoch 13 - iter 1008/5049 - loss 0.13949844 - samples/sec: 9.65 - lr: 0.000002
784
+ 2021-02-20 19:28:34,327 epoch 13 - iter 1512/5049 - loss 0.14321999 - samples/sec: 9.82 - lr: 0.000002
785
+ 2021-02-20 19:32:01,449 epoch 13 - iter 2016/5049 - loss 0.14469366 - samples/sec: 9.73 - lr: 0.000002
786
+ 2021-02-20 19:35:30,176 epoch 13 - iter 2520/5049 - loss 0.14233070 - samples/sec: 9.66 - lr: 0.000002
787
+ 2021-02-20 19:38:58,641 epoch 13 - iter 3024/5049 - loss 0.14131748 - samples/sec: 9.67 - lr: 0.000002
788
+ 2021-02-20 19:42:27,447 epoch 13 - iter 3528/5049 - loss 0.14047840 - samples/sec: 9.66 - lr: 0.000001
789
+ 2021-02-20 19:45:52,955 epoch 13 - iter 4032/5049 - loss 0.14627085 - samples/sec: 9.81 - lr: 0.000001
790
+ 2021-02-20 19:49:18,859 epoch 13 - iter 4536/5049 - loss 0.14438495 - samples/sec: 9.79 - lr: 0.000001
791
+ 2021-02-20 19:52:48,483 epoch 13 - iter 5040/5049 - loss 0.14466525 - samples/sec: 9.62 - lr: 0.000001
792
+ 2021-02-20 19:52:51,977 ----------------------------------------------------------------------------------------------------
793
+ 2021-02-20 19:52:51,977 EPOCH 13 done: loss 0.1446 - lr 0.0000014
794
+ 2021-02-20 19:54:57,358 TEST : loss 0.14258751273155212 - score 0.9399
795
+ 2021-02-20 19:54:57,388 BAD EPOCHS (no improvement): 4
796
+ 2021-02-20 19:54:57,392 ----------------------------------------------------------------------------------------------------
797
+ 2021-02-20 19:58:27,192 epoch 14 - iter 504/5049 - loss 0.15244849 - samples/sec: 9.61 - lr: 0.000001
798
+ 2021-02-20 20:01:54,054 epoch 14 - iter 1008/5049 - loss 0.15439315 - samples/sec: 9.75 - lr: 0.000001
799
+ 2021-02-20 20:05:20,574 epoch 14 - iter 1512/5049 - loss 0.15336394 - samples/sec: 9.76 - lr: 0.000001
800
+ 2021-02-20 20:08:47,946 epoch 14 - iter 2016/5049 - loss 0.15177470 - samples/sec: 9.72 - lr: 0.000001
801
+ 2021-02-20 20:12:16,402 epoch 14 - iter 2520/5049 - loss 0.14492786 - samples/sec: 9.67 - lr: 0.000001
802
+ 2021-02-20 20:15:44,769 epoch 14 - iter 3024/5049 - loss 0.14722528 - samples/sec: 9.68 - lr: 0.000001
803
+ 2021-02-20 20:19:11,969 epoch 14 - iter 3528/5049 - loss 0.14537507 - samples/sec: 9.73 - lr: 0.000001
804
+ 2021-02-20 20:22:40,528 epoch 14 - iter 4032/5049 - loss 0.14247368 - samples/sec: 9.67 - lr: 0.000001
805
+ 2021-02-20 20:26:06,304 epoch 14 - iter 4536/5049 - loss 0.14233014 - samples/sec: 9.80 - lr: 0.000001
806
+ 2021-02-20 20:29:35,214 epoch 14 - iter 5040/5049 - loss 0.14141983 - samples/sec: 9.65 - lr: 0.000001
807
+ 2021-02-20 20:29:38,745 ----------------------------------------------------------------------------------------------------
808
+ 2021-02-20 20:29:38,746 EPOCH 14 done: loss 0.1415 - lr 0.0000010
809
+ 2021-02-20 20:31:42,742 TEST : loss 0.16016331315040588 - score 0.9396
810
+ 2021-02-20 20:31:42,776 BAD EPOCHS (no improvement): 4
811
+ 2021-02-20 20:31:42,874 ----------------------------------------------------------------------------------------------------
812
+ 2021-02-20 20:35:10,584 epoch 15 - iter 504/5049 - loss 0.16948716 - samples/sec: 9.71 - lr: 0.000001
813
+ 2021-02-20 20:38:38,789 epoch 15 - iter 1008/5049 - loss 0.16114678 - samples/sec: 9.68 - lr: 0.000001
814
+ 2021-02-20 20:42:08,608 epoch 15 - iter 1512/5049 - loss 0.15736098 - samples/sec: 9.61 - lr: 0.000001
815
+ 2021-02-20 20:45:37,135 epoch 15 - iter 2016/5049 - loss 0.15347995 - samples/sec: 9.67 - lr: 0.000001
816
+ 2021-02-20 20:49:06,383 epoch 15 - iter 2520/5049 - loss 0.15053243 - samples/sec: 9.64 - lr: 0.000001
817
+ 2021-02-20 20:52:34,741 epoch 15 - iter 3024/5049 - loss 0.15367094 - samples/sec: 9.68 - lr: 0.000001
818
+ 2021-02-20 20:56:02,251 epoch 15 - iter 3528/5049 - loss 0.15097795 - samples/sec: 9.72 - lr: 0.000001
819
+ 2021-02-20 20:59:27,407 epoch 15 - iter 4032/5049 - loss 0.14762646 - samples/sec: 9.83 - lr: 0.000001
820
+ 2021-02-20 21:02:55,468 epoch 15 - iter 4536/5049 - loss 0.14764760 - samples/sec: 9.69 - lr: 0.000001
821
+ 2021-02-20 21:06:24,604 epoch 15 - iter 5040/5049 - loss 0.14664106 - samples/sec: 9.64 - lr: 0.000001
822
+ 2021-02-20 21:06:28,160 ----------------------------------------------------------------------------------------------------
823
+ 2021-02-20 21:06:28,160 EPOCH 15 done: loss 0.1468 - lr 0.0000007
824
+ 2021-02-20 21:08:34,321 TEST : loss 0.15258659422397614 - score 0.94
825
+ 2021-02-20 21:08:34,353 BAD EPOCHS (no improvement): 4
826
+ 2021-02-20 21:08:34,355 ----------------------------------------------------------------------------------------------------
827
+ 2021-02-20 21:12:02,633 epoch 16 - iter 504/5049 - loss 0.14775549 - samples/sec: 9.68 - lr: 0.000001
828
+ 2021-02-20 21:15:29,663 epoch 16 - iter 1008/5049 - loss 0.15171173 - samples/sec: 9.74 - lr: 0.000001
829
+ 2021-02-20 21:18:57,081 epoch 16 - iter 1512/5049 - loss 0.15467193 - samples/sec: 9.72 - lr: 0.000001
830
+ 2021-02-20 21:22:22,530 epoch 16 - iter 2016/5049 - loss 0.15499647 - samples/sec: 9.81 - lr: 0.000001
831
+ 2021-02-20 21:25:49,850 epoch 16 - iter 2520/5049 - loss 0.15723807 - samples/sec: 9.73 - lr: 0.000001
832
+ 2021-02-20 21:29:15,774 epoch 16 - iter 3024/5049 - loss 0.15353327 - samples/sec: 9.79 - lr: 0.000001
833
+ 2021-02-20 21:32:44,337 epoch 16 - iter 3528/5049 - loss 0.15530051 - samples/sec: 9.67 - lr: 0.000001
834
+ 2021-02-20 21:36:13,762 epoch 16 - iter 4032/5049 - loss 0.15354102 - samples/sec: 9.63 - lr: 0.000001
835
+ 2021-02-20 21:39:40,865 epoch 16 - iter 4536/5049 - loss 0.15328424 - samples/sec: 9.74 - lr: 0.000001
836
+ 2021-02-20 21:43:07,866 epoch 16 - iter 5040/5049 - loss 0.15234921 - samples/sec: 9.74 - lr: 0.000000
837
+ 2021-02-20 21:43:11,383 ----------------------------------------------------------------------------------------------------
838
+ 2021-02-20 21:43:11,383 EPOCH 16 done: loss 0.1522 - lr 0.0000005
839
+ 2021-02-20 21:45:16,386 TEST : loss 0.14317740499973297 - score 0.9434
840
+ 2021-02-20 21:45:16,421 BAD EPOCHS (no improvement): 4
841
+ 2021-02-20 21:45:16,435 ----------------------------------------------------------------------------------------------------
842
+ 2021-02-20 21:48:44,324 epoch 17 - iter 504/5049 - loss 0.17996491 - samples/sec: 9.70 - lr: 0.000000
843
+ 2021-02-20 21:52:11,485 epoch 17 - iter 1008/5049 - loss 0.15543252 - samples/sec: 9.73 - lr: 0.000000
844
+ 2021-02-20 21:55:39,073 epoch 17 - iter 1512/5049 - loss 0.15122585 - samples/sec: 9.71 - lr: 0.000000
845
+ 2021-02-20 21:59:05,347 epoch 17 - iter 2016/5049 - loss 0.14783825 - samples/sec: 9.77 - lr: 0.000000
846
+ 2021-02-20 22:02:33,153 epoch 17 - iter 2520/5049 - loss 0.14858434 - samples/sec: 9.70 - lr: 0.000000
847
+ 2021-02-20 22:06:00,594 epoch 17 - iter 3024/5049 - loss 0.14719342 - samples/sec: 9.72 - lr: 0.000000
848
+ 2021-02-20 22:09:28,634 epoch 17 - iter 3528/5049 - loss 0.14664091 - samples/sec: 9.69 - lr: 0.000000
849
+ 2021-02-20 22:12:55,588 epoch 17 - iter 4032/5049 - loss 0.14789258 - samples/sec: 9.74 - lr: 0.000000
850
+ 2021-02-20 22:16:23,015 epoch 17 - iter 4536/5049 - loss 0.14772011 - samples/sec: 9.72 - lr: 0.000000
851
+ 2021-02-20 22:19:48,689 epoch 17 - iter 5040/5049 - loss 0.14601221 - samples/sec: 9.80 - lr: 0.000000
852
+ 2021-02-20 22:19:52,053 ----------------------------------------------------------------------------------------------------
853
+ 2021-02-20 22:19:52,053 EPOCH 17 done: loss 0.1459 - lr 0.0000003
854
+ 2021-02-20 22:21:56,595 TEST : loss 0.14864514768123627 - score 0.9435
855
+ 2021-02-20 22:21:56,631 BAD EPOCHS (no improvement): 4
856
+ 2021-02-20 22:21:56,633 ----------------------------------------------------------------------------------------------------
857
+ 2021-02-20 22:25:22,139 epoch 18 - iter 504/5049 - loss 0.13554364 - samples/sec: 9.81 - lr: 0.000000
858
+ 2021-02-20 22:28:49,994 epoch 18 - iter 1008/5049 - loss 0.14305913 - samples/sec: 9.70 - lr: 0.000000
859
+ 2021-02-20 22:32:15,601 epoch 18 - iter 1512/5049 - loss 0.13788820 - samples/sec: 9.81 - lr: 0.000000
860
+ 2021-02-20 22:35:43,508 epoch 18 - iter 2016/5049 - loss 0.13837578 - samples/sec: 9.70 - lr: 0.000000
861
+ 2021-02-20 22:39:11,318 epoch 18 - iter 2520/5049 - loss 0.14012105 - samples/sec: 9.70 - lr: 0.000000
862
+ 2021-02-20 22:42:39,481 epoch 18 - iter 3024/5049 - loss 0.13876418 - samples/sec: 9.69 - lr: 0.000000
863
+ 2021-02-20 22:46:07,677 epoch 18 - iter 3528/5049 - loss 0.13934073 - samples/sec: 9.68 - lr: 0.000000
864
+ 2021-02-20 22:49:36,353 epoch 18 - iter 4032/5049 - loss 0.14036170 - samples/sec: 9.66 - lr: 0.000000
865
+ 2021-02-20 22:53:02,472 epoch 18 - iter 4536/5049 - loss 0.13826052 - samples/sec: 9.78 - lr: 0.000000
866
+ 2021-02-20 22:56:29,133 epoch 18 - iter 5040/5049 - loss 0.13982791 - samples/sec: 9.76 - lr: 0.000000
867
+ 2021-02-20 22:56:32,612 ----------------------------------------------------------------------------------------------------
868
+ 2021-02-20 22:56:32,613 EPOCH 18 done: loss 0.1397 - lr 0.0000001
869
+ 2021-02-20 22:58:37,314 TEST : loss 0.1518455296754837 - score 0.9437
870
+ 2021-02-20 22:58:37,347 BAD EPOCHS (no improvement): 4
871
+ 2021-02-20 22:58:37,349 ----------------------------------------------------------------------------------------------------
872
+ 2021-02-20 23:02:03,828 epoch 19 - iter 504/5049 - loss 0.13900759 - samples/sec: 9.76 - lr: 0.000000
873
+ 2021-02-20 23:05:30,296 epoch 19 - iter 1008/5049 - loss 0.14452024 - samples/sec: 9.77 - lr: 0.000000
874
+ 2021-02-20 23:08:57,447 epoch 19 - iter 1512/5049 - loss 0.14064833 - samples/sec: 9.73 - lr: 0.000000
875
+ 2021-02-20 23:12:23,953 epoch 19 - iter 2016/5049 - loss 0.13464772 - samples/sec: 9.76 - lr: 0.000000
876
+ 2021-02-20 23:15:51,459 epoch 19 - iter 2520/5049 - loss 0.13777886 - samples/sec: 9.72 - lr: 0.000000
877
+ 2021-02-20 23:19:17,489 epoch 19 - iter 3024/5049 - loss 0.13952515 - samples/sec: 9.79 - lr: 0.000000
878
+ 2021-02-20 23:22:45,967 epoch 19 - iter 3528/5049 - loss 0.14131733 - samples/sec: 9.67 - lr: 0.000000
879
+ 2021-02-20 23:26:13,407 epoch 19 - iter 4032/5049 - loss 0.13939496 - samples/sec: 9.72 - lr: 0.000000
880
+ 2021-02-20 23:29:44,085 epoch 19 - iter 4536/5049 - loss 0.13930015 - samples/sec: 9.57 - lr: 0.000000
881
+ 2021-02-20 23:33:12,190 epoch 19 - iter 5040/5049 - loss 0.14268221 - samples/sec: 9.69 - lr: 0.000000
882
+ 2021-02-20 23:33:15,754 ----------------------------------------------------------------------------------------------------
883
+ 2021-02-20 23:33:15,754 EPOCH 19 done: loss 0.1425 - lr 0.0000000
884
+ 2021-02-20 23:35:20,374 TEST : loss 0.15170469880104065 - score 0.9433
885
+ 2021-02-20 23:35:20,405 BAD EPOCHS (no improvement): 4
886
+ 2021-02-20 23:35:20,408 ----------------------------------------------------------------------------------------------------
887
+ 2021-02-20 23:38:48,797 epoch 20 - iter 504/5049 - loss 0.11983740 - samples/sec: 9.68 - lr: 0.000000
888
+ 2021-02-20 23:42:16,401 epoch 20 - iter 1008/5049 - loss 0.12881478 - samples/sec: 9.71 - lr: 0.000000
889
+ 2021-02-20 23:45:42,588 epoch 20 - iter 1512/5049 - loss 0.13435941 - samples/sec: 9.78 - lr: 0.000000
890
+ 2021-02-20 23:49:09,566 epoch 20 - iter 2016/5049 - loss 0.13495553 - samples/sec: 9.74 - lr: 0.000000
891
+ 2021-02-20 23:52:36,896 epoch 20 - iter 2520/5049 - loss 0.13517442 - samples/sec: 9.72 - lr: 0.000000
892
+ 2021-02-20 23:56:06,234 epoch 20 - iter 3024/5049 - loss 0.13889997 - samples/sec: 9.63 - lr: 0.000000
893
+ 2021-02-20 23:59:35,831 epoch 20 - iter 3528/5049 - loss 0.13720651 - samples/sec: 9.62 - lr: 0.000000
894
+ 2021-02-21 00:03:03,594 epoch 20 - iter 4032/5049 - loss 0.13855230 - samples/sec: 9.70 - lr: 0.000000
895
+ 2021-02-21 00:06:30,095 epoch 20 - iter 4536/5049 - loss 0.14032340 - samples/sec: 9.76 - lr: 0.000000
896
+ 2021-02-21 00:09:58,484 epoch 20 - iter 5040/5049 - loss 0.13983281 - samples/sec: 9.68 - lr: 0.000000
897
+ 2021-02-21 00:10:02,013 ----------------------------------------------------------------------------------------------------
898
+ 2021-02-21 00:10:02,013 EPOCH 20 done: loss 0.1398 - lr 0.0000000
899
+ 2021-02-21 00:12:06,767 TEST : loss 0.15191785991191864 - score 0.9436
900
+ 2021-02-21 00:12:06,801 BAD EPOCHS (no improvement): 4
901
+ 2021-02-21 00:12:53,129 ----------------------------------------------------------------------------------------------------
902
+ 2021-02-21 00:12:53,129 Testing using best model ...
903
+ 2021-02-21 00:15:03,989 0.9408 0.9465 0.9436
904
+ 2021-02-21 00:15:03,989
905
+ Results:
906
+ - F1-score (micro) 0.9436
907
+ - F1-score (macro) 0.9374
908
+
909
+ By class:
910
+ LOC tp: 1445 - fp: 134 - fn: 69 - precision: 0.9151 - recall: 0.9544 - f1-score: 0.9344
911
+ MISC tp: 627 - fp: 96 - fn: 51 - precision: 0.8672 - recall: 0.9248 - f1-score: 0.8951
912
+ ORG tp: 1679 - fp: 98 - fn: 174 - precision: 0.9449 - recall: 0.9061 - f1-score: 0.9251
913
+ PER tp: 1587 - fp: 8 - fn: 8 - precision: 0.9950 - recall: 0.9950 - f1-score: 0.9950
914
+ 2021-02-21 00:15:03,989 ----------------------------------------------------------------------------------------------------
915
+ 2021-02-21 00:15:03,989 ----------------------------------------------------------------------------------------------------