File size: 23,860 Bytes
be4d905
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
2023-10-25 10:37:21,901 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,902 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(64001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-25 10:37:21,902 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 MultiCorpus: 6183 train + 680 dev + 2113 test sentences
 - NER_HIPE_2022 Corpus: 6183 train + 680 dev + 2113 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/topres19th/en/with_doc_seperator
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 Train:  6183 sentences
2023-10-25 10:37:21,903         (train_with_dev=False, train_with_test=False)
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 Training Params:
2023-10-25 10:37:21,903  - learning_rate: "3e-05" 
2023-10-25 10:37:21,903  - mini_batch_size: "8"
2023-10-25 10:37:21,903  - max_epochs: "10"
2023-10-25 10:37:21,903  - shuffle: "True"
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 Plugins:
2023-10-25 10:37:21,903  - TensorboardLogger
2023-10-25 10:37:21,903  - LinearScheduler | warmup_fraction: '0.1'
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 Final evaluation on model from best epoch (best-model.pt)
2023-10-25 10:37:21,903  - metric: "('micro avg', 'f1-score')"
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 Computation:
2023-10-25 10:37:21,903  - compute on device: cuda:0
2023-10-25 10:37:21,903  - embedding storage: none
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 Model training base path: "hmbench-topres19th/en-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2"
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 ----------------------------------------------------------------------------------------------------
2023-10-25 10:37:21,903 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-25 10:37:26,605 epoch 1 - iter 77/773 - loss 2.00342293 - time (sec): 4.70 - samples/sec: 2712.01 - lr: 0.000003 - momentum: 0.000000
2023-10-25 10:37:31,262 epoch 1 - iter 154/773 - loss 1.14055389 - time (sec): 9.36 - samples/sec: 2662.74 - lr: 0.000006 - momentum: 0.000000
2023-10-25 10:37:35,889 epoch 1 - iter 231/773 - loss 0.82625202 - time (sec): 13.98 - samples/sec: 2642.41 - lr: 0.000009 - momentum: 0.000000
2023-10-25 10:37:40,575 epoch 1 - iter 308/773 - loss 0.64493288 - time (sec): 18.67 - samples/sec: 2668.34 - lr: 0.000012 - momentum: 0.000000
2023-10-25 10:37:45,208 epoch 1 - iter 385/773 - loss 0.53761507 - time (sec): 23.30 - samples/sec: 2669.10 - lr: 0.000015 - momentum: 0.000000
2023-10-25 10:37:49,742 epoch 1 - iter 462/773 - loss 0.47184545 - time (sec): 27.84 - samples/sec: 2656.78 - lr: 0.000018 - momentum: 0.000000
2023-10-25 10:37:54,392 epoch 1 - iter 539/773 - loss 0.41778036 - time (sec): 32.49 - samples/sec: 2672.72 - lr: 0.000021 - momentum: 0.000000
2023-10-25 10:37:59,140 epoch 1 - iter 616/773 - loss 0.37526149 - time (sec): 37.24 - samples/sec: 2675.01 - lr: 0.000024 - momentum: 0.000000
2023-10-25 10:38:03,677 epoch 1 - iter 693/773 - loss 0.34438623 - time (sec): 41.77 - samples/sec: 2678.19 - lr: 0.000027 - momentum: 0.000000
2023-10-25 10:38:08,219 epoch 1 - iter 770/773 - loss 0.32016433 - time (sec): 46.31 - samples/sec: 2673.41 - lr: 0.000030 - momentum: 0.000000
2023-10-25 10:38:08,404 ----------------------------------------------------------------------------------------------------
2023-10-25 10:38:08,404 EPOCH 1 done: loss 0.3193 - lr: 0.000030
2023-10-25 10:38:11,901 DEV : loss 0.05575157329440117 - f1-score (micro avg)  0.7258
2023-10-25 10:38:11,920 saving best model
2023-10-25 10:38:12,442 ----------------------------------------------------------------------------------------------------
2023-10-25 10:38:17,190 epoch 2 - iter 77/773 - loss 0.06917782 - time (sec): 4.75 - samples/sec: 2589.73 - lr: 0.000030 - momentum: 0.000000
2023-10-25 10:38:21,845 epoch 2 - iter 154/773 - loss 0.07295557 - time (sec): 9.40 - samples/sec: 2635.98 - lr: 0.000029 - momentum: 0.000000
2023-10-25 10:38:26,432 epoch 2 - iter 231/773 - loss 0.07445178 - time (sec): 13.99 - samples/sec: 2577.67 - lr: 0.000029 - momentum: 0.000000
2023-10-25 10:38:31,123 epoch 2 - iter 308/773 - loss 0.07546608 - time (sec): 18.68 - samples/sec: 2620.82 - lr: 0.000029 - momentum: 0.000000
2023-10-25 10:38:35,870 epoch 2 - iter 385/773 - loss 0.07298737 - time (sec): 23.43 - samples/sec: 2630.85 - lr: 0.000028 - momentum: 0.000000
2023-10-25 10:38:40,555 epoch 2 - iter 462/773 - loss 0.07131937 - time (sec): 28.11 - samples/sec: 2642.70 - lr: 0.000028 - momentum: 0.000000
2023-10-25 10:38:45,260 epoch 2 - iter 539/773 - loss 0.07085615 - time (sec): 32.82 - samples/sec: 2652.90 - lr: 0.000028 - momentum: 0.000000
2023-10-25 10:38:50,000 epoch 2 - iter 616/773 - loss 0.07015877 - time (sec): 37.56 - samples/sec: 2628.79 - lr: 0.000027 - momentum: 0.000000
2023-10-25 10:38:54,537 epoch 2 - iter 693/773 - loss 0.06882789 - time (sec): 42.09 - samples/sec: 2624.41 - lr: 0.000027 - momentum: 0.000000
2023-10-25 10:38:59,395 epoch 2 - iter 770/773 - loss 0.06928794 - time (sec): 46.95 - samples/sec: 2635.97 - lr: 0.000027 - momentum: 0.000000
2023-10-25 10:38:59,585 ----------------------------------------------------------------------------------------------------
2023-10-25 10:38:59,586 EPOCH 2 done: loss 0.0691 - lr: 0.000027
2023-10-25 10:39:02,404 DEV : loss 0.049297548830509186 - f1-score (micro avg)  0.8142
2023-10-25 10:39:02,422 saving best model
2023-10-25 10:39:03,089 ----------------------------------------------------------------------------------------------------
2023-10-25 10:39:07,755 epoch 3 - iter 77/773 - loss 0.04741193 - time (sec): 4.66 - samples/sec: 2618.48 - lr: 0.000026 - momentum: 0.000000
2023-10-25 10:39:12,662 epoch 3 - iter 154/773 - loss 0.04548219 - time (sec): 9.57 - samples/sec: 2493.23 - lr: 0.000026 - momentum: 0.000000
2023-10-25 10:39:17,185 epoch 3 - iter 231/773 - loss 0.04459085 - time (sec): 14.09 - samples/sec: 2538.42 - lr: 0.000026 - momentum: 0.000000
2023-10-25 10:39:21,796 epoch 3 - iter 308/773 - loss 0.04386816 - time (sec): 18.70 - samples/sec: 2585.25 - lr: 0.000025 - momentum: 0.000000
2023-10-25 10:39:26,414 epoch 3 - iter 385/773 - loss 0.04428707 - time (sec): 23.32 - samples/sec: 2595.54 - lr: 0.000025 - momentum: 0.000000
2023-10-25 10:39:31,080 epoch 3 - iter 462/773 - loss 0.04386941 - time (sec): 27.99 - samples/sec: 2628.33 - lr: 0.000025 - momentum: 0.000000
2023-10-25 10:39:35,843 epoch 3 - iter 539/773 - loss 0.04425161 - time (sec): 32.75 - samples/sec: 2629.03 - lr: 0.000024 - momentum: 0.000000
2023-10-25 10:39:40,653 epoch 3 - iter 616/773 - loss 0.04565354 - time (sec): 37.56 - samples/sec: 2628.00 - lr: 0.000024 - momentum: 0.000000
2023-10-25 10:39:45,336 epoch 3 - iter 693/773 - loss 0.04638286 - time (sec): 42.24 - samples/sec: 2640.83 - lr: 0.000024 - momentum: 0.000000
2023-10-25 10:39:50,069 epoch 3 - iter 770/773 - loss 0.04580281 - time (sec): 46.98 - samples/sec: 2637.11 - lr: 0.000023 - momentum: 0.000000
2023-10-25 10:39:50,249 ----------------------------------------------------------------------------------------------------
2023-10-25 10:39:50,249 EPOCH 3 done: loss 0.0457 - lr: 0.000023
2023-10-25 10:39:53,011 DEV : loss 0.07478724420070648 - f1-score (micro avg)  0.7705
2023-10-25 10:39:53,029 ----------------------------------------------------------------------------------------------------
2023-10-25 10:39:57,751 epoch 4 - iter 77/773 - loss 0.02381115 - time (sec): 4.72 - samples/sec: 2644.65 - lr: 0.000023 - momentum: 0.000000
2023-10-25 10:40:02,403 epoch 4 - iter 154/773 - loss 0.02272232 - time (sec): 9.37 - samples/sec: 2696.05 - lr: 0.000023 - momentum: 0.000000
2023-10-25 10:40:07,048 epoch 4 - iter 231/773 - loss 0.02306162 - time (sec): 14.02 - samples/sec: 2694.29 - lr: 0.000022 - momentum: 0.000000
2023-10-25 10:40:11,691 epoch 4 - iter 308/773 - loss 0.02500263 - time (sec): 18.66 - samples/sec: 2695.98 - lr: 0.000022 - momentum: 0.000000
2023-10-25 10:40:16,351 epoch 4 - iter 385/773 - loss 0.02577652 - time (sec): 23.32 - samples/sec: 2669.27 - lr: 0.000022 - momentum: 0.000000
2023-10-25 10:40:21,109 epoch 4 - iter 462/773 - loss 0.02834569 - time (sec): 28.08 - samples/sec: 2640.19 - lr: 0.000021 - momentum: 0.000000
2023-10-25 10:40:26,055 epoch 4 - iter 539/773 - loss 0.02860700 - time (sec): 33.02 - samples/sec: 2629.75 - lr: 0.000021 - momentum: 0.000000
2023-10-25 10:40:30,766 epoch 4 - iter 616/773 - loss 0.02820990 - time (sec): 37.73 - samples/sec: 2639.20 - lr: 0.000021 - momentum: 0.000000
2023-10-25 10:40:35,442 epoch 4 - iter 693/773 - loss 0.02789789 - time (sec): 42.41 - samples/sec: 2648.82 - lr: 0.000020 - momentum: 0.000000
2023-10-25 10:40:39,951 epoch 4 - iter 770/773 - loss 0.02986146 - time (sec): 46.92 - samples/sec: 2638.85 - lr: 0.000020 - momentum: 0.000000
2023-10-25 10:40:40,133 ----------------------------------------------------------------------------------------------------
2023-10-25 10:40:40,133 EPOCH 4 done: loss 0.0299 - lr: 0.000020
2023-10-25 10:40:42,825 DEV : loss 0.08224356174468994 - f1-score (micro avg)  0.7658
2023-10-25 10:40:42,842 ----------------------------------------------------------------------------------------------------
2023-10-25 10:40:47,542 epoch 5 - iter 77/773 - loss 0.02345774 - time (sec): 4.70 - samples/sec: 2619.37 - lr: 0.000020 - momentum: 0.000000
2023-10-25 10:40:52,241 epoch 5 - iter 154/773 - loss 0.02122391 - time (sec): 9.40 - samples/sec: 2625.76 - lr: 0.000019 - momentum: 0.000000
2023-10-25 10:40:56,970 epoch 5 - iter 231/773 - loss 0.01984080 - time (sec): 14.13 - samples/sec: 2661.60 - lr: 0.000019 - momentum: 0.000000
2023-10-25 10:41:01,364 epoch 5 - iter 308/773 - loss 0.02263347 - time (sec): 18.52 - samples/sec: 2686.02 - lr: 0.000019 - momentum: 0.000000
2023-10-25 10:41:06,019 epoch 5 - iter 385/773 - loss 0.02194096 - time (sec): 23.18 - samples/sec: 2706.00 - lr: 0.000018 - momentum: 0.000000
2023-10-25 10:41:10,705 epoch 5 - iter 462/773 - loss 0.02179624 - time (sec): 27.86 - samples/sec: 2708.53 - lr: 0.000018 - momentum: 0.000000
2023-10-25 10:41:15,249 epoch 5 - iter 539/773 - loss 0.02057243 - time (sec): 32.41 - samples/sec: 2722.04 - lr: 0.000018 - momentum: 0.000000
2023-10-25 10:41:19,743 epoch 5 - iter 616/773 - loss 0.02059981 - time (sec): 36.90 - samples/sec: 2702.67 - lr: 0.000017 - momentum: 0.000000
2023-10-25 10:41:24,238 epoch 5 - iter 693/773 - loss 0.02048126 - time (sec): 41.39 - samples/sec: 2712.66 - lr: 0.000017 - momentum: 0.000000
2023-10-25 10:41:28,644 epoch 5 - iter 770/773 - loss 0.02099495 - time (sec): 45.80 - samples/sec: 2703.67 - lr: 0.000017 - momentum: 0.000000
2023-10-25 10:41:28,839 ----------------------------------------------------------------------------------------------------
2023-10-25 10:41:28,840 EPOCH 5 done: loss 0.0212 - lr: 0.000017
2023-10-25 10:41:31,552 DEV : loss 0.09945573657751083 - f1-score (micro avg)  0.781
2023-10-25 10:41:31,572 ----------------------------------------------------------------------------------------------------
2023-10-25 10:41:36,179 epoch 6 - iter 77/773 - loss 0.01534591 - time (sec): 4.60 - samples/sec: 2745.09 - lr: 0.000016 - momentum: 0.000000
2023-10-25 10:41:40,809 epoch 6 - iter 154/773 - loss 0.01519805 - time (sec): 9.24 - samples/sec: 2722.23 - lr: 0.000016 - momentum: 0.000000
2023-10-25 10:41:45,397 epoch 6 - iter 231/773 - loss 0.01464584 - time (sec): 13.82 - samples/sec: 2671.47 - lr: 0.000016 - momentum: 0.000000
2023-10-25 10:41:50,157 epoch 6 - iter 308/773 - loss 0.01414850 - time (sec): 18.58 - samples/sec: 2679.03 - lr: 0.000015 - momentum: 0.000000
2023-10-25 10:41:54,914 epoch 6 - iter 385/773 - loss 0.01433663 - time (sec): 23.34 - samples/sec: 2701.85 - lr: 0.000015 - momentum: 0.000000
2023-10-25 10:41:59,690 epoch 6 - iter 462/773 - loss 0.01277122 - time (sec): 28.12 - samples/sec: 2696.50 - lr: 0.000015 - momentum: 0.000000
2023-10-25 10:42:04,329 epoch 6 - iter 539/773 - loss 0.01360655 - time (sec): 32.76 - samples/sec: 2682.46 - lr: 0.000014 - momentum: 0.000000
2023-10-25 10:42:09,181 epoch 6 - iter 616/773 - loss 0.01363450 - time (sec): 37.61 - samples/sec: 2648.33 - lr: 0.000014 - momentum: 0.000000
2023-10-25 10:42:14,051 epoch 6 - iter 693/773 - loss 0.01353720 - time (sec): 42.48 - samples/sec: 2630.62 - lr: 0.000014 - momentum: 0.000000
2023-10-25 10:42:18,779 epoch 6 - iter 770/773 - loss 0.01363499 - time (sec): 47.21 - samples/sec: 2624.71 - lr: 0.000013 - momentum: 0.000000
2023-10-25 10:42:18,961 ----------------------------------------------------------------------------------------------------
2023-10-25 10:42:18,962 EPOCH 6 done: loss 0.0140 - lr: 0.000013
2023-10-25 10:42:22,522 DEV : loss 0.11278796941041946 - f1-score (micro avg)  0.7753
2023-10-25 10:42:22,540 ----------------------------------------------------------------------------------------------------
2023-10-25 10:42:27,293 epoch 7 - iter 77/773 - loss 0.00960456 - time (sec): 4.75 - samples/sec: 2680.96 - lr: 0.000013 - momentum: 0.000000
2023-10-25 10:42:32,004 epoch 7 - iter 154/773 - loss 0.00936374 - time (sec): 9.46 - samples/sec: 2626.52 - lr: 0.000013 - momentum: 0.000000
2023-10-25 10:42:36,826 epoch 7 - iter 231/773 - loss 0.00747500 - time (sec): 14.28 - samples/sec: 2713.22 - lr: 0.000012 - momentum: 0.000000
2023-10-25 10:42:41,284 epoch 7 - iter 308/773 - loss 0.00789143 - time (sec): 18.74 - samples/sec: 2647.02 - lr: 0.000012 - momentum: 0.000000
2023-10-25 10:42:45,927 epoch 7 - iter 385/773 - loss 0.00801181 - time (sec): 23.39 - samples/sec: 2644.58 - lr: 0.000012 - momentum: 0.000000
2023-10-25 10:42:50,560 epoch 7 - iter 462/773 - loss 0.00730589 - time (sec): 28.02 - samples/sec: 2658.91 - lr: 0.000011 - momentum: 0.000000
2023-10-25 10:42:55,182 epoch 7 - iter 539/773 - loss 0.00808199 - time (sec): 32.64 - samples/sec: 2631.74 - lr: 0.000011 - momentum: 0.000000
2023-10-25 10:42:59,804 epoch 7 - iter 616/773 - loss 0.00863132 - time (sec): 37.26 - samples/sec: 2626.61 - lr: 0.000011 - momentum: 0.000000
2023-10-25 10:43:04,706 epoch 7 - iter 693/773 - loss 0.00876967 - time (sec): 42.16 - samples/sec: 2631.43 - lr: 0.000010 - momentum: 0.000000
2023-10-25 10:43:09,406 epoch 7 - iter 770/773 - loss 0.00915212 - time (sec): 46.86 - samples/sec: 2639.82 - lr: 0.000010 - momentum: 0.000000
2023-10-25 10:43:09,584 ----------------------------------------------------------------------------------------------------
2023-10-25 10:43:09,584 EPOCH 7 done: loss 0.0091 - lr: 0.000010
2023-10-25 10:43:12,629 DEV : loss 0.11861388385295868 - f1-score (micro avg)  0.7724
2023-10-25 10:43:12,647 ----------------------------------------------------------------------------------------------------
2023-10-25 10:43:17,325 epoch 8 - iter 77/773 - loss 0.00461380 - time (sec): 4.68 - samples/sec: 2636.24 - lr: 0.000010 - momentum: 0.000000
2023-10-25 10:43:21,940 epoch 8 - iter 154/773 - loss 0.00702621 - time (sec): 9.29 - samples/sec: 2672.72 - lr: 0.000009 - momentum: 0.000000
2023-10-25 10:43:26,655 epoch 8 - iter 231/773 - loss 0.00812508 - time (sec): 14.01 - samples/sec: 2598.23 - lr: 0.000009 - momentum: 0.000000
2023-10-25 10:43:31,362 epoch 8 - iter 308/773 - loss 0.00661780 - time (sec): 18.71 - samples/sec: 2595.72 - lr: 0.000009 - momentum: 0.000000
2023-10-25 10:43:35,901 epoch 8 - iter 385/773 - loss 0.00640004 - time (sec): 23.25 - samples/sec: 2629.17 - lr: 0.000008 - momentum: 0.000000
2023-10-25 10:43:40,469 epoch 8 - iter 462/773 - loss 0.00633717 - time (sec): 27.82 - samples/sec: 2675.90 - lr: 0.000008 - momentum: 0.000000
2023-10-25 10:43:45,129 epoch 8 - iter 539/773 - loss 0.00653842 - time (sec): 32.48 - samples/sec: 2677.43 - lr: 0.000008 - momentum: 0.000000
2023-10-25 10:43:49,745 epoch 8 - iter 616/773 - loss 0.00732939 - time (sec): 37.10 - samples/sec: 2673.52 - lr: 0.000007 - momentum: 0.000000
2023-10-25 10:43:54,376 epoch 8 - iter 693/773 - loss 0.00709463 - time (sec): 41.73 - samples/sec: 2665.60 - lr: 0.000007 - momentum: 0.000000
2023-10-25 10:43:59,079 epoch 8 - iter 770/773 - loss 0.00668988 - time (sec): 46.43 - samples/sec: 2665.10 - lr: 0.000007 - momentum: 0.000000
2023-10-25 10:43:59,266 ----------------------------------------------------------------------------------------------------
2023-10-25 10:43:59,266 EPOCH 8 done: loss 0.0067 - lr: 0.000007
2023-10-25 10:44:02,424 DEV : loss 0.10935225337743759 - f1-score (micro avg)  0.7901
2023-10-25 10:44:02,442 ----------------------------------------------------------------------------------------------------
2023-10-25 10:44:07,211 epoch 9 - iter 77/773 - loss 0.00280222 - time (sec): 4.77 - samples/sec: 2641.93 - lr: 0.000006 - momentum: 0.000000
2023-10-25 10:44:11,756 epoch 9 - iter 154/773 - loss 0.00290731 - time (sec): 9.31 - samples/sec: 2687.34 - lr: 0.000006 - momentum: 0.000000
2023-10-25 10:44:16,394 epoch 9 - iter 231/773 - loss 0.00405151 - time (sec): 13.95 - samples/sec: 2681.13 - lr: 0.000006 - momentum: 0.000000
2023-10-25 10:44:21,122 epoch 9 - iter 308/773 - loss 0.00435854 - time (sec): 18.68 - samples/sec: 2697.32 - lr: 0.000005 - momentum: 0.000000
2023-10-25 10:44:25,805 epoch 9 - iter 385/773 - loss 0.00434929 - time (sec): 23.36 - samples/sec: 2682.52 - lr: 0.000005 - momentum: 0.000000
2023-10-25 10:44:30,607 epoch 9 - iter 462/773 - loss 0.00405141 - time (sec): 28.16 - samples/sec: 2661.17 - lr: 0.000005 - momentum: 0.000000
2023-10-25 10:44:35,355 epoch 9 - iter 539/773 - loss 0.00398165 - time (sec): 32.91 - samples/sec: 2641.67 - lr: 0.000004 - momentum: 0.000000
2023-10-25 10:44:40,206 epoch 9 - iter 616/773 - loss 0.00428787 - time (sec): 37.76 - samples/sec: 2622.47 - lr: 0.000004 - momentum: 0.000000
2023-10-25 10:44:44,909 epoch 9 - iter 693/773 - loss 0.00416455 - time (sec): 42.47 - samples/sec: 2642.13 - lr: 0.000004 - momentum: 0.000000
2023-10-25 10:44:49,527 epoch 9 - iter 770/773 - loss 0.00393684 - time (sec): 47.08 - samples/sec: 2633.23 - lr: 0.000003 - momentum: 0.000000
2023-10-25 10:44:49,707 ----------------------------------------------------------------------------------------------------
2023-10-25 10:44:49,707 EPOCH 9 done: loss 0.0039 - lr: 0.000003
2023-10-25 10:44:52,325 DEV : loss 0.11163745075464249 - f1-score (micro avg)  0.7942
2023-10-25 10:44:52,342 ----------------------------------------------------------------------------------------------------
2023-10-25 10:44:57,023 epoch 10 - iter 77/773 - loss 0.00160620 - time (sec): 4.68 - samples/sec: 2530.57 - lr: 0.000003 - momentum: 0.000000
2023-10-25 10:45:01,740 epoch 10 - iter 154/773 - loss 0.00218583 - time (sec): 9.40 - samples/sec: 2501.82 - lr: 0.000003 - momentum: 0.000000
2023-10-25 10:45:06,385 epoch 10 - iter 231/773 - loss 0.00253136 - time (sec): 14.04 - samples/sec: 2524.83 - lr: 0.000002 - momentum: 0.000000
2023-10-25 10:45:10,823 epoch 10 - iter 308/773 - loss 0.00321432 - time (sec): 18.48 - samples/sec: 2583.04 - lr: 0.000002 - momentum: 0.000000
2023-10-25 10:45:15,327 epoch 10 - iter 385/773 - loss 0.00292953 - time (sec): 22.98 - samples/sec: 2578.23 - lr: 0.000002 - momentum: 0.000000
2023-10-25 10:45:20,048 epoch 10 - iter 462/773 - loss 0.00283533 - time (sec): 27.70 - samples/sec: 2604.20 - lr: 0.000001 - momentum: 0.000000
2023-10-25 10:45:24,719 epoch 10 - iter 539/773 - loss 0.00275001 - time (sec): 32.37 - samples/sec: 2628.38 - lr: 0.000001 - momentum: 0.000000
2023-10-25 10:45:29,618 epoch 10 - iter 616/773 - loss 0.00267312 - time (sec): 37.27 - samples/sec: 2636.23 - lr: 0.000001 - momentum: 0.000000
2023-10-25 10:45:34,364 epoch 10 - iter 693/773 - loss 0.00242572 - time (sec): 42.02 - samples/sec: 2648.91 - lr: 0.000000 - momentum: 0.000000
2023-10-25 10:45:39,113 epoch 10 - iter 770/773 - loss 0.00279658 - time (sec): 46.77 - samples/sec: 2642.08 - lr: 0.000000 - momentum: 0.000000
2023-10-25 10:45:39,310 ----------------------------------------------------------------------------------------------------
2023-10-25 10:45:39,310 EPOCH 10 done: loss 0.0029 - lr: 0.000000
2023-10-25 10:45:42,349 DEV : loss 0.11523404717445374 - f1-score (micro avg)  0.7884
2023-10-25 10:45:43,297 ----------------------------------------------------------------------------------------------------
2023-10-25 10:45:43,299 Loading model from best epoch ...
2023-10-25 10:45:45,437 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-BUILDING, B-BUILDING, E-BUILDING, I-BUILDING, S-STREET, B-STREET, E-STREET, I-STREET
2023-10-25 10:45:55,712 
Results:
- F-score (micro) 0.7656
- F-score (macro) 0.6513
- Accuracy 0.641

By class:
              precision    recall  f1-score   support

         LOC     0.8262    0.8140    0.8200       946
    BUILDING     0.5258    0.5514    0.5383       185
      STREET     0.7368    0.5000    0.5957        56

   micro avg     0.7732    0.7582    0.7656      1187
   macro avg     0.6963    0.6218    0.6513      1187
weighted avg     0.7751    0.7582    0.7655      1187

2023-10-25 10:45:55,712 ----------------------------------------------------------------------------------------------------