File size: 23,886 Bytes
464d828
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
2023-10-14 10:43:55,873 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,874 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-14 10:43:55,874 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,874 MultiCorpus: 5777 train + 722 dev + 723 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-14 10:43:55,874 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,874 Train:  5777 sentences
2023-10-14 10:43:55,874         (train_with_dev=False, train_with_test=False)
2023-10-14 10:43:55,874 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,875 Training Params:
2023-10-14 10:43:55,875  - learning_rate: "5e-05" 
2023-10-14 10:43:55,875  - mini_batch_size: "4"
2023-10-14 10:43:55,875  - max_epochs: "10"
2023-10-14 10:43:55,875  - shuffle: "True"
2023-10-14 10:43:55,875 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,875 Plugins:
2023-10-14 10:43:55,875  - LinearScheduler | warmup_fraction: '0.1'
2023-10-14 10:43:55,875 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,875 Final evaluation on model from best epoch (best-model.pt)
2023-10-14 10:43:55,875  - metric: "('micro avg', 'f1-score')"
2023-10-14 10:43:55,875 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,875 Computation:
2023-10-14 10:43:55,875  - compute on device: cuda:0
2023-10-14 10:43:55,875  - embedding storage: none
2023-10-14 10:43:55,875 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,875 Model training base path: "hmbench-icdar/nl-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-4"
2023-10-14 10:43:55,875 ----------------------------------------------------------------------------------------------------
2023-10-14 10:43:55,875 ----------------------------------------------------------------------------------------------------
2023-10-14 10:44:03,371 epoch 1 - iter 144/1445 - loss 1.53614842 - time (sec): 7.49 - samples/sec: 2488.14 - lr: 0.000005 - momentum: 0.000000
2023-10-14 10:44:10,666 epoch 1 - iter 288/1445 - loss 0.92543100 - time (sec): 14.79 - samples/sec: 2442.39 - lr: 0.000010 - momentum: 0.000000
2023-10-14 10:44:18,047 epoch 1 - iter 432/1445 - loss 0.68772860 - time (sec): 22.17 - samples/sec: 2403.60 - lr: 0.000015 - momentum: 0.000000
2023-10-14 10:44:25,279 epoch 1 - iter 576/1445 - loss 0.56173958 - time (sec): 29.40 - samples/sec: 2387.31 - lr: 0.000020 - momentum: 0.000000
2023-10-14 10:44:32,595 epoch 1 - iter 720/1445 - loss 0.47974609 - time (sec): 36.72 - samples/sec: 2404.37 - lr: 0.000025 - momentum: 0.000000
2023-10-14 10:44:39,880 epoch 1 - iter 864/1445 - loss 0.42590462 - time (sec): 44.00 - samples/sec: 2411.14 - lr: 0.000030 - momentum: 0.000000
2023-10-14 10:44:47,180 epoch 1 - iter 1008/1445 - loss 0.38604397 - time (sec): 51.30 - samples/sec: 2409.43 - lr: 0.000035 - momentum: 0.000000
2023-10-14 10:44:54,827 epoch 1 - iter 1152/1445 - loss 0.35511393 - time (sec): 58.95 - samples/sec: 2419.50 - lr: 0.000040 - momentum: 0.000000
2023-10-14 10:45:01,879 epoch 1 - iter 1296/1445 - loss 0.33007466 - time (sec): 66.00 - samples/sec: 2423.12 - lr: 0.000045 - momentum: 0.000000
2023-10-14 10:45:08,740 epoch 1 - iter 1440/1445 - loss 0.31318374 - time (sec): 72.86 - samples/sec: 2410.77 - lr: 0.000050 - momentum: 0.000000
2023-10-14 10:45:08,985 ----------------------------------------------------------------------------------------------------
2023-10-14 10:45:08,986 EPOCH 1 done: loss 0.3125 - lr: 0.000050
2023-10-14 10:45:12,844 DEV : loss 0.1323496401309967 - f1-score (micro avg)  0.6425
2023-10-14 10:45:12,863 saving best model
2023-10-14 10:45:13,235 ----------------------------------------------------------------------------------------------------
2023-10-14 10:45:20,499 epoch 2 - iter 144/1445 - loss 0.12655175 - time (sec): 7.26 - samples/sec: 2233.65 - lr: 0.000049 - momentum: 0.000000
2023-10-14 10:45:28,359 epoch 2 - iter 288/1445 - loss 0.11664640 - time (sec): 15.12 - samples/sec: 2234.54 - lr: 0.000049 - momentum: 0.000000
2023-10-14 10:45:35,769 epoch 2 - iter 432/1445 - loss 0.11764083 - time (sec): 22.53 - samples/sec: 2297.07 - lr: 0.000048 - momentum: 0.000000
2023-10-14 10:45:43,380 epoch 2 - iter 576/1445 - loss 0.11288782 - time (sec): 30.14 - samples/sec: 2351.55 - lr: 0.000048 - momentum: 0.000000
2023-10-14 10:45:50,679 epoch 2 - iter 720/1445 - loss 0.11005752 - time (sec): 37.44 - samples/sec: 2372.40 - lr: 0.000047 - momentum: 0.000000
2023-10-14 10:45:57,858 epoch 2 - iter 864/1445 - loss 0.10861039 - time (sec): 44.62 - samples/sec: 2366.36 - lr: 0.000047 - momentum: 0.000000
2023-10-14 10:46:04,905 epoch 2 - iter 1008/1445 - loss 0.11009884 - time (sec): 51.67 - samples/sec: 2363.61 - lr: 0.000046 - momentum: 0.000000
2023-10-14 10:46:12,036 epoch 2 - iter 1152/1445 - loss 0.10756740 - time (sec): 58.80 - samples/sec: 2371.45 - lr: 0.000046 - momentum: 0.000000
2023-10-14 10:46:19,797 epoch 2 - iter 1296/1445 - loss 0.10553537 - time (sec): 66.56 - samples/sec: 2362.48 - lr: 0.000045 - momentum: 0.000000
2023-10-14 10:46:27,342 epoch 2 - iter 1440/1445 - loss 0.10547907 - time (sec): 74.10 - samples/sec: 2370.93 - lr: 0.000044 - momentum: 0.000000
2023-10-14 10:46:27,595 ----------------------------------------------------------------------------------------------------
2023-10-14 10:46:27,595 EPOCH 2 done: loss 0.1053 - lr: 0.000044
2023-10-14 10:46:31,969 DEV : loss 0.10467828810214996 - f1-score (micro avg)  0.7101
2023-10-14 10:46:31,987 saving best model
2023-10-14 10:46:32,513 ----------------------------------------------------------------------------------------------------
2023-10-14 10:46:40,845 epoch 3 - iter 144/1445 - loss 0.06046434 - time (sec): 8.33 - samples/sec: 2176.51 - lr: 0.000044 - momentum: 0.000000
2023-10-14 10:46:49,225 epoch 3 - iter 288/1445 - loss 0.06004263 - time (sec): 16.71 - samples/sec: 2131.19 - lr: 0.000043 - momentum: 0.000000
2023-10-14 10:46:56,492 epoch 3 - iter 432/1445 - loss 0.06632407 - time (sec): 23.98 - samples/sec: 2165.44 - lr: 0.000043 - momentum: 0.000000
2023-10-14 10:47:03,797 epoch 3 - iter 576/1445 - loss 0.06736150 - time (sec): 31.28 - samples/sec: 2213.57 - lr: 0.000042 - momentum: 0.000000
2023-10-14 10:47:11,200 epoch 3 - iter 720/1445 - loss 0.06752059 - time (sec): 38.68 - samples/sec: 2266.94 - lr: 0.000042 - momentum: 0.000000
2023-10-14 10:47:18,528 epoch 3 - iter 864/1445 - loss 0.07237618 - time (sec): 46.01 - samples/sec: 2285.16 - lr: 0.000041 - momentum: 0.000000
2023-10-14 10:47:26,048 epoch 3 - iter 1008/1445 - loss 0.07408386 - time (sec): 53.53 - samples/sec: 2314.03 - lr: 0.000041 - momentum: 0.000000
2023-10-14 10:47:33,156 epoch 3 - iter 1152/1445 - loss 0.07290332 - time (sec): 60.64 - samples/sec: 2314.08 - lr: 0.000040 - momentum: 0.000000
2023-10-14 10:47:40,398 epoch 3 - iter 1296/1445 - loss 0.07248165 - time (sec): 67.88 - samples/sec: 2319.56 - lr: 0.000039 - momentum: 0.000000
2023-10-14 10:47:47,740 epoch 3 - iter 1440/1445 - loss 0.07325238 - time (sec): 75.22 - samples/sec: 2331.61 - lr: 0.000039 - momentum: 0.000000
2023-10-14 10:47:48,047 ----------------------------------------------------------------------------------------------------
2023-10-14 10:47:48,048 EPOCH 3 done: loss 0.0730 - lr: 0.000039
2023-10-14 10:47:51,583 DEV : loss 0.09553560614585876 - f1-score (micro avg)  0.8021
2023-10-14 10:47:51,599 saving best model
2023-10-14 10:47:52,265 ----------------------------------------------------------------------------------------------------
2023-10-14 10:47:59,539 epoch 4 - iter 144/1445 - loss 0.04698177 - time (sec): 7.27 - samples/sec: 2415.48 - lr: 0.000038 - momentum: 0.000000
2023-10-14 10:48:07,133 epoch 4 - iter 288/1445 - loss 0.06031754 - time (sec): 14.87 - samples/sec: 2411.66 - lr: 0.000038 - momentum: 0.000000
2023-10-14 10:48:14,165 epoch 4 - iter 432/1445 - loss 0.06263096 - time (sec): 21.90 - samples/sec: 2410.81 - lr: 0.000037 - momentum: 0.000000
2023-10-14 10:48:21,491 epoch 4 - iter 576/1445 - loss 0.05725339 - time (sec): 29.22 - samples/sec: 2417.83 - lr: 0.000037 - momentum: 0.000000
2023-10-14 10:48:28,487 epoch 4 - iter 720/1445 - loss 0.05516418 - time (sec): 36.22 - samples/sec: 2406.40 - lr: 0.000036 - momentum: 0.000000
2023-10-14 10:48:36,217 epoch 4 - iter 864/1445 - loss 0.05343786 - time (sec): 43.95 - samples/sec: 2404.89 - lr: 0.000036 - momentum: 0.000000
2023-10-14 10:48:43,475 epoch 4 - iter 1008/1445 - loss 0.05296204 - time (sec): 51.21 - samples/sec: 2397.09 - lr: 0.000035 - momentum: 0.000000
2023-10-14 10:48:50,864 epoch 4 - iter 1152/1445 - loss 0.05396480 - time (sec): 58.60 - samples/sec: 2397.71 - lr: 0.000034 - momentum: 0.000000
2023-10-14 10:48:58,163 epoch 4 - iter 1296/1445 - loss 0.05465199 - time (sec): 65.90 - samples/sec: 2406.38 - lr: 0.000034 - momentum: 0.000000
2023-10-14 10:49:05,509 epoch 4 - iter 1440/1445 - loss 0.05379926 - time (sec): 73.24 - samples/sec: 2397.56 - lr: 0.000033 - momentum: 0.000000
2023-10-14 10:49:05,752 ----------------------------------------------------------------------------------------------------
2023-10-14 10:49:05,752 EPOCH 4 done: loss 0.0541 - lr: 0.000033
2023-10-14 10:49:09,309 DEV : loss 0.12118156254291534 - f1-score (micro avg)  0.7946
2023-10-14 10:49:09,326 ----------------------------------------------------------------------------------------------------
2023-10-14 10:49:16,954 epoch 5 - iter 144/1445 - loss 0.04673101 - time (sec): 7.63 - samples/sec: 2414.26 - lr: 0.000033 - momentum: 0.000000
2023-10-14 10:49:24,004 epoch 5 - iter 288/1445 - loss 0.04488133 - time (sec): 14.68 - samples/sec: 2414.51 - lr: 0.000032 - momentum: 0.000000
2023-10-14 10:49:31,520 epoch 5 - iter 432/1445 - loss 0.04016346 - time (sec): 22.19 - samples/sec: 2439.19 - lr: 0.000032 - momentum: 0.000000
2023-10-14 10:49:38,723 epoch 5 - iter 576/1445 - loss 0.04147102 - time (sec): 29.40 - samples/sec: 2409.52 - lr: 0.000031 - momentum: 0.000000
2023-10-14 10:49:46,016 epoch 5 - iter 720/1445 - loss 0.04117617 - time (sec): 36.69 - samples/sec: 2388.99 - lr: 0.000031 - momentum: 0.000000
2023-10-14 10:49:53,185 epoch 5 - iter 864/1445 - loss 0.04145618 - time (sec): 43.86 - samples/sec: 2366.07 - lr: 0.000030 - momentum: 0.000000
2023-10-14 10:50:00,848 epoch 5 - iter 1008/1445 - loss 0.03979828 - time (sec): 51.52 - samples/sec: 2367.21 - lr: 0.000029 - momentum: 0.000000
2023-10-14 10:50:08,189 epoch 5 - iter 1152/1445 - loss 0.03982370 - time (sec): 58.86 - samples/sec: 2374.01 - lr: 0.000029 - momentum: 0.000000
2023-10-14 10:50:15,584 epoch 5 - iter 1296/1445 - loss 0.04077806 - time (sec): 66.26 - samples/sec: 2388.64 - lr: 0.000028 - momentum: 0.000000
2023-10-14 10:50:23,148 epoch 5 - iter 1440/1445 - loss 0.03927038 - time (sec): 73.82 - samples/sec: 2380.09 - lr: 0.000028 - momentum: 0.000000
2023-10-14 10:50:23,369 ----------------------------------------------------------------------------------------------------
2023-10-14 10:50:23,369 EPOCH 5 done: loss 0.0393 - lr: 0.000028
2023-10-14 10:50:27,269 DEV : loss 0.15388108789920807 - f1-score (micro avg)  0.7917
2023-10-14 10:50:27,287 ----------------------------------------------------------------------------------------------------
2023-10-14 10:50:34,633 epoch 6 - iter 144/1445 - loss 0.03233428 - time (sec): 7.34 - samples/sec: 2362.50 - lr: 0.000027 - momentum: 0.000000
2023-10-14 10:50:41,959 epoch 6 - iter 288/1445 - loss 0.03501451 - time (sec): 14.67 - samples/sec: 2390.85 - lr: 0.000027 - momentum: 0.000000
2023-10-14 10:50:49,055 epoch 6 - iter 432/1445 - loss 0.03123834 - time (sec): 21.77 - samples/sec: 2411.58 - lr: 0.000026 - momentum: 0.000000
2023-10-14 10:50:56,443 epoch 6 - iter 576/1445 - loss 0.03274913 - time (sec): 29.16 - samples/sec: 2419.30 - lr: 0.000026 - momentum: 0.000000
2023-10-14 10:51:03,609 epoch 6 - iter 720/1445 - loss 0.03314690 - time (sec): 36.32 - samples/sec: 2406.97 - lr: 0.000025 - momentum: 0.000000
2023-10-14 10:51:10,863 epoch 6 - iter 864/1445 - loss 0.03210885 - time (sec): 43.58 - samples/sec: 2410.60 - lr: 0.000024 - momentum: 0.000000
2023-10-14 10:51:18,264 epoch 6 - iter 1008/1445 - loss 0.03115240 - time (sec): 50.98 - samples/sec: 2410.40 - lr: 0.000024 - momentum: 0.000000
2023-10-14 10:51:25,820 epoch 6 - iter 1152/1445 - loss 0.03008407 - time (sec): 58.53 - samples/sec: 2430.27 - lr: 0.000023 - momentum: 0.000000
2023-10-14 10:51:32,892 epoch 6 - iter 1296/1445 - loss 0.03095943 - time (sec): 65.60 - samples/sec: 2425.45 - lr: 0.000023 - momentum: 0.000000
2023-10-14 10:51:39,991 epoch 6 - iter 1440/1445 - loss 0.03162133 - time (sec): 72.70 - samples/sec: 2417.44 - lr: 0.000022 - momentum: 0.000000
2023-10-14 10:51:40,215 ----------------------------------------------------------------------------------------------------
2023-10-14 10:51:40,215 EPOCH 6 done: loss 0.0317 - lr: 0.000022
2023-10-14 10:51:43,805 DEV : loss 0.19756034016609192 - f1-score (micro avg)  0.8016
2023-10-14 10:51:43,821 ----------------------------------------------------------------------------------------------------
2023-10-14 10:51:51,088 epoch 7 - iter 144/1445 - loss 0.01816229 - time (sec): 7.27 - samples/sec: 2414.93 - lr: 0.000022 - momentum: 0.000000
2023-10-14 10:51:58,649 epoch 7 - iter 288/1445 - loss 0.01954064 - time (sec): 14.83 - samples/sec: 2471.92 - lr: 0.000021 - momentum: 0.000000
2023-10-14 10:52:05,759 epoch 7 - iter 432/1445 - loss 0.01867134 - time (sec): 21.94 - samples/sec: 2439.90 - lr: 0.000021 - momentum: 0.000000
2023-10-14 10:52:13,556 epoch 7 - iter 576/1445 - loss 0.01895690 - time (sec): 29.73 - samples/sec: 2418.92 - lr: 0.000020 - momentum: 0.000000
2023-10-14 10:52:20,679 epoch 7 - iter 720/1445 - loss 0.01836345 - time (sec): 36.86 - samples/sec: 2400.83 - lr: 0.000019 - momentum: 0.000000
2023-10-14 10:52:27,667 epoch 7 - iter 864/1445 - loss 0.01877869 - time (sec): 43.84 - samples/sec: 2396.78 - lr: 0.000019 - momentum: 0.000000
2023-10-14 10:52:35,009 epoch 7 - iter 1008/1445 - loss 0.02080686 - time (sec): 51.19 - samples/sec: 2419.50 - lr: 0.000018 - momentum: 0.000000
2023-10-14 10:52:42,373 epoch 7 - iter 1152/1445 - loss 0.02134693 - time (sec): 58.55 - samples/sec: 2423.15 - lr: 0.000018 - momentum: 0.000000
2023-10-14 10:52:49,576 epoch 7 - iter 1296/1445 - loss 0.02105516 - time (sec): 65.75 - samples/sec: 2420.11 - lr: 0.000017 - momentum: 0.000000
2023-10-14 10:52:56,887 epoch 7 - iter 1440/1445 - loss 0.02073299 - time (sec): 73.06 - samples/sec: 2406.17 - lr: 0.000017 - momentum: 0.000000
2023-10-14 10:52:57,114 ----------------------------------------------------------------------------------------------------
2023-10-14 10:52:57,115 EPOCH 7 done: loss 0.0209 - lr: 0.000017
2023-10-14 10:53:00,712 DEV : loss 0.18008936941623688 - f1-score (micro avg)  0.8011
2023-10-14 10:53:00,729 ----------------------------------------------------------------------------------------------------
2023-10-14 10:53:07,996 epoch 8 - iter 144/1445 - loss 0.01866497 - time (sec): 7.27 - samples/sec: 2300.59 - lr: 0.000016 - momentum: 0.000000
2023-10-14 10:53:15,313 epoch 8 - iter 288/1445 - loss 0.01466923 - time (sec): 14.58 - samples/sec: 2372.09 - lr: 0.000016 - momentum: 0.000000
2023-10-14 10:53:22,675 epoch 8 - iter 432/1445 - loss 0.01583407 - time (sec): 21.95 - samples/sec: 2384.78 - lr: 0.000015 - momentum: 0.000000
2023-10-14 10:53:29,992 epoch 8 - iter 576/1445 - loss 0.01493921 - time (sec): 29.26 - samples/sec: 2381.30 - lr: 0.000014 - momentum: 0.000000
2023-10-14 10:53:37,264 epoch 8 - iter 720/1445 - loss 0.01503871 - time (sec): 36.53 - samples/sec: 2407.26 - lr: 0.000014 - momentum: 0.000000
2023-10-14 10:53:44,308 epoch 8 - iter 864/1445 - loss 0.01515293 - time (sec): 43.58 - samples/sec: 2399.36 - lr: 0.000013 - momentum: 0.000000
2023-10-14 10:53:51,766 epoch 8 - iter 1008/1445 - loss 0.01576861 - time (sec): 51.04 - samples/sec: 2404.97 - lr: 0.000013 - momentum: 0.000000
2023-10-14 10:53:59,034 epoch 8 - iter 1152/1445 - loss 0.01561594 - time (sec): 58.30 - samples/sec: 2408.13 - lr: 0.000012 - momentum: 0.000000
2023-10-14 10:54:06,080 epoch 8 - iter 1296/1445 - loss 0.01534350 - time (sec): 65.35 - samples/sec: 2411.17 - lr: 0.000012 - momentum: 0.000000
2023-10-14 10:54:13,533 epoch 8 - iter 1440/1445 - loss 0.01509812 - time (sec): 72.80 - samples/sec: 2409.56 - lr: 0.000011 - momentum: 0.000000
2023-10-14 10:54:13,784 ----------------------------------------------------------------------------------------------------
2023-10-14 10:54:13,784 EPOCH 8 done: loss 0.0151 - lr: 0.000011
2023-10-14 10:54:17,824 DEV : loss 0.20430612564086914 - f1-score (micro avg)  0.7978
2023-10-14 10:54:17,841 ----------------------------------------------------------------------------------------------------
2023-10-14 10:54:25,386 epoch 9 - iter 144/1445 - loss 0.01055124 - time (sec): 7.54 - samples/sec: 2408.34 - lr: 0.000011 - momentum: 0.000000
2023-10-14 10:54:32,660 epoch 9 - iter 288/1445 - loss 0.00857411 - time (sec): 14.82 - samples/sec: 2384.81 - lr: 0.000010 - momentum: 0.000000
2023-10-14 10:54:39,915 epoch 9 - iter 432/1445 - loss 0.00889903 - time (sec): 22.07 - samples/sec: 2361.30 - lr: 0.000009 - momentum: 0.000000
2023-10-14 10:54:47,643 epoch 9 - iter 576/1445 - loss 0.01030940 - time (sec): 29.80 - samples/sec: 2394.86 - lr: 0.000009 - momentum: 0.000000
2023-10-14 10:54:54,820 epoch 9 - iter 720/1445 - loss 0.00993058 - time (sec): 36.98 - samples/sec: 2400.63 - lr: 0.000008 - momentum: 0.000000
2023-10-14 10:55:02,175 epoch 9 - iter 864/1445 - loss 0.00936448 - time (sec): 44.33 - samples/sec: 2406.24 - lr: 0.000008 - momentum: 0.000000
2023-10-14 10:55:09,201 epoch 9 - iter 1008/1445 - loss 0.00913328 - time (sec): 51.36 - samples/sec: 2404.99 - lr: 0.000007 - momentum: 0.000000
2023-10-14 10:55:16,575 epoch 9 - iter 1152/1445 - loss 0.00941487 - time (sec): 58.73 - samples/sec: 2408.84 - lr: 0.000007 - momentum: 0.000000
2023-10-14 10:55:23,825 epoch 9 - iter 1296/1445 - loss 0.01026341 - time (sec): 65.98 - samples/sec: 2401.83 - lr: 0.000006 - momentum: 0.000000
2023-10-14 10:55:31,036 epoch 9 - iter 1440/1445 - loss 0.01047293 - time (sec): 73.19 - samples/sec: 2400.81 - lr: 0.000006 - momentum: 0.000000
2023-10-14 10:55:31,280 ----------------------------------------------------------------------------------------------------
2023-10-14 10:55:31,280 EPOCH 9 done: loss 0.0105 - lr: 0.000006
2023-10-14 10:55:34,878 DEV : loss 0.1839500218629837 - f1-score (micro avg)  0.7974
2023-10-14 10:55:34,896 ----------------------------------------------------------------------------------------------------
2023-10-14 10:55:42,196 epoch 10 - iter 144/1445 - loss 0.00490526 - time (sec): 7.30 - samples/sec: 2287.22 - lr: 0.000005 - momentum: 0.000000
2023-10-14 10:55:49,779 epoch 10 - iter 288/1445 - loss 0.00524218 - time (sec): 14.88 - samples/sec: 2333.86 - lr: 0.000004 - momentum: 0.000000
2023-10-14 10:55:57,744 epoch 10 - iter 432/1445 - loss 0.00648842 - time (sec): 22.85 - samples/sec: 2293.26 - lr: 0.000004 - momentum: 0.000000
2023-10-14 10:56:05,487 epoch 10 - iter 576/1445 - loss 0.00656819 - time (sec): 30.59 - samples/sec: 2325.72 - lr: 0.000003 - momentum: 0.000000
2023-10-14 10:56:12,959 epoch 10 - iter 720/1445 - loss 0.00698945 - time (sec): 38.06 - samples/sec: 2354.45 - lr: 0.000003 - momentum: 0.000000
2023-10-14 10:56:20,142 epoch 10 - iter 864/1445 - loss 0.00778598 - time (sec): 45.24 - samples/sec: 2362.14 - lr: 0.000002 - momentum: 0.000000
2023-10-14 10:56:27,101 epoch 10 - iter 1008/1445 - loss 0.00737225 - time (sec): 52.20 - samples/sec: 2348.21 - lr: 0.000002 - momentum: 0.000000
2023-10-14 10:56:34,107 epoch 10 - iter 1152/1445 - loss 0.00697948 - time (sec): 59.21 - samples/sec: 2352.75 - lr: 0.000001 - momentum: 0.000000
2023-10-14 10:56:41,417 epoch 10 - iter 1296/1445 - loss 0.00700128 - time (sec): 66.52 - samples/sec: 2371.69 - lr: 0.000001 - momentum: 0.000000
2023-10-14 10:56:48,753 epoch 10 - iter 1440/1445 - loss 0.00696742 - time (sec): 73.86 - samples/sec: 2376.15 - lr: 0.000000 - momentum: 0.000000
2023-10-14 10:56:49,020 ----------------------------------------------------------------------------------------------------
2023-10-14 10:56:49,021 EPOCH 10 done: loss 0.0070 - lr: 0.000000
2023-10-14 10:56:52,539 DEV : loss 0.19384992122650146 - f1-score (micro avg)  0.8093
2023-10-14 10:56:52,555 saving best model
2023-10-14 10:56:53,410 ----------------------------------------------------------------------------------------------------
2023-10-14 10:56:53,411 Loading model from best epoch ...
2023-10-14 10:56:55,129 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-14 10:56:58,276 
Results:
- F-score (micro) 0.7959
- F-score (macro) 0.6975
- Accuracy 0.6749

By class:
              precision    recall  f1-score   support

         PER     0.8184    0.7759    0.7966       482
         LOC     0.8949    0.7991    0.8443       458
         ORG     0.5091    0.4058    0.4516        69

   micro avg     0.8339    0.7611    0.7959      1009
   macro avg     0.7408    0.6603    0.6975      1009
weighted avg     0.8319    0.7611    0.7947      1009

2023-10-14 10:56:58,276 ----------------------------------------------------------------------------------------------------