File size: 25,194 Bytes
3cca4bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
2023-10-12 12:50:33,877 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,879 Model: "SequenceTagger(
  (embeddings): ByT5Embeddings(
    (model): T5EncoderModel(
      (shared): Embedding(384, 1472)
      (encoder): T5Stack(
        (embed_tokens): Embedding(384, 1472)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                  (relative_attention_bias): Embedding(32, 6)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1472, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-12 12:50:33,879 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 MultiCorpus: 5777 train + 722 dev + 723 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 Train:  5777 sentences
2023-10-12 12:50:33,880         (train_with_dev=False, train_with_test=False)
2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 Training Params:
2023-10-12 12:50:33,880  - learning_rate: "0.00015" 
2023-10-12 12:50:33,880  - mini_batch_size: "8"
2023-10-12 12:50:33,880  - max_epochs: "10"
2023-10-12 12:50:33,880  - shuffle: "True"
2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 Plugins:
2023-10-12 12:50:33,881  - TensorboardLogger
2023-10-12 12:50:33,881  - LinearScheduler | warmup_fraction: '0.1'
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 Final evaluation on model from best epoch (best-model.pt)
2023-10-12 12:50:33,881  - metric: "('micro avg', 'f1-score')"
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 Computation:
2023-10-12 12:50:33,881  - compute on device: cuda:0
2023-10-12 12:50:33,881  - embedding storage: none
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,882 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-12 12:51:13,856 epoch 1 - iter 72/723 - loss 2.57210097 - time (sec): 39.97 - samples/sec: 446.88 - lr: 0.000015 - momentum: 0.000000
2023-10-12 12:51:55,334 epoch 1 - iter 144/723 - loss 2.50368577 - time (sec): 81.45 - samples/sec: 443.12 - lr: 0.000030 - momentum: 0.000000
2023-10-12 12:52:35,583 epoch 1 - iter 216/723 - loss 2.34479028 - time (sec): 121.70 - samples/sec: 436.36 - lr: 0.000045 - momentum: 0.000000
2023-10-12 12:53:16,570 epoch 1 - iter 288/723 - loss 2.14084440 - time (sec): 162.69 - samples/sec: 434.66 - lr: 0.000060 - momentum: 0.000000
2023-10-12 12:53:56,024 epoch 1 - iter 360/723 - loss 1.92004853 - time (sec): 202.14 - samples/sec: 437.13 - lr: 0.000074 - momentum: 0.000000
2023-10-12 12:54:36,420 epoch 1 - iter 432/723 - loss 1.71012873 - time (sec): 242.54 - samples/sec: 433.77 - lr: 0.000089 - momentum: 0.000000
2023-10-12 12:55:15,712 epoch 1 - iter 504/723 - loss 1.51773573 - time (sec): 281.83 - samples/sec: 434.20 - lr: 0.000104 - momentum: 0.000000
2023-10-12 12:55:55,929 epoch 1 - iter 576/723 - loss 1.35508499 - time (sec): 322.05 - samples/sec: 436.31 - lr: 0.000119 - momentum: 0.000000
2023-10-12 12:56:35,029 epoch 1 - iter 648/723 - loss 1.22422963 - time (sec): 361.15 - samples/sec: 439.92 - lr: 0.000134 - momentum: 0.000000
2023-10-12 12:57:12,712 epoch 1 - iter 720/723 - loss 1.12678212 - time (sec): 398.83 - samples/sec: 440.47 - lr: 0.000149 - momentum: 0.000000
2023-10-12 12:57:13,890 ----------------------------------------------------------------------------------------------------
2023-10-12 12:57:13,891 EPOCH 1 done: loss 1.1237 - lr: 0.000149
2023-10-12 12:57:33,815 DEV : loss 0.22408561408519745 - f1-score (micro avg)  0.0021
2023-10-12 12:57:33,845 saving best model
2023-10-12 12:57:34,699 ----------------------------------------------------------------------------------------------------
2023-10-12 12:58:12,792 epoch 2 - iter 72/723 - loss 0.16692332 - time (sec): 38.09 - samples/sec: 465.66 - lr: 0.000148 - momentum: 0.000000
2023-10-12 12:58:50,804 epoch 2 - iter 144/723 - loss 0.17013500 - time (sec): 76.10 - samples/sec: 458.04 - lr: 0.000147 - momentum: 0.000000
2023-10-12 12:59:29,373 epoch 2 - iter 216/723 - loss 0.16696133 - time (sec): 114.67 - samples/sec: 455.25 - lr: 0.000145 - momentum: 0.000000
2023-10-12 13:00:06,773 epoch 2 - iter 288/723 - loss 0.15925242 - time (sec): 152.07 - samples/sec: 456.96 - lr: 0.000143 - momentum: 0.000000
2023-10-12 13:00:44,163 epoch 2 - iter 360/723 - loss 0.15257157 - time (sec): 189.46 - samples/sec: 453.07 - lr: 0.000142 - momentum: 0.000000
2023-10-12 13:01:23,135 epoch 2 - iter 432/723 - loss 0.14873337 - time (sec): 228.43 - samples/sec: 453.50 - lr: 0.000140 - momentum: 0.000000
2023-10-12 13:02:02,355 epoch 2 - iter 504/723 - loss 0.14684350 - time (sec): 267.65 - samples/sec: 455.16 - lr: 0.000138 - momentum: 0.000000
2023-10-12 13:02:43,047 epoch 2 - iter 576/723 - loss 0.14308762 - time (sec): 308.35 - samples/sec: 453.78 - lr: 0.000137 - momentum: 0.000000
2023-10-12 13:03:23,828 epoch 2 - iter 648/723 - loss 0.13860972 - time (sec): 349.13 - samples/sec: 452.24 - lr: 0.000135 - momentum: 0.000000
2023-10-12 13:04:03,474 epoch 2 - iter 720/723 - loss 0.13666210 - time (sec): 388.77 - samples/sec: 451.37 - lr: 0.000133 - momentum: 0.000000
2023-10-12 13:04:04,944 ----------------------------------------------------------------------------------------------------
2023-10-12 13:04:04,945 EPOCH 2 done: loss 0.1363 - lr: 0.000133
2023-10-12 13:04:25,559 DEV : loss 0.125865638256073 - f1-score (micro avg)  0.6974
2023-10-12 13:04:25,588 saving best model
2023-10-12 13:04:28,483 ----------------------------------------------------------------------------------------------------
2023-10-12 13:05:06,593 epoch 3 - iter 72/723 - loss 0.10614972 - time (sec): 38.11 - samples/sec: 441.85 - lr: 0.000132 - momentum: 0.000000
2023-10-12 13:05:45,790 epoch 3 - iter 144/723 - loss 0.09543565 - time (sec): 77.30 - samples/sec: 448.25 - lr: 0.000130 - momentum: 0.000000
2023-10-12 13:06:24,554 epoch 3 - iter 216/723 - loss 0.09411193 - time (sec): 116.07 - samples/sec: 445.86 - lr: 0.000128 - momentum: 0.000000
2023-10-12 13:07:02,711 epoch 3 - iter 288/723 - loss 0.09004094 - time (sec): 154.22 - samples/sec: 447.22 - lr: 0.000127 - momentum: 0.000000
2023-10-12 13:07:42,238 epoch 3 - iter 360/723 - loss 0.08961960 - time (sec): 193.75 - samples/sec: 446.27 - lr: 0.000125 - momentum: 0.000000
2023-10-12 13:08:22,102 epoch 3 - iter 432/723 - loss 0.08897250 - time (sec): 233.61 - samples/sec: 451.71 - lr: 0.000123 - momentum: 0.000000
2023-10-12 13:09:02,522 epoch 3 - iter 504/723 - loss 0.08525340 - time (sec): 274.03 - samples/sec: 450.43 - lr: 0.000122 - momentum: 0.000000
2023-10-12 13:09:41,313 epoch 3 - iter 576/723 - loss 0.08340939 - time (sec): 312.83 - samples/sec: 449.52 - lr: 0.000120 - momentum: 0.000000
2023-10-12 13:10:21,839 epoch 3 - iter 648/723 - loss 0.08185803 - time (sec): 353.35 - samples/sec: 446.67 - lr: 0.000118 - momentum: 0.000000
2023-10-12 13:11:01,410 epoch 3 - iter 720/723 - loss 0.08020714 - time (sec): 392.92 - samples/sec: 447.11 - lr: 0.000117 - momentum: 0.000000
2023-10-12 13:11:02,572 ----------------------------------------------------------------------------------------------------
2023-10-12 13:11:02,573 EPOCH 3 done: loss 0.0803 - lr: 0.000117
2023-10-12 13:11:24,304 DEV : loss 0.09134244173765182 - f1-score (micro avg)  0.8085
2023-10-12 13:11:24,336 saving best model
2023-10-12 13:11:26,940 ----------------------------------------------------------------------------------------------------
2023-10-12 13:12:08,816 epoch 4 - iter 72/723 - loss 0.04375890 - time (sec): 41.87 - samples/sec: 451.41 - lr: 0.000115 - momentum: 0.000000
2023-10-12 13:12:45,861 epoch 4 - iter 144/723 - loss 0.04780814 - time (sec): 78.92 - samples/sec: 443.89 - lr: 0.000113 - momentum: 0.000000
2023-10-12 13:13:26,615 epoch 4 - iter 216/723 - loss 0.04927505 - time (sec): 119.67 - samples/sec: 431.85 - lr: 0.000112 - momentum: 0.000000
2023-10-12 13:14:05,422 epoch 4 - iter 288/723 - loss 0.05251675 - time (sec): 158.48 - samples/sec: 435.09 - lr: 0.000110 - momentum: 0.000000
2023-10-12 13:14:42,756 epoch 4 - iter 360/723 - loss 0.05197170 - time (sec): 195.81 - samples/sec: 439.06 - lr: 0.000108 - momentum: 0.000000
2023-10-12 13:15:22,534 epoch 4 - iter 432/723 - loss 0.05444043 - time (sec): 235.59 - samples/sec: 441.72 - lr: 0.000107 - momentum: 0.000000
2023-10-12 13:16:02,299 epoch 4 - iter 504/723 - loss 0.05358124 - time (sec): 275.36 - samples/sec: 447.12 - lr: 0.000105 - momentum: 0.000000
2023-10-12 13:16:40,164 epoch 4 - iter 576/723 - loss 0.05390487 - time (sec): 313.22 - samples/sec: 447.55 - lr: 0.000103 - momentum: 0.000000
2023-10-12 13:17:20,651 epoch 4 - iter 648/723 - loss 0.05448671 - time (sec): 353.71 - samples/sec: 445.79 - lr: 0.000102 - momentum: 0.000000
2023-10-12 13:17:59,869 epoch 4 - iter 720/723 - loss 0.05377454 - time (sec): 392.93 - samples/sec: 447.28 - lr: 0.000100 - momentum: 0.000000
2023-10-12 13:18:00,975 ----------------------------------------------------------------------------------------------------
2023-10-12 13:18:00,975 EPOCH 4 done: loss 0.0537 - lr: 0.000100
2023-10-12 13:18:21,723 DEV : loss 0.08551333099603653 - f1-score (micro avg)  0.8344
2023-10-12 13:18:21,755 saving best model
2023-10-12 13:18:22,733 ----------------------------------------------------------------------------------------------------
2023-10-12 13:19:01,088 epoch 5 - iter 72/723 - loss 0.02887265 - time (sec): 38.35 - samples/sec: 442.57 - lr: 0.000098 - momentum: 0.000000
2023-10-12 13:19:42,050 epoch 5 - iter 144/723 - loss 0.03460162 - time (sec): 79.32 - samples/sec: 433.60 - lr: 0.000097 - momentum: 0.000000
2023-10-12 13:20:22,461 epoch 5 - iter 216/723 - loss 0.03491227 - time (sec): 119.73 - samples/sec: 435.34 - lr: 0.000095 - momentum: 0.000000
2023-10-12 13:21:05,047 epoch 5 - iter 288/723 - loss 0.03630309 - time (sec): 162.31 - samples/sec: 434.65 - lr: 0.000093 - momentum: 0.000000
2023-10-12 13:21:48,859 epoch 5 - iter 360/723 - loss 0.03649219 - time (sec): 206.12 - samples/sec: 427.40 - lr: 0.000092 - momentum: 0.000000
2023-10-12 13:22:31,786 epoch 5 - iter 432/723 - loss 0.03618346 - time (sec): 249.05 - samples/sec: 424.25 - lr: 0.000090 - momentum: 0.000000
2023-10-12 13:23:12,902 epoch 5 - iter 504/723 - loss 0.03519474 - time (sec): 290.17 - samples/sec: 420.92 - lr: 0.000088 - momentum: 0.000000
2023-10-12 13:23:54,082 epoch 5 - iter 576/723 - loss 0.03451399 - time (sec): 331.35 - samples/sec: 422.25 - lr: 0.000087 - momentum: 0.000000
2023-10-12 13:24:35,954 epoch 5 - iter 648/723 - loss 0.03511239 - time (sec): 373.22 - samples/sec: 422.11 - lr: 0.000085 - momentum: 0.000000
2023-10-12 13:25:18,788 epoch 5 - iter 720/723 - loss 0.03557155 - time (sec): 416.05 - samples/sec: 422.28 - lr: 0.000083 - momentum: 0.000000
2023-10-12 13:25:20,108 ----------------------------------------------------------------------------------------------------
2023-10-12 13:25:20,108 EPOCH 5 done: loss 0.0355 - lr: 0.000083
2023-10-12 13:25:42,303 DEV : loss 0.09844549000263214 - f1-score (micro avg)  0.8281
2023-10-12 13:25:42,335 ----------------------------------------------------------------------------------------------------
2023-10-12 13:26:24,818 epoch 6 - iter 72/723 - loss 0.03450655 - time (sec): 42.48 - samples/sec: 423.14 - lr: 0.000082 - momentum: 0.000000
2023-10-12 13:27:04,856 epoch 6 - iter 144/723 - loss 0.03035545 - time (sec): 82.52 - samples/sec: 423.88 - lr: 0.000080 - momentum: 0.000000
2023-10-12 13:27:43,676 epoch 6 - iter 216/723 - loss 0.02931547 - time (sec): 121.34 - samples/sec: 436.00 - lr: 0.000078 - momentum: 0.000000
2023-10-12 13:28:23,185 epoch 6 - iter 288/723 - loss 0.02672535 - time (sec): 160.85 - samples/sec: 445.88 - lr: 0.000077 - momentum: 0.000000
2023-10-12 13:29:00,446 epoch 6 - iter 360/723 - loss 0.02600815 - time (sec): 198.11 - samples/sec: 441.33 - lr: 0.000075 - momentum: 0.000000
2023-10-12 13:29:40,823 epoch 6 - iter 432/723 - loss 0.02605262 - time (sec): 238.49 - samples/sec: 447.85 - lr: 0.000073 - momentum: 0.000000
2023-10-12 13:30:19,459 epoch 6 - iter 504/723 - loss 0.02591407 - time (sec): 277.12 - samples/sec: 446.06 - lr: 0.000072 - momentum: 0.000000
2023-10-12 13:30:58,816 epoch 6 - iter 576/723 - loss 0.02584013 - time (sec): 316.48 - samples/sec: 446.29 - lr: 0.000070 - momentum: 0.000000
2023-10-12 13:31:38,066 epoch 6 - iter 648/723 - loss 0.02608188 - time (sec): 355.73 - samples/sec: 446.29 - lr: 0.000068 - momentum: 0.000000
2023-10-12 13:32:16,080 epoch 6 - iter 720/723 - loss 0.02616533 - time (sec): 393.74 - samples/sec: 446.09 - lr: 0.000067 - momentum: 0.000000
2023-10-12 13:32:17,295 ----------------------------------------------------------------------------------------------------
2023-10-12 13:32:17,296 EPOCH 6 done: loss 0.0261 - lr: 0.000067
2023-10-12 13:32:38,779 DEV : loss 0.0909653976559639 - f1-score (micro avg)  0.8547
2023-10-12 13:32:38,811 saving best model
2023-10-12 13:32:41,417 ----------------------------------------------------------------------------------------------------
2023-10-12 13:33:18,878 epoch 7 - iter 72/723 - loss 0.02548483 - time (sec): 37.46 - samples/sec: 446.16 - lr: 0.000065 - momentum: 0.000000
2023-10-12 13:33:58,757 epoch 7 - iter 144/723 - loss 0.02514500 - time (sec): 77.34 - samples/sec: 462.95 - lr: 0.000063 - momentum: 0.000000
2023-10-12 13:34:37,377 epoch 7 - iter 216/723 - loss 0.02611835 - time (sec): 115.96 - samples/sec: 462.23 - lr: 0.000062 - momentum: 0.000000
2023-10-12 13:35:15,889 epoch 7 - iter 288/723 - loss 0.02415853 - time (sec): 154.47 - samples/sec: 459.97 - lr: 0.000060 - momentum: 0.000000
2023-10-12 13:35:54,789 epoch 7 - iter 360/723 - loss 0.02340115 - time (sec): 193.37 - samples/sec: 458.91 - lr: 0.000058 - momentum: 0.000000
2023-10-12 13:36:34,765 epoch 7 - iter 432/723 - loss 0.02218556 - time (sec): 233.34 - samples/sec: 458.27 - lr: 0.000057 - momentum: 0.000000
2023-10-12 13:37:13,961 epoch 7 - iter 504/723 - loss 0.02199272 - time (sec): 272.54 - samples/sec: 459.46 - lr: 0.000055 - momentum: 0.000000
2023-10-12 13:37:51,396 epoch 7 - iter 576/723 - loss 0.02149862 - time (sec): 309.98 - samples/sec: 459.97 - lr: 0.000053 - momentum: 0.000000
2023-10-12 13:38:28,567 epoch 7 - iter 648/723 - loss 0.02110408 - time (sec): 347.15 - samples/sec: 457.63 - lr: 0.000052 - momentum: 0.000000
2023-10-12 13:39:04,853 epoch 7 - iter 720/723 - loss 0.02089846 - time (sec): 383.43 - samples/sec: 458.45 - lr: 0.000050 - momentum: 0.000000
2023-10-12 13:39:05,894 ----------------------------------------------------------------------------------------------------
2023-10-12 13:39:05,895 EPOCH 7 done: loss 0.0209 - lr: 0.000050
2023-10-12 13:39:26,497 DEV : loss 0.12286769598722458 - f1-score (micro avg)  0.8403
2023-10-12 13:39:26,530 ----------------------------------------------------------------------------------------------------
2023-10-12 13:40:05,852 epoch 8 - iter 72/723 - loss 0.01313140 - time (sec): 39.32 - samples/sec: 454.78 - lr: 0.000048 - momentum: 0.000000
2023-10-12 13:40:45,572 epoch 8 - iter 144/723 - loss 0.01250697 - time (sec): 79.04 - samples/sec: 454.78 - lr: 0.000047 - momentum: 0.000000
2023-10-12 13:41:25,089 epoch 8 - iter 216/723 - loss 0.01246682 - time (sec): 118.56 - samples/sec: 451.77 - lr: 0.000045 - momentum: 0.000000
2023-10-12 13:42:05,028 epoch 8 - iter 288/723 - loss 0.01132435 - time (sec): 158.50 - samples/sec: 455.71 - lr: 0.000043 - momentum: 0.000000
2023-10-12 13:42:42,164 epoch 8 - iter 360/723 - loss 0.01353114 - time (sec): 195.63 - samples/sec: 450.26 - lr: 0.000042 - momentum: 0.000000
2023-10-12 13:43:21,974 epoch 8 - iter 432/723 - loss 0.01342207 - time (sec): 235.44 - samples/sec: 446.73 - lr: 0.000040 - momentum: 0.000000
2023-10-12 13:44:01,853 epoch 8 - iter 504/723 - loss 0.01495552 - time (sec): 275.32 - samples/sec: 444.93 - lr: 0.000038 - momentum: 0.000000
2023-10-12 13:44:44,572 epoch 8 - iter 576/723 - loss 0.01542540 - time (sec): 318.04 - samples/sec: 441.01 - lr: 0.000037 - momentum: 0.000000
2023-10-12 13:45:26,594 epoch 8 - iter 648/723 - loss 0.01508947 - time (sec): 360.06 - samples/sec: 438.48 - lr: 0.000035 - momentum: 0.000000
2023-10-12 13:46:08,421 epoch 8 - iter 720/723 - loss 0.01601048 - time (sec): 401.89 - samples/sec: 437.43 - lr: 0.000033 - momentum: 0.000000
2023-10-12 13:46:09,570 ----------------------------------------------------------------------------------------------------
2023-10-12 13:46:09,571 EPOCH 8 done: loss 0.0162 - lr: 0.000033
2023-10-12 13:46:31,884 DEV : loss 0.12785491347312927 - f1-score (micro avg)  0.8443
2023-10-12 13:46:31,946 ----------------------------------------------------------------------------------------------------
2023-10-12 13:47:15,224 epoch 9 - iter 72/723 - loss 0.01226192 - time (sec): 43.28 - samples/sec: 424.75 - lr: 0.000032 - momentum: 0.000000
2023-10-12 13:47:57,051 epoch 9 - iter 144/723 - loss 0.01230280 - time (sec): 85.10 - samples/sec: 413.44 - lr: 0.000030 - momentum: 0.000000
2023-10-12 13:48:39,683 epoch 9 - iter 216/723 - loss 0.01384980 - time (sec): 127.73 - samples/sec: 408.83 - lr: 0.000028 - momentum: 0.000000
2023-10-12 13:49:21,709 epoch 9 - iter 288/723 - loss 0.01248849 - time (sec): 169.76 - samples/sec: 410.18 - lr: 0.000027 - momentum: 0.000000
2023-10-12 13:50:02,274 epoch 9 - iter 360/723 - loss 0.01133267 - time (sec): 210.32 - samples/sec: 418.21 - lr: 0.000025 - momentum: 0.000000
2023-10-12 13:50:44,723 epoch 9 - iter 432/723 - loss 0.01131670 - time (sec): 252.77 - samples/sec: 417.56 - lr: 0.000023 - momentum: 0.000000
2023-10-12 13:51:25,911 epoch 9 - iter 504/723 - loss 0.01149792 - time (sec): 293.96 - samples/sec: 421.43 - lr: 0.000022 - momentum: 0.000000
2023-10-12 13:52:06,053 epoch 9 - iter 576/723 - loss 0.01200648 - time (sec): 334.10 - samples/sec: 422.14 - lr: 0.000020 - momentum: 0.000000
2023-10-12 13:52:44,347 epoch 9 - iter 648/723 - loss 0.01233164 - time (sec): 372.40 - samples/sec: 422.87 - lr: 0.000018 - momentum: 0.000000
2023-10-12 13:53:25,270 epoch 9 - iter 720/723 - loss 0.01241683 - time (sec): 413.32 - samples/sec: 423.69 - lr: 0.000017 - momentum: 0.000000
2023-10-12 13:53:27,201 ----------------------------------------------------------------------------------------------------
2023-10-12 13:53:27,202 EPOCH 9 done: loss 0.0136 - lr: 0.000017
2023-10-12 13:53:49,325 DEV : loss 0.13609179854393005 - f1-score (micro avg)  0.8429
2023-10-12 13:53:49,364 ----------------------------------------------------------------------------------------------------
2023-10-12 13:54:34,158 epoch 10 - iter 72/723 - loss 0.02070022 - time (sec): 44.79 - samples/sec: 420.68 - lr: 0.000015 - momentum: 0.000000
2023-10-12 13:55:14,534 epoch 10 - iter 144/723 - loss 0.01572521 - time (sec): 85.17 - samples/sec: 427.85 - lr: 0.000013 - momentum: 0.000000
2023-10-12 13:55:57,299 epoch 10 - iter 216/723 - loss 0.01436480 - time (sec): 127.93 - samples/sec: 418.71 - lr: 0.000012 - momentum: 0.000000
2023-10-12 13:56:38,887 epoch 10 - iter 288/723 - loss 0.01335702 - time (sec): 169.52 - samples/sec: 421.14 - lr: 0.000010 - momentum: 0.000000
2023-10-12 13:57:20,435 epoch 10 - iter 360/723 - loss 0.01325818 - time (sec): 211.07 - samples/sec: 422.83 - lr: 0.000008 - momentum: 0.000000
2023-10-12 13:58:02,750 epoch 10 - iter 432/723 - loss 0.01235929 - time (sec): 253.38 - samples/sec: 425.90 - lr: 0.000007 - momentum: 0.000000
2023-10-12 13:58:43,798 epoch 10 - iter 504/723 - loss 0.01217336 - time (sec): 294.43 - samples/sec: 420.51 - lr: 0.000005 - momentum: 0.000000
2023-10-12 13:59:25,942 epoch 10 - iter 576/723 - loss 0.01172104 - time (sec): 336.58 - samples/sec: 422.72 - lr: 0.000003 - momentum: 0.000000
2023-10-12 14:00:06,168 epoch 10 - iter 648/723 - loss 0.01125967 - time (sec): 376.80 - samples/sec: 421.48 - lr: 0.000002 - momentum: 0.000000
2023-10-12 14:00:47,810 epoch 10 - iter 720/723 - loss 0.01135370 - time (sec): 418.44 - samples/sec: 420.11 - lr: 0.000000 - momentum: 0.000000
2023-10-12 14:00:48,941 ----------------------------------------------------------------------------------------------------
2023-10-12 14:00:48,941 EPOCH 10 done: loss 0.0113 - lr: 0.000000
2023-10-12 14:01:12,472 DEV : loss 0.13888543844223022 - f1-score (micro avg)  0.8436
2023-10-12 14:01:13,512 ----------------------------------------------------------------------------------------------------
2023-10-12 14:01:13,514 Loading model from best epoch ...
2023-10-12 14:01:17,667 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-12 14:01:40,330 
Results:
- F-score (micro) 0.8564
- F-score (macro) 0.7697
- Accuracy 0.7601

By class:
              precision    recall  f1-score   support

         PER     0.8566    0.8672    0.8619       482
         LOC     0.8937    0.8996    0.8966       458
         ORG     0.5507    0.5507    0.5507        69

   micro avg     0.8527    0.8603    0.8564      1009
   macro avg     0.7670    0.7725    0.7697      1009
weighted avg     0.8525    0.8603    0.8564      1009

2023-10-12 14:01:40,330 ----------------------------------------------------------------------------------------------------