File size: 23,785 Bytes
22fa481
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
2023-10-16 19:38:25,064 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,065 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-16 19:38:25,065 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,065 MultiCorpus: 1085 train + 148 dev + 364 test sentences
 - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator
2023-10-16 19:38:25,065 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,065 Train:  1085 sentences
2023-10-16 19:38:25,065         (train_with_dev=False, train_with_test=False)
2023-10-16 19:38:25,065 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,065 Training Params:
2023-10-16 19:38:25,066  - learning_rate: "5e-05" 
2023-10-16 19:38:25,066  - mini_batch_size: "8"
2023-10-16 19:38:25,066  - max_epochs: "10"
2023-10-16 19:38:25,066  - shuffle: "True"
2023-10-16 19:38:25,066 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,066 Plugins:
2023-10-16 19:38:25,066  - LinearScheduler | warmup_fraction: '0.1'
2023-10-16 19:38:25,066 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,066 Final evaluation on model from best epoch (best-model.pt)
2023-10-16 19:38:25,066  - metric: "('micro avg', 'f1-score')"
2023-10-16 19:38:25,066 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,066 Computation:
2023-10-16 19:38:25,066  - compute on device: cuda:0
2023-10-16 19:38:25,066  - embedding storage: none
2023-10-16 19:38:25,066 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,066 Model training base path: "hmbench-newseye/sv-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-16 19:38:25,066 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:25,066 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:26,453 epoch 1 - iter 13/136 - loss 3.02415645 - time (sec): 1.39 - samples/sec: 3380.07 - lr: 0.000004 - momentum: 0.000000
2023-10-16 19:38:27,894 epoch 1 - iter 26/136 - loss 2.73651899 - time (sec): 2.83 - samples/sec: 3429.90 - lr: 0.000009 - momentum: 0.000000
2023-10-16 19:38:29,134 epoch 1 - iter 39/136 - loss 2.19292351 - time (sec): 4.07 - samples/sec: 3528.44 - lr: 0.000014 - momentum: 0.000000
2023-10-16 19:38:30,344 epoch 1 - iter 52/136 - loss 1.83877465 - time (sec): 5.28 - samples/sec: 3573.73 - lr: 0.000019 - momentum: 0.000000
2023-10-16 19:38:31,580 epoch 1 - iter 65/136 - loss 1.56468537 - time (sec): 6.51 - samples/sec: 3685.59 - lr: 0.000024 - momentum: 0.000000
2023-10-16 19:38:32,997 epoch 1 - iter 78/136 - loss 1.37162883 - time (sec): 7.93 - samples/sec: 3708.76 - lr: 0.000028 - momentum: 0.000000
2023-10-16 19:38:34,370 epoch 1 - iter 91/136 - loss 1.23231369 - time (sec): 9.30 - samples/sec: 3698.96 - lr: 0.000033 - momentum: 0.000000
2023-10-16 19:38:35,609 epoch 1 - iter 104/136 - loss 1.12236078 - time (sec): 10.54 - samples/sec: 3723.21 - lr: 0.000038 - momentum: 0.000000
2023-10-16 19:38:37,007 epoch 1 - iter 117/136 - loss 1.02932350 - time (sec): 11.94 - samples/sec: 3727.18 - lr: 0.000043 - momentum: 0.000000
2023-10-16 19:38:38,585 epoch 1 - iter 130/136 - loss 0.94617700 - time (sec): 13.52 - samples/sec: 3675.25 - lr: 0.000047 - momentum: 0.000000
2023-10-16 19:38:39,187 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:39,187 EPOCH 1 done: loss 0.9161 - lr: 0.000047
2023-10-16 19:38:40,204 DEV : loss 0.20288820564746857 - f1-score (micro avg)  0.4722
2023-10-16 19:38:40,208 saving best model
2023-10-16 19:38:40,522 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:42,045 epoch 2 - iter 13/136 - loss 0.23732824 - time (sec): 1.52 - samples/sec: 3734.23 - lr: 0.000050 - momentum: 0.000000
2023-10-16 19:38:43,375 epoch 2 - iter 26/136 - loss 0.20696770 - time (sec): 2.85 - samples/sec: 3603.59 - lr: 0.000049 - momentum: 0.000000
2023-10-16 19:38:44,679 epoch 2 - iter 39/136 - loss 0.19951300 - time (sec): 4.16 - samples/sec: 3714.36 - lr: 0.000048 - momentum: 0.000000
2023-10-16 19:38:46,063 epoch 2 - iter 52/136 - loss 0.21957384 - time (sec): 5.54 - samples/sec: 3616.90 - lr: 0.000048 - momentum: 0.000000
2023-10-16 19:38:47,354 epoch 2 - iter 65/136 - loss 0.21017221 - time (sec): 6.83 - samples/sec: 3618.48 - lr: 0.000047 - momentum: 0.000000
2023-10-16 19:38:48,776 epoch 2 - iter 78/136 - loss 0.19876748 - time (sec): 8.25 - samples/sec: 3574.30 - lr: 0.000047 - momentum: 0.000000
2023-10-16 19:38:50,182 epoch 2 - iter 91/136 - loss 0.19154341 - time (sec): 9.66 - samples/sec: 3605.84 - lr: 0.000046 - momentum: 0.000000
2023-10-16 19:38:51,554 epoch 2 - iter 104/136 - loss 0.18885502 - time (sec): 11.03 - samples/sec: 3630.47 - lr: 0.000046 - momentum: 0.000000
2023-10-16 19:38:52,936 epoch 2 - iter 117/136 - loss 0.18113803 - time (sec): 12.41 - samples/sec: 3630.23 - lr: 0.000045 - momentum: 0.000000
2023-10-16 19:38:54,386 epoch 2 - iter 130/136 - loss 0.17639998 - time (sec): 13.86 - samples/sec: 3592.81 - lr: 0.000045 - momentum: 0.000000
2023-10-16 19:38:55,007 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:55,007 EPOCH 2 done: loss 0.1744 - lr: 0.000045
2023-10-16 19:38:56,455 DEV : loss 0.12787802517414093 - f1-score (micro avg)  0.709
2023-10-16 19:38:56,462 saving best model
2023-10-16 19:38:56,980 ----------------------------------------------------------------------------------------------------
2023-10-16 19:38:58,375 epoch 3 - iter 13/136 - loss 0.10130091 - time (sec): 1.39 - samples/sec: 3458.50 - lr: 0.000044 - momentum: 0.000000
2023-10-16 19:38:59,508 epoch 3 - iter 26/136 - loss 0.09604338 - time (sec): 2.52 - samples/sec: 3682.54 - lr: 0.000043 - momentum: 0.000000
2023-10-16 19:39:00,997 epoch 3 - iter 39/136 - loss 0.10268479 - time (sec): 4.01 - samples/sec: 3713.63 - lr: 0.000043 - momentum: 0.000000
2023-10-16 19:39:02,517 epoch 3 - iter 52/136 - loss 0.10309567 - time (sec): 5.53 - samples/sec: 3542.78 - lr: 0.000042 - momentum: 0.000000
2023-10-16 19:39:03,822 epoch 3 - iter 65/136 - loss 0.09646645 - time (sec): 6.84 - samples/sec: 3534.59 - lr: 0.000042 - momentum: 0.000000
2023-10-16 19:39:05,342 epoch 3 - iter 78/136 - loss 0.09885337 - time (sec): 8.36 - samples/sec: 3539.54 - lr: 0.000041 - momentum: 0.000000
2023-10-16 19:39:06,891 epoch 3 - iter 91/136 - loss 0.09499064 - time (sec): 9.91 - samples/sec: 3475.54 - lr: 0.000041 - momentum: 0.000000
2023-10-16 19:39:08,202 epoch 3 - iter 104/136 - loss 0.09164222 - time (sec): 11.22 - samples/sec: 3466.50 - lr: 0.000040 - momentum: 0.000000
2023-10-16 19:39:09,616 epoch 3 - iter 117/136 - loss 0.09539987 - time (sec): 12.63 - samples/sec: 3460.71 - lr: 0.000040 - momentum: 0.000000
2023-10-16 19:39:11,090 epoch 3 - iter 130/136 - loss 0.09368696 - time (sec): 14.11 - samples/sec: 3501.20 - lr: 0.000039 - momentum: 0.000000
2023-10-16 19:39:11,799 ----------------------------------------------------------------------------------------------------
2023-10-16 19:39:11,799 EPOCH 3 done: loss 0.0936 - lr: 0.000039
2023-10-16 19:39:13,630 DEV : loss 0.10517842322587967 - f1-score (micro avg)  0.7648
2023-10-16 19:39:13,634 saving best model
2023-10-16 19:39:14,303 ----------------------------------------------------------------------------------------------------
2023-10-16 19:39:15,796 epoch 4 - iter 13/136 - loss 0.06608433 - time (sec): 1.49 - samples/sec: 3677.44 - lr: 0.000038 - momentum: 0.000000
2023-10-16 19:39:17,085 epoch 4 - iter 26/136 - loss 0.05710389 - time (sec): 2.78 - samples/sec: 3836.83 - lr: 0.000038 - momentum: 0.000000
2023-10-16 19:39:18,540 epoch 4 - iter 39/136 - loss 0.05467172 - time (sec): 4.23 - samples/sec: 3719.82 - lr: 0.000037 - momentum: 0.000000
2023-10-16 19:39:20,049 epoch 4 - iter 52/136 - loss 0.05580848 - time (sec): 5.74 - samples/sec: 3608.35 - lr: 0.000037 - momentum: 0.000000
2023-10-16 19:39:21,509 epoch 4 - iter 65/136 - loss 0.05256870 - time (sec): 7.20 - samples/sec: 3558.38 - lr: 0.000036 - momentum: 0.000000
2023-10-16 19:39:22,925 epoch 4 - iter 78/136 - loss 0.05445472 - time (sec): 8.62 - samples/sec: 3545.68 - lr: 0.000036 - momentum: 0.000000
2023-10-16 19:39:24,676 epoch 4 - iter 91/136 - loss 0.05306817 - time (sec): 10.37 - samples/sec: 3511.57 - lr: 0.000035 - momentum: 0.000000
2023-10-16 19:39:25,972 epoch 4 - iter 104/136 - loss 0.05204892 - time (sec): 11.66 - samples/sec: 3509.49 - lr: 0.000035 - momentum: 0.000000
2023-10-16 19:39:27,239 epoch 4 - iter 117/136 - loss 0.04955030 - time (sec): 12.93 - samples/sec: 3519.63 - lr: 0.000034 - momentum: 0.000000
2023-10-16 19:39:28,688 epoch 4 - iter 130/136 - loss 0.05090082 - time (sec): 14.38 - samples/sec: 3482.74 - lr: 0.000034 - momentum: 0.000000
2023-10-16 19:39:29,277 ----------------------------------------------------------------------------------------------------
2023-10-16 19:39:29,277 EPOCH 4 done: loss 0.0503 - lr: 0.000034
2023-10-16 19:39:30,742 DEV : loss 0.11859514564275742 - f1-score (micro avg)  0.7751
2023-10-16 19:39:30,746 saving best model
2023-10-16 19:39:31,279 ----------------------------------------------------------------------------------------------------
2023-10-16 19:39:32,725 epoch 5 - iter 13/136 - loss 0.05288570 - time (sec): 1.44 - samples/sec: 3341.66 - lr: 0.000033 - momentum: 0.000000
2023-10-16 19:39:34,235 epoch 5 - iter 26/136 - loss 0.03888893 - time (sec): 2.95 - samples/sec: 3360.97 - lr: 0.000032 - momentum: 0.000000
2023-10-16 19:39:35,703 epoch 5 - iter 39/136 - loss 0.03461814 - time (sec): 4.42 - samples/sec: 3491.55 - lr: 0.000032 - momentum: 0.000000
2023-10-16 19:39:37,270 epoch 5 - iter 52/136 - loss 0.03465037 - time (sec): 5.99 - samples/sec: 3451.16 - lr: 0.000031 - momentum: 0.000000
2023-10-16 19:39:38,489 epoch 5 - iter 65/136 - loss 0.03740903 - time (sec): 7.21 - samples/sec: 3564.80 - lr: 0.000031 - momentum: 0.000000
2023-10-16 19:39:39,872 epoch 5 - iter 78/136 - loss 0.03524836 - time (sec): 8.59 - samples/sec: 3525.38 - lr: 0.000030 - momentum: 0.000000
2023-10-16 19:39:41,208 epoch 5 - iter 91/136 - loss 0.03585208 - time (sec): 9.92 - samples/sec: 3537.93 - lr: 0.000030 - momentum: 0.000000
2023-10-16 19:39:42,760 epoch 5 - iter 104/136 - loss 0.03429363 - time (sec): 11.48 - samples/sec: 3529.25 - lr: 0.000029 - momentum: 0.000000
2023-10-16 19:39:44,131 epoch 5 - iter 117/136 - loss 0.03278488 - time (sec): 12.85 - samples/sec: 3543.05 - lr: 0.000029 - momentum: 0.000000
2023-10-16 19:39:45,521 epoch 5 - iter 130/136 - loss 0.03204268 - time (sec): 14.24 - samples/sec: 3549.25 - lr: 0.000028 - momentum: 0.000000
2023-10-16 19:39:45,959 ----------------------------------------------------------------------------------------------------
2023-10-16 19:39:45,959 EPOCH 5 done: loss 0.0324 - lr: 0.000028
2023-10-16 19:39:47,752 DEV : loss 0.12475510686635971 - f1-score (micro avg)  0.8214
2023-10-16 19:39:47,756 saving best model
2023-10-16 19:39:48,262 ----------------------------------------------------------------------------------------------------
2023-10-16 19:39:49,651 epoch 6 - iter 13/136 - loss 0.01579763 - time (sec): 1.39 - samples/sec: 3361.70 - lr: 0.000027 - momentum: 0.000000
2023-10-16 19:39:50,851 epoch 6 - iter 26/136 - loss 0.02031476 - time (sec): 2.59 - samples/sec: 3594.29 - lr: 0.000027 - momentum: 0.000000
2023-10-16 19:39:52,223 epoch 6 - iter 39/136 - loss 0.02508221 - time (sec): 3.96 - samples/sec: 3588.18 - lr: 0.000026 - momentum: 0.000000
2023-10-16 19:39:53,795 epoch 6 - iter 52/136 - loss 0.02385107 - time (sec): 5.53 - samples/sec: 3554.23 - lr: 0.000026 - momentum: 0.000000
2023-10-16 19:39:55,038 epoch 6 - iter 65/136 - loss 0.02550333 - time (sec): 6.77 - samples/sec: 3691.72 - lr: 0.000025 - momentum: 0.000000
2023-10-16 19:39:56,455 epoch 6 - iter 78/136 - loss 0.02488529 - time (sec): 8.19 - samples/sec: 3573.47 - lr: 0.000025 - momentum: 0.000000
2023-10-16 19:39:57,884 epoch 6 - iter 91/136 - loss 0.02306815 - time (sec): 9.62 - samples/sec: 3522.48 - lr: 0.000024 - momentum: 0.000000
2023-10-16 19:39:59,503 epoch 6 - iter 104/136 - loss 0.02319266 - time (sec): 11.24 - samples/sec: 3504.24 - lr: 0.000024 - momentum: 0.000000
2023-10-16 19:40:00,962 epoch 6 - iter 117/136 - loss 0.02205655 - time (sec): 12.70 - samples/sec: 3499.80 - lr: 0.000023 - momentum: 0.000000
2023-10-16 19:40:02,319 epoch 6 - iter 130/136 - loss 0.02214229 - time (sec): 14.06 - samples/sec: 3534.47 - lr: 0.000023 - momentum: 0.000000
2023-10-16 19:40:02,991 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:02,992 EPOCH 6 done: loss 0.0226 - lr: 0.000023
2023-10-16 19:40:04,414 DEV : loss 0.1283072680234909 - f1-score (micro avg)  0.8
2023-10-16 19:40:04,418 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:05,810 epoch 7 - iter 13/136 - loss 0.02517646 - time (sec): 1.39 - samples/sec: 4173.17 - lr: 0.000022 - momentum: 0.000000
2023-10-16 19:40:07,200 epoch 7 - iter 26/136 - loss 0.01976096 - time (sec): 2.78 - samples/sec: 3751.25 - lr: 0.000021 - momentum: 0.000000
2023-10-16 19:40:08,558 epoch 7 - iter 39/136 - loss 0.01871697 - time (sec): 4.14 - samples/sec: 3810.94 - lr: 0.000021 - momentum: 0.000000
2023-10-16 19:40:09,860 epoch 7 - iter 52/136 - loss 0.01928305 - time (sec): 5.44 - samples/sec: 3807.28 - lr: 0.000020 - momentum: 0.000000
2023-10-16 19:40:11,172 epoch 7 - iter 65/136 - loss 0.01756808 - time (sec): 6.75 - samples/sec: 3750.80 - lr: 0.000020 - momentum: 0.000000
2023-10-16 19:40:12,582 epoch 7 - iter 78/136 - loss 0.01776526 - time (sec): 8.16 - samples/sec: 3725.04 - lr: 0.000019 - momentum: 0.000000
2023-10-16 19:40:14,129 epoch 7 - iter 91/136 - loss 0.01667503 - time (sec): 9.71 - samples/sec: 3720.31 - lr: 0.000019 - momentum: 0.000000
2023-10-16 19:40:15,364 epoch 7 - iter 104/136 - loss 0.01663346 - time (sec): 10.94 - samples/sec: 3705.00 - lr: 0.000018 - momentum: 0.000000
2023-10-16 19:40:16,770 epoch 7 - iter 117/136 - loss 0.01670474 - time (sec): 12.35 - samples/sec: 3681.72 - lr: 0.000018 - momentum: 0.000000
2023-10-16 19:40:18,168 epoch 7 - iter 130/136 - loss 0.01720601 - time (sec): 13.75 - samples/sec: 3642.58 - lr: 0.000017 - momentum: 0.000000
2023-10-16 19:40:18,768 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:18,768 EPOCH 7 done: loss 0.0174 - lr: 0.000017
2023-10-16 19:40:20,403 DEV : loss 0.14328120648860931 - f1-score (micro avg)  0.7899
2023-10-16 19:40:20,407 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:21,913 epoch 8 - iter 13/136 - loss 0.00578552 - time (sec): 1.50 - samples/sec: 3549.97 - lr: 0.000016 - momentum: 0.000000
2023-10-16 19:40:23,532 epoch 8 - iter 26/136 - loss 0.00802889 - time (sec): 3.12 - samples/sec: 3436.77 - lr: 0.000016 - momentum: 0.000000
2023-10-16 19:40:24,845 epoch 8 - iter 39/136 - loss 0.00991238 - time (sec): 4.44 - samples/sec: 3553.05 - lr: 0.000015 - momentum: 0.000000
2023-10-16 19:40:26,046 epoch 8 - iter 52/136 - loss 0.00922150 - time (sec): 5.64 - samples/sec: 3555.06 - lr: 0.000015 - momentum: 0.000000
2023-10-16 19:40:27,545 epoch 8 - iter 65/136 - loss 0.01032178 - time (sec): 7.14 - samples/sec: 3626.78 - lr: 0.000014 - momentum: 0.000000
2023-10-16 19:40:28,943 epoch 8 - iter 78/136 - loss 0.01028200 - time (sec): 8.54 - samples/sec: 3589.22 - lr: 0.000014 - momentum: 0.000000
2023-10-16 19:40:30,503 epoch 8 - iter 91/136 - loss 0.01074870 - time (sec): 10.09 - samples/sec: 3592.90 - lr: 0.000013 - momentum: 0.000000
2023-10-16 19:40:32,046 epoch 8 - iter 104/136 - loss 0.00989424 - time (sec): 11.64 - samples/sec: 3558.63 - lr: 0.000013 - momentum: 0.000000
2023-10-16 19:40:33,390 epoch 8 - iter 117/136 - loss 0.01118879 - time (sec): 12.98 - samples/sec: 3511.74 - lr: 0.000012 - momentum: 0.000000
2023-10-16 19:40:34,696 epoch 8 - iter 130/136 - loss 0.01176432 - time (sec): 14.29 - samples/sec: 3480.28 - lr: 0.000012 - momentum: 0.000000
2023-10-16 19:40:35,376 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:35,376 EPOCH 8 done: loss 0.0117 - lr: 0.000012
2023-10-16 19:40:36,803 DEV : loss 0.15624405443668365 - f1-score (micro avg)  0.8133
2023-10-16 19:40:36,807 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:38,141 epoch 9 - iter 13/136 - loss 0.00928382 - time (sec): 1.33 - samples/sec: 3710.24 - lr: 0.000011 - momentum: 0.000000
2023-10-16 19:40:39,597 epoch 9 - iter 26/136 - loss 0.01339997 - time (sec): 2.79 - samples/sec: 3706.34 - lr: 0.000010 - momentum: 0.000000
2023-10-16 19:40:41,087 epoch 9 - iter 39/136 - loss 0.01199908 - time (sec): 4.28 - samples/sec: 3708.38 - lr: 0.000010 - momentum: 0.000000
2023-10-16 19:40:42,388 epoch 9 - iter 52/136 - loss 0.01266300 - time (sec): 5.58 - samples/sec: 3762.51 - lr: 0.000009 - momentum: 0.000000
2023-10-16 19:40:43,771 epoch 9 - iter 65/136 - loss 0.01242176 - time (sec): 6.96 - samples/sec: 3686.38 - lr: 0.000009 - momentum: 0.000000
2023-10-16 19:40:45,109 epoch 9 - iter 78/136 - loss 0.01236743 - time (sec): 8.30 - samples/sec: 3581.29 - lr: 0.000008 - momentum: 0.000000
2023-10-16 19:40:46,647 epoch 9 - iter 91/136 - loss 0.01228619 - time (sec): 9.84 - samples/sec: 3554.07 - lr: 0.000008 - momentum: 0.000000
2023-10-16 19:40:47,911 epoch 9 - iter 104/136 - loss 0.01090571 - time (sec): 11.10 - samples/sec: 3580.06 - lr: 0.000007 - momentum: 0.000000
2023-10-16 19:40:49,656 epoch 9 - iter 117/136 - loss 0.01047892 - time (sec): 12.85 - samples/sec: 3517.41 - lr: 0.000007 - momentum: 0.000000
2023-10-16 19:40:51,014 epoch 9 - iter 130/136 - loss 0.00993445 - time (sec): 14.21 - samples/sec: 3532.53 - lr: 0.000006 - momentum: 0.000000
2023-10-16 19:40:51,564 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:51,564 EPOCH 9 done: loss 0.0099 - lr: 0.000006
2023-10-16 19:40:53,004 DEV : loss 0.15900768339633942 - f1-score (micro avg)  0.8088
2023-10-16 19:40:53,008 ----------------------------------------------------------------------------------------------------
2023-10-16 19:40:55,022 epoch 10 - iter 13/136 - loss 0.00709288 - time (sec): 2.01 - samples/sec: 2608.60 - lr: 0.000005 - momentum: 0.000000
2023-10-16 19:40:56,538 epoch 10 - iter 26/136 - loss 0.00518710 - time (sec): 3.53 - samples/sec: 2992.37 - lr: 0.000005 - momentum: 0.000000
2023-10-16 19:40:57,869 epoch 10 - iter 39/136 - loss 0.00781429 - time (sec): 4.86 - samples/sec: 3046.30 - lr: 0.000004 - momentum: 0.000000
2023-10-16 19:40:59,220 epoch 10 - iter 52/136 - loss 0.00899035 - time (sec): 6.21 - samples/sec: 3172.27 - lr: 0.000004 - momentum: 0.000000
2023-10-16 19:41:00,601 epoch 10 - iter 65/136 - loss 0.00817988 - time (sec): 7.59 - samples/sec: 3212.55 - lr: 0.000003 - momentum: 0.000000
2023-10-16 19:41:02,099 epoch 10 - iter 78/136 - loss 0.00777940 - time (sec): 9.09 - samples/sec: 3226.67 - lr: 0.000003 - momentum: 0.000000
2023-10-16 19:41:03,454 epoch 10 - iter 91/136 - loss 0.00719324 - time (sec): 10.44 - samples/sec: 3245.39 - lr: 0.000002 - momentum: 0.000000
2023-10-16 19:41:04,870 epoch 10 - iter 104/136 - loss 0.00693415 - time (sec): 11.86 - samples/sec: 3312.65 - lr: 0.000002 - momentum: 0.000000
2023-10-16 19:41:06,420 epoch 10 - iter 117/136 - loss 0.00767387 - time (sec): 13.41 - samples/sec: 3353.14 - lr: 0.000001 - momentum: 0.000000
2023-10-16 19:41:07,740 epoch 10 - iter 130/136 - loss 0.00777585 - time (sec): 14.73 - samples/sec: 3398.50 - lr: 0.000000 - momentum: 0.000000
2023-10-16 19:41:08,252 ----------------------------------------------------------------------------------------------------
2023-10-16 19:41:08,252 EPOCH 10 done: loss 0.0076 - lr: 0.000000
2023-10-16 19:41:09,680 DEV : loss 0.16889716684818268 - f1-score (micro avg)  0.8103
2023-10-16 19:41:10,080 ----------------------------------------------------------------------------------------------------
2023-10-16 19:41:10,081 Loading model from best epoch ...
2023-10-16 19:41:11,797 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-16 19:41:13,818 
Results:
- F-score (micro) 0.7764
- F-score (macro) 0.7367
- Accuracy 0.651

By class:
              precision    recall  f1-score   support

         LOC     0.7971    0.8942    0.8429       312
         PER     0.6439    0.8606    0.7366       208
         ORG     0.5366    0.4000    0.4583        55
   HumanProd     0.9091    0.9091    0.9091        22

   micro avg     0.7236    0.8375    0.7764       597
   macro avg     0.7217    0.7660    0.7367       597
weighted avg     0.7239    0.8375    0.7729       597

2023-10-16 19:41:13,818 ----------------------------------------------------------------------------------------------------