File size: 24,015 Bytes
e0e8840
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
2023-10-19 23:58:32,450 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,451 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 128)
        (position_embeddings): Embedding(512, 128)
        (token_type_embeddings): Embedding(2, 128)
        (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-1): 2 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=128, out_features=128, bias=True)
                (key): Linear(in_features=128, out_features=128, bias=True)
                (value): Linear(in_features=128, out_features=128, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=128, out_features=128, bias=True)
                (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=128, out_features=512, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=512, out_features=128, bias=True)
              (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=128, out_features=128, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=128, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-19 23:58:32,451 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,451 MultiCorpus: 1166 train + 165 dev + 415 test sentences
 - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
2023-10-19 23:58:32,451 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,451 Train:  1166 sentences
2023-10-19 23:58:32,451         (train_with_dev=False, train_with_test=False)
2023-10-19 23:58:32,451 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,451 Training Params:
2023-10-19 23:58:32,451  - learning_rate: "5e-05" 
2023-10-19 23:58:32,451  - mini_batch_size: "4"
2023-10-19 23:58:32,451  - max_epochs: "10"
2023-10-19 23:58:32,451  - shuffle: "True"
2023-10-19 23:58:32,451 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,451 Plugins:
2023-10-19 23:58:32,451  - TensorboardLogger
2023-10-19 23:58:32,451  - LinearScheduler | warmup_fraction: '0.1'
2023-10-19 23:58:32,451 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,451 Final evaluation on model from best epoch (best-model.pt)
2023-10-19 23:58:32,451  - metric: "('micro avg', 'f1-score')"
2023-10-19 23:58:32,451 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,451 Computation:
2023-10-19 23:58:32,451  - compute on device: cuda:0
2023-10-19 23:58:32,451  - embedding storage: none
2023-10-19 23:58:32,452 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,452 Model training base path: "hmbench-newseye/fi-dbmdz/bert-tiny-historic-multilingual-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-5"
2023-10-19 23:58:32,452 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,452 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:32,452 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-19 23:58:32,922 epoch 1 - iter 29/292 - loss 3.14779499 - time (sec): 0.47 - samples/sec: 8417.41 - lr: 0.000005 - momentum: 0.000000
2023-10-19 23:58:33,371 epoch 1 - iter 58/292 - loss 3.07959948 - time (sec): 0.92 - samples/sec: 8375.04 - lr: 0.000010 - momentum: 0.000000
2023-10-19 23:58:33,854 epoch 1 - iter 87/292 - loss 2.99636841 - time (sec): 1.40 - samples/sec: 8658.66 - lr: 0.000015 - momentum: 0.000000
2023-10-19 23:58:34,361 epoch 1 - iter 116/292 - loss 2.81136763 - time (sec): 1.91 - samples/sec: 8448.44 - lr: 0.000020 - momentum: 0.000000
2023-10-19 23:58:34,874 epoch 1 - iter 145/292 - loss 2.57856572 - time (sec): 2.42 - samples/sec: 8523.19 - lr: 0.000025 - momentum: 0.000000
2023-10-19 23:58:35,405 epoch 1 - iter 174/292 - loss 2.34135925 - time (sec): 2.95 - samples/sec: 8378.57 - lr: 0.000030 - momentum: 0.000000
2023-10-19 23:58:35,938 epoch 1 - iter 203/292 - loss 2.04834995 - time (sec): 3.49 - samples/sec: 8631.55 - lr: 0.000035 - momentum: 0.000000
2023-10-19 23:58:36,449 epoch 1 - iter 232/292 - loss 1.87569980 - time (sec): 4.00 - samples/sec: 8640.24 - lr: 0.000040 - momentum: 0.000000
2023-10-19 23:58:36,995 epoch 1 - iter 261/292 - loss 1.72976459 - time (sec): 4.54 - samples/sec: 8700.50 - lr: 0.000045 - momentum: 0.000000
2023-10-19 23:58:37,512 epoch 1 - iter 290/292 - loss 1.62662447 - time (sec): 5.06 - samples/sec: 8755.13 - lr: 0.000049 - momentum: 0.000000
2023-10-19 23:58:37,540 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:37,540 EPOCH 1 done: loss 1.6226 - lr: 0.000049
2023-10-19 23:58:37,807 DEV : loss 0.46307969093322754 - f1-score (micro avg)  0.0
2023-10-19 23:58:37,811 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:38,333 epoch 2 - iter 29/292 - loss 0.82866034 - time (sec): 0.52 - samples/sec: 9421.74 - lr: 0.000049 - momentum: 0.000000
2023-10-19 23:58:38,857 epoch 2 - iter 58/292 - loss 0.77057818 - time (sec): 1.05 - samples/sec: 9203.30 - lr: 0.000049 - momentum: 0.000000
2023-10-19 23:58:39,364 epoch 2 - iter 87/292 - loss 0.72915775 - time (sec): 1.55 - samples/sec: 8858.99 - lr: 0.000048 - momentum: 0.000000
2023-10-19 23:58:39,867 epoch 2 - iter 116/292 - loss 0.72394303 - time (sec): 2.06 - samples/sec: 8546.65 - lr: 0.000048 - momentum: 0.000000
2023-10-19 23:58:40,382 epoch 2 - iter 145/292 - loss 0.69723734 - time (sec): 2.57 - samples/sec: 8424.58 - lr: 0.000047 - momentum: 0.000000
2023-10-19 23:58:40,892 epoch 2 - iter 174/292 - loss 0.67216825 - time (sec): 3.08 - samples/sec: 8432.93 - lr: 0.000047 - momentum: 0.000000
2023-10-19 23:58:41,418 epoch 2 - iter 203/292 - loss 0.65219976 - time (sec): 3.61 - samples/sec: 8329.25 - lr: 0.000046 - momentum: 0.000000
2023-10-19 23:58:41,947 epoch 2 - iter 232/292 - loss 0.61740936 - time (sec): 4.14 - samples/sec: 8592.86 - lr: 0.000046 - momentum: 0.000000
2023-10-19 23:58:42,474 epoch 2 - iter 261/292 - loss 0.60180613 - time (sec): 4.66 - samples/sec: 8681.15 - lr: 0.000045 - momentum: 0.000000
2023-10-19 23:58:42,967 epoch 2 - iter 290/292 - loss 0.60028304 - time (sec): 5.16 - samples/sec: 8555.21 - lr: 0.000045 - momentum: 0.000000
2023-10-19 23:58:42,998 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:42,999 EPOCH 2 done: loss 0.5990 - lr: 0.000045
2023-10-19 23:58:43,626 DEV : loss 0.3582861125469208 - f1-score (micro avg)  0.0085
2023-10-19 23:58:43,630 saving best model
2023-10-19 23:58:43,661 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:44,166 epoch 3 - iter 29/292 - loss 0.42071781 - time (sec): 0.50 - samples/sec: 8686.20 - lr: 0.000044 - momentum: 0.000000
2023-10-19 23:58:44,684 epoch 3 - iter 58/292 - loss 0.43984250 - time (sec): 1.02 - samples/sec: 8569.64 - lr: 0.000043 - momentum: 0.000000
2023-10-19 23:58:45,207 epoch 3 - iter 87/292 - loss 0.46328548 - time (sec): 1.55 - samples/sec: 8822.27 - lr: 0.000043 - momentum: 0.000000
2023-10-19 23:58:45,733 epoch 3 - iter 116/292 - loss 0.49552322 - time (sec): 2.07 - samples/sec: 8657.57 - lr: 0.000042 - momentum: 0.000000
2023-10-19 23:58:46,403 epoch 3 - iter 145/292 - loss 0.49277663 - time (sec): 2.74 - samples/sec: 8130.26 - lr: 0.000042 - momentum: 0.000000
2023-10-19 23:58:46,912 epoch 3 - iter 174/292 - loss 0.48425229 - time (sec): 3.25 - samples/sec: 8346.84 - lr: 0.000041 - momentum: 0.000000
2023-10-19 23:58:47,345 epoch 3 - iter 203/292 - loss 0.47920464 - time (sec): 3.68 - samples/sec: 8445.19 - lr: 0.000041 - momentum: 0.000000
2023-10-19 23:58:47,790 epoch 3 - iter 232/292 - loss 0.47196920 - time (sec): 4.13 - samples/sec: 8656.03 - lr: 0.000040 - momentum: 0.000000
2023-10-19 23:58:48,225 epoch 3 - iter 261/292 - loss 0.46824817 - time (sec): 4.56 - samples/sec: 8662.99 - lr: 0.000040 - momentum: 0.000000
2023-10-19 23:58:48,718 epoch 3 - iter 290/292 - loss 0.46590176 - time (sec): 5.06 - samples/sec: 8723.29 - lr: 0.000039 - momentum: 0.000000
2023-10-19 23:58:48,759 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:48,759 EPOCH 3 done: loss 0.4648 - lr: 0.000039
2023-10-19 23:58:49,392 DEV : loss 0.3454797565937042 - f1-score (micro avg)  0.0561
2023-10-19 23:58:49,396 saving best model
2023-10-19 23:58:49,434 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:49,923 epoch 4 - iter 29/292 - loss 0.37633271 - time (sec): 0.49 - samples/sec: 8019.09 - lr: 0.000038 - momentum: 0.000000
2023-10-19 23:58:50,428 epoch 4 - iter 58/292 - loss 0.39081668 - time (sec): 0.99 - samples/sec: 8129.96 - lr: 0.000038 - momentum: 0.000000
2023-10-19 23:58:50,948 epoch 4 - iter 87/292 - loss 0.38739784 - time (sec): 1.51 - samples/sec: 8446.57 - lr: 0.000037 - momentum: 0.000000
2023-10-19 23:58:51,445 epoch 4 - iter 116/292 - loss 0.38345280 - time (sec): 2.01 - samples/sec: 8461.51 - lr: 0.000037 - momentum: 0.000000
2023-10-19 23:58:51,973 epoch 4 - iter 145/292 - loss 0.38672167 - time (sec): 2.54 - samples/sec: 8359.16 - lr: 0.000036 - momentum: 0.000000
2023-10-19 23:58:52,487 epoch 4 - iter 174/292 - loss 0.39123246 - time (sec): 3.05 - samples/sec: 8332.83 - lr: 0.000036 - momentum: 0.000000
2023-10-19 23:58:53,015 epoch 4 - iter 203/292 - loss 0.40684055 - time (sec): 3.58 - samples/sec: 8599.45 - lr: 0.000035 - momentum: 0.000000
2023-10-19 23:58:53,514 epoch 4 - iter 232/292 - loss 0.40715733 - time (sec): 4.08 - samples/sec: 8461.76 - lr: 0.000035 - momentum: 0.000000
2023-10-19 23:58:54,020 epoch 4 - iter 261/292 - loss 0.40395442 - time (sec): 4.58 - samples/sec: 8452.62 - lr: 0.000034 - momentum: 0.000000
2023-10-19 23:58:54,583 epoch 4 - iter 290/292 - loss 0.40765257 - time (sec): 5.15 - samples/sec: 8610.09 - lr: 0.000033 - momentum: 0.000000
2023-10-19 23:58:54,610 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:54,611 EPOCH 4 done: loss 0.4075 - lr: 0.000033
2023-10-19 23:58:55,258 DEV : loss 0.31024691462516785 - f1-score (micro avg)  0.2088
2023-10-19 23:58:55,262 saving best model
2023-10-19 23:58:55,296 ----------------------------------------------------------------------------------------------------
2023-10-19 23:58:55,802 epoch 5 - iter 29/292 - loss 0.38537138 - time (sec): 0.51 - samples/sec: 10355.27 - lr: 0.000033 - momentum: 0.000000
2023-10-19 23:58:56,323 epoch 5 - iter 58/292 - loss 0.43876506 - time (sec): 1.03 - samples/sec: 9554.92 - lr: 0.000032 - momentum: 0.000000
2023-10-19 23:58:56,823 epoch 5 - iter 87/292 - loss 0.40671441 - time (sec): 1.53 - samples/sec: 8982.40 - lr: 0.000032 - momentum: 0.000000
2023-10-19 23:58:57,330 epoch 5 - iter 116/292 - loss 0.40007945 - time (sec): 2.03 - samples/sec: 8784.42 - lr: 0.000031 - momentum: 0.000000
2023-10-19 23:58:57,858 epoch 5 - iter 145/292 - loss 0.39158917 - time (sec): 2.56 - samples/sec: 8880.38 - lr: 0.000031 - momentum: 0.000000
2023-10-19 23:58:58,371 epoch 5 - iter 174/292 - loss 0.37921891 - time (sec): 3.07 - samples/sec: 8775.96 - lr: 0.000030 - momentum: 0.000000
2023-10-19 23:58:58,855 epoch 5 - iter 203/292 - loss 0.38011427 - time (sec): 3.56 - samples/sec: 8685.58 - lr: 0.000030 - momentum: 0.000000
2023-10-19 23:58:59,389 epoch 5 - iter 232/292 - loss 0.39053655 - time (sec): 4.09 - samples/sec: 8676.83 - lr: 0.000029 - momentum: 0.000000
2023-10-19 23:58:59,890 epoch 5 - iter 261/292 - loss 0.38870844 - time (sec): 4.59 - samples/sec: 8575.24 - lr: 0.000028 - momentum: 0.000000
2023-10-19 23:59:00,411 epoch 5 - iter 290/292 - loss 0.37911182 - time (sec): 5.11 - samples/sec: 8672.12 - lr: 0.000028 - momentum: 0.000000
2023-10-19 23:59:00,439 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:00,439 EPOCH 5 done: loss 0.3785 - lr: 0.000028
2023-10-19 23:59:01,076 DEV : loss 0.3215314745903015 - f1-score (micro avg)  0.2246
2023-10-19 23:59:01,080 saving best model
2023-10-19 23:59:01,113 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:01,632 epoch 6 - iter 29/292 - loss 0.38224130 - time (sec): 0.52 - samples/sec: 9011.08 - lr: 0.000027 - momentum: 0.000000
2023-10-19 23:59:02,088 epoch 6 - iter 58/292 - loss 0.36144289 - time (sec): 0.97 - samples/sec: 8652.39 - lr: 0.000027 - momentum: 0.000000
2023-10-19 23:59:02,532 epoch 6 - iter 87/292 - loss 0.37097587 - time (sec): 1.42 - samples/sec: 8935.31 - lr: 0.000026 - momentum: 0.000000
2023-10-19 23:59:02,976 epoch 6 - iter 116/292 - loss 0.36235103 - time (sec): 1.86 - samples/sec: 9274.47 - lr: 0.000026 - momentum: 0.000000
2023-10-19 23:59:03,426 epoch 6 - iter 145/292 - loss 0.35004386 - time (sec): 2.31 - samples/sec: 9520.28 - lr: 0.000025 - momentum: 0.000000
2023-10-19 23:59:03,894 epoch 6 - iter 174/292 - loss 0.35006772 - time (sec): 2.78 - samples/sec: 9664.14 - lr: 0.000025 - momentum: 0.000000
2023-10-19 23:59:04,370 epoch 6 - iter 203/292 - loss 0.33807395 - time (sec): 3.26 - samples/sec: 9664.13 - lr: 0.000024 - momentum: 0.000000
2023-10-19 23:59:04,858 epoch 6 - iter 232/292 - loss 0.33896877 - time (sec): 3.74 - samples/sec: 9609.82 - lr: 0.000023 - momentum: 0.000000
2023-10-19 23:59:05,390 epoch 6 - iter 261/292 - loss 0.34191111 - time (sec): 4.28 - samples/sec: 9355.23 - lr: 0.000023 - momentum: 0.000000
2023-10-19 23:59:05,897 epoch 6 - iter 290/292 - loss 0.34950655 - time (sec): 4.78 - samples/sec: 9240.88 - lr: 0.000022 - momentum: 0.000000
2023-10-19 23:59:05,928 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:05,928 EPOCH 6 done: loss 0.3515 - lr: 0.000022
2023-10-19 23:59:06,570 DEV : loss 0.31194955110549927 - f1-score (micro avg)  0.2452
2023-10-19 23:59:06,574 saving best model
2023-10-19 23:59:06,608 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:07,151 epoch 7 - iter 29/292 - loss 0.26753793 - time (sec): 0.54 - samples/sec: 10262.20 - lr: 0.000022 - momentum: 0.000000
2023-10-19 23:59:07,637 epoch 7 - iter 58/292 - loss 0.33094214 - time (sec): 1.03 - samples/sec: 9137.23 - lr: 0.000021 - momentum: 0.000000
2023-10-19 23:59:08,121 epoch 7 - iter 87/292 - loss 0.34593265 - time (sec): 1.51 - samples/sec: 8884.86 - lr: 0.000021 - momentum: 0.000000
2023-10-19 23:59:08,643 epoch 7 - iter 116/292 - loss 0.32828561 - time (sec): 2.03 - samples/sec: 8790.71 - lr: 0.000020 - momentum: 0.000000
2023-10-19 23:59:09,166 epoch 7 - iter 145/292 - loss 0.33425252 - time (sec): 2.56 - samples/sec: 8462.47 - lr: 0.000020 - momentum: 0.000000
2023-10-19 23:59:09,718 epoch 7 - iter 174/292 - loss 0.34544164 - time (sec): 3.11 - samples/sec: 8606.96 - lr: 0.000019 - momentum: 0.000000
2023-10-19 23:59:10,245 epoch 7 - iter 203/292 - loss 0.33684443 - time (sec): 3.64 - samples/sec: 8692.09 - lr: 0.000018 - momentum: 0.000000
2023-10-19 23:59:10,742 epoch 7 - iter 232/292 - loss 0.34217880 - time (sec): 4.13 - samples/sec: 8692.34 - lr: 0.000018 - momentum: 0.000000
2023-10-19 23:59:11,257 epoch 7 - iter 261/292 - loss 0.33355551 - time (sec): 4.65 - samples/sec: 8615.33 - lr: 0.000017 - momentum: 0.000000
2023-10-19 23:59:11,785 epoch 7 - iter 290/292 - loss 0.32942474 - time (sec): 5.18 - samples/sec: 8527.00 - lr: 0.000017 - momentum: 0.000000
2023-10-19 23:59:11,817 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:11,818 EPOCH 7 done: loss 0.3300 - lr: 0.000017
2023-10-19 23:59:12,459 DEV : loss 0.2952404320240021 - f1-score (micro avg)  0.2941
2023-10-19 23:59:12,463 saving best model
2023-10-19 23:59:12,495 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:12,988 epoch 8 - iter 29/292 - loss 0.30565935 - time (sec): 0.49 - samples/sec: 8828.66 - lr: 0.000016 - momentum: 0.000000
2023-10-19 23:59:13,524 epoch 8 - iter 58/292 - loss 0.34043801 - time (sec): 1.03 - samples/sec: 8887.12 - lr: 0.000016 - momentum: 0.000000
2023-10-19 23:59:14,090 epoch 8 - iter 87/292 - loss 0.30212031 - time (sec): 1.59 - samples/sec: 9162.88 - lr: 0.000015 - momentum: 0.000000
2023-10-19 23:59:14,615 epoch 8 - iter 116/292 - loss 0.31110236 - time (sec): 2.12 - samples/sec: 8676.70 - lr: 0.000015 - momentum: 0.000000
2023-10-19 23:59:15,118 epoch 8 - iter 145/292 - loss 0.31665402 - time (sec): 2.62 - samples/sec: 8340.54 - lr: 0.000014 - momentum: 0.000000
2023-10-19 23:59:15,650 epoch 8 - iter 174/292 - loss 0.32087859 - time (sec): 3.15 - samples/sec: 8269.51 - lr: 0.000013 - momentum: 0.000000
2023-10-19 23:59:16,189 epoch 8 - iter 203/292 - loss 0.31580278 - time (sec): 3.69 - samples/sec: 8200.91 - lr: 0.000013 - momentum: 0.000000
2023-10-19 23:59:16,730 epoch 8 - iter 232/292 - loss 0.32171904 - time (sec): 4.23 - samples/sec: 8120.26 - lr: 0.000012 - momentum: 0.000000
2023-10-19 23:59:17,272 epoch 8 - iter 261/292 - loss 0.31735134 - time (sec): 4.78 - samples/sec: 8118.22 - lr: 0.000012 - momentum: 0.000000
2023-10-19 23:59:17,799 epoch 8 - iter 290/292 - loss 0.33042954 - time (sec): 5.30 - samples/sec: 8319.87 - lr: 0.000011 - momentum: 0.000000
2023-10-19 23:59:17,831 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:17,831 EPOCH 8 done: loss 0.3287 - lr: 0.000011
2023-10-19 23:59:18,470 DEV : loss 0.300225168466568 - f1-score (micro avg)  0.274
2023-10-19 23:59:18,474 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:18,946 epoch 9 - iter 29/292 - loss 0.33105087 - time (sec): 0.47 - samples/sec: 8533.89 - lr: 0.000011 - momentum: 0.000000
2023-10-19 23:59:19,435 epoch 9 - iter 58/292 - loss 0.30511678 - time (sec): 0.96 - samples/sec: 8494.20 - lr: 0.000010 - momentum: 0.000000
2023-10-19 23:59:19,961 epoch 9 - iter 87/292 - loss 0.30860155 - time (sec): 1.49 - samples/sec: 8654.05 - lr: 0.000010 - momentum: 0.000000
2023-10-19 23:59:20,608 epoch 9 - iter 116/292 - loss 0.29350432 - time (sec): 2.13 - samples/sec: 8002.46 - lr: 0.000009 - momentum: 0.000000
2023-10-19 23:59:21,087 epoch 9 - iter 145/292 - loss 0.30324049 - time (sec): 2.61 - samples/sec: 8017.85 - lr: 0.000008 - momentum: 0.000000
2023-10-19 23:59:21,607 epoch 9 - iter 174/292 - loss 0.30397607 - time (sec): 3.13 - samples/sec: 8139.04 - lr: 0.000008 - momentum: 0.000000
2023-10-19 23:59:22,126 epoch 9 - iter 203/292 - loss 0.30607357 - time (sec): 3.65 - samples/sec: 8231.07 - lr: 0.000007 - momentum: 0.000000
2023-10-19 23:59:22,674 epoch 9 - iter 232/292 - loss 0.31419985 - time (sec): 4.20 - samples/sec: 8426.15 - lr: 0.000007 - momentum: 0.000000
2023-10-19 23:59:23,183 epoch 9 - iter 261/292 - loss 0.31077284 - time (sec): 4.71 - samples/sec: 8428.15 - lr: 0.000006 - momentum: 0.000000
2023-10-19 23:59:23,693 epoch 9 - iter 290/292 - loss 0.31676368 - time (sec): 5.22 - samples/sec: 8487.29 - lr: 0.000006 - momentum: 0.000000
2023-10-19 23:59:23,722 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:23,722 EPOCH 9 done: loss 0.3177 - lr: 0.000006
2023-10-19 23:59:24,359 DEV : loss 0.30219903588294983 - f1-score (micro avg)  0.2727
2023-10-19 23:59:24,364 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:24,907 epoch 10 - iter 29/292 - loss 0.26410275 - time (sec): 0.54 - samples/sec: 9510.14 - lr: 0.000005 - momentum: 0.000000
2023-10-19 23:59:25,492 epoch 10 - iter 58/292 - loss 0.29335365 - time (sec): 1.13 - samples/sec: 9521.29 - lr: 0.000005 - momentum: 0.000000
2023-10-19 23:59:26,015 epoch 10 - iter 87/292 - loss 0.30417325 - time (sec): 1.65 - samples/sec: 8740.51 - lr: 0.000004 - momentum: 0.000000
2023-10-19 23:59:26,576 epoch 10 - iter 116/292 - loss 0.30083326 - time (sec): 2.21 - samples/sec: 8421.16 - lr: 0.000003 - momentum: 0.000000
2023-10-19 23:59:27,105 epoch 10 - iter 145/292 - loss 0.30274681 - time (sec): 2.74 - samples/sec: 8203.33 - lr: 0.000003 - momentum: 0.000000
2023-10-19 23:59:27,633 epoch 10 - iter 174/292 - loss 0.30387774 - time (sec): 3.27 - samples/sec: 8146.14 - lr: 0.000002 - momentum: 0.000000
2023-10-19 23:59:28,112 epoch 10 - iter 203/292 - loss 0.30407753 - time (sec): 3.75 - samples/sec: 8142.22 - lr: 0.000002 - momentum: 0.000000
2023-10-19 23:59:28,631 epoch 10 - iter 232/292 - loss 0.30962134 - time (sec): 4.27 - samples/sec: 8163.95 - lr: 0.000001 - momentum: 0.000000
2023-10-19 23:59:29,152 epoch 10 - iter 261/292 - loss 0.31082971 - time (sec): 4.79 - samples/sec: 8096.13 - lr: 0.000001 - momentum: 0.000000
2023-10-19 23:59:29,705 epoch 10 - iter 290/292 - loss 0.31152913 - time (sec): 5.34 - samples/sec: 8290.69 - lr: 0.000000 - momentum: 0.000000
2023-10-19 23:59:29,733 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:29,733 EPOCH 10 done: loss 0.3108 - lr: 0.000000
2023-10-19 23:59:30,397 DEV : loss 0.2987516224384308 - f1-score (micro avg)  0.2961
2023-10-19 23:59:30,401 saving best model
2023-10-19 23:59:30,468 ----------------------------------------------------------------------------------------------------
2023-10-19 23:59:30,468 Loading model from best epoch ...
2023-10-19 23:59:30,544 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-19 23:59:31,512 
Results:
- F-score (micro) 0.377
- F-score (macro) 0.1977
- Accuracy 0.243

By class:
              precision    recall  f1-score   support

         PER     0.4282    0.4368    0.4324       348
         LOC     0.3185    0.4100    0.3585       261
         ORG     0.0000    0.0000    0.0000        52
   HumanProd     0.0000    0.0000    0.0000        22

   micro avg     0.3748    0.3792    0.3770       683
   macro avg     0.1867    0.2117    0.1977       683
weighted avg     0.3399    0.3792    0.3573       683

2023-10-19 23:59:31,512 ----------------------------------------------------------------------------------------------------