File size: 24,169 Bytes
dd1dfcd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
2023-10-27 14:30:09,020 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,022 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): XLMRobertaModel(
      (embeddings): XLMRobertaEmbeddings(
        (word_embeddings): Embedding(250003, 1024)
        (position_embeddings): Embedding(514, 1024, padding_idx=1)
        (token_type_embeddings): Embedding(1, 1024)
        (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): XLMRobertaEncoder(
        (layer): ModuleList(
          (0-23): 24 x XLMRobertaLayer(
            (attention): XLMRobertaAttention(
              (self): XLMRobertaSelfAttention(
                (query): Linear(in_features=1024, out_features=1024, bias=True)
                (key): Linear(in_features=1024, out_features=1024, bias=True)
                (value): Linear(in_features=1024, out_features=1024, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): XLMRobertaSelfOutput(
                (dense): Linear(in_features=1024, out_features=1024, bias=True)
                (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): XLMRobertaIntermediate(
              (dense): Linear(in_features=1024, out_features=4096, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): XLMRobertaOutput(
              (dense): Linear(in_features=4096, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): XLMRobertaPooler(
        (dense): Linear(in_features=1024, out_features=1024, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1024, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-27 14:30:09,022 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,022 Corpus: 14903 train + 3449 dev + 3658 test sentences
2023-10-27 14:30:09,022 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,022 Train:  14903 sentences
2023-10-27 14:30:09,022         (train_with_dev=False, train_with_test=False)
2023-10-27 14:30:09,022 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,022 Training Params:
2023-10-27 14:30:09,022  - learning_rate: "5e-06" 
2023-10-27 14:30:09,022  - mini_batch_size: "4"
2023-10-27 14:30:09,022  - max_epochs: "10"
2023-10-27 14:30:09,022  - shuffle: "True"
2023-10-27 14:30:09,022 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,022 Plugins:
2023-10-27 14:30:09,022  - TensorboardLogger
2023-10-27 14:30:09,022  - LinearScheduler | warmup_fraction: '0.1'
2023-10-27 14:30:09,022 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,022 Final evaluation on model from best epoch (best-model.pt)
2023-10-27 14:30:09,023  - metric: "('micro avg', 'f1-score')"
2023-10-27 14:30:09,023 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,023 Computation:
2023-10-27 14:30:09,023  - compute on device: cuda:0
2023-10-27 14:30:09,023  - embedding storage: none
2023-10-27 14:30:09,023 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,023 Model training base path: "flair-clean-conll-lr5e-06-bs4-1"
2023-10-27 14:30:09,023 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,023 ----------------------------------------------------------------------------------------------------
2023-10-27 14:30:09,023 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-27 14:31:00,447 epoch 1 - iter 372/3726 - loss 3.59658936 - time (sec): 51.42 - samples/sec: 413.36 - lr: 0.000000 - momentum: 0.000000
2023-10-27 14:31:51,041 epoch 1 - iter 744/3726 - loss 2.26388470 - time (sec): 102.02 - samples/sec: 413.01 - lr: 0.000001 - momentum: 0.000000
2023-10-27 14:32:42,079 epoch 1 - iter 1116/3726 - loss 1.71280298 - time (sec): 153.05 - samples/sec: 407.18 - lr: 0.000001 - momentum: 0.000000
2023-10-27 14:33:32,757 epoch 1 - iter 1488/3726 - loss 1.39694066 - time (sec): 203.73 - samples/sec: 405.85 - lr: 0.000002 - momentum: 0.000000
2023-10-27 14:34:23,303 epoch 1 - iter 1860/3726 - loss 1.17731325 - time (sec): 254.28 - samples/sec: 406.61 - lr: 0.000002 - momentum: 0.000000
2023-10-27 14:35:13,791 epoch 1 - iter 2232/3726 - loss 1.01805377 - time (sec): 304.77 - samples/sec: 405.77 - lr: 0.000003 - momentum: 0.000000
2023-10-27 14:36:04,555 epoch 1 - iter 2604/3726 - loss 0.89618095 - time (sec): 355.53 - samples/sec: 404.90 - lr: 0.000003 - momentum: 0.000000
2023-10-27 14:36:55,602 epoch 1 - iter 2976/3726 - loss 0.80113668 - time (sec): 406.58 - samples/sec: 403.81 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:37:46,433 epoch 1 - iter 3348/3726 - loss 0.72868926 - time (sec): 457.41 - samples/sec: 401.79 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:38:36,663 epoch 1 - iter 3720/3726 - loss 0.66657776 - time (sec): 507.64 - samples/sec: 402.50 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:38:37,488 ----------------------------------------------------------------------------------------------------
2023-10-27 14:38:37,488 EPOCH 1 done: loss 0.6659 - lr: 0.000005
2023-10-27 14:39:02,791 DEV : loss 0.0933869257569313 - f1-score (micro avg)  0.9262
2023-10-27 14:39:02,853 saving best model
2023-10-27 14:39:05,885 ----------------------------------------------------------------------------------------------------
2023-10-27 14:39:56,330 epoch 2 - iter 372/3726 - loss 0.09283601 - time (sec): 50.44 - samples/sec: 393.59 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:40:47,261 epoch 2 - iter 744/3726 - loss 0.08756607 - time (sec): 101.37 - samples/sec: 396.30 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:41:38,098 epoch 2 - iter 1116/3726 - loss 0.08583083 - time (sec): 152.21 - samples/sec: 397.99 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:42:28,471 epoch 2 - iter 1488/3726 - loss 0.08370769 - time (sec): 202.58 - samples/sec: 400.87 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:43:19,871 epoch 2 - iter 1860/3726 - loss 0.08404411 - time (sec): 253.98 - samples/sec: 399.49 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:44:10,168 epoch 2 - iter 2232/3726 - loss 0.08124313 - time (sec): 304.28 - samples/sec: 397.73 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:45:01,496 epoch 2 - iter 2604/3726 - loss 0.08228873 - time (sec): 355.61 - samples/sec: 397.75 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:45:52,335 epoch 2 - iter 2976/3726 - loss 0.08159191 - time (sec): 406.45 - samples/sec: 399.36 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:46:43,142 epoch 2 - iter 3348/3726 - loss 0.08066519 - time (sec): 457.26 - samples/sec: 401.12 - lr: 0.000005 - momentum: 0.000000
2023-10-27 14:47:34,049 epoch 2 - iter 3720/3726 - loss 0.08043171 - time (sec): 508.16 - samples/sec: 402.15 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:47:34,866 ----------------------------------------------------------------------------------------------------
2023-10-27 14:47:34,866 EPOCH 2 done: loss 0.0805 - lr: 0.000004
2023-10-27 14:48:01,712 DEV : loss 0.06152888387441635 - f1-score (micro avg)  0.9481
2023-10-27 14:48:01,789 saving best model
2023-10-27 14:48:05,294 ----------------------------------------------------------------------------------------------------
2023-10-27 14:48:55,650 epoch 3 - iter 372/3726 - loss 0.06085669 - time (sec): 50.35 - samples/sec: 406.39 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:49:46,492 epoch 3 - iter 744/3726 - loss 0.05743897 - time (sec): 101.20 - samples/sec: 400.38 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:50:36,994 epoch 3 - iter 1116/3726 - loss 0.05779768 - time (sec): 151.70 - samples/sec: 398.09 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:51:26,995 epoch 3 - iter 1488/3726 - loss 0.05644504 - time (sec): 201.70 - samples/sec: 401.46 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:52:17,584 epoch 3 - iter 1860/3726 - loss 0.05572416 - time (sec): 252.29 - samples/sec: 403.62 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:53:08,570 epoch 3 - iter 2232/3726 - loss 0.05352112 - time (sec): 303.27 - samples/sec: 403.61 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:53:59,036 epoch 3 - iter 2604/3726 - loss 0.05461713 - time (sec): 353.74 - samples/sec: 404.61 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:54:49,877 epoch 3 - iter 2976/3726 - loss 0.05390525 - time (sec): 404.58 - samples/sec: 405.62 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:55:41,535 epoch 3 - iter 3348/3726 - loss 0.05442763 - time (sec): 456.24 - samples/sec: 402.73 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:56:32,799 epoch 3 - iter 3720/3726 - loss 0.05458475 - time (sec): 507.50 - samples/sec: 402.71 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:56:33,616 ----------------------------------------------------------------------------------------------------
2023-10-27 14:56:33,617 EPOCH 3 done: loss 0.0545 - lr: 0.000004
2023-10-27 14:57:00,329 DEV : loss 0.06314758211374283 - f1-score (micro avg)  0.9612
2023-10-27 14:57:00,411 saving best model
2023-10-27 14:57:03,863 ----------------------------------------------------------------------------------------------------
2023-10-27 14:57:54,121 epoch 4 - iter 372/3726 - loss 0.03226694 - time (sec): 50.26 - samples/sec: 411.62 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:58:44,919 epoch 4 - iter 744/3726 - loss 0.03936813 - time (sec): 101.05 - samples/sec: 405.68 - lr: 0.000004 - momentum: 0.000000
2023-10-27 14:59:35,509 epoch 4 - iter 1116/3726 - loss 0.03970607 - time (sec): 151.64 - samples/sec: 406.37 - lr: 0.000004 - momentum: 0.000000
2023-10-27 15:00:23,759 epoch 4 - iter 1488/3726 - loss 0.04029682 - time (sec): 199.89 - samples/sec: 408.57 - lr: 0.000004 - momentum: 0.000000
2023-10-27 15:01:12,178 epoch 4 - iter 1860/3726 - loss 0.03993881 - time (sec): 248.31 - samples/sec: 412.03 - lr: 0.000004 - momentum: 0.000000
2023-10-27 15:01:59,669 epoch 4 - iter 2232/3726 - loss 0.03825023 - time (sec): 295.80 - samples/sec: 415.38 - lr: 0.000004 - momentum: 0.000000
2023-10-27 15:02:46,739 epoch 4 - iter 2604/3726 - loss 0.03860270 - time (sec): 342.87 - samples/sec: 417.88 - lr: 0.000004 - momentum: 0.000000
2023-10-27 15:03:33,155 epoch 4 - iter 2976/3726 - loss 0.03883180 - time (sec): 389.29 - samples/sec: 420.36 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:04:18,906 epoch 4 - iter 3348/3726 - loss 0.03843582 - time (sec): 435.04 - samples/sec: 424.15 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:05:06,069 epoch 4 - iter 3720/3726 - loss 0.03900868 - time (sec): 482.20 - samples/sec: 423.76 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:05:06,804 ----------------------------------------------------------------------------------------------------
2023-10-27 15:05:06,805 EPOCH 4 done: loss 0.0390 - lr: 0.000003
2023-10-27 15:05:31,408 DEV : loss 0.04959763213992119 - f1-score (micro avg)  0.9678
2023-10-27 15:05:31,469 saving best model
2023-10-27 15:05:34,206 ----------------------------------------------------------------------------------------------------
2023-10-27 15:06:22,854 epoch 5 - iter 372/3726 - loss 0.02632753 - time (sec): 48.65 - samples/sec: 413.87 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:07:09,362 epoch 5 - iter 744/3726 - loss 0.02847207 - time (sec): 95.15 - samples/sec: 431.29 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:07:56,415 epoch 5 - iter 1116/3726 - loss 0.02792443 - time (sec): 142.21 - samples/sec: 432.32 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:08:43,809 epoch 5 - iter 1488/3726 - loss 0.02808337 - time (sec): 189.60 - samples/sec: 432.27 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:09:32,174 epoch 5 - iter 1860/3726 - loss 0.02852601 - time (sec): 237.97 - samples/sec: 426.16 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:10:19,545 epoch 5 - iter 2232/3726 - loss 0.02859382 - time (sec): 285.34 - samples/sec: 427.84 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:11:08,221 epoch 5 - iter 2604/3726 - loss 0.02812313 - time (sec): 334.01 - samples/sec: 426.57 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:11:55,520 epoch 5 - iter 2976/3726 - loss 0.02819121 - time (sec): 381.31 - samples/sec: 427.48 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:12:43,398 epoch 5 - iter 3348/3726 - loss 0.02832524 - time (sec): 429.19 - samples/sec: 426.98 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:13:30,169 epoch 5 - iter 3720/3726 - loss 0.02864353 - time (sec): 475.96 - samples/sec: 429.13 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:13:30,908 ----------------------------------------------------------------------------------------------------
2023-10-27 15:13:30,909 EPOCH 5 done: loss 0.0286 - lr: 0.000003
2023-10-27 15:13:55,872 DEV : loss 0.051205169409513474 - f1-score (micro avg)  0.969
2023-10-27 15:13:55,926 saving best model
2023-10-27 15:13:58,971 ----------------------------------------------------------------------------------------------------
2023-10-27 15:14:46,203 epoch 6 - iter 372/3726 - loss 0.02831335 - time (sec): 47.23 - samples/sec: 428.83 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:15:35,236 epoch 6 - iter 744/3726 - loss 0.02273400 - time (sec): 96.26 - samples/sec: 426.78 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:16:23,193 epoch 6 - iter 1116/3726 - loss 0.02109002 - time (sec): 144.22 - samples/sec: 423.66 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:17:10,302 epoch 6 - iter 1488/3726 - loss 0.02166135 - time (sec): 191.33 - samples/sec: 429.37 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:17:58,889 epoch 6 - iter 1860/3726 - loss 0.02127214 - time (sec): 239.91 - samples/sec: 429.35 - lr: 0.000003 - momentum: 0.000000
2023-10-27 15:18:45,994 epoch 6 - iter 2232/3726 - loss 0.02116817 - time (sec): 287.02 - samples/sec: 428.54 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:19:34,532 epoch 6 - iter 2604/3726 - loss 0.02139499 - time (sec): 335.56 - samples/sec: 427.19 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:20:22,599 epoch 6 - iter 2976/3726 - loss 0.02130084 - time (sec): 383.62 - samples/sec: 425.85 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:21:10,024 epoch 6 - iter 3348/3726 - loss 0.02121915 - time (sec): 431.05 - samples/sec: 425.94 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:21:57,177 epoch 6 - iter 3720/3726 - loss 0.02100853 - time (sec): 478.20 - samples/sec: 427.07 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:21:57,964 ----------------------------------------------------------------------------------------------------
2023-10-27 15:21:57,964 EPOCH 6 done: loss 0.0210 - lr: 0.000002
2023-10-27 15:22:23,206 DEV : loss 0.05652967095375061 - f1-score (micro avg)  0.9686
2023-10-27 15:22:23,264 ----------------------------------------------------------------------------------------------------
2023-10-27 15:23:11,169 epoch 7 - iter 372/3726 - loss 0.01511217 - time (sec): 47.90 - samples/sec: 438.07 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:23:58,577 epoch 7 - iter 744/3726 - loss 0.01678792 - time (sec): 95.31 - samples/sec: 441.01 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:24:46,315 epoch 7 - iter 1116/3726 - loss 0.01767805 - time (sec): 143.05 - samples/sec: 436.03 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:25:34,119 epoch 7 - iter 1488/3726 - loss 0.01634669 - time (sec): 190.85 - samples/sec: 434.79 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:26:21,351 epoch 7 - iter 1860/3726 - loss 0.02015737 - time (sec): 238.08 - samples/sec: 437.30 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:27:08,458 epoch 7 - iter 2232/3726 - loss 0.01895408 - time (sec): 285.19 - samples/sec: 435.98 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:27:56,799 epoch 7 - iter 2604/3726 - loss 0.01784524 - time (sec): 333.53 - samples/sec: 433.67 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:28:45,164 epoch 7 - iter 2976/3726 - loss 0.01736500 - time (sec): 381.90 - samples/sec: 431.20 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:29:33,002 epoch 7 - iter 3348/3726 - loss 0.01709685 - time (sec): 429.74 - samples/sec: 429.19 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:30:19,962 epoch 7 - iter 3720/3726 - loss 0.01680664 - time (sec): 476.70 - samples/sec: 428.73 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:30:20,674 ----------------------------------------------------------------------------------------------------
2023-10-27 15:30:20,674 EPOCH 7 done: loss 0.0168 - lr: 0.000002
2023-10-27 15:30:43,202 DEV : loss 0.052628789097070694 - f1-score (micro avg)  0.9722
2023-10-27 15:30:43,260 saving best model
2023-10-27 15:30:46,516 ----------------------------------------------------------------------------------------------------
2023-10-27 15:31:32,754 epoch 8 - iter 372/3726 - loss 0.01025399 - time (sec): 46.24 - samples/sec: 450.08 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:32:18,558 epoch 8 - iter 744/3726 - loss 0.01126091 - time (sec): 92.04 - samples/sec: 447.34 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:33:03,647 epoch 8 - iter 1116/3726 - loss 0.01172293 - time (sec): 137.13 - samples/sec: 445.99 - lr: 0.000002 - momentum: 0.000000
2023-10-27 15:33:49,381 epoch 8 - iter 1488/3726 - loss 0.01219846 - time (sec): 182.86 - samples/sec: 440.90 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:34:35,063 epoch 8 - iter 1860/3726 - loss 0.01190833 - time (sec): 228.55 - samples/sec: 443.89 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:35:20,762 epoch 8 - iter 2232/3726 - loss 0.01209883 - time (sec): 274.24 - samples/sec: 442.06 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:36:06,906 epoch 8 - iter 2604/3726 - loss 0.01227890 - time (sec): 320.39 - samples/sec: 442.88 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:36:52,771 epoch 8 - iter 2976/3726 - loss 0.01220756 - time (sec): 366.25 - samples/sec: 444.61 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:37:38,958 epoch 8 - iter 3348/3726 - loss 0.01181082 - time (sec): 412.44 - samples/sec: 446.10 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:38:24,879 epoch 8 - iter 3720/3726 - loss 0.01141249 - time (sec): 458.36 - samples/sec: 445.64 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:38:25,618 ----------------------------------------------------------------------------------------------------
2023-10-27 15:38:25,619 EPOCH 8 done: loss 0.0114 - lr: 0.000001
2023-10-27 15:38:49,028 DEV : loss 0.053019002079963684 - f1-score (micro avg)  0.9716
2023-10-27 15:38:49,088 ----------------------------------------------------------------------------------------------------
2023-10-27 15:39:35,150 epoch 9 - iter 372/3726 - loss 0.00801185 - time (sec): 46.06 - samples/sec: 438.23 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:40:20,869 epoch 9 - iter 744/3726 - loss 0.00671566 - time (sec): 91.78 - samples/sec: 435.43 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:41:07,081 epoch 9 - iter 1116/3726 - loss 0.00667937 - time (sec): 137.99 - samples/sec: 438.20 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:41:53,955 epoch 9 - iter 1488/3726 - loss 0.00681524 - time (sec): 184.87 - samples/sec: 439.41 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:42:41,635 epoch 9 - iter 1860/3726 - loss 0.00721784 - time (sec): 232.55 - samples/sec: 437.37 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:43:28,277 epoch 9 - iter 2232/3726 - loss 0.00697985 - time (sec): 279.19 - samples/sec: 440.81 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:44:14,754 epoch 9 - iter 2604/3726 - loss 0.00793074 - time (sec): 325.66 - samples/sec: 440.67 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:45:01,296 epoch 9 - iter 2976/3726 - loss 0.00864270 - time (sec): 372.21 - samples/sec: 439.58 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:45:47,999 epoch 9 - iter 3348/3726 - loss 0.00830012 - time (sec): 418.91 - samples/sec: 438.33 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:46:34,892 epoch 9 - iter 3720/3726 - loss 0.00824128 - time (sec): 465.80 - samples/sec: 438.38 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:46:35,624 ----------------------------------------------------------------------------------------------------
2023-10-27 15:46:35,624 EPOCH 9 done: loss 0.0083 - lr: 0.000001
2023-10-27 15:46:59,614 DEV : loss 0.05303654819726944 - f1-score (micro avg)  0.9734
2023-10-27 15:46:59,675 saving best model
2023-10-27 15:47:02,455 ----------------------------------------------------------------------------------------------------
2023-10-27 15:47:49,146 epoch 10 - iter 372/3726 - loss 0.00384825 - time (sec): 46.69 - samples/sec: 434.52 - lr: 0.000001 - momentum: 0.000000
2023-10-27 15:48:35,604 epoch 10 - iter 744/3726 - loss 0.00345501 - time (sec): 93.15 - samples/sec: 434.11 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:49:22,080 epoch 10 - iter 1116/3726 - loss 0.00418854 - time (sec): 139.62 - samples/sec: 435.29 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:50:08,505 epoch 10 - iter 1488/3726 - loss 0.00522497 - time (sec): 186.05 - samples/sec: 430.51 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:50:54,906 epoch 10 - iter 1860/3726 - loss 0.00509842 - time (sec): 232.45 - samples/sec: 435.12 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:51:41,148 epoch 10 - iter 2232/3726 - loss 0.00554209 - time (sec): 278.69 - samples/sec: 437.63 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:52:26,664 epoch 10 - iter 2604/3726 - loss 0.00584885 - time (sec): 324.21 - samples/sec: 441.07 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:53:13,641 epoch 10 - iter 2976/3726 - loss 0.00599832 - time (sec): 371.18 - samples/sec: 440.13 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:53:59,425 epoch 10 - iter 3348/3726 - loss 0.00600259 - time (sec): 416.97 - samples/sec: 442.42 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:54:45,541 epoch 10 - iter 3720/3726 - loss 0.00576228 - time (sec): 463.08 - samples/sec: 441.09 - lr: 0.000000 - momentum: 0.000000
2023-10-27 15:54:46,272 ----------------------------------------------------------------------------------------------------
2023-10-27 15:54:46,273 EPOCH 10 done: loss 0.0058 - lr: 0.000000
2023-10-27 15:55:09,835 DEV : loss 0.05434899777173996 - f1-score (micro avg)  0.9727
2023-10-27 15:55:12,069 ----------------------------------------------------------------------------------------------------
2023-10-27 15:55:12,071 Loading model from best epoch ...
2023-10-27 15:55:20,258 SequenceTagger predicts: Dictionary with 17 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-MISC, B-MISC, E-MISC, I-MISC
2023-10-27 15:55:43,551 
Results:
- F-score (micro) 0.97
- F-score (macro) 0.9642
- Accuracy 0.9559

By class:
              precision    recall  f1-score   support

         ORG     0.9700    0.9665    0.9682      1909
         PER     0.9944    0.9956    0.9950      1591
         LOC     0.9684    0.9745    0.9714      1413
        MISC     0.9245    0.9200    0.9222       812

   micro avg     0.9700    0.9700    0.9700      5725
   macro avg     0.9643    0.9641    0.9642      5725
weighted avg     0.9699    0.9700    0.9699      5725

2023-10-27 15:55:43,551 ----------------------------------------------------------------------------------------------------