File size: 25,403 Bytes
d2ce096
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
2023-10-15 02:31:46,751 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 Model: "SequenceTagger(
  (embeddings): ByT5Embeddings(
    (model): T5EncoderModel(
      (shared): Embedding(384, 1472)
      (encoder): T5Stack(
        (embed_tokens): Embedding(384, 1472)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                  (relative_attention_bias): Embedding(32, 6)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1472, out_features=21, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-15 02:31:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
 - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-15 02:31:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 Train:  3575 sentences
2023-10-15 02:31:46,752         (train_with_dev=False, train_with_test=False)
2023-10-15 02:31:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,752 Training Params:
2023-10-15 02:31:46,752  - learning_rate: "0.00016" 
2023-10-15 02:31:46,752  - mini_batch_size: "4"
2023-10-15 02:31:46,752  - max_epochs: "10"
2023-10-15 02:31:46,752  - shuffle: "True"
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Plugins:
2023-10-15 02:31:46,753  - TensorboardLogger
2023-10-15 02:31:46,753  - LinearScheduler | warmup_fraction: '0.1'
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Final evaluation on model from best epoch (best-model.pt)
2023-10-15 02:31:46,753  - metric: "('micro avg', 'f1-score')"
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Computation:
2023-10-15 02:31:46,753  - compute on device: cuda:0
2023-10-15 02:31:46,753  - embedding storage: none
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Model training base path: "hmbench-hipe2020/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4"
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 ----------------------------------------------------------------------------------------------------
2023-10-15 02:31:46,753 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-15 02:32:03,162 epoch 1 - iter 89/894 - loss 3.02344183 - time (sec): 16.41 - samples/sec: 503.29 - lr: 0.000016 - momentum: 0.000000
2023-10-15 02:32:22,172 epoch 1 - iter 178/894 - loss 2.95643439 - time (sec): 35.42 - samples/sec: 517.95 - lr: 0.000032 - momentum: 0.000000
2023-10-15 02:32:39,335 epoch 1 - iter 267/894 - loss 2.78708734 - time (sec): 52.58 - samples/sec: 525.09 - lr: 0.000048 - momentum: 0.000000
2023-10-15 02:32:55,879 epoch 1 - iter 356/894 - loss 2.58160891 - time (sec): 69.13 - samples/sec: 521.14 - lr: 0.000064 - momentum: 0.000000
2023-10-15 02:33:12,285 epoch 1 - iter 445/894 - loss 2.35433260 - time (sec): 85.53 - samples/sec: 520.19 - lr: 0.000079 - momentum: 0.000000
2023-10-15 02:33:29,031 epoch 1 - iter 534/894 - loss 2.10818662 - time (sec): 102.28 - samples/sec: 520.02 - lr: 0.000095 - momentum: 0.000000
2023-10-15 02:33:45,298 epoch 1 - iter 623/894 - loss 1.90435758 - time (sec): 118.54 - samples/sec: 518.72 - lr: 0.000111 - momentum: 0.000000
2023-10-15 02:34:01,309 epoch 1 - iter 712/894 - loss 1.75522635 - time (sec): 134.55 - samples/sec: 514.33 - lr: 0.000127 - momentum: 0.000000
2023-10-15 02:34:17,945 epoch 1 - iter 801/894 - loss 1.61964852 - time (sec): 151.19 - samples/sec: 514.93 - lr: 0.000143 - momentum: 0.000000
2023-10-15 02:34:34,731 epoch 1 - iter 890/894 - loss 1.50331592 - time (sec): 167.98 - samples/sec: 513.77 - lr: 0.000159 - momentum: 0.000000
2023-10-15 02:34:35,368 ----------------------------------------------------------------------------------------------------
2023-10-15 02:34:35,368 EPOCH 1 done: loss 1.5004 - lr: 0.000159
2023-10-15 02:34:58,797 DEV : loss 0.3780643343925476 - f1-score (micro avg)  0.0
2023-10-15 02:34:58,823 ----------------------------------------------------------------------------------------------------
2023-10-15 02:35:15,413 epoch 2 - iter 89/894 - loss 0.35647784 - time (sec): 16.59 - samples/sec: 506.48 - lr: 0.000158 - momentum: 0.000000
2023-10-15 02:35:32,068 epoch 2 - iter 178/894 - loss 0.34495992 - time (sec): 33.24 - samples/sec: 488.93 - lr: 0.000156 - momentum: 0.000000
2023-10-15 02:35:48,763 epoch 2 - iter 267/894 - loss 0.32633621 - time (sec): 49.94 - samples/sec: 498.77 - lr: 0.000155 - momentum: 0.000000
2023-10-15 02:36:05,361 epoch 2 - iter 356/894 - loss 0.30999671 - time (sec): 66.54 - samples/sec: 501.31 - lr: 0.000153 - momentum: 0.000000
2023-10-15 02:36:22,230 epoch 2 - iter 445/894 - loss 0.29402029 - time (sec): 83.41 - samples/sec: 504.29 - lr: 0.000151 - momentum: 0.000000
2023-10-15 02:36:38,633 epoch 2 - iter 534/894 - loss 0.29260383 - time (sec): 99.81 - samples/sec: 504.68 - lr: 0.000149 - momentum: 0.000000
2023-10-15 02:36:55,012 epoch 2 - iter 623/894 - loss 0.28062458 - time (sec): 116.19 - samples/sec: 505.86 - lr: 0.000148 - momentum: 0.000000
2023-10-15 02:37:11,215 epoch 2 - iter 712/894 - loss 0.27449208 - time (sec): 132.39 - samples/sec: 505.71 - lr: 0.000146 - momentum: 0.000000
2023-10-15 02:37:28,432 epoch 2 - iter 801/894 - loss 0.26359435 - time (sec): 149.61 - samples/sec: 509.57 - lr: 0.000144 - momentum: 0.000000
2023-10-15 02:37:47,059 epoch 2 - iter 890/894 - loss 0.25229683 - time (sec): 168.23 - samples/sec: 511.90 - lr: 0.000142 - momentum: 0.000000
2023-10-15 02:37:47,825 ----------------------------------------------------------------------------------------------------
2023-10-15 02:37:47,825 EPOCH 2 done: loss 0.2512 - lr: 0.000142
2023-10-15 02:38:12,900 DEV : loss 0.19516690075397491 - f1-score (micro avg)  0.6578
2023-10-15 02:38:12,926 saving best model
2023-10-15 02:38:13,600 ----------------------------------------------------------------------------------------------------
2023-10-15 02:38:30,620 epoch 3 - iter 89/894 - loss 0.19012779 - time (sec): 17.02 - samples/sec: 538.74 - lr: 0.000140 - momentum: 0.000000
2023-10-15 02:38:47,328 epoch 3 - iter 178/894 - loss 0.15803457 - time (sec): 33.73 - samples/sec: 540.12 - lr: 0.000139 - momentum: 0.000000
2023-10-15 02:39:04,061 epoch 3 - iter 267/894 - loss 0.14197851 - time (sec): 50.46 - samples/sec: 531.87 - lr: 0.000137 - momentum: 0.000000
2023-10-15 02:39:20,354 epoch 3 - iter 356/894 - loss 0.14302842 - time (sec): 66.75 - samples/sec: 525.00 - lr: 0.000135 - momentum: 0.000000
2023-10-15 02:39:36,895 epoch 3 - iter 445/894 - loss 0.13867980 - time (sec): 83.29 - samples/sec: 524.43 - lr: 0.000133 - momentum: 0.000000
2023-10-15 02:39:52,835 epoch 3 - iter 534/894 - loss 0.13292117 - time (sec): 99.23 - samples/sec: 516.98 - lr: 0.000132 - momentum: 0.000000
2023-10-15 02:40:09,133 epoch 3 - iter 623/894 - loss 0.13020535 - time (sec): 115.53 - samples/sec: 513.60 - lr: 0.000130 - momentum: 0.000000
2023-10-15 02:40:25,404 epoch 3 - iter 712/894 - loss 0.12587355 - time (sec): 131.80 - samples/sec: 512.31 - lr: 0.000128 - momentum: 0.000000
2023-10-15 02:40:43,932 epoch 3 - iter 801/894 - loss 0.12594823 - time (sec): 150.33 - samples/sec: 513.37 - lr: 0.000126 - momentum: 0.000000
2023-10-15 02:41:01,043 epoch 3 - iter 890/894 - loss 0.12151089 - time (sec): 167.44 - samples/sec: 514.04 - lr: 0.000125 - momentum: 0.000000
2023-10-15 02:41:01,800 ----------------------------------------------------------------------------------------------------
2023-10-15 02:41:01,800 EPOCH 3 done: loss 0.1218 - lr: 0.000125
2023-10-15 02:41:27,066 DEV : loss 0.15591633319854736 - f1-score (micro avg)  0.7298
2023-10-15 02:41:27,092 saving best model
2023-10-15 02:41:27,983 ----------------------------------------------------------------------------------------------------
2023-10-15 02:41:44,324 epoch 4 - iter 89/894 - loss 0.07103724 - time (sec): 16.34 - samples/sec: 506.27 - lr: 0.000123 - momentum: 0.000000
2023-10-15 02:42:01,963 epoch 4 - iter 178/894 - loss 0.06257623 - time (sec): 33.98 - samples/sec: 517.81 - lr: 0.000121 - momentum: 0.000000
2023-10-15 02:42:18,072 epoch 4 - iter 267/894 - loss 0.06910373 - time (sec): 50.09 - samples/sec: 511.23 - lr: 0.000119 - momentum: 0.000000
2023-10-15 02:42:34,820 epoch 4 - iter 356/894 - loss 0.07107525 - time (sec): 66.83 - samples/sec: 520.12 - lr: 0.000117 - momentum: 0.000000
2023-10-15 02:42:51,325 epoch 4 - iter 445/894 - loss 0.06802427 - time (sec): 83.34 - samples/sec: 522.37 - lr: 0.000116 - momentum: 0.000000
2023-10-15 02:43:07,291 epoch 4 - iter 534/894 - loss 0.06713645 - time (sec): 99.30 - samples/sec: 519.31 - lr: 0.000114 - momentum: 0.000000
2023-10-15 02:43:23,747 epoch 4 - iter 623/894 - loss 0.06876201 - time (sec): 115.76 - samples/sec: 521.62 - lr: 0.000112 - momentum: 0.000000
2023-10-15 02:43:41,754 epoch 4 - iter 712/894 - loss 0.07019890 - time (sec): 133.77 - samples/sec: 519.61 - lr: 0.000110 - momentum: 0.000000
2023-10-15 02:43:57,924 epoch 4 - iter 801/894 - loss 0.06984123 - time (sec): 149.94 - samples/sec: 518.42 - lr: 0.000109 - momentum: 0.000000
2023-10-15 02:44:14,341 epoch 4 - iter 890/894 - loss 0.06910896 - time (sec): 166.35 - samples/sec: 518.79 - lr: 0.000107 - momentum: 0.000000
2023-10-15 02:44:14,960 ----------------------------------------------------------------------------------------------------
2023-10-15 02:44:14,961 EPOCH 4 done: loss 0.0690 - lr: 0.000107
2023-10-15 02:44:39,949 DEV : loss 0.1609293520450592 - f1-score (micro avg)  0.7516
2023-10-15 02:44:39,975 saving best model
2023-10-15 02:44:40,880 ----------------------------------------------------------------------------------------------------
2023-10-15 02:44:57,848 epoch 5 - iter 89/894 - loss 0.03775608 - time (sec): 16.97 - samples/sec: 531.50 - lr: 0.000105 - momentum: 0.000000
2023-10-15 02:45:14,380 epoch 5 - iter 178/894 - loss 0.03532971 - time (sec): 33.50 - samples/sec: 524.28 - lr: 0.000103 - momentum: 0.000000
2023-10-15 02:45:31,264 epoch 5 - iter 267/894 - loss 0.03513730 - time (sec): 50.38 - samples/sec: 530.22 - lr: 0.000101 - momentum: 0.000000
2023-10-15 02:45:47,982 epoch 5 - iter 356/894 - loss 0.03993442 - time (sec): 67.10 - samples/sec: 528.81 - lr: 0.000100 - momentum: 0.000000
2023-10-15 02:46:06,528 epoch 5 - iter 445/894 - loss 0.04620953 - time (sec): 85.65 - samples/sec: 529.50 - lr: 0.000098 - momentum: 0.000000
2023-10-15 02:46:22,864 epoch 5 - iter 534/894 - loss 0.04713336 - time (sec): 101.98 - samples/sec: 527.63 - lr: 0.000096 - momentum: 0.000000
2023-10-15 02:46:38,857 epoch 5 - iter 623/894 - loss 0.04525939 - time (sec): 117.97 - samples/sec: 521.37 - lr: 0.000094 - momentum: 0.000000
2023-10-15 02:46:55,181 epoch 5 - iter 712/894 - loss 0.04267360 - time (sec): 134.30 - samples/sec: 519.69 - lr: 0.000093 - momentum: 0.000000
2023-10-15 02:47:11,200 epoch 5 - iter 801/894 - loss 0.04096140 - time (sec): 150.32 - samples/sec: 516.27 - lr: 0.000091 - momentum: 0.000000
2023-10-15 02:47:27,819 epoch 5 - iter 890/894 - loss 0.04206314 - time (sec): 166.94 - samples/sec: 516.61 - lr: 0.000089 - momentum: 0.000000
2023-10-15 02:47:28,490 ----------------------------------------------------------------------------------------------------
2023-10-15 02:47:28,490 EPOCH 5 done: loss 0.0419 - lr: 0.000089
2023-10-15 02:47:53,602 DEV : loss 0.1873023360967636 - f1-score (micro avg)  0.7704
2023-10-15 02:47:53,628 saving best model
2023-10-15 02:47:54,607 ----------------------------------------------------------------------------------------------------
2023-10-15 02:48:12,579 epoch 6 - iter 89/894 - loss 0.04090361 - time (sec): 17.97 - samples/sec: 505.20 - lr: 0.000087 - momentum: 0.000000
2023-10-15 02:48:30,268 epoch 6 - iter 178/894 - loss 0.03139198 - time (sec): 35.66 - samples/sec: 515.84 - lr: 0.000085 - momentum: 0.000000
2023-10-15 02:48:46,856 epoch 6 - iter 267/894 - loss 0.03064485 - time (sec): 52.25 - samples/sec: 522.23 - lr: 0.000084 - momentum: 0.000000
2023-10-15 02:49:03,120 epoch 6 - iter 356/894 - loss 0.02852187 - time (sec): 68.51 - samples/sec: 516.68 - lr: 0.000082 - momentum: 0.000000
2023-10-15 02:49:20,293 epoch 6 - iter 445/894 - loss 0.02788748 - time (sec): 85.68 - samples/sec: 508.91 - lr: 0.000080 - momentum: 0.000000
2023-10-15 02:49:37,772 epoch 6 - iter 534/894 - loss 0.02794351 - time (sec): 103.16 - samples/sec: 506.10 - lr: 0.000078 - momentum: 0.000000
2023-10-15 02:49:55,339 epoch 6 - iter 623/894 - loss 0.02802367 - time (sec): 120.73 - samples/sec: 502.99 - lr: 0.000077 - momentum: 0.000000
2023-10-15 02:50:12,915 epoch 6 - iter 712/894 - loss 0.02792162 - time (sec): 138.31 - samples/sec: 500.00 - lr: 0.000075 - momentum: 0.000000
2023-10-15 02:50:29,545 epoch 6 - iter 801/894 - loss 0.02919230 - time (sec): 154.94 - samples/sec: 499.81 - lr: 0.000073 - momentum: 0.000000
2023-10-15 02:50:46,019 epoch 6 - iter 890/894 - loss 0.02781489 - time (sec): 171.41 - samples/sec: 503.33 - lr: 0.000071 - momentum: 0.000000
2023-10-15 02:50:46,685 ----------------------------------------------------------------------------------------------------
2023-10-15 02:50:46,685 EPOCH 6 done: loss 0.0278 - lr: 0.000071
2023-10-15 02:51:13,116 DEV : loss 0.21434210240840912 - f1-score (micro avg)  0.7671
2023-10-15 02:51:13,144 ----------------------------------------------------------------------------------------------------
2023-10-15 02:51:32,358 epoch 7 - iter 89/894 - loss 0.03055746 - time (sec): 19.21 - samples/sec: 506.79 - lr: 0.000069 - momentum: 0.000000
2023-10-15 02:51:49,459 epoch 7 - iter 178/894 - loss 0.03178652 - time (sec): 36.31 - samples/sec: 508.13 - lr: 0.000068 - momentum: 0.000000
2023-10-15 02:52:05,517 epoch 7 - iter 267/894 - loss 0.02947788 - time (sec): 52.37 - samples/sec: 496.56 - lr: 0.000066 - momentum: 0.000000
2023-10-15 02:52:21,830 epoch 7 - iter 356/894 - loss 0.02621194 - time (sec): 68.68 - samples/sec: 497.65 - lr: 0.000064 - momentum: 0.000000
2023-10-15 02:52:38,189 epoch 7 - iter 445/894 - loss 0.02435064 - time (sec): 85.04 - samples/sec: 498.38 - lr: 0.000062 - momentum: 0.000000
2023-10-15 02:52:54,896 epoch 7 - iter 534/894 - loss 0.02215860 - time (sec): 101.75 - samples/sec: 503.67 - lr: 0.000061 - momentum: 0.000000
2023-10-15 02:53:11,073 epoch 7 - iter 623/894 - loss 0.02099170 - time (sec): 117.93 - samples/sec: 501.57 - lr: 0.000059 - momentum: 0.000000
2023-10-15 02:53:27,755 epoch 7 - iter 712/894 - loss 0.01995778 - time (sec): 134.61 - samples/sec: 504.04 - lr: 0.000057 - momentum: 0.000000
2023-10-15 02:53:44,630 epoch 7 - iter 801/894 - loss 0.01902734 - time (sec): 151.48 - samples/sec: 507.27 - lr: 0.000055 - momentum: 0.000000
2023-10-15 02:54:02,012 epoch 7 - iter 890/894 - loss 0.01833316 - time (sec): 168.87 - samples/sec: 510.97 - lr: 0.000053 - momentum: 0.000000
2023-10-15 02:54:02,650 ----------------------------------------------------------------------------------------------------
2023-10-15 02:54:02,651 EPOCH 7 done: loss 0.0185 - lr: 0.000053
2023-10-15 02:54:28,844 DEV : loss 0.21627573668956757 - f1-score (micro avg)  0.7695
2023-10-15 02:54:28,870 ----------------------------------------------------------------------------------------------------
2023-10-15 02:54:45,753 epoch 8 - iter 89/894 - loss 0.01084712 - time (sec): 16.88 - samples/sec: 505.28 - lr: 0.000052 - momentum: 0.000000
2023-10-15 02:55:03,142 epoch 8 - iter 178/894 - loss 0.01079641 - time (sec): 34.27 - samples/sec: 513.12 - lr: 0.000050 - momentum: 0.000000
2023-10-15 02:55:20,027 epoch 8 - iter 267/894 - loss 0.00894694 - time (sec): 51.16 - samples/sec: 521.76 - lr: 0.000048 - momentum: 0.000000
2023-10-15 02:55:36,682 epoch 8 - iter 356/894 - loss 0.00974459 - time (sec): 67.81 - samples/sec: 522.44 - lr: 0.000046 - momentum: 0.000000
2023-10-15 02:55:53,263 epoch 8 - iter 445/894 - loss 0.01110363 - time (sec): 84.39 - samples/sec: 522.29 - lr: 0.000045 - momentum: 0.000000
2023-10-15 02:56:09,389 epoch 8 - iter 534/894 - loss 0.01181829 - time (sec): 100.52 - samples/sec: 517.26 - lr: 0.000043 - momentum: 0.000000
2023-10-15 02:56:26,017 epoch 8 - iter 623/894 - loss 0.01176535 - time (sec): 117.15 - samples/sec: 516.61 - lr: 0.000041 - momentum: 0.000000
2023-10-15 02:56:42,522 epoch 8 - iter 712/894 - loss 0.01085512 - time (sec): 133.65 - samples/sec: 516.06 - lr: 0.000039 - momentum: 0.000000
2023-10-15 02:57:01,040 epoch 8 - iter 801/894 - loss 0.01135469 - time (sec): 152.17 - samples/sec: 515.39 - lr: 0.000038 - momentum: 0.000000
2023-10-15 02:57:17,154 epoch 8 - iter 890/894 - loss 0.01148836 - time (sec): 168.28 - samples/sec: 512.32 - lr: 0.000036 - momentum: 0.000000
2023-10-15 02:57:17,838 ----------------------------------------------------------------------------------------------------
2023-10-15 02:57:17,839 EPOCH 8 done: loss 0.0118 - lr: 0.000036
2023-10-15 02:57:44,212 DEV : loss 0.22242531180381775 - f1-score (micro avg)  0.7719
2023-10-15 02:57:44,240 saving best model
2023-10-15 02:57:46,752 ----------------------------------------------------------------------------------------------------
2023-10-15 02:58:02,946 epoch 9 - iter 89/894 - loss 0.00595299 - time (sec): 16.19 - samples/sec: 494.63 - lr: 0.000034 - momentum: 0.000000
2023-10-15 02:58:19,361 epoch 9 - iter 178/894 - loss 0.00644078 - time (sec): 32.61 - samples/sec: 498.46 - lr: 0.000032 - momentum: 0.000000
2023-10-15 02:58:35,843 epoch 9 - iter 267/894 - loss 0.01132599 - time (sec): 49.09 - samples/sec: 505.33 - lr: 0.000030 - momentum: 0.000000
2023-10-15 02:58:52,419 epoch 9 - iter 356/894 - loss 0.00904748 - time (sec): 65.67 - samples/sec: 510.07 - lr: 0.000029 - momentum: 0.000000
2023-10-15 02:59:10,359 epoch 9 - iter 445/894 - loss 0.01098782 - time (sec): 83.60 - samples/sec: 506.84 - lr: 0.000027 - momentum: 0.000000
2023-10-15 02:59:27,525 epoch 9 - iter 534/894 - loss 0.01062907 - time (sec): 100.77 - samples/sec: 512.80 - lr: 0.000025 - momentum: 0.000000
2023-10-15 02:59:44,394 epoch 9 - iter 623/894 - loss 0.01018680 - time (sec): 117.64 - samples/sec: 512.48 - lr: 0.000023 - momentum: 0.000000
2023-10-15 03:00:01,280 epoch 9 - iter 712/894 - loss 0.00968917 - time (sec): 134.53 - samples/sec: 513.69 - lr: 0.000022 - momentum: 0.000000
2023-10-15 03:00:17,808 epoch 9 - iter 801/894 - loss 0.00910352 - time (sec): 151.05 - samples/sec: 515.92 - lr: 0.000020 - momentum: 0.000000
2023-10-15 03:00:34,257 epoch 9 - iter 890/894 - loss 0.00910813 - time (sec): 167.50 - samples/sec: 514.73 - lr: 0.000018 - momentum: 0.000000
2023-10-15 03:00:34,964 ----------------------------------------------------------------------------------------------------
2023-10-15 03:00:34,964 EPOCH 9 done: loss 0.0091 - lr: 0.000018
2023-10-15 03:01:01,017 DEV : loss 0.23208962380886078 - f1-score (micro avg)  0.7894
2023-10-15 03:01:01,043 saving best model
2023-10-15 03:01:03,945 ----------------------------------------------------------------------------------------------------
2023-10-15 03:01:22,920 epoch 10 - iter 89/894 - loss 0.00741155 - time (sec): 18.97 - samples/sec: 530.99 - lr: 0.000016 - momentum: 0.000000
2023-10-15 03:01:39,312 epoch 10 - iter 178/894 - loss 0.00663561 - time (sec): 35.36 - samples/sec: 518.24 - lr: 0.000014 - momentum: 0.000000
2023-10-15 03:01:55,571 epoch 10 - iter 267/894 - loss 0.00512243 - time (sec): 51.62 - samples/sec: 515.38 - lr: 0.000013 - momentum: 0.000000
2023-10-15 03:02:11,926 epoch 10 - iter 356/894 - loss 0.00494199 - time (sec): 67.98 - samples/sec: 513.59 - lr: 0.000011 - momentum: 0.000000
2023-10-15 03:02:29,205 epoch 10 - iter 445/894 - loss 0.00519207 - time (sec): 85.26 - samples/sec: 519.56 - lr: 0.000009 - momentum: 0.000000
2023-10-15 03:02:45,598 epoch 10 - iter 534/894 - loss 0.00528129 - time (sec): 101.65 - samples/sec: 517.28 - lr: 0.000007 - momentum: 0.000000
2023-10-15 03:03:02,594 epoch 10 - iter 623/894 - loss 0.00652465 - time (sec): 118.65 - samples/sec: 515.94 - lr: 0.000006 - momentum: 0.000000
2023-10-15 03:03:18,989 epoch 10 - iter 712/894 - loss 0.00616289 - time (sec): 135.04 - samples/sec: 517.96 - lr: 0.000004 - momentum: 0.000000
2023-10-15 03:03:35,161 epoch 10 - iter 801/894 - loss 0.00585123 - time (sec): 151.21 - samples/sec: 516.29 - lr: 0.000002 - momentum: 0.000000
2023-10-15 03:03:51,457 epoch 10 - iter 890/894 - loss 0.00587903 - time (sec): 167.51 - samples/sec: 515.29 - lr: 0.000000 - momentum: 0.000000
2023-10-15 03:03:52,093 ----------------------------------------------------------------------------------------------------
2023-10-15 03:03:52,093 EPOCH 10 done: loss 0.0059 - lr: 0.000000
2023-10-15 03:04:17,600 DEV : loss 0.2378893792629242 - f1-score (micro avg)  0.7848
2023-10-15 03:04:18,259 ----------------------------------------------------------------------------------------------------
2023-10-15 03:04:18,261 Loading model from best epoch ...
2023-10-15 03:04:26,176 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-15 03:04:49,929 
Results:
- F-score (micro) 0.7741
- F-score (macro) 0.6991
- Accuracy 0.6448

By class:
              precision    recall  f1-score   support

         loc     0.8588    0.8674    0.8631       596
        pers     0.6898    0.7748    0.7298       333
         org     0.5923    0.5833    0.5878       132
        prod     0.6731    0.5303    0.5932        66
        time     0.7292    0.7143    0.7216        49

   micro avg     0.7645    0.7840    0.7741      1176
   macro avg     0.7086    0.6940    0.6991      1176
weighted avg     0.7652    0.7840    0.7734      1176

2023-10-15 03:04:49,929 ----------------------------------------------------------------------------------------------------