File size: 25,447 Bytes
3a042c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
2023-10-13 05:43:31,169 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,171 Model: "SequenceTagger(
  (embeddings): ByT5Embeddings(
    (model): T5EncoderModel(
      (shared): Embedding(384, 1472)
      (encoder): T5Stack(
        (embed_tokens): Embedding(384, 1472)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                  (relative_attention_bias): Embedding(32, 6)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1472, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-13 05:43:31,172 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,172 MultiCorpus: 7936 train + 992 dev + 992 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-13 05:43:31,172 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,172 Train:  7936 sentences
2023-10-13 05:43:31,172         (train_with_dev=False, train_with_test=False)
2023-10-13 05:43:31,172 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,172 Training Params:
2023-10-13 05:43:31,172  - learning_rate: "0.00015" 
2023-10-13 05:43:31,172  - mini_batch_size: "4"
2023-10-13 05:43:31,172  - max_epochs: "10"
2023-10-13 05:43:31,172  - shuffle: "True"
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Plugins:
2023-10-13 05:43:31,173  - TensorboardLogger
2023-10-13 05:43:31,173  - LinearScheduler | warmup_fraction: '0.1'
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Final evaluation on model from best epoch (best-model.pt)
2023-10-13 05:43:31,173  - metric: "('micro avg', 'f1-score')"
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Computation:
2023-10-13 05:43:31,173  - compute on device: cuda:0
2023-10-13 05:43:31,173  - embedding storage: none
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,174 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-13 05:44:23,669 epoch 1 - iter 198/1984 - loss 2.56023642 - time (sec): 52.49 - samples/sec: 310.74 - lr: 0.000015 - momentum: 0.000000
2023-10-13 05:45:18,867 epoch 1 - iter 396/1984 - loss 2.35774357 - time (sec): 107.69 - samples/sec: 309.60 - lr: 0.000030 - momentum: 0.000000
2023-10-13 05:46:14,303 epoch 1 - iter 594/1984 - loss 2.05704108 - time (sec): 163.13 - samples/sec: 307.54 - lr: 0.000045 - momentum: 0.000000
2023-10-13 05:47:06,479 epoch 1 - iter 792/1984 - loss 1.74769940 - time (sec): 215.30 - samples/sec: 307.47 - lr: 0.000060 - momentum: 0.000000
2023-10-13 05:47:57,574 epoch 1 - iter 990/1984 - loss 1.49982570 - time (sec): 266.40 - samples/sec: 310.06 - lr: 0.000075 - momentum: 0.000000
2023-10-13 05:48:51,134 epoch 1 - iter 1188/1984 - loss 1.30358343 - time (sec): 319.96 - samples/sec: 308.94 - lr: 0.000090 - momentum: 0.000000
2023-10-13 05:49:47,616 epoch 1 - iter 1386/1984 - loss 1.15490690 - time (sec): 376.44 - samples/sec: 304.80 - lr: 0.000105 - momentum: 0.000000
2023-10-13 05:50:43,445 epoch 1 - iter 1584/1984 - loss 1.03965057 - time (sec): 432.27 - samples/sec: 303.80 - lr: 0.000120 - momentum: 0.000000
2023-10-13 05:51:37,606 epoch 1 - iter 1782/1984 - loss 0.94460620 - time (sec): 486.43 - samples/sec: 303.83 - lr: 0.000135 - momentum: 0.000000
2023-10-13 05:52:30,621 epoch 1 - iter 1980/1984 - loss 0.87250029 - time (sec): 539.45 - samples/sec: 303.15 - lr: 0.000150 - momentum: 0.000000
2023-10-13 05:52:31,783 ----------------------------------------------------------------------------------------------------
2023-10-13 05:52:31,783 EPOCH 1 done: loss 0.8711 - lr: 0.000150
2023-10-13 05:52:57,860 DEV : loss 0.16231723129749298 - f1-score (micro avg)  0.663
2023-10-13 05:52:57,906 saving best model
2023-10-13 05:52:58,787 ----------------------------------------------------------------------------------------------------
2023-10-13 05:53:54,708 epoch 2 - iter 198/1984 - loss 0.15963194 - time (sec): 55.92 - samples/sec: 295.50 - lr: 0.000148 - momentum: 0.000000
2023-10-13 05:54:49,891 epoch 2 - iter 396/1984 - loss 0.15446338 - time (sec): 111.10 - samples/sec: 295.65 - lr: 0.000147 - momentum: 0.000000
2023-10-13 05:55:42,962 epoch 2 - iter 594/1984 - loss 0.14934617 - time (sec): 164.17 - samples/sec: 298.92 - lr: 0.000145 - momentum: 0.000000
2023-10-13 05:56:38,570 epoch 2 - iter 792/1984 - loss 0.14004568 - time (sec): 219.78 - samples/sec: 293.01 - lr: 0.000143 - momentum: 0.000000
2023-10-13 05:57:34,636 epoch 2 - iter 990/1984 - loss 0.13664831 - time (sec): 275.85 - samples/sec: 296.06 - lr: 0.000142 - momentum: 0.000000
2023-10-13 05:58:28,427 epoch 2 - iter 1188/1984 - loss 0.13641309 - time (sec): 329.64 - samples/sec: 297.14 - lr: 0.000140 - momentum: 0.000000
2023-10-13 05:59:22,135 epoch 2 - iter 1386/1984 - loss 0.13311507 - time (sec): 383.35 - samples/sec: 298.85 - lr: 0.000138 - momentum: 0.000000
2023-10-13 06:00:19,474 epoch 2 - iter 1584/1984 - loss 0.13084502 - time (sec): 440.68 - samples/sec: 296.84 - lr: 0.000137 - momentum: 0.000000
2023-10-13 06:01:14,119 epoch 2 - iter 1782/1984 - loss 0.12778979 - time (sec): 495.33 - samples/sec: 297.40 - lr: 0.000135 - momentum: 0.000000
2023-10-13 06:02:07,974 epoch 2 - iter 1980/1984 - loss 0.12586842 - time (sec): 549.18 - samples/sec: 298.12 - lr: 0.000133 - momentum: 0.000000
2023-10-13 06:02:08,971 ----------------------------------------------------------------------------------------------------
2023-10-13 06:02:08,972 EPOCH 2 done: loss 0.1258 - lr: 0.000133
2023-10-13 06:02:35,243 DEV : loss 0.08949719369411469 - f1-score (micro avg)  0.7352
2023-10-13 06:02:35,285 saving best model
2023-10-13 06:02:37,860 ----------------------------------------------------------------------------------------------------
2023-10-13 06:03:32,717 epoch 3 - iter 198/1984 - loss 0.07354557 - time (sec): 54.85 - samples/sec: 312.62 - lr: 0.000132 - momentum: 0.000000
2023-10-13 06:04:25,496 epoch 3 - iter 396/1984 - loss 0.07621576 - time (sec): 107.63 - samples/sec: 308.46 - lr: 0.000130 - momentum: 0.000000
2023-10-13 06:05:19,074 epoch 3 - iter 594/1984 - loss 0.08125367 - time (sec): 161.21 - samples/sec: 306.72 - lr: 0.000128 - momentum: 0.000000
2023-10-13 06:06:15,512 epoch 3 - iter 792/1984 - loss 0.07585305 - time (sec): 217.65 - samples/sec: 303.05 - lr: 0.000127 - momentum: 0.000000
2023-10-13 06:07:10,722 epoch 3 - iter 990/1984 - loss 0.07972188 - time (sec): 272.86 - samples/sec: 298.46 - lr: 0.000125 - momentum: 0.000000
2023-10-13 06:08:07,849 epoch 3 - iter 1188/1984 - loss 0.07889533 - time (sec): 329.98 - samples/sec: 295.56 - lr: 0.000123 - momentum: 0.000000
2023-10-13 06:09:01,056 epoch 3 - iter 1386/1984 - loss 0.07614265 - time (sec): 383.19 - samples/sec: 297.04 - lr: 0.000122 - momentum: 0.000000
2023-10-13 06:09:55,354 epoch 3 - iter 1584/1984 - loss 0.07610978 - time (sec): 437.49 - samples/sec: 297.93 - lr: 0.000120 - momentum: 0.000000
2023-10-13 06:10:50,635 epoch 3 - iter 1782/1984 - loss 0.07645598 - time (sec): 492.77 - samples/sec: 299.47 - lr: 0.000118 - momentum: 0.000000
2023-10-13 06:11:45,893 epoch 3 - iter 1980/1984 - loss 0.07656055 - time (sec): 548.03 - samples/sec: 298.44 - lr: 0.000117 - momentum: 0.000000
2023-10-13 06:11:47,067 ----------------------------------------------------------------------------------------------------
2023-10-13 06:11:47,067 EPOCH 3 done: loss 0.0764 - lr: 0.000117
2023-10-13 06:12:13,772 DEV : loss 0.10229434072971344 - f1-score (micro avg)  0.7421
2023-10-13 06:12:13,819 saving best model
2023-10-13 06:12:16,515 ----------------------------------------------------------------------------------------------------
2023-10-13 06:13:11,781 epoch 4 - iter 198/1984 - loss 0.06257893 - time (sec): 55.26 - samples/sec: 301.67 - lr: 0.000115 - momentum: 0.000000
2023-10-13 06:14:06,955 epoch 4 - iter 396/1984 - loss 0.05453809 - time (sec): 110.44 - samples/sec: 294.81 - lr: 0.000113 - momentum: 0.000000
2023-10-13 06:15:02,500 epoch 4 - iter 594/1984 - loss 0.05487829 - time (sec): 165.98 - samples/sec: 304.37 - lr: 0.000112 - momentum: 0.000000
2023-10-13 06:15:57,389 epoch 4 - iter 792/1984 - loss 0.05252948 - time (sec): 220.87 - samples/sec: 301.83 - lr: 0.000110 - momentum: 0.000000
2023-10-13 06:16:50,636 epoch 4 - iter 990/1984 - loss 0.05421408 - time (sec): 274.12 - samples/sec: 303.57 - lr: 0.000108 - momentum: 0.000000
2023-10-13 06:17:42,895 epoch 4 - iter 1188/1984 - loss 0.05404910 - time (sec): 326.38 - samples/sec: 304.71 - lr: 0.000107 - momentum: 0.000000
2023-10-13 06:18:36,607 epoch 4 - iter 1386/1984 - loss 0.05254585 - time (sec): 380.09 - samples/sec: 304.61 - lr: 0.000105 - momentum: 0.000000
2023-10-13 06:19:34,329 epoch 4 - iter 1584/1984 - loss 0.05326865 - time (sec): 437.81 - samples/sec: 299.95 - lr: 0.000103 - momentum: 0.000000
2023-10-13 06:20:28,956 epoch 4 - iter 1782/1984 - loss 0.05356980 - time (sec): 492.44 - samples/sec: 300.72 - lr: 0.000102 - momentum: 0.000000
2023-10-13 06:21:26,190 epoch 4 - iter 1980/1984 - loss 0.05411207 - time (sec): 549.67 - samples/sec: 297.93 - lr: 0.000100 - momentum: 0.000000
2023-10-13 06:21:27,442 ----------------------------------------------------------------------------------------------------
2023-10-13 06:21:27,442 EPOCH 4 done: loss 0.0544 - lr: 0.000100
2023-10-13 06:21:56,062 DEV : loss 0.1296338140964508 - f1-score (micro avg)  0.7448
2023-10-13 06:21:56,106 saving best model
2023-10-13 06:22:00,166 ----------------------------------------------------------------------------------------------------
2023-10-13 06:22:57,064 epoch 5 - iter 198/1984 - loss 0.03401430 - time (sec): 56.89 - samples/sec: 285.27 - lr: 0.000098 - momentum: 0.000000
2023-10-13 06:23:49,847 epoch 5 - iter 396/1984 - loss 0.03367653 - time (sec): 109.68 - samples/sec: 287.03 - lr: 0.000097 - momentum: 0.000000
2023-10-13 06:24:44,133 epoch 5 - iter 594/1984 - loss 0.03772318 - time (sec): 163.96 - samples/sec: 294.26 - lr: 0.000095 - momentum: 0.000000
2023-10-13 06:25:37,666 epoch 5 - iter 792/1984 - loss 0.03702507 - time (sec): 217.49 - samples/sec: 296.65 - lr: 0.000093 - momentum: 0.000000
2023-10-13 06:26:37,327 epoch 5 - iter 990/1984 - loss 0.03647125 - time (sec): 277.16 - samples/sec: 298.28 - lr: 0.000092 - momentum: 0.000000
2023-10-13 06:27:29,315 epoch 5 - iter 1188/1984 - loss 0.03821053 - time (sec): 329.14 - samples/sec: 298.94 - lr: 0.000090 - momentum: 0.000000
2023-10-13 06:28:20,911 epoch 5 - iter 1386/1984 - loss 0.04020892 - time (sec): 380.74 - samples/sec: 300.13 - lr: 0.000088 - momentum: 0.000000
2023-10-13 06:29:15,349 epoch 5 - iter 1584/1984 - loss 0.04121393 - time (sec): 435.18 - samples/sec: 298.01 - lr: 0.000087 - momentum: 0.000000
2023-10-13 06:30:15,057 epoch 5 - iter 1782/1984 - loss 0.03983358 - time (sec): 494.89 - samples/sec: 295.16 - lr: 0.000085 - momentum: 0.000000
2023-10-13 06:31:16,160 epoch 5 - iter 1980/1984 - loss 0.04078366 - time (sec): 555.99 - samples/sec: 294.31 - lr: 0.000083 - momentum: 0.000000
2023-10-13 06:31:17,222 ----------------------------------------------------------------------------------------------------
2023-10-13 06:31:17,222 EPOCH 5 done: loss 0.0407 - lr: 0.000083
2023-10-13 06:31:42,184 DEV : loss 0.14384247362613678 - f1-score (micro avg)  0.7497
2023-10-13 06:31:42,224 saving best model
2023-10-13 06:31:44,772 ----------------------------------------------------------------------------------------------------
2023-10-13 06:32:37,857 epoch 6 - iter 198/1984 - loss 0.02600490 - time (sec): 53.08 - samples/sec: 290.45 - lr: 0.000082 - momentum: 0.000000
2023-10-13 06:33:33,217 epoch 6 - iter 396/1984 - loss 0.02863585 - time (sec): 108.44 - samples/sec: 290.63 - lr: 0.000080 - momentum: 0.000000
2023-10-13 06:34:28,450 epoch 6 - iter 594/1984 - loss 0.03111200 - time (sec): 163.67 - samples/sec: 291.52 - lr: 0.000078 - momentum: 0.000000
2023-10-13 06:35:21,690 epoch 6 - iter 792/1984 - loss 0.03154475 - time (sec): 216.91 - samples/sec: 297.11 - lr: 0.000077 - momentum: 0.000000
2023-10-13 06:36:14,761 epoch 6 - iter 990/1984 - loss 0.03052400 - time (sec): 269.98 - samples/sec: 302.74 - lr: 0.000075 - momentum: 0.000000
2023-10-13 06:37:09,857 epoch 6 - iter 1188/1984 - loss 0.02980984 - time (sec): 325.08 - samples/sec: 303.28 - lr: 0.000073 - momentum: 0.000000
2023-10-13 06:38:05,132 epoch 6 - iter 1386/1984 - loss 0.02858646 - time (sec): 380.36 - samples/sec: 301.79 - lr: 0.000072 - momentum: 0.000000
2023-10-13 06:38:57,131 epoch 6 - iter 1584/1984 - loss 0.02911257 - time (sec): 432.35 - samples/sec: 301.30 - lr: 0.000070 - momentum: 0.000000
2023-10-13 06:39:51,485 epoch 6 - iter 1782/1984 - loss 0.02938630 - time (sec): 486.71 - samples/sec: 302.81 - lr: 0.000068 - momentum: 0.000000
2023-10-13 06:40:47,211 epoch 6 - iter 1980/1984 - loss 0.02932146 - time (sec): 542.43 - samples/sec: 301.60 - lr: 0.000067 - momentum: 0.000000
2023-10-13 06:40:48,350 ----------------------------------------------------------------------------------------------------
2023-10-13 06:40:48,350 EPOCH 6 done: loss 0.0293 - lr: 0.000067
2023-10-13 06:41:17,254 DEV : loss 0.1786336749792099 - f1-score (micro avg)  0.7585
2023-10-13 06:41:17,296 saving best model
2023-10-13 06:41:18,383 ----------------------------------------------------------------------------------------------------
2023-10-13 06:42:15,602 epoch 7 - iter 198/1984 - loss 0.01565821 - time (sec): 57.22 - samples/sec: 273.44 - lr: 0.000065 - momentum: 0.000000
2023-10-13 06:43:13,372 epoch 7 - iter 396/1984 - loss 0.02160073 - time (sec): 114.99 - samples/sec: 275.78 - lr: 0.000063 - momentum: 0.000000
2023-10-13 06:44:11,042 epoch 7 - iter 594/1984 - loss 0.02297071 - time (sec): 172.66 - samples/sec: 279.79 - lr: 0.000062 - momentum: 0.000000
2023-10-13 06:45:06,994 epoch 7 - iter 792/1984 - loss 0.02194959 - time (sec): 228.61 - samples/sec: 281.23 - lr: 0.000060 - momentum: 0.000000
2023-10-13 06:46:02,561 epoch 7 - iter 990/1984 - loss 0.02145332 - time (sec): 284.18 - samples/sec: 283.36 - lr: 0.000058 - momentum: 0.000000
2023-10-13 06:46:55,114 epoch 7 - iter 1188/1984 - loss 0.02157394 - time (sec): 336.73 - samples/sec: 288.44 - lr: 0.000057 - momentum: 0.000000
2023-10-13 06:47:45,302 epoch 7 - iter 1386/1984 - loss 0.02232190 - time (sec): 386.92 - samples/sec: 294.58 - lr: 0.000055 - momentum: 0.000000
2023-10-13 06:48:36,186 epoch 7 - iter 1584/1984 - loss 0.02126233 - time (sec): 437.80 - samples/sec: 297.31 - lr: 0.000053 - momentum: 0.000000
2023-10-13 06:49:30,550 epoch 7 - iter 1782/1984 - loss 0.02124752 - time (sec): 492.16 - samples/sec: 296.78 - lr: 0.000052 - momentum: 0.000000
2023-10-13 06:50:24,014 epoch 7 - iter 1980/1984 - loss 0.02216943 - time (sec): 545.63 - samples/sec: 299.99 - lr: 0.000050 - momentum: 0.000000
2023-10-13 06:50:25,017 ----------------------------------------------------------------------------------------------------
2023-10-13 06:50:25,018 EPOCH 7 done: loss 0.0221 - lr: 0.000050
2023-10-13 06:50:50,990 DEV : loss 0.19668884575366974 - f1-score (micro avg)  0.7557
2023-10-13 06:50:51,030 ----------------------------------------------------------------------------------------------------
2023-10-13 06:51:42,700 epoch 8 - iter 198/1984 - loss 0.00771193 - time (sec): 51.67 - samples/sec: 307.77 - lr: 0.000048 - momentum: 0.000000
2023-10-13 06:52:33,732 epoch 8 - iter 396/1984 - loss 0.01096548 - time (sec): 102.70 - samples/sec: 311.86 - lr: 0.000047 - momentum: 0.000000
2023-10-13 06:53:25,866 epoch 8 - iter 594/1984 - loss 0.01124620 - time (sec): 154.83 - samples/sec: 306.71 - lr: 0.000045 - momentum: 0.000000
2023-10-13 06:54:21,829 epoch 8 - iter 792/1984 - loss 0.01189251 - time (sec): 210.80 - samples/sec: 303.37 - lr: 0.000043 - momentum: 0.000000
2023-10-13 06:55:16,537 epoch 8 - iter 990/1984 - loss 0.01234024 - time (sec): 265.50 - samples/sec: 303.81 - lr: 0.000042 - momentum: 0.000000
2023-10-13 06:56:09,844 epoch 8 - iter 1188/1984 - loss 0.01263419 - time (sec): 318.81 - samples/sec: 305.71 - lr: 0.000040 - momentum: 0.000000
2023-10-13 06:57:04,037 epoch 8 - iter 1386/1984 - loss 0.01265776 - time (sec): 373.00 - samples/sec: 303.96 - lr: 0.000038 - momentum: 0.000000
2023-10-13 06:57:55,226 epoch 8 - iter 1584/1984 - loss 0.01301713 - time (sec): 424.19 - samples/sec: 306.78 - lr: 0.000037 - momentum: 0.000000
2023-10-13 06:58:47,190 epoch 8 - iter 1782/1984 - loss 0.01413211 - time (sec): 476.16 - samples/sec: 309.92 - lr: 0.000035 - momentum: 0.000000
2023-10-13 06:59:38,513 epoch 8 - iter 1980/1984 - loss 0.01490286 - time (sec): 527.48 - samples/sec: 310.17 - lr: 0.000033 - momentum: 0.000000
2023-10-13 06:59:39,610 ----------------------------------------------------------------------------------------------------
2023-10-13 06:59:39,610 EPOCH 8 done: loss 0.0149 - lr: 0.000033
2023-10-13 07:00:04,753 DEV : loss 0.2151404768228531 - f1-score (micro avg)  0.7413
2023-10-13 07:00:04,794 ----------------------------------------------------------------------------------------------------
2023-10-13 07:00:55,949 epoch 9 - iter 198/1984 - loss 0.00807895 - time (sec): 51.15 - samples/sec: 325.86 - lr: 0.000032 - momentum: 0.000000
2023-10-13 07:01:51,407 epoch 9 - iter 396/1984 - loss 0.01077008 - time (sec): 106.61 - samples/sec: 318.81 - lr: 0.000030 - momentum: 0.000000
2023-10-13 07:02:43,202 epoch 9 - iter 594/1984 - loss 0.01143847 - time (sec): 158.41 - samples/sec: 316.85 - lr: 0.000028 - momentum: 0.000000
2023-10-13 07:03:35,088 epoch 9 - iter 792/1984 - loss 0.01163397 - time (sec): 210.29 - samples/sec: 315.64 - lr: 0.000027 - momentum: 0.000000
2023-10-13 07:04:27,415 epoch 9 - iter 990/1984 - loss 0.01100476 - time (sec): 262.62 - samples/sec: 314.62 - lr: 0.000025 - momentum: 0.000000
2023-10-13 07:05:20,750 epoch 9 - iter 1188/1984 - loss 0.01112938 - time (sec): 315.95 - samples/sec: 308.28 - lr: 0.000023 - momentum: 0.000000
2023-10-13 07:06:13,433 epoch 9 - iter 1386/1984 - loss 0.01138202 - time (sec): 368.64 - samples/sec: 307.01 - lr: 0.000022 - momentum: 0.000000
2023-10-13 07:07:07,062 epoch 9 - iter 1584/1984 - loss 0.01089230 - time (sec): 422.27 - samples/sec: 307.08 - lr: 0.000020 - momentum: 0.000000
2023-10-13 07:08:02,575 epoch 9 - iter 1782/1984 - loss 0.01217880 - time (sec): 477.78 - samples/sec: 306.61 - lr: 0.000018 - momentum: 0.000000
2023-10-13 07:08:58,446 epoch 9 - iter 1980/1984 - loss 0.01165758 - time (sec): 533.65 - samples/sec: 306.75 - lr: 0.000017 - momentum: 0.000000
2023-10-13 07:08:59,508 ----------------------------------------------------------------------------------------------------
2023-10-13 07:08:59,508 EPOCH 9 done: loss 0.0117 - lr: 0.000017
2023-10-13 07:09:24,667 DEV : loss 0.22990703582763672 - f1-score (micro avg)  0.7597
2023-10-13 07:09:24,711 saving best model
2023-10-13 07:09:27,881 ----------------------------------------------------------------------------------------------------
2023-10-13 07:10:20,160 epoch 10 - iter 198/1984 - loss 0.00796034 - time (sec): 52.27 - samples/sec: 315.72 - lr: 0.000015 - momentum: 0.000000
2023-10-13 07:11:12,693 epoch 10 - iter 396/1984 - loss 0.00819348 - time (sec): 104.81 - samples/sec: 314.94 - lr: 0.000013 - momentum: 0.000000
2023-10-13 07:12:06,325 epoch 10 - iter 594/1984 - loss 0.00998088 - time (sec): 158.44 - samples/sec: 311.36 - lr: 0.000012 - momentum: 0.000000
2023-10-13 07:12:58,511 epoch 10 - iter 792/1984 - loss 0.00872009 - time (sec): 210.63 - samples/sec: 315.36 - lr: 0.000010 - momentum: 0.000000
2023-10-13 07:13:49,092 epoch 10 - iter 990/1984 - loss 0.00830259 - time (sec): 261.21 - samples/sec: 315.50 - lr: 0.000008 - momentum: 0.000000
2023-10-13 07:14:40,128 epoch 10 - iter 1188/1984 - loss 0.00852866 - time (sec): 312.24 - samples/sec: 315.34 - lr: 0.000007 - momentum: 0.000000
2023-10-13 07:15:33,304 epoch 10 - iter 1386/1984 - loss 0.00801943 - time (sec): 365.42 - samples/sec: 316.72 - lr: 0.000005 - momentum: 0.000000
2023-10-13 07:16:28,514 epoch 10 - iter 1584/1984 - loss 0.00810705 - time (sec): 420.63 - samples/sec: 314.79 - lr: 0.000003 - momentum: 0.000000
2023-10-13 07:17:20,494 epoch 10 - iter 1782/1984 - loss 0.00802002 - time (sec): 472.61 - samples/sec: 312.72 - lr: 0.000002 - momentum: 0.000000
2023-10-13 07:18:10,876 epoch 10 - iter 1980/1984 - loss 0.00791535 - time (sec): 522.99 - samples/sec: 312.85 - lr: 0.000000 - momentum: 0.000000
2023-10-13 07:18:11,919 ----------------------------------------------------------------------------------------------------
2023-10-13 07:18:11,920 EPOCH 10 done: loss 0.0079 - lr: 0.000000
2023-10-13 07:18:36,515 DEV : loss 0.23257124423980713 - f1-score (micro avg)  0.7575
2023-10-13 07:18:37,476 ----------------------------------------------------------------------------------------------------
2023-10-13 07:18:37,478 Loading model from best epoch ...
2023-10-13 07:18:41,683 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-13 07:19:08,466 
Results:
- F-score (micro) 0.7605
- F-score (macro) 0.6686
- Accuracy 0.6421

By class:
              precision    recall  f1-score   support

         LOC     0.8172    0.8397    0.8283       655
         PER     0.6693    0.7713    0.7167       223
         ORG     0.5146    0.4173    0.4609       127

   micro avg     0.7502    0.7711    0.7605      1005
   macro avg     0.6670    0.6761    0.6686      1005
weighted avg     0.7462    0.7711    0.7571      1005

2023-10-13 07:19:08,466 ----------------------------------------------------------------------------------------------------