File size: 23,820 Bytes
7696267
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
2023-10-14 08:40:07,077 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,078 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-14 08:40:07,078 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,078 MultiCorpus: 5777 train + 722 dev + 723 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-14 08:40:07,078 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,078 Train:  5777 sentences
2023-10-14 08:40:07,078         (train_with_dev=False, train_with_test=False)
2023-10-14 08:40:07,078 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,078 Training Params:
2023-10-14 08:40:07,078  - learning_rate: "5e-05" 
2023-10-14 08:40:07,078  - mini_batch_size: "8"
2023-10-14 08:40:07,078  - max_epochs: "10"
2023-10-14 08:40:07,078  - shuffle: "True"
2023-10-14 08:40:07,078 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,078 Plugins:
2023-10-14 08:40:07,078  - LinearScheduler | warmup_fraction: '0.1'
2023-10-14 08:40:07,078 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,078 Final evaluation on model from best epoch (best-model.pt)
2023-10-14 08:40:07,078  - metric: "('micro avg', 'f1-score')"
2023-10-14 08:40:07,078 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,079 Computation:
2023-10-14 08:40:07,079  - compute on device: cuda:0
2023-10-14 08:40:07,079  - embedding storage: none
2023-10-14 08:40:07,079 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,079 Model training base path: "hmbench-icdar/nl-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-14 08:40:07,079 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:07,079 ----------------------------------------------------------------------------------------------------
2023-10-14 08:40:12,831 epoch 1 - iter 72/723 - loss 2.05918268 - time (sec): 5.75 - samples/sec: 2946.51 - lr: 0.000005 - momentum: 0.000000
2023-10-14 08:40:18,424 epoch 1 - iter 144/723 - loss 1.16818121 - time (sec): 11.34 - samples/sec: 2978.79 - lr: 0.000010 - momentum: 0.000000
2023-10-14 08:40:24,479 epoch 1 - iter 216/723 - loss 0.82822698 - time (sec): 17.40 - samples/sec: 3002.20 - lr: 0.000015 - momentum: 0.000000
2023-10-14 08:40:30,235 epoch 1 - iter 288/723 - loss 0.66985125 - time (sec): 23.16 - samples/sec: 3016.41 - lr: 0.000020 - momentum: 0.000000
2023-10-14 08:40:35,853 epoch 1 - iter 360/723 - loss 0.57437342 - time (sec): 28.77 - samples/sec: 3014.67 - lr: 0.000025 - momentum: 0.000000
2023-10-14 08:40:41,454 epoch 1 - iter 432/723 - loss 0.51652129 - time (sec): 34.37 - samples/sec: 2973.95 - lr: 0.000030 - momentum: 0.000000
2023-10-14 08:40:47,515 epoch 1 - iter 504/723 - loss 0.46201015 - time (sec): 40.43 - samples/sec: 2987.94 - lr: 0.000035 - momentum: 0.000000
2023-10-14 08:40:53,810 epoch 1 - iter 576/723 - loss 0.42250578 - time (sec): 46.73 - samples/sec: 2971.37 - lr: 0.000040 - momentum: 0.000000
2023-10-14 08:40:59,867 epoch 1 - iter 648/723 - loss 0.39178574 - time (sec): 52.79 - samples/sec: 2972.34 - lr: 0.000045 - momentum: 0.000000
2023-10-14 08:41:05,871 epoch 1 - iter 720/723 - loss 0.36449097 - time (sec): 58.79 - samples/sec: 2986.16 - lr: 0.000050 - momentum: 0.000000
2023-10-14 08:41:06,127 ----------------------------------------------------------------------------------------------------
2023-10-14 08:41:06,128 EPOCH 1 done: loss 0.3636 - lr: 0.000050
2023-10-14 08:41:10,093 DEV : loss 0.16045822203159332 - f1-score (micro avg)  0.5348
2023-10-14 08:41:10,116 saving best model
2023-10-14 08:41:10,556 ----------------------------------------------------------------------------------------------------
2023-10-14 08:41:16,325 epoch 2 - iter 72/723 - loss 0.14172033 - time (sec): 5.77 - samples/sec: 3010.48 - lr: 0.000049 - momentum: 0.000000
2023-10-14 08:41:22,109 epoch 2 - iter 144/723 - loss 0.12806058 - time (sec): 11.55 - samples/sec: 2980.57 - lr: 0.000049 - momentum: 0.000000
2023-10-14 08:41:28,315 epoch 2 - iter 216/723 - loss 0.12803539 - time (sec): 17.76 - samples/sec: 2933.19 - lr: 0.000048 - momentum: 0.000000
2023-10-14 08:41:33,862 epoch 2 - iter 288/723 - loss 0.12202263 - time (sec): 23.31 - samples/sec: 2973.59 - lr: 0.000048 - momentum: 0.000000
2023-10-14 08:41:40,189 epoch 2 - iter 360/723 - loss 0.12080964 - time (sec): 29.63 - samples/sec: 2958.52 - lr: 0.000047 - momentum: 0.000000
2023-10-14 08:41:45,874 epoch 2 - iter 432/723 - loss 0.11694382 - time (sec): 35.32 - samples/sec: 2965.78 - lr: 0.000047 - momentum: 0.000000
2023-10-14 08:41:51,983 epoch 2 - iter 504/723 - loss 0.11733998 - time (sec): 41.43 - samples/sec: 2962.82 - lr: 0.000046 - momentum: 0.000000
2023-10-14 08:41:57,323 epoch 2 - iter 576/723 - loss 0.11441675 - time (sec): 46.77 - samples/sec: 2964.30 - lr: 0.000046 - momentum: 0.000000
2023-10-14 08:42:03,643 epoch 2 - iter 648/723 - loss 0.11224219 - time (sec): 53.09 - samples/sec: 2972.82 - lr: 0.000045 - momentum: 0.000000
2023-10-14 08:42:09,681 epoch 2 - iter 720/723 - loss 0.11078356 - time (sec): 59.12 - samples/sec: 2970.99 - lr: 0.000044 - momentum: 0.000000
2023-10-14 08:42:09,901 ----------------------------------------------------------------------------------------------------
2023-10-14 08:42:09,902 EPOCH 2 done: loss 0.1107 - lr: 0.000044
2023-10-14 08:42:13,440 DEV : loss 0.10363695025444031 - f1-score (micro avg)  0.7041
2023-10-14 08:42:13,459 saving best model
2023-10-14 08:42:13,965 ----------------------------------------------------------------------------------------------------
2023-10-14 08:42:20,064 epoch 3 - iter 72/723 - loss 0.07046006 - time (sec): 6.09 - samples/sec: 2907.60 - lr: 0.000044 - momentum: 0.000000
2023-10-14 08:42:26,285 epoch 3 - iter 144/723 - loss 0.06632641 - time (sec): 12.31 - samples/sec: 2888.10 - lr: 0.000043 - momentum: 0.000000
2023-10-14 08:42:32,534 epoch 3 - iter 216/723 - loss 0.06851821 - time (sec): 18.56 - samples/sec: 2865.87 - lr: 0.000043 - momentum: 0.000000
2023-10-14 08:42:38,432 epoch 3 - iter 288/723 - loss 0.06615459 - time (sec): 24.46 - samples/sec: 2895.34 - lr: 0.000042 - momentum: 0.000000
2023-10-14 08:42:44,260 epoch 3 - iter 360/723 - loss 0.06572883 - time (sec): 30.29 - samples/sec: 2923.03 - lr: 0.000042 - momentum: 0.000000
2023-10-14 08:42:49,881 epoch 3 - iter 432/723 - loss 0.06538057 - time (sec): 35.91 - samples/sec: 2954.69 - lr: 0.000041 - momentum: 0.000000
2023-10-14 08:42:56,024 epoch 3 - iter 504/723 - loss 0.06700951 - time (sec): 42.05 - samples/sec: 2925.32 - lr: 0.000041 - momentum: 0.000000
2023-10-14 08:43:02,061 epoch 3 - iter 576/723 - loss 0.06591234 - time (sec): 48.09 - samples/sec: 2936.96 - lr: 0.000040 - momentum: 0.000000
2023-10-14 08:43:07,986 epoch 3 - iter 648/723 - loss 0.06585569 - time (sec): 54.01 - samples/sec: 2941.72 - lr: 0.000039 - momentum: 0.000000
2023-10-14 08:43:14,208 epoch 3 - iter 720/723 - loss 0.06567386 - time (sec): 60.24 - samples/sec: 2918.27 - lr: 0.000039 - momentum: 0.000000
2023-10-14 08:43:14,384 ----------------------------------------------------------------------------------------------------
2023-10-14 08:43:14,384 EPOCH 3 done: loss 0.0657 - lr: 0.000039
2023-10-14 08:43:18,341 DEV : loss 0.09261729568243027 - f1-score (micro avg)  0.7609
2023-10-14 08:43:18,361 saving best model
2023-10-14 08:43:18,852 ----------------------------------------------------------------------------------------------------
2023-10-14 08:43:25,236 epoch 4 - iter 72/723 - loss 0.03634882 - time (sec): 6.38 - samples/sec: 2825.65 - lr: 0.000038 - momentum: 0.000000
2023-10-14 08:43:31,297 epoch 4 - iter 144/723 - loss 0.03662606 - time (sec): 12.44 - samples/sec: 2942.03 - lr: 0.000038 - momentum: 0.000000
2023-10-14 08:43:37,068 epoch 4 - iter 216/723 - loss 0.04038379 - time (sec): 18.21 - samples/sec: 2920.40 - lr: 0.000037 - momentum: 0.000000
2023-10-14 08:43:43,362 epoch 4 - iter 288/723 - loss 0.04257444 - time (sec): 24.51 - samples/sec: 2908.61 - lr: 0.000037 - momentum: 0.000000
2023-10-14 08:43:49,090 epoch 4 - iter 360/723 - loss 0.04462673 - time (sec): 30.24 - samples/sec: 2930.97 - lr: 0.000036 - momentum: 0.000000
2023-10-14 08:43:54,761 epoch 4 - iter 432/723 - loss 0.04482846 - time (sec): 35.91 - samples/sec: 2930.90 - lr: 0.000036 - momentum: 0.000000
2023-10-14 08:44:00,366 epoch 4 - iter 504/723 - loss 0.04395167 - time (sec): 41.51 - samples/sec: 2937.71 - lr: 0.000035 - momentum: 0.000000
2023-10-14 08:44:06,685 epoch 4 - iter 576/723 - loss 0.04429902 - time (sec): 47.83 - samples/sec: 2937.07 - lr: 0.000034 - momentum: 0.000000
2023-10-14 08:44:12,568 epoch 4 - iter 648/723 - loss 0.04540314 - time (sec): 53.72 - samples/sec: 2927.97 - lr: 0.000034 - momentum: 0.000000
2023-10-14 08:44:18,568 epoch 4 - iter 720/723 - loss 0.04549307 - time (sec): 59.71 - samples/sec: 2942.83 - lr: 0.000033 - momentum: 0.000000
2023-10-14 08:44:18,769 ----------------------------------------------------------------------------------------------------
2023-10-14 08:44:18,769 EPOCH 4 done: loss 0.0459 - lr: 0.000033
2023-10-14 08:44:22,278 DEV : loss 0.0912981629371643 - f1-score (micro avg)  0.8023
2023-10-14 08:44:22,302 saving best model
2023-10-14 08:44:22,801 ----------------------------------------------------------------------------------------------------
2023-10-14 08:44:29,581 epoch 5 - iter 72/723 - loss 0.02994168 - time (sec): 6.78 - samples/sec: 2619.32 - lr: 0.000033 - momentum: 0.000000
2023-10-14 08:44:35,538 epoch 5 - iter 144/723 - loss 0.03265802 - time (sec): 12.73 - samples/sec: 2796.63 - lr: 0.000032 - momentum: 0.000000
2023-10-14 08:44:41,207 epoch 5 - iter 216/723 - loss 0.03167073 - time (sec): 18.40 - samples/sec: 2852.04 - lr: 0.000032 - momentum: 0.000000
2023-10-14 08:44:46,988 epoch 5 - iter 288/723 - loss 0.03339260 - time (sec): 24.18 - samples/sec: 2891.82 - lr: 0.000031 - momentum: 0.000000
2023-10-14 08:44:52,816 epoch 5 - iter 360/723 - loss 0.03182288 - time (sec): 30.01 - samples/sec: 2921.00 - lr: 0.000031 - momentum: 0.000000
2023-10-14 08:44:59,507 epoch 5 - iter 432/723 - loss 0.03101441 - time (sec): 36.70 - samples/sec: 2884.85 - lr: 0.000030 - momentum: 0.000000
2023-10-14 08:45:05,301 epoch 5 - iter 504/723 - loss 0.03178002 - time (sec): 42.50 - samples/sec: 2887.99 - lr: 0.000029 - momentum: 0.000000
2023-10-14 08:45:11,413 epoch 5 - iter 576/723 - loss 0.03195886 - time (sec): 48.61 - samples/sec: 2892.36 - lr: 0.000029 - momentum: 0.000000
2023-10-14 08:45:17,766 epoch 5 - iter 648/723 - loss 0.03371410 - time (sec): 54.96 - samples/sec: 2883.93 - lr: 0.000028 - momentum: 0.000000
2023-10-14 08:45:23,372 epoch 5 - iter 720/723 - loss 0.03353303 - time (sec): 60.57 - samples/sec: 2897.64 - lr: 0.000028 - momentum: 0.000000
2023-10-14 08:45:23,707 ----------------------------------------------------------------------------------------------------
2023-10-14 08:45:23,707 EPOCH 5 done: loss 0.0334 - lr: 0.000028
2023-10-14 08:45:27,341 DEV : loss 0.15092381834983826 - f1-score (micro avg)  0.7634
2023-10-14 08:45:27,363 ----------------------------------------------------------------------------------------------------
2023-10-14 08:45:33,495 epoch 6 - iter 72/723 - loss 0.02383597 - time (sec): 6.13 - samples/sec: 2932.56 - lr: 0.000027 - momentum: 0.000000
2023-10-14 08:45:39,895 epoch 6 - iter 144/723 - loss 0.02275131 - time (sec): 12.53 - samples/sec: 2885.29 - lr: 0.000027 - momentum: 0.000000
2023-10-14 08:45:45,901 epoch 6 - iter 216/723 - loss 0.02559205 - time (sec): 18.54 - samples/sec: 2912.43 - lr: 0.000026 - momentum: 0.000000
2023-10-14 08:45:52,387 epoch 6 - iter 288/723 - loss 0.02648482 - time (sec): 25.02 - samples/sec: 2872.80 - lr: 0.000026 - momentum: 0.000000
2023-10-14 08:45:57,924 epoch 6 - iter 360/723 - loss 0.02538316 - time (sec): 30.56 - samples/sec: 2900.53 - lr: 0.000025 - momentum: 0.000000
2023-10-14 08:46:03,765 epoch 6 - iter 432/723 - loss 0.02448723 - time (sec): 36.40 - samples/sec: 2901.17 - lr: 0.000024 - momentum: 0.000000
2023-10-14 08:46:09,811 epoch 6 - iter 504/723 - loss 0.02458171 - time (sec): 42.45 - samples/sec: 2906.03 - lr: 0.000024 - momentum: 0.000000
2023-10-14 08:46:16,026 epoch 6 - iter 576/723 - loss 0.02508343 - time (sec): 48.66 - samples/sec: 2911.09 - lr: 0.000023 - momentum: 0.000000
2023-10-14 08:46:21,790 epoch 6 - iter 648/723 - loss 0.02441954 - time (sec): 54.43 - samples/sec: 2909.44 - lr: 0.000023 - momentum: 0.000000
2023-10-14 08:46:27,487 epoch 6 - iter 720/723 - loss 0.02461414 - time (sec): 60.12 - samples/sec: 2921.93 - lr: 0.000022 - momentum: 0.000000
2023-10-14 08:46:27,706 ----------------------------------------------------------------------------------------------------
2023-10-14 08:46:27,706 EPOCH 6 done: loss 0.0246 - lr: 0.000022
2023-10-14 08:46:31,624 DEV : loss 0.13394637405872345 - f1-score (micro avg)  0.8007
2023-10-14 08:46:31,640 ----------------------------------------------------------------------------------------------------
2023-10-14 08:46:37,789 epoch 7 - iter 72/723 - loss 0.01052973 - time (sec): 6.15 - samples/sec: 2831.45 - lr: 0.000022 - momentum: 0.000000
2023-10-14 08:46:43,883 epoch 7 - iter 144/723 - loss 0.01534049 - time (sec): 12.24 - samples/sec: 2788.45 - lr: 0.000021 - momentum: 0.000000
2023-10-14 08:46:50,266 epoch 7 - iter 216/723 - loss 0.01640192 - time (sec): 18.62 - samples/sec: 2816.89 - lr: 0.000021 - momentum: 0.000000
2023-10-14 08:46:56,437 epoch 7 - iter 288/723 - loss 0.01582983 - time (sec): 24.79 - samples/sec: 2843.68 - lr: 0.000020 - momentum: 0.000000
2023-10-14 08:47:02,511 epoch 7 - iter 360/723 - loss 0.01684897 - time (sec): 30.87 - samples/sec: 2851.30 - lr: 0.000019 - momentum: 0.000000
2023-10-14 08:47:08,860 epoch 7 - iter 432/723 - loss 0.01771060 - time (sec): 37.22 - samples/sec: 2861.05 - lr: 0.000019 - momentum: 0.000000
2023-10-14 08:47:14,726 epoch 7 - iter 504/723 - loss 0.01806623 - time (sec): 43.08 - samples/sec: 2860.00 - lr: 0.000018 - momentum: 0.000000
2023-10-14 08:47:20,914 epoch 7 - iter 576/723 - loss 0.01763909 - time (sec): 49.27 - samples/sec: 2870.54 - lr: 0.000018 - momentum: 0.000000
2023-10-14 08:47:26,694 epoch 7 - iter 648/723 - loss 0.01807010 - time (sec): 55.05 - samples/sec: 2870.39 - lr: 0.000017 - momentum: 0.000000
2023-10-14 08:47:32,584 epoch 7 - iter 720/723 - loss 0.01793967 - time (sec): 60.94 - samples/sec: 2882.95 - lr: 0.000017 - momentum: 0.000000
2023-10-14 08:47:32,768 ----------------------------------------------------------------------------------------------------
2023-10-14 08:47:32,768 EPOCH 7 done: loss 0.0180 - lr: 0.000017
2023-10-14 08:47:36,309 DEV : loss 0.15739385783672333 - f1-score (micro avg)  0.8091
2023-10-14 08:47:36,330 saving best model
2023-10-14 08:47:36,972 ----------------------------------------------------------------------------------------------------
2023-10-14 08:47:42,658 epoch 8 - iter 72/723 - loss 0.00886684 - time (sec): 5.68 - samples/sec: 3068.63 - lr: 0.000016 - momentum: 0.000000
2023-10-14 08:47:49,383 epoch 8 - iter 144/723 - loss 0.00889193 - time (sec): 12.41 - samples/sec: 2843.55 - lr: 0.000016 - momentum: 0.000000
2023-10-14 08:47:55,741 epoch 8 - iter 216/723 - loss 0.01045971 - time (sec): 18.77 - samples/sec: 2819.84 - lr: 0.000015 - momentum: 0.000000
2023-10-14 08:48:01,528 epoch 8 - iter 288/723 - loss 0.01209747 - time (sec): 24.55 - samples/sec: 2858.09 - lr: 0.000014 - momentum: 0.000000
2023-10-14 08:48:07,734 epoch 8 - iter 360/723 - loss 0.01220357 - time (sec): 30.76 - samples/sec: 2896.09 - lr: 0.000014 - momentum: 0.000000
2023-10-14 08:48:13,536 epoch 8 - iter 432/723 - loss 0.01156847 - time (sec): 36.56 - samples/sec: 2896.66 - lr: 0.000013 - momentum: 0.000000
2023-10-14 08:48:19,226 epoch 8 - iter 504/723 - loss 0.01200927 - time (sec): 42.25 - samples/sec: 2924.08 - lr: 0.000013 - momentum: 0.000000
2023-10-14 08:48:24,682 epoch 8 - iter 576/723 - loss 0.01257051 - time (sec): 47.71 - samples/sec: 2936.85 - lr: 0.000012 - momentum: 0.000000
2023-10-14 08:48:31,135 epoch 8 - iter 648/723 - loss 0.01301488 - time (sec): 54.16 - samples/sec: 2925.42 - lr: 0.000012 - momentum: 0.000000
2023-10-14 08:48:37,310 epoch 8 - iter 720/723 - loss 0.01265192 - time (sec): 60.34 - samples/sec: 2914.65 - lr: 0.000011 - momentum: 0.000000
2023-10-14 08:48:37,477 ----------------------------------------------------------------------------------------------------
2023-10-14 08:48:37,477 EPOCH 8 done: loss 0.0126 - lr: 0.000011
2023-10-14 08:48:41,031 DEV : loss 0.16435782611370087 - f1-score (micro avg)  0.8191
2023-10-14 08:48:41,057 saving best model
2023-10-14 08:48:41,555 ----------------------------------------------------------------------------------------------------
2023-10-14 08:48:47,795 epoch 9 - iter 72/723 - loss 0.01070952 - time (sec): 6.24 - samples/sec: 2923.12 - lr: 0.000011 - momentum: 0.000000
2023-10-14 08:48:54,609 epoch 9 - iter 144/723 - loss 0.01060093 - time (sec): 13.05 - samples/sec: 2817.51 - lr: 0.000010 - momentum: 0.000000
2023-10-14 08:49:00,950 epoch 9 - iter 216/723 - loss 0.01086091 - time (sec): 19.39 - samples/sec: 2884.36 - lr: 0.000009 - momentum: 0.000000
2023-10-14 08:49:06,724 epoch 9 - iter 288/723 - loss 0.01001855 - time (sec): 25.17 - samples/sec: 2866.40 - lr: 0.000009 - momentum: 0.000000
2023-10-14 08:49:13,184 epoch 9 - iter 360/723 - loss 0.00908806 - time (sec): 31.63 - samples/sec: 2871.35 - lr: 0.000008 - momentum: 0.000000
2023-10-14 08:49:18,726 epoch 9 - iter 432/723 - loss 0.00884104 - time (sec): 37.17 - samples/sec: 2881.99 - lr: 0.000008 - momentum: 0.000000
2023-10-14 08:49:24,886 epoch 9 - iter 504/723 - loss 0.00923270 - time (sec): 43.33 - samples/sec: 2865.04 - lr: 0.000007 - momentum: 0.000000
2023-10-14 08:49:30,450 epoch 9 - iter 576/723 - loss 0.00934944 - time (sec): 48.89 - samples/sec: 2866.48 - lr: 0.000007 - momentum: 0.000000
2023-10-14 08:49:36,426 epoch 9 - iter 648/723 - loss 0.00957257 - time (sec): 54.87 - samples/sec: 2870.69 - lr: 0.000006 - momentum: 0.000000
2023-10-14 08:49:42,770 epoch 9 - iter 720/723 - loss 0.00929602 - time (sec): 61.21 - samples/sec: 2869.96 - lr: 0.000006 - momentum: 0.000000
2023-10-14 08:49:42,966 ----------------------------------------------------------------------------------------------------
2023-10-14 08:49:42,966 EPOCH 9 done: loss 0.0093 - lr: 0.000006
2023-10-14 08:49:47,108 DEV : loss 0.19722115993499756 - f1-score (micro avg)  0.7954
2023-10-14 08:49:47,126 ----------------------------------------------------------------------------------------------------
2023-10-14 08:49:53,385 epoch 10 - iter 72/723 - loss 0.00259989 - time (sec): 6.26 - samples/sec: 2936.21 - lr: 0.000005 - momentum: 0.000000
2023-10-14 08:49:59,031 epoch 10 - iter 144/723 - loss 0.00417789 - time (sec): 11.90 - samples/sec: 2913.46 - lr: 0.000004 - momentum: 0.000000
2023-10-14 08:50:05,446 epoch 10 - iter 216/723 - loss 0.00571543 - time (sec): 18.32 - samples/sec: 2873.95 - lr: 0.000004 - momentum: 0.000000
2023-10-14 08:50:11,774 epoch 10 - iter 288/723 - loss 0.00605918 - time (sec): 24.65 - samples/sec: 2877.11 - lr: 0.000003 - momentum: 0.000000
2023-10-14 08:50:17,626 epoch 10 - iter 360/723 - loss 0.00556952 - time (sec): 30.50 - samples/sec: 2880.45 - lr: 0.000003 - momentum: 0.000000
2023-10-14 08:50:24,462 epoch 10 - iter 432/723 - loss 0.00569447 - time (sec): 37.33 - samples/sec: 2868.44 - lr: 0.000002 - momentum: 0.000000
2023-10-14 08:50:30,331 epoch 10 - iter 504/723 - loss 0.00564912 - time (sec): 43.20 - samples/sec: 2872.78 - lr: 0.000002 - momentum: 0.000000
2023-10-14 08:50:36,276 epoch 10 - iter 576/723 - loss 0.00551211 - time (sec): 49.15 - samples/sec: 2871.73 - lr: 0.000001 - momentum: 0.000000
2023-10-14 08:50:42,081 epoch 10 - iter 648/723 - loss 0.00538978 - time (sec): 54.95 - samples/sec: 2871.64 - lr: 0.000001 - momentum: 0.000000
2023-10-14 08:50:48,575 epoch 10 - iter 720/723 - loss 0.00516187 - time (sec): 61.45 - samples/sec: 2862.03 - lr: 0.000000 - momentum: 0.000000
2023-10-14 08:50:48,765 ----------------------------------------------------------------------------------------------------
2023-10-14 08:50:48,765 EPOCH 10 done: loss 0.0052 - lr: 0.000000
2023-10-14 08:50:52,276 DEV : loss 0.1937766820192337 - f1-score (micro avg)  0.8068
2023-10-14 08:50:52,725 ----------------------------------------------------------------------------------------------------
2023-10-14 08:50:52,727 Loading model from best epoch ...
2023-10-14 08:50:55,066 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-14 08:50:57,817 
Results:
- F-score (micro) 0.8168
- F-score (macro) 0.7131
- Accuracy 0.7047

By class:
              precision    recall  f1-score   support

         PER     0.8333    0.8195    0.8264       482
         LOC     0.8821    0.8493    0.8654       458
         ORG     0.4324    0.4638    0.4476        69

   micro avg     0.8251    0.8087    0.8168      1009
   macro avg     0.7160    0.7109    0.7131      1009
weighted avg     0.8280    0.8087    0.8182      1009

2023-10-14 08:50:57,817 ----------------------------------------------------------------------------------------------------