File size: 23,776 Bytes
25f1529
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
2023-10-14 08:28:49,609 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,610 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-14 08:28:49,610 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,610 MultiCorpus: 5777 train + 722 dev + 723 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-14 08:28:49,610 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,610 Train:  5777 sentences
2023-10-14 08:28:49,610         (train_with_dev=False, train_with_test=False)
2023-10-14 08:28:49,610 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,610 Training Params:
2023-10-14 08:28:49,610  - learning_rate: "3e-05" 
2023-10-14 08:28:49,610  - mini_batch_size: "8"
2023-10-14 08:28:49,610  - max_epochs: "10"
2023-10-14 08:28:49,610  - shuffle: "True"
2023-10-14 08:28:49,611 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,611 Plugins:
2023-10-14 08:28:49,611  - LinearScheduler | warmup_fraction: '0.1'
2023-10-14 08:28:49,611 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,611 Final evaluation on model from best epoch (best-model.pt)
2023-10-14 08:28:49,611  - metric: "('micro avg', 'f1-score')"
2023-10-14 08:28:49,611 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,611 Computation:
2023-10-14 08:28:49,611  - compute on device: cuda:0
2023-10-14 08:28:49,611  - embedding storage: none
2023-10-14 08:28:49,611 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,611 Model training base path: "hmbench-icdar/nl-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-14 08:28:49,611 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:49,611 ----------------------------------------------------------------------------------------------------
2023-10-14 08:28:55,399 epoch 1 - iter 72/723 - loss 2.34500859 - time (sec): 5.79 - samples/sec: 2928.05 - lr: 0.000003 - momentum: 0.000000
2023-10-14 08:29:01,032 epoch 1 - iter 144/723 - loss 1.40146179 - time (sec): 11.42 - samples/sec: 2958.85 - lr: 0.000006 - momentum: 0.000000
2023-10-14 08:29:07,150 epoch 1 - iter 216/723 - loss 0.98472208 - time (sec): 17.54 - samples/sec: 2978.52 - lr: 0.000009 - momentum: 0.000000
2023-10-14 08:29:12,983 epoch 1 - iter 288/723 - loss 0.78948184 - time (sec): 23.37 - samples/sec: 2988.51 - lr: 0.000012 - momentum: 0.000000
2023-10-14 08:29:18,706 epoch 1 - iter 360/723 - loss 0.67269994 - time (sec): 29.09 - samples/sec: 2981.53 - lr: 0.000015 - momentum: 0.000000
2023-10-14 08:29:24,406 epoch 1 - iter 432/723 - loss 0.60136728 - time (sec): 34.79 - samples/sec: 2938.07 - lr: 0.000018 - momentum: 0.000000
2023-10-14 08:29:30,603 epoch 1 - iter 504/723 - loss 0.53425447 - time (sec): 40.99 - samples/sec: 2947.40 - lr: 0.000021 - momentum: 0.000000
2023-10-14 08:29:37,056 epoch 1 - iter 576/723 - loss 0.48656421 - time (sec): 47.44 - samples/sec: 2926.64 - lr: 0.000024 - momentum: 0.000000
2023-10-14 08:29:43,158 epoch 1 - iter 648/723 - loss 0.44962888 - time (sec): 53.55 - samples/sec: 2930.20 - lr: 0.000027 - momentum: 0.000000
2023-10-14 08:29:49,227 epoch 1 - iter 720/723 - loss 0.41686927 - time (sec): 59.62 - samples/sec: 2944.90 - lr: 0.000030 - momentum: 0.000000
2023-10-14 08:29:49,488 ----------------------------------------------------------------------------------------------------
2023-10-14 08:29:49,488 EPOCH 1 done: loss 0.4157 - lr: 0.000030
2023-10-14 08:29:52,663 DEV : loss 0.16736090183258057 - f1-score (micro avg)  0.5276
2023-10-14 08:29:52,698 saving best model
2023-10-14 08:29:53,052 ----------------------------------------------------------------------------------------------------
2023-10-14 08:29:58,787 epoch 2 - iter 72/723 - loss 0.15097610 - time (sec): 5.73 - samples/sec: 3028.16 - lr: 0.000030 - momentum: 0.000000
2023-10-14 08:30:04,682 epoch 2 - iter 144/723 - loss 0.13684182 - time (sec): 11.63 - samples/sec: 2960.60 - lr: 0.000029 - momentum: 0.000000
2023-10-14 08:30:10,736 epoch 2 - iter 216/723 - loss 0.13585466 - time (sec): 17.68 - samples/sec: 2945.67 - lr: 0.000029 - momentum: 0.000000
2023-10-14 08:30:16,269 epoch 2 - iter 288/723 - loss 0.12788984 - time (sec): 23.22 - samples/sec: 2985.08 - lr: 0.000029 - momentum: 0.000000
2023-10-14 08:30:22,637 epoch 2 - iter 360/723 - loss 0.12683567 - time (sec): 29.58 - samples/sec: 2963.23 - lr: 0.000028 - momentum: 0.000000
2023-10-14 08:30:28,367 epoch 2 - iter 432/723 - loss 0.12355013 - time (sec): 35.31 - samples/sec: 2965.98 - lr: 0.000028 - momentum: 0.000000
2023-10-14 08:30:34,488 epoch 2 - iter 504/723 - loss 0.12414425 - time (sec): 41.44 - samples/sec: 2962.13 - lr: 0.000028 - momentum: 0.000000
2023-10-14 08:30:39,830 epoch 2 - iter 576/723 - loss 0.12115131 - time (sec): 46.78 - samples/sec: 2963.61 - lr: 0.000027 - momentum: 0.000000
2023-10-14 08:30:46,149 epoch 2 - iter 648/723 - loss 0.11808836 - time (sec): 53.10 - samples/sec: 2972.27 - lr: 0.000027 - momentum: 0.000000
2023-10-14 08:30:52,188 epoch 2 - iter 720/723 - loss 0.11598087 - time (sec): 59.14 - samples/sec: 2970.40 - lr: 0.000027 - momentum: 0.000000
2023-10-14 08:30:52,397 ----------------------------------------------------------------------------------------------------
2023-10-14 08:30:52,397 EPOCH 2 done: loss 0.1160 - lr: 0.000027
2023-10-14 08:30:56,289 DEV : loss 0.09956270456314087 - f1-score (micro avg)  0.7529
2023-10-14 08:30:56,304 saving best model
2023-10-14 08:30:56,754 ----------------------------------------------------------------------------------------------------
2023-10-14 08:31:02,841 epoch 3 - iter 72/723 - loss 0.07480008 - time (sec): 6.09 - samples/sec: 2911.61 - lr: 0.000026 - momentum: 0.000000
2023-10-14 08:31:08,895 epoch 3 - iter 144/723 - loss 0.06907493 - time (sec): 12.14 - samples/sec: 2929.73 - lr: 0.000026 - momentum: 0.000000
2023-10-14 08:31:14,786 epoch 3 - iter 216/723 - loss 0.07126848 - time (sec): 18.03 - samples/sec: 2950.59 - lr: 0.000026 - momentum: 0.000000
2023-10-14 08:31:20,680 epoch 3 - iter 288/723 - loss 0.06958853 - time (sec): 23.92 - samples/sec: 2960.31 - lr: 0.000025 - momentum: 0.000000
2023-10-14 08:31:26,523 epoch 3 - iter 360/723 - loss 0.06902856 - time (sec): 29.77 - samples/sec: 2974.29 - lr: 0.000025 - momentum: 0.000000
2023-10-14 08:31:32,101 epoch 3 - iter 432/723 - loss 0.06866383 - time (sec): 35.35 - samples/sec: 3001.94 - lr: 0.000025 - momentum: 0.000000
2023-10-14 08:31:38,269 epoch 3 - iter 504/723 - loss 0.07120681 - time (sec): 41.51 - samples/sec: 2963.35 - lr: 0.000024 - momentum: 0.000000
2023-10-14 08:31:44,320 epoch 3 - iter 576/723 - loss 0.06999573 - time (sec): 47.56 - samples/sec: 2969.47 - lr: 0.000024 - momentum: 0.000000
2023-10-14 08:31:50,039 epoch 3 - iter 648/723 - loss 0.06994277 - time (sec): 53.28 - samples/sec: 2982.09 - lr: 0.000024 - momentum: 0.000000
2023-10-14 08:31:56,214 epoch 3 - iter 720/723 - loss 0.06941747 - time (sec): 59.46 - samples/sec: 2956.51 - lr: 0.000023 - momentum: 0.000000
2023-10-14 08:31:56,392 ----------------------------------------------------------------------------------------------------
2023-10-14 08:31:56,392 EPOCH 3 done: loss 0.0694 - lr: 0.000023
2023-10-14 08:31:59,885 DEV : loss 0.09209852665662766 - f1-score (micro avg)  0.7741
2023-10-14 08:31:59,909 saving best model
2023-10-14 08:32:00,444 ----------------------------------------------------------------------------------------------------
2023-10-14 08:32:07,148 epoch 4 - iter 72/723 - loss 0.04111019 - time (sec): 6.70 - samples/sec: 2691.45 - lr: 0.000023 - momentum: 0.000000
2023-10-14 08:32:13,780 epoch 4 - iter 144/723 - loss 0.03743022 - time (sec): 13.33 - samples/sec: 2745.82 - lr: 0.000023 - momentum: 0.000000
2023-10-14 08:32:19,590 epoch 4 - iter 216/723 - loss 0.04032334 - time (sec): 19.14 - samples/sec: 2778.83 - lr: 0.000022 - momentum: 0.000000
2023-10-14 08:32:25,877 epoch 4 - iter 288/723 - loss 0.04209697 - time (sec): 25.43 - samples/sec: 2803.23 - lr: 0.000022 - momentum: 0.000000
2023-10-14 08:32:31,717 epoch 4 - iter 360/723 - loss 0.04334243 - time (sec): 31.27 - samples/sec: 2834.19 - lr: 0.000022 - momentum: 0.000000
2023-10-14 08:32:37,415 epoch 4 - iter 432/723 - loss 0.04378885 - time (sec): 36.97 - samples/sec: 2846.93 - lr: 0.000021 - momentum: 0.000000
2023-10-14 08:32:43,029 epoch 4 - iter 504/723 - loss 0.04269892 - time (sec): 42.58 - samples/sec: 2864.00 - lr: 0.000021 - momentum: 0.000000
2023-10-14 08:32:49,370 epoch 4 - iter 576/723 - loss 0.04332948 - time (sec): 48.92 - samples/sec: 2871.55 - lr: 0.000021 - momentum: 0.000000
2023-10-14 08:32:55,288 epoch 4 - iter 648/723 - loss 0.04450354 - time (sec): 54.84 - samples/sec: 2867.88 - lr: 0.000020 - momentum: 0.000000
2023-10-14 08:33:01,292 epoch 4 - iter 720/723 - loss 0.04532438 - time (sec): 60.84 - samples/sec: 2888.17 - lr: 0.000020 - momentum: 0.000000
2023-10-14 08:33:01,510 ----------------------------------------------------------------------------------------------------
2023-10-14 08:33:01,510 EPOCH 4 done: loss 0.0459 - lr: 0.000020
2023-10-14 08:33:04,963 DEV : loss 0.09347887337207794 - f1-score (micro avg)  0.7897
2023-10-14 08:33:04,979 saving best model
2023-10-14 08:33:05,498 ----------------------------------------------------------------------------------------------------
2023-10-14 08:33:11,770 epoch 5 - iter 72/723 - loss 0.02961405 - time (sec): 6.27 - samples/sec: 2830.72 - lr: 0.000020 - momentum: 0.000000
2023-10-14 08:33:17,831 epoch 5 - iter 144/723 - loss 0.03413594 - time (sec): 12.33 - samples/sec: 2887.48 - lr: 0.000019 - momentum: 0.000000
2023-10-14 08:33:23,316 epoch 5 - iter 216/723 - loss 0.03172083 - time (sec): 17.82 - samples/sec: 2945.68 - lr: 0.000019 - momentum: 0.000000
2023-10-14 08:33:29,114 epoch 5 - iter 288/723 - loss 0.03207149 - time (sec): 23.62 - samples/sec: 2961.38 - lr: 0.000019 - momentum: 0.000000
2023-10-14 08:33:34,760 epoch 5 - iter 360/723 - loss 0.03130345 - time (sec): 29.26 - samples/sec: 2995.97 - lr: 0.000018 - momentum: 0.000000
2023-10-14 08:33:41,220 epoch 5 - iter 432/723 - loss 0.03111424 - time (sec): 35.72 - samples/sec: 2964.12 - lr: 0.000018 - momentum: 0.000000
2023-10-14 08:33:46,931 epoch 5 - iter 504/723 - loss 0.03173225 - time (sec): 41.43 - samples/sec: 2962.14 - lr: 0.000018 - momentum: 0.000000
2023-10-14 08:33:52,899 epoch 5 - iter 576/723 - loss 0.03178811 - time (sec): 47.40 - samples/sec: 2966.06 - lr: 0.000017 - momentum: 0.000000
2023-10-14 08:33:59,266 epoch 5 - iter 648/723 - loss 0.03309214 - time (sec): 53.77 - samples/sec: 2947.98 - lr: 0.000017 - momentum: 0.000000
2023-10-14 08:34:04,879 epoch 5 - iter 720/723 - loss 0.03278333 - time (sec): 59.38 - samples/sec: 2955.61 - lr: 0.000017 - momentum: 0.000000
2023-10-14 08:34:05,223 ----------------------------------------------------------------------------------------------------
2023-10-14 08:34:05,223 EPOCH 5 done: loss 0.0327 - lr: 0.000017
2023-10-14 08:34:09,605 DEV : loss 0.1093878448009491 - f1-score (micro avg)  0.8056
2023-10-14 08:34:09,627 saving best model
2023-10-14 08:34:10,174 ----------------------------------------------------------------------------------------------------
2023-10-14 08:34:16,234 epoch 6 - iter 72/723 - loss 0.02664406 - time (sec): 6.06 - samples/sec: 2967.43 - lr: 0.000016 - momentum: 0.000000
2023-10-14 08:34:22,424 epoch 6 - iter 144/723 - loss 0.02610538 - time (sec): 12.25 - samples/sec: 2951.86 - lr: 0.000016 - momentum: 0.000000
2023-10-14 08:34:28,422 epoch 6 - iter 216/723 - loss 0.02632130 - time (sec): 18.25 - samples/sec: 2958.67 - lr: 0.000016 - momentum: 0.000000
2023-10-14 08:34:34,905 epoch 6 - iter 288/723 - loss 0.02918746 - time (sec): 24.73 - samples/sec: 2906.90 - lr: 0.000015 - momentum: 0.000000
2023-10-14 08:34:40,499 epoch 6 - iter 360/723 - loss 0.02786739 - time (sec): 30.32 - samples/sec: 2923.14 - lr: 0.000015 - momentum: 0.000000
2023-10-14 08:34:46,405 epoch 6 - iter 432/723 - loss 0.02591985 - time (sec): 36.23 - samples/sec: 2914.88 - lr: 0.000015 - momentum: 0.000000
2023-10-14 08:34:52,512 epoch 6 - iter 504/723 - loss 0.02585531 - time (sec): 42.34 - samples/sec: 2913.63 - lr: 0.000014 - momentum: 0.000000
2023-10-14 08:34:58,755 epoch 6 - iter 576/723 - loss 0.02673939 - time (sec): 48.58 - samples/sec: 2916.01 - lr: 0.000014 - momentum: 0.000000
2023-10-14 08:35:04,619 epoch 6 - iter 648/723 - loss 0.02626021 - time (sec): 54.44 - samples/sec: 2908.49 - lr: 0.000014 - momentum: 0.000000
2023-10-14 08:35:10,343 epoch 6 - iter 720/723 - loss 0.02645765 - time (sec): 60.17 - samples/sec: 2919.78 - lr: 0.000013 - momentum: 0.000000
2023-10-14 08:35:10,568 ----------------------------------------------------------------------------------------------------
2023-10-14 08:35:10,568 EPOCH 6 done: loss 0.0264 - lr: 0.000013
2023-10-14 08:35:14,104 DEV : loss 0.1391119509935379 - f1-score (micro avg)  0.7855
2023-10-14 08:35:14,124 ----------------------------------------------------------------------------------------------------
2023-10-14 08:35:20,351 epoch 7 - iter 72/723 - loss 0.01350535 - time (sec): 6.23 - samples/sec: 2796.08 - lr: 0.000013 - momentum: 0.000000
2023-10-14 08:35:26,465 epoch 7 - iter 144/723 - loss 0.01712040 - time (sec): 12.34 - samples/sec: 2766.13 - lr: 0.000013 - momentum: 0.000000
2023-10-14 08:35:33,034 epoch 7 - iter 216/723 - loss 0.01751483 - time (sec): 18.91 - samples/sec: 2774.49 - lr: 0.000012 - momentum: 0.000000
2023-10-14 08:35:39,319 epoch 7 - iter 288/723 - loss 0.01623115 - time (sec): 25.19 - samples/sec: 2798.65 - lr: 0.000012 - momentum: 0.000000
2023-10-14 08:35:45,189 epoch 7 - iter 360/723 - loss 0.01745382 - time (sec): 31.06 - samples/sec: 2833.38 - lr: 0.000012 - momentum: 0.000000
2023-10-14 08:35:51,313 epoch 7 - iter 432/723 - loss 0.01897664 - time (sec): 37.19 - samples/sec: 2863.45 - lr: 0.000011 - momentum: 0.000000
2023-10-14 08:35:56,993 epoch 7 - iter 504/723 - loss 0.01893676 - time (sec): 42.87 - samples/sec: 2874.41 - lr: 0.000011 - momentum: 0.000000
2023-10-14 08:36:02,983 epoch 7 - iter 576/723 - loss 0.01873928 - time (sec): 48.86 - samples/sec: 2894.89 - lr: 0.000011 - momentum: 0.000000
2023-10-14 08:36:08,710 epoch 7 - iter 648/723 - loss 0.01874669 - time (sec): 54.59 - samples/sec: 2894.98 - lr: 0.000010 - momentum: 0.000000
2023-10-14 08:36:14,582 epoch 7 - iter 720/723 - loss 0.01849634 - time (sec): 60.46 - samples/sec: 2906.14 - lr: 0.000010 - momentum: 0.000000
2023-10-14 08:36:14,763 ----------------------------------------------------------------------------------------------------
2023-10-14 08:36:14,763 EPOCH 7 done: loss 0.0185 - lr: 0.000010
2023-10-14 08:36:18,275 DEV : loss 0.17075838148593903 - f1-score (micro avg)  0.8055
2023-10-14 08:36:18,295 ----------------------------------------------------------------------------------------------------
2023-10-14 08:36:24,144 epoch 8 - iter 72/723 - loss 0.01760211 - time (sec): 5.85 - samples/sec: 2982.67 - lr: 0.000010 - momentum: 0.000000
2023-10-14 08:36:30,981 epoch 8 - iter 144/723 - loss 0.01623072 - time (sec): 12.68 - samples/sec: 2781.67 - lr: 0.000009 - momentum: 0.000000
2023-10-14 08:36:36,973 epoch 8 - iter 216/723 - loss 0.01533673 - time (sec): 18.68 - samples/sec: 2833.48 - lr: 0.000009 - momentum: 0.000000
2023-10-14 08:36:42,799 epoch 8 - iter 288/723 - loss 0.01601558 - time (sec): 24.50 - samples/sec: 2864.13 - lr: 0.000009 - momentum: 0.000000
2023-10-14 08:36:49,045 epoch 8 - iter 360/723 - loss 0.01466925 - time (sec): 30.75 - samples/sec: 2897.17 - lr: 0.000008 - momentum: 0.000000
2023-10-14 08:36:54,836 epoch 8 - iter 432/723 - loss 0.01402939 - time (sec): 36.54 - samples/sec: 2898.47 - lr: 0.000008 - momentum: 0.000000
2023-10-14 08:37:00,513 epoch 8 - iter 504/723 - loss 0.01405464 - time (sec): 42.22 - samples/sec: 2926.51 - lr: 0.000008 - momentum: 0.000000
2023-10-14 08:37:06,081 epoch 8 - iter 576/723 - loss 0.01491234 - time (sec): 47.78 - samples/sec: 2932.12 - lr: 0.000007 - momentum: 0.000000
2023-10-14 08:37:12,626 epoch 8 - iter 648/723 - loss 0.01502329 - time (sec): 54.33 - samples/sec: 2916.30 - lr: 0.000007 - momentum: 0.000000
2023-10-14 08:37:18,676 epoch 8 - iter 720/723 - loss 0.01521053 - time (sec): 60.38 - samples/sec: 2912.50 - lr: 0.000007 - momentum: 0.000000
2023-10-14 08:37:18,861 ----------------------------------------------------------------------------------------------------
2023-10-14 08:37:18,861 EPOCH 8 done: loss 0.0152 - lr: 0.000007
2023-10-14 08:37:23,265 DEV : loss 0.17406047880649567 - f1-score (micro avg)  0.7968
2023-10-14 08:37:23,286 ----------------------------------------------------------------------------------------------------
2023-10-14 08:37:29,454 epoch 9 - iter 72/723 - loss 0.01059210 - time (sec): 6.17 - samples/sec: 2956.17 - lr: 0.000006 - momentum: 0.000000
2023-10-14 08:37:36,031 epoch 9 - iter 144/723 - loss 0.01193666 - time (sec): 12.74 - samples/sec: 2885.31 - lr: 0.000006 - momentum: 0.000000
2023-10-14 08:37:42,240 epoch 9 - iter 216/723 - loss 0.01045503 - time (sec): 18.95 - samples/sec: 2951.17 - lr: 0.000006 - momentum: 0.000000
2023-10-14 08:37:47,987 epoch 9 - iter 288/723 - loss 0.00976298 - time (sec): 24.70 - samples/sec: 2920.37 - lr: 0.000005 - momentum: 0.000000
2023-10-14 08:37:54,314 epoch 9 - iter 360/723 - loss 0.01051185 - time (sec): 31.03 - samples/sec: 2926.68 - lr: 0.000005 - momentum: 0.000000
2023-10-14 08:37:59,729 epoch 9 - iter 432/723 - loss 0.01007391 - time (sec): 36.44 - samples/sec: 2939.41 - lr: 0.000005 - momentum: 0.000000
2023-10-14 08:38:05,709 epoch 9 - iter 504/723 - loss 0.01053006 - time (sec): 42.42 - samples/sec: 2926.20 - lr: 0.000004 - momentum: 0.000000
2023-10-14 08:38:11,121 epoch 9 - iter 576/723 - loss 0.01019871 - time (sec): 47.83 - samples/sec: 2929.84 - lr: 0.000004 - momentum: 0.000000
2023-10-14 08:38:17,094 epoch 9 - iter 648/723 - loss 0.01035051 - time (sec): 53.81 - samples/sec: 2927.27 - lr: 0.000004 - momentum: 0.000000
2023-10-14 08:38:23,337 epoch 9 - iter 720/723 - loss 0.01068224 - time (sec): 60.05 - samples/sec: 2925.47 - lr: 0.000003 - momentum: 0.000000
2023-10-14 08:38:23,534 ----------------------------------------------------------------------------------------------------
2023-10-14 08:38:23,534 EPOCH 9 done: loss 0.0107 - lr: 0.000003
2023-10-14 08:38:27,028 DEV : loss 0.1967579573392868 - f1-score (micro avg)  0.7972
2023-10-14 08:38:27,047 ----------------------------------------------------------------------------------------------------
2023-10-14 08:38:33,194 epoch 10 - iter 72/723 - loss 0.00260707 - time (sec): 6.15 - samples/sec: 2990.09 - lr: 0.000003 - momentum: 0.000000
2023-10-14 08:38:38,655 epoch 10 - iter 144/723 - loss 0.00602904 - time (sec): 11.61 - samples/sec: 2988.23 - lr: 0.000003 - momentum: 0.000000
2023-10-14 08:38:44,750 epoch 10 - iter 216/723 - loss 0.01005193 - time (sec): 17.70 - samples/sec: 2974.13 - lr: 0.000002 - momentum: 0.000000
2023-10-14 08:38:51,428 epoch 10 - iter 288/723 - loss 0.00877443 - time (sec): 24.38 - samples/sec: 2908.63 - lr: 0.000002 - momentum: 0.000000
2023-10-14 08:38:56,998 epoch 10 - iter 360/723 - loss 0.00837203 - time (sec): 29.95 - samples/sec: 2933.31 - lr: 0.000002 - momentum: 0.000000
2023-10-14 08:39:03,575 epoch 10 - iter 432/723 - loss 0.00824030 - time (sec): 36.53 - samples/sec: 2931.89 - lr: 0.000001 - momentum: 0.000000
2023-10-14 08:39:09,222 epoch 10 - iter 504/723 - loss 0.00865644 - time (sec): 42.17 - samples/sec: 2942.99 - lr: 0.000001 - momentum: 0.000000
2023-10-14 08:39:14,978 epoch 10 - iter 576/723 - loss 0.00897306 - time (sec): 47.93 - samples/sec: 2944.84 - lr: 0.000001 - momentum: 0.000000
2023-10-14 08:39:20,723 epoch 10 - iter 648/723 - loss 0.00883926 - time (sec): 53.67 - samples/sec: 2940.14 - lr: 0.000000 - momentum: 0.000000
2023-10-14 08:39:26,903 epoch 10 - iter 720/723 - loss 0.00871898 - time (sec): 59.85 - samples/sec: 2938.23 - lr: 0.000000 - momentum: 0.000000
2023-10-14 08:39:27,070 ----------------------------------------------------------------------------------------------------
2023-10-14 08:39:27,070 EPOCH 10 done: loss 0.0088 - lr: 0.000000
2023-10-14 08:39:30,587 DEV : loss 0.2016027718782425 - f1-score (micro avg)  0.7978
2023-10-14 08:39:30,972 ----------------------------------------------------------------------------------------------------
2023-10-14 08:39:30,973 Loading model from best epoch ...
2023-10-14 08:39:32,712 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-14 08:39:35,860 
Results:
- F-score (micro) 0.7954
- F-score (macro) 0.6863
- Accuracy 0.6765

By class:
              precision    recall  f1-score   support

         PER     0.7629    0.8610    0.8090       482
         LOC     0.8997    0.7838    0.8378       458
         ORG     0.4355    0.3913    0.4122        69

   micro avg     0.7970    0.7939    0.7954      1009
   macro avg     0.6994    0.6787    0.6863      1009
weighted avg     0.8026    0.7939    0.7949      1009

2023-10-14 08:39:35,861 ----------------------------------------------------------------------------------------------------