File size: 23,734 Bytes
ccbfaa6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
2023-10-14 10:57:26,424 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,425 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-14 10:57:26,425 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,425 MultiCorpus: 5777 train + 722 dev + 723 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-14 10:57:26,425 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,425 Train:  5777 sentences
2023-10-14 10:57:26,425         (train_with_dev=False, train_with_test=False)
2023-10-14 10:57:26,425 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,425 Training Params:
2023-10-14 10:57:26,425  - learning_rate: "3e-05" 
2023-10-14 10:57:26,425  - mini_batch_size: "8"
2023-10-14 10:57:26,425  - max_epochs: "10"
2023-10-14 10:57:26,425  - shuffle: "True"
2023-10-14 10:57:26,425 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,426 Plugins:
2023-10-14 10:57:26,426  - LinearScheduler | warmup_fraction: '0.1'
2023-10-14 10:57:26,426 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,426 Final evaluation on model from best epoch (best-model.pt)
2023-10-14 10:57:26,426  - metric: "('micro avg', 'f1-score')"
2023-10-14 10:57:26,426 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,426 Computation:
2023-10-14 10:57:26,426  - compute on device: cuda:0
2023-10-14 10:57:26,426  - embedding storage: none
2023-10-14 10:57:26,426 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,426 Model training base path: "hmbench-icdar/nl-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-4"
2023-10-14 10:57:26,426 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:26,426 ----------------------------------------------------------------------------------------------------
2023-10-14 10:57:32,692 epoch 1 - iter 72/723 - loss 2.31506993 - time (sec): 6.26 - samples/sec: 2976.58 - lr: 0.000003 - momentum: 0.000000
2023-10-14 10:57:38,328 epoch 1 - iter 144/723 - loss 1.40098098 - time (sec): 11.90 - samples/sec: 3035.24 - lr: 0.000006 - momentum: 0.000000
2023-10-14 10:57:44,275 epoch 1 - iter 216/723 - loss 1.02739094 - time (sec): 17.85 - samples/sec: 2985.78 - lr: 0.000009 - momentum: 0.000000
2023-10-14 10:57:49,976 epoch 1 - iter 288/723 - loss 0.82673104 - time (sec): 23.55 - samples/sec: 2980.75 - lr: 0.000012 - momentum: 0.000000
2023-10-14 10:57:55,969 epoch 1 - iter 360/723 - loss 0.69531849 - time (sec): 29.54 - samples/sec: 2988.43 - lr: 0.000015 - momentum: 0.000000
2023-10-14 10:58:01,677 epoch 1 - iter 432/723 - loss 0.60767546 - time (sec): 35.25 - samples/sec: 3009.88 - lr: 0.000018 - momentum: 0.000000
2023-10-14 10:58:07,438 epoch 1 - iter 504/723 - loss 0.54162968 - time (sec): 41.01 - samples/sec: 3014.13 - lr: 0.000021 - momentum: 0.000000
2023-10-14 10:58:14,071 epoch 1 - iter 576/723 - loss 0.48974416 - time (sec): 47.64 - samples/sec: 2993.71 - lr: 0.000024 - momentum: 0.000000
2023-10-14 10:58:20,287 epoch 1 - iter 648/723 - loss 0.45020073 - time (sec): 53.86 - samples/sec: 2969.41 - lr: 0.000027 - momentum: 0.000000
2023-10-14 10:58:25,550 epoch 1 - iter 720/723 - loss 0.42217115 - time (sec): 59.12 - samples/sec: 2971.06 - lr: 0.000030 - momentum: 0.000000
2023-10-14 10:58:25,760 ----------------------------------------------------------------------------------------------------
2023-10-14 10:58:25,760 EPOCH 1 done: loss 0.4213 - lr: 0.000030
2023-10-14 10:58:28,750 DEV : loss 0.12855297327041626 - f1-score (micro avg)  0.7041
2023-10-14 10:58:28,768 saving best model
2023-10-14 10:58:29,152 ----------------------------------------------------------------------------------------------------
2023-10-14 10:58:34,794 epoch 2 - iter 72/723 - loss 0.12156124 - time (sec): 5.64 - samples/sec: 2875.61 - lr: 0.000030 - momentum: 0.000000
2023-10-14 10:58:40,804 epoch 2 - iter 144/723 - loss 0.11146254 - time (sec): 11.65 - samples/sec: 2900.46 - lr: 0.000029 - momentum: 0.000000
2023-10-14 10:58:46,896 epoch 2 - iter 216/723 - loss 0.11686624 - time (sec): 17.74 - samples/sec: 2917.23 - lr: 0.000029 - momentum: 0.000000
2023-10-14 10:58:53,571 epoch 2 - iter 288/723 - loss 0.11075586 - time (sec): 24.42 - samples/sec: 2902.96 - lr: 0.000029 - momentum: 0.000000
2023-10-14 10:58:59,700 epoch 2 - iter 360/723 - loss 0.10651151 - time (sec): 30.55 - samples/sec: 2907.89 - lr: 0.000028 - momentum: 0.000000
2023-10-14 10:59:05,443 epoch 2 - iter 432/723 - loss 0.10541663 - time (sec): 36.29 - samples/sec: 2909.61 - lr: 0.000028 - momentum: 0.000000
2023-10-14 10:59:10,993 epoch 2 - iter 504/723 - loss 0.10566055 - time (sec): 41.84 - samples/sec: 2918.83 - lr: 0.000028 - momentum: 0.000000
2023-10-14 10:59:16,688 epoch 2 - iter 576/723 - loss 0.10315725 - time (sec): 47.53 - samples/sec: 2933.43 - lr: 0.000027 - momentum: 0.000000
2023-10-14 10:59:22,656 epoch 2 - iter 648/723 - loss 0.10168054 - time (sec): 53.50 - samples/sec: 2939.05 - lr: 0.000027 - momentum: 0.000000
2023-10-14 10:59:28,642 epoch 2 - iter 720/723 - loss 0.10158311 - time (sec): 59.49 - samples/sec: 2953.47 - lr: 0.000027 - momentum: 0.000000
2023-10-14 10:59:28,880 ----------------------------------------------------------------------------------------------------
2023-10-14 10:59:28,880 EPOCH 2 done: loss 0.1014 - lr: 0.000027
2023-10-14 10:59:32,771 DEV : loss 0.0938434973359108 - f1-score (micro avg)  0.7002
2023-10-14 10:59:32,788 ----------------------------------------------------------------------------------------------------
2023-10-14 10:59:38,964 epoch 3 - iter 72/723 - loss 0.06077009 - time (sec): 6.17 - samples/sec: 2935.87 - lr: 0.000026 - momentum: 0.000000
2023-10-14 10:59:45,122 epoch 3 - iter 144/723 - loss 0.06201343 - time (sec): 12.33 - samples/sec: 2887.51 - lr: 0.000026 - momentum: 0.000000
2023-10-14 10:59:50,971 epoch 3 - iter 216/723 - loss 0.06525859 - time (sec): 18.18 - samples/sec: 2855.52 - lr: 0.000026 - momentum: 0.000000
2023-10-14 10:59:56,686 epoch 3 - iter 288/723 - loss 0.06305458 - time (sec): 23.90 - samples/sec: 2897.57 - lr: 0.000025 - momentum: 0.000000
2023-10-14 11:00:02,887 epoch 3 - iter 360/723 - loss 0.06172832 - time (sec): 30.10 - samples/sec: 2913.59 - lr: 0.000025 - momentum: 0.000000
2023-10-14 11:00:08,780 epoch 3 - iter 432/723 - loss 0.06330693 - time (sec): 35.99 - samples/sec: 2921.37 - lr: 0.000025 - momentum: 0.000000
2023-10-14 11:00:15,222 epoch 3 - iter 504/723 - loss 0.06448096 - time (sec): 42.43 - samples/sec: 2919.30 - lr: 0.000024 - momentum: 0.000000
2023-10-14 11:00:20,867 epoch 3 - iter 576/723 - loss 0.06382838 - time (sec): 48.08 - samples/sec: 2918.70 - lr: 0.000024 - momentum: 0.000000
2023-10-14 11:00:26,929 epoch 3 - iter 648/723 - loss 0.06273635 - time (sec): 54.14 - samples/sec: 2908.32 - lr: 0.000024 - momentum: 0.000000
2023-10-14 11:00:33,161 epoch 3 - iter 720/723 - loss 0.06321271 - time (sec): 60.37 - samples/sec: 2905.22 - lr: 0.000023 - momentum: 0.000000
2023-10-14 11:00:33,487 ----------------------------------------------------------------------------------------------------
2023-10-14 11:00:33,488 EPOCH 3 done: loss 0.0631 - lr: 0.000023
2023-10-14 11:00:36,981 DEV : loss 0.08630853146314621 - f1-score (micro avg)  0.8069
2023-10-14 11:00:36,999 saving best model
2023-10-14 11:00:37,532 ----------------------------------------------------------------------------------------------------
2023-10-14 11:00:43,552 epoch 4 - iter 72/723 - loss 0.03595234 - time (sec): 6.02 - samples/sec: 2919.12 - lr: 0.000023 - momentum: 0.000000
2023-10-14 11:00:49,964 epoch 4 - iter 144/723 - loss 0.04721867 - time (sec): 12.43 - samples/sec: 2884.50 - lr: 0.000023 - momentum: 0.000000
2023-10-14 11:00:56,286 epoch 4 - iter 216/723 - loss 0.04688850 - time (sec): 18.75 - samples/sec: 2815.40 - lr: 0.000022 - momentum: 0.000000
2023-10-14 11:01:02,694 epoch 4 - iter 288/723 - loss 0.04288223 - time (sec): 25.16 - samples/sec: 2808.48 - lr: 0.000022 - momentum: 0.000000
2023-10-14 11:01:08,221 epoch 4 - iter 360/723 - loss 0.04159157 - time (sec): 30.69 - samples/sec: 2840.38 - lr: 0.000022 - momentum: 0.000000
2023-10-14 11:01:14,265 epoch 4 - iter 432/723 - loss 0.04113805 - time (sec): 36.73 - samples/sec: 2877.65 - lr: 0.000021 - momentum: 0.000000
2023-10-14 11:01:20,221 epoch 4 - iter 504/723 - loss 0.04112768 - time (sec): 42.69 - samples/sec: 2875.66 - lr: 0.000021 - momentum: 0.000000
2023-10-14 11:01:26,212 epoch 4 - iter 576/723 - loss 0.04139046 - time (sec): 48.68 - samples/sec: 2886.33 - lr: 0.000021 - momentum: 0.000000
2023-10-14 11:01:32,305 epoch 4 - iter 648/723 - loss 0.04073847 - time (sec): 54.77 - samples/sec: 2895.25 - lr: 0.000020 - momentum: 0.000000
2023-10-14 11:01:38,302 epoch 4 - iter 720/723 - loss 0.04096798 - time (sec): 60.77 - samples/sec: 2889.77 - lr: 0.000020 - momentum: 0.000000
2023-10-14 11:01:38,500 ----------------------------------------------------------------------------------------------------
2023-10-14 11:01:38,501 EPOCH 4 done: loss 0.0412 - lr: 0.000020
2023-10-14 11:01:42,023 DEV : loss 0.08693055063486099 - f1-score (micro avg)  0.8288
2023-10-14 11:01:42,042 saving best model
2023-10-14 11:01:42,516 ----------------------------------------------------------------------------------------------------
2023-10-14 11:01:48,799 epoch 5 - iter 72/723 - loss 0.03262287 - time (sec): 6.28 - samples/sec: 2932.38 - lr: 0.000020 - momentum: 0.000000
2023-10-14 11:01:54,248 epoch 5 - iter 144/723 - loss 0.03076967 - time (sec): 11.73 - samples/sec: 3021.49 - lr: 0.000019 - momentum: 0.000000
2023-10-14 11:02:00,487 epoch 5 - iter 216/723 - loss 0.02828212 - time (sec): 17.97 - samples/sec: 3012.82 - lr: 0.000019 - momentum: 0.000000
2023-10-14 11:02:06,303 epoch 5 - iter 288/723 - loss 0.03125818 - time (sec): 23.78 - samples/sec: 2978.07 - lr: 0.000019 - momentum: 0.000000
2023-10-14 11:02:11,988 epoch 5 - iter 360/723 - loss 0.02911501 - time (sec): 29.47 - samples/sec: 2974.33 - lr: 0.000018 - momentum: 0.000000
2023-10-14 11:02:17,343 epoch 5 - iter 432/723 - loss 0.02975581 - time (sec): 34.82 - samples/sec: 2979.90 - lr: 0.000018 - momentum: 0.000000
2023-10-14 11:02:23,371 epoch 5 - iter 504/723 - loss 0.02903271 - time (sec): 40.85 - samples/sec: 2985.46 - lr: 0.000018 - momentum: 0.000000
2023-10-14 11:02:29,523 epoch 5 - iter 576/723 - loss 0.03013901 - time (sec): 47.00 - samples/sec: 2972.90 - lr: 0.000017 - momentum: 0.000000
2023-10-14 11:02:35,695 epoch 5 - iter 648/723 - loss 0.03126828 - time (sec): 53.18 - samples/sec: 2976.24 - lr: 0.000017 - momentum: 0.000000
2023-10-14 11:02:41,551 epoch 5 - iter 720/723 - loss 0.03072279 - time (sec): 59.03 - samples/sec: 2976.35 - lr: 0.000017 - momentum: 0.000000
2023-10-14 11:02:41,720 ----------------------------------------------------------------------------------------------------
2023-10-14 11:02:41,721 EPOCH 5 done: loss 0.0308 - lr: 0.000017
2023-10-14 11:02:45,637 DEV : loss 0.13127633929252625 - f1-score (micro avg)  0.8044
2023-10-14 11:02:45,653 ----------------------------------------------------------------------------------------------------
2023-10-14 11:02:51,633 epoch 6 - iter 72/723 - loss 0.01842860 - time (sec): 5.98 - samples/sec: 2902.58 - lr: 0.000016 - momentum: 0.000000
2023-10-14 11:02:57,718 epoch 6 - iter 144/723 - loss 0.02104149 - time (sec): 12.06 - samples/sec: 2907.63 - lr: 0.000016 - momentum: 0.000000
2023-10-14 11:03:03,381 epoch 6 - iter 216/723 - loss 0.02082948 - time (sec): 17.73 - samples/sec: 2961.34 - lr: 0.000016 - momentum: 0.000000
2023-10-14 11:03:09,572 epoch 6 - iter 288/723 - loss 0.02184263 - time (sec): 23.92 - samples/sec: 2949.13 - lr: 0.000015 - momentum: 0.000000
2023-10-14 11:03:15,404 epoch 6 - iter 360/723 - loss 0.02072094 - time (sec): 29.75 - samples/sec: 2938.63 - lr: 0.000015 - momentum: 0.000000
2023-10-14 11:03:21,504 epoch 6 - iter 432/723 - loss 0.02069835 - time (sec): 35.85 - samples/sec: 2930.06 - lr: 0.000015 - momentum: 0.000000
2023-10-14 11:03:27,963 epoch 6 - iter 504/723 - loss 0.02129765 - time (sec): 42.31 - samples/sec: 2904.18 - lr: 0.000014 - momentum: 0.000000
2023-10-14 11:03:34,651 epoch 6 - iter 576/723 - loss 0.02050509 - time (sec): 49.00 - samples/sec: 2903.26 - lr: 0.000014 - momentum: 0.000000
2023-10-14 11:03:40,533 epoch 6 - iter 648/723 - loss 0.02175670 - time (sec): 54.88 - samples/sec: 2899.49 - lr: 0.000014 - momentum: 0.000000
2023-10-14 11:03:46,229 epoch 6 - iter 720/723 - loss 0.02231132 - time (sec): 60.57 - samples/sec: 2901.48 - lr: 0.000013 - momentum: 0.000000
2023-10-14 11:03:46,397 ----------------------------------------------------------------------------------------------------
2023-10-14 11:03:46,397 EPOCH 6 done: loss 0.0223 - lr: 0.000013
2023-10-14 11:03:49,947 DEV : loss 0.12690779566764832 - f1-score (micro avg)  0.8249
2023-10-14 11:03:49,964 ----------------------------------------------------------------------------------------------------
2023-10-14 11:03:56,165 epoch 7 - iter 72/723 - loss 0.00937309 - time (sec): 6.20 - samples/sec: 2830.04 - lr: 0.000013 - momentum: 0.000000
2023-10-14 11:04:02,678 epoch 7 - iter 144/723 - loss 0.01350239 - time (sec): 12.71 - samples/sec: 2882.76 - lr: 0.000013 - momentum: 0.000000
2023-10-14 11:04:08,323 epoch 7 - iter 216/723 - loss 0.01358360 - time (sec): 18.36 - samples/sec: 2915.57 - lr: 0.000012 - momentum: 0.000000
2023-10-14 11:04:15,072 epoch 7 - iter 288/723 - loss 0.01452212 - time (sec): 25.11 - samples/sec: 2864.73 - lr: 0.000012 - momentum: 0.000000
2023-10-14 11:04:20,800 epoch 7 - iter 360/723 - loss 0.01439804 - time (sec): 30.84 - samples/sec: 2869.66 - lr: 0.000012 - momentum: 0.000000
2023-10-14 11:04:26,288 epoch 7 - iter 432/723 - loss 0.01516061 - time (sec): 36.32 - samples/sec: 2893.06 - lr: 0.000011 - momentum: 0.000000
2023-10-14 11:04:32,788 epoch 7 - iter 504/723 - loss 0.01712164 - time (sec): 42.82 - samples/sec: 2892.01 - lr: 0.000011 - momentum: 0.000000
2023-10-14 11:04:38,780 epoch 7 - iter 576/723 - loss 0.01729432 - time (sec): 48.81 - samples/sec: 2906.42 - lr: 0.000011 - momentum: 0.000000
2023-10-14 11:04:44,374 epoch 7 - iter 648/723 - loss 0.01749135 - time (sec): 54.41 - samples/sec: 2924.73 - lr: 0.000010 - momentum: 0.000000
2023-10-14 11:04:50,330 epoch 7 - iter 720/723 - loss 0.01711051 - time (sec): 60.37 - samples/sec: 2912.37 - lr: 0.000010 - momentum: 0.000000
2023-10-14 11:04:50,524 ----------------------------------------------------------------------------------------------------
2023-10-14 11:04:50,525 EPOCH 7 done: loss 0.0174 - lr: 0.000010
2023-10-14 11:04:54,070 DEV : loss 0.15461641550064087 - f1-score (micro avg)  0.8145
2023-10-14 11:04:54,088 ----------------------------------------------------------------------------------------------------
2023-10-14 11:04:59,959 epoch 8 - iter 72/723 - loss 0.01665349 - time (sec): 5.87 - samples/sec: 2848.03 - lr: 0.000010 - momentum: 0.000000
2023-10-14 11:05:06,094 epoch 8 - iter 144/723 - loss 0.01446601 - time (sec): 12.00 - samples/sec: 2881.67 - lr: 0.000009 - momentum: 0.000000
2023-10-14 11:05:12,405 epoch 8 - iter 216/723 - loss 0.01410012 - time (sec): 18.32 - samples/sec: 2857.36 - lr: 0.000009 - momentum: 0.000000
2023-10-14 11:05:18,576 epoch 8 - iter 288/723 - loss 0.01366919 - time (sec): 24.49 - samples/sec: 2845.71 - lr: 0.000009 - momentum: 0.000000
2023-10-14 11:05:24,652 epoch 8 - iter 360/723 - loss 0.01239443 - time (sec): 30.56 - samples/sec: 2877.55 - lr: 0.000008 - momentum: 0.000000
2023-10-14 11:05:30,144 epoch 8 - iter 432/723 - loss 0.01265536 - time (sec): 36.05 - samples/sec: 2900.06 - lr: 0.000008 - momentum: 0.000000
2023-10-14 11:05:36,539 epoch 8 - iter 504/723 - loss 0.01358128 - time (sec): 42.45 - samples/sec: 2891.45 - lr: 0.000008 - momentum: 0.000000
2023-10-14 11:05:42,619 epoch 8 - iter 576/723 - loss 0.01348813 - time (sec): 48.53 - samples/sec: 2893.15 - lr: 0.000007 - momentum: 0.000000
2023-10-14 11:05:48,324 epoch 8 - iter 648/723 - loss 0.01308338 - time (sec): 54.23 - samples/sec: 2905.38 - lr: 0.000007 - momentum: 0.000000
2023-10-14 11:05:54,660 epoch 8 - iter 720/723 - loss 0.01321059 - time (sec): 60.57 - samples/sec: 2896.19 - lr: 0.000007 - momentum: 0.000000
2023-10-14 11:05:54,878 ----------------------------------------------------------------------------------------------------
2023-10-14 11:05:54,878 EPOCH 8 done: loss 0.0132 - lr: 0.000007
2023-10-14 11:05:58,864 DEV : loss 0.150771364569664 - f1-score (micro avg)  0.8431
2023-10-14 11:05:58,880 saving best model
2023-10-14 11:05:59,413 ----------------------------------------------------------------------------------------------------
2023-10-14 11:06:05,658 epoch 9 - iter 72/723 - loss 0.01017083 - time (sec): 6.24 - samples/sec: 2911.90 - lr: 0.000006 - momentum: 0.000000
2023-10-14 11:06:11,476 epoch 9 - iter 144/723 - loss 0.00911976 - time (sec): 12.06 - samples/sec: 2930.73 - lr: 0.000006 - momentum: 0.000000
2023-10-14 11:06:17,232 epoch 9 - iter 216/723 - loss 0.00838583 - time (sec): 17.81 - samples/sec: 2926.00 - lr: 0.000006 - momentum: 0.000000
2023-10-14 11:06:23,911 epoch 9 - iter 288/723 - loss 0.00904573 - time (sec): 24.49 - samples/sec: 2913.97 - lr: 0.000005 - momentum: 0.000000
2023-10-14 11:06:29,832 epoch 9 - iter 360/723 - loss 0.00913035 - time (sec): 30.41 - samples/sec: 2918.84 - lr: 0.000005 - momentum: 0.000000
2023-10-14 11:06:36,140 epoch 9 - iter 432/723 - loss 0.00938579 - time (sec): 36.72 - samples/sec: 2905.02 - lr: 0.000005 - momentum: 0.000000
2023-10-14 11:06:41,831 epoch 9 - iter 504/723 - loss 0.00897802 - time (sec): 42.41 - samples/sec: 2912.32 - lr: 0.000004 - momentum: 0.000000
2023-10-14 11:06:48,221 epoch 9 - iter 576/723 - loss 0.00894380 - time (sec): 48.80 - samples/sec: 2899.02 - lr: 0.000004 - momentum: 0.000000
2023-10-14 11:06:54,001 epoch 9 - iter 648/723 - loss 0.00958515 - time (sec): 54.58 - samples/sec: 2903.49 - lr: 0.000004 - momentum: 0.000000
2023-10-14 11:06:59,830 epoch 9 - iter 720/723 - loss 0.00943720 - time (sec): 60.41 - samples/sec: 2908.78 - lr: 0.000003 - momentum: 0.000000
2023-10-14 11:07:00,059 ----------------------------------------------------------------------------------------------------
2023-10-14 11:07:00,060 EPOCH 9 done: loss 0.0095 - lr: 0.000003
2023-10-14 11:07:03,577 DEV : loss 0.15790636837482452 - f1-score (micro avg)  0.833
2023-10-14 11:07:03,592 ----------------------------------------------------------------------------------------------------
2023-10-14 11:07:09,324 epoch 10 - iter 72/723 - loss 0.00876310 - time (sec): 5.73 - samples/sec: 2913.63 - lr: 0.000003 - momentum: 0.000000
2023-10-14 11:07:15,547 epoch 10 - iter 144/723 - loss 0.00697092 - time (sec): 11.95 - samples/sec: 2905.71 - lr: 0.000003 - momentum: 0.000000
2023-10-14 11:07:22,240 epoch 10 - iter 216/723 - loss 0.00740430 - time (sec): 18.65 - samples/sec: 2809.93 - lr: 0.000002 - momentum: 0.000000
2023-10-14 11:07:28,447 epoch 10 - iter 288/723 - loss 0.00869243 - time (sec): 24.85 - samples/sec: 2862.60 - lr: 0.000002 - momentum: 0.000000
2023-10-14 11:07:34,727 epoch 10 - iter 360/723 - loss 0.00813367 - time (sec): 31.13 - samples/sec: 2878.44 - lr: 0.000002 - momentum: 0.000000
2023-10-14 11:07:40,624 epoch 10 - iter 432/723 - loss 0.00870267 - time (sec): 37.03 - samples/sec: 2886.14 - lr: 0.000001 - momentum: 0.000000
2023-10-14 11:07:46,030 epoch 10 - iter 504/723 - loss 0.00810991 - time (sec): 42.44 - samples/sec: 2888.69 - lr: 0.000001 - momentum: 0.000000
2023-10-14 11:07:51,966 epoch 10 - iter 576/723 - loss 0.00778453 - time (sec): 48.37 - samples/sec: 2879.90 - lr: 0.000001 - momentum: 0.000000
2023-10-14 11:07:58,270 epoch 10 - iter 648/723 - loss 0.00790965 - time (sec): 54.68 - samples/sec: 2885.46 - lr: 0.000000 - momentum: 0.000000
2023-10-14 11:08:04,217 epoch 10 - iter 720/723 - loss 0.00773716 - time (sec): 60.62 - samples/sec: 2894.81 - lr: 0.000000 - momentum: 0.000000
2023-10-14 11:08:04,452 ----------------------------------------------------------------------------------------------------
2023-10-14 11:08:04,452 EPOCH 10 done: loss 0.0077 - lr: 0.000000
2023-10-14 11:08:07,999 DEV : loss 0.16016288101673126 - f1-score (micro avg)  0.8324
2023-10-14 11:08:08,469 ----------------------------------------------------------------------------------------------------
2023-10-14 11:08:08,470 Loading model from best epoch ...
2023-10-14 11:08:10,080 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-14 11:08:13,254 
Results:
- F-score (micro) 0.8224
- F-score (macro) 0.7445
- Accuracy 0.7093

By class:
              precision    recall  f1-score   support

         PER     0.8407    0.8320    0.8363       482
         LOC     0.8710    0.8253    0.8475       458
         ORG     0.5806    0.5217    0.5496        69

   micro avg     0.8376    0.8077    0.8224      1009
   macro avg     0.7641    0.7263    0.7445      1009
weighted avg     0.8366    0.8077    0.8218      1009

2023-10-14 11:08:13,254 ----------------------------------------------------------------------------------------------------