File size: 24,018 Bytes
f2bb399
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
2023-10-25 17:00:22,362 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,363 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(64001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-25 17:00:22,363 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
 - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 Train:  7142 sentences
2023-10-25 17:00:22,364         (train_with_dev=False, train_with_test=False)
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 Training Params:
2023-10-25 17:00:22,364  - learning_rate: "3e-05" 
2023-10-25 17:00:22,364  - mini_batch_size: "8"
2023-10-25 17:00:22,364  - max_epochs: "10"
2023-10-25 17:00:22,364  - shuffle: "True"
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 Plugins:
2023-10-25 17:00:22,364  - TensorboardLogger
2023-10-25 17:00:22,364  - LinearScheduler | warmup_fraction: '0.1'
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 Final evaluation on model from best epoch (best-model.pt)
2023-10-25 17:00:22,364  - metric: "('micro avg', 'f1-score')"
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 Computation:
2023-10-25 17:00:22,364  - compute on device: cuda:0
2023-10-25 17:00:22,364  - embedding storage: none
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 Model training base path: "hmbench-newseye/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-4"
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
2023-10-25 17:00:22,365 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-25 17:00:28,672 epoch 1 - iter 89/893 - loss 2.19742480 - time (sec): 6.31 - samples/sec: 4004.07 - lr: 0.000003 - momentum: 0.000000
2023-10-25 17:00:35,048 epoch 1 - iter 178/893 - loss 1.39693997 - time (sec): 12.68 - samples/sec: 4002.88 - lr: 0.000006 - momentum: 0.000000
2023-10-25 17:00:41,309 epoch 1 - iter 267/893 - loss 1.07633123 - time (sec): 18.94 - samples/sec: 3947.26 - lr: 0.000009 - momentum: 0.000000
2023-10-25 17:00:47,306 epoch 1 - iter 356/893 - loss 0.87429157 - time (sec): 24.94 - samples/sec: 3993.37 - lr: 0.000012 - momentum: 0.000000
2023-10-25 17:00:53,202 epoch 1 - iter 445/893 - loss 0.74799460 - time (sec): 30.84 - samples/sec: 4002.33 - lr: 0.000015 - momentum: 0.000000
2023-10-25 17:00:59,211 epoch 1 - iter 534/893 - loss 0.65718250 - time (sec): 36.85 - samples/sec: 4018.73 - lr: 0.000018 - momentum: 0.000000
2023-10-25 17:01:05,222 epoch 1 - iter 623/893 - loss 0.58410501 - time (sec): 42.86 - samples/sec: 4035.30 - lr: 0.000021 - momentum: 0.000000
2023-10-25 17:01:11,176 epoch 1 - iter 712/893 - loss 0.52802697 - time (sec): 48.81 - samples/sec: 4073.71 - lr: 0.000024 - momentum: 0.000000
2023-10-25 17:01:17,243 epoch 1 - iter 801/893 - loss 0.48658916 - time (sec): 54.88 - samples/sec: 4080.04 - lr: 0.000027 - momentum: 0.000000
2023-10-25 17:01:23,202 epoch 1 - iter 890/893 - loss 0.45333206 - time (sec): 60.84 - samples/sec: 4071.24 - lr: 0.000030 - momentum: 0.000000
2023-10-25 17:01:23,417 ----------------------------------------------------------------------------------------------------
2023-10-25 17:01:23,417 EPOCH 1 done: loss 0.4518 - lr: 0.000030
2023-10-25 17:01:27,249 DEV : loss 0.0998985692858696 - f1-score (micro avg)  0.7288
2023-10-25 17:01:27,270 saving best model
2023-10-25 17:01:27,743 ----------------------------------------------------------------------------------------------------
2023-10-25 17:01:33,963 epoch 2 - iter 89/893 - loss 0.11010133 - time (sec): 6.22 - samples/sec: 3972.66 - lr: 0.000030 - momentum: 0.000000
2023-10-25 17:01:40,098 epoch 2 - iter 178/893 - loss 0.10005667 - time (sec): 12.35 - samples/sec: 3975.15 - lr: 0.000029 - momentum: 0.000000
2023-10-25 17:01:46,248 epoch 2 - iter 267/893 - loss 0.10041275 - time (sec): 18.50 - samples/sec: 4054.32 - lr: 0.000029 - momentum: 0.000000
2023-10-25 17:01:52,378 epoch 2 - iter 356/893 - loss 0.10308505 - time (sec): 24.63 - samples/sec: 4108.35 - lr: 0.000029 - momentum: 0.000000
2023-10-25 17:01:58,565 epoch 2 - iter 445/893 - loss 0.10163209 - time (sec): 30.82 - samples/sec: 4120.74 - lr: 0.000028 - momentum: 0.000000
2023-10-25 17:02:04,485 epoch 2 - iter 534/893 - loss 0.10147740 - time (sec): 36.74 - samples/sec: 4101.34 - lr: 0.000028 - momentum: 0.000000
2023-10-25 17:02:10,425 epoch 2 - iter 623/893 - loss 0.10276131 - time (sec): 42.68 - samples/sec: 4107.64 - lr: 0.000028 - momentum: 0.000000
2023-10-25 17:02:16,395 epoch 2 - iter 712/893 - loss 0.10195994 - time (sec): 48.65 - samples/sec: 4123.54 - lr: 0.000027 - momentum: 0.000000
2023-10-25 17:02:22,164 epoch 2 - iter 801/893 - loss 0.10209269 - time (sec): 54.42 - samples/sec: 4096.53 - lr: 0.000027 - momentum: 0.000000
2023-10-25 17:02:28,238 epoch 2 - iter 890/893 - loss 0.10100025 - time (sec): 60.49 - samples/sec: 4102.24 - lr: 0.000027 - momentum: 0.000000
2023-10-25 17:02:28,441 ----------------------------------------------------------------------------------------------------
2023-10-25 17:02:28,441 EPOCH 2 done: loss 0.1009 - lr: 0.000027
2023-10-25 17:02:33,319 DEV : loss 0.09367502480745316 - f1-score (micro avg)  0.7629
2023-10-25 17:02:33,342 saving best model
2023-10-25 17:02:34,008 ----------------------------------------------------------------------------------------------------
2023-10-25 17:02:39,971 epoch 3 - iter 89/893 - loss 0.06332527 - time (sec): 5.96 - samples/sec: 3937.76 - lr: 0.000026 - momentum: 0.000000
2023-10-25 17:02:46,251 epoch 3 - iter 178/893 - loss 0.06194394 - time (sec): 12.24 - samples/sec: 4036.39 - lr: 0.000026 - momentum: 0.000000
2023-10-25 17:02:52,083 epoch 3 - iter 267/893 - loss 0.06017200 - time (sec): 18.07 - samples/sec: 4109.10 - lr: 0.000026 - momentum: 0.000000
2023-10-25 17:02:58,255 epoch 3 - iter 356/893 - loss 0.06164862 - time (sec): 24.24 - samples/sec: 4086.99 - lr: 0.000025 - momentum: 0.000000
2023-10-25 17:03:04,363 epoch 3 - iter 445/893 - loss 0.06151607 - time (sec): 30.35 - samples/sec: 4106.56 - lr: 0.000025 - momentum: 0.000000
2023-10-25 17:03:10,344 epoch 3 - iter 534/893 - loss 0.06067296 - time (sec): 36.33 - samples/sec: 4120.16 - lr: 0.000025 - momentum: 0.000000
2023-10-25 17:03:16,234 epoch 3 - iter 623/893 - loss 0.06086561 - time (sec): 42.22 - samples/sec: 4129.45 - lr: 0.000024 - momentum: 0.000000
2023-10-25 17:03:22,147 epoch 3 - iter 712/893 - loss 0.06099317 - time (sec): 48.13 - samples/sec: 4094.37 - lr: 0.000024 - momentum: 0.000000
2023-10-25 17:03:28,280 epoch 3 - iter 801/893 - loss 0.06051995 - time (sec): 54.27 - samples/sec: 4119.00 - lr: 0.000024 - momentum: 0.000000
2023-10-25 17:03:34,251 epoch 3 - iter 890/893 - loss 0.06220034 - time (sec): 60.24 - samples/sec: 4117.78 - lr: 0.000023 - momentum: 0.000000
2023-10-25 17:03:34,451 ----------------------------------------------------------------------------------------------------
2023-10-25 17:03:34,451 EPOCH 3 done: loss 0.0624 - lr: 0.000023
2023-10-25 17:03:39,555 DEV : loss 0.10349678248167038 - f1-score (micro avg)  0.7851
2023-10-25 17:03:39,573 saving best model
2023-10-25 17:03:40,237 ----------------------------------------------------------------------------------------------------
2023-10-25 17:03:46,372 epoch 4 - iter 89/893 - loss 0.03754159 - time (sec): 6.13 - samples/sec: 4230.43 - lr: 0.000023 - momentum: 0.000000
2023-10-25 17:03:52,281 epoch 4 - iter 178/893 - loss 0.04483007 - time (sec): 12.04 - samples/sec: 4282.44 - lr: 0.000023 - momentum: 0.000000
2023-10-25 17:03:57,977 epoch 4 - iter 267/893 - loss 0.04464268 - time (sec): 17.74 - samples/sec: 4228.06 - lr: 0.000022 - momentum: 0.000000
2023-10-25 17:04:04,177 epoch 4 - iter 356/893 - loss 0.04410251 - time (sec): 23.94 - samples/sec: 4134.91 - lr: 0.000022 - momentum: 0.000000
2023-10-25 17:04:10,450 epoch 4 - iter 445/893 - loss 0.04290747 - time (sec): 30.21 - samples/sec: 4118.64 - lr: 0.000022 - momentum: 0.000000
2023-10-25 17:04:16,490 epoch 4 - iter 534/893 - loss 0.04337588 - time (sec): 36.25 - samples/sec: 4135.74 - lr: 0.000021 - momentum: 0.000000
2023-10-25 17:04:22,526 epoch 4 - iter 623/893 - loss 0.04450695 - time (sec): 42.29 - samples/sec: 4100.04 - lr: 0.000021 - momentum: 0.000000
2023-10-25 17:04:28,608 epoch 4 - iter 712/893 - loss 0.04474457 - time (sec): 48.37 - samples/sec: 4104.66 - lr: 0.000021 - momentum: 0.000000
2023-10-25 17:04:34,686 epoch 4 - iter 801/893 - loss 0.04576971 - time (sec): 54.45 - samples/sec: 4112.57 - lr: 0.000020 - momentum: 0.000000
2023-10-25 17:04:40,591 epoch 4 - iter 890/893 - loss 0.04490676 - time (sec): 60.35 - samples/sec: 4097.87 - lr: 0.000020 - momentum: 0.000000
2023-10-25 17:04:40,899 ----------------------------------------------------------------------------------------------------
2023-10-25 17:04:40,904 EPOCH 4 done: loss 0.0447 - lr: 0.000020
2023-10-25 17:04:45,230 DEV : loss 0.14620383083820343 - f1-score (micro avg)  0.8037
2023-10-25 17:04:45,256 saving best model
2023-10-25 17:04:46,044 ----------------------------------------------------------------------------------------------------
2023-10-25 17:04:52,073 epoch 5 - iter 89/893 - loss 0.03530497 - time (sec): 6.03 - samples/sec: 3848.91 - lr: 0.000020 - momentum: 0.000000
2023-10-25 17:04:58,050 epoch 5 - iter 178/893 - loss 0.03402744 - time (sec): 12.00 - samples/sec: 4001.34 - lr: 0.000019 - momentum: 0.000000
2023-10-25 17:05:04,150 epoch 5 - iter 267/893 - loss 0.03369793 - time (sec): 18.10 - samples/sec: 4023.43 - lr: 0.000019 - momentum: 0.000000
2023-10-25 17:05:10,344 epoch 5 - iter 356/893 - loss 0.03388932 - time (sec): 24.30 - samples/sec: 4021.69 - lr: 0.000019 - momentum: 0.000000
2023-10-25 17:05:16,425 epoch 5 - iter 445/893 - loss 0.03377847 - time (sec): 30.38 - samples/sec: 4048.47 - lr: 0.000018 - momentum: 0.000000
2023-10-25 17:05:22,583 epoch 5 - iter 534/893 - loss 0.03360074 - time (sec): 36.53 - samples/sec: 4053.71 - lr: 0.000018 - momentum: 0.000000
2023-10-25 17:05:28,654 epoch 5 - iter 623/893 - loss 0.03242307 - time (sec): 42.61 - samples/sec: 4046.48 - lr: 0.000018 - momentum: 0.000000
2023-10-25 17:05:34,820 epoch 5 - iter 712/893 - loss 0.03229538 - time (sec): 48.77 - samples/sec: 4034.35 - lr: 0.000017 - momentum: 0.000000
2023-10-25 17:05:40,937 epoch 5 - iter 801/893 - loss 0.03238963 - time (sec): 54.89 - samples/sec: 4065.34 - lr: 0.000017 - momentum: 0.000000
2023-10-25 17:05:47,041 epoch 5 - iter 890/893 - loss 0.03261197 - time (sec): 60.99 - samples/sec: 4063.33 - lr: 0.000017 - momentum: 0.000000
2023-10-25 17:05:47,251 ----------------------------------------------------------------------------------------------------
2023-10-25 17:05:47,251 EPOCH 5 done: loss 0.0325 - lr: 0.000017
2023-10-25 17:05:52,885 DEV : loss 0.1633528769016266 - f1-score (micro avg)  0.797
2023-10-25 17:05:52,915 ----------------------------------------------------------------------------------------------------
2023-10-25 17:05:59,081 epoch 6 - iter 89/893 - loss 0.03029989 - time (sec): 6.16 - samples/sec: 3842.51 - lr: 0.000016 - momentum: 0.000000
2023-10-25 17:06:05,172 epoch 6 - iter 178/893 - loss 0.02564591 - time (sec): 12.26 - samples/sec: 3799.19 - lr: 0.000016 - momentum: 0.000000
2023-10-25 17:06:11,310 epoch 6 - iter 267/893 - loss 0.02415048 - time (sec): 18.39 - samples/sec: 3923.45 - lr: 0.000016 - momentum: 0.000000
2023-10-25 17:06:17,327 epoch 6 - iter 356/893 - loss 0.02531047 - time (sec): 24.41 - samples/sec: 3979.94 - lr: 0.000015 - momentum: 0.000000
2023-10-25 17:06:23,353 epoch 6 - iter 445/893 - loss 0.02540534 - time (sec): 30.44 - samples/sec: 4031.53 - lr: 0.000015 - momentum: 0.000000
2023-10-25 17:06:29,489 epoch 6 - iter 534/893 - loss 0.02638207 - time (sec): 36.57 - samples/sec: 4054.77 - lr: 0.000015 - momentum: 0.000000
2023-10-25 17:06:35,690 epoch 6 - iter 623/893 - loss 0.02582057 - time (sec): 42.77 - samples/sec: 4044.98 - lr: 0.000014 - momentum: 0.000000
2023-10-25 17:06:41,917 epoch 6 - iter 712/893 - loss 0.02512173 - time (sec): 49.00 - samples/sec: 4050.05 - lr: 0.000014 - momentum: 0.000000
2023-10-25 17:06:48,057 epoch 6 - iter 801/893 - loss 0.02591665 - time (sec): 55.14 - samples/sec: 4038.91 - lr: 0.000014 - momentum: 0.000000
2023-10-25 17:06:54,250 epoch 6 - iter 890/893 - loss 0.02583713 - time (sec): 61.33 - samples/sec: 4048.55 - lr: 0.000013 - momentum: 0.000000
2023-10-25 17:06:54,451 ----------------------------------------------------------------------------------------------------
2023-10-25 17:06:54,452 EPOCH 6 done: loss 0.0259 - lr: 0.000013
2023-10-25 17:06:59,824 DEV : loss 0.18684536218643188 - f1-score (micro avg)  0.7976
2023-10-25 17:06:59,848 ----------------------------------------------------------------------------------------------------
2023-10-25 17:07:06,024 epoch 7 - iter 89/893 - loss 0.01485614 - time (sec): 6.17 - samples/sec: 3881.67 - lr: 0.000013 - momentum: 0.000000
2023-10-25 17:07:12,130 epoch 7 - iter 178/893 - loss 0.01598830 - time (sec): 12.28 - samples/sec: 3958.05 - lr: 0.000013 - momentum: 0.000000
2023-10-25 17:07:18,166 epoch 7 - iter 267/893 - loss 0.01783078 - time (sec): 18.32 - samples/sec: 4076.48 - lr: 0.000012 - momentum: 0.000000
2023-10-25 17:07:24,070 epoch 7 - iter 356/893 - loss 0.01936177 - time (sec): 24.22 - samples/sec: 4105.59 - lr: 0.000012 - momentum: 0.000000
2023-10-25 17:07:30,059 epoch 7 - iter 445/893 - loss 0.01988732 - time (sec): 30.21 - samples/sec: 4146.37 - lr: 0.000012 - momentum: 0.000000
2023-10-25 17:07:36,108 epoch 7 - iter 534/893 - loss 0.01928793 - time (sec): 36.26 - samples/sec: 4162.19 - lr: 0.000011 - momentum: 0.000000
2023-10-25 17:07:42,411 epoch 7 - iter 623/893 - loss 0.02039273 - time (sec): 42.56 - samples/sec: 4121.89 - lr: 0.000011 - momentum: 0.000000
2023-10-25 17:07:48,261 epoch 7 - iter 712/893 - loss 0.02001692 - time (sec): 48.41 - samples/sec: 4092.95 - lr: 0.000011 - momentum: 0.000000
2023-10-25 17:07:54,353 epoch 7 - iter 801/893 - loss 0.02017325 - time (sec): 54.50 - samples/sec: 4087.57 - lr: 0.000010 - momentum: 0.000000
2023-10-25 17:08:00,473 epoch 7 - iter 890/893 - loss 0.01991182 - time (sec): 60.62 - samples/sec: 4094.74 - lr: 0.000010 - momentum: 0.000000
2023-10-25 17:08:00,656 ----------------------------------------------------------------------------------------------------
2023-10-25 17:08:00,657 EPOCH 7 done: loss 0.0199 - lr: 0.000010
2023-10-25 17:08:05,217 DEV : loss 0.2105928510427475 - f1-score (micro avg)  0.8011
2023-10-25 17:08:05,237 ----------------------------------------------------------------------------------------------------
2023-10-25 17:08:11,326 epoch 8 - iter 89/893 - loss 0.01743845 - time (sec): 6.09 - samples/sec: 4235.38 - lr: 0.000010 - momentum: 0.000000
2023-10-25 17:08:17,447 epoch 8 - iter 178/893 - loss 0.01794746 - time (sec): 12.21 - samples/sec: 4130.72 - lr: 0.000009 - momentum: 0.000000
2023-10-25 17:08:23,444 epoch 8 - iter 267/893 - loss 0.01517857 - time (sec): 18.21 - samples/sec: 4109.13 - lr: 0.000009 - momentum: 0.000000
2023-10-25 17:08:29,466 epoch 8 - iter 356/893 - loss 0.01542169 - time (sec): 24.23 - samples/sec: 4050.01 - lr: 0.000009 - momentum: 0.000000
2023-10-25 17:08:35,548 epoch 8 - iter 445/893 - loss 0.01465711 - time (sec): 30.31 - samples/sec: 4032.48 - lr: 0.000008 - momentum: 0.000000
2023-10-25 17:08:41,861 epoch 8 - iter 534/893 - loss 0.01456755 - time (sec): 36.62 - samples/sec: 4029.83 - lr: 0.000008 - momentum: 0.000000
2023-10-25 17:08:47,636 epoch 8 - iter 623/893 - loss 0.01409835 - time (sec): 42.40 - samples/sec: 4055.86 - lr: 0.000008 - momentum: 0.000000
2023-10-25 17:08:53,646 epoch 8 - iter 712/893 - loss 0.01384014 - time (sec): 48.41 - samples/sec: 4051.49 - lr: 0.000007 - momentum: 0.000000
2023-10-25 17:08:59,620 epoch 8 - iter 801/893 - loss 0.01422010 - time (sec): 54.38 - samples/sec: 4078.14 - lr: 0.000007 - momentum: 0.000000
2023-10-25 17:09:05,944 epoch 8 - iter 890/893 - loss 0.01456695 - time (sec): 60.71 - samples/sec: 4085.39 - lr: 0.000007 - momentum: 0.000000
2023-10-25 17:09:06,139 ----------------------------------------------------------------------------------------------------
2023-10-25 17:09:06,140 EPOCH 8 done: loss 0.0146 - lr: 0.000007
2023-10-25 17:09:11,159 DEV : loss 0.21266496181488037 - f1-score (micro avg)  0.7947
2023-10-25 17:09:11,180 ----------------------------------------------------------------------------------------------------
2023-10-25 17:09:17,250 epoch 9 - iter 89/893 - loss 0.00472214 - time (sec): 6.07 - samples/sec: 4171.11 - lr: 0.000006 - momentum: 0.000000
2023-10-25 17:09:23,216 epoch 9 - iter 178/893 - loss 0.00879912 - time (sec): 12.03 - samples/sec: 4177.79 - lr: 0.000006 - momentum: 0.000000
2023-10-25 17:09:29,325 epoch 9 - iter 267/893 - loss 0.01001564 - time (sec): 18.14 - samples/sec: 4086.48 - lr: 0.000006 - momentum: 0.000000
2023-10-25 17:09:35,358 epoch 9 - iter 356/893 - loss 0.01086924 - time (sec): 24.18 - samples/sec: 4140.04 - lr: 0.000005 - momentum: 0.000000
2023-10-25 17:09:41,382 epoch 9 - iter 445/893 - loss 0.01063271 - time (sec): 30.20 - samples/sec: 4151.32 - lr: 0.000005 - momentum: 0.000000
2023-10-25 17:09:47,352 epoch 9 - iter 534/893 - loss 0.01049232 - time (sec): 36.17 - samples/sec: 4114.54 - lr: 0.000005 - momentum: 0.000000
2023-10-25 17:09:53,532 epoch 9 - iter 623/893 - loss 0.01032612 - time (sec): 42.35 - samples/sec: 4133.27 - lr: 0.000004 - momentum: 0.000000
2023-10-25 17:09:59,456 epoch 9 - iter 712/893 - loss 0.01040970 - time (sec): 48.27 - samples/sec: 4105.70 - lr: 0.000004 - momentum: 0.000000
2023-10-25 17:10:05,526 epoch 9 - iter 801/893 - loss 0.01045359 - time (sec): 54.34 - samples/sec: 4091.22 - lr: 0.000004 - momentum: 0.000000
2023-10-25 17:10:11,623 epoch 9 - iter 890/893 - loss 0.01060696 - time (sec): 60.44 - samples/sec: 4100.12 - lr: 0.000003 - momentum: 0.000000
2023-10-25 17:10:11,820 ----------------------------------------------------------------------------------------------------
2023-10-25 17:10:11,820 EPOCH 9 done: loss 0.0106 - lr: 0.000003
2023-10-25 17:10:17,087 DEV : loss 0.2295289933681488 - f1-score (micro avg)  0.8011
2023-10-25 17:10:17,112 ----------------------------------------------------------------------------------------------------
2023-10-25 17:10:23,006 epoch 10 - iter 89/893 - loss 0.01002804 - time (sec): 5.89 - samples/sec: 4117.60 - lr: 0.000003 - momentum: 0.000000
2023-10-25 17:10:29,006 epoch 10 - iter 178/893 - loss 0.01019071 - time (sec): 11.89 - samples/sec: 3983.06 - lr: 0.000003 - momentum: 0.000000
2023-10-25 17:10:35,268 epoch 10 - iter 267/893 - loss 0.00952875 - time (sec): 18.15 - samples/sec: 4042.88 - lr: 0.000002 - momentum: 0.000000
2023-10-25 17:10:41,474 epoch 10 - iter 356/893 - loss 0.00888864 - time (sec): 24.36 - samples/sec: 4047.46 - lr: 0.000002 - momentum: 0.000000
2023-10-25 17:10:47,407 epoch 10 - iter 445/893 - loss 0.00924368 - time (sec): 30.29 - samples/sec: 4025.62 - lr: 0.000002 - momentum: 0.000000
2023-10-25 17:10:53,592 epoch 10 - iter 534/893 - loss 0.00922564 - time (sec): 36.48 - samples/sec: 4056.96 - lr: 0.000001 - momentum: 0.000000
2023-10-25 17:10:59,611 epoch 10 - iter 623/893 - loss 0.00891649 - time (sec): 42.50 - samples/sec: 4071.51 - lr: 0.000001 - momentum: 0.000000
2023-10-25 17:11:05,620 epoch 10 - iter 712/893 - loss 0.00840213 - time (sec): 48.51 - samples/sec: 4045.77 - lr: 0.000001 - momentum: 0.000000
2023-10-25 17:11:11,776 epoch 10 - iter 801/893 - loss 0.00803404 - time (sec): 54.66 - samples/sec: 4059.14 - lr: 0.000000 - momentum: 0.000000
2023-10-25 17:11:18,055 epoch 10 - iter 890/893 - loss 0.00778595 - time (sec): 60.94 - samples/sec: 4068.83 - lr: 0.000000 - momentum: 0.000000
2023-10-25 17:11:18,253 ----------------------------------------------------------------------------------------------------
2023-10-25 17:11:18,253 EPOCH 10 done: loss 0.0078 - lr: 0.000000
2023-10-25 17:11:22,853 DEV : loss 0.23914724588394165 - f1-score (micro avg)  0.7997
2023-10-25 17:11:23,516 ----------------------------------------------------------------------------------------------------
2023-10-25 17:11:23,517 Loading model from best epoch ...
2023-10-25 17:11:25,628 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-25 17:11:37,463 
Results:
- F-score (micro) 0.6825
- F-score (macro) 0.5925
- Accuracy 0.5411

By class:
              precision    recall  f1-score   support

         LOC     0.7044    0.6813    0.6927      1095
         PER     0.7967    0.7628    0.7794      1012
         ORG     0.3908    0.5966    0.4723       357
   HumanProd     0.3279    0.6061    0.4255        33

   micro avg     0.6648    0.7012    0.6825      2497
   macro avg     0.5550    0.6617    0.5925      2497
weighted avg     0.6920    0.7012    0.6928      2497

2023-10-25 17:11:37,464 ----------------------------------------------------------------------------------------------------