File size: 23,902 Bytes
db768bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
2023-10-14 01:17:01,167 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,168 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-14 01:17:01,168 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,168 MultiCorpus: 7936 train + 992 dev + 992 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-14 01:17:01,168 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,168 Train:  7936 sentences
2023-10-14 01:17:01,168         (train_with_dev=False, train_with_test=False)
2023-10-14 01:17:01,168 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,168 Training Params:
2023-10-14 01:17:01,168  - learning_rate: "5e-05" 
2023-10-14 01:17:01,168  - mini_batch_size: "8"
2023-10-14 01:17:01,168  - max_epochs: "10"
2023-10-14 01:17:01,168  - shuffle: "True"
2023-10-14 01:17:01,168 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,168 Plugins:
2023-10-14 01:17:01,168  - LinearScheduler | warmup_fraction: '0.1'
2023-10-14 01:17:01,169 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,169 Final evaluation on model from best epoch (best-model.pt)
2023-10-14 01:17:01,169  - metric: "('micro avg', 'f1-score')"
2023-10-14 01:17:01,169 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,169 Computation:
2023-10-14 01:17:01,169  - compute on device: cuda:0
2023-10-14 01:17:01,169  - embedding storage: none
2023-10-14 01:17:01,169 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,169 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-5"
2023-10-14 01:17:01,169 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:01,169 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:06,714 epoch 1 - iter 99/992 - loss 1.86588415 - time (sec): 5.54 - samples/sec: 2785.36 - lr: 0.000005 - momentum: 0.000000
2023-10-14 01:17:12,449 epoch 1 - iter 198/992 - loss 1.09873625 - time (sec): 11.28 - samples/sec: 2794.75 - lr: 0.000010 - momentum: 0.000000
2023-10-14 01:17:18,387 epoch 1 - iter 297/992 - loss 0.80249953 - time (sec): 17.22 - samples/sec: 2794.10 - lr: 0.000015 - momentum: 0.000000
2023-10-14 01:17:23,944 epoch 1 - iter 396/992 - loss 0.64530064 - time (sec): 22.77 - samples/sec: 2826.21 - lr: 0.000020 - momentum: 0.000000
2023-10-14 01:17:29,847 epoch 1 - iter 495/992 - loss 0.54951004 - time (sec): 28.68 - samples/sec: 2818.40 - lr: 0.000025 - momentum: 0.000000
2023-10-14 01:17:35,860 epoch 1 - iter 594/992 - loss 0.47822528 - time (sec): 34.69 - samples/sec: 2821.08 - lr: 0.000030 - momentum: 0.000000
2023-10-14 01:17:41,757 epoch 1 - iter 693/992 - loss 0.43156759 - time (sec): 40.59 - samples/sec: 2802.37 - lr: 0.000035 - momentum: 0.000000
2023-10-14 01:17:47,720 epoch 1 - iter 792/992 - loss 0.39387496 - time (sec): 46.55 - samples/sec: 2794.11 - lr: 0.000040 - momentum: 0.000000
2023-10-14 01:17:53,595 epoch 1 - iter 891/992 - loss 0.36458290 - time (sec): 52.42 - samples/sec: 2790.26 - lr: 0.000045 - momentum: 0.000000
2023-10-14 01:17:59,691 epoch 1 - iter 990/992 - loss 0.34069275 - time (sec): 58.52 - samples/sec: 2792.26 - lr: 0.000050 - momentum: 0.000000
2023-10-14 01:17:59,900 ----------------------------------------------------------------------------------------------------
2023-10-14 01:17:59,900 EPOCH 1 done: loss 0.3399 - lr: 0.000050
2023-10-14 01:18:03,409 DEV : loss 0.09731486439704895 - f1-score (micro avg)  0.6696
2023-10-14 01:18:03,433 saving best model
2023-10-14 01:18:03,828 ----------------------------------------------------------------------------------------------------
2023-10-14 01:18:09,534 epoch 2 - iter 99/992 - loss 0.12915040 - time (sec): 5.70 - samples/sec: 2665.74 - lr: 0.000049 - momentum: 0.000000
2023-10-14 01:18:15,365 epoch 2 - iter 198/992 - loss 0.11559230 - time (sec): 11.54 - samples/sec: 2704.02 - lr: 0.000049 - momentum: 0.000000
2023-10-14 01:18:20,960 epoch 2 - iter 297/992 - loss 0.11414920 - time (sec): 17.13 - samples/sec: 2751.32 - lr: 0.000048 - momentum: 0.000000
2023-10-14 01:18:26,882 epoch 2 - iter 396/992 - loss 0.10894772 - time (sec): 23.05 - samples/sec: 2761.13 - lr: 0.000048 - momentum: 0.000000
2023-10-14 01:18:32,637 epoch 2 - iter 495/992 - loss 0.10862939 - time (sec): 28.81 - samples/sec: 2806.00 - lr: 0.000047 - momentum: 0.000000
2023-10-14 01:18:38,576 epoch 2 - iter 594/992 - loss 0.10732068 - time (sec): 34.75 - samples/sec: 2812.77 - lr: 0.000047 - momentum: 0.000000
2023-10-14 01:18:44,413 epoch 2 - iter 693/992 - loss 0.10605689 - time (sec): 40.58 - samples/sec: 2814.76 - lr: 0.000046 - momentum: 0.000000
2023-10-14 01:18:50,172 epoch 2 - iter 792/992 - loss 0.10337874 - time (sec): 46.34 - samples/sec: 2810.16 - lr: 0.000046 - momentum: 0.000000
2023-10-14 01:18:56,347 epoch 2 - iter 891/992 - loss 0.10259994 - time (sec): 52.52 - samples/sec: 2799.94 - lr: 0.000045 - momentum: 0.000000
2023-10-14 01:19:02,162 epoch 2 - iter 990/992 - loss 0.10325477 - time (sec): 58.33 - samples/sec: 2802.95 - lr: 0.000044 - momentum: 0.000000
2023-10-14 01:19:02,321 ----------------------------------------------------------------------------------------------------
2023-10-14 01:19:02,321 EPOCH 2 done: loss 0.1032 - lr: 0.000044
2023-10-14 01:19:05,742 DEV : loss 0.09102991223335266 - f1-score (micro avg)  0.7377
2023-10-14 01:19:05,763 saving best model
2023-10-14 01:19:06,277 ----------------------------------------------------------------------------------------------------
2023-10-14 01:19:11,924 epoch 3 - iter 99/992 - loss 0.06196304 - time (sec): 5.64 - samples/sec: 2675.30 - lr: 0.000044 - momentum: 0.000000
2023-10-14 01:19:17,975 epoch 3 - iter 198/992 - loss 0.06676977 - time (sec): 11.70 - samples/sec: 2774.34 - lr: 0.000043 - momentum: 0.000000
2023-10-14 01:19:23,503 epoch 3 - iter 297/992 - loss 0.07044537 - time (sec): 17.22 - samples/sec: 2780.26 - lr: 0.000043 - momentum: 0.000000
2023-10-14 01:19:29,407 epoch 3 - iter 396/992 - loss 0.07030135 - time (sec): 23.13 - samples/sec: 2755.93 - lr: 0.000042 - momentum: 0.000000
2023-10-14 01:19:35,437 epoch 3 - iter 495/992 - loss 0.06871568 - time (sec): 29.16 - samples/sec: 2793.45 - lr: 0.000042 - momentum: 0.000000
2023-10-14 01:19:41,277 epoch 3 - iter 594/992 - loss 0.07133824 - time (sec): 35.00 - samples/sec: 2795.31 - lr: 0.000041 - momentum: 0.000000
2023-10-14 01:19:47,157 epoch 3 - iter 693/992 - loss 0.07146656 - time (sec): 40.88 - samples/sec: 2803.67 - lr: 0.000041 - momentum: 0.000000
2023-10-14 01:19:53,718 epoch 3 - iter 792/992 - loss 0.07164041 - time (sec): 47.44 - samples/sec: 2769.56 - lr: 0.000040 - momentum: 0.000000
2023-10-14 01:19:59,430 epoch 3 - iter 891/992 - loss 0.07108609 - time (sec): 53.15 - samples/sec: 2765.26 - lr: 0.000039 - momentum: 0.000000
2023-10-14 01:20:05,167 epoch 3 - iter 990/992 - loss 0.07155895 - time (sec): 58.89 - samples/sec: 2778.37 - lr: 0.000039 - momentum: 0.000000
2023-10-14 01:20:05,297 ----------------------------------------------------------------------------------------------------
2023-10-14 01:20:05,298 EPOCH 3 done: loss 0.0715 - lr: 0.000039
2023-10-14 01:20:08,732 DEV : loss 0.11880763620138168 - f1-score (micro avg)  0.7402
2023-10-14 01:20:08,754 saving best model
2023-10-14 01:20:09,277 ----------------------------------------------------------------------------------------------------
2023-10-14 01:20:15,188 epoch 4 - iter 99/992 - loss 0.04511134 - time (sec): 5.91 - samples/sec: 2964.60 - lr: 0.000038 - momentum: 0.000000
2023-10-14 01:20:20,986 epoch 4 - iter 198/992 - loss 0.04991613 - time (sec): 11.71 - samples/sec: 2885.57 - lr: 0.000038 - momentum: 0.000000
2023-10-14 01:20:26,705 epoch 4 - iter 297/992 - loss 0.05461873 - time (sec): 17.43 - samples/sec: 2874.60 - lr: 0.000037 - momentum: 0.000000
2023-10-14 01:20:32,606 epoch 4 - iter 396/992 - loss 0.05366870 - time (sec): 23.33 - samples/sec: 2832.04 - lr: 0.000037 - momentum: 0.000000
2023-10-14 01:20:38,619 epoch 4 - iter 495/992 - loss 0.05304821 - time (sec): 29.34 - samples/sec: 2819.86 - lr: 0.000036 - momentum: 0.000000
2023-10-14 01:20:44,603 epoch 4 - iter 594/992 - loss 0.05335391 - time (sec): 35.32 - samples/sec: 2795.76 - lr: 0.000036 - momentum: 0.000000
2023-10-14 01:20:50,230 epoch 4 - iter 693/992 - loss 0.05328344 - time (sec): 40.95 - samples/sec: 2791.28 - lr: 0.000035 - momentum: 0.000000
2023-10-14 01:20:55,778 epoch 4 - iter 792/992 - loss 0.05344227 - time (sec): 46.50 - samples/sec: 2806.23 - lr: 0.000034 - momentum: 0.000000
2023-10-14 01:21:01,258 epoch 4 - iter 891/992 - loss 0.05342411 - time (sec): 51.98 - samples/sec: 2812.25 - lr: 0.000034 - momentum: 0.000000
2023-10-14 01:21:07,289 epoch 4 - iter 990/992 - loss 0.05595410 - time (sec): 58.01 - samples/sec: 2821.18 - lr: 0.000033 - momentum: 0.000000
2023-10-14 01:21:07,456 ----------------------------------------------------------------------------------------------------
2023-10-14 01:21:07,456 EPOCH 4 done: loss 0.0559 - lr: 0.000033
2023-10-14 01:21:10,852 DEV : loss 0.1232018768787384 - f1-score (micro avg)  0.7481
2023-10-14 01:21:10,872 saving best model
2023-10-14 01:21:11,360 ----------------------------------------------------------------------------------------------------
2023-10-14 01:21:17,069 epoch 5 - iter 99/992 - loss 0.03903289 - time (sec): 5.71 - samples/sec: 2900.52 - lr: 0.000033 - momentum: 0.000000
2023-10-14 01:21:22,768 epoch 5 - iter 198/992 - loss 0.03876095 - time (sec): 11.41 - samples/sec: 2924.11 - lr: 0.000032 - momentum: 0.000000
2023-10-14 01:21:28,419 epoch 5 - iter 297/992 - loss 0.04243056 - time (sec): 17.06 - samples/sec: 2890.32 - lr: 0.000032 - momentum: 0.000000
2023-10-14 01:21:34,101 epoch 5 - iter 396/992 - loss 0.03983144 - time (sec): 22.74 - samples/sec: 2898.00 - lr: 0.000031 - momentum: 0.000000
2023-10-14 01:21:39,707 epoch 5 - iter 495/992 - loss 0.03928814 - time (sec): 28.34 - samples/sec: 2904.32 - lr: 0.000031 - momentum: 0.000000
2023-10-14 01:21:45,374 epoch 5 - iter 594/992 - loss 0.03949960 - time (sec): 34.01 - samples/sec: 2907.48 - lr: 0.000030 - momentum: 0.000000
2023-10-14 01:21:50,895 epoch 5 - iter 693/992 - loss 0.04107109 - time (sec): 39.53 - samples/sec: 2888.93 - lr: 0.000029 - momentum: 0.000000
2023-10-14 01:21:57,000 epoch 5 - iter 792/992 - loss 0.04131385 - time (sec): 45.64 - samples/sec: 2875.31 - lr: 0.000029 - momentum: 0.000000
2023-10-14 01:22:02,977 epoch 5 - iter 891/992 - loss 0.04155687 - time (sec): 51.61 - samples/sec: 2853.04 - lr: 0.000028 - momentum: 0.000000
2023-10-14 01:22:08,801 epoch 5 - iter 990/992 - loss 0.04121265 - time (sec): 57.44 - samples/sec: 2850.10 - lr: 0.000028 - momentum: 0.000000
2023-10-14 01:22:08,913 ----------------------------------------------------------------------------------------------------
2023-10-14 01:22:08,913 EPOCH 5 done: loss 0.0413 - lr: 0.000028
2023-10-14 01:22:12,796 DEV : loss 0.16722512245178223 - f1-score (micro avg)  0.7586
2023-10-14 01:22:12,817 saving best model
2023-10-14 01:22:13,338 ----------------------------------------------------------------------------------------------------
2023-10-14 01:22:19,723 epoch 6 - iter 99/992 - loss 0.03646699 - time (sec): 6.38 - samples/sec: 2714.64 - lr: 0.000027 - momentum: 0.000000
2023-10-14 01:22:25,276 epoch 6 - iter 198/992 - loss 0.03616235 - time (sec): 11.94 - samples/sec: 2798.79 - lr: 0.000027 - momentum: 0.000000
2023-10-14 01:22:30,923 epoch 6 - iter 297/992 - loss 0.03179580 - time (sec): 17.58 - samples/sec: 2799.10 - lr: 0.000026 - momentum: 0.000000
2023-10-14 01:22:36,766 epoch 6 - iter 396/992 - loss 0.03143445 - time (sec): 23.43 - samples/sec: 2811.42 - lr: 0.000026 - momentum: 0.000000
2023-10-14 01:22:42,630 epoch 6 - iter 495/992 - loss 0.03072024 - time (sec): 29.29 - samples/sec: 2815.34 - lr: 0.000025 - momentum: 0.000000
2023-10-14 01:22:48,259 epoch 6 - iter 594/992 - loss 0.03079986 - time (sec): 34.92 - samples/sec: 2818.20 - lr: 0.000024 - momentum: 0.000000
2023-10-14 01:22:54,316 epoch 6 - iter 693/992 - loss 0.03043770 - time (sec): 40.98 - samples/sec: 2799.34 - lr: 0.000024 - momentum: 0.000000
2023-10-14 01:23:00,319 epoch 6 - iter 792/992 - loss 0.02994022 - time (sec): 46.98 - samples/sec: 2790.61 - lr: 0.000023 - momentum: 0.000000
2023-10-14 01:23:06,471 epoch 6 - iter 891/992 - loss 0.03012561 - time (sec): 53.13 - samples/sec: 2787.05 - lr: 0.000023 - momentum: 0.000000
2023-10-14 01:23:12,180 epoch 6 - iter 990/992 - loss 0.03016026 - time (sec): 58.84 - samples/sec: 2782.11 - lr: 0.000022 - momentum: 0.000000
2023-10-14 01:23:12,291 ----------------------------------------------------------------------------------------------------
2023-10-14 01:23:12,291 EPOCH 6 done: loss 0.0301 - lr: 0.000022
2023-10-14 01:23:15,730 DEV : loss 0.17587369680404663 - f1-score (micro avg)  0.7538
2023-10-14 01:23:15,751 ----------------------------------------------------------------------------------------------------
2023-10-14 01:23:21,560 epoch 7 - iter 99/992 - loss 0.02680858 - time (sec): 5.81 - samples/sec: 2783.87 - lr: 0.000022 - momentum: 0.000000
2023-10-14 01:23:27,386 epoch 7 - iter 198/992 - loss 0.03036132 - time (sec): 11.63 - samples/sec: 2760.05 - lr: 0.000021 - momentum: 0.000000
2023-10-14 01:23:33,314 epoch 7 - iter 297/992 - loss 0.02504973 - time (sec): 17.56 - samples/sec: 2794.69 - lr: 0.000021 - momentum: 0.000000
2023-10-14 01:23:39,233 epoch 7 - iter 396/992 - loss 0.02602930 - time (sec): 23.48 - samples/sec: 2795.69 - lr: 0.000020 - momentum: 0.000000
2023-10-14 01:23:44,944 epoch 7 - iter 495/992 - loss 0.02459364 - time (sec): 29.19 - samples/sec: 2796.12 - lr: 0.000019 - momentum: 0.000000
2023-10-14 01:23:50,850 epoch 7 - iter 594/992 - loss 0.02499387 - time (sec): 35.10 - samples/sec: 2800.96 - lr: 0.000019 - momentum: 0.000000
2023-10-14 01:23:56,991 epoch 7 - iter 693/992 - loss 0.02457250 - time (sec): 41.24 - samples/sec: 2789.41 - lr: 0.000018 - momentum: 0.000000
2023-10-14 01:24:02,722 epoch 7 - iter 792/992 - loss 0.02502738 - time (sec): 46.97 - samples/sec: 2791.68 - lr: 0.000018 - momentum: 0.000000
2023-10-14 01:24:08,621 epoch 7 - iter 891/992 - loss 0.02451035 - time (sec): 52.87 - samples/sec: 2789.21 - lr: 0.000017 - momentum: 0.000000
2023-10-14 01:24:14,382 epoch 7 - iter 990/992 - loss 0.02379099 - time (sec): 58.63 - samples/sec: 2791.31 - lr: 0.000017 - momentum: 0.000000
2023-10-14 01:24:14,488 ----------------------------------------------------------------------------------------------------
2023-10-14 01:24:14,489 EPOCH 7 done: loss 0.0238 - lr: 0.000017
2023-10-14 01:24:18,294 DEV : loss 0.1903119683265686 - f1-score (micro avg)  0.7529
2023-10-14 01:24:18,315 ----------------------------------------------------------------------------------------------------
2023-10-14 01:24:24,188 epoch 8 - iter 99/992 - loss 0.01483288 - time (sec): 5.87 - samples/sec: 2925.75 - lr: 0.000016 - momentum: 0.000000
2023-10-14 01:24:29,964 epoch 8 - iter 198/992 - loss 0.01246495 - time (sec): 11.65 - samples/sec: 2854.69 - lr: 0.000016 - momentum: 0.000000
2023-10-14 01:24:35,622 epoch 8 - iter 297/992 - loss 0.01429814 - time (sec): 17.31 - samples/sec: 2821.25 - lr: 0.000015 - momentum: 0.000000
2023-10-14 01:24:41,814 epoch 8 - iter 396/992 - loss 0.01483730 - time (sec): 23.50 - samples/sec: 2813.76 - lr: 0.000014 - momentum: 0.000000
2023-10-14 01:24:47,786 epoch 8 - iter 495/992 - loss 0.01516838 - time (sec): 29.47 - samples/sec: 2816.87 - lr: 0.000014 - momentum: 0.000000
2023-10-14 01:24:53,786 epoch 8 - iter 594/992 - loss 0.01536659 - time (sec): 35.47 - samples/sec: 2816.83 - lr: 0.000013 - momentum: 0.000000
2023-10-14 01:24:59,378 epoch 8 - iter 693/992 - loss 0.01490078 - time (sec): 41.06 - samples/sec: 2828.30 - lr: 0.000013 - momentum: 0.000000
2023-10-14 01:25:05,215 epoch 8 - iter 792/992 - loss 0.01534412 - time (sec): 46.90 - samples/sec: 2812.41 - lr: 0.000012 - momentum: 0.000000
2023-10-14 01:25:11,007 epoch 8 - iter 891/992 - loss 0.01570233 - time (sec): 52.69 - samples/sec: 2806.90 - lr: 0.000012 - momentum: 0.000000
2023-10-14 01:25:16,688 epoch 8 - iter 990/992 - loss 0.01551159 - time (sec): 58.37 - samples/sec: 2805.59 - lr: 0.000011 - momentum: 0.000000
2023-10-14 01:25:16,785 ----------------------------------------------------------------------------------------------------
2023-10-14 01:25:16,785 EPOCH 8 done: loss 0.0155 - lr: 0.000011
2023-10-14 01:25:20,520 DEV : loss 0.20634520053863525 - f1-score (micro avg)  0.7621
2023-10-14 01:25:20,553 saving best model
2023-10-14 01:25:21,058 ----------------------------------------------------------------------------------------------------
2023-10-14 01:25:26,647 epoch 9 - iter 99/992 - loss 0.01027848 - time (sec): 5.59 - samples/sec: 2895.54 - lr: 0.000011 - momentum: 0.000000
2023-10-14 01:25:32,581 epoch 9 - iter 198/992 - loss 0.01067700 - time (sec): 11.52 - samples/sec: 2883.19 - lr: 0.000010 - momentum: 0.000000
2023-10-14 01:25:38,714 epoch 9 - iter 297/992 - loss 0.01091812 - time (sec): 17.65 - samples/sec: 2834.44 - lr: 0.000009 - momentum: 0.000000
2023-10-14 01:25:44,463 epoch 9 - iter 396/992 - loss 0.01073457 - time (sec): 23.40 - samples/sec: 2805.81 - lr: 0.000009 - momentum: 0.000000
2023-10-14 01:25:50,371 epoch 9 - iter 495/992 - loss 0.01029950 - time (sec): 29.31 - samples/sec: 2807.06 - lr: 0.000008 - momentum: 0.000000
2023-10-14 01:25:56,117 epoch 9 - iter 594/992 - loss 0.01110924 - time (sec): 35.06 - samples/sec: 2815.10 - lr: 0.000008 - momentum: 0.000000
2023-10-14 01:26:02,160 epoch 9 - iter 693/992 - loss 0.01176843 - time (sec): 41.10 - samples/sec: 2792.15 - lr: 0.000007 - momentum: 0.000000
2023-10-14 01:26:08,166 epoch 9 - iter 792/992 - loss 0.01153996 - time (sec): 47.11 - samples/sec: 2796.98 - lr: 0.000007 - momentum: 0.000000
2023-10-14 01:26:13,818 epoch 9 - iter 891/992 - loss 0.01138962 - time (sec): 52.76 - samples/sec: 2798.37 - lr: 0.000006 - momentum: 0.000000
2023-10-14 01:26:19,543 epoch 9 - iter 990/992 - loss 0.01156023 - time (sec): 58.48 - samples/sec: 2796.42 - lr: 0.000006 - momentum: 0.000000
2023-10-14 01:26:19,696 ----------------------------------------------------------------------------------------------------
2023-10-14 01:26:19,696 EPOCH 9 done: loss 0.0115 - lr: 0.000006
2023-10-14 01:26:23,692 DEV : loss 0.2140767127275467 - f1-score (micro avg)  0.7623
2023-10-14 01:26:23,714 saving best model
2023-10-14 01:26:24,211 ----------------------------------------------------------------------------------------------------
2023-10-14 01:26:30,251 epoch 10 - iter 99/992 - loss 0.00607883 - time (sec): 6.04 - samples/sec: 2911.03 - lr: 0.000005 - momentum: 0.000000
2023-10-14 01:26:36,244 epoch 10 - iter 198/992 - loss 0.00672977 - time (sec): 12.03 - samples/sec: 2833.64 - lr: 0.000004 - momentum: 0.000000
2023-10-14 01:26:41,831 epoch 10 - iter 297/992 - loss 0.00683740 - time (sec): 17.62 - samples/sec: 2791.38 - lr: 0.000004 - momentum: 0.000000
2023-10-14 01:26:47,746 epoch 10 - iter 396/992 - loss 0.00754100 - time (sec): 23.53 - samples/sec: 2795.11 - lr: 0.000003 - momentum: 0.000000
2023-10-14 01:26:53,622 epoch 10 - iter 495/992 - loss 0.00737902 - time (sec): 29.41 - samples/sec: 2800.18 - lr: 0.000003 - momentum: 0.000000
2023-10-14 01:26:59,458 epoch 10 - iter 594/992 - loss 0.00713022 - time (sec): 35.24 - samples/sec: 2791.65 - lr: 0.000002 - momentum: 0.000000
2023-10-14 01:27:05,247 epoch 10 - iter 693/992 - loss 0.00806298 - time (sec): 41.03 - samples/sec: 2793.59 - lr: 0.000002 - momentum: 0.000000
2023-10-14 01:27:11,236 epoch 10 - iter 792/992 - loss 0.00809314 - time (sec): 47.02 - samples/sec: 2792.23 - lr: 0.000001 - momentum: 0.000000
2023-10-14 01:27:16,908 epoch 10 - iter 891/992 - loss 0.00803018 - time (sec): 52.69 - samples/sec: 2806.21 - lr: 0.000001 - momentum: 0.000000
2023-10-14 01:27:22,660 epoch 10 - iter 990/992 - loss 0.00838705 - time (sec): 58.45 - samples/sec: 2800.71 - lr: 0.000000 - momentum: 0.000000
2023-10-14 01:27:22,767 ----------------------------------------------------------------------------------------------------
2023-10-14 01:27:22,767 EPOCH 10 done: loss 0.0084 - lr: 0.000000
2023-10-14 01:27:26,208 DEV : loss 0.22556838393211365 - f1-score (micro avg)  0.7641
2023-10-14 01:27:26,232 saving best model
2023-10-14 01:27:27,131 ----------------------------------------------------------------------------------------------------
2023-10-14 01:27:27,132 Loading model from best epoch ...
2023-10-14 01:27:28,425 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-14 01:27:31,682 
Results:
- F-score (micro) 0.7925
- F-score (macro) 0.712
- Accuracy 0.6784

By class:
              precision    recall  f1-score   support

         LOC     0.8363    0.8656    0.8507       655
         PER     0.7336    0.8027    0.7666       223
         ORG     0.5536    0.4882    0.5188       127

   micro avg     0.7814    0.8040    0.7925      1005
   macro avg     0.7078    0.7188    0.7120      1005
weighted avg     0.7778    0.8040    0.7901      1005

2023-10-14 01:27:31,682 ----------------------------------------------------------------------------------------------------