File size: 23,859 Bytes
f745937
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
2023-10-14 01:06:03,859 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,860 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-14 01:06:03,860 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,860 MultiCorpus: 7936 train + 992 dev + 992 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-14 01:06:03,860 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,860 Train:  7936 sentences
2023-10-14 01:06:03,860         (train_with_dev=False, train_with_test=False)
2023-10-14 01:06:03,860 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,860 Training Params:
2023-10-14 01:06:03,860  - learning_rate: "3e-05" 
2023-10-14 01:06:03,860  - mini_batch_size: "8"
2023-10-14 01:06:03,860  - max_epochs: "10"
2023-10-14 01:06:03,860  - shuffle: "True"
2023-10-14 01:06:03,860 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,860 Plugins:
2023-10-14 01:06:03,860  - LinearScheduler | warmup_fraction: '0.1'
2023-10-14 01:06:03,860 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,861 Final evaluation on model from best epoch (best-model.pt)
2023-10-14 01:06:03,861  - metric: "('micro avg', 'f1-score')"
2023-10-14 01:06:03,861 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,861 Computation:
2023-10-14 01:06:03,861  - compute on device: cuda:0
2023-10-14 01:06:03,861  - embedding storage: none
2023-10-14 01:06:03,861 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,861 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-5"
2023-10-14 01:06:03,861 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:03,861 ----------------------------------------------------------------------------------------------------
2023-10-14 01:06:09,367 epoch 1 - iter 99/992 - loss 2.17471007 - time (sec): 5.51 - samples/sec: 2804.90 - lr: 0.000003 - momentum: 0.000000
2023-10-14 01:06:15,065 epoch 1 - iter 198/992 - loss 1.30549124 - time (sec): 11.20 - samples/sec: 2813.64 - lr: 0.000006 - momentum: 0.000000
2023-10-14 01:06:20,977 epoch 1 - iter 297/992 - loss 0.95601849 - time (sec): 17.11 - samples/sec: 2810.76 - lr: 0.000009 - momentum: 0.000000
2023-10-14 01:06:26,541 epoch 1 - iter 396/992 - loss 0.76749053 - time (sec): 22.68 - samples/sec: 2838.05 - lr: 0.000012 - momentum: 0.000000
2023-10-14 01:06:32,406 epoch 1 - iter 495/992 - loss 0.65224931 - time (sec): 28.54 - samples/sec: 2831.49 - lr: 0.000015 - momentum: 0.000000
2023-10-14 01:06:38,300 epoch 1 - iter 594/992 - loss 0.56475688 - time (sec): 34.44 - samples/sec: 2841.74 - lr: 0.000018 - momentum: 0.000000
2023-10-14 01:06:44,094 epoch 1 - iter 693/992 - loss 0.50715690 - time (sec): 40.23 - samples/sec: 2827.09 - lr: 0.000021 - momentum: 0.000000
2023-10-14 01:06:50,014 epoch 1 - iter 792/992 - loss 0.46053485 - time (sec): 46.15 - samples/sec: 2818.19 - lr: 0.000024 - momentum: 0.000000
2023-10-14 01:06:55,862 epoch 1 - iter 891/992 - loss 0.42443987 - time (sec): 52.00 - samples/sec: 2813.04 - lr: 0.000027 - momentum: 0.000000
2023-10-14 01:07:01,950 epoch 1 - iter 990/992 - loss 0.39381793 - time (sec): 58.09 - samples/sec: 2813.05 - lr: 0.000030 - momentum: 0.000000
2023-10-14 01:07:02,157 ----------------------------------------------------------------------------------------------------
2023-10-14 01:07:02,157 EPOCH 1 done: loss 0.3930 - lr: 0.000030
2023-10-14 01:07:05,244 DEV : loss 0.09621600061655045 - f1-score (micro avg)  0.6851
2023-10-14 01:07:05,264 saving best model
2023-10-14 01:07:05,661 ----------------------------------------------------------------------------------------------------
2023-10-14 01:07:11,275 epoch 2 - iter 99/992 - loss 0.13490577 - time (sec): 5.61 - samples/sec: 2709.42 - lr: 0.000030 - momentum: 0.000000
2023-10-14 01:07:17,108 epoch 2 - iter 198/992 - loss 0.11610667 - time (sec): 11.45 - samples/sec: 2725.35 - lr: 0.000029 - momentum: 0.000000
2023-10-14 01:07:22,696 epoch 2 - iter 297/992 - loss 0.11416877 - time (sec): 17.03 - samples/sec: 2766.88 - lr: 0.000029 - momentum: 0.000000
2023-10-14 01:07:28,612 epoch 2 - iter 396/992 - loss 0.10769612 - time (sec): 22.95 - samples/sec: 2773.46 - lr: 0.000029 - momentum: 0.000000
2023-10-14 01:07:34,383 epoch 2 - iter 495/992 - loss 0.10656848 - time (sec): 28.72 - samples/sec: 2814.53 - lr: 0.000028 - momentum: 0.000000
2023-10-14 01:07:40,280 epoch 2 - iter 594/992 - loss 0.10563479 - time (sec): 34.62 - samples/sec: 2823.23 - lr: 0.000028 - momentum: 0.000000
2023-10-14 01:07:46,122 epoch 2 - iter 693/992 - loss 0.10497520 - time (sec): 40.46 - samples/sec: 2823.34 - lr: 0.000028 - momentum: 0.000000
2023-10-14 01:07:51,931 epoch 2 - iter 792/992 - loss 0.10311384 - time (sec): 46.27 - samples/sec: 2814.64 - lr: 0.000027 - momentum: 0.000000
2023-10-14 01:07:58,133 epoch 2 - iter 891/992 - loss 0.10194024 - time (sec): 52.47 - samples/sec: 2802.48 - lr: 0.000027 - momentum: 0.000000
2023-10-14 01:08:03,960 epoch 2 - iter 990/992 - loss 0.10262038 - time (sec): 58.30 - samples/sec: 2804.64 - lr: 0.000027 - momentum: 0.000000
2023-10-14 01:08:04,121 ----------------------------------------------------------------------------------------------------
2023-10-14 01:08:04,121 EPOCH 2 done: loss 0.1025 - lr: 0.000027
2023-10-14 01:08:07,983 DEV : loss 0.08396855741739273 - f1-score (micro avg)  0.7416
2023-10-14 01:08:08,004 saving best model
2023-10-14 01:08:08,517 ----------------------------------------------------------------------------------------------------
2023-10-14 01:08:14,189 epoch 3 - iter 99/992 - loss 0.06551401 - time (sec): 5.67 - samples/sec: 2662.64 - lr: 0.000026 - momentum: 0.000000
2023-10-14 01:08:20,258 epoch 3 - iter 198/992 - loss 0.06924244 - time (sec): 11.74 - samples/sec: 2763.88 - lr: 0.000026 - momentum: 0.000000
2023-10-14 01:08:25,798 epoch 3 - iter 297/992 - loss 0.07056573 - time (sec): 17.28 - samples/sec: 2771.16 - lr: 0.000026 - momentum: 0.000000
2023-10-14 01:08:31,708 epoch 3 - iter 396/992 - loss 0.07055584 - time (sec): 23.19 - samples/sec: 2748.52 - lr: 0.000025 - momentum: 0.000000
2023-10-14 01:08:37,722 epoch 3 - iter 495/992 - loss 0.06873774 - time (sec): 29.20 - samples/sec: 2788.98 - lr: 0.000025 - momentum: 0.000000
2023-10-14 01:08:43,529 epoch 3 - iter 594/992 - loss 0.07045008 - time (sec): 35.01 - samples/sec: 2794.24 - lr: 0.000025 - momentum: 0.000000
2023-10-14 01:08:49,391 epoch 3 - iter 693/992 - loss 0.07037162 - time (sec): 40.87 - samples/sec: 2803.95 - lr: 0.000024 - momentum: 0.000000
2023-10-14 01:08:55,506 epoch 3 - iter 792/992 - loss 0.07021216 - time (sec): 46.99 - samples/sec: 2796.17 - lr: 0.000024 - momentum: 0.000000
2023-10-14 01:09:01,195 epoch 3 - iter 891/992 - loss 0.06994634 - time (sec): 52.68 - samples/sec: 2790.15 - lr: 0.000024 - momentum: 0.000000
2023-10-14 01:09:06,919 epoch 3 - iter 990/992 - loss 0.06967297 - time (sec): 58.40 - samples/sec: 2801.51 - lr: 0.000023 - momentum: 0.000000
2023-10-14 01:09:07,048 ----------------------------------------------------------------------------------------------------
2023-10-14 01:09:07,048 EPOCH 3 done: loss 0.0696 - lr: 0.000023
2023-10-14 01:09:10,503 DEV : loss 0.11555210500955582 - f1-score (micro avg)  0.7446
2023-10-14 01:09:10,523 saving best model
2023-10-14 01:09:11,025 ----------------------------------------------------------------------------------------------------
2023-10-14 01:09:16,953 epoch 4 - iter 99/992 - loss 0.03972797 - time (sec): 5.93 - samples/sec: 2955.76 - lr: 0.000023 - momentum: 0.000000
2023-10-14 01:09:22,805 epoch 4 - iter 198/992 - loss 0.04570840 - time (sec): 11.78 - samples/sec: 2867.88 - lr: 0.000023 - momentum: 0.000000
2023-10-14 01:09:28,524 epoch 4 - iter 297/992 - loss 0.04904627 - time (sec): 17.50 - samples/sec: 2862.88 - lr: 0.000022 - momentum: 0.000000
2023-10-14 01:09:34,474 epoch 4 - iter 396/992 - loss 0.04945405 - time (sec): 23.45 - samples/sec: 2817.49 - lr: 0.000022 - momentum: 0.000000
2023-10-14 01:09:40,476 epoch 4 - iter 495/992 - loss 0.04831346 - time (sec): 29.45 - samples/sec: 2809.35 - lr: 0.000022 - momentum: 0.000000
2023-10-14 01:09:46,365 epoch 4 - iter 594/992 - loss 0.04825902 - time (sec): 35.34 - samples/sec: 2794.59 - lr: 0.000021 - momentum: 0.000000
2023-10-14 01:09:52,103 epoch 4 - iter 693/992 - loss 0.04856953 - time (sec): 41.08 - samples/sec: 2782.77 - lr: 0.000021 - momentum: 0.000000
2023-10-14 01:09:57,796 epoch 4 - iter 792/992 - loss 0.04919667 - time (sec): 46.77 - samples/sec: 2790.01 - lr: 0.000021 - momentum: 0.000000
2023-10-14 01:10:03,524 epoch 4 - iter 891/992 - loss 0.04895947 - time (sec): 52.50 - samples/sec: 2784.48 - lr: 0.000020 - momentum: 0.000000
2023-10-14 01:10:09,641 epoch 4 - iter 990/992 - loss 0.05170296 - time (sec): 58.62 - samples/sec: 2792.06 - lr: 0.000020 - momentum: 0.000000
2023-10-14 01:10:09,813 ----------------------------------------------------------------------------------------------------
2023-10-14 01:10:09,813 EPOCH 4 done: loss 0.0517 - lr: 0.000020
2023-10-14 01:10:13,733 DEV : loss 0.11588922142982483 - f1-score (micro avg)  0.7508
2023-10-14 01:10:13,754 saving best model
2023-10-14 01:10:14,263 ----------------------------------------------------------------------------------------------------
2023-10-14 01:10:20,108 epoch 5 - iter 99/992 - loss 0.03385090 - time (sec): 5.84 - samples/sec: 2833.18 - lr: 0.000020 - momentum: 0.000000
2023-10-14 01:10:25,922 epoch 5 - iter 198/992 - loss 0.03715409 - time (sec): 11.66 - samples/sec: 2861.38 - lr: 0.000019 - momentum: 0.000000
2023-10-14 01:10:31,745 epoch 5 - iter 297/992 - loss 0.03954664 - time (sec): 17.48 - samples/sec: 2820.44 - lr: 0.000019 - momentum: 0.000000
2023-10-14 01:10:37,565 epoch 5 - iter 396/992 - loss 0.03785856 - time (sec): 23.30 - samples/sec: 2828.39 - lr: 0.000019 - momentum: 0.000000
2023-10-14 01:10:43,326 epoch 5 - iter 495/992 - loss 0.03770491 - time (sec): 29.06 - samples/sec: 2832.89 - lr: 0.000018 - momentum: 0.000000
2023-10-14 01:10:49,025 epoch 5 - iter 594/992 - loss 0.03881175 - time (sec): 34.76 - samples/sec: 2844.93 - lr: 0.000018 - momentum: 0.000000
2023-10-14 01:10:54,554 epoch 5 - iter 693/992 - loss 0.04006727 - time (sec): 40.29 - samples/sec: 2834.78 - lr: 0.000018 - momentum: 0.000000
2023-10-14 01:11:00,484 epoch 5 - iter 792/992 - loss 0.04022835 - time (sec): 46.22 - samples/sec: 2839.18 - lr: 0.000017 - momentum: 0.000000
2023-10-14 01:11:06,349 epoch 5 - iter 891/992 - loss 0.03964020 - time (sec): 52.08 - samples/sec: 2827.37 - lr: 0.000017 - momentum: 0.000000
2023-10-14 01:11:12,062 epoch 5 - iter 990/992 - loss 0.03963685 - time (sec): 57.80 - samples/sec: 2832.44 - lr: 0.000017 - momentum: 0.000000
2023-10-14 01:11:12,174 ----------------------------------------------------------------------------------------------------
2023-10-14 01:11:12,174 EPOCH 5 done: loss 0.0396 - lr: 0.000017
2023-10-14 01:11:15,549 DEV : loss 0.149429589509964 - f1-score (micro avg)  0.7512
2023-10-14 01:11:15,571 saving best model
2023-10-14 01:11:16,077 ----------------------------------------------------------------------------------------------------
2023-10-14 01:11:22,503 epoch 6 - iter 99/992 - loss 0.03207299 - time (sec): 6.42 - samples/sec: 2699.08 - lr: 0.000016 - momentum: 0.000000
2023-10-14 01:11:28,059 epoch 6 - iter 198/992 - loss 0.03319848 - time (sec): 11.98 - samples/sec: 2789.57 - lr: 0.000016 - momentum: 0.000000
2023-10-14 01:11:33,724 epoch 6 - iter 297/992 - loss 0.03023090 - time (sec): 17.64 - samples/sec: 2790.07 - lr: 0.000016 - momentum: 0.000000
2023-10-14 01:11:39,537 epoch 6 - iter 396/992 - loss 0.03113076 - time (sec): 23.45 - samples/sec: 2808.14 - lr: 0.000015 - momentum: 0.000000
2023-10-14 01:11:45,406 epoch 6 - iter 495/992 - loss 0.03125986 - time (sec): 29.32 - samples/sec: 2812.25 - lr: 0.000015 - momentum: 0.000000
2023-10-14 01:11:51,038 epoch 6 - iter 594/992 - loss 0.03070883 - time (sec): 34.95 - samples/sec: 2815.40 - lr: 0.000015 - momentum: 0.000000
2023-10-14 01:11:56,908 epoch 6 - iter 693/992 - loss 0.03147804 - time (sec): 40.82 - samples/sec: 2809.72 - lr: 0.000014 - momentum: 0.000000
2023-10-14 01:12:02,821 epoch 6 - iter 792/992 - loss 0.03114004 - time (sec): 46.74 - samples/sec: 2805.07 - lr: 0.000014 - momentum: 0.000000
2023-10-14 01:12:08,954 epoch 6 - iter 891/992 - loss 0.03098304 - time (sec): 52.87 - samples/sec: 2800.80 - lr: 0.000014 - momentum: 0.000000
2023-10-14 01:12:14,742 epoch 6 - iter 990/992 - loss 0.03115872 - time (sec): 58.66 - samples/sec: 2790.74 - lr: 0.000013 - momentum: 0.000000
2023-10-14 01:12:14,852 ----------------------------------------------------------------------------------------------------
2023-10-14 01:12:14,852 EPOCH 6 done: loss 0.0311 - lr: 0.000013
2023-10-14 01:12:18,275 DEV : loss 0.1660223752260208 - f1-score (micro avg)  0.7549
2023-10-14 01:12:18,296 saving best model
2023-10-14 01:12:18,723 ----------------------------------------------------------------------------------------------------
2023-10-14 01:12:24,554 epoch 7 - iter 99/992 - loss 0.02003484 - time (sec): 5.83 - samples/sec: 2773.72 - lr: 0.000013 - momentum: 0.000000
2023-10-14 01:12:30,403 epoch 7 - iter 198/992 - loss 0.02219555 - time (sec): 11.68 - samples/sec: 2749.50 - lr: 0.000013 - momentum: 0.000000
2023-10-14 01:12:36,359 epoch 7 - iter 297/992 - loss 0.02044054 - time (sec): 17.63 - samples/sec: 2783.24 - lr: 0.000012 - momentum: 0.000000
2023-10-14 01:12:42,275 epoch 7 - iter 396/992 - loss 0.02115285 - time (sec): 23.55 - samples/sec: 2787.47 - lr: 0.000012 - momentum: 0.000000
2023-10-14 01:12:47,979 epoch 7 - iter 495/992 - loss 0.02111835 - time (sec): 29.25 - samples/sec: 2790.21 - lr: 0.000012 - momentum: 0.000000
2023-10-14 01:12:53,906 epoch 7 - iter 594/992 - loss 0.02231115 - time (sec): 35.18 - samples/sec: 2794.36 - lr: 0.000011 - momentum: 0.000000
2023-10-14 01:12:59,944 epoch 7 - iter 693/992 - loss 0.02257417 - time (sec): 41.22 - samples/sec: 2790.76 - lr: 0.000011 - momentum: 0.000000
2023-10-14 01:13:05,688 epoch 7 - iter 792/992 - loss 0.02390230 - time (sec): 46.96 - samples/sec: 2792.08 - lr: 0.000011 - momentum: 0.000000
2023-10-14 01:13:11,978 epoch 7 - iter 891/992 - loss 0.02331735 - time (sec): 53.25 - samples/sec: 2769.06 - lr: 0.000010 - momentum: 0.000000
2023-10-14 01:13:17,717 epoch 7 - iter 990/992 - loss 0.02276912 - time (sec): 58.99 - samples/sec: 2774.18 - lr: 0.000010 - momentum: 0.000000
2023-10-14 01:13:17,823 ----------------------------------------------------------------------------------------------------
2023-10-14 01:13:17,823 EPOCH 7 done: loss 0.0228 - lr: 0.000010
2023-10-14 01:13:21,216 DEV : loss 0.19811701774597168 - f1-score (micro avg)  0.7519
2023-10-14 01:13:21,240 ----------------------------------------------------------------------------------------------------
2023-10-14 01:13:26,990 epoch 8 - iter 99/992 - loss 0.01904585 - time (sec): 5.75 - samples/sec: 2988.62 - lr: 0.000010 - momentum: 0.000000
2023-10-14 01:13:32,631 epoch 8 - iter 198/992 - loss 0.01503292 - time (sec): 11.39 - samples/sec: 2919.52 - lr: 0.000009 - momentum: 0.000000
2023-10-14 01:13:38,169 epoch 8 - iter 297/992 - loss 0.01613248 - time (sec): 16.93 - samples/sec: 2884.32 - lr: 0.000009 - momentum: 0.000000
2023-10-14 01:13:44,251 epoch 8 - iter 396/992 - loss 0.01565979 - time (sec): 23.01 - samples/sec: 2873.38 - lr: 0.000009 - momentum: 0.000000
2023-10-14 01:13:50,113 epoch 8 - iter 495/992 - loss 0.01515949 - time (sec): 28.87 - samples/sec: 2875.24 - lr: 0.000008 - momentum: 0.000000
2023-10-14 01:13:56,010 epoch 8 - iter 594/992 - loss 0.01545050 - time (sec): 34.77 - samples/sec: 2873.67 - lr: 0.000008 - momentum: 0.000000
2023-10-14 01:14:01,470 epoch 8 - iter 693/992 - loss 0.01547792 - time (sec): 40.23 - samples/sec: 2886.91 - lr: 0.000008 - momentum: 0.000000
2023-10-14 01:14:07,035 epoch 8 - iter 792/992 - loss 0.01620138 - time (sec): 45.79 - samples/sec: 2880.33 - lr: 0.000007 - momentum: 0.000000
2023-10-14 01:14:12,743 epoch 8 - iter 891/992 - loss 0.01643518 - time (sec): 51.50 - samples/sec: 2871.73 - lr: 0.000007 - momentum: 0.000000
2023-10-14 01:14:18,495 epoch 8 - iter 990/992 - loss 0.01691683 - time (sec): 57.25 - samples/sec: 2860.38 - lr: 0.000007 - momentum: 0.000000
2023-10-14 01:14:18,596 ----------------------------------------------------------------------------------------------------
2023-10-14 01:14:18,596 EPOCH 8 done: loss 0.0169 - lr: 0.000007
2023-10-14 01:14:22,033 DEV : loss 0.2040073573589325 - f1-score (micro avg)  0.7532
2023-10-14 01:14:22,053 ----------------------------------------------------------------------------------------------------
2023-10-14 01:14:27,795 epoch 9 - iter 99/992 - loss 0.01058698 - time (sec): 5.74 - samples/sec: 2818.02 - lr: 0.000006 - momentum: 0.000000
2023-10-14 01:14:33,771 epoch 9 - iter 198/992 - loss 0.01076997 - time (sec): 11.72 - samples/sec: 2835.15 - lr: 0.000006 - momentum: 0.000000
2023-10-14 01:14:39,862 epoch 9 - iter 297/992 - loss 0.01232413 - time (sec): 17.81 - samples/sec: 2809.95 - lr: 0.000006 - momentum: 0.000000
2023-10-14 01:14:45,579 epoch 9 - iter 396/992 - loss 0.01181418 - time (sec): 23.52 - samples/sec: 2791.33 - lr: 0.000005 - momentum: 0.000000
2023-10-14 01:14:51,506 epoch 9 - iter 495/992 - loss 0.01146147 - time (sec): 29.45 - samples/sec: 2793.73 - lr: 0.000005 - momentum: 0.000000
2023-10-14 01:14:57,263 epoch 9 - iter 594/992 - loss 0.01205104 - time (sec): 35.21 - samples/sec: 2802.96 - lr: 0.000005 - momentum: 0.000000
2023-10-14 01:15:03,302 epoch 9 - iter 693/992 - loss 0.01212528 - time (sec): 41.25 - samples/sec: 2782.13 - lr: 0.000004 - momentum: 0.000000
2023-10-14 01:15:09,297 epoch 9 - iter 792/992 - loss 0.01203877 - time (sec): 47.24 - samples/sec: 2788.87 - lr: 0.000004 - momentum: 0.000000
2023-10-14 01:15:14,909 epoch 9 - iter 891/992 - loss 0.01226298 - time (sec): 52.85 - samples/sec: 2793.28 - lr: 0.000004 - momentum: 0.000000
2023-10-14 01:15:20,595 epoch 9 - iter 990/992 - loss 0.01257149 - time (sec): 58.54 - samples/sec: 2793.66 - lr: 0.000003 - momentum: 0.000000
2023-10-14 01:15:20,749 ----------------------------------------------------------------------------------------------------
2023-10-14 01:15:20,750 EPOCH 9 done: loss 0.0125 - lr: 0.000003
2023-10-14 01:15:24,734 DEV : loss 0.20826229453086853 - f1-score (micro avg)  0.7574
2023-10-14 01:15:24,754 saving best model
2023-10-14 01:15:25,268 ----------------------------------------------------------------------------------------------------
2023-10-14 01:15:31,404 epoch 10 - iter 99/992 - loss 0.00769297 - time (sec): 6.13 - samples/sec: 2865.70 - lr: 0.000003 - momentum: 0.000000
2023-10-14 01:15:37,404 epoch 10 - iter 198/992 - loss 0.00697383 - time (sec): 12.13 - samples/sec: 2809.85 - lr: 0.000003 - momentum: 0.000000
2023-10-14 01:15:42,992 epoch 10 - iter 297/992 - loss 0.00764415 - time (sec): 17.72 - samples/sec: 2775.21 - lr: 0.000002 - momentum: 0.000000
2023-10-14 01:15:48,942 epoch 10 - iter 396/992 - loss 0.00827819 - time (sec): 23.67 - samples/sec: 2778.87 - lr: 0.000002 - momentum: 0.000000
2023-10-14 01:15:54,828 epoch 10 - iter 495/992 - loss 0.00802192 - time (sec): 29.56 - samples/sec: 2786.17 - lr: 0.000002 - momentum: 0.000000
2023-10-14 01:16:00,688 epoch 10 - iter 594/992 - loss 0.00768562 - time (sec): 35.42 - samples/sec: 2778.12 - lr: 0.000001 - momentum: 0.000000
2023-10-14 01:16:06,479 epoch 10 - iter 693/992 - loss 0.00852248 - time (sec): 41.21 - samples/sec: 2781.76 - lr: 0.000001 - momentum: 0.000000
2023-10-14 01:16:12,477 epoch 10 - iter 792/992 - loss 0.00836690 - time (sec): 47.20 - samples/sec: 2781.41 - lr: 0.000001 - momentum: 0.000000
2023-10-14 01:16:18,131 epoch 10 - iter 891/992 - loss 0.00856003 - time (sec): 52.86 - samples/sec: 2797.47 - lr: 0.000000 - momentum: 0.000000
2023-10-14 01:16:23,891 epoch 10 - iter 990/992 - loss 0.00877170 - time (sec): 58.62 - samples/sec: 2792.44 - lr: 0.000000 - momentum: 0.000000
2023-10-14 01:16:23,999 ----------------------------------------------------------------------------------------------------
2023-10-14 01:16:24,000 EPOCH 10 done: loss 0.0088 - lr: 0.000000
2023-10-14 01:16:27,448 DEV : loss 0.21987785398960114 - f1-score (micro avg)  0.753
2023-10-14 01:16:27,906 ----------------------------------------------------------------------------------------------------
2023-10-14 01:16:27,907 Loading model from best epoch ...
2023-10-14 01:16:29,242 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-14 01:16:32,497 
Results:
- F-score (micro) 0.7723
- F-score (macro) 0.6898
- Accuracy 0.6513

By class:
              precision    recall  f1-score   support

         LOC     0.8118    0.8427    0.8270       655
         PER     0.7379    0.8206    0.7771       223
         ORG     0.4831    0.4488    0.4653       127

   micro avg     0.7572    0.7881    0.7723      1005
   macro avg     0.6776    0.7041    0.6898      1005
weighted avg     0.7538    0.7881    0.7702      1005

2023-10-14 01:16:32,497 ----------------------------------------------------------------------------------------------------