2023-10-12 16:33:19,371 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,373 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 16:33:19,373 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,374 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 16:33:19,374 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,374 Train: 5777 sentences 2023-10-12 16:33:19,374 (train_with_dev=False, train_with_test=False) 2023-10-12 16:33:19,374 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,374 Training Params: 2023-10-12 16:33:19,374 - learning_rate: "0.00016" 2023-10-12 16:33:19,374 - mini_batch_size: "4" 2023-10-12 16:33:19,374 - max_epochs: "10" 2023-10-12 16:33:19,374 - shuffle: "True" 2023-10-12 16:33:19,374 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,374 Plugins: 2023-10-12 16:33:19,374 - TensorboardLogger 2023-10-12 16:33:19,375 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 16:33:19,375 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,375 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 16:33:19,375 - metric: "('micro avg', 'f1-score')" 2023-10-12 16:33:19,375 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,375 Computation: 2023-10-12 16:33:19,375 - compute on device: cuda:0 2023-10-12 16:33:19,375 - embedding storage: none 2023-10-12 16:33:19,375 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,375 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-12 16:33:19,375 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,375 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:33:19,375 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 16:34:03,482 epoch 1 - iter 144/1445 - loss 2.56300254 - time (sec): 44.10 - samples/sec: 405.02 - lr: 0.000016 - momentum: 0.000000 2023-10-12 16:34:47,886 epoch 1 - iter 288/1445 - loss 2.38526191 - time (sec): 88.51 - samples/sec: 407.78 - lr: 0.000032 - momentum: 0.000000 2023-10-12 16:35:30,464 epoch 1 - iter 432/1445 - loss 2.13289859 - time (sec): 131.09 - samples/sec: 405.11 - lr: 0.000048 - momentum: 0.000000 2023-10-12 16:36:13,934 epoch 1 - iter 576/1445 - loss 1.84024411 - time (sec): 174.56 - samples/sec: 405.11 - lr: 0.000064 - momentum: 0.000000 2023-10-12 16:36:56,656 epoch 1 - iter 720/1445 - loss 1.55933001 - time (sec): 217.28 - samples/sec: 406.68 - lr: 0.000080 - momentum: 0.000000 2023-10-12 16:37:37,250 epoch 1 - iter 864/1445 - loss 1.35312862 - time (sec): 257.87 - samples/sec: 407.98 - lr: 0.000096 - momentum: 0.000000 2023-10-12 16:38:19,681 epoch 1 - iter 1008/1445 - loss 1.19495096 - time (sec): 300.30 - samples/sec: 407.48 - lr: 0.000112 - momentum: 0.000000 2023-10-12 16:39:02,528 epoch 1 - iter 1152/1445 - loss 1.06319855 - time (sec): 343.15 - samples/sec: 409.47 - lr: 0.000127 - momentum: 0.000000 2023-10-12 16:39:44,972 epoch 1 - iter 1296/1445 - loss 0.96042465 - time (sec): 385.59 - samples/sec: 412.03 - lr: 0.000143 - momentum: 0.000000 2023-10-12 16:40:27,593 epoch 1 - iter 1440/1445 - loss 0.88488446 - time (sec): 428.22 - samples/sec: 410.24 - lr: 0.000159 - momentum: 0.000000 2023-10-12 16:40:28,827 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:40:28,828 EPOCH 1 done: loss 0.8825 - lr: 0.000159 2023-10-12 16:40:48,696 DEV : loss 0.18981926143169403 - f1-score (micro avg) 0.3887 2023-10-12 16:40:48,730 saving best model 2023-10-12 16:40:49,705 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:41:32,871 epoch 2 - iter 144/1445 - loss 0.13981313 - time (sec): 43.16 - samples/sec: 410.92 - lr: 0.000158 - momentum: 0.000000 2023-10-12 16:42:13,700 epoch 2 - iter 288/1445 - loss 0.13956803 - time (sec): 83.99 - samples/sec: 415.01 - lr: 0.000156 - momentum: 0.000000 2023-10-12 16:42:54,732 epoch 2 - iter 432/1445 - loss 0.13836300 - time (sec): 125.02 - samples/sec: 417.55 - lr: 0.000155 - momentum: 0.000000 2023-10-12 16:43:36,244 epoch 2 - iter 576/1445 - loss 0.13112203 - time (sec): 166.54 - samples/sec: 417.27 - lr: 0.000153 - momentum: 0.000000 2023-10-12 16:44:19,336 epoch 2 - iter 720/1445 - loss 0.12498734 - time (sec): 209.63 - samples/sec: 409.49 - lr: 0.000151 - momentum: 0.000000 2023-10-12 16:45:03,660 epoch 2 - iter 864/1445 - loss 0.12218588 - time (sec): 253.95 - samples/sec: 407.93 - lr: 0.000149 - momentum: 0.000000 2023-10-12 16:45:46,648 epoch 2 - iter 1008/1445 - loss 0.12188956 - time (sec): 296.94 - samples/sec: 410.26 - lr: 0.000148 - momentum: 0.000000 2023-10-12 16:46:27,790 epoch 2 - iter 1152/1445 - loss 0.11964934 - time (sec): 338.08 - samples/sec: 413.86 - lr: 0.000146 - momentum: 0.000000 2023-10-12 16:47:09,340 epoch 2 - iter 1296/1445 - loss 0.11521379 - time (sec): 379.63 - samples/sec: 415.90 - lr: 0.000144 - momentum: 0.000000 2023-10-12 16:47:50,210 epoch 2 - iter 1440/1445 - loss 0.11336959 - time (sec): 420.50 - samples/sec: 417.31 - lr: 0.000142 - momentum: 0.000000 2023-10-12 16:47:51,614 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:47:51,614 EPOCH 2 done: loss 0.1131 - lr: 0.000142 2023-10-12 16:48:12,874 DEV : loss 0.09119933843612671 - f1-score (micro avg) 0.8308 2023-10-12 16:48:12,904 saving best model 2023-10-12 16:48:15,491 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:48:57,020 epoch 3 - iter 144/1445 - loss 0.07985533 - time (sec): 41.52 - samples/sec: 405.49 - lr: 0.000140 - momentum: 0.000000 2023-10-12 16:49:38,239 epoch 3 - iter 288/1445 - loss 0.07523997 - time (sec): 82.74 - samples/sec: 418.79 - lr: 0.000139 - momentum: 0.000000 2023-10-12 16:50:18,976 epoch 3 - iter 432/1445 - loss 0.07488360 - time (sec): 123.48 - samples/sec: 419.10 - lr: 0.000137 - momentum: 0.000000 2023-10-12 16:50:59,095 epoch 3 - iter 576/1445 - loss 0.07112822 - time (sec): 163.60 - samples/sec: 421.60 - lr: 0.000135 - momentum: 0.000000 2023-10-12 16:51:39,727 epoch 3 - iter 720/1445 - loss 0.07107570 - time (sec): 204.23 - samples/sec: 423.37 - lr: 0.000133 - momentum: 0.000000 2023-10-12 16:52:23,561 epoch 3 - iter 864/1445 - loss 0.07062393 - time (sec): 248.06 - samples/sec: 425.40 - lr: 0.000132 - momentum: 0.000000 2023-10-12 16:53:06,138 epoch 3 - iter 1008/1445 - loss 0.06769948 - time (sec): 290.64 - samples/sec: 424.70 - lr: 0.000130 - momentum: 0.000000 2023-10-12 16:53:47,855 epoch 3 - iter 1152/1445 - loss 0.06676501 - time (sec): 332.36 - samples/sec: 423.10 - lr: 0.000128 - momentum: 0.000000 2023-10-12 16:54:29,300 epoch 3 - iter 1296/1445 - loss 0.06508146 - time (sec): 373.80 - samples/sec: 422.23 - lr: 0.000126 - momentum: 0.000000 2023-10-12 16:55:11,121 epoch 3 - iter 1440/1445 - loss 0.06435452 - time (sec): 415.62 - samples/sec: 422.69 - lr: 0.000125 - momentum: 0.000000 2023-10-12 16:55:12,348 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:55:12,349 EPOCH 3 done: loss 0.0644 - lr: 0.000125 2023-10-12 16:55:32,850 DEV : loss 0.06621405482292175 - f1-score (micro avg) 0.8638 2023-10-12 16:55:32,879 saving best model 2023-10-12 16:55:35,449 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:56:16,669 epoch 4 - iter 144/1445 - loss 0.03491763 - time (sec): 41.22 - samples/sec: 458.62 - lr: 0.000123 - momentum: 0.000000 2023-10-12 16:56:55,734 epoch 4 - iter 288/1445 - loss 0.03857362 - time (sec): 80.28 - samples/sec: 436.36 - lr: 0.000121 - momentum: 0.000000 2023-10-12 16:57:35,889 epoch 4 - iter 432/1445 - loss 0.03858829 - time (sec): 120.43 - samples/sec: 429.12 - lr: 0.000119 - momentum: 0.000000 2023-10-12 16:58:16,860 epoch 4 - iter 576/1445 - loss 0.04126412 - time (sec): 161.41 - samples/sec: 427.20 - lr: 0.000117 - momentum: 0.000000 2023-10-12 16:58:57,647 epoch 4 - iter 720/1445 - loss 0.04127926 - time (sec): 202.19 - samples/sec: 425.21 - lr: 0.000116 - momentum: 0.000000 2023-10-12 16:59:40,966 epoch 4 - iter 864/1445 - loss 0.04331915 - time (sec): 245.51 - samples/sec: 423.87 - lr: 0.000114 - momentum: 0.000000 2023-10-12 17:00:22,245 epoch 4 - iter 1008/1445 - loss 0.04325634 - time (sec): 286.79 - samples/sec: 429.29 - lr: 0.000112 - momentum: 0.000000 2023-10-12 17:01:01,932 epoch 4 - iter 1152/1445 - loss 0.04376823 - time (sec): 326.48 - samples/sec: 429.38 - lr: 0.000110 - momentum: 0.000000 2023-10-12 17:01:41,831 epoch 4 - iter 1296/1445 - loss 0.04471474 - time (sec): 366.38 - samples/sec: 430.38 - lr: 0.000109 - momentum: 0.000000 2023-10-12 17:02:22,332 epoch 4 - iter 1440/1445 - loss 0.04407434 - time (sec): 406.88 - samples/sec: 431.94 - lr: 0.000107 - momentum: 0.000000 2023-10-12 17:02:23,481 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:02:23,481 EPOCH 4 done: loss 0.0441 - lr: 0.000107 2023-10-12 17:02:44,195 DEV : loss 0.08768334984779358 - f1-score (micro avg) 0.8564 2023-10-12 17:02:44,226 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:03:24,221 epoch 5 - iter 144/1445 - loss 0.01928414 - time (sec): 39.99 - samples/sec: 424.43 - lr: 0.000105 - momentum: 0.000000 2023-10-12 17:04:04,655 epoch 5 - iter 288/1445 - loss 0.02782982 - time (sec): 80.43 - samples/sec: 427.60 - lr: 0.000103 - momentum: 0.000000 2023-10-12 17:04:44,485 epoch 5 - iter 432/1445 - loss 0.02750831 - time (sec): 120.26 - samples/sec: 433.41 - lr: 0.000101 - momentum: 0.000000 2023-10-12 17:05:25,173 epoch 5 - iter 576/1445 - loss 0.02988547 - time (sec): 160.95 - samples/sec: 438.34 - lr: 0.000100 - momentum: 0.000000 2023-10-12 17:06:05,256 epoch 5 - iter 720/1445 - loss 0.03160699 - time (sec): 201.03 - samples/sec: 438.23 - lr: 0.000098 - momentum: 0.000000 2023-10-12 17:06:45,729 epoch 5 - iter 864/1445 - loss 0.03162838 - time (sec): 241.50 - samples/sec: 437.51 - lr: 0.000096 - momentum: 0.000000 2023-10-12 17:07:24,736 epoch 5 - iter 1008/1445 - loss 0.03117068 - time (sec): 280.51 - samples/sec: 435.41 - lr: 0.000094 - momentum: 0.000000 2023-10-12 17:08:05,615 epoch 5 - iter 1152/1445 - loss 0.03085839 - time (sec): 321.39 - samples/sec: 435.33 - lr: 0.000093 - momentum: 0.000000 2023-10-12 17:08:46,790 epoch 5 - iter 1296/1445 - loss 0.03070485 - time (sec): 362.56 - samples/sec: 434.52 - lr: 0.000091 - momentum: 0.000000 2023-10-12 17:09:27,677 epoch 5 - iter 1440/1445 - loss 0.03200208 - time (sec): 403.45 - samples/sec: 435.47 - lr: 0.000089 - momentum: 0.000000 2023-10-12 17:09:28,879 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:09:28,879 EPOCH 5 done: loss 0.0319 - lr: 0.000089 2023-10-12 17:09:49,304 DEV : loss 0.10739118605852127 - f1-score (micro avg) 0.8495 2023-10-12 17:09:49,334 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:10:30,193 epoch 6 - iter 144/1445 - loss 0.03926076 - time (sec): 40.86 - samples/sec: 439.95 - lr: 0.000087 - momentum: 0.000000 2023-10-12 17:11:09,843 epoch 6 - iter 288/1445 - loss 0.03195608 - time (sec): 80.51 - samples/sec: 434.47 - lr: 0.000085 - momentum: 0.000000 2023-10-12 17:11:50,454 epoch 6 - iter 432/1445 - loss 0.03002443 - time (sec): 121.12 - samples/sec: 436.80 - lr: 0.000084 - momentum: 0.000000 2023-10-12 17:12:31,958 epoch 6 - iter 576/1445 - loss 0.02767328 - time (sec): 162.62 - samples/sec: 441.01 - lr: 0.000082 - momentum: 0.000000 2023-10-12 17:13:10,662 epoch 6 - iter 720/1445 - loss 0.02642560 - time (sec): 201.33 - samples/sec: 434.28 - lr: 0.000080 - momentum: 0.000000 2023-10-12 17:13:52,438 epoch 6 - iter 864/1445 - loss 0.02552455 - time (sec): 243.10 - samples/sec: 439.35 - lr: 0.000078 - momentum: 0.000000 2023-10-12 17:14:31,894 epoch 6 - iter 1008/1445 - loss 0.02476366 - time (sec): 282.56 - samples/sec: 437.47 - lr: 0.000076 - momentum: 0.000000 2023-10-12 17:15:12,546 epoch 6 - iter 1152/1445 - loss 0.02469782 - time (sec): 323.21 - samples/sec: 436.99 - lr: 0.000075 - momentum: 0.000000 2023-10-12 17:15:54,048 epoch 6 - iter 1296/1445 - loss 0.02478677 - time (sec): 364.71 - samples/sec: 435.29 - lr: 0.000073 - momentum: 0.000000 2023-10-12 17:16:33,885 epoch 6 - iter 1440/1445 - loss 0.02484737 - time (sec): 404.55 - samples/sec: 434.17 - lr: 0.000071 - momentum: 0.000000 2023-10-12 17:16:35,141 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:16:35,141 EPOCH 6 done: loss 0.0248 - lr: 0.000071 2023-10-12 17:16:56,074 DEV : loss 0.11206483840942383 - f1-score (micro avg) 0.8582 2023-10-12 17:16:56,105 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:17:37,350 epoch 7 - iter 144/1445 - loss 0.03196159 - time (sec): 41.24 - samples/sec: 405.21 - lr: 0.000069 - momentum: 0.000000 2023-10-12 17:18:21,847 epoch 7 - iter 288/1445 - loss 0.02494608 - time (sec): 85.74 - samples/sec: 417.58 - lr: 0.000068 - momentum: 0.000000 2023-10-12 17:19:04,928 epoch 7 - iter 432/1445 - loss 0.02528055 - time (sec): 128.82 - samples/sec: 416.07 - lr: 0.000066 - momentum: 0.000000 2023-10-12 17:19:47,446 epoch 7 - iter 576/1445 - loss 0.02201332 - time (sec): 171.34 - samples/sec: 414.68 - lr: 0.000064 - momentum: 0.000000 2023-10-12 17:20:30,277 epoch 7 - iter 720/1445 - loss 0.02177977 - time (sec): 214.17 - samples/sec: 414.34 - lr: 0.000062 - momentum: 0.000000 2023-10-12 17:21:12,767 epoch 7 - iter 864/1445 - loss 0.02046133 - time (sec): 256.66 - samples/sec: 416.64 - lr: 0.000060 - momentum: 0.000000 2023-10-12 17:21:54,758 epoch 7 - iter 1008/1445 - loss 0.02080315 - time (sec): 298.65 - samples/sec: 419.29 - lr: 0.000059 - momentum: 0.000000 2023-10-12 17:22:35,231 epoch 7 - iter 1152/1445 - loss 0.01995810 - time (sec): 339.12 - samples/sec: 420.44 - lr: 0.000057 - momentum: 0.000000 2023-10-12 17:23:15,074 epoch 7 - iter 1296/1445 - loss 0.01959775 - time (sec): 378.97 - samples/sec: 419.20 - lr: 0.000055 - momentum: 0.000000 2023-10-12 17:23:57,171 epoch 7 - iter 1440/1445 - loss 0.01919288 - time (sec): 421.06 - samples/sec: 417.48 - lr: 0.000053 - momentum: 0.000000 2023-10-12 17:23:58,329 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:23:58,330 EPOCH 7 done: loss 0.0191 - lr: 0.000053 2023-10-12 17:24:20,533 DEV : loss 0.12889063358306885 - f1-score (micro avg) 0.853 2023-10-12 17:24:20,562 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:25:01,698 epoch 8 - iter 144/1445 - loss 0.01021579 - time (sec): 41.13 - samples/sec: 434.73 - lr: 0.000052 - momentum: 0.000000 2023-10-12 17:25:42,787 epoch 8 - iter 288/1445 - loss 0.01004474 - time (sec): 82.22 - samples/sec: 437.18 - lr: 0.000050 - momentum: 0.000000 2023-10-12 17:26:23,418 epoch 8 - iter 432/1445 - loss 0.01117270 - time (sec): 122.85 - samples/sec: 435.97 - lr: 0.000048 - momentum: 0.000000 2023-10-12 17:27:05,836 epoch 8 - iter 576/1445 - loss 0.01033342 - time (sec): 165.27 - samples/sec: 437.03 - lr: 0.000046 - momentum: 0.000000 2023-10-12 17:27:45,951 epoch 8 - iter 720/1445 - loss 0.01317246 - time (sec): 205.39 - samples/sec: 428.88 - lr: 0.000044 - momentum: 0.000000 2023-10-12 17:28:26,943 epoch 8 - iter 864/1445 - loss 0.01310077 - time (sec): 246.38 - samples/sec: 426.90 - lr: 0.000043 - momentum: 0.000000 2023-10-12 17:29:07,910 epoch 8 - iter 1008/1445 - loss 0.01364110 - time (sec): 287.35 - samples/sec: 426.32 - lr: 0.000041 - momentum: 0.000000 2023-10-12 17:29:50,319 epoch 8 - iter 1152/1445 - loss 0.01391426 - time (sec): 329.75 - samples/sec: 425.34 - lr: 0.000039 - momentum: 0.000000 2023-10-12 17:30:32,149 epoch 8 - iter 1296/1445 - loss 0.01330664 - time (sec): 371.58 - samples/sec: 424.88 - lr: 0.000037 - momentum: 0.000000 2023-10-12 17:31:14,018 epoch 8 - iter 1440/1445 - loss 0.01380146 - time (sec): 413.45 - samples/sec: 425.19 - lr: 0.000036 - momentum: 0.000000 2023-10-12 17:31:15,188 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:31:15,188 EPOCH 8 done: loss 0.0138 - lr: 0.000036 2023-10-12 17:31:35,933 DEV : loss 0.12251030653715134 - f1-score (micro avg) 0.8642 2023-10-12 17:31:35,963 saving best model 2023-10-12 17:31:36,938 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:32:18,958 epoch 9 - iter 144/1445 - loss 0.00745630 - time (sec): 42.02 - samples/sec: 437.46 - lr: 0.000034 - momentum: 0.000000 2023-10-12 17:33:00,161 epoch 9 - iter 288/1445 - loss 0.00761895 - time (sec): 83.22 - samples/sec: 422.78 - lr: 0.000032 - momentum: 0.000000 2023-10-12 17:33:40,433 epoch 9 - iter 432/1445 - loss 0.00995441 - time (sec): 123.49 - samples/sec: 422.87 - lr: 0.000030 - momentum: 0.000000 2023-10-12 17:34:21,456 epoch 9 - iter 576/1445 - loss 0.00880978 - time (sec): 164.52 - samples/sec: 423.26 - lr: 0.000028 - momentum: 0.000000 2023-10-12 17:35:02,766 epoch 9 - iter 720/1445 - loss 0.00785433 - time (sec): 205.83 - samples/sec: 427.36 - lr: 0.000027 - momentum: 0.000000 2023-10-12 17:35:43,814 epoch 9 - iter 864/1445 - loss 0.00778198 - time (sec): 246.87 - samples/sec: 427.54 - lr: 0.000025 - momentum: 0.000000 2023-10-12 17:36:24,978 epoch 9 - iter 1008/1445 - loss 0.00817747 - time (sec): 288.04 - samples/sec: 430.09 - lr: 0.000023 - momentum: 0.000000 2023-10-12 17:37:05,337 epoch 9 - iter 1152/1445 - loss 0.00847105 - time (sec): 328.40 - samples/sec: 429.48 - lr: 0.000021 - momentum: 0.000000 2023-10-12 17:37:45,918 epoch 9 - iter 1296/1445 - loss 0.00923305 - time (sec): 368.98 - samples/sec: 426.79 - lr: 0.000020 - momentum: 0.000000 2023-10-12 17:38:28,699 epoch 9 - iter 1440/1445 - loss 0.00900821 - time (sec): 411.76 - samples/sec: 425.30 - lr: 0.000018 - momentum: 0.000000 2023-10-12 17:38:30,718 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:38:30,718 EPOCH 9 done: loss 0.0096 - lr: 0.000018 2023-10-12 17:38:52,760 DEV : loss 0.14077246189117432 - f1-score (micro avg) 0.8633 2023-10-12 17:38:52,801 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:39:37,018 epoch 10 - iter 144/1445 - loss 0.01174630 - time (sec): 44.21 - samples/sec: 426.17 - lr: 0.000016 - momentum: 0.000000 2023-10-12 17:40:20,486 epoch 10 - iter 288/1445 - loss 0.01002501 - time (sec): 87.68 - samples/sec: 415.57 - lr: 0.000014 - momentum: 0.000000 2023-10-12 17:41:04,188 epoch 10 - iter 432/1445 - loss 0.00910839 - time (sec): 131.39 - samples/sec: 407.70 - lr: 0.000012 - momentum: 0.000000 2023-10-12 17:41:48,780 epoch 10 - iter 576/1445 - loss 0.00824494 - time (sec): 175.98 - samples/sec: 405.69 - lr: 0.000011 - momentum: 0.000000 2023-10-12 17:42:33,468 epoch 10 - iter 720/1445 - loss 0.00769188 - time (sec): 220.67 - samples/sec: 404.45 - lr: 0.000009 - momentum: 0.000000 2023-10-12 17:43:20,283 epoch 10 - iter 864/1445 - loss 0.00752313 - time (sec): 267.48 - samples/sec: 403.45 - lr: 0.000007 - momentum: 0.000000 2023-10-12 17:44:01,105 epoch 10 - iter 1008/1445 - loss 0.00804398 - time (sec): 308.30 - samples/sec: 401.60 - lr: 0.000005 - momentum: 0.000000 2023-10-12 17:44:43,927 epoch 10 - iter 1152/1445 - loss 0.00739720 - time (sec): 351.12 - samples/sec: 405.21 - lr: 0.000004 - momentum: 0.000000 2023-10-12 17:45:25,058 epoch 10 - iter 1296/1445 - loss 0.00730540 - time (sec): 392.26 - samples/sec: 404.88 - lr: 0.000002 - momentum: 0.000000 2023-10-12 17:46:07,507 epoch 10 - iter 1440/1445 - loss 0.00754475 - time (sec): 434.70 - samples/sec: 404.40 - lr: 0.000000 - momentum: 0.000000 2023-10-12 17:46:08,714 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:46:08,715 EPOCH 10 done: loss 0.0075 - lr: 0.000000 2023-10-12 17:46:30,653 DEV : loss 0.14291653037071228 - f1-score (micro avg) 0.8604 2023-10-12 17:46:31,574 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:46:31,576 Loading model from best epoch ... 2023-10-12 17:46:35,453 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 17:46:59,023 Results: - F-score (micro) 0.851 - F-score (macro) 0.744 - Accuracy 0.7524 By class: precision recall f1-score support PER 0.8621 0.8693 0.8657 482 LOC 0.9277 0.8690 0.8974 458 ORG 0.4474 0.4928 0.4690 69 micro avg 0.8587 0.8434 0.8510 1009 macro avg 0.7457 0.7437 0.7440 1009 weighted avg 0.8636 0.8434 0.8530 1009 2023-10-12 17:46:59,023 ----------------------------------------------------------------------------------------------------