2023-10-12 12:50:33,877 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,879 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 12:50:33,879 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,880 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 12:50:33,880 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,880 Train: 5777 sentences 2023-10-12 12:50:33,880 (train_with_dev=False, train_with_test=False) 2023-10-12 12:50:33,880 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,880 Training Params: 2023-10-12 12:50:33,880 - learning_rate: "0.00015" 2023-10-12 12:50:33,880 - mini_batch_size: "8" 2023-10-12 12:50:33,880 - max_epochs: "10" 2023-10-12 12:50:33,880 - shuffle: "True" 2023-10-12 12:50:33,880 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,880 Plugins: 2023-10-12 12:50:33,881 - TensorboardLogger 2023-10-12 12:50:33,881 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 12:50:33,881 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,881 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 12:50:33,881 - metric: "('micro avg', 'f1-score')" 2023-10-12 12:50:33,881 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,881 Computation: 2023-10-12 12:50:33,881 - compute on device: cuda:0 2023-10-12 12:50:33,881 - embedding storage: none 2023-10-12 12:50:33,881 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,881 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-12 12:50:33,881 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,881 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:50:33,882 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 12:51:13,856 epoch 1 - iter 72/723 - loss 2.57210097 - time (sec): 39.97 - samples/sec: 446.88 - lr: 0.000015 - momentum: 0.000000 2023-10-12 12:51:55,334 epoch 1 - iter 144/723 - loss 2.50368577 - time (sec): 81.45 - samples/sec: 443.12 - lr: 0.000030 - momentum: 0.000000 2023-10-12 12:52:35,583 epoch 1 - iter 216/723 - loss 2.34479028 - time (sec): 121.70 - samples/sec: 436.36 - lr: 0.000045 - momentum: 0.000000 2023-10-12 12:53:16,570 epoch 1 - iter 288/723 - loss 2.14084440 - time (sec): 162.69 - samples/sec: 434.66 - lr: 0.000060 - momentum: 0.000000 2023-10-12 12:53:56,024 epoch 1 - iter 360/723 - loss 1.92004853 - time (sec): 202.14 - samples/sec: 437.13 - lr: 0.000074 - momentum: 0.000000 2023-10-12 12:54:36,420 epoch 1 - iter 432/723 - loss 1.71012873 - time (sec): 242.54 - samples/sec: 433.77 - lr: 0.000089 - momentum: 0.000000 2023-10-12 12:55:15,712 epoch 1 - iter 504/723 - loss 1.51773573 - time (sec): 281.83 - samples/sec: 434.20 - lr: 0.000104 - momentum: 0.000000 2023-10-12 12:55:55,929 epoch 1 - iter 576/723 - loss 1.35508499 - time (sec): 322.05 - samples/sec: 436.31 - lr: 0.000119 - momentum: 0.000000 2023-10-12 12:56:35,029 epoch 1 - iter 648/723 - loss 1.22422963 - time (sec): 361.15 - samples/sec: 439.92 - lr: 0.000134 - momentum: 0.000000 2023-10-12 12:57:12,712 epoch 1 - iter 720/723 - loss 1.12678212 - time (sec): 398.83 - samples/sec: 440.47 - lr: 0.000149 - momentum: 0.000000 2023-10-12 12:57:13,890 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:57:13,891 EPOCH 1 done: loss 1.1237 - lr: 0.000149 2023-10-12 12:57:33,815 DEV : loss 0.22408561408519745 - f1-score (micro avg) 0.0021 2023-10-12 12:57:33,845 saving best model 2023-10-12 12:57:34,699 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:58:12,792 epoch 2 - iter 72/723 - loss 0.16692332 - time (sec): 38.09 - samples/sec: 465.66 - lr: 0.000148 - momentum: 0.000000 2023-10-12 12:58:50,804 epoch 2 - iter 144/723 - loss 0.17013500 - time (sec): 76.10 - samples/sec: 458.04 - lr: 0.000147 - momentum: 0.000000 2023-10-12 12:59:29,373 epoch 2 - iter 216/723 - loss 0.16696133 - time (sec): 114.67 - samples/sec: 455.25 - lr: 0.000145 - momentum: 0.000000 2023-10-12 13:00:06,773 epoch 2 - iter 288/723 - loss 0.15925242 - time (sec): 152.07 - samples/sec: 456.96 - lr: 0.000143 - momentum: 0.000000 2023-10-12 13:00:44,163 epoch 2 - iter 360/723 - loss 0.15257157 - time (sec): 189.46 - samples/sec: 453.07 - lr: 0.000142 - momentum: 0.000000 2023-10-12 13:01:23,135 epoch 2 - iter 432/723 - loss 0.14873337 - time (sec): 228.43 - samples/sec: 453.50 - lr: 0.000140 - momentum: 0.000000 2023-10-12 13:02:02,355 epoch 2 - iter 504/723 - loss 0.14684350 - time (sec): 267.65 - samples/sec: 455.16 - lr: 0.000138 - momentum: 0.000000 2023-10-12 13:02:43,047 epoch 2 - iter 576/723 - loss 0.14308762 - time (sec): 308.35 - samples/sec: 453.78 - lr: 0.000137 - momentum: 0.000000 2023-10-12 13:03:23,828 epoch 2 - iter 648/723 - loss 0.13860972 - time (sec): 349.13 - samples/sec: 452.24 - lr: 0.000135 - momentum: 0.000000 2023-10-12 13:04:03,474 epoch 2 - iter 720/723 - loss 0.13666210 - time (sec): 388.77 - samples/sec: 451.37 - lr: 0.000133 - momentum: 0.000000 2023-10-12 13:04:04,944 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:04:04,945 EPOCH 2 done: loss 0.1363 - lr: 0.000133 2023-10-12 13:04:25,559 DEV : loss 0.125865638256073 - f1-score (micro avg) 0.6974 2023-10-12 13:04:25,588 saving best model 2023-10-12 13:04:28,483 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:05:06,593 epoch 3 - iter 72/723 - loss 0.10614972 - time (sec): 38.11 - samples/sec: 441.85 - lr: 0.000132 - momentum: 0.000000 2023-10-12 13:05:45,790 epoch 3 - iter 144/723 - loss 0.09543565 - time (sec): 77.30 - samples/sec: 448.25 - lr: 0.000130 - momentum: 0.000000 2023-10-12 13:06:24,554 epoch 3 - iter 216/723 - loss 0.09411193 - time (sec): 116.07 - samples/sec: 445.86 - lr: 0.000128 - momentum: 0.000000 2023-10-12 13:07:02,711 epoch 3 - iter 288/723 - loss 0.09004094 - time (sec): 154.22 - samples/sec: 447.22 - lr: 0.000127 - momentum: 0.000000 2023-10-12 13:07:42,238 epoch 3 - iter 360/723 - loss 0.08961960 - time (sec): 193.75 - samples/sec: 446.27 - lr: 0.000125 - momentum: 0.000000 2023-10-12 13:08:22,102 epoch 3 - iter 432/723 - loss 0.08897250 - time (sec): 233.61 - samples/sec: 451.71 - lr: 0.000123 - momentum: 0.000000 2023-10-12 13:09:02,522 epoch 3 - iter 504/723 - loss 0.08525340 - time (sec): 274.03 - samples/sec: 450.43 - lr: 0.000122 - momentum: 0.000000 2023-10-12 13:09:41,313 epoch 3 - iter 576/723 - loss 0.08340939 - time (sec): 312.83 - samples/sec: 449.52 - lr: 0.000120 - momentum: 0.000000 2023-10-12 13:10:21,839 epoch 3 - iter 648/723 - loss 0.08185803 - time (sec): 353.35 - samples/sec: 446.67 - lr: 0.000118 - momentum: 0.000000 2023-10-12 13:11:01,410 epoch 3 - iter 720/723 - loss 0.08020714 - time (sec): 392.92 - samples/sec: 447.11 - lr: 0.000117 - momentum: 0.000000 2023-10-12 13:11:02,572 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:11:02,573 EPOCH 3 done: loss 0.0803 - lr: 0.000117 2023-10-12 13:11:24,304 DEV : loss 0.09134244173765182 - f1-score (micro avg) 0.8085 2023-10-12 13:11:24,336 saving best model 2023-10-12 13:11:26,940 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:12:08,816 epoch 4 - iter 72/723 - loss 0.04375890 - time (sec): 41.87 - samples/sec: 451.41 - lr: 0.000115 - momentum: 0.000000 2023-10-12 13:12:45,861 epoch 4 - iter 144/723 - loss 0.04780814 - time (sec): 78.92 - samples/sec: 443.89 - lr: 0.000113 - momentum: 0.000000 2023-10-12 13:13:26,615 epoch 4 - iter 216/723 - loss 0.04927505 - time (sec): 119.67 - samples/sec: 431.85 - lr: 0.000112 - momentum: 0.000000 2023-10-12 13:14:05,422 epoch 4 - iter 288/723 - loss 0.05251675 - time (sec): 158.48 - samples/sec: 435.09 - lr: 0.000110 - momentum: 0.000000 2023-10-12 13:14:42,756 epoch 4 - iter 360/723 - loss 0.05197170 - time (sec): 195.81 - samples/sec: 439.06 - lr: 0.000108 - momentum: 0.000000 2023-10-12 13:15:22,534 epoch 4 - iter 432/723 - loss 0.05444043 - time (sec): 235.59 - samples/sec: 441.72 - lr: 0.000107 - momentum: 0.000000 2023-10-12 13:16:02,299 epoch 4 - iter 504/723 - loss 0.05358124 - time (sec): 275.36 - samples/sec: 447.12 - lr: 0.000105 - momentum: 0.000000 2023-10-12 13:16:40,164 epoch 4 - iter 576/723 - loss 0.05390487 - time (sec): 313.22 - samples/sec: 447.55 - lr: 0.000103 - momentum: 0.000000 2023-10-12 13:17:20,651 epoch 4 - iter 648/723 - loss 0.05448671 - time (sec): 353.71 - samples/sec: 445.79 - lr: 0.000102 - momentum: 0.000000 2023-10-12 13:17:59,869 epoch 4 - iter 720/723 - loss 0.05377454 - time (sec): 392.93 - samples/sec: 447.28 - lr: 0.000100 - momentum: 0.000000 2023-10-12 13:18:00,975 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:18:00,975 EPOCH 4 done: loss 0.0537 - lr: 0.000100 2023-10-12 13:18:21,723 DEV : loss 0.08551333099603653 - f1-score (micro avg) 0.8344 2023-10-12 13:18:21,755 saving best model 2023-10-12 13:18:22,733 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:19:01,088 epoch 5 - iter 72/723 - loss 0.02887265 - time (sec): 38.35 - samples/sec: 442.57 - lr: 0.000098 - momentum: 0.000000 2023-10-12 13:19:42,050 epoch 5 - iter 144/723 - loss 0.03460162 - time (sec): 79.32 - samples/sec: 433.60 - lr: 0.000097 - momentum: 0.000000 2023-10-12 13:20:22,461 epoch 5 - iter 216/723 - loss 0.03491227 - time (sec): 119.73 - samples/sec: 435.34 - lr: 0.000095 - momentum: 0.000000 2023-10-12 13:21:05,047 epoch 5 - iter 288/723 - loss 0.03630309 - time (sec): 162.31 - samples/sec: 434.65 - lr: 0.000093 - momentum: 0.000000 2023-10-12 13:21:48,859 epoch 5 - iter 360/723 - loss 0.03649219 - time (sec): 206.12 - samples/sec: 427.40 - lr: 0.000092 - momentum: 0.000000 2023-10-12 13:22:31,786 epoch 5 - iter 432/723 - loss 0.03618346 - time (sec): 249.05 - samples/sec: 424.25 - lr: 0.000090 - momentum: 0.000000 2023-10-12 13:23:12,902 epoch 5 - iter 504/723 - loss 0.03519474 - time (sec): 290.17 - samples/sec: 420.92 - lr: 0.000088 - momentum: 0.000000 2023-10-12 13:23:54,082 epoch 5 - iter 576/723 - loss 0.03451399 - time (sec): 331.35 - samples/sec: 422.25 - lr: 0.000087 - momentum: 0.000000 2023-10-12 13:24:35,954 epoch 5 - iter 648/723 - loss 0.03511239 - time (sec): 373.22 - samples/sec: 422.11 - lr: 0.000085 - momentum: 0.000000 2023-10-12 13:25:18,788 epoch 5 - iter 720/723 - loss 0.03557155 - time (sec): 416.05 - samples/sec: 422.28 - lr: 0.000083 - momentum: 0.000000 2023-10-12 13:25:20,108 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:25:20,108 EPOCH 5 done: loss 0.0355 - lr: 0.000083 2023-10-12 13:25:42,303 DEV : loss 0.09844549000263214 - f1-score (micro avg) 0.8281 2023-10-12 13:25:42,335 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:26:24,818 epoch 6 - iter 72/723 - loss 0.03450655 - time (sec): 42.48 - samples/sec: 423.14 - lr: 0.000082 - momentum: 0.000000 2023-10-12 13:27:04,856 epoch 6 - iter 144/723 - loss 0.03035545 - time (sec): 82.52 - samples/sec: 423.88 - lr: 0.000080 - momentum: 0.000000 2023-10-12 13:27:43,676 epoch 6 - iter 216/723 - loss 0.02931547 - time (sec): 121.34 - samples/sec: 436.00 - lr: 0.000078 - momentum: 0.000000 2023-10-12 13:28:23,185 epoch 6 - iter 288/723 - loss 0.02672535 - time (sec): 160.85 - samples/sec: 445.88 - lr: 0.000077 - momentum: 0.000000 2023-10-12 13:29:00,446 epoch 6 - iter 360/723 - loss 0.02600815 - time (sec): 198.11 - samples/sec: 441.33 - lr: 0.000075 - momentum: 0.000000 2023-10-12 13:29:40,823 epoch 6 - iter 432/723 - loss 0.02605262 - time (sec): 238.49 - samples/sec: 447.85 - lr: 0.000073 - momentum: 0.000000 2023-10-12 13:30:19,459 epoch 6 - iter 504/723 - loss 0.02591407 - time (sec): 277.12 - samples/sec: 446.06 - lr: 0.000072 - momentum: 0.000000 2023-10-12 13:30:58,816 epoch 6 - iter 576/723 - loss 0.02584013 - time (sec): 316.48 - samples/sec: 446.29 - lr: 0.000070 - momentum: 0.000000 2023-10-12 13:31:38,066 epoch 6 - iter 648/723 - loss 0.02608188 - time (sec): 355.73 - samples/sec: 446.29 - lr: 0.000068 - momentum: 0.000000 2023-10-12 13:32:16,080 epoch 6 - iter 720/723 - loss 0.02616533 - time (sec): 393.74 - samples/sec: 446.09 - lr: 0.000067 - momentum: 0.000000 2023-10-12 13:32:17,295 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:32:17,296 EPOCH 6 done: loss 0.0261 - lr: 0.000067 2023-10-12 13:32:38,779 DEV : loss 0.0909653976559639 - f1-score (micro avg) 0.8547 2023-10-12 13:32:38,811 saving best model 2023-10-12 13:32:41,417 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:33:18,878 epoch 7 - iter 72/723 - loss 0.02548483 - time (sec): 37.46 - samples/sec: 446.16 - lr: 0.000065 - momentum: 0.000000 2023-10-12 13:33:58,757 epoch 7 - iter 144/723 - loss 0.02514500 - time (sec): 77.34 - samples/sec: 462.95 - lr: 0.000063 - momentum: 0.000000 2023-10-12 13:34:37,377 epoch 7 - iter 216/723 - loss 0.02611835 - time (sec): 115.96 - samples/sec: 462.23 - lr: 0.000062 - momentum: 0.000000 2023-10-12 13:35:15,889 epoch 7 - iter 288/723 - loss 0.02415853 - time (sec): 154.47 - samples/sec: 459.97 - lr: 0.000060 - momentum: 0.000000 2023-10-12 13:35:54,789 epoch 7 - iter 360/723 - loss 0.02340115 - time (sec): 193.37 - samples/sec: 458.91 - lr: 0.000058 - momentum: 0.000000 2023-10-12 13:36:34,765 epoch 7 - iter 432/723 - loss 0.02218556 - time (sec): 233.34 - samples/sec: 458.27 - lr: 0.000057 - momentum: 0.000000 2023-10-12 13:37:13,961 epoch 7 - iter 504/723 - loss 0.02199272 - time (sec): 272.54 - samples/sec: 459.46 - lr: 0.000055 - momentum: 0.000000 2023-10-12 13:37:51,396 epoch 7 - iter 576/723 - loss 0.02149862 - time (sec): 309.98 - samples/sec: 459.97 - lr: 0.000053 - momentum: 0.000000 2023-10-12 13:38:28,567 epoch 7 - iter 648/723 - loss 0.02110408 - time (sec): 347.15 - samples/sec: 457.63 - lr: 0.000052 - momentum: 0.000000 2023-10-12 13:39:04,853 epoch 7 - iter 720/723 - loss 0.02089846 - time (sec): 383.43 - samples/sec: 458.45 - lr: 0.000050 - momentum: 0.000000 2023-10-12 13:39:05,894 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:39:05,895 EPOCH 7 done: loss 0.0209 - lr: 0.000050 2023-10-12 13:39:26,497 DEV : loss 0.12286769598722458 - f1-score (micro avg) 0.8403 2023-10-12 13:39:26,530 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:40:05,852 epoch 8 - iter 72/723 - loss 0.01313140 - time (sec): 39.32 - samples/sec: 454.78 - lr: 0.000048 - momentum: 0.000000 2023-10-12 13:40:45,572 epoch 8 - iter 144/723 - loss 0.01250697 - time (sec): 79.04 - samples/sec: 454.78 - lr: 0.000047 - momentum: 0.000000 2023-10-12 13:41:25,089 epoch 8 - iter 216/723 - loss 0.01246682 - time (sec): 118.56 - samples/sec: 451.77 - lr: 0.000045 - momentum: 0.000000 2023-10-12 13:42:05,028 epoch 8 - iter 288/723 - loss 0.01132435 - time (sec): 158.50 - samples/sec: 455.71 - lr: 0.000043 - momentum: 0.000000 2023-10-12 13:42:42,164 epoch 8 - iter 360/723 - loss 0.01353114 - time (sec): 195.63 - samples/sec: 450.26 - lr: 0.000042 - momentum: 0.000000 2023-10-12 13:43:21,974 epoch 8 - iter 432/723 - loss 0.01342207 - time (sec): 235.44 - samples/sec: 446.73 - lr: 0.000040 - momentum: 0.000000 2023-10-12 13:44:01,853 epoch 8 - iter 504/723 - loss 0.01495552 - time (sec): 275.32 - samples/sec: 444.93 - lr: 0.000038 - momentum: 0.000000 2023-10-12 13:44:44,572 epoch 8 - iter 576/723 - loss 0.01542540 - time (sec): 318.04 - samples/sec: 441.01 - lr: 0.000037 - momentum: 0.000000 2023-10-12 13:45:26,594 epoch 8 - iter 648/723 - loss 0.01508947 - time (sec): 360.06 - samples/sec: 438.48 - lr: 0.000035 - momentum: 0.000000 2023-10-12 13:46:08,421 epoch 8 - iter 720/723 - loss 0.01601048 - time (sec): 401.89 - samples/sec: 437.43 - lr: 0.000033 - momentum: 0.000000 2023-10-12 13:46:09,570 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:46:09,571 EPOCH 8 done: loss 0.0162 - lr: 0.000033 2023-10-12 13:46:31,884 DEV : loss 0.12785491347312927 - f1-score (micro avg) 0.8443 2023-10-12 13:46:31,946 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:47:15,224 epoch 9 - iter 72/723 - loss 0.01226192 - time (sec): 43.28 - samples/sec: 424.75 - lr: 0.000032 - momentum: 0.000000 2023-10-12 13:47:57,051 epoch 9 - iter 144/723 - loss 0.01230280 - time (sec): 85.10 - samples/sec: 413.44 - lr: 0.000030 - momentum: 0.000000 2023-10-12 13:48:39,683 epoch 9 - iter 216/723 - loss 0.01384980 - time (sec): 127.73 - samples/sec: 408.83 - lr: 0.000028 - momentum: 0.000000 2023-10-12 13:49:21,709 epoch 9 - iter 288/723 - loss 0.01248849 - time (sec): 169.76 - samples/sec: 410.18 - lr: 0.000027 - momentum: 0.000000 2023-10-12 13:50:02,274 epoch 9 - iter 360/723 - loss 0.01133267 - time (sec): 210.32 - samples/sec: 418.21 - lr: 0.000025 - momentum: 0.000000 2023-10-12 13:50:44,723 epoch 9 - iter 432/723 - loss 0.01131670 - time (sec): 252.77 - samples/sec: 417.56 - lr: 0.000023 - momentum: 0.000000 2023-10-12 13:51:25,911 epoch 9 - iter 504/723 - loss 0.01149792 - time (sec): 293.96 - samples/sec: 421.43 - lr: 0.000022 - momentum: 0.000000 2023-10-12 13:52:06,053 epoch 9 - iter 576/723 - loss 0.01200648 - time (sec): 334.10 - samples/sec: 422.14 - lr: 0.000020 - momentum: 0.000000 2023-10-12 13:52:44,347 epoch 9 - iter 648/723 - loss 0.01233164 - time (sec): 372.40 - samples/sec: 422.87 - lr: 0.000018 - momentum: 0.000000 2023-10-12 13:53:25,270 epoch 9 - iter 720/723 - loss 0.01241683 - time (sec): 413.32 - samples/sec: 423.69 - lr: 0.000017 - momentum: 0.000000 2023-10-12 13:53:27,201 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:53:27,202 EPOCH 9 done: loss 0.0136 - lr: 0.000017 2023-10-12 13:53:49,325 DEV : loss 0.13609179854393005 - f1-score (micro avg) 0.8429 2023-10-12 13:53:49,364 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:54:34,158 epoch 10 - iter 72/723 - loss 0.02070022 - time (sec): 44.79 - samples/sec: 420.68 - lr: 0.000015 - momentum: 0.000000 2023-10-12 13:55:14,534 epoch 10 - iter 144/723 - loss 0.01572521 - time (sec): 85.17 - samples/sec: 427.85 - lr: 0.000013 - momentum: 0.000000 2023-10-12 13:55:57,299 epoch 10 - iter 216/723 - loss 0.01436480 - time (sec): 127.93 - samples/sec: 418.71 - lr: 0.000012 - momentum: 0.000000 2023-10-12 13:56:38,887 epoch 10 - iter 288/723 - loss 0.01335702 - time (sec): 169.52 - samples/sec: 421.14 - lr: 0.000010 - momentum: 0.000000 2023-10-12 13:57:20,435 epoch 10 - iter 360/723 - loss 0.01325818 - time (sec): 211.07 - samples/sec: 422.83 - lr: 0.000008 - momentum: 0.000000 2023-10-12 13:58:02,750 epoch 10 - iter 432/723 - loss 0.01235929 - time (sec): 253.38 - samples/sec: 425.90 - lr: 0.000007 - momentum: 0.000000 2023-10-12 13:58:43,798 epoch 10 - iter 504/723 - loss 0.01217336 - time (sec): 294.43 - samples/sec: 420.51 - lr: 0.000005 - momentum: 0.000000 2023-10-12 13:59:25,942 epoch 10 - iter 576/723 - loss 0.01172104 - time (sec): 336.58 - samples/sec: 422.72 - lr: 0.000003 - momentum: 0.000000 2023-10-12 14:00:06,168 epoch 10 - iter 648/723 - loss 0.01125967 - time (sec): 376.80 - samples/sec: 421.48 - lr: 0.000002 - momentum: 0.000000 2023-10-12 14:00:47,810 epoch 10 - iter 720/723 - loss 0.01135370 - time (sec): 418.44 - samples/sec: 420.11 - lr: 0.000000 - momentum: 0.000000 2023-10-12 14:00:48,941 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:00:48,941 EPOCH 10 done: loss 0.0113 - lr: 0.000000 2023-10-12 14:01:12,472 DEV : loss 0.13888543844223022 - f1-score (micro avg) 0.8436 2023-10-12 14:01:13,512 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:01:13,514 Loading model from best epoch ... 2023-10-12 14:01:17,667 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 14:01:40,330 Results: - F-score (micro) 0.8564 - F-score (macro) 0.7697 - Accuracy 0.7601 By class: precision recall f1-score support PER 0.8566 0.8672 0.8619 482 LOC 0.8937 0.8996 0.8966 458 ORG 0.5507 0.5507 0.5507 69 micro avg 0.8527 0.8603 0.8564 1009 macro avg 0.7670 0.7725 0.7697 1009 weighted avg 0.8525 0.8603 0.8564 1009 2023-10-12 14:01:40,330 ----------------------------------------------------------------------------------------------------