stefan-it's picture
Upload folder using huggingface_hub
3cca4bf
2023-10-12 12:50:33,877 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,879 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-12 12:50:33,879 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 MultiCorpus: 5777 train + 722 dev + 723 test sentences
- NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 Train: 5777 sentences
2023-10-12 12:50:33,880 (train_with_dev=False, train_with_test=False)
2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 Training Params:
2023-10-12 12:50:33,880 - learning_rate: "0.00015"
2023-10-12 12:50:33,880 - mini_batch_size: "8"
2023-10-12 12:50:33,880 - max_epochs: "10"
2023-10-12 12:50:33,880 - shuffle: "True"
2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,880 Plugins:
2023-10-12 12:50:33,881 - TensorboardLogger
2023-10-12 12:50:33,881 - LinearScheduler | warmup_fraction: '0.1'
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 Final evaluation on model from best epoch (best-model.pt)
2023-10-12 12:50:33,881 - metric: "('micro avg', 'f1-score')"
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 Computation:
2023-10-12 12:50:33,881 - compute on device: cuda:0
2023-10-12 12:50:33,881 - embedding storage: none
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
2023-10-12 12:50:33,882 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-12 12:51:13,856 epoch 1 - iter 72/723 - loss 2.57210097 - time (sec): 39.97 - samples/sec: 446.88 - lr: 0.000015 - momentum: 0.000000
2023-10-12 12:51:55,334 epoch 1 - iter 144/723 - loss 2.50368577 - time (sec): 81.45 - samples/sec: 443.12 - lr: 0.000030 - momentum: 0.000000
2023-10-12 12:52:35,583 epoch 1 - iter 216/723 - loss 2.34479028 - time (sec): 121.70 - samples/sec: 436.36 - lr: 0.000045 - momentum: 0.000000
2023-10-12 12:53:16,570 epoch 1 - iter 288/723 - loss 2.14084440 - time (sec): 162.69 - samples/sec: 434.66 - lr: 0.000060 - momentum: 0.000000
2023-10-12 12:53:56,024 epoch 1 - iter 360/723 - loss 1.92004853 - time (sec): 202.14 - samples/sec: 437.13 - lr: 0.000074 - momentum: 0.000000
2023-10-12 12:54:36,420 epoch 1 - iter 432/723 - loss 1.71012873 - time (sec): 242.54 - samples/sec: 433.77 - lr: 0.000089 - momentum: 0.000000
2023-10-12 12:55:15,712 epoch 1 - iter 504/723 - loss 1.51773573 - time (sec): 281.83 - samples/sec: 434.20 - lr: 0.000104 - momentum: 0.000000
2023-10-12 12:55:55,929 epoch 1 - iter 576/723 - loss 1.35508499 - time (sec): 322.05 - samples/sec: 436.31 - lr: 0.000119 - momentum: 0.000000
2023-10-12 12:56:35,029 epoch 1 - iter 648/723 - loss 1.22422963 - time (sec): 361.15 - samples/sec: 439.92 - lr: 0.000134 - momentum: 0.000000
2023-10-12 12:57:12,712 epoch 1 - iter 720/723 - loss 1.12678212 - time (sec): 398.83 - samples/sec: 440.47 - lr: 0.000149 - momentum: 0.000000
2023-10-12 12:57:13,890 ----------------------------------------------------------------------------------------------------
2023-10-12 12:57:13,891 EPOCH 1 done: loss 1.1237 - lr: 0.000149
2023-10-12 12:57:33,815 DEV : loss 0.22408561408519745 - f1-score (micro avg) 0.0021
2023-10-12 12:57:33,845 saving best model
2023-10-12 12:57:34,699 ----------------------------------------------------------------------------------------------------
2023-10-12 12:58:12,792 epoch 2 - iter 72/723 - loss 0.16692332 - time (sec): 38.09 - samples/sec: 465.66 - lr: 0.000148 - momentum: 0.000000
2023-10-12 12:58:50,804 epoch 2 - iter 144/723 - loss 0.17013500 - time (sec): 76.10 - samples/sec: 458.04 - lr: 0.000147 - momentum: 0.000000
2023-10-12 12:59:29,373 epoch 2 - iter 216/723 - loss 0.16696133 - time (sec): 114.67 - samples/sec: 455.25 - lr: 0.000145 - momentum: 0.000000
2023-10-12 13:00:06,773 epoch 2 - iter 288/723 - loss 0.15925242 - time (sec): 152.07 - samples/sec: 456.96 - lr: 0.000143 - momentum: 0.000000
2023-10-12 13:00:44,163 epoch 2 - iter 360/723 - loss 0.15257157 - time (sec): 189.46 - samples/sec: 453.07 - lr: 0.000142 - momentum: 0.000000
2023-10-12 13:01:23,135 epoch 2 - iter 432/723 - loss 0.14873337 - time (sec): 228.43 - samples/sec: 453.50 - lr: 0.000140 - momentum: 0.000000
2023-10-12 13:02:02,355 epoch 2 - iter 504/723 - loss 0.14684350 - time (sec): 267.65 - samples/sec: 455.16 - lr: 0.000138 - momentum: 0.000000
2023-10-12 13:02:43,047 epoch 2 - iter 576/723 - loss 0.14308762 - time (sec): 308.35 - samples/sec: 453.78 - lr: 0.000137 - momentum: 0.000000
2023-10-12 13:03:23,828 epoch 2 - iter 648/723 - loss 0.13860972 - time (sec): 349.13 - samples/sec: 452.24 - lr: 0.000135 - momentum: 0.000000
2023-10-12 13:04:03,474 epoch 2 - iter 720/723 - loss 0.13666210 - time (sec): 388.77 - samples/sec: 451.37 - lr: 0.000133 - momentum: 0.000000
2023-10-12 13:04:04,944 ----------------------------------------------------------------------------------------------------
2023-10-12 13:04:04,945 EPOCH 2 done: loss 0.1363 - lr: 0.000133
2023-10-12 13:04:25,559 DEV : loss 0.125865638256073 - f1-score (micro avg) 0.6974
2023-10-12 13:04:25,588 saving best model
2023-10-12 13:04:28,483 ----------------------------------------------------------------------------------------------------
2023-10-12 13:05:06,593 epoch 3 - iter 72/723 - loss 0.10614972 - time (sec): 38.11 - samples/sec: 441.85 - lr: 0.000132 - momentum: 0.000000
2023-10-12 13:05:45,790 epoch 3 - iter 144/723 - loss 0.09543565 - time (sec): 77.30 - samples/sec: 448.25 - lr: 0.000130 - momentum: 0.000000
2023-10-12 13:06:24,554 epoch 3 - iter 216/723 - loss 0.09411193 - time (sec): 116.07 - samples/sec: 445.86 - lr: 0.000128 - momentum: 0.000000
2023-10-12 13:07:02,711 epoch 3 - iter 288/723 - loss 0.09004094 - time (sec): 154.22 - samples/sec: 447.22 - lr: 0.000127 - momentum: 0.000000
2023-10-12 13:07:42,238 epoch 3 - iter 360/723 - loss 0.08961960 - time (sec): 193.75 - samples/sec: 446.27 - lr: 0.000125 - momentum: 0.000000
2023-10-12 13:08:22,102 epoch 3 - iter 432/723 - loss 0.08897250 - time (sec): 233.61 - samples/sec: 451.71 - lr: 0.000123 - momentum: 0.000000
2023-10-12 13:09:02,522 epoch 3 - iter 504/723 - loss 0.08525340 - time (sec): 274.03 - samples/sec: 450.43 - lr: 0.000122 - momentum: 0.000000
2023-10-12 13:09:41,313 epoch 3 - iter 576/723 - loss 0.08340939 - time (sec): 312.83 - samples/sec: 449.52 - lr: 0.000120 - momentum: 0.000000
2023-10-12 13:10:21,839 epoch 3 - iter 648/723 - loss 0.08185803 - time (sec): 353.35 - samples/sec: 446.67 - lr: 0.000118 - momentum: 0.000000
2023-10-12 13:11:01,410 epoch 3 - iter 720/723 - loss 0.08020714 - time (sec): 392.92 - samples/sec: 447.11 - lr: 0.000117 - momentum: 0.000000
2023-10-12 13:11:02,572 ----------------------------------------------------------------------------------------------------
2023-10-12 13:11:02,573 EPOCH 3 done: loss 0.0803 - lr: 0.000117
2023-10-12 13:11:24,304 DEV : loss 0.09134244173765182 - f1-score (micro avg) 0.8085
2023-10-12 13:11:24,336 saving best model
2023-10-12 13:11:26,940 ----------------------------------------------------------------------------------------------------
2023-10-12 13:12:08,816 epoch 4 - iter 72/723 - loss 0.04375890 - time (sec): 41.87 - samples/sec: 451.41 - lr: 0.000115 - momentum: 0.000000
2023-10-12 13:12:45,861 epoch 4 - iter 144/723 - loss 0.04780814 - time (sec): 78.92 - samples/sec: 443.89 - lr: 0.000113 - momentum: 0.000000
2023-10-12 13:13:26,615 epoch 4 - iter 216/723 - loss 0.04927505 - time (sec): 119.67 - samples/sec: 431.85 - lr: 0.000112 - momentum: 0.000000
2023-10-12 13:14:05,422 epoch 4 - iter 288/723 - loss 0.05251675 - time (sec): 158.48 - samples/sec: 435.09 - lr: 0.000110 - momentum: 0.000000
2023-10-12 13:14:42,756 epoch 4 - iter 360/723 - loss 0.05197170 - time (sec): 195.81 - samples/sec: 439.06 - lr: 0.000108 - momentum: 0.000000
2023-10-12 13:15:22,534 epoch 4 - iter 432/723 - loss 0.05444043 - time (sec): 235.59 - samples/sec: 441.72 - lr: 0.000107 - momentum: 0.000000
2023-10-12 13:16:02,299 epoch 4 - iter 504/723 - loss 0.05358124 - time (sec): 275.36 - samples/sec: 447.12 - lr: 0.000105 - momentum: 0.000000
2023-10-12 13:16:40,164 epoch 4 - iter 576/723 - loss 0.05390487 - time (sec): 313.22 - samples/sec: 447.55 - lr: 0.000103 - momentum: 0.000000
2023-10-12 13:17:20,651 epoch 4 - iter 648/723 - loss 0.05448671 - time (sec): 353.71 - samples/sec: 445.79 - lr: 0.000102 - momentum: 0.000000
2023-10-12 13:17:59,869 epoch 4 - iter 720/723 - loss 0.05377454 - time (sec): 392.93 - samples/sec: 447.28 - lr: 0.000100 - momentum: 0.000000
2023-10-12 13:18:00,975 ----------------------------------------------------------------------------------------------------
2023-10-12 13:18:00,975 EPOCH 4 done: loss 0.0537 - lr: 0.000100
2023-10-12 13:18:21,723 DEV : loss 0.08551333099603653 - f1-score (micro avg) 0.8344
2023-10-12 13:18:21,755 saving best model
2023-10-12 13:18:22,733 ----------------------------------------------------------------------------------------------------
2023-10-12 13:19:01,088 epoch 5 - iter 72/723 - loss 0.02887265 - time (sec): 38.35 - samples/sec: 442.57 - lr: 0.000098 - momentum: 0.000000
2023-10-12 13:19:42,050 epoch 5 - iter 144/723 - loss 0.03460162 - time (sec): 79.32 - samples/sec: 433.60 - lr: 0.000097 - momentum: 0.000000
2023-10-12 13:20:22,461 epoch 5 - iter 216/723 - loss 0.03491227 - time (sec): 119.73 - samples/sec: 435.34 - lr: 0.000095 - momentum: 0.000000
2023-10-12 13:21:05,047 epoch 5 - iter 288/723 - loss 0.03630309 - time (sec): 162.31 - samples/sec: 434.65 - lr: 0.000093 - momentum: 0.000000
2023-10-12 13:21:48,859 epoch 5 - iter 360/723 - loss 0.03649219 - time (sec): 206.12 - samples/sec: 427.40 - lr: 0.000092 - momentum: 0.000000
2023-10-12 13:22:31,786 epoch 5 - iter 432/723 - loss 0.03618346 - time (sec): 249.05 - samples/sec: 424.25 - lr: 0.000090 - momentum: 0.000000
2023-10-12 13:23:12,902 epoch 5 - iter 504/723 - loss 0.03519474 - time (sec): 290.17 - samples/sec: 420.92 - lr: 0.000088 - momentum: 0.000000
2023-10-12 13:23:54,082 epoch 5 - iter 576/723 - loss 0.03451399 - time (sec): 331.35 - samples/sec: 422.25 - lr: 0.000087 - momentum: 0.000000
2023-10-12 13:24:35,954 epoch 5 - iter 648/723 - loss 0.03511239 - time (sec): 373.22 - samples/sec: 422.11 - lr: 0.000085 - momentum: 0.000000
2023-10-12 13:25:18,788 epoch 5 - iter 720/723 - loss 0.03557155 - time (sec): 416.05 - samples/sec: 422.28 - lr: 0.000083 - momentum: 0.000000
2023-10-12 13:25:20,108 ----------------------------------------------------------------------------------------------------
2023-10-12 13:25:20,108 EPOCH 5 done: loss 0.0355 - lr: 0.000083
2023-10-12 13:25:42,303 DEV : loss 0.09844549000263214 - f1-score (micro avg) 0.8281
2023-10-12 13:25:42,335 ----------------------------------------------------------------------------------------------------
2023-10-12 13:26:24,818 epoch 6 - iter 72/723 - loss 0.03450655 - time (sec): 42.48 - samples/sec: 423.14 - lr: 0.000082 - momentum: 0.000000
2023-10-12 13:27:04,856 epoch 6 - iter 144/723 - loss 0.03035545 - time (sec): 82.52 - samples/sec: 423.88 - lr: 0.000080 - momentum: 0.000000
2023-10-12 13:27:43,676 epoch 6 - iter 216/723 - loss 0.02931547 - time (sec): 121.34 - samples/sec: 436.00 - lr: 0.000078 - momentum: 0.000000
2023-10-12 13:28:23,185 epoch 6 - iter 288/723 - loss 0.02672535 - time (sec): 160.85 - samples/sec: 445.88 - lr: 0.000077 - momentum: 0.000000
2023-10-12 13:29:00,446 epoch 6 - iter 360/723 - loss 0.02600815 - time (sec): 198.11 - samples/sec: 441.33 - lr: 0.000075 - momentum: 0.000000
2023-10-12 13:29:40,823 epoch 6 - iter 432/723 - loss 0.02605262 - time (sec): 238.49 - samples/sec: 447.85 - lr: 0.000073 - momentum: 0.000000
2023-10-12 13:30:19,459 epoch 6 - iter 504/723 - loss 0.02591407 - time (sec): 277.12 - samples/sec: 446.06 - lr: 0.000072 - momentum: 0.000000
2023-10-12 13:30:58,816 epoch 6 - iter 576/723 - loss 0.02584013 - time (sec): 316.48 - samples/sec: 446.29 - lr: 0.000070 - momentum: 0.000000
2023-10-12 13:31:38,066 epoch 6 - iter 648/723 - loss 0.02608188 - time (sec): 355.73 - samples/sec: 446.29 - lr: 0.000068 - momentum: 0.000000
2023-10-12 13:32:16,080 epoch 6 - iter 720/723 - loss 0.02616533 - time (sec): 393.74 - samples/sec: 446.09 - lr: 0.000067 - momentum: 0.000000
2023-10-12 13:32:17,295 ----------------------------------------------------------------------------------------------------
2023-10-12 13:32:17,296 EPOCH 6 done: loss 0.0261 - lr: 0.000067
2023-10-12 13:32:38,779 DEV : loss 0.0909653976559639 - f1-score (micro avg) 0.8547
2023-10-12 13:32:38,811 saving best model
2023-10-12 13:32:41,417 ----------------------------------------------------------------------------------------------------
2023-10-12 13:33:18,878 epoch 7 - iter 72/723 - loss 0.02548483 - time (sec): 37.46 - samples/sec: 446.16 - lr: 0.000065 - momentum: 0.000000
2023-10-12 13:33:58,757 epoch 7 - iter 144/723 - loss 0.02514500 - time (sec): 77.34 - samples/sec: 462.95 - lr: 0.000063 - momentum: 0.000000
2023-10-12 13:34:37,377 epoch 7 - iter 216/723 - loss 0.02611835 - time (sec): 115.96 - samples/sec: 462.23 - lr: 0.000062 - momentum: 0.000000
2023-10-12 13:35:15,889 epoch 7 - iter 288/723 - loss 0.02415853 - time (sec): 154.47 - samples/sec: 459.97 - lr: 0.000060 - momentum: 0.000000
2023-10-12 13:35:54,789 epoch 7 - iter 360/723 - loss 0.02340115 - time (sec): 193.37 - samples/sec: 458.91 - lr: 0.000058 - momentum: 0.000000
2023-10-12 13:36:34,765 epoch 7 - iter 432/723 - loss 0.02218556 - time (sec): 233.34 - samples/sec: 458.27 - lr: 0.000057 - momentum: 0.000000
2023-10-12 13:37:13,961 epoch 7 - iter 504/723 - loss 0.02199272 - time (sec): 272.54 - samples/sec: 459.46 - lr: 0.000055 - momentum: 0.000000
2023-10-12 13:37:51,396 epoch 7 - iter 576/723 - loss 0.02149862 - time (sec): 309.98 - samples/sec: 459.97 - lr: 0.000053 - momentum: 0.000000
2023-10-12 13:38:28,567 epoch 7 - iter 648/723 - loss 0.02110408 - time (sec): 347.15 - samples/sec: 457.63 - lr: 0.000052 - momentum: 0.000000
2023-10-12 13:39:04,853 epoch 7 - iter 720/723 - loss 0.02089846 - time (sec): 383.43 - samples/sec: 458.45 - lr: 0.000050 - momentum: 0.000000
2023-10-12 13:39:05,894 ----------------------------------------------------------------------------------------------------
2023-10-12 13:39:05,895 EPOCH 7 done: loss 0.0209 - lr: 0.000050
2023-10-12 13:39:26,497 DEV : loss 0.12286769598722458 - f1-score (micro avg) 0.8403
2023-10-12 13:39:26,530 ----------------------------------------------------------------------------------------------------
2023-10-12 13:40:05,852 epoch 8 - iter 72/723 - loss 0.01313140 - time (sec): 39.32 - samples/sec: 454.78 - lr: 0.000048 - momentum: 0.000000
2023-10-12 13:40:45,572 epoch 8 - iter 144/723 - loss 0.01250697 - time (sec): 79.04 - samples/sec: 454.78 - lr: 0.000047 - momentum: 0.000000
2023-10-12 13:41:25,089 epoch 8 - iter 216/723 - loss 0.01246682 - time (sec): 118.56 - samples/sec: 451.77 - lr: 0.000045 - momentum: 0.000000
2023-10-12 13:42:05,028 epoch 8 - iter 288/723 - loss 0.01132435 - time (sec): 158.50 - samples/sec: 455.71 - lr: 0.000043 - momentum: 0.000000
2023-10-12 13:42:42,164 epoch 8 - iter 360/723 - loss 0.01353114 - time (sec): 195.63 - samples/sec: 450.26 - lr: 0.000042 - momentum: 0.000000
2023-10-12 13:43:21,974 epoch 8 - iter 432/723 - loss 0.01342207 - time (sec): 235.44 - samples/sec: 446.73 - lr: 0.000040 - momentum: 0.000000
2023-10-12 13:44:01,853 epoch 8 - iter 504/723 - loss 0.01495552 - time (sec): 275.32 - samples/sec: 444.93 - lr: 0.000038 - momentum: 0.000000
2023-10-12 13:44:44,572 epoch 8 - iter 576/723 - loss 0.01542540 - time (sec): 318.04 - samples/sec: 441.01 - lr: 0.000037 - momentum: 0.000000
2023-10-12 13:45:26,594 epoch 8 - iter 648/723 - loss 0.01508947 - time (sec): 360.06 - samples/sec: 438.48 - lr: 0.000035 - momentum: 0.000000
2023-10-12 13:46:08,421 epoch 8 - iter 720/723 - loss 0.01601048 - time (sec): 401.89 - samples/sec: 437.43 - lr: 0.000033 - momentum: 0.000000
2023-10-12 13:46:09,570 ----------------------------------------------------------------------------------------------------
2023-10-12 13:46:09,571 EPOCH 8 done: loss 0.0162 - lr: 0.000033
2023-10-12 13:46:31,884 DEV : loss 0.12785491347312927 - f1-score (micro avg) 0.8443
2023-10-12 13:46:31,946 ----------------------------------------------------------------------------------------------------
2023-10-12 13:47:15,224 epoch 9 - iter 72/723 - loss 0.01226192 - time (sec): 43.28 - samples/sec: 424.75 - lr: 0.000032 - momentum: 0.000000
2023-10-12 13:47:57,051 epoch 9 - iter 144/723 - loss 0.01230280 - time (sec): 85.10 - samples/sec: 413.44 - lr: 0.000030 - momentum: 0.000000
2023-10-12 13:48:39,683 epoch 9 - iter 216/723 - loss 0.01384980 - time (sec): 127.73 - samples/sec: 408.83 - lr: 0.000028 - momentum: 0.000000
2023-10-12 13:49:21,709 epoch 9 - iter 288/723 - loss 0.01248849 - time (sec): 169.76 - samples/sec: 410.18 - lr: 0.000027 - momentum: 0.000000
2023-10-12 13:50:02,274 epoch 9 - iter 360/723 - loss 0.01133267 - time (sec): 210.32 - samples/sec: 418.21 - lr: 0.000025 - momentum: 0.000000
2023-10-12 13:50:44,723 epoch 9 - iter 432/723 - loss 0.01131670 - time (sec): 252.77 - samples/sec: 417.56 - lr: 0.000023 - momentum: 0.000000
2023-10-12 13:51:25,911 epoch 9 - iter 504/723 - loss 0.01149792 - time (sec): 293.96 - samples/sec: 421.43 - lr: 0.000022 - momentum: 0.000000
2023-10-12 13:52:06,053 epoch 9 - iter 576/723 - loss 0.01200648 - time (sec): 334.10 - samples/sec: 422.14 - lr: 0.000020 - momentum: 0.000000
2023-10-12 13:52:44,347 epoch 9 - iter 648/723 - loss 0.01233164 - time (sec): 372.40 - samples/sec: 422.87 - lr: 0.000018 - momentum: 0.000000
2023-10-12 13:53:25,270 epoch 9 - iter 720/723 - loss 0.01241683 - time (sec): 413.32 - samples/sec: 423.69 - lr: 0.000017 - momentum: 0.000000
2023-10-12 13:53:27,201 ----------------------------------------------------------------------------------------------------
2023-10-12 13:53:27,202 EPOCH 9 done: loss 0.0136 - lr: 0.000017
2023-10-12 13:53:49,325 DEV : loss 0.13609179854393005 - f1-score (micro avg) 0.8429
2023-10-12 13:53:49,364 ----------------------------------------------------------------------------------------------------
2023-10-12 13:54:34,158 epoch 10 - iter 72/723 - loss 0.02070022 - time (sec): 44.79 - samples/sec: 420.68 - lr: 0.000015 - momentum: 0.000000
2023-10-12 13:55:14,534 epoch 10 - iter 144/723 - loss 0.01572521 - time (sec): 85.17 - samples/sec: 427.85 - lr: 0.000013 - momentum: 0.000000
2023-10-12 13:55:57,299 epoch 10 - iter 216/723 - loss 0.01436480 - time (sec): 127.93 - samples/sec: 418.71 - lr: 0.000012 - momentum: 0.000000
2023-10-12 13:56:38,887 epoch 10 - iter 288/723 - loss 0.01335702 - time (sec): 169.52 - samples/sec: 421.14 - lr: 0.000010 - momentum: 0.000000
2023-10-12 13:57:20,435 epoch 10 - iter 360/723 - loss 0.01325818 - time (sec): 211.07 - samples/sec: 422.83 - lr: 0.000008 - momentum: 0.000000
2023-10-12 13:58:02,750 epoch 10 - iter 432/723 - loss 0.01235929 - time (sec): 253.38 - samples/sec: 425.90 - lr: 0.000007 - momentum: 0.000000
2023-10-12 13:58:43,798 epoch 10 - iter 504/723 - loss 0.01217336 - time (sec): 294.43 - samples/sec: 420.51 - lr: 0.000005 - momentum: 0.000000
2023-10-12 13:59:25,942 epoch 10 - iter 576/723 - loss 0.01172104 - time (sec): 336.58 - samples/sec: 422.72 - lr: 0.000003 - momentum: 0.000000
2023-10-12 14:00:06,168 epoch 10 - iter 648/723 - loss 0.01125967 - time (sec): 376.80 - samples/sec: 421.48 - lr: 0.000002 - momentum: 0.000000
2023-10-12 14:00:47,810 epoch 10 - iter 720/723 - loss 0.01135370 - time (sec): 418.44 - samples/sec: 420.11 - lr: 0.000000 - momentum: 0.000000
2023-10-12 14:00:48,941 ----------------------------------------------------------------------------------------------------
2023-10-12 14:00:48,941 EPOCH 10 done: loss 0.0113 - lr: 0.000000
2023-10-12 14:01:12,472 DEV : loss 0.13888543844223022 - f1-score (micro avg) 0.8436
2023-10-12 14:01:13,512 ----------------------------------------------------------------------------------------------------
2023-10-12 14:01:13,514 Loading model from best epoch ...
2023-10-12 14:01:17,667 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-12 14:01:40,330
Results:
- F-score (micro) 0.8564
- F-score (macro) 0.7697
- Accuracy 0.7601
By class:
precision recall f1-score support
PER 0.8566 0.8672 0.8619 482
LOC 0.8937 0.8996 0.8966 458
ORG 0.5507 0.5507 0.5507 69
micro avg 0.8527 0.8603 0.8564 1009
macro avg 0.7670 0.7725 0.7697 1009
weighted avg 0.8525 0.8603 0.8564 1009
2023-10-12 14:01:40,330 ----------------------------------------------------------------------------------------------------