2023-10-10 22:58:10,890 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,892 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-10 22:58:10,892 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,893 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-10 22:58:10,893 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,893 Train: 1166 sentences 2023-10-10 22:58:10,893 (train_with_dev=False, train_with_test=False) 2023-10-10 22:58:10,893 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,893 Training Params: 2023-10-10 22:58:10,893 - learning_rate: "0.00015" 2023-10-10 22:58:10,893 - mini_batch_size: "8" 2023-10-10 22:58:10,893 - max_epochs: "10" 2023-10-10 22:58:10,893 - shuffle: "True" 2023-10-10 22:58:10,893 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,893 Plugins: 2023-10-10 22:58:10,893 - TensorboardLogger 2023-10-10 22:58:10,894 - LinearScheduler | warmup_fraction: '0.1' 2023-10-10 22:58:10,894 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,894 Final evaluation on model from best epoch (best-model.pt) 2023-10-10 22:58:10,894 - metric: "('micro avg', 'f1-score')" 2023-10-10 22:58:10,894 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,894 Computation: 2023-10-10 22:58:10,894 - compute on device: cuda:0 2023-10-10 22:58:10,894 - embedding storage: none 2023-10-10 22:58:10,894 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,894 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-10 22:58:10,894 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,894 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:58:10,894 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-10 22:58:20,744 epoch 1 - iter 14/146 - loss 2.85080360 - time (sec): 9.85 - samples/sec: 508.04 - lr: 0.000013 - momentum: 0.000000 2023-10-10 22:58:29,270 epoch 1 - iter 28/146 - loss 2.84737208 - time (sec): 18.37 - samples/sec: 476.37 - lr: 0.000028 - momentum: 0.000000 2023-10-10 22:58:38,296 epoch 1 - iter 42/146 - loss 2.83617163 - time (sec): 27.40 - samples/sec: 485.34 - lr: 0.000042 - momentum: 0.000000 2023-10-10 22:58:47,385 epoch 1 - iter 56/146 - loss 2.82244913 - time (sec): 36.49 - samples/sec: 480.64 - lr: 0.000057 - momentum: 0.000000 2023-10-10 22:58:56,369 epoch 1 - iter 70/146 - loss 2.79297253 - time (sec): 45.47 - samples/sec: 469.05 - lr: 0.000071 - momentum: 0.000000 2023-10-10 22:59:05,074 epoch 1 - iter 84/146 - loss 2.74178364 - time (sec): 54.18 - samples/sec: 462.87 - lr: 0.000085 - momentum: 0.000000 2023-10-10 22:59:14,215 epoch 1 - iter 98/146 - loss 2.67383737 - time (sec): 63.32 - samples/sec: 460.35 - lr: 0.000100 - momentum: 0.000000 2023-10-10 22:59:23,588 epoch 1 - iter 112/146 - loss 2.59471648 - time (sec): 72.69 - samples/sec: 459.42 - lr: 0.000114 - momentum: 0.000000 2023-10-10 22:59:32,864 epoch 1 - iter 126/146 - loss 2.50270095 - time (sec): 81.97 - samples/sec: 463.67 - lr: 0.000128 - momentum: 0.000000 2023-10-10 22:59:42,548 epoch 1 - iter 140/146 - loss 2.40998070 - time (sec): 91.65 - samples/sec: 465.82 - lr: 0.000143 - momentum: 0.000000 2023-10-10 22:59:46,352 ---------------------------------------------------------------------------------------------------- 2023-10-10 22:59:46,352 EPOCH 1 done: loss 2.3739 - lr: 0.000143 2023-10-10 22:59:51,751 DEV : loss 1.3572572469711304 - f1-score (micro avg) 0.0 2023-10-10 22:59:51,762 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:00:00,609 epoch 2 - iter 14/146 - loss 1.35621053 - time (sec): 8.85 - samples/sec: 476.09 - lr: 0.000149 - momentum: 0.000000 2023-10-10 23:00:10,228 epoch 2 - iter 28/146 - loss 1.27306197 - time (sec): 18.46 - samples/sec: 485.71 - lr: 0.000147 - momentum: 0.000000 2023-10-10 23:00:19,552 epoch 2 - iter 42/146 - loss 1.20505413 - time (sec): 27.79 - samples/sec: 495.43 - lr: 0.000145 - momentum: 0.000000 2023-10-10 23:00:28,035 epoch 2 - iter 56/146 - loss 1.14229062 - time (sec): 36.27 - samples/sec: 478.01 - lr: 0.000144 - momentum: 0.000000 2023-10-10 23:00:37,032 epoch 2 - iter 70/146 - loss 1.06371575 - time (sec): 45.27 - samples/sec: 477.67 - lr: 0.000142 - momentum: 0.000000 2023-10-10 23:00:44,657 epoch 2 - iter 84/146 - loss 1.03291390 - time (sec): 52.89 - samples/sec: 466.21 - lr: 0.000141 - momentum: 0.000000 2023-10-10 23:00:53,889 epoch 2 - iter 98/146 - loss 0.97249437 - time (sec): 62.12 - samples/sec: 469.83 - lr: 0.000139 - momentum: 0.000000 2023-10-10 23:01:03,403 epoch 2 - iter 112/146 - loss 0.91123270 - time (sec): 71.64 - samples/sec: 473.21 - lr: 0.000137 - momentum: 0.000000 2023-10-10 23:01:12,183 epoch 2 - iter 126/146 - loss 0.86074670 - time (sec): 80.42 - samples/sec: 474.96 - lr: 0.000136 - momentum: 0.000000 2023-10-10 23:01:21,088 epoch 2 - iter 140/146 - loss 0.82736123 - time (sec): 89.32 - samples/sec: 471.80 - lr: 0.000134 - momentum: 0.000000 2023-10-10 23:01:25,200 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:01:25,200 EPOCH 2 done: loss 0.8534 - lr: 0.000134 2023-10-10 23:01:31,340 DEV : loss 0.4602771997451782 - f1-score (micro avg) 0.0 2023-10-10 23:01:31,350 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:01:40,294 epoch 3 - iter 14/146 - loss 0.58434813 - time (sec): 8.94 - samples/sec: 407.55 - lr: 0.000132 - momentum: 0.000000 2023-10-10 23:01:48,865 epoch 3 - iter 28/146 - loss 0.51229680 - time (sec): 17.51 - samples/sec: 443.85 - lr: 0.000130 - momentum: 0.000000 2023-10-10 23:01:58,385 epoch 3 - iter 42/146 - loss 0.60569011 - time (sec): 27.03 - samples/sec: 463.77 - lr: 0.000129 - momentum: 0.000000 2023-10-10 23:02:06,775 epoch 3 - iter 56/146 - loss 0.57540816 - time (sec): 35.42 - samples/sec: 462.75 - lr: 0.000127 - momentum: 0.000000 2023-10-10 23:02:15,652 epoch 3 - iter 70/146 - loss 0.54419923 - time (sec): 44.30 - samples/sec: 466.46 - lr: 0.000126 - momentum: 0.000000 2023-10-10 23:02:25,245 epoch 3 - iter 84/146 - loss 0.52507541 - time (sec): 53.89 - samples/sec: 459.35 - lr: 0.000124 - momentum: 0.000000 2023-10-10 23:02:35,262 epoch 3 - iter 98/146 - loss 0.50009096 - time (sec): 63.91 - samples/sec: 456.79 - lr: 0.000122 - momentum: 0.000000 2023-10-10 23:02:45,700 epoch 3 - iter 112/146 - loss 0.47890839 - time (sec): 74.35 - samples/sec: 455.88 - lr: 0.000121 - momentum: 0.000000 2023-10-10 23:02:55,929 epoch 3 - iter 126/146 - loss 0.46371610 - time (sec): 84.58 - samples/sec: 452.61 - lr: 0.000119 - momentum: 0.000000 2023-10-10 23:03:06,048 epoch 3 - iter 140/146 - loss 0.45098021 - time (sec): 94.70 - samples/sec: 452.39 - lr: 0.000118 - momentum: 0.000000 2023-10-10 23:03:09,921 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:03:09,921 EPOCH 3 done: loss 0.4549 - lr: 0.000118 2023-10-10 23:03:15,900 DEV : loss 0.3545370399951935 - f1-score (micro avg) 0.1683 2023-10-10 23:03:15,909 saving best model 2023-10-10 23:03:16,804 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:03:25,497 epoch 4 - iter 14/146 - loss 0.35955470 - time (sec): 8.69 - samples/sec: 469.20 - lr: 0.000115 - momentum: 0.000000 2023-10-10 23:03:34,922 epoch 4 - iter 28/146 - loss 0.43915891 - time (sec): 18.12 - samples/sec: 466.27 - lr: 0.000114 - momentum: 0.000000 2023-10-10 23:03:43,948 epoch 4 - iter 42/146 - loss 0.36828509 - time (sec): 27.14 - samples/sec: 469.10 - lr: 0.000112 - momentum: 0.000000 2023-10-10 23:03:52,763 epoch 4 - iter 56/146 - loss 0.36449078 - time (sec): 35.96 - samples/sec: 462.14 - lr: 0.000111 - momentum: 0.000000 2023-10-10 23:04:01,461 epoch 4 - iter 70/146 - loss 0.36634265 - time (sec): 44.65 - samples/sec: 461.38 - lr: 0.000109 - momentum: 0.000000 2023-10-10 23:04:10,702 epoch 4 - iter 84/146 - loss 0.36530805 - time (sec): 53.90 - samples/sec: 458.20 - lr: 0.000107 - momentum: 0.000000 2023-10-10 23:04:20,010 epoch 4 - iter 98/146 - loss 0.34785221 - time (sec): 63.20 - samples/sec: 460.69 - lr: 0.000106 - momentum: 0.000000 2023-10-10 23:04:28,603 epoch 4 - iter 112/146 - loss 0.34537336 - time (sec): 71.80 - samples/sec: 460.94 - lr: 0.000104 - momentum: 0.000000 2023-10-10 23:04:38,203 epoch 4 - iter 126/146 - loss 0.34816589 - time (sec): 81.40 - samples/sec: 461.70 - lr: 0.000103 - momentum: 0.000000 2023-10-10 23:04:47,882 epoch 4 - iter 140/146 - loss 0.35021896 - time (sec): 91.08 - samples/sec: 465.67 - lr: 0.000101 - momentum: 0.000000 2023-10-10 23:04:51,671 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:04:51,671 EPOCH 4 done: loss 0.3454 - lr: 0.000101 2023-10-10 23:04:57,765 DEV : loss 0.2620405852794647 - f1-score (micro avg) 0.2198 2023-10-10 23:04:57,775 saving best model 2023-10-10 23:05:06,005 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:05:14,754 epoch 5 - iter 14/146 - loss 0.33065727 - time (sec): 8.75 - samples/sec: 463.69 - lr: 0.000099 - momentum: 0.000000 2023-10-10 23:05:24,485 epoch 5 - iter 28/146 - loss 0.27465261 - time (sec): 18.48 - samples/sec: 480.13 - lr: 0.000097 - momentum: 0.000000 2023-10-10 23:05:33,372 epoch 5 - iter 42/146 - loss 0.27046333 - time (sec): 27.36 - samples/sec: 469.02 - lr: 0.000096 - momentum: 0.000000 2023-10-10 23:05:42,165 epoch 5 - iter 56/146 - loss 0.26438058 - time (sec): 36.16 - samples/sec: 469.49 - lr: 0.000094 - momentum: 0.000000 2023-10-10 23:05:50,851 epoch 5 - iter 70/146 - loss 0.27496324 - time (sec): 44.84 - samples/sec: 465.23 - lr: 0.000092 - momentum: 0.000000 2023-10-10 23:06:01,117 epoch 5 - iter 84/146 - loss 0.30009690 - time (sec): 55.11 - samples/sec: 473.85 - lr: 0.000091 - momentum: 0.000000 2023-10-10 23:06:11,160 epoch 5 - iter 98/146 - loss 0.30356108 - time (sec): 65.15 - samples/sec: 473.78 - lr: 0.000089 - momentum: 0.000000 2023-10-10 23:06:20,380 epoch 5 - iter 112/146 - loss 0.29759337 - time (sec): 74.37 - samples/sec: 475.74 - lr: 0.000088 - momentum: 0.000000 2023-10-10 23:06:29,071 epoch 5 - iter 126/146 - loss 0.29582810 - time (sec): 83.06 - samples/sec: 470.80 - lr: 0.000086 - momentum: 0.000000 2023-10-10 23:06:37,398 epoch 5 - iter 140/146 - loss 0.29412746 - time (sec): 91.39 - samples/sec: 466.97 - lr: 0.000084 - momentum: 0.000000 2023-10-10 23:06:41,340 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:06:41,340 EPOCH 5 done: loss 0.2924 - lr: 0.000084 2023-10-10 23:06:47,335 DEV : loss 0.233732670545578 - f1-score (micro avg) 0.294 2023-10-10 23:06:47,344 saving best model 2023-10-10 23:06:55,188 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:07:04,904 epoch 6 - iter 14/146 - loss 0.21279716 - time (sec): 9.71 - samples/sec: 479.03 - lr: 0.000082 - momentum: 0.000000 2023-10-10 23:07:13,663 epoch 6 - iter 28/146 - loss 0.23945889 - time (sec): 18.47 - samples/sec: 467.99 - lr: 0.000081 - momentum: 0.000000 2023-10-10 23:07:22,417 epoch 6 - iter 42/146 - loss 0.22572971 - time (sec): 27.23 - samples/sec: 471.79 - lr: 0.000079 - momentum: 0.000000 2023-10-10 23:07:31,554 epoch 6 - iter 56/146 - loss 0.23744901 - time (sec): 36.36 - samples/sec: 466.59 - lr: 0.000077 - momentum: 0.000000 2023-10-10 23:07:40,351 epoch 6 - iter 70/146 - loss 0.24226304 - time (sec): 45.16 - samples/sec: 469.66 - lr: 0.000076 - momentum: 0.000000 2023-10-10 23:07:48,887 epoch 6 - iter 84/146 - loss 0.24621321 - time (sec): 53.70 - samples/sec: 468.04 - lr: 0.000074 - momentum: 0.000000 2023-10-10 23:07:58,610 epoch 6 - iter 98/146 - loss 0.26217291 - time (sec): 63.42 - samples/sec: 476.79 - lr: 0.000073 - momentum: 0.000000 2023-10-10 23:08:07,909 epoch 6 - iter 112/146 - loss 0.26102990 - time (sec): 72.72 - samples/sec: 470.50 - lr: 0.000071 - momentum: 0.000000 2023-10-10 23:08:16,844 epoch 6 - iter 126/146 - loss 0.25644519 - time (sec): 81.65 - samples/sec: 468.56 - lr: 0.000069 - momentum: 0.000000 2023-10-10 23:08:25,543 epoch 6 - iter 140/146 - loss 0.25250192 - time (sec): 90.35 - samples/sec: 470.08 - lr: 0.000068 - momentum: 0.000000 2023-10-10 23:08:29,489 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:08:29,490 EPOCH 6 done: loss 0.2498 - lr: 0.000068 2023-10-10 23:08:35,694 DEV : loss 0.21416617929935455 - f1-score (micro avg) 0.428 2023-10-10 23:08:35,705 saving best model 2023-10-10 23:08:40,902 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:08:50,396 epoch 7 - iter 14/146 - loss 0.20869646 - time (sec): 9.49 - samples/sec: 439.54 - lr: 0.000066 - momentum: 0.000000 2023-10-10 23:09:00,761 epoch 7 - iter 28/146 - loss 0.19793080 - time (sec): 19.85 - samples/sec: 464.18 - lr: 0.000064 - momentum: 0.000000 2023-10-10 23:09:09,488 epoch 7 - iter 42/146 - loss 0.19705387 - time (sec): 28.58 - samples/sec: 443.46 - lr: 0.000062 - momentum: 0.000000 2023-10-10 23:09:18,807 epoch 7 - iter 56/146 - loss 0.21316354 - time (sec): 37.90 - samples/sec: 443.27 - lr: 0.000061 - momentum: 0.000000 2023-10-10 23:09:26,780 epoch 7 - iter 70/146 - loss 0.20525473 - time (sec): 45.87 - samples/sec: 437.55 - lr: 0.000059 - momentum: 0.000000 2023-10-10 23:09:35,299 epoch 7 - iter 84/146 - loss 0.20806961 - time (sec): 54.39 - samples/sec: 445.74 - lr: 0.000058 - momentum: 0.000000 2023-10-10 23:09:44,917 epoch 7 - iter 98/146 - loss 0.20881775 - time (sec): 64.01 - samples/sec: 459.12 - lr: 0.000056 - momentum: 0.000000 2023-10-10 23:09:54,303 epoch 7 - iter 112/146 - loss 0.20670940 - time (sec): 73.40 - samples/sec: 457.48 - lr: 0.000054 - momentum: 0.000000 2023-10-10 23:10:03,027 epoch 7 - iter 126/146 - loss 0.21554904 - time (sec): 82.12 - samples/sec: 460.36 - lr: 0.000053 - momentum: 0.000000 2023-10-10 23:10:12,759 epoch 7 - iter 140/146 - loss 0.20959678 - time (sec): 91.85 - samples/sec: 465.33 - lr: 0.000051 - momentum: 0.000000 2023-10-10 23:10:16,496 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:10:16,496 EPOCH 7 done: loss 0.2096 - lr: 0.000051 2023-10-10 23:10:22,427 DEV : loss 0.19345837831497192 - f1-score (micro avg) 0.485 2023-10-10 23:10:22,437 saving best model 2023-10-10 23:10:32,162 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:10:41,415 epoch 8 - iter 14/146 - loss 0.18298293 - time (sec): 9.25 - samples/sec: 457.08 - lr: 0.000049 - momentum: 0.000000 2023-10-10 23:10:50,318 epoch 8 - iter 28/146 - loss 0.20041757 - time (sec): 18.15 - samples/sec: 467.68 - lr: 0.000047 - momentum: 0.000000 2023-10-10 23:10:59,244 epoch 8 - iter 42/146 - loss 0.18477550 - time (sec): 27.08 - samples/sec: 470.62 - lr: 0.000046 - momentum: 0.000000 2023-10-10 23:11:07,458 epoch 8 - iter 56/146 - loss 0.18939422 - time (sec): 35.29 - samples/sec: 460.11 - lr: 0.000044 - momentum: 0.000000 2023-10-10 23:11:16,672 epoch 8 - iter 70/146 - loss 0.20069635 - time (sec): 44.51 - samples/sec: 471.43 - lr: 0.000043 - momentum: 0.000000 2023-10-10 23:11:25,545 epoch 8 - iter 84/146 - loss 0.19917900 - time (sec): 53.38 - samples/sec: 463.97 - lr: 0.000041 - momentum: 0.000000 2023-10-10 23:11:35,352 epoch 8 - iter 98/146 - loss 0.18755072 - time (sec): 63.19 - samples/sec: 468.40 - lr: 0.000039 - momentum: 0.000000 2023-10-10 23:11:43,884 epoch 8 - iter 112/146 - loss 0.18603688 - time (sec): 71.72 - samples/sec: 466.09 - lr: 0.000038 - momentum: 0.000000 2023-10-10 23:11:53,379 epoch 8 - iter 126/146 - loss 0.18215187 - time (sec): 81.21 - samples/sec: 470.73 - lr: 0.000036 - momentum: 0.000000 2023-10-10 23:12:03,326 epoch 8 - iter 140/146 - loss 0.18026671 - time (sec): 91.16 - samples/sec: 473.98 - lr: 0.000035 - momentum: 0.000000 2023-10-10 23:12:06,714 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:12:06,714 EPOCH 8 done: loss 0.1781 - lr: 0.000035 2023-10-10 23:12:12,668 DEV : loss 0.18470922112464905 - f1-score (micro avg) 0.5094 2023-10-10 23:12:12,678 saving best model 2023-10-10 23:12:26,423 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:12:35,685 epoch 9 - iter 14/146 - loss 0.16655684 - time (sec): 9.26 - samples/sec: 465.12 - lr: 0.000032 - momentum: 0.000000 2023-10-10 23:12:44,294 epoch 9 - iter 28/146 - loss 0.16140822 - time (sec): 17.87 - samples/sec: 461.81 - lr: 0.000031 - momentum: 0.000000 2023-10-10 23:12:52,406 epoch 9 - iter 42/146 - loss 0.17856536 - time (sec): 25.98 - samples/sec: 448.97 - lr: 0.000029 - momentum: 0.000000 2023-10-10 23:13:01,225 epoch 9 - iter 56/146 - loss 0.17303316 - time (sec): 34.80 - samples/sec: 456.18 - lr: 0.000028 - momentum: 0.000000 2023-10-10 23:13:11,847 epoch 9 - iter 70/146 - loss 0.17994148 - time (sec): 45.42 - samples/sec: 477.28 - lr: 0.000026 - momentum: 0.000000 2023-10-10 23:13:19,828 epoch 9 - iter 84/146 - loss 0.17025472 - time (sec): 53.40 - samples/sec: 466.73 - lr: 0.000024 - momentum: 0.000000 2023-10-10 23:13:29,108 epoch 9 - iter 98/146 - loss 0.16961006 - time (sec): 62.68 - samples/sec: 473.92 - lr: 0.000023 - momentum: 0.000000 2023-10-10 23:13:38,011 epoch 9 - iter 112/146 - loss 0.16697591 - time (sec): 71.58 - samples/sec: 474.21 - lr: 0.000021 - momentum: 0.000000 2023-10-10 23:13:46,635 epoch 9 - iter 126/146 - loss 0.16595253 - time (sec): 80.21 - samples/sec: 475.12 - lr: 0.000020 - momentum: 0.000000 2023-10-10 23:13:55,901 epoch 9 - iter 140/146 - loss 0.16336951 - time (sec): 89.47 - samples/sec: 480.09 - lr: 0.000018 - momentum: 0.000000 2023-10-10 23:13:59,214 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:13:59,215 EPOCH 9 done: loss 0.1615 - lr: 0.000018 2023-10-10 23:14:04,878 DEV : loss 0.18229743838310242 - f1-score (micro avg) 0.5605 2023-10-10 23:14:04,887 saving best model 2023-10-10 23:14:15,726 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:14:25,016 epoch 10 - iter 14/146 - loss 0.16914234 - time (sec): 9.29 - samples/sec: 437.01 - lr: 0.000016 - momentum: 0.000000 2023-10-10 23:14:35,848 epoch 10 - iter 28/146 - loss 0.16172601 - time (sec): 20.12 - samples/sec: 436.43 - lr: 0.000014 - momentum: 0.000000 2023-10-10 23:14:45,866 epoch 10 - iter 42/146 - loss 0.15318731 - time (sec): 30.14 - samples/sec: 413.82 - lr: 0.000013 - momentum: 0.000000 2023-10-10 23:14:55,889 epoch 10 - iter 56/146 - loss 0.14649958 - time (sec): 40.16 - samples/sec: 416.17 - lr: 0.000011 - momentum: 0.000000 2023-10-10 23:15:06,068 epoch 10 - iter 70/146 - loss 0.14015652 - time (sec): 50.34 - samples/sec: 418.19 - lr: 0.000009 - momentum: 0.000000 2023-10-10 23:15:14,831 epoch 10 - iter 84/146 - loss 0.14204998 - time (sec): 59.10 - samples/sec: 419.73 - lr: 0.000008 - momentum: 0.000000 2023-10-10 23:15:24,600 epoch 10 - iter 98/146 - loss 0.14430949 - time (sec): 68.87 - samples/sec: 431.09 - lr: 0.000006 - momentum: 0.000000 2023-10-10 23:15:33,767 epoch 10 - iter 112/146 - loss 0.15183816 - time (sec): 78.04 - samples/sec: 439.05 - lr: 0.000005 - momentum: 0.000000 2023-10-10 23:15:42,602 epoch 10 - iter 126/146 - loss 0.14950005 - time (sec): 86.87 - samples/sec: 441.81 - lr: 0.000003 - momentum: 0.000000 2023-10-10 23:15:51,543 epoch 10 - iter 140/146 - loss 0.15184293 - time (sec): 95.81 - samples/sec: 448.71 - lr: 0.000001 - momentum: 0.000000 2023-10-10 23:15:54,868 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:15:54,868 EPOCH 10 done: loss 0.1505 - lr: 0.000001 2023-10-10 23:16:00,814 DEV : loss 0.1794561892747879 - f1-score (micro avg) 0.5867 2023-10-10 23:16:00,824 saving best model 2023-10-10 23:16:11,463 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:16:11,465 Loading model from best epoch ... 2023-10-10 23:16:15,063 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-10 23:16:27,864 Results: - F-score (micro) 0.6406 - F-score (macro) 0.3912 - Accuracy 0.5198 By class: precision recall f1-score support PER 0.7484 0.6839 0.7147 348 LOC 0.5597 0.8084 0.6614 261 ORG 0.1852 0.1923 0.1887 52 HumanProd 0.0000 0.0000 0.0000 22 micro avg 0.6120 0.6720 0.6406 683 macro avg 0.3733 0.4212 0.3912 683 weighted avg 0.6093 0.6720 0.6313 683 2023-10-10 23:16:27,864 ----------------------------------------------------------------------------------------------------