stefan-it's picture
Upload folder using huggingface_hub
37091b9
2023-10-10 22:58:10,890 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,892 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-10 22:58:10,892 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,893 MultiCorpus: 1166 train + 165 dev + 415 test sentences
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
2023-10-10 22:58:10,893 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,893 Train: 1166 sentences
2023-10-10 22:58:10,893 (train_with_dev=False, train_with_test=False)
2023-10-10 22:58:10,893 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,893 Training Params:
2023-10-10 22:58:10,893 - learning_rate: "0.00015"
2023-10-10 22:58:10,893 - mini_batch_size: "8"
2023-10-10 22:58:10,893 - max_epochs: "10"
2023-10-10 22:58:10,893 - shuffle: "True"
2023-10-10 22:58:10,893 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,893 Plugins:
2023-10-10 22:58:10,893 - TensorboardLogger
2023-10-10 22:58:10,894 - LinearScheduler | warmup_fraction: '0.1'
2023-10-10 22:58:10,894 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,894 Final evaluation on model from best epoch (best-model.pt)
2023-10-10 22:58:10,894 - metric: "('micro avg', 'f1-score')"
2023-10-10 22:58:10,894 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,894 Computation:
2023-10-10 22:58:10,894 - compute on device: cuda:0
2023-10-10 22:58:10,894 - embedding storage: none
2023-10-10 22:58:10,894 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,894 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2"
2023-10-10 22:58:10,894 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,894 ----------------------------------------------------------------------------------------------------
2023-10-10 22:58:10,894 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-10 22:58:20,744 epoch 1 - iter 14/146 - loss 2.85080360 - time (sec): 9.85 - samples/sec: 508.04 - lr: 0.000013 - momentum: 0.000000
2023-10-10 22:58:29,270 epoch 1 - iter 28/146 - loss 2.84737208 - time (sec): 18.37 - samples/sec: 476.37 - lr: 0.000028 - momentum: 0.000000
2023-10-10 22:58:38,296 epoch 1 - iter 42/146 - loss 2.83617163 - time (sec): 27.40 - samples/sec: 485.34 - lr: 0.000042 - momentum: 0.000000
2023-10-10 22:58:47,385 epoch 1 - iter 56/146 - loss 2.82244913 - time (sec): 36.49 - samples/sec: 480.64 - lr: 0.000057 - momentum: 0.000000
2023-10-10 22:58:56,369 epoch 1 - iter 70/146 - loss 2.79297253 - time (sec): 45.47 - samples/sec: 469.05 - lr: 0.000071 - momentum: 0.000000
2023-10-10 22:59:05,074 epoch 1 - iter 84/146 - loss 2.74178364 - time (sec): 54.18 - samples/sec: 462.87 - lr: 0.000085 - momentum: 0.000000
2023-10-10 22:59:14,215 epoch 1 - iter 98/146 - loss 2.67383737 - time (sec): 63.32 - samples/sec: 460.35 - lr: 0.000100 - momentum: 0.000000
2023-10-10 22:59:23,588 epoch 1 - iter 112/146 - loss 2.59471648 - time (sec): 72.69 - samples/sec: 459.42 - lr: 0.000114 - momentum: 0.000000
2023-10-10 22:59:32,864 epoch 1 - iter 126/146 - loss 2.50270095 - time (sec): 81.97 - samples/sec: 463.67 - lr: 0.000128 - momentum: 0.000000
2023-10-10 22:59:42,548 epoch 1 - iter 140/146 - loss 2.40998070 - time (sec): 91.65 - samples/sec: 465.82 - lr: 0.000143 - momentum: 0.000000
2023-10-10 22:59:46,352 ----------------------------------------------------------------------------------------------------
2023-10-10 22:59:46,352 EPOCH 1 done: loss 2.3739 - lr: 0.000143
2023-10-10 22:59:51,751 DEV : loss 1.3572572469711304 - f1-score (micro avg) 0.0
2023-10-10 22:59:51,762 ----------------------------------------------------------------------------------------------------
2023-10-10 23:00:00,609 epoch 2 - iter 14/146 - loss 1.35621053 - time (sec): 8.85 - samples/sec: 476.09 - lr: 0.000149 - momentum: 0.000000
2023-10-10 23:00:10,228 epoch 2 - iter 28/146 - loss 1.27306197 - time (sec): 18.46 - samples/sec: 485.71 - lr: 0.000147 - momentum: 0.000000
2023-10-10 23:00:19,552 epoch 2 - iter 42/146 - loss 1.20505413 - time (sec): 27.79 - samples/sec: 495.43 - lr: 0.000145 - momentum: 0.000000
2023-10-10 23:00:28,035 epoch 2 - iter 56/146 - loss 1.14229062 - time (sec): 36.27 - samples/sec: 478.01 - lr: 0.000144 - momentum: 0.000000
2023-10-10 23:00:37,032 epoch 2 - iter 70/146 - loss 1.06371575 - time (sec): 45.27 - samples/sec: 477.67 - lr: 0.000142 - momentum: 0.000000
2023-10-10 23:00:44,657 epoch 2 - iter 84/146 - loss 1.03291390 - time (sec): 52.89 - samples/sec: 466.21 - lr: 0.000141 - momentum: 0.000000
2023-10-10 23:00:53,889 epoch 2 - iter 98/146 - loss 0.97249437 - time (sec): 62.12 - samples/sec: 469.83 - lr: 0.000139 - momentum: 0.000000
2023-10-10 23:01:03,403 epoch 2 - iter 112/146 - loss 0.91123270 - time (sec): 71.64 - samples/sec: 473.21 - lr: 0.000137 - momentum: 0.000000
2023-10-10 23:01:12,183 epoch 2 - iter 126/146 - loss 0.86074670 - time (sec): 80.42 - samples/sec: 474.96 - lr: 0.000136 - momentum: 0.000000
2023-10-10 23:01:21,088 epoch 2 - iter 140/146 - loss 0.82736123 - time (sec): 89.32 - samples/sec: 471.80 - lr: 0.000134 - momentum: 0.000000
2023-10-10 23:01:25,200 ----------------------------------------------------------------------------------------------------
2023-10-10 23:01:25,200 EPOCH 2 done: loss 0.8534 - lr: 0.000134
2023-10-10 23:01:31,340 DEV : loss 0.4602771997451782 - f1-score (micro avg) 0.0
2023-10-10 23:01:31,350 ----------------------------------------------------------------------------------------------------
2023-10-10 23:01:40,294 epoch 3 - iter 14/146 - loss 0.58434813 - time (sec): 8.94 - samples/sec: 407.55 - lr: 0.000132 - momentum: 0.000000
2023-10-10 23:01:48,865 epoch 3 - iter 28/146 - loss 0.51229680 - time (sec): 17.51 - samples/sec: 443.85 - lr: 0.000130 - momentum: 0.000000
2023-10-10 23:01:58,385 epoch 3 - iter 42/146 - loss 0.60569011 - time (sec): 27.03 - samples/sec: 463.77 - lr: 0.000129 - momentum: 0.000000
2023-10-10 23:02:06,775 epoch 3 - iter 56/146 - loss 0.57540816 - time (sec): 35.42 - samples/sec: 462.75 - lr: 0.000127 - momentum: 0.000000
2023-10-10 23:02:15,652 epoch 3 - iter 70/146 - loss 0.54419923 - time (sec): 44.30 - samples/sec: 466.46 - lr: 0.000126 - momentum: 0.000000
2023-10-10 23:02:25,245 epoch 3 - iter 84/146 - loss 0.52507541 - time (sec): 53.89 - samples/sec: 459.35 - lr: 0.000124 - momentum: 0.000000
2023-10-10 23:02:35,262 epoch 3 - iter 98/146 - loss 0.50009096 - time (sec): 63.91 - samples/sec: 456.79 - lr: 0.000122 - momentum: 0.000000
2023-10-10 23:02:45,700 epoch 3 - iter 112/146 - loss 0.47890839 - time (sec): 74.35 - samples/sec: 455.88 - lr: 0.000121 - momentum: 0.000000
2023-10-10 23:02:55,929 epoch 3 - iter 126/146 - loss 0.46371610 - time (sec): 84.58 - samples/sec: 452.61 - lr: 0.000119 - momentum: 0.000000
2023-10-10 23:03:06,048 epoch 3 - iter 140/146 - loss 0.45098021 - time (sec): 94.70 - samples/sec: 452.39 - lr: 0.000118 - momentum: 0.000000
2023-10-10 23:03:09,921 ----------------------------------------------------------------------------------------------------
2023-10-10 23:03:09,921 EPOCH 3 done: loss 0.4549 - lr: 0.000118
2023-10-10 23:03:15,900 DEV : loss 0.3545370399951935 - f1-score (micro avg) 0.1683
2023-10-10 23:03:15,909 saving best model
2023-10-10 23:03:16,804 ----------------------------------------------------------------------------------------------------
2023-10-10 23:03:25,497 epoch 4 - iter 14/146 - loss 0.35955470 - time (sec): 8.69 - samples/sec: 469.20 - lr: 0.000115 - momentum: 0.000000
2023-10-10 23:03:34,922 epoch 4 - iter 28/146 - loss 0.43915891 - time (sec): 18.12 - samples/sec: 466.27 - lr: 0.000114 - momentum: 0.000000
2023-10-10 23:03:43,948 epoch 4 - iter 42/146 - loss 0.36828509 - time (sec): 27.14 - samples/sec: 469.10 - lr: 0.000112 - momentum: 0.000000
2023-10-10 23:03:52,763 epoch 4 - iter 56/146 - loss 0.36449078 - time (sec): 35.96 - samples/sec: 462.14 - lr: 0.000111 - momentum: 0.000000
2023-10-10 23:04:01,461 epoch 4 - iter 70/146 - loss 0.36634265 - time (sec): 44.65 - samples/sec: 461.38 - lr: 0.000109 - momentum: 0.000000
2023-10-10 23:04:10,702 epoch 4 - iter 84/146 - loss 0.36530805 - time (sec): 53.90 - samples/sec: 458.20 - lr: 0.000107 - momentum: 0.000000
2023-10-10 23:04:20,010 epoch 4 - iter 98/146 - loss 0.34785221 - time (sec): 63.20 - samples/sec: 460.69 - lr: 0.000106 - momentum: 0.000000
2023-10-10 23:04:28,603 epoch 4 - iter 112/146 - loss 0.34537336 - time (sec): 71.80 - samples/sec: 460.94 - lr: 0.000104 - momentum: 0.000000
2023-10-10 23:04:38,203 epoch 4 - iter 126/146 - loss 0.34816589 - time (sec): 81.40 - samples/sec: 461.70 - lr: 0.000103 - momentum: 0.000000
2023-10-10 23:04:47,882 epoch 4 - iter 140/146 - loss 0.35021896 - time (sec): 91.08 - samples/sec: 465.67 - lr: 0.000101 - momentum: 0.000000
2023-10-10 23:04:51,671 ----------------------------------------------------------------------------------------------------
2023-10-10 23:04:51,671 EPOCH 4 done: loss 0.3454 - lr: 0.000101
2023-10-10 23:04:57,765 DEV : loss 0.2620405852794647 - f1-score (micro avg) 0.2198
2023-10-10 23:04:57,775 saving best model
2023-10-10 23:05:06,005 ----------------------------------------------------------------------------------------------------
2023-10-10 23:05:14,754 epoch 5 - iter 14/146 - loss 0.33065727 - time (sec): 8.75 - samples/sec: 463.69 - lr: 0.000099 - momentum: 0.000000
2023-10-10 23:05:24,485 epoch 5 - iter 28/146 - loss 0.27465261 - time (sec): 18.48 - samples/sec: 480.13 - lr: 0.000097 - momentum: 0.000000
2023-10-10 23:05:33,372 epoch 5 - iter 42/146 - loss 0.27046333 - time (sec): 27.36 - samples/sec: 469.02 - lr: 0.000096 - momentum: 0.000000
2023-10-10 23:05:42,165 epoch 5 - iter 56/146 - loss 0.26438058 - time (sec): 36.16 - samples/sec: 469.49 - lr: 0.000094 - momentum: 0.000000
2023-10-10 23:05:50,851 epoch 5 - iter 70/146 - loss 0.27496324 - time (sec): 44.84 - samples/sec: 465.23 - lr: 0.000092 - momentum: 0.000000
2023-10-10 23:06:01,117 epoch 5 - iter 84/146 - loss 0.30009690 - time (sec): 55.11 - samples/sec: 473.85 - lr: 0.000091 - momentum: 0.000000
2023-10-10 23:06:11,160 epoch 5 - iter 98/146 - loss 0.30356108 - time (sec): 65.15 - samples/sec: 473.78 - lr: 0.000089 - momentum: 0.000000
2023-10-10 23:06:20,380 epoch 5 - iter 112/146 - loss 0.29759337 - time (sec): 74.37 - samples/sec: 475.74 - lr: 0.000088 - momentum: 0.000000
2023-10-10 23:06:29,071 epoch 5 - iter 126/146 - loss 0.29582810 - time (sec): 83.06 - samples/sec: 470.80 - lr: 0.000086 - momentum: 0.000000
2023-10-10 23:06:37,398 epoch 5 - iter 140/146 - loss 0.29412746 - time (sec): 91.39 - samples/sec: 466.97 - lr: 0.000084 - momentum: 0.000000
2023-10-10 23:06:41,340 ----------------------------------------------------------------------------------------------------
2023-10-10 23:06:41,340 EPOCH 5 done: loss 0.2924 - lr: 0.000084
2023-10-10 23:06:47,335 DEV : loss 0.233732670545578 - f1-score (micro avg) 0.294
2023-10-10 23:06:47,344 saving best model
2023-10-10 23:06:55,188 ----------------------------------------------------------------------------------------------------
2023-10-10 23:07:04,904 epoch 6 - iter 14/146 - loss 0.21279716 - time (sec): 9.71 - samples/sec: 479.03 - lr: 0.000082 - momentum: 0.000000
2023-10-10 23:07:13,663 epoch 6 - iter 28/146 - loss 0.23945889 - time (sec): 18.47 - samples/sec: 467.99 - lr: 0.000081 - momentum: 0.000000
2023-10-10 23:07:22,417 epoch 6 - iter 42/146 - loss 0.22572971 - time (sec): 27.23 - samples/sec: 471.79 - lr: 0.000079 - momentum: 0.000000
2023-10-10 23:07:31,554 epoch 6 - iter 56/146 - loss 0.23744901 - time (sec): 36.36 - samples/sec: 466.59 - lr: 0.000077 - momentum: 0.000000
2023-10-10 23:07:40,351 epoch 6 - iter 70/146 - loss 0.24226304 - time (sec): 45.16 - samples/sec: 469.66 - lr: 0.000076 - momentum: 0.000000
2023-10-10 23:07:48,887 epoch 6 - iter 84/146 - loss 0.24621321 - time (sec): 53.70 - samples/sec: 468.04 - lr: 0.000074 - momentum: 0.000000
2023-10-10 23:07:58,610 epoch 6 - iter 98/146 - loss 0.26217291 - time (sec): 63.42 - samples/sec: 476.79 - lr: 0.000073 - momentum: 0.000000
2023-10-10 23:08:07,909 epoch 6 - iter 112/146 - loss 0.26102990 - time (sec): 72.72 - samples/sec: 470.50 - lr: 0.000071 - momentum: 0.000000
2023-10-10 23:08:16,844 epoch 6 - iter 126/146 - loss 0.25644519 - time (sec): 81.65 - samples/sec: 468.56 - lr: 0.000069 - momentum: 0.000000
2023-10-10 23:08:25,543 epoch 6 - iter 140/146 - loss 0.25250192 - time (sec): 90.35 - samples/sec: 470.08 - lr: 0.000068 - momentum: 0.000000
2023-10-10 23:08:29,489 ----------------------------------------------------------------------------------------------------
2023-10-10 23:08:29,490 EPOCH 6 done: loss 0.2498 - lr: 0.000068
2023-10-10 23:08:35,694 DEV : loss 0.21416617929935455 - f1-score (micro avg) 0.428
2023-10-10 23:08:35,705 saving best model
2023-10-10 23:08:40,902 ----------------------------------------------------------------------------------------------------
2023-10-10 23:08:50,396 epoch 7 - iter 14/146 - loss 0.20869646 - time (sec): 9.49 - samples/sec: 439.54 - lr: 0.000066 - momentum: 0.000000
2023-10-10 23:09:00,761 epoch 7 - iter 28/146 - loss 0.19793080 - time (sec): 19.85 - samples/sec: 464.18 - lr: 0.000064 - momentum: 0.000000
2023-10-10 23:09:09,488 epoch 7 - iter 42/146 - loss 0.19705387 - time (sec): 28.58 - samples/sec: 443.46 - lr: 0.000062 - momentum: 0.000000
2023-10-10 23:09:18,807 epoch 7 - iter 56/146 - loss 0.21316354 - time (sec): 37.90 - samples/sec: 443.27 - lr: 0.000061 - momentum: 0.000000
2023-10-10 23:09:26,780 epoch 7 - iter 70/146 - loss 0.20525473 - time (sec): 45.87 - samples/sec: 437.55 - lr: 0.000059 - momentum: 0.000000
2023-10-10 23:09:35,299 epoch 7 - iter 84/146 - loss 0.20806961 - time (sec): 54.39 - samples/sec: 445.74 - lr: 0.000058 - momentum: 0.000000
2023-10-10 23:09:44,917 epoch 7 - iter 98/146 - loss 0.20881775 - time (sec): 64.01 - samples/sec: 459.12 - lr: 0.000056 - momentum: 0.000000
2023-10-10 23:09:54,303 epoch 7 - iter 112/146 - loss 0.20670940 - time (sec): 73.40 - samples/sec: 457.48 - lr: 0.000054 - momentum: 0.000000
2023-10-10 23:10:03,027 epoch 7 - iter 126/146 - loss 0.21554904 - time (sec): 82.12 - samples/sec: 460.36 - lr: 0.000053 - momentum: 0.000000
2023-10-10 23:10:12,759 epoch 7 - iter 140/146 - loss 0.20959678 - time (sec): 91.85 - samples/sec: 465.33 - lr: 0.000051 - momentum: 0.000000
2023-10-10 23:10:16,496 ----------------------------------------------------------------------------------------------------
2023-10-10 23:10:16,496 EPOCH 7 done: loss 0.2096 - lr: 0.000051
2023-10-10 23:10:22,427 DEV : loss 0.19345837831497192 - f1-score (micro avg) 0.485
2023-10-10 23:10:22,437 saving best model
2023-10-10 23:10:32,162 ----------------------------------------------------------------------------------------------------
2023-10-10 23:10:41,415 epoch 8 - iter 14/146 - loss 0.18298293 - time (sec): 9.25 - samples/sec: 457.08 - lr: 0.000049 - momentum: 0.000000
2023-10-10 23:10:50,318 epoch 8 - iter 28/146 - loss 0.20041757 - time (sec): 18.15 - samples/sec: 467.68 - lr: 0.000047 - momentum: 0.000000
2023-10-10 23:10:59,244 epoch 8 - iter 42/146 - loss 0.18477550 - time (sec): 27.08 - samples/sec: 470.62 - lr: 0.000046 - momentum: 0.000000
2023-10-10 23:11:07,458 epoch 8 - iter 56/146 - loss 0.18939422 - time (sec): 35.29 - samples/sec: 460.11 - lr: 0.000044 - momentum: 0.000000
2023-10-10 23:11:16,672 epoch 8 - iter 70/146 - loss 0.20069635 - time (sec): 44.51 - samples/sec: 471.43 - lr: 0.000043 - momentum: 0.000000
2023-10-10 23:11:25,545 epoch 8 - iter 84/146 - loss 0.19917900 - time (sec): 53.38 - samples/sec: 463.97 - lr: 0.000041 - momentum: 0.000000
2023-10-10 23:11:35,352 epoch 8 - iter 98/146 - loss 0.18755072 - time (sec): 63.19 - samples/sec: 468.40 - lr: 0.000039 - momentum: 0.000000
2023-10-10 23:11:43,884 epoch 8 - iter 112/146 - loss 0.18603688 - time (sec): 71.72 - samples/sec: 466.09 - lr: 0.000038 - momentum: 0.000000
2023-10-10 23:11:53,379 epoch 8 - iter 126/146 - loss 0.18215187 - time (sec): 81.21 - samples/sec: 470.73 - lr: 0.000036 - momentum: 0.000000
2023-10-10 23:12:03,326 epoch 8 - iter 140/146 - loss 0.18026671 - time (sec): 91.16 - samples/sec: 473.98 - lr: 0.000035 - momentum: 0.000000
2023-10-10 23:12:06,714 ----------------------------------------------------------------------------------------------------
2023-10-10 23:12:06,714 EPOCH 8 done: loss 0.1781 - lr: 0.000035
2023-10-10 23:12:12,668 DEV : loss 0.18470922112464905 - f1-score (micro avg) 0.5094
2023-10-10 23:12:12,678 saving best model
2023-10-10 23:12:26,423 ----------------------------------------------------------------------------------------------------
2023-10-10 23:12:35,685 epoch 9 - iter 14/146 - loss 0.16655684 - time (sec): 9.26 - samples/sec: 465.12 - lr: 0.000032 - momentum: 0.000000
2023-10-10 23:12:44,294 epoch 9 - iter 28/146 - loss 0.16140822 - time (sec): 17.87 - samples/sec: 461.81 - lr: 0.000031 - momentum: 0.000000
2023-10-10 23:12:52,406 epoch 9 - iter 42/146 - loss 0.17856536 - time (sec): 25.98 - samples/sec: 448.97 - lr: 0.000029 - momentum: 0.000000
2023-10-10 23:13:01,225 epoch 9 - iter 56/146 - loss 0.17303316 - time (sec): 34.80 - samples/sec: 456.18 - lr: 0.000028 - momentum: 0.000000
2023-10-10 23:13:11,847 epoch 9 - iter 70/146 - loss 0.17994148 - time (sec): 45.42 - samples/sec: 477.28 - lr: 0.000026 - momentum: 0.000000
2023-10-10 23:13:19,828 epoch 9 - iter 84/146 - loss 0.17025472 - time (sec): 53.40 - samples/sec: 466.73 - lr: 0.000024 - momentum: 0.000000
2023-10-10 23:13:29,108 epoch 9 - iter 98/146 - loss 0.16961006 - time (sec): 62.68 - samples/sec: 473.92 - lr: 0.000023 - momentum: 0.000000
2023-10-10 23:13:38,011 epoch 9 - iter 112/146 - loss 0.16697591 - time (sec): 71.58 - samples/sec: 474.21 - lr: 0.000021 - momentum: 0.000000
2023-10-10 23:13:46,635 epoch 9 - iter 126/146 - loss 0.16595253 - time (sec): 80.21 - samples/sec: 475.12 - lr: 0.000020 - momentum: 0.000000
2023-10-10 23:13:55,901 epoch 9 - iter 140/146 - loss 0.16336951 - time (sec): 89.47 - samples/sec: 480.09 - lr: 0.000018 - momentum: 0.000000
2023-10-10 23:13:59,214 ----------------------------------------------------------------------------------------------------
2023-10-10 23:13:59,215 EPOCH 9 done: loss 0.1615 - lr: 0.000018
2023-10-10 23:14:04,878 DEV : loss 0.18229743838310242 - f1-score (micro avg) 0.5605
2023-10-10 23:14:04,887 saving best model
2023-10-10 23:14:15,726 ----------------------------------------------------------------------------------------------------
2023-10-10 23:14:25,016 epoch 10 - iter 14/146 - loss 0.16914234 - time (sec): 9.29 - samples/sec: 437.01 - lr: 0.000016 - momentum: 0.000000
2023-10-10 23:14:35,848 epoch 10 - iter 28/146 - loss 0.16172601 - time (sec): 20.12 - samples/sec: 436.43 - lr: 0.000014 - momentum: 0.000000
2023-10-10 23:14:45,866 epoch 10 - iter 42/146 - loss 0.15318731 - time (sec): 30.14 - samples/sec: 413.82 - lr: 0.000013 - momentum: 0.000000
2023-10-10 23:14:55,889 epoch 10 - iter 56/146 - loss 0.14649958 - time (sec): 40.16 - samples/sec: 416.17 - lr: 0.000011 - momentum: 0.000000
2023-10-10 23:15:06,068 epoch 10 - iter 70/146 - loss 0.14015652 - time (sec): 50.34 - samples/sec: 418.19 - lr: 0.000009 - momentum: 0.000000
2023-10-10 23:15:14,831 epoch 10 - iter 84/146 - loss 0.14204998 - time (sec): 59.10 - samples/sec: 419.73 - lr: 0.000008 - momentum: 0.000000
2023-10-10 23:15:24,600 epoch 10 - iter 98/146 - loss 0.14430949 - time (sec): 68.87 - samples/sec: 431.09 - lr: 0.000006 - momentum: 0.000000
2023-10-10 23:15:33,767 epoch 10 - iter 112/146 - loss 0.15183816 - time (sec): 78.04 - samples/sec: 439.05 - lr: 0.000005 - momentum: 0.000000
2023-10-10 23:15:42,602 epoch 10 - iter 126/146 - loss 0.14950005 - time (sec): 86.87 - samples/sec: 441.81 - lr: 0.000003 - momentum: 0.000000
2023-10-10 23:15:51,543 epoch 10 - iter 140/146 - loss 0.15184293 - time (sec): 95.81 - samples/sec: 448.71 - lr: 0.000001 - momentum: 0.000000
2023-10-10 23:15:54,868 ----------------------------------------------------------------------------------------------------
2023-10-10 23:15:54,868 EPOCH 10 done: loss 0.1505 - lr: 0.000001
2023-10-10 23:16:00,814 DEV : loss 0.1794561892747879 - f1-score (micro avg) 0.5867
2023-10-10 23:16:00,824 saving best model
2023-10-10 23:16:11,463 ----------------------------------------------------------------------------------------------------
2023-10-10 23:16:11,465 Loading model from best epoch ...
2023-10-10 23:16:15,063 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-10 23:16:27,864
Results:
- F-score (micro) 0.6406
- F-score (macro) 0.3912
- Accuracy 0.5198
By class:
precision recall f1-score support
PER 0.7484 0.6839 0.7147 348
LOC 0.5597 0.8084 0.6614 261
ORG 0.1852 0.1923 0.1887 52
HumanProd 0.0000 0.0000 0.0000 22
micro avg 0.6120 0.6720 0.6406 683
macro avg 0.3733 0.4212 0.3912 683
weighted avg 0.6093 0.6720 0.6313 683
2023-10-10 23:16:27,864 ----------------------------------------------------------------------------------------------------