stefan-it's picture
Upload folder using huggingface_hub
df5837a
2023-10-11 10:32:58,086 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,088 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 10:32:58,089 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,089 MultiCorpus: 1085 train + 148 dev + 364 test sentences
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator
2023-10-11 10:32:58,089 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,089 Train: 1085 sentences
2023-10-11 10:32:58,089 (train_with_dev=False, train_with_test=False)
2023-10-11 10:32:58,089 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,089 Training Params:
2023-10-11 10:32:58,089 - learning_rate: "0.00016"
2023-10-11 10:32:58,089 - mini_batch_size: "8"
2023-10-11 10:32:58,089 - max_epochs: "10"
2023-10-11 10:32:58,089 - shuffle: "True"
2023-10-11 10:32:58,089 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,090 Plugins:
2023-10-11 10:32:58,090 - TensorboardLogger
2023-10-11 10:32:58,090 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 10:32:58,090 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,090 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 10:32:58,090 - metric: "('micro avg', 'f1-score')"
2023-10-11 10:32:58,090 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,090 Computation:
2023-10-11 10:32:58,090 - compute on device: cuda:0
2023-10-11 10:32:58,090 - embedding storage: none
2023-10-11 10:32:58,090 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,090 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3"
2023-10-11 10:32:58,090 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,090 ----------------------------------------------------------------------------------------------------
2023-10-11 10:32:58,091 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 10:33:07,135 epoch 1 - iter 13/136 - loss 2.82905636 - time (sec): 9.04 - samples/sec: 614.54 - lr: 0.000014 - momentum: 0.000000
2023-10-11 10:33:16,421 epoch 1 - iter 26/136 - loss 2.82283109 - time (sec): 18.33 - samples/sec: 587.61 - lr: 0.000029 - momentum: 0.000000
2023-10-11 10:33:26,109 epoch 1 - iter 39/136 - loss 2.81195330 - time (sec): 28.02 - samples/sec: 558.24 - lr: 0.000045 - momentum: 0.000000
2023-10-11 10:33:34,871 epoch 1 - iter 52/136 - loss 2.79359051 - time (sec): 36.78 - samples/sec: 558.01 - lr: 0.000060 - momentum: 0.000000
2023-10-11 10:33:43,323 epoch 1 - iter 65/136 - loss 2.75967078 - time (sec): 45.23 - samples/sec: 559.40 - lr: 0.000075 - momentum: 0.000000
2023-10-11 10:33:52,211 epoch 1 - iter 78/136 - loss 2.70744027 - time (sec): 54.12 - samples/sec: 559.05 - lr: 0.000091 - momentum: 0.000000
2023-10-11 10:34:00,586 epoch 1 - iter 91/136 - loss 2.63923954 - time (sec): 62.49 - samples/sec: 560.68 - lr: 0.000106 - momentum: 0.000000
2023-10-11 10:34:09,109 epoch 1 - iter 104/136 - loss 2.56285426 - time (sec): 71.02 - samples/sec: 561.07 - lr: 0.000121 - momentum: 0.000000
2023-10-11 10:34:17,698 epoch 1 - iter 117/136 - loss 2.48134613 - time (sec): 79.61 - samples/sec: 562.29 - lr: 0.000136 - momentum: 0.000000
2023-10-11 10:34:26,610 epoch 1 - iter 130/136 - loss 2.39548058 - time (sec): 88.52 - samples/sec: 564.09 - lr: 0.000152 - momentum: 0.000000
2023-10-11 10:34:30,259 ----------------------------------------------------------------------------------------------------
2023-10-11 10:34:30,259 EPOCH 1 done: loss 2.3601 - lr: 0.000152
2023-10-11 10:34:34,993 DEV : loss 1.3611388206481934 - f1-score (micro avg) 0.0
2023-10-11 10:34:35,001 ----------------------------------------------------------------------------------------------------
2023-10-11 10:34:44,448 epoch 2 - iter 13/136 - loss 1.33420703 - time (sec): 9.45 - samples/sec: 609.84 - lr: 0.000158 - momentum: 0.000000
2023-10-11 10:34:53,003 epoch 2 - iter 26/136 - loss 1.27788032 - time (sec): 18.00 - samples/sec: 597.09 - lr: 0.000157 - momentum: 0.000000
2023-10-11 10:35:01,649 epoch 2 - iter 39/136 - loss 1.19457463 - time (sec): 26.65 - samples/sec: 600.90 - lr: 0.000155 - momentum: 0.000000
2023-10-11 10:35:09,691 epoch 2 - iter 52/136 - loss 1.12462593 - time (sec): 34.69 - samples/sec: 589.01 - lr: 0.000153 - momentum: 0.000000
2023-10-11 10:35:17,806 epoch 2 - iter 65/136 - loss 1.05838575 - time (sec): 42.80 - samples/sec: 587.74 - lr: 0.000152 - momentum: 0.000000
2023-10-11 10:35:25,642 epoch 2 - iter 78/136 - loss 1.00499391 - time (sec): 50.64 - samples/sec: 577.46 - lr: 0.000150 - momentum: 0.000000
2023-10-11 10:35:34,370 epoch 2 - iter 91/136 - loss 0.94353939 - time (sec): 59.37 - samples/sec: 575.06 - lr: 0.000148 - momentum: 0.000000
2023-10-11 10:35:42,539 epoch 2 - iter 104/136 - loss 0.90638593 - time (sec): 67.54 - samples/sec: 566.98 - lr: 0.000147 - momentum: 0.000000
2023-10-11 10:35:51,886 epoch 2 - iter 117/136 - loss 0.85831314 - time (sec): 76.88 - samples/sec: 571.48 - lr: 0.000145 - momentum: 0.000000
2023-10-11 10:36:01,521 epoch 2 - iter 130/136 - loss 0.82520135 - time (sec): 86.52 - samples/sec: 575.24 - lr: 0.000143 - momentum: 0.000000
2023-10-11 10:36:05,445 ----------------------------------------------------------------------------------------------------
2023-10-11 10:36:05,446 EPOCH 2 done: loss 0.8126 - lr: 0.000143
2023-10-11 10:36:11,132 DEV : loss 0.47113388776779175 - f1-score (micro avg) 0.0
2023-10-11 10:36:11,141 ----------------------------------------------------------------------------------------------------
2023-10-11 10:36:19,932 epoch 3 - iter 13/136 - loss 0.52553173 - time (sec): 8.79 - samples/sec: 566.39 - lr: 0.000141 - momentum: 0.000000
2023-10-11 10:36:28,958 epoch 3 - iter 26/136 - loss 0.47220984 - time (sec): 17.81 - samples/sec: 579.07 - lr: 0.000139 - momentum: 0.000000
2023-10-11 10:36:37,970 epoch 3 - iter 39/136 - loss 0.46796738 - time (sec): 26.83 - samples/sec: 590.48 - lr: 0.000137 - momentum: 0.000000
2023-10-11 10:36:46,164 epoch 3 - iter 52/136 - loss 0.46389304 - time (sec): 35.02 - samples/sec: 583.74 - lr: 0.000136 - momentum: 0.000000
2023-10-11 10:36:54,424 epoch 3 - iter 65/136 - loss 0.45462956 - time (sec): 43.28 - samples/sec: 576.74 - lr: 0.000134 - momentum: 0.000000
2023-10-11 10:37:03,468 epoch 3 - iter 78/136 - loss 0.44492860 - time (sec): 52.32 - samples/sec: 583.43 - lr: 0.000132 - momentum: 0.000000
2023-10-11 10:37:12,244 epoch 3 - iter 91/136 - loss 0.42881004 - time (sec): 61.10 - samples/sec: 584.03 - lr: 0.000131 - momentum: 0.000000
2023-10-11 10:37:21,211 epoch 3 - iter 104/136 - loss 0.40971397 - time (sec): 70.07 - samples/sec: 582.00 - lr: 0.000129 - momentum: 0.000000
2023-10-11 10:37:29,425 epoch 3 - iter 117/136 - loss 0.39627687 - time (sec): 78.28 - samples/sec: 576.71 - lr: 0.000127 - momentum: 0.000000
2023-10-11 10:37:37,379 epoch 3 - iter 130/136 - loss 0.39607367 - time (sec): 86.24 - samples/sec: 571.76 - lr: 0.000126 - momentum: 0.000000
2023-10-11 10:37:41,524 ----------------------------------------------------------------------------------------------------
2023-10-11 10:37:41,524 EPOCH 3 done: loss 0.3879 - lr: 0.000126
2023-10-11 10:37:47,309 DEV : loss 0.28040581941604614 - f1-score (micro avg) 0.2634
2023-10-11 10:37:47,317 saving best model
2023-10-11 10:37:48,185 ----------------------------------------------------------------------------------------------------
2023-10-11 10:37:56,781 epoch 4 - iter 13/136 - loss 0.26964123 - time (sec): 8.59 - samples/sec: 546.78 - lr: 0.000123 - momentum: 0.000000
2023-10-11 10:38:04,326 epoch 4 - iter 26/136 - loss 0.30851485 - time (sec): 16.14 - samples/sec: 521.02 - lr: 0.000121 - momentum: 0.000000
2023-10-11 10:38:12,962 epoch 4 - iter 39/136 - loss 0.32765945 - time (sec): 24.77 - samples/sec: 544.95 - lr: 0.000120 - momentum: 0.000000
2023-10-11 10:38:21,363 epoch 4 - iter 52/136 - loss 0.31247566 - time (sec): 33.18 - samples/sec: 551.30 - lr: 0.000118 - momentum: 0.000000
2023-10-11 10:38:29,247 epoch 4 - iter 65/136 - loss 0.31240592 - time (sec): 41.06 - samples/sec: 544.74 - lr: 0.000116 - momentum: 0.000000
2023-10-11 10:38:37,346 epoch 4 - iter 78/136 - loss 0.30735790 - time (sec): 49.16 - samples/sec: 550.88 - lr: 0.000115 - momentum: 0.000000
2023-10-11 10:38:47,151 epoch 4 - iter 91/136 - loss 0.30114313 - time (sec): 58.96 - samples/sec: 570.23 - lr: 0.000113 - momentum: 0.000000
2023-10-11 10:38:55,408 epoch 4 - iter 104/136 - loss 0.29747977 - time (sec): 67.22 - samples/sec: 574.15 - lr: 0.000111 - momentum: 0.000000
2023-10-11 10:39:04,230 epoch 4 - iter 117/136 - loss 0.29176984 - time (sec): 76.04 - samples/sec: 579.91 - lr: 0.000109 - momentum: 0.000000
2023-10-11 10:39:13,172 epoch 4 - iter 130/136 - loss 0.28456610 - time (sec): 84.99 - samples/sec: 584.89 - lr: 0.000108 - momentum: 0.000000
2023-10-11 10:39:16,968 ----------------------------------------------------------------------------------------------------
2023-10-11 10:39:16,968 EPOCH 4 done: loss 0.2853 - lr: 0.000108
2023-10-11 10:39:22,640 DEV : loss 0.23706214129924774 - f1-score (micro avg) 0.4307
2023-10-11 10:39:22,649 saving best model
2023-10-11 10:39:25,192 ----------------------------------------------------------------------------------------------------
2023-10-11 10:39:33,989 epoch 5 - iter 13/136 - loss 0.22322927 - time (sec): 8.79 - samples/sec: 619.75 - lr: 0.000105 - momentum: 0.000000
2023-10-11 10:39:42,567 epoch 5 - iter 26/136 - loss 0.24861548 - time (sec): 17.37 - samples/sec: 607.51 - lr: 0.000104 - momentum: 0.000000
2023-10-11 10:39:50,939 epoch 5 - iter 39/136 - loss 0.24237781 - time (sec): 25.74 - samples/sec: 606.54 - lr: 0.000102 - momentum: 0.000000
2023-10-11 10:39:59,037 epoch 5 - iter 52/136 - loss 0.24947129 - time (sec): 33.84 - samples/sec: 598.18 - lr: 0.000100 - momentum: 0.000000
2023-10-11 10:40:07,314 epoch 5 - iter 65/136 - loss 0.24401319 - time (sec): 42.12 - samples/sec: 584.48 - lr: 0.000099 - momentum: 0.000000
2023-10-11 10:40:15,450 epoch 5 - iter 78/136 - loss 0.24091737 - time (sec): 50.25 - samples/sec: 581.10 - lr: 0.000097 - momentum: 0.000000
2023-10-11 10:40:23,629 epoch 5 - iter 91/136 - loss 0.22839402 - time (sec): 58.43 - samples/sec: 580.20 - lr: 0.000095 - momentum: 0.000000
2023-10-11 10:40:32,299 epoch 5 - iter 104/136 - loss 0.23260473 - time (sec): 67.10 - samples/sec: 583.36 - lr: 0.000093 - momentum: 0.000000
2023-10-11 10:40:41,678 epoch 5 - iter 117/136 - loss 0.22826299 - time (sec): 76.48 - samples/sec: 589.32 - lr: 0.000092 - momentum: 0.000000
2023-10-11 10:40:50,175 epoch 5 - iter 130/136 - loss 0.22918291 - time (sec): 84.98 - samples/sec: 591.32 - lr: 0.000090 - momentum: 0.000000
2023-10-11 10:40:53,438 ----------------------------------------------------------------------------------------------------
2023-10-11 10:40:53,439 EPOCH 5 done: loss 0.2278 - lr: 0.000090
2023-10-11 10:40:59,295 DEV : loss 0.2019164115190506 - f1-score (micro avg) 0.5331
2023-10-11 10:40:59,304 saving best model
2023-10-11 10:41:01,882 ----------------------------------------------------------------------------------------------------
2023-10-11 10:41:10,040 epoch 6 - iter 13/136 - loss 0.18317750 - time (sec): 8.15 - samples/sec: 557.22 - lr: 0.000088 - momentum: 0.000000
2023-10-11 10:41:18,066 epoch 6 - iter 26/136 - loss 0.20214126 - time (sec): 16.18 - samples/sec: 546.57 - lr: 0.000086 - momentum: 0.000000
2023-10-11 10:41:26,453 epoch 6 - iter 39/136 - loss 0.20236218 - time (sec): 24.57 - samples/sec: 550.38 - lr: 0.000084 - momentum: 0.000000
2023-10-11 10:41:35,349 epoch 6 - iter 52/136 - loss 0.19854214 - time (sec): 33.46 - samples/sec: 562.28 - lr: 0.000083 - momentum: 0.000000
2023-10-11 10:41:44,067 epoch 6 - iter 65/136 - loss 0.18937218 - time (sec): 42.18 - samples/sec: 576.29 - lr: 0.000081 - momentum: 0.000000
2023-10-11 10:41:52,334 epoch 6 - iter 78/136 - loss 0.18219775 - time (sec): 50.45 - samples/sec: 573.04 - lr: 0.000079 - momentum: 0.000000
2023-10-11 10:42:01,293 epoch 6 - iter 91/136 - loss 0.17828107 - time (sec): 59.41 - samples/sec: 575.22 - lr: 0.000077 - momentum: 0.000000
2023-10-11 10:42:09,837 epoch 6 - iter 104/136 - loss 0.18218020 - time (sec): 67.95 - samples/sec: 571.92 - lr: 0.000076 - momentum: 0.000000
2023-10-11 10:42:18,635 epoch 6 - iter 117/136 - loss 0.17797010 - time (sec): 76.75 - samples/sec: 573.60 - lr: 0.000074 - momentum: 0.000000
2023-10-11 10:42:28,088 epoch 6 - iter 130/136 - loss 0.17347803 - time (sec): 86.20 - samples/sec: 579.78 - lr: 0.000072 - momentum: 0.000000
2023-10-11 10:42:31,636 ----------------------------------------------------------------------------------------------------
2023-10-11 10:42:31,636 EPOCH 6 done: loss 0.1741 - lr: 0.000072
2023-10-11 10:42:37,610 DEV : loss 0.18107403814792633 - f1-score (micro avg) 0.6025
2023-10-11 10:42:37,618 saving best model
2023-10-11 10:42:40,162 ----------------------------------------------------------------------------------------------------
2023-10-11 10:42:49,084 epoch 7 - iter 13/136 - loss 0.14984582 - time (sec): 8.92 - samples/sec: 596.35 - lr: 0.000070 - momentum: 0.000000
2023-10-11 10:42:57,253 epoch 7 - iter 26/136 - loss 0.15617649 - time (sec): 17.09 - samples/sec: 579.63 - lr: 0.000068 - momentum: 0.000000
2023-10-11 10:43:06,279 epoch 7 - iter 39/136 - loss 0.15137208 - time (sec): 26.11 - samples/sec: 583.76 - lr: 0.000067 - momentum: 0.000000
2023-10-11 10:43:15,206 epoch 7 - iter 52/136 - loss 0.15440621 - time (sec): 35.04 - samples/sec: 589.50 - lr: 0.000065 - momentum: 0.000000
2023-10-11 10:43:23,800 epoch 7 - iter 65/136 - loss 0.15352902 - time (sec): 43.63 - samples/sec: 587.14 - lr: 0.000063 - momentum: 0.000000
2023-10-11 10:43:32,306 epoch 7 - iter 78/136 - loss 0.15036344 - time (sec): 52.14 - samples/sec: 588.95 - lr: 0.000061 - momentum: 0.000000
2023-10-11 10:43:40,770 epoch 7 - iter 91/136 - loss 0.15069822 - time (sec): 60.60 - samples/sec: 587.81 - lr: 0.000060 - momentum: 0.000000
2023-10-11 10:43:48,928 epoch 7 - iter 104/136 - loss 0.14815979 - time (sec): 68.76 - samples/sec: 580.00 - lr: 0.000058 - momentum: 0.000000
2023-10-11 10:43:57,562 epoch 7 - iter 117/136 - loss 0.14471906 - time (sec): 77.40 - samples/sec: 581.59 - lr: 0.000056 - momentum: 0.000000
2023-10-11 10:44:05,803 epoch 7 - iter 130/136 - loss 0.13990487 - time (sec): 85.64 - samples/sec: 580.40 - lr: 0.000055 - momentum: 0.000000
2023-10-11 10:44:09,781 ----------------------------------------------------------------------------------------------------
2023-10-11 10:44:09,781 EPOCH 7 done: loss 0.1379 - lr: 0.000055
2023-10-11 10:44:15,642 DEV : loss 0.16930824518203735 - f1-score (micro avg) 0.6124
2023-10-11 10:44:15,651 saving best model
2023-10-11 10:44:18,226 ----------------------------------------------------------------------------------------------------
2023-10-11 10:44:26,237 epoch 8 - iter 13/136 - loss 0.12269906 - time (sec): 8.01 - samples/sec: 588.24 - lr: 0.000052 - momentum: 0.000000
2023-10-11 10:44:34,428 epoch 8 - iter 26/136 - loss 0.12227413 - time (sec): 16.20 - samples/sec: 585.74 - lr: 0.000051 - momentum: 0.000000
2023-10-11 10:44:43,146 epoch 8 - iter 39/136 - loss 0.12413325 - time (sec): 24.92 - samples/sec: 598.72 - lr: 0.000049 - momentum: 0.000000
2023-10-11 10:44:51,180 epoch 8 - iter 52/136 - loss 0.12476583 - time (sec): 32.95 - samples/sec: 592.13 - lr: 0.000047 - momentum: 0.000000
2023-10-11 10:44:59,803 epoch 8 - iter 65/136 - loss 0.12178669 - time (sec): 41.57 - samples/sec: 595.72 - lr: 0.000045 - momentum: 0.000000
2023-10-11 10:45:08,449 epoch 8 - iter 78/136 - loss 0.12151521 - time (sec): 50.22 - samples/sec: 598.47 - lr: 0.000044 - momentum: 0.000000
2023-10-11 10:45:16,884 epoch 8 - iter 91/136 - loss 0.11963538 - time (sec): 58.65 - samples/sec: 596.79 - lr: 0.000042 - momentum: 0.000000
2023-10-11 10:45:25,373 epoch 8 - iter 104/136 - loss 0.12043380 - time (sec): 67.14 - samples/sec: 599.30 - lr: 0.000040 - momentum: 0.000000
2023-10-11 10:45:34,551 epoch 8 - iter 117/136 - loss 0.11448969 - time (sec): 76.32 - samples/sec: 602.30 - lr: 0.000039 - momentum: 0.000000
2023-10-11 10:45:41,804 epoch 8 - iter 130/136 - loss 0.11567474 - time (sec): 83.57 - samples/sec: 593.87 - lr: 0.000037 - momentum: 0.000000
2023-10-11 10:45:45,514 ----------------------------------------------------------------------------------------------------
2023-10-11 10:45:45,515 EPOCH 8 done: loss 0.1145 - lr: 0.000037
2023-10-11 10:45:51,045 DEV : loss 0.15804526209831238 - f1-score (micro avg) 0.6391
2023-10-11 10:45:51,053 saving best model
2023-10-11 10:45:53,603 ----------------------------------------------------------------------------------------------------
2023-10-11 10:46:02,581 epoch 9 - iter 13/136 - loss 0.09546268 - time (sec): 8.97 - samples/sec: 633.36 - lr: 0.000034 - momentum: 0.000000
2023-10-11 10:46:11,048 epoch 9 - iter 26/136 - loss 0.10013911 - time (sec): 17.44 - samples/sec: 612.07 - lr: 0.000033 - momentum: 0.000000
2023-10-11 10:46:19,177 epoch 9 - iter 39/136 - loss 0.10244150 - time (sec): 25.57 - samples/sec: 594.61 - lr: 0.000031 - momentum: 0.000000
2023-10-11 10:46:27,594 epoch 9 - iter 52/136 - loss 0.10511116 - time (sec): 33.99 - samples/sec: 588.79 - lr: 0.000029 - momentum: 0.000000
2023-10-11 10:46:36,305 epoch 9 - iter 65/136 - loss 0.10253249 - time (sec): 42.70 - samples/sec: 594.55 - lr: 0.000028 - momentum: 0.000000
2023-10-11 10:46:44,319 epoch 9 - iter 78/136 - loss 0.10317238 - time (sec): 50.71 - samples/sec: 585.01 - lr: 0.000026 - momentum: 0.000000
2023-10-11 10:46:53,134 epoch 9 - iter 91/136 - loss 0.10200602 - time (sec): 59.53 - samples/sec: 585.97 - lr: 0.000024 - momentum: 0.000000
2023-10-11 10:47:01,366 epoch 9 - iter 104/136 - loss 0.10049025 - time (sec): 67.76 - samples/sec: 581.49 - lr: 0.000023 - momentum: 0.000000
2023-10-11 10:47:09,944 epoch 9 - iter 117/136 - loss 0.10130709 - time (sec): 76.34 - samples/sec: 582.89 - lr: 0.000021 - momentum: 0.000000
2023-10-11 10:47:18,970 epoch 9 - iter 130/136 - loss 0.10043735 - time (sec): 85.36 - samples/sec: 581.90 - lr: 0.000019 - momentum: 0.000000
2023-10-11 10:47:22,792 ----------------------------------------------------------------------------------------------------
2023-10-11 10:47:22,793 EPOCH 9 done: loss 0.0987 - lr: 0.000019
2023-10-11 10:47:28,754 DEV : loss 0.15687064826488495 - f1-score (micro avg) 0.6439
2023-10-11 10:47:28,764 saving best model
2023-10-11 10:47:31,306 ----------------------------------------------------------------------------------------------------
2023-10-11 10:47:39,905 epoch 10 - iter 13/136 - loss 0.10762881 - time (sec): 8.60 - samples/sec: 602.29 - lr: 0.000017 - momentum: 0.000000
2023-10-11 10:47:47,726 epoch 10 - iter 26/136 - loss 0.11565589 - time (sec): 16.42 - samples/sec: 560.96 - lr: 0.000015 - momentum: 0.000000
2023-10-11 10:47:57,569 epoch 10 - iter 39/136 - loss 0.10752330 - time (sec): 26.26 - samples/sec: 604.45 - lr: 0.000013 - momentum: 0.000000
2023-10-11 10:48:06,902 epoch 10 - iter 52/136 - loss 0.10726886 - time (sec): 35.59 - samples/sec: 608.39 - lr: 0.000012 - momentum: 0.000000
2023-10-11 10:48:15,585 epoch 10 - iter 65/136 - loss 0.10786433 - time (sec): 44.27 - samples/sec: 607.30 - lr: 0.000010 - momentum: 0.000000
2023-10-11 10:48:23,615 epoch 10 - iter 78/136 - loss 0.10287152 - time (sec): 52.31 - samples/sec: 597.26 - lr: 0.000008 - momentum: 0.000000
2023-10-11 10:48:32,122 epoch 10 - iter 91/136 - loss 0.09965866 - time (sec): 60.81 - samples/sec: 594.01 - lr: 0.000007 - momentum: 0.000000
2023-10-11 10:48:39,901 epoch 10 - iter 104/136 - loss 0.09762661 - time (sec): 68.59 - samples/sec: 586.79 - lr: 0.000005 - momentum: 0.000000
2023-10-11 10:48:49,022 epoch 10 - iter 117/136 - loss 0.09489476 - time (sec): 77.71 - samples/sec: 587.84 - lr: 0.000003 - momentum: 0.000000
2023-10-11 10:48:57,329 epoch 10 - iter 130/136 - loss 0.09269028 - time (sec): 86.02 - samples/sec: 585.36 - lr: 0.000002 - momentum: 0.000000
2023-10-11 10:49:00,616 ----------------------------------------------------------------------------------------------------
2023-10-11 10:49:00,616 EPOCH 10 done: loss 0.0925 - lr: 0.000002
2023-10-11 10:49:06,429 DEV : loss 0.15401233732700348 - f1-score (micro avg) 0.6643
2023-10-11 10:49:06,438 saving best model
2023-10-11 10:49:10,013 ----------------------------------------------------------------------------------------------------
2023-10-11 10:49:10,015 Loading model from best epoch ...
2023-10-11 10:49:13,557 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-11 10:49:25,408
Results:
- F-score (micro) 0.626
- F-score (macro) 0.4382
- Accuracy 0.5055
By class:
precision recall f1-score support
LOC 0.6203 0.8429 0.7147 312
PER 0.6066 0.6154 0.6110 208
HumanProd 0.2361 0.7727 0.3617 22
ORG 0.3333 0.0364 0.0656 55
micro avg 0.5750 0.6868 0.6260 597
macro avg 0.4491 0.5669 0.4382 597
weighted avg 0.5749 0.6868 0.6057 597
2023-10-11 10:49:25,408 ----------------------------------------------------------------------------------------------------