stefan-it's picture
Upload folder using huggingface_hub
bc00f83
2023-10-11 08:46:31,837 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,840 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 08:46:31,840 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,840 MultiCorpus: 1085 train + 148 dev + 364 test sentences
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator
2023-10-11 08:46:31,840 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,840 Train: 1085 sentences
2023-10-11 08:46:31,840 (train_with_dev=False, train_with_test=False)
2023-10-11 08:46:31,840 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,840 Training Params:
2023-10-11 08:46:31,840 - learning_rate: "0.00016"
2023-10-11 08:46:31,841 - mini_batch_size: "4"
2023-10-11 08:46:31,841 - max_epochs: "10"
2023-10-11 08:46:31,841 - shuffle: "True"
2023-10-11 08:46:31,841 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,841 Plugins:
2023-10-11 08:46:31,841 - TensorboardLogger
2023-10-11 08:46:31,841 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 08:46:31,841 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,841 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 08:46:31,841 - metric: "('micro avg', 'f1-score')"
2023-10-11 08:46:31,841 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,841 Computation:
2023-10-11 08:46:31,841 - compute on device: cuda:0
2023-10-11 08:46:31,841 - embedding storage: none
2023-10-11 08:46:31,841 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,842 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1"
2023-10-11 08:46:31,842 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,842 ----------------------------------------------------------------------------------------------------
2023-10-11 08:46:31,842 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 08:46:41,628 epoch 1 - iter 27/272 - loss 2.82503855 - time (sec): 9.78 - samples/sec: 533.20 - lr: 0.000015 - momentum: 0.000000
2023-10-11 08:46:51,677 epoch 1 - iter 54/272 - loss 2.81479736 - time (sec): 19.83 - samples/sec: 549.38 - lr: 0.000031 - momentum: 0.000000
2023-10-11 08:47:01,623 epoch 1 - iter 81/272 - loss 2.79445131 - time (sec): 29.78 - samples/sec: 542.15 - lr: 0.000047 - momentum: 0.000000
2023-10-11 08:47:11,987 epoch 1 - iter 108/272 - loss 2.74143790 - time (sec): 40.14 - samples/sec: 545.49 - lr: 0.000063 - momentum: 0.000000
2023-10-11 08:47:21,335 epoch 1 - iter 135/272 - loss 2.67294473 - time (sec): 49.49 - samples/sec: 526.82 - lr: 0.000079 - momentum: 0.000000
2023-10-11 08:47:32,016 epoch 1 - iter 162/272 - loss 2.56753769 - time (sec): 60.17 - samples/sec: 523.44 - lr: 0.000095 - momentum: 0.000000
2023-10-11 08:47:42,898 epoch 1 - iter 189/272 - loss 2.45166255 - time (sec): 71.05 - samples/sec: 519.49 - lr: 0.000111 - momentum: 0.000000
2023-10-11 08:47:53,558 epoch 1 - iter 216/272 - loss 2.33252461 - time (sec): 81.71 - samples/sec: 517.21 - lr: 0.000126 - momentum: 0.000000
2023-10-11 08:48:03,842 epoch 1 - iter 243/272 - loss 2.21679175 - time (sec): 92.00 - samples/sec: 514.16 - lr: 0.000142 - momentum: 0.000000
2023-10-11 08:48:13,393 epoch 1 - iter 270/272 - loss 2.10793293 - time (sec): 101.55 - samples/sec: 509.85 - lr: 0.000158 - momentum: 0.000000
2023-10-11 08:48:13,897 ----------------------------------------------------------------------------------------------------
2023-10-11 08:48:13,897 EPOCH 1 done: loss 2.1029 - lr: 0.000158
2023-10-11 08:48:19,678 DEV : loss 0.7420206069946289 - f1-score (micro avg) 0.0
2023-10-11 08:48:19,686 ----------------------------------------------------------------------------------------------------
2023-10-11 08:48:29,235 epoch 2 - iter 27/272 - loss 0.73201408 - time (sec): 9.55 - samples/sec: 508.44 - lr: 0.000158 - momentum: 0.000000
2023-10-11 08:48:39,028 epoch 2 - iter 54/272 - loss 0.64365181 - time (sec): 19.34 - samples/sec: 518.40 - lr: 0.000157 - momentum: 0.000000
2023-10-11 08:48:48,747 epoch 2 - iter 81/272 - loss 0.62581360 - time (sec): 29.06 - samples/sec: 532.06 - lr: 0.000155 - momentum: 0.000000
2023-10-11 08:48:58,318 epoch 2 - iter 108/272 - loss 0.57402681 - time (sec): 38.63 - samples/sec: 534.90 - lr: 0.000153 - momentum: 0.000000
2023-10-11 08:49:08,050 epoch 2 - iter 135/272 - loss 0.55461793 - time (sec): 48.36 - samples/sec: 519.02 - lr: 0.000151 - momentum: 0.000000
2023-10-11 08:49:18,508 epoch 2 - iter 162/272 - loss 0.54159163 - time (sec): 58.82 - samples/sec: 527.05 - lr: 0.000149 - momentum: 0.000000
2023-10-11 08:49:28,016 epoch 2 - iter 189/272 - loss 0.52812425 - time (sec): 68.33 - samples/sec: 527.76 - lr: 0.000148 - momentum: 0.000000
2023-10-11 08:49:38,503 epoch 2 - iter 216/272 - loss 0.48929605 - time (sec): 78.82 - samples/sec: 535.29 - lr: 0.000146 - momentum: 0.000000
2023-10-11 08:49:48,000 epoch 2 - iter 243/272 - loss 0.46843836 - time (sec): 88.31 - samples/sec: 534.95 - lr: 0.000144 - momentum: 0.000000
2023-10-11 08:49:56,928 epoch 2 - iter 270/272 - loss 0.45538103 - time (sec): 97.24 - samples/sec: 531.77 - lr: 0.000142 - momentum: 0.000000
2023-10-11 08:49:57,467 ----------------------------------------------------------------------------------------------------
2023-10-11 08:49:57,467 EPOCH 2 done: loss 0.4541 - lr: 0.000142
2023-10-11 08:50:03,637 DEV : loss 0.2795553505420685 - f1-score (micro avg) 0.2799
2023-10-11 08:50:03,645 saving best model
2023-10-11 08:50:04,536 ----------------------------------------------------------------------------------------------------
2023-10-11 08:50:13,001 epoch 3 - iter 27/272 - loss 0.30926344 - time (sec): 8.46 - samples/sec: 471.49 - lr: 0.000141 - momentum: 0.000000
2023-10-11 08:50:22,688 epoch 3 - iter 54/272 - loss 0.28010891 - time (sec): 18.15 - samples/sec: 508.98 - lr: 0.000139 - momentum: 0.000000
2023-10-11 08:50:32,156 epoch 3 - iter 81/272 - loss 0.26941339 - time (sec): 27.62 - samples/sec: 520.25 - lr: 0.000137 - momentum: 0.000000
2023-10-11 08:50:42,087 epoch 3 - iter 108/272 - loss 0.26957336 - time (sec): 37.55 - samples/sec: 519.09 - lr: 0.000135 - momentum: 0.000000
2023-10-11 08:50:52,114 epoch 3 - iter 135/272 - loss 0.27237190 - time (sec): 47.58 - samples/sec: 528.62 - lr: 0.000133 - momentum: 0.000000
2023-10-11 08:51:02,110 epoch 3 - iter 162/272 - loss 0.26668477 - time (sec): 57.57 - samples/sec: 536.22 - lr: 0.000132 - momentum: 0.000000
2023-10-11 08:51:11,906 epoch 3 - iter 189/272 - loss 0.26973989 - time (sec): 67.37 - samples/sec: 538.76 - lr: 0.000130 - momentum: 0.000000
2023-10-11 08:51:21,268 epoch 3 - iter 216/272 - loss 0.26714766 - time (sec): 76.73 - samples/sec: 533.69 - lr: 0.000128 - momentum: 0.000000
2023-10-11 08:51:31,787 epoch 3 - iter 243/272 - loss 0.25771325 - time (sec): 87.25 - samples/sec: 539.86 - lr: 0.000126 - momentum: 0.000000
2023-10-11 08:51:41,141 epoch 3 - iter 270/272 - loss 0.25337985 - time (sec): 96.60 - samples/sec: 535.67 - lr: 0.000125 - momentum: 0.000000
2023-10-11 08:51:41,589 ----------------------------------------------------------------------------------------------------
2023-10-11 08:51:41,589 EPOCH 3 done: loss 0.2536 - lr: 0.000125
2023-10-11 08:51:48,529 DEV : loss 0.18771013617515564 - f1-score (micro avg) 0.5978
2023-10-11 08:51:48,538 saving best model
2023-10-11 08:51:51,138 ----------------------------------------------------------------------------------------------------
2023-10-11 08:52:00,134 epoch 4 - iter 27/272 - loss 0.20637994 - time (sec): 8.99 - samples/sec: 526.18 - lr: 0.000123 - momentum: 0.000000
2023-10-11 08:52:10,291 epoch 4 - iter 54/272 - loss 0.18402262 - time (sec): 19.15 - samples/sec: 551.44 - lr: 0.000121 - momentum: 0.000000
2023-10-11 08:52:19,753 epoch 4 - iter 81/272 - loss 0.17988604 - time (sec): 28.61 - samples/sec: 543.96 - lr: 0.000119 - momentum: 0.000000
2023-10-11 08:52:29,681 epoch 4 - iter 108/272 - loss 0.17262483 - time (sec): 38.54 - samples/sec: 551.29 - lr: 0.000117 - momentum: 0.000000
2023-10-11 08:52:39,149 epoch 4 - iter 135/272 - loss 0.16177249 - time (sec): 48.01 - samples/sec: 552.24 - lr: 0.000116 - momentum: 0.000000
2023-10-11 08:52:48,606 epoch 4 - iter 162/272 - loss 0.15861681 - time (sec): 57.46 - samples/sec: 546.29 - lr: 0.000114 - momentum: 0.000000
2023-10-11 08:52:59,370 epoch 4 - iter 189/272 - loss 0.15138177 - time (sec): 68.23 - samples/sec: 550.75 - lr: 0.000112 - momentum: 0.000000
2023-10-11 08:53:09,261 epoch 4 - iter 216/272 - loss 0.15344235 - time (sec): 78.12 - samples/sec: 547.59 - lr: 0.000110 - momentum: 0.000000
2023-10-11 08:53:18,585 epoch 4 - iter 243/272 - loss 0.15540389 - time (sec): 87.44 - samples/sec: 543.27 - lr: 0.000109 - momentum: 0.000000
2023-10-11 08:53:27,569 epoch 4 - iter 270/272 - loss 0.15515638 - time (sec): 96.43 - samples/sec: 537.32 - lr: 0.000107 - momentum: 0.000000
2023-10-11 08:53:27,987 ----------------------------------------------------------------------------------------------------
2023-10-11 08:53:27,988 EPOCH 4 done: loss 0.1550 - lr: 0.000107
2023-10-11 08:53:33,816 DEV : loss 0.14752107858657837 - f1-score (micro avg) 0.658
2023-10-11 08:53:33,826 saving best model
2023-10-11 08:53:36,632 ----------------------------------------------------------------------------------------------------
2023-10-11 08:53:46,312 epoch 5 - iter 27/272 - loss 0.11436262 - time (sec): 9.68 - samples/sec: 572.18 - lr: 0.000105 - momentum: 0.000000
2023-10-11 08:53:55,676 epoch 5 - iter 54/272 - loss 0.11963874 - time (sec): 19.04 - samples/sec: 549.11 - lr: 0.000103 - momentum: 0.000000
2023-10-11 08:54:04,374 epoch 5 - iter 81/272 - loss 0.12040345 - time (sec): 27.74 - samples/sec: 535.16 - lr: 0.000101 - momentum: 0.000000
2023-10-11 08:54:13,474 epoch 5 - iter 108/272 - loss 0.11406206 - time (sec): 36.84 - samples/sec: 540.48 - lr: 0.000100 - momentum: 0.000000
2023-10-11 08:54:23,142 epoch 5 - iter 135/272 - loss 0.11208839 - time (sec): 46.51 - samples/sec: 547.59 - lr: 0.000098 - momentum: 0.000000
2023-10-11 08:54:32,283 epoch 5 - iter 162/272 - loss 0.10997584 - time (sec): 55.65 - samples/sec: 546.65 - lr: 0.000096 - momentum: 0.000000
2023-10-11 08:54:41,397 epoch 5 - iter 189/272 - loss 0.10698639 - time (sec): 64.76 - samples/sec: 547.20 - lr: 0.000094 - momentum: 0.000000
2023-10-11 08:54:51,151 epoch 5 - iter 216/272 - loss 0.10437260 - time (sec): 74.51 - samples/sec: 553.36 - lr: 0.000093 - momentum: 0.000000
2023-10-11 08:55:00,218 epoch 5 - iter 243/272 - loss 0.10669928 - time (sec): 83.58 - samples/sec: 550.74 - lr: 0.000091 - momentum: 0.000000
2023-10-11 08:55:09,997 epoch 5 - iter 270/272 - loss 0.10536589 - time (sec): 93.36 - samples/sec: 554.39 - lr: 0.000089 - momentum: 0.000000
2023-10-11 08:55:10,430 ----------------------------------------------------------------------------------------------------
2023-10-11 08:55:10,431 EPOCH 5 done: loss 0.1055 - lr: 0.000089
2023-10-11 08:55:16,000 DEV : loss 0.13927499949932098 - f1-score (micro avg) 0.7678
2023-10-11 08:55:16,010 saving best model
2023-10-11 08:55:18,606 ----------------------------------------------------------------------------------------------------
2023-10-11 08:55:28,182 epoch 6 - iter 27/272 - loss 0.07850917 - time (sec): 9.57 - samples/sec: 560.10 - lr: 0.000087 - momentum: 0.000000
2023-10-11 08:55:37,244 epoch 6 - iter 54/272 - loss 0.09402615 - time (sec): 18.63 - samples/sec: 543.70 - lr: 0.000085 - momentum: 0.000000
2023-10-11 08:55:46,928 epoch 6 - iter 81/272 - loss 0.08988016 - time (sec): 28.32 - samples/sec: 557.28 - lr: 0.000084 - momentum: 0.000000
2023-10-11 08:55:55,750 epoch 6 - iter 108/272 - loss 0.08253274 - time (sec): 37.14 - samples/sec: 554.07 - lr: 0.000082 - momentum: 0.000000
2023-10-11 08:56:04,527 epoch 6 - iter 135/272 - loss 0.08701424 - time (sec): 45.92 - samples/sec: 544.31 - lr: 0.000080 - momentum: 0.000000
2023-10-11 08:56:13,801 epoch 6 - iter 162/272 - loss 0.08101780 - time (sec): 55.19 - samples/sec: 544.89 - lr: 0.000078 - momentum: 0.000000
2023-10-11 08:56:22,641 epoch 6 - iter 189/272 - loss 0.08041658 - time (sec): 64.03 - samples/sec: 543.08 - lr: 0.000077 - momentum: 0.000000
2023-10-11 08:56:32,377 epoch 6 - iter 216/272 - loss 0.07945405 - time (sec): 73.77 - samples/sec: 548.07 - lr: 0.000075 - momentum: 0.000000
2023-10-11 08:56:42,277 epoch 6 - iter 243/272 - loss 0.07521173 - time (sec): 83.67 - samples/sec: 551.06 - lr: 0.000073 - momentum: 0.000000
2023-10-11 08:56:52,338 epoch 6 - iter 270/272 - loss 0.07280428 - time (sec): 93.73 - samples/sec: 550.66 - lr: 0.000071 - momentum: 0.000000
2023-10-11 08:56:52,948 ----------------------------------------------------------------------------------------------------
2023-10-11 08:56:52,948 EPOCH 6 done: loss 0.0731 - lr: 0.000071
2023-10-11 08:56:58,860 DEV : loss 0.13484491407871246 - f1-score (micro avg) 0.764
2023-10-11 08:56:58,868 ----------------------------------------------------------------------------------------------------
2023-10-11 08:57:07,260 epoch 7 - iter 27/272 - loss 0.06214019 - time (sec): 8.39 - samples/sec: 464.28 - lr: 0.000069 - momentum: 0.000000
2023-10-11 08:57:18,473 epoch 7 - iter 54/272 - loss 0.06815280 - time (sec): 19.60 - samples/sec: 534.52 - lr: 0.000068 - momentum: 0.000000
2023-10-11 08:57:28,778 epoch 7 - iter 81/272 - loss 0.06037909 - time (sec): 29.91 - samples/sec: 549.25 - lr: 0.000066 - momentum: 0.000000
2023-10-11 08:57:38,310 epoch 7 - iter 108/272 - loss 0.05637524 - time (sec): 39.44 - samples/sec: 549.32 - lr: 0.000064 - momentum: 0.000000
2023-10-11 08:57:47,655 epoch 7 - iter 135/272 - loss 0.05750252 - time (sec): 48.78 - samples/sec: 551.32 - lr: 0.000062 - momentum: 0.000000
2023-10-11 08:57:56,698 epoch 7 - iter 162/272 - loss 0.05648056 - time (sec): 57.83 - samples/sec: 545.48 - lr: 0.000061 - momentum: 0.000000
2023-10-11 08:58:05,934 epoch 7 - iter 189/272 - loss 0.05468204 - time (sec): 67.06 - samples/sec: 547.25 - lr: 0.000059 - momentum: 0.000000
2023-10-11 08:58:14,303 epoch 7 - iter 216/272 - loss 0.05367683 - time (sec): 75.43 - samples/sec: 543.46 - lr: 0.000057 - momentum: 0.000000
2023-10-11 08:58:23,658 epoch 7 - iter 243/272 - loss 0.05396445 - time (sec): 84.79 - samples/sec: 546.11 - lr: 0.000055 - momentum: 0.000000
2023-10-11 08:58:33,141 epoch 7 - iter 270/272 - loss 0.05372052 - time (sec): 94.27 - samples/sec: 548.66 - lr: 0.000054 - momentum: 0.000000
2023-10-11 08:58:33,608 ----------------------------------------------------------------------------------------------------
2023-10-11 08:58:33,609 EPOCH 7 done: loss 0.0539 - lr: 0.000054
2023-10-11 08:58:39,587 DEV : loss 0.12847252190113068 - f1-score (micro avg) 0.7792
2023-10-11 08:58:39,595 saving best model
2023-10-11 08:58:42,149 ----------------------------------------------------------------------------------------------------
2023-10-11 08:58:51,228 epoch 8 - iter 27/272 - loss 0.03724604 - time (sec): 9.08 - samples/sec: 555.03 - lr: 0.000052 - momentum: 0.000000
2023-10-11 08:59:00,208 epoch 8 - iter 54/272 - loss 0.03582304 - time (sec): 18.05 - samples/sec: 552.54 - lr: 0.000050 - momentum: 0.000000
2023-10-11 08:59:10,094 epoch 8 - iter 81/272 - loss 0.04189696 - time (sec): 27.94 - samples/sec: 572.81 - lr: 0.000048 - momentum: 0.000000
2023-10-11 08:59:19,038 epoch 8 - iter 108/272 - loss 0.04015146 - time (sec): 36.89 - samples/sec: 566.49 - lr: 0.000046 - momentum: 0.000000
2023-10-11 08:59:28,385 epoch 8 - iter 135/272 - loss 0.03937496 - time (sec): 46.23 - samples/sec: 563.96 - lr: 0.000045 - momentum: 0.000000
2023-10-11 08:59:37,864 epoch 8 - iter 162/272 - loss 0.03916141 - time (sec): 55.71 - samples/sec: 562.64 - lr: 0.000043 - momentum: 0.000000
2023-10-11 08:59:47,098 epoch 8 - iter 189/272 - loss 0.03951737 - time (sec): 64.95 - samples/sec: 555.42 - lr: 0.000041 - momentum: 0.000000
2023-10-11 08:59:56,915 epoch 8 - iter 216/272 - loss 0.03924256 - time (sec): 74.76 - samples/sec: 556.91 - lr: 0.000039 - momentum: 0.000000
2023-10-11 09:00:05,974 epoch 8 - iter 243/272 - loss 0.04242295 - time (sec): 83.82 - samples/sec: 553.15 - lr: 0.000038 - momentum: 0.000000
2023-10-11 09:00:15,555 epoch 8 - iter 270/272 - loss 0.04091031 - time (sec): 93.40 - samples/sec: 552.96 - lr: 0.000036 - momentum: 0.000000
2023-10-11 09:00:16,105 ----------------------------------------------------------------------------------------------------
2023-10-11 09:00:16,105 EPOCH 8 done: loss 0.0411 - lr: 0.000036
2023-10-11 09:00:21,622 DEV : loss 0.1297326683998108 - f1-score (micro avg) 0.8015
2023-10-11 09:00:21,630 saving best model
2023-10-11 09:00:24,195 ----------------------------------------------------------------------------------------------------
2023-10-11 09:00:32,880 epoch 9 - iter 27/272 - loss 0.02851045 - time (sec): 8.68 - samples/sec: 521.81 - lr: 0.000034 - momentum: 0.000000
2023-10-11 09:00:42,791 epoch 9 - iter 54/272 - loss 0.02545259 - time (sec): 18.59 - samples/sec: 555.82 - lr: 0.000032 - momentum: 0.000000
2023-10-11 09:00:52,267 epoch 9 - iter 81/272 - loss 0.02552757 - time (sec): 28.07 - samples/sec: 552.90 - lr: 0.000030 - momentum: 0.000000
2023-10-11 09:01:01,675 epoch 9 - iter 108/272 - loss 0.03169139 - time (sec): 37.48 - samples/sec: 550.18 - lr: 0.000029 - momentum: 0.000000
2023-10-11 09:01:11,062 epoch 9 - iter 135/272 - loss 0.03179437 - time (sec): 46.86 - samples/sec: 551.07 - lr: 0.000027 - momentum: 0.000000
2023-10-11 09:01:20,938 epoch 9 - iter 162/272 - loss 0.03091858 - time (sec): 56.74 - samples/sec: 555.52 - lr: 0.000025 - momentum: 0.000000
2023-10-11 09:01:29,941 epoch 9 - iter 189/272 - loss 0.03017751 - time (sec): 65.74 - samples/sec: 549.36 - lr: 0.000023 - momentum: 0.000000
2023-10-11 09:01:39,359 epoch 9 - iter 216/272 - loss 0.03227109 - time (sec): 75.16 - samples/sec: 551.30 - lr: 0.000022 - momentum: 0.000000
2023-10-11 09:01:48,473 epoch 9 - iter 243/272 - loss 0.03433461 - time (sec): 84.27 - samples/sec: 546.30 - lr: 0.000020 - momentum: 0.000000
2023-10-11 09:01:58,254 epoch 9 - iter 270/272 - loss 0.03343733 - time (sec): 94.06 - samples/sec: 549.00 - lr: 0.000018 - momentum: 0.000000
2023-10-11 09:01:58,817 ----------------------------------------------------------------------------------------------------
2023-10-11 09:01:58,817 EPOCH 9 done: loss 0.0333 - lr: 0.000018
2023-10-11 09:02:04,280 DEV : loss 0.13128715753555298 - f1-score (micro avg) 0.7935
2023-10-11 09:02:04,288 ----------------------------------------------------------------------------------------------------
2023-10-11 09:02:13,398 epoch 10 - iter 27/272 - loss 0.02099314 - time (sec): 9.11 - samples/sec: 517.83 - lr: 0.000016 - momentum: 0.000000
2023-10-11 09:02:22,257 epoch 10 - iter 54/272 - loss 0.02110070 - time (sec): 17.97 - samples/sec: 499.26 - lr: 0.000014 - momentum: 0.000000
2023-10-11 09:02:31,661 epoch 10 - iter 81/272 - loss 0.02719329 - time (sec): 27.37 - samples/sec: 499.66 - lr: 0.000013 - momentum: 0.000000
2023-10-11 09:02:41,814 epoch 10 - iter 108/272 - loss 0.02618643 - time (sec): 37.52 - samples/sec: 509.20 - lr: 0.000011 - momentum: 0.000000
2023-10-11 09:02:52,674 epoch 10 - iter 135/272 - loss 0.02521797 - time (sec): 48.38 - samples/sec: 527.51 - lr: 0.000009 - momentum: 0.000000
2023-10-11 09:03:03,562 epoch 10 - iter 162/272 - loss 0.02773592 - time (sec): 59.27 - samples/sec: 541.00 - lr: 0.000007 - momentum: 0.000000
2023-10-11 09:03:13,363 epoch 10 - iter 189/272 - loss 0.02935487 - time (sec): 69.07 - samples/sec: 543.40 - lr: 0.000005 - momentum: 0.000000
2023-10-11 09:03:22,530 epoch 10 - iter 216/272 - loss 0.02897761 - time (sec): 78.24 - samples/sec: 535.14 - lr: 0.000004 - momentum: 0.000000
2023-10-11 09:03:31,795 epoch 10 - iter 243/272 - loss 0.02896544 - time (sec): 87.50 - samples/sec: 531.38 - lr: 0.000002 - momentum: 0.000000
2023-10-11 09:03:41,537 epoch 10 - iter 270/272 - loss 0.02830754 - time (sec): 97.25 - samples/sec: 532.62 - lr: 0.000000 - momentum: 0.000000
2023-10-11 09:03:41,982 ----------------------------------------------------------------------------------------------------
2023-10-11 09:03:41,982 EPOCH 10 done: loss 0.0284 - lr: 0.000000
2023-10-11 09:03:47,926 DEV : loss 0.1347367763519287 - f1-score (micro avg) 0.7913
2023-10-11 09:03:48,832 ----------------------------------------------------------------------------------------------------
2023-10-11 09:03:48,834 Loading model from best epoch ...
2023-10-11 09:03:52,957 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-11 09:04:05,405
Results:
- F-score (micro) 0.7686
- F-score (macro) 0.7157
- Accuracy 0.6437
By class:
precision recall f1-score support
LOC 0.7624 0.8846 0.8190 312
PER 0.6980 0.8558 0.7689 208
ORG 0.4231 0.4000 0.4112 55
HumanProd 0.8636 0.8636 0.8636 22
micro avg 0.7164 0.8291 0.7686 597
macro avg 0.6868 0.7510 0.7157 597
weighted avg 0.7125 0.8291 0.7656 597
2023-10-11 09:04:05,405 ----------------------------------------------------------------------------------------------------