stefan-it's picture
Upload folder using huggingface_hub
76d9b71
2023-10-11 00:14:02,384 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,386 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 00:14:02,386 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,386 MultiCorpus: 1166 train + 165 dev + 415 test sentences
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
2023-10-11 00:14:02,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,387 Train: 1166 sentences
2023-10-11 00:14:02,387 (train_with_dev=False, train_with_test=False)
2023-10-11 00:14:02,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,387 Training Params:
2023-10-11 00:14:02,387 - learning_rate: "0.00015"
2023-10-11 00:14:02,387 - mini_batch_size: "8"
2023-10-11 00:14:02,387 - max_epochs: "10"
2023-10-11 00:14:02,387 - shuffle: "True"
2023-10-11 00:14:02,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,387 Plugins:
2023-10-11 00:14:02,387 - TensorboardLogger
2023-10-11 00:14:02,387 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 00:14:02,388 - metric: "('micro avg', 'f1-score')"
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Computation:
2023-10-11 00:14:02,388 - compute on device: cuda:0
2023-10-11 00:14:02,388 - embedding storage: none
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3"
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 ----------------------------------------------------------------------------------------------------
2023-10-11 00:14:02,388 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 00:14:11,164 epoch 1 - iter 14/146 - loss 2.82817866 - time (sec): 8.77 - samples/sec: 427.52 - lr: 0.000013 - momentum: 0.000000
2023-10-11 00:14:20,453 epoch 1 - iter 28/146 - loss 2.81986952 - time (sec): 18.06 - samples/sec: 450.86 - lr: 0.000028 - momentum: 0.000000
2023-10-11 00:14:29,670 epoch 1 - iter 42/146 - loss 2.81010839 - time (sec): 27.28 - samples/sec: 448.27 - lr: 0.000042 - momentum: 0.000000
2023-10-11 00:14:38,407 epoch 1 - iter 56/146 - loss 2.79282156 - time (sec): 36.02 - samples/sec: 439.82 - lr: 0.000057 - momentum: 0.000000
2023-10-11 00:14:48,463 epoch 1 - iter 70/146 - loss 2.75564153 - time (sec): 46.07 - samples/sec: 449.23 - lr: 0.000071 - momentum: 0.000000
2023-10-11 00:14:58,486 epoch 1 - iter 84/146 - loss 2.70141670 - time (sec): 56.10 - samples/sec: 458.13 - lr: 0.000085 - momentum: 0.000000
2023-10-11 00:15:07,903 epoch 1 - iter 98/146 - loss 2.63744532 - time (sec): 65.51 - samples/sec: 457.86 - lr: 0.000100 - momentum: 0.000000
2023-10-11 00:15:16,446 epoch 1 - iter 112/146 - loss 2.57069765 - time (sec): 74.06 - samples/sec: 459.84 - lr: 0.000114 - momentum: 0.000000
2023-10-11 00:15:25,138 epoch 1 - iter 126/146 - loss 2.48819700 - time (sec): 82.75 - samples/sec: 464.09 - lr: 0.000128 - momentum: 0.000000
2023-10-11 00:15:33,976 epoch 1 - iter 140/146 - loss 2.40859330 - time (sec): 91.59 - samples/sec: 464.91 - lr: 0.000143 - momentum: 0.000000
2023-10-11 00:15:37,658 ----------------------------------------------------------------------------------------------------
2023-10-11 00:15:37,659 EPOCH 1 done: loss 2.3727 - lr: 0.000143
2023-10-11 00:15:42,937 DEV : loss 1.3521078824996948 - f1-score (micro avg) 0.0
2023-10-11 00:15:42,946 ----------------------------------------------------------------------------------------------------
2023-10-11 00:15:50,957 epoch 2 - iter 14/146 - loss 1.37293642 - time (sec): 8.01 - samples/sec: 471.08 - lr: 0.000149 - momentum: 0.000000
2023-10-11 00:15:59,218 epoch 2 - iter 28/146 - loss 1.27588533 - time (sec): 16.27 - samples/sec: 479.57 - lr: 0.000147 - momentum: 0.000000
2023-10-11 00:16:07,875 epoch 2 - iter 42/146 - loss 1.19578947 - time (sec): 24.93 - samples/sec: 484.01 - lr: 0.000145 - momentum: 0.000000
2023-10-11 00:16:16,004 epoch 2 - iter 56/146 - loss 1.12937665 - time (sec): 33.06 - samples/sec: 480.81 - lr: 0.000144 - momentum: 0.000000
2023-10-11 00:16:24,974 epoch 2 - iter 70/146 - loss 1.04224254 - time (sec): 42.03 - samples/sec: 486.91 - lr: 0.000142 - momentum: 0.000000
2023-10-11 00:16:34,067 epoch 2 - iter 84/146 - loss 1.00774700 - time (sec): 51.12 - samples/sec: 489.84 - lr: 0.000141 - momentum: 0.000000
2023-10-11 00:16:42,519 epoch 2 - iter 98/146 - loss 0.96099641 - time (sec): 59.57 - samples/sec: 487.23 - lr: 0.000139 - momentum: 0.000000
2023-10-11 00:16:51,251 epoch 2 - iter 112/146 - loss 0.90901669 - time (sec): 68.30 - samples/sec: 489.49 - lr: 0.000137 - momentum: 0.000000
2023-10-11 00:17:00,072 epoch 2 - iter 126/146 - loss 0.86569250 - time (sec): 77.12 - samples/sec: 491.52 - lr: 0.000136 - momentum: 0.000000
2023-10-11 00:17:09,029 epoch 2 - iter 140/146 - loss 0.83180630 - time (sec): 86.08 - samples/sec: 492.45 - lr: 0.000134 - momentum: 0.000000
2023-10-11 00:17:12,929 ----------------------------------------------------------------------------------------------------
2023-10-11 00:17:12,930 EPOCH 2 done: loss 0.8265 - lr: 0.000134
2023-10-11 00:17:18,512 DEV : loss 0.45962727069854736 - f1-score (micro avg) 0.0
2023-10-11 00:17:18,522 ----------------------------------------------------------------------------------------------------
2023-10-11 00:17:27,515 epoch 3 - iter 14/146 - loss 0.56664628 - time (sec): 8.99 - samples/sec: 550.74 - lr: 0.000132 - momentum: 0.000000
2023-10-11 00:17:36,702 epoch 3 - iter 28/146 - loss 0.51000469 - time (sec): 18.18 - samples/sec: 553.40 - lr: 0.000130 - momentum: 0.000000
2023-10-11 00:17:45,427 epoch 3 - iter 42/146 - loss 0.55411592 - time (sec): 26.90 - samples/sec: 537.41 - lr: 0.000129 - momentum: 0.000000
2023-10-11 00:17:53,675 epoch 3 - iter 56/146 - loss 0.52261842 - time (sec): 35.15 - samples/sec: 527.12 - lr: 0.000127 - momentum: 0.000000
2023-10-11 00:18:02,091 epoch 3 - iter 70/146 - loss 0.51648209 - time (sec): 43.57 - samples/sec: 523.93 - lr: 0.000126 - momentum: 0.000000
2023-10-11 00:18:10,925 epoch 3 - iter 84/146 - loss 0.49728401 - time (sec): 52.40 - samples/sec: 518.27 - lr: 0.000124 - momentum: 0.000000
2023-10-11 00:18:19,329 epoch 3 - iter 98/146 - loss 0.47812146 - time (sec): 60.81 - samples/sec: 512.91 - lr: 0.000122 - momentum: 0.000000
2023-10-11 00:18:27,211 epoch 3 - iter 112/146 - loss 0.47088239 - time (sec): 68.69 - samples/sec: 505.81 - lr: 0.000121 - momentum: 0.000000
2023-10-11 00:18:34,844 epoch 3 - iter 126/146 - loss 0.46014170 - time (sec): 76.32 - samples/sec: 498.40 - lr: 0.000119 - momentum: 0.000000
2023-10-11 00:18:43,390 epoch 3 - iter 140/146 - loss 0.45282216 - time (sec): 84.87 - samples/sec: 496.35 - lr: 0.000118 - momentum: 0.000000
2023-10-11 00:18:47,362 ----------------------------------------------------------------------------------------------------
2023-10-11 00:18:47,362 EPOCH 3 done: loss 0.4440 - lr: 0.000118
2023-10-11 00:18:53,034 DEV : loss 0.28692546486854553 - f1-score (micro avg) 0.1634
2023-10-11 00:18:53,043 saving best model
2023-10-11 00:18:53,929 ----------------------------------------------------------------------------------------------------
2023-10-11 00:19:02,136 epoch 4 - iter 14/146 - loss 0.33858093 - time (sec): 8.21 - samples/sec: 468.00 - lr: 0.000115 - momentum: 0.000000
2023-10-11 00:19:11,252 epoch 4 - iter 28/146 - loss 0.33897577 - time (sec): 17.32 - samples/sec: 482.48 - lr: 0.000114 - momentum: 0.000000
2023-10-11 00:19:19,557 epoch 4 - iter 42/146 - loss 0.32510505 - time (sec): 25.63 - samples/sec: 480.07 - lr: 0.000112 - momentum: 0.000000
2023-10-11 00:19:27,999 epoch 4 - iter 56/146 - loss 0.33741625 - time (sec): 34.07 - samples/sec: 484.30 - lr: 0.000111 - momentum: 0.000000
2023-10-11 00:19:36,717 epoch 4 - iter 70/146 - loss 0.32330073 - time (sec): 42.79 - samples/sec: 494.60 - lr: 0.000109 - momentum: 0.000000
2023-10-11 00:19:45,262 epoch 4 - iter 84/146 - loss 0.35066962 - time (sec): 51.33 - samples/sec: 491.95 - lr: 0.000107 - momentum: 0.000000
2023-10-11 00:19:53,579 epoch 4 - iter 98/146 - loss 0.34254804 - time (sec): 59.65 - samples/sec: 491.65 - lr: 0.000106 - momentum: 0.000000
2023-10-11 00:20:02,300 epoch 4 - iter 112/146 - loss 0.33425876 - time (sec): 68.37 - samples/sec: 495.96 - lr: 0.000104 - momentum: 0.000000
2023-10-11 00:20:10,709 epoch 4 - iter 126/146 - loss 0.33494481 - time (sec): 76.78 - samples/sec: 495.05 - lr: 0.000103 - momentum: 0.000000
2023-10-11 00:20:19,699 epoch 4 - iter 140/146 - loss 0.32750282 - time (sec): 85.77 - samples/sec: 494.85 - lr: 0.000101 - momentum: 0.000000
2023-10-11 00:20:23,418 ----------------------------------------------------------------------------------------------------
2023-10-11 00:20:23,418 EPOCH 4 done: loss 0.3225 - lr: 0.000101
2023-10-11 00:20:29,017 DEV : loss 0.23322905600070953 - f1-score (micro avg) 0.332
2023-10-11 00:20:29,025 saving best model
2023-10-11 00:20:35,031 ----------------------------------------------------------------------------------------------------
2023-10-11 00:20:43,857 epoch 5 - iter 14/146 - loss 0.27735107 - time (sec): 8.82 - samples/sec: 510.33 - lr: 0.000099 - momentum: 0.000000
2023-10-11 00:20:52,350 epoch 5 - iter 28/146 - loss 0.25431500 - time (sec): 17.31 - samples/sec: 499.40 - lr: 0.000097 - momentum: 0.000000
2023-10-11 00:21:00,691 epoch 5 - iter 42/146 - loss 0.29245784 - time (sec): 25.66 - samples/sec: 492.51 - lr: 0.000096 - momentum: 0.000000
2023-10-11 00:21:08,893 epoch 5 - iter 56/146 - loss 0.30867369 - time (sec): 33.86 - samples/sec: 484.17 - lr: 0.000094 - momentum: 0.000000
2023-10-11 00:21:17,431 epoch 5 - iter 70/146 - loss 0.28826282 - time (sec): 42.40 - samples/sec: 484.22 - lr: 0.000092 - momentum: 0.000000
2023-10-11 00:21:26,668 epoch 5 - iter 84/146 - loss 0.27456335 - time (sec): 51.63 - samples/sec: 487.77 - lr: 0.000091 - momentum: 0.000000
2023-10-11 00:21:36,019 epoch 5 - iter 98/146 - loss 0.26911782 - time (sec): 60.98 - samples/sec: 497.02 - lr: 0.000089 - momentum: 0.000000
2023-10-11 00:21:44,734 epoch 5 - iter 112/146 - loss 0.25803376 - time (sec): 69.70 - samples/sec: 498.10 - lr: 0.000088 - momentum: 0.000000
2023-10-11 00:21:53,421 epoch 5 - iter 126/146 - loss 0.25520050 - time (sec): 78.39 - samples/sec: 498.15 - lr: 0.000086 - momentum: 0.000000
2023-10-11 00:22:01,791 epoch 5 - iter 140/146 - loss 0.25148287 - time (sec): 86.76 - samples/sec: 496.66 - lr: 0.000084 - momentum: 0.000000
2023-10-11 00:22:05,100 ----------------------------------------------------------------------------------------------------
2023-10-11 00:22:05,100 EPOCH 5 done: loss 0.2521 - lr: 0.000084
2023-10-11 00:22:10,781 DEV : loss 0.19501639902591705 - f1-score (micro avg) 0.473
2023-10-11 00:22:10,790 saving best model
2023-10-11 00:22:16,955 ----------------------------------------------------------------------------------------------------
2023-10-11 00:22:26,475 epoch 6 - iter 14/146 - loss 0.16514820 - time (sec): 9.52 - samples/sec: 514.75 - lr: 0.000082 - momentum: 0.000000
2023-10-11 00:22:34,815 epoch 6 - iter 28/146 - loss 0.17522830 - time (sec): 17.86 - samples/sec: 477.67 - lr: 0.000081 - momentum: 0.000000
2023-10-11 00:22:43,531 epoch 6 - iter 42/146 - loss 0.17690455 - time (sec): 26.57 - samples/sec: 477.73 - lr: 0.000079 - momentum: 0.000000
2023-10-11 00:22:52,461 epoch 6 - iter 56/146 - loss 0.16628079 - time (sec): 35.50 - samples/sec: 484.40 - lr: 0.000077 - momentum: 0.000000
2023-10-11 00:23:00,737 epoch 6 - iter 70/146 - loss 0.18071160 - time (sec): 43.78 - samples/sec: 483.63 - lr: 0.000076 - momentum: 0.000000
2023-10-11 00:23:10,600 epoch 6 - iter 84/146 - loss 0.20187792 - time (sec): 53.64 - samples/sec: 497.31 - lr: 0.000074 - momentum: 0.000000
2023-10-11 00:23:19,008 epoch 6 - iter 98/146 - loss 0.20080362 - time (sec): 62.05 - samples/sec: 494.99 - lr: 0.000073 - momentum: 0.000000
2023-10-11 00:23:27,506 epoch 6 - iter 112/146 - loss 0.19888829 - time (sec): 70.55 - samples/sec: 493.73 - lr: 0.000071 - momentum: 0.000000
2023-10-11 00:23:35,994 epoch 6 - iter 126/146 - loss 0.19539473 - time (sec): 79.03 - samples/sec: 493.95 - lr: 0.000069 - momentum: 0.000000
2023-10-11 00:23:43,994 epoch 6 - iter 140/146 - loss 0.19529054 - time (sec): 87.03 - samples/sec: 491.39 - lr: 0.000068 - momentum: 0.000000
2023-10-11 00:23:47,387 ----------------------------------------------------------------------------------------------------
2023-10-11 00:23:47,387 EPOCH 6 done: loss 0.1923 - lr: 0.000068
2023-10-11 00:23:52,889 DEV : loss 0.1738743782043457 - f1-score (micro avg) 0.5498
2023-10-11 00:23:52,897 saving best model
2023-10-11 00:23:59,052 ----------------------------------------------------------------------------------------------------
2023-10-11 00:24:08,020 epoch 7 - iter 14/146 - loss 0.15115652 - time (sec): 8.96 - samples/sec: 516.15 - lr: 0.000066 - momentum: 0.000000
2023-10-11 00:24:16,989 epoch 7 - iter 28/146 - loss 0.15169848 - time (sec): 17.93 - samples/sec: 529.01 - lr: 0.000064 - momentum: 0.000000
2023-10-11 00:24:25,559 epoch 7 - iter 42/146 - loss 0.15112913 - time (sec): 26.50 - samples/sec: 514.72 - lr: 0.000062 - momentum: 0.000000
2023-10-11 00:24:33,559 epoch 7 - iter 56/146 - loss 0.14375947 - time (sec): 34.50 - samples/sec: 505.96 - lr: 0.000061 - momentum: 0.000000
2023-10-11 00:24:41,851 epoch 7 - iter 70/146 - loss 0.14191662 - time (sec): 42.80 - samples/sec: 502.65 - lr: 0.000059 - momentum: 0.000000
2023-10-11 00:24:49,764 epoch 7 - iter 84/146 - loss 0.14733674 - time (sec): 50.71 - samples/sec: 499.78 - lr: 0.000058 - momentum: 0.000000
2023-10-11 00:24:58,428 epoch 7 - iter 98/146 - loss 0.15209724 - time (sec): 59.37 - samples/sec: 503.09 - lr: 0.000056 - momentum: 0.000000
2023-10-11 00:25:06,268 epoch 7 - iter 112/146 - loss 0.15110304 - time (sec): 67.21 - samples/sec: 493.92 - lr: 0.000054 - momentum: 0.000000
2023-10-11 00:25:15,356 epoch 7 - iter 126/146 - loss 0.15315807 - time (sec): 76.30 - samples/sec: 498.24 - lr: 0.000053 - momentum: 0.000000
2023-10-11 00:25:24,258 epoch 7 - iter 140/146 - loss 0.15339801 - time (sec): 85.20 - samples/sec: 504.10 - lr: 0.000051 - momentum: 0.000000
2023-10-11 00:25:27,449 ----------------------------------------------------------------------------------------------------
2023-10-11 00:25:27,450 EPOCH 7 done: loss 0.1525 - lr: 0.000051
2023-10-11 00:25:33,160 DEV : loss 0.1568579375743866 - f1-score (micro avg) 0.6026
2023-10-11 00:25:33,170 saving best model
2023-10-11 00:25:39,402 ----------------------------------------------------------------------------------------------------
2023-10-11 00:25:48,695 epoch 8 - iter 14/146 - loss 0.14470409 - time (sec): 9.29 - samples/sec: 565.87 - lr: 0.000049 - momentum: 0.000000
2023-10-11 00:25:56,856 epoch 8 - iter 28/146 - loss 0.15466879 - time (sec): 17.45 - samples/sec: 513.07 - lr: 0.000047 - momentum: 0.000000
2023-10-11 00:26:05,090 epoch 8 - iter 42/146 - loss 0.14556756 - time (sec): 25.68 - samples/sec: 500.74 - lr: 0.000046 - momentum: 0.000000
2023-10-11 00:26:13,757 epoch 8 - iter 56/146 - loss 0.14562604 - time (sec): 34.35 - samples/sec: 497.81 - lr: 0.000044 - momentum: 0.000000
2023-10-11 00:26:22,674 epoch 8 - iter 70/146 - loss 0.14744312 - time (sec): 43.27 - samples/sec: 498.29 - lr: 0.000043 - momentum: 0.000000
2023-10-11 00:26:31,283 epoch 8 - iter 84/146 - loss 0.14623450 - time (sec): 51.88 - samples/sec: 486.66 - lr: 0.000041 - momentum: 0.000000
2023-10-11 00:26:40,567 epoch 8 - iter 98/146 - loss 0.13910467 - time (sec): 61.16 - samples/sec: 479.51 - lr: 0.000039 - momentum: 0.000000
2023-10-11 00:26:50,125 epoch 8 - iter 112/146 - loss 0.13334599 - time (sec): 70.72 - samples/sec: 476.83 - lr: 0.000038 - momentum: 0.000000
2023-10-11 00:26:59,926 epoch 8 - iter 126/146 - loss 0.12965286 - time (sec): 80.52 - samples/sec: 473.91 - lr: 0.000036 - momentum: 0.000000
2023-10-11 00:27:09,581 epoch 8 - iter 140/146 - loss 0.12939577 - time (sec): 90.18 - samples/sec: 471.19 - lr: 0.000035 - momentum: 0.000000
2023-10-11 00:27:13,663 ----------------------------------------------------------------------------------------------------
2023-10-11 00:27:13,664 EPOCH 8 done: loss 0.1293 - lr: 0.000035
2023-10-11 00:27:20,336 DEV : loss 0.14915454387664795 - f1-score (micro avg) 0.6711
2023-10-11 00:27:20,346 saving best model
2023-10-11 00:27:26,639 ----------------------------------------------------------------------------------------------------
2023-10-11 00:27:35,810 epoch 9 - iter 14/146 - loss 0.14480468 - time (sec): 9.17 - samples/sec: 512.79 - lr: 0.000032 - momentum: 0.000000
2023-10-11 00:27:44,972 epoch 9 - iter 28/146 - loss 0.12111733 - time (sec): 18.33 - samples/sec: 508.59 - lr: 0.000031 - momentum: 0.000000
2023-10-11 00:27:53,282 epoch 9 - iter 42/146 - loss 0.11551154 - time (sec): 26.64 - samples/sec: 494.12 - lr: 0.000029 - momentum: 0.000000
2023-10-11 00:28:02,178 epoch 9 - iter 56/146 - loss 0.11450421 - time (sec): 35.54 - samples/sec: 496.91 - lr: 0.000028 - momentum: 0.000000
2023-10-11 00:28:11,315 epoch 9 - iter 70/146 - loss 0.11627392 - time (sec): 44.67 - samples/sec: 491.31 - lr: 0.000026 - momentum: 0.000000
2023-10-11 00:28:20,239 epoch 9 - iter 84/146 - loss 0.11633930 - time (sec): 53.60 - samples/sec: 491.19 - lr: 0.000024 - momentum: 0.000000
2023-10-11 00:28:28,971 epoch 9 - iter 98/146 - loss 0.11323542 - time (sec): 62.33 - samples/sec: 488.62 - lr: 0.000023 - momentum: 0.000000
2023-10-11 00:28:37,804 epoch 9 - iter 112/146 - loss 0.10890718 - time (sec): 71.16 - samples/sec: 488.50 - lr: 0.000021 - momentum: 0.000000
2023-10-11 00:28:46,797 epoch 9 - iter 126/146 - loss 0.11265525 - time (sec): 80.15 - samples/sec: 487.63 - lr: 0.000020 - momentum: 0.000000
2023-10-11 00:28:55,411 epoch 9 - iter 140/146 - loss 0.11540303 - time (sec): 88.77 - samples/sec: 485.35 - lr: 0.000018 - momentum: 0.000000
2023-10-11 00:28:58,607 ----------------------------------------------------------------------------------------------------
2023-10-11 00:28:58,607 EPOCH 9 done: loss 0.1148 - lr: 0.000018
2023-10-11 00:29:04,628 DEV : loss 0.15014490485191345 - f1-score (micro avg) 0.7097
2023-10-11 00:29:04,638 saving best model
2023-10-11 00:29:10,636 ----------------------------------------------------------------------------------------------------
2023-10-11 00:29:19,540 epoch 10 - iter 14/146 - loss 0.11532110 - time (sec): 8.90 - samples/sec: 515.52 - lr: 0.000016 - momentum: 0.000000
2023-10-11 00:29:28,674 epoch 10 - iter 28/146 - loss 0.11738693 - time (sec): 18.03 - samples/sec: 506.38 - lr: 0.000014 - momentum: 0.000000
2023-10-11 00:29:37,812 epoch 10 - iter 42/146 - loss 0.11927845 - time (sec): 27.17 - samples/sec: 512.69 - lr: 0.000013 - momentum: 0.000000
2023-10-11 00:29:47,596 epoch 10 - iter 56/146 - loss 0.11248663 - time (sec): 36.96 - samples/sec: 504.22 - lr: 0.000011 - momentum: 0.000000
2023-10-11 00:29:56,837 epoch 10 - iter 70/146 - loss 0.11378977 - time (sec): 46.20 - samples/sec: 489.68 - lr: 0.000009 - momentum: 0.000000
2023-10-11 00:30:06,496 epoch 10 - iter 84/146 - loss 0.10860430 - time (sec): 55.86 - samples/sec: 483.19 - lr: 0.000008 - momentum: 0.000000
2023-10-11 00:30:15,171 epoch 10 - iter 98/146 - loss 0.10596910 - time (sec): 64.53 - samples/sec: 467.25 - lr: 0.000006 - momentum: 0.000000
2023-10-11 00:30:25,050 epoch 10 - iter 112/146 - loss 0.10933142 - time (sec): 74.41 - samples/sec: 465.68 - lr: 0.000005 - momentum: 0.000000
2023-10-11 00:30:34,564 epoch 10 - iter 126/146 - loss 0.10650302 - time (sec): 83.92 - samples/sec: 460.29 - lr: 0.000003 - momentum: 0.000000
2023-10-11 00:30:44,129 epoch 10 - iter 140/146 - loss 0.10893081 - time (sec): 93.49 - samples/sec: 456.67 - lr: 0.000001 - momentum: 0.000000
2023-10-11 00:30:48,031 ----------------------------------------------------------------------------------------------------
2023-10-11 00:30:48,032 EPOCH 10 done: loss 0.1087 - lr: 0.000001
2023-10-11 00:30:53,773 DEV : loss 0.15238186717033386 - f1-score (micro avg) 0.7229
2023-10-11 00:30:53,782 saving best model
2023-10-11 00:31:00,790 ----------------------------------------------------------------------------------------------------
2023-10-11 00:31:00,792 Loading model from best epoch ...
2023-10-11 00:31:04,651 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 00:31:16,872
Results:
- F-score (micro) 0.7015
- F-score (macro) 0.6099
- Accuracy 0.5632
By class:
precision recall f1-score support
PER 0.7821 0.8046 0.7932 348
LOC 0.5766 0.7931 0.6677 261
ORG 0.2982 0.3269 0.3119 52
HumanProd 0.7647 0.5909 0.6667 22
micro avg 0.6536 0.7570 0.7015 683
macro avg 0.6054 0.6289 0.6099 683
weighted avg 0.6662 0.7570 0.7045 683
2023-10-11 00:31:16,872 ----------------------------------------------------------------------------------------------------