stefan-it's picture
Upload folder using huggingface_hub
c5f5d1f
2023-10-12 06:24:15,769 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,771 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-12 06:24:15,772 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,772 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-12 06:24:15,772 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,772 Train: 20847 sentences
2023-10-12 06:24:15,772 (train_with_dev=False, train_with_test=False)
2023-10-12 06:24:15,772 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,772 Training Params:
2023-10-12 06:24:15,772 - learning_rate: "0.00016"
2023-10-12 06:24:15,772 - mini_batch_size: "4"
2023-10-12 06:24:15,772 - max_epochs: "10"
2023-10-12 06:24:15,773 - shuffle: "True"
2023-10-12 06:24:15,773 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,773 Plugins:
2023-10-12 06:24:15,773 - TensorboardLogger
2023-10-12 06:24:15,773 - LinearScheduler | warmup_fraction: '0.1'
2023-10-12 06:24:15,773 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,773 Final evaluation on model from best epoch (best-model.pt)
2023-10-12 06:24:15,773 - metric: "('micro avg', 'f1-score')"
2023-10-12 06:24:15,773 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,773 Computation:
2023-10-12 06:24:15,773 - compute on device: cuda:0
2023-10-12 06:24:15,773 - embedding storage: none
2023-10-12 06:24:15,773 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,773 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4"
2023-10-12 06:24:15,774 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,774 ----------------------------------------------------------------------------------------------------
2023-10-12 06:24:15,774 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-12 06:26:32,674 epoch 1 - iter 521/5212 - loss 2.79144553 - time (sec): 136.90 - samples/sec: 242.00 - lr: 0.000016 - momentum: 0.000000
2023-10-12 06:28:51,123 epoch 1 - iter 1042/5212 - loss 2.32825107 - time (sec): 275.35 - samples/sec: 246.67 - lr: 0.000032 - momentum: 0.000000
2023-10-12 06:31:13,745 epoch 1 - iter 1563/5212 - loss 1.76882170 - time (sec): 417.97 - samples/sec: 251.47 - lr: 0.000048 - momentum: 0.000000
2023-10-12 06:33:39,975 epoch 1 - iter 2084/5212 - loss 1.42611448 - time (sec): 564.20 - samples/sec: 251.68 - lr: 0.000064 - momentum: 0.000000
2023-10-12 06:36:10,203 epoch 1 - iter 2605/5212 - loss 1.21917708 - time (sec): 714.43 - samples/sec: 252.03 - lr: 0.000080 - momentum: 0.000000
2023-10-12 06:38:38,555 epoch 1 - iter 3126/5212 - loss 1.07781127 - time (sec): 862.78 - samples/sec: 249.33 - lr: 0.000096 - momentum: 0.000000
2023-10-12 06:41:11,145 epoch 1 - iter 3647/5212 - loss 0.96200152 - time (sec): 1015.37 - samples/sec: 249.54 - lr: 0.000112 - momentum: 0.000000
2023-10-12 06:43:28,559 epoch 1 - iter 4168/5212 - loss 0.87230031 - time (sec): 1152.78 - samples/sec: 251.22 - lr: 0.000128 - momentum: 0.000000
2023-10-12 06:45:46,933 epoch 1 - iter 4689/5212 - loss 0.79145949 - time (sec): 1291.16 - samples/sec: 255.19 - lr: 0.000144 - momentum: 0.000000
2023-10-12 06:48:04,832 epoch 1 - iter 5210/5212 - loss 0.72825829 - time (sec): 1429.06 - samples/sec: 256.96 - lr: 0.000160 - momentum: 0.000000
2023-10-12 06:48:05,387 ----------------------------------------------------------------------------------------------------
2023-10-12 06:48:05,387 EPOCH 1 done: loss 0.7279 - lr: 0.000160
2023-10-12 06:48:40,151 DEV : loss 0.12137877196073532 - f1-score (micro avg) 0.2683
2023-10-12 06:48:40,200 saving best model
2023-10-12 06:48:41,028 ----------------------------------------------------------------------------------------------------
2023-10-12 06:50:57,738 epoch 2 - iter 521/5212 - loss 0.16649058 - time (sec): 136.71 - samples/sec: 265.51 - lr: 0.000158 - momentum: 0.000000
2023-10-12 06:53:18,013 epoch 2 - iter 1042/5212 - loss 0.15471879 - time (sec): 276.98 - samples/sec: 267.97 - lr: 0.000156 - momentum: 0.000000
2023-10-12 06:55:29,293 epoch 2 - iter 1563/5212 - loss 0.15616420 - time (sec): 408.26 - samples/sec: 267.36 - lr: 0.000155 - momentum: 0.000000
2023-10-12 06:57:48,778 epoch 2 - iter 2084/5212 - loss 0.15588028 - time (sec): 547.75 - samples/sec: 270.67 - lr: 0.000153 - momentum: 0.000000
2023-10-12 07:00:07,196 epoch 2 - iter 2605/5212 - loss 0.15245153 - time (sec): 686.17 - samples/sec: 269.12 - lr: 0.000151 - momentum: 0.000000
2023-10-12 07:02:22,055 epoch 2 - iter 3126/5212 - loss 0.15216461 - time (sec): 821.03 - samples/sec: 265.80 - lr: 0.000149 - momentum: 0.000000
2023-10-12 07:04:35,943 epoch 2 - iter 3647/5212 - loss 0.15284241 - time (sec): 954.91 - samples/sec: 262.54 - lr: 0.000148 - momentum: 0.000000
2023-10-12 07:06:55,423 epoch 2 - iter 4168/5212 - loss 0.14914771 - time (sec): 1094.39 - samples/sec: 264.14 - lr: 0.000146 - momentum: 0.000000
2023-10-12 07:09:18,351 epoch 2 - iter 4689/5212 - loss 0.14565854 - time (sec): 1237.32 - samples/sec: 266.98 - lr: 0.000144 - momentum: 0.000000
2023-10-12 07:11:33,790 epoch 2 - iter 5210/5212 - loss 0.14406346 - time (sec): 1372.76 - samples/sec: 267.60 - lr: 0.000142 - momentum: 0.000000
2023-10-12 07:11:34,209 ----------------------------------------------------------------------------------------------------
2023-10-12 07:11:34,209 EPOCH 2 done: loss 0.1441 - lr: 0.000142
2023-10-12 07:12:11,927 DEV : loss 0.1253765970468521 - f1-score (micro avg) 0.3506
2023-10-12 07:12:11,979 saving best model
2023-10-12 07:12:14,555 ----------------------------------------------------------------------------------------------------
2023-10-12 07:14:35,479 epoch 3 - iter 521/5212 - loss 0.09734501 - time (sec): 140.92 - samples/sec: 245.08 - lr: 0.000140 - momentum: 0.000000
2023-10-12 07:16:52,067 epoch 3 - iter 1042/5212 - loss 0.09555292 - time (sec): 277.51 - samples/sec: 243.06 - lr: 0.000139 - momentum: 0.000000
2023-10-12 07:19:15,804 epoch 3 - iter 1563/5212 - loss 0.09971065 - time (sec): 421.24 - samples/sec: 257.63 - lr: 0.000137 - momentum: 0.000000
2023-10-12 07:21:33,165 epoch 3 - iter 2084/5212 - loss 0.09886889 - time (sec): 558.61 - samples/sec: 257.86 - lr: 0.000135 - momentum: 0.000000
2023-10-12 07:23:51,596 epoch 3 - iter 2605/5212 - loss 0.09840080 - time (sec): 697.04 - samples/sec: 256.67 - lr: 0.000133 - momentum: 0.000000
2023-10-12 07:26:18,620 epoch 3 - iter 3126/5212 - loss 0.09591881 - time (sec): 844.06 - samples/sec: 259.68 - lr: 0.000132 - momentum: 0.000000
2023-10-12 07:28:46,160 epoch 3 - iter 3647/5212 - loss 0.09702574 - time (sec): 991.60 - samples/sec: 261.92 - lr: 0.000130 - momentum: 0.000000
2023-10-12 07:31:05,439 epoch 3 - iter 4168/5212 - loss 0.09812998 - time (sec): 1130.88 - samples/sec: 258.75 - lr: 0.000128 - momentum: 0.000000
2023-10-12 07:33:30,593 epoch 3 - iter 4689/5212 - loss 0.10064749 - time (sec): 1276.03 - samples/sec: 258.23 - lr: 0.000126 - momentum: 0.000000
2023-10-12 07:35:54,377 epoch 3 - iter 5210/5212 - loss 0.09998955 - time (sec): 1419.82 - samples/sec: 258.66 - lr: 0.000124 - momentum: 0.000000
2023-10-12 07:35:54,900 ----------------------------------------------------------------------------------------------------
2023-10-12 07:35:54,901 EPOCH 3 done: loss 0.1000 - lr: 0.000124
2023-10-12 07:36:33,027 DEV : loss 0.2415073961019516 - f1-score (micro avg) 0.3658
2023-10-12 07:36:33,078 saving best model
2023-10-12 07:36:35,643 ----------------------------------------------------------------------------------------------------
2023-10-12 07:38:56,125 epoch 4 - iter 521/5212 - loss 0.06307318 - time (sec): 140.48 - samples/sec: 257.95 - lr: 0.000123 - momentum: 0.000000
2023-10-12 07:41:17,505 epoch 4 - iter 1042/5212 - loss 0.06517042 - time (sec): 281.86 - samples/sec: 263.32 - lr: 0.000121 - momentum: 0.000000
2023-10-12 07:43:40,499 epoch 4 - iter 1563/5212 - loss 0.06318112 - time (sec): 424.85 - samples/sec: 265.53 - lr: 0.000119 - momentum: 0.000000
2023-10-12 07:46:02,286 epoch 4 - iter 2084/5212 - loss 0.06508855 - time (sec): 566.64 - samples/sec: 265.94 - lr: 0.000117 - momentum: 0.000000
2023-10-12 07:48:22,695 epoch 4 - iter 2605/5212 - loss 0.06477669 - time (sec): 707.05 - samples/sec: 263.75 - lr: 0.000116 - momentum: 0.000000
2023-10-12 07:50:43,698 epoch 4 - iter 3126/5212 - loss 0.06348635 - time (sec): 848.05 - samples/sec: 264.37 - lr: 0.000114 - momentum: 0.000000
2023-10-12 07:53:11,823 epoch 4 - iter 3647/5212 - loss 0.06400079 - time (sec): 996.18 - samples/sec: 261.69 - lr: 0.000112 - momentum: 0.000000
2023-10-12 07:55:39,211 epoch 4 - iter 4168/5212 - loss 0.06527752 - time (sec): 1143.56 - samples/sec: 258.12 - lr: 0.000110 - momentum: 0.000000
2023-10-12 07:58:13,334 epoch 4 - iter 4689/5212 - loss 0.06466237 - time (sec): 1297.69 - samples/sec: 256.36 - lr: 0.000108 - momentum: 0.000000
2023-10-12 08:00:43,967 epoch 4 - iter 5210/5212 - loss 0.06493695 - time (sec): 1448.32 - samples/sec: 253.64 - lr: 0.000107 - momentum: 0.000000
2023-10-12 08:00:44,441 ----------------------------------------------------------------------------------------------------
2023-10-12 08:00:44,441 EPOCH 4 done: loss 0.0649 - lr: 0.000107
2023-10-12 08:01:24,860 DEV : loss 0.3256777822971344 - f1-score (micro avg) 0.3561
2023-10-12 08:01:24,915 ----------------------------------------------------------------------------------------------------
2023-10-12 08:03:55,582 epoch 5 - iter 521/5212 - loss 0.03739825 - time (sec): 150.66 - samples/sec: 240.43 - lr: 0.000105 - momentum: 0.000000
2023-10-12 08:06:27,648 epoch 5 - iter 1042/5212 - loss 0.04090493 - time (sec): 302.73 - samples/sec: 239.94 - lr: 0.000103 - momentum: 0.000000
2023-10-12 08:08:54,401 epoch 5 - iter 1563/5212 - loss 0.04076288 - time (sec): 449.48 - samples/sec: 237.59 - lr: 0.000101 - momentum: 0.000000
2023-10-12 08:11:24,674 epoch 5 - iter 2084/5212 - loss 0.04217115 - time (sec): 599.76 - samples/sec: 242.15 - lr: 0.000100 - momentum: 0.000000
2023-10-12 08:13:53,967 epoch 5 - iter 2605/5212 - loss 0.04382995 - time (sec): 749.05 - samples/sec: 240.00 - lr: 0.000098 - momentum: 0.000000
2023-10-12 08:16:24,187 epoch 5 - iter 3126/5212 - loss 0.04433151 - time (sec): 899.27 - samples/sec: 241.23 - lr: 0.000096 - momentum: 0.000000
2023-10-12 08:18:53,633 epoch 5 - iter 3647/5212 - loss 0.04289213 - time (sec): 1048.72 - samples/sec: 243.62 - lr: 0.000094 - momentum: 0.000000
2023-10-12 08:21:24,409 epoch 5 - iter 4168/5212 - loss 0.04379848 - time (sec): 1199.49 - samples/sec: 244.00 - lr: 0.000092 - momentum: 0.000000
2023-10-12 08:23:52,868 epoch 5 - iter 4689/5212 - loss 0.04476832 - time (sec): 1347.95 - samples/sec: 243.72 - lr: 0.000091 - momentum: 0.000000
2023-10-12 08:26:28,783 epoch 5 - iter 5210/5212 - loss 0.04555013 - time (sec): 1503.87 - samples/sec: 244.28 - lr: 0.000089 - momentum: 0.000000
2023-10-12 08:26:29,210 ----------------------------------------------------------------------------------------------------
2023-10-12 08:26:29,211 EPOCH 5 done: loss 0.0456 - lr: 0.000089
2023-10-12 08:27:10,531 DEV : loss 0.263163298368454 - f1-score (micro avg) 0.4003
2023-10-12 08:27:10,587 saving best model
2023-10-12 08:27:13,284 ----------------------------------------------------------------------------------------------------
2023-10-12 08:29:48,146 epoch 6 - iter 521/5212 - loss 0.03123258 - time (sec): 154.86 - samples/sec: 246.96 - lr: 0.000087 - momentum: 0.000000
2023-10-12 08:32:19,404 epoch 6 - iter 1042/5212 - loss 0.02842938 - time (sec): 306.12 - samples/sec: 248.85 - lr: 0.000085 - momentum: 0.000000
2023-10-12 08:34:49,046 epoch 6 - iter 1563/5212 - loss 0.02734753 - time (sec): 455.76 - samples/sec: 245.42 - lr: 0.000084 - momentum: 0.000000
2023-10-12 08:37:18,531 epoch 6 - iter 2084/5212 - loss 0.02823876 - time (sec): 605.24 - samples/sec: 243.11 - lr: 0.000082 - momentum: 0.000000
2023-10-12 08:39:53,567 epoch 6 - iter 2605/5212 - loss 0.02786838 - time (sec): 760.28 - samples/sec: 245.30 - lr: 0.000080 - momentum: 0.000000
2023-10-12 08:42:25,382 epoch 6 - iter 3126/5212 - loss 0.02839744 - time (sec): 912.09 - samples/sec: 245.74 - lr: 0.000078 - momentum: 0.000000
2023-10-12 08:44:57,149 epoch 6 - iter 3647/5212 - loss 0.02951283 - time (sec): 1063.86 - samples/sec: 244.21 - lr: 0.000076 - momentum: 0.000000
2023-10-12 08:47:28,472 epoch 6 - iter 4168/5212 - loss 0.02979875 - time (sec): 1215.18 - samples/sec: 243.16 - lr: 0.000075 - momentum: 0.000000
2023-10-12 08:50:03,415 epoch 6 - iter 4689/5212 - loss 0.03174850 - time (sec): 1370.13 - samples/sec: 242.29 - lr: 0.000073 - momentum: 0.000000
2023-10-12 08:52:35,553 epoch 6 - iter 5210/5212 - loss 0.03173940 - time (sec): 1522.26 - samples/sec: 241.32 - lr: 0.000071 - momentum: 0.000000
2023-10-12 08:52:36,019 ----------------------------------------------------------------------------------------------------
2023-10-12 08:52:36,019 EPOCH 6 done: loss 0.0317 - lr: 0.000071
2023-10-12 08:53:17,512 DEV : loss 0.38352152705192566 - f1-score (micro avg) 0.3891
2023-10-12 08:53:17,573 ----------------------------------------------------------------------------------------------------
2023-10-12 08:55:50,278 epoch 7 - iter 521/5212 - loss 0.02027192 - time (sec): 152.70 - samples/sec: 240.26 - lr: 0.000069 - momentum: 0.000000
2023-10-12 08:58:23,318 epoch 7 - iter 1042/5212 - loss 0.02172861 - time (sec): 305.74 - samples/sec: 250.12 - lr: 0.000068 - momentum: 0.000000
2023-10-12 09:00:54,230 epoch 7 - iter 1563/5212 - loss 0.02271727 - time (sec): 456.65 - samples/sec: 244.80 - lr: 0.000066 - momentum: 0.000000
2023-10-12 09:03:26,584 epoch 7 - iter 2084/5212 - loss 0.02225368 - time (sec): 609.01 - samples/sec: 245.02 - lr: 0.000064 - momentum: 0.000000
2023-10-12 09:05:56,723 epoch 7 - iter 2605/5212 - loss 0.02220231 - time (sec): 759.15 - samples/sec: 244.06 - lr: 0.000062 - momentum: 0.000000
2023-10-12 09:08:30,667 epoch 7 - iter 3126/5212 - loss 0.02190183 - time (sec): 913.09 - samples/sec: 244.41 - lr: 0.000060 - momentum: 0.000000
2023-10-12 09:11:01,507 epoch 7 - iter 3647/5212 - loss 0.02278464 - time (sec): 1063.93 - samples/sec: 242.59 - lr: 0.000059 - momentum: 0.000000
2023-10-12 09:13:32,093 epoch 7 - iter 4168/5212 - loss 0.02211897 - time (sec): 1214.52 - samples/sec: 243.57 - lr: 0.000057 - momentum: 0.000000
2023-10-12 09:16:02,927 epoch 7 - iter 4689/5212 - loss 0.02201655 - time (sec): 1365.35 - samples/sec: 242.68 - lr: 0.000055 - momentum: 0.000000
2023-10-12 09:18:33,963 epoch 7 - iter 5210/5212 - loss 0.02167740 - time (sec): 1516.39 - samples/sec: 242.23 - lr: 0.000053 - momentum: 0.000000
2023-10-12 09:18:34,449 ----------------------------------------------------------------------------------------------------
2023-10-12 09:18:34,449 EPOCH 7 done: loss 0.0217 - lr: 0.000053
2023-10-12 09:19:15,575 DEV : loss 0.4565373957157135 - f1-score (micro avg) 0.3855
2023-10-12 09:19:15,632 ----------------------------------------------------------------------------------------------------
2023-10-12 09:21:49,016 epoch 8 - iter 521/5212 - loss 0.01518849 - time (sec): 153.38 - samples/sec: 242.58 - lr: 0.000052 - momentum: 0.000000
2023-10-12 09:24:21,066 epoch 8 - iter 1042/5212 - loss 0.01802269 - time (sec): 305.43 - samples/sec: 249.30 - lr: 0.000050 - momentum: 0.000000
2023-10-12 09:26:55,977 epoch 8 - iter 1563/5212 - loss 0.01788164 - time (sec): 460.34 - samples/sec: 255.33 - lr: 0.000048 - momentum: 0.000000
2023-10-12 09:29:26,001 epoch 8 - iter 2084/5212 - loss 0.01803102 - time (sec): 610.37 - samples/sec: 251.49 - lr: 0.000046 - momentum: 0.000000
2023-10-12 09:31:54,221 epoch 8 - iter 2605/5212 - loss 0.01787651 - time (sec): 758.59 - samples/sec: 248.40 - lr: 0.000044 - momentum: 0.000000
2023-10-12 09:34:22,255 epoch 8 - iter 3126/5212 - loss 0.01738394 - time (sec): 906.62 - samples/sec: 245.47 - lr: 0.000043 - momentum: 0.000000
2023-10-12 09:36:52,062 epoch 8 - iter 3647/5212 - loss 0.01601736 - time (sec): 1056.43 - samples/sec: 243.28 - lr: 0.000041 - momentum: 0.000000
2023-10-12 09:39:21,737 epoch 8 - iter 4168/5212 - loss 0.01630703 - time (sec): 1206.10 - samples/sec: 243.03 - lr: 0.000039 - momentum: 0.000000
2023-10-12 09:41:51,186 epoch 8 - iter 4689/5212 - loss 0.01592637 - time (sec): 1355.55 - samples/sec: 242.64 - lr: 0.000037 - momentum: 0.000000
2023-10-12 09:44:24,639 epoch 8 - iter 5210/5212 - loss 0.01658403 - time (sec): 1509.00 - samples/sec: 243.45 - lr: 0.000036 - momentum: 0.000000
2023-10-12 09:44:25,102 ----------------------------------------------------------------------------------------------------
2023-10-12 09:44:25,102 EPOCH 8 done: loss 0.0166 - lr: 0.000036
2023-10-12 09:45:06,697 DEV : loss 0.47024932503700256 - f1-score (micro avg) 0.3874
2023-10-12 09:45:06,754 ----------------------------------------------------------------------------------------------------
2023-10-12 09:47:40,905 epoch 9 - iter 521/5212 - loss 0.00960047 - time (sec): 154.15 - samples/sec: 251.93 - lr: 0.000034 - momentum: 0.000000
2023-10-12 09:50:19,961 epoch 9 - iter 1042/5212 - loss 0.01249724 - time (sec): 313.20 - samples/sec: 245.66 - lr: 0.000032 - momentum: 0.000000
2023-10-12 09:52:56,352 epoch 9 - iter 1563/5212 - loss 0.01171469 - time (sec): 469.60 - samples/sec: 236.58 - lr: 0.000030 - momentum: 0.000000
2023-10-12 09:55:26,979 epoch 9 - iter 2084/5212 - loss 0.01188106 - time (sec): 620.22 - samples/sec: 235.40 - lr: 0.000028 - momentum: 0.000000
2023-10-12 09:57:57,662 epoch 9 - iter 2605/5212 - loss 0.01152806 - time (sec): 770.90 - samples/sec: 237.92 - lr: 0.000027 - momentum: 0.000000
2023-10-12 10:00:21,367 epoch 9 - iter 3126/5212 - loss 0.01157546 - time (sec): 914.61 - samples/sec: 240.46 - lr: 0.000025 - momentum: 0.000000
2023-10-12 10:02:55,245 epoch 9 - iter 3647/5212 - loss 0.01097260 - time (sec): 1068.49 - samples/sec: 242.57 - lr: 0.000023 - momentum: 0.000000
2023-10-12 10:05:24,412 epoch 9 - iter 4168/5212 - loss 0.01044367 - time (sec): 1217.65 - samples/sec: 241.16 - lr: 0.000021 - momentum: 0.000000
2023-10-12 10:07:55,416 epoch 9 - iter 4689/5212 - loss 0.01056193 - time (sec): 1368.66 - samples/sec: 241.20 - lr: 0.000020 - momentum: 0.000000
2023-10-12 10:10:32,837 epoch 9 - iter 5210/5212 - loss 0.01115296 - time (sec): 1526.08 - samples/sec: 240.73 - lr: 0.000018 - momentum: 0.000000
2023-10-12 10:10:33,340 ----------------------------------------------------------------------------------------------------
2023-10-12 10:10:33,340 EPOCH 9 done: loss 0.0112 - lr: 0.000018
2023-10-12 10:11:14,544 DEV : loss 0.46671026945114136 - f1-score (micro avg) 0.3997
2023-10-12 10:11:14,602 ----------------------------------------------------------------------------------------------------
2023-10-12 10:13:42,802 epoch 10 - iter 521/5212 - loss 0.00611141 - time (sec): 148.20 - samples/sec: 241.15 - lr: 0.000016 - momentum: 0.000000
2023-10-12 10:16:14,619 epoch 10 - iter 1042/5212 - loss 0.00568024 - time (sec): 300.02 - samples/sec: 240.82 - lr: 0.000014 - momentum: 0.000000
2023-10-12 10:18:46,930 epoch 10 - iter 1563/5212 - loss 0.00566509 - time (sec): 452.33 - samples/sec: 245.10 - lr: 0.000012 - momentum: 0.000000
2023-10-12 10:21:16,717 epoch 10 - iter 2084/5212 - loss 0.00562846 - time (sec): 602.11 - samples/sec: 244.14 - lr: 0.000011 - momentum: 0.000000
2023-10-12 10:23:46,749 epoch 10 - iter 2605/5212 - loss 0.00614487 - time (sec): 752.15 - samples/sec: 243.23 - lr: 0.000009 - momentum: 0.000000
2023-10-12 10:26:18,183 epoch 10 - iter 3126/5212 - loss 0.00606442 - time (sec): 903.58 - samples/sec: 242.10 - lr: 0.000007 - momentum: 0.000000
2023-10-12 10:28:50,844 epoch 10 - iter 3647/5212 - loss 0.00629741 - time (sec): 1056.24 - samples/sec: 242.68 - lr: 0.000005 - momentum: 0.000000
2023-10-12 10:31:25,229 epoch 10 - iter 4168/5212 - loss 0.00581098 - time (sec): 1210.62 - samples/sec: 243.91 - lr: 0.000004 - momentum: 0.000000
2023-10-12 10:33:56,605 epoch 10 - iter 4689/5212 - loss 0.00597572 - time (sec): 1362.00 - samples/sec: 243.59 - lr: 0.000002 - momentum: 0.000000
2023-10-12 10:36:29,666 epoch 10 - iter 5210/5212 - loss 0.00617301 - time (sec): 1515.06 - samples/sec: 242.46 - lr: 0.000000 - momentum: 0.000000
2023-10-12 10:36:30,159 ----------------------------------------------------------------------------------------------------
2023-10-12 10:36:30,160 EPOCH 10 done: loss 0.0062 - lr: 0.000000
2023-10-12 10:37:12,484 DEV : loss 0.4969789683818817 - f1-score (micro avg) 0.4007
2023-10-12 10:37:12,548 saving best model
2023-10-12 10:37:16,211 ----------------------------------------------------------------------------------------------------
2023-10-12 10:37:16,213 Loading model from best epoch ...
2023-10-12 10:37:20,237 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-12 10:39:01,582
Results:
- F-score (micro) 0.472
- F-score (macro) 0.3229
- Accuracy 0.3142
By class:
precision recall f1-score support
LOC 0.4978 0.5601 0.5271 1214
PER 0.4233 0.5297 0.4706 808
ORG 0.2930 0.2946 0.2938 353
HumanProd 0.0000 0.0000 0.0000 15
micro avg 0.4414 0.5071 0.4720 2390
macro avg 0.3035 0.3461 0.3229 2390
weighted avg 0.4393 0.5071 0.4702 2390
2023-10-12 10:39:01,583 ----------------------------------------------------------------------------------------------------