stefan-it's picture
Upload folder using huggingface_hub
f5ddf2b
2023-10-11 21:43:48,273 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,275 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 21:43:48,275 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,275 MultiCorpus: 5777 train + 722 dev + 723 test sentences
- NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-11 21:43:48,275 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,275 Train: 5777 sentences
2023-10-11 21:43:48,275 (train_with_dev=False, train_with_test=False)
2023-10-11 21:43:48,275 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,276 Training Params:
2023-10-11 21:43:48,276 - learning_rate: "0.00015"
2023-10-11 21:43:48,276 - mini_batch_size: "4"
2023-10-11 21:43:48,276 - max_epochs: "10"
2023-10-11 21:43:48,276 - shuffle: "True"
2023-10-11 21:43:48,276 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,276 Plugins:
2023-10-11 21:43:48,276 - TensorboardLogger
2023-10-11 21:43:48,276 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 21:43:48,276 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,276 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 21:43:48,276 - metric: "('micro avg', 'f1-score')"
2023-10-11 21:43:48,276 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,276 Computation:
2023-10-11 21:43:48,276 - compute on device: cuda:0
2023-10-11 21:43:48,276 - embedding storage: none
2023-10-11 21:43:48,276 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,277 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2"
2023-10-11 21:43:48,277 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,277 ----------------------------------------------------------------------------------------------------
2023-10-11 21:43:48,277 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 21:44:31,372 epoch 1 - iter 144/1445 - loss 2.56613189 - time (sec): 43.09 - samples/sec: 431.43 - lr: 0.000015 - momentum: 0.000000
2023-10-11 21:45:13,718 epoch 1 - iter 288/1445 - loss 2.45511613 - time (sec): 85.44 - samples/sec: 417.54 - lr: 0.000030 - momentum: 0.000000
2023-10-11 21:46:03,679 epoch 1 - iter 432/1445 - loss 2.18917637 - time (sec): 135.40 - samples/sec: 398.18 - lr: 0.000045 - momentum: 0.000000
2023-10-11 21:46:51,899 epoch 1 - iter 576/1445 - loss 1.90175372 - time (sec): 183.62 - samples/sec: 388.46 - lr: 0.000060 - momentum: 0.000000
2023-10-11 21:47:35,925 epoch 1 - iter 720/1445 - loss 1.62081885 - time (sec): 227.65 - samples/sec: 391.46 - lr: 0.000075 - momentum: 0.000000
2023-10-11 21:48:19,686 epoch 1 - iter 864/1445 - loss 1.39933345 - time (sec): 271.41 - samples/sec: 393.50 - lr: 0.000090 - momentum: 0.000000
2023-10-11 21:49:04,987 epoch 1 - iter 1008/1445 - loss 1.23677883 - time (sec): 316.71 - samples/sec: 393.63 - lr: 0.000105 - momentum: 0.000000
2023-10-11 21:49:52,040 epoch 1 - iter 1152/1445 - loss 1.11004262 - time (sec): 363.76 - samples/sec: 389.52 - lr: 0.000119 - momentum: 0.000000
2023-10-11 21:50:38,645 epoch 1 - iter 1296/1445 - loss 1.00603691 - time (sec): 410.37 - samples/sec: 388.26 - lr: 0.000134 - momentum: 0.000000
2023-10-11 21:51:22,208 epoch 1 - iter 1440/1445 - loss 0.92641471 - time (sec): 453.93 - samples/sec: 387.25 - lr: 0.000149 - momentum: 0.000000
2023-10-11 21:51:23,376 ----------------------------------------------------------------------------------------------------
2023-10-11 21:51:23,376 EPOCH 1 done: loss 0.9247 - lr: 0.000149
2023-10-11 21:51:43,816 DEV : loss 0.18985705077648163 - f1-score (micro avg) 0.3741
2023-10-11 21:51:43,847 saving best model
2023-10-11 21:51:44,770 ----------------------------------------------------------------------------------------------------
2023-10-11 21:52:27,449 epoch 2 - iter 144/1445 - loss 0.14481160 - time (sec): 42.68 - samples/sec: 399.05 - lr: 0.000148 - momentum: 0.000000
2023-10-11 21:53:11,274 epoch 2 - iter 288/1445 - loss 0.14361979 - time (sec): 86.50 - samples/sec: 401.10 - lr: 0.000147 - momentum: 0.000000
2023-10-11 21:53:54,180 epoch 2 - iter 432/1445 - loss 0.13638655 - time (sec): 129.41 - samples/sec: 409.63 - lr: 0.000145 - momentum: 0.000000
2023-10-11 21:54:37,286 epoch 2 - iter 576/1445 - loss 0.13335780 - time (sec): 172.51 - samples/sec: 412.26 - lr: 0.000143 - momentum: 0.000000
2023-10-11 21:55:21,432 epoch 2 - iter 720/1445 - loss 0.12713278 - time (sec): 216.66 - samples/sec: 409.48 - lr: 0.000142 - momentum: 0.000000
2023-10-11 21:56:03,104 epoch 2 - iter 864/1445 - loss 0.12400258 - time (sec): 258.33 - samples/sec: 409.74 - lr: 0.000140 - momentum: 0.000000
2023-10-11 21:56:47,352 epoch 2 - iter 1008/1445 - loss 0.12005883 - time (sec): 302.58 - samples/sec: 407.29 - lr: 0.000138 - momentum: 0.000000
2023-10-11 21:57:31,108 epoch 2 - iter 1152/1445 - loss 0.11751552 - time (sec): 346.34 - samples/sec: 406.59 - lr: 0.000137 - momentum: 0.000000
2023-10-11 21:58:13,976 epoch 2 - iter 1296/1445 - loss 0.11540253 - time (sec): 389.20 - samples/sec: 404.69 - lr: 0.000135 - momentum: 0.000000
2023-10-11 21:59:01,670 epoch 2 - iter 1440/1445 - loss 0.11322182 - time (sec): 436.90 - samples/sec: 402.21 - lr: 0.000133 - momentum: 0.000000
2023-10-11 21:59:03,067 ----------------------------------------------------------------------------------------------------
2023-10-11 21:59:03,067 EPOCH 2 done: loss 0.1133 - lr: 0.000133
2023-10-11 21:59:25,392 DEV : loss 0.09209223836660385 - f1-score (micro avg) 0.7901
2023-10-11 21:59:25,426 saving best model
2023-10-11 21:59:34,741 ----------------------------------------------------------------------------------------------------
2023-10-11 22:00:17,606 epoch 3 - iter 144/1445 - loss 0.08766300 - time (sec): 42.86 - samples/sec: 424.42 - lr: 0.000132 - momentum: 0.000000
2023-10-11 22:01:00,870 epoch 3 - iter 288/1445 - loss 0.07859983 - time (sec): 86.12 - samples/sec: 409.94 - lr: 0.000130 - momentum: 0.000000
2023-10-11 22:01:45,034 epoch 3 - iter 432/1445 - loss 0.07368106 - time (sec): 130.29 - samples/sec: 402.84 - lr: 0.000128 - momentum: 0.000000
2023-10-11 22:02:28,697 epoch 3 - iter 576/1445 - loss 0.07207079 - time (sec): 173.95 - samples/sec: 398.48 - lr: 0.000127 - momentum: 0.000000
2023-10-11 22:03:10,644 epoch 3 - iter 720/1445 - loss 0.06966165 - time (sec): 215.90 - samples/sec: 401.78 - lr: 0.000125 - momentum: 0.000000
2023-10-11 22:03:55,232 epoch 3 - iter 864/1445 - loss 0.07048286 - time (sec): 260.49 - samples/sec: 406.23 - lr: 0.000123 - momentum: 0.000000
2023-10-11 22:04:37,732 epoch 3 - iter 1008/1445 - loss 0.07055815 - time (sec): 302.99 - samples/sec: 405.74 - lr: 0.000122 - momentum: 0.000000
2023-10-11 22:05:19,641 epoch 3 - iter 1152/1445 - loss 0.06860905 - time (sec): 344.90 - samples/sec: 406.49 - lr: 0.000120 - momentum: 0.000000
2023-10-11 22:06:02,514 epoch 3 - iter 1296/1445 - loss 0.06746734 - time (sec): 387.77 - samples/sec: 405.33 - lr: 0.000118 - momentum: 0.000000
2023-10-11 22:06:45,691 epoch 3 - iter 1440/1445 - loss 0.06670716 - time (sec): 430.95 - samples/sec: 407.69 - lr: 0.000117 - momentum: 0.000000
2023-10-11 22:06:46,939 ----------------------------------------------------------------------------------------------------
2023-10-11 22:06:46,939 EPOCH 3 done: loss 0.0666 - lr: 0.000117
2023-10-11 22:07:08,004 DEV : loss 0.08132287114858627 - f1-score (micro avg) 0.8402
2023-10-11 22:07:08,033 saving best model
2023-10-11 22:07:10,846 ----------------------------------------------------------------------------------------------------
2023-10-11 22:07:56,901 epoch 4 - iter 144/1445 - loss 0.05158645 - time (sec): 46.05 - samples/sec: 380.50 - lr: 0.000115 - momentum: 0.000000
2023-10-11 22:08:39,454 epoch 4 - iter 288/1445 - loss 0.04322719 - time (sec): 88.60 - samples/sec: 401.07 - lr: 0.000113 - momentum: 0.000000
2023-10-11 22:09:22,353 epoch 4 - iter 432/1445 - loss 0.04502248 - time (sec): 131.50 - samples/sec: 401.44 - lr: 0.000112 - momentum: 0.000000
2023-10-11 22:10:05,063 epoch 4 - iter 576/1445 - loss 0.04533770 - time (sec): 174.21 - samples/sec: 403.27 - lr: 0.000110 - momentum: 0.000000
2023-10-11 22:10:48,013 epoch 4 - iter 720/1445 - loss 0.04823559 - time (sec): 217.16 - samples/sec: 398.48 - lr: 0.000108 - momentum: 0.000000
2023-10-11 22:11:31,776 epoch 4 - iter 864/1445 - loss 0.04734272 - time (sec): 260.93 - samples/sec: 401.28 - lr: 0.000107 - momentum: 0.000000
2023-10-11 22:12:14,790 epoch 4 - iter 1008/1445 - loss 0.04796593 - time (sec): 303.94 - samples/sec: 404.98 - lr: 0.000105 - momentum: 0.000000
2023-10-11 22:12:57,535 epoch 4 - iter 1152/1445 - loss 0.04550591 - time (sec): 346.68 - samples/sec: 405.64 - lr: 0.000103 - momentum: 0.000000
2023-10-11 22:13:40,031 epoch 4 - iter 1296/1445 - loss 0.04326301 - time (sec): 389.18 - samples/sec: 410.29 - lr: 0.000102 - momentum: 0.000000
2023-10-11 22:14:19,474 epoch 4 - iter 1440/1445 - loss 0.04418604 - time (sec): 428.62 - samples/sec: 410.30 - lr: 0.000100 - momentum: 0.000000
2023-10-11 22:14:20,585 ----------------------------------------------------------------------------------------------------
2023-10-11 22:14:20,586 EPOCH 4 done: loss 0.0441 - lr: 0.000100
2023-10-11 22:14:40,862 DEV : loss 0.08113545924425125 - f1-score (micro avg) 0.8576
2023-10-11 22:14:40,893 saving best model
2023-10-11 22:14:43,508 ----------------------------------------------------------------------------------------------------
2023-10-11 22:15:28,377 epoch 5 - iter 144/1445 - loss 0.04035968 - time (sec): 44.86 - samples/sec: 393.65 - lr: 0.000098 - momentum: 0.000000
2023-10-11 22:16:14,096 epoch 5 - iter 288/1445 - loss 0.03196018 - time (sec): 90.58 - samples/sec: 389.38 - lr: 0.000097 - momentum: 0.000000
2023-10-11 22:16:55,049 epoch 5 - iter 432/1445 - loss 0.03344470 - time (sec): 131.54 - samples/sec: 406.68 - lr: 0.000095 - momentum: 0.000000
2023-10-11 22:17:35,393 epoch 5 - iter 576/1445 - loss 0.03320723 - time (sec): 171.88 - samples/sec: 411.81 - lr: 0.000093 - momentum: 0.000000
2023-10-11 22:18:17,041 epoch 5 - iter 720/1445 - loss 0.03057888 - time (sec): 213.53 - samples/sec: 419.12 - lr: 0.000092 - momentum: 0.000000
2023-10-11 22:18:57,338 epoch 5 - iter 864/1445 - loss 0.03031577 - time (sec): 253.83 - samples/sec: 416.95 - lr: 0.000090 - momentum: 0.000000
2023-10-11 22:19:39,454 epoch 5 - iter 1008/1445 - loss 0.03257003 - time (sec): 295.94 - samples/sec: 417.92 - lr: 0.000088 - momentum: 0.000000
2023-10-11 22:20:22,551 epoch 5 - iter 1152/1445 - loss 0.03228254 - time (sec): 339.04 - samples/sec: 416.03 - lr: 0.000087 - momentum: 0.000000
2023-10-11 22:21:06,771 epoch 5 - iter 1296/1445 - loss 0.03578187 - time (sec): 383.26 - samples/sec: 413.65 - lr: 0.000085 - momentum: 0.000000
2023-10-11 22:21:51,227 epoch 5 - iter 1440/1445 - loss 0.03531510 - time (sec): 427.72 - samples/sec: 410.54 - lr: 0.000083 - momentum: 0.000000
2023-10-11 22:21:52,682 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:52,683 EPOCH 5 done: loss 0.0352 - lr: 0.000083
2023-10-11 22:22:15,675 DEV : loss 0.10317305475473404 - f1-score (micro avg) 0.8457
2023-10-11 22:22:15,704 ----------------------------------------------------------------------------------------------------
2023-10-11 22:22:59,890 epoch 6 - iter 144/1445 - loss 0.02205187 - time (sec): 44.18 - samples/sec: 421.08 - lr: 0.000082 - momentum: 0.000000
2023-10-11 22:23:43,468 epoch 6 - iter 288/1445 - loss 0.01777970 - time (sec): 87.76 - samples/sec: 419.85 - lr: 0.000080 - momentum: 0.000000
2023-10-11 22:24:23,986 epoch 6 - iter 432/1445 - loss 0.02038655 - time (sec): 128.28 - samples/sec: 413.43 - lr: 0.000078 - momentum: 0.000000
2023-10-11 22:25:07,319 epoch 6 - iter 576/1445 - loss 0.02196497 - time (sec): 171.61 - samples/sec: 411.13 - lr: 0.000077 - momentum: 0.000000
2023-10-11 22:25:50,124 epoch 6 - iter 720/1445 - loss 0.02171133 - time (sec): 214.42 - samples/sec: 404.91 - lr: 0.000075 - momentum: 0.000000
2023-10-11 22:26:33,517 epoch 6 - iter 864/1445 - loss 0.02164208 - time (sec): 257.81 - samples/sec: 406.70 - lr: 0.000073 - momentum: 0.000000
2023-10-11 22:27:18,879 epoch 6 - iter 1008/1445 - loss 0.02727128 - time (sec): 303.17 - samples/sec: 407.42 - lr: 0.000072 - momentum: 0.000000
2023-10-11 22:28:02,229 epoch 6 - iter 1152/1445 - loss 0.02637994 - time (sec): 346.52 - samples/sec: 407.96 - lr: 0.000070 - momentum: 0.000000
2023-10-11 22:28:43,733 epoch 6 - iter 1296/1445 - loss 0.02664456 - time (sec): 388.03 - samples/sec: 406.53 - lr: 0.000068 - momentum: 0.000000
2023-10-11 22:29:30,599 epoch 6 - iter 1440/1445 - loss 0.02710892 - time (sec): 434.89 - samples/sec: 403.31 - lr: 0.000067 - momentum: 0.000000
2023-10-11 22:29:32,275 ----------------------------------------------------------------------------------------------------
2023-10-11 22:29:32,276 EPOCH 6 done: loss 0.0271 - lr: 0.000067
2023-10-11 22:29:54,267 DEV : loss 0.1072750836610794 - f1-score (micro avg) 0.8649
2023-10-11 22:29:54,297 saving best model
2023-10-11 22:29:56,969 ----------------------------------------------------------------------------------------------------
2023-10-11 22:30:46,657 epoch 7 - iter 144/1445 - loss 0.02635537 - time (sec): 49.68 - samples/sec: 357.49 - lr: 0.000065 - momentum: 0.000000
2023-10-11 22:31:33,184 epoch 7 - iter 288/1445 - loss 0.02023054 - time (sec): 96.21 - samples/sec: 355.67 - lr: 0.000063 - momentum: 0.000000
2023-10-11 22:32:20,494 epoch 7 - iter 432/1445 - loss 0.01764895 - time (sec): 143.52 - samples/sec: 351.82 - lr: 0.000062 - momentum: 0.000000
2023-10-11 22:33:09,609 epoch 7 - iter 576/1445 - loss 0.01906232 - time (sec): 192.64 - samples/sec: 362.35 - lr: 0.000060 - momentum: 0.000000
2023-10-11 22:33:55,335 epoch 7 - iter 720/1445 - loss 0.02408914 - time (sec): 238.36 - samples/sec: 368.13 - lr: 0.000058 - momentum: 0.000000
2023-10-11 22:34:39,801 epoch 7 - iter 864/1445 - loss 0.02394349 - time (sec): 282.83 - samples/sec: 374.43 - lr: 0.000057 - momentum: 0.000000
2023-10-11 22:35:25,999 epoch 7 - iter 1008/1445 - loss 0.02201986 - time (sec): 329.03 - samples/sec: 372.91 - lr: 0.000055 - momentum: 0.000000
2023-10-11 22:36:14,321 epoch 7 - iter 1152/1445 - loss 0.02087086 - time (sec): 377.35 - samples/sec: 372.68 - lr: 0.000053 - momentum: 0.000000
2023-10-11 22:37:01,701 epoch 7 - iter 1296/1445 - loss 0.02084656 - time (sec): 424.73 - samples/sec: 372.61 - lr: 0.000052 - momentum: 0.000000
2023-10-11 22:37:44,560 epoch 7 - iter 1440/1445 - loss 0.02014863 - time (sec): 467.59 - samples/sec: 375.29 - lr: 0.000050 - momentum: 0.000000
2023-10-11 22:37:45,902 ----------------------------------------------------------------------------------------------------
2023-10-11 22:37:45,902 EPOCH 7 done: loss 0.0201 - lr: 0.000050
2023-10-11 22:38:07,040 DEV : loss 0.12641826272010803 - f1-score (micro avg) 0.859
2023-10-11 22:38:07,074 ----------------------------------------------------------------------------------------------------
2023-10-11 22:38:53,705 epoch 8 - iter 144/1445 - loss 0.00670521 - time (sec): 46.63 - samples/sec: 385.04 - lr: 0.000048 - momentum: 0.000000
2023-10-11 22:39:39,103 epoch 8 - iter 288/1445 - loss 0.01176368 - time (sec): 92.03 - samples/sec: 371.49 - lr: 0.000047 - momentum: 0.000000
2023-10-11 22:40:21,149 epoch 8 - iter 432/1445 - loss 0.01224252 - time (sec): 134.07 - samples/sec: 378.66 - lr: 0.000045 - momentum: 0.000000
2023-10-11 22:41:03,327 epoch 8 - iter 576/1445 - loss 0.01088605 - time (sec): 176.25 - samples/sec: 384.80 - lr: 0.000043 - momentum: 0.000000
2023-10-11 22:41:45,603 epoch 8 - iter 720/1445 - loss 0.01170222 - time (sec): 218.53 - samples/sec: 390.97 - lr: 0.000042 - momentum: 0.000000
2023-10-11 22:42:29,081 epoch 8 - iter 864/1445 - loss 0.01255260 - time (sec): 262.00 - samples/sec: 396.33 - lr: 0.000040 - momentum: 0.000000
2023-10-11 22:43:13,554 epoch 8 - iter 1008/1445 - loss 0.01365201 - time (sec): 306.48 - samples/sec: 400.25 - lr: 0.000038 - momentum: 0.000000
2023-10-11 22:43:56,080 epoch 8 - iter 1152/1445 - loss 0.01389711 - time (sec): 349.00 - samples/sec: 400.74 - lr: 0.000037 - momentum: 0.000000
2023-10-11 22:44:42,051 epoch 8 - iter 1296/1445 - loss 0.01404508 - time (sec): 394.97 - samples/sec: 400.50 - lr: 0.000035 - momentum: 0.000000
2023-10-11 22:45:23,596 epoch 8 - iter 1440/1445 - loss 0.01455041 - time (sec): 436.52 - samples/sec: 402.48 - lr: 0.000033 - momentum: 0.000000
2023-10-11 22:45:24,969 ----------------------------------------------------------------------------------------------------
2023-10-11 22:45:24,969 EPOCH 8 done: loss 0.0145 - lr: 0.000033
2023-10-11 22:45:48,608 DEV : loss 0.14573527872562408 - f1-score (micro avg) 0.8497
2023-10-11 22:45:48,654 ----------------------------------------------------------------------------------------------------
2023-10-11 22:46:30,779 epoch 9 - iter 144/1445 - loss 0.00824711 - time (sec): 42.12 - samples/sec: 407.35 - lr: 0.000032 - momentum: 0.000000
2023-10-11 22:47:11,606 epoch 9 - iter 288/1445 - loss 0.01013820 - time (sec): 82.95 - samples/sec: 410.28 - lr: 0.000030 - momentum: 0.000000
2023-10-11 22:47:53,881 epoch 9 - iter 432/1445 - loss 0.01054407 - time (sec): 125.23 - samples/sec: 408.55 - lr: 0.000028 - momentum: 0.000000
2023-10-11 22:48:37,536 epoch 9 - iter 576/1445 - loss 0.01159667 - time (sec): 168.88 - samples/sec: 413.26 - lr: 0.000027 - momentum: 0.000000
2023-10-11 22:49:20,546 epoch 9 - iter 720/1445 - loss 0.01246631 - time (sec): 211.89 - samples/sec: 418.08 - lr: 0.000025 - momentum: 0.000000
2023-10-11 22:50:01,940 epoch 9 - iter 864/1445 - loss 0.01199709 - time (sec): 253.28 - samples/sec: 419.72 - lr: 0.000023 - momentum: 0.000000
2023-10-11 22:50:44,345 epoch 9 - iter 1008/1445 - loss 0.01176200 - time (sec): 295.69 - samples/sec: 419.73 - lr: 0.000022 - momentum: 0.000000
2023-10-11 22:51:26,657 epoch 9 - iter 1152/1445 - loss 0.01140876 - time (sec): 338.00 - samples/sec: 418.90 - lr: 0.000020 - momentum: 0.000000
2023-10-11 22:52:08,330 epoch 9 - iter 1296/1445 - loss 0.01128398 - time (sec): 379.67 - samples/sec: 417.47 - lr: 0.000018 - momentum: 0.000000
2023-10-11 22:52:49,772 epoch 9 - iter 1440/1445 - loss 0.01080269 - time (sec): 421.12 - samples/sec: 417.01 - lr: 0.000017 - momentum: 0.000000
2023-10-11 22:52:51,055 ----------------------------------------------------------------------------------------------------
2023-10-11 22:52:51,056 EPOCH 9 done: loss 0.0108 - lr: 0.000017
2023-10-11 22:53:12,104 DEV : loss 0.14379249513149261 - f1-score (micro avg) 0.8522
2023-10-11 22:53:12,135 ----------------------------------------------------------------------------------------------------
2023-10-11 22:53:54,142 epoch 10 - iter 144/1445 - loss 0.00870221 - time (sec): 42.01 - samples/sec: 428.73 - lr: 0.000015 - momentum: 0.000000
2023-10-11 22:54:35,576 epoch 10 - iter 288/1445 - loss 0.00818486 - time (sec): 83.44 - samples/sec: 434.53 - lr: 0.000013 - momentum: 0.000000
2023-10-11 22:55:19,109 epoch 10 - iter 432/1445 - loss 0.01027080 - time (sec): 126.97 - samples/sec: 436.28 - lr: 0.000012 - momentum: 0.000000
2023-10-11 22:56:03,228 epoch 10 - iter 576/1445 - loss 0.00825222 - time (sec): 171.09 - samples/sec: 426.55 - lr: 0.000010 - momentum: 0.000000
2023-10-11 22:56:51,934 epoch 10 - iter 720/1445 - loss 0.00778683 - time (sec): 219.80 - samples/sec: 414.50 - lr: 0.000008 - momentum: 0.000000
2023-10-11 22:57:36,523 epoch 10 - iter 864/1445 - loss 0.00795167 - time (sec): 264.39 - samples/sec: 406.20 - lr: 0.000007 - momentum: 0.000000
2023-10-11 22:58:17,410 epoch 10 - iter 1008/1445 - loss 0.00851367 - time (sec): 305.27 - samples/sec: 407.63 - lr: 0.000005 - momentum: 0.000000
2023-10-11 22:58:58,536 epoch 10 - iter 1152/1445 - loss 0.00819079 - time (sec): 346.40 - samples/sec: 409.12 - lr: 0.000003 - momentum: 0.000000
2023-10-11 22:59:39,221 epoch 10 - iter 1296/1445 - loss 0.00816272 - time (sec): 387.08 - samples/sec: 410.50 - lr: 0.000002 - momentum: 0.000000
2023-10-11 23:00:19,487 epoch 10 - iter 1440/1445 - loss 0.00838020 - time (sec): 427.35 - samples/sec: 411.19 - lr: 0.000000 - momentum: 0.000000
2023-10-11 23:00:20,673 ----------------------------------------------------------------------------------------------------
2023-10-11 23:00:20,674 EPOCH 10 done: loss 0.0084 - lr: 0.000000
2023-10-11 23:00:41,686 DEV : loss 0.1489063799381256 - f1-score (micro avg) 0.8545
2023-10-11 23:00:42,634 ----------------------------------------------------------------------------------------------------
2023-10-11 23:00:42,636 Loading model from best epoch ...
2023-10-11 23:00:46,258 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-11 23:01:06,639
Results:
- F-score (micro) 0.8388
- F-score (macro) 0.7554
- Accuracy 0.7402
By class:
precision recall f1-score support
PER 0.8427 0.8444 0.8435 482
LOC 0.9051 0.8537 0.8787 458
ORG 0.5522 0.5362 0.5441 69
micro avg 0.8503 0.8276 0.8388 1009
macro avg 0.7667 0.7448 0.7554 1009
weighted avg 0.8511 0.8276 0.8390 1009
2023-10-11 23:01:06,640 ----------------------------------------------------------------------------------------------------