stefan-it's picture
Upload folder using huggingface_hub
459adee
2023-10-11 17:53:01,993 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,995 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 17:53:01,995 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,995 MultiCorpus: 5777 train + 722 dev + 723 test sentences
- NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-11 17:53:01,995 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,995 Train: 5777 sentences
2023-10-11 17:53:01,995 (train_with_dev=False, train_with_test=False)
2023-10-11 17:53:01,995 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,995 Training Params:
2023-10-11 17:53:01,995 - learning_rate: "0.00016"
2023-10-11 17:53:01,996 - mini_batch_size: "4"
2023-10-11 17:53:01,996 - max_epochs: "10"
2023-10-11 17:53:01,996 - shuffle: "True"
2023-10-11 17:53:01,996 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,996 Plugins:
2023-10-11 17:53:01,996 - TensorboardLogger
2023-10-11 17:53:01,996 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 17:53:01,996 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,996 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 17:53:01,996 - metric: "('micro avg', 'f1-score')"
2023-10-11 17:53:01,996 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,996 Computation:
2023-10-11 17:53:01,996 - compute on device: cuda:0
2023-10-11 17:53:01,996 - embedding storage: none
2023-10-11 17:53:01,996 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,996 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1"
2023-10-11 17:53:01,996 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,997 ----------------------------------------------------------------------------------------------------
2023-10-11 17:53:01,997 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 17:53:48,700 epoch 1 - iter 144/1445 - loss 2.57848411 - time (sec): 46.70 - samples/sec: 377.20 - lr: 0.000016 - momentum: 0.000000
2023-10-11 17:54:35,241 epoch 1 - iter 288/1445 - loss 2.46579897 - time (sec): 93.24 - samples/sec: 398.60 - lr: 0.000032 - momentum: 0.000000
2023-10-11 17:55:18,233 epoch 1 - iter 432/1445 - loss 2.21235139 - time (sec): 136.23 - samples/sec: 393.03 - lr: 0.000048 - momentum: 0.000000
2023-10-11 17:56:02,557 epoch 1 - iter 576/1445 - loss 1.90343871 - time (sec): 180.56 - samples/sec: 396.59 - lr: 0.000064 - momentum: 0.000000
2023-10-11 17:56:49,522 epoch 1 - iter 720/1445 - loss 1.59094900 - time (sec): 227.52 - samples/sec: 400.68 - lr: 0.000080 - momentum: 0.000000
2023-10-11 17:57:34,114 epoch 1 - iter 864/1445 - loss 1.38249162 - time (sec): 272.12 - samples/sec: 397.42 - lr: 0.000096 - momentum: 0.000000
2023-10-11 17:58:17,459 epoch 1 - iter 1008/1445 - loss 1.22459509 - time (sec): 315.46 - samples/sec: 398.14 - lr: 0.000112 - momentum: 0.000000
2023-10-11 17:59:01,341 epoch 1 - iter 1152/1445 - loss 1.10127657 - time (sec): 359.34 - samples/sec: 396.35 - lr: 0.000127 - momentum: 0.000000
2023-10-11 17:59:46,633 epoch 1 - iter 1296/1445 - loss 1.00799125 - time (sec): 404.63 - samples/sec: 390.82 - lr: 0.000143 - momentum: 0.000000
2023-10-11 18:00:30,412 epoch 1 - iter 1440/1445 - loss 0.92239231 - time (sec): 448.41 - samples/sec: 392.09 - lr: 0.000159 - momentum: 0.000000
2023-10-11 18:00:31,780 ----------------------------------------------------------------------------------------------------
2023-10-11 18:00:31,780 EPOCH 1 done: loss 0.9210 - lr: 0.000159
2023-10-11 18:00:56,886 DEV : loss 0.18090352416038513 - f1-score (micro avg) 0.3961
2023-10-11 18:00:56,921 saving best model
2023-10-11 18:00:57,825 ----------------------------------------------------------------------------------------------------
2023-10-11 18:01:42,534 epoch 2 - iter 144/1445 - loss 0.15903434 - time (sec): 44.71 - samples/sec: 403.36 - lr: 0.000158 - momentum: 0.000000
2023-10-11 18:02:26,877 epoch 2 - iter 288/1445 - loss 0.13983359 - time (sec): 89.05 - samples/sec: 398.86 - lr: 0.000156 - momentum: 0.000000
2023-10-11 18:03:07,959 epoch 2 - iter 432/1445 - loss 0.13648926 - time (sec): 130.13 - samples/sec: 397.19 - lr: 0.000155 - momentum: 0.000000
2023-10-11 18:03:51,251 epoch 2 - iter 576/1445 - loss 0.13003560 - time (sec): 173.42 - samples/sec: 399.93 - lr: 0.000153 - momentum: 0.000000
2023-10-11 18:04:34,290 epoch 2 - iter 720/1445 - loss 0.12853763 - time (sec): 216.46 - samples/sec: 403.76 - lr: 0.000151 - momentum: 0.000000
2023-10-11 18:05:15,657 epoch 2 - iter 864/1445 - loss 0.12641822 - time (sec): 257.83 - samples/sec: 406.68 - lr: 0.000149 - momentum: 0.000000
2023-10-11 18:05:57,359 epoch 2 - iter 1008/1445 - loss 0.12127856 - time (sec): 299.53 - samples/sec: 411.68 - lr: 0.000148 - momentum: 0.000000
2023-10-11 18:06:40,653 epoch 2 - iter 1152/1445 - loss 0.11582879 - time (sec): 342.83 - samples/sec: 417.08 - lr: 0.000146 - momentum: 0.000000
2023-10-11 18:07:23,210 epoch 2 - iter 1296/1445 - loss 0.11382167 - time (sec): 385.38 - samples/sec: 412.81 - lr: 0.000144 - momentum: 0.000000
2023-10-11 18:08:09,871 epoch 2 - iter 1440/1445 - loss 0.11247559 - time (sec): 432.04 - samples/sec: 406.15 - lr: 0.000142 - momentum: 0.000000
2023-10-11 18:08:11,565 ----------------------------------------------------------------------------------------------------
2023-10-11 18:08:11,565 EPOCH 2 done: loss 0.1123 - lr: 0.000142
2023-10-11 18:08:36,039 DEV : loss 0.08681953698396683 - f1-score (micro avg) 0.7837
2023-10-11 18:08:36,078 saving best model
2023-10-11 18:08:38,806 ----------------------------------------------------------------------------------------------------
2023-10-11 18:09:31,280 epoch 3 - iter 144/1445 - loss 0.06528553 - time (sec): 52.47 - samples/sec: 329.17 - lr: 0.000140 - momentum: 0.000000
2023-10-11 18:10:17,024 epoch 3 - iter 288/1445 - loss 0.07522556 - time (sec): 98.21 - samples/sec: 350.51 - lr: 0.000139 - momentum: 0.000000
2023-10-11 18:11:03,850 epoch 3 - iter 432/1445 - loss 0.06963333 - time (sec): 145.04 - samples/sec: 358.69 - lr: 0.000137 - momentum: 0.000000
2023-10-11 18:11:51,125 epoch 3 - iter 576/1445 - loss 0.07482525 - time (sec): 192.31 - samples/sec: 356.97 - lr: 0.000135 - momentum: 0.000000
2023-10-11 18:12:41,126 epoch 3 - iter 720/1445 - loss 0.07425414 - time (sec): 242.32 - samples/sec: 360.00 - lr: 0.000133 - momentum: 0.000000
2023-10-11 18:13:27,470 epoch 3 - iter 864/1445 - loss 0.07379735 - time (sec): 288.66 - samples/sec: 360.35 - lr: 0.000132 - momentum: 0.000000
2023-10-11 18:14:11,509 epoch 3 - iter 1008/1445 - loss 0.07309368 - time (sec): 332.70 - samples/sec: 365.23 - lr: 0.000130 - momentum: 0.000000
2023-10-11 18:14:54,347 epoch 3 - iter 1152/1445 - loss 0.07050961 - time (sec): 375.54 - samples/sec: 372.12 - lr: 0.000128 - momentum: 0.000000
2023-10-11 18:15:37,674 epoch 3 - iter 1296/1445 - loss 0.06815127 - time (sec): 418.86 - samples/sec: 377.35 - lr: 0.000126 - momentum: 0.000000
2023-10-11 18:16:22,160 epoch 3 - iter 1440/1445 - loss 0.06798324 - time (sec): 463.35 - samples/sec: 379.08 - lr: 0.000125 - momentum: 0.000000
2023-10-11 18:16:23,435 ----------------------------------------------------------------------------------------------------
2023-10-11 18:16:23,436 EPOCH 3 done: loss 0.0679 - lr: 0.000125
2023-10-11 18:16:44,977 DEV : loss 0.08381146192550659 - f1-score (micro avg) 0.836
2023-10-11 18:16:45,010 saving best model
2023-10-11 18:16:47,752 ----------------------------------------------------------------------------------------------------
2023-10-11 18:17:31,679 epoch 4 - iter 144/1445 - loss 0.05699248 - time (sec): 43.92 - samples/sec: 411.69 - lr: 0.000123 - momentum: 0.000000
2023-10-11 18:18:15,766 epoch 4 - iter 288/1445 - loss 0.04727960 - time (sec): 88.01 - samples/sec: 398.50 - lr: 0.000121 - momentum: 0.000000
2023-10-11 18:18:59,968 epoch 4 - iter 432/1445 - loss 0.04792294 - time (sec): 132.21 - samples/sec: 399.98 - lr: 0.000119 - momentum: 0.000000
2023-10-11 18:19:45,773 epoch 4 - iter 576/1445 - loss 0.04725140 - time (sec): 178.02 - samples/sec: 394.65 - lr: 0.000117 - momentum: 0.000000
2023-10-11 18:20:31,218 epoch 4 - iter 720/1445 - loss 0.04662759 - time (sec): 223.46 - samples/sec: 390.40 - lr: 0.000116 - momentum: 0.000000
2023-10-11 18:21:12,959 epoch 4 - iter 864/1445 - loss 0.04657745 - time (sec): 265.20 - samples/sec: 391.32 - lr: 0.000114 - momentum: 0.000000
2023-10-11 18:21:55,577 epoch 4 - iter 1008/1445 - loss 0.04690755 - time (sec): 307.82 - samples/sec: 393.01 - lr: 0.000112 - momentum: 0.000000
2023-10-11 18:22:41,034 epoch 4 - iter 1152/1445 - loss 0.04926445 - time (sec): 353.28 - samples/sec: 394.39 - lr: 0.000110 - momentum: 0.000000
2023-10-11 18:23:27,336 epoch 4 - iter 1296/1445 - loss 0.04789025 - time (sec): 399.58 - samples/sec: 393.40 - lr: 0.000109 - momentum: 0.000000
2023-10-11 18:24:12,177 epoch 4 - iter 1440/1445 - loss 0.04577433 - time (sec): 444.42 - samples/sec: 395.52 - lr: 0.000107 - momentum: 0.000000
2023-10-11 18:24:13,407 ----------------------------------------------------------------------------------------------------
2023-10-11 18:24:13,407 EPOCH 4 done: loss 0.0457 - lr: 0.000107
2023-10-11 18:24:34,260 DEV : loss 0.08963057398796082 - f1-score (micro avg) 0.8319
2023-10-11 18:24:34,290 ----------------------------------------------------------------------------------------------------
2023-10-11 18:25:16,456 epoch 5 - iter 144/1445 - loss 0.01933052 - time (sec): 42.16 - samples/sec: 423.58 - lr: 0.000105 - momentum: 0.000000
2023-10-11 18:25:58,242 epoch 5 - iter 288/1445 - loss 0.02130787 - time (sec): 83.95 - samples/sec: 411.79 - lr: 0.000103 - momentum: 0.000000
2023-10-11 18:26:40,391 epoch 5 - iter 432/1445 - loss 0.02806167 - time (sec): 126.10 - samples/sec: 407.37 - lr: 0.000101 - momentum: 0.000000
2023-10-11 18:27:26,424 epoch 5 - iter 576/1445 - loss 0.02869156 - time (sec): 172.13 - samples/sec: 402.10 - lr: 0.000100 - momentum: 0.000000
2023-10-11 18:28:12,829 epoch 5 - iter 720/1445 - loss 0.03182868 - time (sec): 218.54 - samples/sec: 401.94 - lr: 0.000098 - momentum: 0.000000
2023-10-11 18:28:57,212 epoch 5 - iter 864/1445 - loss 0.03012182 - time (sec): 262.92 - samples/sec: 396.52 - lr: 0.000096 - momentum: 0.000000
2023-10-11 18:29:41,667 epoch 5 - iter 1008/1445 - loss 0.03206491 - time (sec): 307.38 - samples/sec: 396.37 - lr: 0.000094 - momentum: 0.000000
2023-10-11 18:30:27,402 epoch 5 - iter 1152/1445 - loss 0.03245719 - time (sec): 353.11 - samples/sec: 398.52 - lr: 0.000093 - momentum: 0.000000
2023-10-11 18:31:11,210 epoch 5 - iter 1296/1445 - loss 0.03258037 - time (sec): 396.92 - samples/sec: 396.84 - lr: 0.000091 - momentum: 0.000000
2023-10-11 18:31:55,181 epoch 5 - iter 1440/1445 - loss 0.03298262 - time (sec): 440.89 - samples/sec: 398.53 - lr: 0.000089 - momentum: 0.000000
2023-10-11 18:31:56,379 ----------------------------------------------------------------------------------------------------
2023-10-11 18:31:56,380 EPOCH 5 done: loss 0.0330 - lr: 0.000089
2023-10-11 18:32:18,188 DEV : loss 0.10669823735952377 - f1-score (micro avg) 0.8496
2023-10-11 18:32:18,220 saving best model
2023-10-11 18:32:26,252 ----------------------------------------------------------------------------------------------------
2023-10-11 18:33:11,957 epoch 6 - iter 144/1445 - loss 0.02858235 - time (sec): 45.70 - samples/sec: 368.04 - lr: 0.000087 - momentum: 0.000000
2023-10-11 18:33:57,813 epoch 6 - iter 288/1445 - loss 0.02496902 - time (sec): 91.56 - samples/sec: 371.61 - lr: 0.000085 - momentum: 0.000000
2023-10-11 18:34:44,871 epoch 6 - iter 432/1445 - loss 0.02903031 - time (sec): 138.62 - samples/sec: 374.33 - lr: 0.000084 - momentum: 0.000000
2023-10-11 18:35:30,962 epoch 6 - iter 576/1445 - loss 0.02525110 - time (sec): 184.71 - samples/sec: 379.07 - lr: 0.000082 - momentum: 0.000000
2023-10-11 18:36:15,891 epoch 6 - iter 720/1445 - loss 0.02445476 - time (sec): 229.63 - samples/sec: 384.94 - lr: 0.000080 - momentum: 0.000000
2023-10-11 18:37:02,122 epoch 6 - iter 864/1445 - loss 0.02502730 - time (sec): 275.87 - samples/sec: 382.09 - lr: 0.000078 - momentum: 0.000000
2023-10-11 18:37:48,549 epoch 6 - iter 1008/1445 - loss 0.02407503 - time (sec): 322.29 - samples/sec: 379.12 - lr: 0.000076 - momentum: 0.000000
2023-10-11 18:38:33,282 epoch 6 - iter 1152/1445 - loss 0.02357627 - time (sec): 367.03 - samples/sec: 380.88 - lr: 0.000075 - momentum: 0.000000
2023-10-11 18:39:20,713 epoch 6 - iter 1296/1445 - loss 0.02464782 - time (sec): 414.46 - samples/sec: 384.58 - lr: 0.000073 - momentum: 0.000000
2023-10-11 18:40:05,384 epoch 6 - iter 1440/1445 - loss 0.02419495 - time (sec): 459.13 - samples/sec: 382.83 - lr: 0.000071 - momentum: 0.000000
2023-10-11 18:40:06,706 ----------------------------------------------------------------------------------------------------
2023-10-11 18:40:06,707 EPOCH 6 done: loss 0.0244 - lr: 0.000071
2023-10-11 18:40:29,735 DEV : loss 0.12241014838218689 - f1-score (micro avg) 0.8516
2023-10-11 18:40:29,808 saving best model
2023-10-11 18:40:40,863 ----------------------------------------------------------------------------------------------------
2023-10-11 18:41:26,037 epoch 7 - iter 144/1445 - loss 0.01832273 - time (sec): 45.17 - samples/sec: 396.36 - lr: 0.000069 - momentum: 0.000000
2023-10-11 18:42:08,635 epoch 7 - iter 288/1445 - loss 0.01737041 - time (sec): 87.77 - samples/sec: 397.61 - lr: 0.000068 - momentum: 0.000000
2023-10-11 18:42:51,383 epoch 7 - iter 432/1445 - loss 0.01452621 - time (sec): 130.51 - samples/sec: 397.04 - lr: 0.000066 - momentum: 0.000000
2023-10-11 18:43:36,181 epoch 7 - iter 576/1445 - loss 0.01641296 - time (sec): 175.31 - samples/sec: 392.78 - lr: 0.000064 - momentum: 0.000000
2023-10-11 18:44:20,666 epoch 7 - iter 720/1445 - loss 0.01644292 - time (sec): 219.80 - samples/sec: 393.55 - lr: 0.000062 - momentum: 0.000000
2023-10-11 18:45:05,619 epoch 7 - iter 864/1445 - loss 0.01754543 - time (sec): 264.75 - samples/sec: 394.37 - lr: 0.000060 - momentum: 0.000000
2023-10-11 18:45:52,489 epoch 7 - iter 1008/1445 - loss 0.01846302 - time (sec): 311.62 - samples/sec: 393.10 - lr: 0.000059 - momentum: 0.000000
2023-10-11 18:46:36,969 epoch 7 - iter 1152/1445 - loss 0.01791464 - time (sec): 356.10 - samples/sec: 392.82 - lr: 0.000057 - momentum: 0.000000
2023-10-11 18:47:23,692 epoch 7 - iter 1296/1445 - loss 0.01888407 - time (sec): 402.82 - samples/sec: 390.17 - lr: 0.000055 - momentum: 0.000000
2023-10-11 18:48:09,718 epoch 7 - iter 1440/1445 - loss 0.01856727 - time (sec): 448.85 - samples/sec: 391.01 - lr: 0.000053 - momentum: 0.000000
2023-10-11 18:48:11,296 ----------------------------------------------------------------------------------------------------
2023-10-11 18:48:11,296 EPOCH 7 done: loss 0.0185 - lr: 0.000053
2023-10-11 18:48:33,997 DEV : loss 0.1283944845199585 - f1-score (micro avg) 0.8483
2023-10-11 18:48:34,041 ----------------------------------------------------------------------------------------------------
2023-10-11 18:49:21,193 epoch 8 - iter 144/1445 - loss 0.01377773 - time (sec): 47.15 - samples/sec: 398.08 - lr: 0.000052 - momentum: 0.000000
2023-10-11 18:50:07,645 epoch 8 - iter 288/1445 - loss 0.01338014 - time (sec): 93.60 - samples/sec: 384.26 - lr: 0.000050 - momentum: 0.000000
2023-10-11 18:50:50,610 epoch 8 - iter 432/1445 - loss 0.01240568 - time (sec): 136.57 - samples/sec: 388.41 - lr: 0.000048 - momentum: 0.000000
2023-10-11 18:51:35,104 epoch 8 - iter 576/1445 - loss 0.01139900 - time (sec): 181.06 - samples/sec: 385.89 - lr: 0.000046 - momentum: 0.000000
2023-10-11 18:52:21,530 epoch 8 - iter 720/1445 - loss 0.01311798 - time (sec): 227.49 - samples/sec: 386.11 - lr: 0.000044 - momentum: 0.000000
2023-10-11 18:53:07,917 epoch 8 - iter 864/1445 - loss 0.01251169 - time (sec): 273.87 - samples/sec: 387.76 - lr: 0.000043 - momentum: 0.000000
2023-10-11 18:53:53,699 epoch 8 - iter 1008/1445 - loss 0.01322670 - time (sec): 319.65 - samples/sec: 388.46 - lr: 0.000041 - momentum: 0.000000
2023-10-11 18:54:39,842 epoch 8 - iter 1152/1445 - loss 0.01450279 - time (sec): 365.80 - samples/sec: 389.42 - lr: 0.000039 - momentum: 0.000000
2023-10-11 18:55:25,240 epoch 8 - iter 1296/1445 - loss 0.01418221 - time (sec): 411.20 - samples/sec: 385.19 - lr: 0.000037 - momentum: 0.000000
2023-10-11 18:56:09,983 epoch 8 - iter 1440/1445 - loss 0.01430382 - time (sec): 455.94 - samples/sec: 385.30 - lr: 0.000036 - momentum: 0.000000
2023-10-11 18:56:11,369 ----------------------------------------------------------------------------------------------------
2023-10-11 18:56:11,370 EPOCH 8 done: loss 0.0143 - lr: 0.000036
2023-10-11 18:56:35,103 DEV : loss 0.1493687927722931 - f1-score (micro avg) 0.8463
2023-10-11 18:56:35,138 ----------------------------------------------------------------------------------------------------
2023-10-11 18:57:26,756 epoch 9 - iter 144/1445 - loss 0.01365623 - time (sec): 51.61 - samples/sec: 363.64 - lr: 0.000034 - momentum: 0.000000
2023-10-11 18:58:09,147 epoch 9 - iter 288/1445 - loss 0.01086009 - time (sec): 94.01 - samples/sec: 375.25 - lr: 0.000032 - momentum: 0.000000
2023-10-11 18:58:54,871 epoch 9 - iter 432/1445 - loss 0.01091733 - time (sec): 139.73 - samples/sec: 385.74 - lr: 0.000030 - momentum: 0.000000
2023-10-11 18:59:41,145 epoch 9 - iter 576/1445 - loss 0.00976057 - time (sec): 186.00 - samples/sec: 383.90 - lr: 0.000028 - momentum: 0.000000
2023-10-11 19:00:24,579 epoch 9 - iter 720/1445 - loss 0.00941457 - time (sec): 229.44 - samples/sec: 386.65 - lr: 0.000027 - momentum: 0.000000
2023-10-11 19:01:08,775 epoch 9 - iter 864/1445 - loss 0.00937481 - time (sec): 273.63 - samples/sec: 388.71 - lr: 0.000025 - momentum: 0.000000
2023-10-11 19:01:54,146 epoch 9 - iter 1008/1445 - loss 0.00942580 - time (sec): 319.01 - samples/sec: 391.24 - lr: 0.000023 - momentum: 0.000000
2023-10-11 19:02:36,804 epoch 9 - iter 1152/1445 - loss 0.00946647 - time (sec): 361.66 - samples/sec: 391.23 - lr: 0.000021 - momentum: 0.000000
2023-10-11 19:03:20,018 epoch 9 - iter 1296/1445 - loss 0.00909532 - time (sec): 404.88 - samples/sec: 391.44 - lr: 0.000020 - momentum: 0.000000
2023-10-11 19:04:06,074 epoch 9 - iter 1440/1445 - loss 0.00916284 - time (sec): 450.93 - samples/sec: 389.91 - lr: 0.000018 - momentum: 0.000000
2023-10-11 19:04:07,367 ----------------------------------------------------------------------------------------------------
2023-10-11 19:04:07,367 EPOCH 9 done: loss 0.0091 - lr: 0.000018
2023-10-11 19:04:31,956 DEV : loss 0.1440650075674057 - f1-score (micro avg) 0.8482
2023-10-11 19:04:31,994 ----------------------------------------------------------------------------------------------------
2023-10-11 19:05:16,374 epoch 10 - iter 144/1445 - loss 0.00340475 - time (sec): 44.38 - samples/sec: 378.02 - lr: 0.000016 - momentum: 0.000000
2023-10-11 19:06:03,828 epoch 10 - iter 288/1445 - loss 0.00685161 - time (sec): 91.83 - samples/sec: 383.93 - lr: 0.000014 - momentum: 0.000000
2023-10-11 19:06:50,612 epoch 10 - iter 432/1445 - loss 0.00634319 - time (sec): 138.62 - samples/sec: 378.20 - lr: 0.000012 - momentum: 0.000000
2023-10-11 19:07:34,275 epoch 10 - iter 576/1445 - loss 0.00601531 - time (sec): 182.28 - samples/sec: 379.74 - lr: 0.000011 - momentum: 0.000000
2023-10-11 19:08:18,247 epoch 10 - iter 720/1445 - loss 0.00683740 - time (sec): 226.25 - samples/sec: 387.69 - lr: 0.000009 - momentum: 0.000000
2023-10-11 19:09:03,786 epoch 10 - iter 864/1445 - loss 0.00666797 - time (sec): 271.79 - samples/sec: 390.11 - lr: 0.000007 - momentum: 0.000000
2023-10-11 19:09:49,595 epoch 10 - iter 1008/1445 - loss 0.00806627 - time (sec): 317.60 - samples/sec: 390.75 - lr: 0.000005 - momentum: 0.000000
2023-10-11 19:10:33,481 epoch 10 - iter 1152/1445 - loss 0.00767263 - time (sec): 361.49 - samples/sec: 389.28 - lr: 0.000004 - momentum: 0.000000
2023-10-11 19:11:20,051 epoch 10 - iter 1296/1445 - loss 0.00835646 - time (sec): 408.06 - samples/sec: 388.16 - lr: 0.000002 - momentum: 0.000000
2023-10-11 19:12:07,000 epoch 10 - iter 1440/1445 - loss 0.00799785 - time (sec): 455.00 - samples/sec: 385.90 - lr: 0.000000 - momentum: 0.000000
2023-10-11 19:12:08,397 ----------------------------------------------------------------------------------------------------
2023-10-11 19:12:08,398 EPOCH 10 done: loss 0.0080 - lr: 0.000000
2023-10-11 19:12:29,737 DEV : loss 0.15512152016162872 - f1-score (micro avg) 0.8485
2023-10-11 19:12:30,792 ----------------------------------------------------------------------------------------------------
2023-10-11 19:12:30,794 Loading model from best epoch ...
2023-10-11 19:12:37,160 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-11 19:12:57,570
Results:
- F-score (micro) 0.8349
- F-score (macro) 0.7208
- Accuracy 0.7272
By class:
precision recall f1-score support
PER 0.8266 0.8506 0.8384 482
LOC 0.8969 0.8734 0.8850 458
ORG 0.5000 0.3913 0.4390 69
micro avg 0.8404 0.8295 0.8349 1009
macro avg 0.7412 0.7051 0.7208 1009
weighted avg 0.8362 0.8295 0.8322 1009
2023-10-11 19:12:57,570 ----------------------------------------------------------------------------------------------------