2023-10-12 14:02:20,581 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,583 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 14:02:20,583 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,584 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 14:02:20,584 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,584 Train: 5777 sentences 2023-10-12 14:02:20,584 (train_with_dev=False, train_with_test=False) 2023-10-12 14:02:20,584 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,584 Training Params: 2023-10-12 14:02:20,584 - learning_rate: "0.00016" 2023-10-12 14:02:20,584 - mini_batch_size: "8" 2023-10-12 14:02:20,584 - max_epochs: "10" 2023-10-12 14:02:20,584 - shuffle: "True" 2023-10-12 14:02:20,584 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,584 Plugins: 2023-10-12 14:02:20,585 - TensorboardLogger 2023-10-12 14:02:20,585 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 14:02:20,585 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,585 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 14:02:20,585 - metric: "('micro avg', 'f1-score')" 2023-10-12 14:02:20,585 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,585 Computation: 2023-10-12 14:02:20,585 - compute on device: cuda:0 2023-10-12 14:02:20,585 - embedding storage: none 2023-10-12 14:02:20,585 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,585 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-12 14:02:20,585 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,585 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:02:20,586 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 14:03:00,450 epoch 1 - iter 72/723 - loss 2.57146864 - time (sec): 39.86 - samples/sec: 448.11 - lr: 0.000016 - momentum: 0.000000 2023-10-12 14:03:40,814 epoch 1 - iter 144/723 - loss 2.49589202 - time (sec): 80.23 - samples/sec: 449.88 - lr: 0.000032 - momentum: 0.000000 2023-10-12 14:04:19,302 epoch 1 - iter 216/723 - loss 2.32807345 - time (sec): 118.71 - samples/sec: 447.33 - lr: 0.000048 - momentum: 0.000000 2023-10-12 14:04:57,778 epoch 1 - iter 288/723 - loss 2.11742467 - time (sec): 157.19 - samples/sec: 449.86 - lr: 0.000064 - momentum: 0.000000 2023-10-12 14:05:37,192 epoch 1 - iter 360/723 - loss 1.88902537 - time (sec): 196.60 - samples/sec: 449.44 - lr: 0.000079 - momentum: 0.000000 2023-10-12 14:06:17,712 epoch 1 - iter 432/723 - loss 1.67414156 - time (sec): 237.12 - samples/sec: 443.67 - lr: 0.000095 - momentum: 0.000000 2023-10-12 14:06:57,989 epoch 1 - iter 504/723 - loss 1.48274010 - time (sec): 277.40 - samples/sec: 441.13 - lr: 0.000111 - momentum: 0.000000 2023-10-12 14:07:40,425 epoch 1 - iter 576/723 - loss 1.32332329 - time (sec): 319.84 - samples/sec: 439.32 - lr: 0.000127 - momentum: 0.000000 2023-10-12 14:08:22,595 epoch 1 - iter 648/723 - loss 1.19542792 - time (sec): 362.01 - samples/sec: 438.88 - lr: 0.000143 - momentum: 0.000000 2023-10-12 14:09:04,093 epoch 1 - iter 720/723 - loss 1.10050274 - time (sec): 403.51 - samples/sec: 435.36 - lr: 0.000159 - momentum: 0.000000 2023-10-12 14:09:05,529 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:09:05,529 EPOCH 1 done: loss 1.0975 - lr: 0.000159 2023-10-12 14:09:26,230 DEV : loss 0.2361268699169159 - f1-score (micro avg) 0.0219 2023-10-12 14:09:26,265 saving best model 2023-10-12 14:09:27,169 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:10:08,416 epoch 2 - iter 72/723 - loss 0.16985132 - time (sec): 41.24 - samples/sec: 430.05 - lr: 0.000158 - momentum: 0.000000 2023-10-12 14:10:49,483 epoch 2 - iter 144/723 - loss 0.17040915 - time (sec): 82.31 - samples/sec: 423.49 - lr: 0.000156 - momentum: 0.000000 2023-10-12 14:11:31,985 epoch 2 - iter 216/723 - loss 0.16710193 - time (sec): 124.81 - samples/sec: 418.26 - lr: 0.000155 - momentum: 0.000000 2023-10-12 14:12:13,326 epoch 2 - iter 288/723 - loss 0.15978625 - time (sec): 166.15 - samples/sec: 418.23 - lr: 0.000153 - momentum: 0.000000 2023-10-12 14:12:52,562 epoch 2 - iter 360/723 - loss 0.15267936 - time (sec): 205.39 - samples/sec: 417.94 - lr: 0.000151 - momentum: 0.000000 2023-10-12 14:13:33,393 epoch 2 - iter 432/723 - loss 0.14849362 - time (sec): 246.22 - samples/sec: 420.74 - lr: 0.000149 - momentum: 0.000000 2023-10-12 14:14:13,896 epoch 2 - iter 504/723 - loss 0.14625781 - time (sec): 286.72 - samples/sec: 424.88 - lr: 0.000148 - momentum: 0.000000 2023-10-12 14:14:53,859 epoch 2 - iter 576/723 - loss 0.14267707 - time (sec): 326.69 - samples/sec: 428.30 - lr: 0.000146 - momentum: 0.000000 2023-10-12 14:15:35,999 epoch 2 - iter 648/723 - loss 0.13812316 - time (sec): 368.83 - samples/sec: 428.09 - lr: 0.000144 - momentum: 0.000000 2023-10-12 14:16:17,416 epoch 2 - iter 720/723 - loss 0.13583196 - time (sec): 410.24 - samples/sec: 427.74 - lr: 0.000142 - momentum: 0.000000 2023-10-12 14:16:19,108 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:16:19,109 EPOCH 2 done: loss 0.1355 - lr: 0.000142 2023-10-12 14:16:42,231 DEV : loss 0.12248684465885162 - f1-score (micro avg) 0.6968 2023-10-12 14:16:42,279 saving best model 2023-10-12 14:16:45,002 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:17:24,300 epoch 3 - iter 72/723 - loss 0.10358979 - time (sec): 39.29 - samples/sec: 428.50 - lr: 0.000140 - momentum: 0.000000 2023-10-12 14:18:05,010 epoch 3 - iter 144/723 - loss 0.09371822 - time (sec): 80.00 - samples/sec: 433.12 - lr: 0.000139 - momentum: 0.000000 2023-10-12 14:18:46,397 epoch 3 - iter 216/723 - loss 0.09346647 - time (sec): 121.39 - samples/sec: 426.31 - lr: 0.000137 - momentum: 0.000000 2023-10-12 14:19:28,061 epoch 3 - iter 288/723 - loss 0.08825349 - time (sec): 163.05 - samples/sec: 423.00 - lr: 0.000135 - momentum: 0.000000 2023-10-12 14:20:09,909 epoch 3 - iter 360/723 - loss 0.08701050 - time (sec): 204.90 - samples/sec: 421.98 - lr: 0.000133 - momentum: 0.000000 2023-10-12 14:20:54,725 epoch 3 - iter 432/723 - loss 0.08620917 - time (sec): 249.72 - samples/sec: 422.59 - lr: 0.000132 - momentum: 0.000000 2023-10-12 14:21:36,824 epoch 3 - iter 504/723 - loss 0.08267895 - time (sec): 291.82 - samples/sec: 422.98 - lr: 0.000130 - momentum: 0.000000 2023-10-12 14:22:18,077 epoch 3 - iter 576/723 - loss 0.08059396 - time (sec): 333.07 - samples/sec: 422.20 - lr: 0.000128 - momentum: 0.000000 2023-10-12 14:23:00,039 epoch 3 - iter 648/723 - loss 0.07876330 - time (sec): 375.03 - samples/sec: 420.84 - lr: 0.000126 - momentum: 0.000000 2023-10-12 14:23:39,699 epoch 3 - iter 720/723 - loss 0.07731069 - time (sec): 414.69 - samples/sec: 423.64 - lr: 0.000125 - momentum: 0.000000 2023-10-12 14:23:41,000 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:41,001 EPOCH 3 done: loss 0.0773 - lr: 0.000125 2023-10-12 14:24:03,948 DEV : loss 0.07725054025650024 - f1-score (micro avg) 0.8395 2023-10-12 14:24:03,984 saving best model 2023-10-12 14:24:06,590 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:24:47,396 epoch 4 - iter 72/723 - loss 0.04244017 - time (sec): 40.80 - samples/sec: 463.29 - lr: 0.000123 - momentum: 0.000000 2023-10-12 14:25:24,545 epoch 4 - iter 144/723 - loss 0.04566430 - time (sec): 77.95 - samples/sec: 449.41 - lr: 0.000121 - momentum: 0.000000 2023-10-12 14:26:02,955 epoch 4 - iter 216/723 - loss 0.04742743 - time (sec): 116.36 - samples/sec: 444.15 - lr: 0.000119 - momentum: 0.000000 2023-10-12 14:26:42,655 epoch 4 - iter 288/723 - loss 0.04960497 - time (sec): 156.06 - samples/sec: 441.84 - lr: 0.000117 - momentum: 0.000000 2023-10-12 14:27:21,464 epoch 4 - iter 360/723 - loss 0.04991076 - time (sec): 194.87 - samples/sec: 441.19 - lr: 0.000116 - momentum: 0.000000 2023-10-12 14:28:01,068 epoch 4 - iter 432/723 - loss 0.05291596 - time (sec): 234.47 - samples/sec: 443.83 - lr: 0.000114 - momentum: 0.000000 2023-10-12 14:28:42,792 epoch 4 - iter 504/723 - loss 0.05159753 - time (sec): 276.20 - samples/sec: 445.76 - lr: 0.000112 - momentum: 0.000000 2023-10-12 14:29:23,833 epoch 4 - iter 576/723 - loss 0.05151396 - time (sec): 317.24 - samples/sec: 441.89 - lr: 0.000110 - momentum: 0.000000 2023-10-12 14:30:06,126 epoch 4 - iter 648/723 - loss 0.05194550 - time (sec): 359.53 - samples/sec: 438.57 - lr: 0.000109 - momentum: 0.000000 2023-10-12 14:30:48,071 epoch 4 - iter 720/723 - loss 0.05103521 - time (sec): 401.48 - samples/sec: 437.75 - lr: 0.000107 - momentum: 0.000000 2023-10-12 14:30:49,234 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:30:49,234 EPOCH 4 done: loss 0.0510 - lr: 0.000107 2023-10-12 14:31:11,165 DEV : loss 0.08439276367425919 - f1-score (micro avg) 0.8373 2023-10-12 14:31:11,197 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:31:51,823 epoch 5 - iter 72/723 - loss 0.02827603 - time (sec): 40.62 - samples/sec: 417.84 - lr: 0.000105 - momentum: 0.000000 2023-10-12 14:32:34,630 epoch 5 - iter 144/723 - loss 0.03309349 - time (sec): 83.43 - samples/sec: 412.21 - lr: 0.000103 - momentum: 0.000000 2023-10-12 14:33:17,993 epoch 5 - iter 216/723 - loss 0.03235199 - time (sec): 126.79 - samples/sec: 411.07 - lr: 0.000101 - momentum: 0.000000 2023-10-12 14:34:00,697 epoch 5 - iter 288/723 - loss 0.03466735 - time (sec): 169.50 - samples/sec: 416.22 - lr: 0.000100 - momentum: 0.000000 2023-10-12 14:34:41,998 epoch 5 - iter 360/723 - loss 0.03514176 - time (sec): 210.80 - samples/sec: 417.92 - lr: 0.000098 - momentum: 0.000000 2023-10-12 14:35:24,599 epoch 5 - iter 432/723 - loss 0.03437310 - time (sec): 253.40 - samples/sec: 416.97 - lr: 0.000096 - momentum: 0.000000 2023-10-12 14:36:08,609 epoch 5 - iter 504/723 - loss 0.03332593 - time (sec): 297.41 - samples/sec: 410.67 - lr: 0.000094 - momentum: 0.000000 2023-10-12 14:36:52,064 epoch 5 - iter 576/723 - loss 0.03276601 - time (sec): 340.87 - samples/sec: 410.46 - lr: 0.000093 - momentum: 0.000000 2023-10-12 14:37:31,645 epoch 5 - iter 648/723 - loss 0.03304384 - time (sec): 380.45 - samples/sec: 414.10 - lr: 0.000091 - momentum: 0.000000 2023-10-12 14:38:10,349 epoch 5 - iter 720/723 - loss 0.03360075 - time (sec): 419.15 - samples/sec: 419.16 - lr: 0.000089 - momentum: 0.000000 2023-10-12 14:38:11,539 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:38:11,539 EPOCH 5 done: loss 0.0336 - lr: 0.000089 2023-10-12 14:38:32,325 DEV : loss 0.09600105881690979 - f1-score (micro avg) 0.8287 2023-10-12 14:38:32,357 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:39:11,419 epoch 6 - iter 72/723 - loss 0.03220877 - time (sec): 39.06 - samples/sec: 460.19 - lr: 0.000087 - momentum: 0.000000 2023-10-12 14:39:49,366 epoch 6 - iter 144/723 - loss 0.03071465 - time (sec): 77.01 - samples/sec: 454.22 - lr: 0.000085 - momentum: 0.000000 2023-10-12 14:40:28,696 epoch 6 - iter 216/723 - loss 0.02895270 - time (sec): 116.34 - samples/sec: 454.75 - lr: 0.000084 - momentum: 0.000000 2023-10-12 14:41:08,773 epoch 6 - iter 288/723 - loss 0.02631946 - time (sec): 156.41 - samples/sec: 458.51 - lr: 0.000082 - momentum: 0.000000 2023-10-12 14:41:48,052 epoch 6 - iter 360/723 - loss 0.02581441 - time (sec): 195.69 - samples/sec: 446.78 - lr: 0.000080 - momentum: 0.000000 2023-10-12 14:42:31,220 epoch 6 - iter 432/723 - loss 0.02580522 - time (sec): 238.86 - samples/sec: 447.15 - lr: 0.000078 - momentum: 0.000000 2023-10-12 14:43:09,959 epoch 6 - iter 504/723 - loss 0.02470462 - time (sec): 277.60 - samples/sec: 445.29 - lr: 0.000077 - momentum: 0.000000 2023-10-12 14:43:48,878 epoch 6 - iter 576/723 - loss 0.02475012 - time (sec): 316.52 - samples/sec: 446.23 - lr: 0.000075 - momentum: 0.000000 2023-10-12 14:44:29,409 epoch 6 - iter 648/723 - loss 0.02481761 - time (sec): 357.05 - samples/sec: 444.63 - lr: 0.000073 - momentum: 0.000000 2023-10-12 14:45:08,583 epoch 6 - iter 720/723 - loss 0.02526761 - time (sec): 396.22 - samples/sec: 443.30 - lr: 0.000071 - momentum: 0.000000 2023-10-12 14:45:09,781 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:45:09,782 EPOCH 6 done: loss 0.0252 - lr: 0.000071 2023-10-12 14:45:31,300 DEV : loss 0.10096623748540878 - f1-score (micro avg) 0.8417 2023-10-12 14:45:31,331 saving best model 2023-10-12 14:45:33,926 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:46:12,365 epoch 7 - iter 72/723 - loss 0.02687994 - time (sec): 38.44 - samples/sec: 434.81 - lr: 0.000069 - momentum: 0.000000 2023-10-12 14:46:53,700 epoch 7 - iter 144/723 - loss 0.02462178 - time (sec): 79.77 - samples/sec: 448.83 - lr: 0.000068 - momentum: 0.000000 2023-10-12 14:47:35,239 epoch 7 - iter 216/723 - loss 0.02481712 - time (sec): 121.31 - samples/sec: 441.84 - lr: 0.000066 - momentum: 0.000000 2023-10-12 14:48:14,912 epoch 7 - iter 288/723 - loss 0.02288093 - time (sec): 160.98 - samples/sec: 441.36 - lr: 0.000064 - momentum: 0.000000 2023-10-12 14:48:54,552 epoch 7 - iter 360/723 - loss 0.02190559 - time (sec): 200.62 - samples/sec: 442.32 - lr: 0.000062 - momentum: 0.000000 2023-10-12 14:49:37,326 epoch 7 - iter 432/723 - loss 0.02096859 - time (sec): 243.40 - samples/sec: 439.35 - lr: 0.000061 - momentum: 0.000000 2023-10-12 14:50:18,758 epoch 7 - iter 504/723 - loss 0.02057194 - time (sec): 284.83 - samples/sec: 439.64 - lr: 0.000059 - momentum: 0.000000 2023-10-12 14:50:57,549 epoch 7 - iter 576/723 - loss 0.01995539 - time (sec): 323.62 - samples/sec: 440.58 - lr: 0.000057 - momentum: 0.000000 2023-10-12 14:51:36,488 epoch 7 - iter 648/723 - loss 0.01987628 - time (sec): 362.56 - samples/sec: 438.18 - lr: 0.000055 - momentum: 0.000000 2023-10-12 14:52:19,649 epoch 7 - iter 720/723 - loss 0.01959379 - time (sec): 405.72 - samples/sec: 433.27 - lr: 0.000053 - momentum: 0.000000 2023-10-12 14:52:20,987 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:52:20,988 EPOCH 7 done: loss 0.0196 - lr: 0.000053 2023-10-12 14:52:45,021 DEV : loss 0.13744854927062988 - f1-score (micro avg) 0.8058 2023-10-12 14:52:45,064 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:53:26,090 epoch 8 - iter 72/723 - loss 0.01120851 - time (sec): 41.02 - samples/sec: 435.90 - lr: 0.000052 - momentum: 0.000000 2023-10-12 14:54:05,919 epoch 8 - iter 144/723 - loss 0.01234430 - time (sec): 80.85 - samples/sec: 444.59 - lr: 0.000050 - momentum: 0.000000 2023-10-12 14:54:47,450 epoch 8 - iter 216/723 - loss 0.01255276 - time (sec): 122.38 - samples/sec: 437.65 - lr: 0.000048 - momentum: 0.000000 2023-10-12 14:55:29,854 epoch 8 - iter 288/723 - loss 0.01141486 - time (sec): 164.79 - samples/sec: 438.32 - lr: 0.000046 - momentum: 0.000000 2023-10-12 14:56:10,467 epoch 8 - iter 360/723 - loss 0.01320089 - time (sec): 205.40 - samples/sec: 428.85 - lr: 0.000045 - momentum: 0.000000 2023-10-12 14:56:51,709 epoch 8 - iter 432/723 - loss 0.01317686 - time (sec): 246.64 - samples/sec: 426.45 - lr: 0.000043 - momentum: 0.000000 2023-10-12 14:57:32,364 epoch 8 - iter 504/723 - loss 0.01453607 - time (sec): 287.30 - samples/sec: 426.39 - lr: 0.000041 - momentum: 0.000000 2023-10-12 14:58:13,699 epoch 8 - iter 576/723 - loss 0.01465906 - time (sec): 328.63 - samples/sec: 426.80 - lr: 0.000039 - momentum: 0.000000 2023-10-12 14:58:53,964 epoch 8 - iter 648/723 - loss 0.01412604 - time (sec): 368.90 - samples/sec: 427.98 - lr: 0.000037 - momentum: 0.000000 2023-10-12 14:59:34,520 epoch 8 - iter 720/723 - loss 0.01516939 - time (sec): 409.45 - samples/sec: 429.35 - lr: 0.000036 - momentum: 0.000000 2023-10-12 14:59:35,627 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:59:35,627 EPOCH 8 done: loss 0.0152 - lr: 0.000036 2023-10-12 14:59:56,735 DEV : loss 0.12667076289653778 - f1-score (micro avg) 0.8518 2023-10-12 14:59:56,766 saving best model 2023-10-12 14:59:59,434 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:00:40,080 epoch 9 - iter 72/723 - loss 0.01201924 - time (sec): 40.64 - samples/sec: 452.26 - lr: 0.000034 - momentum: 0.000000 2023-10-12 15:01:20,276 epoch 9 - iter 144/723 - loss 0.01119713 - time (sec): 80.84 - samples/sec: 435.24 - lr: 0.000032 - momentum: 0.000000 2023-10-12 15:02:01,633 epoch 9 - iter 216/723 - loss 0.01319119 - time (sec): 122.19 - samples/sec: 427.37 - lr: 0.000030 - momentum: 0.000000 2023-10-12 15:02:42,950 epoch 9 - iter 288/723 - loss 0.01132013 - time (sec): 163.51 - samples/sec: 425.85 - lr: 0.000028 - momentum: 0.000000 2023-10-12 15:03:25,210 epoch 9 - iter 360/723 - loss 0.01028013 - time (sec): 205.77 - samples/sec: 427.47 - lr: 0.000027 - momentum: 0.000000 2023-10-12 15:04:06,118 epoch 9 - iter 432/723 - loss 0.01003947 - time (sec): 246.68 - samples/sec: 427.88 - lr: 0.000025 - momentum: 0.000000 2023-10-12 15:04:46,490 epoch 9 - iter 504/723 - loss 0.01002813 - time (sec): 287.05 - samples/sec: 431.57 - lr: 0.000023 - momentum: 0.000000 2023-10-12 15:05:25,758 epoch 9 - iter 576/723 - loss 0.01025793 - time (sec): 326.32 - samples/sec: 432.21 - lr: 0.000021 - momentum: 0.000000 2023-10-12 15:06:04,302 epoch 9 - iter 648/723 - loss 0.01067156 - time (sec): 364.86 - samples/sec: 431.60 - lr: 0.000020 - momentum: 0.000000 2023-10-12 15:06:44,309 epoch 9 - iter 720/723 - loss 0.01056221 - time (sec): 404.87 - samples/sec: 432.53 - lr: 0.000018 - momentum: 0.000000 2023-10-12 15:06:46,196 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:06:46,196 EPOCH 9 done: loss 0.0115 - lr: 0.000018 2023-10-12 15:07:06,773 DEV : loss 0.13315436244010925 - f1-score (micro avg) 0.8539 2023-10-12 15:07:06,804 saving best model 2023-10-12 15:07:09,856 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:07:50,190 epoch 10 - iter 72/723 - loss 0.01793669 - time (sec): 40.33 - samples/sec: 467.25 - lr: 0.000016 - momentum: 0.000000 2023-10-12 15:08:29,257 epoch 10 - iter 144/723 - loss 0.01259728 - time (sec): 79.39 - samples/sec: 458.96 - lr: 0.000014 - momentum: 0.000000 2023-10-12 15:09:08,646 epoch 10 - iter 216/723 - loss 0.01175785 - time (sec): 118.78 - samples/sec: 450.96 - lr: 0.000012 - momentum: 0.000000 2023-10-12 15:09:48,830 epoch 10 - iter 288/723 - loss 0.01130089 - time (sec): 158.97 - samples/sec: 449.10 - lr: 0.000011 - momentum: 0.000000 2023-10-12 15:10:28,554 epoch 10 - iter 360/723 - loss 0.01084751 - time (sec): 198.69 - samples/sec: 449.18 - lr: 0.000009 - momentum: 0.000000 2023-10-12 15:11:09,274 epoch 10 - iter 432/723 - loss 0.00990835 - time (sec): 239.41 - samples/sec: 450.76 - lr: 0.000007 - momentum: 0.000000 2023-10-12 15:11:48,871 epoch 10 - iter 504/723 - loss 0.00967903 - time (sec): 279.01 - samples/sec: 443.76 - lr: 0.000005 - momentum: 0.000000 2023-10-12 15:12:29,990 epoch 10 - iter 576/723 - loss 0.00932348 - time (sec): 320.13 - samples/sec: 444.44 - lr: 0.000004 - momentum: 0.000000 2023-10-12 15:13:08,637 epoch 10 - iter 648/723 - loss 0.00892390 - time (sec): 358.77 - samples/sec: 442.66 - lr: 0.000002 - momentum: 0.000000 2023-10-12 15:13:47,594 epoch 10 - iter 720/723 - loss 0.00930951 - time (sec): 397.73 - samples/sec: 441.99 - lr: 0.000000 - momentum: 0.000000 2023-10-12 15:13:48,661 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:13:48,661 EPOCH 10 done: loss 0.0093 - lr: 0.000000 2023-10-12 15:14:11,235 DEV : loss 0.138749361038208 - f1-score (micro avg) 0.8498 2023-10-12 15:14:12,139 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:14:12,141 Loading model from best epoch ... 2023-10-12 15:14:16,462 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 15:14:38,321 Results: - F-score (micro) 0.8482 - F-score (macro) 0.7513 - Accuracy 0.7489 By class: precision recall f1-score support PER 0.8630 0.8361 0.8493 482 LOC 0.9302 0.8734 0.9009 458 ORG 0.5000 0.5072 0.5036 69 micro avg 0.8666 0.8305 0.8482 1009 macro avg 0.7644 0.7389 0.7513 1009 weighted avg 0.8687 0.8305 0.8491 1009 2023-10-12 15:14:38,322 ----------------------------------------------------------------------------------------------------