2023-10-12 21:23:45,902 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,904 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 21:23:45,904 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,904 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 21:23:45,904 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,904 Train: 5777 sentences 2023-10-12 21:23:45,905 (train_with_dev=False, train_with_test=False) 2023-10-12 21:23:45,905 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,905 Training Params: 2023-10-12 21:23:45,905 - learning_rate: "0.00016" 2023-10-12 21:23:45,905 - mini_batch_size: "4" 2023-10-12 21:23:45,905 - max_epochs: "10" 2023-10-12 21:23:45,905 - shuffle: "True" 2023-10-12 21:23:45,905 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,905 Plugins: 2023-10-12 21:23:45,905 - TensorboardLogger 2023-10-12 21:23:45,905 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 21:23:45,905 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,905 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 21:23:45,905 - metric: "('micro avg', 'f1-score')" 2023-10-12 21:23:45,905 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,906 Computation: 2023-10-12 21:23:45,906 - compute on device: cuda:0 2023-10-12 21:23:45,906 - embedding storage: none 2023-10-12 21:23:45,906 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,906 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" 2023-10-12 21:23:45,906 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,906 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:23:45,906 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 21:24:27,641 epoch 1 - iter 144/1445 - loss 2.53329630 - time (sec): 41.73 - samples/sec: 432.73 - lr: 0.000016 - momentum: 0.000000 2023-10-12 21:25:09,203 epoch 1 - iter 288/1445 - loss 2.36537339 - time (sec): 83.29 - samples/sec: 433.39 - lr: 0.000032 - momentum: 0.000000 2023-10-12 21:25:49,786 epoch 1 - iter 432/1445 - loss 2.11838959 - time (sec): 123.88 - samples/sec: 422.80 - lr: 0.000048 - momentum: 0.000000 2023-10-12 21:26:31,250 epoch 1 - iter 576/1445 - loss 1.81958359 - time (sec): 165.34 - samples/sec: 423.75 - lr: 0.000064 - momentum: 0.000000 2023-10-12 21:27:12,928 epoch 1 - iter 720/1445 - loss 1.54330067 - time (sec): 207.02 - samples/sec: 424.24 - lr: 0.000080 - momentum: 0.000000 2023-10-12 21:27:54,484 epoch 1 - iter 864/1445 - loss 1.33675258 - time (sec): 248.58 - samples/sec: 421.19 - lr: 0.000096 - momentum: 0.000000 2023-10-12 21:28:36,755 epoch 1 - iter 1008/1445 - loss 1.17087202 - time (sec): 290.85 - samples/sec: 420.93 - lr: 0.000112 - momentum: 0.000000 2023-10-12 21:29:17,487 epoch 1 - iter 1152/1445 - loss 1.05166695 - time (sec): 331.58 - samples/sec: 419.84 - lr: 0.000127 - momentum: 0.000000 2023-10-12 21:29:59,649 epoch 1 - iter 1296/1445 - loss 0.95132545 - time (sec): 373.74 - samples/sec: 419.88 - lr: 0.000143 - momentum: 0.000000 2023-10-12 21:30:42,244 epoch 1 - iter 1440/1445 - loss 0.86538888 - time (sec): 416.34 - samples/sec: 421.44 - lr: 0.000159 - momentum: 0.000000 2023-10-12 21:30:43,683 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:30:43,684 EPOCH 1 done: loss 0.8621 - lr: 0.000159 2023-10-12 21:31:04,241 DEV : loss 0.1847972571849823 - f1-score (micro avg) 0.3705 2023-10-12 21:31:04,273 saving best model 2023-10-12 21:31:05,195 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:31:48,840 epoch 2 - iter 144/1445 - loss 0.13150717 - time (sec): 43.64 - samples/sec: 399.70 - lr: 0.000158 - momentum: 0.000000 2023-10-12 21:32:32,839 epoch 2 - iter 288/1445 - loss 0.12547482 - time (sec): 87.64 - samples/sec: 404.66 - lr: 0.000156 - momentum: 0.000000 2023-10-12 21:33:15,269 epoch 2 - iter 432/1445 - loss 0.12341173 - time (sec): 130.07 - samples/sec: 399.66 - lr: 0.000155 - momentum: 0.000000 2023-10-12 21:33:58,085 epoch 2 - iter 576/1445 - loss 0.12210825 - time (sec): 172.89 - samples/sec: 403.62 - lr: 0.000153 - momentum: 0.000000 2023-10-12 21:34:41,150 epoch 2 - iter 720/1445 - loss 0.11883011 - time (sec): 215.95 - samples/sec: 403.98 - lr: 0.000151 - momentum: 0.000000 2023-10-12 21:35:27,590 epoch 2 - iter 864/1445 - loss 0.11676973 - time (sec): 262.39 - samples/sec: 399.41 - lr: 0.000149 - momentum: 0.000000 2023-10-12 21:36:12,186 epoch 2 - iter 1008/1445 - loss 0.11778461 - time (sec): 306.99 - samples/sec: 397.72 - lr: 0.000148 - momentum: 0.000000 2023-10-12 21:36:53,431 epoch 2 - iter 1152/1445 - loss 0.11553749 - time (sec): 348.23 - samples/sec: 401.78 - lr: 0.000146 - momentum: 0.000000 2023-10-12 21:37:33,796 epoch 2 - iter 1296/1445 - loss 0.11346210 - time (sec): 388.60 - samples/sec: 407.08 - lr: 0.000144 - momentum: 0.000000 2023-10-12 21:38:13,915 epoch 2 - iter 1440/1445 - loss 0.11050610 - time (sec): 428.72 - samples/sec: 409.60 - lr: 0.000142 - momentum: 0.000000 2023-10-12 21:38:15,240 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:38:15,241 EPOCH 2 done: loss 0.1103 - lr: 0.000142 2023-10-12 21:38:35,365 DEV : loss 0.08923686295747757 - f1-score (micro avg) 0.8234 2023-10-12 21:38:35,394 saving best model 2023-10-12 21:38:37,920 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:39:18,782 epoch 3 - iter 144/1445 - loss 0.06834371 - time (sec): 40.86 - samples/sec: 438.20 - lr: 0.000140 - momentum: 0.000000 2023-10-12 21:40:00,281 epoch 3 - iter 288/1445 - loss 0.06815199 - time (sec): 82.36 - samples/sec: 436.70 - lr: 0.000139 - momentum: 0.000000 2023-10-12 21:40:40,704 epoch 3 - iter 432/1445 - loss 0.06735091 - time (sec): 122.78 - samples/sec: 435.93 - lr: 0.000137 - momentum: 0.000000 2023-10-12 21:41:21,257 epoch 3 - iter 576/1445 - loss 0.06974133 - time (sec): 163.33 - samples/sec: 435.20 - lr: 0.000135 - momentum: 0.000000 2023-10-12 21:42:01,798 epoch 3 - iter 720/1445 - loss 0.06899677 - time (sec): 203.87 - samples/sec: 436.74 - lr: 0.000133 - momentum: 0.000000 2023-10-12 21:42:42,864 epoch 3 - iter 864/1445 - loss 0.06959133 - time (sec): 244.94 - samples/sec: 439.98 - lr: 0.000132 - momentum: 0.000000 2023-10-12 21:43:24,863 epoch 3 - iter 1008/1445 - loss 0.06960467 - time (sec): 286.94 - samples/sec: 436.21 - lr: 0.000130 - momentum: 0.000000 2023-10-12 21:44:06,858 epoch 3 - iter 1152/1445 - loss 0.07023728 - time (sec): 328.93 - samples/sec: 432.23 - lr: 0.000128 - momentum: 0.000000 2023-10-12 21:44:49,005 epoch 3 - iter 1296/1445 - loss 0.06915927 - time (sec): 371.08 - samples/sec: 427.73 - lr: 0.000126 - momentum: 0.000000 2023-10-12 21:45:31,391 epoch 3 - iter 1440/1445 - loss 0.06784203 - time (sec): 413.47 - samples/sec: 424.50 - lr: 0.000125 - momentum: 0.000000 2023-10-12 21:45:32,761 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:45:32,762 EPOCH 3 done: loss 0.0679 - lr: 0.000125 2023-10-12 21:45:54,049 DEV : loss 0.07864446192979813 - f1-score (micro avg) 0.8472 2023-10-12 21:45:54,079 saving best model 2023-10-12 21:45:56,622 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:46:38,942 epoch 4 - iter 144/1445 - loss 0.05236873 - time (sec): 42.32 - samples/sec: 423.83 - lr: 0.000123 - momentum: 0.000000 2023-10-12 21:47:20,089 epoch 4 - iter 288/1445 - loss 0.05180904 - time (sec): 83.46 - samples/sec: 416.34 - lr: 0.000121 - momentum: 0.000000 2023-10-12 21:48:02,270 epoch 4 - iter 432/1445 - loss 0.04908008 - time (sec): 125.64 - samples/sec: 417.28 - lr: 0.000119 - momentum: 0.000000 2023-10-12 21:48:45,487 epoch 4 - iter 576/1445 - loss 0.04796946 - time (sec): 168.86 - samples/sec: 423.21 - lr: 0.000117 - momentum: 0.000000 2023-10-12 21:49:28,393 epoch 4 - iter 720/1445 - loss 0.04610333 - time (sec): 211.77 - samples/sec: 421.32 - lr: 0.000116 - momentum: 0.000000 2023-10-12 21:50:09,873 epoch 4 - iter 864/1445 - loss 0.04493061 - time (sec): 253.25 - samples/sec: 419.33 - lr: 0.000114 - momentum: 0.000000 2023-10-12 21:50:50,880 epoch 4 - iter 1008/1445 - loss 0.04548811 - time (sec): 294.25 - samples/sec: 418.63 - lr: 0.000112 - momentum: 0.000000 2023-10-12 21:51:32,768 epoch 4 - iter 1152/1445 - loss 0.04476085 - time (sec): 336.14 - samples/sec: 420.71 - lr: 0.000110 - momentum: 0.000000 2023-10-12 21:52:13,889 epoch 4 - iter 1296/1445 - loss 0.04656794 - time (sec): 377.26 - samples/sec: 421.08 - lr: 0.000109 - momentum: 0.000000 2023-10-12 21:52:54,707 epoch 4 - iter 1440/1445 - loss 0.04588922 - time (sec): 418.08 - samples/sec: 420.56 - lr: 0.000107 - momentum: 0.000000 2023-10-12 21:52:55,823 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:52:55,823 EPOCH 4 done: loss 0.0460 - lr: 0.000107 2023-10-12 21:53:16,107 DEV : loss 0.10065485537052155 - f1-score (micro avg) 0.8398 2023-10-12 21:53:16,137 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:53:57,924 epoch 5 - iter 144/1445 - loss 0.03754111 - time (sec): 41.79 - samples/sec: 451.73 - lr: 0.000105 - momentum: 0.000000 2023-10-12 21:54:37,644 epoch 5 - iter 288/1445 - loss 0.03295251 - time (sec): 81.51 - samples/sec: 445.25 - lr: 0.000103 - momentum: 0.000000 2023-10-12 21:55:16,790 epoch 5 - iter 432/1445 - loss 0.03034951 - time (sec): 120.65 - samples/sec: 429.78 - lr: 0.000101 - momentum: 0.000000 2023-10-12 21:55:55,747 epoch 5 - iter 576/1445 - loss 0.02962774 - time (sec): 159.61 - samples/sec: 426.38 - lr: 0.000100 - momentum: 0.000000 2023-10-12 21:56:36,572 epoch 5 - iter 720/1445 - loss 0.03171128 - time (sec): 200.43 - samples/sec: 432.19 - lr: 0.000098 - momentum: 0.000000 2023-10-12 21:57:16,464 epoch 5 - iter 864/1445 - loss 0.03189659 - time (sec): 240.33 - samples/sec: 432.78 - lr: 0.000096 - momentum: 0.000000 2023-10-12 21:57:57,974 epoch 5 - iter 1008/1445 - loss 0.03208778 - time (sec): 281.83 - samples/sec: 434.75 - lr: 0.000094 - momentum: 0.000000 2023-10-12 21:58:38,623 epoch 5 - iter 1152/1445 - loss 0.03196953 - time (sec): 322.48 - samples/sec: 435.07 - lr: 0.000093 - momentum: 0.000000 2023-10-12 21:59:18,873 epoch 5 - iter 1296/1445 - loss 0.03247366 - time (sec): 362.73 - samples/sec: 435.30 - lr: 0.000091 - momentum: 0.000000 2023-10-12 21:59:58,904 epoch 5 - iter 1440/1445 - loss 0.03343762 - time (sec): 402.77 - samples/sec: 435.41 - lr: 0.000089 - momentum: 0.000000 2023-10-12 22:00:00,335 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:00:00,336 EPOCH 5 done: loss 0.0340 - lr: 0.000089 2023-10-12 22:00:21,562 DEV : loss 0.11156909167766571 - f1-score (micro avg) 0.8332 2023-10-12 22:00:21,591 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:01:02,121 epoch 6 - iter 144/1445 - loss 0.01887123 - time (sec): 40.53 - samples/sec: 424.57 - lr: 0.000087 - momentum: 0.000000 2023-10-12 22:01:42,563 epoch 6 - iter 288/1445 - loss 0.02116022 - time (sec): 80.97 - samples/sec: 426.42 - lr: 0.000085 - momentum: 0.000000 2023-10-12 22:02:23,726 epoch 6 - iter 432/1445 - loss 0.02492081 - time (sec): 122.13 - samples/sec: 429.07 - lr: 0.000084 - momentum: 0.000000 2023-10-12 22:03:05,157 epoch 6 - iter 576/1445 - loss 0.02312836 - time (sec): 163.56 - samples/sec: 430.48 - lr: 0.000082 - momentum: 0.000000 2023-10-12 22:03:46,615 epoch 6 - iter 720/1445 - loss 0.02397003 - time (sec): 205.02 - samples/sec: 429.86 - lr: 0.000080 - momentum: 0.000000 2023-10-12 22:04:29,524 epoch 6 - iter 864/1445 - loss 0.02169080 - time (sec): 247.93 - samples/sec: 429.81 - lr: 0.000078 - momentum: 0.000000 2023-10-12 22:05:11,526 epoch 6 - iter 1008/1445 - loss 0.02492991 - time (sec): 289.93 - samples/sec: 429.05 - lr: 0.000076 - momentum: 0.000000 2023-10-12 22:05:52,796 epoch 6 - iter 1152/1445 - loss 0.02363443 - time (sec): 331.20 - samples/sec: 425.58 - lr: 0.000075 - momentum: 0.000000 2023-10-12 22:06:32,978 epoch 6 - iter 1296/1445 - loss 0.02330251 - time (sec): 371.38 - samples/sec: 424.51 - lr: 0.000073 - momentum: 0.000000 2023-10-12 22:07:14,891 epoch 6 - iter 1440/1445 - loss 0.02376795 - time (sec): 413.30 - samples/sec: 425.05 - lr: 0.000071 - momentum: 0.000000 2023-10-12 22:07:16,126 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:07:16,126 EPOCH 6 done: loss 0.0237 - lr: 0.000071 2023-10-12 22:07:36,356 DEV : loss 0.13551419973373413 - f1-score (micro avg) 0.841 2023-10-12 22:07:36,386 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:08:18,583 epoch 7 - iter 144/1445 - loss 0.01993712 - time (sec): 42.19 - samples/sec: 418.04 - lr: 0.000069 - momentum: 0.000000 2023-10-12 22:08:59,833 epoch 7 - iter 288/1445 - loss 0.01779429 - time (sec): 83.45 - samples/sec: 426.34 - lr: 0.000068 - momentum: 0.000000 2023-10-12 22:09:40,252 epoch 7 - iter 432/1445 - loss 0.01748285 - time (sec): 123.86 - samples/sec: 420.58 - lr: 0.000066 - momentum: 0.000000 2023-10-12 22:10:20,324 epoch 7 - iter 576/1445 - loss 0.01656374 - time (sec): 163.94 - samples/sec: 418.75 - lr: 0.000064 - momentum: 0.000000 2023-10-12 22:11:01,416 epoch 7 - iter 720/1445 - loss 0.01878790 - time (sec): 205.03 - samples/sec: 423.33 - lr: 0.000062 - momentum: 0.000000 2023-10-12 22:11:42,695 epoch 7 - iter 864/1445 - loss 0.01829691 - time (sec): 246.31 - samples/sec: 423.11 - lr: 0.000060 - momentum: 0.000000 2023-10-12 22:12:23,106 epoch 7 - iter 1008/1445 - loss 0.01767695 - time (sec): 286.72 - samples/sec: 425.26 - lr: 0.000059 - momentum: 0.000000 2023-10-12 22:13:03,909 epoch 7 - iter 1152/1445 - loss 0.01746928 - time (sec): 327.52 - samples/sec: 424.22 - lr: 0.000057 - momentum: 0.000000 2023-10-12 22:13:45,103 epoch 7 - iter 1296/1445 - loss 0.01724052 - time (sec): 368.71 - samples/sec: 424.49 - lr: 0.000055 - momentum: 0.000000 2023-10-12 22:14:27,824 epoch 7 - iter 1440/1445 - loss 0.01810091 - time (sec): 411.44 - samples/sec: 426.53 - lr: 0.000053 - momentum: 0.000000 2023-10-12 22:14:29,309 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:14:29,310 EPOCH 7 done: loss 0.0180 - lr: 0.000053 2023-10-12 22:14:51,394 DEV : loss 0.13199612498283386 - f1-score (micro avg) 0.8541 2023-10-12 22:14:51,426 saving best model 2023-10-12 22:14:54,014 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:15:35,081 epoch 8 - iter 144/1445 - loss 0.01400894 - time (sec): 41.06 - samples/sec: 452.00 - lr: 0.000052 - momentum: 0.000000 2023-10-12 22:16:16,384 epoch 8 - iter 288/1445 - loss 0.01190023 - time (sec): 82.36 - samples/sec: 435.97 - lr: 0.000050 - momentum: 0.000000 2023-10-12 22:16:57,421 epoch 8 - iter 432/1445 - loss 0.01305794 - time (sec): 123.40 - samples/sec: 428.05 - lr: 0.000048 - momentum: 0.000000 2023-10-12 22:17:40,099 epoch 8 - iter 576/1445 - loss 0.01195701 - time (sec): 166.08 - samples/sec: 433.10 - lr: 0.000046 - momentum: 0.000000 2023-10-12 22:18:22,708 epoch 8 - iter 720/1445 - loss 0.01228432 - time (sec): 208.69 - samples/sec: 427.92 - lr: 0.000044 - momentum: 0.000000 2023-10-12 22:19:04,704 epoch 8 - iter 864/1445 - loss 0.01257074 - time (sec): 250.69 - samples/sec: 421.89 - lr: 0.000043 - momentum: 0.000000 2023-10-12 22:19:47,463 epoch 8 - iter 1008/1445 - loss 0.01349553 - time (sec): 293.44 - samples/sec: 419.32 - lr: 0.000041 - momentum: 0.000000 2023-10-12 22:20:29,739 epoch 8 - iter 1152/1445 - loss 0.01302670 - time (sec): 335.72 - samples/sec: 415.59 - lr: 0.000039 - momentum: 0.000000 2023-10-12 22:21:13,081 epoch 8 - iter 1296/1445 - loss 0.01428236 - time (sec): 379.06 - samples/sec: 416.56 - lr: 0.000037 - momentum: 0.000000 2023-10-12 22:21:55,650 epoch 8 - iter 1440/1445 - loss 0.01424477 - time (sec): 421.63 - samples/sec: 416.77 - lr: 0.000036 - momentum: 0.000000 2023-10-12 22:21:56,886 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:21:56,887 EPOCH 8 done: loss 0.0143 - lr: 0.000036 2023-10-12 22:22:17,436 DEV : loss 0.15470005571842194 - f1-score (micro avg) 0.8457 2023-10-12 22:22:17,465 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:22:59,069 epoch 9 - iter 144/1445 - loss 0.00569141 - time (sec): 41.60 - samples/sec: 442.36 - lr: 0.000034 - momentum: 0.000000 2023-10-12 22:23:40,717 epoch 9 - iter 288/1445 - loss 0.01301404 - time (sec): 83.25 - samples/sec: 446.83 - lr: 0.000032 - momentum: 0.000000 2023-10-12 22:24:21,009 epoch 9 - iter 432/1445 - loss 0.01230076 - time (sec): 123.54 - samples/sec: 445.61 - lr: 0.000030 - momentum: 0.000000 2023-10-12 22:25:00,563 epoch 9 - iter 576/1445 - loss 0.01178494 - time (sec): 163.10 - samples/sec: 436.19 - lr: 0.000028 - momentum: 0.000000 2023-10-12 22:25:39,478 epoch 9 - iter 720/1445 - loss 0.01121005 - time (sec): 202.01 - samples/sec: 430.67 - lr: 0.000027 - momentum: 0.000000 2023-10-12 22:26:19,982 epoch 9 - iter 864/1445 - loss 0.01117518 - time (sec): 242.51 - samples/sec: 432.62 - lr: 0.000025 - momentum: 0.000000 2023-10-12 22:27:00,522 epoch 9 - iter 1008/1445 - loss 0.01142421 - time (sec): 283.05 - samples/sec: 432.90 - lr: 0.000023 - momentum: 0.000000 2023-10-12 22:27:42,905 epoch 9 - iter 1152/1445 - loss 0.01188551 - time (sec): 325.44 - samples/sec: 434.68 - lr: 0.000021 - momentum: 0.000000 2023-10-12 22:28:23,774 epoch 9 - iter 1296/1445 - loss 0.01117904 - time (sec): 366.31 - samples/sec: 432.74 - lr: 0.000020 - momentum: 0.000000 2023-10-12 22:29:04,715 epoch 9 - iter 1440/1445 - loss 0.01054392 - time (sec): 407.25 - samples/sec: 431.36 - lr: 0.000018 - momentum: 0.000000 2023-10-12 22:29:05,942 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:29:05,943 EPOCH 9 done: loss 0.0105 - lr: 0.000018 2023-10-12 22:29:27,685 DEV : loss 0.15782958269119263 - f1-score (micro avg) 0.851 2023-10-12 22:29:27,716 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:30:09,377 epoch 10 - iter 144/1445 - loss 0.00649677 - time (sec): 41.66 - samples/sec: 432.41 - lr: 0.000016 - momentum: 0.000000 2023-10-12 22:30:48,589 epoch 10 - iter 288/1445 - loss 0.00717470 - time (sec): 80.87 - samples/sec: 417.09 - lr: 0.000014 - momentum: 0.000000 2023-10-12 22:31:28,912 epoch 10 - iter 432/1445 - loss 0.00796770 - time (sec): 121.19 - samples/sec: 418.26 - lr: 0.000012 - momentum: 0.000000 2023-10-12 22:32:11,533 epoch 10 - iter 576/1445 - loss 0.00925299 - time (sec): 163.82 - samples/sec: 421.90 - lr: 0.000011 - momentum: 0.000000 2023-10-12 22:32:53,189 epoch 10 - iter 720/1445 - loss 0.00823255 - time (sec): 205.47 - samples/sec: 420.03 - lr: 0.000009 - momentum: 0.000000 2023-10-12 22:33:35,725 epoch 10 - iter 864/1445 - loss 0.00732982 - time (sec): 248.01 - samples/sec: 422.20 - lr: 0.000007 - momentum: 0.000000 2023-10-12 22:34:18,703 epoch 10 - iter 1008/1445 - loss 0.00779755 - time (sec): 290.98 - samples/sec: 423.65 - lr: 0.000005 - momentum: 0.000000 2023-10-12 22:35:00,516 epoch 10 - iter 1152/1445 - loss 0.00738921 - time (sec): 332.80 - samples/sec: 420.58 - lr: 0.000004 - momentum: 0.000000 2023-10-12 22:35:42,966 epoch 10 - iter 1296/1445 - loss 0.00811633 - time (sec): 375.25 - samples/sec: 420.22 - lr: 0.000002 - momentum: 0.000000 2023-10-12 22:36:25,327 epoch 10 - iter 1440/1445 - loss 0.00775884 - time (sec): 417.61 - samples/sec: 420.82 - lr: 0.000000 - momentum: 0.000000 2023-10-12 22:36:26,519 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:36:26,519 EPOCH 10 done: loss 0.0077 - lr: 0.000000 2023-10-12 22:36:47,942 DEV : loss 0.16641554236412048 - f1-score (micro avg) 0.8455 2023-10-12 22:36:48,832 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:36:48,834 Loading model from best epoch ... 2023-10-12 22:36:52,922 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 22:37:13,988 Results: - F-score (micro) 0.8543 - F-score (macro) 0.7635 - Accuracy 0.7602 By class: precision recall f1-score support PER 0.8577 0.8631 0.8604 482 LOC 0.9238 0.8734 0.8979 458 ORG 0.5286 0.5362 0.5324 69 micro avg 0.8634 0.8454 0.8543 1009 macro avg 0.7700 0.7576 0.7635 1009 weighted avg 0.8652 0.8454 0.8550 1009 2023-10-12 22:37:13,988 ----------------------------------------------------------------------------------------------------