2023-10-11 11:07:50,584 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,586 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 11:07:50,587 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,587 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-11 11:07:50,587 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,587 Train: 1085 sentences 2023-10-11 11:07:50,587 (train_with_dev=False, train_with_test=False) 2023-10-11 11:07:50,587 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,587 Training Params: 2023-10-11 11:07:50,587 - learning_rate: "0.00016" 2023-10-11 11:07:50,587 - mini_batch_size: "4" 2023-10-11 11:07:50,587 - max_epochs: "10" 2023-10-11 11:07:50,587 - shuffle: "True" 2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,588 Plugins: 2023-10-11 11:07:50,588 - TensorboardLogger 2023-10-11 11:07:50,588 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,588 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 11:07:50,588 - metric: "('micro avg', 'f1-score')" 2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,588 Computation: 2023-10-11 11:07:50,588 - compute on device: cuda:0 2023-10-11 11:07:50,588 - embedding storage: none 2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,588 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:07:50,589 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 11:08:00,786 epoch 1 - iter 27/272 - loss 2.82642878 - time (sec): 10.20 - samples/sec: 564.33 - lr: 0.000015 - momentum: 0.000000 2023-10-11 11:08:10,540 epoch 1 - iter 54/272 - loss 2.81647757 - time (sec): 19.95 - samples/sec: 561.36 - lr: 0.000031 - momentum: 0.000000 2023-10-11 11:08:20,157 epoch 1 - iter 81/272 - loss 2.79293552 - time (sec): 29.57 - samples/sec: 554.92 - lr: 0.000047 - momentum: 0.000000 2023-10-11 11:08:29,853 epoch 1 - iter 108/272 - loss 2.74216787 - time (sec): 39.26 - samples/sec: 547.45 - lr: 0.000063 - momentum: 0.000000 2023-10-11 11:08:39,436 epoch 1 - iter 135/272 - loss 2.65162236 - time (sec): 48.85 - samples/sec: 548.73 - lr: 0.000079 - momentum: 0.000000 2023-10-11 11:08:48,405 epoch 1 - iter 162/272 - loss 2.56580157 - time (sec): 57.82 - samples/sec: 543.49 - lr: 0.000095 - momentum: 0.000000 2023-10-11 11:08:57,372 epoch 1 - iter 189/272 - loss 2.46639280 - time (sec): 66.78 - samples/sec: 537.20 - lr: 0.000111 - momentum: 0.000000 2023-10-11 11:09:07,456 epoch 1 - iter 216/272 - loss 2.33252277 - time (sec): 76.87 - samples/sec: 541.51 - lr: 0.000126 - momentum: 0.000000 2023-10-11 11:09:16,984 epoch 1 - iter 243/272 - loss 2.21102811 - time (sec): 86.39 - samples/sec: 538.76 - lr: 0.000142 - momentum: 0.000000 2023-10-11 11:09:26,631 epoch 1 - iter 270/272 - loss 2.08982352 - time (sec): 96.04 - samples/sec: 537.93 - lr: 0.000158 - momentum: 0.000000 2023-10-11 11:09:27,187 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:09:27,187 EPOCH 1 done: loss 2.0814 - lr: 0.000158 2023-10-11 11:09:32,503 DEV : loss 0.751204788684845 - f1-score (micro avg) 0.0 2023-10-11 11:09:32,512 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:09:42,689 epoch 2 - iter 27/272 - loss 0.70958275 - time (sec): 10.18 - samples/sec: 573.25 - lr: 0.000158 - momentum: 0.000000 2023-10-11 11:09:52,318 epoch 2 - iter 54/272 - loss 0.68509310 - time (sec): 19.80 - samples/sec: 568.82 - lr: 0.000157 - momentum: 0.000000 2023-10-11 11:10:01,642 epoch 2 - iter 81/272 - loss 0.66861733 - time (sec): 29.13 - samples/sec: 562.93 - lr: 0.000155 - momentum: 0.000000 2023-10-11 11:10:10,552 epoch 2 - iter 108/272 - loss 0.63700588 - time (sec): 38.04 - samples/sec: 553.57 - lr: 0.000153 - momentum: 0.000000 2023-10-11 11:10:19,839 epoch 2 - iter 135/272 - loss 0.59504244 - time (sec): 47.33 - samples/sec: 552.75 - lr: 0.000151 - momentum: 0.000000 2023-10-11 11:10:28,596 epoch 2 - iter 162/272 - loss 0.57110271 - time (sec): 56.08 - samples/sec: 544.08 - lr: 0.000149 - momentum: 0.000000 2023-10-11 11:10:37,826 epoch 2 - iter 189/272 - loss 0.53925735 - time (sec): 65.31 - samples/sec: 540.45 - lr: 0.000148 - momentum: 0.000000 2023-10-11 11:10:47,077 epoch 2 - iter 216/272 - loss 0.52341224 - time (sec): 74.56 - samples/sec: 540.39 - lr: 0.000146 - momentum: 0.000000 2023-10-11 11:10:56,423 epoch 2 - iter 243/272 - loss 0.50597516 - time (sec): 83.91 - samples/sec: 543.46 - lr: 0.000144 - momentum: 0.000000 2023-10-11 11:11:06,983 epoch 2 - iter 270/272 - loss 0.48741426 - time (sec): 94.47 - samples/sec: 547.54 - lr: 0.000142 - momentum: 0.000000 2023-10-11 11:11:07,491 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:11:07,492 EPOCH 2 done: loss 0.4853 - lr: 0.000142 2023-10-11 11:11:13,160 DEV : loss 0.2761920094490051 - f1-score (micro avg) 0.3163 2023-10-11 11:11:13,169 saving best model 2023-10-11 11:11:14,262 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:11:23,631 epoch 3 - iter 27/272 - loss 0.33511650 - time (sec): 9.37 - samples/sec: 545.23 - lr: 0.000141 - momentum: 0.000000 2023-10-11 11:11:33,244 epoch 3 - iter 54/272 - loss 0.30965757 - time (sec): 18.98 - samples/sec: 557.62 - lr: 0.000139 - momentum: 0.000000 2023-10-11 11:11:43,271 epoch 3 - iter 81/272 - loss 0.30790871 - time (sec): 29.01 - samples/sec: 574.62 - lr: 0.000137 - momentum: 0.000000 2023-10-11 11:11:52,030 epoch 3 - iter 108/272 - loss 0.31732225 - time (sec): 37.77 - samples/sec: 556.73 - lr: 0.000135 - momentum: 0.000000 2023-10-11 11:12:01,014 epoch 3 - iter 135/272 - loss 0.31924782 - time (sec): 46.75 - samples/sec: 551.63 - lr: 0.000133 - momentum: 0.000000 2023-10-11 11:12:11,223 epoch 3 - iter 162/272 - loss 0.30738277 - time (sec): 56.96 - samples/sec: 557.71 - lr: 0.000132 - momentum: 0.000000 2023-10-11 11:12:20,613 epoch 3 - iter 189/272 - loss 0.29568838 - time (sec): 66.35 - samples/sec: 556.24 - lr: 0.000130 - momentum: 0.000000 2023-10-11 11:12:29,885 epoch 3 - iter 216/272 - loss 0.28129595 - time (sec): 75.62 - samples/sec: 554.44 - lr: 0.000128 - momentum: 0.000000 2023-10-11 11:12:38,783 epoch 3 - iter 243/272 - loss 0.27144926 - time (sec): 84.52 - samples/sec: 552.54 - lr: 0.000126 - momentum: 0.000000 2023-10-11 11:12:47,953 epoch 3 - iter 270/272 - loss 0.27109563 - time (sec): 93.69 - samples/sec: 552.59 - lr: 0.000125 - momentum: 0.000000 2023-10-11 11:12:48,398 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:12:48,399 EPOCH 3 done: loss 0.2705 - lr: 0.000125 2023-10-11 11:12:53,851 DEV : loss 0.20764292776584625 - f1-score (micro avg) 0.5838 2023-10-11 11:12:53,859 saving best model 2023-10-11 11:12:56,387 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:13:05,348 epoch 4 - iter 27/272 - loss 0.17940497 - time (sec): 8.96 - samples/sec: 538.37 - lr: 0.000123 - momentum: 0.000000 2023-10-11 11:13:13,448 epoch 4 - iter 54/272 - loss 0.20547376 - time (sec): 17.06 - samples/sec: 507.08 - lr: 0.000121 - momentum: 0.000000 2023-10-11 11:13:23,088 epoch 4 - iter 81/272 - loss 0.21204444 - time (sec): 26.70 - samples/sec: 527.23 - lr: 0.000119 - momentum: 0.000000 2023-10-11 11:13:32,138 epoch 4 - iter 108/272 - loss 0.19861011 - time (sec): 35.75 - samples/sec: 531.39 - lr: 0.000117 - momentum: 0.000000 2023-10-11 11:13:41,269 epoch 4 - iter 135/272 - loss 0.19649019 - time (sec): 44.88 - samples/sec: 533.18 - lr: 0.000116 - momentum: 0.000000 2023-10-11 11:13:50,014 epoch 4 - iter 162/272 - loss 0.18956010 - time (sec): 53.62 - samples/sec: 527.21 - lr: 0.000114 - momentum: 0.000000 2023-10-11 11:14:00,798 epoch 4 - iter 189/272 - loss 0.18263628 - time (sec): 64.41 - samples/sec: 545.12 - lr: 0.000112 - momentum: 0.000000 2023-10-11 11:14:10,171 epoch 4 - iter 216/272 - loss 0.17817811 - time (sec): 73.78 - samples/sec: 546.98 - lr: 0.000110 - momentum: 0.000000 2023-10-11 11:14:19,885 epoch 4 - iter 243/272 - loss 0.17345252 - time (sec): 83.49 - samples/sec: 549.18 - lr: 0.000109 - momentum: 0.000000 2023-10-11 11:14:29,653 epoch 4 - iter 270/272 - loss 0.16873144 - time (sec): 93.26 - samples/sec: 553.83 - lr: 0.000107 - momentum: 0.000000 2023-10-11 11:14:30,202 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:14:30,202 EPOCH 4 done: loss 0.1683 - lr: 0.000107 2023-10-11 11:14:35,817 DEV : loss 0.16644711792469025 - f1-score (micro avg) 0.6248 2023-10-11 11:14:35,825 saving best model 2023-10-11 11:14:38,358 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:14:48,002 epoch 5 - iter 27/272 - loss 0.10934003 - time (sec): 9.64 - samples/sec: 597.01 - lr: 0.000105 - momentum: 0.000000 2023-10-11 11:14:57,370 epoch 5 - iter 54/272 - loss 0.11810250 - time (sec): 19.01 - samples/sec: 584.87 - lr: 0.000103 - momentum: 0.000000 2023-10-11 11:15:06,752 epoch 5 - iter 81/272 - loss 0.11280886 - time (sec): 28.39 - samples/sec: 579.61 - lr: 0.000101 - momentum: 0.000000 2023-10-11 11:15:15,268 epoch 5 - iter 108/272 - loss 0.11577577 - time (sec): 36.91 - samples/sec: 564.71 - lr: 0.000100 - momentum: 0.000000 2023-10-11 11:15:23,880 epoch 5 - iter 135/272 - loss 0.11510951 - time (sec): 45.52 - samples/sec: 554.18 - lr: 0.000098 - momentum: 0.000000 2023-10-11 11:15:33,117 epoch 5 - iter 162/272 - loss 0.11718809 - time (sec): 54.76 - samples/sec: 553.04 - lr: 0.000096 - momentum: 0.000000 2023-10-11 11:15:42,474 epoch 5 - iter 189/272 - loss 0.11051106 - time (sec): 64.11 - samples/sec: 550.23 - lr: 0.000094 - momentum: 0.000000 2023-10-11 11:15:51,957 epoch 5 - iter 216/272 - loss 0.11192449 - time (sec): 73.59 - samples/sec: 550.60 - lr: 0.000093 - momentum: 0.000000 2023-10-11 11:16:02,208 epoch 5 - iter 243/272 - loss 0.11278962 - time (sec): 83.85 - samples/sec: 558.27 - lr: 0.000091 - momentum: 0.000000 2023-10-11 11:16:11,167 epoch 5 - iter 270/272 - loss 0.11164211 - time (sec): 92.80 - samples/sec: 558.74 - lr: 0.000089 - momentum: 0.000000 2023-10-11 11:16:11,545 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:16:11,545 EPOCH 5 done: loss 0.1115 - lr: 0.000089 2023-10-11 11:16:17,034 DEV : loss 0.14780402183532715 - f1-score (micro avg) 0.7273 2023-10-11 11:16:17,042 saving best model 2023-10-11 11:16:19,527 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:16:28,224 epoch 6 - iter 27/272 - loss 0.06996347 - time (sec): 8.69 - samples/sec: 537.73 - lr: 0.000087 - momentum: 0.000000 2023-10-11 11:16:36,811 epoch 6 - iter 54/272 - loss 0.08153458 - time (sec): 17.28 - samples/sec: 534.03 - lr: 0.000085 - momentum: 0.000000 2023-10-11 11:16:45,704 epoch 6 - iter 81/272 - loss 0.09017221 - time (sec): 26.17 - samples/sec: 539.47 - lr: 0.000084 - momentum: 0.000000 2023-10-11 11:16:55,332 epoch 6 - iter 108/272 - loss 0.08370356 - time (sec): 35.80 - samples/sec: 551.38 - lr: 0.000082 - momentum: 0.000000 2023-10-11 11:17:04,982 epoch 6 - iter 135/272 - loss 0.07843134 - time (sec): 45.45 - samples/sec: 566.46 - lr: 0.000080 - momentum: 0.000000 2023-10-11 11:17:13,606 epoch 6 - iter 162/272 - loss 0.07866597 - time (sec): 54.07 - samples/sec: 558.77 - lr: 0.000078 - momentum: 0.000000 2023-10-11 11:17:23,040 epoch 6 - iter 189/272 - loss 0.07503492 - time (sec): 63.51 - samples/sec: 561.63 - lr: 0.000077 - momentum: 0.000000 2023-10-11 11:17:32,129 epoch 6 - iter 216/272 - loss 0.08015716 - time (sec): 72.60 - samples/sec: 558.57 - lr: 0.000075 - momentum: 0.000000 2023-10-11 11:17:41,521 epoch 6 - iter 243/272 - loss 0.07960023 - time (sec): 81.99 - samples/sec: 561.54 - lr: 0.000073 - momentum: 0.000000 2023-10-11 11:17:51,226 epoch 6 - iter 270/272 - loss 0.07884413 - time (sec): 91.69 - samples/sec: 561.73 - lr: 0.000071 - momentum: 0.000000 2023-10-11 11:17:51,887 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:17:51,887 EPOCH 6 done: loss 0.0788 - lr: 0.000071 2023-10-11 11:17:57,376 DEV : loss 0.14096224308013916 - f1-score (micro avg) 0.7518 2023-10-11 11:17:57,385 saving best model 2023-10-11 11:17:59,894 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:18:09,294 epoch 7 - iter 27/272 - loss 0.06341984 - time (sec): 9.40 - samples/sec: 573.61 - lr: 0.000069 - momentum: 0.000000 2023-10-11 11:18:18,490 epoch 7 - iter 54/272 - loss 0.07180111 - time (sec): 18.59 - samples/sec: 549.22 - lr: 0.000068 - momentum: 0.000000 2023-10-11 11:18:28,484 epoch 7 - iter 81/272 - loss 0.06364756 - time (sec): 28.59 - samples/sec: 562.06 - lr: 0.000066 - momentum: 0.000000 2023-10-11 11:18:37,887 epoch 7 - iter 108/272 - loss 0.06134236 - time (sec): 37.99 - samples/sec: 562.23 - lr: 0.000064 - momentum: 0.000000 2023-10-11 11:18:47,377 epoch 7 - iter 135/272 - loss 0.06432419 - time (sec): 47.48 - samples/sec: 562.86 - lr: 0.000062 - momentum: 0.000000 2023-10-11 11:18:56,504 epoch 7 - iter 162/272 - loss 0.06152738 - time (sec): 56.61 - samples/sec: 558.29 - lr: 0.000061 - momentum: 0.000000 2023-10-11 11:19:06,158 epoch 7 - iter 189/272 - loss 0.06438334 - time (sec): 66.26 - samples/sec: 557.18 - lr: 0.000059 - momentum: 0.000000 2023-10-11 11:19:14,940 epoch 7 - iter 216/272 - loss 0.06377457 - time (sec): 75.04 - samples/sec: 552.50 - lr: 0.000057 - momentum: 0.000000 2023-10-11 11:19:24,451 epoch 7 - iter 243/272 - loss 0.06205376 - time (sec): 84.55 - samples/sec: 554.17 - lr: 0.000055 - momentum: 0.000000 2023-10-11 11:19:33,714 epoch 7 - iter 270/272 - loss 0.05922229 - time (sec): 93.82 - samples/sec: 552.13 - lr: 0.000054 - momentum: 0.000000 2023-10-11 11:19:34,122 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:19:34,122 EPOCH 7 done: loss 0.0591 - lr: 0.000054 2023-10-11 11:19:39,872 DEV : loss 0.15206335484981537 - f1-score (micro avg) 0.7609 2023-10-11 11:19:39,881 saving best model 2023-10-11 11:19:42,622 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:19:51,678 epoch 8 - iter 27/272 - loss 0.05148392 - time (sec): 9.05 - samples/sec: 571.50 - lr: 0.000052 - momentum: 0.000000 2023-10-11 11:20:00,288 epoch 8 - iter 54/272 - loss 0.04823998 - time (sec): 17.66 - samples/sec: 554.33 - lr: 0.000050 - momentum: 0.000000 2023-10-11 11:20:10,210 epoch 8 - iter 81/272 - loss 0.05011770 - time (sec): 27.58 - samples/sec: 558.63 - lr: 0.000048 - momentum: 0.000000 2023-10-11 11:20:19,535 epoch 8 - iter 108/272 - loss 0.05553018 - time (sec): 36.91 - samples/sec: 550.68 - lr: 0.000046 - momentum: 0.000000 2023-10-11 11:20:29,307 epoch 8 - iter 135/272 - loss 0.05185215 - time (sec): 46.68 - samples/sec: 547.66 - lr: 0.000045 - momentum: 0.000000 2023-10-11 11:20:39,021 epoch 8 - iter 162/272 - loss 0.05106686 - time (sec): 56.39 - samples/sec: 545.69 - lr: 0.000043 - momentum: 0.000000 2023-10-11 11:20:48,754 epoch 8 - iter 189/272 - loss 0.04852618 - time (sec): 66.13 - samples/sec: 546.70 - lr: 0.000041 - momentum: 0.000000 2023-10-11 11:20:58,466 epoch 8 - iter 216/272 - loss 0.04781162 - time (sec): 75.84 - samples/sec: 548.61 - lr: 0.000039 - momentum: 0.000000 2023-10-11 11:21:08,324 epoch 8 - iter 243/272 - loss 0.04540073 - time (sec): 85.70 - samples/sec: 549.88 - lr: 0.000038 - momentum: 0.000000 2023-10-11 11:21:17,256 epoch 8 - iter 270/272 - loss 0.04536122 - time (sec): 94.63 - samples/sec: 545.93 - lr: 0.000036 - momentum: 0.000000 2023-10-11 11:21:17,809 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:21:17,809 EPOCH 8 done: loss 0.0452 - lr: 0.000036 2023-10-11 11:21:23,595 DEV : loss 0.1474316567182541 - f1-score (micro avg) 0.7776 2023-10-11 11:21:23,604 saving best model 2023-10-11 11:21:26,122 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:21:36,109 epoch 9 - iter 27/272 - loss 0.03768928 - time (sec): 9.98 - samples/sec: 584.31 - lr: 0.000034 - momentum: 0.000000 2023-10-11 11:21:45,675 epoch 9 - iter 54/272 - loss 0.03902434 - time (sec): 19.55 - samples/sec: 569.27 - lr: 0.000032 - momentum: 0.000000 2023-10-11 11:21:54,404 epoch 9 - iter 81/272 - loss 0.03949006 - time (sec): 28.28 - samples/sec: 549.31 - lr: 0.000030 - momentum: 0.000000 2023-10-11 11:22:03,982 epoch 9 - iter 108/272 - loss 0.04115931 - time (sec): 37.86 - samples/sec: 544.01 - lr: 0.000029 - momentum: 0.000000 2023-10-11 11:22:13,725 epoch 9 - iter 135/272 - loss 0.04113215 - time (sec): 47.60 - samples/sec: 546.99 - lr: 0.000027 - momentum: 0.000000 2023-10-11 11:22:23,425 epoch 9 - iter 162/272 - loss 0.04040816 - time (sec): 57.30 - samples/sec: 547.83 - lr: 0.000025 - momentum: 0.000000 2023-10-11 11:22:32,701 epoch 9 - iter 189/272 - loss 0.03962695 - time (sec): 66.57 - samples/sec: 545.19 - lr: 0.000023 - momentum: 0.000000 2023-10-11 11:22:41,925 epoch 9 - iter 216/272 - loss 0.03930060 - time (sec): 75.80 - samples/sec: 542.07 - lr: 0.000022 - momentum: 0.000000 2023-10-11 11:22:52,271 epoch 9 - iter 243/272 - loss 0.03855005 - time (sec): 86.14 - samples/sec: 545.98 - lr: 0.000020 - momentum: 0.000000 2023-10-11 11:23:01,596 epoch 9 - iter 270/272 - loss 0.03738276 - time (sec): 95.47 - samples/sec: 543.30 - lr: 0.000018 - momentum: 0.000000 2023-10-11 11:23:01,958 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:23:01,958 EPOCH 9 done: loss 0.0374 - lr: 0.000018 2023-10-11 11:23:07,784 DEV : loss 0.15458551049232483 - f1-score (micro avg) 0.7717 2023-10-11 11:23:07,793 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:23:17,507 epoch 10 - iter 27/272 - loss 0.03642691 - time (sec): 9.71 - samples/sec: 560.74 - lr: 0.000016 - momentum: 0.000000 2023-10-11 11:23:26,192 epoch 10 - iter 54/272 - loss 0.03760888 - time (sec): 18.40 - samples/sec: 530.22 - lr: 0.000014 - momentum: 0.000000 2023-10-11 11:23:36,876 epoch 10 - iter 81/272 - loss 0.03781097 - time (sec): 29.08 - samples/sec: 562.52 - lr: 0.000013 - momentum: 0.000000 2023-10-11 11:23:47,197 epoch 10 - iter 108/272 - loss 0.03981923 - time (sec): 39.40 - samples/sec: 569.48 - lr: 0.000011 - momentum: 0.000000 2023-10-11 11:23:56,858 epoch 10 - iter 135/272 - loss 0.03893238 - time (sec): 49.06 - samples/sec: 566.00 - lr: 0.000009 - momentum: 0.000000 2023-10-11 11:24:05,392 epoch 10 - iter 162/272 - loss 0.03669560 - time (sec): 57.60 - samples/sec: 553.96 - lr: 0.000007 - momentum: 0.000000 2023-10-11 11:24:15,658 epoch 10 - iter 189/272 - loss 0.03548350 - time (sec): 67.86 - samples/sec: 551.09 - lr: 0.000005 - momentum: 0.000000 2023-10-11 11:24:24,996 epoch 10 - iter 216/272 - loss 0.03383848 - time (sec): 77.20 - samples/sec: 543.09 - lr: 0.000004 - momentum: 0.000000 2023-10-11 11:24:34,515 epoch 10 - iter 243/272 - loss 0.03361531 - time (sec): 86.72 - samples/sec: 542.13 - lr: 0.000002 - momentum: 0.000000 2023-10-11 11:24:43,874 epoch 10 - iter 270/272 - loss 0.03227112 - time (sec): 96.08 - samples/sec: 538.59 - lr: 0.000000 - momentum: 0.000000 2023-10-11 11:24:44,326 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:24:44,326 EPOCH 10 done: loss 0.0323 - lr: 0.000000 2023-10-11 11:24:50,079 DEV : loss 0.15587206184864044 - f1-score (micro avg) 0.7731 2023-10-11 11:24:50,931 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:24:50,933 Loading model from best epoch ... 2023-10-11 11:24:54,558 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-11 11:25:06,281 Results: - F-score (micro) 0.7481 - F-score (macro) 0.6929 - Accuracy 0.6177 By class: precision recall f1-score support LOC 0.7357 0.8654 0.7953 312 PER 0.6434 0.8846 0.7449 208 ORG 0.4615 0.4364 0.4486 55 HumanProd 0.7500 0.8182 0.7826 22 micro avg 0.6804 0.8308 0.7481 597 macro avg 0.6476 0.7511 0.6929 597 weighted avg 0.6788 0.8308 0.7453 597 2023-10-11 11:25:06,282 ----------------------------------------------------------------------------------------------------