2023-10-11 00:31:54,080 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,083 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 00:31:54,083 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,083 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-11 00:31:54,083 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,083 Train: 1166 sentences 2023-10-11 00:31:54,083 (train_with_dev=False, train_with_test=False) 2023-10-11 00:31:54,083 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,083 Training Params: 2023-10-11 00:31:54,083 - learning_rate: "0.00016" 2023-10-11 00:31:54,083 - mini_batch_size: "8" 2023-10-11 00:31:54,084 - max_epochs: "10" 2023-10-11 00:31:54,084 - shuffle: "True" 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,084 Plugins: 2023-10-11 00:31:54,084 - TensorboardLogger 2023-10-11 00:31:54,084 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,084 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 00:31:54,084 - metric: "('micro avg', 'f1-score')" 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,084 Computation: 2023-10-11 00:31:54,084 - compute on device: cuda:0 2023-10-11 00:31:54,084 - embedding storage: none 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,084 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-11 00:31:54,084 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,085 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:31:54,085 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 00:32:02,779 epoch 1 - iter 14/146 - loss 2.82806942 - time (sec): 8.69 - samples/sec: 431.49 - lr: 0.000014 - momentum: 0.000000 2023-10-11 00:32:12,268 epoch 1 - iter 28/146 - loss 2.81932722 - time (sec): 18.18 - samples/sec: 447.92 - lr: 0.000030 - momentum: 0.000000 2023-10-11 00:32:21,654 epoch 1 - iter 42/146 - loss 2.80887588 - time (sec): 27.57 - samples/sec: 443.61 - lr: 0.000045 - momentum: 0.000000 2023-10-11 00:32:30,455 epoch 1 - iter 56/146 - loss 2.79001925 - time (sec): 36.37 - samples/sec: 435.57 - lr: 0.000060 - momentum: 0.000000 2023-10-11 00:32:40,316 epoch 1 - iter 70/146 - loss 2.74954234 - time (sec): 46.23 - samples/sec: 447.70 - lr: 0.000076 - momentum: 0.000000 2023-10-11 00:32:50,359 epoch 1 - iter 84/146 - loss 2.69183004 - time (sec): 56.27 - samples/sec: 456.68 - lr: 0.000091 - momentum: 0.000000 2023-10-11 00:32:59,788 epoch 1 - iter 98/146 - loss 2.62431560 - time (sec): 65.70 - samples/sec: 456.55 - lr: 0.000106 - momentum: 0.000000 2023-10-11 00:33:09,461 epoch 1 - iter 112/146 - loss 2.55499292 - time (sec): 75.37 - samples/sec: 451.80 - lr: 0.000122 - momentum: 0.000000 2023-10-11 00:33:18,969 epoch 1 - iter 126/146 - loss 2.46794210 - time (sec): 84.88 - samples/sec: 452.41 - lr: 0.000137 - momentum: 0.000000 2023-10-11 00:33:28,484 epoch 1 - iter 140/146 - loss 2.38373239 - time (sec): 94.40 - samples/sec: 451.06 - lr: 0.000152 - momentum: 0.000000 2023-10-11 00:33:32,416 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:33:32,416 EPOCH 1 done: loss 2.3459 - lr: 0.000152 2023-10-11 00:33:37,294 DEV : loss 1.2697190046310425 - f1-score (micro avg) 0.0 2023-10-11 00:33:37,303 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:33:46,076 epoch 2 - iter 14/146 - loss 1.28532706 - time (sec): 8.77 - samples/sec: 430.14 - lr: 0.000158 - momentum: 0.000000 2023-10-11 00:33:55,330 epoch 2 - iter 28/146 - loss 1.18643116 - time (sec): 18.03 - samples/sec: 432.90 - lr: 0.000157 - momentum: 0.000000 2023-10-11 00:34:04,613 epoch 2 - iter 42/146 - loss 1.10584441 - time (sec): 27.31 - samples/sec: 441.81 - lr: 0.000155 - momentum: 0.000000 2023-10-11 00:34:13,560 epoch 2 - iter 56/146 - loss 1.04255237 - time (sec): 36.26 - samples/sec: 438.39 - lr: 0.000153 - momentum: 0.000000 2023-10-11 00:34:23,157 epoch 2 - iter 70/146 - loss 0.95927936 - time (sec): 45.85 - samples/sec: 446.28 - lr: 0.000152 - momentum: 0.000000 2023-10-11 00:34:32,776 epoch 2 - iter 84/146 - loss 0.93373424 - time (sec): 55.47 - samples/sec: 451.40 - lr: 0.000150 - momentum: 0.000000 2023-10-11 00:34:41,982 epoch 2 - iter 98/146 - loss 0.89334982 - time (sec): 64.68 - samples/sec: 448.77 - lr: 0.000148 - momentum: 0.000000 2023-10-11 00:34:51,404 epoch 2 - iter 112/146 - loss 0.84771656 - time (sec): 74.10 - samples/sec: 451.21 - lr: 0.000147 - momentum: 0.000000 2023-10-11 00:35:01,045 epoch 2 - iter 126/146 - loss 0.81012905 - time (sec): 83.74 - samples/sec: 452.68 - lr: 0.000145 - momentum: 0.000000 2023-10-11 00:35:10,772 epoch 2 - iter 140/146 - loss 0.78129014 - time (sec): 93.47 - samples/sec: 453.54 - lr: 0.000143 - momentum: 0.000000 2023-10-11 00:35:14,956 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:35:14,956 EPOCH 2 done: loss 0.7769 - lr: 0.000143 2023-10-11 00:35:20,385 DEV : loss 0.4217626750469208 - f1-score (micro avg) 0.0 2023-10-11 00:35:20,394 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:35:30,560 epoch 3 - iter 14/146 - loss 0.52507267 - time (sec): 10.16 - samples/sec: 487.22 - lr: 0.000141 - momentum: 0.000000 2023-10-11 00:35:40,572 epoch 3 - iter 28/146 - loss 0.47734218 - time (sec): 20.18 - samples/sec: 498.62 - lr: 0.000139 - momentum: 0.000000 2023-10-11 00:35:49,814 epoch 3 - iter 42/146 - loss 0.52236022 - time (sec): 29.42 - samples/sec: 491.47 - lr: 0.000137 - momentum: 0.000000 2023-10-11 00:35:58,140 epoch 3 - iter 56/146 - loss 0.49316258 - time (sec): 37.74 - samples/sec: 490.92 - lr: 0.000136 - momentum: 0.000000 2023-10-11 00:36:06,676 epoch 3 - iter 70/146 - loss 0.48608795 - time (sec): 46.28 - samples/sec: 493.22 - lr: 0.000134 - momentum: 0.000000 2023-10-11 00:36:15,235 epoch 3 - iter 84/146 - loss 0.46795110 - time (sec): 54.84 - samples/sec: 495.23 - lr: 0.000132 - momentum: 0.000000 2023-10-11 00:36:23,555 epoch 3 - iter 98/146 - loss 0.44959769 - time (sec): 63.16 - samples/sec: 493.81 - lr: 0.000131 - momentum: 0.000000 2023-10-11 00:36:31,553 epoch 3 - iter 112/146 - loss 0.44307766 - time (sec): 71.16 - samples/sec: 488.26 - lr: 0.000129 - momentum: 0.000000 2023-10-11 00:36:39,208 epoch 3 - iter 126/146 - loss 0.43279780 - time (sec): 78.81 - samples/sec: 482.64 - lr: 0.000127 - momentum: 0.000000 2023-10-11 00:36:47,675 epoch 3 - iter 140/146 - loss 0.42683695 - time (sec): 87.28 - samples/sec: 482.64 - lr: 0.000125 - momentum: 0.000000 2023-10-11 00:36:51,608 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:36:51,608 EPOCH 3 done: loss 0.4184 - lr: 0.000125 2023-10-11 00:36:57,059 DEV : loss 0.2698569595813751 - f1-score (micro avg) 0.2605 2023-10-11 00:36:57,068 saving best model 2023-10-11 00:36:57,949 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:37:06,223 epoch 4 - iter 14/146 - loss 0.31637424 - time (sec): 8.27 - samples/sec: 464.20 - lr: 0.000123 - momentum: 0.000000 2023-10-11 00:37:15,069 epoch 4 - iter 28/146 - loss 0.31876580 - time (sec): 17.12 - samples/sec: 488.18 - lr: 0.000121 - momentum: 0.000000 2023-10-11 00:37:23,150 epoch 4 - iter 42/146 - loss 0.30486535 - time (sec): 25.20 - samples/sec: 488.19 - lr: 0.000120 - momentum: 0.000000 2023-10-11 00:37:31,536 epoch 4 - iter 56/146 - loss 0.31674327 - time (sec): 33.59 - samples/sec: 491.25 - lr: 0.000118 - momentum: 0.000000 2023-10-11 00:37:40,236 epoch 4 - iter 70/146 - loss 0.30195569 - time (sec): 42.29 - samples/sec: 500.46 - lr: 0.000116 - momentum: 0.000000 2023-10-11 00:37:48,519 epoch 4 - iter 84/146 - loss 0.32557627 - time (sec): 50.57 - samples/sec: 499.37 - lr: 0.000115 - momentum: 0.000000 2023-10-11 00:37:56,785 epoch 4 - iter 98/146 - loss 0.31759790 - time (sec): 58.83 - samples/sec: 498.45 - lr: 0.000113 - momentum: 0.000000 2023-10-11 00:38:05,588 epoch 4 - iter 112/146 - loss 0.31022905 - time (sec): 67.64 - samples/sec: 501.32 - lr: 0.000111 - momentum: 0.000000 2023-10-11 00:38:13,906 epoch 4 - iter 126/146 - loss 0.31045555 - time (sec): 75.96 - samples/sec: 500.41 - lr: 0.000109 - momentum: 0.000000 2023-10-11 00:38:22,839 epoch 4 - iter 140/146 - loss 0.30366458 - time (sec): 84.89 - samples/sec: 499.98 - lr: 0.000108 - momentum: 0.000000 2023-10-11 00:38:26,552 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:38:26,552 EPOCH 4 done: loss 0.2989 - lr: 0.000108 2023-10-11 00:38:32,194 DEV : loss 0.209104984998703 - f1-score (micro avg) 0.4208 2023-10-11 00:38:32,206 saving best model 2023-10-11 00:38:39,160 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:38:49,119 epoch 5 - iter 14/146 - loss 0.25760456 - time (sec): 9.96 - samples/sec: 452.21 - lr: 0.000105 - momentum: 0.000000 2023-10-11 00:38:58,893 epoch 5 - iter 28/146 - loss 0.23471806 - time (sec): 19.73 - samples/sec: 438.29 - lr: 0.000104 - momentum: 0.000000 2023-10-11 00:39:08,357 epoch 5 - iter 42/146 - loss 0.26661266 - time (sec): 29.19 - samples/sec: 432.83 - lr: 0.000102 - momentum: 0.000000 2023-10-11 00:39:17,448 epoch 5 - iter 56/146 - loss 0.28286595 - time (sec): 38.28 - samples/sec: 428.20 - lr: 0.000100 - momentum: 0.000000 2023-10-11 00:39:27,150 epoch 5 - iter 70/146 - loss 0.26401052 - time (sec): 47.99 - samples/sec: 427.81 - lr: 0.000099 - momentum: 0.000000 2023-10-11 00:39:37,205 epoch 5 - iter 84/146 - loss 0.25154027 - time (sec): 58.04 - samples/sec: 433.92 - lr: 0.000097 - momentum: 0.000000 2023-10-11 00:39:47,492 epoch 5 - iter 98/146 - loss 0.24582348 - time (sec): 68.33 - samples/sec: 443.59 - lr: 0.000095 - momentum: 0.000000 2023-10-11 00:39:57,217 epoch 5 - iter 112/146 - loss 0.23536220 - time (sec): 78.05 - samples/sec: 444.78 - lr: 0.000093 - momentum: 0.000000 2023-10-11 00:40:06,736 epoch 5 - iter 126/146 - loss 0.23213010 - time (sec): 87.57 - samples/sec: 445.89 - lr: 0.000092 - momentum: 0.000000 2023-10-11 00:40:16,105 epoch 5 - iter 140/146 - loss 0.22807507 - time (sec): 96.94 - samples/sec: 444.47 - lr: 0.000090 - momentum: 0.000000 2023-10-11 00:40:19,721 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:40:19,722 EPOCH 5 done: loss 0.2286 - lr: 0.000090 2023-10-11 00:40:26,552 DEV : loss 0.17275798320770264 - f1-score (micro avg) 0.533 2023-10-11 00:40:26,563 saving best model 2023-10-11 00:40:34,103 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:40:43,731 epoch 6 - iter 14/146 - loss 0.14647893 - time (sec): 9.62 - samples/sec: 508.91 - lr: 0.000088 - momentum: 0.000000 2023-10-11 00:40:52,055 epoch 6 - iter 28/146 - loss 0.15333865 - time (sec): 17.95 - samples/sec: 475.21 - lr: 0.000086 - momentum: 0.000000 2023-10-11 00:41:00,745 epoch 6 - iter 42/146 - loss 0.15603887 - time (sec): 26.64 - samples/sec: 476.53 - lr: 0.000084 - momentum: 0.000000 2023-10-11 00:41:09,885 epoch 6 - iter 56/146 - loss 0.14799884 - time (sec): 35.78 - samples/sec: 480.65 - lr: 0.000083 - momentum: 0.000000 2023-10-11 00:41:18,384 epoch 6 - iter 70/146 - loss 0.16215169 - time (sec): 44.28 - samples/sec: 478.17 - lr: 0.000081 - momentum: 0.000000 2023-10-11 00:41:28,260 epoch 6 - iter 84/146 - loss 0.17969819 - time (sec): 54.15 - samples/sec: 492.60 - lr: 0.000079 - momentum: 0.000000 2023-10-11 00:41:36,809 epoch 6 - iter 98/146 - loss 0.17881606 - time (sec): 62.70 - samples/sec: 489.82 - lr: 0.000077 - momentum: 0.000000 2023-10-11 00:41:45,394 epoch 6 - iter 112/146 - loss 0.17728906 - time (sec): 71.29 - samples/sec: 488.60 - lr: 0.000076 - momentum: 0.000000 2023-10-11 00:41:54,314 epoch 6 - iter 126/146 - loss 0.17321844 - time (sec): 80.21 - samples/sec: 486.73 - lr: 0.000074 - momentum: 0.000000 2023-10-11 00:42:02,929 epoch 6 - iter 140/146 - loss 0.17286199 - time (sec): 88.82 - samples/sec: 481.50 - lr: 0.000072 - momentum: 0.000000 2023-10-11 00:42:06,597 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:42:06,597 EPOCH 6 done: loss 0.1704 - lr: 0.000072 2023-10-11 00:42:12,450 DEV : loss 0.1590806394815445 - f1-score (micro avg) 0.6079 2023-10-11 00:42:12,460 saving best model 2023-10-11 00:42:19,778 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:42:29,140 epoch 7 - iter 14/146 - loss 0.13574474 - time (sec): 9.36 - samples/sec: 494.46 - lr: 0.000070 - momentum: 0.000000 2023-10-11 00:42:38,694 epoch 7 - iter 28/146 - loss 0.13424153 - time (sec): 18.91 - samples/sec: 501.67 - lr: 0.000068 - momentum: 0.000000 2023-10-11 00:42:47,777 epoch 7 - iter 42/146 - loss 0.13292302 - time (sec): 27.99 - samples/sec: 487.31 - lr: 0.000067 - momentum: 0.000000 2023-10-11 00:42:56,399 epoch 7 - iter 56/146 - loss 0.12583950 - time (sec): 36.62 - samples/sec: 476.75 - lr: 0.000065 - momentum: 0.000000 2023-10-11 00:43:05,449 epoch 7 - iter 70/146 - loss 0.12456020 - time (sec): 45.67 - samples/sec: 471.05 - lr: 0.000063 - momentum: 0.000000 2023-10-11 00:43:13,769 epoch 7 - iter 84/146 - loss 0.12805783 - time (sec): 53.99 - samples/sec: 469.43 - lr: 0.000061 - momentum: 0.000000 2023-10-11 00:43:22,866 epoch 7 - iter 98/146 - loss 0.13275272 - time (sec): 63.08 - samples/sec: 473.50 - lr: 0.000060 - momentum: 0.000000 2023-10-11 00:43:31,148 epoch 7 - iter 112/146 - loss 0.13262712 - time (sec): 71.37 - samples/sec: 465.17 - lr: 0.000058 - momentum: 0.000000 2023-10-11 00:43:40,630 epoch 7 - iter 126/146 - loss 0.13489647 - time (sec): 80.85 - samples/sec: 470.22 - lr: 0.000056 - momentum: 0.000000 2023-10-11 00:43:50,153 epoch 7 - iter 140/146 - loss 0.13471555 - time (sec): 90.37 - samples/sec: 475.26 - lr: 0.000055 - momentum: 0.000000 2023-10-11 00:43:53,631 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:43:53,631 EPOCH 7 done: loss 0.1341 - lr: 0.000055 2023-10-11 00:43:59,885 DEV : loss 0.1412263810634613 - f1-score (micro avg) 0.7484 2023-10-11 00:43:59,896 saving best model 2023-10-11 00:44:04,148 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:44:14,108 epoch 8 - iter 14/146 - loss 0.11916361 - time (sec): 9.96 - samples/sec: 527.93 - lr: 0.000052 - momentum: 0.000000 2023-10-11 00:44:22,742 epoch 8 - iter 28/146 - loss 0.13021523 - time (sec): 18.59 - samples/sec: 481.62 - lr: 0.000051 - momentum: 0.000000 2023-10-11 00:44:31,277 epoch 8 - iter 42/146 - loss 0.12227977 - time (sec): 27.12 - samples/sec: 474.15 - lr: 0.000049 - momentum: 0.000000 2023-10-11 00:44:39,841 epoch 8 - iter 56/146 - loss 0.12444551 - time (sec): 35.69 - samples/sec: 479.15 - lr: 0.000047 - momentum: 0.000000 2023-10-11 00:44:48,733 epoch 8 - iter 70/146 - loss 0.12722021 - time (sec): 44.58 - samples/sec: 483.62 - lr: 0.000045 - momentum: 0.000000 2023-10-11 00:44:57,112 epoch 8 - iter 84/146 - loss 0.12712166 - time (sec): 52.96 - samples/sec: 476.71 - lr: 0.000044 - momentum: 0.000000 2023-10-11 00:45:05,995 epoch 8 - iter 98/146 - loss 0.12123456 - time (sec): 61.84 - samples/sec: 474.22 - lr: 0.000042 - momentum: 0.000000 2023-10-11 00:45:15,792 epoch 8 - iter 112/146 - loss 0.11590990 - time (sec): 71.64 - samples/sec: 470.71 - lr: 0.000040 - momentum: 0.000000 2023-10-11 00:45:25,852 epoch 8 - iter 126/146 - loss 0.11272017 - time (sec): 81.70 - samples/sec: 467.06 - lr: 0.000039 - momentum: 0.000000 2023-10-11 00:45:35,986 epoch 8 - iter 140/146 - loss 0.11288497 - time (sec): 91.83 - samples/sec: 462.69 - lr: 0.000037 - momentum: 0.000000 2023-10-11 00:45:40,161 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:45:40,161 EPOCH 8 done: loss 0.1126 - lr: 0.000037 2023-10-11 00:45:46,862 DEV : loss 0.13121522963047028 - f1-score (micro avg) 0.7425 2023-10-11 00:45:46,873 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:45:56,825 epoch 9 - iter 14/146 - loss 0.12532790 - time (sec): 9.95 - samples/sec: 472.51 - lr: 0.000035 - momentum: 0.000000 2023-10-11 00:46:07,362 epoch 9 - iter 28/146 - loss 0.10275371 - time (sec): 20.49 - samples/sec: 455.04 - lr: 0.000033 - momentum: 0.000000 2023-10-11 00:46:16,396 epoch 9 - iter 42/146 - loss 0.09731750 - time (sec): 29.52 - samples/sec: 445.90 - lr: 0.000031 - momentum: 0.000000 2023-10-11 00:46:26,755 epoch 9 - iter 56/146 - loss 0.09793219 - time (sec): 39.88 - samples/sec: 442.79 - lr: 0.000029 - momentum: 0.000000 2023-10-11 00:46:36,624 epoch 9 - iter 70/146 - loss 0.09884983 - time (sec): 49.75 - samples/sec: 441.18 - lr: 0.000028 - momentum: 0.000000 2023-10-11 00:46:46,452 epoch 9 - iter 84/146 - loss 0.09904834 - time (sec): 59.58 - samples/sec: 441.88 - lr: 0.000026 - momentum: 0.000000 2023-10-11 00:46:56,310 epoch 9 - iter 98/146 - loss 0.09634791 - time (sec): 69.43 - samples/sec: 438.61 - lr: 0.000024 - momentum: 0.000000 2023-10-11 00:47:06,018 epoch 9 - iter 112/146 - loss 0.09264175 - time (sec): 79.14 - samples/sec: 439.24 - lr: 0.000023 - momentum: 0.000000 2023-10-11 00:47:15,964 epoch 9 - iter 126/146 - loss 0.09617828 - time (sec): 89.09 - samples/sec: 438.73 - lr: 0.000021 - momentum: 0.000000 2023-10-11 00:47:25,790 epoch 9 - iter 140/146 - loss 0.09910015 - time (sec): 98.91 - samples/sec: 435.56 - lr: 0.000019 - momentum: 0.000000 2023-10-11 00:47:29,342 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:47:29,343 EPOCH 9 done: loss 0.0986 - lr: 0.000019 2023-10-11 00:47:36,270 DEV : loss 0.1271078884601593 - f1-score (micro avg) 0.78 2023-10-11 00:47:36,281 saving best model 2023-10-11 00:47:42,280 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:47:51,489 epoch 10 - iter 14/146 - loss 0.10220196 - time (sec): 9.21 - samples/sec: 498.35 - lr: 0.000017 - momentum: 0.000000 2023-10-11 00:48:01,979 epoch 10 - iter 28/146 - loss 0.10097503 - time (sec): 19.70 - samples/sec: 463.65 - lr: 0.000015 - momentum: 0.000000 2023-10-11 00:48:11,849 epoch 10 - iter 42/146 - loss 0.10253434 - time (sec): 29.57 - samples/sec: 471.18 - lr: 0.000013 - momentum: 0.000000 2023-10-11 00:48:21,454 epoch 10 - iter 56/146 - loss 0.09627103 - time (sec): 39.17 - samples/sec: 475.71 - lr: 0.000012 - momentum: 0.000000 2023-10-11 00:48:29,986 epoch 10 - iter 70/146 - loss 0.09699707 - time (sec): 47.70 - samples/sec: 474.22 - lr: 0.000010 - momentum: 0.000000 2023-10-11 00:48:39,413 epoch 10 - iter 84/146 - loss 0.09278009 - time (sec): 57.13 - samples/sec: 472.41 - lr: 0.000008 - momentum: 0.000000 2023-10-11 00:48:47,745 epoch 10 - iter 98/146 - loss 0.09014855 - time (sec): 65.46 - samples/sec: 460.60 - lr: 0.000007 - momentum: 0.000000 2023-10-11 00:48:56,979 epoch 10 - iter 112/146 - loss 0.09219331 - time (sec): 74.70 - samples/sec: 463.89 - lr: 0.000005 - momentum: 0.000000 2023-10-11 00:49:05,736 epoch 10 - iter 126/146 - loss 0.09003232 - time (sec): 83.45 - samples/sec: 462.88 - lr: 0.000003 - momentum: 0.000000 2023-10-11 00:49:14,622 epoch 10 - iter 140/146 - loss 0.09300204 - time (sec): 92.34 - samples/sec: 462.35 - lr: 0.000002 - momentum: 0.000000 2023-10-11 00:49:18,215 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:49:18,215 EPOCH 10 done: loss 0.0927 - lr: 0.000002 2023-10-11 00:49:24,001 DEV : loss 0.1254645138978958 - f1-score (micro avg) 0.779 2023-10-11 00:49:24,961 ---------------------------------------------------------------------------------------------------- 2023-10-11 00:49:24,963 Loading model from best epoch ... 2023-10-11 00:49:29,153 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 00:49:42,185 Results: - F-score (micro) 0.7087 - F-score (macro) 0.628 - Accuracy 0.5675 By class: precision recall f1-score support PER 0.7895 0.8190 0.8039 348 LOC 0.5805 0.7739 0.6634 261 ORG 0.2979 0.2692 0.2828 52 HumanProd 0.8000 0.7273 0.7619 22 micro avg 0.6662 0.7570 0.7087 683 macro avg 0.6170 0.6474 0.6280 683 weighted avg 0.6725 0.7570 0.7092 683 2023-10-11 00:49:42,185 ----------------------------------------------------------------------------------------------------