2023-10-11 08:27:56,009 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,011 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 08:27:56,011 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,012 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-11 08:27:56,012 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,012 Train: 1085 sentences 2023-10-11 08:27:56,012 (train_with_dev=False, train_with_test=False) 2023-10-11 08:27:56,012 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,012 Training Params: 2023-10-11 08:27:56,012 - learning_rate: "0.00015" 2023-10-11 08:27:56,012 - mini_batch_size: "4" 2023-10-11 08:27:56,012 - max_epochs: "10" 2023-10-11 08:27:56,012 - shuffle: "True" 2023-10-11 08:27:56,013 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,013 Plugins: 2023-10-11 08:27:56,013 - TensorboardLogger 2023-10-11 08:27:56,013 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 08:27:56,013 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,013 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 08:27:56,013 - metric: "('micro avg', 'f1-score')" 2023-10-11 08:27:56,013 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,013 Computation: 2023-10-11 08:27:56,013 - compute on device: cuda:0 2023-10-11 08:27:56,013 - embedding storage: none 2023-10-11 08:27:56,013 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,013 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-11 08:27:56,014 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,014 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:27:56,014 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 08:28:05,746 epoch 1 - iter 27/272 - loss 2.82527220 - time (sec): 9.73 - samples/sec: 536.16 - lr: 0.000014 - momentum: 0.000000 2023-10-11 08:28:15,634 epoch 1 - iter 54/272 - loss 2.81578887 - time (sec): 19.62 - samples/sec: 555.40 - lr: 0.000029 - momentum: 0.000000 2023-10-11 08:28:25,349 epoch 1 - iter 81/272 - loss 2.79743887 - time (sec): 29.33 - samples/sec: 550.39 - lr: 0.000044 - momentum: 0.000000 2023-10-11 08:28:35,638 epoch 1 - iter 108/272 - loss 2.75063780 - time (sec): 39.62 - samples/sec: 552.67 - lr: 0.000059 - momentum: 0.000000 2023-10-11 08:28:44,316 epoch 1 - iter 135/272 - loss 2.68811497 - time (sec): 48.30 - samples/sec: 539.80 - lr: 0.000074 - momentum: 0.000000 2023-10-11 08:28:53,980 epoch 1 - iter 162/272 - loss 2.58887886 - time (sec): 57.96 - samples/sec: 543.38 - lr: 0.000089 - momentum: 0.000000 2023-10-11 08:29:03,748 epoch 1 - iter 189/272 - loss 2.47848820 - time (sec): 67.73 - samples/sec: 544.97 - lr: 0.000104 - momentum: 0.000000 2023-10-11 08:29:13,570 epoch 1 - iter 216/272 - loss 2.36431382 - time (sec): 77.55 - samples/sec: 544.94 - lr: 0.000119 - momentum: 0.000000 2023-10-11 08:29:23,147 epoch 1 - iter 243/272 - loss 2.25308445 - time (sec): 87.13 - samples/sec: 542.88 - lr: 0.000133 - momentum: 0.000000 2023-10-11 08:29:32,099 epoch 1 - iter 270/272 - loss 2.14779837 - time (sec): 96.08 - samples/sec: 538.86 - lr: 0.000148 - momentum: 0.000000 2023-10-11 08:29:32,557 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:29:32,557 EPOCH 1 done: loss 2.1429 - lr: 0.000148 2023-10-11 08:29:37,551 DEV : loss 0.8090639114379883 - f1-score (micro avg) 0.0 2023-10-11 08:29:37,560 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:29:46,988 epoch 2 - iter 27/272 - loss 0.79380094 - time (sec): 9.43 - samples/sec: 514.96 - lr: 0.000148 - momentum: 0.000000 2023-10-11 08:29:56,691 epoch 2 - iter 54/272 - loss 0.69639882 - time (sec): 19.13 - samples/sec: 524.13 - lr: 0.000147 - momentum: 0.000000 2023-10-11 08:30:06,795 epoch 2 - iter 81/272 - loss 0.66949044 - time (sec): 29.23 - samples/sec: 528.89 - lr: 0.000145 - momentum: 0.000000 2023-10-11 08:30:16,467 epoch 2 - iter 108/272 - loss 0.61712378 - time (sec): 38.91 - samples/sec: 531.11 - lr: 0.000143 - momentum: 0.000000 2023-10-11 08:30:25,467 epoch 2 - iter 135/272 - loss 0.59387096 - time (sec): 47.90 - samples/sec: 523.98 - lr: 0.000142 - momentum: 0.000000 2023-10-11 08:30:35,702 epoch 2 - iter 162/272 - loss 0.57455996 - time (sec): 58.14 - samples/sec: 533.22 - lr: 0.000140 - momentum: 0.000000 2023-10-11 08:30:45,302 epoch 2 - iter 189/272 - loss 0.55870790 - time (sec): 67.74 - samples/sec: 532.34 - lr: 0.000138 - momentum: 0.000000 2023-10-11 08:30:55,971 epoch 2 - iter 216/272 - loss 0.51696139 - time (sec): 78.41 - samples/sec: 538.06 - lr: 0.000137 - momentum: 0.000000 2023-10-11 08:31:05,589 epoch 2 - iter 243/272 - loss 0.49305630 - time (sec): 88.03 - samples/sec: 536.68 - lr: 0.000135 - momentum: 0.000000 2023-10-11 08:31:14,784 epoch 2 - iter 270/272 - loss 0.47744906 - time (sec): 97.22 - samples/sec: 531.86 - lr: 0.000134 - momentum: 0.000000 2023-10-11 08:31:15,351 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:31:15,351 EPOCH 2 done: loss 0.4760 - lr: 0.000134 2023-10-11 08:31:21,289 DEV : loss 0.2697048783302307 - f1-score (micro avg) 0.2098 2023-10-11 08:31:21,297 saving best model 2023-10-11 08:31:22,188 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:31:30,509 epoch 3 - iter 27/272 - loss 0.31259444 - time (sec): 8.32 - samples/sec: 479.62 - lr: 0.000132 - momentum: 0.000000 2023-10-11 08:31:40,176 epoch 3 - iter 54/272 - loss 0.28524280 - time (sec): 17.99 - samples/sec: 513.63 - lr: 0.000130 - momentum: 0.000000 2023-10-11 08:31:49,465 epoch 3 - iter 81/272 - loss 0.26782153 - time (sec): 27.27 - samples/sec: 526.80 - lr: 0.000128 - momentum: 0.000000 2023-10-11 08:31:58,900 epoch 3 - iter 108/272 - loss 0.26941576 - time (sec): 36.71 - samples/sec: 530.95 - lr: 0.000127 - momentum: 0.000000 2023-10-11 08:32:08,682 epoch 3 - iter 135/272 - loss 0.27249015 - time (sec): 46.49 - samples/sec: 540.93 - lr: 0.000125 - momentum: 0.000000 2023-10-11 08:32:18,933 epoch 3 - iter 162/272 - loss 0.26692047 - time (sec): 56.74 - samples/sec: 544.05 - lr: 0.000123 - momentum: 0.000000 2023-10-11 08:32:28,570 epoch 3 - iter 189/272 - loss 0.27090264 - time (sec): 66.38 - samples/sec: 546.78 - lr: 0.000122 - momentum: 0.000000 2023-10-11 08:32:37,583 epoch 3 - iter 216/272 - loss 0.26935437 - time (sec): 75.39 - samples/sec: 543.16 - lr: 0.000120 - momentum: 0.000000 2023-10-11 08:32:48,200 epoch 3 - iter 243/272 - loss 0.26035213 - time (sec): 86.01 - samples/sec: 547.64 - lr: 0.000119 - momentum: 0.000000 2023-10-11 08:32:57,570 epoch 3 - iter 270/272 - loss 0.25719703 - time (sec): 95.38 - samples/sec: 542.54 - lr: 0.000117 - momentum: 0.000000 2023-10-11 08:32:58,034 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:32:58,034 EPOCH 3 done: loss 0.2574 - lr: 0.000117 2023-10-11 08:33:03,831 DEV : loss 0.2006431519985199 - f1-score (micro avg) 0.5292 2023-10-11 08:33:03,840 saving best model 2023-10-11 08:33:10,036 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:33:19,376 epoch 4 - iter 27/272 - loss 0.21238760 - time (sec): 9.33 - samples/sec: 506.80 - lr: 0.000115 - momentum: 0.000000 2023-10-11 08:33:29,507 epoch 4 - iter 54/272 - loss 0.19368792 - time (sec): 19.47 - samples/sec: 542.44 - lr: 0.000113 - momentum: 0.000000 2023-10-11 08:33:38,980 epoch 4 - iter 81/272 - loss 0.18826367 - time (sec): 28.94 - samples/sec: 537.79 - lr: 0.000112 - momentum: 0.000000 2023-10-11 08:33:49,054 epoch 4 - iter 108/272 - loss 0.18386043 - time (sec): 39.01 - samples/sec: 544.59 - lr: 0.000110 - momentum: 0.000000 2023-10-11 08:33:58,456 epoch 4 - iter 135/272 - loss 0.17239772 - time (sec): 48.41 - samples/sec: 547.58 - lr: 0.000108 - momentum: 0.000000 2023-10-11 08:34:07,665 epoch 4 - iter 162/272 - loss 0.16928535 - time (sec): 57.62 - samples/sec: 544.77 - lr: 0.000107 - momentum: 0.000000 2023-10-11 08:34:18,217 epoch 4 - iter 189/272 - loss 0.16232035 - time (sec): 68.18 - samples/sec: 551.16 - lr: 0.000105 - momentum: 0.000000 2023-10-11 08:34:28,323 epoch 4 - iter 216/272 - loss 0.16453145 - time (sec): 78.28 - samples/sec: 546.45 - lr: 0.000103 - momentum: 0.000000 2023-10-11 08:34:37,908 epoch 4 - iter 243/272 - loss 0.16620141 - time (sec): 87.87 - samples/sec: 540.65 - lr: 0.000102 - momentum: 0.000000 2023-10-11 08:34:46,783 epoch 4 - iter 270/272 - loss 0.16558176 - time (sec): 96.74 - samples/sec: 535.57 - lr: 0.000100 - momentum: 0.000000 2023-10-11 08:34:47,194 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:34:47,194 EPOCH 4 done: loss 0.1654 - lr: 0.000100 2023-10-11 08:34:52,786 DEV : loss 0.15305934846401215 - f1-score (micro avg) 0.6691 2023-10-11 08:34:52,794 saving best model 2023-10-11 08:34:57,813 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:35:08,083 epoch 5 - iter 27/272 - loss 0.11542380 - time (sec): 10.27 - samples/sec: 539.29 - lr: 0.000098 - momentum: 0.000000 2023-10-11 08:35:17,686 epoch 5 - iter 54/272 - loss 0.12296413 - time (sec): 19.87 - samples/sec: 526.22 - lr: 0.000097 - momentum: 0.000000 2023-10-11 08:35:26,789 epoch 5 - iter 81/272 - loss 0.12324075 - time (sec): 28.97 - samples/sec: 512.37 - lr: 0.000095 - momentum: 0.000000 2023-10-11 08:35:36,462 epoch 5 - iter 108/272 - loss 0.11739274 - time (sec): 38.64 - samples/sec: 515.21 - lr: 0.000093 - momentum: 0.000000 2023-10-11 08:35:46,648 epoch 5 - iter 135/272 - loss 0.11863495 - time (sec): 48.83 - samples/sec: 521.52 - lr: 0.000092 - momentum: 0.000000 2023-10-11 08:35:56,318 epoch 5 - iter 162/272 - loss 0.11563924 - time (sec): 58.50 - samples/sec: 519.98 - lr: 0.000090 - momentum: 0.000000 2023-10-11 08:36:06,215 epoch 5 - iter 189/272 - loss 0.11198930 - time (sec): 68.40 - samples/sec: 518.10 - lr: 0.000088 - momentum: 0.000000 2023-10-11 08:36:16,740 epoch 5 - iter 216/272 - loss 0.10927879 - time (sec): 78.92 - samples/sec: 522.45 - lr: 0.000087 - momentum: 0.000000 2023-10-11 08:36:26,361 epoch 5 - iter 243/272 - loss 0.11127709 - time (sec): 88.54 - samples/sec: 519.88 - lr: 0.000085 - momentum: 0.000000 2023-10-11 08:36:36,707 epoch 5 - iter 270/272 - loss 0.10945143 - time (sec): 98.89 - samples/sec: 523.39 - lr: 0.000084 - momentum: 0.000000 2023-10-11 08:36:37,167 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:36:37,167 EPOCH 5 done: loss 0.1097 - lr: 0.000084 2023-10-11 08:36:43,351 DEV : loss 0.14368949830532074 - f1-score (micro avg) 0.7306 2023-10-11 08:36:43,359 saving best model 2023-10-11 08:36:45,916 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:36:56,115 epoch 6 - iter 27/272 - loss 0.07539543 - time (sec): 10.19 - samples/sec: 525.87 - lr: 0.000082 - momentum: 0.000000 2023-10-11 08:37:05,931 epoch 6 - iter 54/272 - loss 0.09543769 - time (sec): 20.01 - samples/sec: 506.27 - lr: 0.000080 - momentum: 0.000000 2023-10-11 08:37:16,403 epoch 6 - iter 81/272 - loss 0.09161672 - time (sec): 30.48 - samples/sec: 517.70 - lr: 0.000078 - momentum: 0.000000 2023-10-11 08:37:25,910 epoch 6 - iter 108/272 - loss 0.08558701 - time (sec): 39.99 - samples/sec: 514.59 - lr: 0.000077 - momentum: 0.000000 2023-10-11 08:37:35,143 epoch 6 - iter 135/272 - loss 0.08995611 - time (sec): 49.22 - samples/sec: 507.75 - lr: 0.000075 - momentum: 0.000000 2023-10-11 08:37:44,963 epoch 6 - iter 162/272 - loss 0.08413813 - time (sec): 59.04 - samples/sec: 509.34 - lr: 0.000073 - momentum: 0.000000 2023-10-11 08:37:54,464 epoch 6 - iter 189/272 - loss 0.08426293 - time (sec): 68.54 - samples/sec: 507.32 - lr: 0.000072 - momentum: 0.000000 2023-10-11 08:38:05,043 epoch 6 - iter 216/272 - loss 0.08373579 - time (sec): 79.12 - samples/sec: 510.97 - lr: 0.000070 - momentum: 0.000000 2023-10-11 08:38:15,537 epoch 6 - iter 243/272 - loss 0.07977679 - time (sec): 89.62 - samples/sec: 514.48 - lr: 0.000069 - momentum: 0.000000 2023-10-11 08:38:25,676 epoch 6 - iter 270/272 - loss 0.07725078 - time (sec): 99.76 - samples/sec: 517.38 - lr: 0.000067 - momentum: 0.000000 2023-10-11 08:38:26,295 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:38:26,296 EPOCH 6 done: loss 0.0776 - lr: 0.000067 2023-10-11 08:38:32,186 DEV : loss 0.14091677963733673 - f1-score (micro avg) 0.7487 2023-10-11 08:38:32,194 saving best model 2023-10-11 08:38:33,123 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:38:42,014 epoch 7 - iter 27/272 - loss 0.06607874 - time (sec): 8.89 - samples/sec: 438.23 - lr: 0.000065 - momentum: 0.000000 2023-10-11 08:38:53,029 epoch 7 - iter 54/272 - loss 0.07072193 - time (sec): 19.90 - samples/sec: 526.45 - lr: 0.000063 - momentum: 0.000000 2023-10-11 08:39:03,606 epoch 7 - iter 81/272 - loss 0.06434231 - time (sec): 30.48 - samples/sec: 538.94 - lr: 0.000062 - momentum: 0.000000 2023-10-11 08:39:13,896 epoch 7 - iter 108/272 - loss 0.06116880 - time (sec): 40.77 - samples/sec: 531.38 - lr: 0.000060 - momentum: 0.000000 2023-10-11 08:39:23,694 epoch 7 - iter 135/272 - loss 0.06217545 - time (sec): 50.57 - samples/sec: 531.88 - lr: 0.000058 - momentum: 0.000000 2023-10-11 08:39:33,282 epoch 7 - iter 162/272 - loss 0.06273333 - time (sec): 60.16 - samples/sec: 524.37 - lr: 0.000057 - momentum: 0.000000 2023-10-11 08:39:43,044 epoch 7 - iter 189/272 - loss 0.06023536 - time (sec): 69.92 - samples/sec: 524.91 - lr: 0.000055 - momentum: 0.000000 2023-10-11 08:39:52,205 epoch 7 - iter 216/272 - loss 0.05977588 - time (sec): 79.08 - samples/sec: 518.40 - lr: 0.000053 - momentum: 0.000000 2023-10-11 08:40:02,331 epoch 7 - iter 243/272 - loss 0.05885531 - time (sec): 89.21 - samples/sec: 519.06 - lr: 0.000052 - momentum: 0.000000 2023-10-11 08:40:12,657 epoch 7 - iter 270/272 - loss 0.05924263 - time (sec): 99.53 - samples/sec: 519.66 - lr: 0.000050 - momentum: 0.000000 2023-10-11 08:40:13,173 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:40:13,173 EPOCH 7 done: loss 0.0595 - lr: 0.000050 2023-10-11 08:40:19,155 DEV : loss 0.14236551523208618 - f1-score (micro avg) 0.7731 2023-10-11 08:40:19,164 saving best model 2023-10-11 08:40:21,752 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:40:31,183 epoch 8 - iter 27/272 - loss 0.04561038 - time (sec): 9.43 - samples/sec: 534.37 - lr: 0.000048 - momentum: 0.000000 2023-10-11 08:40:40,708 epoch 8 - iter 54/272 - loss 0.04281581 - time (sec): 18.95 - samples/sec: 526.40 - lr: 0.000047 - momentum: 0.000000 2023-10-11 08:40:51,411 epoch 8 - iter 81/272 - loss 0.04882565 - time (sec): 29.65 - samples/sec: 539.72 - lr: 0.000045 - momentum: 0.000000 2023-10-11 08:41:01,243 epoch 8 - iter 108/272 - loss 0.04678312 - time (sec): 39.49 - samples/sec: 529.16 - lr: 0.000043 - momentum: 0.000000 2023-10-11 08:41:11,088 epoch 8 - iter 135/272 - loss 0.04583440 - time (sec): 49.33 - samples/sec: 528.53 - lr: 0.000042 - momentum: 0.000000 2023-10-11 08:41:20,671 epoch 8 - iter 162/272 - loss 0.04565349 - time (sec): 58.91 - samples/sec: 532.04 - lr: 0.000040 - momentum: 0.000000 2023-10-11 08:41:29,823 epoch 8 - iter 189/272 - loss 0.04604960 - time (sec): 68.07 - samples/sec: 529.96 - lr: 0.000038 - momentum: 0.000000 2023-10-11 08:41:39,649 epoch 8 - iter 216/272 - loss 0.04541623 - time (sec): 77.89 - samples/sec: 534.53 - lr: 0.000037 - momentum: 0.000000 2023-10-11 08:41:48,976 epoch 8 - iter 243/272 - loss 0.04830839 - time (sec): 87.22 - samples/sec: 531.59 - lr: 0.000035 - momentum: 0.000000 2023-10-11 08:41:58,683 epoch 8 - iter 270/272 - loss 0.04705058 - time (sec): 96.93 - samples/sec: 532.86 - lr: 0.000034 - momentum: 0.000000 2023-10-11 08:41:59,240 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:41:59,241 EPOCH 8 done: loss 0.0472 - lr: 0.000034 2023-10-11 08:42:04,937 DEV : loss 0.1399533450603485 - f1-score (micro avg) 0.7877 2023-10-11 08:42:04,945 saving best model 2023-10-11 08:42:05,883 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:42:14,810 epoch 9 - iter 27/272 - loss 0.03068623 - time (sec): 8.93 - samples/sec: 507.55 - lr: 0.000032 - momentum: 0.000000 2023-10-11 08:42:24,940 epoch 9 - iter 54/272 - loss 0.02656723 - time (sec): 19.06 - samples/sec: 542.31 - lr: 0.000030 - momentum: 0.000000 2023-10-11 08:42:34,472 epoch 9 - iter 81/272 - loss 0.02819868 - time (sec): 28.59 - samples/sec: 542.87 - lr: 0.000028 - momentum: 0.000000 2023-10-11 08:42:43,997 epoch 9 - iter 108/272 - loss 0.03508973 - time (sec): 38.11 - samples/sec: 541.00 - lr: 0.000027 - momentum: 0.000000 2023-10-11 08:42:54,004 epoch 9 - iter 135/272 - loss 0.03637900 - time (sec): 48.12 - samples/sec: 536.69 - lr: 0.000025 - momentum: 0.000000 2023-10-11 08:43:05,395 epoch 9 - iter 162/272 - loss 0.03568471 - time (sec): 59.51 - samples/sec: 529.66 - lr: 0.000023 - momentum: 0.000000 2023-10-11 08:43:15,322 epoch 9 - iter 189/272 - loss 0.03486822 - time (sec): 69.44 - samples/sec: 520.12 - lr: 0.000022 - momentum: 0.000000 2023-10-11 08:43:25,598 epoch 9 - iter 216/272 - loss 0.03613166 - time (sec): 79.71 - samples/sec: 519.82 - lr: 0.000020 - momentum: 0.000000 2023-10-11 08:43:35,152 epoch 9 - iter 243/272 - loss 0.03922140 - time (sec): 89.27 - samples/sec: 515.74 - lr: 0.000019 - momentum: 0.000000 2023-10-11 08:43:45,659 epoch 9 - iter 270/272 - loss 0.03803645 - time (sec): 99.77 - samples/sec: 517.53 - lr: 0.000017 - momentum: 0.000000 2023-10-11 08:43:46,289 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:43:46,289 EPOCH 9 done: loss 0.0379 - lr: 0.000017 2023-10-11 08:43:52,708 DEV : loss 0.1408960521221161 - f1-score (micro avg) 0.7784 2023-10-11 08:43:52,719 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:44:02,018 epoch 10 - iter 27/272 - loss 0.02462086 - time (sec): 9.30 - samples/sec: 507.24 - lr: 0.000015 - momentum: 0.000000 2023-10-11 08:44:11,063 epoch 10 - iter 54/272 - loss 0.02805834 - time (sec): 18.34 - samples/sec: 489.03 - lr: 0.000013 - momentum: 0.000000 2023-10-11 08:44:20,606 epoch 10 - iter 81/272 - loss 0.03332562 - time (sec): 27.88 - samples/sec: 490.45 - lr: 0.000012 - momentum: 0.000000 2023-10-11 08:44:30,527 epoch 10 - iter 108/272 - loss 0.03136298 - time (sec): 37.81 - samples/sec: 505.40 - lr: 0.000010 - momentum: 0.000000 2023-10-11 08:44:41,197 epoch 10 - iter 135/272 - loss 0.03099600 - time (sec): 48.48 - samples/sec: 526.51 - lr: 0.000008 - momentum: 0.000000 2023-10-11 08:44:52,141 epoch 10 - iter 162/272 - loss 0.03378086 - time (sec): 59.42 - samples/sec: 539.65 - lr: 0.000007 - momentum: 0.000000 2023-10-11 08:45:02,208 epoch 10 - iter 189/272 - loss 0.03459241 - time (sec): 69.49 - samples/sec: 540.16 - lr: 0.000005 - momentum: 0.000000 2023-10-11 08:45:11,492 epoch 10 - iter 216/272 - loss 0.03427708 - time (sec): 78.77 - samples/sec: 531.53 - lr: 0.000003 - momentum: 0.000000 2023-10-11 08:45:20,905 epoch 10 - iter 243/272 - loss 0.03584933 - time (sec): 88.18 - samples/sec: 527.28 - lr: 0.000002 - momentum: 0.000000 2023-10-11 08:45:31,284 epoch 10 - iter 270/272 - loss 0.03499016 - time (sec): 98.56 - samples/sec: 525.50 - lr: 0.000000 - momentum: 0.000000 2023-10-11 08:45:31,735 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:45:31,735 EPOCH 10 done: loss 0.0350 - lr: 0.000000 2023-10-11 08:45:37,835 DEV : loss 0.14018605649471283 - f1-score (micro avg) 0.782 2023-10-11 08:45:38,743 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:45:38,745 Loading model from best epoch ... 2023-10-11 08:45:42,618 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-11 08:45:54,534 Results: - F-score (micro) 0.7593 - F-score (macro) 0.6641 - Accuracy 0.631 By class: precision recall f1-score support LOC 0.7466 0.8782 0.8071 312 PER 0.7115 0.8654 0.7809 208 ORG 0.4000 0.3273 0.3600 55 HumanProd 0.6538 0.7727 0.7083 22 micro avg 0.7077 0.8191 0.7593 597 macro avg 0.6280 0.7109 0.6641 597 weighted avg 0.6990 0.8191 0.7531 597 2023-10-11 08:45:54,534 ----------------------------------------------------------------------------------------------------