2023-10-11 02:00:27,642 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,645 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 02:00:27,645 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,645 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-11 02:00:27,645 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,645 Train: 7142 sentences 2023-10-11 02:00:27,645 (train_with_dev=False, train_with_test=False) 2023-10-11 02:00:27,646 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,646 Training Params: 2023-10-11 02:00:27,646 - learning_rate: "0.00015" 2023-10-11 02:00:27,646 - mini_batch_size: "8" 2023-10-11 02:00:27,646 - max_epochs: "10" 2023-10-11 02:00:27,646 - shuffle: "True" 2023-10-11 02:00:27,646 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,646 Plugins: 2023-10-11 02:00:27,646 - TensorboardLogger 2023-10-11 02:00:27,646 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 02:00:27,646 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,646 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 02:00:27,646 - metric: "('micro avg', 'f1-score')" 2023-10-11 02:00:27,646 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,646 Computation: 2023-10-11 02:00:27,646 - compute on device: cuda:0 2023-10-11 02:00:27,647 - embedding storage: none 2023-10-11 02:00:27,647 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,647 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-11 02:00:27,647 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,647 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:00:27,647 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 02:01:19,984 epoch 1 - iter 89/893 - loss 2.83790708 - time (sec): 52.33 - samples/sec: 507.40 - lr: 0.000015 - momentum: 0.000000 2023-10-11 02:02:11,747 epoch 1 - iter 178/893 - loss 2.77852176 - time (sec): 104.10 - samples/sec: 497.39 - lr: 0.000030 - momentum: 0.000000 2023-10-11 02:03:05,897 epoch 1 - iter 267/893 - loss 2.58685480 - time (sec): 158.25 - samples/sec: 490.84 - lr: 0.000045 - momentum: 0.000000 2023-10-11 02:03:55,271 epoch 1 - iter 356/893 - loss 2.36244979 - time (sec): 207.62 - samples/sec: 495.84 - lr: 0.000060 - momentum: 0.000000 2023-10-11 02:04:46,699 epoch 1 - iter 445/893 - loss 2.13959943 - time (sec): 259.05 - samples/sec: 491.26 - lr: 0.000075 - momentum: 0.000000 2023-10-11 02:05:38,352 epoch 1 - iter 534/893 - loss 1.91736663 - time (sec): 310.70 - samples/sec: 492.56 - lr: 0.000090 - momentum: 0.000000 2023-10-11 02:06:29,081 epoch 1 - iter 623/893 - loss 1.74080020 - time (sec): 361.43 - samples/sec: 492.25 - lr: 0.000104 - momentum: 0.000000 2023-10-11 02:07:23,712 epoch 1 - iter 712/893 - loss 1.60422960 - time (sec): 416.06 - samples/sec: 483.34 - lr: 0.000119 - momentum: 0.000000 2023-10-11 02:08:12,004 epoch 1 - iter 801/893 - loss 1.48109419 - time (sec): 464.36 - samples/sec: 483.49 - lr: 0.000134 - momentum: 0.000000 2023-10-11 02:08:59,733 epoch 1 - iter 890/893 - loss 1.37704593 - time (sec): 512.08 - samples/sec: 484.80 - lr: 0.000149 - momentum: 0.000000 2023-10-11 02:09:01,007 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:09:01,008 EPOCH 1 done: loss 1.3752 - lr: 0.000149 2023-10-11 02:09:20,678 DEV : loss 0.27791526913642883 - f1-score (micro avg) 0.236 2023-10-11 02:09:20,711 saving best model 2023-10-11 02:09:21,542 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:10:10,289 epoch 2 - iter 89/893 - loss 0.31498471 - time (sec): 48.74 - samples/sec: 515.58 - lr: 0.000148 - momentum: 0.000000 2023-10-11 02:11:00,221 epoch 2 - iter 178/893 - loss 0.29516314 - time (sec): 98.68 - samples/sec: 522.13 - lr: 0.000147 - momentum: 0.000000 2023-10-11 02:11:48,501 epoch 2 - iter 267/893 - loss 0.27648439 - time (sec): 146.96 - samples/sec: 519.52 - lr: 0.000145 - momentum: 0.000000 2023-10-11 02:12:36,435 epoch 2 - iter 356/893 - loss 0.25974186 - time (sec): 194.89 - samples/sec: 515.74 - lr: 0.000143 - momentum: 0.000000 2023-10-11 02:13:23,642 epoch 2 - iter 445/893 - loss 0.24526095 - time (sec): 242.10 - samples/sec: 514.55 - lr: 0.000142 - momentum: 0.000000 2023-10-11 02:14:11,342 epoch 2 - iter 534/893 - loss 0.23418021 - time (sec): 289.80 - samples/sec: 512.27 - lr: 0.000140 - momentum: 0.000000 2023-10-11 02:14:58,173 epoch 2 - iter 623/893 - loss 0.22272322 - time (sec): 336.63 - samples/sec: 511.28 - lr: 0.000138 - momentum: 0.000000 2023-10-11 02:15:47,390 epoch 2 - iter 712/893 - loss 0.21186423 - time (sec): 385.84 - samples/sec: 511.40 - lr: 0.000137 - momentum: 0.000000 2023-10-11 02:16:37,322 epoch 2 - iter 801/893 - loss 0.20247638 - time (sec): 435.78 - samples/sec: 512.65 - lr: 0.000135 - momentum: 0.000000 2023-10-11 02:17:28,585 epoch 2 - iter 890/893 - loss 0.19324619 - time (sec): 487.04 - samples/sec: 509.21 - lr: 0.000133 - momentum: 0.000000 2023-10-11 02:17:30,107 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:17:30,107 EPOCH 2 done: loss 0.1931 - lr: 0.000133 2023-10-11 02:17:51,205 DEV : loss 0.10340522974729538 - f1-score (micro avg) 0.7358 2023-10-11 02:17:51,239 saving best model 2023-10-11 02:17:53,750 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:18:44,287 epoch 3 - iter 89/893 - loss 0.09071773 - time (sec): 50.53 - samples/sec: 511.46 - lr: 0.000132 - momentum: 0.000000 2023-10-11 02:19:34,150 epoch 3 - iter 178/893 - loss 0.08766364 - time (sec): 100.39 - samples/sec: 484.80 - lr: 0.000130 - momentum: 0.000000 2023-10-11 02:20:23,940 epoch 3 - iter 267/893 - loss 0.08620654 - time (sec): 150.18 - samples/sec: 492.81 - lr: 0.000128 - momentum: 0.000000 2023-10-11 02:21:12,569 epoch 3 - iter 356/893 - loss 0.08710514 - time (sec): 198.81 - samples/sec: 494.07 - lr: 0.000127 - momentum: 0.000000 2023-10-11 02:22:00,699 epoch 3 - iter 445/893 - loss 0.08228171 - time (sec): 246.94 - samples/sec: 497.77 - lr: 0.000125 - momentum: 0.000000 2023-10-11 02:22:49,313 epoch 3 - iter 534/893 - loss 0.08245604 - time (sec): 295.56 - samples/sec: 501.25 - lr: 0.000123 - momentum: 0.000000 2023-10-11 02:23:41,764 epoch 3 - iter 623/893 - loss 0.08060727 - time (sec): 348.01 - samples/sec: 502.27 - lr: 0.000122 - momentum: 0.000000 2023-10-11 02:24:33,176 epoch 3 - iter 712/893 - loss 0.08031657 - time (sec): 399.42 - samples/sec: 499.87 - lr: 0.000120 - momentum: 0.000000 2023-10-11 02:25:23,560 epoch 3 - iter 801/893 - loss 0.07999967 - time (sec): 449.80 - samples/sec: 497.60 - lr: 0.000118 - momentum: 0.000000 2023-10-11 02:26:13,606 epoch 3 - iter 890/893 - loss 0.07986772 - time (sec): 499.85 - samples/sec: 496.50 - lr: 0.000117 - momentum: 0.000000 2023-10-11 02:26:15,048 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:26:15,048 EPOCH 3 done: loss 0.0799 - lr: 0.000117 2023-10-11 02:26:37,235 DEV : loss 0.1001172587275505 - f1-score (micro avg) 0.7796 2023-10-11 02:26:37,265 saving best model 2023-10-11 02:26:39,818 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:27:30,471 epoch 4 - iter 89/893 - loss 0.05380223 - time (sec): 50.65 - samples/sec: 488.72 - lr: 0.000115 - momentum: 0.000000 2023-10-11 02:28:20,599 epoch 4 - iter 178/893 - loss 0.04978763 - time (sec): 100.78 - samples/sec: 474.69 - lr: 0.000113 - momentum: 0.000000 2023-10-11 02:29:12,869 epoch 4 - iter 267/893 - loss 0.05029604 - time (sec): 153.05 - samples/sec: 482.96 - lr: 0.000112 - momentum: 0.000000 2023-10-11 02:30:04,806 epoch 4 - iter 356/893 - loss 0.05128141 - time (sec): 204.98 - samples/sec: 490.90 - lr: 0.000110 - momentum: 0.000000 2023-10-11 02:30:53,688 epoch 4 - iter 445/893 - loss 0.05039331 - time (sec): 253.87 - samples/sec: 487.56 - lr: 0.000108 - momentum: 0.000000 2023-10-11 02:31:42,720 epoch 4 - iter 534/893 - loss 0.05100630 - time (sec): 302.90 - samples/sec: 490.66 - lr: 0.000107 - momentum: 0.000000 2023-10-11 02:32:32,575 epoch 4 - iter 623/893 - loss 0.05103879 - time (sec): 352.75 - samples/sec: 494.73 - lr: 0.000105 - momentum: 0.000000 2023-10-11 02:33:22,895 epoch 4 - iter 712/893 - loss 0.05096798 - time (sec): 403.07 - samples/sec: 493.14 - lr: 0.000103 - momentum: 0.000000 2023-10-11 02:34:12,828 epoch 4 - iter 801/893 - loss 0.05049788 - time (sec): 453.01 - samples/sec: 492.51 - lr: 0.000102 - momentum: 0.000000 2023-10-11 02:35:02,250 epoch 4 - iter 890/893 - loss 0.05080113 - time (sec): 502.43 - samples/sec: 494.14 - lr: 0.000100 - momentum: 0.000000 2023-10-11 02:35:03,636 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:35:03,636 EPOCH 4 done: loss 0.0507 - lr: 0.000100 2023-10-11 02:35:24,945 DEV : loss 0.117433100938797 - f1-score (micro avg) 0.7895 2023-10-11 02:35:24,976 saving best model 2023-10-11 02:35:27,592 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:18,704 epoch 5 - iter 89/893 - loss 0.04326464 - time (sec): 51.11 - samples/sec: 486.91 - lr: 0.000098 - momentum: 0.000000 2023-10-11 02:37:07,649 epoch 5 - iter 178/893 - loss 0.04043555 - time (sec): 100.05 - samples/sec: 475.08 - lr: 0.000097 - momentum: 0.000000 2023-10-11 02:37:58,633 epoch 5 - iter 267/893 - loss 0.04063361 - time (sec): 151.04 - samples/sec: 479.21 - lr: 0.000095 - momentum: 0.000000 2023-10-11 02:38:47,397 epoch 5 - iter 356/893 - loss 0.04065561 - time (sec): 199.80 - samples/sec: 487.14 - lr: 0.000093 - momentum: 0.000000 2023-10-11 02:39:35,821 epoch 5 - iter 445/893 - loss 0.04107866 - time (sec): 248.22 - samples/sec: 488.55 - lr: 0.000092 - momentum: 0.000000 2023-10-11 02:40:25,648 epoch 5 - iter 534/893 - loss 0.03902400 - time (sec): 298.05 - samples/sec: 490.74 - lr: 0.000090 - momentum: 0.000000 2023-10-11 02:41:16,041 epoch 5 - iter 623/893 - loss 0.03913140 - time (sec): 348.45 - samples/sec: 495.76 - lr: 0.000088 - momentum: 0.000000 2023-10-11 02:42:05,861 epoch 5 - iter 712/893 - loss 0.03940436 - time (sec): 398.26 - samples/sec: 496.17 - lr: 0.000087 - momentum: 0.000000 2023-10-11 02:42:54,853 epoch 5 - iter 801/893 - loss 0.03813737 - time (sec): 447.26 - samples/sec: 496.93 - lr: 0.000085 - momentum: 0.000000 2023-10-11 02:43:45,529 epoch 5 - iter 890/893 - loss 0.03835348 - time (sec): 497.93 - samples/sec: 498.20 - lr: 0.000083 - momentum: 0.000000 2023-10-11 02:43:47,040 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:43:47,041 EPOCH 5 done: loss 0.0383 - lr: 0.000083 2023-10-11 02:44:09,474 DEV : loss 0.12915697693824768 - f1-score (micro avg) 0.7864 2023-10-11 02:44:09,504 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:45:03,744 epoch 6 - iter 89/893 - loss 0.02671786 - time (sec): 54.24 - samples/sec: 456.47 - lr: 0.000082 - momentum: 0.000000 2023-10-11 02:45:56,595 epoch 6 - iter 178/893 - loss 0.02772095 - time (sec): 107.09 - samples/sec: 465.30 - lr: 0.000080 - momentum: 0.000000 2023-10-11 02:46:48,856 epoch 6 - iter 267/893 - loss 0.02637075 - time (sec): 159.35 - samples/sec: 467.27 - lr: 0.000078 - momentum: 0.000000 2023-10-11 02:47:41,548 epoch 6 - iter 356/893 - loss 0.02780804 - time (sec): 212.04 - samples/sec: 467.98 - lr: 0.000077 - momentum: 0.000000 2023-10-11 02:48:35,284 epoch 6 - iter 445/893 - loss 0.02807407 - time (sec): 265.78 - samples/sec: 467.16 - lr: 0.000075 - momentum: 0.000000 2023-10-11 02:49:31,168 epoch 6 - iter 534/893 - loss 0.02808160 - time (sec): 321.66 - samples/sec: 466.96 - lr: 0.000073 - momentum: 0.000000 2023-10-11 02:50:23,488 epoch 6 - iter 623/893 - loss 0.02768363 - time (sec): 373.98 - samples/sec: 466.32 - lr: 0.000072 - momentum: 0.000000 2023-10-11 02:51:17,788 epoch 6 - iter 712/893 - loss 0.02797056 - time (sec): 428.28 - samples/sec: 465.85 - lr: 0.000070 - momentum: 0.000000 2023-10-11 02:52:12,329 epoch 6 - iter 801/893 - loss 0.02832246 - time (sec): 482.82 - samples/sec: 466.61 - lr: 0.000068 - momentum: 0.000000 2023-10-11 02:53:07,807 epoch 6 - iter 890/893 - loss 0.02898230 - time (sec): 538.30 - samples/sec: 461.25 - lr: 0.000067 - momentum: 0.000000 2023-10-11 02:53:09,228 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:53:09,229 EPOCH 6 done: loss 0.0289 - lr: 0.000067 2023-10-11 02:53:31,359 DEV : loss 0.13720029592514038 - f1-score (micro avg) 0.7888 2023-10-11 02:53:31,390 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:54:24,860 epoch 7 - iter 89/893 - loss 0.02389918 - time (sec): 53.47 - samples/sec: 511.70 - lr: 0.000065 - momentum: 0.000000 2023-10-11 02:55:17,199 epoch 7 - iter 178/893 - loss 0.02442693 - time (sec): 105.81 - samples/sec: 491.45 - lr: 0.000063 - momentum: 0.000000 2023-10-11 02:56:07,039 epoch 7 - iter 267/893 - loss 0.02393469 - time (sec): 155.65 - samples/sec: 486.40 - lr: 0.000062 - momentum: 0.000000 2023-10-11 02:56:57,236 epoch 7 - iter 356/893 - loss 0.02364784 - time (sec): 205.84 - samples/sec: 485.70 - lr: 0.000060 - momentum: 0.000000 2023-10-11 02:57:47,574 epoch 7 - iter 445/893 - loss 0.02319517 - time (sec): 256.18 - samples/sec: 485.39 - lr: 0.000058 - momentum: 0.000000 2023-10-11 02:58:39,926 epoch 7 - iter 534/893 - loss 0.02254515 - time (sec): 308.53 - samples/sec: 485.71 - lr: 0.000057 - momentum: 0.000000 2023-10-11 02:59:31,138 epoch 7 - iter 623/893 - loss 0.02241681 - time (sec): 359.75 - samples/sec: 482.79 - lr: 0.000055 - momentum: 0.000000 2023-10-11 03:00:26,161 epoch 7 - iter 712/893 - loss 0.02214750 - time (sec): 414.77 - samples/sec: 482.36 - lr: 0.000053 - momentum: 0.000000 2023-10-11 03:01:16,103 epoch 7 - iter 801/893 - loss 0.02204018 - time (sec): 464.71 - samples/sec: 483.40 - lr: 0.000052 - momentum: 0.000000 2023-10-11 03:02:04,996 epoch 7 - iter 890/893 - loss 0.02241433 - time (sec): 513.60 - samples/sec: 482.95 - lr: 0.000050 - momentum: 0.000000 2023-10-11 03:02:06,525 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:02:06,526 EPOCH 7 done: loss 0.0224 - lr: 0.000050 2023-10-11 03:02:27,764 DEV : loss 0.15427739918231964 - f1-score (micro avg) 0.7838 2023-10-11 03:02:27,794 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:03:17,261 epoch 8 - iter 89/893 - loss 0.01644759 - time (sec): 49.46 - samples/sec: 486.76 - lr: 0.000048 - momentum: 0.000000 2023-10-11 03:04:05,512 epoch 8 - iter 178/893 - loss 0.01439801 - time (sec): 97.72 - samples/sec: 497.24 - lr: 0.000047 - momentum: 0.000000 2023-10-11 03:04:58,009 epoch 8 - iter 267/893 - loss 0.01392832 - time (sec): 150.21 - samples/sec: 489.72 - lr: 0.000045 - momentum: 0.000000 2023-10-11 03:05:52,634 epoch 8 - iter 356/893 - loss 0.01599008 - time (sec): 204.84 - samples/sec: 484.90 - lr: 0.000043 - momentum: 0.000000 2023-10-11 03:06:48,250 epoch 8 - iter 445/893 - loss 0.01659639 - time (sec): 260.45 - samples/sec: 474.24 - lr: 0.000042 - momentum: 0.000000 2023-10-11 03:07:44,160 epoch 8 - iter 534/893 - loss 0.01829444 - time (sec): 316.36 - samples/sec: 468.84 - lr: 0.000040 - momentum: 0.000000 2023-10-11 03:08:36,417 epoch 8 - iter 623/893 - loss 0.01815970 - time (sec): 368.62 - samples/sec: 468.62 - lr: 0.000038 - momentum: 0.000000 2023-10-11 03:09:32,746 epoch 8 - iter 712/893 - loss 0.01780340 - time (sec): 424.95 - samples/sec: 466.93 - lr: 0.000037 - momentum: 0.000000 2023-10-11 03:10:25,528 epoch 8 - iter 801/893 - loss 0.01774196 - time (sec): 477.73 - samples/sec: 466.21 - lr: 0.000035 - momentum: 0.000000 2023-10-11 03:11:20,029 epoch 8 - iter 890/893 - loss 0.01749401 - time (sec): 532.23 - samples/sec: 465.57 - lr: 0.000033 - momentum: 0.000000 2023-10-11 03:11:21,784 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:11:21,784 EPOCH 8 done: loss 0.0176 - lr: 0.000033 2023-10-11 03:11:44,400 DEV : loss 0.17111782729625702 - f1-score (micro avg) 0.8003 2023-10-11 03:11:44,436 saving best model 2023-10-11 03:11:46,991 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:12:39,531 epoch 9 - iter 89/893 - loss 0.01256651 - time (sec): 52.54 - samples/sec: 491.41 - lr: 0.000032 - momentum: 0.000000 2023-10-11 03:13:33,242 epoch 9 - iter 178/893 - loss 0.01534048 - time (sec): 106.25 - samples/sec: 472.82 - lr: 0.000030 - momentum: 0.000000 2023-10-11 03:14:27,777 epoch 9 - iter 267/893 - loss 0.01378958 - time (sec): 160.78 - samples/sec: 465.55 - lr: 0.000028 - momentum: 0.000000 2023-10-11 03:15:20,646 epoch 9 - iter 356/893 - loss 0.01377709 - time (sec): 213.65 - samples/sec: 464.37 - lr: 0.000027 - momentum: 0.000000 2023-10-11 03:16:13,432 epoch 9 - iter 445/893 - loss 0.01258058 - time (sec): 266.44 - samples/sec: 465.09 - lr: 0.000025 - momentum: 0.000000 2023-10-11 03:17:04,781 epoch 9 - iter 534/893 - loss 0.01314215 - time (sec): 317.79 - samples/sec: 467.46 - lr: 0.000023 - momentum: 0.000000 2023-10-11 03:17:54,474 epoch 9 - iter 623/893 - loss 0.01350092 - time (sec): 367.48 - samples/sec: 466.69 - lr: 0.000022 - momentum: 0.000000 2023-10-11 03:18:48,473 epoch 9 - iter 712/893 - loss 0.01447242 - time (sec): 421.48 - samples/sec: 467.37 - lr: 0.000020 - momentum: 0.000000 2023-10-11 03:19:43,268 epoch 9 - iter 801/893 - loss 0.01452537 - time (sec): 476.27 - samples/sec: 468.86 - lr: 0.000019 - momentum: 0.000000 2023-10-11 03:20:34,952 epoch 9 - iter 890/893 - loss 0.01466660 - time (sec): 527.96 - samples/sec: 469.52 - lr: 0.000017 - momentum: 0.000000 2023-10-11 03:20:36,621 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:20:36,621 EPOCH 9 done: loss 0.0147 - lr: 0.000017 2023-10-11 03:20:59,124 DEV : loss 0.17805925011634827 - f1-score (micro avg) 0.7981 2023-10-11 03:20:59,158 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:21:47,938 epoch 10 - iter 89/893 - loss 0.01343074 - time (sec): 48.78 - samples/sec: 481.71 - lr: 0.000015 - momentum: 0.000000 2023-10-11 03:22:37,293 epoch 10 - iter 178/893 - loss 0.01307677 - time (sec): 98.13 - samples/sec: 492.76 - lr: 0.000013 - momentum: 0.000000 2023-10-11 03:23:27,892 epoch 10 - iter 267/893 - loss 0.01186327 - time (sec): 148.73 - samples/sec: 497.09 - lr: 0.000012 - momentum: 0.000000 2023-10-11 03:24:17,105 epoch 10 - iter 356/893 - loss 0.01107913 - time (sec): 197.95 - samples/sec: 494.17 - lr: 0.000010 - momentum: 0.000000 2023-10-11 03:25:07,498 epoch 10 - iter 445/893 - loss 0.01167501 - time (sec): 248.34 - samples/sec: 499.17 - lr: 0.000008 - momentum: 0.000000 2023-10-11 03:25:59,498 epoch 10 - iter 534/893 - loss 0.01203564 - time (sec): 300.34 - samples/sec: 501.72 - lr: 0.000007 - momentum: 0.000000 2023-10-11 03:26:49,245 epoch 10 - iter 623/893 - loss 0.01234827 - time (sec): 350.09 - samples/sec: 496.60 - lr: 0.000005 - momentum: 0.000000 2023-10-11 03:27:41,346 epoch 10 - iter 712/893 - loss 0.01235296 - time (sec): 402.19 - samples/sec: 495.65 - lr: 0.000004 - momentum: 0.000000 2023-10-11 03:28:36,110 epoch 10 - iter 801/893 - loss 0.01233653 - time (sec): 456.95 - samples/sec: 489.75 - lr: 0.000002 - momentum: 0.000000 2023-10-11 03:29:27,337 epoch 10 - iter 890/893 - loss 0.01217264 - time (sec): 508.18 - samples/sec: 488.21 - lr: 0.000000 - momentum: 0.000000 2023-10-11 03:29:28,848 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:29:28,848 EPOCH 10 done: loss 0.0121 - lr: 0.000000 2023-10-11 03:29:51,390 DEV : loss 0.1803114116191864 - f1-score (micro avg) 0.7928 2023-10-11 03:29:52,270 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:29:52,272 Loading model from best epoch ... 2023-10-11 03:29:56,254 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 03:31:03,930 Results: - F-score (micro) 0.7132 - F-score (macro) 0.6199 - Accuracy 0.5691 By class: precision recall f1-score support LOC 0.7458 0.7342 0.7400 1095 PER 0.7884 0.7806 0.7845 1012 ORG 0.4342 0.5910 0.5006 357 HumanProd 0.3636 0.6061 0.4545 33 micro avg 0.6963 0.7309 0.7132 2497 macro avg 0.5830 0.6780 0.6199 2497 weighted avg 0.7135 0.7309 0.7200 2497 2023-10-11 03:31:03,930 ----------------------------------------------------------------------------------------------------