2023-10-11 06:39:23,992 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,994 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 06:39:23,994 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,995 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-11 06:39:23,995 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,995 Train: 7142 sentences 2023-10-11 06:39:23,995 (train_with_dev=False, train_with_test=False) 2023-10-11 06:39:23,995 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,995 Training Params: 2023-10-11 06:39:23,995 - learning_rate: "0.00016" 2023-10-11 06:39:23,995 - mini_batch_size: "4" 2023-10-11 06:39:23,995 - max_epochs: "10" 2023-10-11 06:39:23,995 - shuffle: "True" 2023-10-11 06:39:23,995 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,995 Plugins: 2023-10-11 06:39:23,995 - TensorboardLogger 2023-10-11 06:39:23,995 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 06:39:23,995 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,996 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 06:39:23,996 - metric: "('micro avg', 'f1-score')" 2023-10-11 06:39:23,996 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,996 Computation: 2023-10-11 06:39:23,996 - compute on device: cuda:0 2023-10-11 06:39:23,996 - embedding storage: none 2023-10-11 06:39:23,996 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,996 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" 2023-10-11 06:39:23,996 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,996 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:39:23,996 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 06:40:19,912 epoch 1 - iter 178/1786 - loss 2.82900261 - time (sec): 55.91 - samples/sec: 474.93 - lr: 0.000016 - momentum: 0.000000 2023-10-11 06:41:14,029 epoch 1 - iter 356/1786 - loss 2.67404787 - time (sec): 110.03 - samples/sec: 470.57 - lr: 0.000032 - momentum: 0.000000 2023-10-11 06:42:09,632 epoch 1 - iter 534/1786 - loss 2.38436083 - time (sec): 165.63 - samples/sec: 468.96 - lr: 0.000048 - momentum: 0.000000 2023-10-11 06:43:05,404 epoch 1 - iter 712/1786 - loss 2.06699231 - time (sec): 221.41 - samples/sec: 464.97 - lr: 0.000064 - momentum: 0.000000 2023-10-11 06:44:01,010 epoch 1 - iter 890/1786 - loss 1.80287645 - time (sec): 277.01 - samples/sec: 459.40 - lr: 0.000080 - momentum: 0.000000 2023-10-11 06:44:56,410 epoch 1 - iter 1068/1786 - loss 1.59842649 - time (sec): 332.41 - samples/sec: 460.39 - lr: 0.000096 - momentum: 0.000000 2023-10-11 06:45:50,950 epoch 1 - iter 1246/1786 - loss 1.43900727 - time (sec): 386.95 - samples/sec: 459.78 - lr: 0.000112 - momentum: 0.000000 2023-10-11 06:46:43,346 epoch 1 - iter 1424/1786 - loss 1.31683155 - time (sec): 439.35 - samples/sec: 457.72 - lr: 0.000127 - momentum: 0.000000 2023-10-11 06:47:34,731 epoch 1 - iter 1602/1786 - loss 1.21390367 - time (sec): 490.73 - samples/sec: 457.50 - lr: 0.000143 - momentum: 0.000000 2023-10-11 06:48:27,379 epoch 1 - iter 1780/1786 - loss 1.12581674 - time (sec): 543.38 - samples/sec: 456.87 - lr: 0.000159 - momentum: 0.000000 2023-10-11 06:48:28,795 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:48:28,795 EPOCH 1 done: loss 1.1242 - lr: 0.000159 2023-10-11 06:48:48,043 DEV : loss 0.21339952945709229 - f1-score (micro avg) 0.4704 2023-10-11 06:48:48,073 saving best model 2023-10-11 06:48:48,971 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:49:40,643 epoch 2 - iter 178/1786 - loss 0.20730434 - time (sec): 51.67 - samples/sec: 486.40 - lr: 0.000158 - momentum: 0.000000 2023-10-11 06:50:33,708 epoch 2 - iter 356/1786 - loss 0.19312469 - time (sec): 104.73 - samples/sec: 491.93 - lr: 0.000156 - momentum: 0.000000 2023-10-11 06:51:25,317 epoch 2 - iter 534/1786 - loss 0.18172318 - time (sec): 156.34 - samples/sec: 488.33 - lr: 0.000155 - momentum: 0.000000 2023-10-11 06:52:16,339 epoch 2 - iter 712/1786 - loss 0.17387846 - time (sec): 207.37 - samples/sec: 484.71 - lr: 0.000153 - momentum: 0.000000 2023-10-11 06:53:07,904 epoch 2 - iter 890/1786 - loss 0.16698512 - time (sec): 258.93 - samples/sec: 481.09 - lr: 0.000151 - momentum: 0.000000 2023-10-11 06:53:58,781 epoch 2 - iter 1068/1786 - loss 0.16051997 - time (sec): 309.81 - samples/sec: 479.18 - lr: 0.000149 - momentum: 0.000000 2023-10-11 06:54:49,112 epoch 2 - iter 1246/1786 - loss 0.15405773 - time (sec): 360.14 - samples/sec: 477.91 - lr: 0.000148 - momentum: 0.000000 2023-10-11 06:55:41,208 epoch 2 - iter 1424/1786 - loss 0.14923648 - time (sec): 412.23 - samples/sec: 478.67 - lr: 0.000146 - momentum: 0.000000 2023-10-11 06:56:35,914 epoch 2 - iter 1602/1786 - loss 0.14482467 - time (sec): 466.94 - samples/sec: 478.44 - lr: 0.000144 - momentum: 0.000000 2023-10-11 06:57:29,822 epoch 2 - iter 1780/1786 - loss 0.13951975 - time (sec): 520.85 - samples/sec: 476.15 - lr: 0.000142 - momentum: 0.000000 2023-10-11 06:57:31,487 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:57:31,487 EPOCH 2 done: loss 0.1394 - lr: 0.000142 2023-10-11 06:57:53,454 DEV : loss 0.12035670131444931 - f1-score (micro avg) 0.7478 2023-10-11 06:57:53,485 saving best model 2023-10-11 06:57:56,090 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:58:49,895 epoch 3 - iter 178/1786 - loss 0.07648250 - time (sec): 53.80 - samples/sec: 480.38 - lr: 0.000140 - momentum: 0.000000 2023-10-11 06:59:42,427 epoch 3 - iter 356/1786 - loss 0.07475891 - time (sec): 106.33 - samples/sec: 457.73 - lr: 0.000139 - momentum: 0.000000 2023-10-11 07:00:36,639 epoch 3 - iter 534/1786 - loss 0.07760944 - time (sec): 160.54 - samples/sec: 461.01 - lr: 0.000137 - momentum: 0.000000 2023-10-11 07:01:30,450 epoch 3 - iter 712/1786 - loss 0.07885752 - time (sec): 214.36 - samples/sec: 458.24 - lr: 0.000135 - momentum: 0.000000 2023-10-11 07:02:24,329 epoch 3 - iter 890/1786 - loss 0.07553472 - time (sec): 268.23 - samples/sec: 458.26 - lr: 0.000133 - momentum: 0.000000 2023-10-11 07:03:19,934 epoch 3 - iter 1068/1786 - loss 0.07803250 - time (sec): 323.84 - samples/sec: 457.48 - lr: 0.000132 - momentum: 0.000000 2023-10-11 07:04:18,636 epoch 3 - iter 1246/1786 - loss 0.07541694 - time (sec): 382.54 - samples/sec: 456.93 - lr: 0.000130 - momentum: 0.000000 2023-10-11 07:05:16,926 epoch 3 - iter 1424/1786 - loss 0.07566700 - time (sec): 440.83 - samples/sec: 452.91 - lr: 0.000128 - momentum: 0.000000 2023-10-11 07:06:14,488 epoch 3 - iter 1602/1786 - loss 0.07521801 - time (sec): 498.39 - samples/sec: 449.09 - lr: 0.000126 - momentum: 0.000000 2023-10-11 07:07:11,671 epoch 3 - iter 1780/1786 - loss 0.07599665 - time (sec): 555.58 - samples/sec: 446.70 - lr: 0.000125 - momentum: 0.000000 2023-10-11 07:07:13,371 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:07:13,371 EPOCH 3 done: loss 0.0763 - lr: 0.000125 2023-10-11 07:07:36,874 DEV : loss 0.12293554842472076 - f1-score (micro avg) 0.7694 2023-10-11 07:07:36,908 saving best model 2023-10-11 07:07:39,619 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:08:32,498 epoch 4 - iter 178/1786 - loss 0.06175889 - time (sec): 52.87 - samples/sec: 468.15 - lr: 0.000123 - momentum: 0.000000 2023-10-11 07:09:25,152 epoch 4 - iter 356/1786 - loss 0.05291025 - time (sec): 105.53 - samples/sec: 453.32 - lr: 0.000121 - momentum: 0.000000 2023-10-11 07:10:25,644 epoch 4 - iter 534/1786 - loss 0.05155940 - time (sec): 166.02 - samples/sec: 445.22 - lr: 0.000119 - momentum: 0.000000 2023-10-11 07:11:24,877 epoch 4 - iter 712/1786 - loss 0.05289251 - time (sec): 225.25 - samples/sec: 446.72 - lr: 0.000117 - momentum: 0.000000 2023-10-11 07:12:16,721 epoch 4 - iter 890/1786 - loss 0.05247741 - time (sec): 277.10 - samples/sec: 446.69 - lr: 0.000116 - momentum: 0.000000 2023-10-11 07:13:08,568 epoch 4 - iter 1068/1786 - loss 0.05140588 - time (sec): 328.94 - samples/sec: 451.81 - lr: 0.000114 - momentum: 0.000000 2023-10-11 07:14:02,792 epoch 4 - iter 1246/1786 - loss 0.05313146 - time (sec): 383.17 - samples/sec: 455.46 - lr: 0.000112 - momentum: 0.000000 2023-10-11 07:14:58,620 epoch 4 - iter 1424/1786 - loss 0.05344596 - time (sec): 439.00 - samples/sec: 452.79 - lr: 0.000110 - momentum: 0.000000 2023-10-11 07:15:55,417 epoch 4 - iter 1602/1786 - loss 0.05307124 - time (sec): 495.79 - samples/sec: 450.00 - lr: 0.000109 - momentum: 0.000000 2023-10-11 07:16:48,635 epoch 4 - iter 1780/1786 - loss 0.05340878 - time (sec): 549.01 - samples/sec: 452.21 - lr: 0.000107 - momentum: 0.000000 2023-10-11 07:16:50,081 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:16:50,081 EPOCH 4 done: loss 0.0533 - lr: 0.000107 2023-10-11 07:17:11,531 DEV : loss 0.15573516488075256 - f1-score (micro avg) 0.7904 2023-10-11 07:17:11,562 saving best model 2023-10-11 07:17:14,160 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:18:09,045 epoch 5 - iter 178/1786 - loss 0.04583648 - time (sec): 54.87 - samples/sec: 453.51 - lr: 0.000105 - momentum: 0.000000 2023-10-11 07:19:01,699 epoch 5 - iter 356/1786 - loss 0.04299453 - time (sec): 107.53 - samples/sec: 442.06 - lr: 0.000103 - momentum: 0.000000 2023-10-11 07:19:53,208 epoch 5 - iter 534/1786 - loss 0.04267254 - time (sec): 159.04 - samples/sec: 455.11 - lr: 0.000101 - momentum: 0.000000 2023-10-11 07:20:50,338 epoch 5 - iter 712/1786 - loss 0.04135181 - time (sec): 216.17 - samples/sec: 450.26 - lr: 0.000100 - momentum: 0.000000 2023-10-11 07:21:51,130 epoch 5 - iter 890/1786 - loss 0.04091209 - time (sec): 276.96 - samples/sec: 437.86 - lr: 0.000098 - momentum: 0.000000 2023-10-11 07:22:51,933 epoch 5 - iter 1068/1786 - loss 0.03936224 - time (sec): 337.76 - samples/sec: 433.05 - lr: 0.000096 - momentum: 0.000000 2023-10-11 07:23:51,852 epoch 5 - iter 1246/1786 - loss 0.03988643 - time (sec): 397.68 - samples/sec: 434.38 - lr: 0.000094 - momentum: 0.000000 2023-10-11 07:24:47,391 epoch 5 - iter 1424/1786 - loss 0.04034676 - time (sec): 453.22 - samples/sec: 436.01 - lr: 0.000093 - momentum: 0.000000 2023-10-11 07:25:43,025 epoch 5 - iter 1602/1786 - loss 0.03940605 - time (sec): 508.85 - samples/sec: 436.78 - lr: 0.000091 - momentum: 0.000000 2023-10-11 07:26:37,195 epoch 5 - iter 1780/1786 - loss 0.03943045 - time (sec): 563.02 - samples/sec: 440.61 - lr: 0.000089 - momentum: 0.000000 2023-10-11 07:26:38,788 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:26:38,788 EPOCH 5 done: loss 0.0393 - lr: 0.000089 2023-10-11 07:26:59,794 DEV : loss 0.16520686447620392 - f1-score (micro avg) 0.7995 2023-10-11 07:26:59,824 saving best model 2023-10-11 07:27:02,424 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:27:57,899 epoch 6 - iter 178/1786 - loss 0.03094883 - time (sec): 55.47 - samples/sec: 446.32 - lr: 0.000087 - momentum: 0.000000 2023-10-11 07:28:53,808 epoch 6 - iter 356/1786 - loss 0.03037107 - time (sec): 111.38 - samples/sec: 447.38 - lr: 0.000085 - momentum: 0.000000 2023-10-11 07:29:48,736 epoch 6 - iter 534/1786 - loss 0.02899602 - time (sec): 166.31 - samples/sec: 447.72 - lr: 0.000084 - momentum: 0.000000 2023-10-11 07:30:44,570 epoch 6 - iter 712/1786 - loss 0.02900632 - time (sec): 222.14 - samples/sec: 446.71 - lr: 0.000082 - momentum: 0.000000 2023-10-11 07:31:40,780 epoch 6 - iter 890/1786 - loss 0.02906074 - time (sec): 278.35 - samples/sec: 446.06 - lr: 0.000080 - momentum: 0.000000 2023-10-11 07:32:38,978 epoch 6 - iter 1068/1786 - loss 0.02883507 - time (sec): 336.55 - samples/sec: 446.30 - lr: 0.000078 - momentum: 0.000000 2023-10-11 07:33:33,658 epoch 6 - iter 1246/1786 - loss 0.02715296 - time (sec): 391.23 - samples/sec: 445.77 - lr: 0.000077 - momentum: 0.000000 2023-10-11 07:34:29,044 epoch 6 - iter 1424/1786 - loss 0.02742899 - time (sec): 446.62 - samples/sec: 446.73 - lr: 0.000075 - momentum: 0.000000 2023-10-11 07:35:22,481 epoch 6 - iter 1602/1786 - loss 0.02809173 - time (sec): 500.05 - samples/sec: 450.53 - lr: 0.000073 - momentum: 0.000000 2023-10-11 07:36:15,937 epoch 6 - iter 1780/1786 - loss 0.02903797 - time (sec): 553.51 - samples/sec: 448.58 - lr: 0.000071 - momentum: 0.000000 2023-10-11 07:36:17,437 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:36:17,438 EPOCH 6 done: loss 0.0290 - lr: 0.000071 2023-10-11 07:36:42,730 DEV : loss 0.16866964101791382 - f1-score (micro avg) 0.7947 2023-10-11 07:36:42,764 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:37:41,091 epoch 7 - iter 178/1786 - loss 0.02176058 - time (sec): 58.33 - samples/sec: 469.08 - lr: 0.000069 - momentum: 0.000000 2023-10-11 07:38:39,578 epoch 7 - iter 356/1786 - loss 0.02292926 - time (sec): 116.81 - samples/sec: 445.15 - lr: 0.000068 - momentum: 0.000000 2023-10-11 07:39:34,898 epoch 7 - iter 534/1786 - loss 0.02247289 - time (sec): 172.13 - samples/sec: 439.81 - lr: 0.000066 - momentum: 0.000000 2023-10-11 07:40:31,803 epoch 7 - iter 712/1786 - loss 0.02241562 - time (sec): 229.04 - samples/sec: 436.51 - lr: 0.000064 - momentum: 0.000000 2023-10-11 07:41:27,757 epoch 7 - iter 890/1786 - loss 0.02220387 - time (sec): 284.99 - samples/sec: 436.33 - lr: 0.000062 - momentum: 0.000000 2023-10-11 07:42:23,342 epoch 7 - iter 1068/1786 - loss 0.02137649 - time (sec): 340.58 - samples/sec: 440.01 - lr: 0.000061 - momentum: 0.000000 2023-10-11 07:43:18,915 epoch 7 - iter 1246/1786 - loss 0.02099827 - time (sec): 396.15 - samples/sec: 438.43 - lr: 0.000059 - momentum: 0.000000 2023-10-11 07:44:18,289 epoch 7 - iter 1424/1786 - loss 0.02038066 - time (sec): 455.52 - samples/sec: 439.21 - lr: 0.000057 - momentum: 0.000000 2023-10-11 07:45:16,495 epoch 7 - iter 1602/1786 - loss 0.01985291 - time (sec): 513.73 - samples/sec: 437.28 - lr: 0.000055 - momentum: 0.000000 2023-10-11 07:46:09,692 epoch 7 - iter 1780/1786 - loss 0.02003687 - time (sec): 566.93 - samples/sec: 437.53 - lr: 0.000053 - momentum: 0.000000 2023-10-11 07:46:11,314 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:46:11,314 EPOCH 7 done: loss 0.0201 - lr: 0.000053 2023-10-11 07:46:32,834 DEV : loss 0.19440826773643494 - f1-score (micro avg) 0.7954 2023-10-11 07:46:32,868 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:47:26,349 epoch 8 - iter 178/1786 - loss 0.02118880 - time (sec): 53.48 - samples/sec: 450.21 - lr: 0.000052 - momentum: 0.000000 2023-10-11 07:48:21,828 epoch 8 - iter 356/1786 - loss 0.01874784 - time (sec): 108.96 - samples/sec: 445.93 - lr: 0.000050 - momentum: 0.000000 2023-10-11 07:49:16,307 epoch 8 - iter 534/1786 - loss 0.01722056 - time (sec): 163.44 - samples/sec: 450.09 - lr: 0.000048 - momentum: 0.000000 2023-10-11 07:50:13,820 epoch 8 - iter 712/1786 - loss 0.01921550 - time (sec): 220.95 - samples/sec: 449.54 - lr: 0.000046 - momentum: 0.000000 2023-10-11 07:51:10,663 epoch 8 - iter 890/1786 - loss 0.01831347 - time (sec): 277.79 - samples/sec: 444.64 - lr: 0.000044 - momentum: 0.000000 2023-10-11 07:52:09,449 epoch 8 - iter 1068/1786 - loss 0.01900935 - time (sec): 336.58 - samples/sec: 440.68 - lr: 0.000043 - momentum: 0.000000 2023-10-11 07:53:04,351 epoch 8 - iter 1246/1786 - loss 0.01828550 - time (sec): 391.48 - samples/sec: 441.25 - lr: 0.000041 - momentum: 0.000000 2023-10-11 07:54:00,162 epoch 8 - iter 1424/1786 - loss 0.01722443 - time (sec): 447.29 - samples/sec: 443.60 - lr: 0.000039 - momentum: 0.000000 2023-10-11 07:54:56,327 epoch 8 - iter 1602/1786 - loss 0.01696995 - time (sec): 503.46 - samples/sec: 442.39 - lr: 0.000037 - momentum: 0.000000 2023-10-11 07:55:53,224 epoch 8 - iter 1780/1786 - loss 0.01668282 - time (sec): 560.35 - samples/sec: 442.21 - lr: 0.000036 - momentum: 0.000000 2023-10-11 07:55:55,029 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:55:55,030 EPOCH 8 done: loss 0.0167 - lr: 0.000036 2023-10-11 07:56:17,077 DEV : loss 0.1955002099275589 - f1-score (micro avg) 0.803 2023-10-11 07:56:17,108 saving best model 2023-10-11 07:56:19,773 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:57:19,182 epoch 9 - iter 178/1786 - loss 0.00753763 - time (sec): 59.40 - samples/sec: 434.60 - lr: 0.000034 - momentum: 0.000000 2023-10-11 07:58:14,298 epoch 9 - iter 356/1786 - loss 0.01214607 - time (sec): 114.52 - samples/sec: 438.67 - lr: 0.000032 - momentum: 0.000000 2023-10-11 07:59:07,909 epoch 9 - iter 534/1786 - loss 0.00983533 - time (sec): 168.13 - samples/sec: 445.20 - lr: 0.000030 - momentum: 0.000000 2023-10-11 08:00:01,122 epoch 9 - iter 712/1786 - loss 0.01002213 - time (sec): 221.34 - samples/sec: 448.23 - lr: 0.000028 - momentum: 0.000000 2023-10-11 08:00:56,745 epoch 9 - iter 890/1786 - loss 0.00888437 - time (sec): 276.97 - samples/sec: 447.41 - lr: 0.000027 - momentum: 0.000000 2023-10-11 08:01:52,410 epoch 9 - iter 1068/1786 - loss 0.00895876 - time (sec): 332.63 - samples/sec: 446.60 - lr: 0.000025 - momentum: 0.000000 2023-10-11 08:02:51,663 epoch 9 - iter 1246/1786 - loss 0.00900417 - time (sec): 391.89 - samples/sec: 437.63 - lr: 0.000023 - momentum: 0.000000 2023-10-11 08:03:51,512 epoch 9 - iter 1424/1786 - loss 0.01017587 - time (sec): 451.73 - samples/sec: 436.07 - lr: 0.000021 - momentum: 0.000000 2023-10-11 08:04:46,799 epoch 9 - iter 1602/1786 - loss 0.01074771 - time (sec): 507.02 - samples/sec: 440.42 - lr: 0.000020 - momentum: 0.000000 2023-10-11 08:05:40,266 epoch 9 - iter 1780/1786 - loss 0.01046865 - time (sec): 560.49 - samples/sec: 442.27 - lr: 0.000018 - momentum: 0.000000 2023-10-11 08:05:42,023 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:05:42,023 EPOCH 9 done: loss 0.0104 - lr: 0.000018 2023-10-11 08:06:04,094 DEV : loss 0.21326804161071777 - f1-score (micro avg) 0.7971 2023-10-11 08:06:04,125 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:06:58,019 epoch 10 - iter 178/1786 - loss 0.00672035 - time (sec): 53.89 - samples/sec: 436.00 - lr: 0.000016 - momentum: 0.000000 2023-10-11 08:07:53,539 epoch 10 - iter 356/1786 - loss 0.00687932 - time (sec): 109.41 - samples/sec: 441.96 - lr: 0.000014 - momentum: 0.000000 2023-10-11 08:08:52,210 epoch 10 - iter 534/1786 - loss 0.00685882 - time (sec): 168.08 - samples/sec: 439.86 - lr: 0.000012 - momentum: 0.000000 2023-10-11 08:09:49,775 epoch 10 - iter 712/1786 - loss 0.00638074 - time (sec): 225.65 - samples/sec: 433.50 - lr: 0.000011 - momentum: 0.000000 2023-10-11 08:10:46,408 epoch 10 - iter 890/1786 - loss 0.00750964 - time (sec): 282.28 - samples/sec: 439.14 - lr: 0.000009 - momentum: 0.000000 2023-10-11 08:11:42,983 epoch 10 - iter 1068/1786 - loss 0.00846706 - time (sec): 338.86 - samples/sec: 444.69 - lr: 0.000007 - momentum: 0.000000 2023-10-11 08:12:37,467 epoch 10 - iter 1246/1786 - loss 0.00856117 - time (sec): 393.34 - samples/sec: 441.99 - lr: 0.000005 - momentum: 0.000000 2023-10-11 08:13:35,598 epoch 10 - iter 1424/1786 - loss 0.00826761 - time (sec): 451.47 - samples/sec: 441.54 - lr: 0.000004 - momentum: 0.000000 2023-10-11 08:14:36,646 epoch 10 - iter 1602/1786 - loss 0.00793073 - time (sec): 512.52 - samples/sec: 436.65 - lr: 0.000002 - momentum: 0.000000 2023-10-11 08:15:31,095 epoch 10 - iter 1780/1786 - loss 0.00780604 - time (sec): 566.97 - samples/sec: 437.59 - lr: 0.000000 - momentum: 0.000000 2023-10-11 08:15:32,665 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:15:32,666 EPOCH 10 done: loss 0.0078 - lr: 0.000000 2023-10-11 08:15:53,506 DEV : loss 0.21688954532146454 - f1-score (micro avg) 0.7973 2023-10-11 08:15:54,599 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:15:54,601 Loading model from best epoch ... 2023-10-11 08:15:58,969 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 08:17:06,782 Results: - F-score (micro) 0.7048 - F-score (macro) 0.6244 - Accuracy 0.5644 By class: precision recall f1-score support LOC 0.7416 0.7078 0.7243 1095 PER 0.7849 0.7826 0.7838 1012 ORG 0.4219 0.5826 0.4894 357 HumanProd 0.4000 0.6667 0.5000 33 micro avg 0.6906 0.7197 0.7048 2497 macro avg 0.5871 0.6849 0.6244 2497 weighted avg 0.7090 0.7197 0.7119 2497 2023-10-11 08:17:06,782 ----------------------------------------------------------------------------------------------------