2023-10-11 19:13:51,177 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,179 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 19:13:51,179 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,180 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-11 19:13:51,180 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,180 Train: 7142 sentences 2023-10-11 19:13:51,180 (train_with_dev=False, train_with_test=False) 2023-10-11 19:13:51,180 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,180 Training Params: 2023-10-11 19:13:51,180 - learning_rate: "0.00016" 2023-10-11 19:13:51,180 - mini_batch_size: "4" 2023-10-11 19:13:51,180 - max_epochs: "10" 2023-10-11 19:13:51,180 - shuffle: "True" 2023-10-11 19:13:51,180 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,180 Plugins: 2023-10-11 19:13:51,180 - TensorboardLogger 2023-10-11 19:13:51,180 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 19:13:51,181 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,181 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 19:13:51,181 - metric: "('micro avg', 'f1-score')" 2023-10-11 19:13:51,181 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,181 Computation: 2023-10-11 19:13:51,181 - compute on device: cuda:0 2023-10-11 19:13:51,181 - embedding storage: none 2023-10-11 19:13:51,181 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,181 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-11 19:13:51,181 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,181 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:51,181 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 19:14:45,792 epoch 1 - iter 178/1786 - loss 2.81726663 - time (sec): 54.61 - samples/sec: 427.42 - lr: 0.000016 - momentum: 0.000000 2023-10-11 19:15:39,272 epoch 1 - iter 356/1786 - loss 2.63354128 - time (sec): 108.09 - samples/sec: 443.28 - lr: 0.000032 - momentum: 0.000000 2023-10-11 19:16:33,216 epoch 1 - iter 534/1786 - loss 2.35130578 - time (sec): 162.03 - samples/sec: 444.80 - lr: 0.000048 - momentum: 0.000000 2023-10-11 19:17:26,272 epoch 1 - iter 712/1786 - loss 2.04487747 - time (sec): 215.09 - samples/sec: 450.33 - lr: 0.000064 - momentum: 0.000000 2023-10-11 19:18:17,461 epoch 1 - iter 890/1786 - loss 1.77647319 - time (sec): 266.28 - samples/sec: 451.84 - lr: 0.000080 - momentum: 0.000000 2023-10-11 19:19:13,643 epoch 1 - iter 1068/1786 - loss 1.55935737 - time (sec): 322.46 - samples/sec: 453.84 - lr: 0.000096 - momentum: 0.000000 2023-10-11 19:20:09,974 epoch 1 - iter 1246/1786 - loss 1.37945532 - time (sec): 378.79 - samples/sec: 456.82 - lr: 0.000112 - momentum: 0.000000 2023-10-11 19:21:04,833 epoch 1 - iter 1424/1786 - loss 1.25092114 - time (sec): 433.65 - samples/sec: 456.93 - lr: 0.000127 - momentum: 0.000000 2023-10-11 19:21:59,519 epoch 1 - iter 1602/1786 - loss 1.14992315 - time (sec): 488.34 - samples/sec: 455.35 - lr: 0.000143 - momentum: 0.000000 2023-10-11 19:22:54,887 epoch 1 - iter 1780/1786 - loss 1.05807649 - time (sec): 543.70 - samples/sec: 456.19 - lr: 0.000159 - momentum: 0.000000 2023-10-11 19:22:56,573 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:22:56,574 EPOCH 1 done: loss 1.0556 - lr: 0.000159 2023-10-11 19:23:16,475 DEV : loss 0.1910761147737503 - f1-score (micro avg) 0.548 2023-10-11 19:23:16,504 saving best model 2023-10-11 19:23:17,476 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:24:13,401 epoch 2 - iter 178/1786 - loss 0.19843727 - time (sec): 55.92 - samples/sec: 460.08 - lr: 0.000158 - momentum: 0.000000 2023-10-11 19:25:09,123 epoch 2 - iter 356/1786 - loss 0.19088934 - time (sec): 111.64 - samples/sec: 455.94 - lr: 0.000156 - momentum: 0.000000 2023-10-11 19:26:07,078 epoch 2 - iter 534/1786 - loss 0.17857709 - time (sec): 169.60 - samples/sec: 457.11 - lr: 0.000155 - momentum: 0.000000 2023-10-11 19:27:01,067 epoch 2 - iter 712/1786 - loss 0.17027634 - time (sec): 223.59 - samples/sec: 450.87 - lr: 0.000153 - momentum: 0.000000 2023-10-11 19:27:55,997 epoch 2 - iter 890/1786 - loss 0.16174542 - time (sec): 278.52 - samples/sec: 450.27 - lr: 0.000151 - momentum: 0.000000 2023-10-11 19:28:49,544 epoch 2 - iter 1068/1786 - loss 0.15308871 - time (sec): 332.07 - samples/sec: 451.00 - lr: 0.000149 - momentum: 0.000000 2023-10-11 19:29:43,254 epoch 2 - iter 1246/1786 - loss 0.14946536 - time (sec): 385.78 - samples/sec: 450.66 - lr: 0.000148 - momentum: 0.000000 2023-10-11 19:30:37,656 epoch 2 - iter 1424/1786 - loss 0.14427293 - time (sec): 440.18 - samples/sec: 450.49 - lr: 0.000146 - momentum: 0.000000 2023-10-11 19:31:31,781 epoch 2 - iter 1602/1786 - loss 0.13998247 - time (sec): 494.30 - samples/sec: 448.65 - lr: 0.000144 - momentum: 0.000000 2023-10-11 19:32:26,177 epoch 2 - iter 1780/1786 - loss 0.13658975 - time (sec): 548.70 - samples/sec: 451.31 - lr: 0.000142 - momentum: 0.000000 2023-10-11 19:32:28,111 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:32:28,112 EPOCH 2 done: loss 0.1363 - lr: 0.000142 2023-10-11 19:32:49,373 DEV : loss 0.11301875859498978 - f1-score (micro avg) 0.7643 2023-10-11 19:32:49,404 saving best model 2023-10-11 19:32:54,507 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:33:49,062 epoch 3 - iter 178/1786 - loss 0.07443452 - time (sec): 54.55 - samples/sec: 454.66 - lr: 0.000140 - momentum: 0.000000 2023-10-11 19:34:45,096 epoch 3 - iter 356/1786 - loss 0.07792617 - time (sec): 110.58 - samples/sec: 451.54 - lr: 0.000139 - momentum: 0.000000 2023-10-11 19:35:39,332 epoch 3 - iter 534/1786 - loss 0.07368656 - time (sec): 164.82 - samples/sec: 449.84 - lr: 0.000137 - momentum: 0.000000 2023-10-11 19:36:35,943 epoch 3 - iter 712/1786 - loss 0.07163510 - time (sec): 221.43 - samples/sec: 443.25 - lr: 0.000135 - momentum: 0.000000 2023-10-11 19:37:33,213 epoch 3 - iter 890/1786 - loss 0.07448758 - time (sec): 278.70 - samples/sec: 440.47 - lr: 0.000133 - momentum: 0.000000 2023-10-11 19:38:28,862 epoch 3 - iter 1068/1786 - loss 0.07522810 - time (sec): 334.35 - samples/sec: 441.07 - lr: 0.000132 - momentum: 0.000000 2023-10-11 19:39:24,154 epoch 3 - iter 1246/1786 - loss 0.07393748 - time (sec): 389.64 - samples/sec: 441.29 - lr: 0.000130 - momentum: 0.000000 2023-10-11 19:40:21,470 epoch 3 - iter 1424/1786 - loss 0.07497568 - time (sec): 446.96 - samples/sec: 441.14 - lr: 0.000128 - momentum: 0.000000 2023-10-11 19:41:17,975 epoch 3 - iter 1602/1786 - loss 0.07401065 - time (sec): 503.46 - samples/sec: 442.08 - lr: 0.000126 - momentum: 0.000000 2023-10-11 19:42:14,212 epoch 3 - iter 1780/1786 - loss 0.07443437 - time (sec): 559.70 - samples/sec: 442.67 - lr: 0.000125 - momentum: 0.000000 2023-10-11 19:42:16,107 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:42:16,108 EPOCH 3 done: loss 0.0744 - lr: 0.000125 2023-10-11 19:42:37,828 DEV : loss 0.13155920803546906 - f1-score (micro avg) 0.7789 2023-10-11 19:42:37,858 saving best model 2023-10-11 19:42:48,486 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:43:42,662 epoch 4 - iter 178/1786 - loss 0.05682850 - time (sec): 54.17 - samples/sec: 455.12 - lr: 0.000123 - momentum: 0.000000 2023-10-11 19:44:36,229 epoch 4 - iter 356/1786 - loss 0.05085291 - time (sec): 107.74 - samples/sec: 461.16 - lr: 0.000121 - momentum: 0.000000 2023-10-11 19:45:31,139 epoch 4 - iter 534/1786 - loss 0.05380353 - time (sec): 162.65 - samples/sec: 467.00 - lr: 0.000119 - momentum: 0.000000 2023-10-11 19:46:27,225 epoch 4 - iter 712/1786 - loss 0.05409746 - time (sec): 218.73 - samples/sec: 460.15 - lr: 0.000117 - momentum: 0.000000 2023-10-11 19:47:21,874 epoch 4 - iter 890/1786 - loss 0.05254482 - time (sec): 273.38 - samples/sec: 456.64 - lr: 0.000116 - momentum: 0.000000 2023-10-11 19:48:17,463 epoch 4 - iter 1068/1786 - loss 0.05090637 - time (sec): 328.97 - samples/sec: 455.59 - lr: 0.000114 - momentum: 0.000000 2023-10-11 19:49:12,569 epoch 4 - iter 1246/1786 - loss 0.05176189 - time (sec): 384.08 - samples/sec: 459.11 - lr: 0.000112 - momentum: 0.000000 2023-10-11 19:50:05,052 epoch 4 - iter 1424/1786 - loss 0.05266426 - time (sec): 436.56 - samples/sec: 457.32 - lr: 0.000110 - momentum: 0.000000 2023-10-11 19:50:59,830 epoch 4 - iter 1602/1786 - loss 0.05268196 - time (sec): 491.34 - samples/sec: 455.32 - lr: 0.000109 - momentum: 0.000000 2023-10-11 19:51:54,157 epoch 4 - iter 1780/1786 - loss 0.05210017 - time (sec): 545.67 - samples/sec: 454.53 - lr: 0.000107 - momentum: 0.000000 2023-10-11 19:51:55,885 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:51:55,885 EPOCH 4 done: loss 0.0520 - lr: 0.000107 2023-10-11 19:52:18,869 DEV : loss 0.14153534173965454 - f1-score (micro avg) 0.7719 2023-10-11 19:52:18,903 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:53:12,247 epoch 5 - iter 178/1786 - loss 0.03543846 - time (sec): 53.34 - samples/sec: 445.73 - lr: 0.000105 - momentum: 0.000000 2023-10-11 19:54:04,362 epoch 5 - iter 356/1786 - loss 0.03568755 - time (sec): 105.46 - samples/sec: 444.09 - lr: 0.000103 - momentum: 0.000000 2023-10-11 19:54:59,829 epoch 5 - iter 534/1786 - loss 0.03544231 - time (sec): 160.92 - samples/sec: 456.17 - lr: 0.000101 - momentum: 0.000000 2023-10-11 19:55:58,462 epoch 5 - iter 712/1786 - loss 0.03608462 - time (sec): 219.56 - samples/sec: 450.03 - lr: 0.000100 - momentum: 0.000000 2023-10-11 19:56:56,006 epoch 5 - iter 890/1786 - loss 0.03631887 - time (sec): 277.10 - samples/sec: 443.92 - lr: 0.000098 - momentum: 0.000000 2023-10-11 19:57:51,595 epoch 5 - iter 1068/1786 - loss 0.03584506 - time (sec): 332.69 - samples/sec: 440.11 - lr: 0.000096 - momentum: 0.000000 2023-10-11 19:58:52,566 epoch 5 - iter 1246/1786 - loss 0.03555371 - time (sec): 393.66 - samples/sec: 438.69 - lr: 0.000094 - momentum: 0.000000 2023-10-11 19:59:47,070 epoch 5 - iter 1424/1786 - loss 0.03537355 - time (sec): 448.16 - samples/sec: 438.91 - lr: 0.000093 - momentum: 0.000000 2023-10-11 20:00:42,523 epoch 5 - iter 1602/1786 - loss 0.03662827 - time (sec): 503.62 - samples/sec: 441.12 - lr: 0.000091 - momentum: 0.000000 2023-10-11 20:01:38,743 epoch 5 - iter 1780/1786 - loss 0.03694673 - time (sec): 559.84 - samples/sec: 442.55 - lr: 0.000089 - momentum: 0.000000 2023-10-11 20:01:40,649 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:01:40,650 EPOCH 5 done: loss 0.0370 - lr: 0.000089 2023-10-11 20:02:02,932 DEV : loss 0.1543937772512436 - f1-score (micro avg) 0.8069 2023-10-11 20:02:02,963 saving best model 2023-10-11 20:02:34,808 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:03:31,794 epoch 6 - iter 178/1786 - loss 0.03407214 - time (sec): 56.98 - samples/sec: 440.90 - lr: 0.000087 - momentum: 0.000000 2023-10-11 20:04:27,201 epoch 6 - iter 356/1786 - loss 0.02838239 - time (sec): 112.39 - samples/sec: 441.41 - lr: 0.000085 - momentum: 0.000000 2023-10-11 20:05:22,131 epoch 6 - iter 534/1786 - loss 0.02705824 - time (sec): 167.32 - samples/sec: 440.23 - lr: 0.000084 - momentum: 0.000000 2023-10-11 20:06:17,967 epoch 6 - iter 712/1786 - loss 0.02835522 - time (sec): 223.15 - samples/sec: 442.11 - lr: 0.000082 - momentum: 0.000000 2023-10-11 20:07:13,247 epoch 6 - iter 890/1786 - loss 0.02711076 - time (sec): 278.43 - samples/sec: 441.74 - lr: 0.000080 - momentum: 0.000000 2023-10-11 20:08:09,742 epoch 6 - iter 1068/1786 - loss 0.02894506 - time (sec): 334.93 - samples/sec: 443.02 - lr: 0.000078 - momentum: 0.000000 2023-10-11 20:09:04,988 epoch 6 - iter 1246/1786 - loss 0.02842164 - time (sec): 390.18 - samples/sec: 444.19 - lr: 0.000077 - momentum: 0.000000 2023-10-11 20:10:01,812 epoch 6 - iter 1424/1786 - loss 0.02862201 - time (sec): 447.00 - samples/sec: 443.26 - lr: 0.000075 - momentum: 0.000000 2023-10-11 20:10:58,162 epoch 6 - iter 1602/1786 - loss 0.02770039 - time (sec): 503.35 - samples/sec: 442.70 - lr: 0.000073 - momentum: 0.000000 2023-10-11 20:11:52,537 epoch 6 - iter 1780/1786 - loss 0.02749648 - time (sec): 557.72 - samples/sec: 444.60 - lr: 0.000071 - momentum: 0.000000 2023-10-11 20:11:54,275 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:11:54,275 EPOCH 6 done: loss 0.0276 - lr: 0.000071 2023-10-11 20:12:15,480 DEV : loss 0.17768503725528717 - f1-score (micro avg) 0.8032 2023-10-11 20:12:15,511 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:13:10,371 epoch 7 - iter 178/1786 - loss 0.02194253 - time (sec): 54.86 - samples/sec: 440.84 - lr: 0.000069 - momentum: 0.000000 2023-10-11 20:14:05,705 epoch 7 - iter 356/1786 - loss 0.02239136 - time (sec): 110.19 - samples/sec: 442.59 - lr: 0.000068 - momentum: 0.000000 2023-10-11 20:15:03,230 epoch 7 - iter 534/1786 - loss 0.02455740 - time (sec): 167.72 - samples/sec: 434.51 - lr: 0.000066 - momentum: 0.000000 2023-10-11 20:15:58,719 epoch 7 - iter 712/1786 - loss 0.02161452 - time (sec): 223.21 - samples/sec: 441.61 - lr: 0.000064 - momentum: 0.000000 2023-10-11 20:16:55,364 epoch 7 - iter 890/1786 - loss 0.02185857 - time (sec): 279.85 - samples/sec: 442.72 - lr: 0.000062 - momentum: 0.000000 2023-10-11 20:17:53,035 epoch 7 - iter 1068/1786 - loss 0.02033916 - time (sec): 337.52 - samples/sec: 440.31 - lr: 0.000061 - momentum: 0.000000 2023-10-11 20:18:46,607 epoch 7 - iter 1246/1786 - loss 0.02021475 - time (sec): 391.09 - samples/sec: 442.63 - lr: 0.000059 - momentum: 0.000000 2023-10-11 20:19:41,538 epoch 7 - iter 1424/1786 - loss 0.02040982 - time (sec): 446.02 - samples/sec: 445.41 - lr: 0.000057 - momentum: 0.000000 2023-10-11 20:20:36,433 epoch 7 - iter 1602/1786 - loss 0.02004331 - time (sec): 500.92 - samples/sec: 446.50 - lr: 0.000055 - momentum: 0.000000 2023-10-11 20:21:30,020 epoch 7 - iter 1780/1786 - loss 0.02009444 - time (sec): 554.51 - samples/sec: 446.57 - lr: 0.000053 - momentum: 0.000000 2023-10-11 20:21:31,879 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:21:31,879 EPOCH 7 done: loss 0.0200 - lr: 0.000053 2023-10-11 20:21:52,842 DEV : loss 0.19214080274105072 - f1-score (micro avg) 0.8 2023-10-11 20:21:52,871 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:22:46,267 epoch 8 - iter 178/1786 - loss 0.01515433 - time (sec): 53.39 - samples/sec: 462.82 - lr: 0.000052 - momentum: 0.000000 2023-10-11 20:23:40,414 epoch 8 - iter 356/1786 - loss 0.01607011 - time (sec): 107.54 - samples/sec: 469.13 - lr: 0.000050 - momentum: 0.000000 2023-10-11 20:24:34,115 epoch 8 - iter 534/1786 - loss 0.01419008 - time (sec): 161.24 - samples/sec: 467.30 - lr: 0.000048 - momentum: 0.000000 2023-10-11 20:25:28,581 epoch 8 - iter 712/1786 - loss 0.01574404 - time (sec): 215.71 - samples/sec: 465.31 - lr: 0.000046 - momentum: 0.000000 2023-10-11 20:26:22,156 epoch 8 - iter 890/1786 - loss 0.01486595 - time (sec): 269.28 - samples/sec: 461.55 - lr: 0.000044 - momentum: 0.000000 2023-10-11 20:27:16,509 epoch 8 - iter 1068/1786 - loss 0.01436697 - time (sec): 323.64 - samples/sec: 459.27 - lr: 0.000043 - momentum: 0.000000 2023-10-11 20:28:11,431 epoch 8 - iter 1246/1786 - loss 0.01439559 - time (sec): 378.56 - samples/sec: 459.92 - lr: 0.000041 - momentum: 0.000000 2023-10-11 20:29:04,809 epoch 8 - iter 1424/1786 - loss 0.01444281 - time (sec): 431.94 - samples/sec: 455.36 - lr: 0.000039 - momentum: 0.000000 2023-10-11 20:30:00,610 epoch 8 - iter 1602/1786 - loss 0.01433488 - time (sec): 487.74 - samples/sec: 456.40 - lr: 0.000037 - momentum: 0.000000 2023-10-11 20:30:56,858 epoch 8 - iter 1780/1786 - loss 0.01428435 - time (sec): 543.99 - samples/sec: 456.02 - lr: 0.000036 - momentum: 0.000000 2023-10-11 20:30:58,578 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:30:58,579 EPOCH 8 done: loss 0.0143 - lr: 0.000036 2023-10-11 20:31:20,849 DEV : loss 0.20372258126735687 - f1-score (micro avg) 0.8021 2023-10-11 20:31:20,881 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:32:16,935 epoch 9 - iter 178/1786 - loss 0.00793109 - time (sec): 56.05 - samples/sec: 422.02 - lr: 0.000034 - momentum: 0.000000 2023-10-11 20:33:14,012 epoch 9 - iter 356/1786 - loss 0.01077629 - time (sec): 113.13 - samples/sec: 436.99 - lr: 0.000032 - momentum: 0.000000 2023-10-11 20:34:11,690 epoch 9 - iter 534/1786 - loss 0.01120412 - time (sec): 170.81 - samples/sec: 443.56 - lr: 0.000030 - momentum: 0.000000 2023-10-11 20:35:08,258 epoch 9 - iter 712/1786 - loss 0.00993767 - time (sec): 227.38 - samples/sec: 442.48 - lr: 0.000028 - momentum: 0.000000 2023-10-11 20:36:02,018 epoch 9 - iter 890/1786 - loss 0.00981936 - time (sec): 281.14 - samples/sec: 443.78 - lr: 0.000027 - momentum: 0.000000 2023-10-11 20:36:57,308 epoch 9 - iter 1068/1786 - loss 0.00964522 - time (sec): 336.42 - samples/sec: 445.06 - lr: 0.000025 - momentum: 0.000000 2023-10-11 20:37:51,844 epoch 9 - iter 1246/1786 - loss 0.00967314 - time (sec): 390.96 - samples/sec: 444.94 - lr: 0.000023 - momentum: 0.000000 2023-10-11 20:38:46,124 epoch 9 - iter 1424/1786 - loss 0.00944093 - time (sec): 445.24 - samples/sec: 445.21 - lr: 0.000021 - momentum: 0.000000 2023-10-11 20:39:40,185 epoch 9 - iter 1602/1786 - loss 0.00949793 - time (sec): 499.30 - samples/sec: 445.57 - lr: 0.000020 - momentum: 0.000000 2023-10-11 20:40:35,488 epoch 9 - iter 1780/1786 - loss 0.00978050 - time (sec): 554.61 - samples/sec: 446.96 - lr: 0.000018 - momentum: 0.000000 2023-10-11 20:40:37,245 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:40:37,245 EPOCH 9 done: loss 0.0098 - lr: 0.000018 2023-10-11 20:40:59,367 DEV : loss 0.21249088644981384 - f1-score (micro avg) 0.8021 2023-10-11 20:40:59,397 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:41:53,260 epoch 10 - iter 178/1786 - loss 0.00766748 - time (sec): 53.86 - samples/sec: 467.31 - lr: 0.000016 - momentum: 0.000000 2023-10-11 20:42:47,621 epoch 10 - iter 356/1786 - loss 0.00666879 - time (sec): 108.22 - samples/sec: 473.72 - lr: 0.000014 - momentum: 0.000000 2023-10-11 20:43:40,341 epoch 10 - iter 534/1786 - loss 0.00669677 - time (sec): 160.94 - samples/sec: 463.30 - lr: 0.000012 - momentum: 0.000000 2023-10-11 20:44:33,751 epoch 10 - iter 712/1786 - loss 0.00714241 - time (sec): 214.35 - samples/sec: 463.12 - lr: 0.000011 - momentum: 0.000000 2023-10-11 20:45:27,178 epoch 10 - iter 890/1786 - loss 0.00681352 - time (sec): 267.78 - samples/sec: 457.07 - lr: 0.000009 - momentum: 0.000000 2023-10-11 20:46:22,447 epoch 10 - iter 1068/1786 - loss 0.00770174 - time (sec): 323.05 - samples/sec: 461.61 - lr: 0.000007 - momentum: 0.000000 2023-10-11 20:47:15,081 epoch 10 - iter 1246/1786 - loss 0.00775296 - time (sec): 375.68 - samples/sec: 460.13 - lr: 0.000005 - momentum: 0.000000 2023-10-11 20:48:10,290 epoch 10 - iter 1424/1786 - loss 0.00785199 - time (sec): 430.89 - samples/sec: 459.46 - lr: 0.000004 - momentum: 0.000000 2023-10-11 20:49:06,235 epoch 10 - iter 1602/1786 - loss 0.00742856 - time (sec): 486.84 - samples/sec: 458.91 - lr: 0.000002 - momentum: 0.000000 2023-10-11 20:50:01,353 epoch 10 - iter 1780/1786 - loss 0.00753833 - time (sec): 541.95 - samples/sec: 457.97 - lr: 0.000000 - momentum: 0.000000 2023-10-11 20:50:02,958 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:50:02,959 EPOCH 10 done: loss 0.0075 - lr: 0.000000 2023-10-11 20:50:25,995 DEV : loss 0.21911536157131195 - f1-score (micro avg) 0.8046 2023-10-11 20:50:27,138 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:50:27,141 Loading model from best epoch ... 2023-10-11 20:50:31,856 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 20:51:43,012 Results: - F-score (micro) 0.6934 - F-score (macro) 0.6106 - Accuracy 0.5471 By class: precision recall f1-score support LOC 0.7176 0.7242 0.7209 1095 PER 0.7850 0.7431 0.7635 1012 ORG 0.4150 0.5742 0.4818 357 HumanProd 0.3922 0.6061 0.4762 33 micro avg 0.6787 0.7089 0.6934 2497 macro avg 0.5774 0.6619 0.6106 2497 weighted avg 0.6974 0.7089 0.7007 2497 2023-10-11 20:51:43,012 ----------------------------------------------------------------------------------------------------