2023-10-11 09:51:51,454 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,456 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 09:51:51,456 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,457 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-11 09:51:51,457 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,457 Train: 7142 sentences 2023-10-11 09:51:51,457 (train_with_dev=False, train_with_test=False) 2023-10-11 09:51:51,457 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,457 Training Params: 2023-10-11 09:51:51,457 - learning_rate: "0.00016" 2023-10-11 09:51:51,457 - mini_batch_size: "8" 2023-10-11 09:51:51,457 - max_epochs: "10" 2023-10-11 09:51:51,457 - shuffle: "True" 2023-10-11 09:51:51,457 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,457 Plugins: 2023-10-11 09:51:51,458 - TensorboardLogger 2023-10-11 09:51:51,458 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 09:51:51,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,458 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 09:51:51,458 - metric: "('micro avg', 'f1-score')" 2023-10-11 09:51:51,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,458 Computation: 2023-10-11 09:51:51,458 - compute on device: cuda:0 2023-10-11 09:51:51,458 - embedding storage: none 2023-10-11 09:51:51,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,458 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-11 09:51:51,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,458 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:51:51,459 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 09:52:43,650 epoch 1 - iter 89/893 - loss 2.81958198 - time (sec): 52.19 - samples/sec: 515.94 - lr: 0.000016 - momentum: 0.000000 2023-10-11 09:53:33,684 epoch 1 - iter 178/893 - loss 2.73966772 - time (sec): 102.22 - samples/sec: 495.89 - lr: 0.000032 - momentum: 0.000000 2023-10-11 09:54:24,264 epoch 1 - iter 267/893 - loss 2.54471735 - time (sec): 152.80 - samples/sec: 487.65 - lr: 0.000048 - momentum: 0.000000 2023-10-11 09:55:14,301 epoch 1 - iter 356/893 - loss 2.32649706 - time (sec): 202.84 - samples/sec: 486.76 - lr: 0.000064 - momentum: 0.000000 2023-10-11 09:56:05,159 epoch 1 - iter 445/893 - loss 2.08476844 - time (sec): 253.70 - samples/sec: 492.21 - lr: 0.000080 - momentum: 0.000000 2023-10-11 09:56:55,867 epoch 1 - iter 534/893 - loss 1.86965509 - time (sec): 304.41 - samples/sec: 488.69 - lr: 0.000095 - momentum: 0.000000 2023-10-11 09:57:49,598 epoch 1 - iter 623/893 - loss 1.68007860 - time (sec): 358.14 - samples/sec: 486.88 - lr: 0.000111 - momentum: 0.000000 2023-10-11 09:58:44,484 epoch 1 - iter 712/893 - loss 1.52983259 - time (sec): 413.02 - samples/sec: 481.46 - lr: 0.000127 - momentum: 0.000000 2023-10-11 09:59:39,435 epoch 1 - iter 801/893 - loss 1.39878794 - time (sec): 467.97 - samples/sec: 478.76 - lr: 0.000143 - momentum: 0.000000 2023-10-11 10:00:28,196 epoch 1 - iter 890/893 - loss 1.29860971 - time (sec): 516.74 - samples/sec: 479.81 - lr: 0.000159 - momentum: 0.000000 2023-10-11 10:00:29,735 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:00:29,736 EPOCH 1 done: loss 1.2953 - lr: 0.000159 2023-10-11 10:00:49,291 DEV : loss 0.23720598220825195 - f1-score (micro avg) 0.521 2023-10-11 10:00:49,321 saving best model 2023-10-11 10:00:50,224 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:01:41,521 epoch 2 - iter 89/893 - loss 0.24156291 - time (sec): 51.29 - samples/sec: 511.50 - lr: 0.000158 - momentum: 0.000000 2023-10-11 10:02:32,514 epoch 2 - iter 178/893 - loss 0.23973758 - time (sec): 102.29 - samples/sec: 503.29 - lr: 0.000156 - momentum: 0.000000 2023-10-11 10:03:23,339 epoch 2 - iter 267/893 - loss 0.22533212 - time (sec): 153.11 - samples/sec: 492.16 - lr: 0.000155 - momentum: 0.000000 2023-10-11 10:04:15,696 epoch 2 - iter 356/893 - loss 0.20433271 - time (sec): 205.47 - samples/sec: 490.36 - lr: 0.000153 - momentum: 0.000000 2023-10-11 10:05:06,103 epoch 2 - iter 445/893 - loss 0.19447693 - time (sec): 255.88 - samples/sec: 486.27 - lr: 0.000151 - momentum: 0.000000 2023-10-11 10:06:00,826 epoch 2 - iter 534/893 - loss 0.18500783 - time (sec): 310.60 - samples/sec: 483.67 - lr: 0.000149 - momentum: 0.000000 2023-10-11 10:06:55,207 epoch 2 - iter 623/893 - loss 0.17866349 - time (sec): 364.98 - samples/sec: 478.57 - lr: 0.000148 - momentum: 0.000000 2023-10-11 10:07:46,050 epoch 2 - iter 712/893 - loss 0.17165749 - time (sec): 415.82 - samples/sec: 476.07 - lr: 0.000146 - momentum: 0.000000 2023-10-11 10:08:40,495 epoch 2 - iter 801/893 - loss 0.16565209 - time (sec): 470.27 - samples/sec: 473.21 - lr: 0.000144 - momentum: 0.000000 2023-10-11 10:09:35,309 epoch 2 - iter 890/893 - loss 0.15984826 - time (sec): 525.08 - samples/sec: 472.17 - lr: 0.000142 - momentum: 0.000000 2023-10-11 10:09:36,886 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:09:36,887 EPOCH 2 done: loss 0.1596 - lr: 0.000142 2023-10-11 10:09:58,231 DEV : loss 0.09981917589902878 - f1-score (micro avg) 0.7588 2023-10-11 10:09:58,264 saving best model 2023-10-11 10:10:00,886 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:10:52,118 epoch 3 - iter 89/893 - loss 0.06769040 - time (sec): 51.23 - samples/sec: 465.50 - lr: 0.000140 - momentum: 0.000000 2023-10-11 10:11:42,503 epoch 3 - iter 178/893 - loss 0.06730058 - time (sec): 101.61 - samples/sec: 481.27 - lr: 0.000139 - momentum: 0.000000 2023-10-11 10:12:32,365 epoch 3 - iter 267/893 - loss 0.06608996 - time (sec): 151.47 - samples/sec: 483.16 - lr: 0.000137 - momentum: 0.000000 2023-10-11 10:13:22,493 epoch 3 - iter 356/893 - loss 0.06822392 - time (sec): 201.60 - samples/sec: 487.02 - lr: 0.000135 - momentum: 0.000000 2023-10-11 10:14:13,138 epoch 3 - iter 445/893 - loss 0.07122185 - time (sec): 252.25 - samples/sec: 487.15 - lr: 0.000133 - momentum: 0.000000 2023-10-11 10:15:03,603 epoch 3 - iter 534/893 - loss 0.07293485 - time (sec): 302.71 - samples/sec: 487.29 - lr: 0.000132 - momentum: 0.000000 2023-10-11 10:15:59,820 epoch 3 - iter 623/893 - loss 0.07412771 - time (sec): 358.93 - samples/sec: 485.22 - lr: 0.000130 - momentum: 0.000000 2023-10-11 10:16:54,820 epoch 3 - iter 712/893 - loss 0.07275147 - time (sec): 413.93 - samples/sec: 478.54 - lr: 0.000128 - momentum: 0.000000 2023-10-11 10:17:50,647 epoch 3 - iter 801/893 - loss 0.07101037 - time (sec): 469.76 - samples/sec: 475.21 - lr: 0.000126 - momentum: 0.000000 2023-10-11 10:18:45,676 epoch 3 - iter 890/893 - loss 0.07076952 - time (sec): 524.78 - samples/sec: 472.85 - lr: 0.000125 - momentum: 0.000000 2023-10-11 10:18:47,139 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:18:47,140 EPOCH 3 done: loss 0.0709 - lr: 0.000125 2023-10-11 10:19:08,915 DEV : loss 0.10631939768791199 - f1-score (micro avg) 0.7909 2023-10-11 10:19:08,960 saving best model 2023-10-11 10:19:11,681 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:20:04,148 epoch 4 - iter 89/893 - loss 0.04488972 - time (sec): 52.46 - samples/sec: 457.96 - lr: 0.000123 - momentum: 0.000000 2023-10-11 10:20:59,199 epoch 4 - iter 178/893 - loss 0.04859362 - time (sec): 107.51 - samples/sec: 455.92 - lr: 0.000121 - momentum: 0.000000 2023-10-11 10:21:55,697 epoch 4 - iter 267/893 - loss 0.04714332 - time (sec): 164.01 - samples/sec: 461.00 - lr: 0.000119 - momentum: 0.000000 2023-10-11 10:22:50,160 epoch 4 - iter 356/893 - loss 0.04851860 - time (sec): 218.47 - samples/sec: 458.60 - lr: 0.000117 - momentum: 0.000000 2023-10-11 10:23:41,893 epoch 4 - iter 445/893 - loss 0.04963523 - time (sec): 270.21 - samples/sec: 465.48 - lr: 0.000116 - momentum: 0.000000 2023-10-11 10:24:30,632 epoch 4 - iter 534/893 - loss 0.05027391 - time (sec): 318.95 - samples/sec: 466.00 - lr: 0.000114 - momentum: 0.000000 2023-10-11 10:25:19,741 epoch 4 - iter 623/893 - loss 0.05064819 - time (sec): 368.06 - samples/sec: 472.03 - lr: 0.000112 - momentum: 0.000000 2023-10-11 10:26:11,200 epoch 4 - iter 712/893 - loss 0.05119551 - time (sec): 419.51 - samples/sec: 471.86 - lr: 0.000110 - momentum: 0.000000 2023-10-11 10:27:01,611 epoch 4 - iter 801/893 - loss 0.05054556 - time (sec): 469.93 - samples/sec: 474.73 - lr: 0.000109 - momentum: 0.000000 2023-10-11 10:27:52,358 epoch 4 - iter 890/893 - loss 0.04922809 - time (sec): 520.67 - samples/sec: 476.32 - lr: 0.000107 - momentum: 0.000000 2023-10-11 10:27:53,878 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:27:53,879 EPOCH 4 done: loss 0.0491 - lr: 0.000107 2023-10-11 10:28:16,061 DEV : loss 0.13096819818019867 - f1-score (micro avg) 0.7973 2023-10-11 10:28:16,097 saving best model 2023-10-11 10:28:18,714 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:29:14,514 epoch 5 - iter 89/893 - loss 0.03067518 - time (sec): 55.79 - samples/sec: 450.15 - lr: 0.000105 - momentum: 0.000000 2023-10-11 10:30:10,324 epoch 5 - iter 178/893 - loss 0.03624382 - time (sec): 111.61 - samples/sec: 453.36 - lr: 0.000103 - momentum: 0.000000 2023-10-11 10:31:01,711 epoch 5 - iter 267/893 - loss 0.03352199 - time (sec): 162.99 - samples/sec: 464.01 - lr: 0.000101 - momentum: 0.000000 2023-10-11 10:31:50,487 epoch 5 - iter 356/893 - loss 0.03375946 - time (sec): 211.77 - samples/sec: 470.97 - lr: 0.000100 - momentum: 0.000000 2023-10-11 10:32:40,342 epoch 5 - iter 445/893 - loss 0.03395690 - time (sec): 261.62 - samples/sec: 472.45 - lr: 0.000098 - momentum: 0.000000 2023-10-11 10:33:31,750 epoch 5 - iter 534/893 - loss 0.03542389 - time (sec): 313.03 - samples/sec: 471.58 - lr: 0.000096 - momentum: 0.000000 2023-10-11 10:34:28,397 epoch 5 - iter 623/893 - loss 0.03472368 - time (sec): 369.68 - samples/sec: 467.82 - lr: 0.000094 - momentum: 0.000000 2023-10-11 10:35:18,765 epoch 5 - iter 712/893 - loss 0.03520394 - time (sec): 420.05 - samples/sec: 469.87 - lr: 0.000093 - momentum: 0.000000 2023-10-11 10:36:13,262 epoch 5 - iter 801/893 - loss 0.03464309 - time (sec): 474.54 - samples/sec: 468.46 - lr: 0.000091 - momentum: 0.000000 2023-10-11 10:37:02,901 epoch 5 - iter 890/893 - loss 0.03596866 - time (sec): 524.18 - samples/sec: 473.24 - lr: 0.000089 - momentum: 0.000000 2023-10-11 10:37:04,415 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:37:04,415 EPOCH 5 done: loss 0.0360 - lr: 0.000089 2023-10-11 10:37:25,634 DEV : loss 0.14307744801044464 - f1-score (micro avg) 0.8043 2023-10-11 10:37:25,663 saving best model 2023-10-11 10:37:28,263 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:38:18,809 epoch 6 - iter 89/893 - loss 0.03174465 - time (sec): 50.54 - samples/sec: 514.67 - lr: 0.000087 - momentum: 0.000000 2023-10-11 10:39:07,991 epoch 6 - iter 178/893 - loss 0.02928006 - time (sec): 99.72 - samples/sec: 497.76 - lr: 0.000085 - momentum: 0.000000 2023-10-11 10:39:57,896 epoch 6 - iter 267/893 - loss 0.02968319 - time (sec): 149.63 - samples/sec: 492.29 - lr: 0.000084 - momentum: 0.000000 2023-10-11 10:40:49,294 epoch 6 - iter 356/893 - loss 0.02835877 - time (sec): 201.03 - samples/sec: 493.58 - lr: 0.000082 - momentum: 0.000000 2023-10-11 10:41:40,128 epoch 6 - iter 445/893 - loss 0.02693656 - time (sec): 251.86 - samples/sec: 491.28 - lr: 0.000080 - momentum: 0.000000 2023-10-11 10:42:30,558 epoch 6 - iter 534/893 - loss 0.02653119 - time (sec): 302.29 - samples/sec: 486.48 - lr: 0.000078 - momentum: 0.000000 2023-10-11 10:43:21,338 epoch 6 - iter 623/893 - loss 0.02667286 - time (sec): 353.07 - samples/sec: 486.64 - lr: 0.000077 - momentum: 0.000000 2023-10-11 10:44:12,346 epoch 6 - iter 712/893 - loss 0.02748104 - time (sec): 404.08 - samples/sec: 490.38 - lr: 0.000075 - momentum: 0.000000 2023-10-11 10:45:00,449 epoch 6 - iter 801/893 - loss 0.02718055 - time (sec): 452.18 - samples/sec: 493.59 - lr: 0.000073 - momentum: 0.000000 2023-10-11 10:45:49,594 epoch 6 - iter 890/893 - loss 0.02706511 - time (sec): 501.33 - samples/sec: 494.82 - lr: 0.000071 - momentum: 0.000000 2023-10-11 10:45:51,093 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:45:51,093 EPOCH 6 done: loss 0.0270 - lr: 0.000071 2023-10-11 10:46:11,841 DEV : loss 0.1740272492170334 - f1-score (micro avg) 0.7989 2023-10-11 10:46:11,871 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:46:59,408 epoch 7 - iter 89/893 - loss 0.02529529 - time (sec): 47.53 - samples/sec: 515.71 - lr: 0.000069 - momentum: 0.000000 2023-10-11 10:47:48,889 epoch 7 - iter 178/893 - loss 0.02618348 - time (sec): 97.02 - samples/sec: 492.54 - lr: 0.000068 - momentum: 0.000000 2023-10-11 10:48:38,830 epoch 7 - iter 267/893 - loss 0.02283278 - time (sec): 146.96 - samples/sec: 497.17 - lr: 0.000066 - momentum: 0.000000 2023-10-11 10:49:27,859 epoch 7 - iter 356/893 - loss 0.02247875 - time (sec): 195.99 - samples/sec: 495.73 - lr: 0.000064 - momentum: 0.000000 2023-10-11 10:50:20,268 epoch 7 - iter 445/893 - loss 0.02349580 - time (sec): 248.39 - samples/sec: 492.55 - lr: 0.000062 - momentum: 0.000000 2023-10-11 10:51:11,607 epoch 7 - iter 534/893 - loss 0.02280138 - time (sec): 299.73 - samples/sec: 493.64 - lr: 0.000061 - momentum: 0.000000 2023-10-11 10:52:02,733 epoch 7 - iter 623/893 - loss 0.02188778 - time (sec): 350.86 - samples/sec: 493.14 - lr: 0.000059 - momentum: 0.000000 2023-10-11 10:52:55,211 epoch 7 - iter 712/893 - loss 0.02139551 - time (sec): 403.34 - samples/sec: 491.20 - lr: 0.000057 - momentum: 0.000000 2023-10-11 10:53:47,078 epoch 7 - iter 801/893 - loss 0.02130839 - time (sec): 455.20 - samples/sec: 490.46 - lr: 0.000055 - momentum: 0.000000 2023-10-11 10:54:40,102 epoch 7 - iter 890/893 - loss 0.02103175 - time (sec): 508.23 - samples/sec: 488.30 - lr: 0.000053 - momentum: 0.000000 2023-10-11 10:54:41,689 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:54:41,689 EPOCH 7 done: loss 0.0211 - lr: 0.000053 2023-10-11 10:55:04,149 DEV : loss 0.17123691737651825 - f1-score (micro avg) 0.7955 2023-10-11 10:55:04,179 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:55:54,180 epoch 8 - iter 89/893 - loss 0.01527667 - time (sec): 50.00 - samples/sec: 493.94 - lr: 0.000052 - momentum: 0.000000 2023-10-11 10:56:44,974 epoch 8 - iter 178/893 - loss 0.01586546 - time (sec): 100.79 - samples/sec: 490.48 - lr: 0.000050 - momentum: 0.000000 2023-10-11 10:57:38,893 epoch 8 - iter 267/893 - loss 0.01414335 - time (sec): 154.71 - samples/sec: 470.32 - lr: 0.000048 - momentum: 0.000000 2023-10-11 10:58:32,085 epoch 8 - iter 356/893 - loss 0.01357556 - time (sec): 207.90 - samples/sec: 462.77 - lr: 0.000046 - momentum: 0.000000 2023-10-11 10:59:27,665 epoch 8 - iter 445/893 - loss 0.01506294 - time (sec): 263.48 - samples/sec: 455.96 - lr: 0.000045 - momentum: 0.000000 2023-10-11 11:00:20,764 epoch 8 - iter 534/893 - loss 0.01614469 - time (sec): 316.58 - samples/sec: 462.89 - lr: 0.000043 - momentum: 0.000000 2023-10-11 11:01:12,150 epoch 8 - iter 623/893 - loss 0.01576585 - time (sec): 367.97 - samples/sec: 468.57 - lr: 0.000041 - momentum: 0.000000 2023-10-11 11:02:04,707 epoch 8 - iter 712/893 - loss 0.01631603 - time (sec): 420.53 - samples/sec: 473.12 - lr: 0.000039 - momentum: 0.000000 2023-10-11 11:02:56,711 epoch 8 - iter 801/893 - loss 0.01698976 - time (sec): 472.53 - samples/sec: 474.98 - lr: 0.000037 - momentum: 0.000000 2023-10-11 11:03:46,712 epoch 8 - iter 890/893 - loss 0.01660442 - time (sec): 522.53 - samples/sec: 474.80 - lr: 0.000036 - momentum: 0.000000 2023-10-11 11:03:48,175 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:03:48,176 EPOCH 8 done: loss 0.0166 - lr: 0.000036 2023-10-11 11:04:09,326 DEV : loss 0.1897462159395218 - f1-score (micro avg) 0.8003 2023-10-11 11:04:09,356 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:04:59,464 epoch 9 - iter 89/893 - loss 0.01141607 - time (sec): 50.11 - samples/sec: 476.01 - lr: 0.000034 - momentum: 0.000000 2023-10-11 11:05:49,221 epoch 9 - iter 178/893 - loss 0.01061287 - time (sec): 99.86 - samples/sec: 467.98 - lr: 0.000032 - momentum: 0.000000 2023-10-11 11:06:37,768 epoch 9 - iter 267/893 - loss 0.01141794 - time (sec): 148.41 - samples/sec: 463.59 - lr: 0.000030 - momentum: 0.000000 2023-10-11 11:07:31,411 epoch 9 - iter 356/893 - loss 0.01110773 - time (sec): 202.05 - samples/sec: 469.68 - lr: 0.000029 - momentum: 0.000000 2023-10-11 11:08:27,558 epoch 9 - iter 445/893 - loss 0.01210552 - time (sec): 258.20 - samples/sec: 465.79 - lr: 0.000027 - momentum: 0.000000 2023-10-11 11:09:21,450 epoch 9 - iter 534/893 - loss 0.01286214 - time (sec): 312.09 - samples/sec: 468.30 - lr: 0.000025 - momentum: 0.000000 2023-10-11 11:10:14,841 epoch 9 - iter 623/893 - loss 0.01302278 - time (sec): 365.48 - samples/sec: 472.28 - lr: 0.000023 - momentum: 0.000000 2023-10-11 11:11:06,931 epoch 9 - iter 712/893 - loss 0.01261530 - time (sec): 417.57 - samples/sec: 475.71 - lr: 0.000022 - momentum: 0.000000 2023-10-11 11:11:59,563 epoch 9 - iter 801/893 - loss 0.01305653 - time (sec): 470.21 - samples/sec: 475.44 - lr: 0.000020 - momentum: 0.000000 2023-10-11 11:12:49,808 epoch 9 - iter 890/893 - loss 0.01318959 - time (sec): 520.45 - samples/sec: 475.89 - lr: 0.000018 - momentum: 0.000000 2023-10-11 11:12:51,530 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:12:51,530 EPOCH 9 done: loss 0.0131 - lr: 0.000018 2023-10-11 11:13:13,077 DEV : loss 0.19762861728668213 - f1-score (micro avg) 0.7971 2023-10-11 11:13:13,107 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:14:01,040 epoch 10 - iter 89/893 - loss 0.00903721 - time (sec): 47.93 - samples/sec: 520.50 - lr: 0.000016 - momentum: 0.000000 2023-10-11 11:14:48,169 epoch 10 - iter 178/893 - loss 0.01155862 - time (sec): 95.06 - samples/sec: 506.09 - lr: 0.000014 - momentum: 0.000000 2023-10-11 11:15:36,368 epoch 10 - iter 267/893 - loss 0.01117476 - time (sec): 143.26 - samples/sec: 507.88 - lr: 0.000013 - momentum: 0.000000 2023-10-11 11:16:25,236 epoch 10 - iter 356/893 - loss 0.01092791 - time (sec): 192.13 - samples/sec: 511.73 - lr: 0.000011 - momentum: 0.000000 2023-10-11 11:17:15,043 epoch 10 - iter 445/893 - loss 0.01104186 - time (sec): 241.93 - samples/sec: 513.40 - lr: 0.000009 - momentum: 0.000000 2023-10-11 11:18:03,341 epoch 10 - iter 534/893 - loss 0.01048931 - time (sec): 290.23 - samples/sec: 512.33 - lr: 0.000007 - momentum: 0.000000 2023-10-11 11:18:53,582 epoch 10 - iter 623/893 - loss 0.01050700 - time (sec): 340.47 - samples/sec: 507.21 - lr: 0.000006 - momentum: 0.000000 2023-10-11 11:19:43,252 epoch 10 - iter 712/893 - loss 0.01002609 - time (sec): 390.14 - samples/sec: 506.48 - lr: 0.000004 - momentum: 0.000000 2023-10-11 11:20:33,114 epoch 10 - iter 801/893 - loss 0.00990462 - time (sec): 440.01 - samples/sec: 505.61 - lr: 0.000002 - momentum: 0.000000 2023-10-11 11:21:24,177 epoch 10 - iter 890/893 - loss 0.01009786 - time (sec): 491.07 - samples/sec: 505.52 - lr: 0.000000 - momentum: 0.000000 2023-10-11 11:21:25,540 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:21:25,540 EPOCH 10 done: loss 0.0101 - lr: 0.000000 2023-10-11 11:21:46,902 DEV : loss 0.2045479267835617 - f1-score (micro avg) 0.7949 2023-10-11 11:21:47,835 ---------------------------------------------------------------------------------------------------- 2023-10-11 11:21:47,837 Loading model from best epoch ... 2023-10-11 11:21:51,551 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 11:23:00,255 Results: - F-score (micro) 0.6987 - F-score (macro) 0.6131 - Accuracy 0.5529 By class: precision recall f1-score support LOC 0.6829 0.7315 0.7063 1095 PER 0.7810 0.7717 0.7763 1012 ORG 0.4562 0.5686 0.5062 357 HumanProd 0.3878 0.5758 0.4634 33 micro avg 0.6764 0.7225 0.6987 2497 macro avg 0.5769 0.6619 0.6131 2497 weighted avg 0.6863 0.7225 0.7029 2497 2023-10-11 11:23:00,255 ----------------------------------------------------------------------------------------------------