2023-10-13 08:55:31,689 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,690 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 08:55:31,690 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 Train: 1100 sentences 2023-10-13 08:55:31,691 (train_with_dev=False, train_with_test=False) 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 Training Params: 2023-10-13 08:55:31,691 - learning_rate: "3e-05" 2023-10-13 08:55:31,691 - mini_batch_size: "8" 2023-10-13 08:55:31,691 - max_epochs: "10" 2023-10-13 08:55:31,691 - shuffle: "True" 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 Plugins: 2023-10-13 08:55:31,691 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 08:55:31,691 - metric: "('micro avg', 'f1-score')" 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 Computation: 2023-10-13 08:55:31,691 - compute on device: cuda:0 2023-10-13 08:55:31,691 - embedding storage: none 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 Model training base path: "hmbench-ajmc/de-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-5" 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:31,691 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:32,450 epoch 1 - iter 13/138 - loss 3.23164083 - time (sec): 0.76 - samples/sec: 2657.16 - lr: 0.000003 - momentum: 0.000000 2023-10-13 08:55:33,231 epoch 1 - iter 26/138 - loss 3.07083348 - time (sec): 1.54 - samples/sec: 2592.11 - lr: 0.000005 - momentum: 0.000000 2023-10-13 08:55:34,010 epoch 1 - iter 39/138 - loss 2.71985264 - time (sec): 2.32 - samples/sec: 2587.53 - lr: 0.000008 - momentum: 0.000000 2023-10-13 08:55:34,819 epoch 1 - iter 52/138 - loss 2.24509201 - time (sec): 3.13 - samples/sec: 2663.88 - lr: 0.000011 - momentum: 0.000000 2023-10-13 08:55:35,628 epoch 1 - iter 65/138 - loss 1.93631872 - time (sec): 3.94 - samples/sec: 2694.87 - lr: 0.000014 - momentum: 0.000000 2023-10-13 08:55:36,459 epoch 1 - iter 78/138 - loss 1.73605929 - time (sec): 4.77 - samples/sec: 2683.95 - lr: 0.000017 - momentum: 0.000000 2023-10-13 08:55:37,270 epoch 1 - iter 91/138 - loss 1.57473090 - time (sec): 5.58 - samples/sec: 2672.60 - lr: 0.000020 - momentum: 0.000000 2023-10-13 08:55:38,072 epoch 1 - iter 104/138 - loss 1.43433242 - time (sec): 6.38 - samples/sec: 2698.19 - lr: 0.000022 - momentum: 0.000000 2023-10-13 08:55:38,741 epoch 1 - iter 117/138 - loss 1.33051183 - time (sec): 7.05 - samples/sec: 2752.06 - lr: 0.000025 - momentum: 0.000000 2023-10-13 08:55:39,432 epoch 1 - iter 130/138 - loss 1.24843363 - time (sec): 7.74 - samples/sec: 2756.81 - lr: 0.000028 - momentum: 0.000000 2023-10-13 08:55:39,882 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:39,883 EPOCH 1 done: loss 1.2007 - lr: 0.000028 2023-10-13 08:55:40,600 DEV : loss 0.28690722584724426 - f1-score (micro avg) 0.6651 2023-10-13 08:55:40,605 saving best model 2023-10-13 08:55:40,989 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:41,681 epoch 2 - iter 13/138 - loss 0.32963642 - time (sec): 0.69 - samples/sec: 3035.52 - lr: 0.000030 - momentum: 0.000000 2023-10-13 08:55:42,381 epoch 2 - iter 26/138 - loss 0.27672936 - time (sec): 1.39 - samples/sec: 3080.92 - lr: 0.000029 - momentum: 0.000000 2023-10-13 08:55:43,105 epoch 2 - iter 39/138 - loss 0.24838311 - time (sec): 2.11 - samples/sec: 3088.92 - lr: 0.000029 - momentum: 0.000000 2023-10-13 08:55:43,865 epoch 2 - iter 52/138 - loss 0.24825315 - time (sec): 2.87 - samples/sec: 3116.82 - lr: 0.000029 - momentum: 0.000000 2023-10-13 08:55:44,587 epoch 2 - iter 65/138 - loss 0.24919275 - time (sec): 3.60 - samples/sec: 3175.76 - lr: 0.000028 - momentum: 0.000000 2023-10-13 08:55:45,265 epoch 2 - iter 78/138 - loss 0.24381568 - time (sec): 4.27 - samples/sec: 3108.71 - lr: 0.000028 - momentum: 0.000000 2023-10-13 08:55:45,969 epoch 2 - iter 91/138 - loss 0.23905698 - time (sec): 4.98 - samples/sec: 3065.00 - lr: 0.000028 - momentum: 0.000000 2023-10-13 08:55:46,728 epoch 2 - iter 104/138 - loss 0.23512468 - time (sec): 5.74 - samples/sec: 3041.71 - lr: 0.000028 - momentum: 0.000000 2023-10-13 08:55:47,505 epoch 2 - iter 117/138 - loss 0.22772616 - time (sec): 6.51 - samples/sec: 3004.87 - lr: 0.000027 - momentum: 0.000000 2023-10-13 08:55:48,178 epoch 2 - iter 130/138 - loss 0.22406658 - time (sec): 7.19 - samples/sec: 3008.30 - lr: 0.000027 - momentum: 0.000000 2023-10-13 08:55:48,628 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:48,629 EPOCH 2 done: loss 0.2199 - lr: 0.000027 2023-10-13 08:55:49,287 DEV : loss 0.14626161754131317 - f1-score (micro avg) 0.7981 2023-10-13 08:55:49,292 saving best model 2023-10-13 08:55:49,827 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:50,567 epoch 3 - iter 13/138 - loss 0.11244007 - time (sec): 0.74 - samples/sec: 2858.37 - lr: 0.000026 - momentum: 0.000000 2023-10-13 08:55:51,350 epoch 3 - iter 26/138 - loss 0.09545981 - time (sec): 1.52 - samples/sec: 2944.50 - lr: 0.000026 - momentum: 0.000000 2023-10-13 08:55:52,041 epoch 3 - iter 39/138 - loss 0.11179342 - time (sec): 2.21 - samples/sec: 2936.77 - lr: 0.000026 - momentum: 0.000000 2023-10-13 08:55:52,722 epoch 3 - iter 52/138 - loss 0.10944003 - time (sec): 2.89 - samples/sec: 2886.44 - lr: 0.000025 - momentum: 0.000000 2023-10-13 08:55:53,434 epoch 3 - iter 65/138 - loss 0.10962130 - time (sec): 3.60 - samples/sec: 2950.32 - lr: 0.000025 - momentum: 0.000000 2023-10-13 08:55:54,143 epoch 3 - iter 78/138 - loss 0.10776067 - time (sec): 4.31 - samples/sec: 2949.96 - lr: 0.000025 - momentum: 0.000000 2023-10-13 08:55:54,863 epoch 3 - iter 91/138 - loss 0.11389656 - time (sec): 5.03 - samples/sec: 3003.46 - lr: 0.000025 - momentum: 0.000000 2023-10-13 08:55:55,586 epoch 3 - iter 104/138 - loss 0.11174026 - time (sec): 5.75 - samples/sec: 3002.22 - lr: 0.000024 - momentum: 0.000000 2023-10-13 08:55:56,327 epoch 3 - iter 117/138 - loss 0.11375118 - time (sec): 6.50 - samples/sec: 3004.08 - lr: 0.000024 - momentum: 0.000000 2023-10-13 08:55:57,034 epoch 3 - iter 130/138 - loss 0.11290363 - time (sec): 7.20 - samples/sec: 2992.35 - lr: 0.000024 - momentum: 0.000000 2023-10-13 08:55:57,479 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:57,480 EPOCH 3 done: loss 0.1138 - lr: 0.000024 2023-10-13 08:55:58,116 DEV : loss 0.13030706346035004 - f1-score (micro avg) 0.8351 2023-10-13 08:55:58,122 saving best model 2023-10-13 08:55:58,619 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:59,326 epoch 4 - iter 13/138 - loss 0.05847433 - time (sec): 0.70 - samples/sec: 3245.80 - lr: 0.000023 - momentum: 0.000000 2023-10-13 08:56:00,013 epoch 4 - iter 26/138 - loss 0.07319254 - time (sec): 1.39 - samples/sec: 3278.88 - lr: 0.000023 - momentum: 0.000000 2023-10-13 08:56:00,755 epoch 4 - iter 39/138 - loss 0.06410931 - time (sec): 2.13 - samples/sec: 3160.08 - lr: 0.000022 - momentum: 0.000000 2023-10-13 08:56:01,417 epoch 4 - iter 52/138 - loss 0.06694482 - time (sec): 2.79 - samples/sec: 3055.12 - lr: 0.000022 - momentum: 0.000000 2023-10-13 08:56:02,160 epoch 4 - iter 65/138 - loss 0.07449259 - time (sec): 3.54 - samples/sec: 2987.09 - lr: 0.000022 - momentum: 0.000000 2023-10-13 08:56:02,857 epoch 4 - iter 78/138 - loss 0.07065528 - time (sec): 4.23 - samples/sec: 2995.98 - lr: 0.000021 - momentum: 0.000000 2023-10-13 08:56:03,582 epoch 4 - iter 91/138 - loss 0.07272318 - time (sec): 4.96 - samples/sec: 2970.50 - lr: 0.000021 - momentum: 0.000000 2023-10-13 08:56:04,384 epoch 4 - iter 104/138 - loss 0.07925456 - time (sec): 5.76 - samples/sec: 2982.04 - lr: 0.000021 - momentum: 0.000000 2023-10-13 08:56:05,116 epoch 4 - iter 117/138 - loss 0.07849385 - time (sec): 6.49 - samples/sec: 2969.64 - lr: 0.000021 - momentum: 0.000000 2023-10-13 08:56:05,816 epoch 4 - iter 130/138 - loss 0.08000911 - time (sec): 7.19 - samples/sec: 2974.95 - lr: 0.000020 - momentum: 0.000000 2023-10-13 08:56:06,282 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:06,283 EPOCH 4 done: loss 0.0806 - lr: 0.000020 2023-10-13 08:56:06,929 DEV : loss 0.11944959312677383 - f1-score (micro avg) 0.8647 2023-10-13 08:56:06,934 saving best model 2023-10-13 08:56:07,410 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:08,184 epoch 5 - iter 13/138 - loss 0.04455185 - time (sec): 0.77 - samples/sec: 2978.01 - lr: 0.000020 - momentum: 0.000000 2023-10-13 08:56:08,883 epoch 5 - iter 26/138 - loss 0.05992813 - time (sec): 1.47 - samples/sec: 2969.05 - lr: 0.000019 - momentum: 0.000000 2023-10-13 08:56:09,579 epoch 5 - iter 39/138 - loss 0.06847538 - time (sec): 2.17 - samples/sec: 2968.84 - lr: 0.000019 - momentum: 0.000000 2023-10-13 08:56:10,312 epoch 5 - iter 52/138 - loss 0.06921582 - time (sec): 2.90 - samples/sec: 3006.87 - lr: 0.000019 - momentum: 0.000000 2023-10-13 08:56:11,043 epoch 5 - iter 65/138 - loss 0.06454351 - time (sec): 3.63 - samples/sec: 3037.21 - lr: 0.000018 - momentum: 0.000000 2023-10-13 08:56:11,741 epoch 5 - iter 78/138 - loss 0.06104228 - time (sec): 4.33 - samples/sec: 2991.88 - lr: 0.000018 - momentum: 0.000000 2023-10-13 08:56:12,423 epoch 5 - iter 91/138 - loss 0.06053616 - time (sec): 5.01 - samples/sec: 3012.00 - lr: 0.000018 - momentum: 0.000000 2023-10-13 08:56:13,136 epoch 5 - iter 104/138 - loss 0.05702704 - time (sec): 5.72 - samples/sec: 3007.00 - lr: 0.000018 - momentum: 0.000000 2023-10-13 08:56:13,889 epoch 5 - iter 117/138 - loss 0.06057969 - time (sec): 6.48 - samples/sec: 3008.34 - lr: 0.000017 - momentum: 0.000000 2023-10-13 08:56:14,585 epoch 5 - iter 130/138 - loss 0.05796395 - time (sec): 7.17 - samples/sec: 2983.23 - lr: 0.000017 - momentum: 0.000000 2023-10-13 08:56:15,051 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:15,051 EPOCH 5 done: loss 0.0560 - lr: 0.000017 2023-10-13 08:56:15,712 DEV : loss 0.14307376742362976 - f1-score (micro avg) 0.8645 2023-10-13 08:56:15,718 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:16,409 epoch 6 - iter 13/138 - loss 0.05583016 - time (sec): 0.69 - samples/sec: 3077.69 - lr: 0.000016 - momentum: 0.000000 2023-10-13 08:56:17,103 epoch 6 - iter 26/138 - loss 0.05558456 - time (sec): 1.38 - samples/sec: 2980.23 - lr: 0.000016 - momentum: 0.000000 2023-10-13 08:56:17,794 epoch 6 - iter 39/138 - loss 0.06217709 - time (sec): 2.08 - samples/sec: 2932.93 - lr: 0.000016 - momentum: 0.000000 2023-10-13 08:56:18,592 epoch 6 - iter 52/138 - loss 0.05287790 - time (sec): 2.87 - samples/sec: 2886.05 - lr: 0.000015 - momentum: 0.000000 2023-10-13 08:56:19,321 epoch 6 - iter 65/138 - loss 0.05224642 - time (sec): 3.60 - samples/sec: 2871.09 - lr: 0.000015 - momentum: 0.000000 2023-10-13 08:56:20,086 epoch 6 - iter 78/138 - loss 0.05314834 - time (sec): 4.37 - samples/sec: 2851.29 - lr: 0.000015 - momentum: 0.000000 2023-10-13 08:56:20,850 epoch 6 - iter 91/138 - loss 0.05054307 - time (sec): 5.13 - samples/sec: 2839.96 - lr: 0.000015 - momentum: 0.000000 2023-10-13 08:56:21,583 epoch 6 - iter 104/138 - loss 0.04958115 - time (sec): 5.86 - samples/sec: 2864.84 - lr: 0.000014 - momentum: 0.000000 2023-10-13 08:56:22,358 epoch 6 - iter 117/138 - loss 0.04491562 - time (sec): 6.64 - samples/sec: 2897.75 - lr: 0.000014 - momentum: 0.000000 2023-10-13 08:56:23,124 epoch 6 - iter 130/138 - loss 0.04152667 - time (sec): 7.41 - samples/sec: 2908.51 - lr: 0.000014 - momentum: 0.000000 2023-10-13 08:56:23,611 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:23,612 EPOCH 6 done: loss 0.0423 - lr: 0.000014 2023-10-13 08:56:24,252 DEV : loss 0.15504451096057892 - f1-score (micro avg) 0.8682 2023-10-13 08:56:24,258 saving best model 2023-10-13 08:56:24,735 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:25,476 epoch 7 - iter 13/138 - loss 0.03028371 - time (sec): 0.73 - samples/sec: 2893.04 - lr: 0.000013 - momentum: 0.000000 2023-10-13 08:56:26,194 epoch 7 - iter 26/138 - loss 0.03618885 - time (sec): 1.45 - samples/sec: 3014.09 - lr: 0.000013 - momentum: 0.000000 2023-10-13 08:56:26,878 epoch 7 - iter 39/138 - loss 0.03348011 - time (sec): 2.14 - samples/sec: 2976.79 - lr: 0.000012 - momentum: 0.000000 2023-10-13 08:56:27,628 epoch 7 - iter 52/138 - loss 0.03659548 - time (sec): 2.89 - samples/sec: 3003.87 - lr: 0.000012 - momentum: 0.000000 2023-10-13 08:56:28,349 epoch 7 - iter 65/138 - loss 0.03402997 - time (sec): 3.61 - samples/sec: 2985.22 - lr: 0.000012 - momentum: 0.000000 2023-10-13 08:56:29,048 epoch 7 - iter 78/138 - loss 0.03277293 - time (sec): 4.31 - samples/sec: 2950.15 - lr: 0.000012 - momentum: 0.000000 2023-10-13 08:56:29,733 epoch 7 - iter 91/138 - loss 0.02973205 - time (sec): 4.99 - samples/sec: 2960.37 - lr: 0.000011 - momentum: 0.000000 2023-10-13 08:56:30,424 epoch 7 - iter 104/138 - loss 0.03177800 - time (sec): 5.68 - samples/sec: 2961.55 - lr: 0.000011 - momentum: 0.000000 2023-10-13 08:56:31,156 epoch 7 - iter 117/138 - loss 0.03571944 - time (sec): 6.41 - samples/sec: 2945.82 - lr: 0.000011 - momentum: 0.000000 2023-10-13 08:56:31,861 epoch 7 - iter 130/138 - loss 0.03317145 - time (sec): 7.12 - samples/sec: 2995.83 - lr: 0.000010 - momentum: 0.000000 2023-10-13 08:56:32,346 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:32,347 EPOCH 7 done: loss 0.0344 - lr: 0.000010 2023-10-13 08:56:33,038 DEV : loss 0.16303503513336182 - f1-score (micro avg) 0.869 2023-10-13 08:56:33,043 saving best model 2023-10-13 08:56:33,506 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:34,208 epoch 8 - iter 13/138 - loss 0.04174621 - time (sec): 0.70 - samples/sec: 3112.07 - lr: 0.000010 - momentum: 0.000000 2023-10-13 08:56:34,943 epoch 8 - iter 26/138 - loss 0.03564655 - time (sec): 1.44 - samples/sec: 3135.33 - lr: 0.000009 - momentum: 0.000000 2023-10-13 08:56:35,690 epoch 8 - iter 39/138 - loss 0.03078857 - time (sec): 2.18 - samples/sec: 3087.36 - lr: 0.000009 - momentum: 0.000000 2023-10-13 08:56:36,438 epoch 8 - iter 52/138 - loss 0.02824037 - time (sec): 2.93 - samples/sec: 2999.92 - lr: 0.000009 - momentum: 0.000000 2023-10-13 08:56:37,114 epoch 8 - iter 65/138 - loss 0.03416201 - time (sec): 3.61 - samples/sec: 2976.52 - lr: 0.000009 - momentum: 0.000000 2023-10-13 08:56:37,809 epoch 8 - iter 78/138 - loss 0.03021071 - time (sec): 4.30 - samples/sec: 3001.05 - lr: 0.000008 - momentum: 0.000000 2023-10-13 08:56:38,523 epoch 8 - iter 91/138 - loss 0.02979412 - time (sec): 5.02 - samples/sec: 2951.07 - lr: 0.000008 - momentum: 0.000000 2023-10-13 08:56:39,292 epoch 8 - iter 104/138 - loss 0.03054127 - time (sec): 5.78 - samples/sec: 2948.92 - lr: 0.000008 - momentum: 0.000000 2023-10-13 08:56:39,952 epoch 8 - iter 117/138 - loss 0.02900375 - time (sec): 6.44 - samples/sec: 2961.97 - lr: 0.000007 - momentum: 0.000000 2023-10-13 08:56:40,794 epoch 8 - iter 130/138 - loss 0.02735633 - time (sec): 7.29 - samples/sec: 2942.14 - lr: 0.000007 - momentum: 0.000000 2023-10-13 08:56:41,285 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:41,285 EPOCH 8 done: loss 0.0269 - lr: 0.000007 2023-10-13 08:56:41,985 DEV : loss 0.16464966535568237 - f1-score (micro avg) 0.8653 2023-10-13 08:56:41,992 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:42,836 epoch 9 - iter 13/138 - loss 0.01084989 - time (sec): 0.84 - samples/sec: 2749.55 - lr: 0.000006 - momentum: 0.000000 2023-10-13 08:56:43,716 epoch 9 - iter 26/138 - loss 0.01287242 - time (sec): 1.72 - samples/sec: 2578.66 - lr: 0.000006 - momentum: 0.000000 2023-10-13 08:56:44,506 epoch 9 - iter 39/138 - loss 0.00954964 - time (sec): 2.51 - samples/sec: 2576.09 - lr: 0.000006 - momentum: 0.000000 2023-10-13 08:56:45,286 epoch 9 - iter 52/138 - loss 0.01496470 - time (sec): 3.29 - samples/sec: 2656.45 - lr: 0.000005 - momentum: 0.000000 2023-10-13 08:56:46,055 epoch 9 - iter 65/138 - loss 0.02251741 - time (sec): 4.06 - samples/sec: 2712.53 - lr: 0.000005 - momentum: 0.000000 2023-10-13 08:56:46,816 epoch 9 - iter 78/138 - loss 0.02222717 - time (sec): 4.82 - samples/sec: 2748.33 - lr: 0.000005 - momentum: 0.000000 2023-10-13 08:56:47,576 epoch 9 - iter 91/138 - loss 0.02364398 - time (sec): 5.58 - samples/sec: 2750.28 - lr: 0.000005 - momentum: 0.000000 2023-10-13 08:56:48,361 epoch 9 - iter 104/138 - loss 0.02214873 - time (sec): 6.37 - samples/sec: 2747.78 - lr: 0.000004 - momentum: 0.000000 2023-10-13 08:56:49,171 epoch 9 - iter 117/138 - loss 0.02128482 - time (sec): 7.18 - samples/sec: 2744.24 - lr: 0.000004 - momentum: 0.000000 2023-10-13 08:56:49,896 epoch 9 - iter 130/138 - loss 0.01985708 - time (sec): 7.90 - samples/sec: 2736.81 - lr: 0.000004 - momentum: 0.000000 2023-10-13 08:56:50,363 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:50,364 EPOCH 9 done: loss 0.0213 - lr: 0.000004 2023-10-13 08:56:51,039 DEV : loss 0.157485231757164 - f1-score (micro avg) 0.878 2023-10-13 08:56:51,044 saving best model 2023-10-13 08:56:51,521 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:52,311 epoch 10 - iter 13/138 - loss 0.03842880 - time (sec): 0.79 - samples/sec: 2703.78 - lr: 0.000003 - momentum: 0.000000 2023-10-13 08:56:53,031 epoch 10 - iter 26/138 - loss 0.03642163 - time (sec): 1.50 - samples/sec: 2895.86 - lr: 0.000003 - momentum: 0.000000 2023-10-13 08:56:53,800 epoch 10 - iter 39/138 - loss 0.03099660 - time (sec): 2.27 - samples/sec: 2874.39 - lr: 0.000002 - momentum: 0.000000 2023-10-13 08:56:54,572 epoch 10 - iter 52/138 - loss 0.02679781 - time (sec): 3.05 - samples/sec: 2869.85 - lr: 0.000002 - momentum: 0.000000 2023-10-13 08:56:55,332 epoch 10 - iter 65/138 - loss 0.02419200 - time (sec): 3.81 - samples/sec: 2883.76 - lr: 0.000002 - momentum: 0.000000 2023-10-13 08:56:56,119 epoch 10 - iter 78/138 - loss 0.02172226 - time (sec): 4.59 - samples/sec: 2870.91 - lr: 0.000002 - momentum: 0.000000 2023-10-13 08:56:56,817 epoch 10 - iter 91/138 - loss 0.02109726 - time (sec): 5.29 - samples/sec: 2881.57 - lr: 0.000001 - momentum: 0.000000 2023-10-13 08:56:57,564 epoch 10 - iter 104/138 - loss 0.02084847 - time (sec): 6.04 - samples/sec: 2899.55 - lr: 0.000001 - momentum: 0.000000 2023-10-13 08:56:58,282 epoch 10 - iter 117/138 - loss 0.02014893 - time (sec): 6.76 - samples/sec: 2904.60 - lr: 0.000001 - momentum: 0.000000 2023-10-13 08:56:58,985 epoch 10 - iter 130/138 - loss 0.01966590 - time (sec): 7.46 - samples/sec: 2907.61 - lr: 0.000000 - momentum: 0.000000 2023-10-13 08:56:59,403 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:56:59,404 EPOCH 10 done: loss 0.0194 - lr: 0.000000 2023-10-13 08:57:00,100 DEV : loss 0.15935906767845154 - f1-score (micro avg) 0.8809 2023-10-13 08:57:00,105 saving best model 2023-10-13 08:57:00,891 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:57:00,893 Loading model from best epoch ... 2023-10-13 08:57:02,434 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-13 08:57:03,219 Results: - F-score (micro) 0.9174 - F-score (macro) 0.7519 - Accuracy 0.8537 By class: precision recall f1-score support scope 0.8876 0.8977 0.8927 176 pers 0.9688 0.9688 0.9688 128 work 0.9041 0.8919 0.8980 74 loc 1.0000 1.0000 1.0000 2 object 0.0000 0.0000 0.0000 2 micro avg 0.9186 0.9162 0.9174 382 macro avg 0.7521 0.7517 0.7519 382 weighted avg 0.9139 0.9162 0.9151 382 2023-10-13 08:57:03,219 ----------------------------------------------------------------------------------------------------