2023-10-16 18:23:52,773 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,774 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-16 18:23:52,774 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,774 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-16 18:23:52,774 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,774 Train: 1166 sentences 2023-10-16 18:23:52,774 (train_with_dev=False, train_with_test=False) 2023-10-16 18:23:52,774 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,774 Training Params: 2023-10-16 18:23:52,774 - learning_rate: "5e-05" 2023-10-16 18:23:52,774 - mini_batch_size: "4" 2023-10-16 18:23:52,774 - max_epochs: "10" 2023-10-16 18:23:52,774 - shuffle: "True" 2023-10-16 18:23:52,774 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,774 Plugins: 2023-10-16 18:23:52,775 - LinearScheduler | warmup_fraction: '0.1' 2023-10-16 18:23:52,775 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,775 Final evaluation on model from best epoch (best-model.pt) 2023-10-16 18:23:52,775 - metric: "('micro avg', 'f1-score')" 2023-10-16 18:23:52,775 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,775 Computation: 2023-10-16 18:23:52,775 - compute on device: cuda:0 2023-10-16 18:23:52,775 - embedding storage: none 2023-10-16 18:23:52,775 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,775 Model training base path: "hmbench-newseye/fi-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-3" 2023-10-16 18:23:52,775 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:52,775 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:23:54,355 epoch 1 - iter 29/292 - loss 2.82861770 - time (sec): 1.58 - samples/sec: 2569.51 - lr: 0.000005 - momentum: 0.000000 2023-10-16 18:23:55,895 epoch 1 - iter 58/292 - loss 2.27049059 - time (sec): 3.12 - samples/sec: 2418.01 - lr: 0.000010 - momentum: 0.000000 2023-10-16 18:23:57,801 epoch 1 - iter 87/292 - loss 1.50862869 - time (sec): 5.02 - samples/sec: 2581.74 - lr: 0.000015 - momentum: 0.000000 2023-10-16 18:23:59,391 epoch 1 - iter 116/292 - loss 1.29469874 - time (sec): 6.62 - samples/sec: 2575.35 - lr: 0.000020 - momentum: 0.000000 2023-10-16 18:24:00,959 epoch 1 - iter 145/292 - loss 1.16113362 - time (sec): 8.18 - samples/sec: 2577.50 - lr: 0.000025 - momentum: 0.000000 2023-10-16 18:24:02,460 epoch 1 - iter 174/292 - loss 1.04202470 - time (sec): 9.68 - samples/sec: 2554.20 - lr: 0.000030 - momentum: 0.000000 2023-10-16 18:24:04,304 epoch 1 - iter 203/292 - loss 0.92592921 - time (sec): 11.53 - samples/sec: 2608.57 - lr: 0.000035 - momentum: 0.000000 2023-10-16 18:24:06,039 epoch 1 - iter 232/292 - loss 0.82865037 - time (sec): 13.26 - samples/sec: 2633.60 - lr: 0.000040 - momentum: 0.000000 2023-10-16 18:24:07,694 epoch 1 - iter 261/292 - loss 0.76127572 - time (sec): 14.92 - samples/sec: 2634.58 - lr: 0.000045 - momentum: 0.000000 2023-10-16 18:24:09,386 epoch 1 - iter 290/292 - loss 0.70436125 - time (sec): 16.61 - samples/sec: 2661.85 - lr: 0.000049 - momentum: 0.000000 2023-10-16 18:24:09,480 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:24:09,480 EPOCH 1 done: loss 0.7015 - lr: 0.000049 2023-10-16 18:24:10,491 DEV : loss 0.198698028922081 - f1-score (micro avg) 0.4183 2023-10-16 18:24:10,496 saving best model 2023-10-16 18:24:10,894 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:24:12,579 epoch 2 - iter 29/292 - loss 0.21870353 - time (sec): 1.68 - samples/sec: 2890.96 - lr: 0.000049 - momentum: 0.000000 2023-10-16 18:24:14,360 epoch 2 - iter 58/292 - loss 0.19710288 - time (sec): 3.46 - samples/sec: 2792.34 - lr: 0.000049 - momentum: 0.000000 2023-10-16 18:24:15,863 epoch 2 - iter 87/292 - loss 0.20636583 - time (sec): 4.97 - samples/sec: 2748.51 - lr: 0.000048 - momentum: 0.000000 2023-10-16 18:24:17,463 epoch 2 - iter 116/292 - loss 0.19745140 - time (sec): 6.57 - samples/sec: 2680.86 - lr: 0.000048 - momentum: 0.000000 2023-10-16 18:24:19,192 epoch 2 - iter 145/292 - loss 0.19457600 - time (sec): 8.30 - samples/sec: 2670.91 - lr: 0.000047 - momentum: 0.000000 2023-10-16 18:24:20,873 epoch 2 - iter 174/292 - loss 0.19655143 - time (sec): 9.98 - samples/sec: 2701.15 - lr: 0.000047 - momentum: 0.000000 2023-10-16 18:24:22,548 epoch 2 - iter 203/292 - loss 0.19279805 - time (sec): 11.65 - samples/sec: 2695.02 - lr: 0.000046 - momentum: 0.000000 2023-10-16 18:24:24,046 epoch 2 - iter 232/292 - loss 0.18956283 - time (sec): 13.15 - samples/sec: 2679.43 - lr: 0.000046 - momentum: 0.000000 2023-10-16 18:24:25,691 epoch 2 - iter 261/292 - loss 0.18794363 - time (sec): 14.80 - samples/sec: 2702.80 - lr: 0.000045 - momentum: 0.000000 2023-10-16 18:24:27,392 epoch 2 - iter 290/292 - loss 0.18131152 - time (sec): 16.50 - samples/sec: 2688.04 - lr: 0.000045 - momentum: 0.000000 2023-10-16 18:24:27,495 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:24:27,495 EPOCH 2 done: loss 0.1809 - lr: 0.000045 2023-10-16 18:24:28,829 DEV : loss 0.12991848587989807 - f1-score (micro avg) 0.6681 2023-10-16 18:24:28,836 saving best model 2023-10-16 18:24:29,353 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:24:31,427 epoch 3 - iter 29/292 - loss 0.14634692 - time (sec): 2.07 - samples/sec: 2612.34 - lr: 0.000044 - momentum: 0.000000 2023-10-16 18:24:33,032 epoch 3 - iter 58/292 - loss 0.15575082 - time (sec): 3.68 - samples/sec: 2612.36 - lr: 0.000043 - momentum: 0.000000 2023-10-16 18:24:34,896 epoch 3 - iter 87/292 - loss 0.14045427 - time (sec): 5.54 - samples/sec: 2653.09 - lr: 0.000043 - momentum: 0.000000 2023-10-16 18:24:36,530 epoch 3 - iter 116/292 - loss 0.12675597 - time (sec): 7.17 - samples/sec: 2688.94 - lr: 0.000042 - momentum: 0.000000 2023-10-16 18:24:38,274 epoch 3 - iter 145/292 - loss 0.12307056 - time (sec): 8.92 - samples/sec: 2729.57 - lr: 0.000042 - momentum: 0.000000 2023-10-16 18:24:39,809 epoch 3 - iter 174/292 - loss 0.12034832 - time (sec): 10.45 - samples/sec: 2692.55 - lr: 0.000041 - momentum: 0.000000 2023-10-16 18:24:41,378 epoch 3 - iter 203/292 - loss 0.11494198 - time (sec): 12.02 - samples/sec: 2652.33 - lr: 0.000041 - momentum: 0.000000 2023-10-16 18:24:42,914 epoch 3 - iter 232/292 - loss 0.11259923 - time (sec): 13.56 - samples/sec: 2679.55 - lr: 0.000040 - momentum: 0.000000 2023-10-16 18:24:44,531 epoch 3 - iter 261/292 - loss 0.11038783 - time (sec): 15.18 - samples/sec: 2660.06 - lr: 0.000040 - momentum: 0.000000 2023-10-16 18:24:46,091 epoch 3 - iter 290/292 - loss 0.10544311 - time (sec): 16.74 - samples/sec: 2644.00 - lr: 0.000039 - momentum: 0.000000 2023-10-16 18:24:46,179 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:24:46,180 EPOCH 3 done: loss 0.1058 - lr: 0.000039 2023-10-16 18:24:47,660 DEV : loss 0.14769691228866577 - f1-score (micro avg) 0.6436 2023-10-16 18:24:47,665 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:24:49,228 epoch 4 - iter 29/292 - loss 0.07509580 - time (sec): 1.56 - samples/sec: 2524.12 - lr: 0.000038 - momentum: 0.000000 2023-10-16 18:24:50,823 epoch 4 - iter 58/292 - loss 0.07886125 - time (sec): 3.16 - samples/sec: 2606.97 - lr: 0.000038 - momentum: 0.000000 2023-10-16 18:24:52,434 epoch 4 - iter 87/292 - loss 0.08386891 - time (sec): 4.77 - samples/sec: 2589.31 - lr: 0.000037 - momentum: 0.000000 2023-10-16 18:24:54,155 epoch 4 - iter 116/292 - loss 0.07402682 - time (sec): 6.49 - samples/sec: 2590.61 - lr: 0.000037 - momentum: 0.000000 2023-10-16 18:24:55,743 epoch 4 - iter 145/292 - loss 0.07089413 - time (sec): 8.08 - samples/sec: 2597.15 - lr: 0.000036 - momentum: 0.000000 2023-10-16 18:24:57,409 epoch 4 - iter 174/292 - loss 0.07525889 - time (sec): 9.74 - samples/sec: 2623.03 - lr: 0.000036 - momentum: 0.000000 2023-10-16 18:24:59,095 epoch 4 - iter 203/292 - loss 0.07936455 - time (sec): 11.43 - samples/sec: 2579.90 - lr: 0.000035 - momentum: 0.000000 2023-10-16 18:25:00,743 epoch 4 - iter 232/292 - loss 0.08057674 - time (sec): 13.08 - samples/sec: 2607.69 - lr: 0.000035 - momentum: 0.000000 2023-10-16 18:25:02,362 epoch 4 - iter 261/292 - loss 0.07932140 - time (sec): 14.70 - samples/sec: 2619.95 - lr: 0.000034 - momentum: 0.000000 2023-10-16 18:25:04,245 epoch 4 - iter 290/292 - loss 0.07381505 - time (sec): 16.58 - samples/sec: 2668.67 - lr: 0.000033 - momentum: 0.000000 2023-10-16 18:25:04,334 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:25:04,334 EPOCH 4 done: loss 0.0737 - lr: 0.000033 2023-10-16 18:25:05,677 DEV : loss 0.12834103405475616 - f1-score (micro avg) 0.7359 2023-10-16 18:25:05,683 saving best model 2023-10-16 18:25:06,300 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:25:08,081 epoch 5 - iter 29/292 - loss 0.07177539 - time (sec): 1.78 - samples/sec: 2425.89 - lr: 0.000033 - momentum: 0.000000 2023-10-16 18:25:09,758 epoch 5 - iter 58/292 - loss 0.06584511 - time (sec): 3.46 - samples/sec: 2491.49 - lr: 0.000032 - momentum: 0.000000 2023-10-16 18:25:11,555 epoch 5 - iter 87/292 - loss 0.05710451 - time (sec): 5.25 - samples/sec: 2522.65 - lr: 0.000032 - momentum: 0.000000 2023-10-16 18:25:13,217 epoch 5 - iter 116/292 - loss 0.05716308 - time (sec): 6.92 - samples/sec: 2475.43 - lr: 0.000031 - momentum: 0.000000 2023-10-16 18:25:14,897 epoch 5 - iter 145/292 - loss 0.05603785 - time (sec): 8.59 - samples/sec: 2511.36 - lr: 0.000031 - momentum: 0.000000 2023-10-16 18:25:16,552 epoch 5 - iter 174/292 - loss 0.05766367 - time (sec): 10.25 - samples/sec: 2515.05 - lr: 0.000030 - momentum: 0.000000 2023-10-16 18:25:18,212 epoch 5 - iter 203/292 - loss 0.05955690 - time (sec): 11.91 - samples/sec: 2543.13 - lr: 0.000030 - momentum: 0.000000 2023-10-16 18:25:19,868 epoch 5 - iter 232/292 - loss 0.05935010 - time (sec): 13.57 - samples/sec: 2596.39 - lr: 0.000029 - momentum: 0.000000 2023-10-16 18:25:21,531 epoch 5 - iter 261/292 - loss 0.05678007 - time (sec): 15.23 - samples/sec: 2597.33 - lr: 0.000028 - momentum: 0.000000 2023-10-16 18:25:23,191 epoch 5 - iter 290/292 - loss 0.05537101 - time (sec): 16.89 - samples/sec: 2620.66 - lr: 0.000028 - momentum: 0.000000 2023-10-16 18:25:23,284 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:25:23,284 EPOCH 5 done: loss 0.0552 - lr: 0.000028 2023-10-16 18:25:24,543 DEV : loss 0.14645273983478546 - f1-score (micro avg) 0.7352 2023-10-16 18:25:24,548 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:25:26,308 epoch 6 - iter 29/292 - loss 0.04452680 - time (sec): 1.76 - samples/sec: 2936.92 - lr: 0.000027 - momentum: 0.000000 2023-10-16 18:25:27,791 epoch 6 - iter 58/292 - loss 0.03953523 - time (sec): 3.24 - samples/sec: 2783.92 - lr: 0.000027 - momentum: 0.000000 2023-10-16 18:25:29,353 epoch 6 - iter 87/292 - loss 0.03594242 - time (sec): 4.80 - samples/sec: 2699.21 - lr: 0.000026 - momentum: 0.000000 2023-10-16 18:25:31,087 epoch 6 - iter 116/292 - loss 0.03355498 - time (sec): 6.54 - samples/sec: 2682.21 - lr: 0.000026 - momentum: 0.000000 2023-10-16 18:25:32,641 epoch 6 - iter 145/292 - loss 0.03011679 - time (sec): 8.09 - samples/sec: 2752.84 - lr: 0.000025 - momentum: 0.000000 2023-10-16 18:25:34,188 epoch 6 - iter 174/292 - loss 0.03277362 - time (sec): 9.64 - samples/sec: 2715.24 - lr: 0.000025 - momentum: 0.000000 2023-10-16 18:25:35,821 epoch 6 - iter 203/292 - loss 0.03272890 - time (sec): 11.27 - samples/sec: 2703.01 - lr: 0.000024 - momentum: 0.000000 2023-10-16 18:25:37,583 epoch 6 - iter 232/292 - loss 0.03614478 - time (sec): 13.03 - samples/sec: 2692.34 - lr: 0.000023 - momentum: 0.000000 2023-10-16 18:25:39,388 epoch 6 - iter 261/292 - loss 0.04456812 - time (sec): 14.84 - samples/sec: 2713.55 - lr: 0.000023 - momentum: 0.000000 2023-10-16 18:25:41,070 epoch 6 - iter 290/292 - loss 0.04463049 - time (sec): 16.52 - samples/sec: 2677.04 - lr: 0.000022 - momentum: 0.000000 2023-10-16 18:25:41,157 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:25:41,157 EPOCH 6 done: loss 0.0444 - lr: 0.000022 2023-10-16 18:25:42,447 DEV : loss 0.14799444377422333 - f1-score (micro avg) 0.7484 2023-10-16 18:25:42,452 saving best model 2023-10-16 18:25:42,958 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:25:44,786 epoch 7 - iter 29/292 - loss 0.03561790 - time (sec): 1.83 - samples/sec: 3036.38 - lr: 0.000022 - momentum: 0.000000 2023-10-16 18:25:46,453 epoch 7 - iter 58/292 - loss 0.02551739 - time (sec): 3.49 - samples/sec: 2816.66 - lr: 0.000021 - momentum: 0.000000 2023-10-16 18:25:48,083 epoch 7 - iter 87/292 - loss 0.02590902 - time (sec): 5.12 - samples/sec: 2751.37 - lr: 0.000021 - momentum: 0.000000 2023-10-16 18:25:49,738 epoch 7 - iter 116/292 - loss 0.02747668 - time (sec): 6.78 - samples/sec: 2682.94 - lr: 0.000020 - momentum: 0.000000 2023-10-16 18:25:51,399 epoch 7 - iter 145/292 - loss 0.03466284 - time (sec): 8.44 - samples/sec: 2669.94 - lr: 0.000020 - momentum: 0.000000 2023-10-16 18:25:53,181 epoch 7 - iter 174/292 - loss 0.03184043 - time (sec): 10.22 - samples/sec: 2664.73 - lr: 0.000019 - momentum: 0.000000 2023-10-16 18:25:54,741 epoch 7 - iter 203/292 - loss 0.02941846 - time (sec): 11.78 - samples/sec: 2660.51 - lr: 0.000018 - momentum: 0.000000 2023-10-16 18:25:56,439 epoch 7 - iter 232/292 - loss 0.02738168 - time (sec): 13.48 - samples/sec: 2675.12 - lr: 0.000018 - momentum: 0.000000 2023-10-16 18:25:58,056 epoch 7 - iter 261/292 - loss 0.03070679 - time (sec): 15.10 - samples/sec: 2674.75 - lr: 0.000017 - momentum: 0.000000 2023-10-16 18:25:59,711 epoch 7 - iter 290/292 - loss 0.03072038 - time (sec): 16.75 - samples/sec: 2647.79 - lr: 0.000017 - momentum: 0.000000 2023-10-16 18:25:59,802 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:25:59,802 EPOCH 7 done: loss 0.0310 - lr: 0.000017 2023-10-16 18:26:01,110 DEV : loss 0.19859679043293 - f1-score (micro avg) 0.7 2023-10-16 18:26:01,122 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:26:02,791 epoch 8 - iter 29/292 - loss 0.02307857 - time (sec): 1.67 - samples/sec: 2596.11 - lr: 0.000016 - momentum: 0.000000 2023-10-16 18:26:04,483 epoch 8 - iter 58/292 - loss 0.01807390 - time (sec): 3.36 - samples/sec: 2719.87 - lr: 0.000016 - momentum: 0.000000 2023-10-16 18:26:06,451 epoch 8 - iter 87/292 - loss 0.02050120 - time (sec): 5.33 - samples/sec: 2515.58 - lr: 0.000015 - momentum: 0.000000 2023-10-16 18:26:08,050 epoch 8 - iter 116/292 - loss 0.01843879 - time (sec): 6.93 - samples/sec: 2494.41 - lr: 0.000015 - momentum: 0.000000 2023-10-16 18:26:09,701 epoch 8 - iter 145/292 - loss 0.02220405 - time (sec): 8.58 - samples/sec: 2566.57 - lr: 0.000014 - momentum: 0.000000 2023-10-16 18:26:11,313 epoch 8 - iter 174/292 - loss 0.01973760 - time (sec): 10.19 - samples/sec: 2591.01 - lr: 0.000013 - momentum: 0.000000 2023-10-16 18:26:13,004 epoch 8 - iter 203/292 - loss 0.02021888 - time (sec): 11.88 - samples/sec: 2585.02 - lr: 0.000013 - momentum: 0.000000 2023-10-16 18:26:14,724 epoch 8 - iter 232/292 - loss 0.02196907 - time (sec): 13.60 - samples/sec: 2614.75 - lr: 0.000012 - momentum: 0.000000 2023-10-16 18:26:16,202 epoch 8 - iter 261/292 - loss 0.02142726 - time (sec): 15.08 - samples/sec: 2600.76 - lr: 0.000012 - momentum: 0.000000 2023-10-16 18:26:17,917 epoch 8 - iter 290/292 - loss 0.02133524 - time (sec): 16.79 - samples/sec: 2629.43 - lr: 0.000011 - momentum: 0.000000 2023-10-16 18:26:18,016 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:26:18,017 EPOCH 8 done: loss 0.0217 - lr: 0.000011 2023-10-16 18:26:19,265 DEV : loss 0.18336039781570435 - f1-score (micro avg) 0.7265 2023-10-16 18:26:19,269 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:26:20,781 epoch 9 - iter 29/292 - loss 0.00973806 - time (sec): 1.51 - samples/sec: 2912.72 - lr: 0.000011 - momentum: 0.000000 2023-10-16 18:26:22,671 epoch 9 - iter 58/292 - loss 0.01556620 - time (sec): 3.40 - samples/sec: 2708.61 - lr: 0.000010 - momentum: 0.000000 2023-10-16 18:26:24,279 epoch 9 - iter 87/292 - loss 0.01591292 - time (sec): 5.01 - samples/sec: 2680.62 - lr: 0.000010 - momentum: 0.000000 2023-10-16 18:26:25,971 epoch 9 - iter 116/292 - loss 0.01447470 - time (sec): 6.70 - samples/sec: 2738.05 - lr: 0.000009 - momentum: 0.000000 2023-10-16 18:26:27,665 epoch 9 - iter 145/292 - loss 0.01441480 - time (sec): 8.39 - samples/sec: 2754.42 - lr: 0.000008 - momentum: 0.000000 2023-10-16 18:26:29,466 epoch 9 - iter 174/292 - loss 0.01538477 - time (sec): 10.20 - samples/sec: 2754.27 - lr: 0.000008 - momentum: 0.000000 2023-10-16 18:26:31,091 epoch 9 - iter 203/292 - loss 0.01607027 - time (sec): 11.82 - samples/sec: 2727.62 - lr: 0.000007 - momentum: 0.000000 2023-10-16 18:26:32,659 epoch 9 - iter 232/292 - loss 0.01513598 - time (sec): 13.39 - samples/sec: 2699.68 - lr: 0.000007 - momentum: 0.000000 2023-10-16 18:26:34,245 epoch 9 - iter 261/292 - loss 0.01547309 - time (sec): 14.97 - samples/sec: 2677.88 - lr: 0.000006 - momentum: 0.000000 2023-10-16 18:26:35,816 epoch 9 - iter 290/292 - loss 0.01443794 - time (sec): 16.55 - samples/sec: 2666.50 - lr: 0.000006 - momentum: 0.000000 2023-10-16 18:26:35,924 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:26:35,924 EPOCH 9 done: loss 0.0148 - lr: 0.000006 2023-10-16 18:26:37,193 DEV : loss 0.18265648186206818 - f1-score (micro avg) 0.7039 2023-10-16 18:26:37,198 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:26:38,851 epoch 10 - iter 29/292 - loss 0.00403559 - time (sec): 1.65 - samples/sec: 2836.55 - lr: 0.000005 - momentum: 0.000000 2023-10-16 18:26:40,560 epoch 10 - iter 58/292 - loss 0.00536696 - time (sec): 3.36 - samples/sec: 2893.01 - lr: 0.000005 - momentum: 0.000000 2023-10-16 18:26:42,186 epoch 10 - iter 87/292 - loss 0.01069592 - time (sec): 4.99 - samples/sec: 2799.15 - lr: 0.000004 - momentum: 0.000000 2023-10-16 18:26:43,781 epoch 10 - iter 116/292 - loss 0.00956979 - time (sec): 6.58 - samples/sec: 2751.92 - lr: 0.000003 - momentum: 0.000000 2023-10-16 18:26:45,406 epoch 10 - iter 145/292 - loss 0.01094773 - time (sec): 8.21 - samples/sec: 2733.42 - lr: 0.000003 - momentum: 0.000000 2023-10-16 18:26:47,151 epoch 10 - iter 174/292 - loss 0.01223287 - time (sec): 9.95 - samples/sec: 2766.65 - lr: 0.000002 - momentum: 0.000000 2023-10-16 18:26:48,801 epoch 10 - iter 203/292 - loss 0.01118601 - time (sec): 11.60 - samples/sec: 2747.13 - lr: 0.000002 - momentum: 0.000000 2023-10-16 18:26:50,388 epoch 10 - iter 232/292 - loss 0.01079535 - time (sec): 13.19 - samples/sec: 2709.71 - lr: 0.000001 - momentum: 0.000000 2023-10-16 18:26:52,076 epoch 10 - iter 261/292 - loss 0.01043899 - time (sec): 14.88 - samples/sec: 2713.94 - lr: 0.000001 - momentum: 0.000000 2023-10-16 18:26:53,572 epoch 10 - iter 290/292 - loss 0.01121750 - time (sec): 16.37 - samples/sec: 2700.50 - lr: 0.000000 - momentum: 0.000000 2023-10-16 18:26:53,665 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:26:53,665 EPOCH 10 done: loss 0.0112 - lr: 0.000000 2023-10-16 18:26:54,953 DEV : loss 0.1769292950630188 - f1-score (micro avg) 0.7277 2023-10-16 18:26:55,295 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:26:55,296 Loading model from best epoch ... 2023-10-16 18:26:56,968 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-16 18:26:59,639 Results: - F-score (micro) 0.7629 - F-score (macro) 0.6925 - Accuracy 0.639 By class: precision recall f1-score support PER 0.7989 0.8448 0.8212 348 LOC 0.6804 0.8238 0.7452 261 ORG 0.4651 0.3846 0.4211 52 HumanProd 0.7500 0.8182 0.7826 22 micro avg 0.7284 0.8009 0.7629 683 macro avg 0.6736 0.7178 0.6925 683 weighted avg 0.7266 0.8009 0.7605 683 2023-10-16 18:26:59,639 ----------------------------------------------------------------------------------------------------