2023-10-25 16:45:15,823 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,823 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 Train: 20847 sentences 2023-10-25 16:45:15,824 (train_with_dev=False, train_with_test=False) 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 Training Params: 2023-10-25 16:45:15,824 - learning_rate: "5e-05" 2023-10-25 16:45:15,824 - mini_batch_size: "8" 2023-10-25 16:45:15,824 - max_epochs: "10" 2023-10-25 16:45:15,824 - shuffle: "True" 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 Plugins: 2023-10-25 16:45:15,824 - TensorboardLogger 2023-10-25 16:45:15,824 - LinearScheduler | warmup_fraction: '0.1' 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 Final evaluation on model from best epoch (best-model.pt) 2023-10-25 16:45:15,824 - metric: "('micro avg', 'f1-score')" 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 Computation: 2023-10-25 16:45:15,824 - compute on device: cuda:0 2023-10-25 16:45:15,824 - embedding storage: none 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 Model training base path: "hmbench-newseye/de-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-4" 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:45:15,824 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-25 16:45:29,924 epoch 1 - iter 260/2606 - loss 1.44327200 - time (sec): 14.10 - samples/sec: 2479.77 - lr: 0.000005 - momentum: 0.000000 2023-10-25 16:45:44,833 epoch 1 - iter 520/2606 - loss 0.87020391 - time (sec): 29.01 - samples/sec: 2516.71 - lr: 0.000010 - momentum: 0.000000 2023-10-25 16:45:58,506 epoch 1 - iter 780/2606 - loss 0.67200000 - time (sec): 42.68 - samples/sec: 2488.08 - lr: 0.000015 - momentum: 0.000000 2023-10-25 16:46:12,807 epoch 1 - iter 1040/2606 - loss 0.56002942 - time (sec): 56.98 - samples/sec: 2552.72 - lr: 0.000020 - momentum: 0.000000 2023-10-25 16:46:27,006 epoch 1 - iter 1300/2606 - loss 0.48938374 - time (sec): 71.18 - samples/sec: 2564.16 - lr: 0.000025 - momentum: 0.000000 2023-10-25 16:46:40,890 epoch 1 - iter 1560/2606 - loss 0.43900299 - time (sec): 85.06 - samples/sec: 2585.58 - lr: 0.000030 - momentum: 0.000000 2023-10-25 16:46:55,169 epoch 1 - iter 1820/2606 - loss 0.39943791 - time (sec): 99.34 - samples/sec: 2600.59 - lr: 0.000035 - momentum: 0.000000 2023-10-25 16:47:09,222 epoch 1 - iter 2080/2606 - loss 0.37314820 - time (sec): 113.40 - samples/sec: 2595.66 - lr: 0.000040 - momentum: 0.000000 2023-10-25 16:47:23,358 epoch 1 - iter 2340/2606 - loss 0.35456131 - time (sec): 127.53 - samples/sec: 2592.17 - lr: 0.000045 - momentum: 0.000000 2023-10-25 16:47:37,439 epoch 1 - iter 2600/2606 - loss 0.33854746 - time (sec): 141.61 - samples/sec: 2592.16 - lr: 0.000050 - momentum: 0.000000 2023-10-25 16:47:37,706 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:47:37,707 EPOCH 1 done: loss 0.3387 - lr: 0.000050 2023-10-25 16:47:41,324 DEV : loss 0.118812695145607 - f1-score (micro avg) 0.1625 2023-10-25 16:47:41,349 saving best model 2023-10-25 16:47:41,819 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:47:55,809 epoch 2 - iter 260/2606 - loss 0.17564576 - time (sec): 13.99 - samples/sec: 2630.19 - lr: 0.000049 - momentum: 0.000000 2023-10-25 16:48:10,903 epoch 2 - iter 520/2606 - loss 0.16339489 - time (sec): 29.08 - samples/sec: 2633.99 - lr: 0.000049 - momentum: 0.000000 2023-10-25 16:48:25,129 epoch 2 - iter 780/2606 - loss 0.16367825 - time (sec): 43.31 - samples/sec: 2648.13 - lr: 0.000048 - momentum: 0.000000 2023-10-25 16:48:39,023 epoch 2 - iter 1040/2606 - loss 0.16465426 - time (sec): 57.20 - samples/sec: 2644.39 - lr: 0.000048 - momentum: 0.000000 2023-10-25 16:48:52,698 epoch 2 - iter 1300/2606 - loss 0.16691656 - time (sec): 70.88 - samples/sec: 2618.17 - lr: 0.000047 - momentum: 0.000000 2023-10-25 16:49:07,060 epoch 2 - iter 1560/2606 - loss 0.16504912 - time (sec): 85.24 - samples/sec: 2615.67 - lr: 0.000047 - momentum: 0.000000 2023-10-25 16:49:21,199 epoch 2 - iter 1820/2606 - loss 0.16040319 - time (sec): 99.38 - samples/sec: 2625.20 - lr: 0.000046 - momentum: 0.000000 2023-10-25 16:49:35,311 epoch 2 - iter 2080/2606 - loss 0.15902007 - time (sec): 113.49 - samples/sec: 2637.30 - lr: 0.000046 - momentum: 0.000000 2023-10-25 16:49:48,475 epoch 2 - iter 2340/2606 - loss 0.15709971 - time (sec): 126.65 - samples/sec: 2628.74 - lr: 0.000045 - momentum: 0.000000 2023-10-25 16:50:02,559 epoch 2 - iter 2600/2606 - loss 0.15647373 - time (sec): 140.74 - samples/sec: 2604.39 - lr: 0.000044 - momentum: 0.000000 2023-10-25 16:50:02,869 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:50:02,869 EPOCH 2 done: loss 0.1566 - lr: 0.000044 2023-10-25 16:50:09,872 DEV : loss 0.1611776500940323 - f1-score (micro avg) 0.3391 2023-10-25 16:50:09,896 saving best model 2023-10-25 16:50:10,500 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:50:24,874 epoch 3 - iter 260/2606 - loss 0.10670473 - time (sec): 14.37 - samples/sec: 2724.91 - lr: 0.000044 - momentum: 0.000000 2023-10-25 16:50:38,583 epoch 3 - iter 520/2606 - loss 0.10898329 - time (sec): 28.08 - samples/sec: 2706.78 - lr: 0.000043 - momentum: 0.000000 2023-10-25 16:50:52,379 epoch 3 - iter 780/2606 - loss 0.11166505 - time (sec): 41.88 - samples/sec: 2643.92 - lr: 0.000043 - momentum: 0.000000 2023-10-25 16:51:06,834 epoch 3 - iter 1040/2606 - loss 0.10939807 - time (sec): 56.33 - samples/sec: 2663.11 - lr: 0.000042 - momentum: 0.000000 2023-10-25 16:51:20,398 epoch 3 - iter 1300/2606 - loss 0.10997620 - time (sec): 69.90 - samples/sec: 2648.46 - lr: 0.000042 - momentum: 0.000000 2023-10-25 16:51:34,567 epoch 3 - iter 1560/2606 - loss 0.11114702 - time (sec): 84.07 - samples/sec: 2627.29 - lr: 0.000041 - momentum: 0.000000 2023-10-25 16:51:48,566 epoch 3 - iter 1820/2606 - loss 0.10915658 - time (sec): 98.06 - samples/sec: 2619.10 - lr: 0.000041 - momentum: 0.000000 2023-10-25 16:52:02,818 epoch 3 - iter 2080/2606 - loss 0.10966065 - time (sec): 112.32 - samples/sec: 2620.09 - lr: 0.000040 - momentum: 0.000000 2023-10-25 16:52:16,817 epoch 3 - iter 2340/2606 - loss 0.11109113 - time (sec): 126.31 - samples/sec: 2612.60 - lr: 0.000039 - momentum: 0.000000 2023-10-25 16:52:30,695 epoch 3 - iter 2600/2606 - loss 0.10926086 - time (sec): 140.19 - samples/sec: 2616.49 - lr: 0.000039 - momentum: 0.000000 2023-10-25 16:52:30,986 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:52:30,986 EPOCH 3 done: loss 0.1093 - lr: 0.000039 2023-10-25 16:52:38,212 DEV : loss 0.23014920949935913 - f1-score (micro avg) 0.3626 2023-10-25 16:52:38,236 saving best model 2023-10-25 16:52:38,689 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:52:52,399 epoch 4 - iter 260/2606 - loss 0.08026601 - time (sec): 13.71 - samples/sec: 2578.59 - lr: 0.000038 - momentum: 0.000000 2023-10-25 16:53:06,053 epoch 4 - iter 520/2606 - loss 0.07866155 - time (sec): 27.36 - samples/sec: 2540.99 - lr: 0.000038 - momentum: 0.000000 2023-10-25 16:53:20,595 epoch 4 - iter 780/2606 - loss 0.07694070 - time (sec): 41.90 - samples/sec: 2599.38 - lr: 0.000037 - momentum: 0.000000 2023-10-25 16:53:34,797 epoch 4 - iter 1040/2606 - loss 0.07572775 - time (sec): 56.11 - samples/sec: 2612.43 - lr: 0.000037 - momentum: 0.000000 2023-10-25 16:53:48,644 epoch 4 - iter 1300/2606 - loss 0.07947650 - time (sec): 69.95 - samples/sec: 2613.93 - lr: 0.000036 - momentum: 0.000000 2023-10-25 16:54:02,886 epoch 4 - iter 1560/2606 - loss 0.07880117 - time (sec): 84.20 - samples/sec: 2609.19 - lr: 0.000036 - momentum: 0.000000 2023-10-25 16:54:16,960 epoch 4 - iter 1820/2606 - loss 0.07887358 - time (sec): 98.27 - samples/sec: 2611.18 - lr: 0.000035 - momentum: 0.000000 2023-10-25 16:54:31,164 epoch 4 - iter 2080/2606 - loss 0.07742549 - time (sec): 112.47 - samples/sec: 2620.31 - lr: 0.000034 - momentum: 0.000000 2023-10-25 16:54:44,982 epoch 4 - iter 2340/2606 - loss 0.07826687 - time (sec): 126.29 - samples/sec: 2623.98 - lr: 0.000034 - momentum: 0.000000 2023-10-25 16:54:58,682 epoch 4 - iter 2600/2606 - loss 0.07860251 - time (sec): 139.99 - samples/sec: 2620.12 - lr: 0.000033 - momentum: 0.000000 2023-10-25 16:54:58,958 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:54:58,958 EPOCH 4 done: loss 0.0788 - lr: 0.000033 2023-10-25 16:55:05,179 DEV : loss 0.24600794911384583 - f1-score (micro avg) 0.358 2023-10-25 16:55:05,203 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:55:19,990 epoch 5 - iter 260/2606 - loss 0.05672130 - time (sec): 14.79 - samples/sec: 2594.82 - lr: 0.000033 - momentum: 0.000000 2023-10-25 16:55:34,219 epoch 5 - iter 520/2606 - loss 0.06166018 - time (sec): 29.01 - samples/sec: 2621.33 - lr: 0.000032 - momentum: 0.000000 2023-10-25 16:55:48,173 epoch 5 - iter 780/2606 - loss 0.06054051 - time (sec): 42.97 - samples/sec: 2619.67 - lr: 0.000032 - momentum: 0.000000 2023-10-25 16:56:02,180 epoch 5 - iter 1040/2606 - loss 0.05994699 - time (sec): 56.98 - samples/sec: 2592.31 - lr: 0.000031 - momentum: 0.000000 2023-10-25 16:56:16,341 epoch 5 - iter 1300/2606 - loss 0.06032593 - time (sec): 71.14 - samples/sec: 2605.83 - lr: 0.000031 - momentum: 0.000000 2023-10-25 16:56:30,625 epoch 5 - iter 1560/2606 - loss 0.05835629 - time (sec): 85.42 - samples/sec: 2598.27 - lr: 0.000030 - momentum: 0.000000 2023-10-25 16:56:44,827 epoch 5 - iter 1820/2606 - loss 0.06037339 - time (sec): 99.62 - samples/sec: 2592.66 - lr: 0.000029 - momentum: 0.000000 2023-10-25 16:56:58,661 epoch 5 - iter 2080/2606 - loss 0.05960227 - time (sec): 113.46 - samples/sec: 2600.76 - lr: 0.000029 - momentum: 0.000000 2023-10-25 16:57:11,848 epoch 5 - iter 2340/2606 - loss 0.05877380 - time (sec): 126.64 - samples/sec: 2617.27 - lr: 0.000028 - momentum: 0.000000 2023-10-25 16:57:26,094 epoch 5 - iter 2600/2606 - loss 0.05813652 - time (sec): 140.89 - samples/sec: 2605.23 - lr: 0.000028 - momentum: 0.000000 2023-10-25 16:57:26,431 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:57:26,431 EPOCH 5 done: loss 0.0581 - lr: 0.000028 2023-10-25 16:57:32,774 DEV : loss 0.29553988575935364 - f1-score (micro avg) 0.4099 2023-10-25 16:57:32,799 saving best model 2023-10-25 16:57:33,294 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:57:48,554 epoch 6 - iter 260/2606 - loss 0.04797258 - time (sec): 15.26 - samples/sec: 2573.89 - lr: 0.000027 - momentum: 0.000000 2023-10-25 16:58:02,779 epoch 6 - iter 520/2606 - loss 0.04837650 - time (sec): 29.48 - samples/sec: 2595.28 - lr: 0.000027 - momentum: 0.000000 2023-10-25 16:58:17,011 epoch 6 - iter 780/2606 - loss 0.04578472 - time (sec): 43.71 - samples/sec: 2609.95 - lr: 0.000026 - momentum: 0.000000 2023-10-25 16:58:31,560 epoch 6 - iter 1040/2606 - loss 0.04760570 - time (sec): 58.26 - samples/sec: 2572.96 - lr: 0.000026 - momentum: 0.000000 2023-10-25 16:58:45,137 epoch 6 - iter 1300/2606 - loss 0.04892803 - time (sec): 71.84 - samples/sec: 2569.00 - lr: 0.000025 - momentum: 0.000000 2023-10-25 16:58:58,688 epoch 6 - iter 1560/2606 - loss 0.05121577 - time (sec): 85.39 - samples/sec: 2567.67 - lr: 0.000024 - momentum: 0.000000 2023-10-25 16:59:13,684 epoch 6 - iter 1820/2606 - loss 0.05294998 - time (sec): 100.39 - samples/sec: 2576.77 - lr: 0.000024 - momentum: 0.000000 2023-10-25 16:59:27,754 epoch 6 - iter 2080/2606 - loss 0.05640148 - time (sec): 114.46 - samples/sec: 2573.59 - lr: 0.000023 - momentum: 0.000000 2023-10-25 16:59:41,576 epoch 6 - iter 2340/2606 - loss 0.05593627 - time (sec): 128.28 - samples/sec: 2576.66 - lr: 0.000023 - momentum: 0.000000 2023-10-25 16:59:55,997 epoch 6 - iter 2600/2606 - loss 0.05483941 - time (sec): 142.70 - samples/sec: 2565.84 - lr: 0.000022 - momentum: 0.000000 2023-10-25 16:59:56,369 ---------------------------------------------------------------------------------------------------- 2023-10-25 16:59:56,369 EPOCH 6 done: loss 0.0547 - lr: 0.000022 2023-10-25 17:00:02,624 DEV : loss 0.33623284101486206 - f1-score (micro avg) 0.3687 2023-10-25 17:00:02,649 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:00:16,615 epoch 7 - iter 260/2606 - loss 0.03858880 - time (sec): 13.97 - samples/sec: 2649.66 - lr: 0.000022 - momentum: 0.000000 2023-10-25 17:00:30,568 epoch 7 - iter 520/2606 - loss 0.04139851 - time (sec): 27.92 - samples/sec: 2635.80 - lr: 0.000021 - momentum: 0.000000 2023-10-25 17:00:45,290 epoch 7 - iter 780/2606 - loss 0.04241529 - time (sec): 42.64 - samples/sec: 2604.64 - lr: 0.000021 - momentum: 0.000000 2023-10-25 17:00:59,285 epoch 7 - iter 1040/2606 - loss 0.04833894 - time (sec): 56.64 - samples/sec: 2601.68 - lr: 0.000020 - momentum: 0.000000 2023-10-25 17:01:13,271 epoch 7 - iter 1300/2606 - loss 0.04856922 - time (sec): 70.62 - samples/sec: 2592.87 - lr: 0.000019 - momentum: 0.000000 2023-10-25 17:01:28,375 epoch 7 - iter 1560/2606 - loss 0.05096085 - time (sec): 85.73 - samples/sec: 2579.70 - lr: 0.000019 - momentum: 0.000000 2023-10-25 17:01:43,143 epoch 7 - iter 1820/2606 - loss 0.05958653 - time (sec): 100.49 - samples/sec: 2603.24 - lr: 0.000018 - momentum: 0.000000 2023-10-25 17:01:56,751 epoch 7 - iter 2080/2606 - loss 0.06793201 - time (sec): 114.10 - samples/sec: 2605.36 - lr: 0.000018 - momentum: 0.000000 2023-10-25 17:02:09,950 epoch 7 - iter 2340/2606 - loss 0.06878116 - time (sec): 127.30 - samples/sec: 2607.52 - lr: 0.000017 - momentum: 0.000000 2023-10-25 17:02:23,915 epoch 7 - iter 2600/2606 - loss 0.06941224 - time (sec): 141.27 - samples/sec: 2596.78 - lr: 0.000017 - momentum: 0.000000 2023-10-25 17:02:24,220 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:02:24,220 EPOCH 7 done: loss 0.0694 - lr: 0.000017 2023-10-25 17:02:30,444 DEV : loss 0.35152772068977356 - f1-score (micro avg) 0.3214 2023-10-25 17:02:30,469 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:02:44,163 epoch 8 - iter 260/2606 - loss 0.07588476 - time (sec): 13.69 - samples/sec: 2635.53 - lr: 0.000016 - momentum: 0.000000 2023-10-25 17:02:57,982 epoch 8 - iter 520/2606 - loss 0.10043230 - time (sec): 27.51 - samples/sec: 2649.59 - lr: 0.000016 - momentum: 0.000000 2023-10-25 17:03:11,756 epoch 8 - iter 780/2606 - loss 0.13203650 - time (sec): 41.29 - samples/sec: 2628.01 - lr: 0.000015 - momentum: 0.000000 2023-10-25 17:03:25,598 epoch 8 - iter 1040/2606 - loss 0.12385530 - time (sec): 55.13 - samples/sec: 2623.26 - lr: 0.000014 - momentum: 0.000000 2023-10-25 17:03:39,705 epoch 8 - iter 1300/2606 - loss 0.12813762 - time (sec): 69.23 - samples/sec: 2618.67 - lr: 0.000014 - momentum: 0.000000 2023-10-25 17:03:53,694 epoch 8 - iter 1560/2606 - loss 0.13153397 - time (sec): 83.22 - samples/sec: 2621.37 - lr: 0.000013 - momentum: 0.000000 2023-10-25 17:04:08,336 epoch 8 - iter 1820/2606 - loss 0.12905434 - time (sec): 97.87 - samples/sec: 2630.44 - lr: 0.000013 - momentum: 0.000000 2023-10-25 17:04:22,400 epoch 8 - iter 2080/2606 - loss 0.13389258 - time (sec): 111.93 - samples/sec: 2623.41 - lr: 0.000012 - momentum: 0.000000 2023-10-25 17:04:36,755 epoch 8 - iter 2340/2606 - loss 0.13480518 - time (sec): 126.28 - samples/sec: 2606.93 - lr: 0.000012 - momentum: 0.000000 2023-10-25 17:04:50,583 epoch 8 - iter 2600/2606 - loss 0.13531880 - time (sec): 140.11 - samples/sec: 2616.35 - lr: 0.000011 - momentum: 0.000000 2023-10-25 17:04:50,912 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:04:50,912 EPOCH 8 done: loss 0.1352 - lr: 0.000011 2023-10-25 17:04:57,141 DEV : loss 0.2633623480796814 - f1-score (micro avg) 0.2342 2023-10-25 17:04:57,166 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:05:11,033 epoch 9 - iter 260/2606 - loss 0.08967090 - time (sec): 13.87 - samples/sec: 2713.58 - lr: 0.000011 - momentum: 0.000000 2023-10-25 17:05:25,132 epoch 9 - iter 520/2606 - loss 0.09331506 - time (sec): 27.97 - samples/sec: 2651.52 - lr: 0.000010 - momentum: 0.000000 2023-10-25 17:05:38,813 epoch 9 - iter 780/2606 - loss 0.09092922 - time (sec): 41.65 - samples/sec: 2643.14 - lr: 0.000009 - momentum: 0.000000 2023-10-25 17:05:52,488 epoch 9 - iter 1040/2606 - loss 0.09774521 - time (sec): 55.32 - samples/sec: 2686.25 - lr: 0.000009 - momentum: 0.000000 2023-10-25 17:06:06,110 epoch 9 - iter 1300/2606 - loss 0.10814172 - time (sec): 68.94 - samples/sec: 2685.63 - lr: 0.000008 - momentum: 0.000000 2023-10-25 17:06:19,697 epoch 9 - iter 1560/2606 - loss 0.11196481 - time (sec): 82.53 - samples/sec: 2668.35 - lr: 0.000008 - momentum: 0.000000 2023-10-25 17:06:33,655 epoch 9 - iter 1820/2606 - loss 0.11081450 - time (sec): 96.49 - samples/sec: 2673.87 - lr: 0.000007 - momentum: 0.000000 2023-10-25 17:06:47,477 epoch 9 - iter 2080/2606 - loss 0.11167705 - time (sec): 110.31 - samples/sec: 2663.33 - lr: 0.000007 - momentum: 0.000000 2023-10-25 17:07:01,705 epoch 9 - iter 2340/2606 - loss 0.11066523 - time (sec): 124.54 - samples/sec: 2664.34 - lr: 0.000006 - momentum: 0.000000 2023-10-25 17:07:15,529 epoch 9 - iter 2600/2606 - loss 0.11158714 - time (sec): 138.36 - samples/sec: 2647.26 - lr: 0.000006 - momentum: 0.000000 2023-10-25 17:07:15,945 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:07:15,946 EPOCH 9 done: loss 0.1115 - lr: 0.000006 2023-10-25 17:07:22,880 DEV : loss 0.2682478427886963 - f1-score (micro avg) 0.2293 2023-10-25 17:07:22,913 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:07:37,193 epoch 10 - iter 260/2606 - loss 0.09408029 - time (sec): 14.28 - samples/sec: 2595.12 - lr: 0.000005 - momentum: 0.000000 2023-10-25 17:07:51,010 epoch 10 - iter 520/2606 - loss 0.09512229 - time (sec): 28.09 - samples/sec: 2593.26 - lr: 0.000004 - momentum: 0.000000 2023-10-25 17:08:05,607 epoch 10 - iter 780/2606 - loss 0.08964442 - time (sec): 42.69 - samples/sec: 2609.86 - lr: 0.000004 - momentum: 0.000000 2023-10-25 17:08:20,011 epoch 10 - iter 1040/2606 - loss 0.08987618 - time (sec): 57.10 - samples/sec: 2633.35 - lr: 0.000003 - momentum: 0.000000 2023-10-25 17:08:33,885 epoch 10 - iter 1300/2606 - loss 0.08788357 - time (sec): 70.97 - samples/sec: 2663.52 - lr: 0.000003 - momentum: 0.000000 2023-10-25 17:08:47,392 epoch 10 - iter 1560/2606 - loss 0.08724918 - time (sec): 84.48 - samples/sec: 2641.55 - lr: 0.000002 - momentum: 0.000000 2023-10-25 17:09:01,218 epoch 10 - iter 1820/2606 - loss 0.08700501 - time (sec): 98.30 - samples/sec: 2620.75 - lr: 0.000002 - momentum: 0.000000 2023-10-25 17:09:14,991 epoch 10 - iter 2080/2606 - loss 0.08812984 - time (sec): 112.08 - samples/sec: 2606.24 - lr: 0.000001 - momentum: 0.000000 2023-10-25 17:09:29,100 epoch 10 - iter 2340/2606 - loss 0.09074088 - time (sec): 126.18 - samples/sec: 2614.01 - lr: 0.000001 - momentum: 0.000000 2023-10-25 17:09:43,532 epoch 10 - iter 2600/2606 - loss 0.09068240 - time (sec): 140.62 - samples/sec: 2606.43 - lr: 0.000000 - momentum: 0.000000 2023-10-25 17:09:43,851 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:09:43,851 EPOCH 10 done: loss 0.0905 - lr: 0.000000 2023-10-25 17:09:50,726 DEV : loss 0.27786970138549805 - f1-score (micro avg) 0.2168 2023-10-25 17:09:51,221 ---------------------------------------------------------------------------------------------------- 2023-10-25 17:09:51,222 Loading model from best epoch ... 2023-10-25 17:09:52,830 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-25 17:10:02,533 Results: - F-score (micro) 0.4446 - F-score (macro) 0.2829 - Accuracy 0.2912 By class: precision recall f1-score support LOC 0.5264 0.5840 0.5537 1214 PER 0.4000 0.3490 0.3728 808 ORG 0.2194 0.1926 0.2051 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4461 0.4431 0.4446 2390 macro avg 0.2864 0.2814 0.2829 2390 weighted avg 0.4350 0.4431 0.4376 2390 2023-10-25 17:10:02,533 ----------------------------------------------------------------------------------------------------