2023-10-25 21:39:22,701 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,702 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-25 21:39:22,702 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,703 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-25 21:39:22,703 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,703 Train: 1085 sentences 2023-10-25 21:39:22,703 (train_with_dev=False, train_with_test=False) 2023-10-25 21:39:22,703 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,703 Training Params: 2023-10-25 21:39:22,703 - learning_rate: "5e-05" 2023-10-25 21:39:22,703 - mini_batch_size: "4" 2023-10-25 21:39:22,703 - max_epochs: "10" 2023-10-25 21:39:22,703 - shuffle: "True" 2023-10-25 21:39:22,703 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,703 Plugins: 2023-10-25 21:39:22,703 - TensorboardLogger 2023-10-25 21:39:22,703 - LinearScheduler | warmup_fraction: '0.1' 2023-10-25 21:39:22,703 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,703 Final evaluation on model from best epoch (best-model.pt) 2023-10-25 21:39:22,704 - metric: "('micro avg', 'f1-score')" 2023-10-25 21:39:22,704 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,704 Computation: 2023-10-25 21:39:22,704 - compute on device: cuda:0 2023-10-25 21:39:22,704 - embedding storage: none 2023-10-25 21:39:22,704 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,704 Model training base path: "hmbench-newseye/sv-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-5" 2023-10-25 21:39:22,704 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,704 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:22,704 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-25 21:39:24,155 epoch 1 - iter 27/272 - loss 2.49805446 - time (sec): 1.45 - samples/sec: 3540.00 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:39:25,675 epoch 1 - iter 54/272 - loss 1.67544873 - time (sec): 2.97 - samples/sec: 3496.77 - lr: 0.000010 - momentum: 0.000000 2023-10-25 21:39:27,155 epoch 1 - iter 81/272 - loss 1.26708392 - time (sec): 4.45 - samples/sec: 3630.39 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:39:28,663 epoch 1 - iter 108/272 - loss 1.04932330 - time (sec): 5.96 - samples/sec: 3687.73 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:39:30,181 epoch 1 - iter 135/272 - loss 0.90812403 - time (sec): 7.48 - samples/sec: 3600.75 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:39:31,693 epoch 1 - iter 162/272 - loss 0.80144087 - time (sec): 8.99 - samples/sec: 3564.49 - lr: 0.000030 - momentum: 0.000000 2023-10-25 21:39:33,167 epoch 1 - iter 189/272 - loss 0.71754388 - time (sec): 10.46 - samples/sec: 3591.83 - lr: 0.000035 - momentum: 0.000000 2023-10-25 21:39:34,661 epoch 1 - iter 216/272 - loss 0.66141360 - time (sec): 11.96 - samples/sec: 3525.30 - lr: 0.000040 - momentum: 0.000000 2023-10-25 21:39:36,195 epoch 1 - iter 243/272 - loss 0.60466834 - time (sec): 13.49 - samples/sec: 3541.07 - lr: 0.000044 - momentum: 0.000000 2023-10-25 21:39:37,648 epoch 1 - iter 270/272 - loss 0.57299630 - time (sec): 14.94 - samples/sec: 3462.38 - lr: 0.000049 - momentum: 0.000000 2023-10-25 21:39:37,745 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:37,745 EPOCH 1 done: loss 0.5710 - lr: 0.000049 2023-10-25 21:39:38,421 DEV : loss 0.1312190741300583 - f1-score (micro avg) 0.6734 2023-10-25 21:39:38,428 saving best model 2023-10-25 21:39:38,939 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:40,412 epoch 2 - iter 27/272 - loss 0.13994969 - time (sec): 1.47 - samples/sec: 3542.86 - lr: 0.000049 - momentum: 0.000000 2023-10-25 21:39:41,895 epoch 2 - iter 54/272 - loss 0.14777439 - time (sec): 2.95 - samples/sec: 3635.29 - lr: 0.000049 - momentum: 0.000000 2023-10-25 21:39:43,400 epoch 2 - iter 81/272 - loss 0.14433581 - time (sec): 4.46 - samples/sec: 3437.04 - lr: 0.000048 - momentum: 0.000000 2023-10-25 21:39:44,891 epoch 2 - iter 108/272 - loss 0.13742883 - time (sec): 5.95 - samples/sec: 3398.38 - lr: 0.000048 - momentum: 0.000000 2023-10-25 21:39:46,405 epoch 2 - iter 135/272 - loss 0.12798837 - time (sec): 7.46 - samples/sec: 3425.00 - lr: 0.000047 - momentum: 0.000000 2023-10-25 21:39:47,911 epoch 2 - iter 162/272 - loss 0.13103800 - time (sec): 8.97 - samples/sec: 3424.63 - lr: 0.000047 - momentum: 0.000000 2023-10-25 21:39:49,449 epoch 2 - iter 189/272 - loss 0.13367852 - time (sec): 10.51 - samples/sec: 3472.65 - lr: 0.000046 - momentum: 0.000000 2023-10-25 21:39:50,888 epoch 2 - iter 216/272 - loss 0.13388695 - time (sec): 11.95 - samples/sec: 3419.50 - lr: 0.000046 - momentum: 0.000000 2023-10-25 21:39:52,414 epoch 2 - iter 243/272 - loss 0.13310624 - time (sec): 13.47 - samples/sec: 3458.75 - lr: 0.000045 - momentum: 0.000000 2023-10-25 21:39:53,891 epoch 2 - iter 270/272 - loss 0.12902925 - time (sec): 14.95 - samples/sec: 3448.09 - lr: 0.000045 - momentum: 0.000000 2023-10-25 21:39:54,014 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:54,015 EPOCH 2 done: loss 0.1288 - lr: 0.000045 2023-10-25 21:39:55,628 DEV : loss 0.11965033411979675 - f1-score (micro avg) 0.7882 2023-10-25 21:39:55,635 saving best model 2023-10-25 21:39:56,316 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:39:57,805 epoch 3 - iter 27/272 - loss 0.07229947 - time (sec): 1.49 - samples/sec: 2928.53 - lr: 0.000044 - momentum: 0.000000 2023-10-25 21:39:59,248 epoch 3 - iter 54/272 - loss 0.07657749 - time (sec): 2.93 - samples/sec: 3221.53 - lr: 0.000043 - momentum: 0.000000 2023-10-25 21:40:00,792 epoch 3 - iter 81/272 - loss 0.06566703 - time (sec): 4.47 - samples/sec: 3350.84 - lr: 0.000043 - momentum: 0.000000 2023-10-25 21:40:02,256 epoch 3 - iter 108/272 - loss 0.07494847 - time (sec): 5.94 - samples/sec: 3275.73 - lr: 0.000042 - momentum: 0.000000 2023-10-25 21:40:03,887 epoch 3 - iter 135/272 - loss 0.06987444 - time (sec): 7.57 - samples/sec: 3372.36 - lr: 0.000042 - momentum: 0.000000 2023-10-25 21:40:05,534 epoch 3 - iter 162/272 - loss 0.06586629 - time (sec): 9.22 - samples/sec: 3397.13 - lr: 0.000041 - momentum: 0.000000 2023-10-25 21:40:07,052 epoch 3 - iter 189/272 - loss 0.06688391 - time (sec): 10.73 - samples/sec: 3437.33 - lr: 0.000041 - momentum: 0.000000 2023-10-25 21:40:08,532 epoch 3 - iter 216/272 - loss 0.06969988 - time (sec): 12.21 - samples/sec: 3392.47 - lr: 0.000040 - momentum: 0.000000 2023-10-25 21:40:10,001 epoch 3 - iter 243/272 - loss 0.06930302 - time (sec): 13.68 - samples/sec: 3371.69 - lr: 0.000040 - momentum: 0.000000 2023-10-25 21:40:11,506 epoch 3 - iter 270/272 - loss 0.06894899 - time (sec): 15.19 - samples/sec: 3395.60 - lr: 0.000039 - momentum: 0.000000 2023-10-25 21:40:11,607 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:40:11,607 EPOCH 3 done: loss 0.0699 - lr: 0.000039 2023-10-25 21:40:12,807 DEV : loss 0.11476627737283707 - f1-score (micro avg) 0.7796 2023-10-25 21:40:12,813 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:40:14,352 epoch 4 - iter 27/272 - loss 0.03266996 - time (sec): 1.54 - samples/sec: 3713.00 - lr: 0.000038 - momentum: 0.000000 2023-10-25 21:40:15,870 epoch 4 - iter 54/272 - loss 0.03739765 - time (sec): 3.06 - samples/sec: 3748.93 - lr: 0.000038 - momentum: 0.000000 2023-10-25 21:40:17,437 epoch 4 - iter 81/272 - loss 0.03454158 - time (sec): 4.62 - samples/sec: 3703.81 - lr: 0.000037 - momentum: 0.000000 2023-10-25 21:40:18,898 epoch 4 - iter 108/272 - loss 0.03836219 - time (sec): 6.08 - samples/sec: 3585.70 - lr: 0.000037 - momentum: 0.000000 2023-10-25 21:40:20,332 epoch 4 - iter 135/272 - loss 0.03954303 - time (sec): 7.52 - samples/sec: 3442.41 - lr: 0.000036 - momentum: 0.000000 2023-10-25 21:40:21,810 epoch 4 - iter 162/272 - loss 0.04116692 - time (sec): 9.00 - samples/sec: 3434.51 - lr: 0.000036 - momentum: 0.000000 2023-10-25 21:40:23,406 epoch 4 - iter 189/272 - loss 0.04364166 - time (sec): 10.59 - samples/sec: 3520.44 - lr: 0.000035 - momentum: 0.000000 2023-10-25 21:40:24,868 epoch 4 - iter 216/272 - loss 0.04463442 - time (sec): 12.05 - samples/sec: 3491.56 - lr: 0.000034 - momentum: 0.000000 2023-10-25 21:40:26,286 epoch 4 - iter 243/272 - loss 0.04378423 - time (sec): 13.47 - samples/sec: 3458.50 - lr: 0.000034 - momentum: 0.000000 2023-10-25 21:40:27,719 epoch 4 - iter 270/272 - loss 0.04515536 - time (sec): 14.90 - samples/sec: 3457.30 - lr: 0.000033 - momentum: 0.000000 2023-10-25 21:40:27,816 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:40:27,817 EPOCH 4 done: loss 0.0448 - lr: 0.000033 2023-10-25 21:40:28,998 DEV : loss 0.16075612604618073 - f1-score (micro avg) 0.7817 2023-10-25 21:40:29,003 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:40:30,401 epoch 5 - iter 27/272 - loss 0.02593003 - time (sec): 1.40 - samples/sec: 3711.81 - lr: 0.000033 - momentum: 0.000000 2023-10-25 21:40:31,853 epoch 5 - iter 54/272 - loss 0.02124892 - time (sec): 2.85 - samples/sec: 3440.04 - lr: 0.000032 - momentum: 0.000000 2023-10-25 21:40:33,271 epoch 5 - iter 81/272 - loss 0.02821459 - time (sec): 4.27 - samples/sec: 3361.61 - lr: 0.000032 - momentum: 0.000000 2023-10-25 21:40:34,717 epoch 5 - iter 108/272 - loss 0.02765530 - time (sec): 5.71 - samples/sec: 3429.78 - lr: 0.000031 - momentum: 0.000000 2023-10-25 21:40:36,143 epoch 5 - iter 135/272 - loss 0.03611608 - time (sec): 7.14 - samples/sec: 3424.71 - lr: 0.000031 - momentum: 0.000000 2023-10-25 21:40:37,651 epoch 5 - iter 162/272 - loss 0.03369221 - time (sec): 8.65 - samples/sec: 3535.61 - lr: 0.000030 - momentum: 0.000000 2023-10-25 21:40:39,147 epoch 5 - iter 189/272 - loss 0.03139247 - time (sec): 10.14 - samples/sec: 3589.30 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:40:40,625 epoch 5 - iter 216/272 - loss 0.03358985 - time (sec): 11.62 - samples/sec: 3586.62 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:40:42,089 epoch 5 - iter 243/272 - loss 0.03662728 - time (sec): 13.08 - samples/sec: 3532.03 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:40:43,560 epoch 5 - iter 270/272 - loss 0.03537001 - time (sec): 14.56 - samples/sec: 3560.81 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:40:43,665 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:40:43,666 EPOCH 5 done: loss 0.0353 - lr: 0.000028 2023-10-25 21:40:44,800 DEV : loss 0.16380402445793152 - f1-score (micro avg) 0.7751 2023-10-25 21:40:44,805 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:40:46,342 epoch 6 - iter 27/272 - loss 0.01481133 - time (sec): 1.54 - samples/sec: 3732.48 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:40:47,863 epoch 6 - iter 54/272 - loss 0.01850367 - time (sec): 3.06 - samples/sec: 3695.05 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:40:49,311 epoch 6 - iter 81/272 - loss 0.01883298 - time (sec): 4.50 - samples/sec: 3565.15 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:40:50,830 epoch 6 - iter 108/272 - loss 0.01882319 - time (sec): 6.02 - samples/sec: 3476.29 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:40:52,286 epoch 6 - iter 135/272 - loss 0.02367874 - time (sec): 7.48 - samples/sec: 3414.32 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:40:53,800 epoch 6 - iter 162/272 - loss 0.02132550 - time (sec): 8.99 - samples/sec: 3441.34 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:40:55,301 epoch 6 - iter 189/272 - loss 0.02360473 - time (sec): 10.49 - samples/sec: 3515.45 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:40:57,222 epoch 6 - iter 216/272 - loss 0.02161557 - time (sec): 12.42 - samples/sec: 3396.78 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:40:58,712 epoch 6 - iter 243/272 - loss 0.02278553 - time (sec): 13.90 - samples/sec: 3387.42 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:41:00,134 epoch 6 - iter 270/272 - loss 0.02113688 - time (sec): 15.33 - samples/sec: 3374.35 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:41:00,245 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:00,245 EPOCH 6 done: loss 0.0210 - lr: 0.000022 2023-10-25 21:41:01,550 DEV : loss 0.1769583821296692 - f1-score (micro avg) 0.7993 2023-10-25 21:41:01,557 saving best model 2023-10-25 21:41:02,266 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:03,785 epoch 7 - iter 27/272 - loss 0.02688400 - time (sec): 1.52 - samples/sec: 2915.87 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:41:05,247 epoch 7 - iter 54/272 - loss 0.02234700 - time (sec): 2.98 - samples/sec: 3064.03 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:41:06,737 epoch 7 - iter 81/272 - loss 0.01669336 - time (sec): 4.47 - samples/sec: 3160.69 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:41:08,276 epoch 7 - iter 108/272 - loss 0.01645342 - time (sec): 6.01 - samples/sec: 3273.87 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:41:09,728 epoch 7 - iter 135/272 - loss 0.01547215 - time (sec): 7.46 - samples/sec: 3279.73 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:41:11,200 epoch 7 - iter 162/272 - loss 0.01717666 - time (sec): 8.93 - samples/sec: 3344.47 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:41:12,708 epoch 7 - iter 189/272 - loss 0.01813691 - time (sec): 10.44 - samples/sec: 3405.98 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:41:14,193 epoch 7 - iter 216/272 - loss 0.01828301 - time (sec): 11.92 - samples/sec: 3383.23 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:41:15,699 epoch 7 - iter 243/272 - loss 0.01781809 - time (sec): 13.43 - samples/sec: 3424.57 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:41:17,291 epoch 7 - iter 270/272 - loss 0.01696126 - time (sec): 15.02 - samples/sec: 3431.39 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:41:17,416 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:17,417 EPOCH 7 done: loss 0.0169 - lr: 0.000017 2023-10-25 21:41:18,575 DEV : loss 0.19760426878929138 - f1-score (micro avg) 0.7934 2023-10-25 21:41:18,582 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:20,040 epoch 8 - iter 27/272 - loss 0.00228138 - time (sec): 1.46 - samples/sec: 3302.28 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:41:21,592 epoch 8 - iter 54/272 - loss 0.00949457 - time (sec): 3.01 - samples/sec: 3441.69 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:41:23,128 epoch 8 - iter 81/272 - loss 0.01336783 - time (sec): 4.54 - samples/sec: 3542.85 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:41:24,674 epoch 8 - iter 108/272 - loss 0.01437311 - time (sec): 6.09 - samples/sec: 3499.09 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:41:26,109 epoch 8 - iter 135/272 - loss 0.01210584 - time (sec): 7.53 - samples/sec: 3418.67 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:41:27,576 epoch 8 - iter 162/272 - loss 0.01394525 - time (sec): 8.99 - samples/sec: 3509.76 - lr: 0.000013 - momentum: 0.000000 2023-10-25 21:41:28,942 epoch 8 - iter 189/272 - loss 0.01282887 - time (sec): 10.36 - samples/sec: 3529.47 - lr: 0.000013 - momentum: 0.000000 2023-10-25 21:41:30,391 epoch 8 - iter 216/272 - loss 0.01369915 - time (sec): 11.81 - samples/sec: 3558.74 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:41:31,813 epoch 8 - iter 243/272 - loss 0.01272860 - time (sec): 13.23 - samples/sec: 3522.68 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:41:33,284 epoch 8 - iter 270/272 - loss 0.01228205 - time (sec): 14.70 - samples/sec: 3523.27 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:41:33,379 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:33,379 EPOCH 8 done: loss 0.0123 - lr: 0.000011 2023-10-25 21:41:34,601 DEV : loss 0.18747395277023315 - f1-score (micro avg) 0.8051 2023-10-25 21:41:34,608 saving best model 2023-10-25 21:41:35,363 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:36,857 epoch 9 - iter 27/272 - loss 0.00231539 - time (sec): 1.49 - samples/sec: 3274.11 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:41:38,266 epoch 9 - iter 54/272 - loss 0.00500163 - time (sec): 2.90 - samples/sec: 3069.42 - lr: 0.000010 - momentum: 0.000000 2023-10-25 21:41:39,718 epoch 9 - iter 81/272 - loss 0.00555753 - time (sec): 4.35 - samples/sec: 3291.06 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:41:41,176 epoch 9 - iter 108/272 - loss 0.00495039 - time (sec): 5.81 - samples/sec: 3300.73 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:41:42,740 epoch 9 - iter 135/272 - loss 0.00560261 - time (sec): 7.38 - samples/sec: 3353.92 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:41:44,268 epoch 9 - iter 162/272 - loss 0.00590316 - time (sec): 8.90 - samples/sec: 3494.87 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:41:45,803 epoch 9 - iter 189/272 - loss 0.00586282 - time (sec): 10.44 - samples/sec: 3518.11 - lr: 0.000007 - momentum: 0.000000 2023-10-25 21:41:47,368 epoch 9 - iter 216/272 - loss 0.00624663 - time (sec): 12.00 - samples/sec: 3538.85 - lr: 0.000007 - momentum: 0.000000 2023-10-25 21:41:48,868 epoch 9 - iter 243/272 - loss 0.00782273 - time (sec): 13.50 - samples/sec: 3556.05 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:41:50,201 epoch 9 - iter 270/272 - loss 0.00780728 - time (sec): 14.84 - samples/sec: 3480.92 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:41:50,302 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:50,302 EPOCH 9 done: loss 0.0078 - lr: 0.000006 2023-10-25 21:41:51,457 DEV : loss 0.1823568344116211 - f1-score (micro avg) 0.8073 2023-10-25 21:41:51,463 saving best model 2023-10-25 21:41:52,196 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:41:53,706 epoch 10 - iter 27/272 - loss 0.00845377 - time (sec): 1.51 - samples/sec: 3165.39 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:41:55,521 epoch 10 - iter 54/272 - loss 0.00663455 - time (sec): 3.32 - samples/sec: 3013.55 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:41:56,962 epoch 10 - iter 81/272 - loss 0.00509540 - time (sec): 4.76 - samples/sec: 3295.57 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:41:58,360 epoch 10 - iter 108/272 - loss 0.00549684 - time (sec): 6.16 - samples/sec: 3232.74 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:41:59,792 epoch 10 - iter 135/272 - loss 0.00473920 - time (sec): 7.59 - samples/sec: 3308.63 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:42:01,228 epoch 10 - iter 162/272 - loss 0.00471395 - time (sec): 9.03 - samples/sec: 3339.99 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:42:02,675 epoch 10 - iter 189/272 - loss 0.00409399 - time (sec): 10.48 - samples/sec: 3387.44 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:42:04,112 epoch 10 - iter 216/272 - loss 0.00421195 - time (sec): 11.91 - samples/sec: 3445.37 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:42:05,525 epoch 10 - iter 243/272 - loss 0.00424710 - time (sec): 13.33 - samples/sec: 3470.75 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:42:07,001 epoch 10 - iter 270/272 - loss 0.00508973 - time (sec): 14.80 - samples/sec: 3507.30 - lr: 0.000000 - momentum: 0.000000 2023-10-25 21:42:07,089 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:42:07,089 EPOCH 10 done: loss 0.0051 - lr: 0.000000 2023-10-25 21:42:08,275 DEV : loss 0.18975241482257843 - f1-score (micro avg) 0.8022 2023-10-25 21:42:08,776 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:42:08,777 Loading model from best epoch ... 2023-10-25 21:42:10,690 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-25 21:42:12,589 Results: - F-score (micro) 0.7927 - F-score (macro) 0.7586 - Accuracy 0.6734 By class: precision recall f1-score support LOC 0.8152 0.8910 0.8515 312 PER 0.6892 0.8317 0.7538 208 ORG 0.5957 0.5091 0.5490 55 HumanProd 0.7857 1.0000 0.8800 22 micro avg 0.7511 0.8392 0.7927 597 macro avg 0.7215 0.8080 0.7586 597 weighted avg 0.7500 0.8392 0.7906 597 2023-10-25 21:42:12,589 ----------------------------------------------------------------------------------------------------