2023-10-25 20:51:03,901 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,902 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-25 20:51:03,902 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,903 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-25 20:51:03,903 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,903 Train: 1085 sentences 2023-10-25 20:51:03,903 (train_with_dev=False, train_with_test=False) 2023-10-25 20:51:03,903 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,903 Training Params: 2023-10-25 20:51:03,903 - learning_rate: "3e-05" 2023-10-25 20:51:03,903 - mini_batch_size: "4" 2023-10-25 20:51:03,903 - max_epochs: "10" 2023-10-25 20:51:03,903 - shuffle: "True" 2023-10-25 20:51:03,903 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,903 Plugins: 2023-10-25 20:51:03,903 - TensorboardLogger 2023-10-25 20:51:03,903 - LinearScheduler | warmup_fraction: '0.1' 2023-10-25 20:51:03,903 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,903 Final evaluation on model from best epoch (best-model.pt) 2023-10-25 20:51:03,903 - metric: "('micro avg', 'f1-score')" 2023-10-25 20:51:03,903 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,903 Computation: 2023-10-25 20:51:03,903 - compute on device: cuda:0 2023-10-25 20:51:03,903 - embedding storage: none 2023-10-25 20:51:03,903 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,904 Model training base path: "hmbench-newseye/sv-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-1" 2023-10-25 20:51:03,904 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,904 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:03,904 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-25 20:51:05,373 epoch 1 - iter 27/272 - loss 3.35810402 - time (sec): 1.47 - samples/sec: 3492.66 - lr: 0.000003 - momentum: 0.000000 2023-10-25 20:51:06,887 epoch 1 - iter 54/272 - loss 2.62664709 - time (sec): 2.98 - samples/sec: 3401.67 - lr: 0.000006 - momentum: 0.000000 2023-10-25 20:51:08,380 epoch 1 - iter 81/272 - loss 1.99702193 - time (sec): 4.48 - samples/sec: 3374.69 - lr: 0.000009 - momentum: 0.000000 2023-10-25 20:51:09,943 epoch 1 - iter 108/272 - loss 1.58713504 - time (sec): 6.04 - samples/sec: 3404.72 - lr: 0.000012 - momentum: 0.000000 2023-10-25 20:51:11,444 epoch 1 - iter 135/272 - loss 1.36358638 - time (sec): 7.54 - samples/sec: 3368.62 - lr: 0.000015 - momentum: 0.000000 2023-10-25 20:51:12,949 epoch 1 - iter 162/272 - loss 1.18146019 - time (sec): 9.04 - samples/sec: 3376.99 - lr: 0.000018 - momentum: 0.000000 2023-10-25 20:51:14,393 epoch 1 - iter 189/272 - loss 1.06467820 - time (sec): 10.49 - samples/sec: 3358.79 - lr: 0.000021 - momentum: 0.000000 2023-10-25 20:51:15,843 epoch 1 - iter 216/272 - loss 0.96906368 - time (sec): 11.94 - samples/sec: 3321.01 - lr: 0.000024 - momentum: 0.000000 2023-10-25 20:51:17,363 epoch 1 - iter 243/272 - loss 0.86325291 - time (sec): 13.46 - samples/sec: 3386.37 - lr: 0.000027 - momentum: 0.000000 2023-10-25 20:51:18,899 epoch 1 - iter 270/272 - loss 0.78237789 - time (sec): 14.99 - samples/sec: 3458.33 - lr: 0.000030 - momentum: 0.000000 2023-10-25 20:51:19,005 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:19,005 EPOCH 1 done: loss 0.7808 - lr: 0.000030 2023-10-25 20:51:20,097 DEV : loss 0.15305721759796143 - f1-score (micro avg) 0.6386 2023-10-25 20:51:20,104 saving best model 2023-10-25 20:51:20,577 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:22,089 epoch 2 - iter 27/272 - loss 0.13098364 - time (sec): 1.51 - samples/sec: 4032.22 - lr: 0.000030 - momentum: 0.000000 2023-10-25 20:51:23,680 epoch 2 - iter 54/272 - loss 0.13797292 - time (sec): 3.10 - samples/sec: 3676.76 - lr: 0.000029 - momentum: 0.000000 2023-10-25 20:51:25,272 epoch 2 - iter 81/272 - loss 0.13466499 - time (sec): 4.69 - samples/sec: 3567.20 - lr: 0.000029 - momentum: 0.000000 2023-10-25 20:51:26,739 epoch 2 - iter 108/272 - loss 0.14329642 - time (sec): 6.16 - samples/sec: 3503.40 - lr: 0.000029 - momentum: 0.000000 2023-10-25 20:51:28,334 epoch 2 - iter 135/272 - loss 0.14715212 - time (sec): 7.76 - samples/sec: 3406.31 - lr: 0.000028 - momentum: 0.000000 2023-10-25 20:51:29,837 epoch 2 - iter 162/272 - loss 0.14527168 - time (sec): 9.26 - samples/sec: 3439.37 - lr: 0.000028 - momentum: 0.000000 2023-10-25 20:51:31,271 epoch 2 - iter 189/272 - loss 0.14428601 - time (sec): 10.69 - samples/sec: 3391.13 - lr: 0.000028 - momentum: 0.000000 2023-10-25 20:51:32,731 epoch 2 - iter 216/272 - loss 0.14094961 - time (sec): 12.15 - samples/sec: 3376.93 - lr: 0.000027 - momentum: 0.000000 2023-10-25 20:51:34,209 epoch 2 - iter 243/272 - loss 0.13825777 - time (sec): 13.63 - samples/sec: 3412.68 - lr: 0.000027 - momentum: 0.000000 2023-10-25 20:51:35,720 epoch 2 - iter 270/272 - loss 0.13406289 - time (sec): 15.14 - samples/sec: 3420.90 - lr: 0.000027 - momentum: 0.000000 2023-10-25 20:51:35,822 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:35,823 EPOCH 2 done: loss 0.1336 - lr: 0.000027 2023-10-25 20:51:37,026 DEV : loss 0.10869525372982025 - f1-score (micro avg) 0.7612 2023-10-25 20:51:37,032 saving best model 2023-10-25 20:51:37,679 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:39,125 epoch 3 - iter 27/272 - loss 0.06125906 - time (sec): 1.44 - samples/sec: 3180.36 - lr: 0.000026 - momentum: 0.000000 2023-10-25 20:51:40,638 epoch 3 - iter 54/272 - loss 0.05737341 - time (sec): 2.96 - samples/sec: 3829.78 - lr: 0.000026 - momentum: 0.000000 2023-10-25 20:51:42,149 epoch 3 - iter 81/272 - loss 0.06197287 - time (sec): 4.47 - samples/sec: 3671.03 - lr: 0.000026 - momentum: 0.000000 2023-10-25 20:51:43,609 epoch 3 - iter 108/272 - loss 0.06482587 - time (sec): 5.93 - samples/sec: 3526.11 - lr: 0.000025 - momentum: 0.000000 2023-10-25 20:51:45,141 epoch 3 - iter 135/272 - loss 0.06852388 - time (sec): 7.46 - samples/sec: 3507.27 - lr: 0.000025 - momentum: 0.000000 2023-10-25 20:51:46,621 epoch 3 - iter 162/272 - loss 0.06845236 - time (sec): 8.94 - samples/sec: 3533.20 - lr: 0.000025 - momentum: 0.000000 2023-10-25 20:51:48,120 epoch 3 - iter 189/272 - loss 0.06756303 - time (sec): 10.44 - samples/sec: 3460.60 - lr: 0.000024 - momentum: 0.000000 2023-10-25 20:51:49,609 epoch 3 - iter 216/272 - loss 0.06827772 - time (sec): 11.93 - samples/sec: 3469.25 - lr: 0.000024 - momentum: 0.000000 2023-10-25 20:51:51,080 epoch 3 - iter 243/272 - loss 0.07019403 - time (sec): 13.40 - samples/sec: 3441.78 - lr: 0.000024 - momentum: 0.000000 2023-10-25 20:51:52,526 epoch 3 - iter 270/272 - loss 0.06889892 - time (sec): 14.85 - samples/sec: 3470.59 - lr: 0.000023 - momentum: 0.000000 2023-10-25 20:51:52,642 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:52,642 EPOCH 3 done: loss 0.0684 - lr: 0.000023 2023-10-25 20:51:53,815 DEV : loss 0.11325477808713913 - f1-score (micro avg) 0.8008 2023-10-25 20:51:53,821 saving best model 2023-10-25 20:51:54,495 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:51:55,964 epoch 4 - iter 27/272 - loss 0.03659584 - time (sec): 1.46 - samples/sec: 3878.96 - lr: 0.000023 - momentum: 0.000000 2023-10-25 20:51:57,465 epoch 4 - iter 54/272 - loss 0.03315229 - time (sec): 2.96 - samples/sec: 3655.01 - lr: 0.000023 - momentum: 0.000000 2023-10-25 20:51:58,859 epoch 4 - iter 81/272 - loss 0.03745032 - time (sec): 4.36 - samples/sec: 3462.26 - lr: 0.000022 - momentum: 0.000000 2023-10-25 20:52:00,314 epoch 4 - iter 108/272 - loss 0.03711163 - time (sec): 5.81 - samples/sec: 3479.09 - lr: 0.000022 - momentum: 0.000000 2023-10-25 20:52:01,742 epoch 4 - iter 135/272 - loss 0.03775119 - time (sec): 7.24 - samples/sec: 3466.86 - lr: 0.000022 - momentum: 0.000000 2023-10-25 20:52:03,327 epoch 4 - iter 162/272 - loss 0.04244654 - time (sec): 8.83 - samples/sec: 3400.80 - lr: 0.000021 - momentum: 0.000000 2023-10-25 20:52:04,794 epoch 4 - iter 189/272 - loss 0.04115573 - time (sec): 10.29 - samples/sec: 3389.24 - lr: 0.000021 - momentum: 0.000000 2023-10-25 20:52:06,428 epoch 4 - iter 216/272 - loss 0.04090003 - time (sec): 11.93 - samples/sec: 3466.71 - lr: 0.000021 - momentum: 0.000000 2023-10-25 20:52:07,981 epoch 4 - iter 243/272 - loss 0.04016434 - time (sec): 13.48 - samples/sec: 3404.78 - lr: 0.000020 - momentum: 0.000000 2023-10-25 20:52:09,573 epoch 4 - iter 270/272 - loss 0.03989968 - time (sec): 15.07 - samples/sec: 3430.88 - lr: 0.000020 - momentum: 0.000000 2023-10-25 20:52:09,677 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:52:09,677 EPOCH 4 done: loss 0.0397 - lr: 0.000020 2023-10-25 20:52:10,943 DEV : loss 0.1432153433561325 - f1-score (micro avg) 0.7927 2023-10-25 20:52:10,950 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:52:12,397 epoch 5 - iter 27/272 - loss 0.03498180 - time (sec): 1.45 - samples/sec: 3105.70 - lr: 0.000020 - momentum: 0.000000 2023-10-25 20:52:13,883 epoch 5 - iter 54/272 - loss 0.03139512 - time (sec): 2.93 - samples/sec: 3136.88 - lr: 0.000019 - momentum: 0.000000 2023-10-25 20:52:15,479 epoch 5 - iter 81/272 - loss 0.02686437 - time (sec): 4.53 - samples/sec: 3278.40 - lr: 0.000019 - momentum: 0.000000 2023-10-25 20:52:17,076 epoch 5 - iter 108/272 - loss 0.02614851 - time (sec): 6.13 - samples/sec: 3313.61 - lr: 0.000019 - momentum: 0.000000 2023-10-25 20:52:18,602 epoch 5 - iter 135/272 - loss 0.02762334 - time (sec): 7.65 - samples/sec: 3288.24 - lr: 0.000018 - momentum: 0.000000 2023-10-25 20:52:20,123 epoch 5 - iter 162/272 - loss 0.02604272 - time (sec): 9.17 - samples/sec: 3385.64 - lr: 0.000018 - momentum: 0.000000 2023-10-25 20:52:21,652 epoch 5 - iter 189/272 - loss 0.02439411 - time (sec): 10.70 - samples/sec: 3327.17 - lr: 0.000018 - momentum: 0.000000 2023-10-25 20:52:23,621 epoch 5 - iter 216/272 - loss 0.02376852 - time (sec): 12.67 - samples/sec: 3258.47 - lr: 0.000017 - momentum: 0.000000 2023-10-25 20:52:25,077 epoch 5 - iter 243/272 - loss 0.02352355 - time (sec): 14.13 - samples/sec: 3289.41 - lr: 0.000017 - momentum: 0.000000 2023-10-25 20:52:26,579 epoch 5 - iter 270/272 - loss 0.02408055 - time (sec): 15.63 - samples/sec: 3309.97 - lr: 0.000017 - momentum: 0.000000 2023-10-25 20:52:26,689 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:52:26,689 EPOCH 5 done: loss 0.0244 - lr: 0.000017 2023-10-25 20:52:27,848 DEV : loss 0.16343659162521362 - f1-score (micro avg) 0.8102 2023-10-25 20:52:27,855 saving best model 2023-10-25 20:52:28,577 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:52:30,124 epoch 6 - iter 27/272 - loss 0.02652120 - time (sec): 1.52 - samples/sec: 3067.11 - lr: 0.000016 - momentum: 0.000000 2023-10-25 20:52:31,621 epoch 6 - iter 54/272 - loss 0.02631961 - time (sec): 3.02 - samples/sec: 3218.86 - lr: 0.000016 - momentum: 0.000000 2023-10-25 20:52:33,096 epoch 6 - iter 81/272 - loss 0.02198692 - time (sec): 4.49 - samples/sec: 3280.48 - lr: 0.000016 - momentum: 0.000000 2023-10-25 20:52:34,602 epoch 6 - iter 108/272 - loss 0.02473048 - time (sec): 6.00 - samples/sec: 3337.97 - lr: 0.000015 - momentum: 0.000000 2023-10-25 20:52:36,070 epoch 6 - iter 135/272 - loss 0.02285927 - time (sec): 7.47 - samples/sec: 3371.24 - lr: 0.000015 - momentum: 0.000000 2023-10-25 20:52:37,555 epoch 6 - iter 162/272 - loss 0.02122821 - time (sec): 8.95 - samples/sec: 3473.57 - lr: 0.000015 - momentum: 0.000000 2023-10-25 20:52:39,056 epoch 6 - iter 189/272 - loss 0.01959515 - time (sec): 10.45 - samples/sec: 3470.46 - lr: 0.000014 - momentum: 0.000000 2023-10-25 20:52:40,504 epoch 6 - iter 216/272 - loss 0.01869554 - time (sec): 11.90 - samples/sec: 3465.89 - lr: 0.000014 - momentum: 0.000000 2023-10-25 20:52:41,945 epoch 6 - iter 243/272 - loss 0.01895320 - time (sec): 13.34 - samples/sec: 3492.28 - lr: 0.000014 - momentum: 0.000000 2023-10-25 20:52:43,365 epoch 6 - iter 270/272 - loss 0.01873867 - time (sec): 14.76 - samples/sec: 3507.21 - lr: 0.000013 - momentum: 0.000000 2023-10-25 20:52:43,462 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:52:43,463 EPOCH 6 done: loss 0.0187 - lr: 0.000013 2023-10-25 20:52:44,733 DEV : loss 0.16893555223941803 - f1-score (micro avg) 0.8324 2023-10-25 20:52:44,741 saving best model 2023-10-25 20:52:45,405 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:52:46,859 epoch 7 - iter 27/272 - loss 0.01570413 - time (sec): 1.45 - samples/sec: 3659.52 - lr: 0.000013 - momentum: 0.000000 2023-10-25 20:52:48,351 epoch 7 - iter 54/272 - loss 0.01635881 - time (sec): 2.94 - samples/sec: 3551.57 - lr: 0.000013 - momentum: 0.000000 2023-10-25 20:52:49,917 epoch 7 - iter 81/272 - loss 0.01833799 - time (sec): 4.51 - samples/sec: 3370.52 - lr: 0.000012 - momentum: 0.000000 2023-10-25 20:52:51,376 epoch 7 - iter 108/272 - loss 0.01488772 - time (sec): 5.97 - samples/sec: 3387.92 - lr: 0.000012 - momentum: 0.000000 2023-10-25 20:52:52,826 epoch 7 - iter 135/272 - loss 0.01593043 - time (sec): 7.42 - samples/sec: 3335.88 - lr: 0.000012 - momentum: 0.000000 2023-10-25 20:52:54,341 epoch 7 - iter 162/272 - loss 0.01519486 - time (sec): 8.93 - samples/sec: 3367.70 - lr: 0.000011 - momentum: 0.000000 2023-10-25 20:52:55,849 epoch 7 - iter 189/272 - loss 0.01473941 - time (sec): 10.44 - samples/sec: 3372.45 - lr: 0.000011 - momentum: 0.000000 2023-10-25 20:52:57,397 epoch 7 - iter 216/272 - loss 0.01429210 - time (sec): 11.99 - samples/sec: 3440.67 - lr: 0.000011 - momentum: 0.000000 2023-10-25 20:52:58,888 epoch 7 - iter 243/272 - loss 0.01324011 - time (sec): 13.48 - samples/sec: 3476.27 - lr: 0.000010 - momentum: 0.000000 2023-10-25 20:53:00,322 epoch 7 - iter 270/272 - loss 0.01365909 - time (sec): 14.91 - samples/sec: 3480.96 - lr: 0.000010 - momentum: 0.000000 2023-10-25 20:53:00,416 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:00,416 EPOCH 7 done: loss 0.0136 - lr: 0.000010 2023-10-25 20:53:01,585 DEV : loss 0.1673850566148758 - f1-score (micro avg) 0.844 2023-10-25 20:53:01,592 saving best model 2023-10-25 20:53:02,278 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:03,801 epoch 8 - iter 27/272 - loss 0.02161960 - time (sec): 1.52 - samples/sec: 3618.74 - lr: 0.000010 - momentum: 0.000000 2023-10-25 20:53:05,303 epoch 8 - iter 54/272 - loss 0.01680960 - time (sec): 3.02 - samples/sec: 3491.96 - lr: 0.000009 - momentum: 0.000000 2023-10-25 20:53:06,784 epoch 8 - iter 81/272 - loss 0.01355752 - time (sec): 4.50 - samples/sec: 3448.46 - lr: 0.000009 - momentum: 0.000000 2023-10-25 20:53:08,258 epoch 8 - iter 108/272 - loss 0.01579221 - time (sec): 5.98 - samples/sec: 3470.08 - lr: 0.000009 - momentum: 0.000000 2023-10-25 20:53:09,700 epoch 8 - iter 135/272 - loss 0.01480401 - time (sec): 7.42 - samples/sec: 3427.25 - lr: 0.000008 - momentum: 0.000000 2023-10-25 20:53:11,218 epoch 8 - iter 162/272 - loss 0.01266974 - time (sec): 8.94 - samples/sec: 3464.35 - lr: 0.000008 - momentum: 0.000000 2023-10-25 20:53:12,725 epoch 8 - iter 189/272 - loss 0.01141429 - time (sec): 10.44 - samples/sec: 3449.79 - lr: 0.000008 - momentum: 0.000000 2023-10-25 20:53:14,150 epoch 8 - iter 216/272 - loss 0.01178960 - time (sec): 11.87 - samples/sec: 3379.00 - lr: 0.000007 - momentum: 0.000000 2023-10-25 20:53:15,670 epoch 8 - iter 243/272 - loss 0.01123156 - time (sec): 13.39 - samples/sec: 3428.57 - lr: 0.000007 - momentum: 0.000000 2023-10-25 20:53:17,154 epoch 8 - iter 270/272 - loss 0.01148940 - time (sec): 14.87 - samples/sec: 3482.11 - lr: 0.000007 - momentum: 0.000000 2023-10-25 20:53:17,258 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:17,259 EPOCH 8 done: loss 0.0115 - lr: 0.000007 2023-10-25 20:53:18,828 DEV : loss 0.18429012596607208 - f1-score (micro avg) 0.8429 2023-10-25 20:53:18,834 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:20,345 epoch 9 - iter 27/272 - loss 0.00096616 - time (sec): 1.51 - samples/sec: 3279.68 - lr: 0.000006 - momentum: 0.000000 2023-10-25 20:53:21,821 epoch 9 - iter 54/272 - loss 0.00315849 - time (sec): 2.99 - samples/sec: 3323.41 - lr: 0.000006 - momentum: 0.000000 2023-10-25 20:53:23,292 epoch 9 - iter 81/272 - loss 0.00292424 - time (sec): 4.46 - samples/sec: 3273.48 - lr: 0.000006 - momentum: 0.000000 2023-10-25 20:53:24,820 epoch 9 - iter 108/272 - loss 0.00436173 - time (sec): 5.98 - samples/sec: 3396.22 - lr: 0.000005 - momentum: 0.000000 2023-10-25 20:53:26,273 epoch 9 - iter 135/272 - loss 0.00392677 - time (sec): 7.44 - samples/sec: 3513.87 - lr: 0.000005 - momentum: 0.000000 2023-10-25 20:53:27,649 epoch 9 - iter 162/272 - loss 0.00622840 - time (sec): 8.81 - samples/sec: 3541.40 - lr: 0.000005 - momentum: 0.000000 2023-10-25 20:53:29,173 epoch 9 - iter 189/272 - loss 0.00621295 - time (sec): 10.34 - samples/sec: 3498.94 - lr: 0.000004 - momentum: 0.000000 2023-10-25 20:53:30,573 epoch 9 - iter 216/272 - loss 0.00596192 - time (sec): 11.74 - samples/sec: 3536.20 - lr: 0.000004 - momentum: 0.000000 2023-10-25 20:53:32,055 epoch 9 - iter 243/272 - loss 0.00724204 - time (sec): 13.22 - samples/sec: 3575.10 - lr: 0.000004 - momentum: 0.000000 2023-10-25 20:53:33,437 epoch 9 - iter 270/272 - loss 0.00728933 - time (sec): 14.60 - samples/sec: 3541.52 - lr: 0.000003 - momentum: 0.000000 2023-10-25 20:53:33,536 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:33,536 EPOCH 9 done: loss 0.0073 - lr: 0.000003 2023-10-25 20:53:34,754 DEV : loss 0.18683403730392456 - f1-score (micro avg) 0.846 2023-10-25 20:53:34,761 saving best model 2023-10-25 20:53:35,249 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:36,695 epoch 10 - iter 27/272 - loss 0.00555682 - time (sec): 1.44 - samples/sec: 3069.63 - lr: 0.000003 - momentum: 0.000000 2023-10-25 20:53:38,153 epoch 10 - iter 54/272 - loss 0.00691251 - time (sec): 2.90 - samples/sec: 3140.72 - lr: 0.000003 - momentum: 0.000000 2023-10-25 20:53:39,581 epoch 10 - iter 81/272 - loss 0.00559490 - time (sec): 4.33 - samples/sec: 3363.79 - lr: 0.000002 - momentum: 0.000000 2023-10-25 20:53:41,065 epoch 10 - iter 108/272 - loss 0.00408773 - time (sec): 5.81 - samples/sec: 3510.61 - lr: 0.000002 - momentum: 0.000000 2023-10-25 20:53:42,390 epoch 10 - iter 135/272 - loss 0.00363612 - time (sec): 7.14 - samples/sec: 3426.84 - lr: 0.000002 - momentum: 0.000000 2023-10-25 20:53:43,826 epoch 10 - iter 162/272 - loss 0.00590384 - time (sec): 8.57 - samples/sec: 3439.31 - lr: 0.000001 - momentum: 0.000000 2023-10-25 20:53:45,205 epoch 10 - iter 189/272 - loss 0.00552531 - time (sec): 9.95 - samples/sec: 3489.58 - lr: 0.000001 - momentum: 0.000000 2023-10-25 20:53:46,692 epoch 10 - iter 216/272 - loss 0.00564306 - time (sec): 11.44 - samples/sec: 3572.62 - lr: 0.000001 - momentum: 0.000000 2023-10-25 20:53:48,035 epoch 10 - iter 243/272 - loss 0.00641529 - time (sec): 12.78 - samples/sec: 3538.99 - lr: 0.000000 - momentum: 0.000000 2023-10-25 20:53:49,469 epoch 10 - iter 270/272 - loss 0.00576979 - time (sec): 14.22 - samples/sec: 3623.00 - lr: 0.000000 - momentum: 0.000000 2023-10-25 20:53:49,580 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:49,580 EPOCH 10 done: loss 0.0057 - lr: 0.000000 2023-10-25 20:53:50,777 DEV : loss 0.18572133779525757 - f1-score (micro avg) 0.845 2023-10-25 20:53:51,215 ---------------------------------------------------------------------------------------------------- 2023-10-25 20:53:51,216 Loading model from best epoch ... 2023-10-25 20:53:53,027 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-25 20:53:55,007 Results: - F-score (micro) 0.7975 - F-score (macro) 0.7465 - Accuracy 0.682 By class: precision recall f1-score support LOC 0.8160 0.8814 0.8475 312 PER 0.7137 0.8750 0.7862 208 ORG 0.5510 0.4909 0.5192 55 HumanProd 0.7692 0.9091 0.8333 22 micro avg 0.7556 0.8442 0.7975 597 macro avg 0.7125 0.7891 0.7465 597 weighted avg 0.7542 0.8442 0.7953 597 2023-10-25 20:53:55,007 ----------------------------------------------------------------------------------------------------