2023-11-16 03:28:06,601 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,603 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(250003, 1024) (position_embeddings): Embedding(514, 1024, padding_idx=1) (token_type_embeddings): Embedding(1, 1024) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): XLMRobertaEncoder( (layer): ModuleList( (0-23): 24 x XLMRobertaLayer( (attention): XLMRobertaAttention( (self): XLMRobertaSelfAttention( (query): Linear(in_features=1024, out_features=1024, bias=True) (key): Linear(in_features=1024, out_features=1024, bias=True) (value): Linear(in_features=1024, out_features=1024, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): XLMRobertaSelfOutput( (dense): Linear(in_features=1024, out_features=1024, bias=True) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): XLMRobertaIntermediate( (dense): Linear(in_features=1024, out_features=4096, bias=True) (intermediate_act_fn): GELUActivation() ) (output): XLMRobertaOutput( (dense): Linear(in_features=4096, out_features=1024, bias=True) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): XLMRobertaPooler( (dense): Linear(in_features=1024, out_features=1024, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1024, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-11-16 03:28:06,603 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,603 MultiCorpus: 30000 train + 10000 dev + 10000 test sentences - ColumnCorpus Corpus: 20000 train + 0 dev + 0 test sentences - /root/.flair/datasets/ner_multi_xtreme/en - ColumnCorpus Corpus: 10000 train + 10000 dev + 10000 test sentences - /root/.flair/datasets/ner_multi_xtreme/ka 2023-11-16 03:28:06,603 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,603 Train: 30000 sentences 2023-11-16 03:28:06,603 (train_with_dev=False, train_with_test=False) 2023-11-16 03:28:06,603 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,604 Training Params: 2023-11-16 03:28:06,604 - learning_rate: "5e-06" 2023-11-16 03:28:06,604 - mini_batch_size: "4" 2023-11-16 03:28:06,604 - max_epochs: "10" 2023-11-16 03:28:06,604 - shuffle: "True" 2023-11-16 03:28:06,604 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,604 Plugins: 2023-11-16 03:28:06,604 - TensorboardLogger 2023-11-16 03:28:06,604 - LinearScheduler | warmup_fraction: '0.1' 2023-11-16 03:28:06,604 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,604 Final evaluation on model from best epoch (best-model.pt) 2023-11-16 03:28:06,604 - metric: "('micro avg', 'f1-score')" 2023-11-16 03:28:06,604 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,604 Computation: 2023-11-16 03:28:06,604 - compute on device: cuda:0 2023-11-16 03:28:06,604 - embedding storage: none 2023-11-16 03:28:06,604 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,604 Model training base path: "autotrain-flair-georgian-ner-xlm_r_large-bs4-e10-lr5e-06-3" 2023-11-16 03:28:06,604 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,604 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:28:06,604 Logging anything other than scalars to TensorBoard is currently not supported. 2023-11-16 03:29:38,193 epoch 1 - iter 750/7500 - loss 2.70469216 - time (sec): 91.59 - samples/sec: 264.85 - lr: 0.000000 - momentum: 0.000000 2023-11-16 03:31:09,213 epoch 1 - iter 1500/7500 - loss 2.24893654 - time (sec): 182.61 - samples/sec: 261.81 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:32:42,308 epoch 1 - iter 2250/7500 - loss 1.97006153 - time (sec): 275.70 - samples/sec: 260.33 - lr: 0.000001 - momentum: 0.000000 2023-11-16 03:34:16,815 epoch 1 - iter 3000/7500 - loss 1.72031860 - time (sec): 370.21 - samples/sec: 260.02 - lr: 0.000002 - momentum: 0.000000 2023-11-16 03:35:50,112 epoch 1 - iter 3750/7500 - loss 1.52308109 - time (sec): 463.51 - samples/sec: 259.42 - lr: 0.000002 - momentum: 0.000000 2023-11-16 03:37:23,760 epoch 1 - iter 4500/7500 - loss 1.36457847 - time (sec): 557.15 - samples/sec: 259.48 - lr: 0.000003 - momentum: 0.000000 2023-11-16 03:38:57,168 epoch 1 - iter 5250/7500 - loss 1.24407079 - time (sec): 650.56 - samples/sec: 259.07 - lr: 0.000003 - momentum: 0.000000 2023-11-16 03:40:28,972 epoch 1 - iter 6000/7500 - loss 1.15260515 - time (sec): 742.37 - samples/sec: 259.75 - lr: 0.000004 - momentum: 0.000000 2023-11-16 03:42:03,894 epoch 1 - iter 6750/7500 - loss 1.07519645 - time (sec): 837.29 - samples/sec: 258.95 - lr: 0.000004 - momentum: 0.000000 2023-11-16 03:43:39,060 epoch 1 - iter 7500/7500 - loss 1.01557427 - time (sec): 932.45 - samples/sec: 258.24 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:43:39,062 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:43:39,063 EPOCH 1 done: loss 1.0156 - lr: 0.000005 2023-11-16 03:44:06,229 DEV : loss 0.27559971809387207 - f1-score (micro avg) 0.8152 2023-11-16 03:44:08,725 saving best model 2023-11-16 03:44:10,470 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:45:42,474 epoch 2 - iter 750/7500 - loss 0.39106376 - time (sec): 92.00 - samples/sec: 261.03 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:47:15,847 epoch 2 - iter 1500/7500 - loss 0.40555598 - time (sec): 185.37 - samples/sec: 261.97 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:48:49,533 epoch 2 - iter 2250/7500 - loss 0.40652252 - time (sec): 279.06 - samples/sec: 260.36 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:50:24,376 epoch 2 - iter 3000/7500 - loss 0.40712357 - time (sec): 373.90 - samples/sec: 258.58 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:52:01,501 epoch 2 - iter 3750/7500 - loss 0.40345429 - time (sec): 471.03 - samples/sec: 256.65 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:53:38,242 epoch 2 - iter 4500/7500 - loss 0.40372313 - time (sec): 567.77 - samples/sec: 255.87 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:55:11,702 epoch 2 - iter 5250/7500 - loss 0.40504927 - time (sec): 661.23 - samples/sec: 255.50 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:56:44,579 epoch 2 - iter 6000/7500 - loss 0.40569421 - time (sec): 754.11 - samples/sec: 256.15 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:58:17,886 epoch 2 - iter 6750/7500 - loss 0.40571892 - time (sec): 847.41 - samples/sec: 256.18 - lr: 0.000005 - momentum: 0.000000 2023-11-16 03:59:50,847 epoch 2 - iter 7500/7500 - loss 0.40365851 - time (sec): 940.37 - samples/sec: 256.06 - lr: 0.000004 - momentum: 0.000000 2023-11-16 03:59:50,849 ---------------------------------------------------------------------------------------------------- 2023-11-16 03:59:50,849 EPOCH 2 done: loss 0.4037 - lr: 0.000004 2023-11-16 04:00:17,681 DEV : loss 0.271997332572937 - f1-score (micro avg) 0.8697 2023-11-16 04:00:20,070 saving best model 2023-11-16 04:00:23,060 ---------------------------------------------------------------------------------------------------- 2023-11-16 04:01:57,142 epoch 3 - iter 750/7500 - loss 0.34646794 - time (sec): 94.08 - samples/sec: 250.74 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:03:32,257 epoch 3 - iter 1500/7500 - loss 0.33277165 - time (sec): 189.19 - samples/sec: 253.91 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:05:06,742 epoch 3 - iter 2250/7500 - loss 0.34013081 - time (sec): 283.68 - samples/sec: 253.23 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:06:41,133 epoch 3 - iter 3000/7500 - loss 0.33864371 - time (sec): 378.07 - samples/sec: 253.41 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:08:14,833 epoch 3 - iter 3750/7500 - loss 0.34190452 - time (sec): 471.77 - samples/sec: 254.37 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:09:45,391 epoch 3 - iter 4500/7500 - loss 0.34219639 - time (sec): 562.33 - samples/sec: 256.12 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:11:18,334 epoch 3 - iter 5250/7500 - loss 0.34365478 - time (sec): 655.27 - samples/sec: 256.94 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:12:52,829 epoch 3 - iter 6000/7500 - loss 0.34431528 - time (sec): 749.76 - samples/sec: 256.24 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:14:25,065 epoch 3 - iter 6750/7500 - loss 0.34309773 - time (sec): 842.00 - samples/sec: 257.59 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:15:57,201 epoch 3 - iter 7500/7500 - loss 0.34251715 - time (sec): 934.14 - samples/sec: 257.77 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:15:57,204 ---------------------------------------------------------------------------------------------------- 2023-11-16 04:15:57,204 EPOCH 3 done: loss 0.3425 - lr: 0.000004 2023-11-16 04:16:24,728 DEV : loss 0.2714731991291046 - f1-score (micro avg) 0.8842 2023-11-16 04:16:27,191 saving best model 2023-11-16 04:16:29,639 ---------------------------------------------------------------------------------------------------- 2023-11-16 04:18:06,042 epoch 4 - iter 750/7500 - loss 0.29074268 - time (sec): 96.40 - samples/sec: 252.40 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:19:39,895 epoch 4 - iter 1500/7500 - loss 0.29294947 - time (sec): 190.25 - samples/sec: 256.92 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:21:12,116 epoch 4 - iter 2250/7500 - loss 0.29693683 - time (sec): 282.47 - samples/sec: 257.67 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:22:43,053 epoch 4 - iter 3000/7500 - loss 0.29670062 - time (sec): 373.41 - samples/sec: 259.41 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:24:17,342 epoch 4 - iter 3750/7500 - loss 0.29561519 - time (sec): 467.70 - samples/sec: 257.80 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:25:50,373 epoch 4 - iter 4500/7500 - loss 0.29194840 - time (sec): 560.73 - samples/sec: 258.18 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:27:24,822 epoch 4 - iter 5250/7500 - loss 0.29857267 - time (sec): 655.18 - samples/sec: 257.96 - lr: 0.000004 - momentum: 0.000000 2023-11-16 04:28:58,980 epoch 4 - iter 6000/7500 - loss 0.30018714 - time (sec): 749.34 - samples/sec: 257.24 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:30:33,294 epoch 4 - iter 6750/7500 - loss 0.30336094 - time (sec): 843.65 - samples/sec: 257.08 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:32:07,976 epoch 4 - iter 7500/7500 - loss 0.30240959 - time (sec): 938.33 - samples/sec: 256.62 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:32:07,980 ---------------------------------------------------------------------------------------------------- 2023-11-16 04:32:07,980 EPOCH 4 done: loss 0.3024 - lr: 0.000003 2023-11-16 04:32:35,569 DEV : loss 0.2897871732711792 - f1-score (micro avg) 0.8922 2023-11-16 04:32:38,075 saving best model 2023-11-16 04:32:40,983 ---------------------------------------------------------------------------------------------------- 2023-11-16 04:34:14,736 epoch 5 - iter 750/7500 - loss 0.22168761 - time (sec): 93.75 - samples/sec: 260.88 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:35:48,275 epoch 5 - iter 1500/7500 - loss 0.23358638 - time (sec): 187.29 - samples/sec: 258.77 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:37:23,490 epoch 5 - iter 2250/7500 - loss 0.24130242 - time (sec): 282.50 - samples/sec: 256.72 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:38:57,959 epoch 5 - iter 3000/7500 - loss 0.24848714 - time (sec): 376.97 - samples/sec: 257.38 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:40:32,648 epoch 5 - iter 3750/7500 - loss 0.25384312 - time (sec): 471.66 - samples/sec: 255.68 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:42:08,140 epoch 5 - iter 4500/7500 - loss 0.25352346 - time (sec): 567.15 - samples/sec: 254.83 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:43:42,002 epoch 5 - iter 5250/7500 - loss 0.25599881 - time (sec): 661.01 - samples/sec: 255.09 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:45:13,434 epoch 5 - iter 6000/7500 - loss 0.25515887 - time (sec): 752.45 - samples/sec: 255.70 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:46:47,959 epoch 5 - iter 6750/7500 - loss 0.25539887 - time (sec): 846.97 - samples/sec: 255.87 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:48:21,835 epoch 5 - iter 7500/7500 - loss 0.25660205 - time (sec): 940.85 - samples/sec: 255.94 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:48:21,838 ---------------------------------------------------------------------------------------------------- 2023-11-16 04:48:21,838 EPOCH 5 done: loss 0.2566 - lr: 0.000003 2023-11-16 04:48:49,130 DEV : loss 0.28101304173469543 - f1-score (micro avg) 0.8973 2023-11-16 04:48:51,696 saving best model 2023-11-16 04:48:53,741 ---------------------------------------------------------------------------------------------------- 2023-11-16 04:50:26,277 epoch 6 - iter 750/7500 - loss 0.22465859 - time (sec): 92.53 - samples/sec: 255.27 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:52:00,738 epoch 6 - iter 1500/7500 - loss 0.21970656 - time (sec): 186.99 - samples/sec: 254.83 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:53:34,911 epoch 6 - iter 2250/7500 - loss 0.21946764 - time (sec): 281.17 - samples/sec: 255.61 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:55:09,828 epoch 6 - iter 3000/7500 - loss 0.21638489 - time (sec): 376.08 - samples/sec: 255.02 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:56:43,313 epoch 6 - iter 3750/7500 - loss 0.21414458 - time (sec): 469.57 - samples/sec: 255.78 - lr: 0.000003 - momentum: 0.000000 2023-11-16 04:58:15,828 epoch 6 - iter 4500/7500 - loss 0.21434532 - time (sec): 562.08 - samples/sec: 256.74 - lr: 0.000002 - momentum: 0.000000 2023-11-16 04:59:47,824 epoch 6 - iter 5250/7500 - loss 0.21772911 - time (sec): 654.08 - samples/sec: 257.12 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:01:19,648 epoch 6 - iter 6000/7500 - loss 0.21657089 - time (sec): 745.90 - samples/sec: 257.68 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:02:53,268 epoch 6 - iter 6750/7500 - loss 0.21549326 - time (sec): 839.52 - samples/sec: 257.74 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:04:26,550 epoch 6 - iter 7500/7500 - loss 0.21351207 - time (sec): 932.80 - samples/sec: 258.14 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:04:26,555 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:04:26,555 EPOCH 6 done: loss 0.2135 - lr: 0.000002 2023-11-16 05:04:53,798 DEV : loss 0.3079068958759308 - f1-score (micro avg) 0.9002 2023-11-16 05:04:56,055 saving best model 2023-11-16 05:04:58,666 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:06:33,192 epoch 7 - iter 750/7500 - loss 0.18268097 - time (sec): 94.52 - samples/sec: 251.77 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:08:05,085 epoch 7 - iter 1500/7500 - loss 0.18175139 - time (sec): 186.42 - samples/sec: 257.03 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:09:42,400 epoch 7 - iter 2250/7500 - loss 0.19001507 - time (sec): 283.73 - samples/sec: 252.39 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:11:16,393 epoch 7 - iter 3000/7500 - loss 0.18641112 - time (sec): 377.72 - samples/sec: 253.16 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:12:51,038 epoch 7 - iter 3750/7500 - loss 0.18515279 - time (sec): 472.37 - samples/sec: 253.66 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:14:26,806 epoch 7 - iter 4500/7500 - loss 0.18525402 - time (sec): 568.14 - samples/sec: 253.77 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:16:01,622 epoch 7 - iter 5250/7500 - loss 0.18863436 - time (sec): 662.95 - samples/sec: 253.50 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:17:35,258 epoch 7 - iter 6000/7500 - loss 0.18494686 - time (sec): 756.59 - samples/sec: 253.73 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:19:09,299 epoch 7 - iter 6750/7500 - loss 0.18556342 - time (sec): 850.63 - samples/sec: 254.64 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:20:43,065 epoch 7 - iter 7500/7500 - loss 0.18644460 - time (sec): 944.40 - samples/sec: 254.97 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:20:43,069 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:20:43,069 EPOCH 7 done: loss 0.1864 - lr: 0.000002 2023-11-16 05:21:10,981 DEV : loss 0.2802160382270813 - f1-score (micro avg) 0.9048 2023-11-16 05:21:13,241 saving best model 2023-11-16 05:21:15,612 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:22:49,896 epoch 8 - iter 750/7500 - loss 0.14122739 - time (sec): 94.28 - samples/sec: 259.14 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:24:23,137 epoch 8 - iter 1500/7500 - loss 0.14874139 - time (sec): 187.52 - samples/sec: 258.77 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:25:56,602 epoch 8 - iter 2250/7500 - loss 0.15341856 - time (sec): 280.99 - samples/sec: 257.70 - lr: 0.000002 - momentum: 0.000000 2023-11-16 05:27:28,992 epoch 8 - iter 3000/7500 - loss 0.15416389 - time (sec): 373.38 - samples/sec: 258.36 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:29:03,053 epoch 8 - iter 3750/7500 - loss 0.15634692 - time (sec): 467.44 - samples/sec: 257.36 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:30:37,470 epoch 8 - iter 4500/7500 - loss 0.15700278 - time (sec): 561.85 - samples/sec: 256.31 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:32:10,851 epoch 8 - iter 5250/7500 - loss 0.15692674 - time (sec): 655.24 - samples/sec: 256.36 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:33:44,315 epoch 8 - iter 6000/7500 - loss 0.15879525 - time (sec): 748.70 - samples/sec: 256.96 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:35:16,930 epoch 8 - iter 6750/7500 - loss 0.15726830 - time (sec): 841.31 - samples/sec: 257.53 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:36:52,298 epoch 8 - iter 7500/7500 - loss 0.15647824 - time (sec): 936.68 - samples/sec: 257.07 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:36:52,300 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:36:52,301 EPOCH 8 done: loss 0.1565 - lr: 0.000001 2023-11-16 05:37:19,973 DEV : loss 0.3105733096599579 - f1-score (micro avg) 0.9056 2023-11-16 05:37:21,975 saving best model 2023-11-16 05:37:24,268 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:38:57,543 epoch 9 - iter 750/7500 - loss 0.13578701 - time (sec): 93.27 - samples/sec: 260.82 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:40:30,477 epoch 9 - iter 1500/7500 - loss 0.13977943 - time (sec): 186.21 - samples/sec: 258.36 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:42:04,338 epoch 9 - iter 2250/7500 - loss 0.13579281 - time (sec): 280.07 - samples/sec: 257.18 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:43:38,134 epoch 9 - iter 3000/7500 - loss 0.13083188 - time (sec): 373.86 - samples/sec: 257.67 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:45:11,777 epoch 9 - iter 3750/7500 - loss 0.13761002 - time (sec): 467.51 - samples/sec: 257.61 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:46:46,321 epoch 9 - iter 4500/7500 - loss 0.13992387 - time (sec): 562.05 - samples/sec: 256.71 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:48:20,030 epoch 9 - iter 5250/7500 - loss 0.13868841 - time (sec): 655.76 - samples/sec: 256.12 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:49:52,363 epoch 9 - iter 6000/7500 - loss 0.13924211 - time (sec): 748.09 - samples/sec: 256.89 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:51:24,421 epoch 9 - iter 6750/7500 - loss 0.13714285 - time (sec): 840.15 - samples/sec: 257.44 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:52:57,141 epoch 9 - iter 7500/7500 - loss 0.13574777 - time (sec): 932.87 - samples/sec: 258.12 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:52:57,144 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:52:57,144 EPOCH 9 done: loss 0.1357 - lr: 0.000001 2023-11-16 05:53:24,122 DEV : loss 0.30354949831962585 - f1-score (micro avg) 0.9069 2023-11-16 05:53:26,256 saving best model 2023-11-16 05:53:28,623 ---------------------------------------------------------------------------------------------------- 2023-11-16 05:55:01,065 epoch 10 - iter 750/7500 - loss 0.11662424 - time (sec): 92.44 - samples/sec: 258.78 - lr: 0.000001 - momentum: 0.000000 2023-11-16 05:56:34,198 epoch 10 - iter 1500/7500 - loss 0.10739844 - time (sec): 185.57 - samples/sec: 260.22 - lr: 0.000000 - momentum: 0.000000 2023-11-16 05:58:07,104 epoch 10 - iter 2250/7500 - loss 0.11728002 - time (sec): 278.48 - samples/sec: 261.23 - lr: 0.000000 - momentum: 0.000000 2023-11-16 05:59:38,545 epoch 10 - iter 3000/7500 - loss 0.11111246 - time (sec): 369.92 - samples/sec: 263.47 - lr: 0.000000 - momentum: 0.000000 2023-11-16 06:01:13,020 epoch 10 - iter 3750/7500 - loss 0.11185424 - time (sec): 464.39 - samples/sec: 261.72 - lr: 0.000000 - momentum: 0.000000 2023-11-16 06:02:47,617 epoch 10 - iter 4500/7500 - loss 0.11443883 - time (sec): 558.99 - samples/sec: 260.01 - lr: 0.000000 - momentum: 0.000000 2023-11-16 06:04:22,079 epoch 10 - iter 5250/7500 - loss 0.11684866 - time (sec): 653.45 - samples/sec: 259.18 - lr: 0.000000 - momentum: 0.000000 2023-11-16 06:05:55,899 epoch 10 - iter 6000/7500 - loss 0.11690532 - time (sec): 747.27 - samples/sec: 258.80 - lr: 0.000000 - momentum: 0.000000 2023-11-16 06:07:29,604 epoch 10 - iter 6750/7500 - loss 0.11669926 - time (sec): 840.98 - samples/sec: 258.11 - lr: 0.000000 - momentum: 0.000000 2023-11-16 06:09:02,429 epoch 10 - iter 7500/7500 - loss 0.11723510 - time (sec): 933.80 - samples/sec: 257.87 - lr: 0.000000 - momentum: 0.000000 2023-11-16 06:09:02,432 ---------------------------------------------------------------------------------------------------- 2023-11-16 06:09:02,432 EPOCH 10 done: loss 0.1172 - lr: 0.000000 2023-11-16 06:09:29,736 DEV : loss 0.3160940110683441 - f1-score (micro avg) 0.9064 2023-11-16 06:09:34,595 ---------------------------------------------------------------------------------------------------- 2023-11-16 06:09:34,598 Loading model from best epoch ... 2023-11-16 06:09:44,517 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER 2023-11-16 06:10:13,551 Results: - F-score (micro) 0.9076 - F-score (macro) 0.9067 - Accuracy 0.8601 By class: precision recall f1-score support LOC 0.9066 0.9143 0.9105 5288 PER 0.9231 0.9485 0.9356 3962 ORG 0.8737 0.8742 0.8739 3807 micro avg 0.9022 0.9130 0.9076 13057 macro avg 0.9012 0.9123 0.9067 13057 weighted avg 0.9020 0.9130 0.9075 13057 2023-11-16 06:10:13,551 ----------------------------------------------------------------------------------------------------