2023-11-15 22:00:25,159 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,161 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(250003, 1024) (position_embeddings): Embedding(514, 1024, padding_idx=1) (token_type_embeddings): Embedding(1, 1024) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): XLMRobertaEncoder( (layer): ModuleList( (0-23): 24 x XLMRobertaLayer( (attention): XLMRobertaAttention( (self): XLMRobertaSelfAttention( (query): Linear(in_features=1024, out_features=1024, bias=True) (key): Linear(in_features=1024, out_features=1024, bias=True) (value): Linear(in_features=1024, out_features=1024, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): XLMRobertaSelfOutput( (dense): Linear(in_features=1024, out_features=1024, bias=True) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): XLMRobertaIntermediate( (dense): Linear(in_features=1024, out_features=4096, bias=True) (intermediate_act_fn): GELUActivation() ) (output): XLMRobertaOutput( (dense): Linear(in_features=4096, out_features=1024, bias=True) (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): XLMRobertaPooler( (dense): Linear(in_features=1024, out_features=1024, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1024, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-11-15 22:00:25,161 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,161 MultiCorpus: 30000 train + 10000 dev + 10000 test sentences - ColumnCorpus Corpus: 20000 train + 0 dev + 0 test sentences - /root/.flair/datasets/ner_multi_xtreme/en - ColumnCorpus Corpus: 10000 train + 10000 dev + 10000 test sentences - /root/.flair/datasets/ner_multi_xtreme/ka 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 Train: 30000 sentences 2023-11-15 22:00:25,162 (train_with_dev=False, train_with_test=False) 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 Training Params: 2023-11-15 22:00:25,162 - learning_rate: "5e-06" 2023-11-15 22:00:25,162 - mini_batch_size: "4" 2023-11-15 22:00:25,162 - max_epochs: "10" 2023-11-15 22:00:25,162 - shuffle: "True" 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 Plugins: 2023-11-15 22:00:25,162 - TensorboardLogger 2023-11-15 22:00:25,162 - LinearScheduler | warmup_fraction: '0.1' 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 Final evaluation on model from best epoch (best-model.pt) 2023-11-15 22:00:25,162 - metric: "('micro avg', 'f1-score')" 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 Computation: 2023-11-15 22:00:25,162 - compute on device: cuda:0 2023-11-15 22:00:25,162 - embedding storage: none 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 Model training base path: "autotrain-flair-georgian-ner-xlm_r_large-bs4-e10-lr5e-06-1" 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:00:25,162 Logging anything other than scalars to TensorBoard is currently not supported. 2023-11-15 22:02:02,381 epoch 1 - iter 750/7500 - loss 2.97720856 - time (sec): 97.22 - samples/sec: 247.35 - lr: 0.000000 - momentum: 0.000000 2023-11-15 22:03:39,323 epoch 1 - iter 1500/7500 - loss 2.41069745 - time (sec): 194.16 - samples/sec: 251.96 - lr: 0.000001 - momentum: 0.000000 2023-11-15 22:05:13,923 epoch 1 - iter 2250/7500 - loss 2.11659763 - time (sec): 288.76 - samples/sec: 250.97 - lr: 0.000001 - momentum: 0.000000 2023-11-15 22:06:49,395 epoch 1 - iter 3000/7500 - loss 1.85845398 - time (sec): 384.23 - samples/sec: 250.76 - lr: 0.000002 - momentum: 0.000000 2023-11-15 22:08:25,760 epoch 1 - iter 3750/7500 - loss 1.64125510 - time (sec): 480.60 - samples/sec: 250.04 - lr: 0.000002 - momentum: 0.000000 2023-11-15 22:09:57,628 epoch 1 - iter 4500/7500 - loss 1.46644056 - time (sec): 572.46 - samples/sec: 251.97 - lr: 0.000003 - momentum: 0.000000 2023-11-15 22:11:32,160 epoch 1 - iter 5250/7500 - loss 1.33869112 - time (sec): 667.00 - samples/sec: 252.39 - lr: 0.000003 - momentum: 0.000000 2023-11-15 22:13:07,470 epoch 1 - iter 6000/7500 - loss 1.23507864 - time (sec): 762.31 - samples/sec: 252.62 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:14:40,413 epoch 1 - iter 6750/7500 - loss 1.14834855 - time (sec): 855.25 - samples/sec: 253.64 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:16:12,674 epoch 1 - iter 7500/7500 - loss 1.08078087 - time (sec): 947.51 - samples/sec: 254.14 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:16:12,676 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:16:12,676 EPOCH 1 done: loss 1.0808 - lr: 0.000005 2023-11-15 22:16:40,276 DEV : loss 0.28522568941116333 - f1-score (micro avg) 0.7689 2023-11-15 22:16:42,355 saving best model 2023-11-15 22:16:45,087 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:18:19,302 epoch 2 - iter 750/7500 - loss 0.40628767 - time (sec): 94.21 - samples/sec: 252.98 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:19:54,957 epoch 2 - iter 1500/7500 - loss 0.39913376 - time (sec): 189.87 - samples/sec: 252.13 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:21:30,957 epoch 2 - iter 2250/7500 - loss 0.39502501 - time (sec): 285.87 - samples/sec: 252.08 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:23:08,084 epoch 2 - iter 3000/7500 - loss 0.39217047 - time (sec): 382.99 - samples/sec: 252.19 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:24:44,510 epoch 2 - iter 3750/7500 - loss 0.39363076 - time (sec): 479.42 - samples/sec: 251.43 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:26:22,214 epoch 2 - iter 4500/7500 - loss 0.39486610 - time (sec): 577.12 - samples/sec: 250.52 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:28:03,826 epoch 2 - iter 5250/7500 - loss 0.39412988 - time (sec): 678.74 - samples/sec: 248.23 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:29:45,422 epoch 2 - iter 6000/7500 - loss 0.39244545 - time (sec): 780.33 - samples/sec: 246.43 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:31:27,285 epoch 2 - iter 6750/7500 - loss 0.39014491 - time (sec): 882.20 - samples/sec: 245.94 - lr: 0.000005 - momentum: 0.000000 2023-11-15 22:33:08,358 epoch 2 - iter 7500/7500 - loss 0.38962978 - time (sec): 983.27 - samples/sec: 244.89 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:33:08,361 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:33:08,361 EPOCH 2 done: loss 0.3896 - lr: 0.000004 2023-11-15 22:33:34,136 DEV : loss 0.22754639387130737 - f1-score (micro avg) 0.8657 2023-11-15 22:33:36,396 saving best model 2023-11-15 22:33:39,256 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:35:17,903 epoch 3 - iter 750/7500 - loss 0.34214105 - time (sec): 98.64 - samples/sec: 244.93 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:36:57,436 epoch 3 - iter 1500/7500 - loss 0.33960084 - time (sec): 198.18 - samples/sec: 243.91 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:38:36,787 epoch 3 - iter 2250/7500 - loss 0.34100409 - time (sec): 297.53 - samples/sec: 245.57 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:40:16,845 epoch 3 - iter 3000/7500 - loss 0.34997877 - time (sec): 397.59 - samples/sec: 242.72 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:41:55,617 epoch 3 - iter 3750/7500 - loss 0.34973124 - time (sec): 496.36 - samples/sec: 242.81 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:43:33,457 epoch 3 - iter 4500/7500 - loss 0.35049763 - time (sec): 594.20 - samples/sec: 242.95 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:45:13,590 epoch 3 - iter 5250/7500 - loss 0.34990880 - time (sec): 694.33 - samples/sec: 242.50 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:46:52,454 epoch 3 - iter 6000/7500 - loss 0.35059132 - time (sec): 793.19 - samples/sec: 242.62 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:48:30,458 epoch 3 - iter 6750/7500 - loss 0.34738568 - time (sec): 891.20 - samples/sec: 242.64 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:50:08,763 epoch 3 - iter 7500/7500 - loss 0.34558871 - time (sec): 989.50 - samples/sec: 243.35 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:50:08,766 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:50:08,766 EPOCH 3 done: loss 0.3456 - lr: 0.000004 2023-11-15 22:50:36,792 DEV : loss 0.2620299756526947 - f1-score (micro avg) 0.8807 2023-11-15 22:50:39,427 saving best model 2023-11-15 22:50:42,689 ---------------------------------------------------------------------------------------------------- 2023-11-15 22:52:20,669 epoch 4 - iter 750/7500 - loss 0.28881601 - time (sec): 97.97 - samples/sec: 249.59 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:53:59,111 epoch 4 - iter 1500/7500 - loss 0.29772971 - time (sec): 196.42 - samples/sec: 247.27 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:55:37,524 epoch 4 - iter 2250/7500 - loss 0.29353995 - time (sec): 294.83 - samples/sec: 246.88 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:57:10,987 epoch 4 - iter 3000/7500 - loss 0.29232593 - time (sec): 388.29 - samples/sec: 249.82 - lr: 0.000004 - momentum: 0.000000 2023-11-15 22:58:45,878 epoch 4 - iter 3750/7500 - loss 0.29565608 - time (sec): 483.18 - samples/sec: 250.22 - lr: 0.000004 - momentum: 0.000000 2023-11-15 23:00:15,561 epoch 4 - iter 4500/7500 - loss 0.29546503 - time (sec): 572.87 - samples/sec: 252.61 - lr: 0.000004 - momentum: 0.000000 2023-11-15 23:01:46,907 epoch 4 - iter 5250/7500 - loss 0.29295260 - time (sec): 664.21 - samples/sec: 254.49 - lr: 0.000004 - momentum: 0.000000 2023-11-15 23:03:20,758 epoch 4 - iter 6000/7500 - loss 0.29538906 - time (sec): 758.06 - samples/sec: 254.97 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:04:54,701 epoch 4 - iter 6750/7500 - loss 0.29413686 - time (sec): 852.01 - samples/sec: 254.92 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:06:27,232 epoch 4 - iter 7500/7500 - loss 0.29473517 - time (sec): 944.54 - samples/sec: 254.94 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:06:27,234 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:06:27,234 EPOCH 4 done: loss 0.2947 - lr: 0.000003 2023-11-15 23:06:55,453 DEV : loss 0.2627362310886383 - f1-score (micro avg) 0.8931 2023-11-15 23:06:57,258 saving best model 2023-11-15 23:07:00,052 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:08:33,866 epoch 5 - iter 750/7500 - loss 0.23475035 - time (sec): 93.81 - samples/sec: 257.86 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:10:06,704 epoch 5 - iter 1500/7500 - loss 0.25020039 - time (sec): 186.65 - samples/sec: 257.30 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:11:40,952 epoch 5 - iter 2250/7500 - loss 0.24718727 - time (sec): 280.90 - samples/sec: 257.62 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:13:19,252 epoch 5 - iter 3000/7500 - loss 0.24230280 - time (sec): 379.20 - samples/sec: 255.38 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:14:53,432 epoch 5 - iter 3750/7500 - loss 0.24621586 - time (sec): 473.38 - samples/sec: 255.15 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:16:28,060 epoch 5 - iter 4500/7500 - loss 0.25446598 - time (sec): 568.01 - samples/sec: 254.99 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:18:01,296 epoch 5 - iter 5250/7500 - loss 0.25785483 - time (sec): 661.24 - samples/sec: 255.64 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:19:34,321 epoch 5 - iter 6000/7500 - loss 0.25542150 - time (sec): 754.27 - samples/sec: 255.29 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:21:06,401 epoch 5 - iter 6750/7500 - loss 0.25788299 - time (sec): 846.35 - samples/sec: 256.25 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:22:42,749 epoch 5 - iter 7500/7500 - loss 0.25897971 - time (sec): 942.69 - samples/sec: 255.43 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:22:42,752 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:22:42,752 EPOCH 5 done: loss 0.2590 - lr: 0.000003 2023-11-15 23:23:10,474 DEV : loss 0.28726592659950256 - f1-score (micro avg) 0.8965 2023-11-15 23:23:12,645 saving best model 2023-11-15 23:23:15,948 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:24:53,113 epoch 6 - iter 750/7500 - loss 0.21973774 - time (sec): 97.16 - samples/sec: 249.95 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:26:26,815 epoch 6 - iter 1500/7500 - loss 0.21332096 - time (sec): 190.86 - samples/sec: 253.62 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:27:59,646 epoch 6 - iter 2250/7500 - loss 0.21491622 - time (sec): 283.69 - samples/sec: 254.10 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:29:32,191 epoch 6 - iter 3000/7500 - loss 0.21457413 - time (sec): 376.24 - samples/sec: 255.67 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:31:04,817 epoch 6 - iter 3750/7500 - loss 0.21967125 - time (sec): 468.87 - samples/sec: 257.41 - lr: 0.000003 - momentum: 0.000000 2023-11-15 23:32:36,784 epoch 6 - iter 4500/7500 - loss 0.22261148 - time (sec): 560.83 - samples/sec: 257.57 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:34:09,162 epoch 6 - iter 5250/7500 - loss 0.22064338 - time (sec): 653.21 - samples/sec: 257.83 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:35:42,101 epoch 6 - iter 6000/7500 - loss 0.21731885 - time (sec): 746.15 - samples/sec: 258.55 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:37:15,361 epoch 6 - iter 6750/7500 - loss 0.21808265 - time (sec): 839.41 - samples/sec: 257.99 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:38:49,020 epoch 6 - iter 7500/7500 - loss 0.21803191 - time (sec): 933.07 - samples/sec: 258.07 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:38:49,030 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:38:49,030 EPOCH 6 done: loss 0.2180 - lr: 0.000002 2023-11-15 23:39:16,657 DEV : loss 0.2947460412979126 - f1-score (micro avg) 0.8946 2023-11-15 23:39:18,346 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:40:52,475 epoch 7 - iter 750/7500 - loss 0.18291047 - time (sec): 94.13 - samples/sec: 255.74 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:42:24,451 epoch 7 - iter 1500/7500 - loss 0.18433171 - time (sec): 186.10 - samples/sec: 255.14 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:43:58,604 epoch 7 - iter 2250/7500 - loss 0.18998389 - time (sec): 280.25 - samples/sec: 253.23 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:45:30,028 epoch 7 - iter 3000/7500 - loss 0.18175644 - time (sec): 371.68 - samples/sec: 256.35 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:47:03,392 epoch 7 - iter 3750/7500 - loss 0.18696273 - time (sec): 465.04 - samples/sec: 257.12 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:48:37,150 epoch 7 - iter 4500/7500 - loss 0.18321438 - time (sec): 558.80 - samples/sec: 257.32 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:50:10,852 epoch 7 - iter 5250/7500 - loss 0.18492056 - time (sec): 652.50 - samples/sec: 257.36 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:51:45,033 epoch 7 - iter 6000/7500 - loss 0.18451583 - time (sec): 746.68 - samples/sec: 256.74 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:53:17,929 epoch 7 - iter 6750/7500 - loss 0.18613635 - time (sec): 839.58 - samples/sec: 257.56 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:54:48,507 epoch 7 - iter 7500/7500 - loss 0.18639933 - time (sec): 930.16 - samples/sec: 258.88 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:54:48,510 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:54:48,510 EPOCH 7 done: loss 0.1864 - lr: 0.000002 2023-11-15 23:55:15,442 DEV : loss 0.3085970878601074 - f1-score (micro avg) 0.8966 2023-11-15 23:55:18,420 saving best model 2023-11-15 23:55:21,094 ---------------------------------------------------------------------------------------------------- 2023-11-15 23:56:54,732 epoch 8 - iter 750/7500 - loss 0.17466309 - time (sec): 93.63 - samples/sec: 253.48 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:58:27,121 epoch 8 - iter 1500/7500 - loss 0.16813816 - time (sec): 186.02 - samples/sec: 257.36 - lr: 0.000002 - momentum: 0.000000 2023-11-15 23:59:58,131 epoch 8 - iter 2250/7500 - loss 0.16489442 - time (sec): 277.03 - samples/sec: 259.32 - lr: 0.000002 - momentum: 0.000000 2023-11-16 00:01:31,801 epoch 8 - iter 3000/7500 - loss 0.16611691 - time (sec): 370.70 - samples/sec: 260.01 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:03:04,777 epoch 8 - iter 3750/7500 - loss 0.15963682 - time (sec): 463.68 - samples/sec: 260.64 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:04:37,159 epoch 8 - iter 4500/7500 - loss 0.15855342 - time (sec): 556.06 - samples/sec: 260.73 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:06:08,858 epoch 8 - iter 5250/7500 - loss 0.15795009 - time (sec): 647.76 - samples/sec: 260.56 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:07:40,645 epoch 8 - iter 6000/7500 - loss 0.15834278 - time (sec): 739.55 - samples/sec: 260.14 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:09:13,294 epoch 8 - iter 6750/7500 - loss 0.15728929 - time (sec): 832.19 - samples/sec: 259.82 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:10:45,101 epoch 8 - iter 7500/7500 - loss 0.15715876 - time (sec): 924.00 - samples/sec: 260.60 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:10:45,104 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:10:45,104 EPOCH 8 done: loss 0.1572 - lr: 0.000001 2023-11-16 00:11:12,824 DEV : loss 0.3132772743701935 - f1-score (micro avg) 0.8987 2023-11-16 00:11:14,864 saving best model 2023-11-16 00:11:17,496 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:12:50,277 epoch 9 - iter 750/7500 - loss 0.13402991 - time (sec): 92.78 - samples/sec: 262.49 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:14:25,042 epoch 9 - iter 1500/7500 - loss 0.13544134 - time (sec): 187.54 - samples/sec: 260.51 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:16:00,925 epoch 9 - iter 2250/7500 - loss 0.13605938 - time (sec): 283.43 - samples/sec: 256.17 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:17:34,205 epoch 9 - iter 3000/7500 - loss 0.13264017 - time (sec): 376.71 - samples/sec: 257.28 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:19:09,897 epoch 9 - iter 3750/7500 - loss 0.13248311 - time (sec): 472.40 - samples/sec: 257.65 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:20:46,186 epoch 9 - iter 4500/7500 - loss 0.13242849 - time (sec): 568.69 - samples/sec: 255.77 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:22:19,408 epoch 9 - iter 5250/7500 - loss 0.13193630 - time (sec): 661.91 - samples/sec: 256.28 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:23:53,091 epoch 9 - iter 6000/7500 - loss 0.13145249 - time (sec): 755.59 - samples/sec: 256.01 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:25:24,254 epoch 9 - iter 6750/7500 - loss 0.13171967 - time (sec): 846.75 - samples/sec: 256.47 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:26:55,146 epoch 9 - iter 7500/7500 - loss 0.13410642 - time (sec): 937.65 - samples/sec: 256.81 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:26:55,148 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:26:55,148 EPOCH 9 done: loss 0.1341 - lr: 0.000001 2023-11-16 00:27:22,622 DEV : loss 0.33312830328941345 - f1-score (micro avg) 0.8995 2023-11-16 00:27:24,615 saving best model 2023-11-16 00:27:27,266 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:29:01,701 epoch 10 - iter 750/7500 - loss 0.12076654 - time (sec): 94.43 - samples/sec: 257.39 - lr: 0.000001 - momentum: 0.000000 2023-11-16 00:30:35,121 epoch 10 - iter 1500/7500 - loss 0.11879889 - time (sec): 187.85 - samples/sec: 259.58 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:32:08,364 epoch 10 - iter 2250/7500 - loss 0.11450840 - time (sec): 281.09 - samples/sec: 258.55 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:33:41,838 epoch 10 - iter 3000/7500 - loss 0.10939028 - time (sec): 374.57 - samples/sec: 258.62 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:35:14,502 epoch 10 - iter 3750/7500 - loss 0.10864189 - time (sec): 467.23 - samples/sec: 259.77 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:36:47,096 epoch 10 - iter 4500/7500 - loss 0.11020150 - time (sec): 559.83 - samples/sec: 259.27 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:38:19,472 epoch 10 - iter 5250/7500 - loss 0.11284750 - time (sec): 652.20 - samples/sec: 259.40 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:39:51,965 epoch 10 - iter 6000/7500 - loss 0.11442017 - time (sec): 744.70 - samples/sec: 259.20 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:41:24,113 epoch 10 - iter 6750/7500 - loss 0.11440977 - time (sec): 836.84 - samples/sec: 259.01 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:42:56,471 epoch 10 - iter 7500/7500 - loss 0.11765761 - time (sec): 929.20 - samples/sec: 259.14 - lr: 0.000000 - momentum: 0.000000 2023-11-16 00:42:56,474 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:42:56,474 EPOCH 10 done: loss 0.1177 - lr: 0.000000 2023-11-16 00:43:23,420 DEV : loss 0.3264077305793762 - f1-score (micro avg) 0.9005 2023-11-16 00:43:25,353 saving best model 2023-11-16 00:43:30,017 ---------------------------------------------------------------------------------------------------- 2023-11-16 00:43:30,019 Loading model from best epoch ... 2023-11-16 00:43:39,075 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER 2023-11-16 00:44:07,251 Results: - F-score (micro) 0.9036 - F-score (macro) 0.9025 - Accuracy 0.8526 By class: precision recall f1-score support LOC 0.9015 0.9153 0.9083 5288 PER 0.9170 0.9430 0.9298 3962 ORG 0.8680 0.8708 0.8694 3807 micro avg 0.8966 0.9107 0.9036 13057 macro avg 0.8955 0.9097 0.9025 13057 weighted avg 0.8964 0.9107 0.9035 13057 2023-11-16 00:44:07,251 ----------------------------------------------------------------------------------------------------