stefan-it's picture
Upload ./training.log with huggingface_hub
04a6203
2023-11-15 22:00:25,159 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,161 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): XLMRobertaModel(
(embeddings): XLMRobertaEmbeddings(
(word_embeddings): Embedding(250003, 1024)
(position_embeddings): Embedding(514, 1024, padding_idx=1)
(token_type_embeddings): Embedding(1, 1024)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): XLMRobertaEncoder(
(layer): ModuleList(
(0-23): 24 x XLMRobertaLayer(
(attention): XLMRobertaAttention(
(self): XLMRobertaSelfAttention(
(query): Linear(in_features=1024, out_features=1024, bias=True)
(key): Linear(in_features=1024, out_features=1024, bias=True)
(value): Linear(in_features=1024, out_features=1024, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): XLMRobertaSelfOutput(
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): XLMRobertaIntermediate(
(dense): Linear(in_features=1024, out_features=4096, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): XLMRobertaOutput(
(dense): Linear(in_features=4096, out_features=1024, bias=True)
(LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): XLMRobertaPooler(
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1024, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-11-15 22:00:25,161 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,161 MultiCorpus: 30000 train + 10000 dev + 10000 test sentences
- ColumnCorpus Corpus: 20000 train + 0 dev + 0 test sentences - /root/.flair/datasets/ner_multi_xtreme/en
- ColumnCorpus Corpus: 10000 train + 10000 dev + 10000 test sentences - /root/.flair/datasets/ner_multi_xtreme/ka
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 Train: 30000 sentences
2023-11-15 22:00:25,162 (train_with_dev=False, train_with_test=False)
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 Training Params:
2023-11-15 22:00:25,162 - learning_rate: "5e-06"
2023-11-15 22:00:25,162 - mini_batch_size: "4"
2023-11-15 22:00:25,162 - max_epochs: "10"
2023-11-15 22:00:25,162 - shuffle: "True"
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 Plugins:
2023-11-15 22:00:25,162 - TensorboardLogger
2023-11-15 22:00:25,162 - LinearScheduler | warmup_fraction: '0.1'
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 Final evaluation on model from best epoch (best-model.pt)
2023-11-15 22:00:25,162 - metric: "('micro avg', 'f1-score')"
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 Computation:
2023-11-15 22:00:25,162 - compute on device: cuda:0
2023-11-15 22:00:25,162 - embedding storage: none
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 Model training base path: "autotrain-flair-georgian-ner-xlm_r_large-bs4-e10-lr5e-06-1"
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 ----------------------------------------------------------------------------------------------------
2023-11-15 22:00:25,162 Logging anything other than scalars to TensorBoard is currently not supported.
2023-11-15 22:02:02,381 epoch 1 - iter 750/7500 - loss 2.97720856 - time (sec): 97.22 - samples/sec: 247.35 - lr: 0.000000 - momentum: 0.000000
2023-11-15 22:03:39,323 epoch 1 - iter 1500/7500 - loss 2.41069745 - time (sec): 194.16 - samples/sec: 251.96 - lr: 0.000001 - momentum: 0.000000
2023-11-15 22:05:13,923 epoch 1 - iter 2250/7500 - loss 2.11659763 - time (sec): 288.76 - samples/sec: 250.97 - lr: 0.000001 - momentum: 0.000000
2023-11-15 22:06:49,395 epoch 1 - iter 3000/7500 - loss 1.85845398 - time (sec): 384.23 - samples/sec: 250.76 - lr: 0.000002 - momentum: 0.000000
2023-11-15 22:08:25,760 epoch 1 - iter 3750/7500 - loss 1.64125510 - time (sec): 480.60 - samples/sec: 250.04 - lr: 0.000002 - momentum: 0.000000
2023-11-15 22:09:57,628 epoch 1 - iter 4500/7500 - loss 1.46644056 - time (sec): 572.46 - samples/sec: 251.97 - lr: 0.000003 - momentum: 0.000000
2023-11-15 22:11:32,160 epoch 1 - iter 5250/7500 - loss 1.33869112 - time (sec): 667.00 - samples/sec: 252.39 - lr: 0.000003 - momentum: 0.000000
2023-11-15 22:13:07,470 epoch 1 - iter 6000/7500 - loss 1.23507864 - time (sec): 762.31 - samples/sec: 252.62 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:14:40,413 epoch 1 - iter 6750/7500 - loss 1.14834855 - time (sec): 855.25 - samples/sec: 253.64 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:16:12,674 epoch 1 - iter 7500/7500 - loss 1.08078087 - time (sec): 947.51 - samples/sec: 254.14 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:16:12,676 ----------------------------------------------------------------------------------------------------
2023-11-15 22:16:12,676 EPOCH 1 done: loss 1.0808 - lr: 0.000005
2023-11-15 22:16:40,276 DEV : loss 0.28522568941116333 - f1-score (micro avg) 0.7689
2023-11-15 22:16:42,355 saving best model
2023-11-15 22:16:45,087 ----------------------------------------------------------------------------------------------------
2023-11-15 22:18:19,302 epoch 2 - iter 750/7500 - loss 0.40628767 - time (sec): 94.21 - samples/sec: 252.98 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:19:54,957 epoch 2 - iter 1500/7500 - loss 0.39913376 - time (sec): 189.87 - samples/sec: 252.13 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:21:30,957 epoch 2 - iter 2250/7500 - loss 0.39502501 - time (sec): 285.87 - samples/sec: 252.08 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:23:08,084 epoch 2 - iter 3000/7500 - loss 0.39217047 - time (sec): 382.99 - samples/sec: 252.19 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:24:44,510 epoch 2 - iter 3750/7500 - loss 0.39363076 - time (sec): 479.42 - samples/sec: 251.43 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:26:22,214 epoch 2 - iter 4500/7500 - loss 0.39486610 - time (sec): 577.12 - samples/sec: 250.52 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:28:03,826 epoch 2 - iter 5250/7500 - loss 0.39412988 - time (sec): 678.74 - samples/sec: 248.23 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:29:45,422 epoch 2 - iter 6000/7500 - loss 0.39244545 - time (sec): 780.33 - samples/sec: 246.43 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:31:27,285 epoch 2 - iter 6750/7500 - loss 0.39014491 - time (sec): 882.20 - samples/sec: 245.94 - lr: 0.000005 - momentum: 0.000000
2023-11-15 22:33:08,358 epoch 2 - iter 7500/7500 - loss 0.38962978 - time (sec): 983.27 - samples/sec: 244.89 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:33:08,361 ----------------------------------------------------------------------------------------------------
2023-11-15 22:33:08,361 EPOCH 2 done: loss 0.3896 - lr: 0.000004
2023-11-15 22:33:34,136 DEV : loss 0.22754639387130737 - f1-score (micro avg) 0.8657
2023-11-15 22:33:36,396 saving best model
2023-11-15 22:33:39,256 ----------------------------------------------------------------------------------------------------
2023-11-15 22:35:17,903 epoch 3 - iter 750/7500 - loss 0.34214105 - time (sec): 98.64 - samples/sec: 244.93 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:36:57,436 epoch 3 - iter 1500/7500 - loss 0.33960084 - time (sec): 198.18 - samples/sec: 243.91 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:38:36,787 epoch 3 - iter 2250/7500 - loss 0.34100409 - time (sec): 297.53 - samples/sec: 245.57 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:40:16,845 epoch 3 - iter 3000/7500 - loss 0.34997877 - time (sec): 397.59 - samples/sec: 242.72 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:41:55,617 epoch 3 - iter 3750/7500 - loss 0.34973124 - time (sec): 496.36 - samples/sec: 242.81 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:43:33,457 epoch 3 - iter 4500/7500 - loss 0.35049763 - time (sec): 594.20 - samples/sec: 242.95 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:45:13,590 epoch 3 - iter 5250/7500 - loss 0.34990880 - time (sec): 694.33 - samples/sec: 242.50 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:46:52,454 epoch 3 - iter 6000/7500 - loss 0.35059132 - time (sec): 793.19 - samples/sec: 242.62 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:48:30,458 epoch 3 - iter 6750/7500 - loss 0.34738568 - time (sec): 891.20 - samples/sec: 242.64 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:50:08,763 epoch 3 - iter 7500/7500 - loss 0.34558871 - time (sec): 989.50 - samples/sec: 243.35 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:50:08,766 ----------------------------------------------------------------------------------------------------
2023-11-15 22:50:08,766 EPOCH 3 done: loss 0.3456 - lr: 0.000004
2023-11-15 22:50:36,792 DEV : loss 0.2620299756526947 - f1-score (micro avg) 0.8807
2023-11-15 22:50:39,427 saving best model
2023-11-15 22:50:42,689 ----------------------------------------------------------------------------------------------------
2023-11-15 22:52:20,669 epoch 4 - iter 750/7500 - loss 0.28881601 - time (sec): 97.97 - samples/sec: 249.59 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:53:59,111 epoch 4 - iter 1500/7500 - loss 0.29772971 - time (sec): 196.42 - samples/sec: 247.27 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:55:37,524 epoch 4 - iter 2250/7500 - loss 0.29353995 - time (sec): 294.83 - samples/sec: 246.88 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:57:10,987 epoch 4 - iter 3000/7500 - loss 0.29232593 - time (sec): 388.29 - samples/sec: 249.82 - lr: 0.000004 - momentum: 0.000000
2023-11-15 22:58:45,878 epoch 4 - iter 3750/7500 - loss 0.29565608 - time (sec): 483.18 - samples/sec: 250.22 - lr: 0.000004 - momentum: 0.000000
2023-11-15 23:00:15,561 epoch 4 - iter 4500/7500 - loss 0.29546503 - time (sec): 572.87 - samples/sec: 252.61 - lr: 0.000004 - momentum: 0.000000
2023-11-15 23:01:46,907 epoch 4 - iter 5250/7500 - loss 0.29295260 - time (sec): 664.21 - samples/sec: 254.49 - lr: 0.000004 - momentum: 0.000000
2023-11-15 23:03:20,758 epoch 4 - iter 6000/7500 - loss 0.29538906 - time (sec): 758.06 - samples/sec: 254.97 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:04:54,701 epoch 4 - iter 6750/7500 - loss 0.29413686 - time (sec): 852.01 - samples/sec: 254.92 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:06:27,232 epoch 4 - iter 7500/7500 - loss 0.29473517 - time (sec): 944.54 - samples/sec: 254.94 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:06:27,234 ----------------------------------------------------------------------------------------------------
2023-11-15 23:06:27,234 EPOCH 4 done: loss 0.2947 - lr: 0.000003
2023-11-15 23:06:55,453 DEV : loss 0.2627362310886383 - f1-score (micro avg) 0.8931
2023-11-15 23:06:57,258 saving best model
2023-11-15 23:07:00,052 ----------------------------------------------------------------------------------------------------
2023-11-15 23:08:33,866 epoch 5 - iter 750/7500 - loss 0.23475035 - time (sec): 93.81 - samples/sec: 257.86 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:10:06,704 epoch 5 - iter 1500/7500 - loss 0.25020039 - time (sec): 186.65 - samples/sec: 257.30 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:11:40,952 epoch 5 - iter 2250/7500 - loss 0.24718727 - time (sec): 280.90 - samples/sec: 257.62 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:13:19,252 epoch 5 - iter 3000/7500 - loss 0.24230280 - time (sec): 379.20 - samples/sec: 255.38 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:14:53,432 epoch 5 - iter 3750/7500 - loss 0.24621586 - time (sec): 473.38 - samples/sec: 255.15 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:16:28,060 epoch 5 - iter 4500/7500 - loss 0.25446598 - time (sec): 568.01 - samples/sec: 254.99 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:18:01,296 epoch 5 - iter 5250/7500 - loss 0.25785483 - time (sec): 661.24 - samples/sec: 255.64 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:19:34,321 epoch 5 - iter 6000/7500 - loss 0.25542150 - time (sec): 754.27 - samples/sec: 255.29 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:21:06,401 epoch 5 - iter 6750/7500 - loss 0.25788299 - time (sec): 846.35 - samples/sec: 256.25 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:22:42,749 epoch 5 - iter 7500/7500 - loss 0.25897971 - time (sec): 942.69 - samples/sec: 255.43 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:22:42,752 ----------------------------------------------------------------------------------------------------
2023-11-15 23:22:42,752 EPOCH 5 done: loss 0.2590 - lr: 0.000003
2023-11-15 23:23:10,474 DEV : loss 0.28726592659950256 - f1-score (micro avg) 0.8965
2023-11-15 23:23:12,645 saving best model
2023-11-15 23:23:15,948 ----------------------------------------------------------------------------------------------------
2023-11-15 23:24:53,113 epoch 6 - iter 750/7500 - loss 0.21973774 - time (sec): 97.16 - samples/sec: 249.95 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:26:26,815 epoch 6 - iter 1500/7500 - loss 0.21332096 - time (sec): 190.86 - samples/sec: 253.62 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:27:59,646 epoch 6 - iter 2250/7500 - loss 0.21491622 - time (sec): 283.69 - samples/sec: 254.10 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:29:32,191 epoch 6 - iter 3000/7500 - loss 0.21457413 - time (sec): 376.24 - samples/sec: 255.67 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:31:04,817 epoch 6 - iter 3750/7500 - loss 0.21967125 - time (sec): 468.87 - samples/sec: 257.41 - lr: 0.000003 - momentum: 0.000000
2023-11-15 23:32:36,784 epoch 6 - iter 4500/7500 - loss 0.22261148 - time (sec): 560.83 - samples/sec: 257.57 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:34:09,162 epoch 6 - iter 5250/7500 - loss 0.22064338 - time (sec): 653.21 - samples/sec: 257.83 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:35:42,101 epoch 6 - iter 6000/7500 - loss 0.21731885 - time (sec): 746.15 - samples/sec: 258.55 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:37:15,361 epoch 6 - iter 6750/7500 - loss 0.21808265 - time (sec): 839.41 - samples/sec: 257.99 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:38:49,020 epoch 6 - iter 7500/7500 - loss 0.21803191 - time (sec): 933.07 - samples/sec: 258.07 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:38:49,030 ----------------------------------------------------------------------------------------------------
2023-11-15 23:38:49,030 EPOCH 6 done: loss 0.2180 - lr: 0.000002
2023-11-15 23:39:16,657 DEV : loss 0.2947460412979126 - f1-score (micro avg) 0.8946
2023-11-15 23:39:18,346 ----------------------------------------------------------------------------------------------------
2023-11-15 23:40:52,475 epoch 7 - iter 750/7500 - loss 0.18291047 - time (sec): 94.13 - samples/sec: 255.74 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:42:24,451 epoch 7 - iter 1500/7500 - loss 0.18433171 - time (sec): 186.10 - samples/sec: 255.14 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:43:58,604 epoch 7 - iter 2250/7500 - loss 0.18998389 - time (sec): 280.25 - samples/sec: 253.23 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:45:30,028 epoch 7 - iter 3000/7500 - loss 0.18175644 - time (sec): 371.68 - samples/sec: 256.35 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:47:03,392 epoch 7 - iter 3750/7500 - loss 0.18696273 - time (sec): 465.04 - samples/sec: 257.12 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:48:37,150 epoch 7 - iter 4500/7500 - loss 0.18321438 - time (sec): 558.80 - samples/sec: 257.32 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:50:10,852 epoch 7 - iter 5250/7500 - loss 0.18492056 - time (sec): 652.50 - samples/sec: 257.36 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:51:45,033 epoch 7 - iter 6000/7500 - loss 0.18451583 - time (sec): 746.68 - samples/sec: 256.74 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:53:17,929 epoch 7 - iter 6750/7500 - loss 0.18613635 - time (sec): 839.58 - samples/sec: 257.56 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:54:48,507 epoch 7 - iter 7500/7500 - loss 0.18639933 - time (sec): 930.16 - samples/sec: 258.88 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:54:48,510 ----------------------------------------------------------------------------------------------------
2023-11-15 23:54:48,510 EPOCH 7 done: loss 0.1864 - lr: 0.000002
2023-11-15 23:55:15,442 DEV : loss 0.3085970878601074 - f1-score (micro avg) 0.8966
2023-11-15 23:55:18,420 saving best model
2023-11-15 23:55:21,094 ----------------------------------------------------------------------------------------------------
2023-11-15 23:56:54,732 epoch 8 - iter 750/7500 - loss 0.17466309 - time (sec): 93.63 - samples/sec: 253.48 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:58:27,121 epoch 8 - iter 1500/7500 - loss 0.16813816 - time (sec): 186.02 - samples/sec: 257.36 - lr: 0.000002 - momentum: 0.000000
2023-11-15 23:59:58,131 epoch 8 - iter 2250/7500 - loss 0.16489442 - time (sec): 277.03 - samples/sec: 259.32 - lr: 0.000002 - momentum: 0.000000
2023-11-16 00:01:31,801 epoch 8 - iter 3000/7500 - loss 0.16611691 - time (sec): 370.70 - samples/sec: 260.01 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:03:04,777 epoch 8 - iter 3750/7500 - loss 0.15963682 - time (sec): 463.68 - samples/sec: 260.64 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:04:37,159 epoch 8 - iter 4500/7500 - loss 0.15855342 - time (sec): 556.06 - samples/sec: 260.73 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:06:08,858 epoch 8 - iter 5250/7500 - loss 0.15795009 - time (sec): 647.76 - samples/sec: 260.56 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:07:40,645 epoch 8 - iter 6000/7500 - loss 0.15834278 - time (sec): 739.55 - samples/sec: 260.14 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:09:13,294 epoch 8 - iter 6750/7500 - loss 0.15728929 - time (sec): 832.19 - samples/sec: 259.82 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:10:45,101 epoch 8 - iter 7500/7500 - loss 0.15715876 - time (sec): 924.00 - samples/sec: 260.60 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:10:45,104 ----------------------------------------------------------------------------------------------------
2023-11-16 00:10:45,104 EPOCH 8 done: loss 0.1572 - lr: 0.000001
2023-11-16 00:11:12,824 DEV : loss 0.3132772743701935 - f1-score (micro avg) 0.8987
2023-11-16 00:11:14,864 saving best model
2023-11-16 00:11:17,496 ----------------------------------------------------------------------------------------------------
2023-11-16 00:12:50,277 epoch 9 - iter 750/7500 - loss 0.13402991 - time (sec): 92.78 - samples/sec: 262.49 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:14:25,042 epoch 9 - iter 1500/7500 - loss 0.13544134 - time (sec): 187.54 - samples/sec: 260.51 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:16:00,925 epoch 9 - iter 2250/7500 - loss 0.13605938 - time (sec): 283.43 - samples/sec: 256.17 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:17:34,205 epoch 9 - iter 3000/7500 - loss 0.13264017 - time (sec): 376.71 - samples/sec: 257.28 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:19:09,897 epoch 9 - iter 3750/7500 - loss 0.13248311 - time (sec): 472.40 - samples/sec: 257.65 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:20:46,186 epoch 9 - iter 4500/7500 - loss 0.13242849 - time (sec): 568.69 - samples/sec: 255.77 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:22:19,408 epoch 9 - iter 5250/7500 - loss 0.13193630 - time (sec): 661.91 - samples/sec: 256.28 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:23:53,091 epoch 9 - iter 6000/7500 - loss 0.13145249 - time (sec): 755.59 - samples/sec: 256.01 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:25:24,254 epoch 9 - iter 6750/7500 - loss 0.13171967 - time (sec): 846.75 - samples/sec: 256.47 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:26:55,146 epoch 9 - iter 7500/7500 - loss 0.13410642 - time (sec): 937.65 - samples/sec: 256.81 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:26:55,148 ----------------------------------------------------------------------------------------------------
2023-11-16 00:26:55,148 EPOCH 9 done: loss 0.1341 - lr: 0.000001
2023-11-16 00:27:22,622 DEV : loss 0.33312830328941345 - f1-score (micro avg) 0.8995
2023-11-16 00:27:24,615 saving best model
2023-11-16 00:27:27,266 ----------------------------------------------------------------------------------------------------
2023-11-16 00:29:01,701 epoch 10 - iter 750/7500 - loss 0.12076654 - time (sec): 94.43 - samples/sec: 257.39 - lr: 0.000001 - momentum: 0.000000
2023-11-16 00:30:35,121 epoch 10 - iter 1500/7500 - loss 0.11879889 - time (sec): 187.85 - samples/sec: 259.58 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:32:08,364 epoch 10 - iter 2250/7500 - loss 0.11450840 - time (sec): 281.09 - samples/sec: 258.55 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:33:41,838 epoch 10 - iter 3000/7500 - loss 0.10939028 - time (sec): 374.57 - samples/sec: 258.62 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:35:14,502 epoch 10 - iter 3750/7500 - loss 0.10864189 - time (sec): 467.23 - samples/sec: 259.77 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:36:47,096 epoch 10 - iter 4500/7500 - loss 0.11020150 - time (sec): 559.83 - samples/sec: 259.27 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:38:19,472 epoch 10 - iter 5250/7500 - loss 0.11284750 - time (sec): 652.20 - samples/sec: 259.40 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:39:51,965 epoch 10 - iter 6000/7500 - loss 0.11442017 - time (sec): 744.70 - samples/sec: 259.20 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:41:24,113 epoch 10 - iter 6750/7500 - loss 0.11440977 - time (sec): 836.84 - samples/sec: 259.01 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:42:56,471 epoch 10 - iter 7500/7500 - loss 0.11765761 - time (sec): 929.20 - samples/sec: 259.14 - lr: 0.000000 - momentum: 0.000000
2023-11-16 00:42:56,474 ----------------------------------------------------------------------------------------------------
2023-11-16 00:42:56,474 EPOCH 10 done: loss 0.1177 - lr: 0.000000
2023-11-16 00:43:23,420 DEV : loss 0.3264077305793762 - f1-score (micro avg) 0.9005
2023-11-16 00:43:25,353 saving best model
2023-11-16 00:43:30,017 ----------------------------------------------------------------------------------------------------
2023-11-16 00:43:30,019 Loading model from best epoch ...
2023-11-16 00:43:39,075 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER
2023-11-16 00:44:07,251
Results:
- F-score (micro) 0.9036
- F-score (macro) 0.9025
- Accuracy 0.8526
By class:
precision recall f1-score support
LOC 0.9015 0.9153 0.9083 5288
PER 0.9170 0.9430 0.9298 3962
ORG 0.8680 0.8708 0.8694 3807
micro avg 0.8966 0.9107 0.9036 13057
macro avg 0.8955 0.9097 0.9025 13057
weighted avg 0.8964 0.9107 0.9035 13057
2023-11-16 00:44:07,251 ----------------------------------------------------------------------------------------------------