stefan-it's picture
Upload folder using huggingface_hub
be7840b
2023-10-12 18:58:20,800 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,802 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-12 18:58:20,803 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,803 MultiCorpus: 5777 train + 722 dev + 723 test sentences
- NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
2023-10-12 18:58:20,803 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,803 Train: 5777 sentences
2023-10-12 18:58:20,803 (train_with_dev=False, train_with_test=False)
2023-10-12 18:58:20,803 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,803 Training Params:
2023-10-12 18:58:20,803 - learning_rate: "0.00016"
2023-10-12 18:58:20,803 - mini_batch_size: "8"
2023-10-12 18:58:20,803 - max_epochs: "10"
2023-10-12 18:58:20,804 - shuffle: "True"
2023-10-12 18:58:20,804 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,804 Plugins:
2023-10-12 18:58:20,804 - TensorboardLogger
2023-10-12 18:58:20,804 - LinearScheduler | warmup_fraction: '0.1'
2023-10-12 18:58:20,804 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,804 Final evaluation on model from best epoch (best-model.pt)
2023-10-12 18:58:20,804 - metric: "('micro avg', 'f1-score')"
2023-10-12 18:58:20,804 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,804 Computation:
2023-10-12 18:58:20,804 - compute on device: cuda:0
2023-10-12 18:58:20,804 - embedding storage: none
2023-10-12 18:58:20,804 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,804 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5"
2023-10-12 18:58:20,804 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,804 ----------------------------------------------------------------------------------------------------
2023-10-12 18:58:20,805 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-12 18:59:00,054 epoch 1 - iter 72/723 - loss 2.54355580 - time (sec): 39.25 - samples/sec: 460.13 - lr: 0.000016 - momentum: 0.000000
2023-10-12 18:59:38,638 epoch 1 - iter 144/723 - loss 2.47351916 - time (sec): 77.83 - samples/sec: 463.81 - lr: 0.000032 - momentum: 0.000000
2023-10-12 19:00:17,358 epoch 1 - iter 216/723 - loss 2.30912831 - time (sec): 116.55 - samples/sec: 449.38 - lr: 0.000048 - momentum: 0.000000
2023-10-12 19:00:59,860 epoch 1 - iter 288/723 - loss 2.09160479 - time (sec): 159.05 - samples/sec: 440.50 - lr: 0.000064 - momentum: 0.000000
2023-10-12 19:01:41,319 epoch 1 - iter 360/723 - loss 1.86564000 - time (sec): 200.51 - samples/sec: 438.01 - lr: 0.000079 - momentum: 0.000000
2023-10-12 19:02:21,012 epoch 1 - iter 432/723 - loss 1.64989151 - time (sec): 240.21 - samples/sec: 435.87 - lr: 0.000095 - momentum: 0.000000
2023-10-12 19:03:00,546 epoch 1 - iter 504/723 - loss 1.45108438 - time (sec): 279.74 - samples/sec: 437.65 - lr: 0.000111 - momentum: 0.000000
2023-10-12 19:03:39,300 epoch 1 - iter 576/723 - loss 1.30837807 - time (sec): 318.49 - samples/sec: 437.09 - lr: 0.000127 - momentum: 0.000000
2023-10-12 19:04:19,048 epoch 1 - iter 648/723 - loss 1.18554857 - time (sec): 358.24 - samples/sec: 438.05 - lr: 0.000143 - momentum: 0.000000
2023-10-12 19:05:00,227 epoch 1 - iter 720/723 - loss 1.07844162 - time (sec): 399.42 - samples/sec: 439.29 - lr: 0.000159 - momentum: 0.000000
2023-10-12 19:05:01,624 ----------------------------------------------------------------------------------------------------
2023-10-12 19:05:01,624 EPOCH 1 done: loss 1.0744 - lr: 0.000159
2023-10-12 19:05:21,904 DEV : loss 0.2226296067237854 - f1-score (micro avg) 0.0
2023-10-12 19:05:21,937 ----------------------------------------------------------------------------------------------------
2023-10-12 19:06:00,883 epoch 2 - iter 72/723 - loss 0.16071835 - time (sec): 38.94 - samples/sec: 447.92 - lr: 0.000158 - momentum: 0.000000
2023-10-12 19:06:40,410 epoch 2 - iter 144/723 - loss 0.14928826 - time (sec): 78.47 - samples/sec: 451.95 - lr: 0.000156 - momentum: 0.000000
2023-10-12 19:07:19,001 epoch 2 - iter 216/723 - loss 0.14640929 - time (sec): 117.06 - samples/sec: 444.07 - lr: 0.000155 - momentum: 0.000000
2023-10-12 19:07:58,952 epoch 2 - iter 288/723 - loss 0.14351447 - time (sec): 157.01 - samples/sec: 444.43 - lr: 0.000153 - momentum: 0.000000
2023-10-12 19:08:37,927 epoch 2 - iter 360/723 - loss 0.13927936 - time (sec): 195.99 - samples/sec: 445.13 - lr: 0.000151 - momentum: 0.000000
2023-10-12 19:09:17,408 epoch 2 - iter 432/723 - loss 0.13562770 - time (sec): 235.47 - samples/sec: 445.08 - lr: 0.000149 - momentum: 0.000000
2023-10-12 19:09:56,251 epoch 2 - iter 504/723 - loss 0.13508475 - time (sec): 274.31 - samples/sec: 445.09 - lr: 0.000148 - momentum: 0.000000
2023-10-12 19:10:35,074 epoch 2 - iter 576/723 - loss 0.13232611 - time (sec): 313.13 - samples/sec: 446.82 - lr: 0.000146 - momentum: 0.000000
2023-10-12 19:11:14,451 epoch 2 - iter 648/723 - loss 0.12941887 - time (sec): 352.51 - samples/sec: 448.75 - lr: 0.000144 - momentum: 0.000000
2023-10-12 19:11:53,004 epoch 2 - iter 720/723 - loss 0.12559223 - time (sec): 391.07 - samples/sec: 449.04 - lr: 0.000142 - momentum: 0.000000
2023-10-12 19:11:54,252 ----------------------------------------------------------------------------------------------------
2023-10-12 19:11:54,252 EPOCH 2 done: loss 0.1254 - lr: 0.000142
2023-10-12 19:12:15,858 DEV : loss 0.10742620378732681 - f1-score (micro avg) 0.7805
2023-10-12 19:12:15,891 saving best model
2023-10-12 19:12:16,807 ----------------------------------------------------------------------------------------------------
2023-10-12 19:12:56,695 epoch 3 - iter 72/723 - loss 0.08288091 - time (sec): 39.89 - samples/sec: 448.88 - lr: 0.000140 - momentum: 0.000000
2023-10-12 19:13:36,974 epoch 3 - iter 144/723 - loss 0.07985183 - time (sec): 80.16 - samples/sec: 448.64 - lr: 0.000139 - momentum: 0.000000
2023-10-12 19:14:16,040 epoch 3 - iter 216/723 - loss 0.07952059 - time (sec): 119.23 - samples/sec: 448.91 - lr: 0.000137 - momentum: 0.000000
2023-10-12 19:14:54,814 epoch 3 - iter 288/723 - loss 0.07742485 - time (sec): 158.01 - samples/sec: 449.88 - lr: 0.000135 - momentum: 0.000000
2023-10-12 19:15:33,909 epoch 3 - iter 360/723 - loss 0.07655356 - time (sec): 197.10 - samples/sec: 451.75 - lr: 0.000133 - momentum: 0.000000
2023-10-12 19:16:13,793 epoch 3 - iter 432/723 - loss 0.07627322 - time (sec): 236.98 - samples/sec: 454.76 - lr: 0.000132 - momentum: 0.000000
2023-10-12 19:16:52,959 epoch 3 - iter 504/723 - loss 0.07598384 - time (sec): 276.15 - samples/sec: 453.26 - lr: 0.000130 - momentum: 0.000000
2023-10-12 19:17:32,309 epoch 3 - iter 576/723 - loss 0.07582371 - time (sec): 315.50 - samples/sec: 450.64 - lr: 0.000128 - momentum: 0.000000
2023-10-12 19:18:11,512 epoch 3 - iter 648/723 - loss 0.07469627 - time (sec): 354.70 - samples/sec: 447.48 - lr: 0.000126 - momentum: 0.000000
2023-10-12 19:18:50,760 epoch 3 - iter 720/723 - loss 0.07352052 - time (sec): 393.95 - samples/sec: 445.53 - lr: 0.000125 - momentum: 0.000000
2023-10-12 19:18:52,075 ----------------------------------------------------------------------------------------------------
2023-10-12 19:18:52,075 EPOCH 3 done: loss 0.0736 - lr: 0.000125
2023-10-12 19:19:13,745 DEV : loss 0.07580851018428802 - f1-score (micro avg) 0.8611
2023-10-12 19:19:13,776 saving best model
2023-10-12 19:19:24,857 ----------------------------------------------------------------------------------------------------
2023-10-12 19:20:05,578 epoch 4 - iter 72/723 - loss 0.05191155 - time (sec): 40.72 - samples/sec: 440.48 - lr: 0.000123 - momentum: 0.000000
2023-10-12 19:20:44,504 epoch 4 - iter 144/723 - loss 0.05203988 - time (sec): 79.64 - samples/sec: 436.31 - lr: 0.000121 - momentum: 0.000000
2023-10-12 19:21:22,936 epoch 4 - iter 216/723 - loss 0.04893082 - time (sec): 118.07 - samples/sec: 444.03 - lr: 0.000119 - momentum: 0.000000
2023-10-12 19:22:01,345 epoch 4 - iter 288/723 - loss 0.04936358 - time (sec): 156.48 - samples/sec: 456.68 - lr: 0.000117 - momentum: 0.000000
2023-10-12 19:22:39,287 epoch 4 - iter 360/723 - loss 0.04704445 - time (sec): 194.43 - samples/sec: 458.90 - lr: 0.000116 - momentum: 0.000000
2023-10-12 19:23:17,810 epoch 4 - iter 432/723 - loss 0.04632235 - time (sec): 232.95 - samples/sec: 455.86 - lr: 0.000114 - momentum: 0.000000
2023-10-12 19:23:57,203 epoch 4 - iter 504/723 - loss 0.04588963 - time (sec): 272.34 - samples/sec: 452.32 - lr: 0.000112 - momentum: 0.000000
2023-10-12 19:24:36,716 epoch 4 - iter 576/723 - loss 0.04525655 - time (sec): 311.85 - samples/sec: 453.47 - lr: 0.000110 - momentum: 0.000000
2023-10-12 19:25:18,379 epoch 4 - iter 648/723 - loss 0.04859378 - time (sec): 353.52 - samples/sec: 449.36 - lr: 0.000109 - momentum: 0.000000
2023-10-12 19:25:56,777 epoch 4 - iter 720/723 - loss 0.04735869 - time (sec): 391.92 - samples/sec: 448.64 - lr: 0.000107 - momentum: 0.000000
2023-10-12 19:25:57,841 ----------------------------------------------------------------------------------------------------
2023-10-12 19:25:57,841 EPOCH 4 done: loss 0.0475 - lr: 0.000107
2023-10-12 19:26:19,114 DEV : loss 0.09613429009914398 - f1-score (micro avg) 0.8346
2023-10-12 19:26:19,147 ----------------------------------------------------------------------------------------------------
2023-10-12 19:26:59,987 epoch 5 - iter 72/723 - loss 0.03592579 - time (sec): 40.84 - samples/sec: 462.23 - lr: 0.000105 - momentum: 0.000000
2023-10-12 19:27:38,528 epoch 5 - iter 144/723 - loss 0.03119838 - time (sec): 79.38 - samples/sec: 457.18 - lr: 0.000103 - momentum: 0.000000
2023-10-12 19:28:16,016 epoch 5 - iter 216/723 - loss 0.03042988 - time (sec): 116.87 - samples/sec: 443.70 - lr: 0.000101 - momentum: 0.000000
2023-10-12 19:28:54,004 epoch 5 - iter 288/723 - loss 0.03005227 - time (sec): 154.85 - samples/sec: 439.46 - lr: 0.000100 - momentum: 0.000000
2023-10-12 19:29:34,928 epoch 5 - iter 360/723 - loss 0.03217606 - time (sec): 195.78 - samples/sec: 442.46 - lr: 0.000098 - momentum: 0.000000
2023-10-12 19:30:14,684 epoch 5 - iter 432/723 - loss 0.03112412 - time (sec): 235.53 - samples/sec: 441.59 - lr: 0.000096 - momentum: 0.000000
2023-10-12 19:30:55,240 epoch 5 - iter 504/723 - loss 0.03176554 - time (sec): 276.09 - samples/sec: 443.79 - lr: 0.000094 - momentum: 0.000000
2023-10-12 19:31:34,003 epoch 5 - iter 576/723 - loss 0.03180365 - time (sec): 314.85 - samples/sec: 445.61 - lr: 0.000093 - momentum: 0.000000
2023-10-12 19:32:13,719 epoch 5 - iter 648/723 - loss 0.03213951 - time (sec): 354.57 - samples/sec: 445.33 - lr: 0.000091 - momentum: 0.000000
2023-10-12 19:32:54,951 epoch 5 - iter 720/723 - loss 0.03253080 - time (sec): 395.80 - samples/sec: 443.07 - lr: 0.000089 - momentum: 0.000000
2023-10-12 19:32:56,682 ----------------------------------------------------------------------------------------------------
2023-10-12 19:32:56,683 EPOCH 5 done: loss 0.0326 - lr: 0.000089
2023-10-12 19:33:18,519 DEV : loss 0.08075438439846039 - f1-score (micro avg) 0.8604
2023-10-12 19:33:18,549 ----------------------------------------------------------------------------------------------------
2023-10-12 19:33:57,261 epoch 6 - iter 72/723 - loss 0.02372279 - time (sec): 38.71 - samples/sec: 444.51 - lr: 0.000087 - momentum: 0.000000
2023-10-12 19:34:36,141 epoch 6 - iter 144/723 - loss 0.02332990 - time (sec): 77.59 - samples/sec: 445.00 - lr: 0.000085 - momentum: 0.000000
2023-10-12 19:35:15,551 epoch 6 - iter 216/723 - loss 0.02529176 - time (sec): 117.00 - samples/sec: 447.90 - lr: 0.000084 - momentum: 0.000000
2023-10-12 19:35:56,603 epoch 6 - iter 288/723 - loss 0.02502072 - time (sec): 158.05 - samples/sec: 445.49 - lr: 0.000082 - momentum: 0.000000
2023-10-12 19:36:37,040 epoch 6 - iter 360/723 - loss 0.02478190 - time (sec): 198.49 - samples/sec: 444.01 - lr: 0.000080 - momentum: 0.000000
2023-10-12 19:37:16,368 epoch 6 - iter 432/723 - loss 0.02272948 - time (sec): 237.82 - samples/sec: 448.09 - lr: 0.000078 - momentum: 0.000000
2023-10-12 19:37:55,143 epoch 6 - iter 504/723 - loss 0.02482061 - time (sec): 276.59 - samples/sec: 449.75 - lr: 0.000077 - momentum: 0.000000
2023-10-12 19:38:32,812 epoch 6 - iter 576/723 - loss 0.02360557 - time (sec): 314.26 - samples/sec: 448.53 - lr: 0.000075 - momentum: 0.000000
2023-10-12 19:39:11,274 epoch 6 - iter 648/723 - loss 0.02300141 - time (sec): 352.72 - samples/sec: 446.97 - lr: 0.000073 - momentum: 0.000000
2023-10-12 19:39:53,495 epoch 6 - iter 720/723 - loss 0.02395673 - time (sec): 394.94 - samples/sec: 444.81 - lr: 0.000071 - momentum: 0.000000
2023-10-12 19:39:54,699 ----------------------------------------------------------------------------------------------------
2023-10-12 19:39:54,700 EPOCH 6 done: loss 0.0240 - lr: 0.000071
2023-10-12 19:40:16,957 DEV : loss 0.09915146231651306 - f1-score (micro avg) 0.8614
2023-10-12 19:40:16,992 saving best model
2023-10-12 19:40:19,580 ----------------------------------------------------------------------------------------------------
2023-10-12 19:41:00,926 epoch 7 - iter 72/723 - loss 0.02262591 - time (sec): 41.34 - samples/sec: 426.67 - lr: 0.000069 - momentum: 0.000000
2023-10-12 19:41:42,902 epoch 7 - iter 144/723 - loss 0.02190572 - time (sec): 83.32 - samples/sec: 426.99 - lr: 0.000068 - momentum: 0.000000
2023-10-12 19:42:24,325 epoch 7 - iter 216/723 - loss 0.02121091 - time (sec): 124.74 - samples/sec: 417.62 - lr: 0.000066 - momentum: 0.000000
2023-10-12 19:43:04,878 epoch 7 - iter 288/723 - loss 0.02062404 - time (sec): 165.29 - samples/sec: 415.31 - lr: 0.000064 - momentum: 0.000000
2023-10-12 19:43:46,681 epoch 7 - iter 360/723 - loss 0.02239849 - time (sec): 207.10 - samples/sec: 419.10 - lr: 0.000062 - momentum: 0.000000
2023-10-12 19:44:28,351 epoch 7 - iter 432/723 - loss 0.02093754 - time (sec): 248.77 - samples/sec: 418.93 - lr: 0.000061 - momentum: 0.000000
2023-10-12 19:45:08,141 epoch 7 - iter 504/723 - loss 0.02078108 - time (sec): 288.56 - samples/sec: 422.55 - lr: 0.000059 - momentum: 0.000000
2023-10-12 19:45:47,271 epoch 7 - iter 576/723 - loss 0.02052113 - time (sec): 327.69 - samples/sec: 424.00 - lr: 0.000057 - momentum: 0.000000
2023-10-12 19:46:26,176 epoch 7 - iter 648/723 - loss 0.01984905 - time (sec): 366.59 - samples/sec: 426.95 - lr: 0.000055 - momentum: 0.000000
2023-10-12 19:47:06,241 epoch 7 - iter 720/723 - loss 0.01954505 - time (sec): 406.66 - samples/sec: 431.54 - lr: 0.000053 - momentum: 0.000000
2023-10-12 19:47:07,629 ----------------------------------------------------------------------------------------------------
2023-10-12 19:47:07,630 EPOCH 7 done: loss 0.0195 - lr: 0.000053
2023-10-12 19:47:29,089 DEV : loss 0.10769647359848022 - f1-score (micro avg) 0.8602
2023-10-12 19:47:29,124 ----------------------------------------------------------------------------------------------------
2023-10-12 19:48:08,516 epoch 8 - iter 72/723 - loss 0.01456090 - time (sec): 39.39 - samples/sec: 471.19 - lr: 0.000052 - momentum: 0.000000
2023-10-12 19:48:47,249 epoch 8 - iter 144/723 - loss 0.01638180 - time (sec): 78.12 - samples/sec: 459.65 - lr: 0.000050 - momentum: 0.000000
2023-10-12 19:49:25,637 epoch 8 - iter 216/723 - loss 0.01560092 - time (sec): 116.51 - samples/sec: 453.37 - lr: 0.000048 - momentum: 0.000000
2023-10-12 19:50:05,829 epoch 8 - iter 288/723 - loss 0.01509465 - time (sec): 156.70 - samples/sec: 459.02 - lr: 0.000046 - momentum: 0.000000
2023-10-12 19:50:44,804 epoch 8 - iter 360/723 - loss 0.01478213 - time (sec): 195.68 - samples/sec: 456.37 - lr: 0.000045 - momentum: 0.000000
2023-10-12 19:51:23,381 epoch 8 - iter 432/723 - loss 0.01528980 - time (sec): 234.25 - samples/sec: 451.48 - lr: 0.000043 - momentum: 0.000000
2023-10-12 19:52:02,332 epoch 8 - iter 504/723 - loss 0.01593641 - time (sec): 273.21 - samples/sec: 450.38 - lr: 0.000041 - momentum: 0.000000
2023-10-12 19:52:41,277 epoch 8 - iter 576/723 - loss 0.01546593 - time (sec): 312.15 - samples/sec: 446.97 - lr: 0.000039 - momentum: 0.000000
2023-10-12 19:53:22,174 epoch 8 - iter 648/723 - loss 0.01707868 - time (sec): 353.05 - samples/sec: 447.25 - lr: 0.000037 - momentum: 0.000000
2023-10-12 19:54:02,545 epoch 8 - iter 720/723 - loss 0.01621935 - time (sec): 393.42 - samples/sec: 446.66 - lr: 0.000036 - momentum: 0.000000
2023-10-12 19:54:03,704 ----------------------------------------------------------------------------------------------------
2023-10-12 19:54:03,705 EPOCH 8 done: loss 0.0162 - lr: 0.000036
2023-10-12 19:54:24,898 DEV : loss 0.11788733303546906 - f1-score (micro avg) 0.8613
2023-10-12 19:54:24,929 ----------------------------------------------------------------------------------------------------
2023-10-12 19:55:04,405 epoch 9 - iter 72/723 - loss 0.00435665 - time (sec): 39.47 - samples/sec: 466.21 - lr: 0.000034 - momentum: 0.000000
2023-10-12 19:55:43,405 epoch 9 - iter 144/723 - loss 0.01561935 - time (sec): 78.47 - samples/sec: 474.03 - lr: 0.000032 - momentum: 0.000000
2023-10-12 19:56:21,332 epoch 9 - iter 216/723 - loss 0.01515669 - time (sec): 116.40 - samples/sec: 472.95 - lr: 0.000030 - momentum: 0.000000
2023-10-12 19:56:58,414 epoch 9 - iter 288/723 - loss 0.01423773 - time (sec): 153.48 - samples/sec: 463.50 - lr: 0.000028 - momentum: 0.000000
2023-10-12 19:57:36,161 epoch 9 - iter 360/723 - loss 0.01346231 - time (sec): 191.23 - samples/sec: 454.95 - lr: 0.000027 - momentum: 0.000000
2023-10-12 19:58:16,429 epoch 9 - iter 432/723 - loss 0.01303858 - time (sec): 231.50 - samples/sec: 453.21 - lr: 0.000025 - momentum: 0.000000
2023-10-12 19:58:56,064 epoch 9 - iter 504/723 - loss 0.01320394 - time (sec): 271.13 - samples/sec: 451.94 - lr: 0.000023 - momentum: 0.000000
2023-10-12 19:59:36,961 epoch 9 - iter 576/723 - loss 0.01367903 - time (sec): 312.03 - samples/sec: 453.35 - lr: 0.000021 - momentum: 0.000000
2023-10-12 20:00:16,437 epoch 9 - iter 648/723 - loss 0.01284167 - time (sec): 351.51 - samples/sec: 450.96 - lr: 0.000020 - momentum: 0.000000
2023-10-12 20:00:56,177 epoch 9 - iter 720/723 - loss 0.01273017 - time (sec): 391.25 - samples/sec: 449.00 - lr: 0.000018 - momentum: 0.000000
2023-10-12 20:00:57,336 ----------------------------------------------------------------------------------------------------
2023-10-12 20:00:57,336 EPOCH 9 done: loss 0.0127 - lr: 0.000018
2023-10-12 20:01:18,773 DEV : loss 0.11393096297979355 - f1-score (micro avg) 0.8665
2023-10-12 20:01:18,808 saving best model
2023-10-12 20:01:23,910 ----------------------------------------------------------------------------------------------------
2023-10-12 20:02:03,031 epoch 10 - iter 72/723 - loss 0.00615811 - time (sec): 39.12 - samples/sec: 460.52 - lr: 0.000016 - momentum: 0.000000
2023-10-12 20:02:41,238 epoch 10 - iter 144/723 - loss 0.00676133 - time (sec): 77.32 - samples/sec: 436.23 - lr: 0.000014 - momentum: 0.000000
2023-10-12 20:03:20,138 epoch 10 - iter 216/723 - loss 0.00881176 - time (sec): 116.22 - samples/sec: 436.14 - lr: 0.000012 - momentum: 0.000000
2023-10-12 20:04:00,711 epoch 10 - iter 288/723 - loss 0.01105617 - time (sec): 156.80 - samples/sec: 440.79 - lr: 0.000011 - momentum: 0.000000
2023-10-12 20:04:40,165 epoch 10 - iter 360/723 - loss 0.01009560 - time (sec): 196.25 - samples/sec: 439.76 - lr: 0.000009 - momentum: 0.000000
2023-10-12 20:05:21,218 epoch 10 - iter 432/723 - loss 0.00922833 - time (sec): 237.30 - samples/sec: 441.24 - lr: 0.000007 - momentum: 0.000000
2023-10-12 20:06:01,972 epoch 10 - iter 504/723 - loss 0.00991438 - time (sec): 278.06 - samples/sec: 443.34 - lr: 0.000005 - momentum: 0.000000
2023-10-12 20:06:40,462 epoch 10 - iter 576/723 - loss 0.00967514 - time (sec): 316.55 - samples/sec: 442.17 - lr: 0.000004 - momentum: 0.000000
2023-10-12 20:07:19,416 epoch 10 - iter 648/723 - loss 0.00999608 - time (sec): 355.50 - samples/sec: 443.56 - lr: 0.000002 - momentum: 0.000000
2023-10-12 20:07:58,652 epoch 10 - iter 720/723 - loss 0.00993301 - time (sec): 394.74 - samples/sec: 445.20 - lr: 0.000000 - momentum: 0.000000
2023-10-12 20:07:59,756 ----------------------------------------------------------------------------------------------------
2023-10-12 20:07:59,756 EPOCH 10 done: loss 0.0099 - lr: 0.000000
2023-10-12 20:08:21,111 DEV : loss 0.11986048519611359 - f1-score (micro avg) 0.8657
2023-10-12 20:08:21,984 ----------------------------------------------------------------------------------------------------
2023-10-12 20:08:21,986 Loading model from best epoch ...
2023-10-12 20:08:26,181 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-12 20:08:47,461
Results:
- F-score (micro) 0.8641
- F-score (macro) 0.7722
- Accuracy 0.7736
By class:
precision recall f1-score support
PER 0.8645 0.8734 0.8689 482
LOC 0.9154 0.8974 0.9063 458
ORG 0.5625 0.5217 0.5414 69
micro avg 0.8680 0.8603 0.8641 1009
macro avg 0.7808 0.7642 0.7722 1009
weighted avg 0.8669 0.8603 0.8635 1009
2023-10-12 20:08:47,461 ----------------------------------------------------------------------------------------------------