stefan-it's picture
Upload folder using huggingface_hub
4287d30
2023-10-11 09:22:17,384 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,386 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 09:22:17,387 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,387 MultiCorpus: 1085 train + 148 dev + 364 test sentences
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator
2023-10-11 09:22:17,387 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,387 Train: 1085 sentences
2023-10-11 09:22:17,387 (train_with_dev=False, train_with_test=False)
2023-10-11 09:22:17,387 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,387 Training Params:
2023-10-11 09:22:17,387 - learning_rate: "0.00016"
2023-10-11 09:22:17,387 - mini_batch_size: "8"
2023-10-11 09:22:17,387 - max_epochs: "10"
2023-10-11 09:22:17,388 - shuffle: "True"
2023-10-11 09:22:17,388 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,388 Plugins:
2023-10-11 09:22:17,388 - TensorboardLogger
2023-10-11 09:22:17,388 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 09:22:17,388 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,388 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 09:22:17,388 - metric: "('micro avg', 'f1-score')"
2023-10-11 09:22:17,388 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,388 Computation:
2023-10-11 09:22:17,388 - compute on device: cuda:0
2023-10-11 09:22:17,388 - embedding storage: none
2023-10-11 09:22:17,388 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,388 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2"
2023-10-11 09:22:17,388 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,389 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:17,389 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 09:22:26,095 epoch 1 - iter 13/136 - loss 2.85446923 - time (sec): 8.70 - samples/sec: 588.34 - lr: 0.000014 - momentum: 0.000000
2023-10-11 09:22:34,155 epoch 1 - iter 26/136 - loss 2.84819784 - time (sec): 16.76 - samples/sec: 557.37 - lr: 0.000029 - momentum: 0.000000
2023-10-11 09:22:43,000 epoch 1 - iter 39/136 - loss 2.83695644 - time (sec): 25.61 - samples/sec: 571.65 - lr: 0.000045 - momentum: 0.000000
2023-10-11 09:22:51,863 epoch 1 - iter 52/136 - loss 2.81681872 - time (sec): 34.47 - samples/sec: 574.45 - lr: 0.000060 - momentum: 0.000000
2023-10-11 09:23:00,750 epoch 1 - iter 65/136 - loss 2.78286085 - time (sec): 43.36 - samples/sec: 577.54 - lr: 0.000075 - momentum: 0.000000
2023-10-11 09:23:09,466 epoch 1 - iter 78/136 - loss 2.72872399 - time (sec): 52.08 - samples/sec: 573.28 - lr: 0.000091 - momentum: 0.000000
2023-10-11 09:23:18,296 epoch 1 - iter 91/136 - loss 2.65661692 - time (sec): 60.91 - samples/sec: 572.03 - lr: 0.000106 - momentum: 0.000000
2023-10-11 09:23:26,980 epoch 1 - iter 104/136 - loss 2.58114034 - time (sec): 69.59 - samples/sec: 569.30 - lr: 0.000121 - momentum: 0.000000
2023-10-11 09:23:36,080 epoch 1 - iter 117/136 - loss 2.49034961 - time (sec): 78.69 - samples/sec: 571.68 - lr: 0.000136 - momentum: 0.000000
2023-10-11 09:23:44,902 epoch 1 - iter 130/136 - loss 2.40576009 - time (sec): 87.51 - samples/sec: 573.18 - lr: 0.000152 - momentum: 0.000000
2023-10-11 09:23:48,395 ----------------------------------------------------------------------------------------------------
2023-10-11 09:23:48,396 EPOCH 1 done: loss 2.3753 - lr: 0.000152
2023-10-11 09:23:53,682 DEV : loss 1.356597661972046 - f1-score (micro avg) 0.0
2023-10-11 09:23:53,690 ----------------------------------------------------------------------------------------------------
2023-10-11 09:24:02,835 epoch 2 - iter 13/136 - loss 1.33385722 - time (sec): 9.14 - samples/sec: 608.94 - lr: 0.000158 - momentum: 0.000000
2023-10-11 09:24:11,349 epoch 2 - iter 26/136 - loss 1.23868983 - time (sec): 17.66 - samples/sec: 587.74 - lr: 0.000157 - momentum: 0.000000
2023-10-11 09:24:20,755 epoch 2 - iter 39/136 - loss 1.17813516 - time (sec): 27.06 - samples/sec: 597.44 - lr: 0.000155 - momentum: 0.000000
2023-10-11 09:24:29,258 epoch 2 - iter 52/136 - loss 1.09171959 - time (sec): 35.57 - samples/sec: 589.32 - lr: 0.000153 - momentum: 0.000000
2023-10-11 09:24:37,955 epoch 2 - iter 65/136 - loss 1.03129374 - time (sec): 44.26 - samples/sec: 583.57 - lr: 0.000152 - momentum: 0.000000
2023-10-11 09:24:46,413 epoch 2 - iter 78/136 - loss 0.97605171 - time (sec): 52.72 - samples/sec: 573.76 - lr: 0.000150 - momentum: 0.000000
2023-10-11 09:24:55,176 epoch 2 - iter 91/136 - loss 0.93113778 - time (sec): 61.48 - samples/sec: 567.38 - lr: 0.000148 - momentum: 0.000000
2023-10-11 09:25:04,202 epoch 2 - iter 104/136 - loss 0.87668593 - time (sec): 70.51 - samples/sec: 565.45 - lr: 0.000147 - momentum: 0.000000
2023-10-11 09:25:12,821 epoch 2 - iter 117/136 - loss 0.84924304 - time (sec): 79.13 - samples/sec: 563.58 - lr: 0.000145 - momentum: 0.000000
2023-10-11 09:25:21,129 epoch 2 - iter 130/136 - loss 0.83219789 - time (sec): 87.44 - samples/sec: 563.70 - lr: 0.000143 - momentum: 0.000000
2023-10-11 09:25:25,283 ----------------------------------------------------------------------------------------------------
2023-10-11 09:25:25,283 EPOCH 2 done: loss 0.8170 - lr: 0.000143
2023-10-11 09:25:31,157 DEV : loss 0.4814496636390686 - f1-score (micro avg) 0.0
2023-10-11 09:25:31,166 ----------------------------------------------------------------------------------------------------
2023-10-11 09:25:39,752 epoch 3 - iter 13/136 - loss 0.52849933 - time (sec): 8.58 - samples/sec: 605.95 - lr: 0.000141 - momentum: 0.000000
2023-10-11 09:25:48,099 epoch 3 - iter 26/136 - loss 0.51390315 - time (sec): 16.93 - samples/sec: 590.19 - lr: 0.000139 - momentum: 0.000000
2023-10-11 09:25:56,461 epoch 3 - iter 39/136 - loss 0.47008844 - time (sec): 25.29 - samples/sec: 584.37 - lr: 0.000137 - momentum: 0.000000
2023-10-11 09:26:04,473 epoch 3 - iter 52/136 - loss 0.45885997 - time (sec): 33.31 - samples/sec: 577.32 - lr: 0.000136 - momentum: 0.000000
2023-10-11 09:26:13,660 epoch 3 - iter 65/136 - loss 0.44940332 - time (sec): 42.49 - samples/sec: 588.92 - lr: 0.000134 - momentum: 0.000000
2023-10-11 09:26:22,010 epoch 3 - iter 78/136 - loss 0.42990217 - time (sec): 50.84 - samples/sec: 587.54 - lr: 0.000132 - momentum: 0.000000
2023-10-11 09:26:31,612 epoch 3 - iter 91/136 - loss 0.41954018 - time (sec): 60.44 - samples/sec: 594.21 - lr: 0.000131 - momentum: 0.000000
2023-10-11 09:26:40,220 epoch 3 - iter 104/136 - loss 0.41242956 - time (sec): 69.05 - samples/sec: 593.42 - lr: 0.000129 - momentum: 0.000000
2023-10-11 09:26:49,193 epoch 3 - iter 117/136 - loss 0.39861445 - time (sec): 78.03 - samples/sec: 588.86 - lr: 0.000127 - momentum: 0.000000
2023-10-11 09:26:57,219 epoch 3 - iter 130/136 - loss 0.39461741 - time (sec): 86.05 - samples/sec: 581.03 - lr: 0.000126 - momentum: 0.000000
2023-10-11 09:27:00,933 ----------------------------------------------------------------------------------------------------
2023-10-11 09:27:00,933 EPOCH 3 done: loss 0.3964 - lr: 0.000126
2023-10-11 09:27:07,019 DEV : loss 0.294859379529953 - f1-score (micro avg) 0.2986
2023-10-11 09:27:07,030 saving best model
2023-10-11 09:27:07,991 ----------------------------------------------------------------------------------------------------
2023-10-11 09:27:16,576 epoch 4 - iter 13/136 - loss 0.31357415 - time (sec): 8.58 - samples/sec: 556.39 - lr: 0.000123 - momentum: 0.000000
2023-10-11 09:27:24,618 epoch 4 - iter 26/136 - loss 0.31884578 - time (sec): 16.62 - samples/sec: 550.03 - lr: 0.000121 - momentum: 0.000000
2023-10-11 09:27:33,627 epoch 4 - iter 39/136 - loss 0.30684042 - time (sec): 25.63 - samples/sec: 581.74 - lr: 0.000120 - momentum: 0.000000
2023-10-11 09:27:42,467 epoch 4 - iter 52/136 - loss 0.30096490 - time (sec): 34.47 - samples/sec: 595.44 - lr: 0.000118 - momentum: 0.000000
2023-10-11 09:27:50,400 epoch 4 - iter 65/136 - loss 0.29893478 - time (sec): 42.41 - samples/sec: 591.96 - lr: 0.000116 - momentum: 0.000000
2023-10-11 09:27:59,033 epoch 4 - iter 78/136 - loss 0.29398395 - time (sec): 51.04 - samples/sec: 595.38 - lr: 0.000115 - momentum: 0.000000
2023-10-11 09:28:07,206 epoch 4 - iter 91/136 - loss 0.30030830 - time (sec): 59.21 - samples/sec: 591.36 - lr: 0.000113 - momentum: 0.000000
2023-10-11 09:28:15,497 epoch 4 - iter 104/136 - loss 0.29277071 - time (sec): 67.50 - samples/sec: 592.49 - lr: 0.000111 - momentum: 0.000000
2023-10-11 09:28:23,710 epoch 4 - iter 117/136 - loss 0.30022584 - time (sec): 75.72 - samples/sec: 591.45 - lr: 0.000109 - momentum: 0.000000
2023-10-11 09:28:32,379 epoch 4 - iter 130/136 - loss 0.30199746 - time (sec): 84.39 - samples/sec: 588.99 - lr: 0.000108 - momentum: 0.000000
2023-10-11 09:28:36,190 ----------------------------------------------------------------------------------------------------
2023-10-11 09:28:36,190 EPOCH 4 done: loss 0.2996 - lr: 0.000108
2023-10-11 09:28:41,678 DEV : loss 0.25123921036720276 - f1-score (micro avg) 0.387
2023-10-11 09:28:41,687 saving best model
2023-10-11 09:28:44,236 ----------------------------------------------------------------------------------------------------
2023-10-11 09:28:52,096 epoch 5 - iter 13/136 - loss 0.27743068 - time (sec): 7.86 - samples/sec: 577.12 - lr: 0.000105 - momentum: 0.000000
2023-10-11 09:29:00,409 epoch 5 - iter 26/136 - loss 0.26814576 - time (sec): 16.17 - samples/sec: 599.92 - lr: 0.000104 - momentum: 0.000000
2023-10-11 09:29:09,814 epoch 5 - iter 39/136 - loss 0.25408909 - time (sec): 25.57 - samples/sec: 587.51 - lr: 0.000102 - momentum: 0.000000
2023-10-11 09:29:17,569 epoch 5 - iter 52/136 - loss 0.25896282 - time (sec): 33.33 - samples/sec: 578.63 - lr: 0.000100 - momentum: 0.000000
2023-10-11 09:29:26,113 epoch 5 - iter 65/136 - loss 0.24479651 - time (sec): 41.87 - samples/sec: 576.79 - lr: 0.000099 - momentum: 0.000000
2023-10-11 09:29:35,466 epoch 5 - iter 78/136 - loss 0.23665481 - time (sec): 51.23 - samples/sec: 575.51 - lr: 0.000097 - momentum: 0.000000
2023-10-11 09:29:44,334 epoch 5 - iter 91/136 - loss 0.23896649 - time (sec): 60.09 - samples/sec: 574.14 - lr: 0.000095 - momentum: 0.000000
2023-10-11 09:29:53,316 epoch 5 - iter 104/136 - loss 0.24722321 - time (sec): 69.08 - samples/sec: 574.93 - lr: 0.000093 - momentum: 0.000000
2023-10-11 09:30:02,483 epoch 5 - iter 117/136 - loss 0.24854195 - time (sec): 78.24 - samples/sec: 575.02 - lr: 0.000092 - momentum: 0.000000
2023-10-11 09:30:11,085 epoch 5 - iter 130/136 - loss 0.24845778 - time (sec): 86.84 - samples/sec: 571.43 - lr: 0.000090 - momentum: 0.000000
2023-10-11 09:30:15,093 ----------------------------------------------------------------------------------------------------
2023-10-11 09:30:15,093 EPOCH 5 done: loss 0.2484 - lr: 0.000090
2023-10-11 09:30:21,001 DEV : loss 0.2224646955728531 - f1-score (micro avg) 0.4767
2023-10-11 09:30:21,011 saving best model
2023-10-11 09:30:23,572 ----------------------------------------------------------------------------------------------------
2023-10-11 09:30:32,235 epoch 6 - iter 13/136 - loss 0.24336816 - time (sec): 8.66 - samples/sec: 599.38 - lr: 0.000088 - momentum: 0.000000
2023-10-11 09:30:40,564 epoch 6 - iter 26/136 - loss 0.23662061 - time (sec): 16.99 - samples/sec: 583.20 - lr: 0.000086 - momentum: 0.000000
2023-10-11 09:30:49,284 epoch 6 - iter 39/136 - loss 0.22757752 - time (sec): 25.71 - samples/sec: 581.75 - lr: 0.000084 - momentum: 0.000000
2023-10-11 09:30:58,212 epoch 6 - iter 52/136 - loss 0.23103914 - time (sec): 34.64 - samples/sec: 592.25 - lr: 0.000083 - momentum: 0.000000
2023-10-11 09:31:06,295 epoch 6 - iter 65/136 - loss 0.22567006 - time (sec): 42.72 - samples/sec: 580.96 - lr: 0.000081 - momentum: 0.000000
2023-10-11 09:31:15,137 epoch 6 - iter 78/136 - loss 0.21375693 - time (sec): 51.56 - samples/sec: 583.82 - lr: 0.000079 - momentum: 0.000000
2023-10-11 09:31:24,022 epoch 6 - iter 91/136 - loss 0.20775684 - time (sec): 60.45 - samples/sec: 586.75 - lr: 0.000077 - momentum: 0.000000
2023-10-11 09:31:32,341 epoch 6 - iter 104/136 - loss 0.21174013 - time (sec): 68.76 - samples/sec: 583.41 - lr: 0.000076 - momentum: 0.000000
2023-10-11 09:31:40,953 epoch 6 - iter 117/136 - loss 0.20969673 - time (sec): 77.38 - samples/sec: 580.99 - lr: 0.000074 - momentum: 0.000000
2023-10-11 09:31:49,704 epoch 6 - iter 130/136 - loss 0.20516424 - time (sec): 86.13 - samples/sec: 579.94 - lr: 0.000072 - momentum: 0.000000
2023-10-11 09:31:53,414 ----------------------------------------------------------------------------------------------------
2023-10-11 09:31:53,414 EPOCH 6 done: loss 0.2041 - lr: 0.000072
2023-10-11 09:31:59,207 DEV : loss 0.20685341954231262 - f1-score (micro avg) 0.5169
2023-10-11 09:31:59,224 saving best model
2023-10-11 09:32:01,786 ----------------------------------------------------------------------------------------------------
2023-10-11 09:32:10,439 epoch 7 - iter 13/136 - loss 0.17704560 - time (sec): 8.65 - samples/sec: 532.79 - lr: 0.000070 - momentum: 0.000000
2023-10-11 09:32:19,595 epoch 7 - iter 26/136 - loss 0.16608999 - time (sec): 17.80 - samples/sec: 561.43 - lr: 0.000068 - momentum: 0.000000
2023-10-11 09:32:28,971 epoch 7 - iter 39/136 - loss 0.16113561 - time (sec): 27.18 - samples/sec: 576.92 - lr: 0.000067 - momentum: 0.000000
2023-10-11 09:32:37,322 epoch 7 - iter 52/136 - loss 0.16321262 - time (sec): 35.53 - samples/sec: 568.91 - lr: 0.000065 - momentum: 0.000000
2023-10-11 09:32:46,529 epoch 7 - iter 65/136 - loss 0.16337568 - time (sec): 44.74 - samples/sec: 572.50 - lr: 0.000063 - momentum: 0.000000
2023-10-11 09:32:55,317 epoch 7 - iter 78/136 - loss 0.16471333 - time (sec): 53.52 - samples/sec: 568.85 - lr: 0.000061 - momentum: 0.000000
2023-10-11 09:33:03,307 epoch 7 - iter 91/136 - loss 0.16880727 - time (sec): 61.51 - samples/sec: 557.86 - lr: 0.000060 - momentum: 0.000000
2023-10-11 09:33:12,128 epoch 7 - iter 104/136 - loss 0.16668869 - time (sec): 70.34 - samples/sec: 560.26 - lr: 0.000058 - momentum: 0.000000
2023-10-11 09:33:21,089 epoch 7 - iter 117/136 - loss 0.16868453 - time (sec): 79.30 - samples/sec: 563.62 - lr: 0.000056 - momentum: 0.000000
2023-10-11 09:33:29,353 epoch 7 - iter 130/136 - loss 0.16881225 - time (sec): 87.56 - samples/sec: 562.45 - lr: 0.000055 - momentum: 0.000000
2023-10-11 09:33:33,534 ----------------------------------------------------------------------------------------------------
2023-10-11 09:33:33,534 EPOCH 7 done: loss 0.1688 - lr: 0.000055
2023-10-11 09:33:39,299 DEV : loss 0.18640807271003723 - f1-score (micro avg) 0.5329
2023-10-11 09:33:39,307 saving best model
2023-10-11 09:33:41,873 ----------------------------------------------------------------------------------------------------
2023-10-11 09:33:50,064 epoch 8 - iter 13/136 - loss 0.16538513 - time (sec): 8.19 - samples/sec: 551.98 - lr: 0.000052 - momentum: 0.000000
2023-10-11 09:33:58,121 epoch 8 - iter 26/136 - loss 0.13848192 - time (sec): 16.24 - samples/sec: 552.95 - lr: 0.000051 - momentum: 0.000000
2023-10-11 09:34:07,212 epoch 8 - iter 39/136 - loss 0.14824327 - time (sec): 25.33 - samples/sec: 581.65 - lr: 0.000049 - momentum: 0.000000
2023-10-11 09:34:15,755 epoch 8 - iter 52/136 - loss 0.15156730 - time (sec): 33.88 - samples/sec: 586.35 - lr: 0.000047 - momentum: 0.000000
2023-10-11 09:34:24,020 epoch 8 - iter 65/136 - loss 0.15316354 - time (sec): 42.14 - samples/sec: 583.73 - lr: 0.000045 - momentum: 0.000000
2023-10-11 09:34:32,389 epoch 8 - iter 78/136 - loss 0.15664533 - time (sec): 50.51 - samples/sec: 584.78 - lr: 0.000044 - momentum: 0.000000
2023-10-11 09:34:41,102 epoch 8 - iter 91/136 - loss 0.15321152 - time (sec): 59.22 - samples/sec: 587.00 - lr: 0.000042 - momentum: 0.000000
2023-10-11 09:34:50,395 epoch 8 - iter 104/136 - loss 0.14996935 - time (sec): 68.52 - samples/sec: 589.11 - lr: 0.000040 - momentum: 0.000000
2023-10-11 09:34:59,295 epoch 8 - iter 117/136 - loss 0.14674671 - time (sec): 77.42 - samples/sec: 586.03 - lr: 0.000039 - momentum: 0.000000
2023-10-11 09:35:07,742 epoch 8 - iter 130/136 - loss 0.14590270 - time (sec): 85.86 - samples/sec: 580.16 - lr: 0.000037 - momentum: 0.000000
2023-10-11 09:35:11,635 ----------------------------------------------------------------------------------------------------
2023-10-11 09:35:11,635 EPOCH 8 done: loss 0.1442 - lr: 0.000037
2023-10-11 09:35:17,701 DEV : loss 0.1840282827615738 - f1-score (micro avg) 0.5674
2023-10-11 09:35:17,709 saving best model
2023-10-11 09:35:20,283 ----------------------------------------------------------------------------------------------------
2023-10-11 09:35:28,030 epoch 9 - iter 13/136 - loss 0.16181475 - time (sec): 7.74 - samples/sec: 533.14 - lr: 0.000034 - momentum: 0.000000
2023-10-11 09:35:36,255 epoch 9 - iter 26/136 - loss 0.14681724 - time (sec): 15.97 - samples/sec: 554.11 - lr: 0.000033 - momentum: 0.000000
2023-10-11 09:35:44,795 epoch 9 - iter 39/136 - loss 0.13820279 - time (sec): 24.51 - samples/sec: 571.17 - lr: 0.000031 - momentum: 0.000000
2023-10-11 09:35:53,078 epoch 9 - iter 52/136 - loss 0.14489701 - time (sec): 32.79 - samples/sec: 569.88 - lr: 0.000029 - momentum: 0.000000
2023-10-11 09:36:01,201 epoch 9 - iter 65/136 - loss 0.15027000 - time (sec): 40.91 - samples/sec: 567.62 - lr: 0.000028 - momentum: 0.000000
2023-10-11 09:36:10,032 epoch 9 - iter 78/136 - loss 0.14053266 - time (sec): 49.74 - samples/sec: 575.66 - lr: 0.000026 - momentum: 0.000000
2023-10-11 09:36:18,379 epoch 9 - iter 91/136 - loss 0.14000543 - time (sec): 58.09 - samples/sec: 576.16 - lr: 0.000024 - momentum: 0.000000
2023-10-11 09:36:27,550 epoch 9 - iter 104/136 - loss 0.13747365 - time (sec): 67.26 - samples/sec: 583.59 - lr: 0.000023 - momentum: 0.000000
2023-10-11 09:36:36,206 epoch 9 - iter 117/136 - loss 0.13158513 - time (sec): 75.92 - samples/sec: 586.65 - lr: 0.000021 - momentum: 0.000000
2023-10-11 09:36:44,906 epoch 9 - iter 130/136 - loss 0.12865231 - time (sec): 84.62 - samples/sec: 586.80 - lr: 0.000019 - momentum: 0.000000
2023-10-11 09:36:48,811 ----------------------------------------------------------------------------------------------------
2023-10-11 09:36:48,811 EPOCH 9 done: loss 0.1281 - lr: 0.000019
2023-10-11 09:36:54,438 DEV : loss 0.18203255534172058 - f1-score (micro avg) 0.6004
2023-10-11 09:36:54,447 saving best model
2023-10-11 09:36:57,285 ----------------------------------------------------------------------------------------------------
2023-10-11 09:37:06,059 epoch 10 - iter 13/136 - loss 0.14217771 - time (sec): 8.77 - samples/sec: 584.97 - lr: 0.000017 - momentum: 0.000000
2023-10-11 09:37:14,553 epoch 10 - iter 26/136 - loss 0.13969245 - time (sec): 17.26 - samples/sec: 567.04 - lr: 0.000015 - momentum: 0.000000
2023-10-11 09:37:22,983 epoch 10 - iter 39/136 - loss 0.13303950 - time (sec): 25.69 - samples/sec: 563.53 - lr: 0.000013 - momentum: 0.000000
2023-10-11 09:37:31,331 epoch 10 - iter 52/136 - loss 0.13098438 - time (sec): 34.04 - samples/sec: 556.79 - lr: 0.000012 - momentum: 0.000000
2023-10-11 09:37:40,514 epoch 10 - iter 65/136 - loss 0.12985671 - time (sec): 43.22 - samples/sec: 571.90 - lr: 0.000010 - momentum: 0.000000
2023-10-11 09:37:49,115 epoch 10 - iter 78/136 - loss 0.12227963 - time (sec): 51.83 - samples/sec: 576.27 - lr: 0.000008 - momentum: 0.000000
2023-10-11 09:37:57,502 epoch 10 - iter 91/136 - loss 0.12446202 - time (sec): 60.21 - samples/sec: 576.03 - lr: 0.000007 - momentum: 0.000000
2023-10-11 09:38:06,175 epoch 10 - iter 104/136 - loss 0.12499547 - time (sec): 68.89 - samples/sec: 575.22 - lr: 0.000005 - momentum: 0.000000
2023-10-11 09:38:14,736 epoch 10 - iter 117/136 - loss 0.12096331 - time (sec): 77.45 - samples/sec: 578.51 - lr: 0.000003 - momentum: 0.000000
2023-10-11 09:38:23,127 epoch 10 - iter 130/136 - loss 0.12143672 - time (sec): 85.84 - samples/sec: 578.24 - lr: 0.000002 - momentum: 0.000000
2023-10-11 09:38:26,880 ----------------------------------------------------------------------------------------------------
2023-10-11 09:38:26,880 EPOCH 10 done: loss 0.1213 - lr: 0.000002
2023-10-11 09:38:32,414 DEV : loss 0.17927290499210358 - f1-score (micro avg) 0.5942
2023-10-11 09:38:33,260 ----------------------------------------------------------------------------------------------------
2023-10-11 09:38:33,262 Loading model from best epoch ...
2023-10-11 09:38:36,955 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-11 09:38:48,596
Results:
- F-score (micro) 0.5031
- F-score (macro) 0.3361
- Accuracy 0.3827
By class:
precision recall f1-score support
LOC 0.5573 0.6859 0.6149 312
PER 0.4034 0.4615 0.4305 208
HumanProd 0.2000 0.5909 0.2989 22
ORG 0.0000 0.0000 0.0000 55
micro avg 0.4702 0.5410 0.5031 597
macro avg 0.2902 0.4346 0.3361 597
weighted avg 0.4392 0.5410 0.4824 597
2023-10-11 09:38:48,596 ----------------------------------------------------------------------------------------------------