stefan-it's picture
Upload folder using huggingface_hub
a79bb12
2023-10-11 02:22:17,955 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,957 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 02:22:17,957 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,957 MultiCorpus: 1166 train + 165 dev + 415 test sentences
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
2023-10-11 02:22:17,957 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,957 Train: 1166 sentences
2023-10-11 02:22:17,957 (train_with_dev=False, train_with_test=False)
2023-10-11 02:22:17,958 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,958 Training Params:
2023-10-11 02:22:17,958 - learning_rate: "0.00016"
2023-10-11 02:22:17,958 - mini_batch_size: "4"
2023-10-11 02:22:17,958 - max_epochs: "10"
2023-10-11 02:22:17,958 - shuffle: "True"
2023-10-11 02:22:17,958 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,958 Plugins:
2023-10-11 02:22:17,958 - TensorboardLogger
2023-10-11 02:22:17,958 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 02:22:17,958 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,958 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 02:22:17,958 - metric: "('micro avg', 'f1-score')"
2023-10-11 02:22:17,958 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,958 Computation:
2023-10-11 02:22:17,959 - compute on device: cuda:0
2023-10-11 02:22:17,959 - embedding storage: none
2023-10-11 02:22:17,959 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,959 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4"
2023-10-11 02:22:17,959 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,959 ----------------------------------------------------------------------------------------------------
2023-10-11 02:22:17,959 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 02:22:27,986 epoch 1 - iter 29/292 - loss 2.84405011 - time (sec): 10.02 - samples/sec: 469.93 - lr: 0.000015 - momentum: 0.000000
2023-10-11 02:22:37,370 epoch 1 - iter 58/292 - loss 2.83170628 - time (sec): 19.41 - samples/sec: 457.20 - lr: 0.000031 - momentum: 0.000000
2023-10-11 02:22:46,528 epoch 1 - iter 87/292 - loss 2.80816659 - time (sec): 28.57 - samples/sec: 445.96 - lr: 0.000047 - momentum: 0.000000
2023-10-11 02:22:56,398 epoch 1 - iter 116/292 - loss 2.74484364 - time (sec): 38.44 - samples/sec: 452.01 - lr: 0.000063 - momentum: 0.000000
2023-10-11 02:23:06,048 epoch 1 - iter 145/292 - loss 2.65830898 - time (sec): 48.09 - samples/sec: 441.16 - lr: 0.000079 - momentum: 0.000000
2023-10-11 02:23:15,959 epoch 1 - iter 174/292 - loss 2.54286823 - time (sec): 58.00 - samples/sec: 442.84 - lr: 0.000095 - momentum: 0.000000
2023-10-11 02:23:26,700 epoch 1 - iter 203/292 - loss 2.41236971 - time (sec): 68.74 - samples/sec: 455.77 - lr: 0.000111 - momentum: 0.000000
2023-10-11 02:23:36,342 epoch 1 - iter 232/292 - loss 2.29850246 - time (sec): 78.38 - samples/sec: 456.38 - lr: 0.000127 - momentum: 0.000000
2023-10-11 02:23:45,555 epoch 1 - iter 261/292 - loss 2.19378808 - time (sec): 87.59 - samples/sec: 451.85 - lr: 0.000142 - momentum: 0.000000
2023-10-11 02:23:55,974 epoch 1 - iter 290/292 - loss 2.05654901 - time (sec): 98.01 - samples/sec: 451.23 - lr: 0.000158 - momentum: 0.000000
2023-10-11 02:23:56,470 ----------------------------------------------------------------------------------------------------
2023-10-11 02:23:56,470 EPOCH 1 done: loss 2.0507 - lr: 0.000158
2023-10-11 02:24:01,919 DEV : loss 0.6568392515182495 - f1-score (micro avg) 0.0
2023-10-11 02:24:01,927 ----------------------------------------------------------------------------------------------------
2023-10-11 02:24:11,061 epoch 2 - iter 29/292 - loss 0.62598909 - time (sec): 9.13 - samples/sec: 461.56 - lr: 0.000158 - momentum: 0.000000
2023-10-11 02:24:20,167 epoch 2 - iter 58/292 - loss 0.57978019 - time (sec): 18.24 - samples/sec: 467.49 - lr: 0.000157 - momentum: 0.000000
2023-10-11 02:24:29,160 epoch 2 - iter 87/292 - loss 0.57477199 - time (sec): 27.23 - samples/sec: 471.57 - lr: 0.000155 - momentum: 0.000000
2023-10-11 02:24:38,531 epoch 2 - iter 116/292 - loss 0.53886914 - time (sec): 36.60 - samples/sec: 479.18 - lr: 0.000153 - momentum: 0.000000
2023-10-11 02:24:47,735 epoch 2 - iter 145/292 - loss 0.51975581 - time (sec): 45.81 - samples/sec: 480.73 - lr: 0.000151 - momentum: 0.000000
2023-10-11 02:24:57,373 epoch 2 - iter 174/292 - loss 0.53807317 - time (sec): 55.44 - samples/sec: 486.32 - lr: 0.000149 - momentum: 0.000000
2023-10-11 02:25:07,457 epoch 2 - iter 203/292 - loss 0.51656765 - time (sec): 65.53 - samples/sec: 485.08 - lr: 0.000148 - momentum: 0.000000
2023-10-11 02:25:17,793 epoch 2 - iter 232/292 - loss 0.50681885 - time (sec): 75.86 - samples/sec: 474.56 - lr: 0.000146 - momentum: 0.000000
2023-10-11 02:25:29,042 epoch 2 - iter 261/292 - loss 0.49819345 - time (sec): 87.11 - samples/sec: 464.32 - lr: 0.000144 - momentum: 0.000000
2023-10-11 02:25:39,415 epoch 2 - iter 290/292 - loss 0.48361749 - time (sec): 97.49 - samples/sec: 454.44 - lr: 0.000142 - momentum: 0.000000
2023-10-11 02:25:39,903 ----------------------------------------------------------------------------------------------------
2023-10-11 02:25:39,903 EPOCH 2 done: loss 0.4844 - lr: 0.000142
2023-10-11 02:25:45,697 DEV : loss 0.27027052640914917 - f1-score (micro avg) 0.2532
2023-10-11 02:25:45,706 saving best model
2023-10-11 02:25:46,553 ----------------------------------------------------------------------------------------------------
2023-10-11 02:25:56,518 epoch 3 - iter 29/292 - loss 0.34171181 - time (sec): 9.96 - samples/sec: 385.13 - lr: 0.000141 - momentum: 0.000000
2023-10-11 02:26:06,230 epoch 3 - iter 58/292 - loss 0.33011318 - time (sec): 19.67 - samples/sec: 435.89 - lr: 0.000139 - momentum: 0.000000
2023-10-11 02:26:15,438 epoch 3 - iter 87/292 - loss 0.29524142 - time (sec): 28.88 - samples/sec: 428.18 - lr: 0.000137 - momentum: 0.000000
2023-10-11 02:26:25,582 epoch 3 - iter 116/292 - loss 0.32286885 - time (sec): 39.03 - samples/sec: 442.75 - lr: 0.000135 - momentum: 0.000000
2023-10-11 02:26:35,208 epoch 3 - iter 145/292 - loss 0.30129883 - time (sec): 48.65 - samples/sec: 452.27 - lr: 0.000133 - momentum: 0.000000
2023-10-11 02:26:45,353 epoch 3 - iter 174/292 - loss 0.30000804 - time (sec): 58.80 - samples/sec: 459.49 - lr: 0.000132 - momentum: 0.000000
2023-10-11 02:26:54,123 epoch 3 - iter 203/292 - loss 0.29053407 - time (sec): 67.57 - samples/sec: 454.97 - lr: 0.000130 - momentum: 0.000000
2023-10-11 02:27:02,974 epoch 3 - iter 232/292 - loss 0.28629204 - time (sec): 76.42 - samples/sec: 452.51 - lr: 0.000128 - momentum: 0.000000
2023-10-11 02:27:12,219 epoch 3 - iter 261/292 - loss 0.28138015 - time (sec): 85.66 - samples/sec: 455.33 - lr: 0.000126 - momentum: 0.000000
2023-10-11 02:27:22,634 epoch 3 - iter 290/292 - loss 0.27715632 - time (sec): 96.08 - samples/sec: 461.29 - lr: 0.000125 - momentum: 0.000000
2023-10-11 02:27:23,054 ----------------------------------------------------------------------------------------------------
2023-10-11 02:27:23,054 EPOCH 3 done: loss 0.2770 - lr: 0.000125
2023-10-11 02:27:28,787 DEV : loss 0.19208243489265442 - f1-score (micro avg) 0.498
2023-10-11 02:27:28,795 saving best model
2023-10-11 02:27:31,430 ----------------------------------------------------------------------------------------------------
2023-10-11 02:27:40,794 epoch 4 - iter 29/292 - loss 0.15296171 - time (sec): 9.36 - samples/sec: 452.03 - lr: 0.000123 - momentum: 0.000000
2023-10-11 02:27:50,090 epoch 4 - iter 58/292 - loss 0.19406134 - time (sec): 18.66 - samples/sec: 458.51 - lr: 0.000121 - momentum: 0.000000
2023-10-11 02:27:59,801 epoch 4 - iter 87/292 - loss 0.17747525 - time (sec): 28.37 - samples/sec: 464.34 - lr: 0.000119 - momentum: 0.000000
2023-10-11 02:28:09,135 epoch 4 - iter 116/292 - loss 0.16770571 - time (sec): 37.70 - samples/sec: 463.17 - lr: 0.000117 - momentum: 0.000000
2023-10-11 02:28:18,141 epoch 4 - iter 145/292 - loss 0.17038541 - time (sec): 46.71 - samples/sec: 459.89 - lr: 0.000116 - momentum: 0.000000
2023-10-11 02:28:28,400 epoch 4 - iter 174/292 - loss 0.17589765 - time (sec): 56.97 - samples/sec: 458.13 - lr: 0.000114 - momentum: 0.000000
2023-10-11 02:28:38,133 epoch 4 - iter 203/292 - loss 0.17832868 - time (sec): 66.70 - samples/sec: 457.92 - lr: 0.000112 - momentum: 0.000000
2023-10-11 02:28:47,802 epoch 4 - iter 232/292 - loss 0.17493554 - time (sec): 76.37 - samples/sec: 458.90 - lr: 0.000110 - momentum: 0.000000
2023-10-11 02:28:57,190 epoch 4 - iter 261/292 - loss 0.17196372 - time (sec): 85.76 - samples/sec: 455.30 - lr: 0.000109 - momentum: 0.000000
2023-10-11 02:29:07,230 epoch 4 - iter 290/292 - loss 0.16898623 - time (sec): 95.80 - samples/sec: 459.64 - lr: 0.000107 - momentum: 0.000000
2023-10-11 02:29:07,912 ----------------------------------------------------------------------------------------------------
2023-10-11 02:29:07,913 EPOCH 4 done: loss 0.1681 - lr: 0.000107
2023-10-11 02:29:13,730 DEV : loss 0.147971972823143 - f1-score (micro avg) 0.7257
2023-10-11 02:29:13,739 saving best model
2023-10-11 02:29:14,672 ----------------------------------------------------------------------------------------------------
2023-10-11 02:29:25,259 epoch 5 - iter 29/292 - loss 0.15169791 - time (sec): 10.58 - samples/sec: 526.70 - lr: 0.000105 - momentum: 0.000000
2023-10-11 02:29:35,330 epoch 5 - iter 58/292 - loss 0.13029739 - time (sec): 20.66 - samples/sec: 500.82 - lr: 0.000103 - momentum: 0.000000
2023-10-11 02:29:45,248 epoch 5 - iter 87/292 - loss 0.13245643 - time (sec): 30.57 - samples/sec: 462.98 - lr: 0.000101 - momentum: 0.000000
2023-10-11 02:29:55,075 epoch 5 - iter 116/292 - loss 0.12631426 - time (sec): 40.40 - samples/sec: 464.12 - lr: 0.000100 - momentum: 0.000000
2023-10-11 02:30:04,502 epoch 5 - iter 145/292 - loss 0.13024278 - time (sec): 49.83 - samples/sec: 462.23 - lr: 0.000098 - momentum: 0.000000
2023-10-11 02:30:13,782 epoch 5 - iter 174/292 - loss 0.12849868 - time (sec): 59.11 - samples/sec: 457.18 - lr: 0.000096 - momentum: 0.000000
2023-10-11 02:30:23,235 epoch 5 - iter 203/292 - loss 0.12118872 - time (sec): 68.56 - samples/sec: 455.06 - lr: 0.000094 - momentum: 0.000000
2023-10-11 02:30:32,460 epoch 5 - iter 232/292 - loss 0.11674466 - time (sec): 77.79 - samples/sec: 455.60 - lr: 0.000093 - momentum: 0.000000
2023-10-11 02:30:41,348 epoch 5 - iter 261/292 - loss 0.11497375 - time (sec): 86.67 - samples/sec: 452.60 - lr: 0.000091 - momentum: 0.000000
2023-10-11 02:30:51,317 epoch 5 - iter 290/292 - loss 0.11482274 - time (sec): 96.64 - samples/sec: 456.46 - lr: 0.000089 - momentum: 0.000000
2023-10-11 02:30:51,888 ----------------------------------------------------------------------------------------------------
2023-10-11 02:30:51,889 EPOCH 5 done: loss 0.1147 - lr: 0.000089
2023-10-11 02:30:57,394 DEV : loss 0.13591936230659485 - f1-score (micro avg) 0.7312
2023-10-11 02:30:57,403 saving best model
2023-10-11 02:30:59,986 ----------------------------------------------------------------------------------------------------
2023-10-11 02:31:10,259 epoch 6 - iter 29/292 - loss 0.08136425 - time (sec): 10.27 - samples/sec: 534.03 - lr: 0.000087 - momentum: 0.000000
2023-10-11 02:31:19,438 epoch 6 - iter 58/292 - loss 0.06936994 - time (sec): 19.45 - samples/sec: 515.27 - lr: 0.000085 - momentum: 0.000000
2023-10-11 02:31:27,937 epoch 6 - iter 87/292 - loss 0.07071139 - time (sec): 27.95 - samples/sec: 490.85 - lr: 0.000084 - momentum: 0.000000
2023-10-11 02:31:37,281 epoch 6 - iter 116/292 - loss 0.06752745 - time (sec): 37.29 - samples/sec: 492.85 - lr: 0.000082 - momentum: 0.000000
2023-10-11 02:31:46,112 epoch 6 - iter 145/292 - loss 0.07212155 - time (sec): 46.12 - samples/sec: 480.88 - lr: 0.000080 - momentum: 0.000000
2023-10-11 02:31:55,850 epoch 6 - iter 174/292 - loss 0.08003889 - time (sec): 55.86 - samples/sec: 486.31 - lr: 0.000078 - momentum: 0.000000
2023-10-11 02:32:05,557 epoch 6 - iter 203/292 - loss 0.07724681 - time (sec): 65.57 - samples/sec: 488.70 - lr: 0.000077 - momentum: 0.000000
2023-10-11 02:32:15,268 epoch 6 - iter 232/292 - loss 0.07725976 - time (sec): 75.28 - samples/sec: 478.53 - lr: 0.000075 - momentum: 0.000000
2023-10-11 02:32:24,897 epoch 6 - iter 261/292 - loss 0.08058987 - time (sec): 84.91 - samples/sec: 468.19 - lr: 0.000073 - momentum: 0.000000
2023-10-11 02:32:34,909 epoch 6 - iter 290/292 - loss 0.07823012 - time (sec): 94.92 - samples/sec: 464.64 - lr: 0.000071 - momentum: 0.000000
2023-10-11 02:32:35,538 ----------------------------------------------------------------------------------------------------
2023-10-11 02:32:35,538 EPOCH 6 done: loss 0.0785 - lr: 0.000071
2023-10-11 02:32:42,085 DEV : loss 0.12154516577720642 - f1-score (micro avg) 0.7733
2023-10-11 02:32:42,094 saving best model
2023-10-11 02:32:44,717 ----------------------------------------------------------------------------------------------------
2023-10-11 02:32:53,994 epoch 7 - iter 29/292 - loss 0.05511245 - time (sec): 9.27 - samples/sec: 411.85 - lr: 0.000069 - momentum: 0.000000
2023-10-11 02:33:03,636 epoch 7 - iter 58/292 - loss 0.05492446 - time (sec): 18.91 - samples/sec: 435.00 - lr: 0.000068 - momentum: 0.000000
2023-10-11 02:33:12,894 epoch 7 - iter 87/292 - loss 0.06175282 - time (sec): 28.17 - samples/sec: 428.04 - lr: 0.000066 - momentum: 0.000000
2023-10-11 02:33:23,190 epoch 7 - iter 116/292 - loss 0.05569448 - time (sec): 38.47 - samples/sec: 441.53 - lr: 0.000064 - momentum: 0.000000
2023-10-11 02:33:33,372 epoch 7 - iter 145/292 - loss 0.05718601 - time (sec): 48.65 - samples/sec: 458.43 - lr: 0.000062 - momentum: 0.000000
2023-10-11 02:33:42,561 epoch 7 - iter 174/292 - loss 0.06103347 - time (sec): 57.84 - samples/sec: 458.39 - lr: 0.000061 - momentum: 0.000000
2023-10-11 02:33:52,399 epoch 7 - iter 203/292 - loss 0.05935794 - time (sec): 67.68 - samples/sec: 463.39 - lr: 0.000059 - momentum: 0.000000
2023-10-11 02:34:01,519 epoch 7 - iter 232/292 - loss 0.05778039 - time (sec): 76.80 - samples/sec: 462.19 - lr: 0.000057 - momentum: 0.000000
2023-10-11 02:34:10,853 epoch 7 - iter 261/292 - loss 0.05858678 - time (sec): 86.13 - samples/sec: 461.09 - lr: 0.000055 - momentum: 0.000000
2023-10-11 02:34:20,513 epoch 7 - iter 290/292 - loss 0.05966390 - time (sec): 95.79 - samples/sec: 461.88 - lr: 0.000054 - momentum: 0.000000
2023-10-11 02:34:20,982 ----------------------------------------------------------------------------------------------------
2023-10-11 02:34:20,982 EPOCH 7 done: loss 0.0596 - lr: 0.000054
2023-10-11 02:34:26,485 DEV : loss 0.12782339751720428 - f1-score (micro avg) 0.7749
2023-10-11 02:34:26,494 saving best model
2023-10-11 02:34:29,017 ----------------------------------------------------------------------------------------------------
2023-10-11 02:34:38,399 epoch 8 - iter 29/292 - loss 0.05488144 - time (sec): 9.38 - samples/sec: 460.14 - lr: 0.000052 - momentum: 0.000000
2023-10-11 02:34:47,950 epoch 8 - iter 58/292 - loss 0.04194912 - time (sec): 18.93 - samples/sec: 481.44 - lr: 0.000050 - momentum: 0.000000
2023-10-11 02:34:56,990 epoch 8 - iter 87/292 - loss 0.04292430 - time (sec): 27.97 - samples/sec: 470.25 - lr: 0.000048 - momentum: 0.000000
2023-10-11 02:35:06,302 epoch 8 - iter 116/292 - loss 0.05008338 - time (sec): 37.28 - samples/sec: 462.18 - lr: 0.000046 - momentum: 0.000000
2023-10-11 02:35:16,594 epoch 8 - iter 145/292 - loss 0.04445838 - time (sec): 47.57 - samples/sec: 473.68 - lr: 0.000045 - momentum: 0.000000
2023-10-11 02:35:26,793 epoch 8 - iter 174/292 - loss 0.04716916 - time (sec): 57.77 - samples/sec: 475.06 - lr: 0.000043 - momentum: 0.000000
2023-10-11 02:35:36,141 epoch 8 - iter 203/292 - loss 0.04687333 - time (sec): 67.12 - samples/sec: 471.02 - lr: 0.000041 - momentum: 0.000000
2023-10-11 02:35:45,092 epoch 8 - iter 232/292 - loss 0.04640217 - time (sec): 76.07 - samples/sec: 468.05 - lr: 0.000039 - momentum: 0.000000
2023-10-11 02:35:54,331 epoch 8 - iter 261/292 - loss 0.04752936 - time (sec): 85.31 - samples/sec: 466.07 - lr: 0.000038 - momentum: 0.000000
2023-10-11 02:36:03,829 epoch 8 - iter 290/292 - loss 0.04677794 - time (sec): 94.81 - samples/sec: 466.35 - lr: 0.000036 - momentum: 0.000000
2023-10-11 02:36:04,315 ----------------------------------------------------------------------------------------------------
2023-10-11 02:36:04,316 EPOCH 8 done: loss 0.0471 - lr: 0.000036
2023-10-11 02:36:09,809 DEV : loss 0.13004955649375916 - f1-score (micro avg) 0.7759
2023-10-11 02:36:09,818 saving best model
2023-10-11 02:36:12,333 ----------------------------------------------------------------------------------------------------
2023-10-11 02:36:21,618 epoch 9 - iter 29/292 - loss 0.02873439 - time (sec): 9.28 - samples/sec: 485.32 - lr: 0.000034 - momentum: 0.000000
2023-10-11 02:36:30,730 epoch 9 - iter 58/292 - loss 0.03268934 - time (sec): 18.39 - samples/sec: 472.14 - lr: 0.000032 - momentum: 0.000000
2023-10-11 02:36:40,098 epoch 9 - iter 87/292 - loss 0.03158168 - time (sec): 27.76 - samples/sec: 479.63 - lr: 0.000030 - momentum: 0.000000
2023-10-11 02:36:49,302 epoch 9 - iter 116/292 - loss 0.02918808 - time (sec): 36.96 - samples/sec: 473.43 - lr: 0.000029 - momentum: 0.000000
2023-10-11 02:36:59,109 epoch 9 - iter 145/292 - loss 0.02774731 - time (sec): 46.77 - samples/sec: 475.68 - lr: 0.000027 - momentum: 0.000000
2023-10-11 02:37:08,619 epoch 9 - iter 174/292 - loss 0.02995775 - time (sec): 56.28 - samples/sec: 466.07 - lr: 0.000025 - momentum: 0.000000
2023-10-11 02:37:18,451 epoch 9 - iter 203/292 - loss 0.03214478 - time (sec): 66.11 - samples/sec: 468.14 - lr: 0.000023 - momentum: 0.000000
2023-10-11 02:37:28,951 epoch 9 - iter 232/292 - loss 0.03234939 - time (sec): 76.61 - samples/sec: 469.83 - lr: 0.000022 - momentum: 0.000000
2023-10-11 02:37:38,213 epoch 9 - iter 261/292 - loss 0.03627506 - time (sec): 85.88 - samples/sec: 463.78 - lr: 0.000020 - momentum: 0.000000
2023-10-11 02:37:48,218 epoch 9 - iter 290/292 - loss 0.03898896 - time (sec): 95.88 - samples/sec: 461.31 - lr: 0.000018 - momentum: 0.000000
2023-10-11 02:37:48,705 ----------------------------------------------------------------------------------------------------
2023-10-11 02:37:48,705 EPOCH 9 done: loss 0.0390 - lr: 0.000018
2023-10-11 02:37:54,501 DEV : loss 0.1367434859275818 - f1-score (micro avg) 0.7613
2023-10-11 02:37:54,510 ----------------------------------------------------------------------------------------------------
2023-10-11 02:38:04,763 epoch 10 - iter 29/292 - loss 0.02971200 - time (sec): 10.25 - samples/sec: 498.51 - lr: 0.000016 - momentum: 0.000000
2023-10-11 02:38:14,588 epoch 10 - iter 58/292 - loss 0.03029689 - time (sec): 20.08 - samples/sec: 481.86 - lr: 0.000014 - momentum: 0.000000
2023-10-11 02:38:24,532 epoch 10 - iter 87/292 - loss 0.02849897 - time (sec): 30.02 - samples/sec: 477.58 - lr: 0.000013 - momentum: 0.000000
2023-10-11 02:38:33,766 epoch 10 - iter 116/292 - loss 0.03324429 - time (sec): 39.25 - samples/sec: 462.91 - lr: 0.000011 - momentum: 0.000000
2023-10-11 02:38:43,535 epoch 10 - iter 145/292 - loss 0.03540899 - time (sec): 49.02 - samples/sec: 463.44 - lr: 0.000009 - momentum: 0.000000
2023-10-11 02:38:53,923 epoch 10 - iter 174/292 - loss 0.03318948 - time (sec): 59.41 - samples/sec: 466.48 - lr: 0.000007 - momentum: 0.000000
2023-10-11 02:39:02,992 epoch 10 - iter 203/292 - loss 0.03240122 - time (sec): 68.48 - samples/sec: 455.63 - lr: 0.000006 - momentum: 0.000000
2023-10-11 02:39:12,960 epoch 10 - iter 232/292 - loss 0.03389265 - time (sec): 78.45 - samples/sec: 451.94 - lr: 0.000004 - momentum: 0.000000
2023-10-11 02:39:22,905 epoch 10 - iter 261/292 - loss 0.03543843 - time (sec): 88.39 - samples/sec: 450.18 - lr: 0.000002 - momentum: 0.000000
2023-10-11 02:39:32,940 epoch 10 - iter 290/292 - loss 0.03586228 - time (sec): 98.43 - samples/sec: 449.45 - lr: 0.000000 - momentum: 0.000000
2023-10-11 02:39:33,421 ----------------------------------------------------------------------------------------------------
2023-10-11 02:39:33,422 EPOCH 10 done: loss 0.0358 - lr: 0.000000
2023-10-11 02:39:38,962 DEV : loss 0.1346944272518158 - f1-score (micro avg) 0.7646
2023-10-11 02:39:39,825 ----------------------------------------------------------------------------------------------------
2023-10-11 02:39:39,827 Loading model from best epoch ...
2023-10-11 02:39:43,540 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 02:39:55,836
Results:
- F-score (micro) 0.7191
- F-score (macro) 0.6682
- Accuracy 0.5799
By class:
precision recall f1-score support
PER 0.7908 0.8362 0.8128 348
LOC 0.5718 0.7778 0.6591 261
ORG 0.3830 0.3462 0.3636 52
HumanProd 0.8571 0.8182 0.8372 22
micro avg 0.6700 0.7760 0.7191 683
macro avg 0.6507 0.6946 0.6682 683
weighted avg 0.6782 0.7760 0.7207 683
2023-10-11 02:39:55,837 ----------------------------------------------------------------------------------------------------