stefan-it's picture
Upload folder using huggingface_hub
fbfd3b3
2023-10-11 08:17:53,170 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,172 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 08:17:53,172 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,173 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
- NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
2023-10-11 08:17:53,173 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,173 Train: 7142 sentences
2023-10-11 08:17:53,173 (train_with_dev=False, train_with_test=False)
2023-10-11 08:17:53,173 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,173 Training Params:
2023-10-11 08:17:53,173 - learning_rate: "0.00015"
2023-10-11 08:17:53,173 - mini_batch_size: "8"
2023-10-11 08:17:53,173 - max_epochs: "10"
2023-10-11 08:17:53,173 - shuffle: "True"
2023-10-11 08:17:53,173 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,173 Plugins:
2023-10-11 08:17:53,173 - TensorboardLogger
2023-10-11 08:17:53,173 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 08:17:53,173 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,174 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 08:17:53,174 - metric: "('micro avg', 'f1-score')"
2023-10-11 08:17:53,174 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,174 Computation:
2023-10-11 08:17:53,174 - compute on device: cuda:0
2023-10-11 08:17:53,174 - embedding storage: none
2023-10-11 08:17:53,174 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,174 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3"
2023-10-11 08:17:53,174 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,174 ----------------------------------------------------------------------------------------------------
2023-10-11 08:17:53,174 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 08:18:49,180 epoch 1 - iter 89/893 - loss 2.82046970 - time (sec): 56.00 - samples/sec: 480.80 - lr: 0.000015 - momentum: 0.000000
2023-10-11 08:19:47,163 epoch 1 - iter 178/893 - loss 2.74859222 - time (sec): 113.99 - samples/sec: 444.71 - lr: 0.000030 - momentum: 0.000000
2023-10-11 08:20:38,003 epoch 1 - iter 267/893 - loss 2.56466690 - time (sec): 164.83 - samples/sec: 452.08 - lr: 0.000045 - momentum: 0.000000
2023-10-11 08:21:26,826 epoch 1 - iter 356/893 - loss 2.35440028 - time (sec): 213.65 - samples/sec: 462.13 - lr: 0.000060 - momentum: 0.000000
2023-10-11 08:22:19,664 epoch 1 - iter 445/893 - loss 2.12091695 - time (sec): 266.49 - samples/sec: 468.58 - lr: 0.000075 - momentum: 0.000000
2023-10-11 08:23:15,478 epoch 1 - iter 534/893 - loss 1.90749191 - time (sec): 322.30 - samples/sec: 461.56 - lr: 0.000090 - momentum: 0.000000
2023-10-11 08:24:10,556 epoch 1 - iter 623/893 - loss 1.71575701 - time (sec): 377.38 - samples/sec: 462.06 - lr: 0.000104 - momentum: 0.000000
2023-10-11 08:25:03,879 epoch 1 - iter 712/893 - loss 1.56436100 - time (sec): 430.70 - samples/sec: 461.70 - lr: 0.000119 - momentum: 0.000000
2023-10-11 08:25:55,832 epoch 1 - iter 801/893 - loss 1.43013467 - time (sec): 482.66 - samples/sec: 464.19 - lr: 0.000134 - momentum: 0.000000
2023-10-11 08:26:50,261 epoch 1 - iter 890/893 - loss 1.32696860 - time (sec): 537.08 - samples/sec: 461.63 - lr: 0.000149 - momentum: 0.000000
2023-10-11 08:26:52,072 ----------------------------------------------------------------------------------------------------
2023-10-11 08:26:52,072 EPOCH 1 done: loss 1.3236 - lr: 0.000149
2023-10-11 08:27:16,454 DEV : loss 0.23555637896060944 - f1-score (micro avg) 0.524
2023-10-11 08:27:16,494 saving best model
2023-10-11 08:27:17,659 ----------------------------------------------------------------------------------------------------
2023-10-11 08:28:20,694 epoch 2 - iter 89/893 - loss 0.24308848 - time (sec): 63.03 - samples/sec: 416.25 - lr: 0.000148 - momentum: 0.000000
2023-10-11 08:29:23,466 epoch 2 - iter 178/893 - loss 0.24144103 - time (sec): 125.80 - samples/sec: 409.21 - lr: 0.000147 - momentum: 0.000000
2023-10-11 08:30:18,170 epoch 2 - iter 267/893 - loss 0.22698655 - time (sec): 180.51 - samples/sec: 417.46 - lr: 0.000145 - momentum: 0.000000
2023-10-11 08:31:13,243 epoch 2 - iter 356/893 - loss 0.20654434 - time (sec): 235.58 - samples/sec: 427.68 - lr: 0.000143 - momentum: 0.000000
2023-10-11 08:32:08,723 epoch 2 - iter 445/893 - loss 0.19778611 - time (sec): 291.06 - samples/sec: 427.49 - lr: 0.000142 - momentum: 0.000000
2023-10-11 08:33:05,539 epoch 2 - iter 534/893 - loss 0.18786454 - time (sec): 347.88 - samples/sec: 431.84 - lr: 0.000140 - momentum: 0.000000
2023-10-11 08:33:57,472 epoch 2 - iter 623/893 - loss 0.18152301 - time (sec): 399.81 - samples/sec: 436.88 - lr: 0.000138 - momentum: 0.000000
2023-10-11 08:34:48,759 epoch 2 - iter 712/893 - loss 0.17442350 - time (sec): 451.10 - samples/sec: 438.85 - lr: 0.000137 - momentum: 0.000000
2023-10-11 08:35:41,408 epoch 2 - iter 801/893 - loss 0.16821741 - time (sec): 503.75 - samples/sec: 441.76 - lr: 0.000135 - momentum: 0.000000
2023-10-11 08:36:33,619 epoch 2 - iter 890/893 - loss 0.16223406 - time (sec): 555.96 - samples/sec: 445.94 - lr: 0.000133 - momentum: 0.000000
2023-10-11 08:36:35,297 ----------------------------------------------------------------------------------------------------
2023-10-11 08:36:35,297 EPOCH 2 done: loss 0.1619 - lr: 0.000133
2023-10-11 08:36:58,542 DEV : loss 0.09554639458656311 - f1-score (micro avg) 0.7648
2023-10-11 08:36:58,576 saving best model
2023-10-11 08:37:01,229 ----------------------------------------------------------------------------------------------------
2023-10-11 08:37:54,732 epoch 3 - iter 89/893 - loss 0.06890555 - time (sec): 53.50 - samples/sec: 445.74 - lr: 0.000132 - momentum: 0.000000
2023-10-11 08:38:46,142 epoch 3 - iter 178/893 - loss 0.06908598 - time (sec): 104.91 - samples/sec: 466.15 - lr: 0.000130 - momentum: 0.000000
2023-10-11 08:39:38,869 epoch 3 - iter 267/893 - loss 0.06842913 - time (sec): 157.63 - samples/sec: 464.28 - lr: 0.000128 - momentum: 0.000000
2023-10-11 08:40:30,914 epoch 3 - iter 356/893 - loss 0.07036545 - time (sec): 209.68 - samples/sec: 468.26 - lr: 0.000127 - momentum: 0.000000
2023-10-11 08:41:24,718 epoch 3 - iter 445/893 - loss 0.07189188 - time (sec): 263.48 - samples/sec: 466.37 - lr: 0.000125 - momentum: 0.000000
2023-10-11 08:42:18,632 epoch 3 - iter 534/893 - loss 0.07394553 - time (sec): 317.40 - samples/sec: 464.75 - lr: 0.000123 - momentum: 0.000000
2023-10-11 08:43:12,718 epoch 3 - iter 623/893 - loss 0.07564987 - time (sec): 371.48 - samples/sec: 468.82 - lr: 0.000122 - momentum: 0.000000
2023-10-11 08:44:05,682 epoch 3 - iter 712/893 - loss 0.07431156 - time (sec): 424.45 - samples/sec: 466.68 - lr: 0.000120 - momentum: 0.000000
2023-10-11 08:44:58,592 epoch 3 - iter 801/893 - loss 0.07235690 - time (sec): 477.36 - samples/sec: 467.64 - lr: 0.000118 - momentum: 0.000000
2023-10-11 08:45:49,069 epoch 3 - iter 890/893 - loss 0.07242906 - time (sec): 527.83 - samples/sec: 470.12 - lr: 0.000117 - momentum: 0.000000
2023-10-11 08:45:50,521 ----------------------------------------------------------------------------------------------------
2023-10-11 08:45:50,521 EPOCH 3 done: loss 0.0726 - lr: 0.000117
2023-10-11 08:46:12,903 DEV : loss 0.09995921701192856 - f1-score (micro avg) 0.7951
2023-10-11 08:46:12,940 saving best model
2023-10-11 08:46:15,623 ----------------------------------------------------------------------------------------------------
2023-10-11 08:47:06,741 epoch 4 - iter 89/893 - loss 0.04054614 - time (sec): 51.11 - samples/sec: 470.05 - lr: 0.000115 - momentum: 0.000000
2023-10-11 08:48:03,732 epoch 4 - iter 178/893 - loss 0.04820537 - time (sec): 108.10 - samples/sec: 453.43 - lr: 0.000113 - momentum: 0.000000
2023-10-11 08:49:02,459 epoch 4 - iter 267/893 - loss 0.04726065 - time (sec): 166.83 - samples/sec: 453.20 - lr: 0.000112 - momentum: 0.000000
2023-10-11 08:49:53,635 epoch 4 - iter 356/893 - loss 0.04746388 - time (sec): 218.01 - samples/sec: 459.58 - lr: 0.000110 - momentum: 0.000000
2023-10-11 08:50:46,643 epoch 4 - iter 445/893 - loss 0.04847189 - time (sec): 271.02 - samples/sec: 464.09 - lr: 0.000108 - momentum: 0.000000
2023-10-11 08:51:37,168 epoch 4 - iter 534/893 - loss 0.04943906 - time (sec): 321.54 - samples/sec: 462.24 - lr: 0.000107 - momentum: 0.000000
2023-10-11 08:52:32,719 epoch 4 - iter 623/893 - loss 0.04991779 - time (sec): 377.09 - samples/sec: 460.72 - lr: 0.000105 - momentum: 0.000000
2023-10-11 08:53:25,238 epoch 4 - iter 712/893 - loss 0.04993927 - time (sec): 429.61 - samples/sec: 460.77 - lr: 0.000103 - momentum: 0.000000
2023-10-11 08:54:18,323 epoch 4 - iter 801/893 - loss 0.04923222 - time (sec): 482.70 - samples/sec: 462.17 - lr: 0.000102 - momentum: 0.000000
2023-10-11 08:55:10,626 epoch 4 - iter 890/893 - loss 0.04836037 - time (sec): 535.00 - samples/sec: 463.56 - lr: 0.000100 - momentum: 0.000000
2023-10-11 08:55:12,296 ----------------------------------------------------------------------------------------------------
2023-10-11 08:55:12,297 EPOCH 4 done: loss 0.0483 - lr: 0.000100
2023-10-11 08:55:38,738 DEV : loss 0.1315266638994217 - f1-score (micro avg) 0.7959
2023-10-11 08:55:38,776 saving best model
2023-10-11 08:55:41,441 ----------------------------------------------------------------------------------------------------
2023-10-11 08:56:32,049 epoch 5 - iter 89/893 - loss 0.03387070 - time (sec): 50.60 - samples/sec: 496.32 - lr: 0.000098 - momentum: 0.000000
2023-10-11 08:57:25,784 epoch 5 - iter 178/893 - loss 0.03724058 - time (sec): 104.34 - samples/sec: 484.93 - lr: 0.000097 - momentum: 0.000000
2023-10-11 08:58:17,751 epoch 5 - iter 267/893 - loss 0.03458615 - time (sec): 156.31 - samples/sec: 483.86 - lr: 0.000095 - momentum: 0.000000
2023-10-11 08:59:12,353 epoch 5 - iter 356/893 - loss 0.03459710 - time (sec): 210.91 - samples/sec: 472.89 - lr: 0.000093 - momentum: 0.000000
2023-10-11 09:00:09,877 epoch 5 - iter 445/893 - loss 0.03385511 - time (sec): 268.43 - samples/sec: 460.47 - lr: 0.000092 - momentum: 0.000000
2023-10-11 09:01:06,978 epoch 5 - iter 534/893 - loss 0.03402281 - time (sec): 325.53 - samples/sec: 453.46 - lr: 0.000090 - momentum: 0.000000
2023-10-11 09:02:01,657 epoch 5 - iter 623/893 - loss 0.03373238 - time (sec): 380.21 - samples/sec: 454.86 - lr: 0.000088 - momentum: 0.000000
2023-10-11 09:02:55,698 epoch 5 - iter 712/893 - loss 0.03440469 - time (sec): 434.25 - samples/sec: 454.50 - lr: 0.000087 - momentum: 0.000000
2023-10-11 09:03:47,467 epoch 5 - iter 801/893 - loss 0.03456330 - time (sec): 486.02 - samples/sec: 457.40 - lr: 0.000085 - momentum: 0.000000
2023-10-11 09:04:43,386 epoch 5 - iter 890/893 - loss 0.03572654 - time (sec): 541.94 - samples/sec: 457.73 - lr: 0.000083 - momentum: 0.000000
2023-10-11 09:04:44,857 ----------------------------------------------------------------------------------------------------
2023-10-11 09:04:44,858 EPOCH 5 done: loss 0.0358 - lr: 0.000083
2023-10-11 09:05:05,510 DEV : loss 0.14709879457950592 - f1-score (micro avg) 0.804
2023-10-11 09:05:05,544 saving best model
2023-10-11 09:05:08,168 ----------------------------------------------------------------------------------------------------
2023-10-11 09:05:58,891 epoch 6 - iter 89/893 - loss 0.03395492 - time (sec): 50.72 - samples/sec: 512.87 - lr: 0.000082 - momentum: 0.000000
2023-10-11 09:06:47,795 epoch 6 - iter 178/893 - loss 0.03071508 - time (sec): 99.62 - samples/sec: 498.26 - lr: 0.000080 - momentum: 0.000000
2023-10-11 09:07:36,756 epoch 6 - iter 267/893 - loss 0.03123902 - time (sec): 148.58 - samples/sec: 495.75 - lr: 0.000078 - momentum: 0.000000
2023-10-11 09:08:26,624 epoch 6 - iter 356/893 - loss 0.02913751 - time (sec): 198.45 - samples/sec: 499.99 - lr: 0.000077 - momentum: 0.000000
2023-10-11 09:09:16,587 epoch 6 - iter 445/893 - loss 0.02833281 - time (sec): 248.41 - samples/sec: 498.09 - lr: 0.000075 - momentum: 0.000000
2023-10-11 09:10:05,096 epoch 6 - iter 534/893 - loss 0.02808492 - time (sec): 296.92 - samples/sec: 495.27 - lr: 0.000073 - momentum: 0.000000
2023-10-11 09:10:55,678 epoch 6 - iter 623/893 - loss 0.02776145 - time (sec): 347.51 - samples/sec: 494.44 - lr: 0.000072 - momentum: 0.000000
2023-10-11 09:11:49,851 epoch 6 - iter 712/893 - loss 0.02816525 - time (sec): 401.68 - samples/sec: 493.31 - lr: 0.000070 - momentum: 0.000000
2023-10-11 09:12:40,879 epoch 6 - iter 801/893 - loss 0.02763361 - time (sec): 452.71 - samples/sec: 493.02 - lr: 0.000068 - momentum: 0.000000
2023-10-11 09:13:35,189 epoch 6 - iter 890/893 - loss 0.02744010 - time (sec): 507.02 - samples/sec: 489.27 - lr: 0.000067 - momentum: 0.000000
2023-10-11 09:13:36,841 ----------------------------------------------------------------------------------------------------
2023-10-11 09:13:36,842 EPOCH 6 done: loss 0.0274 - lr: 0.000067
2023-10-11 09:13:57,966 DEV : loss 0.17321458458900452 - f1-score (micro avg) 0.7967
2023-10-11 09:13:57,997 ----------------------------------------------------------------------------------------------------
2023-10-11 09:14:50,694 epoch 7 - iter 89/893 - loss 0.02413897 - time (sec): 52.70 - samples/sec: 465.20 - lr: 0.000065 - momentum: 0.000000
2023-10-11 09:15:39,245 epoch 7 - iter 178/893 - loss 0.02544808 - time (sec): 101.25 - samples/sec: 471.96 - lr: 0.000063 - momentum: 0.000000
2023-10-11 09:16:30,729 epoch 7 - iter 267/893 - loss 0.02381130 - time (sec): 152.73 - samples/sec: 478.37 - lr: 0.000062 - momentum: 0.000000
2023-10-11 09:17:19,133 epoch 7 - iter 356/893 - loss 0.02288685 - time (sec): 201.13 - samples/sec: 483.04 - lr: 0.000060 - momentum: 0.000000
2023-10-11 09:18:07,867 epoch 7 - iter 445/893 - loss 0.02431887 - time (sec): 249.87 - samples/sec: 489.65 - lr: 0.000058 - momentum: 0.000000
2023-10-11 09:18:57,693 epoch 7 - iter 534/893 - loss 0.02364863 - time (sec): 299.69 - samples/sec: 493.70 - lr: 0.000057 - momentum: 0.000000
2023-10-11 09:19:47,195 epoch 7 - iter 623/893 - loss 0.02293135 - time (sec): 349.20 - samples/sec: 495.50 - lr: 0.000055 - momentum: 0.000000
2023-10-11 09:20:37,649 epoch 7 - iter 712/893 - loss 0.02271678 - time (sec): 399.65 - samples/sec: 495.74 - lr: 0.000053 - momentum: 0.000000
2023-10-11 09:21:28,122 epoch 7 - iter 801/893 - loss 0.02242817 - time (sec): 450.12 - samples/sec: 496.00 - lr: 0.000052 - momentum: 0.000000
2023-10-11 09:22:19,499 epoch 7 - iter 890/893 - loss 0.02186383 - time (sec): 501.50 - samples/sec: 494.85 - lr: 0.000050 - momentum: 0.000000
2023-10-11 09:22:20,940 ----------------------------------------------------------------------------------------------------
2023-10-11 09:22:20,940 EPOCH 7 done: loss 0.0219 - lr: 0.000050
2023-10-11 09:22:43,053 DEV : loss 0.18447040021419525 - f1-score (micro avg) 0.8054
2023-10-11 09:22:43,084 saving best model
2023-10-11 09:22:45,710 ----------------------------------------------------------------------------------------------------
2023-10-11 09:23:35,757 epoch 8 - iter 89/893 - loss 0.02030536 - time (sec): 50.04 - samples/sec: 493.50 - lr: 0.000048 - momentum: 0.000000
2023-10-11 09:24:24,663 epoch 8 - iter 178/893 - loss 0.01795230 - time (sec): 98.95 - samples/sec: 499.63 - lr: 0.000047 - momentum: 0.000000
2023-10-11 09:25:16,205 epoch 8 - iter 267/893 - loss 0.01573223 - time (sec): 150.49 - samples/sec: 483.51 - lr: 0.000045 - momentum: 0.000000
2023-10-11 09:26:04,490 epoch 8 - iter 356/893 - loss 0.01501584 - time (sec): 198.78 - samples/sec: 484.02 - lr: 0.000043 - momentum: 0.000000
2023-10-11 09:26:56,116 epoch 8 - iter 445/893 - loss 0.01684699 - time (sec): 250.40 - samples/sec: 479.79 - lr: 0.000042 - momentum: 0.000000
2023-10-11 09:27:50,434 epoch 8 - iter 534/893 - loss 0.01897729 - time (sec): 304.72 - samples/sec: 480.91 - lr: 0.000040 - momentum: 0.000000
2023-10-11 09:28:41,741 epoch 8 - iter 623/893 - loss 0.01864153 - time (sec): 356.03 - samples/sec: 484.28 - lr: 0.000038 - momentum: 0.000000
2023-10-11 09:29:34,049 epoch 8 - iter 712/893 - loss 0.01835066 - time (sec): 408.33 - samples/sec: 487.24 - lr: 0.000037 - momentum: 0.000000
2023-10-11 09:30:25,154 epoch 8 - iter 801/893 - loss 0.01881720 - time (sec): 459.44 - samples/sec: 488.51 - lr: 0.000035 - momentum: 0.000000
2023-10-11 09:31:13,819 epoch 8 - iter 890/893 - loss 0.01822911 - time (sec): 508.10 - samples/sec: 488.28 - lr: 0.000033 - momentum: 0.000000
2023-10-11 09:31:15,385 ----------------------------------------------------------------------------------------------------
2023-10-11 09:31:15,385 EPOCH 8 done: loss 0.0182 - lr: 0.000033
2023-10-11 09:31:37,483 DEV : loss 0.1934366077184677 - f1-score (micro avg) 0.8032
2023-10-11 09:31:37,514 ----------------------------------------------------------------------------------------------------
2023-10-11 09:32:26,805 epoch 9 - iter 89/893 - loss 0.01629395 - time (sec): 49.29 - samples/sec: 483.90 - lr: 0.000032 - momentum: 0.000000
2023-10-11 09:33:20,446 epoch 9 - iter 178/893 - loss 0.01330756 - time (sec): 102.93 - samples/sec: 454.04 - lr: 0.000030 - momentum: 0.000000
2023-10-11 09:34:09,567 epoch 9 - iter 267/893 - loss 0.01423309 - time (sec): 152.05 - samples/sec: 452.49 - lr: 0.000028 - momentum: 0.000000
2023-10-11 09:35:01,878 epoch 9 - iter 356/893 - loss 0.01350426 - time (sec): 204.36 - samples/sec: 464.37 - lr: 0.000027 - momentum: 0.000000
2023-10-11 09:35:53,100 epoch 9 - iter 445/893 - loss 0.01413719 - time (sec): 255.58 - samples/sec: 470.56 - lr: 0.000025 - momentum: 0.000000
2023-10-11 09:36:44,141 epoch 9 - iter 534/893 - loss 0.01441919 - time (sec): 306.62 - samples/sec: 476.65 - lr: 0.000023 - momentum: 0.000000
2023-10-11 09:37:40,428 epoch 9 - iter 623/893 - loss 0.01462155 - time (sec): 362.91 - samples/sec: 475.63 - lr: 0.000022 - momentum: 0.000000
2023-10-11 09:38:36,494 epoch 9 - iter 712/893 - loss 0.01443591 - time (sec): 418.98 - samples/sec: 474.11 - lr: 0.000020 - momentum: 0.000000
2023-10-11 09:39:32,640 epoch 9 - iter 801/893 - loss 0.01436811 - time (sec): 475.12 - samples/sec: 470.51 - lr: 0.000019 - momentum: 0.000000
2023-10-11 09:40:23,804 epoch 9 - iter 890/893 - loss 0.01444067 - time (sec): 526.29 - samples/sec: 470.61 - lr: 0.000017 - momentum: 0.000000
2023-10-11 09:40:25,601 ----------------------------------------------------------------------------------------------------
2023-10-11 09:40:25,602 EPOCH 9 done: loss 0.0144 - lr: 0.000017
2023-10-11 09:40:48,079 DEV : loss 0.19690608978271484 - f1-score (micro avg) 0.8067
2023-10-11 09:40:48,111 saving best model
2023-10-11 09:40:50,764 ----------------------------------------------------------------------------------------------------
2023-10-11 09:41:44,710 epoch 10 - iter 89/893 - loss 0.01266720 - time (sec): 53.94 - samples/sec: 462.49 - lr: 0.000015 - momentum: 0.000000
2023-10-11 09:42:42,606 epoch 10 - iter 178/893 - loss 0.01298725 - time (sec): 111.84 - samples/sec: 430.16 - lr: 0.000013 - momentum: 0.000000
2023-10-11 09:43:33,690 epoch 10 - iter 267/893 - loss 0.01235157 - time (sec): 162.92 - samples/sec: 446.58 - lr: 0.000012 - momentum: 0.000000
2023-10-11 09:44:23,850 epoch 10 - iter 356/893 - loss 0.01184256 - time (sec): 213.08 - samples/sec: 461.41 - lr: 0.000010 - momentum: 0.000000
2023-10-11 09:45:15,104 epoch 10 - iter 445/893 - loss 0.01173342 - time (sec): 264.34 - samples/sec: 469.89 - lr: 0.000008 - momentum: 0.000000
2023-10-11 09:46:06,317 epoch 10 - iter 534/893 - loss 0.01162859 - time (sec): 315.55 - samples/sec: 471.23 - lr: 0.000007 - momentum: 0.000000
2023-10-11 09:46:58,795 epoch 10 - iter 623/893 - loss 0.01135377 - time (sec): 368.03 - samples/sec: 469.24 - lr: 0.000005 - momentum: 0.000000
2023-10-11 09:47:49,365 epoch 10 - iter 712/893 - loss 0.01098390 - time (sec): 418.60 - samples/sec: 472.05 - lr: 0.000004 - momentum: 0.000000
2023-10-11 09:48:39,705 epoch 10 - iter 801/893 - loss 0.01075236 - time (sec): 468.94 - samples/sec: 474.42 - lr: 0.000002 - momentum: 0.000000
2023-10-11 09:49:30,954 epoch 10 - iter 890/893 - loss 0.01084378 - time (sec): 520.19 - samples/sec: 477.22 - lr: 0.000000 - momentum: 0.000000
2023-10-11 09:49:32,283 ----------------------------------------------------------------------------------------------------
2023-10-11 09:49:32,284 EPOCH 10 done: loss 0.0108 - lr: 0.000000
2023-10-11 09:49:53,633 DEV : loss 0.20006020367145538 - f1-score (micro avg) 0.8024
2023-10-11 09:49:54,579 ----------------------------------------------------------------------------------------------------
2023-10-11 09:49:54,581 Loading model from best epoch ...
2023-10-11 09:49:59,266 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 09:51:09,198
Results:
- F-score (micro) 0.7079
- F-score (macro) 0.6296
- Accuracy 0.564
By class:
precision recall f1-score support
LOC 0.7382 0.7288 0.7335 1095
PER 0.7646 0.7737 0.7692 1012
ORG 0.4451 0.6134 0.5159 357
HumanProd 0.4118 0.6364 0.5000 33
micro avg 0.6877 0.7293 0.7079 2497
macro avg 0.5899 0.6881 0.6296 2497
weighted avg 0.7027 0.7293 0.7137 2497
2023-10-11 09:51:09,198 ----------------------------------------------------------------------------------------------------