|
2023-10-11 08:17:53,170 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,172 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 08:17:53,172 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,173 MultiCorpus: 7142 train + 698 dev + 2570 test sentences |
|
- NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator |
|
2023-10-11 08:17:53,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,173 Train: 7142 sentences |
|
2023-10-11 08:17:53,173 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 08:17:53,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,173 Training Params: |
|
2023-10-11 08:17:53,173 - learning_rate: "0.00015" |
|
2023-10-11 08:17:53,173 - mini_batch_size: "8" |
|
2023-10-11 08:17:53,173 - max_epochs: "10" |
|
2023-10-11 08:17:53,173 - shuffle: "True" |
|
2023-10-11 08:17:53,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,173 Plugins: |
|
2023-10-11 08:17:53,173 - TensorboardLogger |
|
2023-10-11 08:17:53,173 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 08:17:53,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,174 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 08:17:53,174 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 08:17:53,174 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,174 Computation: |
|
2023-10-11 08:17:53,174 - compute on device: cuda:0 |
|
2023-10-11 08:17:53,174 - embedding storage: none |
|
2023-10-11 08:17:53,174 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,174 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" |
|
2023-10-11 08:17:53,174 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,174 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:17:53,174 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 08:18:49,180 epoch 1 - iter 89/893 - loss 2.82046970 - time (sec): 56.00 - samples/sec: 480.80 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 08:19:47,163 epoch 1 - iter 178/893 - loss 2.74859222 - time (sec): 113.99 - samples/sec: 444.71 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 08:20:38,003 epoch 1 - iter 267/893 - loss 2.56466690 - time (sec): 164.83 - samples/sec: 452.08 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 08:21:26,826 epoch 1 - iter 356/893 - loss 2.35440028 - time (sec): 213.65 - samples/sec: 462.13 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 08:22:19,664 epoch 1 - iter 445/893 - loss 2.12091695 - time (sec): 266.49 - samples/sec: 468.58 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 08:23:15,478 epoch 1 - iter 534/893 - loss 1.90749191 - time (sec): 322.30 - samples/sec: 461.56 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 08:24:10,556 epoch 1 - iter 623/893 - loss 1.71575701 - time (sec): 377.38 - samples/sec: 462.06 - lr: 0.000104 - momentum: 0.000000 |
|
2023-10-11 08:25:03,879 epoch 1 - iter 712/893 - loss 1.56436100 - time (sec): 430.70 - samples/sec: 461.70 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 08:25:55,832 epoch 1 - iter 801/893 - loss 1.43013467 - time (sec): 482.66 - samples/sec: 464.19 - lr: 0.000134 - momentum: 0.000000 |
|
2023-10-11 08:26:50,261 epoch 1 - iter 890/893 - loss 1.32696860 - time (sec): 537.08 - samples/sec: 461.63 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 08:26:52,072 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:26:52,072 EPOCH 1 done: loss 1.3236 - lr: 0.000149 |
|
2023-10-11 08:27:16,454 DEV : loss 0.23555637896060944 - f1-score (micro avg) 0.524 |
|
2023-10-11 08:27:16,494 saving best model |
|
2023-10-11 08:27:17,659 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:28:20,694 epoch 2 - iter 89/893 - loss 0.24308848 - time (sec): 63.03 - samples/sec: 416.25 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 08:29:23,466 epoch 2 - iter 178/893 - loss 0.24144103 - time (sec): 125.80 - samples/sec: 409.21 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-11 08:30:18,170 epoch 2 - iter 267/893 - loss 0.22698655 - time (sec): 180.51 - samples/sec: 417.46 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-11 08:31:13,243 epoch 2 - iter 356/893 - loss 0.20654434 - time (sec): 235.58 - samples/sec: 427.68 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-11 08:32:08,723 epoch 2 - iter 445/893 - loss 0.19778611 - time (sec): 291.06 - samples/sec: 427.49 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 08:33:05,539 epoch 2 - iter 534/893 - loss 0.18786454 - time (sec): 347.88 - samples/sec: 431.84 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-11 08:33:57,472 epoch 2 - iter 623/893 - loss 0.18152301 - time (sec): 399.81 - samples/sec: 436.88 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-11 08:34:48,759 epoch 2 - iter 712/893 - loss 0.17442350 - time (sec): 451.10 - samples/sec: 438.85 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 08:35:41,408 epoch 2 - iter 801/893 - loss 0.16821741 - time (sec): 503.75 - samples/sec: 441.76 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 08:36:33,619 epoch 2 - iter 890/893 - loss 0.16223406 - time (sec): 555.96 - samples/sec: 445.94 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 08:36:35,297 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:36:35,297 EPOCH 2 done: loss 0.1619 - lr: 0.000133 |
|
2023-10-11 08:36:58,542 DEV : loss 0.09554639458656311 - f1-score (micro avg) 0.7648 |
|
2023-10-11 08:36:58,576 saving best model |
|
2023-10-11 08:37:01,229 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:37:54,732 epoch 3 - iter 89/893 - loss 0.06890555 - time (sec): 53.50 - samples/sec: 445.74 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 08:38:46,142 epoch 3 - iter 178/893 - loss 0.06908598 - time (sec): 104.91 - samples/sec: 466.15 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 08:39:38,869 epoch 3 - iter 267/893 - loss 0.06842913 - time (sec): 157.63 - samples/sec: 464.28 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 08:40:30,914 epoch 3 - iter 356/893 - loss 0.07036545 - time (sec): 209.68 - samples/sec: 468.26 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 08:41:24,718 epoch 3 - iter 445/893 - loss 0.07189188 - time (sec): 263.48 - samples/sec: 466.37 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 08:42:18,632 epoch 3 - iter 534/893 - loss 0.07394553 - time (sec): 317.40 - samples/sec: 464.75 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 08:43:12,718 epoch 3 - iter 623/893 - loss 0.07564987 - time (sec): 371.48 - samples/sec: 468.82 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-11 08:44:05,682 epoch 3 - iter 712/893 - loss 0.07431156 - time (sec): 424.45 - samples/sec: 466.68 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-11 08:44:58,592 epoch 3 - iter 801/893 - loss 0.07235690 - time (sec): 477.36 - samples/sec: 467.64 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-11 08:45:49,069 epoch 3 - iter 890/893 - loss 0.07242906 - time (sec): 527.83 - samples/sec: 470.12 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 08:45:50,521 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:45:50,521 EPOCH 3 done: loss 0.0726 - lr: 0.000117 |
|
2023-10-11 08:46:12,903 DEV : loss 0.09995921701192856 - f1-score (micro avg) 0.7951 |
|
2023-10-11 08:46:12,940 saving best model |
|
2023-10-11 08:46:15,623 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:47:06,741 epoch 4 - iter 89/893 - loss 0.04054614 - time (sec): 51.11 - samples/sec: 470.05 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-11 08:48:03,732 epoch 4 - iter 178/893 - loss 0.04820537 - time (sec): 108.10 - samples/sec: 453.43 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-11 08:49:02,459 epoch 4 - iter 267/893 - loss 0.04726065 - time (sec): 166.83 - samples/sec: 453.20 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 08:49:53,635 epoch 4 - iter 356/893 - loss 0.04746388 - time (sec): 218.01 - samples/sec: 459.58 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 08:50:46,643 epoch 4 - iter 445/893 - loss 0.04847189 - time (sec): 271.02 - samples/sec: 464.09 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-11 08:51:37,168 epoch 4 - iter 534/893 - loss 0.04943906 - time (sec): 321.54 - samples/sec: 462.24 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 08:52:32,719 epoch 4 - iter 623/893 - loss 0.04991779 - time (sec): 377.09 - samples/sec: 460.72 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 08:53:25,238 epoch 4 - iter 712/893 - loss 0.04993927 - time (sec): 429.61 - samples/sec: 460.77 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 08:54:18,323 epoch 4 - iter 801/893 - loss 0.04923222 - time (sec): 482.70 - samples/sec: 462.17 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-11 08:55:10,626 epoch 4 - iter 890/893 - loss 0.04836037 - time (sec): 535.00 - samples/sec: 463.56 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 08:55:12,296 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:55:12,297 EPOCH 4 done: loss 0.0483 - lr: 0.000100 |
|
2023-10-11 08:55:38,738 DEV : loss 0.1315266638994217 - f1-score (micro avg) 0.7959 |
|
2023-10-11 08:55:38,776 saving best model |
|
2023-10-11 08:55:41,441 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 08:56:32,049 epoch 5 - iter 89/893 - loss 0.03387070 - time (sec): 50.60 - samples/sec: 496.32 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 08:57:25,784 epoch 5 - iter 178/893 - loss 0.03724058 - time (sec): 104.34 - samples/sec: 484.93 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-11 08:58:17,751 epoch 5 - iter 267/893 - loss 0.03458615 - time (sec): 156.31 - samples/sec: 483.86 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 08:59:12,353 epoch 5 - iter 356/893 - loss 0.03459710 - time (sec): 210.91 - samples/sec: 472.89 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 09:00:09,877 epoch 5 - iter 445/893 - loss 0.03385511 - time (sec): 268.43 - samples/sec: 460.47 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-11 09:01:06,978 epoch 5 - iter 534/893 - loss 0.03402281 - time (sec): 325.53 - samples/sec: 453.46 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 09:02:01,657 epoch 5 - iter 623/893 - loss 0.03373238 - time (sec): 380.21 - samples/sec: 454.86 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-11 09:02:55,698 epoch 5 - iter 712/893 - loss 0.03440469 - time (sec): 434.25 - samples/sec: 454.50 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 09:03:47,467 epoch 5 - iter 801/893 - loss 0.03456330 - time (sec): 486.02 - samples/sec: 457.40 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 09:04:43,386 epoch 5 - iter 890/893 - loss 0.03572654 - time (sec): 541.94 - samples/sec: 457.73 - lr: 0.000083 - momentum: 0.000000 |
|
2023-10-11 09:04:44,857 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:44,858 EPOCH 5 done: loss 0.0358 - lr: 0.000083 |
|
2023-10-11 09:05:05,510 DEV : loss 0.14709879457950592 - f1-score (micro avg) 0.804 |
|
2023-10-11 09:05:05,544 saving best model |
|
2023-10-11 09:05:08,168 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:05:58,891 epoch 6 - iter 89/893 - loss 0.03395492 - time (sec): 50.72 - samples/sec: 512.87 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 09:06:47,795 epoch 6 - iter 178/893 - loss 0.03071508 - time (sec): 99.62 - samples/sec: 498.26 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 09:07:36,756 epoch 6 - iter 267/893 - loss 0.03123902 - time (sec): 148.58 - samples/sec: 495.75 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 09:08:26,624 epoch 6 - iter 356/893 - loss 0.02913751 - time (sec): 198.45 - samples/sec: 499.99 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 09:09:16,587 epoch 6 - iter 445/893 - loss 0.02833281 - time (sec): 248.41 - samples/sec: 498.09 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 09:10:05,096 epoch 6 - iter 534/893 - loss 0.02808492 - time (sec): 296.92 - samples/sec: 495.27 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 09:10:55,678 epoch 6 - iter 623/893 - loss 0.02776145 - time (sec): 347.51 - samples/sec: 494.44 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-11 09:11:49,851 epoch 6 - iter 712/893 - loss 0.02816525 - time (sec): 401.68 - samples/sec: 493.31 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-11 09:12:40,879 epoch 6 - iter 801/893 - loss 0.02763361 - time (sec): 452.71 - samples/sec: 493.02 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 09:13:35,189 epoch 6 - iter 890/893 - loss 0.02744010 - time (sec): 507.02 - samples/sec: 489.27 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-11 09:13:36,841 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:13:36,842 EPOCH 6 done: loss 0.0274 - lr: 0.000067 |
|
2023-10-11 09:13:57,966 DEV : loss 0.17321458458900452 - f1-score (micro avg) 0.7967 |
|
2023-10-11 09:13:57,997 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:14:50,694 epoch 7 - iter 89/893 - loss 0.02413897 - time (sec): 52.70 - samples/sec: 465.20 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-11 09:15:39,245 epoch 7 - iter 178/893 - loss 0.02544808 - time (sec): 101.25 - samples/sec: 471.96 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 09:16:30,729 epoch 7 - iter 267/893 - loss 0.02381130 - time (sec): 152.73 - samples/sec: 478.37 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 09:17:19,133 epoch 7 - iter 356/893 - loss 0.02288685 - time (sec): 201.13 - samples/sec: 483.04 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 09:18:07,867 epoch 7 - iter 445/893 - loss 0.02431887 - time (sec): 249.87 - samples/sec: 489.65 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-11 09:18:57,693 epoch 7 - iter 534/893 - loss 0.02364863 - time (sec): 299.69 - samples/sec: 493.70 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 09:19:47,195 epoch 7 - iter 623/893 - loss 0.02293135 - time (sec): 349.20 - samples/sec: 495.50 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 09:20:37,649 epoch 7 - iter 712/893 - loss 0.02271678 - time (sec): 399.65 - samples/sec: 495.74 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-11 09:21:28,122 epoch 7 - iter 801/893 - loss 0.02242817 - time (sec): 450.12 - samples/sec: 496.00 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 09:22:19,499 epoch 7 - iter 890/893 - loss 0.02186383 - time (sec): 501.50 - samples/sec: 494.85 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 09:22:20,940 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:22:20,940 EPOCH 7 done: loss 0.0219 - lr: 0.000050 |
|
2023-10-11 09:22:43,053 DEV : loss 0.18447040021419525 - f1-score (micro avg) 0.8054 |
|
2023-10-11 09:22:43,084 saving best model |
|
2023-10-11 09:22:45,710 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:23:35,757 epoch 8 - iter 89/893 - loss 0.02030536 - time (sec): 50.04 - samples/sec: 493.50 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 09:24:24,663 epoch 8 - iter 178/893 - loss 0.01795230 - time (sec): 98.95 - samples/sec: 499.63 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 09:25:16,205 epoch 8 - iter 267/893 - loss 0.01573223 - time (sec): 150.49 - samples/sec: 483.51 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 09:26:04,490 epoch 8 - iter 356/893 - loss 0.01501584 - time (sec): 198.78 - samples/sec: 484.02 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 09:26:56,116 epoch 8 - iter 445/893 - loss 0.01684699 - time (sec): 250.40 - samples/sec: 479.79 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-11 09:27:50,434 epoch 8 - iter 534/893 - loss 0.01897729 - time (sec): 304.72 - samples/sec: 480.91 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-11 09:28:41,741 epoch 8 - iter 623/893 - loss 0.01864153 - time (sec): 356.03 - samples/sec: 484.28 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 09:29:34,049 epoch 8 - iter 712/893 - loss 0.01835066 - time (sec): 408.33 - samples/sec: 487.24 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-11 09:30:25,154 epoch 8 - iter 801/893 - loss 0.01881720 - time (sec): 459.44 - samples/sec: 488.51 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-11 09:31:13,819 epoch 8 - iter 890/893 - loss 0.01822911 - time (sec): 508.10 - samples/sec: 488.28 - lr: 0.000033 - momentum: 0.000000 |
|
2023-10-11 09:31:15,385 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:31:15,385 EPOCH 8 done: loss 0.0182 - lr: 0.000033 |
|
2023-10-11 09:31:37,483 DEV : loss 0.1934366077184677 - f1-score (micro avg) 0.8032 |
|
2023-10-11 09:31:37,514 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:32:26,805 epoch 9 - iter 89/893 - loss 0.01629395 - time (sec): 49.29 - samples/sec: 483.90 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 09:33:20,446 epoch 9 - iter 178/893 - loss 0.01330756 - time (sec): 102.93 - samples/sec: 454.04 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 09:34:09,567 epoch 9 - iter 267/893 - loss 0.01423309 - time (sec): 152.05 - samples/sec: 452.49 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-11 09:35:01,878 epoch 9 - iter 356/893 - loss 0.01350426 - time (sec): 204.36 - samples/sec: 464.37 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 09:35:53,100 epoch 9 - iter 445/893 - loss 0.01413719 - time (sec): 255.58 - samples/sec: 470.56 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 09:36:44,141 epoch 9 - iter 534/893 - loss 0.01441919 - time (sec): 306.62 - samples/sec: 476.65 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 09:37:40,428 epoch 9 - iter 623/893 - loss 0.01462155 - time (sec): 362.91 - samples/sec: 475.63 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 09:38:36,494 epoch 9 - iter 712/893 - loss 0.01443591 - time (sec): 418.98 - samples/sec: 474.11 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 09:39:32,640 epoch 9 - iter 801/893 - loss 0.01436811 - time (sec): 475.12 - samples/sec: 470.51 - lr: 0.000019 - momentum: 0.000000 |
|
2023-10-11 09:40:23,804 epoch 9 - iter 890/893 - loss 0.01444067 - time (sec): 526.29 - samples/sec: 470.61 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-11 09:40:25,601 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:40:25,602 EPOCH 9 done: loss 0.0144 - lr: 0.000017 |
|
2023-10-11 09:40:48,079 DEV : loss 0.19690608978271484 - f1-score (micro avg) 0.8067 |
|
2023-10-11 09:40:48,111 saving best model |
|
2023-10-11 09:40:50,764 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:41:44,710 epoch 10 - iter 89/893 - loss 0.01266720 - time (sec): 53.94 - samples/sec: 462.49 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 09:42:42,606 epoch 10 - iter 178/893 - loss 0.01298725 - time (sec): 111.84 - samples/sec: 430.16 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 09:43:33,690 epoch 10 - iter 267/893 - loss 0.01235157 - time (sec): 162.92 - samples/sec: 446.58 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-11 09:44:23,850 epoch 10 - iter 356/893 - loss 0.01184256 - time (sec): 213.08 - samples/sec: 461.41 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-11 09:45:15,104 epoch 10 - iter 445/893 - loss 0.01173342 - time (sec): 264.34 - samples/sec: 469.89 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-11 09:46:06,317 epoch 10 - iter 534/893 - loss 0.01162859 - time (sec): 315.55 - samples/sec: 471.23 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 09:46:58,795 epoch 10 - iter 623/893 - loss 0.01135377 - time (sec): 368.03 - samples/sec: 469.24 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 09:47:49,365 epoch 10 - iter 712/893 - loss 0.01098390 - time (sec): 418.60 - samples/sec: 472.05 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-11 09:48:39,705 epoch 10 - iter 801/893 - loss 0.01075236 - time (sec): 468.94 - samples/sec: 474.42 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 09:49:30,954 epoch 10 - iter 890/893 - loss 0.01084378 - time (sec): 520.19 - samples/sec: 477.22 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 09:49:32,283 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:49:32,284 EPOCH 10 done: loss 0.0108 - lr: 0.000000 |
|
2023-10-11 09:49:53,633 DEV : loss 0.20006020367145538 - f1-score (micro avg) 0.8024 |
|
2023-10-11 09:49:54,579 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:49:54,581 Loading model from best epoch ... |
|
2023-10-11 09:49:59,266 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-11 09:51:09,198 |
|
Results: |
|
- F-score (micro) 0.7079 |
|
- F-score (macro) 0.6296 |
|
- Accuracy 0.564 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.7382 0.7288 0.7335 1095 |
|
PER 0.7646 0.7737 0.7692 1012 |
|
ORG 0.4451 0.6134 0.5159 357 |
|
HumanProd 0.4118 0.6364 0.5000 33 |
|
|
|
micro avg 0.6877 0.7293 0.7079 2497 |
|
macro avg 0.5899 0.6881 0.6296 2497 |
|
weighted avg 0.7027 0.7293 0.7137 2497 |
|
|
|
2023-10-11 09:51:09,198 ---------------------------------------------------------------------------------------------------- |
|
|