|
2023-10-11 10:32:58,086 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,088 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 10:32:58,089 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,089 MultiCorpus: 1085 train + 148 dev + 364 test sentences |
|
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator |
|
2023-10-11 10:32:58,089 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,089 Train: 1085 sentences |
|
2023-10-11 10:32:58,089 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 10:32:58,089 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,089 Training Params: |
|
2023-10-11 10:32:58,089 - learning_rate: "0.00016" |
|
2023-10-11 10:32:58,089 - mini_batch_size: "8" |
|
2023-10-11 10:32:58,089 - max_epochs: "10" |
|
2023-10-11 10:32:58,089 - shuffle: "True" |
|
2023-10-11 10:32:58,089 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,090 Plugins: |
|
2023-10-11 10:32:58,090 - TensorboardLogger |
|
2023-10-11 10:32:58,090 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 10:32:58,090 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,090 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 10:32:58,090 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 10:32:58,090 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,090 Computation: |
|
2023-10-11 10:32:58,090 - compute on device: cuda:0 |
|
2023-10-11 10:32:58,090 - embedding storage: none |
|
2023-10-11 10:32:58,090 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,090 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" |
|
2023-10-11 10:32:58,090 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,090 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:32:58,091 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 10:33:07,135 epoch 1 - iter 13/136 - loss 2.82905636 - time (sec): 9.04 - samples/sec: 614.54 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 10:33:16,421 epoch 1 - iter 26/136 - loss 2.82283109 - time (sec): 18.33 - samples/sec: 587.61 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 10:33:26,109 epoch 1 - iter 39/136 - loss 2.81195330 - time (sec): 28.02 - samples/sec: 558.24 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 10:33:34,871 epoch 1 - iter 52/136 - loss 2.79359051 - time (sec): 36.78 - samples/sec: 558.01 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 10:33:43,323 epoch 1 - iter 65/136 - loss 2.75967078 - time (sec): 45.23 - samples/sec: 559.40 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 10:33:52,211 epoch 1 - iter 78/136 - loss 2.70744027 - time (sec): 54.12 - samples/sec: 559.05 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 10:34:00,586 epoch 1 - iter 91/136 - loss 2.63923954 - time (sec): 62.49 - samples/sec: 560.68 - lr: 0.000106 - momentum: 0.000000 |
|
2023-10-11 10:34:09,109 epoch 1 - iter 104/136 - loss 2.56285426 - time (sec): 71.02 - samples/sec: 561.07 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 10:34:17,698 epoch 1 - iter 117/136 - loss 2.48134613 - time (sec): 79.61 - samples/sec: 562.29 - lr: 0.000136 - momentum: 0.000000 |
|
2023-10-11 10:34:26,610 epoch 1 - iter 130/136 - loss 2.39548058 - time (sec): 88.52 - samples/sec: 564.09 - lr: 0.000152 - momentum: 0.000000 |
|
2023-10-11 10:34:30,259 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:34:30,259 EPOCH 1 done: loss 2.3601 - lr: 0.000152 |
|
2023-10-11 10:34:34,993 DEV : loss 1.3611388206481934 - f1-score (micro avg) 0.0 |
|
2023-10-11 10:34:35,001 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:34:44,448 epoch 2 - iter 13/136 - loss 1.33420703 - time (sec): 9.45 - samples/sec: 609.84 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 10:34:53,003 epoch 2 - iter 26/136 - loss 1.27788032 - time (sec): 18.00 - samples/sec: 597.09 - lr: 0.000157 - momentum: 0.000000 |
|
2023-10-11 10:35:01,649 epoch 2 - iter 39/136 - loss 1.19457463 - time (sec): 26.65 - samples/sec: 600.90 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-11 10:35:09,691 epoch 2 - iter 52/136 - loss 1.12462593 - time (sec): 34.69 - samples/sec: 589.01 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-11 10:35:17,806 epoch 2 - iter 65/136 - loss 1.05838575 - time (sec): 42.80 - samples/sec: 587.74 - lr: 0.000152 - momentum: 0.000000 |
|
2023-10-11 10:35:25,642 epoch 2 - iter 78/136 - loss 1.00499391 - time (sec): 50.64 - samples/sec: 577.46 - lr: 0.000150 - momentum: 0.000000 |
|
2023-10-11 10:35:34,370 epoch 2 - iter 91/136 - loss 0.94353939 - time (sec): 59.37 - samples/sec: 575.06 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 10:35:42,539 epoch 2 - iter 104/136 - loss 0.90638593 - time (sec): 67.54 - samples/sec: 566.98 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-11 10:35:51,886 epoch 2 - iter 117/136 - loss 0.85831314 - time (sec): 76.88 - samples/sec: 571.48 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-11 10:36:01,521 epoch 2 - iter 130/136 - loss 0.82520135 - time (sec): 86.52 - samples/sec: 575.24 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-11 10:36:05,445 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:36:05,446 EPOCH 2 done: loss 0.8126 - lr: 0.000143 |
|
2023-10-11 10:36:11,132 DEV : loss 0.47113388776779175 - f1-score (micro avg) 0.0 |
|
2023-10-11 10:36:11,141 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:36:19,932 epoch 3 - iter 13/136 - loss 0.52553173 - time (sec): 8.79 - samples/sec: 566.39 - lr: 0.000141 - momentum: 0.000000 |
|
2023-10-11 10:36:28,958 epoch 3 - iter 26/136 - loss 0.47220984 - time (sec): 17.81 - samples/sec: 579.07 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-11 10:36:37,970 epoch 3 - iter 39/136 - loss 0.46796738 - time (sec): 26.83 - samples/sec: 590.48 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 10:36:46,164 epoch 3 - iter 52/136 - loss 0.46389304 - time (sec): 35.02 - samples/sec: 583.74 - lr: 0.000136 - momentum: 0.000000 |
|
2023-10-11 10:36:54,424 epoch 3 - iter 65/136 - loss 0.45462956 - time (sec): 43.28 - samples/sec: 576.74 - lr: 0.000134 - momentum: 0.000000 |
|
2023-10-11 10:37:03,468 epoch 3 - iter 78/136 - loss 0.44492860 - time (sec): 52.32 - samples/sec: 583.43 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 10:37:12,244 epoch 3 - iter 91/136 - loss 0.42881004 - time (sec): 61.10 - samples/sec: 584.03 - lr: 0.000131 - momentum: 0.000000 |
|
2023-10-11 10:37:21,211 epoch 3 - iter 104/136 - loss 0.40971397 - time (sec): 70.07 - samples/sec: 582.00 - lr: 0.000129 - momentum: 0.000000 |
|
2023-10-11 10:37:29,425 epoch 3 - iter 117/136 - loss 0.39627687 - time (sec): 78.28 - samples/sec: 576.71 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 10:37:37,379 epoch 3 - iter 130/136 - loss 0.39607367 - time (sec): 86.24 - samples/sec: 571.76 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 10:37:41,524 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:37:41,524 EPOCH 3 done: loss 0.3879 - lr: 0.000126 |
|
2023-10-11 10:37:47,309 DEV : loss 0.28040581941604614 - f1-score (micro avg) 0.2634 |
|
2023-10-11 10:37:47,317 saving best model |
|
2023-10-11 10:37:48,185 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:37:56,781 epoch 4 - iter 13/136 - loss 0.26964123 - time (sec): 8.59 - samples/sec: 546.78 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 10:38:04,326 epoch 4 - iter 26/136 - loss 0.30851485 - time (sec): 16.14 - samples/sec: 521.02 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 10:38:12,962 epoch 4 - iter 39/136 - loss 0.32765945 - time (sec): 24.77 - samples/sec: 544.95 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-11 10:38:21,363 epoch 4 - iter 52/136 - loss 0.31247566 - time (sec): 33.18 - samples/sec: 551.30 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-11 10:38:29,247 epoch 4 - iter 65/136 - loss 0.31240592 - time (sec): 41.06 - samples/sec: 544.74 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-11 10:38:37,346 epoch 4 - iter 78/136 - loss 0.30735790 - time (sec): 49.16 - samples/sec: 550.88 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-11 10:38:47,151 epoch 4 - iter 91/136 - loss 0.30114313 - time (sec): 58.96 - samples/sec: 570.23 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-11 10:38:55,408 epoch 4 - iter 104/136 - loss 0.29747977 - time (sec): 67.22 - samples/sec: 574.15 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-11 10:39:04,230 epoch 4 - iter 117/136 - loss 0.29176984 - time (sec): 76.04 - samples/sec: 579.91 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 10:39:13,172 epoch 4 - iter 130/136 - loss 0.28456610 - time (sec): 84.99 - samples/sec: 584.89 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-11 10:39:16,968 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:39:16,968 EPOCH 4 done: loss 0.2853 - lr: 0.000108 |
|
2023-10-11 10:39:22,640 DEV : loss 0.23706214129924774 - f1-score (micro avg) 0.4307 |
|
2023-10-11 10:39:22,649 saving best model |
|
2023-10-11 10:39:25,192 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:39:33,989 epoch 5 - iter 13/136 - loss 0.22322927 - time (sec): 8.79 - samples/sec: 619.75 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 10:39:42,567 epoch 5 - iter 26/136 - loss 0.24861548 - time (sec): 17.37 - samples/sec: 607.51 - lr: 0.000104 - momentum: 0.000000 |
|
2023-10-11 10:39:50,939 epoch 5 - iter 39/136 - loss 0.24237781 - time (sec): 25.74 - samples/sec: 606.54 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-11 10:39:59,037 epoch 5 - iter 52/136 - loss 0.24947129 - time (sec): 33.84 - samples/sec: 598.18 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 10:40:07,314 epoch 5 - iter 65/136 - loss 0.24401319 - time (sec): 42.12 - samples/sec: 584.48 - lr: 0.000099 - momentum: 0.000000 |
|
2023-10-11 10:40:15,450 epoch 5 - iter 78/136 - loss 0.24091737 - time (sec): 50.25 - samples/sec: 581.10 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-11 10:40:23,629 epoch 5 - iter 91/136 - loss 0.22839402 - time (sec): 58.43 - samples/sec: 580.20 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 10:40:32,299 epoch 5 - iter 104/136 - loss 0.23260473 - time (sec): 67.10 - samples/sec: 583.36 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 10:40:41,678 epoch 5 - iter 117/136 - loss 0.22826299 - time (sec): 76.48 - samples/sec: 589.32 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-11 10:40:50,175 epoch 5 - iter 130/136 - loss 0.22918291 - time (sec): 84.98 - samples/sec: 591.32 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 10:40:53,438 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:40:53,439 EPOCH 5 done: loss 0.2278 - lr: 0.000090 |
|
2023-10-11 10:40:59,295 DEV : loss 0.2019164115190506 - f1-score (micro avg) 0.5331 |
|
2023-10-11 10:40:59,304 saving best model |
|
2023-10-11 10:41:01,882 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:41:10,040 epoch 6 - iter 13/136 - loss 0.18317750 - time (sec): 8.15 - samples/sec: 557.22 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-11 10:41:18,066 epoch 6 - iter 26/136 - loss 0.20214126 - time (sec): 16.18 - samples/sec: 546.57 - lr: 0.000086 - momentum: 0.000000 |
|
2023-10-11 10:41:26,453 epoch 6 - iter 39/136 - loss 0.20236218 - time (sec): 24.57 - samples/sec: 550.38 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 10:41:35,349 epoch 6 - iter 52/136 - loss 0.19854214 - time (sec): 33.46 - samples/sec: 562.28 - lr: 0.000083 - momentum: 0.000000 |
|
2023-10-11 10:41:44,067 epoch 6 - iter 65/136 - loss 0.18937218 - time (sec): 42.18 - samples/sec: 576.29 - lr: 0.000081 - momentum: 0.000000 |
|
2023-10-11 10:41:52,334 epoch 6 - iter 78/136 - loss 0.18219775 - time (sec): 50.45 - samples/sec: 573.04 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-11 10:42:01,293 epoch 6 - iter 91/136 - loss 0.17828107 - time (sec): 59.41 - samples/sec: 575.22 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 10:42:09,837 epoch 6 - iter 104/136 - loss 0.18218020 - time (sec): 67.95 - samples/sec: 571.92 - lr: 0.000076 - momentum: 0.000000 |
|
2023-10-11 10:42:18,635 epoch 6 - iter 117/136 - loss 0.17797010 - time (sec): 76.75 - samples/sec: 573.60 - lr: 0.000074 - momentum: 0.000000 |
|
2023-10-11 10:42:28,088 epoch 6 - iter 130/136 - loss 0.17347803 - time (sec): 86.20 - samples/sec: 579.78 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-11 10:42:31,636 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:42:31,636 EPOCH 6 done: loss 0.1741 - lr: 0.000072 |
|
2023-10-11 10:42:37,610 DEV : loss 0.18107403814792633 - f1-score (micro avg) 0.6025 |
|
2023-10-11 10:42:37,618 saving best model |
|
2023-10-11 10:42:40,162 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:42:49,084 epoch 7 - iter 13/136 - loss 0.14984582 - time (sec): 8.92 - samples/sec: 596.35 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-11 10:42:57,253 epoch 7 - iter 26/136 - loss 0.15617649 - time (sec): 17.09 - samples/sec: 579.63 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 10:43:06,279 epoch 7 - iter 39/136 - loss 0.15137208 - time (sec): 26.11 - samples/sec: 583.76 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-11 10:43:15,206 epoch 7 - iter 52/136 - loss 0.15440621 - time (sec): 35.04 - samples/sec: 589.50 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-11 10:43:23,800 epoch 7 - iter 65/136 - loss 0.15352902 - time (sec): 43.63 - samples/sec: 587.14 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 10:43:32,306 epoch 7 - iter 78/136 - loss 0.15036344 - time (sec): 52.14 - samples/sec: 588.95 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 10:43:40,770 epoch 7 - iter 91/136 - loss 0.15069822 - time (sec): 60.60 - samples/sec: 587.81 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 10:43:48,928 epoch 7 - iter 104/136 - loss 0.14815979 - time (sec): 68.76 - samples/sec: 580.00 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-11 10:43:57,562 epoch 7 - iter 117/136 - loss 0.14471906 - time (sec): 77.40 - samples/sec: 581.59 - lr: 0.000056 - momentum: 0.000000 |
|
2023-10-11 10:44:05,803 epoch 7 - iter 130/136 - loss 0.13990487 - time (sec): 85.64 - samples/sec: 580.40 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 10:44:09,781 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:44:09,781 EPOCH 7 done: loss 0.1379 - lr: 0.000055 |
|
2023-10-11 10:44:15,642 DEV : loss 0.16930824518203735 - f1-score (micro avg) 0.6124 |
|
2023-10-11 10:44:15,651 saving best model |
|
2023-10-11 10:44:18,226 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:44:26,237 epoch 8 - iter 13/136 - loss 0.12269906 - time (sec): 8.01 - samples/sec: 588.24 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 10:44:34,428 epoch 8 - iter 26/136 - loss 0.12227413 - time (sec): 16.20 - samples/sec: 585.74 - lr: 0.000051 - momentum: 0.000000 |
|
2023-10-11 10:44:43,146 epoch 8 - iter 39/136 - loss 0.12413325 - time (sec): 24.92 - samples/sec: 598.72 - lr: 0.000049 - momentum: 0.000000 |
|
2023-10-11 10:44:51,180 epoch 8 - iter 52/136 - loss 0.12476583 - time (sec): 32.95 - samples/sec: 592.13 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 10:44:59,803 epoch 8 - iter 65/136 - loss 0.12178669 - time (sec): 41.57 - samples/sec: 595.72 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 10:45:08,449 epoch 8 - iter 78/136 - loss 0.12151521 - time (sec): 50.22 - samples/sec: 598.47 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-11 10:45:16,884 epoch 8 - iter 91/136 - loss 0.11963538 - time (sec): 58.65 - samples/sec: 596.79 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-11 10:45:25,373 epoch 8 - iter 104/136 - loss 0.12043380 - time (sec): 67.14 - samples/sec: 599.30 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-11 10:45:34,551 epoch 8 - iter 117/136 - loss 0.11448969 - time (sec): 76.32 - samples/sec: 602.30 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 10:45:41,804 epoch 8 - iter 130/136 - loss 0.11567474 - time (sec): 83.57 - samples/sec: 593.87 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-11 10:45:45,514 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:45:45,515 EPOCH 8 done: loss 0.1145 - lr: 0.000037 |
|
2023-10-11 10:45:51,045 DEV : loss 0.15804526209831238 - f1-score (micro avg) 0.6391 |
|
2023-10-11 10:45:51,053 saving best model |
|
2023-10-11 10:45:53,603 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:46:02,581 epoch 9 - iter 13/136 - loss 0.09546268 - time (sec): 8.97 - samples/sec: 633.36 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 10:46:11,048 epoch 9 - iter 26/136 - loss 0.10013911 - time (sec): 17.44 - samples/sec: 612.07 - lr: 0.000033 - momentum: 0.000000 |
|
2023-10-11 10:46:19,177 epoch 9 - iter 39/136 - loss 0.10244150 - time (sec): 25.57 - samples/sec: 594.61 - lr: 0.000031 - momentum: 0.000000 |
|
2023-10-11 10:46:27,594 epoch 9 - iter 52/136 - loss 0.10511116 - time (sec): 33.99 - samples/sec: 588.79 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 10:46:36,305 epoch 9 - iter 65/136 - loss 0.10253249 - time (sec): 42.70 - samples/sec: 594.55 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-11 10:46:44,319 epoch 9 - iter 78/136 - loss 0.10317238 - time (sec): 50.71 - samples/sec: 585.01 - lr: 0.000026 - momentum: 0.000000 |
|
2023-10-11 10:46:53,134 epoch 9 - iter 91/136 - loss 0.10200602 - time (sec): 59.53 - samples/sec: 585.97 - lr: 0.000024 - momentum: 0.000000 |
|
2023-10-11 10:47:01,366 epoch 9 - iter 104/136 - loss 0.10049025 - time (sec): 67.76 - samples/sec: 581.49 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 10:47:09,944 epoch 9 - iter 117/136 - loss 0.10130709 - time (sec): 76.34 - samples/sec: 582.89 - lr: 0.000021 - momentum: 0.000000 |
|
2023-10-11 10:47:18,970 epoch 9 - iter 130/136 - loss 0.10043735 - time (sec): 85.36 - samples/sec: 581.90 - lr: 0.000019 - momentum: 0.000000 |
|
2023-10-11 10:47:22,792 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:47:22,793 EPOCH 9 done: loss 0.0987 - lr: 0.000019 |
|
2023-10-11 10:47:28,754 DEV : loss 0.15687064826488495 - f1-score (micro avg) 0.6439 |
|
2023-10-11 10:47:28,764 saving best model |
|
2023-10-11 10:47:31,306 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:47:39,905 epoch 10 - iter 13/136 - loss 0.10762881 - time (sec): 8.60 - samples/sec: 602.29 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-11 10:47:47,726 epoch 10 - iter 26/136 - loss 0.11565589 - time (sec): 16.42 - samples/sec: 560.96 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 10:47:57,569 epoch 10 - iter 39/136 - loss 0.10752330 - time (sec): 26.26 - samples/sec: 604.45 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 10:48:06,902 epoch 10 - iter 52/136 - loss 0.10726886 - time (sec): 35.59 - samples/sec: 608.39 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-11 10:48:15,585 epoch 10 - iter 65/136 - loss 0.10786433 - time (sec): 44.27 - samples/sec: 607.30 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-11 10:48:23,615 epoch 10 - iter 78/136 - loss 0.10287152 - time (sec): 52.31 - samples/sec: 597.26 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-11 10:48:32,122 epoch 10 - iter 91/136 - loss 0.09965866 - time (sec): 60.81 - samples/sec: 594.01 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 10:48:39,901 epoch 10 - iter 104/136 - loss 0.09762661 - time (sec): 68.59 - samples/sec: 586.79 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 10:48:49,022 epoch 10 - iter 117/136 - loss 0.09489476 - time (sec): 77.71 - samples/sec: 587.84 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-11 10:48:57,329 epoch 10 - iter 130/136 - loss 0.09269028 - time (sec): 86.02 - samples/sec: 585.36 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 10:49:00,616 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:49:00,616 EPOCH 10 done: loss 0.0925 - lr: 0.000002 |
|
2023-10-11 10:49:06,429 DEV : loss 0.15401233732700348 - f1-score (micro avg) 0.6643 |
|
2023-10-11 10:49:06,438 saving best model |
|
2023-10-11 10:49:10,013 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:49:10,015 Loading model from best epoch ... |
|
2023-10-11 10:49:13,557 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG |
|
2023-10-11 10:49:25,408 |
|
Results: |
|
- F-score (micro) 0.626 |
|
- F-score (macro) 0.4382 |
|
- Accuracy 0.5055 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.6203 0.8429 0.7147 312 |
|
PER 0.6066 0.6154 0.6110 208 |
|
HumanProd 0.2361 0.7727 0.3617 22 |
|
ORG 0.3333 0.0364 0.0656 55 |
|
|
|
micro avg 0.5750 0.6868 0.6260 597 |
|
macro avg 0.4491 0.5669 0.4382 597 |
|
weighted avg 0.5749 0.6868 0.6057 597 |
|
|
|
2023-10-11 10:49:25,408 ---------------------------------------------------------------------------------------------------- |
|
|