|
2023-10-11 09:04:56,657 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,659 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 09:04:56,659 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,659 MultiCorpus: 1085 train + 148 dev + 364 test sentences |
|
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator |
|
2023-10-11 09:04:56,659 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,659 Train: 1085 sentences |
|
2023-10-11 09:04:56,659 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 09:04:56,659 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,660 Training Params: |
|
2023-10-11 09:04:56,660 - learning_rate: "0.00015" |
|
2023-10-11 09:04:56,660 - mini_batch_size: "8" |
|
2023-10-11 09:04:56,660 - max_epochs: "10" |
|
2023-10-11 09:04:56,660 - shuffle: "True" |
|
2023-10-11 09:04:56,660 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,660 Plugins: |
|
2023-10-11 09:04:56,660 - TensorboardLogger |
|
2023-10-11 09:04:56,660 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 09:04:56,660 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,660 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 09:04:56,660 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 09:04:56,660 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,660 Computation: |
|
2023-10-11 09:04:56,660 - compute on device: cuda:0 |
|
2023-10-11 09:04:56,661 - embedding storage: none |
|
2023-10-11 09:04:56,661 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,661 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" |
|
2023-10-11 09:04:56,661 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,661 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:04:56,661 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 09:05:05,324 epoch 1 - iter 13/136 - loss 2.85454415 - time (sec): 8.66 - samples/sec: 591.27 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 09:05:13,976 epoch 1 - iter 26/136 - loss 2.84857093 - time (sec): 17.31 - samples/sec: 539.68 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-11 09:05:22,695 epoch 1 - iter 39/136 - loss 2.83797201 - time (sec): 26.03 - samples/sec: 562.37 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-11 09:05:31,290 epoch 1 - iter 52/136 - loss 2.81925649 - time (sec): 34.63 - samples/sec: 571.88 - lr: 0.000056 - momentum: 0.000000 |
|
2023-10-11 09:05:39,850 epoch 1 - iter 65/136 - loss 2.78851902 - time (sec): 43.19 - samples/sec: 579.84 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 09:05:48,157 epoch 1 - iter 78/136 - loss 2.73824936 - time (sec): 51.49 - samples/sec: 579.76 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 09:05:56,653 epoch 1 - iter 91/136 - loss 2.66944775 - time (sec): 59.99 - samples/sec: 580.76 - lr: 0.000099 - momentum: 0.000000 |
|
2023-10-11 09:06:05,007 epoch 1 - iter 104/136 - loss 2.59680995 - time (sec): 68.34 - samples/sec: 579.67 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 09:06:14,148 epoch 1 - iter 117/136 - loss 2.50961737 - time (sec): 77.49 - samples/sec: 580.56 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 09:06:22,904 epoch 1 - iter 130/136 - loss 2.42829796 - time (sec): 86.24 - samples/sec: 581.62 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 09:06:26,301 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:06:26,302 EPOCH 1 done: loss 2.3990 - lr: 0.000142 |
|
2023-10-11 09:06:31,357 DEV : loss 1.4164044857025146 - f1-score (micro avg) 0.0 |
|
2023-10-11 09:06:31,365 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:06:40,344 epoch 2 - iter 13/136 - loss 1.39753845 - time (sec): 8.98 - samples/sec: 620.23 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 09:06:48,792 epoch 2 - iter 26/136 - loss 1.30612822 - time (sec): 17.43 - samples/sec: 595.58 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-11 09:06:58,251 epoch 2 - iter 39/136 - loss 1.24698017 - time (sec): 26.88 - samples/sec: 601.44 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-11 09:07:06,826 epoch 2 - iter 52/136 - loss 1.16189236 - time (sec): 35.46 - samples/sec: 591.11 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-11 09:07:15,457 epoch 2 - iter 65/136 - loss 1.10052914 - time (sec): 44.09 - samples/sec: 585.86 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 09:07:23,553 epoch 2 - iter 78/136 - loss 1.04318846 - time (sec): 52.19 - samples/sec: 579.64 - lr: 0.000141 - momentum: 0.000000 |
|
2023-10-11 09:07:32,021 epoch 2 - iter 91/136 - loss 0.99457757 - time (sec): 60.65 - samples/sec: 575.15 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-11 09:07:40,643 epoch 2 - iter 104/136 - loss 0.93611893 - time (sec): 69.28 - samples/sec: 575.52 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 09:07:49,116 epoch 2 - iter 117/136 - loss 0.90403324 - time (sec): 77.75 - samples/sec: 573.59 - lr: 0.000136 - momentum: 0.000000 |
|
2023-10-11 09:07:57,517 epoch 2 - iter 130/136 - loss 0.88225517 - time (sec): 86.15 - samples/sec: 572.13 - lr: 0.000134 - momentum: 0.000000 |
|
2023-10-11 09:08:01,685 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:08:01,685 EPOCH 2 done: loss 0.8649 - lr: 0.000134 |
|
2023-10-11 09:08:07,581 DEV : loss 0.49027636647224426 - f1-score (micro avg) 0.0 |
|
2023-10-11 09:08:07,589 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:08:16,565 epoch 3 - iter 13/136 - loss 0.54946705 - time (sec): 8.97 - samples/sec: 579.70 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 09:08:25,341 epoch 3 - iter 26/136 - loss 0.57802032 - time (sec): 17.75 - samples/sec: 563.01 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 09:08:34,098 epoch 3 - iter 39/136 - loss 0.52629489 - time (sec): 26.51 - samples/sec: 557.63 - lr: 0.000129 - momentum: 0.000000 |
|
2023-10-11 09:08:42,539 epoch 3 - iter 52/136 - loss 0.50784907 - time (sec): 34.95 - samples/sec: 550.19 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 09:08:52,029 epoch 3 - iter 65/136 - loss 0.49141220 - time (sec): 44.44 - samples/sec: 563.15 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 09:09:00,500 epoch 3 - iter 78/136 - loss 0.46973894 - time (sec): 52.91 - samples/sec: 564.60 - lr: 0.000124 - momentum: 0.000000 |
|
2023-10-11 09:09:10,091 epoch 3 - iter 91/136 - loss 0.45652392 - time (sec): 62.50 - samples/sec: 574.68 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-11 09:09:18,756 epoch 3 - iter 104/136 - loss 0.44771354 - time (sec): 71.17 - samples/sec: 575.80 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 09:09:27,472 epoch 3 - iter 117/136 - loss 0.43194864 - time (sec): 79.88 - samples/sec: 575.18 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 09:09:35,262 epoch 3 - iter 130/136 - loss 0.42684263 - time (sec): 87.67 - samples/sec: 570.29 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-11 09:09:38,903 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:09:38,903 EPOCH 3 done: loss 0.4273 - lr: 0.000118 |
|
2023-10-11 09:09:45,058 DEV : loss 0.30092161893844604 - f1-score (micro avg) 0.2491 |
|
2023-10-11 09:09:45,067 saving best model |
|
2023-10-11 09:09:45,936 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:09:54,331 epoch 4 - iter 13/136 - loss 0.32672183 - time (sec): 8.39 - samples/sec: 568.98 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-11 09:10:02,521 epoch 4 - iter 26/136 - loss 0.33454165 - time (sec): 16.58 - samples/sec: 551.43 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 09:10:11,774 epoch 4 - iter 39/136 - loss 0.31845525 - time (sec): 25.84 - samples/sec: 577.18 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 09:10:20,950 epoch 4 - iter 52/136 - loss 0.31203326 - time (sec): 35.01 - samples/sec: 586.29 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-11 09:10:29,119 epoch 4 - iter 65/136 - loss 0.30935257 - time (sec): 43.18 - samples/sec: 581.35 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 09:10:37,905 epoch 4 - iter 78/136 - loss 0.30389004 - time (sec): 51.97 - samples/sec: 584.76 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 09:10:46,403 epoch 4 - iter 91/136 - loss 0.31155565 - time (sec): 60.46 - samples/sec: 579.12 - lr: 0.000106 - momentum: 0.000000 |
|
2023-10-11 09:10:55,125 epoch 4 - iter 104/136 - loss 0.30416576 - time (sec): 69.19 - samples/sec: 578.07 - lr: 0.000104 - momentum: 0.000000 |
|
2023-10-11 09:11:03,837 epoch 4 - iter 117/136 - loss 0.31157005 - time (sec): 77.90 - samples/sec: 574.88 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 09:11:12,717 epoch 4 - iter 130/136 - loss 0.31360435 - time (sec): 86.78 - samples/sec: 572.74 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-11 09:11:17,287 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:11:17,287 EPOCH 4 done: loss 0.3110 - lr: 0.000101 |
|
2023-10-11 09:11:23,281 DEV : loss 0.2556583285331726 - f1-score (micro avg) 0.3513 |
|
2023-10-11 09:11:23,297 saving best model |
|
2023-10-11 09:11:25,880 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:11:33,874 epoch 5 - iter 13/136 - loss 0.28788963 - time (sec): 7.99 - samples/sec: 567.52 - lr: 0.000099 - momentum: 0.000000 |
|
2023-10-11 09:11:42,322 epoch 5 - iter 26/136 - loss 0.27712001 - time (sec): 16.44 - samples/sec: 590.10 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-11 09:11:51,282 epoch 5 - iter 39/136 - loss 0.26346164 - time (sec): 25.40 - samples/sec: 591.58 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-11 09:11:59,175 epoch 5 - iter 52/136 - loss 0.26906131 - time (sec): 33.29 - samples/sec: 579.28 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-11 09:12:07,515 epoch 5 - iter 65/136 - loss 0.25519490 - time (sec): 41.63 - samples/sec: 580.16 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-11 09:12:16,284 epoch 5 - iter 78/136 - loss 0.24701124 - time (sec): 50.40 - samples/sec: 584.95 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 09:12:25,010 epoch 5 - iter 91/136 - loss 0.24986196 - time (sec): 59.13 - samples/sec: 583.53 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 09:12:33,752 epoch 5 - iter 104/136 - loss 0.25823714 - time (sec): 67.87 - samples/sec: 585.17 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-11 09:12:42,624 epoch 5 - iter 117/136 - loss 0.26070813 - time (sec): 76.74 - samples/sec: 586.28 - lr: 0.000086 - momentum: 0.000000 |
|
2023-10-11 09:12:50,866 epoch 5 - iter 130/136 - loss 0.26182593 - time (sec): 84.98 - samples/sec: 583.96 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 09:12:54,779 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:12:54,779 EPOCH 5 done: loss 0.2621 - lr: 0.000084 |
|
2023-10-11 09:13:00,530 DEV : loss 0.2347497045993805 - f1-score (micro avg) 0.3522 |
|
2023-10-11 09:13:00,539 saving best model |
|
2023-10-11 09:13:03,115 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:13:11,908 epoch 6 - iter 13/136 - loss 0.26730900 - time (sec): 8.79 - samples/sec: 590.52 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 09:13:20,432 epoch 6 - iter 26/136 - loss 0.25691153 - time (sec): 17.31 - samples/sec: 572.24 - lr: 0.000081 - momentum: 0.000000 |
|
2023-10-11 09:13:29,176 epoch 6 - iter 39/136 - loss 0.24647921 - time (sec): 26.06 - samples/sec: 573.94 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-11 09:13:37,933 epoch 6 - iter 52/136 - loss 0.25068436 - time (sec): 34.81 - samples/sec: 589.21 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 09:13:45,698 epoch 6 - iter 65/136 - loss 0.24521590 - time (sec): 42.58 - samples/sec: 582.87 - lr: 0.000076 - momentum: 0.000000 |
|
2023-10-11 09:13:54,218 epoch 6 - iter 78/136 - loss 0.23230441 - time (sec): 51.10 - samples/sec: 589.09 - lr: 0.000074 - momentum: 0.000000 |
|
2023-10-11 09:14:02,808 epoch 6 - iter 91/136 - loss 0.22651680 - time (sec): 59.69 - samples/sec: 594.18 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 09:14:10,918 epoch 6 - iter 104/136 - loss 0.23136907 - time (sec): 67.80 - samples/sec: 591.72 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 09:14:19,378 epoch 6 - iter 117/136 - loss 0.22970596 - time (sec): 76.26 - samples/sec: 589.50 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 09:14:27,857 epoch 6 - iter 130/136 - loss 0.22534419 - time (sec): 84.74 - samples/sec: 589.45 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 09:14:31,402 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:14:31,403 EPOCH 6 done: loss 0.2242 - lr: 0.000068 |
|
2023-10-11 09:14:37,061 DEV : loss 0.2328910082578659 - f1-score (micro avg) 0.4469 |
|
2023-10-11 09:14:37,069 saving best model |
|
2023-10-11 09:14:39,620 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:14:48,048 epoch 7 - iter 13/136 - loss 0.20055562 - time (sec): 8.42 - samples/sec: 546.90 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-11 09:14:56,720 epoch 7 - iter 26/136 - loss 0.19132013 - time (sec): 17.10 - samples/sec: 584.66 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 09:15:05,870 epoch 7 - iter 39/136 - loss 0.18504756 - time (sec): 26.25 - samples/sec: 597.42 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 09:15:13,741 epoch 7 - iter 52/136 - loss 0.18716742 - time (sec): 34.12 - samples/sec: 592.46 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 09:15:22,691 epoch 7 - iter 65/136 - loss 0.18737032 - time (sec): 43.07 - samples/sec: 594.71 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 09:15:31,175 epoch 7 - iter 78/136 - loss 0.18893187 - time (sec): 51.55 - samples/sec: 590.61 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-11 09:15:38,773 epoch 7 - iter 91/136 - loss 0.19416762 - time (sec): 59.15 - samples/sec: 580.16 - lr: 0.000056 - momentum: 0.000000 |
|
2023-10-11 09:15:47,664 epoch 7 - iter 104/136 - loss 0.19167814 - time (sec): 68.04 - samples/sec: 579.16 - lr: 0.000054 - momentum: 0.000000 |
|
2023-10-11 09:15:56,563 epoch 7 - iter 117/136 - loss 0.19355860 - time (sec): 76.94 - samples/sec: 580.89 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-11 09:16:04,831 epoch 7 - iter 130/136 - loss 0.19427477 - time (sec): 85.21 - samples/sec: 577.98 - lr: 0.000051 - momentum: 0.000000 |
|
2023-10-11 09:16:09,097 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:16:09,097 EPOCH 7 done: loss 0.1945 - lr: 0.000051 |
|
2023-10-11 09:16:14,969 DEV : loss 0.20801755785942078 - f1-score (micro avg) 0.5008 |
|
2023-10-11 09:16:14,978 saving best model |
|
2023-10-11 09:16:17,554 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:16:25,728 epoch 8 - iter 13/136 - loss 0.19111255 - time (sec): 8.17 - samples/sec: 553.12 - lr: 0.000049 - momentum: 0.000000 |
|
2023-10-11 09:16:33,725 epoch 8 - iter 26/136 - loss 0.16263921 - time (sec): 16.17 - samples/sec: 555.57 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 09:16:42,962 epoch 8 - iter 39/136 - loss 0.17294011 - time (sec): 25.40 - samples/sec: 580.06 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-11 09:16:51,627 epoch 8 - iter 52/136 - loss 0.17653631 - time (sec): 34.07 - samples/sec: 583.04 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-11 09:17:00,030 epoch 8 - iter 65/136 - loss 0.17843550 - time (sec): 42.47 - samples/sec: 579.20 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 09:17:08,509 epoch 8 - iter 78/136 - loss 0.18239003 - time (sec): 50.95 - samples/sec: 579.74 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-11 09:17:17,131 epoch 8 - iter 91/136 - loss 0.17848480 - time (sec): 59.57 - samples/sec: 583.57 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 09:17:26,021 epoch 8 - iter 104/136 - loss 0.17535801 - time (sec): 68.46 - samples/sec: 589.57 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 09:17:34,586 epoch 8 - iter 117/136 - loss 0.17156513 - time (sec): 77.03 - samples/sec: 588.99 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-11 09:17:42,440 epoch 8 - iter 130/136 - loss 0.17067507 - time (sec): 84.88 - samples/sec: 586.87 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-11 09:17:46,087 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:17:46,088 EPOCH 8 done: loss 0.1690 - lr: 0.000035 |
|
2023-10-11 09:17:51,758 DEV : loss 0.2015739232301712 - f1-score (micro avg) 0.5357 |
|
2023-10-11 09:17:51,766 saving best model |
|
2023-10-11 09:17:54,288 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:18:02,209 epoch 9 - iter 13/136 - loss 0.19176870 - time (sec): 7.92 - samples/sec: 521.35 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 09:18:10,535 epoch 9 - iter 26/136 - loss 0.17373868 - time (sec): 16.24 - samples/sec: 544.69 - lr: 0.000031 - momentum: 0.000000 |
|
2023-10-11 09:18:19,288 epoch 9 - iter 39/136 - loss 0.16354683 - time (sec): 25.00 - samples/sec: 560.01 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 09:18:28,315 epoch 9 - iter 52/136 - loss 0.17111041 - time (sec): 34.02 - samples/sec: 549.24 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-11 09:18:37,045 epoch 9 - iter 65/136 - loss 0.17730382 - time (sec): 42.75 - samples/sec: 543.18 - lr: 0.000026 - momentum: 0.000000 |
|
2023-10-11 09:18:46,344 epoch 9 - iter 78/136 - loss 0.16675119 - time (sec): 52.05 - samples/sec: 550.14 - lr: 0.000024 - momentum: 0.000000 |
|
2023-10-11 09:18:54,829 epoch 9 - iter 91/136 - loss 0.16491876 - time (sec): 60.54 - samples/sec: 552.88 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 09:19:04,115 epoch 9 - iter 104/136 - loss 0.16242229 - time (sec): 69.82 - samples/sec: 562.19 - lr: 0.000021 - momentum: 0.000000 |
|
2023-10-11 09:19:13,031 epoch 9 - iter 117/136 - loss 0.15534520 - time (sec): 78.74 - samples/sec: 565.64 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 09:19:21,897 epoch 9 - iter 130/136 - loss 0.15219382 - time (sec): 87.61 - samples/sec: 566.79 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 09:19:25,761 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:19:25,762 EPOCH 9 done: loss 0.1517 - lr: 0.000018 |
|
2023-10-11 09:19:31,594 DEV : loss 0.1982397586107254 - f1-score (micro avg) 0.5445 |
|
2023-10-11 09:19:31,603 saving best model |
|
2023-10-11 09:19:34,155 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:19:43,245 epoch 10 - iter 13/136 - loss 0.16390125 - time (sec): 9.08 - samples/sec: 564.67 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 09:19:51,864 epoch 10 - iter 26/136 - loss 0.16014339 - time (sec): 17.70 - samples/sec: 552.92 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 09:20:00,312 epoch 10 - iter 39/136 - loss 0.15538690 - time (sec): 26.15 - samples/sec: 553.66 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 09:20:08,630 epoch 10 - iter 52/136 - loss 0.15425086 - time (sec): 34.47 - samples/sec: 549.87 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-11 09:20:17,912 epoch 10 - iter 65/136 - loss 0.15381241 - time (sec): 43.75 - samples/sec: 565.00 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-11 09:20:26,756 epoch 10 - iter 78/136 - loss 0.14518053 - time (sec): 52.60 - samples/sec: 567.82 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-11 09:20:35,335 epoch 10 - iter 91/136 - loss 0.14825107 - time (sec): 61.17 - samples/sec: 566.97 - lr: 0.000006 - momentum: 0.000000 |
|
2023-10-11 09:20:44,119 epoch 10 - iter 104/136 - loss 0.14892497 - time (sec): 69.96 - samples/sec: 566.39 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 09:20:52,990 epoch 10 - iter 117/136 - loss 0.14425945 - time (sec): 78.83 - samples/sec: 568.35 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-11 09:21:01,634 epoch 10 - iter 130/136 - loss 0.14459632 - time (sec): 87.47 - samples/sec: 567.41 - lr: 0.000001 - momentum: 0.000000 |
|
2023-10-11 09:21:05,521 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:21:05,521 EPOCH 10 done: loss 0.1445 - lr: 0.000001 |
|
2023-10-11 09:21:11,286 DEV : loss 0.19525763392448425 - f1-score (micro avg) 0.5471 |
|
2023-10-11 09:21:11,295 saving best model |
|
2023-10-11 09:21:14,711 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 09:21:14,713 Loading model from best epoch ... |
|
2023-10-11 09:21:18,624 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG |
|
2023-10-11 09:21:30,954 |
|
Results: |
|
- F-score (micro) 0.5107 |
|
- F-score (macro) 0.3264 |
|
- Accuracy 0.3794 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.5385 0.7179 0.6154 312 |
|
PER 0.4023 0.4952 0.4440 208 |
|
HumanProd 0.1860 0.3636 0.2462 22 |
|
ORG 0.0000 0.0000 0.0000 55 |
|
|
|
micro avg 0.4685 0.5611 0.5107 597 |
|
macro avg 0.2817 0.3942 0.3264 597 |
|
weighted avg 0.4284 0.5611 0.4854 597 |
|
|
|
2023-10-11 09:21:30,954 ---------------------------------------------------------------------------------------------------- |
|
|