2023-10-11 08:10:20,580 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,582 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 08:10:20,582 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,582 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-11 08:10:20,582 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,582 Train: 1085 sentences 2023-10-11 08:10:20,582 (train_with_dev=False, train_with_test=False) 2023-10-11 08:10:20,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,583 Training Params: 2023-10-11 08:10:20,583 - learning_rate: "0.00016" 2023-10-11 08:10:20,583 - mini_batch_size: "8" 2023-10-11 08:10:20,583 - max_epochs: "10" 2023-10-11 08:10:20,583 - shuffle: "True" 2023-10-11 08:10:20,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,583 Plugins: 2023-10-11 08:10:20,583 - TensorboardLogger 2023-10-11 08:10:20,583 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 08:10:20,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,583 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 08:10:20,583 - metric: "('micro avg', 'f1-score')" 2023-10-11 08:10:20,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,584 Computation: 2023-10-11 08:10:20,584 - compute on device: cuda:0 2023-10-11 08:10:20,584 - embedding storage: none 2023-10-11 08:10:20,584 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,584 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-11 08:10:20,584 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,584 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:20,584 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 08:10:29,004 epoch 1 - iter 13/136 - loss 2.83008951 - time (sec): 8.42 - samples/sec: 585.64 - lr: 0.000014 - momentum: 0.000000 2023-10-11 08:10:37,673 epoch 1 - iter 26/136 - loss 2.82368887 - time (sec): 17.09 - samples/sec: 600.40 - lr: 0.000029 - momentum: 0.000000 2023-10-11 08:10:46,719 epoch 1 - iter 39/136 - loss 2.81340369 - time (sec): 26.13 - samples/sec: 596.28 - lr: 0.000045 - momentum: 0.000000 2023-10-11 08:10:55,351 epoch 1 - iter 52/136 - loss 2.79752564 - time (sec): 34.77 - samples/sec: 594.06 - lr: 0.000060 - momentum: 0.000000 2023-10-11 08:11:03,384 epoch 1 - iter 65/136 - loss 2.77294396 - time (sec): 42.80 - samples/sec: 582.92 - lr: 0.000075 - momentum: 0.000000 2023-10-11 08:11:12,489 epoch 1 - iter 78/136 - loss 2.72088276 - time (sec): 51.90 - samples/sec: 584.10 - lr: 0.000091 - momentum: 0.000000 2023-10-11 08:11:21,313 epoch 1 - iter 91/136 - loss 2.65722387 - time (sec): 60.73 - samples/sec: 580.82 - lr: 0.000106 - momentum: 0.000000 2023-10-11 08:11:30,471 epoch 1 - iter 104/136 - loss 2.57792535 - time (sec): 69.89 - samples/sec: 581.99 - lr: 0.000121 - momentum: 0.000000 2023-10-11 08:11:39,124 epoch 1 - iter 117/136 - loss 2.49490382 - time (sec): 78.54 - samples/sec: 584.23 - lr: 0.000136 - momentum: 0.000000 2023-10-11 08:11:47,132 epoch 1 - iter 130/136 - loss 2.42291945 - time (sec): 86.55 - samples/sec: 580.68 - lr: 0.000152 - momentum: 0.000000 2023-10-11 08:11:50,587 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:11:50,588 EPOCH 1 done: loss 2.3906 - lr: 0.000152 2023-10-11 08:11:55,498 DEV : loss 1.359419345855713 - f1-score (micro avg) 0.0 2023-10-11 08:11:55,507 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:12:03,838 epoch 2 - iter 13/136 - loss 1.34412946 - time (sec): 8.33 - samples/sec: 563.42 - lr: 0.000158 - momentum: 0.000000 2023-10-11 08:12:12,311 epoch 2 - iter 26/136 - loss 1.24937968 - time (sec): 16.80 - samples/sec: 571.46 - lr: 0.000157 - momentum: 0.000000 2023-10-11 08:12:21,109 epoch 2 - iter 39/136 - loss 1.17010663 - time (sec): 25.60 - samples/sec: 587.53 - lr: 0.000155 - momentum: 0.000000 2023-10-11 08:12:29,655 epoch 2 - iter 52/136 - loss 1.07940129 - time (sec): 34.15 - samples/sec: 591.20 - lr: 0.000153 - momentum: 0.000000 2023-10-11 08:12:37,608 epoch 2 - iter 65/136 - loss 1.02694391 - time (sec): 42.10 - samples/sec: 579.66 - lr: 0.000152 - momentum: 0.000000 2023-10-11 08:12:46,472 epoch 2 - iter 78/136 - loss 0.97987925 - time (sec): 50.96 - samples/sec: 587.74 - lr: 0.000150 - momentum: 0.000000 2023-10-11 08:12:55,008 epoch 2 - iter 91/136 - loss 0.93022685 - time (sec): 59.50 - samples/sec: 585.55 - lr: 0.000148 - momentum: 0.000000 2023-10-11 08:13:03,896 epoch 2 - iter 104/136 - loss 0.88253436 - time (sec): 68.39 - samples/sec: 587.50 - lr: 0.000147 - momentum: 0.000000 2023-10-11 08:13:12,864 epoch 2 - iter 117/136 - loss 0.83798555 - time (sec): 77.36 - samples/sec: 587.97 - lr: 0.000145 - momentum: 0.000000 2023-10-11 08:13:21,277 epoch 2 - iter 130/136 - loss 0.80219015 - time (sec): 85.77 - samples/sec: 585.96 - lr: 0.000143 - momentum: 0.000000 2023-10-11 08:13:24,864 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:13:24,864 EPOCH 2 done: loss 0.7897 - lr: 0.000143 2023-10-11 08:13:30,879 DEV : loss 0.3818140923976898 - f1-score (micro avg) 0.0 2023-10-11 08:13:30,887 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:13:38,973 epoch 3 - iter 13/136 - loss 0.43805734 - time (sec): 8.08 - samples/sec: 479.35 - lr: 0.000141 - momentum: 0.000000 2023-10-11 08:13:47,955 epoch 3 - iter 26/136 - loss 0.41234292 - time (sec): 17.07 - samples/sec: 518.27 - lr: 0.000139 - momentum: 0.000000 2023-10-11 08:13:56,633 epoch 3 - iter 39/136 - loss 0.40579529 - time (sec): 25.74 - samples/sec: 537.68 - lr: 0.000137 - momentum: 0.000000 2023-10-11 08:14:05,469 epoch 3 - iter 52/136 - loss 0.39245856 - time (sec): 34.58 - samples/sec: 544.04 - lr: 0.000136 - momentum: 0.000000 2023-10-11 08:14:14,347 epoch 3 - iter 65/136 - loss 0.39994977 - time (sec): 43.46 - samples/sec: 552.71 - lr: 0.000134 - momentum: 0.000000 2023-10-11 08:14:22,952 epoch 3 - iter 78/136 - loss 0.38752465 - time (sec): 52.06 - samples/sec: 555.44 - lr: 0.000132 - momentum: 0.000000 2023-10-11 08:14:32,366 epoch 3 - iter 91/136 - loss 0.38301996 - time (sec): 61.48 - samples/sec: 564.22 - lr: 0.000131 - momentum: 0.000000 2023-10-11 08:14:40,970 epoch 3 - iter 104/136 - loss 0.38396675 - time (sec): 70.08 - samples/sec: 564.65 - lr: 0.000129 - momentum: 0.000000 2023-10-11 08:14:50,261 epoch 3 - iter 117/136 - loss 0.37265386 - time (sec): 79.37 - samples/sec: 569.43 - lr: 0.000127 - momentum: 0.000000 2023-10-11 08:14:58,659 epoch 3 - iter 130/136 - loss 0.36462475 - time (sec): 87.77 - samples/sec: 568.86 - lr: 0.000126 - momentum: 0.000000 2023-10-11 08:15:02,285 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:15:02,285 EPOCH 3 done: loss 0.3658 - lr: 0.000126 2023-10-11 08:15:08,253 DEV : loss 0.2684977948665619 - f1-score (micro avg) 0.3173 2023-10-11 08:15:08,261 saving best model 2023-10-11 08:15:09,130 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:15:17,061 epoch 4 - iter 13/136 - loss 0.33221349 - time (sec): 7.93 - samples/sec: 539.80 - lr: 0.000123 - momentum: 0.000000 2023-10-11 08:15:26,385 epoch 4 - iter 26/136 - loss 0.30977777 - time (sec): 17.25 - samples/sec: 595.15 - lr: 0.000121 - momentum: 0.000000 2023-10-11 08:15:34,738 epoch 4 - iter 39/136 - loss 0.30789013 - time (sec): 25.61 - samples/sec: 588.82 - lr: 0.000120 - momentum: 0.000000 2023-10-11 08:15:43,242 epoch 4 - iter 52/136 - loss 0.29487947 - time (sec): 34.11 - samples/sec: 587.66 - lr: 0.000118 - momentum: 0.000000 2023-10-11 08:15:51,970 epoch 4 - iter 65/136 - loss 0.27112810 - time (sec): 42.84 - samples/sec: 595.38 - lr: 0.000116 - momentum: 0.000000 2023-10-11 08:16:00,574 epoch 4 - iter 78/136 - loss 0.27205837 - time (sec): 51.44 - samples/sec: 590.14 - lr: 0.000115 - momentum: 0.000000 2023-10-11 08:16:09,933 epoch 4 - iter 91/136 - loss 0.26009433 - time (sec): 60.80 - samples/sec: 593.62 - lr: 0.000113 - momentum: 0.000000 2023-10-11 08:16:18,318 epoch 4 - iter 104/136 - loss 0.25642960 - time (sec): 69.19 - samples/sec: 588.68 - lr: 0.000111 - momentum: 0.000000 2023-10-11 08:16:27,314 epoch 4 - iter 117/136 - loss 0.26043586 - time (sec): 78.18 - samples/sec: 587.27 - lr: 0.000109 - momentum: 0.000000 2023-10-11 08:16:35,665 epoch 4 - iter 130/136 - loss 0.26681548 - time (sec): 86.53 - samples/sec: 585.45 - lr: 0.000108 - momentum: 0.000000 2023-10-11 08:16:38,680 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:16:38,681 EPOCH 4 done: loss 0.2657 - lr: 0.000108 2023-10-11 08:16:44,299 DEV : loss 0.2195644974708557 - f1-score (micro avg) 0.4686 2023-10-11 08:16:44,307 saving best model 2023-10-11 08:16:46,842 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:16:55,497 epoch 5 - iter 13/136 - loss 0.19784218 - time (sec): 8.65 - samples/sec: 614.63 - lr: 0.000105 - momentum: 0.000000 2023-10-11 08:17:03,906 epoch 5 - iter 26/136 - loss 0.20836578 - time (sec): 17.06 - samples/sec: 576.51 - lr: 0.000104 - momentum: 0.000000 2023-10-11 08:17:12,106 epoch 5 - iter 39/136 - loss 0.22776639 - time (sec): 25.26 - samples/sec: 563.39 - lr: 0.000102 - momentum: 0.000000 2023-10-11 08:17:20,882 epoch 5 - iter 52/136 - loss 0.21597829 - time (sec): 34.04 - samples/sec: 571.63 - lr: 0.000100 - momentum: 0.000000 2023-10-11 08:17:29,434 epoch 5 - iter 65/136 - loss 0.21210043 - time (sec): 42.59 - samples/sec: 573.83 - lr: 0.000099 - momentum: 0.000000 2023-10-11 08:17:38,652 epoch 5 - iter 78/136 - loss 0.21109159 - time (sec): 51.81 - samples/sec: 565.16 - lr: 0.000097 - momentum: 0.000000 2023-10-11 08:17:47,133 epoch 5 - iter 91/136 - loss 0.20674644 - time (sec): 60.29 - samples/sec: 559.84 - lr: 0.000095 - momentum: 0.000000 2023-10-11 08:17:56,654 epoch 5 - iter 104/136 - loss 0.20252494 - time (sec): 69.81 - samples/sec: 566.80 - lr: 0.000093 - momentum: 0.000000 2023-10-11 08:18:05,375 epoch 5 - iter 117/136 - loss 0.20014526 - time (sec): 78.53 - samples/sec: 564.73 - lr: 0.000092 - momentum: 0.000000 2023-10-11 08:18:15,479 epoch 5 - iter 130/136 - loss 0.19880601 - time (sec): 88.63 - samples/sec: 560.94 - lr: 0.000090 - momentum: 0.000000 2023-10-11 08:18:19,494 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:18:19,494 EPOCH 5 done: loss 0.1981 - lr: 0.000090 2023-10-11 08:18:25,593 DEV : loss 0.18384359776973724 - f1-score (micro avg) 0.6234 2023-10-11 08:18:25,602 saving best model 2023-10-11 08:18:28,201 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:18:37,304 epoch 6 - iter 13/136 - loss 0.14650725 - time (sec): 9.10 - samples/sec: 556.15 - lr: 0.000088 - momentum: 0.000000 2023-10-11 08:18:46,013 epoch 6 - iter 26/136 - loss 0.16471832 - time (sec): 17.81 - samples/sec: 533.37 - lr: 0.000086 - momentum: 0.000000 2023-10-11 08:18:55,340 epoch 6 - iter 39/136 - loss 0.17446982 - time (sec): 27.13 - samples/sec: 554.95 - lr: 0.000084 - momentum: 0.000000 2023-10-11 08:19:04,064 epoch 6 - iter 52/136 - loss 0.16576348 - time (sec): 35.86 - samples/sec: 554.79 - lr: 0.000083 - momentum: 0.000000 2023-10-11 08:19:12,449 epoch 6 - iter 65/136 - loss 0.16736478 - time (sec): 44.24 - samples/sec: 546.55 - lr: 0.000081 - momentum: 0.000000 2023-10-11 08:19:21,134 epoch 6 - iter 78/136 - loss 0.16608000 - time (sec): 52.93 - samples/sec: 546.61 - lr: 0.000079 - momentum: 0.000000 2023-10-11 08:19:29,900 epoch 6 - iter 91/136 - loss 0.16395224 - time (sec): 61.69 - samples/sec: 546.16 - lr: 0.000077 - momentum: 0.000000 2023-10-11 08:19:38,795 epoch 6 - iter 104/136 - loss 0.16307175 - time (sec): 70.59 - samples/sec: 548.09 - lr: 0.000076 - momentum: 0.000000 2023-10-11 08:19:48,719 epoch 6 - iter 117/136 - loss 0.15725774 - time (sec): 80.51 - samples/sec: 556.95 - lr: 0.000074 - momentum: 0.000000 2023-10-11 08:19:56,956 epoch 6 - iter 130/136 - loss 0.15389111 - time (sec): 88.75 - samples/sec: 554.58 - lr: 0.000072 - momentum: 0.000000 2023-10-11 08:20:01,203 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:20:01,204 EPOCH 6 done: loss 0.1515 - lr: 0.000072 2023-10-11 08:20:07,027 DEV : loss 0.16635040938854218 - f1-score (micro avg) 0.6201 2023-10-11 08:20:07,036 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:20:14,907 epoch 7 - iter 13/136 - loss 0.15338606 - time (sec): 7.87 - samples/sec: 471.84 - lr: 0.000070 - momentum: 0.000000 2023-10-11 08:20:24,730 epoch 7 - iter 26/136 - loss 0.14255855 - time (sec): 17.69 - samples/sec: 571.99 - lr: 0.000068 - momentum: 0.000000 2023-10-11 08:20:33,927 epoch 7 - iter 39/136 - loss 0.12679887 - time (sec): 26.89 - samples/sec: 579.04 - lr: 0.000067 - momentum: 0.000000 2023-10-11 08:20:42,728 epoch 7 - iter 52/136 - loss 0.13133836 - time (sec): 35.69 - samples/sec: 577.13 - lr: 0.000065 - momentum: 0.000000 2023-10-11 08:20:51,857 epoch 7 - iter 65/136 - loss 0.13057029 - time (sec): 44.82 - samples/sec: 579.70 - lr: 0.000063 - momentum: 0.000000 2023-10-11 08:21:00,459 epoch 7 - iter 78/136 - loss 0.12851078 - time (sec): 53.42 - samples/sec: 576.71 - lr: 0.000061 - momentum: 0.000000 2023-10-11 08:21:09,029 epoch 7 - iter 91/136 - loss 0.12530179 - time (sec): 61.99 - samples/sec: 575.03 - lr: 0.000060 - momentum: 0.000000 2023-10-11 08:21:17,073 epoch 7 - iter 104/136 - loss 0.12324245 - time (sec): 70.04 - samples/sec: 568.12 - lr: 0.000058 - momentum: 0.000000 2023-10-11 08:21:25,676 epoch 7 - iter 117/136 - loss 0.12249819 - time (sec): 78.64 - samples/sec: 569.42 - lr: 0.000056 - momentum: 0.000000 2023-10-11 08:21:34,239 epoch 7 - iter 130/136 - loss 0.11928789 - time (sec): 87.20 - samples/sec: 568.99 - lr: 0.000055 - momentum: 0.000000 2023-10-11 08:21:38,203 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:21:38,203 EPOCH 7 done: loss 0.1190 - lr: 0.000055 2023-10-11 08:21:44,012 DEV : loss 0.15652361512184143 - f1-score (micro avg) 0.6535 2023-10-11 08:21:44,021 saving best model 2023-10-11 08:21:46,574 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:21:55,242 epoch 8 - iter 13/136 - loss 0.11592295 - time (sec): 8.66 - samples/sec: 569.40 - lr: 0.000052 - momentum: 0.000000 2023-10-11 08:22:03,447 epoch 8 - iter 26/136 - loss 0.10352624 - time (sec): 16.87 - samples/sec: 566.94 - lr: 0.000051 - momentum: 0.000000 2023-10-11 08:22:12,831 epoch 8 - iter 39/136 - loss 0.11076953 - time (sec): 26.25 - samples/sec: 586.80 - lr: 0.000049 - momentum: 0.000000 2023-10-11 08:22:21,315 epoch 8 - iter 52/136 - loss 0.10546970 - time (sec): 34.74 - samples/sec: 571.88 - lr: 0.000047 - momentum: 0.000000 2023-10-11 08:22:30,565 epoch 8 - iter 65/136 - loss 0.10215561 - time (sec): 43.99 - samples/sec: 572.24 - lr: 0.000045 - momentum: 0.000000 2023-10-11 08:22:39,732 epoch 8 - iter 78/136 - loss 0.10068370 - time (sec): 53.15 - samples/sec: 569.00 - lr: 0.000044 - momentum: 0.000000 2023-10-11 08:22:48,381 epoch 8 - iter 91/136 - loss 0.10082195 - time (sec): 61.80 - samples/sec: 563.30 - lr: 0.000042 - momentum: 0.000000 2023-10-11 08:22:57,403 epoch 8 - iter 104/136 - loss 0.09810977 - time (sec): 70.82 - samples/sec: 564.68 - lr: 0.000040 - momentum: 0.000000 2023-10-11 08:23:05,932 epoch 8 - iter 117/136 - loss 0.09729601 - time (sec): 79.35 - samples/sec: 560.82 - lr: 0.000039 - momentum: 0.000000 2023-10-11 08:23:15,356 epoch 8 - iter 130/136 - loss 0.09850851 - time (sec): 88.78 - samples/sec: 563.38 - lr: 0.000037 - momentum: 0.000000 2023-10-11 08:23:19,046 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:23:19,047 EPOCH 8 done: loss 0.0980 - lr: 0.000037 2023-10-11 08:23:25,011 DEV : loss 0.14729855954647064 - f1-score (micro avg) 0.6524 2023-10-11 08:23:25,019 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:23:32,916 epoch 9 - iter 13/136 - loss 0.08399411 - time (sec): 7.89 - samples/sec: 522.49 - lr: 0.000034 - momentum: 0.000000 2023-10-11 08:23:42,217 epoch 9 - iter 26/136 - loss 0.07704439 - time (sec): 17.20 - samples/sec: 578.09 - lr: 0.000033 - momentum: 0.000000 2023-10-11 08:23:50,654 epoch 9 - iter 39/136 - loss 0.08056037 - time (sec): 25.63 - samples/sec: 578.79 - lr: 0.000031 - momentum: 0.000000 2023-10-11 08:23:59,442 epoch 9 - iter 52/136 - loss 0.08098512 - time (sec): 34.42 - samples/sec: 576.92 - lr: 0.000029 - momentum: 0.000000 2023-10-11 08:24:08,244 epoch 9 - iter 65/136 - loss 0.08177950 - time (sec): 43.22 - samples/sec: 578.77 - lr: 0.000028 - momentum: 0.000000 2023-10-11 08:24:17,310 epoch 9 - iter 78/136 - loss 0.08167083 - time (sec): 52.29 - samples/sec: 583.52 - lr: 0.000026 - momentum: 0.000000 2023-10-11 08:24:26,105 epoch 9 - iter 91/136 - loss 0.08275593 - time (sec): 61.08 - samples/sec: 578.63 - lr: 0.000024 - momentum: 0.000000 2023-10-11 08:24:34,647 epoch 9 - iter 104/136 - loss 0.08395371 - time (sec): 69.63 - samples/sec: 572.54 - lr: 0.000023 - momentum: 0.000000 2023-10-11 08:24:43,409 epoch 9 - iter 117/136 - loss 0.08520719 - time (sec): 78.39 - samples/sec: 566.16 - lr: 0.000021 - momentum: 0.000000 2023-10-11 08:24:52,367 epoch 9 - iter 130/136 - loss 0.08850575 - time (sec): 87.35 - samples/sec: 568.06 - lr: 0.000019 - momentum: 0.000000 2023-10-11 08:24:56,600 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:24:56,601 EPOCH 9 done: loss 0.0874 - lr: 0.000019 2023-10-11 08:25:02,537 DEV : loss 0.14513665437698364 - f1-score (micro avg) 0.6908 2023-10-11 08:25:02,545 saving best model 2023-10-11 08:25:05,116 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:25:13,810 epoch 10 - iter 13/136 - loss 0.06488614 - time (sec): 8.69 - samples/sec: 535.05 - lr: 0.000017 - momentum: 0.000000 2023-10-11 08:25:22,179 epoch 10 - iter 26/136 - loss 0.06997266 - time (sec): 17.06 - samples/sec: 511.30 - lr: 0.000015 - momentum: 0.000000 2023-10-11 08:25:30,800 epoch 10 - iter 39/136 - loss 0.07622472 - time (sec): 25.68 - samples/sec: 510.60 - lr: 0.000013 - momentum: 0.000000 2023-10-11 08:25:40,000 epoch 10 - iter 52/136 - loss 0.07291127 - time (sec): 34.88 - samples/sec: 527.10 - lr: 0.000012 - momentum: 0.000000 2023-10-11 08:25:49,353 epoch 10 - iter 65/136 - loss 0.07368018 - time (sec): 44.23 - samples/sec: 546.46 - lr: 0.000010 - momentum: 0.000000 2023-10-11 08:25:59,836 epoch 10 - iter 78/136 - loss 0.07542636 - time (sec): 54.72 - samples/sec: 565.14 - lr: 0.000008 - momentum: 0.000000 2023-10-11 08:26:09,182 epoch 10 - iter 91/136 - loss 0.07656023 - time (sec): 64.06 - samples/sec: 568.36 - lr: 0.000007 - momentum: 0.000000 2023-10-11 08:26:18,254 epoch 10 - iter 104/136 - loss 0.07981575 - time (sec): 73.13 - samples/sec: 558.65 - lr: 0.000005 - momentum: 0.000000 2023-10-11 08:26:26,888 epoch 10 - iter 117/136 - loss 0.07941501 - time (sec): 81.77 - samples/sec: 554.61 - lr: 0.000003 - momentum: 0.000000 2023-10-11 08:26:35,531 epoch 10 - iter 130/136 - loss 0.07955517 - time (sec): 90.41 - samples/sec: 550.38 - lr: 0.000002 - momentum: 0.000000 2023-10-11 08:26:39,290 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:26:39,290 EPOCH 10 done: loss 0.0802 - lr: 0.000002 2023-10-11 08:26:45,344 DEV : loss 0.14399686455726624 - f1-score (micro avg) 0.7063 2023-10-11 08:26:45,353 saving best model 2023-10-11 08:26:52,577 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:26:52,579 Loading model from best epoch ... 2023-10-11 08:26:57,447 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-11 08:27:09,616 Results: - F-score (micro) 0.6682 - F-score (macro) 0.4708 - Accuracy 0.5556 By class: precision recall f1-score support LOC 0.6383 0.8654 0.7347 312 PER 0.7249 0.6587 0.6902 208 HumanProd 0.2931 0.7727 0.4250 22 ORG 0.2000 0.0182 0.0333 55 micro avg 0.6296 0.7119 0.6682 597 macro avg 0.4641 0.5787 0.4708 597 weighted avg 0.6154 0.7119 0.6432 597 2023-10-11 08:27:09,617 ----------------------------------------------------------------------------------------------------