|
2023-10-11 11:59:51,169 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,171 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 11:59:51,171 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,171 MultiCorpus: 1085 train + 148 dev + 364 test sentences |
|
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator |
|
2023-10-11 11:59:51,171 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,172 Train: 1085 sentences |
|
2023-10-11 11:59:51,172 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 11:59:51,172 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,172 Training Params: |
|
2023-10-11 11:59:51,172 - learning_rate: "0.00015" |
|
2023-10-11 11:59:51,172 - mini_batch_size: "4" |
|
2023-10-11 11:59:51,172 - max_epochs: "10" |
|
2023-10-11 11:59:51,172 - shuffle: "True" |
|
2023-10-11 11:59:51,172 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,172 Plugins: |
|
2023-10-11 11:59:51,172 - TensorboardLogger |
|
2023-10-11 11:59:51,172 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 11:59:51,172 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,172 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 11:59:51,172 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 11:59:51,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,173 Computation: |
|
2023-10-11 11:59:51,173 - compute on device: cuda:0 |
|
2023-10-11 11:59:51,173 - embedding storage: none |
|
2023-10-11 11:59:51,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,173 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" |
|
2023-10-11 11:59:51,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,173 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:59:51,173 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 12:00:00,746 epoch 1 - iter 27/272 - loss 2.83808245 - time (sec): 9.57 - samples/sec: 582.50 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 12:00:10,911 epoch 1 - iter 54/272 - loss 2.83168516 - time (sec): 19.74 - samples/sec: 577.07 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 12:00:20,306 epoch 1 - iter 81/272 - loss 2.81100836 - time (sec): 29.13 - samples/sec: 572.17 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-11 12:00:29,430 epoch 1 - iter 108/272 - loss 2.76093772 - time (sec): 38.26 - samples/sec: 568.91 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 12:00:38,439 epoch 1 - iter 135/272 - loss 2.68694812 - time (sec): 47.26 - samples/sec: 557.33 - lr: 0.000074 - momentum: 0.000000 |
|
2023-10-11 12:00:47,327 epoch 1 - iter 162/272 - loss 2.59395144 - time (sec): 56.15 - samples/sec: 553.86 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 12:00:56,040 epoch 1 - iter 189/272 - loss 2.49110800 - time (sec): 64.86 - samples/sec: 551.64 - lr: 0.000104 - momentum: 0.000000 |
|
2023-10-11 12:01:05,617 epoch 1 - iter 216/272 - loss 2.37091645 - time (sec): 74.44 - samples/sec: 558.22 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 12:01:14,799 epoch 1 - iter 243/272 - loss 2.25247641 - time (sec): 83.62 - samples/sec: 556.31 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 12:01:24,116 epoch 1 - iter 270/272 - loss 2.12945547 - time (sec): 92.94 - samples/sec: 555.84 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 12:01:24,641 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:01:24,641 EPOCH 1 done: loss 2.1220 - lr: 0.000148 |
|
2023-10-11 12:01:29,446 DEV : loss 0.787132978439331 - f1-score (micro avg) 0.0 |
|
2023-10-11 12:01:29,453 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:01:38,673 epoch 2 - iter 27/272 - loss 0.76434729 - time (sec): 9.22 - samples/sec: 580.18 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 12:01:47,492 epoch 2 - iter 54/272 - loss 0.73284299 - time (sec): 18.04 - samples/sec: 560.18 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-11 12:01:55,521 epoch 2 - iter 81/272 - loss 0.69848232 - time (sec): 26.07 - samples/sec: 536.84 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-11 12:02:05,664 epoch 2 - iter 108/272 - loss 0.64417120 - time (sec): 36.21 - samples/sec: 558.51 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-11 12:02:15,021 epoch 2 - iter 135/272 - loss 0.61592547 - time (sec): 45.57 - samples/sec: 553.77 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 12:02:24,945 epoch 2 - iter 162/272 - loss 0.56608904 - time (sec): 55.49 - samples/sec: 560.30 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-11 12:02:33,341 epoch 2 - iter 189/272 - loss 0.54253010 - time (sec): 63.89 - samples/sec: 553.92 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-11 12:02:42,502 epoch 2 - iter 216/272 - loss 0.51543417 - time (sec): 73.05 - samples/sec: 552.83 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 12:02:51,397 epoch 2 - iter 243/272 - loss 0.50075965 - time (sec): 81.94 - samples/sec: 551.15 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 12:03:01,869 epoch 2 - iter 270/272 - loss 0.48128104 - time (sec): 92.41 - samples/sec: 560.52 - lr: 0.000134 - momentum: 0.000000 |
|
2023-10-11 12:03:02,288 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:03:02,288 EPOCH 2 done: loss 0.4809 - lr: 0.000134 |
|
2023-10-11 12:03:08,051 DEV : loss 0.2902776598930359 - f1-score (micro avg) 0.3249 |
|
2023-10-11 12:03:08,059 saving best model |
|
2023-10-11 12:03:08,905 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:03:18,487 epoch 3 - iter 27/272 - loss 0.25324371 - time (sec): 9.58 - samples/sec: 532.44 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 12:03:28,503 epoch 3 - iter 54/272 - loss 0.29239049 - time (sec): 19.60 - samples/sec: 552.15 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 12:03:38,111 epoch 3 - iter 81/272 - loss 0.28983456 - time (sec): 29.20 - samples/sec: 552.59 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 12:03:47,503 epoch 3 - iter 108/272 - loss 0.28946884 - time (sec): 38.60 - samples/sec: 544.15 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 12:03:57,218 epoch 3 - iter 135/272 - loss 0.28735415 - time (sec): 48.31 - samples/sec: 542.24 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 12:04:06,661 epoch 3 - iter 162/272 - loss 0.29499243 - time (sec): 57.75 - samples/sec: 542.78 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 12:04:16,599 epoch 3 - iter 189/272 - loss 0.29299739 - time (sec): 67.69 - samples/sec: 546.15 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-11 12:04:25,978 epoch 3 - iter 216/272 - loss 0.28805671 - time (sec): 77.07 - samples/sec: 543.70 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-11 12:04:35,125 epoch 3 - iter 243/272 - loss 0.29343930 - time (sec): 86.22 - samples/sec: 543.21 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 12:04:44,272 epoch 3 - iter 270/272 - loss 0.28780910 - time (sec): 95.37 - samples/sec: 541.74 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 12:04:44,825 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:04:44,825 EPOCH 3 done: loss 0.2863 - lr: 0.000117 |
|
2023-10-11 12:04:50,364 DEV : loss 0.21922904253005981 - f1-score (micro avg) 0.5105 |
|
2023-10-11 12:04:50,372 saving best model |
|
2023-10-11 12:04:52,880 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:05:02,022 epoch 4 - iter 27/272 - loss 0.21394877 - time (sec): 9.14 - samples/sec: 533.16 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-11 12:05:11,546 epoch 4 - iter 54/272 - loss 0.17957342 - time (sec): 18.66 - samples/sec: 554.65 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-11 12:05:21,430 epoch 4 - iter 81/272 - loss 0.20307451 - time (sec): 28.55 - samples/sec: 568.62 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 12:05:30,670 epoch 4 - iter 108/272 - loss 0.19980419 - time (sec): 37.79 - samples/sec: 568.55 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 12:05:40,288 epoch 4 - iter 135/272 - loss 0.18968939 - time (sec): 47.40 - samples/sec: 570.33 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-11 12:05:49,026 epoch 4 - iter 162/272 - loss 0.19031204 - time (sec): 56.14 - samples/sec: 561.51 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 12:05:58,645 epoch 4 - iter 189/272 - loss 0.19085205 - time (sec): 65.76 - samples/sec: 564.64 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 12:06:07,826 epoch 4 - iter 216/272 - loss 0.18864700 - time (sec): 74.94 - samples/sec: 557.99 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 12:06:16,802 epoch 4 - iter 243/272 - loss 0.18605938 - time (sec): 83.92 - samples/sec: 553.83 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-11 12:06:26,347 epoch 4 - iter 270/272 - loss 0.18742508 - time (sec): 93.46 - samples/sec: 554.00 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 12:06:26,780 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:06:26,780 EPOCH 4 done: loss 0.1869 - lr: 0.000100 |
|
2023-10-11 12:06:32,242 DEV : loss 0.15904375910758972 - f1-score (micro avg) 0.6264 |
|
2023-10-11 12:06:32,250 saving best model |
|
2023-10-11 12:06:34,755 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:06:44,289 epoch 5 - iter 27/272 - loss 0.13640380 - time (sec): 9.53 - samples/sec: 588.07 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 12:06:53,372 epoch 5 - iter 54/272 - loss 0.14436985 - time (sec): 18.61 - samples/sec: 564.88 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-11 12:07:03,111 epoch 5 - iter 81/272 - loss 0.15175419 - time (sec): 28.35 - samples/sec: 574.28 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 12:07:13,161 epoch 5 - iter 108/272 - loss 0.14004454 - time (sec): 38.40 - samples/sec: 576.06 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 12:07:22,370 epoch 5 - iter 135/272 - loss 0.13536960 - time (sec): 47.61 - samples/sec: 569.30 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-11 12:07:31,360 epoch 5 - iter 162/272 - loss 0.13210074 - time (sec): 56.60 - samples/sec: 561.25 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 12:07:40,344 epoch 5 - iter 189/272 - loss 0.12674601 - time (sec): 65.58 - samples/sec: 555.01 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-11 12:07:49,701 epoch 5 - iter 216/272 - loss 0.12421874 - time (sec): 74.94 - samples/sec: 555.07 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 12:07:59,031 epoch 5 - iter 243/272 - loss 0.12779456 - time (sec): 84.27 - samples/sec: 556.30 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 12:08:07,940 epoch 5 - iter 270/272 - loss 0.12379983 - time (sec): 93.18 - samples/sec: 554.13 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 12:08:08,508 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:08:08,508 EPOCH 5 done: loss 0.1239 - lr: 0.000084 |
|
2023-10-11 12:08:14,112 DEV : loss 0.14260436594486237 - f1-score (micro avg) 0.6396 |
|
2023-10-11 12:08:14,120 saving best model |
|
2023-10-11 12:08:16,636 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:08:25,967 epoch 6 - iter 27/272 - loss 0.09081814 - time (sec): 9.33 - samples/sec: 549.95 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 12:08:35,234 epoch 6 - iter 54/272 - loss 0.09044964 - time (sec): 18.59 - samples/sec: 549.29 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 12:08:45,497 epoch 6 - iter 81/272 - loss 0.08657561 - time (sec): 28.86 - samples/sec: 574.01 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 12:08:54,116 epoch 6 - iter 108/272 - loss 0.08572893 - time (sec): 37.48 - samples/sec: 565.04 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 12:09:02,994 epoch 6 - iter 135/272 - loss 0.08374920 - time (sec): 46.35 - samples/sec: 559.29 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 12:09:12,371 epoch 6 - iter 162/272 - loss 0.08312497 - time (sec): 55.73 - samples/sec: 558.08 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 12:09:21,244 epoch 6 - iter 189/272 - loss 0.08859240 - time (sec): 64.60 - samples/sec: 553.43 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-11 12:09:30,609 epoch 6 - iter 216/272 - loss 0.08839713 - time (sec): 73.97 - samples/sec: 553.75 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-11 12:09:40,425 epoch 6 - iter 243/272 - loss 0.08829484 - time (sec): 83.78 - samples/sec: 556.94 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 12:09:49,758 epoch 6 - iter 270/272 - loss 0.08896194 - time (sec): 93.12 - samples/sec: 556.03 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-11 12:09:50,169 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:09:50,169 EPOCH 6 done: loss 0.0886 - lr: 0.000067 |
|
2023-10-11 12:09:55,682 DEV : loss 0.13805799186229706 - f1-score (micro avg) 0.7097 |
|
2023-10-11 12:09:55,690 saving best model |
|
2023-10-11 12:09:58,183 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:10:07,627 epoch 7 - iter 27/272 - loss 0.06053447 - time (sec): 9.44 - samples/sec: 515.26 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-11 12:10:16,495 epoch 7 - iter 54/272 - loss 0.07179566 - time (sec): 18.31 - samples/sec: 509.11 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 12:10:25,250 epoch 7 - iter 81/272 - loss 0.06709301 - time (sec): 27.06 - samples/sec: 512.22 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 12:10:34,721 epoch 7 - iter 108/272 - loss 0.06756713 - time (sec): 36.53 - samples/sec: 522.78 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 12:10:44,611 epoch 7 - iter 135/272 - loss 0.06376829 - time (sec): 46.42 - samples/sec: 532.50 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-11 12:10:54,644 epoch 7 - iter 162/272 - loss 0.06218548 - time (sec): 56.46 - samples/sec: 540.48 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 12:11:04,197 epoch 7 - iter 189/272 - loss 0.06719311 - time (sec): 66.01 - samples/sec: 538.29 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 12:11:13,197 epoch 7 - iter 216/272 - loss 0.06548144 - time (sec): 75.01 - samples/sec: 529.93 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-11 12:11:23,292 epoch 7 - iter 243/272 - loss 0.06733417 - time (sec): 85.11 - samples/sec: 537.18 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 12:11:33,454 epoch 7 - iter 270/272 - loss 0.06807975 - time (sec): 95.27 - samples/sec: 543.97 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 12:11:33,852 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:11:33,852 EPOCH 7 done: loss 0.0682 - lr: 0.000050 |
|
2023-10-11 12:11:39,581 DEV : loss 0.13298040628433228 - f1-score (micro avg) 0.7486 |
|
2023-10-11 12:11:39,590 saving best model |
|
2023-10-11 12:11:42,108 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:11:51,545 epoch 8 - iter 27/272 - loss 0.03607497 - time (sec): 9.43 - samples/sec: 545.51 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 12:12:01,924 epoch 8 - iter 54/272 - loss 0.05540413 - time (sec): 19.81 - samples/sec: 553.76 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 12:12:11,888 epoch 8 - iter 81/272 - loss 0.05162211 - time (sec): 29.78 - samples/sec: 543.33 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 12:12:21,348 epoch 8 - iter 108/272 - loss 0.05054188 - time (sec): 39.24 - samples/sec: 533.64 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 12:12:31,201 epoch 8 - iter 135/272 - loss 0.04929120 - time (sec): 49.09 - samples/sec: 538.60 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-11 12:12:41,764 epoch 8 - iter 162/272 - loss 0.05142742 - time (sec): 59.65 - samples/sec: 548.02 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-11 12:12:50,755 epoch 8 - iter 189/272 - loss 0.05364109 - time (sec): 68.64 - samples/sec: 537.23 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 12:13:00,768 epoch 8 - iter 216/272 - loss 0.05394908 - time (sec): 78.66 - samples/sec: 538.88 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-11 12:13:10,105 epoch 8 - iter 243/272 - loss 0.05376396 - time (sec): 87.99 - samples/sec: 536.01 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-11 12:13:19,304 epoch 8 - iter 270/272 - loss 0.05261769 - time (sec): 97.19 - samples/sec: 532.25 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 12:13:19,767 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:13:19,768 EPOCH 8 done: loss 0.0524 - lr: 0.000034 |
|
2023-10-11 12:13:25,531 DEV : loss 0.1314464509487152 - f1-score (micro avg) 0.7656 |
|
2023-10-11 12:13:25,540 saving best model |
|
2023-10-11 12:13:28,081 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:13:37,314 epoch 9 - iter 27/272 - loss 0.04742634 - time (sec): 9.23 - samples/sec: 530.51 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 12:13:46,234 epoch 9 - iter 54/272 - loss 0.05748414 - time (sec): 18.15 - samples/sec: 510.72 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 12:13:55,554 epoch 9 - iter 81/272 - loss 0.05158405 - time (sec): 27.47 - samples/sec: 510.11 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-11 12:14:05,486 epoch 9 - iter 108/272 - loss 0.04642078 - time (sec): 37.40 - samples/sec: 523.07 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 12:14:15,195 epoch 9 - iter 135/272 - loss 0.04661014 - time (sec): 47.11 - samples/sec: 530.66 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 12:14:24,961 epoch 9 - iter 162/272 - loss 0.04351579 - time (sec): 56.88 - samples/sec: 526.63 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 12:14:34,863 epoch 9 - iter 189/272 - loss 0.04215739 - time (sec): 66.78 - samples/sec: 526.63 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 12:14:44,981 epoch 9 - iter 216/272 - loss 0.04283431 - time (sec): 76.90 - samples/sec: 533.37 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 12:14:54,506 epoch 9 - iter 243/272 - loss 0.04609896 - time (sec): 86.42 - samples/sec: 531.67 - lr: 0.000019 - momentum: 0.000000 |
|
2023-10-11 12:15:04,548 epoch 9 - iter 270/272 - loss 0.04490903 - time (sec): 96.46 - samples/sec: 536.01 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-11 12:15:05,044 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:15:05,044 EPOCH 9 done: loss 0.0447 - lr: 0.000017 |
|
2023-10-11 12:15:10,818 DEV : loss 0.13004814088344574 - f1-score (micro avg) 0.7729 |
|
2023-10-11 12:15:10,826 saving best model |
|
2023-10-11 12:15:13,329 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:15:23,113 epoch 10 - iter 27/272 - loss 0.04807385 - time (sec): 9.78 - samples/sec: 545.79 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 12:15:32,509 epoch 10 - iter 54/272 - loss 0.04823045 - time (sec): 19.18 - samples/sec: 544.06 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 12:15:41,444 epoch 10 - iter 81/272 - loss 0.04320814 - time (sec): 28.11 - samples/sec: 537.12 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-11 12:15:50,755 epoch 10 - iter 108/272 - loss 0.04362312 - time (sec): 37.42 - samples/sec: 535.03 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-11 12:16:01,496 epoch 10 - iter 135/272 - loss 0.03948215 - time (sec): 48.16 - samples/sec: 546.87 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-11 12:16:10,578 epoch 10 - iter 162/272 - loss 0.03854345 - time (sec): 57.24 - samples/sec: 536.05 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 12:16:20,657 epoch 10 - iter 189/272 - loss 0.03769292 - time (sec): 67.32 - samples/sec: 537.82 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 12:16:30,189 epoch 10 - iter 216/272 - loss 0.03774164 - time (sec): 76.86 - samples/sec: 537.11 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-11 12:16:39,841 epoch 10 - iter 243/272 - loss 0.03886886 - time (sec): 86.51 - samples/sec: 538.09 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 12:16:49,410 epoch 10 - iter 270/272 - loss 0.03961577 - time (sec): 96.08 - samples/sec: 538.68 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 12:16:49,871 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:16:49,871 EPOCH 10 done: loss 0.0395 - lr: 0.000000 |
|
2023-10-11 12:16:55,638 DEV : loss 0.1334371417760849 - f1-score (micro avg) 0.7701 |
|
2023-10-11 12:16:56,489 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:16:56,490 Loading model from best epoch ... |
|
2023-10-11 12:17:00,059 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG |
|
2023-10-11 12:17:12,402 |
|
Results: |
|
- F-score (micro) 0.7609 |
|
- F-score (macro) 0.6793 |
|
- Accuracy 0.635 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.7830 0.8558 0.8178 312 |
|
PER 0.7000 0.8413 0.7642 208 |
|
ORG 0.4314 0.4000 0.4151 55 |
|
HumanProd 0.6429 0.8182 0.7200 22 |
|
|
|
micro avg 0.7194 0.8074 0.7609 597 |
|
macro avg 0.6393 0.7288 0.6793 597 |
|
weighted avg 0.7165 0.8074 0.7584 597 |
|
|
|
2023-10-11 12:17:12,403 ---------------------------------------------------------------------------------------------------- |
|
|