stefan-it's picture
Upload folder using huggingface_hub
04175b4
2023-10-13 13:41:03,373 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,375 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-13 13:41:03,375 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,375 MultiCorpus: 7936 train + 992 dev + 992 test sentences
- NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-13 13:41:03,375 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,375 Train: 7936 sentences
2023-10-13 13:41:03,375 (train_with_dev=False, train_with_test=False)
2023-10-13 13:41:03,376 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,376 Training Params:
2023-10-13 13:41:03,376 - learning_rate: "0.00016"
2023-10-13 13:41:03,376 - mini_batch_size: "4"
2023-10-13 13:41:03,376 - max_epochs: "10"
2023-10-13 13:41:03,376 - shuffle: "True"
2023-10-13 13:41:03,376 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,376 Plugins:
2023-10-13 13:41:03,376 - TensorboardLogger
2023-10-13 13:41:03,376 - LinearScheduler | warmup_fraction: '0.1'
2023-10-13 13:41:03,376 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,376 Final evaluation on model from best epoch (best-model.pt)
2023-10-13 13:41:03,376 - metric: "('micro avg', 'f1-score')"
2023-10-13 13:41:03,376 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,376 Computation:
2023-10-13 13:41:03,377 - compute on device: cuda:0
2023-10-13 13:41:03,377 - embedding storage: none
2023-10-13 13:41:03,377 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,377 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5"
2023-10-13 13:41:03,377 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,377 ----------------------------------------------------------------------------------------------------
2023-10-13 13:41:03,377 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-13 13:41:57,697 epoch 1 - iter 198/1984 - loss 2.53411240 - time (sec): 54.32 - samples/sec: 325.47 - lr: 0.000016 - momentum: 0.000000
2023-10-13 13:42:51,087 epoch 1 - iter 396/1984 - loss 2.34200986 - time (sec): 107.71 - samples/sec: 309.82 - lr: 0.000032 - momentum: 0.000000
2023-10-13 13:43:46,734 epoch 1 - iter 594/1984 - loss 2.00986386 - time (sec): 163.35 - samples/sec: 309.73 - lr: 0.000048 - momentum: 0.000000
2023-10-13 13:44:41,691 epoch 1 - iter 792/1984 - loss 1.71728426 - time (sec): 218.31 - samples/sec: 300.53 - lr: 0.000064 - momentum: 0.000000
2023-10-13 13:45:39,863 epoch 1 - iter 990/1984 - loss 1.47309135 - time (sec): 276.48 - samples/sec: 295.09 - lr: 0.000080 - momentum: 0.000000
2023-10-13 13:46:37,854 epoch 1 - iter 1188/1984 - loss 1.28054026 - time (sec): 334.48 - samples/sec: 291.62 - lr: 0.000096 - momentum: 0.000000
2023-10-13 13:47:32,695 epoch 1 - iter 1386/1984 - loss 1.13053154 - time (sec): 389.32 - samples/sec: 293.66 - lr: 0.000112 - momentum: 0.000000
2023-10-13 13:48:27,496 epoch 1 - iter 1584/1984 - loss 1.01770682 - time (sec): 444.12 - samples/sec: 293.40 - lr: 0.000128 - momentum: 0.000000
2023-10-13 13:49:25,748 epoch 1 - iter 1782/1984 - loss 0.91618266 - time (sec): 502.37 - samples/sec: 294.61 - lr: 0.000144 - momentum: 0.000000
2023-10-13 13:50:22,623 epoch 1 - iter 1980/1984 - loss 0.84428444 - time (sec): 559.24 - samples/sec: 292.81 - lr: 0.000160 - momentum: 0.000000
2023-10-13 13:50:23,697 ----------------------------------------------------------------------------------------------------
2023-10-13 13:50:23,697 EPOCH 1 done: loss 0.8433 - lr: 0.000160
2023-10-13 13:50:48,714 DEV : loss 0.13220053911209106 - f1-score (micro avg) 0.6771
2023-10-13 13:50:48,754 saving best model
2023-10-13 13:50:49,635 ----------------------------------------------------------------------------------------------------
2023-10-13 13:51:44,743 epoch 2 - iter 198/1984 - loss 0.15716909 - time (sec): 55.11 - samples/sec: 300.33 - lr: 0.000158 - momentum: 0.000000
2023-10-13 13:52:39,667 epoch 2 - iter 396/1984 - loss 0.14341696 - time (sec): 110.03 - samples/sec: 302.03 - lr: 0.000156 - momentum: 0.000000
2023-10-13 13:53:39,458 epoch 2 - iter 594/1984 - loss 0.13622103 - time (sec): 169.82 - samples/sec: 295.61 - lr: 0.000155 - momentum: 0.000000
2023-10-13 13:54:35,548 epoch 2 - iter 792/1984 - loss 0.13486939 - time (sec): 225.91 - samples/sec: 291.39 - lr: 0.000153 - momentum: 0.000000
2023-10-13 13:55:31,683 epoch 2 - iter 990/1984 - loss 0.13032807 - time (sec): 282.05 - samples/sec: 291.80 - lr: 0.000151 - momentum: 0.000000
2023-10-13 13:56:24,784 epoch 2 - iter 1188/1984 - loss 0.12849675 - time (sec): 335.15 - samples/sec: 294.12 - lr: 0.000149 - momentum: 0.000000
2023-10-13 13:57:19,699 epoch 2 - iter 1386/1984 - loss 0.12609233 - time (sec): 390.06 - samples/sec: 294.46 - lr: 0.000148 - momentum: 0.000000
2023-10-13 13:58:16,450 epoch 2 - iter 1584/1984 - loss 0.12288083 - time (sec): 446.81 - samples/sec: 292.77 - lr: 0.000146 - momentum: 0.000000
2023-10-13 13:59:10,814 epoch 2 - iter 1782/1984 - loss 0.12152366 - time (sec): 501.18 - samples/sec: 291.77 - lr: 0.000144 - momentum: 0.000000
2023-10-13 14:00:08,264 epoch 2 - iter 1980/1984 - loss 0.11885738 - time (sec): 558.63 - samples/sec: 293.08 - lr: 0.000142 - momentum: 0.000000
2023-10-13 14:00:09,548 ----------------------------------------------------------------------------------------------------
2023-10-13 14:00:09,549 EPOCH 2 done: loss 0.1188 - lr: 0.000142
2023-10-13 14:00:35,360 DEV : loss 0.09111367911100388 - f1-score (micro avg) 0.7334
2023-10-13 14:00:35,406 saving best model
2023-10-13 14:00:37,985 ----------------------------------------------------------------------------------------------------
2023-10-13 14:01:35,507 epoch 3 - iter 198/1984 - loss 0.06992910 - time (sec): 57.52 - samples/sec: 279.98 - lr: 0.000140 - momentum: 0.000000
2023-10-13 14:02:30,138 epoch 3 - iter 396/1984 - loss 0.07739811 - time (sec): 112.15 - samples/sec: 289.02 - lr: 0.000139 - momentum: 0.000000
2023-10-13 14:03:23,678 epoch 3 - iter 594/1984 - loss 0.07861216 - time (sec): 165.69 - samples/sec: 292.77 - lr: 0.000137 - momentum: 0.000000
2023-10-13 14:04:18,698 epoch 3 - iter 792/1984 - loss 0.07894264 - time (sec): 220.71 - samples/sec: 294.08 - lr: 0.000135 - momentum: 0.000000
2023-10-13 14:05:11,588 epoch 3 - iter 990/1984 - loss 0.07810640 - time (sec): 273.60 - samples/sec: 295.90 - lr: 0.000133 - momentum: 0.000000
2023-10-13 14:06:07,163 epoch 3 - iter 1188/1984 - loss 0.07738219 - time (sec): 329.17 - samples/sec: 296.93 - lr: 0.000132 - momentum: 0.000000
2023-10-13 14:07:04,929 epoch 3 - iter 1386/1984 - loss 0.07780412 - time (sec): 386.94 - samples/sec: 295.25 - lr: 0.000130 - momentum: 0.000000
2023-10-13 14:08:00,504 epoch 3 - iter 1584/1984 - loss 0.07607150 - time (sec): 442.51 - samples/sec: 295.09 - lr: 0.000128 - momentum: 0.000000
2023-10-13 14:08:57,596 epoch 3 - iter 1782/1984 - loss 0.07488198 - time (sec): 499.61 - samples/sec: 294.57 - lr: 0.000126 - momentum: 0.000000
2023-10-13 14:09:56,502 epoch 3 - iter 1980/1984 - loss 0.07537555 - time (sec): 558.51 - samples/sec: 293.01 - lr: 0.000125 - momentum: 0.000000
2023-10-13 14:09:57,672 ----------------------------------------------------------------------------------------------------
2023-10-13 14:09:57,673 EPOCH 3 done: loss 0.0753 - lr: 0.000125
2023-10-13 14:10:24,639 DEV : loss 0.09566155821084976 - f1-score (micro avg) 0.7588
2023-10-13 14:10:24,680 saving best model
2023-10-13 14:10:27,792 ----------------------------------------------------------------------------------------------------
2023-10-13 14:11:22,947 epoch 4 - iter 198/1984 - loss 0.05573856 - time (sec): 55.15 - samples/sec: 298.81 - lr: 0.000123 - momentum: 0.000000
2023-10-13 14:12:18,581 epoch 4 - iter 396/1984 - loss 0.05399853 - time (sec): 110.78 - samples/sec: 295.09 - lr: 0.000121 - momentum: 0.000000
2023-10-13 14:13:12,197 epoch 4 - iter 594/1984 - loss 0.05122298 - time (sec): 164.40 - samples/sec: 297.54 - lr: 0.000119 - momentum: 0.000000
2023-10-13 14:14:05,824 epoch 4 - iter 792/1984 - loss 0.05330777 - time (sec): 218.03 - samples/sec: 299.43 - lr: 0.000117 - momentum: 0.000000
2023-10-13 14:14:59,557 epoch 4 - iter 990/1984 - loss 0.05347162 - time (sec): 271.76 - samples/sec: 303.22 - lr: 0.000116 - momentum: 0.000000
2023-10-13 14:15:53,694 epoch 4 - iter 1188/1984 - loss 0.05216234 - time (sec): 325.90 - samples/sec: 304.32 - lr: 0.000114 - momentum: 0.000000
2023-10-13 14:16:49,035 epoch 4 - iter 1386/1984 - loss 0.05143628 - time (sec): 381.24 - samples/sec: 301.49 - lr: 0.000112 - momentum: 0.000000
2023-10-13 14:17:43,745 epoch 4 - iter 1584/1984 - loss 0.05184746 - time (sec): 435.95 - samples/sec: 300.80 - lr: 0.000110 - momentum: 0.000000
2023-10-13 14:18:38,926 epoch 4 - iter 1782/1984 - loss 0.05298005 - time (sec): 491.13 - samples/sec: 300.19 - lr: 0.000109 - momentum: 0.000000
2023-10-13 14:19:32,075 epoch 4 - iter 1980/1984 - loss 0.05388844 - time (sec): 544.28 - samples/sec: 300.75 - lr: 0.000107 - momentum: 0.000000
2023-10-13 14:19:33,222 ----------------------------------------------------------------------------------------------------
2023-10-13 14:19:33,223 EPOCH 4 done: loss 0.0538 - lr: 0.000107
2023-10-13 14:20:00,317 DEV : loss 0.12870270013809204 - f1-score (micro avg) 0.7573
2023-10-13 14:20:00,358 ----------------------------------------------------------------------------------------------------
2023-10-13 14:20:53,615 epoch 5 - iter 198/1984 - loss 0.03435675 - time (sec): 53.25 - samples/sec: 315.28 - lr: 0.000105 - momentum: 0.000000
2023-10-13 14:21:44,798 epoch 5 - iter 396/1984 - loss 0.03418837 - time (sec): 104.44 - samples/sec: 319.18 - lr: 0.000103 - momentum: 0.000000
2023-10-13 14:22:37,689 epoch 5 - iter 594/1984 - loss 0.03735641 - time (sec): 157.33 - samples/sec: 318.10 - lr: 0.000101 - momentum: 0.000000
2023-10-13 14:23:31,434 epoch 5 - iter 792/1984 - loss 0.03848171 - time (sec): 211.07 - samples/sec: 314.78 - lr: 0.000100 - momentum: 0.000000
2023-10-13 14:24:27,121 epoch 5 - iter 990/1984 - loss 0.03772221 - time (sec): 266.76 - samples/sec: 309.25 - lr: 0.000098 - momentum: 0.000000
2023-10-13 14:25:20,412 epoch 5 - iter 1188/1984 - loss 0.03984799 - time (sec): 320.05 - samples/sec: 307.07 - lr: 0.000096 - momentum: 0.000000
2023-10-13 14:26:15,156 epoch 5 - iter 1386/1984 - loss 0.03966161 - time (sec): 374.80 - samples/sec: 304.75 - lr: 0.000094 - momentum: 0.000000
2023-10-13 14:27:08,199 epoch 5 - iter 1584/1984 - loss 0.03987275 - time (sec): 427.84 - samples/sec: 306.78 - lr: 0.000093 - momentum: 0.000000
2023-10-13 14:28:03,847 epoch 5 - iter 1782/1984 - loss 0.04040681 - time (sec): 483.49 - samples/sec: 305.38 - lr: 0.000091 - momentum: 0.000000
2023-10-13 14:28:57,477 epoch 5 - iter 1980/1984 - loss 0.03995813 - time (sec): 537.12 - samples/sec: 304.92 - lr: 0.000089 - momentum: 0.000000
2023-10-13 14:28:58,562 ----------------------------------------------------------------------------------------------------
2023-10-13 14:28:58,563 EPOCH 5 done: loss 0.0399 - lr: 0.000089
2023-10-13 14:29:25,637 DEV : loss 0.15645428001880646 - f1-score (micro avg) 0.7602
2023-10-13 14:29:25,679 saving best model
2023-10-13 14:29:28,321 ----------------------------------------------------------------------------------------------------
2023-10-13 14:30:21,475 epoch 6 - iter 198/1984 - loss 0.02467542 - time (sec): 53.15 - samples/sec: 321.45 - lr: 0.000087 - momentum: 0.000000
2023-10-13 14:31:13,093 epoch 6 - iter 396/1984 - loss 0.02376319 - time (sec): 104.77 - samples/sec: 320.03 - lr: 0.000085 - momentum: 0.000000
2023-10-13 14:32:04,586 epoch 6 - iter 594/1984 - loss 0.02474508 - time (sec): 156.26 - samples/sec: 315.70 - lr: 0.000084 - momentum: 0.000000
2023-10-13 14:32:57,169 epoch 6 - iter 792/1984 - loss 0.02868125 - time (sec): 208.84 - samples/sec: 315.69 - lr: 0.000082 - momentum: 0.000000
2023-10-13 14:33:51,982 epoch 6 - iter 990/1984 - loss 0.02813004 - time (sec): 263.66 - samples/sec: 311.99 - lr: 0.000080 - momentum: 0.000000
2023-10-13 14:34:49,635 epoch 6 - iter 1188/1984 - loss 0.02767063 - time (sec): 321.31 - samples/sec: 306.76 - lr: 0.000078 - momentum: 0.000000
2023-10-13 14:35:44,096 epoch 6 - iter 1386/1984 - loss 0.02786452 - time (sec): 375.77 - samples/sec: 305.99 - lr: 0.000077 - momentum: 0.000000
2023-10-13 14:36:36,019 epoch 6 - iter 1584/1984 - loss 0.02885341 - time (sec): 427.69 - samples/sec: 306.12 - lr: 0.000075 - momentum: 0.000000
2023-10-13 14:37:29,223 epoch 6 - iter 1782/1984 - loss 0.02817717 - time (sec): 480.90 - samples/sec: 306.03 - lr: 0.000073 - momentum: 0.000000
2023-10-13 14:38:27,626 epoch 6 - iter 1980/1984 - loss 0.02939511 - time (sec): 539.30 - samples/sec: 303.56 - lr: 0.000071 - momentum: 0.000000
2023-10-13 14:38:28,789 ----------------------------------------------------------------------------------------------------
2023-10-13 14:38:28,789 EPOCH 6 done: loss 0.0295 - lr: 0.000071
2023-10-13 14:38:56,206 DEV : loss 0.16446241736412048 - f1-score (micro avg) 0.7492
2023-10-13 14:38:56,257 ----------------------------------------------------------------------------------------------------
2023-10-13 14:39:51,637 epoch 7 - iter 198/1984 - loss 0.01809156 - time (sec): 55.38 - samples/sec: 297.72 - lr: 0.000069 - momentum: 0.000000
2023-10-13 14:40:46,224 epoch 7 - iter 396/1984 - loss 0.01613317 - time (sec): 109.96 - samples/sec: 293.23 - lr: 0.000068 - momentum: 0.000000
2023-10-13 14:41:41,377 epoch 7 - iter 594/1984 - loss 0.01702457 - time (sec): 165.12 - samples/sec: 297.91 - lr: 0.000066 - momentum: 0.000000
2023-10-13 14:42:36,232 epoch 7 - iter 792/1984 - loss 0.01784643 - time (sec): 219.97 - samples/sec: 296.04 - lr: 0.000064 - momentum: 0.000000
2023-10-13 14:43:31,504 epoch 7 - iter 990/1984 - loss 0.01766601 - time (sec): 275.24 - samples/sec: 296.10 - lr: 0.000062 - momentum: 0.000000
2023-10-13 14:44:29,053 epoch 7 - iter 1188/1984 - loss 0.01798869 - time (sec): 332.79 - samples/sec: 293.84 - lr: 0.000061 - momentum: 0.000000
2023-10-13 14:45:23,368 epoch 7 - iter 1386/1984 - loss 0.01843451 - time (sec): 387.11 - samples/sec: 294.70 - lr: 0.000059 - momentum: 0.000000
2023-10-13 14:46:15,280 epoch 7 - iter 1584/1984 - loss 0.01841142 - time (sec): 439.02 - samples/sec: 296.13 - lr: 0.000057 - momentum: 0.000000
2023-10-13 14:47:10,847 epoch 7 - iter 1782/1984 - loss 0.01957244 - time (sec): 494.59 - samples/sec: 297.35 - lr: 0.000055 - momentum: 0.000000
2023-10-13 14:48:05,500 epoch 7 - iter 1980/1984 - loss 0.02031150 - time (sec): 549.24 - samples/sec: 298.17 - lr: 0.000053 - momentum: 0.000000
2023-10-13 14:48:06,548 ----------------------------------------------------------------------------------------------------
2023-10-13 14:48:06,548 EPOCH 7 done: loss 0.0203 - lr: 0.000053
2023-10-13 14:48:34,612 DEV : loss 0.19750244915485382 - f1-score (micro avg) 0.7567
2023-10-13 14:48:34,664 ----------------------------------------------------------------------------------------------------
2023-10-13 14:49:28,793 epoch 8 - iter 198/1984 - loss 0.01418068 - time (sec): 54.13 - samples/sec: 304.44 - lr: 0.000052 - momentum: 0.000000
2023-10-13 14:50:22,561 epoch 8 - iter 396/1984 - loss 0.01608296 - time (sec): 107.90 - samples/sec: 307.64 - lr: 0.000050 - momentum: 0.000000
2023-10-13 14:51:15,361 epoch 8 - iter 594/1984 - loss 0.01364728 - time (sec): 160.69 - samples/sec: 306.85 - lr: 0.000048 - momentum: 0.000000
2023-10-13 14:52:08,131 epoch 8 - iter 792/1984 - loss 0.01376922 - time (sec): 213.46 - samples/sec: 309.15 - lr: 0.000046 - momentum: 0.000000
2023-10-13 14:53:02,352 epoch 8 - iter 990/1984 - loss 0.01311662 - time (sec): 267.69 - samples/sec: 307.25 - lr: 0.000045 - momentum: 0.000000
2023-10-13 14:53:57,556 epoch 8 - iter 1188/1984 - loss 0.01375357 - time (sec): 322.89 - samples/sec: 305.33 - lr: 0.000043 - momentum: 0.000000
2023-10-13 14:54:52,392 epoch 8 - iter 1386/1984 - loss 0.01337745 - time (sec): 377.73 - samples/sec: 304.21 - lr: 0.000041 - momentum: 0.000000
2023-10-13 14:55:49,698 epoch 8 - iter 1584/1984 - loss 0.01404364 - time (sec): 435.03 - samples/sec: 300.09 - lr: 0.000039 - momentum: 0.000000
2023-10-13 14:56:44,413 epoch 8 - iter 1782/1984 - loss 0.01463217 - time (sec): 489.75 - samples/sec: 300.14 - lr: 0.000037 - momentum: 0.000000
2023-10-13 14:57:39,656 epoch 8 - iter 1980/1984 - loss 0.01450393 - time (sec): 544.99 - samples/sec: 300.46 - lr: 0.000036 - momentum: 0.000000
2023-10-13 14:57:40,732 ----------------------------------------------------------------------------------------------------
2023-10-13 14:57:40,733 EPOCH 8 done: loss 0.0145 - lr: 0.000036
2023-10-13 14:58:09,479 DEV : loss 0.21577665209770203 - f1-score (micro avg) 0.7529
2023-10-13 14:58:09,519 ----------------------------------------------------------------------------------------------------
2023-10-13 14:59:03,230 epoch 9 - iter 198/1984 - loss 0.00877228 - time (sec): 53.71 - samples/sec: 293.64 - lr: 0.000034 - momentum: 0.000000
2023-10-13 15:00:00,025 epoch 9 - iter 396/1984 - loss 0.00937105 - time (sec): 110.50 - samples/sec: 287.85 - lr: 0.000032 - momentum: 0.000000
2023-10-13 15:00:54,419 epoch 9 - iter 594/1984 - loss 0.00856868 - time (sec): 164.90 - samples/sec: 293.52 - lr: 0.000030 - momentum: 0.000000
2023-10-13 15:01:49,560 epoch 9 - iter 792/1984 - loss 0.00989234 - time (sec): 220.04 - samples/sec: 295.67 - lr: 0.000029 - momentum: 0.000000
2023-10-13 15:02:47,168 epoch 9 - iter 990/1984 - loss 0.01106797 - time (sec): 277.65 - samples/sec: 292.62 - lr: 0.000027 - momentum: 0.000000
2023-10-13 15:03:41,878 epoch 9 - iter 1188/1984 - loss 0.01081355 - time (sec): 332.36 - samples/sec: 288.46 - lr: 0.000025 - momentum: 0.000000
2023-10-13 15:04:36,367 epoch 9 - iter 1386/1984 - loss 0.01050288 - time (sec): 386.85 - samples/sec: 292.92 - lr: 0.000023 - momentum: 0.000000
2023-10-13 15:05:30,076 epoch 9 - iter 1584/1984 - loss 0.01056348 - time (sec): 440.55 - samples/sec: 294.63 - lr: 0.000021 - momentum: 0.000000
2023-10-13 15:06:27,189 epoch 9 - iter 1782/1984 - loss 0.01102912 - time (sec): 497.67 - samples/sec: 295.61 - lr: 0.000020 - momentum: 0.000000
2023-10-13 15:07:18,598 epoch 9 - iter 1980/1984 - loss 0.01087141 - time (sec): 549.08 - samples/sec: 297.93 - lr: 0.000018 - momentum: 0.000000
2023-10-13 15:07:19,699 ----------------------------------------------------------------------------------------------------
2023-10-13 15:07:19,699 EPOCH 9 done: loss 0.0108 - lr: 0.000018
2023-10-13 15:07:45,626 DEV : loss 0.22953416407108307 - f1-score (micro avg) 0.7647
2023-10-13 15:07:45,671 saving best model
2023-10-13 15:07:48,309 ----------------------------------------------------------------------------------------------------
2023-10-13 15:08:44,790 epoch 10 - iter 198/1984 - loss 0.00464760 - time (sec): 56.48 - samples/sec: 299.72 - lr: 0.000016 - momentum: 0.000000
2023-10-13 15:09:43,385 epoch 10 - iter 396/1984 - loss 0.00822453 - time (sec): 115.07 - samples/sec: 283.39 - lr: 0.000014 - momentum: 0.000000
2023-10-13 15:10:37,532 epoch 10 - iter 594/1984 - loss 0.00828961 - time (sec): 169.22 - samples/sec: 286.73 - lr: 0.000013 - momentum: 0.000000
2023-10-13 15:11:31,359 epoch 10 - iter 792/1984 - loss 0.00709603 - time (sec): 223.04 - samples/sec: 288.73 - lr: 0.000011 - momentum: 0.000000
2023-10-13 15:12:28,943 epoch 10 - iter 990/1984 - loss 0.00651970 - time (sec): 280.63 - samples/sec: 289.30 - lr: 0.000009 - momentum: 0.000000
2023-10-13 15:13:24,258 epoch 10 - iter 1188/1984 - loss 0.00667521 - time (sec): 335.94 - samples/sec: 291.68 - lr: 0.000007 - momentum: 0.000000
2023-10-13 15:14:23,470 epoch 10 - iter 1386/1984 - loss 0.00650330 - time (sec): 395.16 - samples/sec: 290.20 - lr: 0.000005 - momentum: 0.000000
2023-10-13 15:15:17,651 epoch 10 - iter 1584/1984 - loss 0.00684118 - time (sec): 449.34 - samples/sec: 292.52 - lr: 0.000004 - momentum: 0.000000
2023-10-13 15:16:11,772 epoch 10 - iter 1782/1984 - loss 0.00684599 - time (sec): 503.46 - samples/sec: 293.68 - lr: 0.000002 - momentum: 0.000000
2023-10-13 15:17:08,238 epoch 10 - iter 1980/1984 - loss 0.00707841 - time (sec): 559.92 - samples/sec: 292.19 - lr: 0.000000 - momentum: 0.000000
2023-10-13 15:17:09,533 ----------------------------------------------------------------------------------------------------
2023-10-13 15:17:09,534 EPOCH 10 done: loss 0.0071 - lr: 0.000000
2023-10-13 15:17:37,000 DEV : loss 0.23275645077228546 - f1-score (micro avg) 0.76
2023-10-13 15:17:38,020 ----------------------------------------------------------------------------------------------------
2023-10-13 15:17:38,023 Loading model from best epoch ...
2023-10-13 15:17:42,458 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-13 15:18:07,926
Results:
- F-score (micro) 0.7594
- F-score (macro) 0.6623
- Accuracy 0.6372
By class:
precision recall f1-score support
LOC 0.8053 0.8397 0.8221 655
PER 0.7076 0.7489 0.7277 223
ORG 0.5341 0.3701 0.4372 127
micro avg 0.7587 0.7602 0.7594 1005
macro avg 0.6823 0.6529 0.6623 1005
weighted avg 0.7493 0.7602 0.7525 1005
2023-10-13 15:18:07,927 ----------------------------------------------------------------------------------------------------