2023-10-12 08:04:12,035 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,037 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 08:04:12,038 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,038 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-12 08:04:12,038 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,038 Train: 7936 sentences 2023-10-12 08:04:12,038 (train_with_dev=False, train_with_test=False) 2023-10-12 08:04:12,038 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,038 Training Params: 2023-10-12 08:04:12,039 - learning_rate: "0.00015" 2023-10-12 08:04:12,039 - mini_batch_size: "8" 2023-10-12 08:04:12,039 - max_epochs: "10" 2023-10-12 08:04:12,039 - shuffle: "True" 2023-10-12 08:04:12,039 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,039 Plugins: 2023-10-12 08:04:12,039 - TensorboardLogger 2023-10-12 08:04:12,039 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 08:04:12,039 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,039 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 08:04:12,039 - metric: "('micro avg', 'f1-score')" 2023-10-12 08:04:12,039 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,039 Computation: 2023-10-12 08:04:12,040 - compute on device: cuda:0 2023-10-12 08:04:12,040 - embedding storage: none 2023-10-12 08:04:12,040 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,040 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-12 08:04:12,040 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,040 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:04:12,040 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 08:05:06,449 epoch 1 - iter 99/992 - loss 2.58555570 - time (sec): 54.41 - samples/sec: 284.28 - lr: 0.000015 - momentum: 0.000000 2023-10-12 08:05:56,282 epoch 1 - iter 198/992 - loss 2.53686260 - time (sec): 104.24 - samples/sec: 302.44 - lr: 0.000030 - momentum: 0.000000 2023-10-12 08:06:46,233 epoch 1 - iter 297/992 - loss 2.34204530 - time (sec): 154.19 - samples/sec: 310.69 - lr: 0.000045 - momentum: 0.000000 2023-10-12 08:07:34,661 epoch 1 - iter 396/992 - loss 2.08169120 - time (sec): 202.62 - samples/sec: 317.45 - lr: 0.000060 - momentum: 0.000000 2023-10-12 08:08:24,602 epoch 1 - iter 495/992 - loss 1.82694454 - time (sec): 252.56 - samples/sec: 317.85 - lr: 0.000075 - momentum: 0.000000 2023-10-12 08:09:14,534 epoch 1 - iter 594/992 - loss 1.59247411 - time (sec): 302.49 - samples/sec: 320.69 - lr: 0.000090 - momentum: 0.000000 2023-10-12 08:10:03,550 epoch 1 - iter 693/992 - loss 1.39639771 - time (sec): 351.51 - samples/sec: 325.34 - lr: 0.000105 - momentum: 0.000000 2023-10-12 08:10:59,316 epoch 1 - iter 792/992 - loss 1.25466641 - time (sec): 407.27 - samples/sec: 321.81 - lr: 0.000120 - momentum: 0.000000 2023-10-12 08:11:53,277 epoch 1 - iter 891/992 - loss 1.14679909 - time (sec): 461.23 - samples/sec: 319.44 - lr: 0.000135 - momentum: 0.000000 2023-10-12 08:12:42,482 epoch 1 - iter 990/992 - loss 1.05527957 - time (sec): 510.44 - samples/sec: 320.66 - lr: 0.000150 - momentum: 0.000000 2023-10-12 08:12:43,887 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:12:43,888 EPOCH 1 done: loss 1.0537 - lr: 0.000150 2023-10-12 08:13:10,955 DEV : loss 0.18303227424621582 - f1-score (micro avg) 0.355 2023-10-12 08:13:11,005 saving best model 2023-10-12 08:13:11,967 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:14:02,123 epoch 2 - iter 99/992 - loss 0.24094875 - time (sec): 50.15 - samples/sec: 328.44 - lr: 0.000148 - momentum: 0.000000 2023-10-12 08:14:55,551 epoch 2 - iter 198/992 - loss 0.20476026 - time (sec): 103.58 - samples/sec: 315.61 - lr: 0.000147 - momentum: 0.000000 2023-10-12 08:15:49,201 epoch 2 - iter 297/992 - loss 0.19182712 - time (sec): 157.23 - samples/sec: 311.94 - lr: 0.000145 - momentum: 0.000000 2023-10-12 08:16:42,272 epoch 2 - iter 396/992 - loss 0.18719035 - time (sec): 210.30 - samples/sec: 312.52 - lr: 0.000143 - momentum: 0.000000 2023-10-12 08:17:36,327 epoch 2 - iter 495/992 - loss 0.18025399 - time (sec): 264.36 - samples/sec: 311.59 - lr: 0.000142 - momentum: 0.000000 2023-10-12 08:18:27,193 epoch 2 - iter 594/992 - loss 0.17568004 - time (sec): 315.22 - samples/sec: 312.61 - lr: 0.000140 - momentum: 0.000000 2023-10-12 08:19:21,888 epoch 2 - iter 693/992 - loss 0.17273836 - time (sec): 369.92 - samples/sec: 311.10 - lr: 0.000138 - momentum: 0.000000 2023-10-12 08:20:16,613 epoch 2 - iter 792/992 - loss 0.16679007 - time (sec): 424.64 - samples/sec: 308.04 - lr: 0.000137 - momentum: 0.000000 2023-10-12 08:21:07,673 epoch 2 - iter 891/992 - loss 0.16155770 - time (sec): 475.70 - samples/sec: 308.99 - lr: 0.000135 - momentum: 0.000000 2023-10-12 08:22:02,055 epoch 2 - iter 990/992 - loss 0.15729141 - time (sec): 530.08 - samples/sec: 308.45 - lr: 0.000133 - momentum: 0.000000 2023-10-12 08:22:03,276 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:22:03,276 EPOCH 2 done: loss 0.1570 - lr: 0.000133 2023-10-12 08:22:30,282 DEV : loss 0.0930715873837471 - f1-score (micro avg) 0.7059 2023-10-12 08:22:30,322 saving best model 2023-10-12 08:22:33,238 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:23:31,054 epoch 3 - iter 99/992 - loss 0.09366490 - time (sec): 57.81 - samples/sec: 272.24 - lr: 0.000132 - momentum: 0.000000 2023-10-12 08:24:26,452 epoch 3 - iter 198/992 - loss 0.09371713 - time (sec): 113.21 - samples/sec: 282.27 - lr: 0.000130 - momentum: 0.000000 2023-10-12 08:25:20,392 epoch 3 - iter 297/992 - loss 0.09379383 - time (sec): 167.15 - samples/sec: 291.97 - lr: 0.000128 - momentum: 0.000000 2023-10-12 08:26:13,353 epoch 3 - iter 396/992 - loss 0.09391945 - time (sec): 220.11 - samples/sec: 296.14 - lr: 0.000127 - momentum: 0.000000 2023-10-12 08:27:04,613 epoch 3 - iter 495/992 - loss 0.09225266 - time (sec): 271.37 - samples/sec: 299.92 - lr: 0.000125 - momentum: 0.000000 2023-10-12 08:27:58,875 epoch 3 - iter 594/992 - loss 0.09074357 - time (sec): 325.63 - samples/sec: 299.29 - lr: 0.000123 - momentum: 0.000000 2023-10-12 08:28:48,568 epoch 3 - iter 693/992 - loss 0.09041959 - time (sec): 375.33 - samples/sec: 301.62 - lr: 0.000122 - momentum: 0.000000 2023-10-12 08:29:39,210 epoch 3 - iter 792/992 - loss 0.08851234 - time (sec): 425.97 - samples/sec: 307.21 - lr: 0.000120 - momentum: 0.000000 2023-10-12 08:30:29,714 epoch 3 - iter 891/992 - loss 0.08699505 - time (sec): 476.47 - samples/sec: 310.07 - lr: 0.000118 - momentum: 0.000000 2023-10-12 08:31:24,309 epoch 3 - iter 990/992 - loss 0.08693648 - time (sec): 531.07 - samples/sec: 308.23 - lr: 0.000117 - momentum: 0.000000 2023-10-12 08:31:25,444 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:31:25,445 EPOCH 3 done: loss 0.0869 - lr: 0.000117 2023-10-12 08:31:51,548 DEV : loss 0.09189649671316147 - f1-score (micro avg) 0.7402 2023-10-12 08:31:51,594 saving best model 2023-10-12 08:31:54,213 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:32:43,918 epoch 4 - iter 99/992 - loss 0.06142325 - time (sec): 49.70 - samples/sec: 344.89 - lr: 0.000115 - momentum: 0.000000 2023-10-12 08:33:36,309 epoch 4 - iter 198/992 - loss 0.06089639 - time (sec): 102.09 - samples/sec: 333.62 - lr: 0.000113 - momentum: 0.000000 2023-10-12 08:34:25,710 epoch 4 - iter 297/992 - loss 0.06267594 - time (sec): 151.49 - samples/sec: 329.91 - lr: 0.000112 - momentum: 0.000000 2023-10-12 08:35:15,366 epoch 4 - iter 396/992 - loss 0.06005022 - time (sec): 201.15 - samples/sec: 326.69 - lr: 0.000110 - momentum: 0.000000 2023-10-12 08:36:07,476 epoch 4 - iter 495/992 - loss 0.05963130 - time (sec): 253.26 - samples/sec: 323.68 - lr: 0.000108 - momentum: 0.000000 2023-10-12 08:36:57,931 epoch 4 - iter 594/992 - loss 0.05891404 - time (sec): 303.71 - samples/sec: 322.82 - lr: 0.000107 - momentum: 0.000000 2023-10-12 08:37:46,481 epoch 4 - iter 693/992 - loss 0.05887390 - time (sec): 352.26 - samples/sec: 325.28 - lr: 0.000105 - momentum: 0.000000 2023-10-12 08:38:34,872 epoch 4 - iter 792/992 - loss 0.05887289 - time (sec): 400.65 - samples/sec: 326.32 - lr: 0.000103 - momentum: 0.000000 2023-10-12 08:39:29,526 epoch 4 - iter 891/992 - loss 0.05755141 - time (sec): 455.31 - samples/sec: 324.49 - lr: 0.000102 - momentum: 0.000000 2023-10-12 08:40:24,825 epoch 4 - iter 990/992 - loss 0.05782701 - time (sec): 510.61 - samples/sec: 320.69 - lr: 0.000100 - momentum: 0.000000 2023-10-12 08:40:25,816 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:40:25,816 EPOCH 4 done: loss 0.0578 - lr: 0.000100 2023-10-12 08:40:51,595 DEV : loss 0.09931203722953796 - f1-score (micro avg) 0.7623 2023-10-12 08:40:51,635 saving best model 2023-10-12 08:40:57,442 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:41:49,152 epoch 5 - iter 99/992 - loss 0.04436841 - time (sec): 51.71 - samples/sec: 312.44 - lr: 0.000098 - momentum: 0.000000 2023-10-12 08:42:42,375 epoch 5 - iter 198/992 - loss 0.03706072 - time (sec): 104.93 - samples/sec: 308.20 - lr: 0.000097 - momentum: 0.000000 2023-10-12 08:43:35,907 epoch 5 - iter 297/992 - loss 0.03821323 - time (sec): 158.46 - samples/sec: 307.02 - lr: 0.000095 - momentum: 0.000000 2023-10-12 08:44:31,403 epoch 5 - iter 396/992 - loss 0.03912269 - time (sec): 213.96 - samples/sec: 304.26 - lr: 0.000093 - momentum: 0.000000 2023-10-12 08:45:22,108 epoch 5 - iter 495/992 - loss 0.03917046 - time (sec): 264.66 - samples/sec: 307.19 - lr: 0.000092 - momentum: 0.000000 2023-10-12 08:46:11,499 epoch 5 - iter 594/992 - loss 0.04022926 - time (sec): 314.05 - samples/sec: 311.81 - lr: 0.000090 - momentum: 0.000000 2023-10-12 08:46:59,993 epoch 5 - iter 693/992 - loss 0.03989622 - time (sec): 362.55 - samples/sec: 315.65 - lr: 0.000088 - momentum: 0.000000 2023-10-12 08:47:58,985 epoch 5 - iter 792/992 - loss 0.04056929 - time (sec): 421.54 - samples/sec: 311.55 - lr: 0.000087 - momentum: 0.000000 2023-10-12 08:48:50,892 epoch 5 - iter 891/992 - loss 0.04088817 - time (sec): 473.45 - samples/sec: 312.27 - lr: 0.000085 - momentum: 0.000000 2023-10-12 08:49:38,976 epoch 5 - iter 990/992 - loss 0.04156030 - time (sec): 521.53 - samples/sec: 313.73 - lr: 0.000083 - momentum: 0.000000 2023-10-12 08:49:40,070 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:49:40,071 EPOCH 5 done: loss 0.0415 - lr: 0.000083 2023-10-12 08:50:06,253 DEV : loss 0.11372340470552444 - f1-score (micro avg) 0.756 2023-10-12 08:50:06,293 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:50:55,831 epoch 6 - iter 99/992 - loss 0.02534475 - time (sec): 49.54 - samples/sec: 316.24 - lr: 0.000082 - momentum: 0.000000 2023-10-12 08:51:50,289 epoch 6 - iter 198/992 - loss 0.02728538 - time (sec): 103.99 - samples/sec: 307.91 - lr: 0.000080 - momentum: 0.000000 2023-10-12 08:52:44,094 epoch 6 - iter 297/992 - loss 0.02693384 - time (sec): 157.80 - samples/sec: 305.30 - lr: 0.000078 - momentum: 0.000000 2023-10-12 08:53:36,994 epoch 6 - iter 396/992 - loss 0.02900133 - time (sec): 210.70 - samples/sec: 309.30 - lr: 0.000077 - momentum: 0.000000 2023-10-12 08:54:29,419 epoch 6 - iter 495/992 - loss 0.02831503 - time (sec): 263.12 - samples/sec: 308.36 - lr: 0.000075 - momentum: 0.000000 2023-10-12 08:55:19,177 epoch 6 - iter 594/992 - loss 0.02808324 - time (sec): 312.88 - samples/sec: 312.68 - lr: 0.000073 - momentum: 0.000000 2023-10-12 08:56:07,273 epoch 6 - iter 693/992 - loss 0.02892834 - time (sec): 360.98 - samples/sec: 317.73 - lr: 0.000072 - momentum: 0.000000 2023-10-12 08:56:55,649 epoch 6 - iter 792/992 - loss 0.03069744 - time (sec): 409.35 - samples/sec: 319.35 - lr: 0.000070 - momentum: 0.000000 2023-10-12 08:57:50,900 epoch 6 - iter 891/992 - loss 0.03124301 - time (sec): 464.60 - samples/sec: 317.11 - lr: 0.000068 - momentum: 0.000000 2023-10-12 08:58:43,506 epoch 6 - iter 990/992 - loss 0.03144417 - time (sec): 517.21 - samples/sec: 316.34 - lr: 0.000067 - momentum: 0.000000 2023-10-12 08:58:44,493 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:58:44,493 EPOCH 6 done: loss 0.0314 - lr: 0.000067 2023-10-12 08:59:08,717 DEV : loss 0.13427288830280304 - f1-score (micro avg) 0.7743 2023-10-12 08:59:08,761 saving best model 2023-10-12 08:59:11,805 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:00:02,976 epoch 7 - iter 99/992 - loss 0.01884774 - time (sec): 51.17 - samples/sec: 318.71 - lr: 0.000065 - momentum: 0.000000 2023-10-12 09:00:51,080 epoch 7 - iter 198/992 - loss 0.02301048 - time (sec): 99.27 - samples/sec: 332.12 - lr: 0.000063 - momentum: 0.000000 2023-10-12 09:01:38,861 epoch 7 - iter 297/992 - loss 0.02287236 - time (sec): 147.05 - samples/sec: 332.81 - lr: 0.000062 - momentum: 0.000000 2023-10-12 09:02:26,775 epoch 7 - iter 396/992 - loss 0.02348912 - time (sec): 194.97 - samples/sec: 336.71 - lr: 0.000060 - momentum: 0.000000 2023-10-12 09:03:14,307 epoch 7 - iter 495/992 - loss 0.02304502 - time (sec): 242.50 - samples/sec: 336.33 - lr: 0.000058 - momentum: 0.000000 2023-10-12 09:04:01,858 epoch 7 - iter 594/992 - loss 0.02320461 - time (sec): 290.05 - samples/sec: 337.37 - lr: 0.000057 - momentum: 0.000000 2023-10-12 09:04:51,398 epoch 7 - iter 693/992 - loss 0.02410117 - time (sec): 339.59 - samples/sec: 337.92 - lr: 0.000055 - momentum: 0.000000 2023-10-12 09:05:37,198 epoch 7 - iter 792/992 - loss 0.02467991 - time (sec): 385.39 - samples/sec: 336.38 - lr: 0.000053 - momentum: 0.000000 2023-10-12 09:06:24,113 epoch 7 - iter 891/992 - loss 0.02411616 - time (sec): 432.30 - samples/sec: 338.45 - lr: 0.000052 - momentum: 0.000000 2023-10-12 09:07:11,294 epoch 7 - iter 990/992 - loss 0.02385416 - time (sec): 479.48 - samples/sec: 341.20 - lr: 0.000050 - momentum: 0.000000 2023-10-12 09:07:12,268 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:07:12,268 EPOCH 7 done: loss 0.0238 - lr: 0.000050 2023-10-12 09:07:37,616 DEV : loss 0.16945815086364746 - f1-score (micro avg) 0.7625 2023-10-12 09:07:37,658 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:08:25,636 epoch 8 - iter 99/992 - loss 0.01826581 - time (sec): 47.98 - samples/sec: 352.82 - lr: 0.000048 - momentum: 0.000000 2023-10-12 09:09:15,818 epoch 8 - iter 198/992 - loss 0.01797263 - time (sec): 98.16 - samples/sec: 328.55 - lr: 0.000047 - momentum: 0.000000 2023-10-12 09:10:08,897 epoch 8 - iter 297/992 - loss 0.01946032 - time (sec): 151.24 - samples/sec: 315.74 - lr: 0.000045 - momentum: 0.000000 2023-10-12 09:10:58,922 epoch 8 - iter 396/992 - loss 0.01952599 - time (sec): 201.26 - samples/sec: 316.86 - lr: 0.000043 - momentum: 0.000000 2023-10-12 09:11:52,850 epoch 8 - iter 495/992 - loss 0.01857797 - time (sec): 255.19 - samples/sec: 315.56 - lr: 0.000042 - momentum: 0.000000 2023-10-12 09:12:45,735 epoch 8 - iter 594/992 - loss 0.02004084 - time (sec): 308.08 - samples/sec: 317.70 - lr: 0.000040 - momentum: 0.000000 2023-10-12 09:13:36,327 epoch 8 - iter 693/992 - loss 0.02014586 - time (sec): 358.67 - samples/sec: 317.68 - lr: 0.000038 - momentum: 0.000000 2023-10-12 09:14:30,138 epoch 8 - iter 792/992 - loss 0.01980906 - time (sec): 412.48 - samples/sec: 316.70 - lr: 0.000037 - momentum: 0.000000 2023-10-12 09:15:21,623 epoch 8 - iter 891/992 - loss 0.02006957 - time (sec): 463.96 - samples/sec: 316.10 - lr: 0.000035 - momentum: 0.000000 2023-10-12 09:16:09,180 epoch 8 - iter 990/992 - loss 0.01945183 - time (sec): 511.52 - samples/sec: 320.12 - lr: 0.000033 - momentum: 0.000000 2023-10-12 09:16:10,068 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:16:10,068 EPOCH 8 done: loss 0.0194 - lr: 0.000033 2023-10-12 09:16:34,568 DEV : loss 0.1777261346578598 - f1-score (micro avg) 0.7603 2023-10-12 09:16:34,606 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:17:22,132 epoch 9 - iter 99/992 - loss 0.02532699 - time (sec): 47.52 - samples/sec: 362.18 - lr: 0.000032 - momentum: 0.000000 2023-10-12 09:18:10,391 epoch 9 - iter 198/992 - loss 0.02156891 - time (sec): 95.78 - samples/sec: 351.32 - lr: 0.000030 - momentum: 0.000000 2023-10-12 09:18:56,755 epoch 9 - iter 297/992 - loss 0.01810471 - time (sec): 142.15 - samples/sec: 356.12 - lr: 0.000028 - momentum: 0.000000 2023-10-12 09:19:44,249 epoch 9 - iter 396/992 - loss 0.01799543 - time (sec): 189.64 - samples/sec: 348.96 - lr: 0.000027 - momentum: 0.000000 2023-10-12 09:20:31,198 epoch 9 - iter 495/992 - loss 0.01649332 - time (sec): 236.59 - samples/sec: 349.44 - lr: 0.000025 - momentum: 0.000000 2023-10-12 09:21:19,342 epoch 9 - iter 594/992 - loss 0.01544692 - time (sec): 284.73 - samples/sec: 347.94 - lr: 0.000023 - momentum: 0.000000 2023-10-12 09:22:07,048 epoch 9 - iter 693/992 - loss 0.01478198 - time (sec): 332.44 - samples/sec: 348.11 - lr: 0.000022 - momentum: 0.000000 2023-10-12 09:22:55,379 epoch 9 - iter 792/992 - loss 0.01567478 - time (sec): 380.77 - samples/sec: 344.64 - lr: 0.000020 - momentum: 0.000000 2023-10-12 09:23:42,926 epoch 9 - iter 891/992 - loss 0.01599589 - time (sec): 428.32 - samples/sec: 344.01 - lr: 0.000018 - momentum: 0.000000 2023-10-12 09:24:31,114 epoch 9 - iter 990/992 - loss 0.01558416 - time (sec): 476.51 - samples/sec: 343.42 - lr: 0.000017 - momentum: 0.000000 2023-10-12 09:24:32,095 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:24:32,095 EPOCH 9 done: loss 0.0156 - lr: 0.000017 2023-10-12 09:24:57,467 DEV : loss 0.18520045280456543 - f1-score (micro avg) 0.7619 2023-10-12 09:24:57,511 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:25:46,427 epoch 10 - iter 99/992 - loss 0.01050991 - time (sec): 48.91 - samples/sec: 341.19 - lr: 0.000015 - momentum: 0.000000 2023-10-12 09:26:34,054 epoch 10 - iter 198/992 - loss 0.01124717 - time (sec): 96.54 - samples/sec: 339.42 - lr: 0.000013 - momentum: 0.000000 2023-10-12 09:27:22,983 epoch 10 - iter 297/992 - loss 0.01103351 - time (sec): 145.47 - samples/sec: 339.57 - lr: 0.000012 - momentum: 0.000000 2023-10-12 09:28:15,052 epoch 10 - iter 396/992 - loss 0.01233967 - time (sec): 197.54 - samples/sec: 333.82 - lr: 0.000010 - momentum: 0.000000 2023-10-12 09:29:10,897 epoch 10 - iter 495/992 - loss 0.01168010 - time (sec): 253.38 - samples/sec: 325.72 - lr: 0.000008 - momentum: 0.000000 2023-10-12 09:30:07,424 epoch 10 - iter 594/992 - loss 0.01179804 - time (sec): 309.91 - samples/sec: 317.53 - lr: 0.000007 - momentum: 0.000000 2023-10-12 09:31:02,969 epoch 10 - iter 693/992 - loss 0.01193022 - time (sec): 365.46 - samples/sec: 312.37 - lr: 0.000005 - momentum: 0.000000 2023-10-12 09:31:59,360 epoch 10 - iter 792/992 - loss 0.01236357 - time (sec): 421.85 - samples/sec: 309.60 - lr: 0.000004 - momentum: 0.000000 2023-10-12 09:32:52,374 epoch 10 - iter 891/992 - loss 0.01270245 - time (sec): 474.86 - samples/sec: 310.06 - lr: 0.000002 - momentum: 0.000000 2023-10-12 09:33:40,701 epoch 10 - iter 990/992 - loss 0.01318705 - time (sec): 523.19 - samples/sec: 313.02 - lr: 0.000000 - momentum: 0.000000 2023-10-12 09:33:41,597 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:33:41,597 EPOCH 10 done: loss 0.0134 - lr: 0.000000 2023-10-12 09:34:08,746 DEV : loss 0.19303283095359802 - f1-score (micro avg) 0.7562 2023-10-12 09:34:09,762 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:34:09,764 Loading model from best epoch ... 2023-10-12 09:34:15,360 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 09:34:40,369 Results: - F-score (micro) 0.7486 - F-score (macro) 0.6567 - Accuracy 0.6255 By class: precision recall f1-score support LOC 0.8082 0.8107 0.8095 655 PER 0.6795 0.7892 0.7303 223 ORG 0.5000 0.3780 0.4305 127 micro avg 0.7460 0.7512 0.7486 1005 macro avg 0.6626 0.6593 0.6567 1005 weighted avg 0.7407 0.7512 0.7440 1005 2023-10-12 09:34:40,369 ----------------------------------------------------------------------------------------------------