2023-10-11 19:13:59,349 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,352 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 19:13:59,352 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,352 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-11 19:13:59,352 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,352 Train: 5777 sentences 2023-10-11 19:13:59,352 (train_with_dev=False, train_with_test=False) 2023-10-11 19:13:59,352 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,352 Training Params: 2023-10-11 19:13:59,352 - learning_rate: "0.00015" 2023-10-11 19:13:59,352 - mini_batch_size: "8" 2023-10-11 19:13:59,353 - max_epochs: "10" 2023-10-11 19:13:59,353 - shuffle: "True" 2023-10-11 19:13:59,353 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,353 Plugins: 2023-10-11 19:13:59,353 - TensorboardLogger 2023-10-11 19:13:59,353 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 19:13:59,353 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,353 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 19:13:59,353 - metric: "('micro avg', 'f1-score')" 2023-10-11 19:13:59,353 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,353 Computation: 2023-10-11 19:13:59,353 - compute on device: cuda:0 2023-10-11 19:13:59,353 - embedding storage: none 2023-10-11 19:13:59,353 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,353 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-11 19:13:59,354 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,354 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:13:59,354 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 19:14:44,985 epoch 1 - iter 72/723 - loss 2.57086872 - time (sec): 45.63 - samples/sec: 407.45 - lr: 0.000015 - momentum: 0.000000 2023-10-11 19:15:23,902 epoch 1 - iter 144/723 - loss 2.53330840 - time (sec): 84.55 - samples/sec: 421.96 - lr: 0.000030 - momentum: 0.000000 2023-10-11 19:16:07,025 epoch 1 - iter 216/723 - loss 2.37416485 - time (sec): 127.67 - samples/sec: 422.29 - lr: 0.000045 - momentum: 0.000000 2023-10-11 19:16:48,488 epoch 1 - iter 288/723 - loss 2.16789438 - time (sec): 169.13 - samples/sec: 421.74 - lr: 0.000060 - momentum: 0.000000 2023-10-11 19:17:29,619 epoch 1 - iter 360/723 - loss 1.94703601 - time (sec): 210.26 - samples/sec: 423.82 - lr: 0.000074 - momentum: 0.000000 2023-10-11 19:18:12,121 epoch 1 - iter 432/723 - loss 1.72485908 - time (sec): 252.77 - samples/sec: 422.53 - lr: 0.000089 - momentum: 0.000000 2023-10-11 19:18:54,982 epoch 1 - iter 504/723 - loss 1.52985339 - time (sec): 295.63 - samples/sec: 421.70 - lr: 0.000104 - momentum: 0.000000 2023-10-11 19:19:35,885 epoch 1 - iter 576/723 - loss 1.37738975 - time (sec): 336.53 - samples/sec: 421.04 - lr: 0.000119 - momentum: 0.000000 2023-10-11 19:20:16,171 epoch 1 - iter 648/723 - loss 1.25078985 - time (sec): 376.81 - samples/sec: 422.83 - lr: 0.000134 - momentum: 0.000000 2023-10-11 19:20:56,215 epoch 1 - iter 720/723 - loss 1.15095694 - time (sec): 416.86 - samples/sec: 421.68 - lr: 0.000149 - momentum: 0.000000 2023-10-11 19:20:57,395 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:20:57,396 EPOCH 1 done: loss 1.1488 - lr: 0.000149 2023-10-11 19:21:17,705 DEV : loss 0.21648679673671722 - f1-score (micro avg) 0.0123 2023-10-11 19:21:17,740 saving best model 2023-10-11 19:21:18,688 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:22:01,303 epoch 2 - iter 72/723 - loss 0.16183662 - time (sec): 42.61 - samples/sec: 399.65 - lr: 0.000148 - momentum: 0.000000 2023-10-11 19:22:42,923 epoch 2 - iter 144/723 - loss 0.16103628 - time (sec): 84.23 - samples/sec: 411.91 - lr: 0.000147 - momentum: 0.000000 2023-10-11 19:23:25,027 epoch 2 - iter 216/723 - loss 0.15369470 - time (sec): 126.34 - samples/sec: 419.60 - lr: 0.000145 - momentum: 0.000000 2023-10-11 19:24:10,544 epoch 2 - iter 288/723 - loss 0.14990715 - time (sec): 171.85 - samples/sec: 413.84 - lr: 0.000143 - momentum: 0.000000 2023-10-11 19:24:54,049 epoch 2 - iter 360/723 - loss 0.14344494 - time (sec): 215.36 - samples/sec: 411.96 - lr: 0.000142 - momentum: 0.000000 2023-10-11 19:25:35,378 epoch 2 - iter 432/723 - loss 0.14132544 - time (sec): 256.69 - samples/sec: 412.37 - lr: 0.000140 - momentum: 0.000000 2023-10-11 19:26:17,204 epoch 2 - iter 504/723 - loss 0.13701095 - time (sec): 298.51 - samples/sec: 412.84 - lr: 0.000138 - momentum: 0.000000 2023-10-11 19:26:59,167 epoch 2 - iter 576/723 - loss 0.13330675 - time (sec): 340.48 - samples/sec: 413.59 - lr: 0.000137 - momentum: 0.000000 2023-10-11 19:27:42,057 epoch 2 - iter 648/723 - loss 0.13052442 - time (sec): 383.37 - samples/sec: 410.86 - lr: 0.000135 - momentum: 0.000000 2023-10-11 19:28:23,642 epoch 2 - iter 720/723 - loss 0.12780591 - time (sec): 424.95 - samples/sec: 413.52 - lr: 0.000133 - momentum: 0.000000 2023-10-11 19:28:24,842 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:28:24,843 EPOCH 2 done: loss 0.1278 - lr: 0.000133 2023-10-11 19:28:45,421 DEV : loss 0.1180637776851654 - f1-score (micro avg) 0.6647 2023-10-11 19:28:45,451 saving best model 2023-10-11 19:28:52,913 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:29:33,850 epoch 3 - iter 72/723 - loss 0.10015265 - time (sec): 40.91 - samples/sec: 444.71 - lr: 0.000132 - momentum: 0.000000 2023-10-11 19:30:13,514 epoch 3 - iter 144/723 - loss 0.08972155 - time (sec): 80.57 - samples/sec: 438.21 - lr: 0.000130 - momentum: 0.000000 2023-10-11 19:30:52,539 epoch 3 - iter 216/723 - loss 0.08709046 - time (sec): 119.59 - samples/sec: 438.87 - lr: 0.000128 - momentum: 0.000000 2023-10-11 19:31:33,908 epoch 3 - iter 288/723 - loss 0.08496174 - time (sec): 160.96 - samples/sec: 430.63 - lr: 0.000127 - momentum: 0.000000 2023-10-11 19:32:13,347 epoch 3 - iter 360/723 - loss 0.08081415 - time (sec): 200.40 - samples/sec: 432.84 - lr: 0.000125 - momentum: 0.000000 2023-10-11 19:32:54,772 epoch 3 - iter 432/723 - loss 0.08200482 - time (sec): 241.83 - samples/sec: 437.58 - lr: 0.000123 - momentum: 0.000000 2023-10-11 19:33:38,370 epoch 3 - iter 504/723 - loss 0.08119768 - time (sec): 285.42 - samples/sec: 430.71 - lr: 0.000122 - momentum: 0.000000 2023-10-11 19:34:22,240 epoch 3 - iter 576/723 - loss 0.07951784 - time (sec): 329.29 - samples/sec: 425.75 - lr: 0.000120 - momentum: 0.000000 2023-10-11 19:35:04,706 epoch 3 - iter 648/723 - loss 0.07780987 - time (sec): 371.76 - samples/sec: 422.78 - lr: 0.000118 - momentum: 0.000000 2023-10-11 19:35:48,845 epoch 3 - iter 720/723 - loss 0.07699412 - time (sec): 415.90 - samples/sec: 422.44 - lr: 0.000117 - momentum: 0.000000 2023-10-11 19:35:50,187 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:35:50,188 EPOCH 3 done: loss 0.0769 - lr: 0.000117 2023-10-11 19:36:12,306 DEV : loss 0.08086864650249481 - f1-score (micro avg) 0.8377 2023-10-11 19:36:12,336 saving best model 2023-10-11 19:36:23,238 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:37:04,812 epoch 4 - iter 72/723 - loss 0.06706830 - time (sec): 41.57 - samples/sec: 421.51 - lr: 0.000115 - momentum: 0.000000 2023-10-11 19:37:47,107 epoch 4 - iter 144/723 - loss 0.05669850 - time (sec): 83.86 - samples/sec: 423.73 - lr: 0.000113 - momentum: 0.000000 2023-10-11 19:38:29,958 epoch 4 - iter 216/723 - loss 0.05685534 - time (sec): 126.72 - samples/sec: 416.60 - lr: 0.000112 - momentum: 0.000000 2023-10-11 19:39:11,998 epoch 4 - iter 288/723 - loss 0.05541904 - time (sec): 168.76 - samples/sec: 416.31 - lr: 0.000110 - momentum: 0.000000 2023-10-11 19:39:54,241 epoch 4 - iter 360/723 - loss 0.05734549 - time (sec): 211.00 - samples/sec: 410.12 - lr: 0.000108 - momentum: 0.000000 2023-10-11 19:40:37,348 epoch 4 - iter 432/723 - loss 0.05576876 - time (sec): 254.10 - samples/sec: 412.05 - lr: 0.000107 - momentum: 0.000000 2023-10-11 19:41:20,058 epoch 4 - iter 504/723 - loss 0.05631673 - time (sec): 296.82 - samples/sec: 414.70 - lr: 0.000105 - momentum: 0.000000 2023-10-11 19:42:01,084 epoch 4 - iter 576/723 - loss 0.05413009 - time (sec): 337.84 - samples/sec: 416.26 - lr: 0.000103 - momentum: 0.000000 2023-10-11 19:42:44,112 epoch 4 - iter 648/723 - loss 0.05175789 - time (sec): 380.87 - samples/sec: 419.25 - lr: 0.000102 - momentum: 0.000000 2023-10-11 19:43:24,791 epoch 4 - iter 720/723 - loss 0.05190640 - time (sec): 421.55 - samples/sec: 417.18 - lr: 0.000100 - momentum: 0.000000 2023-10-11 19:43:26,016 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:43:26,016 EPOCH 4 done: loss 0.0519 - lr: 0.000100 2023-10-11 19:43:48,629 DEV : loss 0.07001111656427383 - f1-score (micro avg) 0.8727 2023-10-11 19:43:48,665 saving best model 2023-10-11 19:43:52,185 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:44:35,248 epoch 5 - iter 72/723 - loss 0.03577280 - time (sec): 43.06 - samples/sec: 410.16 - lr: 0.000098 - momentum: 0.000000 2023-10-11 19:45:15,914 epoch 5 - iter 144/723 - loss 0.03264401 - time (sec): 83.72 - samples/sec: 421.27 - lr: 0.000097 - momentum: 0.000000 2023-10-11 19:45:57,621 epoch 5 - iter 216/723 - loss 0.03662028 - time (sec): 125.43 - samples/sec: 426.47 - lr: 0.000095 - momentum: 0.000000 2023-10-11 19:46:40,431 epoch 5 - iter 288/723 - loss 0.03545328 - time (sec): 168.24 - samples/sec: 420.72 - lr: 0.000093 - momentum: 0.000000 2023-10-11 19:47:22,294 epoch 5 - iter 360/723 - loss 0.03467577 - time (sec): 210.10 - samples/sec: 425.95 - lr: 0.000092 - momentum: 0.000000 2023-10-11 19:48:03,651 epoch 5 - iter 432/723 - loss 0.03412429 - time (sec): 251.46 - samples/sec: 420.87 - lr: 0.000090 - momentum: 0.000000 2023-10-11 19:48:47,048 epoch 5 - iter 504/723 - loss 0.03625783 - time (sec): 294.86 - samples/sec: 419.45 - lr: 0.000088 - momentum: 0.000000 2023-10-11 19:49:28,818 epoch 5 - iter 576/723 - loss 0.03576595 - time (sec): 336.63 - samples/sec: 419.01 - lr: 0.000087 - momentum: 0.000000 2023-10-11 19:50:09,866 epoch 5 - iter 648/723 - loss 0.03742931 - time (sec): 377.68 - samples/sec: 419.77 - lr: 0.000085 - momentum: 0.000000 2023-10-11 19:50:50,937 epoch 5 - iter 720/723 - loss 0.03701053 - time (sec): 418.75 - samples/sec: 419.33 - lr: 0.000083 - momentum: 0.000000 2023-10-11 19:50:52,310 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:50:52,311 EPOCH 5 done: loss 0.0369 - lr: 0.000083 2023-10-11 19:51:16,106 DEV : loss 0.07903970032930374 - f1-score (micro avg) 0.8603 2023-10-11 19:51:16,143 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:52:00,787 epoch 6 - iter 72/723 - loss 0.02447853 - time (sec): 44.64 - samples/sec: 416.76 - lr: 0.000082 - momentum: 0.000000 2023-10-11 19:52:44,152 epoch 6 - iter 144/723 - loss 0.02205426 - time (sec): 88.01 - samples/sec: 418.68 - lr: 0.000080 - momentum: 0.000000 2023-10-11 19:53:25,211 epoch 6 - iter 216/723 - loss 0.02750277 - time (sec): 129.07 - samples/sec: 410.91 - lr: 0.000078 - momentum: 0.000000 2023-10-11 19:54:08,739 epoch 6 - iter 288/723 - loss 0.02591301 - time (sec): 172.59 - samples/sec: 408.80 - lr: 0.000077 - momentum: 0.000000 2023-10-11 19:54:51,542 epoch 6 - iter 360/723 - loss 0.02561029 - time (sec): 215.40 - samples/sec: 403.07 - lr: 0.000075 - momentum: 0.000000 2023-10-11 19:55:37,784 epoch 6 - iter 432/723 - loss 0.02497769 - time (sec): 261.64 - samples/sec: 400.75 - lr: 0.000073 - momentum: 0.000000 2023-10-11 19:56:24,102 epoch 6 - iter 504/723 - loss 0.02676010 - time (sec): 307.96 - samples/sec: 401.09 - lr: 0.000072 - momentum: 0.000000 2023-10-11 19:57:11,010 epoch 6 - iter 576/723 - loss 0.02633032 - time (sec): 354.86 - samples/sec: 398.37 - lr: 0.000070 - momentum: 0.000000 2023-10-11 19:57:54,453 epoch 6 - iter 648/723 - loss 0.02681762 - time (sec): 398.31 - samples/sec: 396.04 - lr: 0.000068 - momentum: 0.000000 2023-10-11 19:58:40,721 epoch 6 - iter 720/723 - loss 0.02707283 - time (sec): 444.58 - samples/sec: 394.53 - lr: 0.000067 - momentum: 0.000000 2023-10-11 19:58:42,172 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:58:42,172 EPOCH 6 done: loss 0.0270 - lr: 0.000067 2023-10-11 19:59:07,335 DEV : loss 0.09442799538373947 - f1-score (micro avg) 0.8571 2023-10-11 19:59:07,373 ---------------------------------------------------------------------------------------------------- 2023-10-11 19:59:52,725 epoch 7 - iter 72/723 - loss 0.02094561 - time (sec): 45.35 - samples/sec: 391.65 - lr: 0.000065 - momentum: 0.000000 2023-10-11 20:00:33,300 epoch 7 - iter 144/723 - loss 0.01875934 - time (sec): 85.92 - samples/sec: 398.24 - lr: 0.000063 - momentum: 0.000000 2023-10-11 20:01:14,107 epoch 7 - iter 216/723 - loss 0.01681267 - time (sec): 126.73 - samples/sec: 398.42 - lr: 0.000062 - momentum: 0.000000 2023-10-11 20:01:57,563 epoch 7 - iter 288/723 - loss 0.01766153 - time (sec): 170.19 - samples/sec: 410.15 - lr: 0.000060 - momentum: 0.000000 2023-10-11 20:02:43,950 epoch 7 - iter 360/723 - loss 0.02292811 - time (sec): 216.58 - samples/sec: 405.16 - lr: 0.000058 - momentum: 0.000000 2023-10-11 20:03:27,340 epoch 7 - iter 432/723 - loss 0.02258017 - time (sec): 259.97 - samples/sec: 407.35 - lr: 0.000057 - momentum: 0.000000 2023-10-11 20:04:08,335 epoch 7 - iter 504/723 - loss 0.02155383 - time (sec): 300.96 - samples/sec: 407.69 - lr: 0.000055 - momentum: 0.000000 2023-10-11 20:04:50,397 epoch 7 - iter 576/723 - loss 0.02107573 - time (sec): 343.02 - samples/sec: 409.98 - lr: 0.000053 - momentum: 0.000000 2023-10-11 20:05:32,273 epoch 7 - iter 648/723 - loss 0.02110528 - time (sec): 384.90 - samples/sec: 411.17 - lr: 0.000052 - momentum: 0.000000 2023-10-11 20:06:15,185 epoch 7 - iter 720/723 - loss 0.02085093 - time (sec): 427.81 - samples/sec: 410.19 - lr: 0.000050 - momentum: 0.000000 2023-10-11 20:06:16,648 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:06:16,649 EPOCH 7 done: loss 0.0208 - lr: 0.000050 2023-10-11 20:06:40,108 DEV : loss 0.09876430779695511 - f1-score (micro avg) 0.8703 2023-10-11 20:06:40,152 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:07:22,836 epoch 8 - iter 72/723 - loss 0.00933933 - time (sec): 42.68 - samples/sec: 420.64 - lr: 0.000048 - momentum: 0.000000 2023-10-11 20:08:03,853 epoch 8 - iter 144/723 - loss 0.01292328 - time (sec): 83.70 - samples/sec: 408.45 - lr: 0.000047 - momentum: 0.000000 2023-10-11 20:08:43,013 epoch 8 - iter 216/723 - loss 0.01314725 - time (sec): 122.86 - samples/sec: 413.22 - lr: 0.000045 - momentum: 0.000000 2023-10-11 20:09:23,514 epoch 8 - iter 288/723 - loss 0.01277868 - time (sec): 163.36 - samples/sec: 415.17 - lr: 0.000043 - momentum: 0.000000 2023-10-11 20:10:04,265 epoch 8 - iter 360/723 - loss 0.01439634 - time (sec): 204.11 - samples/sec: 418.58 - lr: 0.000042 - momentum: 0.000000 2023-10-11 20:10:45,239 epoch 8 - iter 432/723 - loss 0.01543636 - time (sec): 245.09 - samples/sec: 423.69 - lr: 0.000040 - momentum: 0.000000 2023-10-11 20:11:26,857 epoch 8 - iter 504/723 - loss 0.01613212 - time (sec): 286.70 - samples/sec: 427.85 - lr: 0.000038 - momentum: 0.000000 2023-10-11 20:12:06,133 epoch 8 - iter 576/723 - loss 0.01588643 - time (sec): 325.98 - samples/sec: 429.05 - lr: 0.000037 - momentum: 0.000000 2023-10-11 20:12:48,288 epoch 8 - iter 648/723 - loss 0.01595246 - time (sec): 368.13 - samples/sec: 429.70 - lr: 0.000035 - momentum: 0.000000 2023-10-11 20:13:31,496 epoch 8 - iter 720/723 - loss 0.01649749 - time (sec): 411.34 - samples/sec: 427.11 - lr: 0.000033 - momentum: 0.000000 2023-10-11 20:13:32,739 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:13:32,740 EPOCH 8 done: loss 0.0165 - lr: 0.000033 2023-10-11 20:13:55,278 DEV : loss 0.10501116514205933 - f1-score (micro avg) 0.8751 2023-10-11 20:13:55,327 saving best model 2023-10-11 20:13:56,488 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:14:41,953 epoch 9 - iter 72/723 - loss 0.00692157 - time (sec): 45.46 - samples/sec: 377.43 - lr: 0.000032 - momentum: 0.000000 2023-10-11 20:15:23,711 epoch 9 - iter 144/723 - loss 0.00987217 - time (sec): 87.22 - samples/sec: 390.20 - lr: 0.000030 - momentum: 0.000000 2023-10-11 20:16:06,341 epoch 9 - iter 216/723 - loss 0.01091573 - time (sec): 129.85 - samples/sec: 394.00 - lr: 0.000028 - momentum: 0.000000 2023-10-11 20:16:50,893 epoch 9 - iter 288/723 - loss 0.01283372 - time (sec): 174.40 - samples/sec: 400.17 - lr: 0.000027 - momentum: 0.000000 2023-10-11 20:17:36,364 epoch 9 - iter 360/723 - loss 0.01390500 - time (sec): 219.87 - samples/sec: 402.90 - lr: 0.000025 - momentum: 0.000000 2023-10-11 20:18:19,862 epoch 9 - iter 432/723 - loss 0.01345060 - time (sec): 263.37 - samples/sec: 403.65 - lr: 0.000023 - momentum: 0.000000 2023-10-11 20:19:02,380 epoch 9 - iter 504/723 - loss 0.01441546 - time (sec): 305.89 - samples/sec: 405.73 - lr: 0.000022 - momentum: 0.000000 2023-10-11 20:19:45,060 epoch 9 - iter 576/723 - loss 0.01394066 - time (sec): 348.57 - samples/sec: 406.20 - lr: 0.000020 - momentum: 0.000000 2023-10-11 20:20:25,766 epoch 9 - iter 648/723 - loss 0.01417785 - time (sec): 389.28 - samples/sec: 407.17 - lr: 0.000018 - momentum: 0.000000 2023-10-11 20:21:06,100 epoch 9 - iter 720/723 - loss 0.01360125 - time (sec): 429.61 - samples/sec: 408.77 - lr: 0.000017 - momentum: 0.000000 2023-10-11 20:21:07,399 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:21:07,399 EPOCH 9 done: loss 0.0136 - lr: 0.000017 2023-10-11 20:21:30,023 DEV : loss 0.11595697700977325 - f1-score (micro avg) 0.8585 2023-10-11 20:21:30,064 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:22:12,604 epoch 10 - iter 72/723 - loss 0.01406246 - time (sec): 42.54 - samples/sec: 423.37 - lr: 0.000015 - momentum: 0.000000 2023-10-11 20:22:55,515 epoch 10 - iter 144/723 - loss 0.01245519 - time (sec): 85.45 - samples/sec: 424.31 - lr: 0.000013 - momentum: 0.000000 2023-10-11 20:23:38,829 epoch 10 - iter 216/723 - loss 0.01322961 - time (sec): 128.76 - samples/sec: 430.22 - lr: 0.000012 - momentum: 0.000000 2023-10-11 20:24:20,473 epoch 10 - iter 288/723 - loss 0.01274717 - time (sec): 170.41 - samples/sec: 428.27 - lr: 0.000010 - momentum: 0.000000 2023-10-11 20:25:01,572 epoch 10 - iter 360/723 - loss 0.01269929 - time (sec): 211.51 - samples/sec: 430.74 - lr: 0.000008 - momentum: 0.000000 2023-10-11 20:25:43,691 epoch 10 - iter 432/723 - loss 0.01264156 - time (sec): 253.62 - samples/sec: 423.44 - lr: 0.000007 - momentum: 0.000000 2023-10-11 20:26:25,384 epoch 10 - iter 504/723 - loss 0.01254290 - time (sec): 295.32 - samples/sec: 421.37 - lr: 0.000005 - momentum: 0.000000 2023-10-11 20:27:06,979 epoch 10 - iter 576/723 - loss 0.01252330 - time (sec): 336.91 - samples/sec: 420.64 - lr: 0.000003 - momentum: 0.000000 2023-10-11 20:27:46,703 epoch 10 - iter 648/723 - loss 0.01221020 - time (sec): 376.64 - samples/sec: 421.88 - lr: 0.000002 - momentum: 0.000000 2023-10-11 20:28:26,795 epoch 10 - iter 720/723 - loss 0.01195603 - time (sec): 416.73 - samples/sec: 421.67 - lr: 0.000000 - momentum: 0.000000 2023-10-11 20:28:27,946 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:28:27,946 EPOCH 10 done: loss 0.0119 - lr: 0.000000 2023-10-11 20:28:49,635 DEV : loss 0.11474814265966415 - f1-score (micro avg) 0.8625 2023-10-11 20:28:50,709 ---------------------------------------------------------------------------------------------------- 2023-10-11 20:28:50,711 Loading model from best epoch ... 2023-10-11 20:28:55,094 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-11 20:29:19,250 Results: - F-score (micro) 0.8422 - F-score (macro) 0.7359 - Accuracy 0.7429 By class: precision recall f1-score support PER 0.8480 0.8797 0.8635 482 LOC 0.9159 0.8319 0.8719 458 ORG 0.5172 0.4348 0.4724 69 micro avg 0.8573 0.8276 0.8422 1009 macro avg 0.7604 0.7154 0.7359 1009 weighted avg 0.8562 0.8276 0.8406 1009 2023-10-11 20:29:19,250 ----------------------------------------------------------------------------------------------------