2023-10-10 21:41:37,288 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,290 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-10 21:41:37,290 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,291 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-10 21:41:37,291 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,291 Train: 1166 sentences 2023-10-10 21:41:37,291 (train_with_dev=False, train_with_test=False) 2023-10-10 21:41:37,291 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,291 Training Params: 2023-10-10 21:41:37,291 - learning_rate: "0.00015" 2023-10-10 21:41:37,291 - mini_batch_size: "8" 2023-10-10 21:41:37,291 - max_epochs: "10" 2023-10-10 21:41:37,291 - shuffle: "True" 2023-10-10 21:41:37,291 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,291 Plugins: 2023-10-10 21:41:37,291 - TensorboardLogger 2023-10-10 21:41:37,292 - LinearScheduler | warmup_fraction: '0.1' 2023-10-10 21:41:37,292 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,292 Final evaluation on model from best epoch (best-model.pt) 2023-10-10 21:41:37,292 - metric: "('micro avg', 'f1-score')" 2023-10-10 21:41:37,292 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,292 Computation: 2023-10-10 21:41:37,292 - compute on device: cuda:0 2023-10-10 21:41:37,292 - embedding storage: none 2023-10-10 21:41:37,292 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,292 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-10 21:41:37,292 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,292 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:41:37,292 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-10 21:41:47,892 epoch 1 - iter 14/146 - loss 2.83099006 - time (sec): 10.60 - samples/sec: 454.83 - lr: 0.000013 - momentum: 0.000000 2023-10-10 21:41:56,307 epoch 1 - iter 28/146 - loss 2.82672987 - time (sec): 19.01 - samples/sec: 453.39 - lr: 0.000028 - momentum: 0.000000 2023-10-10 21:42:05,412 epoch 1 - iter 42/146 - loss 2.81735922 - time (sec): 28.12 - samples/sec: 487.02 - lr: 0.000042 - momentum: 0.000000 2023-10-10 21:42:13,913 epoch 1 - iter 56/146 - loss 2.80389264 - time (sec): 36.62 - samples/sec: 492.89 - lr: 0.000057 - momentum: 0.000000 2023-10-10 21:42:21,745 epoch 1 - iter 70/146 - loss 2.78350842 - time (sec): 44.45 - samples/sec: 488.75 - lr: 0.000071 - momentum: 0.000000 2023-10-10 21:42:30,315 epoch 1 - iter 84/146 - loss 2.74176859 - time (sec): 53.02 - samples/sec: 484.59 - lr: 0.000085 - momentum: 0.000000 2023-10-10 21:42:38,433 epoch 1 - iter 98/146 - loss 2.68567253 - time (sec): 61.14 - samples/sec: 478.65 - lr: 0.000100 - momentum: 0.000000 2023-10-10 21:42:48,723 epoch 1 - iter 112/146 - loss 2.58862607 - time (sec): 71.43 - samples/sec: 486.62 - lr: 0.000114 - momentum: 0.000000 2023-10-10 21:42:57,392 epoch 1 - iter 126/146 - loss 2.51465264 - time (sec): 80.10 - samples/sec: 484.34 - lr: 0.000128 - momentum: 0.000000 2023-10-10 21:43:07,025 epoch 1 - iter 140/146 - loss 2.43562024 - time (sec): 89.73 - samples/sec: 478.39 - lr: 0.000143 - momentum: 0.000000 2023-10-10 21:43:10,647 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:43:10,648 EPOCH 1 done: loss 2.4069 - lr: 0.000143 2023-10-10 21:43:16,880 DEV : loss 1.356826901435852 - f1-score (micro avg) 0.0 2023-10-10 21:43:16,889 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:43:26,905 epoch 2 - iter 14/146 - loss 1.39370672 - time (sec): 10.01 - samples/sec: 457.36 - lr: 0.000149 - momentum: 0.000000 2023-10-10 21:43:36,699 epoch 2 - iter 28/146 - loss 1.35742229 - time (sec): 19.81 - samples/sec: 449.72 - lr: 0.000147 - momentum: 0.000000 2023-10-10 21:43:46,625 epoch 2 - iter 42/146 - loss 1.22339354 - time (sec): 29.73 - samples/sec: 441.39 - lr: 0.000145 - momentum: 0.000000 2023-10-10 21:43:56,477 epoch 2 - iter 56/146 - loss 1.12844539 - time (sec): 39.59 - samples/sec: 446.10 - lr: 0.000144 - momentum: 0.000000 2023-10-10 21:44:07,430 epoch 2 - iter 70/146 - loss 1.05603913 - time (sec): 50.54 - samples/sec: 453.57 - lr: 0.000142 - momentum: 0.000000 2023-10-10 21:44:16,779 epoch 2 - iter 84/146 - loss 0.99801684 - time (sec): 59.89 - samples/sec: 448.64 - lr: 0.000141 - momentum: 0.000000 2023-10-10 21:44:26,181 epoch 2 - iter 98/146 - loss 0.95042167 - time (sec): 69.29 - samples/sec: 443.73 - lr: 0.000139 - momentum: 0.000000 2023-10-10 21:44:35,483 epoch 2 - iter 112/146 - loss 0.91179574 - time (sec): 78.59 - samples/sec: 439.91 - lr: 0.000137 - momentum: 0.000000 2023-10-10 21:44:44,970 epoch 2 - iter 126/146 - loss 0.87949814 - time (sec): 88.08 - samples/sec: 440.06 - lr: 0.000136 - momentum: 0.000000 2023-10-10 21:44:54,351 epoch 2 - iter 140/146 - loss 0.84964841 - time (sec): 97.46 - samples/sec: 440.66 - lr: 0.000134 - momentum: 0.000000 2023-10-10 21:44:58,079 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:44:58,079 EPOCH 2 done: loss 0.8457 - lr: 0.000134 2023-10-10 21:45:04,659 DEV : loss 0.4102368950843811 - f1-score (micro avg) 0.0 2023-10-10 21:45:04,669 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:45:13,948 epoch 3 - iter 14/146 - loss 0.53677705 - time (sec): 9.28 - samples/sec: 380.64 - lr: 0.000132 - momentum: 0.000000 2023-10-10 21:45:23,837 epoch 3 - iter 28/146 - loss 0.43791549 - time (sec): 19.17 - samples/sec: 446.68 - lr: 0.000130 - momentum: 0.000000 2023-10-10 21:45:32,873 epoch 3 - iter 42/146 - loss 0.45683813 - time (sec): 28.20 - samples/sec: 453.06 - lr: 0.000129 - momentum: 0.000000 2023-10-10 21:45:41,710 epoch 3 - iter 56/146 - loss 0.44634295 - time (sec): 37.04 - samples/sec: 451.44 - lr: 0.000127 - momentum: 0.000000 2023-10-10 21:45:50,849 epoch 3 - iter 70/146 - loss 0.43680565 - time (sec): 46.18 - samples/sec: 454.80 - lr: 0.000126 - momentum: 0.000000 2023-10-10 21:45:59,653 epoch 3 - iter 84/146 - loss 0.43710977 - time (sec): 54.98 - samples/sec: 447.84 - lr: 0.000124 - momentum: 0.000000 2023-10-10 21:46:09,907 epoch 3 - iter 98/146 - loss 0.46074815 - time (sec): 65.24 - samples/sec: 457.86 - lr: 0.000122 - momentum: 0.000000 2023-10-10 21:46:19,746 epoch 3 - iter 112/146 - loss 0.44530575 - time (sec): 75.08 - samples/sec: 462.35 - lr: 0.000121 - momentum: 0.000000 2023-10-10 21:46:29,509 epoch 3 - iter 126/146 - loss 0.43292114 - time (sec): 84.84 - samples/sec: 461.28 - lr: 0.000119 - momentum: 0.000000 2023-10-10 21:46:37,992 epoch 3 - iter 140/146 - loss 0.43074177 - time (sec): 93.32 - samples/sec: 456.80 - lr: 0.000118 - momentum: 0.000000 2023-10-10 21:46:41,672 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:46:41,672 EPOCH 3 done: loss 0.4255 - lr: 0.000118 2023-10-10 21:46:47,458 DEV : loss 0.29022911190986633 - f1-score (micro avg) 0.0078 2023-10-10 21:46:47,467 saving best model 2023-10-10 21:46:48,888 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:46:59,134 epoch 4 - iter 14/146 - loss 0.30368856 - time (sec): 10.24 - samples/sec: 462.62 - lr: 0.000115 - momentum: 0.000000 2023-10-10 21:47:08,517 epoch 4 - iter 28/146 - loss 0.28383264 - time (sec): 19.63 - samples/sec: 438.24 - lr: 0.000114 - momentum: 0.000000 2023-10-10 21:47:18,673 epoch 4 - iter 42/146 - loss 0.34186303 - time (sec): 29.78 - samples/sec: 452.85 - lr: 0.000112 - momentum: 0.000000 2023-10-10 21:47:28,077 epoch 4 - iter 56/146 - loss 0.34206921 - time (sec): 39.19 - samples/sec: 446.05 - lr: 0.000111 - momentum: 0.000000 2023-10-10 21:47:38,755 epoch 4 - iter 70/146 - loss 0.33835857 - time (sec): 49.87 - samples/sec: 443.34 - lr: 0.000109 - momentum: 0.000000 2023-10-10 21:47:48,562 epoch 4 - iter 84/146 - loss 0.33883733 - time (sec): 59.67 - samples/sec: 441.55 - lr: 0.000107 - momentum: 0.000000 2023-10-10 21:47:59,381 epoch 4 - iter 98/146 - loss 0.33112961 - time (sec): 70.49 - samples/sec: 439.76 - lr: 0.000106 - momentum: 0.000000 2023-10-10 21:48:08,664 epoch 4 - iter 112/146 - loss 0.32683644 - time (sec): 79.77 - samples/sec: 436.20 - lr: 0.000104 - momentum: 0.000000 2023-10-10 21:48:18,314 epoch 4 - iter 126/146 - loss 0.32139242 - time (sec): 89.42 - samples/sec: 434.36 - lr: 0.000103 - momentum: 0.000000 2023-10-10 21:48:27,775 epoch 4 - iter 140/146 - loss 0.32278398 - time (sec): 98.88 - samples/sec: 429.61 - lr: 0.000101 - momentum: 0.000000 2023-10-10 21:48:32,032 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:48:32,033 EPOCH 4 done: loss 0.3173 - lr: 0.000101 2023-10-10 21:48:38,292 DEV : loss 0.23751258850097656 - f1-score (micro avg) 0.3686 2023-10-10 21:48:38,302 saving best model 2023-10-10 21:48:47,130 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:48:56,208 epoch 5 - iter 14/146 - loss 0.26938515 - time (sec): 9.07 - samples/sec: 453.04 - lr: 0.000099 - momentum: 0.000000 2023-10-10 21:49:06,203 epoch 5 - iter 28/146 - loss 0.33101405 - time (sec): 19.07 - samples/sec: 475.22 - lr: 0.000097 - momentum: 0.000000 2023-10-10 21:49:15,596 epoch 5 - iter 42/146 - loss 0.31691507 - time (sec): 28.46 - samples/sec: 466.90 - lr: 0.000096 - momentum: 0.000000 2023-10-10 21:49:24,647 epoch 5 - iter 56/146 - loss 0.28600023 - time (sec): 37.51 - samples/sec: 461.87 - lr: 0.000094 - momentum: 0.000000 2023-10-10 21:49:34,897 epoch 5 - iter 70/146 - loss 0.27334427 - time (sec): 47.76 - samples/sec: 459.23 - lr: 0.000092 - momentum: 0.000000 2023-10-10 21:49:44,423 epoch 5 - iter 84/146 - loss 0.26564120 - time (sec): 57.29 - samples/sec: 452.58 - lr: 0.000091 - momentum: 0.000000 2023-10-10 21:49:53,632 epoch 5 - iter 98/146 - loss 0.26142888 - time (sec): 66.50 - samples/sec: 454.26 - lr: 0.000089 - momentum: 0.000000 2023-10-10 21:50:02,850 epoch 5 - iter 112/146 - loss 0.25808288 - time (sec): 75.72 - samples/sec: 458.61 - lr: 0.000088 - momentum: 0.000000 2023-10-10 21:50:11,451 epoch 5 - iter 126/146 - loss 0.25610218 - time (sec): 84.32 - samples/sec: 458.55 - lr: 0.000086 - momentum: 0.000000 2023-10-10 21:50:20,211 epoch 5 - iter 140/146 - loss 0.25265262 - time (sec): 93.08 - samples/sec: 458.37 - lr: 0.000084 - momentum: 0.000000 2023-10-10 21:50:23,964 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:50:23,964 EPOCH 5 done: loss 0.2526 - lr: 0.000084 2023-10-10 21:50:29,941 DEV : loss 0.20196418464183807 - f1-score (micro avg) 0.4746 2023-10-10 21:50:29,950 saving best model 2023-10-10 21:50:37,460 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:50:47,309 epoch 6 - iter 14/146 - loss 0.21125154 - time (sec): 9.84 - samples/sec: 508.80 - lr: 0.000082 - momentum: 0.000000 2023-10-10 21:50:56,084 epoch 6 - iter 28/146 - loss 0.22445195 - time (sec): 18.62 - samples/sec: 481.00 - lr: 0.000081 - momentum: 0.000000 2023-10-10 21:51:05,040 epoch 6 - iter 42/146 - loss 0.21454292 - time (sec): 27.58 - samples/sec: 476.11 - lr: 0.000079 - momentum: 0.000000 2023-10-10 21:51:13,750 epoch 6 - iter 56/146 - loss 0.20511075 - time (sec): 36.29 - samples/sec: 485.70 - lr: 0.000077 - momentum: 0.000000 2023-10-10 21:51:23,059 epoch 6 - iter 70/146 - loss 0.20516919 - time (sec): 45.59 - samples/sec: 488.85 - lr: 0.000076 - momentum: 0.000000 2023-10-10 21:51:31,377 epoch 6 - iter 84/146 - loss 0.20221448 - time (sec): 53.91 - samples/sec: 480.18 - lr: 0.000074 - momentum: 0.000000 2023-10-10 21:51:40,433 epoch 6 - iter 98/146 - loss 0.20584764 - time (sec): 62.97 - samples/sec: 474.79 - lr: 0.000073 - momentum: 0.000000 2023-10-10 21:51:49,118 epoch 6 - iter 112/146 - loss 0.19933841 - time (sec): 71.65 - samples/sec: 474.95 - lr: 0.000071 - momentum: 0.000000 2023-10-10 21:51:58,017 epoch 6 - iter 126/146 - loss 0.20436836 - time (sec): 80.55 - samples/sec: 480.15 - lr: 0.000069 - momentum: 0.000000 2023-10-10 21:52:05,954 epoch 6 - iter 140/146 - loss 0.20428475 - time (sec): 88.49 - samples/sec: 475.67 - lr: 0.000068 - momentum: 0.000000 2023-10-10 21:52:10,321 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:52:10,322 EPOCH 6 done: loss 0.2022 - lr: 0.000068 2023-10-10 21:52:16,082 DEV : loss 0.17842039465904236 - f1-score (micro avg) 0.5636 2023-10-10 21:52:16,091 saving best model 2023-10-10 21:52:24,741 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:52:34,632 epoch 7 - iter 14/146 - loss 0.16519836 - time (sec): 9.89 - samples/sec: 484.44 - lr: 0.000066 - momentum: 0.000000 2023-10-10 21:52:43,455 epoch 7 - iter 28/146 - loss 0.15491464 - time (sec): 18.71 - samples/sec: 459.41 - lr: 0.000064 - momentum: 0.000000 2023-10-10 21:52:52,480 epoch 7 - iter 42/146 - loss 0.17736530 - time (sec): 27.74 - samples/sec: 460.60 - lr: 0.000062 - momentum: 0.000000 2023-10-10 21:53:01,686 epoch 7 - iter 56/146 - loss 0.16335559 - time (sec): 36.94 - samples/sec: 477.54 - lr: 0.000061 - momentum: 0.000000 2023-10-10 21:53:10,637 epoch 7 - iter 70/146 - loss 0.15852353 - time (sec): 45.89 - samples/sec: 477.37 - lr: 0.000059 - momentum: 0.000000 2023-10-10 21:53:18,935 epoch 7 - iter 84/146 - loss 0.16471171 - time (sec): 54.19 - samples/sec: 468.14 - lr: 0.000058 - momentum: 0.000000 2023-10-10 21:53:28,006 epoch 7 - iter 98/146 - loss 0.16359888 - time (sec): 63.26 - samples/sec: 467.46 - lr: 0.000056 - momentum: 0.000000 2023-10-10 21:53:37,417 epoch 7 - iter 112/146 - loss 0.15896974 - time (sec): 72.67 - samples/sec: 473.77 - lr: 0.000054 - momentum: 0.000000 2023-10-10 21:53:46,512 epoch 7 - iter 126/146 - loss 0.15746455 - time (sec): 81.77 - samples/sec: 474.51 - lr: 0.000053 - momentum: 0.000000 2023-10-10 21:53:56,111 epoch 7 - iter 140/146 - loss 0.16140715 - time (sec): 91.37 - samples/sec: 463.92 - lr: 0.000051 - momentum: 0.000000 2023-10-10 21:54:00,586 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:54:00,587 EPOCH 7 done: loss 0.1639 - lr: 0.000051 2023-10-10 21:54:07,600 DEV : loss 0.16299203038215637 - f1-score (micro avg) 0.6061 2023-10-10 21:54:07,610 saving best model 2023-10-10 21:54:15,265 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:54:24,807 epoch 8 - iter 14/146 - loss 0.14054633 - time (sec): 9.54 - samples/sec: 515.83 - lr: 0.000049 - momentum: 0.000000 2023-10-10 21:54:33,707 epoch 8 - iter 28/146 - loss 0.15463910 - time (sec): 18.44 - samples/sec: 526.26 - lr: 0.000047 - momentum: 0.000000 2023-10-10 21:54:42,354 epoch 8 - iter 42/146 - loss 0.15471171 - time (sec): 27.08 - samples/sec: 512.50 - lr: 0.000046 - momentum: 0.000000 2023-10-10 21:54:50,606 epoch 8 - iter 56/146 - loss 0.14788860 - time (sec): 35.34 - samples/sec: 498.83 - lr: 0.000044 - momentum: 0.000000 2023-10-10 21:54:59,640 epoch 8 - iter 70/146 - loss 0.14632606 - time (sec): 44.37 - samples/sec: 498.54 - lr: 0.000043 - momentum: 0.000000 2023-10-10 21:55:08,163 epoch 8 - iter 84/146 - loss 0.15098798 - time (sec): 52.89 - samples/sec: 491.80 - lr: 0.000041 - momentum: 0.000000 2023-10-10 21:55:17,245 epoch 8 - iter 98/146 - loss 0.14595901 - time (sec): 61.98 - samples/sec: 483.64 - lr: 0.000039 - momentum: 0.000000 2023-10-10 21:55:27,185 epoch 8 - iter 112/146 - loss 0.14346752 - time (sec): 71.92 - samples/sec: 480.22 - lr: 0.000038 - momentum: 0.000000 2023-10-10 21:55:36,937 epoch 8 - iter 126/146 - loss 0.14157061 - time (sec): 81.67 - samples/sec: 471.85 - lr: 0.000036 - momentum: 0.000000 2023-10-10 21:55:46,970 epoch 8 - iter 140/146 - loss 0.13629684 - time (sec): 91.70 - samples/sec: 467.06 - lr: 0.000035 - momentum: 0.000000 2023-10-10 21:55:50,860 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:55:50,860 EPOCH 8 done: loss 0.1398 - lr: 0.000035 2023-10-10 21:55:56,778 DEV : loss 0.15665240585803986 - f1-score (micro avg) 0.6504 2023-10-10 21:55:56,788 saving best model 2023-10-10 21:56:06,636 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:56:16,167 epoch 9 - iter 14/146 - loss 0.10274893 - time (sec): 9.53 - samples/sec: 441.91 - lr: 0.000032 - momentum: 0.000000 2023-10-10 21:56:26,757 epoch 9 - iter 28/146 - loss 0.13023131 - time (sec): 20.12 - samples/sec: 454.61 - lr: 0.000031 - momentum: 0.000000 2023-10-10 21:56:36,486 epoch 9 - iter 42/146 - loss 0.12947726 - time (sec): 29.85 - samples/sec: 444.52 - lr: 0.000029 - momentum: 0.000000 2023-10-10 21:56:46,259 epoch 9 - iter 56/146 - loss 0.12059411 - time (sec): 39.62 - samples/sec: 442.36 - lr: 0.000028 - momentum: 0.000000 2023-10-10 21:56:55,148 epoch 9 - iter 70/146 - loss 0.12012299 - time (sec): 48.51 - samples/sec: 450.59 - lr: 0.000026 - momentum: 0.000000 2023-10-10 21:57:03,385 epoch 9 - iter 84/146 - loss 0.12128595 - time (sec): 56.74 - samples/sec: 456.05 - lr: 0.000024 - momentum: 0.000000 2023-10-10 21:57:12,121 epoch 9 - iter 98/146 - loss 0.12559222 - time (sec): 65.48 - samples/sec: 461.22 - lr: 0.000023 - momentum: 0.000000 2023-10-10 21:57:20,492 epoch 9 - iter 112/146 - loss 0.12343906 - time (sec): 73.85 - samples/sec: 465.07 - lr: 0.000021 - momentum: 0.000000 2023-10-10 21:57:29,518 epoch 9 - iter 126/146 - loss 0.12193938 - time (sec): 82.88 - samples/sec: 472.03 - lr: 0.000020 - momentum: 0.000000 2023-10-10 21:57:37,883 epoch 9 - iter 140/146 - loss 0.12379814 - time (sec): 91.24 - samples/sec: 470.00 - lr: 0.000018 - momentum: 0.000000 2023-10-10 21:57:41,307 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:57:41,307 EPOCH 9 done: loss 0.1234 - lr: 0.000018 2023-10-10 21:57:47,299 DEV : loss 0.1573924571275711 - f1-score (micro avg) 0.6348 2023-10-10 21:57:47,308 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:57:56,128 epoch 10 - iter 14/146 - loss 0.10385368 - time (sec): 8.82 - samples/sec: 508.61 - lr: 0.000016 - momentum: 0.000000 2023-10-10 21:58:04,303 epoch 10 - iter 28/146 - loss 0.12007872 - time (sec): 16.99 - samples/sec: 479.61 - lr: 0.000014 - momentum: 0.000000 2023-10-10 21:58:12,720 epoch 10 - iter 42/146 - loss 0.11224087 - time (sec): 25.41 - samples/sec: 485.72 - lr: 0.000013 - momentum: 0.000000 2023-10-10 21:58:22,256 epoch 10 - iter 56/146 - loss 0.10217048 - time (sec): 34.95 - samples/sec: 503.12 - lr: 0.000011 - momentum: 0.000000 2023-10-10 21:58:31,488 epoch 10 - iter 70/146 - loss 0.10189941 - time (sec): 44.18 - samples/sec: 506.27 - lr: 0.000009 - momentum: 0.000000 2023-10-10 21:58:39,895 epoch 10 - iter 84/146 - loss 0.10134392 - time (sec): 52.59 - samples/sec: 499.85 - lr: 0.000008 - momentum: 0.000000 2023-10-10 21:58:48,328 epoch 10 - iter 98/146 - loss 0.10206351 - time (sec): 61.02 - samples/sec: 498.54 - lr: 0.000006 - momentum: 0.000000 2023-10-10 21:58:57,115 epoch 10 - iter 112/146 - loss 0.10614914 - time (sec): 69.81 - samples/sec: 497.17 - lr: 0.000005 - momentum: 0.000000 2023-10-10 21:59:05,599 epoch 10 - iter 126/146 - loss 0.10946359 - time (sec): 78.29 - samples/sec: 494.50 - lr: 0.000003 - momentum: 0.000000 2023-10-10 21:59:14,323 epoch 10 - iter 140/146 - loss 0.11293547 - time (sec): 87.01 - samples/sec: 494.56 - lr: 0.000001 - momentum: 0.000000 2023-10-10 21:59:17,682 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:59:17,683 EPOCH 10 done: loss 0.1160 - lr: 0.000001 2023-10-10 21:59:23,725 DEV : loss 0.15526741743087769 - f1-score (micro avg) 0.6783 2023-10-10 21:59:23,734 saving best model 2023-10-10 21:59:32,858 ---------------------------------------------------------------------------------------------------- 2023-10-10 21:59:32,860 Loading model from best epoch ... 2023-10-10 21:59:38,272 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-10 21:59:51,377 Results: - F-score (micro) 0.7005 - F-score (macro) 0.6111 - Accuracy 0.5721 By class: precision recall f1-score support PER 0.7970 0.7672 0.7818 348 LOC 0.5960 0.7969 0.6820 261 ORG 0.2857 0.3077 0.2963 52 HumanProd 0.8125 0.5909 0.6842 22 micro avg 0.6667 0.7379 0.7005 683 macro avg 0.6228 0.6157 0.6111 683 weighted avg 0.6818 0.7379 0.7036 683 2023-10-10 21:59:51,378 ----------------------------------------------------------------------------------------------------