2023-10-10 23:35:37,071 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,073 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-10 23:35:37,073 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,074 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-10 23:35:37,074 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,074 Train: 1166 sentences 2023-10-10 23:35:37,074 (train_with_dev=False, train_with_test=False) 2023-10-10 23:35:37,074 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,074 Training Params: 2023-10-10 23:35:37,074 - learning_rate: "0.00015" 2023-10-10 23:35:37,074 - mini_batch_size: "4" 2023-10-10 23:35:37,074 - max_epochs: "10" 2023-10-10 23:35:37,074 - shuffle: "True" 2023-10-10 23:35:37,074 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,074 Plugins: 2023-10-10 23:35:37,074 - TensorboardLogger 2023-10-10 23:35:37,075 - LinearScheduler | warmup_fraction: '0.1' 2023-10-10 23:35:37,075 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,075 Final evaluation on model from best epoch (best-model.pt) 2023-10-10 23:35:37,075 - metric: "('micro avg', 'f1-score')" 2023-10-10 23:35:37,075 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,075 Computation: 2023-10-10 23:35:37,075 - compute on device: cuda:0 2023-10-10 23:35:37,075 - embedding storage: none 2023-10-10 23:35:37,075 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,075 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-10 23:35:37,075 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,075 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:35:37,075 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-10 23:35:46,918 epoch 1 - iter 29/292 - loss 2.85313542 - time (sec): 9.84 - samples/sec: 511.95 - lr: 0.000014 - momentum: 0.000000 2023-10-10 23:35:55,830 epoch 1 - iter 58/292 - loss 2.84372720 - time (sec): 18.75 - samples/sec: 483.82 - lr: 0.000029 - momentum: 0.000000 2023-10-10 23:36:05,558 epoch 1 - iter 87/292 - loss 2.82117874 - time (sec): 28.48 - samples/sec: 484.65 - lr: 0.000044 - momentum: 0.000000 2023-10-10 23:36:14,853 epoch 1 - iter 116/292 - loss 2.77756334 - time (sec): 37.78 - samples/sec: 481.28 - lr: 0.000059 - momentum: 0.000000 2023-10-10 23:36:23,528 epoch 1 - iter 145/292 - loss 2.70096029 - time (sec): 46.45 - samples/sec: 471.66 - lr: 0.000074 - momentum: 0.000000 2023-10-10 23:36:32,493 epoch 1 - iter 174/292 - loss 2.59885515 - time (sec): 55.42 - samples/sec: 466.51 - lr: 0.000089 - momentum: 0.000000 2023-10-10 23:36:42,145 epoch 1 - iter 203/292 - loss 2.47709292 - time (sec): 65.07 - samples/sec: 467.68 - lr: 0.000104 - momentum: 0.000000 2023-10-10 23:36:51,289 epoch 1 - iter 232/292 - loss 2.36240655 - time (sec): 74.21 - samples/sec: 466.99 - lr: 0.000119 - momentum: 0.000000 2023-10-10 23:37:01,345 epoch 1 - iter 261/292 - loss 2.21801160 - time (sec): 84.27 - samples/sec: 470.53 - lr: 0.000134 - momentum: 0.000000 2023-10-10 23:37:11,105 epoch 1 - iter 290/292 - loss 2.08431696 - time (sec): 94.03 - samples/sec: 471.53 - lr: 0.000148 - momentum: 0.000000 2023-10-10 23:37:11,490 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:37:11,490 EPOCH 1 done: loss 2.0821 - lr: 0.000148 2023-10-10 23:37:16,630 DEV : loss 0.7319196462631226 - f1-score (micro avg) 0.0 2023-10-10 23:37:16,639 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:37:25,760 epoch 2 - iter 29/292 - loss 0.78142018 - time (sec): 9.12 - samples/sec: 475.48 - lr: 0.000148 - momentum: 0.000000 2023-10-10 23:37:35,383 epoch 2 - iter 58/292 - loss 0.69086359 - time (sec): 18.74 - samples/sec: 487.15 - lr: 0.000147 - momentum: 0.000000 2023-10-10 23:37:45,245 epoch 2 - iter 87/292 - loss 0.67792000 - time (sec): 28.60 - samples/sec: 493.68 - lr: 0.000145 - momentum: 0.000000 2023-10-10 23:37:53,662 epoch 2 - iter 116/292 - loss 0.65922426 - time (sec): 37.02 - samples/sec: 479.84 - lr: 0.000143 - momentum: 0.000000 2023-10-10 23:38:03,204 epoch 2 - iter 145/292 - loss 0.62436588 - time (sec): 46.56 - samples/sec: 478.37 - lr: 0.000142 - momentum: 0.000000 2023-10-10 23:38:11,407 epoch 2 - iter 174/292 - loss 0.62263495 - time (sec): 54.77 - samples/sec: 465.55 - lr: 0.000140 - momentum: 0.000000 2023-10-10 23:38:21,079 epoch 2 - iter 203/292 - loss 0.59059183 - time (sec): 64.44 - samples/sec: 471.68 - lr: 0.000138 - momentum: 0.000000 2023-10-10 23:38:30,761 epoch 2 - iter 232/292 - loss 0.55458788 - time (sec): 74.12 - samples/sec: 476.41 - lr: 0.000137 - momentum: 0.000000 2023-10-10 23:38:39,702 epoch 2 - iter 261/292 - loss 0.52948949 - time (sec): 83.06 - samples/sec: 475.48 - lr: 0.000135 - momentum: 0.000000 2023-10-10 23:38:49,110 epoch 2 - iter 290/292 - loss 0.54146179 - time (sec): 92.47 - samples/sec: 478.37 - lr: 0.000134 - momentum: 0.000000 2023-10-10 23:38:49,561 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:38:49,561 EPOCH 2 done: loss 0.5406 - lr: 0.000134 2023-10-10 23:38:54,981 DEV : loss 0.30725252628326416 - f1-score (micro avg) 0.0 2023-10-10 23:38:54,990 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:39:03,937 epoch 3 - iter 29/292 - loss 0.44916497 - time (sec): 8.94 - samples/sec: 410.97 - lr: 0.000132 - momentum: 0.000000 2023-10-10 23:39:13,069 epoch 3 - iter 58/292 - loss 0.37025341 - time (sec): 18.08 - samples/sec: 444.55 - lr: 0.000130 - momentum: 0.000000 2023-10-10 23:39:22,841 epoch 3 - iter 87/292 - loss 0.43563961 - time (sec): 27.85 - samples/sec: 471.68 - lr: 0.000128 - momentum: 0.000000 2023-10-10 23:39:31,276 epoch 3 - iter 116/292 - loss 0.42624209 - time (sec): 36.28 - samples/sec: 460.09 - lr: 0.000127 - momentum: 0.000000 2023-10-10 23:39:41,047 epoch 3 - iter 145/292 - loss 0.40305206 - time (sec): 46.06 - samples/sec: 467.72 - lr: 0.000125 - momentum: 0.000000 2023-10-10 23:39:50,060 epoch 3 - iter 174/292 - loss 0.38723134 - time (sec): 55.07 - samples/sec: 466.78 - lr: 0.000123 - momentum: 0.000000 2023-10-10 23:39:59,322 epoch 3 - iter 203/292 - loss 0.37293211 - time (sec): 64.33 - samples/sec: 469.75 - lr: 0.000122 - momentum: 0.000000 2023-10-10 23:40:08,762 epoch 3 - iter 232/292 - loss 0.36065966 - time (sec): 73.77 - samples/sec: 473.13 - lr: 0.000120 - momentum: 0.000000 2023-10-10 23:40:18,095 epoch 3 - iter 261/292 - loss 0.35244201 - time (sec): 83.10 - samples/sec: 472.58 - lr: 0.000119 - momentum: 0.000000 2023-10-10 23:40:27,867 epoch 3 - iter 290/292 - loss 0.34283270 - time (sec): 92.88 - samples/sec: 476.03 - lr: 0.000117 - momentum: 0.000000 2023-10-10 23:40:28,376 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:40:28,376 EPOCH 3 done: loss 0.3487 - lr: 0.000117 2023-10-10 23:40:34,075 DEV : loss 0.25399211049079895 - f1-score (micro avg) 0.2737 2023-10-10 23:40:34,084 saving best model 2023-10-10 23:40:34,996 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:40:44,089 epoch 4 - iter 29/292 - loss 0.27665069 - time (sec): 9.09 - samples/sec: 456.67 - lr: 0.000115 - momentum: 0.000000 2023-10-10 23:40:53,537 epoch 4 - iter 58/292 - loss 0.35599118 - time (sec): 18.54 - samples/sec: 465.92 - lr: 0.000113 - momentum: 0.000000 2023-10-10 23:41:03,309 epoch 4 - iter 87/292 - loss 0.29107191 - time (sec): 28.31 - samples/sec: 461.84 - lr: 0.000112 - momentum: 0.000000 2023-10-10 23:41:12,868 epoch 4 - iter 116/292 - loss 0.28109945 - time (sec): 37.87 - samples/sec: 461.68 - lr: 0.000110 - momentum: 0.000000 2023-10-10 23:41:21,797 epoch 4 - iter 145/292 - loss 0.27689495 - time (sec): 46.80 - samples/sec: 455.20 - lr: 0.000108 - momentum: 0.000000 2023-10-10 23:41:31,466 epoch 4 - iter 174/292 - loss 0.27119259 - time (sec): 56.47 - samples/sec: 453.83 - lr: 0.000107 - momentum: 0.000000 2023-10-10 23:41:41,243 epoch 4 - iter 203/292 - loss 0.25991322 - time (sec): 66.25 - samples/sec: 455.98 - lr: 0.000105 - momentum: 0.000000 2023-10-10 23:41:50,603 epoch 4 - iter 232/292 - loss 0.25730647 - time (sec): 75.61 - samples/sec: 455.08 - lr: 0.000104 - momentum: 0.000000 2023-10-10 23:42:00,430 epoch 4 - iter 261/292 - loss 0.26262908 - time (sec): 85.43 - samples/sec: 459.51 - lr: 0.000102 - momentum: 0.000000 2023-10-10 23:42:10,745 epoch 4 - iter 290/292 - loss 0.25695125 - time (sec): 95.75 - samples/sec: 462.90 - lr: 0.000100 - momentum: 0.000000 2023-10-10 23:42:11,163 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:42:11,163 EPOCH 4 done: loss 0.2568 - lr: 0.000100 2023-10-10 23:42:16,884 DEV : loss 0.19706253707408905 - f1-score (micro avg) 0.4559 2023-10-10 23:42:16,894 saving best model 2023-10-10 23:42:24,060 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:42:33,369 epoch 5 - iter 29/292 - loss 0.21009808 - time (sec): 9.31 - samples/sec: 462.53 - lr: 0.000098 - momentum: 0.000000 2023-10-10 23:42:43,233 epoch 5 - iter 58/292 - loss 0.18064570 - time (sec): 19.17 - samples/sec: 471.90 - lr: 0.000097 - momentum: 0.000000 2023-10-10 23:42:52,542 epoch 5 - iter 87/292 - loss 0.17577573 - time (sec): 28.48 - samples/sec: 464.00 - lr: 0.000095 - momentum: 0.000000 2023-10-10 23:43:02,200 epoch 5 - iter 116/292 - loss 0.17534975 - time (sec): 38.14 - samples/sec: 463.29 - lr: 0.000093 - momentum: 0.000000 2023-10-10 23:43:11,628 epoch 5 - iter 145/292 - loss 0.17692015 - time (sec): 47.56 - samples/sec: 456.12 - lr: 0.000092 - momentum: 0.000000 2023-10-10 23:43:22,538 epoch 5 - iter 174/292 - loss 0.19355367 - time (sec): 58.47 - samples/sec: 468.31 - lr: 0.000090 - momentum: 0.000000 2023-10-10 23:43:32,633 epoch 5 - iter 203/292 - loss 0.18908499 - time (sec): 68.57 - samples/sec: 464.72 - lr: 0.000089 - momentum: 0.000000 2023-10-10 23:43:42,635 epoch 5 - iter 232/292 - loss 0.18645282 - time (sec): 78.57 - samples/sec: 464.25 - lr: 0.000087 - momentum: 0.000000 2023-10-10 23:43:51,591 epoch 5 - iter 261/292 - loss 0.18419102 - time (sec): 87.53 - samples/sec: 458.59 - lr: 0.000085 - momentum: 0.000000 2023-10-10 23:44:01,370 epoch 5 - iter 290/292 - loss 0.18219314 - time (sec): 97.31 - samples/sec: 455.09 - lr: 0.000084 - momentum: 0.000000 2023-10-10 23:44:01,814 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:44:01,814 EPOCH 5 done: loss 0.1825 - lr: 0.000084 2023-10-10 23:44:07,667 DEV : loss 0.16620376706123352 - f1-score (micro avg) 0.6582 2023-10-10 23:44:07,676 saving best model 2023-10-10 23:44:14,706 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:44:24,827 epoch 6 - iter 29/292 - loss 0.12681655 - time (sec): 10.12 - samples/sec: 476.31 - lr: 0.000082 - momentum: 0.000000 2023-10-10 23:44:34,359 epoch 6 - iter 58/292 - loss 0.13248581 - time (sec): 19.65 - samples/sec: 457.69 - lr: 0.000080 - momentum: 0.000000 2023-10-10 23:44:43,923 epoch 6 - iter 87/292 - loss 0.12617558 - time (sec): 29.21 - samples/sec: 454.60 - lr: 0.000078 - momentum: 0.000000 2023-10-10 23:44:53,324 epoch 6 - iter 116/292 - loss 0.13236752 - time (sec): 38.61 - samples/sec: 453.54 - lr: 0.000077 - momentum: 0.000000 2023-10-10 23:45:02,801 epoch 6 - iter 145/292 - loss 0.13561748 - time (sec): 48.09 - samples/sec: 452.93 - lr: 0.000075 - momentum: 0.000000 2023-10-10 23:45:12,212 epoch 6 - iter 174/292 - loss 0.13666887 - time (sec): 57.50 - samples/sec: 451.10 - lr: 0.000074 - momentum: 0.000000 2023-10-10 23:45:22,723 epoch 6 - iter 203/292 - loss 0.13676891 - time (sec): 68.01 - samples/sec: 460.56 - lr: 0.000072 - momentum: 0.000000 2023-10-10 23:45:31,946 epoch 6 - iter 232/292 - loss 0.13971349 - time (sec): 77.24 - samples/sec: 457.86 - lr: 0.000070 - momentum: 0.000000 2023-10-10 23:45:41,291 epoch 6 - iter 261/292 - loss 0.13570923 - time (sec): 86.58 - samples/sec: 459.15 - lr: 0.000069 - momentum: 0.000000 2023-10-10 23:45:50,981 epoch 6 - iter 290/292 - loss 0.13228776 - time (sec): 96.27 - samples/sec: 457.72 - lr: 0.000067 - momentum: 0.000000 2023-10-10 23:45:51,640 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:45:51,640 EPOCH 6 done: loss 0.1315 - lr: 0.000067 2023-10-10 23:45:57,282 DEV : loss 0.14966456592082977 - f1-score (micro avg) 0.7119 2023-10-10 23:45:57,292 saving best model 2023-10-10 23:46:05,725 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:46:15,412 epoch 7 - iter 29/292 - loss 0.10699187 - time (sec): 9.68 - samples/sec: 452.78 - lr: 0.000065 - momentum: 0.000000 2023-10-10 23:46:25,634 epoch 7 - iter 58/292 - loss 0.10066351 - time (sec): 19.91 - samples/sec: 474.60 - lr: 0.000063 - momentum: 0.000000 2023-10-10 23:46:34,493 epoch 7 - iter 87/292 - loss 0.09640648 - time (sec): 28.76 - samples/sec: 455.47 - lr: 0.000062 - momentum: 0.000000 2023-10-10 23:46:43,743 epoch 7 - iter 116/292 - loss 0.10508994 - time (sec): 38.01 - samples/sec: 454.73 - lr: 0.000060 - momentum: 0.000000 2023-10-10 23:46:52,292 epoch 7 - iter 145/292 - loss 0.10571912 - time (sec): 46.56 - samples/sec: 444.09 - lr: 0.000059 - momentum: 0.000000 2023-10-10 23:47:01,874 epoch 7 - iter 174/292 - loss 0.10302895 - time (sec): 56.14 - samples/sec: 452.90 - lr: 0.000057 - momentum: 0.000000 2023-10-10 23:47:11,954 epoch 7 - iter 203/292 - loss 0.10265168 - time (sec): 66.22 - samples/sec: 456.99 - lr: 0.000055 - momentum: 0.000000 2023-10-10 23:47:21,832 epoch 7 - iter 232/292 - loss 0.10359819 - time (sec): 76.10 - samples/sec: 461.95 - lr: 0.000054 - momentum: 0.000000 2023-10-10 23:47:31,190 epoch 7 - iter 261/292 - loss 0.10237414 - time (sec): 85.46 - samples/sec: 462.74 - lr: 0.000052 - momentum: 0.000000 2023-10-10 23:47:41,383 epoch 7 - iter 290/292 - loss 0.10022497 - time (sec): 95.65 - samples/sec: 462.90 - lr: 0.000050 - momentum: 0.000000 2023-10-10 23:47:41,885 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:47:41,886 EPOCH 7 done: loss 0.1003 - lr: 0.000050 2023-10-10 23:47:48,311 DEV : loss 0.14436788856983185 - f1-score (micro avg) 0.7431 2023-10-10 23:47:48,322 saving best model 2023-10-10 23:47:55,695 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:48:05,875 epoch 8 - iter 29/292 - loss 0.08606206 - time (sec): 10.18 - samples/sec: 428.08 - lr: 0.000048 - momentum: 0.000000 2023-10-10 23:48:15,176 epoch 8 - iter 58/292 - loss 0.10042979 - time (sec): 19.48 - samples/sec: 439.24 - lr: 0.000047 - momentum: 0.000000 2023-10-10 23:48:24,658 epoch 8 - iter 87/292 - loss 0.08629826 - time (sec): 28.96 - samples/sec: 451.68 - lr: 0.000045 - momentum: 0.000000 2023-10-10 23:48:33,961 epoch 8 - iter 116/292 - loss 0.09198991 - time (sec): 38.26 - samples/sec: 456.71 - lr: 0.000044 - momentum: 0.000000 2023-10-10 23:48:43,305 epoch 8 - iter 145/292 - loss 0.08720786 - time (sec): 47.61 - samples/sec: 458.40 - lr: 0.000042 - momentum: 0.000000 2023-10-10 23:48:51,893 epoch 8 - iter 174/292 - loss 0.08829853 - time (sec): 56.19 - samples/sec: 450.20 - lr: 0.000040 - momentum: 0.000000 2023-10-10 23:49:02,167 epoch 8 - iter 203/292 - loss 0.08393690 - time (sec): 66.47 - samples/sec: 461.17 - lr: 0.000039 - momentum: 0.000000 2023-10-10 23:49:11,359 epoch 8 - iter 232/292 - loss 0.08287430 - time (sec): 75.66 - samples/sec: 459.05 - lr: 0.000037 - momentum: 0.000000 2023-10-10 23:49:21,542 epoch 8 - iter 261/292 - loss 0.08141757 - time (sec): 85.84 - samples/sec: 466.25 - lr: 0.000035 - momentum: 0.000000 2023-10-10 23:49:30,780 epoch 8 - iter 290/292 - loss 0.08090538 - time (sec): 95.08 - samples/sec: 466.14 - lr: 0.000034 - momentum: 0.000000 2023-10-10 23:49:31,186 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:49:31,187 EPOCH 8 done: loss 0.0807 - lr: 0.000034 2023-10-10 23:49:36,774 DEV : loss 0.14600899815559387 - f1-score (micro avg) 0.7373 2023-10-10 23:49:36,784 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:49:46,636 epoch 9 - iter 29/292 - loss 0.07929525 - time (sec): 9.85 - samples/sec: 446.43 - lr: 0.000032 - momentum: 0.000000 2023-10-10 23:49:55,786 epoch 9 - iter 58/292 - loss 0.06917974 - time (sec): 19.00 - samples/sec: 446.43 - lr: 0.000030 - momentum: 0.000000 2023-10-10 23:50:04,404 epoch 9 - iter 87/292 - loss 0.07967206 - time (sec): 27.62 - samples/sec: 436.54 - lr: 0.000029 - momentum: 0.000000 2023-10-10 23:50:14,209 epoch 9 - iter 116/292 - loss 0.07544498 - time (sec): 37.42 - samples/sec: 451.67 - lr: 0.000027 - momentum: 0.000000 2023-10-10 23:50:24,534 epoch 9 - iter 145/292 - loss 0.07339838 - time (sec): 47.75 - samples/sec: 464.00 - lr: 0.000025 - momentum: 0.000000 2023-10-10 23:50:33,546 epoch 9 - iter 174/292 - loss 0.07269647 - time (sec): 56.76 - samples/sec: 460.21 - lr: 0.000024 - momentum: 0.000000 2023-10-10 23:50:43,184 epoch 9 - iter 203/292 - loss 0.07186638 - time (sec): 66.40 - samples/sec: 464.40 - lr: 0.000022 - momentum: 0.000000 2023-10-10 23:50:52,306 epoch 9 - iter 232/292 - loss 0.07095228 - time (sec): 75.52 - samples/sec: 461.80 - lr: 0.000020 - momentum: 0.000000 2023-10-10 23:51:02,019 epoch 9 - iter 261/292 - loss 0.07058102 - time (sec): 85.23 - samples/sec: 465.29 - lr: 0.000019 - momentum: 0.000000 2023-10-10 23:51:11,747 epoch 9 - iter 290/292 - loss 0.06922655 - time (sec): 94.96 - samples/sec: 466.40 - lr: 0.000017 - momentum: 0.000000 2023-10-10 23:51:12,183 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:51:12,184 EPOCH 9 done: loss 0.0691 - lr: 0.000017 2023-10-10 23:51:17,740 DEV : loss 0.1435050070285797 - f1-score (micro avg) 0.7431 2023-10-10 23:51:17,750 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:51:26,849 epoch 10 - iter 29/292 - loss 0.07707015 - time (sec): 9.10 - samples/sec: 469.17 - lr: 0.000015 - momentum: 0.000000 2023-10-10 23:51:36,524 epoch 10 - iter 58/292 - loss 0.07455355 - time (sec): 18.77 - samples/sec: 481.63 - lr: 0.000014 - momentum: 0.000000 2023-10-10 23:51:45,636 epoch 10 - iter 87/292 - loss 0.06383844 - time (sec): 27.88 - samples/sec: 465.36 - lr: 0.000012 - momentum: 0.000000 2023-10-10 23:51:55,848 epoch 10 - iter 116/292 - loss 0.06023946 - time (sec): 38.10 - samples/sec: 454.54 - lr: 0.000010 - momentum: 0.000000 2023-10-10 23:52:06,208 epoch 10 - iter 145/292 - loss 0.05687143 - time (sec): 48.46 - samples/sec: 444.57 - lr: 0.000009 - momentum: 0.000000 2023-10-10 23:52:16,612 epoch 10 - iter 174/292 - loss 0.06003429 - time (sec): 58.86 - samples/sec: 447.08 - lr: 0.000007 - momentum: 0.000000 2023-10-10 23:52:26,689 epoch 10 - iter 203/292 - loss 0.06203918 - time (sec): 68.94 - samples/sec: 454.17 - lr: 0.000005 - momentum: 0.000000 2023-10-10 23:52:35,890 epoch 10 - iter 232/292 - loss 0.06183864 - time (sec): 78.14 - samples/sec: 450.04 - lr: 0.000004 - momentum: 0.000000 2023-10-10 23:52:46,122 epoch 10 - iter 261/292 - loss 0.06243654 - time (sec): 88.37 - samples/sec: 455.81 - lr: 0.000002 - momentum: 0.000000 2023-10-10 23:52:55,079 epoch 10 - iter 290/292 - loss 0.06314414 - time (sec): 97.33 - samples/sec: 454.93 - lr: 0.000000 - momentum: 0.000000 2023-10-10 23:52:55,520 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:52:55,520 EPOCH 10 done: loss 0.0629 - lr: 0.000000 2023-10-10 23:53:00,971 DEV : loss 0.144588902592659 - f1-score (micro avg) 0.7553 2023-10-10 23:53:00,980 saving best model 2023-10-10 23:53:07,827 ---------------------------------------------------------------------------------------------------- 2023-10-10 23:53:07,829 Loading model from best epoch ... 2023-10-10 23:53:11,365 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-10 23:53:25,880 Results: - F-score (micro) 0.7047 - F-score (macro) 0.6351 - Accuracy 0.5597 By class: precision recall f1-score support PER 0.7429 0.8218 0.7804 348 LOC 0.5879 0.7816 0.6711 261 ORG 0.3585 0.3654 0.3619 52 HumanProd 0.7273 0.7273 0.7273 22 micro avg 0.6506 0.7687 0.7047 683 macro avg 0.6041 0.6740 0.6351 683 weighted avg 0.6539 0.7687 0.7050 683 2023-10-10 23:53:25,881 ----------------------------------------------------------------------------------------------------