2023-04-10 13:00:16,297 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:00:16,300 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): RobertaModel( (embeddings): RobertaEmbeddings( (word_embeddings): Embedding(50263, 768) (position_embeddings): Embedding(514, 768, padding_idx=1) (token_type_embeddings): Embedding(1, 768) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): RobertaEncoder( (layer): ModuleList( (0-11): 12 x RobertaLayer( (attention): RobertaAttention( (self): RobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): RobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): RobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): RobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): RobertaPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-04-10 13:00:16,300 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:00:16,302 Corpus: "Corpus: 12554 train + 4549 dev + 4505 test sentences" 2023-04-10 13:00:16,303 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:00:16,305 Parameters: 2023-04-10 13:00:16,305 - learning_rate: "0.000005" 2023-04-10 13:00:16,307 - mini_batch_size: "16" 2023-04-10 13:00:16,308 - patience: "3" 2023-04-10 13:00:16,310 - anneal_factor: "0.5" 2023-04-10 13:00:16,312 - max_epochs: "20" 2023-04-10 13:00:16,313 - shuffle: "True" 2023-04-10 13:00:16,315 - train_with_dev: "False" 2023-04-10 13:00:16,316 - batch_growth_annealing: "False" 2023-04-10 13:00:16,317 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:00:16,319 Model training base path: "CREBMSP_results" 2023-04-10 13:00:16,320 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:00:16,322 Device: cuda 2023-04-10 13:00:16,323 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:00:16,329 Embeddings storage mode: none 2023-04-10 13:00:16,329 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:01:05,878 epoch 1 - iter 78/785 - loss 2.72638388 - time (sec): 49.55 - samples/sec: 640.40 - lr: 0.000000 2023-04-10 13:01:51,226 epoch 1 - iter 156/785 - loss 2.67369637 - time (sec): 94.89 - samples/sec: 661.65 - lr: 0.000000 2023-04-10 13:02:36,982 epoch 1 - iter 234/785 - loss 2.59907317 - time (sec): 140.65 - samples/sec: 673.76 - lr: 0.000001 2023-04-10 13:03:22,714 epoch 1 - iter 312/785 - loss 2.54109267 - time (sec): 186.38 - samples/sec: 607.84 - lr: 0.000001 2023-04-10 13:04:08,546 epoch 1 - iter 390/785 - loss 2.44853293 - time (sec): 232.21 - samples/sec: 553.27 - lr: 0.000001 2023-04-10 13:04:54,366 epoch 1 - iter 468/785 - loss 2.33490476 - time (sec): 278.03 - samples/sec: 516.47 - lr: 0.000001 2023-04-10 13:05:39,909 epoch 1 - iter 546/785 - loss 2.23039567 - time (sec): 323.58 - samples/sec: 493.22 - lr: 0.000002 2023-04-10 13:06:25,708 epoch 1 - iter 624/785 - loss 2.13753946 - time (sec): 369.38 - samples/sec: 474.69 - lr: 0.000002 2023-04-10 13:07:11,470 epoch 1 - iter 702/785 - loss 2.05023541 - time (sec): 415.14 - samples/sec: 459.71 - lr: 0.000002 2023-04-10 13:07:57,005 epoch 1 - iter 780/785 - loss 1.96306759 - time (sec): 460.67 - samples/sec: 449.14 - lr: 0.000002 2023-04-10 13:07:59,739 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:07:59,740 EPOCH 1 done: loss 1.9575 - lr 0.000002 2023-04-10 13:08:24,560 Evaluating as a multi-label problem: False 2023-04-10 13:08:24,631 DEV : loss 0.7945712804794312 - f1-score (micro avg) 0.1859 2023-04-10 13:08:24,716 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:09:10,697 epoch 2 - iter 78/785 - loss 0.80433163 - time (sec): 45.98 - samples/sec: 433.02 - lr: 0.000003 2023-04-10 13:09:56,456 epoch 2 - iter 156/785 - loss 0.77925513 - time (sec): 91.74 - samples/sec: 446.90 - lr: 0.000003 2023-04-10 13:10:42,110 epoch 2 - iter 234/785 - loss 0.75449825 - time (sec): 137.39 - samples/sec: 451.07 - lr: 0.000003 2023-04-10 13:11:28,041 epoch 2 - iter 312/785 - loss 0.73931223 - time (sec): 183.32 - samples/sec: 452.65 - lr: 0.000003 2023-04-10 13:12:13,619 epoch 2 - iter 390/785 - loss 0.71775506 - time (sec): 228.90 - samples/sec: 453.04 - lr: 0.000004 2023-04-10 13:12:59,428 epoch 2 - iter 468/785 - loss 0.70077066 - time (sec): 274.71 - samples/sec: 454.23 - lr: 0.000004 2023-04-10 13:13:45,255 epoch 2 - iter 546/785 - loss 0.67654616 - time (sec): 320.54 - samples/sec: 453.06 - lr: 0.000004 2023-04-10 13:14:31,147 epoch 2 - iter 624/785 - loss 0.65446315 - time (sec): 366.43 - samples/sec: 452.84 - lr: 0.000004 2023-04-10 13:15:17,010 epoch 2 - iter 702/785 - loss 0.63531321 - time (sec): 412.29 - samples/sec: 453.08 - lr: 0.000005 2023-04-10 13:16:02,846 epoch 2 - iter 780/785 - loss 0.61564112 - time (sec): 458.13 - samples/sec: 451.65 - lr: 0.000005 2023-04-10 13:16:05,578 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:16:05,580 EPOCH 2 done: loss 0.6148 - lr 0.000005 2023-04-10 13:16:31,519 Evaluating as a multi-label problem: False 2023-04-10 13:16:31,599 DEV : loss 0.3734082877635956 - f1-score (micro avg) 0.6995 2023-04-10 13:16:31,683 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:17:17,398 epoch 3 - iter 78/785 - loss 0.42211544 - time (sec): 45.71 - samples/sec: 460.06 - lr: 0.000005 2023-04-10 13:18:03,476 epoch 3 - iter 156/785 - loss 0.39062543 - time (sec): 91.79 - samples/sec: 450.57 - lr: 0.000005 2023-04-10 13:18:49,456 epoch 3 - iter 234/785 - loss 0.38367027 - time (sec): 137.77 - samples/sec: 448.39 - lr: 0.000005 2023-04-10 13:19:35,061 epoch 3 - iter 312/785 - loss 0.37454659 - time (sec): 183.38 - samples/sec: 450.10 - lr: 0.000005 2023-04-10 13:20:20,826 epoch 3 - iter 390/785 - loss 0.36558572 - time (sec): 229.14 - samples/sec: 448.39 - lr: 0.000005 2023-04-10 13:21:06,862 epoch 3 - iter 468/785 - loss 0.36016623 - time (sec): 275.18 - samples/sec: 450.31 - lr: 0.000005 2023-04-10 13:21:52,802 epoch 3 - iter 546/785 - loss 0.35301531 - time (sec): 321.12 - samples/sec: 450.40 - lr: 0.000005 2023-04-10 13:22:38,631 epoch 3 - iter 624/785 - loss 0.34785227 - time (sec): 366.95 - samples/sec: 451.31 - lr: 0.000005 2023-04-10 13:23:24,542 epoch 3 - iter 702/785 - loss 0.34289183 - time (sec): 412.86 - samples/sec: 450.14 - lr: 0.000005 2023-04-10 13:24:10,698 epoch 3 - iter 780/785 - loss 0.33617042 - time (sec): 459.01 - samples/sec: 450.47 - lr: 0.000005 2023-04-10 13:24:13,424 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:24:13,425 EPOCH 3 done: loss 0.3359 - lr 0.000005 2023-04-10 13:24:39,219 Evaluating as a multi-label problem: False 2023-04-10 13:24:39,294 DEV : loss 0.2478274405002594 - f1-score (micro avg) 0.7669 2023-04-10 13:24:39,378 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:25:25,057 epoch 4 - iter 78/785 - loss 0.24856806 - time (sec): 45.68 - samples/sec: 460.40 - lr: 0.000005 2023-04-10 13:26:11,029 epoch 4 - iter 156/785 - loss 0.24520289 - time (sec): 91.65 - samples/sec: 457.73 - lr: 0.000005 2023-04-10 13:26:56,833 epoch 4 - iter 234/785 - loss 0.24918267 - time (sec): 137.45 - samples/sec: 446.42 - lr: 0.000005 2023-04-10 13:27:42,768 epoch 4 - iter 312/785 - loss 0.24994835 - time (sec): 183.39 - samples/sec: 445.87 - lr: 0.000005 2023-04-10 13:28:28,729 epoch 4 - iter 390/785 - loss 0.24670791 - time (sec): 229.35 - samples/sec: 443.71 - lr: 0.000005 2023-04-10 13:29:14,517 epoch 4 - iter 468/785 - loss 0.24363947 - time (sec): 275.14 - samples/sec: 447.73 - lr: 0.000005 2023-04-10 13:30:00,442 epoch 4 - iter 546/785 - loss 0.24232958 - time (sec): 321.06 - samples/sec: 446.61 - lr: 0.000005 2023-04-10 13:30:46,468 epoch 4 - iter 624/785 - loss 0.23891458 - time (sec): 367.09 - samples/sec: 447.11 - lr: 0.000005 2023-04-10 13:31:32,246 epoch 4 - iter 702/785 - loss 0.23581434 - time (sec): 412.87 - samples/sec: 450.67 - lr: 0.000004 2023-04-10 13:32:18,461 epoch 4 - iter 780/785 - loss 0.23410588 - time (sec): 459.08 - samples/sec: 450.67 - lr: 0.000004 2023-04-10 13:32:21,126 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:32:21,128 EPOCH 4 done: loss 0.2340 - lr 0.000004 2023-04-10 13:32:46,956 Evaluating as a multi-label problem: False 2023-04-10 13:32:47,034 DEV : loss 0.21353298425674438 - f1-score (micro avg) 0.7899 2023-04-10 13:32:47,119 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:33:32,954 epoch 5 - iter 78/785 - loss 0.18971301 - time (sec): 45.83 - samples/sec: 455.71 - lr: 0.000004 2023-04-10 13:34:18,919 epoch 5 - iter 156/785 - loss 0.19396453 - time (sec): 91.80 - samples/sec: 453.89 - lr: 0.000004 2023-04-10 13:35:04,859 epoch 5 - iter 234/785 - loss 0.19108296 - time (sec): 137.74 - samples/sec: 453.20 - lr: 0.000004 2023-04-10 13:35:50,732 epoch 5 - iter 312/785 - loss 0.18832768 - time (sec): 183.61 - samples/sec: 449.66 - lr: 0.000004 2023-04-10 13:36:36,467 epoch 5 - iter 390/785 - loss 0.18825695 - time (sec): 229.35 - samples/sec: 452.64 - lr: 0.000004 2023-04-10 13:37:22,590 epoch 5 - iter 468/785 - loss 0.18787454 - time (sec): 275.47 - samples/sec: 451.57 - lr: 0.000004 2023-04-10 13:38:08,477 epoch 5 - iter 546/785 - loss 0.18615161 - time (sec): 321.36 - samples/sec: 451.47 - lr: 0.000004 2023-04-10 13:38:54,257 epoch 5 - iter 624/785 - loss 0.18594722 - time (sec): 367.14 - samples/sec: 450.59 - lr: 0.000004 2023-04-10 13:39:40,394 epoch 5 - iter 702/785 - loss 0.18508805 - time (sec): 413.27 - samples/sec: 450.44 - lr: 0.000004 2023-04-10 13:40:26,352 epoch 5 - iter 780/785 - loss 0.18421189 - time (sec): 459.23 - samples/sec: 450.18 - lr: 0.000004 2023-04-10 13:40:29,092 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:40:29,093 EPOCH 5 done: loss 0.1843 - lr 0.000004 2023-04-10 13:40:54,758 Evaluating as a multi-label problem: False 2023-04-10 13:40:54,836 DEV : loss 0.19248297810554504 - f1-score (micro avg) 0.8091 2023-04-10 13:40:54,920 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:41:40,954 epoch 6 - iter 78/785 - loss 0.16169561 - time (sec): 46.03 - samples/sec: 440.82 - lr: 0.000004 2023-04-10 13:42:26,837 epoch 6 - iter 156/785 - loss 0.15868632 - time (sec): 91.92 - samples/sec: 441.25 - lr: 0.000004 2023-04-10 13:43:12,870 epoch 6 - iter 234/785 - loss 0.16192991 - time (sec): 137.95 - samples/sec: 443.43 - lr: 0.000004 2023-04-10 13:43:58,927 epoch 6 - iter 312/785 - loss 0.15837734 - time (sec): 184.01 - samples/sec: 445.62 - lr: 0.000004 2023-04-10 13:44:44,996 epoch 6 - iter 390/785 - loss 0.15549650 - time (sec): 230.07 - samples/sec: 442.88 - lr: 0.000004 2023-04-10 13:45:31,130 epoch 6 - iter 468/785 - loss 0.15509965 - time (sec): 276.21 - samples/sec: 440.68 - lr: 0.000004 2023-04-10 13:46:17,430 epoch 6 - iter 546/785 - loss 0.15536700 - time (sec): 322.51 - samples/sec: 444.17 - lr: 0.000004 2023-04-10 13:47:03,271 epoch 6 - iter 624/785 - loss 0.15596272 - time (sec): 368.35 - samples/sec: 447.46 - lr: 0.000004 2023-04-10 13:47:49,333 epoch 6 - iter 702/785 - loss 0.15470882 - time (sec): 414.41 - samples/sec: 446.90 - lr: 0.000004 2023-04-10 13:48:35,335 epoch 6 - iter 780/785 - loss 0.15353726 - time (sec): 460.41 - samples/sec: 449.07 - lr: 0.000004 2023-04-10 13:48:38,091 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:48:38,093 EPOCH 6 done: loss 0.1537 - lr 0.000004 2023-04-10 13:49:03,872 Evaluating as a multi-label problem: False 2023-04-10 13:49:03,948 DEV : loss 0.19085420668125153 - f1-score (micro avg) 0.8218 2023-04-10 13:49:04,033 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:49:49,709 epoch 7 - iter 78/785 - loss 0.13949250 - time (sec): 45.68 - samples/sec: 451.69 - lr: 0.000004 2023-04-10 13:50:35,575 epoch 7 - iter 156/785 - loss 0.13991533 - time (sec): 91.54 - samples/sec: 451.74 - lr: 0.000004 2023-04-10 13:51:21,688 epoch 7 - iter 234/785 - loss 0.13727018 - time (sec): 137.65 - samples/sec: 446.11 - lr: 0.000004 2023-04-10 13:52:07,734 epoch 7 - iter 312/785 - loss 0.13962965 - time (sec): 183.70 - samples/sec: 443.38 - lr: 0.000004 2023-04-10 13:52:53,850 epoch 7 - iter 390/785 - loss 0.13871141 - time (sec): 229.82 - samples/sec: 444.60 - lr: 0.000004 2023-04-10 13:53:39,863 epoch 7 - iter 468/785 - loss 0.13783456 - time (sec): 275.83 - samples/sec: 446.99 - lr: 0.000004 2023-04-10 13:54:25,630 epoch 7 - iter 546/785 - loss 0.13700803 - time (sec): 321.60 - samples/sec: 449.44 - lr: 0.000004 2023-04-10 13:55:11,589 epoch 7 - iter 624/785 - loss 0.13522205 - time (sec): 367.55 - samples/sec: 450.51 - lr: 0.000004 2023-04-10 13:55:57,520 epoch 7 - iter 702/785 - loss 0.13499711 - time (sec): 413.49 - samples/sec: 449.78 - lr: 0.000004 2023-04-10 13:56:43,432 epoch 7 - iter 780/785 - loss 0.13300228 - time (sec): 459.40 - samples/sec: 450.08 - lr: 0.000004 2023-04-10 13:56:46,156 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:56:46,157 EPOCH 7 done: loss 0.1327 - lr 0.000004 2023-04-10 13:57:11,974 Evaluating as a multi-label problem: False 2023-04-10 13:57:12,051 DEV : loss 0.188308447599411 - f1-score (micro avg) 0.8331 2023-04-10 13:57:12,135 ---------------------------------------------------------------------------------------------------- 2023-04-10 13:57:58,126 epoch 8 - iter 78/785 - loss 0.12315730 - time (sec): 45.99 - samples/sec: 450.64 - lr: 0.000004 2023-04-10 13:58:44,253 epoch 8 - iter 156/785 - loss 0.11813120 - time (sec): 92.12 - samples/sec: 436.42 - lr: 0.000004 2023-04-10 13:59:30,059 epoch 8 - iter 234/785 - loss 0.11978297 - time (sec): 137.92 - samples/sec: 444.20 - lr: 0.000004 2023-04-10 14:00:16,019 epoch 8 - iter 312/785 - loss 0.11865398 - time (sec): 183.88 - samples/sec: 448.73 - lr: 0.000004 2023-04-10 14:01:01,667 epoch 8 - iter 390/785 - loss 0.11649927 - time (sec): 229.53 - samples/sec: 449.84 - lr: 0.000003 2023-04-10 14:01:47,632 epoch 8 - iter 468/785 - loss 0.11678690 - time (sec): 275.50 - samples/sec: 450.66 - lr: 0.000003 2023-04-10 14:02:33,760 epoch 8 - iter 546/785 - loss 0.11768582 - time (sec): 321.62 - samples/sec: 451.69 - lr: 0.000003 2023-04-10 14:03:19,469 epoch 8 - iter 624/785 - loss 0.11698818 - time (sec): 367.33 - samples/sec: 450.46 - lr: 0.000003 2023-04-10 14:04:05,350 epoch 8 - iter 702/785 - loss 0.11655166 - time (sec): 413.21 - samples/sec: 449.45 - lr: 0.000003 2023-04-10 14:04:51,185 epoch 8 - iter 780/785 - loss 0.11644185 - time (sec): 459.05 - samples/sec: 450.72 - lr: 0.000003 2023-04-10 14:04:53,872 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:04:53,874 EPOCH 8 done: loss 0.1164 - lr 0.000003 2023-04-10 14:05:19,663 Evaluating as a multi-label problem: False 2023-04-10 14:05:19,743 DEV : loss 0.18473857641220093 - f1-score (micro avg) 0.8406 2023-04-10 14:05:19,828 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:06:05,453 epoch 9 - iter 78/785 - loss 0.10657376 - time (sec): 45.62 - samples/sec: 440.02 - lr: 0.000003 2023-04-10 14:06:51,368 epoch 9 - iter 156/785 - loss 0.10863598 - time (sec): 91.54 - samples/sec: 439.74 - lr: 0.000003 2023-04-10 14:07:36,988 epoch 9 - iter 234/785 - loss 0.10633920 - time (sec): 137.16 - samples/sec: 445.35 - lr: 0.000003 2023-04-10 14:08:22,898 epoch 9 - iter 312/785 - loss 0.10460097 - time (sec): 183.07 - samples/sec: 446.32 - lr: 0.000003 2023-04-10 14:09:08,636 epoch 9 - iter 390/785 - loss 0.10531387 - time (sec): 228.81 - samples/sec: 446.18 - lr: 0.000003 2023-04-10 14:09:54,238 epoch 9 - iter 468/785 - loss 0.10648494 - time (sec): 274.41 - samples/sec: 446.35 - lr: 0.000003 2023-04-10 14:10:39,806 epoch 9 - iter 546/785 - loss 0.10488251 - time (sec): 319.98 - samples/sec: 448.60 - lr: 0.000003 2023-04-10 14:11:25,286 epoch 9 - iter 624/785 - loss 0.10527523 - time (sec): 365.46 - samples/sec: 450.60 - lr: 0.000003 2023-04-10 14:12:10,986 epoch 9 - iter 702/785 - loss 0.10473876 - time (sec): 411.16 - samples/sec: 451.31 - lr: 0.000003 2023-04-10 14:12:56,700 epoch 9 - iter 780/785 - loss 0.10399221 - time (sec): 456.87 - samples/sec: 452.89 - lr: 0.000003 2023-04-10 14:12:59,380 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:12:59,382 EPOCH 9 done: loss 0.1042 - lr 0.000003 2023-04-10 14:13:25,061 Evaluating as a multi-label problem: False 2023-04-10 14:13:25,138 DEV : loss 0.19332602620124817 - f1-score (micro avg) 0.8443 2023-04-10 14:13:25,224 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:14:10,842 epoch 10 - iter 78/785 - loss 0.09722209 - time (sec): 45.62 - samples/sec: 457.53 - lr: 0.000003 2023-04-10 14:14:56,574 epoch 10 - iter 156/785 - loss 0.09960375 - time (sec): 91.35 - samples/sec: 452.13 - lr: 0.000003 2023-04-10 14:15:42,382 epoch 10 - iter 234/785 - loss 0.09791734 - time (sec): 137.16 - samples/sec: 451.64 - lr: 0.000003 2023-04-10 14:16:28,359 epoch 10 - iter 312/785 - loss 0.09533145 - time (sec): 183.13 - samples/sec: 455.08 - lr: 0.000003 2023-04-10 14:17:13,898 epoch 10 - iter 390/785 - loss 0.09546462 - time (sec): 228.67 - samples/sec: 455.61 - lr: 0.000003 2023-04-10 14:17:59,305 epoch 10 - iter 468/785 - loss 0.09469376 - time (sec): 274.08 - samples/sec: 452.76 - lr: 0.000003 2023-04-10 14:18:44,944 epoch 10 - iter 546/785 - loss 0.09461890 - time (sec): 319.72 - samples/sec: 453.17 - lr: 0.000003 2023-04-10 14:19:30,380 epoch 10 - iter 624/785 - loss 0.09481641 - time (sec): 365.16 - samples/sec: 453.85 - lr: 0.000003 2023-04-10 14:20:15,968 epoch 10 - iter 702/785 - loss 0.09459196 - time (sec): 410.74 - samples/sec: 453.32 - lr: 0.000003 2023-04-10 14:21:01,721 epoch 10 - iter 780/785 - loss 0.09403540 - time (sec): 456.50 - samples/sec: 452.79 - lr: 0.000003 2023-04-10 14:21:04,407 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:21:04,409 EPOCH 10 done: loss 0.0938 - lr 0.000003 2023-04-10 14:21:30,389 Evaluating as a multi-label problem: False 2023-04-10 14:21:30,467 DEV : loss 0.18941430747509003 - f1-score (micro avg) 0.8458 2023-04-10 14:21:30,553 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:22:16,229 epoch 11 - iter 78/785 - loss 0.07965436 - time (sec): 45.67 - samples/sec: 469.58 - lr: 0.000003 2023-04-10 14:23:02,286 epoch 11 - iter 156/785 - loss 0.08242477 - time (sec): 91.73 - samples/sec: 467.46 - lr: 0.000003 2023-04-10 14:23:47,919 epoch 11 - iter 234/785 - loss 0.08405410 - time (sec): 137.36 - samples/sec: 461.18 - lr: 0.000003 2023-04-10 14:24:33,686 epoch 11 - iter 312/785 - loss 0.08238391 - time (sec): 183.13 - samples/sec: 455.77 - lr: 0.000003 2023-04-10 14:25:19,484 epoch 11 - iter 390/785 - loss 0.08149592 - time (sec): 228.93 - samples/sec: 453.93 - lr: 0.000003 2023-04-10 14:26:05,233 epoch 11 - iter 468/785 - loss 0.08168820 - time (sec): 274.68 - samples/sec: 452.09 - lr: 0.000003 2023-04-10 14:26:50,901 epoch 11 - iter 546/785 - loss 0.08177046 - time (sec): 320.35 - samples/sec: 452.25 - lr: 0.000003 2023-04-10 14:27:36,661 epoch 11 - iter 624/785 - loss 0.08271731 - time (sec): 366.11 - samples/sec: 452.99 - lr: 0.000003 2023-04-10 14:28:22,099 epoch 11 - iter 702/785 - loss 0.08254577 - time (sec): 411.54 - samples/sec: 452.31 - lr: 0.000003 2023-04-10 14:29:07,671 epoch 11 - iter 780/785 - loss 0.08371043 - time (sec): 457.12 - samples/sec: 453.09 - lr: 0.000003 2023-04-10 14:29:10,316 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:29:10,318 EPOCH 11 done: loss 0.0837 - lr 0.000003 2023-04-10 14:29:35,090 Evaluating as a multi-label problem: False 2023-04-10 14:29:35,166 DEV : loss 0.2022610902786255 - f1-score (micro avg) 0.8404 2023-04-10 14:29:35,259 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:30:21,283 epoch 12 - iter 78/785 - loss 0.06767359 - time (sec): 46.02 - samples/sec: 454.91 - lr: 0.000002 2023-04-10 14:31:07,146 epoch 12 - iter 156/785 - loss 0.07443837 - time (sec): 91.89 - samples/sec: 449.99 - lr: 0.000002 2023-04-10 14:31:53,003 epoch 12 - iter 234/785 - loss 0.07629224 - time (sec): 137.74 - samples/sec: 451.86 - lr: 0.000002 2023-04-10 14:32:38,934 epoch 12 - iter 312/785 - loss 0.07741157 - time (sec): 183.67 - samples/sec: 452.03 - lr: 0.000002 2023-04-10 14:33:24,791 epoch 12 - iter 390/785 - loss 0.07706257 - time (sec): 229.53 - samples/sec: 454.13 - lr: 0.000002 2023-04-10 14:34:10,546 epoch 12 - iter 468/785 - loss 0.07581749 - time (sec): 275.29 - samples/sec: 454.58 - lr: 0.000002 2023-04-10 14:34:56,353 epoch 12 - iter 546/785 - loss 0.07615371 - time (sec): 321.09 - samples/sec: 453.24 - lr: 0.000002 2023-04-10 14:35:41,715 epoch 12 - iter 624/785 - loss 0.07630547 - time (sec): 366.45 - samples/sec: 451.94 - lr: 0.000002 2023-04-10 14:36:27,902 epoch 12 - iter 702/785 - loss 0.07703151 - time (sec): 412.64 - samples/sec: 451.11 - lr: 0.000002 2023-04-10 14:37:13,513 epoch 12 - iter 780/785 - loss 0.07688972 - time (sec): 458.25 - samples/sec: 451.11 - lr: 0.000002 2023-04-10 14:37:16,212 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:37:16,214 EPOCH 12 done: loss 0.0769 - lr 0.000002 2023-04-10 14:37:42,267 Evaluating as a multi-label problem: False 2023-04-10 14:37:42,343 DEV : loss 0.19032613933086395 - f1-score (micro avg) 0.8513 2023-04-10 14:37:42,429 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:38:28,309 epoch 13 - iter 78/785 - loss 0.06781882 - time (sec): 45.88 - samples/sec: 437.28 - lr: 0.000002 2023-04-10 14:39:13,998 epoch 13 - iter 156/785 - loss 0.06953428 - time (sec): 91.57 - samples/sec: 442.07 - lr: 0.000002 2023-04-10 14:39:59,400 epoch 13 - iter 234/785 - loss 0.06968786 - time (sec): 136.97 - samples/sec: 447.35 - lr: 0.000002 2023-04-10 14:40:45,242 epoch 13 - iter 312/785 - loss 0.07032229 - time (sec): 182.81 - samples/sec: 449.08 - lr: 0.000002 2023-04-10 14:41:30,932 epoch 13 - iter 390/785 - loss 0.07052987 - time (sec): 228.50 - samples/sec: 445.56 - lr: 0.000002 2023-04-10 14:42:16,884 epoch 13 - iter 468/785 - loss 0.07176712 - time (sec): 274.45 - samples/sec: 444.54 - lr: 0.000002 2023-04-10 14:43:02,911 epoch 13 - iter 546/785 - loss 0.07183614 - time (sec): 320.48 - samples/sec: 446.39 - lr: 0.000002 2023-04-10 14:43:48,816 epoch 13 - iter 624/785 - loss 0.07253765 - time (sec): 366.39 - samples/sec: 446.93 - lr: 0.000002 2023-04-10 14:44:34,491 epoch 13 - iter 702/785 - loss 0.07213498 - time (sec): 412.06 - samples/sec: 449.17 - lr: 0.000002 2023-04-10 14:45:20,007 epoch 13 - iter 780/785 - loss 0.07218568 - time (sec): 457.58 - samples/sec: 451.86 - lr: 0.000002 2023-04-10 14:45:22,772 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:45:22,774 EPOCH 13 done: loss 0.0722 - lr 0.000002 2023-04-10 14:45:48,608 Evaluating as a multi-label problem: False 2023-04-10 14:45:48,685 DEV : loss 0.19682374596595764 - f1-score (micro avg) 0.853 2023-04-10 14:45:48,772 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:46:34,526 epoch 14 - iter 78/785 - loss 0.05882194 - time (sec): 45.75 - samples/sec: 442.48 - lr: 0.000002 2023-04-10 14:47:20,308 epoch 14 - iter 156/785 - loss 0.06553124 - time (sec): 91.53 - samples/sec: 446.65 - lr: 0.000002 2023-04-10 14:48:06,130 epoch 14 - iter 234/785 - loss 0.06636154 - time (sec): 137.36 - samples/sec: 445.02 - lr: 0.000002 2023-04-10 14:48:51,621 epoch 14 - iter 312/785 - loss 0.06544912 - time (sec): 182.85 - samples/sec: 448.03 - lr: 0.000002 2023-04-10 14:49:37,323 epoch 14 - iter 390/785 - loss 0.06512617 - time (sec): 228.55 - samples/sec: 448.79 - lr: 0.000002 2023-04-10 14:50:23,228 epoch 14 - iter 468/785 - loss 0.06536846 - time (sec): 274.46 - samples/sec: 448.59 - lr: 0.000002 2023-04-10 14:51:08,762 epoch 14 - iter 546/785 - loss 0.06540547 - time (sec): 319.99 - samples/sec: 450.40 - lr: 0.000002 2023-04-10 14:51:54,701 epoch 14 - iter 624/785 - loss 0.06641531 - time (sec): 365.93 - samples/sec: 448.39 - lr: 0.000002 2023-04-10 14:52:40,613 epoch 14 - iter 702/785 - loss 0.06649606 - time (sec): 411.84 - samples/sec: 449.74 - lr: 0.000002 2023-04-10 14:53:26,281 epoch 14 - iter 780/785 - loss 0.06663863 - time (sec): 457.51 - samples/sec: 452.11 - lr: 0.000002 2023-04-10 14:53:29,011 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:53:29,013 EPOCH 14 done: loss 0.0665 - lr 0.000002 2023-04-10 14:53:54,922 Evaluating as a multi-label problem: False 2023-04-10 14:53:54,995 DEV : loss 0.19152763485908508 - f1-score (micro avg) 0.8543 2023-04-10 14:53:55,084 ---------------------------------------------------------------------------------------------------- 2023-04-10 14:54:40,977 epoch 15 - iter 78/785 - loss 0.05893628 - time (sec): 45.89 - samples/sec: 434.64 - lr: 0.000002 2023-04-10 14:55:26,530 epoch 15 - iter 156/785 - loss 0.06296731 - time (sec): 91.44 - samples/sec: 438.22 - lr: 0.000002 2023-04-10 14:56:11,919 epoch 15 - iter 234/785 - loss 0.06296709 - time (sec): 136.83 - samples/sec: 449.49 - lr: 0.000002 2023-04-10 14:56:57,537 epoch 15 - iter 312/785 - loss 0.06042086 - time (sec): 182.45 - samples/sec: 452.57 - lr: 0.000002 2023-04-10 14:57:43,221 epoch 15 - iter 390/785 - loss 0.06400154 - time (sec): 228.13 - samples/sec: 453.81 - lr: 0.000002 2023-04-10 14:58:29,151 epoch 15 - iter 468/785 - loss 0.06384428 - time (sec): 274.07 - samples/sec: 450.67 - lr: 0.000002 2023-04-10 14:59:14,871 epoch 15 - iter 546/785 - loss 0.06211280 - time (sec): 319.79 - samples/sec: 452.05 - lr: 0.000001 2023-04-10 15:00:00,212 epoch 15 - iter 624/785 - loss 0.06312445 - time (sec): 365.13 - samples/sec: 453.40 - lr: 0.000001 2023-04-10 15:00:46,083 epoch 15 - iter 702/785 - loss 0.06365620 - time (sec): 411.00 - samples/sec: 453.08 - lr: 0.000001 2023-04-10 15:01:31,831 epoch 15 - iter 780/785 - loss 0.06357545 - time (sec): 456.74 - samples/sec: 452.90 - lr: 0.000001 2023-04-10 15:01:34,610 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:01:34,612 EPOCH 15 done: loss 0.0635 - lr 0.000001 2023-04-10 15:02:00,571 Evaluating as a multi-label problem: False 2023-04-10 15:02:00,650 DEV : loss 0.19623318314552307 - f1-score (micro avg) 0.8562 2023-04-10 15:02:00,739 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:02:46,516 epoch 16 - iter 78/785 - loss 0.05263069 - time (sec): 45.78 - samples/sec: 447.66 - lr: 0.000001 2023-04-10 15:03:32,217 epoch 16 - iter 156/785 - loss 0.05540555 - time (sec): 91.48 - samples/sec: 458.04 - lr: 0.000001 2023-04-10 15:04:17,802 epoch 16 - iter 234/785 - loss 0.05653095 - time (sec): 137.06 - samples/sec: 454.61 - lr: 0.000001 2023-04-10 15:05:03,756 epoch 16 - iter 312/785 - loss 0.05690468 - time (sec): 183.01 - samples/sec: 453.77 - lr: 0.000001 2023-04-10 15:05:49,407 epoch 16 - iter 390/785 - loss 0.05848835 - time (sec): 228.67 - samples/sec: 454.04 - lr: 0.000001 2023-04-10 15:06:35,060 epoch 16 - iter 468/785 - loss 0.05897047 - time (sec): 274.32 - samples/sec: 453.35 - lr: 0.000001 2023-04-10 15:07:20,765 epoch 16 - iter 546/785 - loss 0.05940641 - time (sec): 320.02 - samples/sec: 452.29 - lr: 0.000001 2023-04-10 15:08:06,518 epoch 16 - iter 624/785 - loss 0.05878874 - time (sec): 365.78 - samples/sec: 452.31 - lr: 0.000001 2023-04-10 15:08:52,406 epoch 16 - iter 702/785 - loss 0.05878710 - time (sec): 411.67 - samples/sec: 452.43 - lr: 0.000001 2023-04-10 15:09:38,261 epoch 16 - iter 780/785 - loss 0.05871527 - time (sec): 457.52 - samples/sec: 452.47 - lr: 0.000001 2023-04-10 15:09:41,139 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:09:41,141 EPOCH 16 done: loss 0.0587 - lr 0.000001 2023-04-10 15:10:06,206 Evaluating as a multi-label problem: False 2023-04-10 15:10:06,282 DEV : loss 0.19955378770828247 - f1-score (micro avg) 0.8578 2023-04-10 15:10:06,370 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:10:52,361 epoch 17 - iter 78/785 - loss 0.05076330 - time (sec): 45.99 - samples/sec: 462.06 - lr: 0.000001 2023-04-10 15:11:38,184 epoch 17 - iter 156/785 - loss 0.05519241 - time (sec): 91.81 - samples/sec: 462.21 - lr: 0.000001 2023-04-10 15:12:24,115 epoch 17 - iter 234/785 - loss 0.05342529 - time (sec): 137.74 - samples/sec: 457.55 - lr: 0.000001 2023-04-10 15:13:09,882 epoch 17 - iter 312/785 - loss 0.05189467 - time (sec): 183.51 - samples/sec: 455.00 - lr: 0.000001 2023-04-10 15:13:55,976 epoch 17 - iter 390/785 - loss 0.05405067 - time (sec): 229.60 - samples/sec: 453.17 - lr: 0.000001 2023-04-10 15:14:41,579 epoch 17 - iter 468/785 - loss 0.05398715 - time (sec): 275.21 - samples/sec: 453.21 - lr: 0.000001 2023-04-10 15:15:27,308 epoch 17 - iter 546/785 - loss 0.05539713 - time (sec): 320.94 - samples/sec: 454.08 - lr: 0.000001 2023-04-10 15:16:13,512 epoch 17 - iter 624/785 - loss 0.05586570 - time (sec): 367.14 - samples/sec: 453.67 - lr: 0.000001 2023-04-10 15:16:59,624 epoch 17 - iter 702/785 - loss 0.05576616 - time (sec): 413.25 - samples/sec: 452.94 - lr: 0.000001 2023-04-10 15:17:45,460 epoch 17 - iter 780/785 - loss 0.05531521 - time (sec): 459.09 - samples/sec: 450.39 - lr: 0.000001 2023-04-10 15:17:48,168 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:17:48,170 EPOCH 17 done: loss 0.0553 - lr 0.000001 2023-04-10 15:18:14,080 Evaluating as a multi-label problem: False 2023-04-10 15:18:14,155 DEV : loss 0.20788049697875977 - f1-score (micro avg) 0.8562 2023-04-10 15:18:14,243 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:18:48,097 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:18:48,099 Exiting from training early. 2023-04-10 15:18:48,100 Saving model ... 2023-04-10 15:18:48,949 Done. 2023-04-10 15:18:48,952 ---------------------------------------------------------------------------------------------------- 2023-04-10 15:18:48,954 Testing using last state of model ... 2023-04-10 15:19:14,468 Evaluating as a multi-label problem: False 2023-04-10 15:19:14,541 0.8346 0.868 0.851 0.7477 2023-04-10 15:19:14,543 Results: - F-score (micro) 0.851 - F-score (macro) 0.8197 - Accuracy 0.7477 By class: precision recall f1-score support PROC 0.8033 0.8731 0.8368 3364 DISO 0.8552 0.8722 0.8636 2472 CHEM 0.8973 0.8933 0.8953 1565 ANAT 0.7138 0.6551 0.6832 316 micro avg 0.8346 0.8680 0.8510 7717 macro avg 0.8174 0.8234 0.8197 7717 weighted avg 0.8353 0.8680 0.8509 7717 2023-04-10 15:19:14,544 ----------------------------------------------------------------------------------------------------