2023-10-09 21:19:56,874 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,877 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-09 21:19:56,877 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,877 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-09 21:19:56,877 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,877 Train: 20847 sentences 2023-10-09 21:19:56,877 (train_with_dev=False, train_with_test=False) 2023-10-09 21:19:56,878 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,878 Training Params: 2023-10-09 21:19:56,878 - learning_rate: "0.00016" 2023-10-09 21:19:56,878 - mini_batch_size: "8" 2023-10-09 21:19:56,878 - max_epochs: "10" 2023-10-09 21:19:56,878 - shuffle: "True" 2023-10-09 21:19:56,878 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,878 Plugins: 2023-10-09 21:19:56,878 - TensorboardLogger 2023-10-09 21:19:56,878 - LinearScheduler | warmup_fraction: '0.1' 2023-10-09 21:19:56,878 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,878 Final evaluation on model from best epoch (best-model.pt) 2023-10-09 21:19:56,879 - metric: "('micro avg', 'f1-score')" 2023-10-09 21:19:56,879 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,879 Computation: 2023-10-09 21:19:56,879 - compute on device: cuda:0 2023-10-09 21:19:56,879 - embedding storage: none 2023-10-09 21:19:56,879 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,879 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-09 21:19:56,879 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,879 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:19:56,879 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-09 21:22:21,512 epoch 1 - iter 260/2606 - loss 2.80647480 - time (sec): 144.63 - samples/sec: 272.42 - lr: 0.000016 - momentum: 0.000000 2023-10-09 21:24:40,110 epoch 1 - iter 520/2606 - loss 2.57714924 - time (sec): 283.23 - samples/sec: 261.01 - lr: 0.000032 - momentum: 0.000000 2023-10-09 21:27:08,137 epoch 1 - iter 780/2606 - loss 2.16321673 - time (sec): 431.26 - samples/sec: 255.15 - lr: 0.000048 - momentum: 0.000000 2023-10-09 21:29:25,932 epoch 1 - iter 1040/2606 - loss 1.80319624 - time (sec): 569.05 - samples/sec: 253.10 - lr: 0.000064 - momentum: 0.000000 2023-10-09 21:31:44,572 epoch 1 - iter 1300/2606 - loss 1.51984788 - time (sec): 707.69 - samples/sec: 256.83 - lr: 0.000080 - momentum: 0.000000 2023-10-09 21:34:06,385 epoch 1 - iter 1560/2606 - loss 1.33539090 - time (sec): 849.50 - samples/sec: 258.86 - lr: 0.000096 - momentum: 0.000000 2023-10-09 21:36:25,234 epoch 1 - iter 1820/2606 - loss 1.20340260 - time (sec): 988.35 - samples/sec: 257.95 - lr: 0.000112 - momentum: 0.000000 2023-10-09 21:38:45,336 epoch 1 - iter 2080/2606 - loss 1.08973347 - time (sec): 1128.45 - samples/sec: 258.54 - lr: 0.000128 - momentum: 0.000000 2023-10-09 21:41:05,295 epoch 1 - iter 2340/2606 - loss 0.99333935 - time (sec): 1268.41 - samples/sec: 260.51 - lr: 0.000144 - momentum: 0.000000 2023-10-09 21:43:27,691 epoch 1 - iter 2600/2606 - loss 0.92110059 - time (sec): 1410.81 - samples/sec: 259.88 - lr: 0.000160 - momentum: 0.000000 2023-10-09 21:43:30,711 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:43:30,711 EPOCH 1 done: loss 0.9198 - lr: 0.000160 2023-10-09 21:44:07,491 DEV : loss 0.1330375373363495 - f1-score (micro avg) 0.3013 2023-10-09 21:44:07,556 saving best model 2023-10-09 21:44:08,556 ---------------------------------------------------------------------------------------------------- 2023-10-09 21:46:29,548 epoch 2 - iter 260/2606 - loss 0.21276153 - time (sec): 140.99 - samples/sec: 282.02 - lr: 0.000158 - momentum: 0.000000 2023-10-09 21:48:49,379 epoch 2 - iter 520/2606 - loss 0.21284661 - time (sec): 280.82 - samples/sec: 279.79 - lr: 0.000156 - momentum: 0.000000 2023-10-09 21:51:07,729 epoch 2 - iter 780/2606 - loss 0.19909410 - time (sec): 419.17 - samples/sec: 276.40 - lr: 0.000155 - momentum: 0.000000 2023-10-09 21:53:23,117 epoch 2 - iter 1040/2606 - loss 0.19175427 - time (sec): 554.56 - samples/sec: 271.02 - lr: 0.000153 - momentum: 0.000000 2023-10-09 21:55:40,552 epoch 2 - iter 1300/2606 - loss 0.18755619 - time (sec): 691.99 - samples/sec: 269.33 - lr: 0.000151 - momentum: 0.000000 2023-10-09 21:57:56,677 epoch 2 - iter 1560/2606 - loss 0.18201966 - time (sec): 828.12 - samples/sec: 268.26 - lr: 0.000149 - momentum: 0.000000 2023-10-09 22:00:21,283 epoch 2 - iter 1820/2606 - loss 0.17717852 - time (sec): 972.72 - samples/sec: 265.45 - lr: 0.000148 - momentum: 0.000000 2023-10-09 22:02:40,967 epoch 2 - iter 2080/2606 - loss 0.17086806 - time (sec): 1112.41 - samples/sec: 265.29 - lr: 0.000146 - momentum: 0.000000 2023-10-09 22:05:00,885 epoch 2 - iter 2340/2606 - loss 0.16582540 - time (sec): 1252.33 - samples/sec: 265.78 - lr: 0.000144 - momentum: 0.000000 2023-10-09 22:07:18,380 epoch 2 - iter 2600/2606 - loss 0.16149225 - time (sec): 1389.82 - samples/sec: 263.81 - lr: 0.000142 - momentum: 0.000000 2023-10-09 22:07:21,444 ---------------------------------------------------------------------------------------------------- 2023-10-09 22:07:21,445 EPOCH 2 done: loss 0.1613 - lr: 0.000142 2023-10-09 22:08:03,474 DEV : loss 0.11526025831699371 - f1-score (micro avg) 0.3843 2023-10-09 22:08:03,533 saving best model 2023-10-09 22:08:06,253 ---------------------------------------------------------------------------------------------------- 2023-10-09 22:10:30,351 epoch 3 - iter 260/2606 - loss 0.09533533 - time (sec): 144.09 - samples/sec: 252.71 - lr: 0.000140 - momentum: 0.000000 2023-10-09 22:12:48,656 epoch 3 - iter 520/2606 - loss 0.09967778 - time (sec): 282.40 - samples/sec: 252.71 - lr: 0.000139 - momentum: 0.000000 2023-10-09 22:15:13,286 epoch 3 - iter 780/2606 - loss 0.09556336 - time (sec): 427.02 - samples/sec: 259.00 - lr: 0.000137 - momentum: 0.000000 2023-10-09 22:17:29,465 epoch 3 - iter 1040/2606 - loss 0.09703779 - time (sec): 563.20 - samples/sec: 255.55 - lr: 0.000135 - momentum: 0.000000 2023-10-09 22:19:43,209 epoch 3 - iter 1300/2606 - loss 0.09641637 - time (sec): 696.95 - samples/sec: 254.29 - lr: 0.000133 - momentum: 0.000000 2023-10-09 22:22:06,685 epoch 3 - iter 1560/2606 - loss 0.09648911 - time (sec): 840.42 - samples/sec: 258.14 - lr: 0.000132 - momentum: 0.000000 2023-10-09 22:24:35,613 epoch 3 - iter 1820/2606 - loss 0.09589935 - time (sec): 989.35 - samples/sec: 258.39 - lr: 0.000130 - momentum: 0.000000 2023-10-09 22:26:56,117 epoch 3 - iter 2080/2606 - loss 0.09505206 - time (sec): 1129.86 - samples/sec: 259.55 - lr: 0.000128 - momentum: 0.000000 2023-10-09 22:29:16,749 epoch 3 - iter 2340/2606 - loss 0.09458005 - time (sec): 1270.49 - samples/sec: 260.79 - lr: 0.000126 - momentum: 0.000000 2023-10-09 22:31:34,336 epoch 3 - iter 2600/2606 - loss 0.09389335 - time (sec): 1408.08 - samples/sec: 260.50 - lr: 0.000125 - momentum: 0.000000 2023-10-09 22:31:37,238 ---------------------------------------------------------------------------------------------------- 2023-10-09 22:31:37,238 EPOCH 3 done: loss 0.0940 - lr: 0.000125 2023-10-09 22:32:18,211 DEV : loss 0.21473725140094757 - f1-score (micro avg) 0.3466 2023-10-09 22:32:18,273 ---------------------------------------------------------------------------------------------------- 2023-10-09 22:34:36,848 epoch 4 - iter 260/2606 - loss 0.06749285 - time (sec): 138.57 - samples/sec: 263.49 - lr: 0.000123 - momentum: 0.000000 2023-10-09 22:36:57,878 epoch 4 - iter 520/2606 - loss 0.06262288 - time (sec): 279.60 - samples/sec: 257.39 - lr: 0.000121 - momentum: 0.000000 2023-10-09 22:39:15,206 epoch 4 - iter 780/2606 - loss 0.06093080 - time (sec): 416.93 - samples/sec: 258.17 - lr: 0.000119 - momentum: 0.000000 2023-10-09 22:41:33,785 epoch 4 - iter 1040/2606 - loss 0.06300101 - time (sec): 555.51 - samples/sec: 258.27 - lr: 0.000117 - momentum: 0.000000 2023-10-09 22:43:52,308 epoch 4 - iter 1300/2606 - loss 0.06730143 - time (sec): 694.03 - samples/sec: 260.72 - lr: 0.000116 - momentum: 0.000000 2023-10-09 22:46:21,314 epoch 4 - iter 1560/2606 - loss 0.06445156 - time (sec): 843.04 - samples/sec: 262.43 - lr: 0.000114 - momentum: 0.000000 2023-10-09 22:48:37,254 epoch 4 - iter 1820/2606 - loss 0.06404726 - time (sec): 978.98 - samples/sec: 261.77 - lr: 0.000112 - momentum: 0.000000 2023-10-09 22:50:58,245 epoch 4 - iter 2080/2606 - loss 0.06441531 - time (sec): 1119.97 - samples/sec: 260.79 - lr: 0.000110 - momentum: 0.000000 2023-10-09 22:53:19,391 epoch 4 - iter 2340/2606 - loss 0.06601434 - time (sec): 1261.12 - samples/sec: 261.63 - lr: 0.000109 - momentum: 0.000000 2023-10-09 22:55:43,080 epoch 4 - iter 2600/2606 - loss 0.06647076 - time (sec): 1404.80 - samples/sec: 261.04 - lr: 0.000107 - momentum: 0.000000 2023-10-09 22:55:46,216 ---------------------------------------------------------------------------------------------------- 2023-10-09 22:55:46,217 EPOCH 4 done: loss 0.0665 - lr: 0.000107 2023-10-09 22:56:28,195 DEV : loss 0.25312381982803345 - f1-score (micro avg) 0.3504 2023-10-09 22:56:28,256 ---------------------------------------------------------------------------------------------------- 2023-10-09 22:58:52,649 epoch 5 - iter 260/2606 - loss 0.04687563 - time (sec): 144.39 - samples/sec: 238.37 - lr: 0.000105 - momentum: 0.000000 2023-10-09 23:01:12,773 epoch 5 - iter 520/2606 - loss 0.05344267 - time (sec): 284.51 - samples/sec: 250.63 - lr: 0.000103 - momentum: 0.000000 2023-10-09 23:03:33,085 epoch 5 - iter 780/2606 - loss 0.05092848 - time (sec): 424.83 - samples/sec: 258.55 - lr: 0.000101 - momentum: 0.000000 2023-10-09 23:05:57,687 epoch 5 - iter 1040/2606 - loss 0.04913735 - time (sec): 569.43 - samples/sec: 260.71 - lr: 0.000100 - momentum: 0.000000 2023-10-09 23:08:12,298 epoch 5 - iter 1300/2606 - loss 0.04916492 - time (sec): 704.04 - samples/sec: 260.14 - lr: 0.000098 - momentum: 0.000000 2023-10-09 23:10:31,281 epoch 5 - iter 1560/2606 - loss 0.05150383 - time (sec): 843.02 - samples/sec: 261.58 - lr: 0.000096 - momentum: 0.000000 2023-10-09 23:12:55,333 epoch 5 - iter 1820/2606 - loss 0.05223482 - time (sec): 987.07 - samples/sec: 262.40 - lr: 0.000094 - momentum: 0.000000 2023-10-09 23:15:20,555 epoch 5 - iter 2080/2606 - loss 0.05153119 - time (sec): 1132.30 - samples/sec: 260.99 - lr: 0.000093 - momentum: 0.000000 2023-10-09 23:17:39,058 epoch 5 - iter 2340/2606 - loss 0.05043738 - time (sec): 1270.80 - samples/sec: 259.67 - lr: 0.000091 - momentum: 0.000000 2023-10-09 23:20:03,897 epoch 5 - iter 2600/2606 - loss 0.05060692 - time (sec): 1415.64 - samples/sec: 258.66 - lr: 0.000089 - momentum: 0.000000 2023-10-09 23:20:07,768 ---------------------------------------------------------------------------------------------------- 2023-10-09 23:20:07,768 EPOCH 5 done: loss 0.0505 - lr: 0.000089 2023-10-09 23:20:48,701 DEV : loss 0.2983781099319458 - f1-score (micro avg) 0.3832 2023-10-09 23:20:48,772 ---------------------------------------------------------------------------------------------------- 2023-10-09 23:23:07,560 epoch 6 - iter 260/2606 - loss 0.02937703 - time (sec): 138.79 - samples/sec: 264.82 - lr: 0.000087 - momentum: 0.000000 2023-10-09 23:25:30,695 epoch 6 - iter 520/2606 - loss 0.03318626 - time (sec): 281.92 - samples/sec: 251.56 - lr: 0.000085 - momentum: 0.000000 2023-10-09 23:27:52,282 epoch 6 - iter 780/2606 - loss 0.03155811 - time (sec): 423.51 - samples/sec: 259.36 - lr: 0.000084 - momentum: 0.000000 2023-10-09 23:30:13,592 epoch 6 - iter 1040/2606 - loss 0.03188441 - time (sec): 564.82 - samples/sec: 257.25 - lr: 0.000082 - momentum: 0.000000 2023-10-09 23:32:34,084 epoch 6 - iter 1300/2606 - loss 0.03353433 - time (sec): 705.31 - samples/sec: 257.77 - lr: 0.000080 - momentum: 0.000000 2023-10-09 23:34:58,642 epoch 6 - iter 1560/2606 - loss 0.03464167 - time (sec): 849.87 - samples/sec: 254.06 - lr: 0.000078 - momentum: 0.000000 2023-10-09 23:37:22,052 epoch 6 - iter 1820/2606 - loss 0.03470450 - time (sec): 993.28 - samples/sec: 254.45 - lr: 0.000077 - momentum: 0.000000 2023-10-09 23:39:41,751 epoch 6 - iter 2080/2606 - loss 0.03483777 - time (sec): 1132.98 - samples/sec: 256.43 - lr: 0.000075 - momentum: 0.000000 2023-10-09 23:42:05,986 epoch 6 - iter 2340/2606 - loss 0.03565070 - time (sec): 1277.21 - samples/sec: 257.37 - lr: 0.000073 - momentum: 0.000000 2023-10-09 23:44:26,648 epoch 6 - iter 2600/2606 - loss 0.03682143 - time (sec): 1417.87 - samples/sec: 258.81 - lr: 0.000071 - momentum: 0.000000 2023-10-09 23:44:29,467 ---------------------------------------------------------------------------------------------------- 2023-10-09 23:44:29,468 EPOCH 6 done: loss 0.0368 - lr: 0.000071 2023-10-09 23:45:10,942 DEV : loss 0.35610052943229675 - f1-score (micro avg) 0.3742 2023-10-09 23:45:11,003 ---------------------------------------------------------------------------------------------------- 2023-10-09 23:47:39,689 epoch 7 - iter 260/2606 - loss 0.02267644 - time (sec): 148.68 - samples/sec: 258.00 - lr: 0.000069 - momentum: 0.000000 2023-10-09 23:50:07,336 epoch 7 - iter 520/2606 - loss 0.02417424 - time (sec): 296.33 - samples/sec: 258.54 - lr: 0.000068 - momentum: 0.000000 2023-10-09 23:52:24,084 epoch 7 - iter 780/2606 - loss 0.02606608 - time (sec): 433.08 - samples/sec: 257.55 - lr: 0.000066 - momentum: 0.000000 2023-10-09 23:54:53,690 epoch 7 - iter 1040/2606 - loss 0.02526796 - time (sec): 582.68 - samples/sec: 255.54 - lr: 0.000064 - momentum: 0.000000 2023-10-09 23:57:14,548 epoch 7 - iter 1300/2606 - loss 0.02470524 - time (sec): 723.54 - samples/sec: 258.11 - lr: 0.000062 - momentum: 0.000000 2023-10-09 23:59:40,240 epoch 7 - iter 1560/2606 - loss 0.02608201 - time (sec): 869.23 - samples/sec: 257.31 - lr: 0.000061 - momentum: 0.000000 2023-10-10 00:02:10,450 epoch 7 - iter 1820/2606 - loss 0.02541811 - time (sec): 1019.44 - samples/sec: 254.49 - lr: 0.000059 - momentum: 0.000000 2023-10-10 00:04:28,537 epoch 7 - iter 2080/2606 - loss 0.02616950 - time (sec): 1157.53 - samples/sec: 255.27 - lr: 0.000057 - momentum: 0.000000 2023-10-10 00:06:52,824 epoch 7 - iter 2340/2606 - loss 0.02604706 - time (sec): 1301.82 - samples/sec: 255.00 - lr: 0.000055 - momentum: 0.000000 2023-10-10 00:09:10,093 epoch 7 - iter 2600/2606 - loss 0.02658662 - time (sec): 1439.09 - samples/sec: 254.78 - lr: 0.000053 - momentum: 0.000000 2023-10-10 00:09:13,227 ---------------------------------------------------------------------------------------------------- 2023-10-10 00:09:13,228 EPOCH 7 done: loss 0.0266 - lr: 0.000053 2023-10-10 00:09:54,499 DEV : loss 0.36638563871383667 - f1-score (micro avg) 0.393 2023-10-10 00:09:54,560 saving best model 2023-10-10 00:09:57,284 ---------------------------------------------------------------------------------------------------- 2023-10-10 00:12:19,774 epoch 8 - iter 260/2606 - loss 0.01822913 - time (sec): 142.49 - samples/sec: 253.32 - lr: 0.000052 - momentum: 0.000000 2023-10-10 00:14:43,139 epoch 8 - iter 520/2606 - loss 0.01834896 - time (sec): 285.85 - samples/sec: 254.92 - lr: 0.000050 - momentum: 0.000000 2023-10-10 00:17:03,348 epoch 8 - iter 780/2606 - loss 0.01923891 - time (sec): 426.06 - samples/sec: 259.99 - lr: 0.000048 - momentum: 0.000000 2023-10-10 00:19:26,521 epoch 8 - iter 1040/2606 - loss 0.01918401 - time (sec): 569.23 - samples/sec: 257.56 - lr: 0.000046 - momentum: 0.000000 2023-10-10 00:21:46,854 epoch 8 - iter 1300/2606 - loss 0.02003374 - time (sec): 709.57 - samples/sec: 257.12 - lr: 0.000045 - momentum: 0.000000 2023-10-10 00:24:08,172 epoch 8 - iter 1560/2606 - loss 0.02027599 - time (sec): 850.88 - samples/sec: 258.03 - lr: 0.000043 - momentum: 0.000000 2023-10-10 00:26:31,945 epoch 8 - iter 1820/2606 - loss 0.02016303 - time (sec): 994.66 - samples/sec: 256.05 - lr: 0.000041 - momentum: 0.000000 2023-10-10 00:28:52,525 epoch 8 - iter 2080/2606 - loss 0.01973949 - time (sec): 1135.24 - samples/sec: 258.50 - lr: 0.000039 - momentum: 0.000000 2023-10-10 00:31:16,007 epoch 8 - iter 2340/2606 - loss 0.01926607 - time (sec): 1278.72 - samples/sec: 258.44 - lr: 0.000037 - momentum: 0.000000 2023-10-10 00:33:36,034 epoch 8 - iter 2600/2606 - loss 0.01941452 - time (sec): 1418.75 - samples/sec: 258.42 - lr: 0.000036 - momentum: 0.000000 2023-10-10 00:33:39,247 ---------------------------------------------------------------------------------------------------- 2023-10-10 00:33:39,248 EPOCH 8 done: loss 0.0194 - lr: 0.000036 2023-10-10 00:34:22,217 DEV : loss 0.4113345742225647 - f1-score (micro avg) 0.4105 2023-10-10 00:34:22,275 saving best model 2023-10-10 00:34:25,003 ---------------------------------------------------------------------------------------------------- 2023-10-10 00:36:50,095 epoch 9 - iter 260/2606 - loss 0.01806776 - time (sec): 145.09 - samples/sec: 260.15 - lr: 0.000034 - momentum: 0.000000 2023-10-10 00:39:15,157 epoch 9 - iter 520/2606 - loss 0.01665407 - time (sec): 290.15 - samples/sec: 260.23 - lr: 0.000032 - momentum: 0.000000 2023-10-10 00:41:35,122 epoch 9 - iter 780/2606 - loss 0.01567942 - time (sec): 430.11 - samples/sec: 255.60 - lr: 0.000030 - momentum: 0.000000 2023-10-10 00:44:04,493 epoch 9 - iter 1040/2606 - loss 0.01509980 - time (sec): 579.49 - samples/sec: 253.38 - lr: 0.000029 - momentum: 0.000000 2023-10-10 00:46:23,680 epoch 9 - iter 1300/2606 - loss 0.01591559 - time (sec): 718.67 - samples/sec: 255.04 - lr: 0.000027 - momentum: 0.000000 2023-10-10 00:48:42,310 epoch 9 - iter 1560/2606 - loss 0.01573140 - time (sec): 857.30 - samples/sec: 256.72 - lr: 0.000025 - momentum: 0.000000 2023-10-10 00:51:00,451 epoch 9 - iter 1820/2606 - loss 0.01521895 - time (sec): 995.44 - samples/sec: 256.73 - lr: 0.000023 - momentum: 0.000000 2023-10-10 00:53:21,383 epoch 9 - iter 2080/2606 - loss 0.01475674 - time (sec): 1136.38 - samples/sec: 256.09 - lr: 0.000021 - momentum: 0.000000 2023-10-10 00:55:45,112 epoch 9 - iter 2340/2606 - loss 0.01445053 - time (sec): 1280.10 - samples/sec: 256.35 - lr: 0.000020 - momentum: 0.000000 2023-10-10 00:58:03,910 epoch 9 - iter 2600/2606 - loss 0.01402838 - time (sec): 1418.90 - samples/sec: 258.18 - lr: 0.000018 - momentum: 0.000000 2023-10-10 00:58:07,250 ---------------------------------------------------------------------------------------------------- 2023-10-10 00:58:07,251 EPOCH 9 done: loss 0.0140 - lr: 0.000018 2023-10-10 00:58:48,323 DEV : loss 0.45426633954048157 - f1-score (micro avg) 0.3959 2023-10-10 00:58:48,375 ---------------------------------------------------------------------------------------------------- 2023-10-10 01:01:08,212 epoch 10 - iter 260/2606 - loss 0.01259898 - time (sec): 139.83 - samples/sec: 262.55 - lr: 0.000016 - momentum: 0.000000 2023-10-10 01:03:30,710 epoch 10 - iter 520/2606 - loss 0.01112512 - time (sec): 282.33 - samples/sec: 254.43 - lr: 0.000014 - momentum: 0.000000 2023-10-10 01:05:50,144 epoch 10 - iter 780/2606 - loss 0.01157811 - time (sec): 421.77 - samples/sec: 248.87 - lr: 0.000013 - momentum: 0.000000 2023-10-10 01:08:09,984 epoch 10 - iter 1040/2606 - loss 0.01040706 - time (sec): 561.61 - samples/sec: 256.04 - lr: 0.000011 - momentum: 0.000000 2023-10-10 01:10:35,004 epoch 10 - iter 1300/2606 - loss 0.01105992 - time (sec): 706.63 - samples/sec: 261.18 - lr: 0.000009 - momentum: 0.000000 2023-10-10 01:12:55,356 epoch 10 - iter 1560/2606 - loss 0.01090713 - time (sec): 846.98 - samples/sec: 259.41 - lr: 0.000007 - momentum: 0.000000 2023-10-10 01:15:24,167 epoch 10 - iter 1820/2606 - loss 0.01106558 - time (sec): 995.79 - samples/sec: 258.10 - lr: 0.000005 - momentum: 0.000000 2023-10-10 01:17:45,318 epoch 10 - iter 2080/2606 - loss 0.01060413 - time (sec): 1136.94 - samples/sec: 259.10 - lr: 0.000004 - momentum: 0.000000 2023-10-10 01:20:04,288 epoch 10 - iter 2340/2606 - loss 0.01019989 - time (sec): 1275.91 - samples/sec: 260.29 - lr: 0.000002 - momentum: 0.000000 2023-10-10 01:22:23,524 epoch 10 - iter 2600/2606 - loss 0.01013457 - time (sec): 1415.15 - samples/sec: 258.97 - lr: 0.000000 - momentum: 0.000000 2023-10-10 01:22:26,690 ---------------------------------------------------------------------------------------------------- 2023-10-10 01:22:26,690 EPOCH 10 done: loss 0.0101 - lr: 0.000000 2023-10-10 01:23:06,687 DEV : loss 0.4742611050605774 - f1-score (micro avg) 0.3928 2023-10-10 01:23:07,733 ---------------------------------------------------------------------------------------------------- 2023-10-10 01:23:07,735 Loading model from best epoch ... 2023-10-10 01:23:11,754 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-10 01:24:55,577 Results: - F-score (micro) 0.4682 - F-score (macro) 0.3262 - Accuracy 0.3104 By class: precision recall f1-score support LOC 0.5077 0.5700 0.5371 1214 PER 0.3953 0.4554 0.4232 808 ORG 0.3407 0.3484 0.3445 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4442 0.4950 0.4682 2390 macro avg 0.3109 0.3435 0.3262 2390 weighted avg 0.4418 0.4950 0.4668 2390 2023-10-10 01:24:55,577 ----------------------------------------------------------------------------------------------------