2023-10-12 02:19:10,693 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,695 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 02:19:10,696 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,696 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-12 02:19:10,696 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,696 Train: 20847 sentences 2023-10-12 02:19:10,696 (train_with_dev=False, train_with_test=False) 2023-10-12 02:19:10,696 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,696 Training Params: 2023-10-12 02:19:10,696 - learning_rate: "0.00015" 2023-10-12 02:19:10,696 - mini_batch_size: "4" 2023-10-12 02:19:10,696 - max_epochs: "10" 2023-10-12 02:19:10,696 - shuffle: "True" 2023-10-12 02:19:10,696 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,696 Plugins: 2023-10-12 02:19:10,697 - TensorboardLogger 2023-10-12 02:19:10,697 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 02:19:10,697 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,697 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 02:19:10,697 - metric: "('micro avg', 'f1-score')" 2023-10-12 02:19:10,697 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,697 Computation: 2023-10-12 02:19:10,697 - compute on device: cuda:0 2023-10-12 02:19:10,697 - embedding storage: none 2023-10-12 02:19:10,697 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,697 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-12 02:19:10,697 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,697 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:19:10,697 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 02:21:27,177 epoch 1 - iter 521/5212 - loss 2.79668095 - time (sec): 136.48 - samples/sec: 242.74 - lr: 0.000015 - momentum: 0.000000 2023-10-12 02:23:45,273 epoch 1 - iter 1042/5212 - loss 2.35843978 - time (sec): 274.57 - samples/sec: 247.36 - lr: 0.000030 - momentum: 0.000000 2023-10-12 02:26:04,245 epoch 1 - iter 1563/5212 - loss 1.80796410 - time (sec): 413.55 - samples/sec: 254.16 - lr: 0.000045 - momentum: 0.000000 2023-10-12 02:28:22,402 epoch 1 - iter 2084/5212 - loss 1.45533862 - time (sec): 551.70 - samples/sec: 257.38 - lr: 0.000060 - momentum: 0.000000 2023-10-12 02:30:41,547 epoch 1 - iter 2605/5212 - loss 1.24266578 - time (sec): 690.85 - samples/sec: 260.63 - lr: 0.000075 - momentum: 0.000000 2023-10-12 02:32:58,209 epoch 1 - iter 3126/5212 - loss 1.09834367 - time (sec): 827.51 - samples/sec: 259.96 - lr: 0.000090 - momentum: 0.000000 2023-10-12 02:35:17,942 epoch 1 - iter 3647/5212 - loss 0.98084623 - time (sec): 967.24 - samples/sec: 261.96 - lr: 0.000105 - momentum: 0.000000 2023-10-12 02:37:35,810 epoch 1 - iter 4168/5212 - loss 0.88950826 - time (sec): 1105.11 - samples/sec: 262.06 - lr: 0.000120 - momentum: 0.000000 2023-10-12 02:39:57,111 epoch 1 - iter 4689/5212 - loss 0.80746283 - time (sec): 1246.41 - samples/sec: 264.35 - lr: 0.000135 - momentum: 0.000000 2023-10-12 02:42:16,837 epoch 1 - iter 5210/5212 - loss 0.74371560 - time (sec): 1386.14 - samples/sec: 264.92 - lr: 0.000150 - momentum: 0.000000 2023-10-12 02:42:17,401 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:42:17,402 EPOCH 1 done: loss 0.7433 - lr: 0.000150 2023-10-12 02:42:52,120 DEV : loss 0.12636248767375946 - f1-score (micro avg) 0.2776 2023-10-12 02:42:52,174 saving best model 2023-10-12 02:42:53,044 ---------------------------------------------------------------------------------------------------- 2023-10-12 02:45:11,088 epoch 2 - iter 521/5212 - loss 0.17686924 - time (sec): 138.04 - samples/sec: 262.94 - lr: 0.000148 - momentum: 0.000000 2023-10-12 02:47:30,960 epoch 2 - iter 1042/5212 - loss 0.15464838 - time (sec): 277.91 - samples/sec: 267.07 - lr: 0.000147 - momentum: 0.000000 2023-10-12 02:49:50,501 epoch 2 - iter 1563/5212 - loss 0.15677141 - time (sec): 417.45 - samples/sec: 261.48 - lr: 0.000145 - momentum: 0.000000 2023-10-12 02:52:16,593 epoch 2 - iter 2084/5212 - loss 0.15541495 - time (sec): 563.55 - samples/sec: 263.09 - lr: 0.000143 - momentum: 0.000000 2023-10-12 02:54:38,805 epoch 2 - iter 2605/5212 - loss 0.15288785 - time (sec): 705.76 - samples/sec: 261.65 - lr: 0.000142 - momentum: 0.000000 2023-10-12 02:56:58,256 epoch 2 - iter 3126/5212 - loss 0.15178570 - time (sec): 845.21 - samples/sec: 258.20 - lr: 0.000140 - momentum: 0.000000 2023-10-12 02:59:16,224 epoch 2 - iter 3647/5212 - loss 0.15320470 - time (sec): 983.18 - samples/sec: 254.99 - lr: 0.000138 - momentum: 0.000000 2023-10-12 03:01:39,472 epoch 2 - iter 4168/5212 - loss 0.14998413 - time (sec): 1126.43 - samples/sec: 256.63 - lr: 0.000137 - momentum: 0.000000 2023-10-12 03:04:04,945 epoch 2 - iter 4689/5212 - loss 0.14664370 - time (sec): 1271.90 - samples/sec: 259.73 - lr: 0.000135 - momentum: 0.000000 2023-10-12 03:06:26,106 epoch 2 - iter 5210/5212 - loss 0.14460575 - time (sec): 1413.06 - samples/sec: 259.97 - lr: 0.000133 - momentum: 0.000000 2023-10-12 03:06:26,551 ---------------------------------------------------------------------------------------------------- 2023-10-12 03:06:26,552 EPOCH 2 done: loss 0.1446 - lr: 0.000133 2023-10-12 03:07:05,601 DEV : loss 0.14167223870754242 - f1-score (micro avg) 0.3339 2023-10-12 03:07:05,653 saving best model 2023-10-12 03:07:08,265 ---------------------------------------------------------------------------------------------------- 2023-10-12 03:09:23,808 epoch 3 - iter 521/5212 - loss 0.10082822 - time (sec): 135.54 - samples/sec: 254.81 - lr: 0.000132 - momentum: 0.000000 2023-10-12 03:11:37,979 epoch 3 - iter 1042/5212 - loss 0.09748085 - time (sec): 269.71 - samples/sec: 250.08 - lr: 0.000130 - momentum: 0.000000 2023-10-12 03:13:58,974 epoch 3 - iter 1563/5212 - loss 0.09985524 - time (sec): 410.70 - samples/sec: 264.24 - lr: 0.000128 - momentum: 0.000000 2023-10-12 03:16:15,894 epoch 3 - iter 2084/5212 - loss 0.09997962 - time (sec): 547.62 - samples/sec: 263.03 - lr: 0.000127 - momentum: 0.000000 2023-10-12 03:18:33,733 epoch 3 - iter 2605/5212 - loss 0.09885177 - time (sec): 685.46 - samples/sec: 261.01 - lr: 0.000125 - momentum: 0.000000 2023-10-12 03:20:55,408 epoch 3 - iter 3126/5212 - loss 0.09501689 - time (sec): 827.14 - samples/sec: 264.99 - lr: 0.000123 - momentum: 0.000000 2023-10-12 03:23:18,463 epoch 3 - iter 3647/5212 - loss 0.09597740 - time (sec): 970.19 - samples/sec: 267.70 - lr: 0.000122 - momentum: 0.000000 2023-10-12 03:25:40,852 epoch 3 - iter 4168/5212 - loss 0.09794887 - time (sec): 1112.58 - samples/sec: 263.01 - lr: 0.000120 - momentum: 0.000000 2023-10-12 03:28:07,225 epoch 3 - iter 4689/5212 - loss 0.09904610 - time (sec): 1258.96 - samples/sec: 261.73 - lr: 0.000118 - momentum: 0.000000 2023-10-12 03:30:34,184 epoch 3 - iter 5210/5212 - loss 0.09816484 - time (sec): 1405.91 - samples/sec: 261.22 - lr: 0.000117 - momentum: 0.000000 2023-10-12 03:30:34,730 ---------------------------------------------------------------------------------------------------- 2023-10-12 03:30:34,731 EPOCH 3 done: loss 0.0981 - lr: 0.000117 2023-10-12 03:31:15,205 DEV : loss 0.2600547671318054 - f1-score (micro avg) 0.3618 2023-10-12 03:31:15,262 saving best model 2023-10-12 03:31:17,833 ---------------------------------------------------------------------------------------------------- 2023-10-12 03:33:41,338 epoch 4 - iter 521/5212 - loss 0.06934331 - time (sec): 143.50 - samples/sec: 252.51 - lr: 0.000115 - momentum: 0.000000 2023-10-12 03:36:05,531 epoch 4 - iter 1042/5212 - loss 0.07084255 - time (sec): 287.69 - samples/sec: 257.98 - lr: 0.000113 - momentum: 0.000000 2023-10-12 03:38:29,695 epoch 4 - iter 1563/5212 - loss 0.06755543 - time (sec): 431.86 - samples/sec: 261.22 - lr: 0.000112 - momentum: 0.000000 2023-10-12 03:40:54,693 epoch 4 - iter 2084/5212 - loss 0.06616838 - time (sec): 576.86 - samples/sec: 261.23 - lr: 0.000110 - momentum: 0.000000 2023-10-12 03:43:17,865 epoch 4 - iter 2605/5212 - loss 0.06558266 - time (sec): 720.03 - samples/sec: 258.99 - lr: 0.000108 - momentum: 0.000000 2023-10-12 03:45:41,115 epoch 4 - iter 3126/5212 - loss 0.06435344 - time (sec): 863.28 - samples/sec: 259.71 - lr: 0.000107 - momentum: 0.000000 2023-10-12 03:48:03,559 epoch 4 - iter 3647/5212 - loss 0.06413682 - time (sec): 1005.72 - samples/sec: 259.21 - lr: 0.000105 - momentum: 0.000000 2023-10-12 03:50:23,570 epoch 4 - iter 4168/5212 - loss 0.06596094 - time (sec): 1145.73 - samples/sec: 257.63 - lr: 0.000103 - momentum: 0.000000 2023-10-12 03:52:45,111 epoch 4 - iter 4689/5212 - loss 0.06614742 - time (sec): 1287.27 - samples/sec: 258.43 - lr: 0.000102 - momentum: 0.000000 2023-10-12 03:55:04,718 epoch 4 - iter 5210/5212 - loss 0.06648116 - time (sec): 1426.88 - samples/sec: 257.45 - lr: 0.000100 - momentum: 0.000000 2023-10-12 03:55:05,152 ---------------------------------------------------------------------------------------------------- 2023-10-12 03:55:05,153 EPOCH 4 done: loss 0.0665 - lr: 0.000100 2023-10-12 03:55:45,279 DEV : loss 0.3101561367511749 - f1-score (micro avg) 0.3555 2023-10-12 03:55:45,331 ---------------------------------------------------------------------------------------------------- 2023-10-12 03:58:04,744 epoch 5 - iter 521/5212 - loss 0.04426358 - time (sec): 139.41 - samples/sec: 259.84 - lr: 0.000098 - momentum: 0.000000 2023-10-12 04:00:24,001 epoch 5 - iter 1042/5212 - loss 0.04273940 - time (sec): 278.67 - samples/sec: 260.65 - lr: 0.000097 - momentum: 0.000000 2023-10-12 04:02:41,900 epoch 5 - iter 1563/5212 - loss 0.04286202 - time (sec): 416.57 - samples/sec: 256.36 - lr: 0.000095 - momentum: 0.000000 2023-10-12 04:05:03,372 epoch 5 - iter 2084/5212 - loss 0.04497980 - time (sec): 558.04 - samples/sec: 260.25 - lr: 0.000093 - momentum: 0.000000 2023-10-12 04:07:19,613 epoch 5 - iter 2605/5212 - loss 0.04479044 - time (sec): 694.28 - samples/sec: 258.94 - lr: 0.000092 - momentum: 0.000000 2023-10-12 04:09:43,942 epoch 5 - iter 3126/5212 - loss 0.04436294 - time (sec): 838.61 - samples/sec: 258.68 - lr: 0.000090 - momentum: 0.000000 2023-10-12 04:12:11,821 epoch 5 - iter 3647/5212 - loss 0.04380965 - time (sec): 986.49 - samples/sec: 258.99 - lr: 0.000088 - momentum: 0.000000 2023-10-12 04:14:37,186 epoch 5 - iter 4168/5212 - loss 0.04530386 - time (sec): 1131.85 - samples/sec: 258.58 - lr: 0.000087 - momentum: 0.000000 2023-10-12 04:17:02,247 epoch 5 - iter 4689/5212 - loss 0.04665775 - time (sec): 1276.91 - samples/sec: 257.28 - lr: 0.000085 - momentum: 0.000000 2023-10-12 04:19:31,906 epoch 5 - iter 5210/5212 - loss 0.04638068 - time (sec): 1426.57 - samples/sec: 257.51 - lr: 0.000083 - momentum: 0.000000 2023-10-12 04:19:32,352 ---------------------------------------------------------------------------------------------------- 2023-10-12 04:19:32,353 EPOCH 5 done: loss 0.0464 - lr: 0.000083 2023-10-12 04:20:13,213 DEV : loss 0.3113304078578949 - f1-score (micro avg) 0.4003 2023-10-12 04:20:13,271 saving best model 2023-10-12 04:20:14,217 ---------------------------------------------------------------------------------------------------- 2023-10-12 04:22:41,995 epoch 6 - iter 521/5212 - loss 0.02551644 - time (sec): 147.78 - samples/sec: 258.80 - lr: 0.000082 - momentum: 0.000000 2023-10-12 04:25:08,081 epoch 6 - iter 1042/5212 - loss 0.02937621 - time (sec): 293.86 - samples/sec: 259.22 - lr: 0.000080 - momentum: 0.000000 2023-10-12 04:27:33,632 epoch 6 - iter 1563/5212 - loss 0.03011832 - time (sec): 439.41 - samples/sec: 254.55 - lr: 0.000078 - momentum: 0.000000 2023-10-12 04:29:57,124 epoch 6 - iter 2084/5212 - loss 0.03067784 - time (sec): 582.90 - samples/sec: 252.43 - lr: 0.000077 - momentum: 0.000000 2023-10-12 04:32:22,709 epoch 6 - iter 2605/5212 - loss 0.03033894 - time (sec): 728.49 - samples/sec: 256.01 - lr: 0.000075 - momentum: 0.000000 2023-10-12 04:34:46,393 epoch 6 - iter 3126/5212 - loss 0.03057800 - time (sec): 872.17 - samples/sec: 256.98 - lr: 0.000073 - momentum: 0.000000 2023-10-12 04:37:07,410 epoch 6 - iter 3647/5212 - loss 0.03169587 - time (sec): 1013.19 - samples/sec: 256.43 - lr: 0.000072 - momentum: 0.000000 2023-10-12 04:39:27,981 epoch 6 - iter 4168/5212 - loss 0.03188952 - time (sec): 1153.76 - samples/sec: 256.11 - lr: 0.000070 - momentum: 0.000000 2023-10-12 04:41:50,066 epoch 6 - iter 4689/5212 - loss 0.03265092 - time (sec): 1295.85 - samples/sec: 256.18 - lr: 0.000068 - momentum: 0.000000 2023-10-12 04:44:09,929 epoch 6 - iter 5210/5212 - loss 0.03297566 - time (sec): 1435.71 - samples/sec: 255.87 - lr: 0.000067 - momentum: 0.000000 2023-10-12 04:44:10,371 ---------------------------------------------------------------------------------------------------- 2023-10-12 04:44:10,371 EPOCH 6 done: loss 0.0330 - lr: 0.000067 2023-10-12 04:44:50,633 DEV : loss 0.40718281269073486 - f1-score (micro avg) 0.3971 2023-10-12 04:44:50,685 ---------------------------------------------------------------------------------------------------- 2023-10-12 04:47:11,977 epoch 7 - iter 521/5212 - loss 0.02552297 - time (sec): 141.29 - samples/sec: 259.67 - lr: 0.000065 - momentum: 0.000000 2023-10-12 04:49:32,146 epoch 7 - iter 1042/5212 - loss 0.02437180 - time (sec): 281.46 - samples/sec: 271.70 - lr: 0.000063 - momentum: 0.000000 2023-10-12 04:51:48,960 epoch 7 - iter 1563/5212 - loss 0.02514062 - time (sec): 418.27 - samples/sec: 267.26 - lr: 0.000062 - momentum: 0.000000 2023-10-12 04:54:07,753 epoch 7 - iter 2084/5212 - loss 0.02516668 - time (sec): 557.07 - samples/sec: 267.87 - lr: 0.000060 - momentum: 0.000000 2023-10-12 04:56:27,890 epoch 7 - iter 2605/5212 - loss 0.02469412 - time (sec): 697.20 - samples/sec: 265.74 - lr: 0.000058 - momentum: 0.000000 2023-10-12 04:58:49,869 epoch 7 - iter 3126/5212 - loss 0.02456715 - time (sec): 839.18 - samples/sec: 265.93 - lr: 0.000057 - momentum: 0.000000 2023-10-12 05:01:09,634 epoch 7 - iter 3647/5212 - loss 0.02477855 - time (sec): 978.95 - samples/sec: 263.65 - lr: 0.000055 - momentum: 0.000000 2023-10-12 05:03:36,700 epoch 7 - iter 4168/5212 - loss 0.02394687 - time (sec): 1126.01 - samples/sec: 262.72 - lr: 0.000053 - momentum: 0.000000 2023-10-12 05:06:00,466 epoch 7 - iter 4689/5212 - loss 0.02356060 - time (sec): 1269.78 - samples/sec: 260.95 - lr: 0.000052 - momentum: 0.000000 2023-10-12 05:08:24,608 epoch 7 - iter 5210/5212 - loss 0.02323141 - time (sec): 1413.92 - samples/sec: 259.78 - lr: 0.000050 - momentum: 0.000000 2023-10-12 05:08:25,097 ---------------------------------------------------------------------------------------------------- 2023-10-12 05:08:25,097 EPOCH 7 done: loss 0.0232 - lr: 0.000050 2023-10-12 05:09:04,885 DEV : loss 0.408222496509552 - f1-score (micro avg) 0.4035 2023-10-12 05:09:04,936 saving best model 2023-10-12 05:09:07,501 ---------------------------------------------------------------------------------------------------- 2023-10-12 05:11:27,797 epoch 8 - iter 521/5212 - loss 0.01537607 - time (sec): 140.29 - samples/sec: 265.21 - lr: 0.000048 - momentum: 0.000000 2023-10-12 05:13:47,952 epoch 8 - iter 1042/5212 - loss 0.01753064 - time (sec): 280.45 - samples/sec: 271.51 - lr: 0.000047 - momentum: 0.000000 2023-10-12 05:16:11,225 epoch 8 - iter 1563/5212 - loss 0.01696724 - time (sec): 423.72 - samples/sec: 277.40 - lr: 0.000045 - momentum: 0.000000 2023-10-12 05:18:28,174 epoch 8 - iter 2084/5212 - loss 0.01669412 - time (sec): 560.67 - samples/sec: 273.78 - lr: 0.000043 - momentum: 0.000000 2023-10-12 05:20:45,546 epoch 8 - iter 2605/5212 - loss 0.01686027 - time (sec): 698.04 - samples/sec: 269.95 - lr: 0.000042 - momentum: 0.000000 2023-10-12 05:23:00,618 epoch 8 - iter 3126/5212 - loss 0.01671656 - time (sec): 833.11 - samples/sec: 267.12 - lr: 0.000040 - momentum: 0.000000 2023-10-12 05:25:16,222 epoch 8 - iter 3647/5212 - loss 0.01600572 - time (sec): 968.72 - samples/sec: 265.31 - lr: 0.000038 - momentum: 0.000000 2023-10-12 05:27:35,013 epoch 8 - iter 4168/5212 - loss 0.01597679 - time (sec): 1107.51 - samples/sec: 264.67 - lr: 0.000037 - momentum: 0.000000 2023-10-12 05:29:56,551 epoch 8 - iter 4689/5212 - loss 0.01553053 - time (sec): 1249.05 - samples/sec: 263.33 - lr: 0.000035 - momentum: 0.000000 2023-10-12 05:32:19,571 epoch 8 - iter 5210/5212 - loss 0.01639257 - time (sec): 1392.06 - samples/sec: 263.90 - lr: 0.000033 - momentum: 0.000000 2023-10-12 05:32:19,996 ---------------------------------------------------------------------------------------------------- 2023-10-12 05:32:19,997 EPOCH 8 done: loss 0.0164 - lr: 0.000033 2023-10-12 05:32:58,072 DEV : loss 0.4335840940475464 - f1-score (micro avg) 0.404 2023-10-12 05:32:58,123 saving best model 2023-10-12 05:33:00,797 ---------------------------------------------------------------------------------------------------- 2023-10-12 05:35:21,905 epoch 9 - iter 521/5212 - loss 0.01030793 - time (sec): 141.10 - samples/sec: 275.22 - lr: 0.000032 - momentum: 0.000000 2023-10-12 05:37:41,298 epoch 9 - iter 1042/5212 - loss 0.01148381 - time (sec): 280.50 - samples/sec: 274.31 - lr: 0.000030 - momentum: 0.000000 2023-10-12 05:40:02,195 epoch 9 - iter 1563/5212 - loss 0.01090402 - time (sec): 421.39 - samples/sec: 263.64 - lr: 0.000028 - momentum: 0.000000 2023-10-12 05:42:24,062 epoch 9 - iter 2084/5212 - loss 0.01205054 - time (sec): 563.26 - samples/sec: 259.21 - lr: 0.000027 - momentum: 0.000000 2023-10-12 05:44:49,921 epoch 9 - iter 2605/5212 - loss 0.01249863 - time (sec): 709.12 - samples/sec: 258.65 - lr: 0.000025 - momentum: 0.000000 2023-10-12 05:47:12,873 epoch 9 - iter 3126/5212 - loss 0.01215066 - time (sec): 852.07 - samples/sec: 258.10 - lr: 0.000023 - momentum: 0.000000 2023-10-12 05:49:39,769 epoch 9 - iter 3647/5212 - loss 0.01116151 - time (sec): 998.97 - samples/sec: 259.45 - lr: 0.000022 - momentum: 0.000000 2023-10-12 05:52:01,961 epoch 9 - iter 4168/5212 - loss 0.01081521 - time (sec): 1141.16 - samples/sec: 257.32 - lr: 0.000020 - momentum: 0.000000 2023-10-12 05:54:25,864 epoch 9 - iter 4689/5212 - loss 0.01082631 - time (sec): 1285.06 - samples/sec: 256.89 - lr: 0.000018 - momentum: 0.000000 2023-10-12 05:56:50,574 epoch 9 - iter 5210/5212 - loss 0.01087435 - time (sec): 1429.77 - samples/sec: 256.94 - lr: 0.000017 - momentum: 0.000000 2023-10-12 05:56:51,009 ---------------------------------------------------------------------------------------------------- 2023-10-12 05:56:51,009 EPOCH 9 done: loss 0.0109 - lr: 0.000017 2023-10-12 05:57:30,552 DEV : loss 0.47575101256370544 - f1-score (micro avg) 0.3948 2023-10-12 05:57:30,605 ---------------------------------------------------------------------------------------------------- 2023-10-12 05:59:52,416 epoch 10 - iter 521/5212 - loss 0.00498377 - time (sec): 141.81 - samples/sec: 252.02 - lr: 0.000015 - momentum: 0.000000 2023-10-12 06:02:13,588 epoch 10 - iter 1042/5212 - loss 0.00681335 - time (sec): 282.98 - samples/sec: 255.31 - lr: 0.000013 - momentum: 0.000000 2023-10-12 06:04:37,692 epoch 10 - iter 1563/5212 - loss 0.00622093 - time (sec): 427.08 - samples/sec: 259.59 - lr: 0.000012 - momentum: 0.000000 2023-10-12 06:06:59,831 epoch 10 - iter 2084/5212 - loss 0.00629281 - time (sec): 569.22 - samples/sec: 258.24 - lr: 0.000010 - momentum: 0.000000 2023-10-12 06:09:21,704 epoch 10 - iter 2605/5212 - loss 0.00707237 - time (sec): 711.10 - samples/sec: 257.28 - lr: 0.000008 - momentum: 0.000000 2023-10-12 06:11:42,090 epoch 10 - iter 3126/5212 - loss 0.00730698 - time (sec): 851.48 - samples/sec: 256.91 - lr: 0.000007 - momentum: 0.000000 2023-10-12 06:14:01,717 epoch 10 - iter 3647/5212 - loss 0.00730729 - time (sec): 991.11 - samples/sec: 258.63 - lr: 0.000005 - momentum: 0.000000 2023-10-12 06:16:22,285 epoch 10 - iter 4168/5212 - loss 0.00690845 - time (sec): 1131.68 - samples/sec: 260.93 - lr: 0.000003 - momentum: 0.000000 2023-10-12 06:18:43,018 epoch 10 - iter 4689/5212 - loss 0.00696219 - time (sec): 1272.41 - samples/sec: 260.74 - lr: 0.000002 - momentum: 0.000000 2023-10-12 06:21:02,617 epoch 10 - iter 5210/5212 - loss 0.00715270 - time (sec): 1412.01 - samples/sec: 260.15 - lr: 0.000000 - momentum: 0.000000 2023-10-12 06:21:03,064 ---------------------------------------------------------------------------------------------------- 2023-10-12 06:21:03,065 EPOCH 10 done: loss 0.0072 - lr: 0.000000 2023-10-12 06:21:42,939 DEV : loss 0.4921533763408661 - f1-score (micro avg) 0.3907 2023-10-12 06:21:43,893 ---------------------------------------------------------------------------------------------------- 2023-10-12 06:21:43,895 Loading model from best epoch ... 2023-10-12 06:21:47,649 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-12 06:23:28,188 Results: - F-score (micro) 0.4681 - F-score (macro) 0.3274 - Accuracy 0.3108 By class: precision recall f1-score support LOC 0.5033 0.5610 0.5306 1214 PER 0.4123 0.4567 0.4334 808 ORG 0.3282 0.3654 0.3458 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4454 0.4933 0.4681 2390 macro avg 0.3110 0.3458 0.3274 2390 weighted avg 0.4435 0.4933 0.4671 2390 2023-10-12 06:23:28,188 ----------------------------------------------------------------------------------------------------