2023-10-11 02:36:00,775 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,778 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 02:36:00,778 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,778 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-11 02:36:00,778 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,778 Train: 20847 sentences 2023-10-11 02:36:00,779 (train_with_dev=False, train_with_test=False) 2023-10-11 02:36:00,779 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,779 Training Params: 2023-10-11 02:36:00,779 - learning_rate: "0.00015" 2023-10-11 02:36:00,779 - mini_batch_size: "8" 2023-10-11 02:36:00,779 - max_epochs: "10" 2023-10-11 02:36:00,779 - shuffle: "True" 2023-10-11 02:36:00,779 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,779 Plugins: 2023-10-11 02:36:00,779 - TensorboardLogger 2023-10-11 02:36:00,779 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 02:36:00,779 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,779 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 02:36:00,780 - metric: "('micro avg', 'f1-score')" 2023-10-11 02:36:00,780 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,780 Computation: 2023-10-11 02:36:00,780 - compute on device: cuda:0 2023-10-11 02:36:00,780 - embedding storage: none 2023-10-11 02:36:00,780 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,780 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-11 02:36:00,780 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,780 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:36:00,780 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 02:38:18,607 epoch 1 - iter 260/2606 - loss 2.79875312 - time (sec): 137.82 - samples/sec: 270.82 - lr: 0.000015 - momentum: 0.000000 2023-10-11 02:40:38,845 epoch 1 - iter 520/2606 - loss 2.53797710 - time (sec): 278.06 - samples/sec: 275.79 - lr: 0.000030 - momentum: 0.000000 2023-10-11 02:42:54,895 epoch 1 - iter 780/2606 - loss 2.17089096 - time (sec): 414.11 - samples/sec: 273.19 - lr: 0.000045 - momentum: 0.000000 2023-10-11 02:45:09,429 epoch 1 - iter 1040/2606 - loss 1.79817367 - time (sec): 548.65 - samples/sec: 272.16 - lr: 0.000060 - momentum: 0.000000 2023-10-11 02:47:25,050 epoch 1 - iter 1300/2606 - loss 1.53189687 - time (sec): 684.27 - samples/sec: 273.25 - lr: 0.000075 - momentum: 0.000000 2023-10-11 02:49:38,076 epoch 1 - iter 1560/2606 - loss 1.35818094 - time (sec): 817.29 - samples/sec: 272.98 - lr: 0.000090 - momentum: 0.000000 2023-10-11 02:51:51,661 epoch 1 - iter 1820/2606 - loss 1.22736080 - time (sec): 950.88 - samples/sec: 271.45 - lr: 0.000105 - momentum: 0.000000 2023-10-11 02:54:05,050 epoch 1 - iter 2080/2606 - loss 1.12807500 - time (sec): 1084.27 - samples/sec: 269.35 - lr: 0.000120 - momentum: 0.000000 2023-10-11 02:56:20,813 epoch 1 - iter 2340/2606 - loss 1.03625081 - time (sec): 1220.03 - samples/sec: 270.57 - lr: 0.000135 - momentum: 0.000000 2023-10-11 02:58:36,030 epoch 1 - iter 2600/2606 - loss 0.95912226 - time (sec): 1355.25 - samples/sec: 270.56 - lr: 0.000150 - momentum: 0.000000 2023-10-11 02:58:39,080 ---------------------------------------------------------------------------------------------------- 2023-10-11 02:58:39,080 EPOCH 1 done: loss 0.9578 - lr: 0.000150 2023-10-11 02:59:16,006 DEV : loss 0.1385965347290039 - f1-score (micro avg) 0.0 2023-10-11 02:59:16,068 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:01:33,372 epoch 2 - iter 260/2606 - loss 0.23906488 - time (sec): 137.30 - samples/sec: 268.82 - lr: 0.000148 - momentum: 0.000000 2023-10-11 03:03:48,996 epoch 2 - iter 520/2606 - loss 0.22848489 - time (sec): 272.92 - samples/sec: 268.13 - lr: 0.000147 - momentum: 0.000000 2023-10-11 03:06:09,371 epoch 2 - iter 780/2606 - loss 0.23177752 - time (sec): 413.30 - samples/sec: 272.39 - lr: 0.000145 - momentum: 0.000000 2023-10-11 03:08:29,617 epoch 2 - iter 1040/2606 - loss 0.22529060 - time (sec): 553.55 - samples/sec: 269.39 - lr: 0.000143 - momentum: 0.000000 2023-10-11 03:10:46,687 epoch 2 - iter 1300/2606 - loss 0.21839832 - time (sec): 690.62 - samples/sec: 266.76 - lr: 0.000142 - momentum: 0.000000 2023-10-11 03:13:03,201 epoch 2 - iter 1560/2606 - loss 0.20884696 - time (sec): 827.13 - samples/sec: 267.47 - lr: 0.000140 - momentum: 0.000000 2023-10-11 03:15:16,019 epoch 2 - iter 1820/2606 - loss 0.20568470 - time (sec): 959.95 - samples/sec: 265.71 - lr: 0.000138 - momentum: 0.000000 2023-10-11 03:17:32,731 epoch 2 - iter 2080/2606 - loss 0.19850457 - time (sec): 1096.66 - samples/sec: 265.70 - lr: 0.000137 - momentum: 0.000000 2023-10-11 03:19:49,480 epoch 2 - iter 2340/2606 - loss 0.19252617 - time (sec): 1233.41 - samples/sec: 267.36 - lr: 0.000135 - momentum: 0.000000 2023-10-11 03:22:04,628 epoch 2 - iter 2600/2606 - loss 0.18624742 - time (sec): 1368.56 - samples/sec: 267.88 - lr: 0.000133 - momentum: 0.000000 2023-10-11 03:22:07,610 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:22:07,610 EPOCH 2 done: loss 0.1859 - lr: 0.000133 2023-10-11 03:22:49,339 DEV : loss 0.14479756355285645 - f1-score (micro avg) 0.3101 2023-10-11 03:22:49,395 saving best model 2023-10-11 03:22:50,414 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:25:06,167 epoch 3 - iter 260/2606 - loss 0.10408270 - time (sec): 135.75 - samples/sec: 257.11 - lr: 0.000132 - momentum: 0.000000 2023-10-11 03:27:23,435 epoch 3 - iter 520/2606 - loss 0.10809063 - time (sec): 273.02 - samples/sec: 260.19 - lr: 0.000130 - momentum: 0.000000 2023-10-11 03:29:40,284 epoch 3 - iter 780/2606 - loss 0.10276919 - time (sec): 409.87 - samples/sec: 260.29 - lr: 0.000128 - momentum: 0.000000 2023-10-11 03:32:00,756 epoch 3 - iter 1040/2606 - loss 0.10650096 - time (sec): 550.34 - samples/sec: 264.48 - lr: 0.000127 - momentum: 0.000000 2023-10-11 03:34:19,482 epoch 3 - iter 1300/2606 - loss 0.10951685 - time (sec): 689.07 - samples/sec: 266.61 - lr: 0.000125 - momentum: 0.000000 2023-10-11 03:36:39,185 epoch 3 - iter 1560/2606 - loss 0.10633073 - time (sec): 828.77 - samples/sec: 264.78 - lr: 0.000123 - momentum: 0.000000 2023-10-11 03:38:59,124 epoch 3 - iter 1820/2606 - loss 0.10501204 - time (sec): 968.71 - samples/sec: 263.31 - lr: 0.000122 - momentum: 0.000000 2023-10-11 03:41:17,126 epoch 3 - iter 2080/2606 - loss 0.10470219 - time (sec): 1106.71 - samples/sec: 264.03 - lr: 0.000120 - momentum: 0.000000 2023-10-11 03:43:33,651 epoch 3 - iter 2340/2606 - loss 0.10526270 - time (sec): 1243.23 - samples/sec: 263.79 - lr: 0.000118 - momentum: 0.000000 2023-10-11 03:45:53,893 epoch 3 - iter 2600/2606 - loss 0.10410180 - time (sec): 1383.48 - samples/sec: 265.09 - lr: 0.000117 - momentum: 0.000000 2023-10-11 03:45:56,835 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:45:56,835 EPOCH 3 done: loss 0.1044 - lr: 0.000117 2023-10-11 03:46:38,205 DEV : loss 0.18133316934108734 - f1-score (micro avg) 0.3516 2023-10-11 03:46:38,261 saving best model 2023-10-11 03:46:44,488 ---------------------------------------------------------------------------------------------------- 2023-10-11 03:48:57,610 epoch 4 - iter 260/2606 - loss 0.06859431 - time (sec): 133.12 - samples/sec: 264.04 - lr: 0.000115 - momentum: 0.000000 2023-10-11 03:51:11,126 epoch 4 - iter 520/2606 - loss 0.07027485 - time (sec): 266.63 - samples/sec: 269.61 - lr: 0.000113 - momentum: 0.000000 2023-10-11 03:53:25,561 epoch 4 - iter 780/2606 - loss 0.07085992 - time (sec): 401.07 - samples/sec: 271.87 - lr: 0.000112 - momentum: 0.000000 2023-10-11 03:55:39,197 epoch 4 - iter 1040/2606 - loss 0.07328355 - time (sec): 534.71 - samples/sec: 270.36 - lr: 0.000110 - momentum: 0.000000 2023-10-11 03:57:58,458 epoch 4 - iter 1300/2606 - loss 0.07139563 - time (sec): 673.97 - samples/sec: 274.04 - lr: 0.000108 - momentum: 0.000000 2023-10-11 04:00:15,073 epoch 4 - iter 1560/2606 - loss 0.07188588 - time (sec): 810.58 - samples/sec: 271.09 - lr: 0.000107 - momentum: 0.000000 2023-10-11 04:02:32,271 epoch 4 - iter 1820/2606 - loss 0.07168252 - time (sec): 947.78 - samples/sec: 271.34 - lr: 0.000105 - momentum: 0.000000 2023-10-11 04:04:51,046 epoch 4 - iter 2080/2606 - loss 0.07050739 - time (sec): 1086.55 - samples/sec: 273.65 - lr: 0.000103 - momentum: 0.000000 2023-10-11 04:07:02,394 epoch 4 - iter 2340/2606 - loss 0.07032051 - time (sec): 1217.90 - samples/sec: 272.18 - lr: 0.000102 - momentum: 0.000000 2023-10-11 04:09:15,575 epoch 4 - iter 2600/2606 - loss 0.07012799 - time (sec): 1351.08 - samples/sec: 271.60 - lr: 0.000100 - momentum: 0.000000 2023-10-11 04:09:18,332 ---------------------------------------------------------------------------------------------------- 2023-10-11 04:09:18,332 EPOCH 4 done: loss 0.0701 - lr: 0.000100 2023-10-11 04:09:59,619 DEV : loss 0.25803107023239136 - f1-score (micro avg) 0.3519 2023-10-11 04:09:59,675 saving best model 2023-10-11 04:10:05,959 ---------------------------------------------------------------------------------------------------- 2023-10-11 04:12:22,298 epoch 5 - iter 260/2606 - loss 0.05443710 - time (sec): 136.33 - samples/sec: 262.96 - lr: 0.000098 - momentum: 0.000000 2023-10-11 04:14:42,695 epoch 5 - iter 520/2606 - loss 0.05908808 - time (sec): 276.73 - samples/sec: 265.82 - lr: 0.000097 - momentum: 0.000000 2023-10-11 04:17:01,783 epoch 5 - iter 780/2606 - loss 0.05752056 - time (sec): 415.82 - samples/sec: 261.97 - lr: 0.000095 - momentum: 0.000000 2023-10-11 04:19:19,820 epoch 5 - iter 1040/2606 - loss 0.05497372 - time (sec): 553.86 - samples/sec: 262.52 - lr: 0.000093 - momentum: 0.000000 2023-10-11 04:21:39,929 epoch 5 - iter 1300/2606 - loss 0.05305092 - time (sec): 693.97 - samples/sec: 264.86 - lr: 0.000092 - momentum: 0.000000 2023-10-11 04:23:54,456 epoch 5 - iter 1560/2606 - loss 0.05305439 - time (sec): 828.49 - samples/sec: 264.58 - lr: 0.000090 - momentum: 0.000000 2023-10-11 04:26:10,801 epoch 5 - iter 1820/2606 - loss 0.05315558 - time (sec): 964.84 - samples/sec: 265.27 - lr: 0.000088 - momentum: 0.000000 2023-10-11 04:28:25,838 epoch 5 - iter 2080/2606 - loss 0.05281160 - time (sec): 1099.87 - samples/sec: 264.63 - lr: 0.000087 - momentum: 0.000000 2023-10-11 04:30:43,752 epoch 5 - iter 2340/2606 - loss 0.05256430 - time (sec): 1237.79 - samples/sec: 265.27 - lr: 0.000085 - momentum: 0.000000 2023-10-11 04:33:07,208 epoch 5 - iter 2600/2606 - loss 0.05326034 - time (sec): 1381.24 - samples/sec: 264.92 - lr: 0.000083 - momentum: 0.000000 2023-10-11 04:33:11,087 ---------------------------------------------------------------------------------------------------- 2023-10-11 04:33:11,087 EPOCH 5 done: loss 0.0531 - lr: 0.000083 2023-10-11 04:33:52,049 DEV : loss 0.3203273415565491 - f1-score (micro avg) 0.3574 2023-10-11 04:33:52,111 saving best model 2023-10-11 04:33:55,540 ---------------------------------------------------------------------------------------------------- 2023-10-11 04:36:12,513 epoch 6 - iter 260/2606 - loss 0.03371390 - time (sec): 136.97 - samples/sec: 245.66 - lr: 0.000082 - momentum: 0.000000 2023-10-11 04:38:27,463 epoch 6 - iter 520/2606 - loss 0.03309046 - time (sec): 271.92 - samples/sec: 250.81 - lr: 0.000080 - momentum: 0.000000 2023-10-11 04:40:45,216 epoch 6 - iter 780/2606 - loss 0.03599853 - time (sec): 409.67 - samples/sec: 255.19 - lr: 0.000078 - momentum: 0.000000 2023-10-11 04:42:57,982 epoch 6 - iter 1040/2606 - loss 0.03691222 - time (sec): 542.44 - samples/sec: 258.71 - lr: 0.000077 - momentum: 0.000000 2023-10-11 04:45:12,781 epoch 6 - iter 1300/2606 - loss 0.03826054 - time (sec): 677.24 - samples/sec: 261.80 - lr: 0.000075 - momentum: 0.000000 2023-10-11 04:47:24,869 epoch 6 - iter 1560/2606 - loss 0.03728935 - time (sec): 809.32 - samples/sec: 262.45 - lr: 0.000073 - momentum: 0.000000 2023-10-11 04:49:41,097 epoch 6 - iter 1820/2606 - loss 0.03682751 - time (sec): 945.55 - samples/sec: 266.43 - lr: 0.000072 - momentum: 0.000000 2023-10-11 04:51:58,399 epoch 6 - iter 2080/2606 - loss 0.03749443 - time (sec): 1082.85 - samples/sec: 267.50 - lr: 0.000070 - momentum: 0.000000 2023-10-11 04:54:20,332 epoch 6 - iter 2340/2606 - loss 0.03754226 - time (sec): 1224.79 - samples/sec: 268.67 - lr: 0.000068 - momentum: 0.000000 2023-10-11 04:56:39,733 epoch 6 - iter 2600/2606 - loss 0.03778431 - time (sec): 1364.19 - samples/sec: 268.75 - lr: 0.000067 - momentum: 0.000000 2023-10-11 04:56:42,821 ---------------------------------------------------------------------------------------------------- 2023-10-11 04:56:42,821 EPOCH 6 done: loss 0.0378 - lr: 0.000067 2023-10-11 04:57:23,462 DEV : loss 0.4133872985839844 - f1-score (micro avg) 0.3448 2023-10-11 04:57:23,515 ---------------------------------------------------------------------------------------------------- 2023-10-11 04:59:43,378 epoch 7 - iter 260/2606 - loss 0.02500855 - time (sec): 139.86 - samples/sec: 286.63 - lr: 0.000065 - momentum: 0.000000 2023-10-11 05:01:57,357 epoch 7 - iter 520/2606 - loss 0.02465180 - time (sec): 273.84 - samples/sec: 277.16 - lr: 0.000063 - momentum: 0.000000 2023-10-11 05:04:15,775 epoch 7 - iter 780/2606 - loss 0.02468536 - time (sec): 412.26 - samples/sec: 274.92 - lr: 0.000062 - momentum: 0.000000 2023-10-11 05:06:37,944 epoch 7 - iter 1040/2606 - loss 0.02691277 - time (sec): 554.43 - samples/sec: 273.88 - lr: 0.000060 - momentum: 0.000000 2023-10-11 05:08:55,685 epoch 7 - iter 1300/2606 - loss 0.03046514 - time (sec): 692.17 - samples/sec: 268.08 - lr: 0.000058 - momentum: 0.000000 2023-10-11 05:11:14,370 epoch 7 - iter 1560/2606 - loss 0.03146291 - time (sec): 830.85 - samples/sec: 268.60 - lr: 0.000057 - momentum: 0.000000 2023-10-11 05:13:33,566 epoch 7 - iter 1820/2606 - loss 0.03250450 - time (sec): 970.05 - samples/sec: 266.48 - lr: 0.000055 - momentum: 0.000000 2023-10-11 05:15:49,540 epoch 7 - iter 2080/2606 - loss 0.03159278 - time (sec): 1106.02 - samples/sec: 265.63 - lr: 0.000053 - momentum: 0.000000 2023-10-11 05:18:07,114 epoch 7 - iter 2340/2606 - loss 0.03124657 - time (sec): 1243.60 - samples/sec: 265.89 - lr: 0.000052 - momentum: 0.000000 2023-10-11 05:20:21,597 epoch 7 - iter 2600/2606 - loss 0.03094040 - time (sec): 1378.08 - samples/sec: 266.18 - lr: 0.000050 - momentum: 0.000000 2023-10-11 05:20:24,427 ---------------------------------------------------------------------------------------------------- 2023-10-11 05:20:24,428 EPOCH 7 done: loss 0.0310 - lr: 0.000050 2023-10-11 05:21:05,201 DEV : loss 0.3855676054954529 - f1-score (micro avg) 0.349 2023-10-11 05:21:05,255 ---------------------------------------------------------------------------------------------------- 2023-10-11 05:23:17,454 epoch 8 - iter 260/2606 - loss 0.01823084 - time (sec): 132.20 - samples/sec: 278.11 - lr: 0.000048 - momentum: 0.000000 2023-10-11 05:25:30,878 epoch 8 - iter 520/2606 - loss 0.02290933 - time (sec): 265.62 - samples/sec: 278.67 - lr: 0.000047 - momentum: 0.000000 2023-10-11 05:27:43,132 epoch 8 - iter 780/2606 - loss 0.02219691 - time (sec): 397.88 - samples/sec: 276.89 - lr: 0.000045 - momentum: 0.000000 2023-10-11 05:29:55,683 epoch 8 - iter 1040/2606 - loss 0.02069056 - time (sec): 530.43 - samples/sec: 276.21 - lr: 0.000043 - momentum: 0.000000 2023-10-11 05:32:15,605 epoch 8 - iter 1300/2606 - loss 0.02116472 - time (sec): 670.35 - samples/sec: 274.99 - lr: 0.000042 - momentum: 0.000000 2023-10-11 05:34:31,899 epoch 8 - iter 1560/2606 - loss 0.02187514 - time (sec): 806.64 - samples/sec: 273.92 - lr: 0.000040 - momentum: 0.000000 2023-10-11 05:36:46,629 epoch 8 - iter 1820/2606 - loss 0.02138843 - time (sec): 941.37 - samples/sec: 272.65 - lr: 0.000038 - momentum: 0.000000 2023-10-11 05:39:04,653 epoch 8 - iter 2080/2606 - loss 0.02142026 - time (sec): 1079.40 - samples/sec: 271.95 - lr: 0.000037 - momentum: 0.000000 2023-10-11 05:41:23,550 epoch 8 - iter 2340/2606 - loss 0.02137287 - time (sec): 1218.29 - samples/sec: 271.99 - lr: 0.000035 - momentum: 0.000000 2023-10-11 05:43:36,354 epoch 8 - iter 2600/2606 - loss 0.02183994 - time (sec): 1351.10 - samples/sec: 271.12 - lr: 0.000033 - momentum: 0.000000 2023-10-11 05:43:39,801 ---------------------------------------------------------------------------------------------------- 2023-10-11 05:43:39,801 EPOCH 8 done: loss 0.0219 - lr: 0.000033 2023-10-11 05:44:21,249 DEV : loss 0.40076038241386414 - f1-score (micro avg) 0.3927 2023-10-11 05:44:21,303 saving best model 2023-10-11 05:44:27,529 ---------------------------------------------------------------------------------------------------- 2023-10-11 05:46:47,143 epoch 9 - iter 260/2606 - loss 0.01797232 - time (sec): 139.61 - samples/sec: 274.39 - lr: 0.000032 - momentum: 0.000000 2023-10-11 05:49:06,372 epoch 9 - iter 520/2606 - loss 0.01701648 - time (sec): 278.84 - samples/sec: 271.57 - lr: 0.000030 - momentum: 0.000000 2023-10-11 05:51:24,008 epoch 9 - iter 780/2606 - loss 0.01514496 - time (sec): 416.47 - samples/sec: 267.41 - lr: 0.000028 - momentum: 0.000000 2023-10-11 05:53:38,087 epoch 9 - iter 1040/2606 - loss 0.01510022 - time (sec): 550.55 - samples/sec: 265.10 - lr: 0.000027 - momentum: 0.000000 2023-10-11 05:55:53,133 epoch 9 - iter 1300/2606 - loss 0.01530686 - time (sec): 685.60 - samples/sec: 267.48 - lr: 0.000025 - momentum: 0.000000 2023-10-11 05:58:07,417 epoch 9 - iter 1560/2606 - loss 0.01503083 - time (sec): 819.88 - samples/sec: 266.57 - lr: 0.000023 - momentum: 0.000000 2023-10-11 06:00:23,208 epoch 9 - iter 1820/2606 - loss 0.01507330 - time (sec): 955.67 - samples/sec: 267.36 - lr: 0.000022 - momentum: 0.000000 2023-10-11 06:02:38,780 epoch 9 - iter 2080/2606 - loss 0.01452415 - time (sec): 1091.25 - samples/sec: 268.44 - lr: 0.000020 - momentum: 0.000000 2023-10-11 06:04:56,244 epoch 9 - iter 2340/2606 - loss 0.01529258 - time (sec): 1228.71 - samples/sec: 268.86 - lr: 0.000018 - momentum: 0.000000 2023-10-11 06:07:13,677 epoch 9 - iter 2600/2606 - loss 0.01565526 - time (sec): 1366.14 - samples/sec: 268.43 - lr: 0.000017 - momentum: 0.000000 2023-10-11 06:07:16,755 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:07:16,755 EPOCH 9 done: loss 0.0156 - lr: 0.000017 2023-10-11 06:07:57,652 DEV : loss 0.4159277677536011 - f1-score (micro avg) 0.3883 2023-10-11 06:07:57,716 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:10:16,257 epoch 10 - iter 260/2606 - loss 0.00788747 - time (sec): 138.54 - samples/sec: 263.07 - lr: 0.000015 - momentum: 0.000000 2023-10-11 06:12:30,532 epoch 10 - iter 520/2606 - loss 0.01146169 - time (sec): 272.81 - samples/sec: 263.35 - lr: 0.000013 - momentum: 0.000000 2023-10-11 06:14:50,774 epoch 10 - iter 780/2606 - loss 0.01052928 - time (sec): 413.06 - samples/sec: 262.47 - lr: 0.000012 - momentum: 0.000000 2023-10-11 06:17:07,263 epoch 10 - iter 1040/2606 - loss 0.01073654 - time (sec): 549.55 - samples/sec: 261.11 - lr: 0.000010 - momentum: 0.000000 2023-10-11 06:19:31,825 epoch 10 - iter 1300/2606 - loss 0.01015607 - time (sec): 694.11 - samples/sec: 263.35 - lr: 0.000008 - momentum: 0.000000 2023-10-11 06:21:48,930 epoch 10 - iter 1560/2606 - loss 0.01048530 - time (sec): 831.21 - samples/sec: 263.02 - lr: 0.000007 - momentum: 0.000000 2023-10-11 06:24:02,051 epoch 10 - iter 1820/2606 - loss 0.01120340 - time (sec): 964.33 - samples/sec: 264.27 - lr: 0.000005 - momentum: 0.000000 2023-10-11 06:26:14,763 epoch 10 - iter 2080/2606 - loss 0.01126706 - time (sec): 1097.05 - samples/sec: 264.41 - lr: 0.000003 - momentum: 0.000000 2023-10-11 06:28:30,333 epoch 10 - iter 2340/2606 - loss 0.01126768 - time (sec): 1232.61 - samples/sec: 267.29 - lr: 0.000002 - momentum: 0.000000 2023-10-11 06:30:43,698 epoch 10 - iter 2600/2606 - loss 0.01126218 - time (sec): 1365.98 - samples/sec: 268.19 - lr: 0.000000 - momentum: 0.000000 2023-10-11 06:30:46,919 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:30:46,919 EPOCH 10 done: loss 0.0112 - lr: 0.000000 2023-10-11 06:31:28,527 DEV : loss 0.4329761266708374 - f1-score (micro avg) 0.3906 2023-10-11 06:31:29,498 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:31:29,500 Loading model from best epoch ... 2023-10-11 06:31:34,240 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 06:33:16,669 Results: - F-score (micro) 0.4521 - F-score (macro) 0.3045 - Accuracy 0.2967 By class: precision recall f1-score support LOC 0.4853 0.5840 0.5301 1214 PER 0.3764 0.4542 0.4117 808 ORG 0.2745 0.2776 0.2761 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4187 0.4912 0.4521 2390 macro avg 0.2841 0.3290 0.3045 2390 weighted avg 0.4143 0.4912 0.4492 2390 2023-10-11 06:33:16,670 ----------------------------------------------------------------------------------------------------