2023-10-11 06:34:05,203 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,205 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 06:34:05,205 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,206 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-11 06:34:05,206 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,206 Train: 20847 sentences 2023-10-11 06:34:05,206 (train_with_dev=False, train_with_test=False) 2023-10-11 06:34:05,206 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,206 Training Params: 2023-10-11 06:34:05,206 - learning_rate: "0.00016" 2023-10-11 06:34:05,206 - mini_batch_size: "8" 2023-10-11 06:34:05,206 - max_epochs: "10" 2023-10-11 06:34:05,206 - shuffle: "True" 2023-10-11 06:34:05,207 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,207 Plugins: 2023-10-11 06:34:05,207 - TensorboardLogger 2023-10-11 06:34:05,207 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 06:34:05,207 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,207 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 06:34:05,207 - metric: "('micro avg', 'f1-score')" 2023-10-11 06:34:05,207 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,207 Computation: 2023-10-11 06:34:05,207 - compute on device: cuda:0 2023-10-11 06:34:05,207 - embedding storage: none 2023-10-11 06:34:05,207 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,207 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-11 06:34:05,207 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,208 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:34:05,208 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 06:36:23,104 epoch 1 - iter 260/2606 - loss 2.79623512 - time (sec): 137.89 - samples/sec: 270.68 - lr: 0.000016 - momentum: 0.000000 2023-10-11 06:38:42,899 epoch 1 - iter 520/2606 - loss 2.51320583 - time (sec): 277.69 - samples/sec: 276.16 - lr: 0.000032 - momentum: 0.000000 2023-10-11 06:41:02,243 epoch 1 - iter 780/2606 - loss 2.13123088 - time (sec): 417.03 - samples/sec: 271.28 - lr: 0.000048 - momentum: 0.000000 2023-10-11 06:43:21,533 epoch 1 - iter 1040/2606 - loss 1.75671481 - time (sec): 556.32 - samples/sec: 268.41 - lr: 0.000064 - momentum: 0.000000 2023-10-11 06:45:41,186 epoch 1 - iter 1300/2606 - loss 1.49838201 - time (sec): 695.98 - samples/sec: 268.65 - lr: 0.000080 - momentum: 0.000000 2023-10-11 06:47:59,654 epoch 1 - iter 1560/2606 - loss 1.32866928 - time (sec): 834.44 - samples/sec: 267.37 - lr: 0.000096 - momentum: 0.000000 2023-10-11 06:50:16,336 epoch 1 - iter 1820/2606 - loss 1.20054232 - time (sec): 971.13 - samples/sec: 265.79 - lr: 0.000112 - momentum: 0.000000 2023-10-11 06:52:32,593 epoch 1 - iter 2080/2606 - loss 1.10214569 - time (sec): 1107.38 - samples/sec: 263.73 - lr: 0.000128 - momentum: 0.000000 2023-10-11 06:54:49,367 epoch 1 - iter 2340/2606 - loss 1.01185782 - time (sec): 1244.16 - samples/sec: 265.32 - lr: 0.000144 - momentum: 0.000000 2023-10-11 06:57:04,584 epoch 1 - iter 2600/2606 - loss 0.93498109 - time (sec): 1379.37 - samples/sec: 265.83 - lr: 0.000160 - momentum: 0.000000 2023-10-11 06:57:07,647 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:57:07,648 EPOCH 1 done: loss 0.9337 - lr: 0.000160 2023-10-11 06:57:43,437 DEV : loss 0.13330958783626556 - f1-score (micro avg) 0.3228 2023-10-11 06:57:43,490 saving best model 2023-10-11 06:57:44,505 ---------------------------------------------------------------------------------------------------- 2023-10-11 06:59:57,930 epoch 2 - iter 260/2606 - loss 0.21019294 - time (sec): 133.42 - samples/sec: 276.63 - lr: 0.000158 - momentum: 0.000000 2023-10-11 07:02:09,637 epoch 2 - iter 520/2606 - loss 0.19864513 - time (sec): 265.13 - samples/sec: 276.01 - lr: 0.000156 - momentum: 0.000000 2023-10-11 07:04:22,717 epoch 2 - iter 780/2606 - loss 0.20049738 - time (sec): 398.21 - samples/sec: 282.71 - lr: 0.000155 - momentum: 0.000000 2023-10-11 07:06:32,374 epoch 2 - iter 1040/2606 - loss 0.19536494 - time (sec): 527.87 - samples/sec: 282.50 - lr: 0.000153 - momentum: 0.000000 2023-10-11 07:08:44,974 epoch 2 - iter 1300/2606 - loss 0.18944632 - time (sec): 660.47 - samples/sec: 278.93 - lr: 0.000151 - momentum: 0.000000 2023-10-11 07:10:58,600 epoch 2 - iter 1560/2606 - loss 0.18221078 - time (sec): 794.09 - samples/sec: 278.60 - lr: 0.000149 - momentum: 0.000000 2023-10-11 07:13:07,761 epoch 2 - iter 1820/2606 - loss 0.18085900 - time (sec): 923.25 - samples/sec: 276.27 - lr: 0.000148 - momentum: 0.000000 2023-10-11 07:15:22,042 epoch 2 - iter 2080/2606 - loss 0.17490813 - time (sec): 1057.53 - samples/sec: 275.53 - lr: 0.000146 - momentum: 0.000000 2023-10-11 07:17:39,559 epoch 2 - iter 2340/2606 - loss 0.17046757 - time (sec): 1195.05 - samples/sec: 275.94 - lr: 0.000144 - momentum: 0.000000 2023-10-11 07:19:55,227 epoch 2 - iter 2600/2606 - loss 0.16672440 - time (sec): 1330.72 - samples/sec: 275.49 - lr: 0.000142 - momentum: 0.000000 2023-10-11 07:19:58,227 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:19:58,228 EPOCH 2 done: loss 0.1664 - lr: 0.000142 2023-10-11 07:20:39,656 DEV : loss 0.1355997771024704 - f1-score (micro avg) 0.3157 2023-10-11 07:20:39,711 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:22:55,760 epoch 3 - iter 260/2606 - loss 0.09722056 - time (sec): 136.05 - samples/sec: 256.55 - lr: 0.000140 - momentum: 0.000000 2023-10-11 07:25:13,756 epoch 3 - iter 520/2606 - loss 0.09970387 - time (sec): 274.04 - samples/sec: 259.22 - lr: 0.000139 - momentum: 0.000000 2023-10-11 07:27:29,818 epoch 3 - iter 780/2606 - loss 0.09543305 - time (sec): 410.11 - samples/sec: 260.14 - lr: 0.000137 - momentum: 0.000000 2023-10-11 07:29:51,073 epoch 3 - iter 1040/2606 - loss 0.10146619 - time (sec): 551.36 - samples/sec: 263.99 - lr: 0.000135 - momentum: 0.000000 2023-10-11 07:32:08,639 epoch 3 - iter 1300/2606 - loss 0.10377832 - time (sec): 688.93 - samples/sec: 266.67 - lr: 0.000133 - momentum: 0.000000 2023-10-11 07:34:22,168 epoch 3 - iter 1560/2606 - loss 0.10053981 - time (sec): 822.46 - samples/sec: 266.82 - lr: 0.000132 - momentum: 0.000000 2023-10-11 07:36:35,069 epoch 3 - iter 1820/2606 - loss 0.09949351 - time (sec): 955.36 - samples/sec: 266.99 - lr: 0.000130 - momentum: 0.000000 2023-10-11 07:38:49,369 epoch 3 - iter 2080/2606 - loss 0.09938405 - time (sec): 1089.66 - samples/sec: 268.16 - lr: 0.000128 - momentum: 0.000000 2023-10-11 07:41:02,378 epoch 3 - iter 2340/2606 - loss 0.09946291 - time (sec): 1222.66 - samples/sec: 268.23 - lr: 0.000126 - momentum: 0.000000 2023-10-11 07:43:18,809 epoch 3 - iter 2600/2606 - loss 0.09852433 - time (sec): 1359.10 - samples/sec: 269.85 - lr: 0.000125 - momentum: 0.000000 2023-10-11 07:43:21,680 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:43:21,680 EPOCH 3 done: loss 0.0987 - lr: 0.000125 2023-10-11 07:44:02,294 DEV : loss 0.21938827633857727 - f1-score (micro avg) 0.3278 2023-10-11 07:44:02,348 saving best model 2023-10-11 07:44:08,749 ---------------------------------------------------------------------------------------------------- 2023-10-11 07:46:23,521 epoch 4 - iter 260/2606 - loss 0.07666126 - time (sec): 134.77 - samples/sec: 260.80 - lr: 0.000123 - momentum: 0.000000 2023-10-11 07:48:42,647 epoch 4 - iter 520/2606 - loss 0.07094144 - time (sec): 273.89 - samples/sec: 262.46 - lr: 0.000121 - momentum: 0.000000 2023-10-11 07:51:03,507 epoch 4 - iter 780/2606 - loss 0.06881513 - time (sec): 414.75 - samples/sec: 262.90 - lr: 0.000119 - momentum: 0.000000 2023-10-11 07:53:21,938 epoch 4 - iter 1040/2606 - loss 0.07087916 - time (sec): 553.18 - samples/sec: 261.33 - lr: 0.000117 - momentum: 0.000000 2023-10-11 07:55:44,233 epoch 4 - iter 1300/2606 - loss 0.06949960 - time (sec): 695.48 - samples/sec: 265.57 - lr: 0.000116 - momentum: 0.000000 2023-10-11 07:58:00,501 epoch 4 - iter 1560/2606 - loss 0.06914117 - time (sec): 831.75 - samples/sec: 264.19 - lr: 0.000114 - momentum: 0.000000 2023-10-11 08:00:17,923 epoch 4 - iter 1820/2606 - loss 0.07037335 - time (sec): 969.17 - samples/sec: 265.35 - lr: 0.000112 - momentum: 0.000000 2023-10-11 08:02:37,656 epoch 4 - iter 2080/2606 - loss 0.07026631 - time (sec): 1108.90 - samples/sec: 268.13 - lr: 0.000110 - momentum: 0.000000 2023-10-11 08:04:51,458 epoch 4 - iter 2340/2606 - loss 0.06994160 - time (sec): 1242.70 - samples/sec: 266.75 - lr: 0.000109 - momentum: 0.000000 2023-10-11 08:07:07,399 epoch 4 - iter 2600/2606 - loss 0.07025162 - time (sec): 1378.65 - samples/sec: 266.17 - lr: 0.000107 - momentum: 0.000000 2023-10-11 08:07:10,188 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:07:10,189 EPOCH 4 done: loss 0.0702 - lr: 0.000107 2023-10-11 08:07:49,536 DEV : loss 0.26091474294662476 - f1-score (micro avg) 0.3583 2023-10-11 08:07:49,591 saving best model 2023-10-11 08:07:55,779 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:10:11,216 epoch 5 - iter 260/2606 - loss 0.03843460 - time (sec): 135.43 - samples/sec: 264.71 - lr: 0.000105 - momentum: 0.000000 2023-10-11 08:12:27,730 epoch 5 - iter 520/2606 - loss 0.04394140 - time (sec): 271.95 - samples/sec: 270.49 - lr: 0.000103 - momentum: 0.000000 2023-10-11 08:14:43,439 epoch 5 - iter 780/2606 - loss 0.04684401 - time (sec): 407.66 - samples/sec: 267.22 - lr: 0.000101 - momentum: 0.000000 2023-10-11 08:17:04,023 epoch 5 - iter 1040/2606 - loss 0.04695587 - time (sec): 548.24 - samples/sec: 265.21 - lr: 0.000100 - momentum: 0.000000 2023-10-11 08:19:25,434 epoch 5 - iter 1300/2606 - loss 0.04626700 - time (sec): 689.65 - samples/sec: 266.52 - lr: 0.000098 - momentum: 0.000000 2023-10-11 08:21:43,551 epoch 5 - iter 1560/2606 - loss 0.04788600 - time (sec): 827.77 - samples/sec: 264.81 - lr: 0.000096 - momentum: 0.000000 2023-10-11 08:23:59,805 epoch 5 - iter 1820/2606 - loss 0.04881533 - time (sec): 964.02 - samples/sec: 265.49 - lr: 0.000094 - momentum: 0.000000 2023-10-11 08:26:14,240 epoch 5 - iter 2080/2606 - loss 0.04905495 - time (sec): 1098.46 - samples/sec: 264.97 - lr: 0.000093 - momentum: 0.000000 2023-10-11 08:28:29,966 epoch 5 - iter 2340/2606 - loss 0.04807348 - time (sec): 1234.18 - samples/sec: 266.04 - lr: 0.000091 - momentum: 0.000000 2023-10-11 08:30:46,446 epoch 5 - iter 2600/2606 - loss 0.04913741 - time (sec): 1370.66 - samples/sec: 266.96 - lr: 0.000089 - momentum: 0.000000 2023-10-11 08:30:50,213 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:30:50,213 EPOCH 5 done: loss 0.0491 - lr: 0.000089 2023-10-11 08:31:31,103 DEV : loss 0.3354221284389496 - f1-score (micro avg) 0.3411 2023-10-11 08:31:31,156 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:33:39,991 epoch 6 - iter 260/2606 - loss 0.03643917 - time (sec): 128.83 - samples/sec: 261.17 - lr: 0.000087 - momentum: 0.000000 2023-10-11 08:35:50,088 epoch 6 - iter 520/2606 - loss 0.03512044 - time (sec): 258.93 - samples/sec: 263.39 - lr: 0.000085 - momentum: 0.000000 2023-10-11 08:38:01,620 epoch 6 - iter 780/2606 - loss 0.03671140 - time (sec): 390.46 - samples/sec: 267.74 - lr: 0.000084 - momentum: 0.000000 2023-10-11 08:40:10,677 epoch 6 - iter 1040/2606 - loss 0.03608106 - time (sec): 519.52 - samples/sec: 270.12 - lr: 0.000082 - momentum: 0.000000 2023-10-11 08:42:21,009 epoch 6 - iter 1300/2606 - loss 0.03705224 - time (sec): 649.85 - samples/sec: 272.84 - lr: 0.000080 - momentum: 0.000000 2023-10-11 08:44:31,388 epoch 6 - iter 1560/2606 - loss 0.03561669 - time (sec): 780.23 - samples/sec: 272.24 - lr: 0.000078 - momentum: 0.000000 2023-10-11 08:46:47,025 epoch 6 - iter 1820/2606 - loss 0.03469015 - time (sec): 915.87 - samples/sec: 275.06 - lr: 0.000077 - momentum: 0.000000 2023-10-11 08:48:58,805 epoch 6 - iter 2080/2606 - loss 0.03543369 - time (sec): 1047.65 - samples/sec: 276.49 - lr: 0.000075 - momentum: 0.000000 2023-10-11 08:51:13,324 epoch 6 - iter 2340/2606 - loss 0.03550377 - time (sec): 1182.17 - samples/sec: 278.36 - lr: 0.000073 - momentum: 0.000000 2023-10-11 08:53:25,518 epoch 6 - iter 2600/2606 - loss 0.03521598 - time (sec): 1314.36 - samples/sec: 278.94 - lr: 0.000071 - momentum: 0.000000 2023-10-11 08:53:28,410 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:53:28,410 EPOCH 6 done: loss 0.0353 - lr: 0.000071 2023-10-11 08:54:06,572 DEV : loss 0.4164799451828003 - f1-score (micro avg) 0.3462 2023-10-11 08:54:06,624 ---------------------------------------------------------------------------------------------------- 2023-10-11 08:56:18,996 epoch 7 - iter 260/2606 - loss 0.02421613 - time (sec): 132.37 - samples/sec: 302.86 - lr: 0.000069 - momentum: 0.000000 2023-10-11 08:58:27,905 epoch 7 - iter 520/2606 - loss 0.02396377 - time (sec): 261.28 - samples/sec: 290.49 - lr: 0.000068 - momentum: 0.000000 2023-10-11 09:00:42,064 epoch 7 - iter 780/2606 - loss 0.02348941 - time (sec): 395.44 - samples/sec: 286.61 - lr: 0.000066 - momentum: 0.000000 2023-10-11 09:02:59,932 epoch 7 - iter 1040/2606 - loss 0.02640947 - time (sec): 533.31 - samples/sec: 284.72 - lr: 0.000064 - momentum: 0.000000 2023-10-11 09:05:14,117 epoch 7 - iter 1300/2606 - loss 0.02707752 - time (sec): 667.49 - samples/sec: 277.99 - lr: 0.000062 - momentum: 0.000000 2023-10-11 09:07:30,502 epoch 7 - iter 1560/2606 - loss 0.02838498 - time (sec): 803.88 - samples/sec: 277.62 - lr: 0.000061 - momentum: 0.000000 2023-10-11 09:09:45,699 epoch 7 - iter 1820/2606 - loss 0.02906200 - time (sec): 939.07 - samples/sec: 275.27 - lr: 0.000059 - momentum: 0.000000 2023-10-11 09:12:01,451 epoch 7 - iter 2080/2606 - loss 0.02815842 - time (sec): 1074.82 - samples/sec: 273.34 - lr: 0.000057 - momentum: 0.000000 2023-10-11 09:14:19,544 epoch 7 - iter 2340/2606 - loss 0.02840422 - time (sec): 1212.92 - samples/sec: 272.62 - lr: 0.000055 - momentum: 0.000000 2023-10-11 09:16:32,556 epoch 7 - iter 2600/2606 - loss 0.02795787 - time (sec): 1345.93 - samples/sec: 272.54 - lr: 0.000053 - momentum: 0.000000 2023-10-11 09:16:35,342 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:16:35,342 EPOCH 7 done: loss 0.0280 - lr: 0.000053 2023-10-11 09:17:14,611 DEV : loss 0.38824594020843506 - f1-score (micro avg) 0.3855 2023-10-11 09:17:14,663 saving best model 2023-10-11 09:17:17,239 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:19:25,024 epoch 8 - iter 260/2606 - loss 0.01471915 - time (sec): 127.78 - samples/sec: 287.72 - lr: 0.000052 - momentum: 0.000000 2023-10-11 09:21:33,496 epoch 8 - iter 520/2606 - loss 0.01972084 - time (sec): 256.25 - samples/sec: 288.86 - lr: 0.000050 - momentum: 0.000000 2023-10-11 09:23:42,553 epoch 8 - iter 780/2606 - loss 0.02027082 - time (sec): 385.31 - samples/sec: 285.92 - lr: 0.000048 - momentum: 0.000000 2023-10-11 09:25:51,832 epoch 8 - iter 1040/2606 - loss 0.01980338 - time (sec): 514.59 - samples/sec: 284.71 - lr: 0.000046 - momentum: 0.000000 2023-10-11 09:28:01,892 epoch 8 - iter 1300/2606 - loss 0.02000664 - time (sec): 644.65 - samples/sec: 285.96 - lr: 0.000045 - momentum: 0.000000 2023-10-11 09:30:12,840 epoch 8 - iter 1560/2606 - loss 0.02105758 - time (sec): 775.60 - samples/sec: 284.89 - lr: 0.000043 - momentum: 0.000000 2023-10-11 09:32:22,237 epoch 8 - iter 1820/2606 - loss 0.02052649 - time (sec): 904.99 - samples/sec: 283.61 - lr: 0.000041 - momentum: 0.000000 2023-10-11 09:34:33,876 epoch 8 - iter 2080/2606 - loss 0.02031634 - time (sec): 1036.63 - samples/sec: 283.17 - lr: 0.000039 - momentum: 0.000000 2023-10-11 09:36:45,792 epoch 8 - iter 2340/2606 - loss 0.02078787 - time (sec): 1168.55 - samples/sec: 283.56 - lr: 0.000037 - momentum: 0.000000 2023-10-11 09:38:55,539 epoch 8 - iter 2600/2606 - loss 0.02156944 - time (sec): 1298.30 - samples/sec: 282.15 - lr: 0.000036 - momentum: 0.000000 2023-10-11 09:38:58,823 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:38:58,823 EPOCH 8 done: loss 0.0216 - lr: 0.000036 2023-10-11 09:39:38,820 DEV : loss 0.4608902931213379 - f1-score (micro avg) 0.3699 2023-10-11 09:39:38,874 ---------------------------------------------------------------------------------------------------- 2023-10-11 09:41:56,379 epoch 9 - iter 260/2606 - loss 0.01668864 - time (sec): 137.50 - samples/sec: 278.59 - lr: 0.000034 - momentum: 0.000000 2023-10-11 09:44:11,267 epoch 9 - iter 520/2606 - loss 0.01829687 - time (sec): 272.39 - samples/sec: 278.00 - lr: 0.000032 - momentum: 0.000000 2023-10-11 09:46:23,543 epoch 9 - iter 780/2606 - loss 0.01610329 - time (sec): 404.67 - samples/sec: 275.21 - lr: 0.000030 - momentum: 0.000000 2023-10-11 09:48:36,901 epoch 9 - iter 1040/2606 - loss 0.01572074 - time (sec): 538.02 - samples/sec: 271.27 - lr: 0.000029 - momentum: 0.000000 2023-10-11 09:50:50,690 epoch 9 - iter 1300/2606 - loss 0.01553277 - time (sec): 671.81 - samples/sec: 272.97 - lr: 0.000027 - momentum: 0.000000 2023-10-11 09:53:01,793 epoch 9 - iter 1560/2606 - loss 0.01493528 - time (sec): 802.92 - samples/sec: 272.20 - lr: 0.000025 - momentum: 0.000000 2023-10-11 09:55:13,466 epoch 9 - iter 1820/2606 - loss 0.01487477 - time (sec): 934.59 - samples/sec: 273.39 - lr: 0.000023 - momentum: 0.000000 2023-10-11 09:57:26,097 epoch 9 - iter 2080/2606 - loss 0.01451586 - time (sec): 1067.22 - samples/sec: 274.48 - lr: 0.000021 - momentum: 0.000000 2023-10-11 09:59:36,968 epoch 9 - iter 2340/2606 - loss 0.01514862 - time (sec): 1198.09 - samples/sec: 275.73 - lr: 0.000020 - momentum: 0.000000 2023-10-11 10:01:48,255 epoch 9 - iter 2600/2606 - loss 0.01524259 - time (sec): 1329.38 - samples/sec: 275.86 - lr: 0.000018 - momentum: 0.000000 2023-10-11 10:01:51,153 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:01:51,154 EPOCH 9 done: loss 0.0152 - lr: 0.000018 2023-10-11 10:02:30,205 DEV : loss 0.4856250286102295 - f1-score (micro avg) 0.3617 2023-10-11 10:02:30,256 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:04:42,536 epoch 10 - iter 260/2606 - loss 0.01055031 - time (sec): 132.28 - samples/sec: 275.53 - lr: 0.000016 - momentum: 0.000000 2023-10-11 10:06:53,359 epoch 10 - iter 520/2606 - loss 0.01088598 - time (sec): 263.10 - samples/sec: 273.07 - lr: 0.000014 - momentum: 0.000000 2023-10-11 10:09:05,981 epoch 10 - iter 780/2606 - loss 0.01012233 - time (sec): 395.72 - samples/sec: 273.96 - lr: 0.000013 - momentum: 0.000000 2023-10-11 10:11:17,473 epoch 10 - iter 1040/2606 - loss 0.00939882 - time (sec): 527.21 - samples/sec: 272.16 - lr: 0.000011 - momentum: 0.000000 2023-10-11 10:13:30,498 epoch 10 - iter 1300/2606 - loss 0.00948977 - time (sec): 660.24 - samples/sec: 276.86 - lr: 0.000009 - momentum: 0.000000 2023-10-11 10:15:39,670 epoch 10 - iter 1560/2606 - loss 0.00947705 - time (sec): 789.41 - samples/sec: 276.95 - lr: 0.000007 - momentum: 0.000000 2023-10-11 10:17:49,663 epoch 10 - iter 1820/2606 - loss 0.01016429 - time (sec): 919.40 - samples/sec: 277.18 - lr: 0.000005 - momentum: 0.000000 2023-10-11 10:19:59,020 epoch 10 - iter 2080/2606 - loss 0.01020035 - time (sec): 1048.76 - samples/sec: 276.58 - lr: 0.000004 - momentum: 0.000000 2023-10-11 10:22:11,668 epoch 10 - iter 2340/2606 - loss 0.01061835 - time (sec): 1181.41 - samples/sec: 278.88 - lr: 0.000002 - momentum: 0.000000 2023-10-11 10:24:23,077 epoch 10 - iter 2600/2606 - loss 0.01044635 - time (sec): 1312.82 - samples/sec: 279.05 - lr: 0.000000 - momentum: 0.000000 2023-10-11 10:24:26,217 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:24:26,217 EPOCH 10 done: loss 0.0104 - lr: 0.000000 2023-10-11 10:25:04,805 DEV : loss 0.4848763942718506 - f1-score (micro avg) 0.3673 2023-10-11 10:25:05,729 ---------------------------------------------------------------------------------------------------- 2023-10-11 10:25:05,732 Loading model from best epoch ... 2023-10-11 10:25:09,728 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 10:26:46,831 Results: - F-score (micro) 0.4614 - F-score (macro) 0.3091 - Accuracy 0.3043 By class: precision recall f1-score support LOC 0.4851 0.5783 0.5276 1214 PER 0.4194 0.4765 0.4461 808 ORG 0.2620 0.2635 0.2627 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4330 0.4937 0.4614 2390 macro avg 0.2916 0.3295 0.3091 2390 weighted avg 0.4269 0.4937 0.4576 2390 2023-10-11 10:26:46,832 ----------------------------------------------------------------------------------------------------