2023-10-14 02:56:52,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,968 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 02:56:52,968 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,969 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-14 02:56:52,969 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,969 Train: 14465 sentences 2023-10-14 02:56:52,969 (train_with_dev=False, train_with_test=False) 2023-10-14 02:56:52,969 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,969 Training Params: 2023-10-14 02:56:52,969 - learning_rate: "0.00015" 2023-10-14 02:56:52,969 - mini_batch_size: "4" 2023-10-14 02:56:52,969 - max_epochs: "10" 2023-10-14 02:56:52,969 - shuffle: "True" 2023-10-14 02:56:52,969 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,970 Plugins: 2023-10-14 02:56:52,970 - TensorboardLogger 2023-10-14 02:56:52,970 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 02:56:52,970 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,970 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 02:56:52,970 - metric: "('micro avg', 'f1-score')" 2023-10-14 02:56:52,970 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,970 Computation: 2023-10-14 02:56:52,970 - compute on device: cuda:0 2023-10-14 02:56:52,970 - embedding storage: none 2023-10-14 02:56:52,970 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,970 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-14 02:56:52,970 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,970 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:56:52,971 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 02:58:32,206 epoch 1 - iter 361/3617 - loss 2.48047885 - time (sec): 99.23 - samples/sec: 380.07 - lr: 0.000015 - momentum: 0.000000 2023-10-14 03:00:10,564 epoch 1 - iter 722/3617 - loss 2.10182133 - time (sec): 197.59 - samples/sec: 376.99 - lr: 0.000030 - momentum: 0.000000 2023-10-14 03:01:51,336 epoch 1 - iter 1083/3617 - loss 1.65225929 - time (sec): 298.36 - samples/sec: 378.51 - lr: 0.000045 - momentum: 0.000000 2023-10-14 03:03:32,329 epoch 1 - iter 1444/3617 - loss 1.30928949 - time (sec): 399.36 - samples/sec: 378.96 - lr: 0.000060 - momentum: 0.000000 2023-10-14 03:05:11,749 epoch 1 - iter 1805/3617 - loss 1.08702017 - time (sec): 498.78 - samples/sec: 378.85 - lr: 0.000075 - momentum: 0.000000 2023-10-14 03:06:53,731 epoch 1 - iter 2166/3617 - loss 0.93505989 - time (sec): 600.76 - samples/sec: 377.19 - lr: 0.000090 - momentum: 0.000000 2023-10-14 03:08:31,230 epoch 1 - iter 2527/3617 - loss 0.82602626 - time (sec): 698.26 - samples/sec: 376.94 - lr: 0.000105 - momentum: 0.000000 2023-10-14 03:10:07,444 epoch 1 - iter 2888/3617 - loss 0.74069780 - time (sec): 794.47 - samples/sec: 379.05 - lr: 0.000120 - momentum: 0.000000 2023-10-14 03:11:45,292 epoch 1 - iter 3249/3617 - loss 0.66755548 - time (sec): 892.32 - samples/sec: 381.80 - lr: 0.000135 - momentum: 0.000000 2023-10-14 03:13:23,530 epoch 1 - iter 3610/3617 - loss 0.61237021 - time (sec): 990.56 - samples/sec: 382.89 - lr: 0.000150 - momentum: 0.000000 2023-10-14 03:13:25,225 ---------------------------------------------------------------------------------------------------- 2023-10-14 03:13:25,225 EPOCH 1 done: loss 0.6115 - lr: 0.000150 2023-10-14 03:14:03,125 DEV : loss 0.12874433398246765 - f1-score (micro avg) 0.6138 2023-10-14 03:14:03,190 saving best model 2023-10-14 03:14:04,104 ---------------------------------------------------------------------------------------------------- 2023-10-14 03:15:40,905 epoch 2 - iter 361/3617 - loss 0.09418970 - time (sec): 96.80 - samples/sec: 391.51 - lr: 0.000148 - momentum: 0.000000 2023-10-14 03:17:19,253 epoch 2 - iter 722/3617 - loss 0.09519731 - time (sec): 195.15 - samples/sec: 384.46 - lr: 0.000147 - momentum: 0.000000 2023-10-14 03:19:03,584 epoch 2 - iter 1083/3617 - loss 0.09418116 - time (sec): 299.48 - samples/sec: 388.15 - lr: 0.000145 - momentum: 0.000000 2023-10-14 03:20:44,465 epoch 2 - iter 1444/3617 - loss 0.09280081 - time (sec): 400.36 - samples/sec: 387.09 - lr: 0.000143 - momentum: 0.000000 2023-10-14 03:22:24,435 epoch 2 - iter 1805/3617 - loss 0.09367715 - time (sec): 500.33 - samples/sec: 386.16 - lr: 0.000142 - momentum: 0.000000 2023-10-14 03:24:00,186 epoch 2 - iter 2166/3617 - loss 0.09233796 - time (sec): 596.08 - samples/sec: 385.19 - lr: 0.000140 - momentum: 0.000000 2023-10-14 03:25:37,405 epoch 2 - iter 2527/3617 - loss 0.09099030 - time (sec): 693.30 - samples/sec: 385.19 - lr: 0.000138 - momentum: 0.000000 2023-10-14 03:27:18,849 epoch 2 - iter 2888/3617 - loss 0.09098331 - time (sec): 794.74 - samples/sec: 382.89 - lr: 0.000137 - momentum: 0.000000 2023-10-14 03:28:58,925 epoch 2 - iter 3249/3617 - loss 0.09046497 - time (sec): 894.82 - samples/sec: 382.77 - lr: 0.000135 - momentum: 0.000000 2023-10-14 03:30:35,687 epoch 2 - iter 3610/3617 - loss 0.08998216 - time (sec): 991.58 - samples/sec: 382.45 - lr: 0.000133 - momentum: 0.000000 2023-10-14 03:30:37,445 ---------------------------------------------------------------------------------------------------- 2023-10-14 03:30:37,445 EPOCH 2 done: loss 0.0899 - lr: 0.000133 2023-10-14 03:31:21,578 DEV : loss 0.11994253098964691 - f1-score (micro avg) 0.6262 2023-10-14 03:31:21,672 saving best model 2023-10-14 03:31:24,476 ---------------------------------------------------------------------------------------------------- 2023-10-14 03:33:17,847 epoch 3 - iter 361/3617 - loss 0.06787226 - time (sec): 113.36 - samples/sec: 330.36 - lr: 0.000132 - momentum: 0.000000 2023-10-14 03:35:08,246 epoch 3 - iter 722/3617 - loss 0.06752508 - time (sec): 223.76 - samples/sec: 344.88 - lr: 0.000130 - momentum: 0.000000 2023-10-14 03:36:56,698 epoch 3 - iter 1083/3617 - loss 0.06518068 - time (sec): 332.22 - samples/sec: 346.12 - lr: 0.000128 - momentum: 0.000000 2023-10-14 03:38:44,830 epoch 3 - iter 1444/3617 - loss 0.06371927 - time (sec): 440.35 - samples/sec: 346.81 - lr: 0.000127 - momentum: 0.000000 2023-10-14 03:40:32,739 epoch 3 - iter 1805/3617 - loss 0.06387658 - time (sec): 548.26 - samples/sec: 350.40 - lr: 0.000125 - momentum: 0.000000 2023-10-14 03:42:18,558 epoch 3 - iter 2166/3617 - loss 0.06346408 - time (sec): 654.08 - samples/sec: 352.37 - lr: 0.000123 - momentum: 0.000000 2023-10-14 03:44:05,888 epoch 3 - iter 2527/3617 - loss 0.06459000 - time (sec): 761.41 - samples/sec: 350.21 - lr: 0.000122 - momentum: 0.000000 2023-10-14 03:45:55,184 epoch 3 - iter 2888/3617 - loss 0.06401459 - time (sec): 870.70 - samples/sec: 348.70 - lr: 0.000120 - momentum: 0.000000 2023-10-14 03:47:39,172 epoch 3 - iter 3249/3617 - loss 0.06394856 - time (sec): 974.69 - samples/sec: 349.27 - lr: 0.000118 - momentum: 0.000000 2023-10-14 03:49:25,723 epoch 3 - iter 3610/3617 - loss 0.06409756 - time (sec): 1081.24 - samples/sec: 350.60 - lr: 0.000117 - momentum: 0.000000 2023-10-14 03:49:27,697 ---------------------------------------------------------------------------------------------------- 2023-10-14 03:49:27,697 EPOCH 3 done: loss 0.0640 - lr: 0.000117 2023-10-14 03:50:10,253 DEV : loss 0.1677953451871872 - f1-score (micro avg) 0.6461 2023-10-14 03:50:10,312 saving best model 2023-10-14 03:50:13,061 ---------------------------------------------------------------------------------------------------- 2023-10-14 03:52:00,514 epoch 4 - iter 361/3617 - loss 0.04549276 - time (sec): 107.45 - samples/sec: 340.97 - lr: 0.000115 - momentum: 0.000000 2023-10-14 03:53:47,270 epoch 4 - iter 722/3617 - loss 0.04285214 - time (sec): 214.20 - samples/sec: 351.22 - lr: 0.000113 - momentum: 0.000000 2023-10-14 03:55:39,687 epoch 4 - iter 1083/3617 - loss 0.04301779 - time (sec): 326.62 - samples/sec: 345.40 - lr: 0.000112 - momentum: 0.000000 2023-10-14 03:57:21,761 epoch 4 - iter 1444/3617 - loss 0.04227254 - time (sec): 428.70 - samples/sec: 349.22 - lr: 0.000110 - momentum: 0.000000 2023-10-14 03:59:08,557 epoch 4 - iter 1805/3617 - loss 0.04271265 - time (sec): 535.49 - samples/sec: 350.63 - lr: 0.000108 - momentum: 0.000000 2023-10-14 04:00:56,545 epoch 4 - iter 2166/3617 - loss 0.04401912 - time (sec): 643.48 - samples/sec: 353.09 - lr: 0.000107 - momentum: 0.000000 2023-10-14 04:02:49,728 epoch 4 - iter 2527/3617 - loss 0.04552488 - time (sec): 756.66 - samples/sec: 351.47 - lr: 0.000105 - momentum: 0.000000 2023-10-14 04:04:38,740 epoch 4 - iter 2888/3617 - loss 0.04620049 - time (sec): 865.67 - samples/sec: 350.11 - lr: 0.000103 - momentum: 0.000000 2023-10-14 04:06:26,364 epoch 4 - iter 3249/3617 - loss 0.04661299 - time (sec): 973.30 - samples/sec: 351.46 - lr: 0.000102 - momentum: 0.000000 2023-10-14 04:08:08,784 epoch 4 - iter 3610/3617 - loss 0.04677504 - time (sec): 1075.72 - samples/sec: 352.63 - lr: 0.000100 - momentum: 0.000000 2023-10-14 04:08:10,581 ---------------------------------------------------------------------------------------------------- 2023-10-14 04:08:10,581 EPOCH 4 done: loss 0.0467 - lr: 0.000100 2023-10-14 04:08:52,500 DEV : loss 0.2164839208126068 - f1-score (micro avg) 0.6366 2023-10-14 04:08:52,567 ---------------------------------------------------------------------------------------------------- 2023-10-14 04:10:41,619 epoch 5 - iter 361/3617 - loss 0.02826111 - time (sec): 109.05 - samples/sec: 352.78 - lr: 0.000098 - momentum: 0.000000 2023-10-14 04:12:31,812 epoch 5 - iter 722/3617 - loss 0.02975887 - time (sec): 219.24 - samples/sec: 348.59 - lr: 0.000097 - momentum: 0.000000 2023-10-14 04:14:13,673 epoch 5 - iter 1083/3617 - loss 0.03117931 - time (sec): 321.10 - samples/sec: 353.89 - lr: 0.000095 - momentum: 0.000000 2023-10-14 04:15:59,664 epoch 5 - iter 1444/3617 - loss 0.03114051 - time (sec): 427.09 - samples/sec: 351.78 - lr: 0.000093 - momentum: 0.000000 2023-10-14 04:17:51,747 epoch 5 - iter 1805/3617 - loss 0.03157137 - time (sec): 539.18 - samples/sec: 350.38 - lr: 0.000092 - momentum: 0.000000 2023-10-14 04:19:39,699 epoch 5 - iter 2166/3617 - loss 0.03117937 - time (sec): 647.13 - samples/sec: 349.96 - lr: 0.000090 - momentum: 0.000000 2023-10-14 04:21:21,870 epoch 5 - iter 2527/3617 - loss 0.03194759 - time (sec): 749.30 - samples/sec: 350.89 - lr: 0.000088 - momentum: 0.000000 2023-10-14 04:23:09,331 epoch 5 - iter 2888/3617 - loss 0.03227297 - time (sec): 856.76 - samples/sec: 351.11 - lr: 0.000087 - momentum: 0.000000 2023-10-14 04:24:50,326 epoch 5 - iter 3249/3617 - loss 0.03237174 - time (sec): 957.76 - samples/sec: 355.14 - lr: 0.000085 - momentum: 0.000000 2023-10-14 04:26:33,987 epoch 5 - iter 3610/3617 - loss 0.03258639 - time (sec): 1061.42 - samples/sec: 357.40 - lr: 0.000083 - momentum: 0.000000 2023-10-14 04:26:35,865 ---------------------------------------------------------------------------------------------------- 2023-10-14 04:26:35,865 EPOCH 5 done: loss 0.0326 - lr: 0.000083 2023-10-14 04:27:17,687 DEV : loss 0.23494853079319 - f1-score (micro avg) 0.641 2023-10-14 04:27:17,752 ---------------------------------------------------------------------------------------------------- 2023-10-14 04:29:14,894 epoch 6 - iter 361/3617 - loss 0.01939447 - time (sec): 117.14 - samples/sec: 335.01 - lr: 0.000082 - momentum: 0.000000 2023-10-14 04:30:58,369 epoch 6 - iter 722/3617 - loss 0.01924277 - time (sec): 220.61 - samples/sec: 349.27 - lr: 0.000080 - momentum: 0.000000 2023-10-14 04:32:36,989 epoch 6 - iter 1083/3617 - loss 0.02053705 - time (sec): 319.23 - samples/sec: 358.43 - lr: 0.000078 - momentum: 0.000000 2023-10-14 04:34:18,846 epoch 6 - iter 1444/3617 - loss 0.02164758 - time (sec): 421.09 - samples/sec: 358.61 - lr: 0.000077 - momentum: 0.000000 2023-10-14 04:36:07,097 epoch 6 - iter 1805/3617 - loss 0.02255132 - time (sec): 529.34 - samples/sec: 354.80 - lr: 0.000075 - momentum: 0.000000 2023-10-14 04:37:50,944 epoch 6 - iter 2166/3617 - loss 0.02251730 - time (sec): 633.19 - samples/sec: 355.87 - lr: 0.000073 - momentum: 0.000000 2023-10-14 04:39:38,861 epoch 6 - iter 2527/3617 - loss 0.02245883 - time (sec): 741.11 - samples/sec: 357.56 - lr: 0.000072 - momentum: 0.000000 2023-10-14 04:41:21,087 epoch 6 - iter 2888/3617 - loss 0.02197406 - time (sec): 843.33 - samples/sec: 360.49 - lr: 0.000070 - momentum: 0.000000 2023-10-14 04:43:02,150 epoch 6 - iter 3249/3617 - loss 0.02290527 - time (sec): 944.40 - samples/sec: 360.40 - lr: 0.000068 - momentum: 0.000000 2023-10-14 04:44:46,979 epoch 6 - iter 3610/3617 - loss 0.02272655 - time (sec): 1049.22 - samples/sec: 361.29 - lr: 0.000067 - momentum: 0.000000 2023-10-14 04:44:48,993 ---------------------------------------------------------------------------------------------------- 2023-10-14 04:44:48,994 EPOCH 6 done: loss 0.0227 - lr: 0.000067 2023-10-14 04:45:30,501 DEV : loss 0.28848496079444885 - f1-score (micro avg) 0.6514 2023-10-14 04:45:30,570 saving best model 2023-10-14 04:45:35,631 ---------------------------------------------------------------------------------------------------- 2023-10-14 04:47:22,333 epoch 7 - iter 361/3617 - loss 0.01165935 - time (sec): 106.69 - samples/sec: 359.92 - lr: 0.000065 - momentum: 0.000000 2023-10-14 04:49:03,875 epoch 7 - iter 722/3617 - loss 0.01137913 - time (sec): 208.23 - samples/sec: 365.84 - lr: 0.000063 - momentum: 0.000000 2023-10-14 04:50:48,226 epoch 7 - iter 1083/3617 - loss 0.01305213 - time (sec): 312.58 - samples/sec: 363.16 - lr: 0.000062 - momentum: 0.000000 2023-10-14 04:52:34,504 epoch 7 - iter 1444/3617 - loss 0.01287060 - time (sec): 418.86 - samples/sec: 365.53 - lr: 0.000060 - momentum: 0.000000 2023-10-14 04:54:21,015 epoch 7 - iter 1805/3617 - loss 0.01312161 - time (sec): 525.37 - samples/sec: 362.68 - lr: 0.000058 - momentum: 0.000000 2023-10-14 04:56:05,373 epoch 7 - iter 2166/3617 - loss 0.01362263 - time (sec): 629.73 - samples/sec: 362.43 - lr: 0.000057 - momentum: 0.000000 2023-10-14 04:57:53,808 epoch 7 - iter 2527/3617 - loss 0.01454342 - time (sec): 738.16 - samples/sec: 361.92 - lr: 0.000055 - momentum: 0.000000 2023-10-14 04:59:36,320 epoch 7 - iter 2888/3617 - loss 0.01455589 - time (sec): 840.68 - samples/sec: 362.15 - lr: 0.000053 - momentum: 0.000000 2023-10-14 05:01:16,947 epoch 7 - iter 3249/3617 - loss 0.01486078 - time (sec): 941.30 - samples/sec: 363.83 - lr: 0.000052 - momentum: 0.000000 2023-10-14 05:03:01,921 epoch 7 - iter 3610/3617 - loss 0.01483858 - time (sec): 1046.28 - samples/sec: 362.57 - lr: 0.000050 - momentum: 0.000000 2023-10-14 05:03:03,721 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:03:03,721 EPOCH 7 done: loss 0.0148 - lr: 0.000050 2023-10-14 05:03:46,800 DEV : loss 0.3004520535469055 - f1-score (micro avg) 0.6474 2023-10-14 05:03:46,866 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:05:31,450 epoch 8 - iter 361/3617 - loss 0.00618969 - time (sec): 104.58 - samples/sec: 355.17 - lr: 0.000048 - momentum: 0.000000 2023-10-14 05:07:14,817 epoch 8 - iter 722/3617 - loss 0.00964368 - time (sec): 207.95 - samples/sec: 361.93 - lr: 0.000047 - momentum: 0.000000 2023-10-14 05:09:01,458 epoch 8 - iter 1083/3617 - loss 0.01105306 - time (sec): 314.59 - samples/sec: 364.87 - lr: 0.000045 - momentum: 0.000000 2023-10-14 05:10:45,447 epoch 8 - iter 1444/3617 - loss 0.01059372 - time (sec): 418.58 - samples/sec: 364.30 - lr: 0.000043 - momentum: 0.000000 2023-10-14 05:12:30,524 epoch 8 - iter 1805/3617 - loss 0.01012078 - time (sec): 523.66 - samples/sec: 365.47 - lr: 0.000042 - momentum: 0.000000 2023-10-14 05:14:14,804 epoch 8 - iter 2166/3617 - loss 0.00963085 - time (sec): 627.94 - samples/sec: 364.14 - lr: 0.000040 - momentum: 0.000000 2023-10-14 05:16:02,304 epoch 8 - iter 2527/3617 - loss 0.00979590 - time (sec): 735.44 - samples/sec: 362.59 - lr: 0.000038 - momentum: 0.000000 2023-10-14 05:17:45,665 epoch 8 - iter 2888/3617 - loss 0.00961432 - time (sec): 838.80 - samples/sec: 362.84 - lr: 0.000037 - momentum: 0.000000 2023-10-14 05:19:28,394 epoch 8 - iter 3249/3617 - loss 0.00997350 - time (sec): 941.53 - samples/sec: 362.99 - lr: 0.000035 - momentum: 0.000000 2023-10-14 05:21:12,120 epoch 8 - iter 3610/3617 - loss 0.00969534 - time (sec): 1045.25 - samples/sec: 363.06 - lr: 0.000033 - momentum: 0.000000 2023-10-14 05:21:13,905 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:21:13,905 EPOCH 8 done: loss 0.0097 - lr: 0.000033 2023-10-14 05:21:55,993 DEV : loss 0.33705711364746094 - f1-score (micro avg) 0.6492 2023-10-14 05:21:56,061 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:23:41,737 epoch 9 - iter 361/3617 - loss 0.00768336 - time (sec): 105.67 - samples/sec: 366.27 - lr: 0.000032 - momentum: 0.000000 2023-10-14 05:25:32,318 epoch 9 - iter 722/3617 - loss 0.00809368 - time (sec): 216.25 - samples/sec: 361.32 - lr: 0.000030 - momentum: 0.000000 2023-10-14 05:27:20,923 epoch 9 - iter 1083/3617 - loss 0.00801755 - time (sec): 324.86 - samples/sec: 355.90 - lr: 0.000028 - momentum: 0.000000 2023-10-14 05:29:02,262 epoch 9 - iter 1444/3617 - loss 0.00794549 - time (sec): 426.20 - samples/sec: 361.14 - lr: 0.000027 - momentum: 0.000000 2023-10-14 05:30:41,183 epoch 9 - iter 1805/3617 - loss 0.00771750 - time (sec): 525.12 - samples/sec: 364.12 - lr: 0.000025 - momentum: 0.000000 2023-10-14 05:32:20,956 epoch 9 - iter 2166/3617 - loss 0.00754007 - time (sec): 624.89 - samples/sec: 367.42 - lr: 0.000023 - momentum: 0.000000 2023-10-14 05:34:00,453 epoch 9 - iter 2527/3617 - loss 0.00713992 - time (sec): 724.39 - samples/sec: 369.15 - lr: 0.000022 - momentum: 0.000000 2023-10-14 05:35:41,205 epoch 9 - iter 2888/3617 - loss 0.00704793 - time (sec): 825.14 - samples/sec: 368.97 - lr: 0.000020 - momentum: 0.000000 2023-10-14 05:37:23,105 epoch 9 - iter 3249/3617 - loss 0.00699019 - time (sec): 927.04 - samples/sec: 367.49 - lr: 0.000018 - momentum: 0.000000 2023-10-14 05:39:03,150 epoch 9 - iter 3610/3617 - loss 0.00658969 - time (sec): 1027.09 - samples/sec: 369.22 - lr: 0.000017 - momentum: 0.000000 2023-10-14 05:39:05,141 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:39:05,142 EPOCH 9 done: loss 0.0066 - lr: 0.000017 2023-10-14 05:39:46,293 DEV : loss 0.3554496467113495 - f1-score (micro avg) 0.65 2023-10-14 05:39:46,351 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:41:28,259 epoch 10 - iter 361/3617 - loss 0.00317370 - time (sec): 101.91 - samples/sec: 370.33 - lr: 0.000015 - momentum: 0.000000 2023-10-14 05:43:12,503 epoch 10 - iter 722/3617 - loss 0.00395930 - time (sec): 206.15 - samples/sec: 358.82 - lr: 0.000013 - momentum: 0.000000 2023-10-14 05:44:56,641 epoch 10 - iter 1083/3617 - loss 0.00484872 - time (sec): 310.29 - samples/sec: 362.49 - lr: 0.000012 - momentum: 0.000000 2023-10-14 05:46:40,943 epoch 10 - iter 1444/3617 - loss 0.00441603 - time (sec): 414.59 - samples/sec: 360.25 - lr: 0.000010 - momentum: 0.000000 2023-10-14 05:48:22,688 epoch 10 - iter 1805/3617 - loss 0.00440111 - time (sec): 516.33 - samples/sec: 364.19 - lr: 0.000008 - momentum: 0.000000 2023-10-14 05:50:03,498 epoch 10 - iter 2166/3617 - loss 0.00463574 - time (sec): 617.14 - samples/sec: 365.23 - lr: 0.000007 - momentum: 0.000000 2023-10-14 05:51:45,733 epoch 10 - iter 2527/3617 - loss 0.00475224 - time (sec): 719.38 - samples/sec: 368.15 - lr: 0.000005 - momentum: 0.000000 2023-10-14 05:53:30,970 epoch 10 - iter 2888/3617 - loss 0.00477992 - time (sec): 824.62 - samples/sec: 366.33 - lr: 0.000003 - momentum: 0.000000 2023-10-14 05:55:21,600 epoch 10 - iter 3249/3617 - loss 0.00469963 - time (sec): 935.25 - samples/sec: 364.85 - lr: 0.000002 - momentum: 0.000000 2023-10-14 05:57:08,911 epoch 10 - iter 3610/3617 - loss 0.00449982 - time (sec): 1042.56 - samples/sec: 363.50 - lr: 0.000000 - momentum: 0.000000 2023-10-14 05:57:11,042 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:57:11,043 EPOCH 10 done: loss 0.0045 - lr: 0.000000 2023-10-14 05:57:55,357 DEV : loss 0.3686419725418091 - f1-score (micro avg) 0.654 2023-10-14 05:57:55,427 saving best model 2023-10-14 05:58:03,839 ---------------------------------------------------------------------------------------------------- 2023-10-14 05:58:03,841 Loading model from best epoch ... 2023-10-14 05:58:08,061 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-14 05:59:10,367 Results: - F-score (micro) 0.6565 - F-score (macro) 0.5195 - Accuracy 0.5017 By class: precision recall f1-score support loc 0.6609 0.7750 0.7134 591 pers 0.5807 0.7255 0.6451 357 org 0.2295 0.1772 0.2000 79 micro avg 0.6092 0.7118 0.6565 1027 macro avg 0.4904 0.5592 0.5195 1027 weighted avg 0.5998 0.7118 0.6502 1027 2023-10-14 05:59:10,367 ----------------------------------------------------------------------------------------------------