2023-10-08 20:37:47,174 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,175 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-08 20:37:47,175 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,175 MultiCorpus: 966 train + 219 dev + 204 test sentences - NER_HIPE_2022 Corpus: 966 train + 219 dev + 204 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/fr/with_doc_seperator 2023-10-08 20:37:47,175 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,175 Train: 966 sentences 2023-10-08 20:37:47,175 (train_with_dev=False, train_with_test=False) 2023-10-08 20:37:47,175 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,175 Training Params: 2023-10-08 20:37:47,175 - learning_rate: "0.00015" 2023-10-08 20:37:47,175 - mini_batch_size: "4" 2023-10-08 20:37:47,176 - max_epochs: "10" 2023-10-08 20:37:47,176 - shuffle: "True" 2023-10-08 20:37:47,176 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,176 Plugins: 2023-10-08 20:37:47,176 - TensorboardLogger 2023-10-08 20:37:47,176 - LinearScheduler | warmup_fraction: '0.1' 2023-10-08 20:37:47,176 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,176 Final evaluation on model from best epoch (best-model.pt) 2023-10-08 20:37:47,176 - metric: "('micro avg', 'f1-score')" 2023-10-08 20:37:47,176 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,176 Computation: 2023-10-08 20:37:47,176 - compute on device: cuda:0 2023-10-08 20:37:47,176 - embedding storage: none 2023-10-08 20:37:47,176 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,176 Model training base path: "hmbench-ajmc/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-08 20:37:47,176 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,176 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:37:47,176 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-08 20:37:56,399 epoch 1 - iter 24/242 - loss 3.21267637 - time (sec): 9.22 - samples/sec: 240.63 - lr: 0.000014 - momentum: 0.000000 2023-10-08 20:38:06,542 epoch 1 - iter 48/242 - loss 3.20307232 - time (sec): 19.36 - samples/sec: 245.92 - lr: 0.000029 - momentum: 0.000000 2023-10-08 20:38:17,230 epoch 1 - iter 72/242 - loss 3.18434051 - time (sec): 30.05 - samples/sec: 244.17 - lr: 0.000044 - momentum: 0.000000 2023-10-08 20:38:26,812 epoch 1 - iter 96/242 - loss 3.15281690 - time (sec): 39.63 - samples/sec: 241.99 - lr: 0.000059 - momentum: 0.000000 2023-10-08 20:38:36,753 epoch 1 - iter 120/242 - loss 3.08052411 - time (sec): 49.58 - samples/sec: 243.67 - lr: 0.000074 - momentum: 0.000000 2023-10-08 20:38:47,297 epoch 1 - iter 144/242 - loss 2.97207701 - time (sec): 60.12 - samples/sec: 245.76 - lr: 0.000089 - momentum: 0.000000 2023-10-08 20:38:57,373 epoch 1 - iter 168/242 - loss 2.87257333 - time (sec): 70.20 - samples/sec: 244.74 - lr: 0.000104 - momentum: 0.000000 2023-10-08 20:39:07,642 epoch 1 - iter 192/242 - loss 2.75990826 - time (sec): 80.47 - samples/sec: 244.40 - lr: 0.000118 - momentum: 0.000000 2023-10-08 20:39:17,902 epoch 1 - iter 216/242 - loss 2.63611288 - time (sec): 90.73 - samples/sec: 244.76 - lr: 0.000133 - momentum: 0.000000 2023-10-08 20:39:27,924 epoch 1 - iter 240/242 - loss 2.51731291 - time (sec): 100.75 - samples/sec: 243.38 - lr: 0.000148 - momentum: 0.000000 2023-10-08 20:39:28,831 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:39:28,831 EPOCH 1 done: loss 2.5055 - lr: 0.000148 2023-10-08 20:39:35,225 DEV : loss 1.1602489948272705 - f1-score (micro avg) 0.0 2023-10-08 20:39:35,231 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:39:45,425 epoch 2 - iter 24/242 - loss 1.12779841 - time (sec): 10.19 - samples/sec: 253.99 - lr: 0.000148 - momentum: 0.000000 2023-10-08 20:39:55,982 epoch 2 - iter 48/242 - loss 0.99394518 - time (sec): 20.75 - samples/sec: 255.24 - lr: 0.000147 - momentum: 0.000000 2023-10-08 20:40:05,573 epoch 2 - iter 72/242 - loss 0.90000041 - time (sec): 30.34 - samples/sec: 251.24 - lr: 0.000145 - momentum: 0.000000 2023-10-08 20:40:15,598 epoch 2 - iter 96/242 - loss 0.83514172 - time (sec): 40.37 - samples/sec: 247.46 - lr: 0.000143 - momentum: 0.000000 2023-10-08 20:40:25,634 epoch 2 - iter 120/242 - loss 0.78613642 - time (sec): 50.40 - samples/sec: 246.26 - lr: 0.000142 - momentum: 0.000000 2023-10-08 20:40:36,094 epoch 2 - iter 144/242 - loss 0.75456433 - time (sec): 60.86 - samples/sec: 244.32 - lr: 0.000140 - momentum: 0.000000 2023-10-08 20:40:46,424 epoch 2 - iter 168/242 - loss 0.73651448 - time (sec): 71.19 - samples/sec: 244.83 - lr: 0.000139 - momentum: 0.000000 2023-10-08 20:40:56,039 epoch 2 - iter 192/242 - loss 0.71263451 - time (sec): 80.81 - samples/sec: 243.79 - lr: 0.000137 - momentum: 0.000000 2023-10-08 20:41:05,910 epoch 2 - iter 216/242 - loss 0.67454872 - time (sec): 90.68 - samples/sec: 243.15 - lr: 0.000135 - momentum: 0.000000 2023-10-08 20:41:15,908 epoch 2 - iter 240/242 - loss 0.64339024 - time (sec): 100.68 - samples/sec: 243.62 - lr: 0.000134 - momentum: 0.000000 2023-10-08 20:41:16,645 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:41:16,646 EPOCH 2 done: loss 0.6402 - lr: 0.000134 2023-10-08 20:41:22,973 DEV : loss 0.39065688848495483 - f1-score (micro avg) 0.0 2023-10-08 20:41:22,978 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:41:32,210 epoch 3 - iter 24/242 - loss 0.35800452 - time (sec): 9.23 - samples/sec: 246.91 - lr: 0.000132 - momentum: 0.000000 2023-10-08 20:41:41,958 epoch 3 - iter 48/242 - loss 0.34921218 - time (sec): 18.98 - samples/sec: 256.93 - lr: 0.000130 - momentum: 0.000000 2023-10-08 20:41:51,479 epoch 3 - iter 72/242 - loss 0.33892346 - time (sec): 28.50 - samples/sec: 256.60 - lr: 0.000128 - momentum: 0.000000 2023-10-08 20:42:00,975 epoch 3 - iter 96/242 - loss 0.33598328 - time (sec): 38.00 - samples/sec: 257.58 - lr: 0.000127 - momentum: 0.000000 2023-10-08 20:42:10,493 epoch 3 - iter 120/242 - loss 0.33623651 - time (sec): 47.51 - samples/sec: 260.10 - lr: 0.000125 - momentum: 0.000000 2023-10-08 20:42:19,463 epoch 3 - iter 144/242 - loss 0.32222127 - time (sec): 56.48 - samples/sec: 258.67 - lr: 0.000124 - momentum: 0.000000 2023-10-08 20:42:28,821 epoch 3 - iter 168/242 - loss 0.31486026 - time (sec): 65.84 - samples/sec: 259.35 - lr: 0.000122 - momentum: 0.000000 2023-10-08 20:42:38,426 epoch 3 - iter 192/242 - loss 0.30625824 - time (sec): 75.45 - samples/sec: 259.02 - lr: 0.000120 - momentum: 0.000000 2023-10-08 20:42:48,167 epoch 3 - iter 216/242 - loss 0.29409824 - time (sec): 85.19 - samples/sec: 261.14 - lr: 0.000119 - momentum: 0.000000 2023-10-08 20:42:57,432 epoch 3 - iter 240/242 - loss 0.28421914 - time (sec): 94.45 - samples/sec: 260.05 - lr: 0.000117 - momentum: 0.000000 2023-10-08 20:42:58,058 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:42:58,058 EPOCH 3 done: loss 0.2851 - lr: 0.000117 2023-10-08 20:43:03,885 DEV : loss 0.2419191300868988 - f1-score (micro avg) 0.5218 2023-10-08 20:43:03,891 saving best model 2023-10-08 20:43:04,744 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:43:13,856 epoch 4 - iter 24/242 - loss 0.18664465 - time (sec): 9.11 - samples/sec: 246.41 - lr: 0.000115 - momentum: 0.000000 2023-10-08 20:43:23,295 epoch 4 - iter 48/242 - loss 0.19053585 - time (sec): 18.55 - samples/sec: 254.29 - lr: 0.000113 - momentum: 0.000000 2023-10-08 20:43:31,995 epoch 4 - iter 72/242 - loss 0.19570698 - time (sec): 27.25 - samples/sec: 250.82 - lr: 0.000112 - momentum: 0.000000 2023-10-08 20:43:41,100 epoch 4 - iter 96/242 - loss 0.19846108 - time (sec): 36.35 - samples/sec: 254.05 - lr: 0.000110 - momentum: 0.000000 2023-10-08 20:43:50,906 epoch 4 - iter 120/242 - loss 0.18589080 - time (sec): 46.16 - samples/sec: 258.06 - lr: 0.000109 - momentum: 0.000000 2023-10-08 20:44:00,820 epoch 4 - iter 144/242 - loss 0.18736843 - time (sec): 56.07 - samples/sec: 259.44 - lr: 0.000107 - momentum: 0.000000 2023-10-08 20:44:09,953 epoch 4 - iter 168/242 - loss 0.18873113 - time (sec): 65.21 - samples/sec: 259.39 - lr: 0.000105 - momentum: 0.000000 2023-10-08 20:44:20,175 epoch 4 - iter 192/242 - loss 0.18433421 - time (sec): 75.43 - samples/sec: 261.04 - lr: 0.000104 - momentum: 0.000000 2023-10-08 20:44:29,441 epoch 4 - iter 216/242 - loss 0.18029319 - time (sec): 84.70 - samples/sec: 261.06 - lr: 0.000102 - momentum: 0.000000 2023-10-08 20:44:39,124 epoch 4 - iter 240/242 - loss 0.17659345 - time (sec): 94.38 - samples/sec: 260.60 - lr: 0.000100 - momentum: 0.000000 2023-10-08 20:44:39,732 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:44:39,733 EPOCH 4 done: loss 0.1767 - lr: 0.000100 2023-10-08 20:44:45,625 DEV : loss 0.17208659648895264 - f1-score (micro avg) 0.8213 2023-10-08 20:44:45,631 saving best model 2023-10-08 20:44:50,049 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:44:59,153 epoch 5 - iter 24/242 - loss 0.14219111 - time (sec): 9.10 - samples/sec: 250.58 - lr: 0.000098 - momentum: 0.000000 2023-10-08 20:45:09,188 epoch 5 - iter 48/242 - loss 0.13338823 - time (sec): 19.14 - samples/sec: 252.79 - lr: 0.000097 - momentum: 0.000000 2023-10-08 20:45:19,035 epoch 5 - iter 72/242 - loss 0.13785786 - time (sec): 28.98 - samples/sec: 259.62 - lr: 0.000095 - momentum: 0.000000 2023-10-08 20:45:28,363 epoch 5 - iter 96/242 - loss 0.13654911 - time (sec): 38.31 - samples/sec: 258.84 - lr: 0.000094 - momentum: 0.000000 2023-10-08 20:45:38,326 epoch 5 - iter 120/242 - loss 0.13694237 - time (sec): 48.28 - samples/sec: 258.18 - lr: 0.000092 - momentum: 0.000000 2023-10-08 20:45:48,243 epoch 5 - iter 144/242 - loss 0.12956289 - time (sec): 58.19 - samples/sec: 257.23 - lr: 0.000090 - momentum: 0.000000 2023-10-08 20:45:57,802 epoch 5 - iter 168/242 - loss 0.12786883 - time (sec): 67.75 - samples/sec: 255.47 - lr: 0.000089 - momentum: 0.000000 2023-10-08 20:46:07,983 epoch 5 - iter 192/242 - loss 0.12387386 - time (sec): 77.93 - samples/sec: 257.36 - lr: 0.000087 - momentum: 0.000000 2023-10-08 20:46:17,291 epoch 5 - iter 216/242 - loss 0.12074936 - time (sec): 87.24 - samples/sec: 255.43 - lr: 0.000085 - momentum: 0.000000 2023-10-08 20:46:26,564 epoch 5 - iter 240/242 - loss 0.11875479 - time (sec): 96.51 - samples/sec: 254.43 - lr: 0.000084 - momentum: 0.000000 2023-10-08 20:46:27,248 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:46:27,248 EPOCH 5 done: loss 0.1198 - lr: 0.000084 2023-10-08 20:46:33,549 DEV : loss 0.13837525248527527 - f1-score (micro avg) 0.8232 2023-10-08 20:46:33,555 saving best model 2023-10-08 20:46:37,900 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:46:48,282 epoch 6 - iter 24/242 - loss 0.09765854 - time (sec): 10.38 - samples/sec: 254.03 - lr: 0.000082 - momentum: 0.000000 2023-10-08 20:46:58,297 epoch 6 - iter 48/242 - loss 0.09653803 - time (sec): 20.40 - samples/sec: 250.98 - lr: 0.000080 - momentum: 0.000000 2023-10-08 20:47:07,880 epoch 6 - iter 72/242 - loss 0.09483115 - time (sec): 29.98 - samples/sec: 248.21 - lr: 0.000079 - momentum: 0.000000 2023-10-08 20:47:17,448 epoch 6 - iter 96/242 - loss 0.09842891 - time (sec): 39.55 - samples/sec: 242.50 - lr: 0.000077 - momentum: 0.000000 2023-10-08 20:47:27,117 epoch 6 - iter 120/242 - loss 0.09427364 - time (sec): 49.22 - samples/sec: 241.20 - lr: 0.000075 - momentum: 0.000000 2023-10-08 20:47:37,162 epoch 6 - iter 144/242 - loss 0.08993055 - time (sec): 59.26 - samples/sec: 241.92 - lr: 0.000074 - momentum: 0.000000 2023-10-08 20:47:47,380 epoch 6 - iter 168/242 - loss 0.08947974 - time (sec): 69.48 - samples/sec: 243.23 - lr: 0.000072 - momentum: 0.000000 2023-10-08 20:47:57,987 epoch 6 - iter 192/242 - loss 0.08637726 - time (sec): 80.09 - samples/sec: 244.33 - lr: 0.000070 - momentum: 0.000000 2023-10-08 20:48:08,148 epoch 6 - iter 216/242 - loss 0.08585719 - time (sec): 90.25 - samples/sec: 244.54 - lr: 0.000069 - momentum: 0.000000 2023-10-08 20:48:18,514 epoch 6 - iter 240/242 - loss 0.08554439 - time (sec): 100.61 - samples/sec: 244.24 - lr: 0.000067 - momentum: 0.000000 2023-10-08 20:48:19,251 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:48:19,251 EPOCH 6 done: loss 0.0852 - lr: 0.000067 2023-10-08 20:48:25,765 DEV : loss 0.1333693414926529 - f1-score (micro avg) 0.8385 2023-10-08 20:48:25,771 saving best model 2023-10-08 20:48:30,128 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:48:39,821 epoch 7 - iter 24/242 - loss 0.06494081 - time (sec): 9.69 - samples/sec: 240.10 - lr: 0.000065 - momentum: 0.000000 2023-10-08 20:48:49,721 epoch 7 - iter 48/242 - loss 0.07542430 - time (sec): 19.59 - samples/sec: 237.15 - lr: 0.000064 - momentum: 0.000000 2023-10-08 20:48:59,978 epoch 7 - iter 72/242 - loss 0.06532519 - time (sec): 29.85 - samples/sec: 241.62 - lr: 0.000062 - momentum: 0.000000 2023-10-08 20:49:09,773 epoch 7 - iter 96/242 - loss 0.06151443 - time (sec): 39.64 - samples/sec: 241.73 - lr: 0.000060 - momentum: 0.000000 2023-10-08 20:49:19,690 epoch 7 - iter 120/242 - loss 0.06601591 - time (sec): 49.56 - samples/sec: 241.59 - lr: 0.000059 - momentum: 0.000000 2023-10-08 20:49:30,024 epoch 7 - iter 144/242 - loss 0.06771002 - time (sec): 59.89 - samples/sec: 243.18 - lr: 0.000057 - momentum: 0.000000 2023-10-08 20:49:40,754 epoch 7 - iter 168/242 - loss 0.06487587 - time (sec): 70.62 - samples/sec: 242.82 - lr: 0.000055 - momentum: 0.000000 2023-10-08 20:49:51,342 epoch 7 - iter 192/242 - loss 0.06564763 - time (sec): 81.21 - samples/sec: 243.49 - lr: 0.000054 - momentum: 0.000000 2023-10-08 20:50:01,499 epoch 7 - iter 216/242 - loss 0.06563684 - time (sec): 91.37 - samples/sec: 244.02 - lr: 0.000052 - momentum: 0.000000 2023-10-08 20:50:11,337 epoch 7 - iter 240/242 - loss 0.06559082 - time (sec): 101.21 - samples/sec: 242.77 - lr: 0.000050 - momentum: 0.000000 2023-10-08 20:50:12,020 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:50:12,020 EPOCH 7 done: loss 0.0655 - lr: 0.000050 2023-10-08 20:50:18,799 DEV : loss 0.1311401128768921 - f1-score (micro avg) 0.8094 2023-10-08 20:50:18,805 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:50:28,618 epoch 8 - iter 24/242 - loss 0.04171124 - time (sec): 9.81 - samples/sec: 237.67 - lr: 0.000049 - momentum: 0.000000 2023-10-08 20:50:38,436 epoch 8 - iter 48/242 - loss 0.05372762 - time (sec): 19.63 - samples/sec: 248.49 - lr: 0.000047 - momentum: 0.000000 2023-10-08 20:50:48,132 epoch 8 - iter 72/242 - loss 0.04450090 - time (sec): 29.33 - samples/sec: 251.28 - lr: 0.000045 - momentum: 0.000000 2023-10-08 20:50:57,837 epoch 8 - iter 96/242 - loss 0.04381766 - time (sec): 39.03 - samples/sec: 254.52 - lr: 0.000044 - momentum: 0.000000 2023-10-08 20:51:07,936 epoch 8 - iter 120/242 - loss 0.04340234 - time (sec): 49.13 - samples/sec: 256.48 - lr: 0.000042 - momentum: 0.000000 2023-10-08 20:51:17,502 epoch 8 - iter 144/242 - loss 0.04675505 - time (sec): 58.70 - samples/sec: 255.90 - lr: 0.000040 - momentum: 0.000000 2023-10-08 20:51:26,614 epoch 8 - iter 168/242 - loss 0.04625383 - time (sec): 67.81 - samples/sec: 254.75 - lr: 0.000039 - momentum: 0.000000 2023-10-08 20:51:36,042 epoch 8 - iter 192/242 - loss 0.05160384 - time (sec): 77.24 - samples/sec: 254.83 - lr: 0.000037 - momentum: 0.000000 2023-10-08 20:51:45,733 epoch 8 - iter 216/242 - loss 0.04998427 - time (sec): 86.93 - samples/sec: 256.93 - lr: 0.000035 - momentum: 0.000000 2023-10-08 20:51:54,858 epoch 8 - iter 240/242 - loss 0.05024745 - time (sec): 96.05 - samples/sec: 256.50 - lr: 0.000034 - momentum: 0.000000 2023-10-08 20:51:55,375 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:51:55,375 EPOCH 8 done: loss 0.0501 - lr: 0.000034 2023-10-08 20:52:01,199 DEV : loss 0.12895697355270386 - f1-score (micro avg) 0.8292 2023-10-08 20:52:01,205 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:52:10,834 epoch 9 - iter 24/242 - loss 0.03088136 - time (sec): 9.63 - samples/sec: 251.25 - lr: 0.000032 - momentum: 0.000000 2023-10-08 20:52:20,209 epoch 9 - iter 48/242 - loss 0.03908230 - time (sec): 19.00 - samples/sec: 257.44 - lr: 0.000030 - momentum: 0.000000 2023-10-08 20:52:29,683 epoch 9 - iter 72/242 - loss 0.03860066 - time (sec): 28.48 - samples/sec: 256.67 - lr: 0.000029 - momentum: 0.000000 2023-10-08 20:52:39,417 epoch 9 - iter 96/242 - loss 0.03847794 - time (sec): 38.21 - samples/sec: 261.84 - lr: 0.000027 - momentum: 0.000000 2023-10-08 20:52:48,516 epoch 9 - iter 120/242 - loss 0.03757508 - time (sec): 47.31 - samples/sec: 260.96 - lr: 0.000025 - momentum: 0.000000 2023-10-08 20:52:58,084 epoch 9 - iter 144/242 - loss 0.04117309 - time (sec): 56.88 - samples/sec: 261.83 - lr: 0.000024 - momentum: 0.000000 2023-10-08 20:53:07,549 epoch 9 - iter 168/242 - loss 0.04203154 - time (sec): 66.34 - samples/sec: 263.00 - lr: 0.000022 - momentum: 0.000000 2023-10-08 20:53:16,561 epoch 9 - iter 192/242 - loss 0.04099426 - time (sec): 75.35 - samples/sec: 261.21 - lr: 0.000020 - momentum: 0.000000 2023-10-08 20:53:25,974 epoch 9 - iter 216/242 - loss 0.04015046 - time (sec): 84.77 - samples/sec: 260.89 - lr: 0.000019 - momentum: 0.000000 2023-10-08 20:53:35,409 epoch 9 - iter 240/242 - loss 0.04255973 - time (sec): 94.20 - samples/sec: 261.42 - lr: 0.000017 - momentum: 0.000000 2023-10-08 20:53:35,932 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:53:35,933 EPOCH 9 done: loss 0.0424 - lr: 0.000017 2023-10-08 20:53:41,787 DEV : loss 0.1345527321100235 - f1-score (micro avg) 0.8079 2023-10-08 20:53:41,792 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:53:52,236 epoch 10 - iter 24/242 - loss 0.05726141 - time (sec): 10.44 - samples/sec: 272.65 - lr: 0.000015 - momentum: 0.000000 2023-10-08 20:54:01,643 epoch 10 - iter 48/242 - loss 0.04996508 - time (sec): 19.85 - samples/sec: 268.12 - lr: 0.000014 - momentum: 0.000000 2023-10-08 20:54:11,055 epoch 10 - iter 72/242 - loss 0.04759714 - time (sec): 29.26 - samples/sec: 267.01 - lr: 0.000012 - momentum: 0.000000 2023-10-08 20:54:19,930 epoch 10 - iter 96/242 - loss 0.04300074 - time (sec): 38.14 - samples/sec: 262.46 - lr: 0.000010 - momentum: 0.000000 2023-10-08 20:54:29,853 epoch 10 - iter 120/242 - loss 0.03893002 - time (sec): 48.06 - samples/sec: 262.78 - lr: 0.000009 - momentum: 0.000000 2023-10-08 20:54:39,745 epoch 10 - iter 144/242 - loss 0.04154004 - time (sec): 57.95 - samples/sec: 262.72 - lr: 0.000007 - momentum: 0.000000 2023-10-08 20:54:48,658 epoch 10 - iter 168/242 - loss 0.04023619 - time (sec): 66.86 - samples/sec: 260.72 - lr: 0.000005 - momentum: 0.000000 2023-10-08 20:54:57,955 epoch 10 - iter 192/242 - loss 0.04075316 - time (sec): 76.16 - samples/sec: 261.24 - lr: 0.000004 - momentum: 0.000000 2023-10-08 20:55:07,523 epoch 10 - iter 216/242 - loss 0.03990472 - time (sec): 85.73 - samples/sec: 261.11 - lr: 0.000002 - momentum: 0.000000 2023-10-08 20:55:16,236 epoch 10 - iter 240/242 - loss 0.03893929 - time (sec): 94.44 - samples/sec: 259.37 - lr: 0.000000 - momentum: 0.000000 2023-10-08 20:55:17,017 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:55:17,018 EPOCH 10 done: loss 0.0386 - lr: 0.000000 2023-10-08 20:55:22,983 DEV : loss 0.13392986357212067 - f1-score (micro avg) 0.8243 2023-10-08 20:55:23,885 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:55:23,886 Loading model from best epoch ... 2023-10-08 20:55:27,996 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-08 20:55:33,877 Results: - F-score (micro) 0.7962 - F-score (macro) 0.4776 - Accuracy 0.6925 By class: precision recall f1-score support pers 0.8182 0.8417 0.8298 139 scope 0.8345 0.8992 0.8657 129 work 0.6263 0.7750 0.6927 80 loc 0.0000 0.0000 0.0000 9 date 0.0000 0.0000 0.0000 3 micro avg 0.7743 0.8194 0.7962 360 macro avg 0.4558 0.5032 0.4776 360 weighted avg 0.7541 0.8194 0.7845 360 2023-10-08 20:55:33,877 ----------------------------------------------------------------------------------------------------