2023-10-06 22:40:07,515 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,516 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 22:40:07,516 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,516 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-06 22:40:07,516 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,516 Train: 1100 sentences 2023-10-06 22:40:07,516 (train_with_dev=False, train_with_test=False) 2023-10-06 22:40:07,516 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,516 Training Params: 2023-10-06 22:40:07,516 - learning_rate: "0.00015" 2023-10-06 22:40:07,516 - mini_batch_size: "8" 2023-10-06 22:40:07,517 - max_epochs: "10" 2023-10-06 22:40:07,517 - shuffle: "True" 2023-10-06 22:40:07,517 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,517 Plugins: 2023-10-06 22:40:07,517 - TensorboardLogger 2023-10-06 22:40:07,517 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 22:40:07,517 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,517 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 22:40:07,517 - metric: "('micro avg', 'f1-score')" 2023-10-06 22:40:07,517 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,517 Computation: 2023-10-06 22:40:07,517 - compute on device: cuda:0 2023-10-06 22:40:07,517 - embedding storage: none 2023-10-06 22:40:07,517 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,517 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-06 22:40:07,517 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,517 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:40:07,517 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 22:40:16,980 epoch 1 - iter 13/138 - loss 3.23270694 - time (sec): 9.46 - samples/sec: 218.90 - lr: 0.000013 - momentum: 0.000000 2023-10-06 22:40:26,901 epoch 1 - iter 26/138 - loss 3.22709210 - time (sec): 19.38 - samples/sec: 221.08 - lr: 0.000027 - momentum: 0.000000 2023-10-06 22:40:37,130 epoch 1 - iter 39/138 - loss 3.21951402 - time (sec): 29.61 - samples/sec: 225.55 - lr: 0.000041 - momentum: 0.000000 2023-10-06 22:40:46,360 epoch 1 - iter 52/138 - loss 3.20762449 - time (sec): 38.84 - samples/sec: 221.49 - lr: 0.000055 - momentum: 0.000000 2023-10-06 22:40:56,280 epoch 1 - iter 65/138 - loss 3.18212496 - time (sec): 48.76 - samples/sec: 222.31 - lr: 0.000070 - momentum: 0.000000 2023-10-06 22:41:05,244 epoch 1 - iter 78/138 - loss 3.14270868 - time (sec): 57.73 - samples/sec: 218.65 - lr: 0.000084 - momentum: 0.000000 2023-10-06 22:41:14,911 epoch 1 - iter 91/138 - loss 3.07754997 - time (sec): 67.39 - samples/sec: 220.43 - lr: 0.000098 - momentum: 0.000000 2023-10-06 22:41:24,874 epoch 1 - iter 104/138 - loss 3.00132907 - time (sec): 77.36 - samples/sec: 222.09 - lr: 0.000112 - momentum: 0.000000 2023-10-06 22:41:34,377 epoch 1 - iter 117/138 - loss 2.92445174 - time (sec): 86.86 - samples/sec: 222.04 - lr: 0.000126 - momentum: 0.000000 2023-10-06 22:41:44,534 epoch 1 - iter 130/138 - loss 2.83424009 - time (sec): 97.02 - samples/sec: 223.31 - lr: 0.000140 - momentum: 0.000000 2023-10-06 22:41:49,633 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:41:49,633 EPOCH 1 done: loss 2.7938 - lr: 0.000140 2023-10-06 22:41:56,124 DEV : loss 1.8360791206359863 - f1-score (micro avg) 0.0 2023-10-06 22:41:56,129 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:42:05,325 epoch 2 - iter 13/138 - loss 1.80734911 - time (sec): 9.19 - samples/sec: 227.54 - lr: 0.000149 - momentum: 0.000000 2023-10-06 22:42:14,867 epoch 2 - iter 26/138 - loss 1.71670342 - time (sec): 18.74 - samples/sec: 227.63 - lr: 0.000147 - momentum: 0.000000 2023-10-06 22:42:24,160 epoch 2 - iter 39/138 - loss 1.61366142 - time (sec): 28.03 - samples/sec: 224.59 - lr: 0.000145 - momentum: 0.000000 2023-10-06 22:42:33,636 epoch 2 - iter 52/138 - loss 1.54140300 - time (sec): 37.51 - samples/sec: 226.26 - lr: 0.000144 - momentum: 0.000000 2023-10-06 22:42:43,213 epoch 2 - iter 65/138 - loss 1.45588373 - time (sec): 47.08 - samples/sec: 225.39 - lr: 0.000142 - momentum: 0.000000 2023-10-06 22:42:53,393 epoch 2 - iter 78/138 - loss 1.37438730 - time (sec): 57.26 - samples/sec: 224.58 - lr: 0.000141 - momentum: 0.000000 2023-10-06 22:43:02,972 epoch 2 - iter 91/138 - loss 1.31681132 - time (sec): 66.84 - samples/sec: 222.32 - lr: 0.000139 - momentum: 0.000000 2023-10-06 22:43:12,826 epoch 2 - iter 104/138 - loss 1.24181067 - time (sec): 76.70 - samples/sec: 221.63 - lr: 0.000138 - momentum: 0.000000 2023-10-06 22:43:22,615 epoch 2 - iter 117/138 - loss 1.19731384 - time (sec): 86.48 - samples/sec: 222.01 - lr: 0.000136 - momentum: 0.000000 2023-10-06 22:43:32,349 epoch 2 - iter 130/138 - loss 1.17072614 - time (sec): 96.22 - samples/sec: 223.44 - lr: 0.000134 - momentum: 0.000000 2023-10-06 22:43:37,854 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:43:37,855 EPOCH 2 done: loss 1.1501 - lr: 0.000134 2023-10-06 22:43:44,284 DEV : loss 0.8543247580528259 - f1-score (micro avg) 0.0 2023-10-06 22:43:44,290 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:43:53,454 epoch 3 - iter 13/138 - loss 0.80170569 - time (sec): 9.16 - samples/sec: 225.05 - lr: 0.000132 - momentum: 0.000000 2023-10-06 22:44:03,495 epoch 3 - iter 26/138 - loss 0.74131971 - time (sec): 19.20 - samples/sec: 229.85 - lr: 0.000130 - momentum: 0.000000 2023-10-06 22:44:13,430 epoch 3 - iter 39/138 - loss 0.73800487 - time (sec): 29.14 - samples/sec: 229.18 - lr: 0.000129 - momentum: 0.000000 2023-10-06 22:44:22,895 epoch 3 - iter 52/138 - loss 0.70649031 - time (sec): 38.60 - samples/sec: 227.59 - lr: 0.000127 - momentum: 0.000000 2023-10-06 22:44:31,963 epoch 3 - iter 65/138 - loss 0.67637647 - time (sec): 47.67 - samples/sec: 223.65 - lr: 0.000126 - momentum: 0.000000 2023-10-06 22:44:42,000 epoch 3 - iter 78/138 - loss 0.64577064 - time (sec): 57.71 - samples/sec: 224.02 - lr: 0.000124 - momentum: 0.000000 2023-10-06 22:44:51,735 epoch 3 - iter 91/138 - loss 0.63965953 - time (sec): 67.44 - samples/sec: 224.53 - lr: 0.000123 - momentum: 0.000000 2023-10-06 22:45:01,184 epoch 3 - iter 104/138 - loss 0.62298647 - time (sec): 76.89 - samples/sec: 223.84 - lr: 0.000121 - momentum: 0.000000 2023-10-06 22:45:11,004 epoch 3 - iter 117/138 - loss 0.60241587 - time (sec): 86.71 - samples/sec: 223.11 - lr: 0.000119 - momentum: 0.000000 2023-10-06 22:45:20,957 epoch 3 - iter 130/138 - loss 0.58226301 - time (sec): 96.67 - samples/sec: 223.75 - lr: 0.000118 - momentum: 0.000000 2023-10-06 22:45:26,069 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:45:26,069 EPOCH 3 done: loss 0.5799 - lr: 0.000118 2023-10-06 22:45:32,625 DEV : loss 0.4383922815322876 - f1-score (micro avg) 0.4459 2023-10-06 22:45:32,631 saving best model 2023-10-06 22:45:33,498 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:45:43,340 epoch 4 - iter 13/138 - loss 0.39528726 - time (sec): 9.84 - samples/sec: 232.91 - lr: 0.000115 - momentum: 0.000000 2023-10-06 22:45:53,026 epoch 4 - iter 26/138 - loss 0.39137875 - time (sec): 19.53 - samples/sec: 232.40 - lr: 0.000114 - momentum: 0.000000 2023-10-06 22:46:02,353 epoch 4 - iter 39/138 - loss 0.40195115 - time (sec): 28.85 - samples/sec: 231.31 - lr: 0.000112 - momentum: 0.000000 2023-10-06 22:46:11,842 epoch 4 - iter 52/138 - loss 0.39403229 - time (sec): 38.34 - samples/sec: 230.53 - lr: 0.000111 - momentum: 0.000000 2023-10-06 22:46:21,207 epoch 4 - iter 65/138 - loss 0.38808218 - time (sec): 47.71 - samples/sec: 228.14 - lr: 0.000109 - momentum: 0.000000 2023-10-06 22:46:30,628 epoch 4 - iter 78/138 - loss 0.37807952 - time (sec): 57.13 - samples/sec: 226.61 - lr: 0.000107 - momentum: 0.000000 2023-10-06 22:46:40,919 epoch 4 - iter 91/138 - loss 0.36474449 - time (sec): 67.42 - samples/sec: 225.66 - lr: 0.000106 - momentum: 0.000000 2023-10-06 22:46:50,394 epoch 4 - iter 104/138 - loss 0.35159512 - time (sec): 76.89 - samples/sec: 224.06 - lr: 0.000104 - momentum: 0.000000 2023-10-06 22:46:59,579 epoch 4 - iter 117/138 - loss 0.35097923 - time (sec): 86.08 - samples/sec: 223.75 - lr: 0.000103 - momentum: 0.000000 2023-10-06 22:47:10,071 epoch 4 - iter 130/138 - loss 0.34641242 - time (sec): 96.57 - samples/sec: 223.91 - lr: 0.000101 - momentum: 0.000000 2023-10-06 22:47:15,408 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:47:15,408 EPOCH 4 done: loss 0.3417 - lr: 0.000101 2023-10-06 22:47:21,994 DEV : loss 0.27298662066459656 - f1-score (micro avg) 0.722 2023-10-06 22:47:21,999 saving best model 2023-10-06 22:47:22,922 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:47:32,969 epoch 5 - iter 13/138 - loss 0.31949968 - time (sec): 10.04 - samples/sec: 239.13 - lr: 0.000099 - momentum: 0.000000 2023-10-06 22:47:42,469 epoch 5 - iter 26/138 - loss 0.28567257 - time (sec): 19.54 - samples/sec: 231.16 - lr: 0.000097 - momentum: 0.000000 2023-10-06 22:47:51,751 epoch 5 - iter 39/138 - loss 0.26080385 - time (sec): 28.83 - samples/sec: 229.96 - lr: 0.000096 - momentum: 0.000000 2023-10-06 22:48:01,505 epoch 5 - iter 52/138 - loss 0.25780751 - time (sec): 38.58 - samples/sec: 230.09 - lr: 0.000094 - momentum: 0.000000 2023-10-06 22:48:10,196 epoch 5 - iter 65/138 - loss 0.24192842 - time (sec): 47.27 - samples/sec: 226.75 - lr: 0.000092 - momentum: 0.000000 2023-10-06 22:48:19,883 epoch 5 - iter 78/138 - loss 0.23299080 - time (sec): 56.96 - samples/sec: 226.79 - lr: 0.000091 - momentum: 0.000000 2023-10-06 22:48:30,225 epoch 5 - iter 91/138 - loss 0.23422848 - time (sec): 67.30 - samples/sec: 226.68 - lr: 0.000089 - momentum: 0.000000 2023-10-06 22:48:39,698 epoch 5 - iter 104/138 - loss 0.22888322 - time (sec): 76.77 - samples/sec: 224.85 - lr: 0.000088 - momentum: 0.000000 2023-10-06 22:48:49,605 epoch 5 - iter 117/138 - loss 0.22339740 - time (sec): 86.68 - samples/sec: 224.12 - lr: 0.000086 - momentum: 0.000000 2023-10-06 22:48:58,739 epoch 5 - iter 130/138 - loss 0.22210052 - time (sec): 95.81 - samples/sec: 223.67 - lr: 0.000085 - momentum: 0.000000 2023-10-06 22:49:04,653 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:49:04,654 EPOCH 5 done: loss 0.2231 - lr: 0.000085 2023-10-06 22:49:11,307 DEV : loss 0.2003016620874405 - f1-score (micro avg) 0.8 2023-10-06 22:49:11,313 saving best model 2023-10-06 22:49:12,246 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:49:22,323 epoch 6 - iter 13/138 - loss 0.17800230 - time (sec): 10.08 - samples/sec: 222.22 - lr: 0.000082 - momentum: 0.000000 2023-10-06 22:49:32,392 epoch 6 - iter 26/138 - loss 0.16319816 - time (sec): 20.14 - samples/sec: 223.83 - lr: 0.000080 - momentum: 0.000000 2023-10-06 22:49:42,740 epoch 6 - iter 39/138 - loss 0.16985443 - time (sec): 30.49 - samples/sec: 227.83 - lr: 0.000079 - momentum: 0.000000 2023-10-06 22:49:52,532 epoch 6 - iter 52/138 - loss 0.17010931 - time (sec): 40.28 - samples/sec: 226.57 - lr: 0.000077 - momentum: 0.000000 2023-10-06 22:50:01,578 epoch 6 - iter 65/138 - loss 0.17050727 - time (sec): 49.33 - samples/sec: 225.02 - lr: 0.000076 - momentum: 0.000000 2023-10-06 22:50:10,325 epoch 6 - iter 78/138 - loss 0.17216643 - time (sec): 58.08 - samples/sec: 223.18 - lr: 0.000074 - momentum: 0.000000 2023-10-06 22:50:19,892 epoch 6 - iter 91/138 - loss 0.16880803 - time (sec): 67.64 - samples/sec: 223.27 - lr: 0.000073 - momentum: 0.000000 2023-10-06 22:50:29,380 epoch 6 - iter 104/138 - loss 0.16960223 - time (sec): 77.13 - samples/sec: 223.56 - lr: 0.000071 - momentum: 0.000000 2023-10-06 22:50:38,800 epoch 6 - iter 117/138 - loss 0.16582669 - time (sec): 86.55 - samples/sec: 222.72 - lr: 0.000070 - momentum: 0.000000 2023-10-06 22:50:48,643 epoch 6 - iter 130/138 - loss 0.16097879 - time (sec): 96.39 - samples/sec: 223.21 - lr: 0.000068 - momentum: 0.000000 2023-10-06 22:50:54,423 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:50:54,423 EPOCH 6 done: loss 0.1573 - lr: 0.000068 2023-10-06 22:51:01,072 DEV : loss 0.15701791644096375 - f1-score (micro avg) 0.8432 2023-10-06 22:51:01,080 saving best model 2023-10-06 22:51:02,019 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:51:12,241 epoch 7 - iter 13/138 - loss 0.10149661 - time (sec): 10.22 - samples/sec: 231.20 - lr: 0.000065 - momentum: 0.000000 2023-10-06 22:51:21,746 epoch 7 - iter 26/138 - loss 0.11288742 - time (sec): 19.73 - samples/sec: 226.76 - lr: 0.000064 - momentum: 0.000000 2023-10-06 22:51:31,279 epoch 7 - iter 39/138 - loss 0.10855925 - time (sec): 29.26 - samples/sec: 223.05 - lr: 0.000062 - momentum: 0.000000 2023-10-06 22:51:40,157 epoch 7 - iter 52/138 - loss 0.11205464 - time (sec): 38.14 - samples/sec: 221.76 - lr: 0.000061 - momentum: 0.000000 2023-10-06 22:51:49,713 epoch 7 - iter 65/138 - loss 0.11113198 - time (sec): 47.69 - samples/sec: 221.46 - lr: 0.000059 - momentum: 0.000000 2023-10-06 22:51:59,803 epoch 7 - iter 78/138 - loss 0.11529537 - time (sec): 57.78 - samples/sec: 221.59 - lr: 0.000058 - momentum: 0.000000 2023-10-06 22:52:09,382 epoch 7 - iter 91/138 - loss 0.11301134 - time (sec): 67.36 - samples/sec: 221.12 - lr: 0.000056 - momentum: 0.000000 2023-10-06 22:52:18,932 epoch 7 - iter 104/138 - loss 0.11717520 - time (sec): 76.91 - samples/sec: 221.62 - lr: 0.000054 - momentum: 0.000000 2023-10-06 22:52:27,998 epoch 7 - iter 117/138 - loss 0.12193948 - time (sec): 85.98 - samples/sec: 221.35 - lr: 0.000053 - momentum: 0.000000 2023-10-06 22:52:38,041 epoch 7 - iter 130/138 - loss 0.11945526 - time (sec): 96.02 - samples/sec: 222.69 - lr: 0.000051 - momentum: 0.000000 2023-10-06 22:52:44,131 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:52:44,131 EPOCH 7 done: loss 0.1196 - lr: 0.000051 2023-10-06 22:52:50,785 DEV : loss 0.14921151101589203 - f1-score (micro avg) 0.8464 2023-10-06 22:52:50,790 saving best model 2023-10-06 22:52:51,747 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:53:01,131 epoch 8 - iter 13/138 - loss 0.11288043 - time (sec): 9.38 - samples/sec: 221.89 - lr: 0.000049 - momentum: 0.000000 2023-10-06 22:53:10,260 epoch 8 - iter 26/138 - loss 0.10790543 - time (sec): 18.51 - samples/sec: 219.81 - lr: 0.000047 - momentum: 0.000000 2023-10-06 22:53:20,639 epoch 8 - iter 39/138 - loss 0.11620050 - time (sec): 28.89 - samples/sec: 226.09 - lr: 0.000046 - momentum: 0.000000 2023-10-06 22:53:30,881 epoch 8 - iter 52/138 - loss 0.11596319 - time (sec): 39.13 - samples/sec: 226.59 - lr: 0.000044 - momentum: 0.000000 2023-10-06 22:53:40,614 epoch 8 - iter 65/138 - loss 0.10861121 - time (sec): 48.87 - samples/sec: 226.19 - lr: 0.000043 - momentum: 0.000000 2023-10-06 22:53:49,813 epoch 8 - iter 78/138 - loss 0.10317406 - time (sec): 58.06 - samples/sec: 225.23 - lr: 0.000041 - momentum: 0.000000 2023-10-06 22:53:59,379 epoch 8 - iter 91/138 - loss 0.10247958 - time (sec): 67.63 - samples/sec: 224.37 - lr: 0.000039 - momentum: 0.000000 2023-10-06 22:54:09,062 epoch 8 - iter 104/138 - loss 0.10646015 - time (sec): 77.31 - samples/sec: 224.64 - lr: 0.000038 - momentum: 0.000000 2023-10-06 22:54:18,537 epoch 8 - iter 117/138 - loss 0.10062796 - time (sec): 86.79 - samples/sec: 223.31 - lr: 0.000036 - momentum: 0.000000 2023-10-06 22:54:28,011 epoch 8 - iter 130/138 - loss 0.09854541 - time (sec): 96.26 - samples/sec: 222.05 - lr: 0.000035 - momentum: 0.000000 2023-10-06 22:54:34,092 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:54:34,093 EPOCH 8 done: loss 0.0969 - lr: 0.000035 2023-10-06 22:54:40,787 DEV : loss 0.1336197406053543 - f1-score (micro avg) 0.8626 2023-10-06 22:54:40,793 saving best model 2023-10-06 22:54:41,748 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:54:51,317 epoch 9 - iter 13/138 - loss 0.12348235 - time (sec): 9.57 - samples/sec: 227.13 - lr: 0.000032 - momentum: 0.000000 2023-10-06 22:55:02,066 epoch 9 - iter 26/138 - loss 0.09125223 - time (sec): 20.32 - samples/sec: 229.96 - lr: 0.000031 - momentum: 0.000000 2023-10-06 22:55:11,317 epoch 9 - iter 39/138 - loss 0.08898213 - time (sec): 29.57 - samples/sec: 223.73 - lr: 0.000029 - momentum: 0.000000 2023-10-06 22:55:20,915 epoch 9 - iter 52/138 - loss 0.08155307 - time (sec): 39.16 - samples/sec: 221.01 - lr: 0.000027 - momentum: 0.000000 2023-10-06 22:55:30,641 epoch 9 - iter 65/138 - loss 0.08070331 - time (sec): 48.89 - samples/sec: 221.12 - lr: 0.000026 - momentum: 0.000000 2023-10-06 22:55:40,979 epoch 9 - iter 78/138 - loss 0.07648315 - time (sec): 59.23 - samples/sec: 223.32 - lr: 0.000024 - momentum: 0.000000 2023-10-06 22:55:50,197 epoch 9 - iter 91/138 - loss 0.07279742 - time (sec): 68.45 - samples/sec: 222.20 - lr: 0.000023 - momentum: 0.000000 2023-10-06 22:55:59,539 epoch 9 - iter 104/138 - loss 0.07440837 - time (sec): 77.79 - samples/sec: 221.08 - lr: 0.000021 - momentum: 0.000000 2023-10-06 22:56:08,356 epoch 9 - iter 117/138 - loss 0.08033490 - time (sec): 86.61 - samples/sec: 220.66 - lr: 0.000020 - momentum: 0.000000 2023-10-06 22:56:18,372 epoch 9 - iter 130/138 - loss 0.08221682 - time (sec): 96.62 - samples/sec: 221.55 - lr: 0.000018 - momentum: 0.000000 2023-10-06 22:56:24,462 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:56:24,462 EPOCH 9 done: loss 0.0837 - lr: 0.000018 2023-10-06 22:56:31,157 DEV : loss 0.1325034648180008 - f1-score (micro avg) 0.8707 2023-10-06 22:56:31,166 saving best model 2023-10-06 22:56:32,401 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:56:41,996 epoch 10 - iter 13/138 - loss 0.07287829 - time (sec): 9.59 - samples/sec: 220.89 - lr: 0.000016 - momentum: 0.000000 2023-10-06 22:56:51,521 epoch 10 - iter 26/138 - loss 0.07381896 - time (sec): 19.12 - samples/sec: 222.67 - lr: 0.000014 - momentum: 0.000000 2023-10-06 22:57:01,039 epoch 10 - iter 39/138 - loss 0.08513765 - time (sec): 28.64 - samples/sec: 219.90 - lr: 0.000012 - momentum: 0.000000 2023-10-06 22:57:10,282 epoch 10 - iter 52/138 - loss 0.07902667 - time (sec): 37.88 - samples/sec: 217.40 - lr: 0.000011 - momentum: 0.000000 2023-10-06 22:57:21,111 epoch 10 - iter 65/138 - loss 0.08501577 - time (sec): 48.71 - samples/sec: 219.47 - lr: 0.000009 - momentum: 0.000000 2023-10-06 22:57:30,993 epoch 10 - iter 78/138 - loss 0.08257276 - time (sec): 58.59 - samples/sec: 221.05 - lr: 0.000008 - momentum: 0.000000 2023-10-06 22:57:40,514 epoch 10 - iter 91/138 - loss 0.08006238 - time (sec): 68.11 - samples/sec: 221.45 - lr: 0.000006 - momentum: 0.000000 2023-10-06 22:57:50,258 epoch 10 - iter 104/138 - loss 0.07722084 - time (sec): 77.86 - samples/sec: 222.03 - lr: 0.000005 - momentum: 0.000000 2023-10-06 22:58:00,547 epoch 10 - iter 117/138 - loss 0.07610917 - time (sec): 88.14 - samples/sec: 222.76 - lr: 0.000003 - momentum: 0.000000 2023-10-06 22:58:09,847 epoch 10 - iter 130/138 - loss 0.07577818 - time (sec): 97.44 - samples/sec: 221.91 - lr: 0.000001 - momentum: 0.000000 2023-10-06 22:58:15,066 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:15,067 EPOCH 10 done: loss 0.0780 - lr: 0.000001 2023-10-06 22:58:21,772 DEV : loss 0.1309366226196289 - f1-score (micro avg) 0.8707 2023-10-06 22:58:22,650 ---------------------------------------------------------------------------------------------------- 2023-10-06 22:58:22,651 Loading model from best epoch ... 2023-10-06 22:58:25,647 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-06 22:58:32,909 Results: - F-score (micro) 0.8786 - F-score (macro) 0.5233 - Accuracy 0.8057 By class: precision recall f1-score support scope 0.8743 0.9091 0.8914 176 pers 0.8947 0.9297 0.9119 128 work 0.8026 0.8243 0.8133 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8673 0.8901 0.8786 382 macro avg 0.5143 0.5326 0.5233 382 weighted avg 0.8581 0.8901 0.8738 382 2023-10-06 22:58:32,909 ----------------------------------------------------------------------------------------------------