2023-10-06 23:37:56,326 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,327 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 23:37:56,327 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,327 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-06 23:37:56,327 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,327 Train: 1100 sentences 2023-10-06 23:37:56,327 (train_with_dev=False, train_with_test=False) 2023-10-06 23:37:56,327 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,328 Training Params: 2023-10-06 23:37:56,328 - learning_rate: "0.00016" 2023-10-06 23:37:56,328 - mini_batch_size: "4" 2023-10-06 23:37:56,328 - max_epochs: "10" 2023-10-06 23:37:56,328 - shuffle: "True" 2023-10-06 23:37:56,328 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,328 Plugins: 2023-10-06 23:37:56,328 - TensorboardLogger 2023-10-06 23:37:56,328 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 23:37:56,328 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,328 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 23:37:56,328 - metric: "('micro avg', 'f1-score')" 2023-10-06 23:37:56,328 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,328 Computation: 2023-10-06 23:37:56,328 - compute on device: cuda:0 2023-10-06 23:37:56,328 - embedding storage: none 2023-10-06 23:37:56,328 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,328 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-06 23:37:56,328 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,328 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:37:56,329 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 23:38:06,529 epoch 1 - iter 27/275 - loss 3.20759716 - time (sec): 10.20 - samples/sec: 214.62 - lr: 0.000015 - momentum: 0.000000 2023-10-06 23:38:17,461 epoch 1 - iter 54/275 - loss 3.19820147 - time (sec): 21.13 - samples/sec: 211.58 - lr: 0.000031 - momentum: 0.000000 2023-10-06 23:38:29,075 epoch 1 - iter 81/275 - loss 3.17828879 - time (sec): 32.75 - samples/sec: 212.40 - lr: 0.000047 - momentum: 0.000000 2023-10-06 23:38:40,188 epoch 1 - iter 108/275 - loss 3.13738568 - time (sec): 43.86 - samples/sec: 209.88 - lr: 0.000062 - momentum: 0.000000 2023-10-06 23:38:51,695 epoch 1 - iter 135/275 - loss 3.05638381 - time (sec): 55.36 - samples/sec: 209.25 - lr: 0.000078 - momentum: 0.000000 2023-10-06 23:39:02,313 epoch 1 - iter 162/275 - loss 2.96583899 - time (sec): 65.98 - samples/sec: 207.96 - lr: 0.000094 - momentum: 0.000000 2023-10-06 23:39:13,054 epoch 1 - iter 189/275 - loss 2.85411795 - time (sec): 76.72 - samples/sec: 207.00 - lr: 0.000109 - momentum: 0.000000 2023-10-06 23:39:23,579 epoch 1 - iter 216/275 - loss 2.74239937 - time (sec): 87.25 - samples/sec: 207.43 - lr: 0.000125 - momentum: 0.000000 2023-10-06 23:39:33,970 epoch 1 - iter 243/275 - loss 2.62956778 - time (sec): 97.64 - samples/sec: 207.06 - lr: 0.000141 - momentum: 0.000000 2023-10-06 23:39:44,315 epoch 1 - iter 270/275 - loss 2.50508115 - time (sec): 107.99 - samples/sec: 206.19 - lr: 0.000157 - momentum: 0.000000 2023-10-06 23:39:46,527 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:39:46,527 EPOCH 1 done: loss 2.4748 - lr: 0.000157 2023-10-06 23:39:52,949 DEV : loss 1.1024742126464844 - f1-score (micro avg) 0.0 2023-10-06 23:39:52,955 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:40:03,743 epoch 2 - iter 27/275 - loss 0.95300002 - time (sec): 10.79 - samples/sec: 218.98 - lr: 0.000158 - momentum: 0.000000 2023-10-06 23:40:14,988 epoch 2 - iter 54/275 - loss 0.94803693 - time (sec): 22.03 - samples/sec: 215.64 - lr: 0.000157 - momentum: 0.000000 2023-10-06 23:40:25,807 epoch 2 - iter 81/275 - loss 0.86535965 - time (sec): 32.85 - samples/sec: 210.44 - lr: 0.000155 - momentum: 0.000000 2023-10-06 23:40:37,202 epoch 2 - iter 108/275 - loss 0.85393072 - time (sec): 44.25 - samples/sec: 210.89 - lr: 0.000153 - momentum: 0.000000 2023-10-06 23:40:48,270 epoch 2 - iter 135/275 - loss 0.81474395 - time (sec): 55.31 - samples/sec: 209.57 - lr: 0.000151 - momentum: 0.000000 2023-10-06 23:40:58,826 epoch 2 - iter 162/275 - loss 0.78487656 - time (sec): 65.87 - samples/sec: 207.84 - lr: 0.000150 - momentum: 0.000000 2023-10-06 23:41:09,696 epoch 2 - iter 189/275 - loss 0.76827034 - time (sec): 76.74 - samples/sec: 207.12 - lr: 0.000148 - momentum: 0.000000 2023-10-06 23:41:20,400 epoch 2 - iter 216/275 - loss 0.74333005 - time (sec): 87.44 - samples/sec: 207.36 - lr: 0.000146 - momentum: 0.000000 2023-10-06 23:41:30,441 epoch 2 - iter 243/275 - loss 0.71127276 - time (sec): 97.48 - samples/sec: 206.25 - lr: 0.000144 - momentum: 0.000000 2023-10-06 23:41:41,384 epoch 2 - iter 270/275 - loss 0.67603678 - time (sec): 108.43 - samples/sec: 206.43 - lr: 0.000143 - momentum: 0.000000 2023-10-06 23:41:43,238 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:41:43,238 EPOCH 2 done: loss 0.6729 - lr: 0.000143 2023-10-06 23:41:49,895 DEV : loss 0.400844007730484 - f1-score (micro avg) 0.5569 2023-10-06 23:41:49,900 saving best model 2023-10-06 23:41:50,781 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:42:01,835 epoch 3 - iter 27/275 - loss 0.40468884 - time (sec): 11.05 - samples/sec: 214.61 - lr: 0.000141 - momentum: 0.000000 2023-10-06 23:42:12,301 epoch 3 - iter 54/275 - loss 0.36215445 - time (sec): 21.52 - samples/sec: 210.70 - lr: 0.000139 - momentum: 0.000000 2023-10-06 23:42:23,376 epoch 3 - iter 81/275 - loss 0.34555518 - time (sec): 32.59 - samples/sec: 212.13 - lr: 0.000137 - momentum: 0.000000 2023-10-06 23:42:34,050 epoch 3 - iter 108/275 - loss 0.33564958 - time (sec): 43.27 - samples/sec: 209.93 - lr: 0.000135 - momentum: 0.000000 2023-10-06 23:42:45,440 epoch 3 - iter 135/275 - loss 0.31662012 - time (sec): 54.66 - samples/sec: 209.34 - lr: 0.000134 - momentum: 0.000000 2023-10-06 23:42:55,854 epoch 3 - iter 162/275 - loss 0.30658692 - time (sec): 65.07 - samples/sec: 208.48 - lr: 0.000132 - momentum: 0.000000 2023-10-06 23:43:07,012 epoch 3 - iter 189/275 - loss 0.29749956 - time (sec): 76.23 - samples/sec: 208.19 - lr: 0.000130 - momentum: 0.000000 2023-10-06 23:43:18,227 epoch 3 - iter 216/275 - loss 0.29258974 - time (sec): 87.44 - samples/sec: 208.93 - lr: 0.000128 - momentum: 0.000000 2023-10-06 23:43:28,398 epoch 3 - iter 243/275 - loss 0.28746511 - time (sec): 97.62 - samples/sec: 207.21 - lr: 0.000127 - momentum: 0.000000 2023-10-06 23:43:38,934 epoch 3 - iter 270/275 - loss 0.28244896 - time (sec): 108.15 - samples/sec: 207.44 - lr: 0.000125 - momentum: 0.000000 2023-10-06 23:43:40,732 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:43:40,733 EPOCH 3 done: loss 0.2829 - lr: 0.000125 2023-10-06 23:43:47,329 DEV : loss 0.2035077065229416 - f1-score (micro avg) 0.7707 2023-10-06 23:43:47,335 saving best model 2023-10-06 23:43:48,306 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:43:59,128 epoch 4 - iter 27/275 - loss 0.21579728 - time (sec): 10.82 - samples/sec: 209.24 - lr: 0.000123 - momentum: 0.000000 2023-10-06 23:44:10,119 epoch 4 - iter 54/275 - loss 0.21108141 - time (sec): 21.81 - samples/sec: 211.82 - lr: 0.000121 - momentum: 0.000000 2023-10-06 23:44:20,928 epoch 4 - iter 81/275 - loss 0.19892179 - time (sec): 32.62 - samples/sec: 209.69 - lr: 0.000119 - momentum: 0.000000 2023-10-06 23:44:30,969 epoch 4 - iter 108/275 - loss 0.18241997 - time (sec): 42.66 - samples/sec: 205.55 - lr: 0.000118 - momentum: 0.000000 2023-10-06 23:44:42,280 epoch 4 - iter 135/275 - loss 0.17029466 - time (sec): 53.97 - samples/sec: 206.72 - lr: 0.000116 - momentum: 0.000000 2023-10-06 23:44:53,633 epoch 4 - iter 162/275 - loss 0.16263732 - time (sec): 65.33 - samples/sec: 209.08 - lr: 0.000114 - momentum: 0.000000 2023-10-06 23:45:04,038 epoch 4 - iter 189/275 - loss 0.16026098 - time (sec): 75.73 - samples/sec: 207.59 - lr: 0.000112 - momentum: 0.000000 2023-10-06 23:45:14,996 epoch 4 - iter 216/275 - loss 0.15427176 - time (sec): 86.69 - samples/sec: 206.97 - lr: 0.000111 - momentum: 0.000000 2023-10-06 23:45:25,408 epoch 4 - iter 243/275 - loss 0.14934868 - time (sec): 97.10 - samples/sec: 205.84 - lr: 0.000109 - momentum: 0.000000 2023-10-06 23:45:35,910 epoch 4 - iter 270/275 - loss 0.14414700 - time (sec): 107.60 - samples/sec: 206.71 - lr: 0.000107 - momentum: 0.000000 2023-10-06 23:45:38,344 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:45:38,345 EPOCH 4 done: loss 0.1436 - lr: 0.000107 2023-10-06 23:45:44,979 DEV : loss 0.14160457253456116 - f1-score (micro avg) 0.8403 2023-10-06 23:45:44,984 saving best model 2023-10-06 23:45:45,900 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:45:56,654 epoch 5 - iter 27/275 - loss 0.07496784 - time (sec): 10.75 - samples/sec: 203.59 - lr: 0.000105 - momentum: 0.000000 2023-10-06 23:46:07,540 epoch 5 - iter 54/275 - loss 0.08379570 - time (sec): 21.64 - samples/sec: 208.52 - lr: 0.000103 - momentum: 0.000000 2023-10-06 23:46:18,023 epoch 5 - iter 81/275 - loss 0.08385845 - time (sec): 32.12 - samples/sec: 207.12 - lr: 0.000102 - momentum: 0.000000 2023-10-06 23:46:28,720 epoch 5 - iter 108/275 - loss 0.09152714 - time (sec): 42.82 - samples/sec: 205.43 - lr: 0.000100 - momentum: 0.000000 2023-10-06 23:46:39,091 epoch 5 - iter 135/275 - loss 0.09302558 - time (sec): 53.19 - samples/sec: 204.37 - lr: 0.000098 - momentum: 0.000000 2023-10-06 23:46:50,304 epoch 5 - iter 162/275 - loss 0.08986553 - time (sec): 64.40 - samples/sec: 205.01 - lr: 0.000096 - momentum: 0.000000 2023-10-06 23:47:01,183 epoch 5 - iter 189/275 - loss 0.08459261 - time (sec): 75.28 - samples/sec: 206.04 - lr: 0.000095 - momentum: 0.000000 2023-10-06 23:47:12,185 epoch 5 - iter 216/275 - loss 0.08318350 - time (sec): 86.28 - samples/sec: 205.93 - lr: 0.000093 - momentum: 0.000000 2023-10-06 23:47:22,762 epoch 5 - iter 243/275 - loss 0.09007042 - time (sec): 96.86 - samples/sec: 206.23 - lr: 0.000091 - momentum: 0.000000 2023-10-06 23:47:33,559 epoch 5 - iter 270/275 - loss 0.08780662 - time (sec): 107.66 - samples/sec: 206.50 - lr: 0.000089 - momentum: 0.000000 2023-10-06 23:47:35,966 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:47:35,966 EPOCH 5 done: loss 0.0890 - lr: 0.000089 2023-10-06 23:47:42,635 DEV : loss 0.12801072001457214 - f1-score (micro avg) 0.8671 2023-10-06 23:47:42,641 saving best model 2023-10-06 23:47:43,559 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:47:54,562 epoch 6 - iter 27/275 - loss 0.06769972 - time (sec): 11.00 - samples/sec: 207.52 - lr: 0.000087 - momentum: 0.000000 2023-10-06 23:48:04,606 epoch 6 - iter 54/275 - loss 0.08070495 - time (sec): 21.05 - samples/sec: 204.89 - lr: 0.000086 - momentum: 0.000000 2023-10-06 23:48:15,124 epoch 6 - iter 81/275 - loss 0.07360525 - time (sec): 31.56 - samples/sec: 204.29 - lr: 0.000084 - momentum: 0.000000 2023-10-06 23:48:25,640 epoch 6 - iter 108/275 - loss 0.07944394 - time (sec): 42.08 - samples/sec: 203.62 - lr: 0.000082 - momentum: 0.000000 2023-10-06 23:48:36,668 epoch 6 - iter 135/275 - loss 0.07420191 - time (sec): 53.11 - samples/sec: 202.67 - lr: 0.000080 - momentum: 0.000000 2023-10-06 23:48:47,943 epoch 6 - iter 162/275 - loss 0.07552311 - time (sec): 64.38 - samples/sec: 204.72 - lr: 0.000079 - momentum: 0.000000 2023-10-06 23:48:58,508 epoch 6 - iter 189/275 - loss 0.07600925 - time (sec): 74.95 - samples/sec: 204.38 - lr: 0.000077 - momentum: 0.000000 2023-10-06 23:49:09,816 epoch 6 - iter 216/275 - loss 0.08172512 - time (sec): 86.26 - samples/sec: 205.09 - lr: 0.000075 - momentum: 0.000000 2023-10-06 23:49:20,901 epoch 6 - iter 243/275 - loss 0.07486038 - time (sec): 97.34 - samples/sec: 206.00 - lr: 0.000073 - momentum: 0.000000 2023-10-06 23:49:31,721 epoch 6 - iter 270/275 - loss 0.06979166 - time (sec): 108.16 - samples/sec: 206.76 - lr: 0.000072 - momentum: 0.000000 2023-10-06 23:49:33,748 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:49:33,749 EPOCH 6 done: loss 0.0689 - lr: 0.000072 2023-10-06 23:49:40,392 DEV : loss 0.1296079158782959 - f1-score (micro avg) 0.864 2023-10-06 23:49:40,398 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:49:50,352 epoch 7 - iter 27/275 - loss 0.05177797 - time (sec): 9.95 - samples/sec: 199.84 - lr: 0.000070 - momentum: 0.000000 2023-10-06 23:50:02,018 epoch 7 - iter 54/275 - loss 0.04259000 - time (sec): 21.62 - samples/sec: 205.52 - lr: 0.000068 - momentum: 0.000000 2023-10-06 23:50:12,879 epoch 7 - iter 81/275 - loss 0.04131591 - time (sec): 32.48 - samples/sec: 207.36 - lr: 0.000066 - momentum: 0.000000 2023-10-06 23:50:23,584 epoch 7 - iter 108/275 - loss 0.04032018 - time (sec): 43.18 - samples/sec: 208.27 - lr: 0.000064 - momentum: 0.000000 2023-10-06 23:50:34,555 epoch 7 - iter 135/275 - loss 0.05017223 - time (sec): 54.16 - samples/sec: 210.23 - lr: 0.000063 - momentum: 0.000000 2023-10-06 23:50:45,553 epoch 7 - iter 162/275 - loss 0.05245721 - time (sec): 65.15 - samples/sec: 210.40 - lr: 0.000061 - momentum: 0.000000 2023-10-06 23:50:56,163 epoch 7 - iter 189/275 - loss 0.05097137 - time (sec): 75.76 - samples/sec: 209.34 - lr: 0.000059 - momentum: 0.000000 2023-10-06 23:51:07,012 epoch 7 - iter 216/275 - loss 0.05347213 - time (sec): 86.61 - samples/sec: 208.85 - lr: 0.000058 - momentum: 0.000000 2023-10-06 23:51:17,751 epoch 7 - iter 243/275 - loss 0.05400938 - time (sec): 97.35 - samples/sec: 207.33 - lr: 0.000056 - momentum: 0.000000 2023-10-06 23:51:28,984 epoch 7 - iter 270/275 - loss 0.05386263 - time (sec): 108.58 - samples/sec: 206.98 - lr: 0.000054 - momentum: 0.000000 2023-10-06 23:51:30,610 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:51:30,610 EPOCH 7 done: loss 0.0555 - lr: 0.000054 2023-10-06 23:51:37,281 DEV : loss 0.1361655443906784 - f1-score (micro avg) 0.8636 2023-10-06 23:51:37,287 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:51:47,916 epoch 8 - iter 27/275 - loss 0.06601610 - time (sec): 10.63 - samples/sec: 202.77 - lr: 0.000052 - momentum: 0.000000 2023-10-06 23:51:58,641 epoch 8 - iter 54/275 - loss 0.04950539 - time (sec): 21.35 - samples/sec: 204.43 - lr: 0.000050 - momentum: 0.000000 2023-10-06 23:52:08,922 epoch 8 - iter 81/275 - loss 0.04331609 - time (sec): 31.63 - samples/sec: 204.37 - lr: 0.000048 - momentum: 0.000000 2023-10-06 23:52:19,388 epoch 8 - iter 108/275 - loss 0.04614914 - time (sec): 42.10 - samples/sec: 205.20 - lr: 0.000047 - momentum: 0.000000 2023-10-06 23:52:30,210 epoch 8 - iter 135/275 - loss 0.04191499 - time (sec): 52.92 - samples/sec: 204.68 - lr: 0.000045 - momentum: 0.000000 2023-10-06 23:52:41,619 epoch 8 - iter 162/275 - loss 0.04206415 - time (sec): 64.33 - samples/sec: 206.48 - lr: 0.000043 - momentum: 0.000000 2023-10-06 23:52:53,301 epoch 8 - iter 189/275 - loss 0.04251098 - time (sec): 76.01 - samples/sec: 206.23 - lr: 0.000042 - momentum: 0.000000 2023-10-06 23:53:04,768 epoch 8 - iter 216/275 - loss 0.04682950 - time (sec): 87.48 - samples/sec: 206.65 - lr: 0.000040 - momentum: 0.000000 2023-10-06 23:53:15,298 epoch 8 - iter 243/275 - loss 0.04681140 - time (sec): 98.01 - samples/sec: 206.47 - lr: 0.000038 - momentum: 0.000000 2023-10-06 23:53:25,783 epoch 8 - iter 270/275 - loss 0.04555660 - time (sec): 108.49 - samples/sec: 206.02 - lr: 0.000036 - momentum: 0.000000 2023-10-06 23:53:27,850 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:53:27,851 EPOCH 8 done: loss 0.0449 - lr: 0.000036 2023-10-06 23:53:34,523 DEV : loss 0.13024164736270905 - f1-score (micro avg) 0.8854 2023-10-06 23:53:34,529 saving best model 2023-10-06 23:53:35,455 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:53:45,556 epoch 9 - iter 27/275 - loss 0.03640670 - time (sec): 10.10 - samples/sec: 198.63 - lr: 0.000034 - momentum: 0.000000 2023-10-06 23:53:56,381 epoch 9 - iter 54/275 - loss 0.04753717 - time (sec): 20.92 - samples/sec: 204.35 - lr: 0.000032 - momentum: 0.000000 2023-10-06 23:54:07,187 epoch 9 - iter 81/275 - loss 0.05212401 - time (sec): 31.73 - samples/sec: 206.08 - lr: 0.000031 - momentum: 0.000000 2023-10-06 23:54:17,311 epoch 9 - iter 108/275 - loss 0.04952237 - time (sec): 41.85 - samples/sec: 203.06 - lr: 0.000029 - momentum: 0.000000 2023-10-06 23:54:28,090 epoch 9 - iter 135/275 - loss 0.04447321 - time (sec): 52.63 - samples/sec: 203.06 - lr: 0.000027 - momentum: 0.000000 2023-10-06 23:54:39,213 epoch 9 - iter 162/275 - loss 0.04118156 - time (sec): 63.76 - samples/sec: 204.40 - lr: 0.000026 - momentum: 0.000000 2023-10-06 23:54:50,637 epoch 9 - iter 189/275 - loss 0.04036277 - time (sec): 75.18 - samples/sec: 206.00 - lr: 0.000024 - momentum: 0.000000 2023-10-06 23:55:02,057 epoch 9 - iter 216/275 - loss 0.03812569 - time (sec): 86.60 - samples/sec: 206.33 - lr: 0.000022 - momentum: 0.000000 2023-10-06 23:55:12,859 epoch 9 - iter 243/275 - loss 0.03872181 - time (sec): 97.40 - samples/sec: 206.03 - lr: 0.000020 - momentum: 0.000000 2023-10-06 23:55:23,676 epoch 9 - iter 270/275 - loss 0.03856253 - time (sec): 108.22 - samples/sec: 206.34 - lr: 0.000019 - momentum: 0.000000 2023-10-06 23:55:25,731 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:55:25,731 EPOCH 9 done: loss 0.0386 - lr: 0.000019 2023-10-06 23:55:32,396 DEV : loss 0.1292894184589386 - f1-score (micro avg) 0.8867 2023-10-06 23:55:32,401 saving best model 2023-10-06 23:55:33,312 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:55:43,914 epoch 10 - iter 27/275 - loss 0.03171847 - time (sec): 10.60 - samples/sec: 199.81 - lr: 0.000017 - momentum: 0.000000 2023-10-06 23:55:55,585 epoch 10 - iter 54/275 - loss 0.02231766 - time (sec): 22.27 - samples/sec: 201.97 - lr: 0.000015 - momentum: 0.000000 2023-10-06 23:56:05,592 epoch 10 - iter 81/275 - loss 0.03651030 - time (sec): 32.28 - samples/sec: 199.64 - lr: 0.000013 - momentum: 0.000000 2023-10-06 23:56:15,804 epoch 10 - iter 108/275 - loss 0.03816859 - time (sec): 42.49 - samples/sec: 199.62 - lr: 0.000011 - momentum: 0.000000 2023-10-06 23:56:27,109 epoch 10 - iter 135/275 - loss 0.03758415 - time (sec): 53.80 - samples/sec: 202.81 - lr: 0.000010 - momentum: 0.000000 2023-10-06 23:56:38,127 epoch 10 - iter 162/275 - loss 0.03607183 - time (sec): 64.81 - samples/sec: 203.06 - lr: 0.000008 - momentum: 0.000000 2023-10-06 23:56:48,330 epoch 10 - iter 189/275 - loss 0.03658297 - time (sec): 75.02 - samples/sec: 202.22 - lr: 0.000006 - momentum: 0.000000 2023-10-06 23:56:59,566 epoch 10 - iter 216/275 - loss 0.03641231 - time (sec): 86.25 - samples/sec: 203.26 - lr: 0.000004 - momentum: 0.000000 2023-10-06 23:57:10,474 epoch 10 - iter 243/275 - loss 0.03684785 - time (sec): 97.16 - samples/sec: 204.28 - lr: 0.000003 - momentum: 0.000000 2023-10-06 23:57:21,852 epoch 10 - iter 270/275 - loss 0.03541448 - time (sec): 108.54 - samples/sec: 205.31 - lr: 0.000001 - momentum: 0.000000 2023-10-06 23:57:23,954 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:57:23,954 EPOCH 10 done: loss 0.0354 - lr: 0.000001 2023-10-06 23:57:30,618 DEV : loss 0.13039009273052216 - f1-score (micro avg) 0.8867 2023-10-06 23:57:31,531 ---------------------------------------------------------------------------------------------------- 2023-10-06 23:57:31,532 Loading model from best epoch ... 2023-10-06 23:57:34,402 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-06 23:57:41,638 Results: - F-score (micro) 0.9141 - F-score (macro) 0.5476 - Accuracy 0.8561 By class: precision recall f1-score support scope 0.9106 0.9261 0.9183 176 pers 0.9385 0.9531 0.9457 128 work 0.8571 0.8919 0.8742 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.9093 0.9188 0.9141 382 macro avg 0.5412 0.5542 0.5476 382 weighted avg 0.9001 0.9188 0.9093 382 2023-10-06 23:57:41,638 ----------------------------------------------------------------------------------------------------