2023-10-07 00:16:56,250 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,251 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-07 00:16:56,251 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,251 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-07 00:16:56,251 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,251 Train: 1100 sentences 2023-10-07 00:16:56,251 (train_with_dev=False, train_with_test=False) 2023-10-07 00:16:56,252 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,252 Training Params: 2023-10-07 00:16:56,252 - learning_rate: "0.00016" 2023-10-07 00:16:56,252 - mini_batch_size: "8" 2023-10-07 00:16:56,252 - max_epochs: "10" 2023-10-07 00:16:56,252 - shuffle: "True" 2023-10-07 00:16:56,252 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,252 Plugins: 2023-10-07 00:16:56,252 - TensorboardLogger 2023-10-07 00:16:56,252 - LinearScheduler | warmup_fraction: '0.1' 2023-10-07 00:16:56,252 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,252 Final evaluation on model from best epoch (best-model.pt) 2023-10-07 00:16:56,252 - metric: "('micro avg', 'f1-score')" 2023-10-07 00:16:56,252 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,252 Computation: 2023-10-07 00:16:56,252 - compute on device: cuda:0 2023-10-07 00:16:56,252 - embedding storage: none 2023-10-07 00:16:56,252 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,252 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-07 00:16:56,252 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,253 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:16:56,253 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-07 00:17:05,291 epoch 1 - iter 13/138 - loss 3.20987785 - time (sec): 9.04 - samples/sec: 232.04 - lr: 0.000014 - momentum: 0.000000 2023-10-07 00:17:14,933 epoch 1 - iter 26/138 - loss 3.20425074 - time (sec): 18.68 - samples/sec: 231.54 - lr: 0.000029 - momentum: 0.000000 2023-10-07 00:17:25,113 epoch 1 - iter 39/138 - loss 3.19417174 - time (sec): 28.86 - samples/sec: 231.22 - lr: 0.000044 - momentum: 0.000000 2023-10-07 00:17:35,007 epoch 1 - iter 52/138 - loss 3.17802627 - time (sec): 38.75 - samples/sec: 228.81 - lr: 0.000059 - momentum: 0.000000 2023-10-07 00:17:45,119 epoch 1 - iter 65/138 - loss 3.15047188 - time (sec): 48.87 - samples/sec: 228.36 - lr: 0.000074 - momentum: 0.000000 2023-10-07 00:17:54,810 epoch 1 - iter 78/138 - loss 3.10582695 - time (sec): 58.56 - samples/sec: 227.63 - lr: 0.000089 - momentum: 0.000000 2023-10-07 00:18:04,288 epoch 1 - iter 91/138 - loss 3.04601336 - time (sec): 68.03 - samples/sec: 225.92 - lr: 0.000104 - momentum: 0.000000 2023-10-07 00:18:13,398 epoch 1 - iter 104/138 - loss 2.98001811 - time (sec): 77.14 - samples/sec: 225.36 - lr: 0.000119 - momentum: 0.000000 2023-10-07 00:18:22,907 epoch 1 - iter 117/138 - loss 2.90532182 - time (sec): 86.65 - samples/sec: 225.74 - lr: 0.000134 - momentum: 0.000000 2023-10-07 00:18:31,964 epoch 1 - iter 130/138 - loss 2.82859113 - time (sec): 95.71 - samples/sec: 224.81 - lr: 0.000150 - momentum: 0.000000 2023-10-07 00:18:37,712 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:18:37,713 EPOCH 1 done: loss 2.7748 - lr: 0.000150 2023-10-07 00:18:44,373 DEV : loss 1.7666748762130737 - f1-score (micro avg) 0.0 2023-10-07 00:18:44,378 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:18:54,057 epoch 2 - iter 13/138 - loss 1.66897079 - time (sec): 9.68 - samples/sec: 237.03 - lr: 0.000158 - momentum: 0.000000 2023-10-07 00:19:03,862 epoch 2 - iter 26/138 - loss 1.61192996 - time (sec): 19.48 - samples/sec: 233.80 - lr: 0.000157 - momentum: 0.000000 2023-10-07 00:19:13,618 epoch 2 - iter 39/138 - loss 1.50140710 - time (sec): 29.24 - samples/sec: 228.19 - lr: 0.000155 - momentum: 0.000000 2023-10-07 00:19:23,629 epoch 2 - iter 52/138 - loss 1.42184205 - time (sec): 39.25 - samples/sec: 229.10 - lr: 0.000153 - momentum: 0.000000 2023-10-07 00:19:33,185 epoch 2 - iter 65/138 - loss 1.35730933 - time (sec): 48.81 - samples/sec: 228.39 - lr: 0.000152 - momentum: 0.000000 2023-10-07 00:19:42,763 epoch 2 - iter 78/138 - loss 1.28942228 - time (sec): 58.38 - samples/sec: 227.49 - lr: 0.000150 - momentum: 0.000000 2023-10-07 00:19:52,009 epoch 2 - iter 91/138 - loss 1.25068538 - time (sec): 67.63 - samples/sec: 225.92 - lr: 0.000148 - momentum: 0.000000 2023-10-07 00:20:01,311 epoch 2 - iter 104/138 - loss 1.21595365 - time (sec): 76.93 - samples/sec: 225.74 - lr: 0.000147 - momentum: 0.000000 2023-10-07 00:20:10,549 epoch 2 - iter 117/138 - loss 1.17883841 - time (sec): 86.17 - samples/sec: 224.90 - lr: 0.000145 - momentum: 0.000000 2023-10-07 00:20:20,033 epoch 2 - iter 130/138 - loss 1.13417483 - time (sec): 95.65 - samples/sec: 225.32 - lr: 0.000143 - momentum: 0.000000 2023-10-07 00:20:25,523 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:20:25,523 EPOCH 2 done: loss 1.1077 - lr: 0.000143 2023-10-07 00:20:32,135 DEV : loss 0.7003989815711975 - f1-score (micro avg) 0.0 2023-10-07 00:20:32,140 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:20:41,926 epoch 3 - iter 13/138 - loss 0.72502244 - time (sec): 9.79 - samples/sec: 229.22 - lr: 0.000141 - momentum: 0.000000 2023-10-07 00:20:51,078 epoch 3 - iter 26/138 - loss 0.65066668 - time (sec): 18.94 - samples/sec: 227.55 - lr: 0.000139 - momentum: 0.000000 2023-10-07 00:21:01,085 epoch 3 - iter 39/138 - loss 0.61943348 - time (sec): 28.94 - samples/sec: 229.83 - lr: 0.000137 - momentum: 0.000000 2023-10-07 00:21:10,438 epoch 3 - iter 52/138 - loss 0.59005170 - time (sec): 38.30 - samples/sec: 227.02 - lr: 0.000136 - momentum: 0.000000 2023-10-07 00:21:20,650 epoch 3 - iter 65/138 - loss 0.56795034 - time (sec): 48.51 - samples/sec: 227.40 - lr: 0.000134 - momentum: 0.000000 2023-10-07 00:21:30,091 epoch 3 - iter 78/138 - loss 0.55126050 - time (sec): 57.95 - samples/sec: 226.27 - lr: 0.000132 - momentum: 0.000000 2023-10-07 00:21:39,729 epoch 3 - iter 91/138 - loss 0.53964843 - time (sec): 67.59 - samples/sec: 225.53 - lr: 0.000131 - momentum: 0.000000 2023-10-07 00:21:50,093 epoch 3 - iter 104/138 - loss 0.52402792 - time (sec): 77.95 - samples/sec: 226.96 - lr: 0.000129 - momentum: 0.000000 2023-10-07 00:21:58,814 epoch 3 - iter 117/138 - loss 0.50653508 - time (sec): 86.67 - samples/sec: 224.94 - lr: 0.000127 - momentum: 0.000000 2023-10-07 00:22:08,092 epoch 3 - iter 130/138 - loss 0.50564663 - time (sec): 95.95 - samples/sec: 225.20 - lr: 0.000126 - momentum: 0.000000 2023-10-07 00:22:13,392 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:22:13,393 EPOCH 3 done: loss 0.5048 - lr: 0.000126 2023-10-07 00:22:19,968 DEV : loss 0.3794858157634735 - f1-score (micro avg) 0.6922 2023-10-07 00:22:19,973 saving best model 2023-10-07 00:22:20,794 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:22:30,239 epoch 4 - iter 13/138 - loss 0.43133783 - time (sec): 9.44 - samples/sec: 230.72 - lr: 0.000123 - momentum: 0.000000 2023-10-07 00:22:39,901 epoch 4 - iter 26/138 - loss 0.40345726 - time (sec): 19.11 - samples/sec: 229.66 - lr: 0.000121 - momentum: 0.000000 2023-10-07 00:22:49,780 epoch 4 - iter 39/138 - loss 0.36943352 - time (sec): 28.99 - samples/sec: 227.63 - lr: 0.000120 - momentum: 0.000000 2023-10-07 00:22:58,746 epoch 4 - iter 52/138 - loss 0.35749811 - time (sec): 37.95 - samples/sec: 224.77 - lr: 0.000118 - momentum: 0.000000 2023-10-07 00:23:08,595 epoch 4 - iter 65/138 - loss 0.34188002 - time (sec): 47.80 - samples/sec: 225.17 - lr: 0.000116 - momentum: 0.000000 2023-10-07 00:23:18,799 epoch 4 - iter 78/138 - loss 0.33821449 - time (sec): 58.00 - samples/sec: 227.05 - lr: 0.000115 - momentum: 0.000000 2023-10-07 00:23:28,030 epoch 4 - iter 91/138 - loss 0.33026077 - time (sec): 67.24 - samples/sec: 225.80 - lr: 0.000113 - momentum: 0.000000 2023-10-07 00:23:37,401 epoch 4 - iter 104/138 - loss 0.31683232 - time (sec): 76.61 - samples/sec: 224.98 - lr: 0.000111 - momentum: 0.000000 2023-10-07 00:23:46,759 epoch 4 - iter 117/138 - loss 0.30643412 - time (sec): 85.96 - samples/sec: 223.64 - lr: 0.000110 - momentum: 0.000000 2023-10-07 00:23:56,107 epoch 4 - iter 130/138 - loss 0.30029371 - time (sec): 95.31 - samples/sec: 224.38 - lr: 0.000108 - momentum: 0.000000 2023-10-07 00:24:01,999 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:24:02,000 EPOCH 4 done: loss 0.2998 - lr: 0.000108 2023-10-07 00:24:08,609 DEV : loss 0.23874996602535248 - f1-score (micro avg) 0.7368 2023-10-07 00:24:08,614 saving best model 2023-10-07 00:24:09,482 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:24:19,027 epoch 5 - iter 13/138 - loss 0.18661904 - time (sec): 9.54 - samples/sec: 218.26 - lr: 0.000105 - momentum: 0.000000 2023-10-07 00:24:28,588 epoch 5 - iter 26/138 - loss 0.20138618 - time (sec): 19.10 - samples/sec: 225.60 - lr: 0.000104 - momentum: 0.000000 2023-10-07 00:24:37,974 epoch 5 - iter 39/138 - loss 0.19886661 - time (sec): 28.49 - samples/sec: 225.37 - lr: 0.000102 - momentum: 0.000000 2023-10-07 00:24:47,176 epoch 5 - iter 52/138 - loss 0.20697436 - time (sec): 37.69 - samples/sec: 223.55 - lr: 0.000100 - momentum: 0.000000 2023-10-07 00:24:56,628 epoch 5 - iter 65/138 - loss 0.20524823 - time (sec): 47.15 - samples/sec: 223.63 - lr: 0.000099 - momentum: 0.000000 2023-10-07 00:25:05,962 epoch 5 - iter 78/138 - loss 0.20253462 - time (sec): 56.48 - samples/sec: 222.14 - lr: 0.000097 - momentum: 0.000000 2023-10-07 00:25:16,129 epoch 5 - iter 91/138 - loss 0.19761339 - time (sec): 66.65 - samples/sec: 224.76 - lr: 0.000095 - momentum: 0.000000 2023-10-07 00:25:25,832 epoch 5 - iter 104/138 - loss 0.19216663 - time (sec): 76.35 - samples/sec: 224.44 - lr: 0.000094 - momentum: 0.000000 2023-10-07 00:25:35,033 epoch 5 - iter 117/138 - loss 0.19141989 - time (sec): 85.55 - samples/sec: 224.49 - lr: 0.000092 - momentum: 0.000000 2023-10-07 00:25:44,579 epoch 5 - iter 130/138 - loss 0.19119707 - time (sec): 95.10 - samples/sec: 224.68 - lr: 0.000090 - momentum: 0.000000 2023-10-07 00:25:50,580 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:25:50,580 EPOCH 5 done: loss 0.1888 - lr: 0.000090 2023-10-07 00:25:57,191 DEV : loss 0.17390672862529755 - f1-score (micro avg) 0.8109 2023-10-07 00:25:57,197 saving best model 2023-10-07 00:25:58,071 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:26:07,875 epoch 6 - iter 13/138 - loss 0.15130469 - time (sec): 9.80 - samples/sec: 227.41 - lr: 0.000088 - momentum: 0.000000 2023-10-07 00:26:16,705 epoch 6 - iter 26/138 - loss 0.16728482 - time (sec): 18.63 - samples/sec: 224.66 - lr: 0.000086 - momentum: 0.000000 2023-10-07 00:26:26,229 epoch 6 - iter 39/138 - loss 0.15492027 - time (sec): 28.16 - samples/sec: 224.65 - lr: 0.000084 - momentum: 0.000000 2023-10-07 00:26:35,243 epoch 6 - iter 52/138 - loss 0.15016055 - time (sec): 37.17 - samples/sec: 220.66 - lr: 0.000083 - momentum: 0.000000 2023-10-07 00:26:45,070 epoch 6 - iter 65/138 - loss 0.13933937 - time (sec): 47.00 - samples/sec: 221.18 - lr: 0.000081 - momentum: 0.000000 2023-10-07 00:26:55,337 epoch 6 - iter 78/138 - loss 0.14271374 - time (sec): 57.26 - samples/sec: 222.11 - lr: 0.000079 - momentum: 0.000000 2023-10-07 00:27:04,451 epoch 6 - iter 91/138 - loss 0.14215746 - time (sec): 66.38 - samples/sec: 221.87 - lr: 0.000077 - momentum: 0.000000 2023-10-07 00:27:14,226 epoch 6 - iter 104/138 - loss 0.14585207 - time (sec): 76.15 - samples/sec: 222.54 - lr: 0.000076 - momentum: 0.000000 2023-10-07 00:27:24,216 epoch 6 - iter 117/138 - loss 0.13980804 - time (sec): 86.14 - samples/sec: 223.35 - lr: 0.000074 - momentum: 0.000000 2023-10-07 00:27:33,816 epoch 6 - iter 130/138 - loss 0.13442442 - time (sec): 95.74 - samples/sec: 224.71 - lr: 0.000072 - momentum: 0.000000 2023-10-07 00:27:39,472 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:27:39,473 EPOCH 6 done: loss 0.1310 - lr: 0.000072 2023-10-07 00:27:46,101 DEV : loss 0.1577848196029663 - f1-score (micro avg) 0.8091 2023-10-07 00:27:46,106 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:27:54,966 epoch 7 - iter 13/138 - loss 0.11973989 - time (sec): 8.86 - samples/sec: 215.94 - lr: 0.000070 - momentum: 0.000000 2023-10-07 00:28:05,176 epoch 7 - iter 26/138 - loss 0.09471475 - time (sec): 19.07 - samples/sec: 220.99 - lr: 0.000068 - momentum: 0.000000 2023-10-07 00:28:14,894 epoch 7 - iter 39/138 - loss 0.09268126 - time (sec): 28.79 - samples/sec: 224.44 - lr: 0.000066 - momentum: 0.000000 2023-10-07 00:28:24,468 epoch 7 - iter 52/138 - loss 0.09018254 - time (sec): 38.36 - samples/sec: 225.88 - lr: 0.000065 - momentum: 0.000000 2023-10-07 00:28:34,181 epoch 7 - iter 65/138 - loss 0.09647109 - time (sec): 48.07 - samples/sec: 227.71 - lr: 0.000063 - momentum: 0.000000 2023-10-07 00:28:44,052 epoch 7 - iter 78/138 - loss 0.10063479 - time (sec): 57.95 - samples/sec: 229.08 - lr: 0.000061 - momentum: 0.000000 2023-10-07 00:28:52,882 epoch 7 - iter 91/138 - loss 0.09702574 - time (sec): 66.78 - samples/sec: 226.84 - lr: 0.000060 - momentum: 0.000000 2023-10-07 00:29:02,611 epoch 7 - iter 104/138 - loss 0.09916712 - time (sec): 76.50 - samples/sec: 226.41 - lr: 0.000058 - momentum: 0.000000 2023-10-07 00:29:11,940 epoch 7 - iter 117/138 - loss 0.10195852 - time (sec): 85.83 - samples/sec: 225.47 - lr: 0.000056 - momentum: 0.000000 2023-10-07 00:29:21,630 epoch 7 - iter 130/138 - loss 0.09589982 - time (sec): 95.52 - samples/sec: 224.78 - lr: 0.000055 - momentum: 0.000000 2023-10-07 00:29:27,353 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:29:27,354 EPOCH 7 done: loss 0.0970 - lr: 0.000055 2023-10-07 00:29:33,977 DEV : loss 0.13684943318367004 - f1-score (micro avg) 0.8394 2023-10-07 00:29:33,982 saving best model 2023-10-07 00:29:34,854 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:29:43,896 epoch 8 - iter 13/138 - loss 0.11585992 - time (sec): 9.04 - samples/sec: 226.32 - lr: 0.000052 - momentum: 0.000000 2023-10-07 00:29:53,558 epoch 8 - iter 26/138 - loss 0.09084112 - time (sec): 18.70 - samples/sec: 224.84 - lr: 0.000050 - momentum: 0.000000 2023-10-07 00:30:02,560 epoch 8 - iter 39/138 - loss 0.07986481 - time (sec): 27.70 - samples/sec: 224.66 - lr: 0.000049 - momentum: 0.000000 2023-10-07 00:30:11,704 epoch 8 - iter 52/138 - loss 0.08459644 - time (sec): 36.85 - samples/sec: 225.19 - lr: 0.000047 - momentum: 0.000000 2023-10-07 00:30:21,294 epoch 8 - iter 65/138 - loss 0.08168173 - time (sec): 46.44 - samples/sec: 225.40 - lr: 0.000045 - momentum: 0.000000 2023-10-07 00:30:31,306 epoch 8 - iter 78/138 - loss 0.08031317 - time (sec): 56.45 - samples/sec: 226.70 - lr: 0.000044 - momentum: 0.000000 2023-10-07 00:30:41,171 epoch 8 - iter 91/138 - loss 0.08060537 - time (sec): 66.32 - samples/sec: 225.96 - lr: 0.000042 - momentum: 0.000000 2023-10-07 00:30:51,906 epoch 8 - iter 104/138 - loss 0.07999972 - time (sec): 77.05 - samples/sec: 226.34 - lr: 0.000040 - momentum: 0.000000 2023-10-07 00:31:01,659 epoch 8 - iter 117/138 - loss 0.07927948 - time (sec): 86.80 - samples/sec: 226.30 - lr: 0.000039 - momentum: 0.000000 2023-10-07 00:31:10,216 epoch 8 - iter 130/138 - loss 0.08110622 - time (sec): 95.36 - samples/sec: 225.09 - lr: 0.000037 - momentum: 0.000000 2023-10-07 00:31:16,089 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:31:16,089 EPOCH 8 done: loss 0.0784 - lr: 0.000037 2023-10-07 00:31:22,728 DEV : loss 0.12365750968456268 - f1-score (micro avg) 0.8599 2023-10-07 00:31:22,733 saving best model 2023-10-07 00:31:23,611 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:31:32,442 epoch 9 - iter 13/138 - loss 0.07018430 - time (sec): 8.83 - samples/sec: 216.99 - lr: 0.000034 - momentum: 0.000000 2023-10-07 00:31:42,139 epoch 9 - iter 26/138 - loss 0.07333448 - time (sec): 18.53 - samples/sec: 223.40 - lr: 0.000033 - momentum: 0.000000 2023-10-07 00:31:51,697 epoch 9 - iter 39/138 - loss 0.08003951 - time (sec): 28.09 - samples/sec: 224.46 - lr: 0.000031 - momentum: 0.000000 2023-10-07 00:32:00,287 epoch 9 - iter 52/138 - loss 0.07785033 - time (sec): 36.67 - samples/sec: 221.27 - lr: 0.000029 - momentum: 0.000000 2023-10-07 00:32:09,908 epoch 9 - iter 65/138 - loss 0.07606347 - time (sec): 46.30 - samples/sec: 221.96 - lr: 0.000028 - momentum: 0.000000 2023-10-07 00:32:20,128 epoch 9 - iter 78/138 - loss 0.07173574 - time (sec): 56.52 - samples/sec: 222.70 - lr: 0.000026 - momentum: 0.000000 2023-10-07 00:32:30,079 epoch 9 - iter 91/138 - loss 0.07173073 - time (sec): 66.47 - samples/sec: 224.59 - lr: 0.000024 - momentum: 0.000000 2023-10-07 00:32:39,699 epoch 9 - iter 104/138 - loss 0.06641157 - time (sec): 76.09 - samples/sec: 224.68 - lr: 0.000023 - momentum: 0.000000 2023-10-07 00:32:49,859 epoch 9 - iter 117/138 - loss 0.06555111 - time (sec): 86.25 - samples/sec: 224.90 - lr: 0.000021 - momentum: 0.000000 2023-10-07 00:32:58,973 epoch 9 - iter 130/138 - loss 0.06634256 - time (sec): 95.36 - samples/sec: 224.20 - lr: 0.000019 - momentum: 0.000000 2023-10-07 00:33:04,871 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:33:04,871 EPOCH 9 done: loss 0.0672 - lr: 0.000019 2023-10-07 00:33:11,528 DEV : loss 0.12403400987386703 - f1-score (micro avg) 0.8643 2023-10-07 00:33:11,533 saving best model 2023-10-07 00:33:12,413 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:33:21,759 epoch 10 - iter 13/138 - loss 0.05474985 - time (sec): 9.34 - samples/sec: 217.35 - lr: 0.000017 - momentum: 0.000000 2023-10-07 00:33:31,975 epoch 10 - iter 26/138 - loss 0.05153700 - time (sec): 19.56 - samples/sec: 218.15 - lr: 0.000015 - momentum: 0.000000 2023-10-07 00:33:41,167 epoch 10 - iter 39/138 - loss 0.06022659 - time (sec): 28.75 - samples/sec: 218.18 - lr: 0.000013 - momentum: 0.000000 2023-10-07 00:33:50,209 epoch 10 - iter 52/138 - loss 0.06304792 - time (sec): 37.79 - samples/sec: 218.05 - lr: 0.000012 - momentum: 0.000000 2023-10-07 00:34:00,055 epoch 10 - iter 65/138 - loss 0.06265450 - time (sec): 47.64 - samples/sec: 220.87 - lr: 0.000010 - momentum: 0.000000 2023-10-07 00:34:09,514 epoch 10 - iter 78/138 - loss 0.06199970 - time (sec): 57.10 - samples/sec: 222.35 - lr: 0.000008 - momentum: 0.000000 2023-10-07 00:34:18,612 epoch 10 - iter 91/138 - loss 0.06346734 - time (sec): 66.20 - samples/sec: 221.91 - lr: 0.000007 - momentum: 0.000000 2023-10-07 00:34:28,346 epoch 10 - iter 104/138 - loss 0.06295073 - time (sec): 75.93 - samples/sec: 222.27 - lr: 0.000005 - momentum: 0.000000 2023-10-07 00:34:37,953 epoch 10 - iter 117/138 - loss 0.06330168 - time (sec): 85.54 - samples/sec: 222.57 - lr: 0.000003 - momentum: 0.000000 2023-10-07 00:34:48,160 epoch 10 - iter 130/138 - loss 0.06429653 - time (sec): 95.74 - samples/sec: 224.22 - lr: 0.000002 - momentum: 0.000000 2023-10-07 00:34:53,765 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:34:53,766 EPOCH 10 done: loss 0.0629 - lr: 0.000002 2023-10-07 00:35:00,445 DEV : loss 0.12417016178369522 - f1-score (micro avg) 0.8599 2023-10-07 00:35:01,284 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:35:01,285 Loading model from best epoch ... 2023-10-07 00:35:03,806 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-07 00:35:11,017 Results: - F-score (micro) 0.8886 - F-score (macro) 0.5281 - Accuracy 0.8147 By class: precision recall f1-score support scope 0.8939 0.9091 0.9014 176 pers 0.9104 0.9531 0.9313 128 work 0.7922 0.8243 0.8079 74 object 0.0000 0.0000 0.0000 2 loc 0.0000 0.0000 0.0000 2 micro avg 0.8795 0.8979 0.8886 382 macro avg 0.5193 0.5373 0.5281 382 weighted avg 0.8704 0.8979 0.8839 382 2023-10-07 00:35:11,017 ----------------------------------------------------------------------------------------------------