2023-10-07 00:55:30,892 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,893 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-07 00:55:30,893 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,894 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-07 00:55:30,894 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,894 Train: 1100 sentences 2023-10-07 00:55:30,894 (train_with_dev=False, train_with_test=False) 2023-10-07 00:55:30,894 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,894 Training Params: 2023-10-07 00:55:30,894 - learning_rate: "0.00016" 2023-10-07 00:55:30,894 - mini_batch_size: "4" 2023-10-07 00:55:30,894 - max_epochs: "10" 2023-10-07 00:55:30,894 - shuffle: "True" 2023-10-07 00:55:30,894 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,894 Plugins: 2023-10-07 00:55:30,894 - TensorboardLogger 2023-10-07 00:55:30,894 - LinearScheduler | warmup_fraction: '0.1' 2023-10-07 00:55:30,894 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,894 Final evaluation on model from best epoch (best-model.pt) 2023-10-07 00:55:30,894 - metric: "('micro avg', 'f1-score')" 2023-10-07 00:55:30,895 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,895 Computation: 2023-10-07 00:55:30,895 - compute on device: cuda:0 2023-10-07 00:55:30,895 - embedding storage: none 2023-10-07 00:55:30,895 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,895 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-07 00:55:30,895 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,895 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:55:30,895 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-07 00:55:41,090 epoch 1 - iter 27/275 - loss 3.24443246 - time (sec): 10.19 - samples/sec: 210.41 - lr: 0.000015 - momentum: 0.000000 2023-10-07 00:55:51,393 epoch 1 - iter 54/275 - loss 3.23701926 - time (sec): 20.50 - samples/sec: 212.66 - lr: 0.000031 - momentum: 0.000000 2023-10-07 00:56:02,480 epoch 1 - iter 81/275 - loss 3.21453984 - time (sec): 31.58 - samples/sec: 218.40 - lr: 0.000047 - momentum: 0.000000 2023-10-07 00:56:13,307 epoch 1 - iter 108/275 - loss 3.16837283 - time (sec): 42.41 - samples/sec: 215.20 - lr: 0.000062 - momentum: 0.000000 2023-10-07 00:56:23,838 epoch 1 - iter 135/275 - loss 3.08279244 - time (sec): 52.94 - samples/sec: 215.20 - lr: 0.000078 - momentum: 0.000000 2023-10-07 00:56:34,051 epoch 1 - iter 162/275 - loss 2.98754839 - time (sec): 63.15 - samples/sec: 212.89 - lr: 0.000094 - momentum: 0.000000 2023-10-07 00:56:44,247 epoch 1 - iter 189/275 - loss 2.88239138 - time (sec): 73.35 - samples/sec: 211.69 - lr: 0.000109 - momentum: 0.000000 2023-10-07 00:56:55,468 epoch 1 - iter 216/275 - loss 2.74673167 - time (sec): 84.57 - samples/sec: 212.75 - lr: 0.000125 - momentum: 0.000000 2023-10-07 00:57:05,742 epoch 1 - iter 243/275 - loss 2.62261849 - time (sec): 94.85 - samples/sec: 212.90 - lr: 0.000141 - momentum: 0.000000 2023-10-07 00:57:16,262 epoch 1 - iter 270/275 - loss 2.48968329 - time (sec): 105.37 - samples/sec: 212.62 - lr: 0.000157 - momentum: 0.000000 2023-10-07 00:57:18,045 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:57:18,046 EPOCH 1 done: loss 2.4709 - lr: 0.000157 2023-10-07 00:57:24,299 DEV : loss 1.115329623222351 - f1-score (micro avg) 0.0 2023-10-07 00:57:24,305 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:57:34,951 epoch 2 - iter 27/275 - loss 1.00094398 - time (sec): 10.65 - samples/sec: 216.80 - lr: 0.000158 - momentum: 0.000000 2023-10-07 00:57:45,264 epoch 2 - iter 54/275 - loss 0.94362358 - time (sec): 20.96 - samples/sec: 208.08 - lr: 0.000157 - momentum: 0.000000 2023-10-07 00:57:55,177 epoch 2 - iter 81/275 - loss 0.91479557 - time (sec): 30.87 - samples/sec: 204.89 - lr: 0.000155 - momentum: 0.000000 2023-10-07 00:58:05,579 epoch 2 - iter 108/275 - loss 0.87818056 - time (sec): 41.27 - samples/sec: 207.62 - lr: 0.000153 - momentum: 0.000000 2023-10-07 00:58:16,280 epoch 2 - iter 135/275 - loss 0.82434084 - time (sec): 51.97 - samples/sec: 209.03 - lr: 0.000151 - momentum: 0.000000 2023-10-07 00:58:26,740 epoch 2 - iter 162/275 - loss 0.79135315 - time (sec): 62.43 - samples/sec: 208.63 - lr: 0.000150 - momentum: 0.000000 2023-10-07 00:58:38,413 epoch 2 - iter 189/275 - loss 0.75582299 - time (sec): 74.11 - samples/sec: 210.20 - lr: 0.000148 - momentum: 0.000000 2023-10-07 00:58:48,976 epoch 2 - iter 216/275 - loss 0.72861169 - time (sec): 84.67 - samples/sec: 210.75 - lr: 0.000146 - momentum: 0.000000 2023-10-07 00:58:59,465 epoch 2 - iter 243/275 - loss 0.70252342 - time (sec): 95.16 - samples/sec: 211.41 - lr: 0.000144 - momentum: 0.000000 2023-10-07 00:59:09,614 epoch 2 - iter 270/275 - loss 0.68296208 - time (sec): 105.31 - samples/sec: 211.65 - lr: 0.000143 - momentum: 0.000000 2023-10-07 00:59:11,778 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:59:11,778 EPOCH 2 done: loss 0.6746 - lr: 0.000143 2023-10-07 00:59:18,294 DEV : loss 0.41209274530410767 - f1-score (micro avg) 0.5789 2023-10-07 00:59:18,299 saving best model 2023-10-07 00:59:19,111 ---------------------------------------------------------------------------------------------------- 2023-10-07 00:59:29,314 epoch 3 - iter 27/275 - loss 0.41221825 - time (sec): 10.20 - samples/sec: 207.51 - lr: 0.000141 - momentum: 0.000000 2023-10-07 00:59:40,081 epoch 3 - iter 54/275 - loss 0.40527715 - time (sec): 20.97 - samples/sec: 213.23 - lr: 0.000139 - momentum: 0.000000 2023-10-07 00:59:51,134 epoch 3 - iter 81/275 - loss 0.36241217 - time (sec): 32.02 - samples/sec: 214.61 - lr: 0.000137 - momentum: 0.000000 2023-10-07 01:00:02,062 epoch 3 - iter 108/275 - loss 0.34465609 - time (sec): 42.95 - samples/sec: 213.46 - lr: 0.000135 - momentum: 0.000000 2023-10-07 01:00:12,289 epoch 3 - iter 135/275 - loss 0.33797591 - time (sec): 53.18 - samples/sec: 212.74 - lr: 0.000134 - momentum: 0.000000 2023-10-07 01:00:22,648 epoch 3 - iter 162/275 - loss 0.32917430 - time (sec): 63.54 - samples/sec: 212.01 - lr: 0.000132 - momentum: 0.000000 2023-10-07 01:00:33,465 epoch 3 - iter 189/275 - loss 0.31514816 - time (sec): 74.35 - samples/sec: 213.56 - lr: 0.000130 - momentum: 0.000000 2023-10-07 01:00:43,891 epoch 3 - iter 216/275 - loss 0.30660635 - time (sec): 84.78 - samples/sec: 212.45 - lr: 0.000128 - momentum: 0.000000 2023-10-07 01:00:54,409 epoch 3 - iter 243/275 - loss 0.29992221 - time (sec): 95.30 - samples/sec: 212.08 - lr: 0.000127 - momentum: 0.000000 2023-10-07 01:01:04,911 epoch 3 - iter 270/275 - loss 0.29138113 - time (sec): 105.80 - samples/sec: 211.92 - lr: 0.000125 - momentum: 0.000000 2023-10-07 01:01:06,716 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:01:06,716 EPOCH 3 done: loss 0.2913 - lr: 0.000125 2023-10-07 01:01:13,165 DEV : loss 0.21348148584365845 - f1-score (micro avg) 0.763 2023-10-07 01:01:13,170 saving best model 2023-10-07 01:01:14,049 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:01:23,346 epoch 4 - iter 27/275 - loss 0.20827435 - time (sec): 9.30 - samples/sec: 189.87 - lr: 0.000123 - momentum: 0.000000 2023-10-07 01:01:34,246 epoch 4 - iter 54/275 - loss 0.17653774 - time (sec): 20.20 - samples/sec: 205.14 - lr: 0.000121 - momentum: 0.000000 2023-10-07 01:01:44,895 epoch 4 - iter 81/275 - loss 0.18622337 - time (sec): 30.85 - samples/sec: 209.63 - lr: 0.000119 - momentum: 0.000000 2023-10-07 01:01:55,896 epoch 4 - iter 108/275 - loss 0.18424493 - time (sec): 41.85 - samples/sec: 210.58 - lr: 0.000118 - momentum: 0.000000 2023-10-07 01:02:05,965 epoch 4 - iter 135/275 - loss 0.18539041 - time (sec): 51.92 - samples/sec: 208.98 - lr: 0.000116 - momentum: 0.000000 2023-10-07 01:02:16,802 epoch 4 - iter 162/275 - loss 0.17445705 - time (sec): 62.75 - samples/sec: 209.09 - lr: 0.000114 - momentum: 0.000000 2023-10-07 01:02:27,041 epoch 4 - iter 189/275 - loss 0.16461385 - time (sec): 72.99 - samples/sec: 208.15 - lr: 0.000112 - momentum: 0.000000 2023-10-07 01:02:38,001 epoch 4 - iter 216/275 - loss 0.16130998 - time (sec): 83.95 - samples/sec: 209.03 - lr: 0.000111 - momentum: 0.000000 2023-10-07 01:02:49,387 epoch 4 - iter 243/275 - loss 0.15975395 - time (sec): 95.34 - samples/sec: 209.13 - lr: 0.000109 - momentum: 0.000000 2023-10-07 01:03:00,756 epoch 4 - iter 270/275 - loss 0.15506250 - time (sec): 106.71 - samples/sec: 208.71 - lr: 0.000107 - momentum: 0.000000 2023-10-07 01:03:03,045 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:03:03,045 EPOCH 4 done: loss 0.1550 - lr: 0.000107 2023-10-07 01:03:09,671 DEV : loss 0.1472424566745758 - f1-score (micro avg) 0.8397 2023-10-07 01:03:09,676 saving best model 2023-10-07 01:03:10,560 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:03:20,858 epoch 5 - iter 27/275 - loss 0.11595235 - time (sec): 10.30 - samples/sec: 212.98 - lr: 0.000105 - momentum: 0.000000 2023-10-07 01:03:31,113 epoch 5 - iter 54/275 - loss 0.10138073 - time (sec): 20.55 - samples/sec: 204.61 - lr: 0.000103 - momentum: 0.000000 2023-10-07 01:03:42,404 epoch 5 - iter 81/275 - loss 0.08570661 - time (sec): 31.84 - samples/sec: 209.75 - lr: 0.000102 - momentum: 0.000000 2023-10-07 01:03:52,573 epoch 5 - iter 108/275 - loss 0.07846763 - time (sec): 42.01 - samples/sec: 209.99 - lr: 0.000100 - momentum: 0.000000 2023-10-07 01:04:03,575 epoch 5 - iter 135/275 - loss 0.08006098 - time (sec): 53.01 - samples/sec: 211.34 - lr: 0.000098 - momentum: 0.000000 2023-10-07 01:04:13,820 epoch 5 - iter 162/275 - loss 0.08048465 - time (sec): 63.26 - samples/sec: 210.29 - lr: 0.000096 - momentum: 0.000000 2023-10-07 01:04:24,836 epoch 5 - iter 189/275 - loss 0.08506759 - time (sec): 74.28 - samples/sec: 211.32 - lr: 0.000095 - momentum: 0.000000 2023-10-07 01:04:35,413 epoch 5 - iter 216/275 - loss 0.08461603 - time (sec): 84.85 - samples/sec: 211.11 - lr: 0.000093 - momentum: 0.000000 2023-10-07 01:04:46,683 epoch 5 - iter 243/275 - loss 0.08541012 - time (sec): 96.12 - samples/sec: 212.00 - lr: 0.000091 - momentum: 0.000000 2023-10-07 01:04:57,110 epoch 5 - iter 270/275 - loss 0.09192583 - time (sec): 106.55 - samples/sec: 210.93 - lr: 0.000089 - momentum: 0.000000 2023-10-07 01:04:58,847 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:04:58,847 EPOCH 5 done: loss 0.0916 - lr: 0.000089 2023-10-07 01:05:05,513 DEV : loss 0.13818754255771637 - f1-score (micro avg) 0.8473 2023-10-07 01:05:05,518 saving best model 2023-10-07 01:05:06,393 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:05:17,437 epoch 6 - iter 27/275 - loss 0.06324321 - time (sec): 11.04 - samples/sec: 218.43 - lr: 0.000087 - momentum: 0.000000 2023-10-07 01:05:28,473 epoch 6 - iter 54/275 - loss 0.07326289 - time (sec): 22.08 - samples/sec: 214.06 - lr: 0.000086 - momentum: 0.000000 2023-10-07 01:05:38,955 epoch 6 - iter 81/275 - loss 0.07296084 - time (sec): 32.56 - samples/sec: 212.50 - lr: 0.000084 - momentum: 0.000000 2023-10-07 01:05:49,663 epoch 6 - iter 108/275 - loss 0.06654062 - time (sec): 43.27 - samples/sec: 209.30 - lr: 0.000082 - momentum: 0.000000 2023-10-07 01:05:59,765 epoch 6 - iter 135/275 - loss 0.07514183 - time (sec): 53.37 - samples/sec: 207.47 - lr: 0.000080 - momentum: 0.000000 2023-10-07 01:06:10,720 epoch 6 - iter 162/275 - loss 0.07133695 - time (sec): 64.33 - samples/sec: 208.75 - lr: 0.000079 - momentum: 0.000000 2023-10-07 01:06:21,532 epoch 6 - iter 189/275 - loss 0.07266455 - time (sec): 75.14 - samples/sec: 208.72 - lr: 0.000077 - momentum: 0.000000 2023-10-07 01:06:32,273 epoch 6 - iter 216/275 - loss 0.06751770 - time (sec): 85.88 - samples/sec: 208.87 - lr: 0.000075 - momentum: 0.000000 2023-10-07 01:06:42,182 epoch 6 - iter 243/275 - loss 0.06759819 - time (sec): 95.79 - samples/sec: 208.15 - lr: 0.000073 - momentum: 0.000000 2023-10-07 01:06:53,776 epoch 6 - iter 270/275 - loss 0.06724337 - time (sec): 107.38 - samples/sec: 208.17 - lr: 0.000072 - momentum: 0.000000 2023-10-07 01:06:55,742 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:06:55,742 EPOCH 6 done: loss 0.0679 - lr: 0.000072 2023-10-07 01:07:02,383 DEV : loss 0.13090500235557556 - f1-score (micro avg) 0.8681 2023-10-07 01:07:02,389 saving best model 2023-10-07 01:07:03,269 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:07:14,331 epoch 7 - iter 27/275 - loss 0.06572405 - time (sec): 11.06 - samples/sec: 209.40 - lr: 0.000070 - momentum: 0.000000 2023-10-07 01:07:24,829 epoch 7 - iter 54/275 - loss 0.06193543 - time (sec): 21.56 - samples/sec: 210.87 - lr: 0.000068 - momentum: 0.000000 2023-10-07 01:07:34,746 epoch 7 - iter 81/275 - loss 0.06948728 - time (sec): 31.48 - samples/sec: 204.61 - lr: 0.000066 - momentum: 0.000000 2023-10-07 01:07:45,704 epoch 7 - iter 108/275 - loss 0.05424317 - time (sec): 42.43 - samples/sec: 204.93 - lr: 0.000064 - momentum: 0.000000 2023-10-07 01:07:55,958 epoch 7 - iter 135/275 - loss 0.04896158 - time (sec): 52.69 - samples/sec: 204.34 - lr: 0.000063 - momentum: 0.000000 2023-10-07 01:08:06,396 epoch 7 - iter 162/275 - loss 0.04789517 - time (sec): 63.13 - samples/sec: 204.42 - lr: 0.000061 - momentum: 0.000000 2023-10-07 01:08:18,096 epoch 7 - iter 189/275 - loss 0.05134562 - time (sec): 74.83 - samples/sec: 207.15 - lr: 0.000059 - momentum: 0.000000 2023-10-07 01:08:28,755 epoch 7 - iter 216/275 - loss 0.05370121 - time (sec): 85.48 - samples/sec: 207.87 - lr: 0.000058 - momentum: 0.000000 2023-10-07 01:08:39,841 epoch 7 - iter 243/275 - loss 0.05143749 - time (sec): 96.57 - samples/sec: 208.56 - lr: 0.000056 - momentum: 0.000000 2023-10-07 01:08:50,902 epoch 7 - iter 270/275 - loss 0.05373658 - time (sec): 107.63 - samples/sec: 207.46 - lr: 0.000054 - momentum: 0.000000 2023-10-07 01:08:52,924 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:08:52,924 EPOCH 7 done: loss 0.0533 - lr: 0.000054 2023-10-07 01:08:59,544 DEV : loss 0.13015246391296387 - f1-score (micro avg) 0.8843 2023-10-07 01:08:59,549 saving best model 2023-10-07 01:09:00,420 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:09:11,331 epoch 8 - iter 27/275 - loss 0.05000391 - time (sec): 10.91 - samples/sec: 210.46 - lr: 0.000052 - momentum: 0.000000 2023-10-07 01:09:22,473 epoch 8 - iter 54/275 - loss 0.04632122 - time (sec): 22.05 - samples/sec: 211.42 - lr: 0.000050 - momentum: 0.000000 2023-10-07 01:09:32,739 epoch 8 - iter 81/275 - loss 0.03773412 - time (sec): 32.32 - samples/sec: 209.24 - lr: 0.000048 - momentum: 0.000000 2023-10-07 01:09:43,508 epoch 8 - iter 108/275 - loss 0.04268720 - time (sec): 43.09 - samples/sec: 209.53 - lr: 0.000047 - momentum: 0.000000 2023-10-07 01:09:53,550 epoch 8 - iter 135/275 - loss 0.04063720 - time (sec): 53.13 - samples/sec: 207.21 - lr: 0.000045 - momentum: 0.000000 2023-10-07 01:10:04,334 epoch 8 - iter 162/275 - loss 0.04177450 - time (sec): 63.91 - samples/sec: 207.08 - lr: 0.000043 - momentum: 0.000000 2023-10-07 01:10:15,171 epoch 8 - iter 189/275 - loss 0.04050759 - time (sec): 74.75 - samples/sec: 207.00 - lr: 0.000042 - momentum: 0.000000 2023-10-07 01:10:26,152 epoch 8 - iter 216/275 - loss 0.03942396 - time (sec): 85.73 - samples/sec: 206.96 - lr: 0.000040 - momentum: 0.000000 2023-10-07 01:10:37,454 epoch 8 - iter 243/275 - loss 0.03983832 - time (sec): 97.03 - samples/sec: 207.50 - lr: 0.000038 - momentum: 0.000000 2023-10-07 01:10:48,041 epoch 8 - iter 270/275 - loss 0.04350975 - time (sec): 107.62 - samples/sec: 207.70 - lr: 0.000036 - momentum: 0.000000 2023-10-07 01:10:50,082 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:10:50,082 EPOCH 8 done: loss 0.0435 - lr: 0.000036 2023-10-07 01:10:56,737 DEV : loss 0.1378580927848816 - f1-score (micro avg) 0.8678 2023-10-07 01:10:56,742 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:11:07,588 epoch 9 - iter 27/275 - loss 0.03043676 - time (sec): 10.84 - samples/sec: 205.82 - lr: 0.000034 - momentum: 0.000000 2023-10-07 01:11:18,824 epoch 9 - iter 54/275 - loss 0.03274688 - time (sec): 22.08 - samples/sec: 209.51 - lr: 0.000032 - momentum: 0.000000 2023-10-07 01:11:29,646 epoch 9 - iter 81/275 - loss 0.03587991 - time (sec): 32.90 - samples/sec: 210.53 - lr: 0.000031 - momentum: 0.000000 2023-10-07 01:11:41,475 epoch 9 - iter 108/275 - loss 0.03678092 - time (sec): 44.73 - samples/sec: 214.17 - lr: 0.000029 - momentum: 0.000000 2023-10-07 01:11:51,930 epoch 9 - iter 135/275 - loss 0.03250785 - time (sec): 55.19 - samples/sec: 211.21 - lr: 0.000027 - momentum: 0.000000 2023-10-07 01:12:03,378 epoch 9 - iter 162/275 - loss 0.03490330 - time (sec): 66.63 - samples/sec: 210.84 - lr: 0.000026 - momentum: 0.000000 2023-10-07 01:12:13,323 epoch 9 - iter 189/275 - loss 0.03206847 - time (sec): 76.58 - samples/sec: 209.19 - lr: 0.000024 - momentum: 0.000000 2023-10-07 01:12:24,299 epoch 9 - iter 216/275 - loss 0.03398931 - time (sec): 87.56 - samples/sec: 209.22 - lr: 0.000022 - momentum: 0.000000 2023-10-07 01:12:34,742 epoch 9 - iter 243/275 - loss 0.03971760 - time (sec): 98.00 - samples/sec: 208.16 - lr: 0.000020 - momentum: 0.000000 2023-10-07 01:12:44,730 epoch 9 - iter 270/275 - loss 0.03899938 - time (sec): 107.99 - samples/sec: 207.91 - lr: 0.000019 - momentum: 0.000000 2023-10-07 01:12:46,542 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:12:46,543 EPOCH 9 done: loss 0.0390 - lr: 0.000019 2023-10-07 01:12:53,114 DEV : loss 0.1367572695016861 - f1-score (micro avg) 0.8743 2023-10-07 01:12:53,120 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:13:03,405 epoch 10 - iter 27/275 - loss 0.01399907 - time (sec): 10.28 - samples/sec: 203.42 - lr: 0.000017 - momentum: 0.000000 2023-10-07 01:13:13,637 epoch 10 - iter 54/275 - loss 0.01049303 - time (sec): 20.52 - samples/sec: 205.69 - lr: 0.000015 - momentum: 0.000000 2023-10-07 01:13:24,392 epoch 10 - iter 81/275 - loss 0.01595940 - time (sec): 31.27 - samples/sec: 207.80 - lr: 0.000013 - momentum: 0.000000 2023-10-07 01:13:34,981 epoch 10 - iter 108/275 - loss 0.02816700 - time (sec): 41.86 - samples/sec: 207.88 - lr: 0.000011 - momentum: 0.000000 2023-10-07 01:13:46,448 epoch 10 - iter 135/275 - loss 0.02794080 - time (sec): 53.33 - samples/sec: 210.32 - lr: 0.000010 - momentum: 0.000000 2023-10-07 01:13:56,726 epoch 10 - iter 162/275 - loss 0.03455779 - time (sec): 63.61 - samples/sec: 210.60 - lr: 0.000008 - momentum: 0.000000 2023-10-07 01:14:07,438 epoch 10 - iter 189/275 - loss 0.03289012 - time (sec): 74.32 - samples/sec: 210.60 - lr: 0.000006 - momentum: 0.000000 2023-10-07 01:14:17,805 epoch 10 - iter 216/275 - loss 0.03201020 - time (sec): 84.68 - samples/sec: 209.86 - lr: 0.000004 - momentum: 0.000000 2023-10-07 01:14:28,887 epoch 10 - iter 243/275 - loss 0.03402215 - time (sec): 95.77 - samples/sec: 210.30 - lr: 0.000003 - momentum: 0.000000 2023-10-07 01:14:39,054 epoch 10 - iter 270/275 - loss 0.03636924 - time (sec): 105.93 - samples/sec: 209.25 - lr: 0.000001 - momentum: 0.000000 2023-10-07 01:14:41,548 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:14:41,548 EPOCH 10 done: loss 0.0355 - lr: 0.000001 2023-10-07 01:14:48,236 DEV : loss 0.13855089247226715 - f1-score (micro avg) 0.8746 2023-10-07 01:14:49,079 ---------------------------------------------------------------------------------------------------- 2023-10-07 01:14:49,080 Loading model from best epoch ... 2023-10-07 01:14:51,806 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-07 01:14:58,828 Results: - F-score (micro) 0.8869 - F-score (macro) 0.6619 - Accuracy 0.8158 By class: precision recall f1-score support scope 0.8696 0.9091 0.8889 176 pers 0.9297 0.9297 0.9297 128 work 0.8243 0.8243 0.8243 74 loc 1.0000 0.5000 0.6667 2 object 0.0000 0.0000 0.0000 2 micro avg 0.8811 0.8927 0.8869 382 macro avg 0.7247 0.6326 0.6619 382 weighted avg 0.8771 0.8927 0.8842 382 2023-10-07 01:14:58,828 ----------------------------------------------------------------------------------------------------