2023-10-06 14:54:04,117 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,118 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 14:54:04,118 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,118 MultiCorpus: 1214 train + 266 dev + 251 test sentences - NER_HIPE_2022 Corpus: 1214 train + 266 dev + 251 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/en/with_doc_seperator 2023-10-06 14:54:04,119 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,119 Train: 1214 sentences 2023-10-06 14:54:04,119 (train_with_dev=False, train_with_test=False) 2023-10-06 14:54:04,119 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,119 Training Params: 2023-10-06 14:54:04,119 - learning_rate: "0.00016" 2023-10-06 14:54:04,119 - mini_batch_size: "8" 2023-10-06 14:54:04,119 - max_epochs: "10" 2023-10-06 14:54:04,119 - shuffle: "True" 2023-10-06 14:54:04,119 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,119 Plugins: 2023-10-06 14:54:04,119 - TensorboardLogger 2023-10-06 14:54:04,119 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 14:54:04,119 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,119 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 14:54:04,119 - metric: "('micro avg', 'f1-score')" 2023-10-06 14:54:04,119 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,120 Computation: 2023-10-06 14:54:04,120 - compute on device: cuda:0 2023-10-06 14:54:04,120 - embedding storage: none 2023-10-06 14:54:04,120 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,120 Model training base path: "hmbench-ajmc/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-06 14:54:04,120 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,120 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:54:04,120 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 14:54:15,045 epoch 1 - iter 15/152 - loss 3.24944544 - time (sec): 10.92 - samples/sec: 280.48 - lr: 0.000015 - momentum: 0.000000 2023-10-06 14:54:26,358 epoch 1 - iter 30/152 - loss 3.24250264 - time (sec): 22.24 - samples/sec: 277.56 - lr: 0.000031 - momentum: 0.000000 2023-10-06 14:54:37,201 epoch 1 - iter 45/152 - loss 3.23097880 - time (sec): 33.08 - samples/sec: 276.79 - lr: 0.000046 - momentum: 0.000000 2023-10-06 14:54:48,538 epoch 1 - iter 60/152 - loss 3.20599492 - time (sec): 44.42 - samples/sec: 275.76 - lr: 0.000062 - momentum: 0.000000 2023-10-06 14:55:00,091 epoch 1 - iter 75/152 - loss 3.15179797 - time (sec): 55.97 - samples/sec: 277.22 - lr: 0.000078 - momentum: 0.000000 2023-10-06 14:55:11,261 epoch 1 - iter 90/152 - loss 3.08113062 - time (sec): 67.14 - samples/sec: 276.63 - lr: 0.000094 - momentum: 0.000000 2023-10-06 14:55:22,033 epoch 1 - iter 105/152 - loss 2.99589247 - time (sec): 77.91 - samples/sec: 275.94 - lr: 0.000109 - momentum: 0.000000 2023-10-06 14:55:32,979 epoch 1 - iter 120/152 - loss 2.90320056 - time (sec): 88.86 - samples/sec: 274.66 - lr: 0.000125 - momentum: 0.000000 2023-10-06 14:55:44,611 epoch 1 - iter 135/152 - loss 2.79285826 - time (sec): 100.49 - samples/sec: 275.87 - lr: 0.000141 - momentum: 0.000000 2023-10-06 14:55:55,343 epoch 1 - iter 150/152 - loss 2.69377810 - time (sec): 111.22 - samples/sec: 274.81 - lr: 0.000157 - momentum: 0.000000 2023-10-06 14:55:56,812 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:55:56,813 EPOCH 1 done: loss 2.6808 - lr: 0.000157 2023-10-06 14:56:04,646 DEV : loss 1.5423763990402222 - f1-score (micro avg) 0.0 2023-10-06 14:56:04,653 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:56:16,114 epoch 2 - iter 15/152 - loss 1.45028936 - time (sec): 11.46 - samples/sec: 281.01 - lr: 0.000158 - momentum: 0.000000 2023-10-06 14:56:27,084 epoch 2 - iter 30/152 - loss 1.31419700 - time (sec): 22.43 - samples/sec: 277.85 - lr: 0.000157 - momentum: 0.000000 2023-10-06 14:56:38,327 epoch 2 - iter 45/152 - loss 1.22141358 - time (sec): 33.67 - samples/sec: 275.63 - lr: 0.000155 - momentum: 0.000000 2023-10-06 14:56:49,884 epoch 2 - iter 60/152 - loss 1.13843406 - time (sec): 45.23 - samples/sec: 277.78 - lr: 0.000153 - momentum: 0.000000 2023-10-06 14:57:00,600 epoch 2 - iter 75/152 - loss 1.08039112 - time (sec): 55.94 - samples/sec: 277.54 - lr: 0.000151 - momentum: 0.000000 2023-10-06 14:57:11,495 epoch 2 - iter 90/152 - loss 1.01221563 - time (sec): 66.84 - samples/sec: 277.48 - lr: 0.000150 - momentum: 0.000000 2023-10-06 14:57:22,606 epoch 2 - iter 105/152 - loss 0.97571095 - time (sec): 77.95 - samples/sec: 277.16 - lr: 0.000148 - momentum: 0.000000 2023-10-06 14:57:33,499 epoch 2 - iter 120/152 - loss 0.92119821 - time (sec): 88.84 - samples/sec: 276.72 - lr: 0.000146 - momentum: 0.000000 2023-10-06 14:57:44,980 epoch 2 - iter 135/152 - loss 0.87915528 - time (sec): 100.33 - samples/sec: 278.04 - lr: 0.000144 - momentum: 0.000000 2023-10-06 14:57:55,416 epoch 2 - iter 150/152 - loss 0.84045686 - time (sec): 110.76 - samples/sec: 277.49 - lr: 0.000143 - momentum: 0.000000 2023-10-06 14:57:56,460 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:57:56,460 EPOCH 2 done: loss 0.8375 - lr: 0.000143 2023-10-06 14:58:04,278 DEV : loss 0.5047337412834167 - f1-score (micro avg) 0.0142 2023-10-06 14:58:04,285 saving best model 2023-10-06 14:58:05,137 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:58:16,206 epoch 3 - iter 15/152 - loss 0.36830879 - time (sec): 11.07 - samples/sec: 269.53 - lr: 0.000141 - momentum: 0.000000 2023-10-06 14:58:27,597 epoch 3 - iter 30/152 - loss 0.37695983 - time (sec): 22.46 - samples/sec: 274.92 - lr: 0.000139 - momentum: 0.000000 2023-10-06 14:58:37,905 epoch 3 - iter 45/152 - loss 0.37352735 - time (sec): 32.77 - samples/sec: 271.68 - lr: 0.000137 - momentum: 0.000000 2023-10-06 14:58:48,591 epoch 3 - iter 60/152 - loss 0.37890694 - time (sec): 43.45 - samples/sec: 271.52 - lr: 0.000135 - momentum: 0.000000 2023-10-06 14:59:00,051 epoch 3 - iter 75/152 - loss 0.36339867 - time (sec): 54.91 - samples/sec: 275.66 - lr: 0.000134 - momentum: 0.000000 2023-10-06 14:59:10,655 epoch 3 - iter 90/152 - loss 0.35784250 - time (sec): 65.52 - samples/sec: 275.17 - lr: 0.000132 - momentum: 0.000000 2023-10-06 14:59:21,918 epoch 3 - iter 105/152 - loss 0.35112702 - time (sec): 76.78 - samples/sec: 277.68 - lr: 0.000130 - momentum: 0.000000 2023-10-06 14:59:33,043 epoch 3 - iter 120/152 - loss 0.34716540 - time (sec): 87.90 - samples/sec: 278.01 - lr: 0.000128 - momentum: 0.000000 2023-10-06 14:59:44,360 epoch 3 - iter 135/152 - loss 0.33725530 - time (sec): 99.22 - samples/sec: 278.72 - lr: 0.000127 - momentum: 0.000000 2023-10-06 14:59:55,038 epoch 3 - iter 150/152 - loss 0.33077678 - time (sec): 109.90 - samples/sec: 277.95 - lr: 0.000125 - momentum: 0.000000 2023-10-06 14:59:56,568 ---------------------------------------------------------------------------------------------------- 2023-10-06 14:59:56,568 EPOCH 3 done: loss 0.3300 - lr: 0.000125 2023-10-06 15:00:04,361 DEV : loss 0.29971709847450256 - f1-score (micro avg) 0.5147 2023-10-06 15:00:04,367 saving best model 2023-10-06 15:00:08,657 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:00:19,398 epoch 4 - iter 15/152 - loss 0.22111745 - time (sec): 10.74 - samples/sec: 274.60 - lr: 0.000123 - momentum: 0.000000 2023-10-06 15:00:30,602 epoch 4 - iter 30/152 - loss 0.22991226 - time (sec): 21.94 - samples/sec: 279.98 - lr: 0.000121 - momentum: 0.000000 2023-10-06 15:00:41,086 epoch 4 - iter 45/152 - loss 0.22353854 - time (sec): 32.43 - samples/sec: 275.69 - lr: 0.000119 - momentum: 0.000000 2023-10-06 15:00:51,889 epoch 4 - iter 60/152 - loss 0.21205288 - time (sec): 43.23 - samples/sec: 275.40 - lr: 0.000118 - momentum: 0.000000 2023-10-06 15:01:02,655 epoch 4 - iter 75/152 - loss 0.20545775 - time (sec): 54.00 - samples/sec: 275.74 - lr: 0.000116 - momentum: 0.000000 2023-10-06 15:01:13,843 epoch 4 - iter 90/152 - loss 0.20843455 - time (sec): 65.19 - samples/sec: 276.49 - lr: 0.000114 - momentum: 0.000000 2023-10-06 15:01:25,156 epoch 4 - iter 105/152 - loss 0.20784356 - time (sec): 76.50 - samples/sec: 278.16 - lr: 0.000112 - momentum: 0.000000 2023-10-06 15:01:36,444 epoch 4 - iter 120/152 - loss 0.20651959 - time (sec): 87.79 - samples/sec: 277.86 - lr: 0.000111 - momentum: 0.000000 2023-10-06 15:01:47,962 epoch 4 - iter 135/152 - loss 0.20646267 - time (sec): 99.30 - samples/sec: 278.36 - lr: 0.000109 - momentum: 0.000000 2023-10-06 15:01:58,990 epoch 4 - iter 150/152 - loss 0.20044522 - time (sec): 110.33 - samples/sec: 278.64 - lr: 0.000107 - momentum: 0.000000 2023-10-06 15:02:00,046 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:02:00,046 EPOCH 4 done: loss 0.2003 - lr: 0.000107 2023-10-06 15:02:07,930 DEV : loss 0.2131974995136261 - f1-score (micro avg) 0.696 2023-10-06 15:02:07,937 saving best model 2023-10-06 15:02:12,292 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:02:23,063 epoch 5 - iter 15/152 - loss 0.13651155 - time (sec): 10.77 - samples/sec: 275.49 - lr: 0.000105 - momentum: 0.000000 2023-10-06 15:02:34,190 epoch 5 - iter 30/152 - loss 0.14246880 - time (sec): 21.90 - samples/sec: 285.98 - lr: 0.000104 - momentum: 0.000000 2023-10-06 15:02:45,061 epoch 5 - iter 45/152 - loss 0.14377517 - time (sec): 32.77 - samples/sec: 286.84 - lr: 0.000102 - momentum: 0.000000 2023-10-06 15:02:55,518 epoch 5 - iter 60/152 - loss 0.14293566 - time (sec): 43.23 - samples/sec: 285.25 - lr: 0.000100 - momentum: 0.000000 2023-10-06 15:03:05,926 epoch 5 - iter 75/152 - loss 0.13509826 - time (sec): 53.63 - samples/sec: 286.43 - lr: 0.000098 - momentum: 0.000000 2023-10-06 15:03:17,090 epoch 5 - iter 90/152 - loss 0.13560247 - time (sec): 64.80 - samples/sec: 289.58 - lr: 0.000097 - momentum: 0.000000 2023-10-06 15:03:27,733 epoch 5 - iter 105/152 - loss 0.14033493 - time (sec): 75.44 - samples/sec: 291.66 - lr: 0.000095 - momentum: 0.000000 2023-10-06 15:03:37,869 epoch 5 - iter 120/152 - loss 0.13801788 - time (sec): 85.58 - samples/sec: 290.50 - lr: 0.000093 - momentum: 0.000000 2023-10-06 15:03:48,244 epoch 5 - iter 135/152 - loss 0.13495779 - time (sec): 95.95 - samples/sec: 290.31 - lr: 0.000091 - momentum: 0.000000 2023-10-06 15:03:58,015 epoch 5 - iter 150/152 - loss 0.13282031 - time (sec): 105.72 - samples/sec: 289.40 - lr: 0.000090 - momentum: 0.000000 2023-10-06 15:03:59,322 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:03:59,323 EPOCH 5 done: loss 0.1328 - lr: 0.000090 2023-10-06 15:04:06,330 DEV : loss 0.16594909131526947 - f1-score (micro avg) 0.7552 2023-10-06 15:04:06,339 saving best model 2023-10-06 15:04:10,659 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:04:20,574 epoch 6 - iter 15/152 - loss 0.07286253 - time (sec): 9.91 - samples/sec: 297.17 - lr: 0.000088 - momentum: 0.000000 2023-10-06 15:04:30,596 epoch 6 - iter 30/152 - loss 0.08190653 - time (sec): 19.94 - samples/sec: 292.85 - lr: 0.000086 - momentum: 0.000000 2023-10-06 15:04:40,957 epoch 6 - iter 45/152 - loss 0.08201220 - time (sec): 30.30 - samples/sec: 288.51 - lr: 0.000084 - momentum: 0.000000 2023-10-06 15:04:51,641 epoch 6 - iter 60/152 - loss 0.08484603 - time (sec): 40.98 - samples/sec: 290.41 - lr: 0.000082 - momentum: 0.000000 2023-10-06 15:05:01,979 epoch 6 - iter 75/152 - loss 0.08338837 - time (sec): 51.32 - samples/sec: 293.50 - lr: 0.000081 - momentum: 0.000000 2023-10-06 15:05:12,517 epoch 6 - iter 90/152 - loss 0.08585472 - time (sec): 61.86 - samples/sec: 294.02 - lr: 0.000079 - momentum: 0.000000 2023-10-06 15:05:23,099 epoch 6 - iter 105/152 - loss 0.09058195 - time (sec): 72.44 - samples/sec: 293.12 - lr: 0.000077 - momentum: 0.000000 2023-10-06 15:05:33,403 epoch 6 - iter 120/152 - loss 0.09149141 - time (sec): 82.74 - samples/sec: 292.40 - lr: 0.000075 - momentum: 0.000000 2023-10-06 15:05:43,822 epoch 6 - iter 135/152 - loss 0.09044122 - time (sec): 93.16 - samples/sec: 292.03 - lr: 0.000074 - momentum: 0.000000 2023-10-06 15:05:54,785 epoch 6 - iter 150/152 - loss 0.08888543 - time (sec): 104.12 - samples/sec: 292.59 - lr: 0.000072 - momentum: 0.000000 2023-10-06 15:05:56,348 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:05:56,349 EPOCH 6 done: loss 0.0902 - lr: 0.000072 2023-10-06 15:06:03,654 DEV : loss 0.14003103971481323 - f1-score (micro avg) 0.8216 2023-10-06 15:06:03,661 saving best model 2023-10-06 15:06:07,966 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:06:19,336 epoch 7 - iter 15/152 - loss 0.09304549 - time (sec): 11.37 - samples/sec: 304.52 - lr: 0.000070 - momentum: 0.000000 2023-10-06 15:06:29,974 epoch 7 - iter 30/152 - loss 0.07562276 - time (sec): 22.01 - samples/sec: 292.14 - lr: 0.000068 - momentum: 0.000000 2023-10-06 15:06:40,446 epoch 7 - iter 45/152 - loss 0.07334318 - time (sec): 32.48 - samples/sec: 291.76 - lr: 0.000066 - momentum: 0.000000 2023-10-06 15:06:50,617 epoch 7 - iter 60/152 - loss 0.07407120 - time (sec): 42.65 - samples/sec: 284.56 - lr: 0.000065 - momentum: 0.000000 2023-10-06 15:07:01,846 epoch 7 - iter 75/152 - loss 0.07204809 - time (sec): 53.88 - samples/sec: 284.47 - lr: 0.000063 - momentum: 0.000000 2023-10-06 15:07:12,504 epoch 7 - iter 90/152 - loss 0.07172554 - time (sec): 64.54 - samples/sec: 282.82 - lr: 0.000061 - momentum: 0.000000 2023-10-06 15:07:23,447 epoch 7 - iter 105/152 - loss 0.06797482 - time (sec): 75.48 - samples/sec: 282.22 - lr: 0.000059 - momentum: 0.000000 2023-10-06 15:07:34,152 epoch 7 - iter 120/152 - loss 0.06978457 - time (sec): 86.18 - samples/sec: 281.50 - lr: 0.000058 - momentum: 0.000000 2023-10-06 15:07:45,419 epoch 7 - iter 135/152 - loss 0.06738419 - time (sec): 97.45 - samples/sec: 282.09 - lr: 0.000056 - momentum: 0.000000 2023-10-06 15:07:56,784 epoch 7 - iter 150/152 - loss 0.06940149 - time (sec): 108.82 - samples/sec: 281.76 - lr: 0.000054 - momentum: 0.000000 2023-10-06 15:07:58,063 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:07:58,063 EPOCH 7 done: loss 0.0689 - lr: 0.000054 2023-10-06 15:08:06,065 DEV : loss 0.14181111752986908 - f1-score (micro avg) 0.8228 2023-10-06 15:08:06,073 saving best model 2023-10-06 15:08:10,369 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:08:21,506 epoch 8 - iter 15/152 - loss 0.05064222 - time (sec): 11.14 - samples/sec: 274.15 - lr: 0.000052 - momentum: 0.000000 2023-10-06 15:08:32,890 epoch 8 - iter 30/152 - loss 0.07198941 - time (sec): 22.52 - samples/sec: 276.91 - lr: 0.000050 - momentum: 0.000000 2023-10-06 15:08:43,836 epoch 8 - iter 45/152 - loss 0.06404128 - time (sec): 33.47 - samples/sec: 277.15 - lr: 0.000049 - momentum: 0.000000 2023-10-06 15:08:55,623 epoch 8 - iter 60/152 - loss 0.05734585 - time (sec): 45.25 - samples/sec: 279.41 - lr: 0.000047 - momentum: 0.000000 2023-10-06 15:09:06,691 epoch 8 - iter 75/152 - loss 0.06032725 - time (sec): 56.32 - samples/sec: 278.33 - lr: 0.000045 - momentum: 0.000000 2023-10-06 15:09:17,087 epoch 8 - iter 90/152 - loss 0.05864651 - time (sec): 66.72 - samples/sec: 276.18 - lr: 0.000043 - momentum: 0.000000 2023-10-06 15:09:27,963 epoch 8 - iter 105/152 - loss 0.05785205 - time (sec): 77.59 - samples/sec: 275.58 - lr: 0.000042 - momentum: 0.000000 2023-10-06 15:09:38,769 epoch 8 - iter 120/152 - loss 0.05653871 - time (sec): 88.40 - samples/sec: 275.04 - lr: 0.000040 - momentum: 0.000000 2023-10-06 15:09:50,187 epoch 8 - iter 135/152 - loss 0.05700357 - time (sec): 99.82 - samples/sec: 274.75 - lr: 0.000038 - momentum: 0.000000 2023-10-06 15:10:01,519 epoch 8 - iter 150/152 - loss 0.05517782 - time (sec): 111.15 - samples/sec: 275.22 - lr: 0.000036 - momentum: 0.000000 2023-10-06 15:10:02,946 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:10:02,947 EPOCH 8 done: loss 0.0546 - lr: 0.000036 2023-10-06 15:10:10,834 DEV : loss 0.13887561857700348 - f1-score (micro avg) 0.8339 2023-10-06 15:10:10,841 saving best model 2023-10-06 15:10:15,124 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:10:26,833 epoch 9 - iter 15/152 - loss 0.03018573 - time (sec): 11.71 - samples/sec: 283.47 - lr: 0.000034 - momentum: 0.000000 2023-10-06 15:10:38,375 epoch 9 - iter 30/152 - loss 0.03319836 - time (sec): 23.25 - samples/sec: 280.52 - lr: 0.000033 - momentum: 0.000000 2023-10-06 15:10:48,902 epoch 9 - iter 45/152 - loss 0.03938238 - time (sec): 33.78 - samples/sec: 275.75 - lr: 0.000031 - momentum: 0.000000 2023-10-06 15:11:00,133 epoch 9 - iter 60/152 - loss 0.03910265 - time (sec): 45.01 - samples/sec: 277.55 - lr: 0.000029 - momentum: 0.000000 2023-10-06 15:11:11,452 epoch 9 - iter 75/152 - loss 0.04443264 - time (sec): 56.33 - samples/sec: 278.50 - lr: 0.000027 - momentum: 0.000000 2023-10-06 15:11:22,466 epoch 9 - iter 90/152 - loss 0.04504801 - time (sec): 67.34 - samples/sec: 278.51 - lr: 0.000026 - momentum: 0.000000 2023-10-06 15:11:33,644 epoch 9 - iter 105/152 - loss 0.04819797 - time (sec): 78.52 - samples/sec: 279.31 - lr: 0.000024 - momentum: 0.000000 2023-10-06 15:11:44,142 epoch 9 - iter 120/152 - loss 0.04718031 - time (sec): 89.02 - samples/sec: 277.53 - lr: 0.000022 - momentum: 0.000000 2023-10-06 15:11:55,011 epoch 9 - iter 135/152 - loss 0.04711017 - time (sec): 99.89 - samples/sec: 277.60 - lr: 0.000020 - momentum: 0.000000 2023-10-06 15:12:05,499 epoch 9 - iter 150/152 - loss 0.04672030 - time (sec): 110.37 - samples/sec: 276.95 - lr: 0.000019 - momentum: 0.000000 2023-10-06 15:12:06,962 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:12:06,962 EPOCH 9 done: loss 0.0464 - lr: 0.000019 2023-10-06 15:12:14,653 DEV : loss 0.137730672955513 - f1-score (micro avg) 0.83 2023-10-06 15:12:14,659 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:12:25,326 epoch 10 - iter 15/152 - loss 0.04516188 - time (sec): 10.67 - samples/sec: 280.54 - lr: 0.000017 - momentum: 0.000000 2023-10-06 15:12:36,332 epoch 10 - iter 30/152 - loss 0.04842008 - time (sec): 21.67 - samples/sec: 278.94 - lr: 0.000015 - momentum: 0.000000 2023-10-06 15:12:46,987 epoch 10 - iter 45/152 - loss 0.04303074 - time (sec): 32.33 - samples/sec: 280.51 - lr: 0.000013 - momentum: 0.000000 2023-10-06 15:12:57,904 epoch 10 - iter 60/152 - loss 0.04090721 - time (sec): 43.24 - samples/sec: 277.69 - lr: 0.000012 - momentum: 0.000000 2023-10-06 15:13:08,514 epoch 10 - iter 75/152 - loss 0.03829420 - time (sec): 53.85 - samples/sec: 273.85 - lr: 0.000010 - momentum: 0.000000 2023-10-06 15:13:19,914 epoch 10 - iter 90/152 - loss 0.04368647 - time (sec): 65.25 - samples/sec: 274.54 - lr: 0.000008 - momentum: 0.000000 2023-10-06 15:13:30,800 epoch 10 - iter 105/152 - loss 0.04082935 - time (sec): 76.14 - samples/sec: 274.42 - lr: 0.000006 - momentum: 0.000000 2023-10-06 15:13:42,223 epoch 10 - iter 120/152 - loss 0.04090036 - time (sec): 87.56 - samples/sec: 275.54 - lr: 0.000005 - momentum: 0.000000 2023-10-06 15:13:53,578 epoch 10 - iter 135/152 - loss 0.04032698 - time (sec): 98.92 - samples/sec: 277.23 - lr: 0.000003 - momentum: 0.000000 2023-10-06 15:14:04,848 epoch 10 - iter 150/152 - loss 0.04052295 - time (sec): 110.19 - samples/sec: 278.04 - lr: 0.000001 - momentum: 0.000000 2023-10-06 15:14:06,117 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:14:06,118 EPOCH 10 done: loss 0.0418 - lr: 0.000001 2023-10-06 15:14:13,851 DEV : loss 0.13857229053974152 - f1-score (micro avg) 0.8296 2023-10-06 15:14:14,654 ---------------------------------------------------------------------------------------------------- 2023-10-06 15:14:14,655 Loading model from best epoch ... 2023-10-06 15:14:18,184 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-date, B-date, E-date, I-date, S-object, B-object, E-object, I-object 2023-10-06 15:14:25,471 Results: - F-score (micro) 0.7984 - F-score (macro) 0.4845 - Accuracy 0.6751 By class: precision recall f1-score support scope 0.7640 0.8146 0.7885 151 pers 0.7826 0.9375 0.8531 96 work 0.7130 0.8632 0.7810 95 loc 0.0000 0.0000 0.0000 3 date 0.0000 0.0000 0.0000 3 micro avg 0.7545 0.8477 0.7984 348 macro avg 0.4519 0.5230 0.4845 348 weighted avg 0.7420 0.8477 0.7906 348 2023-10-06 15:14:25,472 ----------------------------------------------------------------------------------------------------