2023-10-06 16:19:45,746 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,747 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 16:19:45,747 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,747 MultiCorpus: 1214 train + 266 dev + 251 test sentences - NER_HIPE_2022 Corpus: 1214 train + 266 dev + 251 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/en/with_doc_seperator 2023-10-06 16:19:45,747 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,748 Train: 1214 sentences 2023-10-06 16:19:45,748 (train_with_dev=False, train_with_test=False) 2023-10-06 16:19:45,748 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,748 Training Params: 2023-10-06 16:19:45,748 - learning_rate: "0.00016" 2023-10-06 16:19:45,748 - mini_batch_size: "8" 2023-10-06 16:19:45,748 - max_epochs: "10" 2023-10-06 16:19:45,748 - shuffle: "True" 2023-10-06 16:19:45,748 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,748 Plugins: 2023-10-06 16:19:45,748 - TensorboardLogger 2023-10-06 16:19:45,748 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 16:19:45,748 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,748 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 16:19:45,748 - metric: "('micro avg', 'f1-score')" 2023-10-06 16:19:45,748 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,748 Computation: 2023-10-06 16:19:45,748 - compute on device: cuda:0 2023-10-06 16:19:45,748 - embedding storage: none 2023-10-06 16:19:45,749 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,749 Model training base path: "hmbench-ajmc/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" 2023-10-06 16:19:45,749 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,749 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:19:45,749 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 16:19:56,807 epoch 1 - iter 15/152 - loss 3.23540578 - time (sec): 11.06 - samples/sec: 289.15 - lr: 0.000015 - momentum: 0.000000 2023-10-06 16:20:07,385 epoch 1 - iter 30/152 - loss 3.22879502 - time (sec): 21.63 - samples/sec: 284.86 - lr: 0.000031 - momentum: 0.000000 2023-10-06 16:20:18,149 epoch 1 - iter 45/152 - loss 3.21582019 - time (sec): 32.40 - samples/sec: 276.56 - lr: 0.000046 - momentum: 0.000000 2023-10-06 16:20:29,219 epoch 1 - iter 60/152 - loss 3.19005230 - time (sec): 43.47 - samples/sec: 274.54 - lr: 0.000062 - momentum: 0.000000 2023-10-06 16:20:40,097 epoch 1 - iter 75/152 - loss 3.14005105 - time (sec): 54.35 - samples/sec: 274.00 - lr: 0.000078 - momentum: 0.000000 2023-10-06 16:20:51,996 epoch 1 - iter 90/152 - loss 3.06101250 - time (sec): 66.25 - samples/sec: 276.44 - lr: 0.000094 - momentum: 0.000000 2023-10-06 16:21:01,874 epoch 1 - iter 105/152 - loss 2.99199941 - time (sec): 76.12 - samples/sec: 274.17 - lr: 0.000109 - momentum: 0.000000 2023-10-06 16:21:13,317 epoch 1 - iter 120/152 - loss 2.89372585 - time (sec): 87.57 - samples/sec: 274.65 - lr: 0.000125 - momentum: 0.000000 2023-10-06 16:21:24,577 epoch 1 - iter 135/152 - loss 2.79047626 - time (sec): 98.83 - samples/sec: 275.47 - lr: 0.000141 - momentum: 0.000000 2023-10-06 16:21:36,219 epoch 1 - iter 150/152 - loss 2.68092815 - time (sec): 110.47 - samples/sec: 276.72 - lr: 0.000157 - momentum: 0.000000 2023-10-06 16:21:37,642 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:21:37,642 EPOCH 1 done: loss 2.6662 - lr: 0.000157 2023-10-06 16:21:45,376 DEV : loss 1.55168879032135 - f1-score (micro avg) 0.0 2023-10-06 16:21:45,384 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:21:56,354 epoch 2 - iter 15/152 - loss 1.46794260 - time (sec): 10.97 - samples/sec: 279.35 - lr: 0.000158 - momentum: 0.000000 2023-10-06 16:22:07,158 epoch 2 - iter 30/152 - loss 1.34797918 - time (sec): 21.77 - samples/sec: 280.03 - lr: 0.000157 - momentum: 0.000000 2023-10-06 16:22:18,856 epoch 2 - iter 45/152 - loss 1.25232371 - time (sec): 33.47 - samples/sec: 278.31 - lr: 0.000155 - momentum: 0.000000 2023-10-06 16:22:30,255 epoch 2 - iter 60/152 - loss 1.14965941 - time (sec): 44.87 - samples/sec: 279.01 - lr: 0.000153 - momentum: 0.000000 2023-10-06 16:22:40,140 epoch 2 - iter 75/152 - loss 1.07885480 - time (sec): 54.76 - samples/sec: 275.34 - lr: 0.000151 - momentum: 0.000000 2023-10-06 16:22:50,748 epoch 2 - iter 90/152 - loss 1.00441250 - time (sec): 65.36 - samples/sec: 277.03 - lr: 0.000150 - momentum: 0.000000 2023-10-06 16:23:01,996 epoch 2 - iter 105/152 - loss 0.97487658 - time (sec): 76.61 - samples/sec: 275.74 - lr: 0.000148 - momentum: 0.000000 2023-10-06 16:23:13,023 epoch 2 - iter 120/152 - loss 0.93412980 - time (sec): 87.64 - samples/sec: 276.93 - lr: 0.000146 - momentum: 0.000000 2023-10-06 16:23:24,251 epoch 2 - iter 135/152 - loss 0.88937827 - time (sec): 98.87 - samples/sec: 277.97 - lr: 0.000144 - momentum: 0.000000 2023-10-06 16:23:35,145 epoch 2 - iter 150/152 - loss 0.84649586 - time (sec): 109.76 - samples/sec: 278.95 - lr: 0.000143 - momentum: 0.000000 2023-10-06 16:23:36,498 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:23:36,498 EPOCH 2 done: loss 0.8446 - lr: 0.000143 2023-10-06 16:23:44,291 DEV : loss 0.513270378112793 - f1-score (micro avg) 0.0 2023-10-06 16:23:44,298 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:23:55,057 epoch 3 - iter 15/152 - loss 0.52773187 - time (sec): 10.76 - samples/sec: 268.54 - lr: 0.000141 - momentum: 0.000000 2023-10-06 16:24:05,818 epoch 3 - iter 30/152 - loss 0.45778212 - time (sec): 21.52 - samples/sec: 272.32 - lr: 0.000139 - momentum: 0.000000 2023-10-06 16:24:16,081 epoch 3 - iter 45/152 - loss 0.42200152 - time (sec): 31.78 - samples/sec: 270.75 - lr: 0.000137 - momentum: 0.000000 2023-10-06 16:24:27,766 epoch 3 - iter 60/152 - loss 0.41488350 - time (sec): 43.47 - samples/sec: 275.43 - lr: 0.000135 - momentum: 0.000000 2023-10-06 16:24:38,613 epoch 3 - iter 75/152 - loss 0.39878706 - time (sec): 54.31 - samples/sec: 274.18 - lr: 0.000134 - momentum: 0.000000 2023-10-06 16:24:49,655 epoch 3 - iter 90/152 - loss 0.39305017 - time (sec): 65.36 - samples/sec: 274.74 - lr: 0.000132 - momentum: 0.000000 2023-10-06 16:25:00,851 epoch 3 - iter 105/152 - loss 0.38050487 - time (sec): 76.55 - samples/sec: 275.58 - lr: 0.000130 - momentum: 0.000000 2023-10-06 16:25:12,061 epoch 3 - iter 120/152 - loss 0.36951706 - time (sec): 87.76 - samples/sec: 276.58 - lr: 0.000128 - momentum: 0.000000 2023-10-06 16:25:22,826 epoch 3 - iter 135/152 - loss 0.35655812 - time (sec): 98.53 - samples/sec: 277.14 - lr: 0.000127 - momentum: 0.000000 2023-10-06 16:25:34,090 epoch 3 - iter 150/152 - loss 0.34510699 - time (sec): 109.79 - samples/sec: 278.17 - lr: 0.000125 - momentum: 0.000000 2023-10-06 16:25:35,588 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:25:35,588 EPOCH 3 done: loss 0.3446 - lr: 0.000125 2023-10-06 16:25:43,031 DEV : loss 0.3055611252784729 - f1-score (micro avg) 0.5161 2023-10-06 16:25:43,038 saving best model 2023-10-06 16:25:43,861 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:25:54,725 epoch 4 - iter 15/152 - loss 0.26960635 - time (sec): 10.86 - samples/sec: 291.45 - lr: 0.000123 - momentum: 0.000000 2023-10-06 16:26:04,591 epoch 4 - iter 30/152 - loss 0.24910661 - time (sec): 20.73 - samples/sec: 285.60 - lr: 0.000121 - momentum: 0.000000 2023-10-06 16:26:14,738 epoch 4 - iter 45/152 - loss 0.24214415 - time (sec): 30.88 - samples/sec: 290.13 - lr: 0.000119 - momentum: 0.000000 2023-10-06 16:26:25,154 epoch 4 - iter 60/152 - loss 0.23251342 - time (sec): 41.29 - samples/sec: 291.07 - lr: 0.000118 - momentum: 0.000000 2023-10-06 16:26:35,591 epoch 4 - iter 75/152 - loss 0.23189634 - time (sec): 51.73 - samples/sec: 292.14 - lr: 0.000116 - momentum: 0.000000 2023-10-06 16:26:45,809 epoch 4 - iter 90/152 - loss 0.22377769 - time (sec): 61.95 - samples/sec: 293.25 - lr: 0.000114 - momentum: 0.000000 2023-10-06 16:26:55,960 epoch 4 - iter 105/152 - loss 0.21822695 - time (sec): 72.10 - samples/sec: 292.37 - lr: 0.000112 - momentum: 0.000000 2023-10-06 16:27:07,084 epoch 4 - iter 120/152 - loss 0.21412196 - time (sec): 83.22 - samples/sec: 295.90 - lr: 0.000111 - momentum: 0.000000 2023-10-06 16:27:18,034 epoch 4 - iter 135/152 - loss 0.20764481 - time (sec): 94.17 - samples/sec: 296.33 - lr: 0.000109 - momentum: 0.000000 2023-10-06 16:27:27,724 epoch 4 - iter 150/152 - loss 0.20211165 - time (sec): 103.86 - samples/sec: 295.11 - lr: 0.000107 - momentum: 0.000000 2023-10-06 16:27:28,912 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:27:28,912 EPOCH 4 done: loss 0.2015 - lr: 0.000107 2023-10-06 16:27:35,913 DEV : loss 0.20635436475276947 - f1-score (micro avg) 0.6913 2023-10-06 16:27:35,919 saving best model 2023-10-06 16:27:40,225 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:27:50,736 epoch 5 - iter 15/152 - loss 0.12257114 - time (sec): 10.51 - samples/sec: 292.61 - lr: 0.000105 - momentum: 0.000000 2023-10-06 16:28:00,863 epoch 5 - iter 30/152 - loss 0.14636035 - time (sec): 20.64 - samples/sec: 293.99 - lr: 0.000104 - momentum: 0.000000 2023-10-06 16:28:11,596 epoch 5 - iter 45/152 - loss 0.14997723 - time (sec): 31.37 - samples/sec: 294.97 - lr: 0.000102 - momentum: 0.000000 2023-10-06 16:28:22,020 epoch 5 - iter 60/152 - loss 0.14533547 - time (sec): 41.79 - samples/sec: 297.08 - lr: 0.000100 - momentum: 0.000000 2023-10-06 16:28:32,657 epoch 5 - iter 75/152 - loss 0.14715061 - time (sec): 52.43 - samples/sec: 297.79 - lr: 0.000098 - momentum: 0.000000 2023-10-06 16:28:42,839 epoch 5 - iter 90/152 - loss 0.13914782 - time (sec): 62.61 - samples/sec: 296.20 - lr: 0.000097 - momentum: 0.000000 2023-10-06 16:28:53,712 epoch 5 - iter 105/152 - loss 0.13306359 - time (sec): 73.48 - samples/sec: 295.88 - lr: 0.000095 - momentum: 0.000000 2023-10-06 16:29:04,301 epoch 5 - iter 120/152 - loss 0.13385254 - time (sec): 84.07 - samples/sec: 296.24 - lr: 0.000093 - momentum: 0.000000 2023-10-06 16:29:14,803 epoch 5 - iter 135/152 - loss 0.13129539 - time (sec): 94.58 - samples/sec: 294.48 - lr: 0.000091 - momentum: 0.000000 2023-10-06 16:29:25,131 epoch 5 - iter 150/152 - loss 0.13187403 - time (sec): 104.90 - samples/sec: 293.24 - lr: 0.000090 - momentum: 0.000000 2023-10-06 16:29:26,138 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:29:26,138 EPOCH 5 done: loss 0.1312 - lr: 0.000090 2023-10-06 16:29:33,532 DEV : loss 0.15531174838542938 - f1-score (micro avg) 0.7911 2023-10-06 16:29:33,539 saving best model 2023-10-06 16:29:37,867 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:29:48,682 epoch 6 - iter 15/152 - loss 0.12109111 - time (sec): 10.81 - samples/sec: 294.45 - lr: 0.000088 - momentum: 0.000000 2023-10-06 16:29:59,738 epoch 6 - iter 30/152 - loss 0.11090099 - time (sec): 21.87 - samples/sec: 294.39 - lr: 0.000086 - momentum: 0.000000 2023-10-06 16:30:11,182 epoch 6 - iter 45/152 - loss 0.10440084 - time (sec): 33.31 - samples/sec: 292.23 - lr: 0.000084 - momentum: 0.000000 2023-10-06 16:30:22,294 epoch 6 - iter 60/152 - loss 0.09930167 - time (sec): 44.43 - samples/sec: 289.12 - lr: 0.000082 - momentum: 0.000000 2023-10-06 16:30:33,112 epoch 6 - iter 75/152 - loss 0.09471203 - time (sec): 55.24 - samples/sec: 288.20 - lr: 0.000081 - momentum: 0.000000 2023-10-06 16:30:43,143 epoch 6 - iter 90/152 - loss 0.09074533 - time (sec): 65.27 - samples/sec: 284.72 - lr: 0.000079 - momentum: 0.000000 2023-10-06 16:30:54,624 epoch 6 - iter 105/152 - loss 0.09014343 - time (sec): 76.76 - samples/sec: 284.46 - lr: 0.000077 - momentum: 0.000000 2023-10-06 16:31:05,287 epoch 6 - iter 120/152 - loss 0.08993628 - time (sec): 87.42 - samples/sec: 282.47 - lr: 0.000075 - momentum: 0.000000 2023-10-06 16:31:16,181 epoch 6 - iter 135/152 - loss 0.08746524 - time (sec): 98.31 - samples/sec: 280.77 - lr: 0.000074 - momentum: 0.000000 2023-10-06 16:31:27,287 epoch 6 - iter 150/152 - loss 0.08855062 - time (sec): 109.42 - samples/sec: 280.37 - lr: 0.000072 - momentum: 0.000000 2023-10-06 16:31:28,470 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:31:28,470 EPOCH 6 done: loss 0.0895 - lr: 0.000072 2023-10-06 16:31:36,442 DEV : loss 0.14460936188697815 - f1-score (micro avg) 0.819 2023-10-06 16:31:36,449 saving best model 2023-10-06 16:31:40,765 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:31:51,479 epoch 7 - iter 15/152 - loss 0.06953968 - time (sec): 10.71 - samples/sec: 267.73 - lr: 0.000070 - momentum: 0.000000 2023-10-06 16:32:02,182 epoch 7 - iter 30/152 - loss 0.07050808 - time (sec): 21.42 - samples/sec: 268.68 - lr: 0.000068 - momentum: 0.000000 2023-10-06 16:32:13,212 epoch 7 - iter 45/152 - loss 0.06094662 - time (sec): 32.45 - samples/sec: 269.99 - lr: 0.000066 - momentum: 0.000000 2023-10-06 16:32:24,315 epoch 7 - iter 60/152 - loss 0.05752575 - time (sec): 43.55 - samples/sec: 271.63 - lr: 0.000065 - momentum: 0.000000 2023-10-06 16:32:35,212 epoch 7 - iter 75/152 - loss 0.05735934 - time (sec): 54.45 - samples/sec: 272.12 - lr: 0.000063 - momentum: 0.000000 2023-10-06 16:32:46,573 epoch 7 - iter 90/152 - loss 0.06572225 - time (sec): 65.81 - samples/sec: 275.06 - lr: 0.000061 - momentum: 0.000000 2023-10-06 16:32:57,480 epoch 7 - iter 105/152 - loss 0.06380567 - time (sec): 76.71 - samples/sec: 276.39 - lr: 0.000059 - momentum: 0.000000 2023-10-06 16:33:08,303 epoch 7 - iter 120/152 - loss 0.06421396 - time (sec): 87.54 - samples/sec: 276.07 - lr: 0.000058 - momentum: 0.000000 2023-10-06 16:33:19,644 epoch 7 - iter 135/152 - loss 0.06519409 - time (sec): 98.88 - samples/sec: 278.17 - lr: 0.000056 - momentum: 0.000000 2023-10-06 16:33:30,745 epoch 7 - iter 150/152 - loss 0.07035232 - time (sec): 109.98 - samples/sec: 278.35 - lr: 0.000054 - momentum: 0.000000 2023-10-06 16:33:32,065 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:33:32,066 EPOCH 7 done: loss 0.0697 - lr: 0.000054 2023-10-06 16:33:39,825 DEV : loss 0.13930074870586395 - f1-score (micro avg) 0.8202 2023-10-06 16:33:39,832 saving best model 2023-10-06 16:33:44,192 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:33:54,847 epoch 8 - iter 15/152 - loss 0.06736346 - time (sec): 10.65 - samples/sec: 284.33 - lr: 0.000052 - momentum: 0.000000 2023-10-06 16:34:05,996 epoch 8 - iter 30/152 - loss 0.07138290 - time (sec): 21.80 - samples/sec: 274.01 - lr: 0.000050 - momentum: 0.000000 2023-10-06 16:34:16,435 epoch 8 - iter 45/152 - loss 0.06326499 - time (sec): 32.24 - samples/sec: 269.75 - lr: 0.000049 - momentum: 0.000000 2023-10-06 16:34:27,607 epoch 8 - iter 60/152 - loss 0.06559809 - time (sec): 43.41 - samples/sec: 271.19 - lr: 0.000047 - momentum: 0.000000 2023-10-06 16:34:39,121 epoch 8 - iter 75/152 - loss 0.06035025 - time (sec): 54.93 - samples/sec: 272.87 - lr: 0.000045 - momentum: 0.000000 2023-10-06 16:34:50,151 epoch 8 - iter 90/152 - loss 0.05872432 - time (sec): 65.96 - samples/sec: 272.18 - lr: 0.000043 - momentum: 0.000000 2023-10-06 16:35:01,701 epoch 8 - iter 105/152 - loss 0.05638853 - time (sec): 77.51 - samples/sec: 273.41 - lr: 0.000042 - momentum: 0.000000 2023-10-06 16:35:13,519 epoch 8 - iter 120/152 - loss 0.05567008 - time (sec): 89.33 - samples/sec: 275.39 - lr: 0.000040 - momentum: 0.000000 2023-10-06 16:35:24,854 epoch 8 - iter 135/152 - loss 0.05635795 - time (sec): 100.66 - samples/sec: 275.93 - lr: 0.000038 - momentum: 0.000000 2023-10-06 16:35:35,353 epoch 8 - iter 150/152 - loss 0.05701935 - time (sec): 111.16 - samples/sec: 275.05 - lr: 0.000036 - momentum: 0.000000 2023-10-06 16:35:36,839 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:35:36,840 EPOCH 8 done: loss 0.0569 - lr: 0.000036 2023-10-06 16:35:44,889 DEV : loss 0.13281205296516418 - f1-score (micro avg) 0.8264 2023-10-06 16:35:44,897 saving best model 2023-10-06 16:35:49,205 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:36:00,808 epoch 9 - iter 15/152 - loss 0.06683062 - time (sec): 11.60 - samples/sec: 291.25 - lr: 0.000034 - momentum: 0.000000 2023-10-06 16:36:11,803 epoch 9 - iter 30/152 - loss 0.05790843 - time (sec): 22.60 - samples/sec: 290.44 - lr: 0.000033 - momentum: 0.000000 2023-10-06 16:36:22,556 epoch 9 - iter 45/152 - loss 0.05600634 - time (sec): 33.35 - samples/sec: 285.52 - lr: 0.000031 - momentum: 0.000000 2023-10-06 16:36:33,214 epoch 9 - iter 60/152 - loss 0.05304022 - time (sec): 44.01 - samples/sec: 281.77 - lr: 0.000029 - momentum: 0.000000 2023-10-06 16:36:43,962 epoch 9 - iter 75/152 - loss 0.04923524 - time (sec): 54.76 - samples/sec: 279.99 - lr: 0.000027 - momentum: 0.000000 2023-10-06 16:36:55,071 epoch 9 - iter 90/152 - loss 0.04860929 - time (sec): 65.86 - samples/sec: 280.38 - lr: 0.000026 - momentum: 0.000000 2023-10-06 16:37:06,633 epoch 9 - iter 105/152 - loss 0.04819044 - time (sec): 77.43 - samples/sec: 280.03 - lr: 0.000024 - momentum: 0.000000 2023-10-06 16:37:17,550 epoch 9 - iter 120/152 - loss 0.05061531 - time (sec): 88.34 - samples/sec: 278.29 - lr: 0.000022 - momentum: 0.000000 2023-10-06 16:37:28,427 epoch 9 - iter 135/152 - loss 0.04742917 - time (sec): 99.22 - samples/sec: 278.07 - lr: 0.000020 - momentum: 0.000000 2023-10-06 16:37:39,380 epoch 9 - iter 150/152 - loss 0.04774681 - time (sec): 110.17 - samples/sec: 277.85 - lr: 0.000019 - momentum: 0.000000 2023-10-06 16:37:40,609 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:37:40,609 EPOCH 9 done: loss 0.0482 - lr: 0.000019 2023-10-06 16:37:48,514 DEV : loss 0.1400037407875061 - f1-score (micro avg) 0.8197 2023-10-06 16:37:48,522 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:37:59,171 epoch 10 - iter 15/152 - loss 0.05615972 - time (sec): 10.65 - samples/sec: 269.93 - lr: 0.000017 - momentum: 0.000000 2023-10-06 16:38:11,284 epoch 10 - iter 30/152 - loss 0.04641276 - time (sec): 22.76 - samples/sec: 280.13 - lr: 0.000015 - momentum: 0.000000 2023-10-06 16:38:22,062 epoch 10 - iter 45/152 - loss 0.04558062 - time (sec): 33.54 - samples/sec: 279.29 - lr: 0.000013 - momentum: 0.000000 2023-10-06 16:38:32,593 epoch 10 - iter 60/152 - loss 0.04186607 - time (sec): 44.07 - samples/sec: 280.42 - lr: 0.000012 - momentum: 0.000000 2023-10-06 16:38:42,523 epoch 10 - iter 75/152 - loss 0.04540781 - time (sec): 54.00 - samples/sec: 280.13 - lr: 0.000010 - momentum: 0.000000 2023-10-06 16:38:52,937 epoch 10 - iter 90/152 - loss 0.04336269 - time (sec): 64.41 - samples/sec: 283.62 - lr: 0.000008 - momentum: 0.000000 2023-10-06 16:39:03,010 epoch 10 - iter 105/152 - loss 0.04154053 - time (sec): 74.49 - samples/sec: 283.07 - lr: 0.000006 - momentum: 0.000000 2023-10-06 16:39:13,719 epoch 10 - iter 120/152 - loss 0.04194713 - time (sec): 85.20 - samples/sec: 285.38 - lr: 0.000005 - momentum: 0.000000 2023-10-06 16:39:24,367 epoch 10 - iter 135/152 - loss 0.04126831 - time (sec): 95.84 - samples/sec: 286.89 - lr: 0.000003 - momentum: 0.000000 2023-10-06 16:39:34,864 epoch 10 - iter 150/152 - loss 0.04270560 - time (sec): 106.34 - samples/sec: 288.95 - lr: 0.000001 - momentum: 0.000000 2023-10-06 16:39:35,902 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:39:35,902 EPOCH 10 done: loss 0.0431 - lr: 0.000001 2023-10-06 16:39:43,530 DEV : loss 0.14096252620220184 - f1-score (micro avg) 0.8235 2023-10-06 16:39:44,371 ---------------------------------------------------------------------------------------------------- 2023-10-06 16:39:44,372 Loading model from best epoch ... 2023-10-06 16:39:47,649 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-date, B-date, E-date, I-date, S-object, B-object, E-object, I-object 2023-10-06 16:39:54,210 Results: - F-score (micro) 0.7913 - F-score (macro) 0.4814 - Accuracy 0.6621 By class: precision recall f1-score support scope 0.7453 0.7947 0.7692 151 pers 0.8000 0.9583 0.8720 96 work 0.7018 0.8421 0.7656 95 loc 0.0000 0.0000 0.0000 3 date 0.0000 0.0000 0.0000 3 micro avg 0.7487 0.8391 0.7913 348 macro avg 0.4494 0.5190 0.4814 348 weighted avg 0.7357 0.8391 0.7833 348 2023-10-06 16:39:54,210 ----------------------------------------------------------------------------------------------------