2023-10-06 20:41:12,480 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,481 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 20:41:12,482 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,482 MultiCorpus: 1100 train + 206 dev + 240 test sentences - NER_HIPE_2022 Corpus: 1100 train + 206 dev + 240 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/de/with_doc_seperator 2023-10-06 20:41:12,482 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,482 Train: 1100 sentences 2023-10-06 20:41:12,482 (train_with_dev=False, train_with_test=False) 2023-10-06 20:41:12,482 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,482 Training Params: 2023-10-06 20:41:12,482 - learning_rate: "0.00015" 2023-10-06 20:41:12,482 - mini_batch_size: "4" 2023-10-06 20:41:12,482 - max_epochs: "10" 2023-10-06 20:41:12,482 - shuffle: "True" 2023-10-06 20:41:12,482 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,482 Plugins: 2023-10-06 20:41:12,482 - TensorboardLogger 2023-10-06 20:41:12,483 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 20:41:12,483 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,483 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 20:41:12,483 - metric: "('micro avg', 'f1-score')" 2023-10-06 20:41:12,483 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,483 Computation: 2023-10-06 20:41:12,483 - compute on device: cuda:0 2023-10-06 20:41:12,483 - embedding storage: none 2023-10-06 20:41:12,483 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,483 Model training base path: "hmbench-ajmc/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" 2023-10-06 20:41:12,483 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,483 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:41:12,483 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 20:41:23,097 epoch 1 - iter 27/275 - loss 3.22851574 - time (sec): 10.61 - samples/sec: 208.24 - lr: 0.000014 - momentum: 0.000000 2023-10-06 20:41:33,019 epoch 1 - iter 54/275 - loss 3.21829774 - time (sec): 20.53 - samples/sec: 213.98 - lr: 0.000029 - momentum: 0.000000 2023-10-06 20:41:42,867 epoch 1 - iter 81/275 - loss 3.19979413 - time (sec): 30.38 - samples/sec: 215.91 - lr: 0.000044 - momentum: 0.000000 2023-10-06 20:41:52,594 epoch 1 - iter 108/275 - loss 3.15653748 - time (sec): 40.11 - samples/sec: 215.83 - lr: 0.000058 - momentum: 0.000000 2023-10-06 20:42:03,379 epoch 1 - iter 135/275 - loss 3.06509180 - time (sec): 50.89 - samples/sec: 218.86 - lr: 0.000073 - momentum: 0.000000 2023-10-06 20:42:13,893 epoch 1 - iter 162/275 - loss 2.96221944 - time (sec): 61.41 - samples/sec: 219.38 - lr: 0.000088 - momentum: 0.000000 2023-10-06 20:42:24,478 epoch 1 - iter 189/275 - loss 2.85684614 - time (sec): 71.99 - samples/sec: 219.80 - lr: 0.000103 - momentum: 0.000000 2023-10-06 20:42:34,575 epoch 1 - iter 216/275 - loss 2.74750307 - time (sec): 82.09 - samples/sec: 219.59 - lr: 0.000117 - momentum: 0.000000 2023-10-06 20:42:44,911 epoch 1 - iter 243/275 - loss 2.63108071 - time (sec): 92.43 - samples/sec: 219.19 - lr: 0.000132 - momentum: 0.000000 2023-10-06 20:42:54,630 epoch 1 - iter 270/275 - loss 2.53023247 - time (sec): 102.15 - samples/sec: 218.06 - lr: 0.000147 - momentum: 0.000000 2023-10-06 20:42:56,909 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:42:56,910 EPOCH 1 done: loss 2.5051 - lr: 0.000147 2023-10-06 20:43:03,213 DEV : loss 1.1949074268341064 - f1-score (micro avg) 0.0 2023-10-06 20:43:03,219 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:43:13,749 epoch 2 - iter 27/275 - loss 1.09853432 - time (sec): 10.53 - samples/sec: 209.06 - lr: 0.000148 - momentum: 0.000000 2023-10-06 20:43:24,679 epoch 2 - iter 54/275 - loss 0.97325545 - time (sec): 21.46 - samples/sec: 208.26 - lr: 0.000147 - momentum: 0.000000 2023-10-06 20:43:34,960 epoch 2 - iter 81/275 - loss 0.96420986 - time (sec): 31.74 - samples/sec: 207.53 - lr: 0.000145 - momentum: 0.000000 2023-10-06 20:43:44,907 epoch 2 - iter 108/275 - loss 0.90817676 - time (sec): 41.69 - samples/sec: 204.38 - lr: 0.000144 - momentum: 0.000000 2023-10-06 20:43:55,205 epoch 2 - iter 135/275 - loss 0.89704867 - time (sec): 51.98 - samples/sec: 203.33 - lr: 0.000142 - momentum: 0.000000 2023-10-06 20:44:06,528 epoch 2 - iter 162/275 - loss 0.86146920 - time (sec): 63.31 - samples/sec: 204.97 - lr: 0.000140 - momentum: 0.000000 2023-10-06 20:44:17,829 epoch 2 - iter 189/275 - loss 0.82102558 - time (sec): 74.61 - samples/sec: 206.69 - lr: 0.000139 - momentum: 0.000000 2023-10-06 20:44:28,642 epoch 2 - iter 216/275 - loss 0.78226959 - time (sec): 85.42 - samples/sec: 207.29 - lr: 0.000137 - momentum: 0.000000 2023-10-06 20:44:39,384 epoch 2 - iter 243/275 - loss 0.74764483 - time (sec): 96.16 - samples/sec: 208.07 - lr: 0.000135 - momentum: 0.000000 2023-10-06 20:44:49,949 epoch 2 - iter 270/275 - loss 0.72320304 - time (sec): 106.73 - samples/sec: 208.87 - lr: 0.000134 - momentum: 0.000000 2023-10-06 20:44:51,899 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:44:51,899 EPOCH 2 done: loss 0.7199 - lr: 0.000134 2023-10-06 20:44:58,340 DEV : loss 0.43983766436576843 - f1-score (micro avg) 0.2969 2023-10-06 20:44:58,345 saving best model 2023-10-06 20:44:59,199 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:45:09,642 epoch 3 - iter 27/275 - loss 0.40940852 - time (sec): 10.44 - samples/sec: 212.23 - lr: 0.000132 - momentum: 0.000000 2023-10-06 20:45:20,822 epoch 3 - iter 54/275 - loss 0.38706250 - time (sec): 21.62 - samples/sec: 215.90 - lr: 0.000130 - momentum: 0.000000 2023-10-06 20:45:31,632 epoch 3 - iter 81/275 - loss 0.38027867 - time (sec): 32.43 - samples/sec: 216.40 - lr: 0.000129 - momentum: 0.000000 2023-10-06 20:45:42,720 epoch 3 - iter 108/275 - loss 0.37562173 - time (sec): 43.52 - samples/sec: 215.70 - lr: 0.000127 - momentum: 0.000000 2023-10-06 20:45:53,200 epoch 3 - iter 135/275 - loss 0.36728998 - time (sec): 54.00 - samples/sec: 214.06 - lr: 0.000125 - momentum: 0.000000 2023-10-06 20:46:04,063 epoch 3 - iter 162/275 - loss 0.36364722 - time (sec): 64.86 - samples/sec: 213.19 - lr: 0.000124 - momentum: 0.000000 2023-10-06 20:46:14,084 epoch 3 - iter 189/275 - loss 0.35221302 - time (sec): 74.88 - samples/sec: 211.14 - lr: 0.000122 - momentum: 0.000000 2023-10-06 20:46:24,319 epoch 3 - iter 216/275 - loss 0.34021923 - time (sec): 85.12 - samples/sec: 209.43 - lr: 0.000120 - momentum: 0.000000 2023-10-06 20:46:35,911 epoch 3 - iter 243/275 - loss 0.33081667 - time (sec): 96.71 - samples/sec: 210.07 - lr: 0.000119 - momentum: 0.000000 2023-10-06 20:46:45,972 epoch 3 - iter 270/275 - loss 0.31882959 - time (sec): 106.77 - samples/sec: 208.92 - lr: 0.000117 - momentum: 0.000000 2023-10-06 20:46:48,190 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:46:48,190 EPOCH 3 done: loss 0.3166 - lr: 0.000117 2023-10-06 20:46:54,743 DEV : loss 0.22956699132919312 - f1-score (micro avg) 0.7306 2023-10-06 20:46:54,748 saving best model 2023-10-06 20:46:59,225 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:47:11,123 epoch 4 - iter 27/275 - loss 0.18862218 - time (sec): 11.90 - samples/sec: 219.13 - lr: 0.000115 - momentum: 0.000000 2023-10-06 20:47:21,611 epoch 4 - iter 54/275 - loss 0.17081658 - time (sec): 22.38 - samples/sec: 212.87 - lr: 0.000114 - momentum: 0.000000 2023-10-06 20:47:31,893 epoch 4 - iter 81/275 - loss 0.17792602 - time (sec): 32.67 - samples/sec: 207.43 - lr: 0.000112 - momentum: 0.000000 2023-10-06 20:47:42,500 epoch 4 - iter 108/275 - loss 0.18147922 - time (sec): 43.27 - samples/sec: 208.49 - lr: 0.000110 - momentum: 0.000000 2023-10-06 20:47:53,725 epoch 4 - iter 135/275 - loss 0.18415122 - time (sec): 54.50 - samples/sec: 210.10 - lr: 0.000109 - momentum: 0.000000 2023-10-06 20:48:04,226 epoch 4 - iter 162/275 - loss 0.17880585 - time (sec): 65.00 - samples/sec: 208.51 - lr: 0.000107 - momentum: 0.000000 2023-10-06 20:48:15,156 epoch 4 - iter 189/275 - loss 0.17398312 - time (sec): 75.93 - samples/sec: 208.20 - lr: 0.000105 - momentum: 0.000000 2023-10-06 20:48:25,454 epoch 4 - iter 216/275 - loss 0.17691687 - time (sec): 86.23 - samples/sec: 208.11 - lr: 0.000104 - momentum: 0.000000 2023-10-06 20:48:35,926 epoch 4 - iter 243/275 - loss 0.17122472 - time (sec): 96.70 - samples/sec: 208.21 - lr: 0.000102 - momentum: 0.000000 2023-10-06 20:48:46,776 epoch 4 - iter 270/275 - loss 0.16701451 - time (sec): 107.55 - samples/sec: 207.35 - lr: 0.000101 - momentum: 0.000000 2023-10-06 20:48:48,883 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:48:48,883 EPOCH 4 done: loss 0.1661 - lr: 0.000101 2023-10-06 20:48:55,611 DEV : loss 0.14865034818649292 - f1-score (micro avg) 0.8125 2023-10-06 20:48:55,616 saving best model 2023-10-06 20:48:59,970 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:49:10,651 epoch 5 - iter 27/275 - loss 0.13835944 - time (sec): 10.68 - samples/sec: 204.71 - lr: 0.000099 - momentum: 0.000000 2023-10-06 20:49:21,407 epoch 5 - iter 54/275 - loss 0.12077603 - time (sec): 21.44 - samples/sec: 203.68 - lr: 0.000097 - momentum: 0.000000 2023-10-06 20:49:31,869 epoch 5 - iter 81/275 - loss 0.12505616 - time (sec): 31.90 - samples/sec: 205.13 - lr: 0.000095 - momentum: 0.000000 2023-10-06 20:49:43,704 epoch 5 - iter 108/275 - loss 0.11524283 - time (sec): 43.73 - samples/sec: 208.68 - lr: 0.000094 - momentum: 0.000000 2023-10-06 20:49:55,748 epoch 5 - iter 135/275 - loss 0.11047258 - time (sec): 55.78 - samples/sec: 209.28 - lr: 0.000092 - momentum: 0.000000 2023-10-06 20:50:06,657 epoch 5 - iter 162/275 - loss 0.10857292 - time (sec): 66.68 - samples/sec: 207.63 - lr: 0.000090 - momentum: 0.000000 2023-10-06 20:50:17,559 epoch 5 - iter 189/275 - loss 0.10346163 - time (sec): 77.59 - samples/sec: 207.55 - lr: 0.000089 - momentum: 0.000000 2023-10-06 20:50:28,156 epoch 5 - iter 216/275 - loss 0.10354840 - time (sec): 88.18 - samples/sec: 208.70 - lr: 0.000087 - momentum: 0.000000 2023-10-06 20:50:38,244 epoch 5 - iter 243/275 - loss 0.10378967 - time (sec): 98.27 - samples/sec: 207.14 - lr: 0.000086 - momentum: 0.000000 2023-10-06 20:50:48,462 epoch 5 - iter 270/275 - loss 0.09977533 - time (sec): 108.49 - samples/sec: 206.57 - lr: 0.000084 - momentum: 0.000000 2023-10-06 20:50:50,302 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:50:50,302 EPOCH 5 done: loss 0.1025 - lr: 0.000084 2023-10-06 20:50:57,063 DEV : loss 0.12752477824687958 - f1-score (micro avg) 0.8685 2023-10-06 20:50:57,069 saving best model 2023-10-06 20:51:01,781 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:51:12,742 epoch 6 - iter 27/275 - loss 0.07310406 - time (sec): 10.96 - samples/sec: 214.16 - lr: 0.000082 - momentum: 0.000000 2023-10-06 20:51:23,205 epoch 6 - iter 54/275 - loss 0.07407181 - time (sec): 21.42 - samples/sec: 209.41 - lr: 0.000080 - momentum: 0.000000 2023-10-06 20:51:33,976 epoch 6 - iter 81/275 - loss 0.06876723 - time (sec): 32.19 - samples/sec: 209.02 - lr: 0.000079 - momentum: 0.000000 2023-10-06 20:51:45,214 epoch 6 - iter 108/275 - loss 0.06624089 - time (sec): 43.43 - samples/sec: 207.59 - lr: 0.000077 - momentum: 0.000000 2023-10-06 20:51:56,261 epoch 6 - iter 135/275 - loss 0.06596938 - time (sec): 54.48 - samples/sec: 206.93 - lr: 0.000075 - momentum: 0.000000 2023-10-06 20:52:07,021 epoch 6 - iter 162/275 - loss 0.06566254 - time (sec): 65.24 - samples/sec: 206.49 - lr: 0.000074 - momentum: 0.000000 2023-10-06 20:52:18,206 epoch 6 - iter 189/275 - loss 0.07319061 - time (sec): 76.42 - samples/sec: 208.38 - lr: 0.000072 - momentum: 0.000000 2023-10-06 20:52:28,891 epoch 6 - iter 216/275 - loss 0.07175614 - time (sec): 87.11 - samples/sec: 207.72 - lr: 0.000071 - momentum: 0.000000 2023-10-06 20:52:39,707 epoch 6 - iter 243/275 - loss 0.07368640 - time (sec): 97.92 - samples/sec: 207.27 - lr: 0.000069 - momentum: 0.000000 2023-10-06 20:52:49,870 epoch 6 - iter 270/275 - loss 0.07386029 - time (sec): 108.09 - samples/sec: 206.37 - lr: 0.000067 - momentum: 0.000000 2023-10-06 20:52:52,149 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:52:52,150 EPOCH 6 done: loss 0.0737 - lr: 0.000067 2023-10-06 20:52:58,861 DEV : loss 0.1320181041955948 - f1-score (micro avg) 0.8616 2023-10-06 20:52:58,867 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:53:09,121 epoch 7 - iter 27/275 - loss 0.04497303 - time (sec): 10.25 - samples/sec: 199.75 - lr: 0.000065 - momentum: 0.000000 2023-10-06 20:53:19,415 epoch 7 - iter 54/275 - loss 0.04642627 - time (sec): 20.55 - samples/sec: 198.86 - lr: 0.000064 - momentum: 0.000000 2023-10-06 20:53:30,703 epoch 7 - iter 81/275 - loss 0.05601185 - time (sec): 31.83 - samples/sec: 203.08 - lr: 0.000062 - momentum: 0.000000 2023-10-06 20:53:41,853 epoch 7 - iter 108/275 - loss 0.04747209 - time (sec): 42.98 - samples/sec: 207.33 - lr: 0.000060 - momentum: 0.000000 2023-10-06 20:53:53,189 epoch 7 - iter 135/275 - loss 0.04916826 - time (sec): 54.32 - samples/sec: 207.47 - lr: 0.000059 - momentum: 0.000000 2023-10-06 20:54:03,634 epoch 7 - iter 162/275 - loss 0.05453263 - time (sec): 64.77 - samples/sec: 205.22 - lr: 0.000057 - momentum: 0.000000 2023-10-06 20:54:14,657 epoch 7 - iter 189/275 - loss 0.05089647 - time (sec): 75.79 - samples/sec: 205.49 - lr: 0.000056 - momentum: 0.000000 2023-10-06 20:54:25,774 epoch 7 - iter 216/275 - loss 0.05096154 - time (sec): 86.91 - samples/sec: 205.92 - lr: 0.000054 - momentum: 0.000000 2023-10-06 20:54:37,052 epoch 7 - iter 243/275 - loss 0.05331658 - time (sec): 98.18 - samples/sec: 207.30 - lr: 0.000052 - momentum: 0.000000 2023-10-06 20:54:47,632 epoch 7 - iter 270/275 - loss 0.05694342 - time (sec): 108.76 - samples/sec: 206.60 - lr: 0.000051 - momentum: 0.000000 2023-10-06 20:54:49,366 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:54:49,366 EPOCH 7 done: loss 0.0568 - lr: 0.000051 2023-10-06 20:54:56,092 DEV : loss 0.12154703587293625 - f1-score (micro avg) 0.8695 2023-10-06 20:54:56,097 saving best model 2023-10-06 20:55:00,483 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:55:10,631 epoch 8 - iter 27/275 - loss 0.04565948 - time (sec): 10.15 - samples/sec: 191.40 - lr: 0.000049 - momentum: 0.000000 2023-10-06 20:55:21,037 epoch 8 - iter 54/275 - loss 0.06174029 - time (sec): 20.55 - samples/sec: 201.59 - lr: 0.000047 - momentum: 0.000000 2023-10-06 20:55:31,987 epoch 8 - iter 81/275 - loss 0.05427538 - time (sec): 31.50 - samples/sec: 205.38 - lr: 0.000045 - momentum: 0.000000 2023-10-06 20:55:42,945 epoch 8 - iter 108/275 - loss 0.05040491 - time (sec): 42.46 - samples/sec: 206.34 - lr: 0.000044 - momentum: 0.000000 2023-10-06 20:55:53,321 epoch 8 - iter 135/275 - loss 0.04921226 - time (sec): 52.84 - samples/sec: 203.27 - lr: 0.000042 - momentum: 0.000000 2023-10-06 20:56:04,857 epoch 8 - iter 162/275 - loss 0.05030560 - time (sec): 64.37 - samples/sec: 205.65 - lr: 0.000041 - momentum: 0.000000 2023-10-06 20:56:14,975 epoch 8 - iter 189/275 - loss 0.04905001 - time (sec): 74.49 - samples/sec: 205.13 - lr: 0.000039 - momentum: 0.000000 2023-10-06 20:56:26,443 epoch 8 - iter 216/275 - loss 0.04739645 - time (sec): 85.96 - samples/sec: 206.03 - lr: 0.000037 - momentum: 0.000000 2023-10-06 20:56:38,162 epoch 8 - iter 243/275 - loss 0.04880165 - time (sec): 97.68 - samples/sec: 206.19 - lr: 0.000036 - momentum: 0.000000 2023-10-06 20:56:48,961 epoch 8 - iter 270/275 - loss 0.04732420 - time (sec): 108.48 - samples/sec: 205.91 - lr: 0.000034 - momentum: 0.000000 2023-10-06 20:56:51,030 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:56:51,030 EPOCH 8 done: loss 0.0465 - lr: 0.000034 2023-10-06 20:56:57,714 DEV : loss 0.12373079359531403 - f1-score (micro avg) 0.8668 2023-10-06 20:56:57,719 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:57:08,225 epoch 9 - iter 27/275 - loss 0.02881863 - time (sec): 10.50 - samples/sec: 204.50 - lr: 0.000032 - momentum: 0.000000 2023-10-06 20:57:18,849 epoch 9 - iter 54/275 - loss 0.04766868 - time (sec): 21.13 - samples/sec: 204.81 - lr: 0.000030 - momentum: 0.000000 2023-10-06 20:57:29,593 epoch 9 - iter 81/275 - loss 0.04945213 - time (sec): 31.87 - samples/sec: 204.48 - lr: 0.000029 - momentum: 0.000000 2023-10-06 20:57:40,917 epoch 9 - iter 108/275 - loss 0.04948668 - time (sec): 43.20 - samples/sec: 204.81 - lr: 0.000027 - momentum: 0.000000 2023-10-06 20:57:51,779 epoch 9 - iter 135/275 - loss 0.05271936 - time (sec): 54.06 - samples/sec: 205.15 - lr: 0.000026 - momentum: 0.000000 2023-10-06 20:58:02,196 epoch 9 - iter 162/275 - loss 0.05315498 - time (sec): 64.47 - samples/sec: 205.43 - lr: 0.000024 - momentum: 0.000000 2023-10-06 20:58:14,113 epoch 9 - iter 189/275 - loss 0.04591171 - time (sec): 76.39 - samples/sec: 206.84 - lr: 0.000022 - momentum: 0.000000 2023-10-06 20:58:24,875 epoch 9 - iter 216/275 - loss 0.04283486 - time (sec): 87.15 - samples/sec: 206.73 - lr: 0.000021 - momentum: 0.000000 2023-10-06 20:58:35,090 epoch 9 - iter 243/275 - loss 0.03967070 - time (sec): 97.37 - samples/sec: 205.61 - lr: 0.000019 - momentum: 0.000000 2023-10-06 20:58:45,978 epoch 9 - iter 270/275 - loss 0.04036106 - time (sec): 108.26 - samples/sec: 206.32 - lr: 0.000017 - momentum: 0.000000 2023-10-06 20:58:47,978 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:58:47,978 EPOCH 9 done: loss 0.0402 - lr: 0.000017 2023-10-06 20:58:54,661 DEV : loss 0.126312717795372 - f1-score (micro avg) 0.8752 2023-10-06 20:58:54,667 saving best model 2023-10-06 20:58:59,057 ---------------------------------------------------------------------------------------------------- 2023-10-06 20:59:10,177 epoch 10 - iter 27/275 - loss 0.06065561 - time (sec): 11.12 - samples/sec: 204.36 - lr: 0.000015 - momentum: 0.000000 2023-10-06 20:59:20,842 epoch 10 - iter 54/275 - loss 0.05483908 - time (sec): 21.78 - samples/sec: 203.50 - lr: 0.000014 - momentum: 0.000000 2023-10-06 20:59:31,444 epoch 10 - iter 81/275 - loss 0.04534089 - time (sec): 32.38 - samples/sec: 205.00 - lr: 0.000012 - momentum: 0.000000 2023-10-06 20:59:42,516 epoch 10 - iter 108/275 - loss 0.04445337 - time (sec): 43.46 - samples/sec: 206.32 - lr: 0.000011 - momentum: 0.000000 2023-10-06 20:59:53,025 epoch 10 - iter 135/275 - loss 0.03812986 - time (sec): 53.97 - samples/sec: 204.48 - lr: 0.000009 - momentum: 0.000000 2023-10-06 21:00:03,647 epoch 10 - iter 162/275 - loss 0.04007420 - time (sec): 64.59 - samples/sec: 204.17 - lr: 0.000007 - momentum: 0.000000 2023-10-06 21:00:14,653 epoch 10 - iter 189/275 - loss 0.03835348 - time (sec): 75.59 - samples/sec: 204.39 - lr: 0.000006 - momentum: 0.000000 2023-10-06 21:00:25,689 epoch 10 - iter 216/275 - loss 0.03808426 - time (sec): 86.63 - samples/sec: 205.10 - lr: 0.000004 - momentum: 0.000000 2023-10-06 21:00:36,553 epoch 10 - iter 243/275 - loss 0.03670371 - time (sec): 97.49 - samples/sec: 205.86 - lr: 0.000002 - momentum: 0.000000 2023-10-06 21:00:47,225 epoch 10 - iter 270/275 - loss 0.03786440 - time (sec): 108.17 - samples/sec: 206.28 - lr: 0.000001 - momentum: 0.000000 2023-10-06 21:00:49,405 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:00:49,405 EPOCH 10 done: loss 0.0376 - lr: 0.000001 2023-10-06 21:00:56,101 DEV : loss 0.12622681260108948 - f1-score (micro avg) 0.8786 2023-10-06 21:00:56,107 saving best model 2023-10-06 21:01:01,443 ---------------------------------------------------------------------------------------------------- 2023-10-06 21:01:01,445 Loading model from best epoch ... 2023-10-06 21:01:06,293 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-06 21:01:13,499 Results: - F-score (micro) 0.8903 - F-score (macro) 0.6643 - Accuracy 0.8197 By class: precision recall f1-score support scope 0.8977 0.8977 0.8977 176 pers 0.9154 0.9297 0.9225 128 work 0.8182 0.8514 0.8344 74 loc 1.0000 0.5000 0.6667 2 object 0.0000 0.0000 0.0000 2 micro avg 0.8880 0.8927 0.8903 382 macro avg 0.7263 0.6358 0.6643 382 weighted avg 0.8841 0.8927 0.8879 382 2023-10-06 21:01:13,499 ----------------------------------------------------------------------------------------------------