2023-10-06 09:54:44,470 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,471 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 09:54:44,471 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,472 MultiCorpus: 1214 train + 266 dev + 251 test sentences - NER_HIPE_2022 Corpus: 1214 train + 266 dev + 251 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/en/with_doc_seperator 2023-10-06 09:54:44,472 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,472 Train: 1214 sentences 2023-10-06 09:54:44,472 (train_with_dev=False, train_with_test=False) 2023-10-06 09:54:44,472 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,472 Training Params: 2023-10-06 09:54:44,472 - learning_rate: "0.00016" 2023-10-06 09:54:44,472 - mini_batch_size: "4" 2023-10-06 09:54:44,472 - max_epochs: "10" 2023-10-06 09:54:44,472 - shuffle: "True" 2023-10-06 09:54:44,472 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,472 Plugins: 2023-10-06 09:54:44,472 - TensorboardLogger 2023-10-06 09:54:44,472 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 09:54:44,472 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,472 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 09:54:44,472 - metric: "('micro avg', 'f1-score')" 2023-10-06 09:54:44,473 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,473 Computation: 2023-10-06 09:54:44,473 - compute on device: cuda:0 2023-10-06 09:54:44,473 - embedding storage: none 2023-10-06 09:54:44,473 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,473 Model training base path: "hmbench-ajmc/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-06 09:54:44,473 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,473 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:54:44,473 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 09:54:56,376 epoch 1 - iter 30/304 - loss 3.22742643 - time (sec): 11.90 - samples/sec: 288.60 - lr: 0.000015 - momentum: 0.000000 2023-10-06 09:55:07,887 epoch 1 - iter 60/304 - loss 3.21915042 - time (sec): 23.41 - samples/sec: 287.96 - lr: 0.000031 - momentum: 0.000000 2023-10-06 09:55:18,992 epoch 1 - iter 90/304 - loss 3.19773545 - time (sec): 34.52 - samples/sec: 282.12 - lr: 0.000047 - momentum: 0.000000 2023-10-06 09:55:30,310 epoch 1 - iter 120/304 - loss 3.13133843 - time (sec): 45.84 - samples/sec: 278.62 - lr: 0.000063 - momentum: 0.000000 2023-10-06 09:55:41,121 epoch 1 - iter 150/304 - loss 3.03631970 - time (sec): 56.65 - samples/sec: 274.31 - lr: 0.000078 - momentum: 0.000000 2023-10-06 09:55:51,674 epoch 1 - iter 180/304 - loss 2.92654304 - time (sec): 67.20 - samples/sec: 271.15 - lr: 0.000094 - momentum: 0.000000 2023-10-06 09:56:02,500 epoch 1 - iter 210/304 - loss 2.80021159 - time (sec): 78.03 - samples/sec: 270.63 - lr: 0.000110 - momentum: 0.000000 2023-10-06 09:56:13,664 epoch 1 - iter 240/304 - loss 2.66006911 - time (sec): 89.19 - samples/sec: 270.26 - lr: 0.000126 - momentum: 0.000000 2023-10-06 09:56:25,332 epoch 1 - iter 270/304 - loss 2.50546540 - time (sec): 100.86 - samples/sec: 269.65 - lr: 0.000142 - momentum: 0.000000 2023-10-06 09:56:37,599 epoch 1 - iter 300/304 - loss 2.33574461 - time (sec): 113.13 - samples/sec: 271.35 - lr: 0.000157 - momentum: 0.000000 2023-10-06 09:56:38,870 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:56:38,870 EPOCH 1 done: loss 2.3228 - lr: 0.000157 2023-10-06 09:56:46,110 DEV : loss 0.8970457911491394 - f1-score (micro avg) 0.0 2023-10-06 09:56:46,116 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:56:57,526 epoch 2 - iter 30/304 - loss 0.81144182 - time (sec): 11.41 - samples/sec: 269.45 - lr: 0.000158 - momentum: 0.000000 2023-10-06 09:57:09,002 epoch 2 - iter 60/304 - loss 0.75433976 - time (sec): 22.89 - samples/sec: 263.93 - lr: 0.000157 - momentum: 0.000000 2023-10-06 09:57:20,565 epoch 2 - iter 90/304 - loss 0.71555452 - time (sec): 34.45 - samples/sec: 263.96 - lr: 0.000155 - momentum: 0.000000 2023-10-06 09:57:32,831 epoch 2 - iter 120/304 - loss 0.66835125 - time (sec): 46.71 - samples/sec: 268.44 - lr: 0.000153 - momentum: 0.000000 2023-10-06 09:57:43,867 epoch 2 - iter 150/304 - loss 0.63094298 - time (sec): 57.75 - samples/sec: 265.47 - lr: 0.000151 - momentum: 0.000000 2023-10-06 09:57:55,696 epoch 2 - iter 180/304 - loss 0.57924068 - time (sec): 69.58 - samples/sec: 264.84 - lr: 0.000150 - momentum: 0.000000 2023-10-06 09:58:07,350 epoch 2 - iter 210/304 - loss 0.53846233 - time (sec): 81.23 - samples/sec: 262.91 - lr: 0.000148 - momentum: 0.000000 2023-10-06 09:58:19,675 epoch 2 - iter 240/304 - loss 0.51040511 - time (sec): 93.56 - samples/sec: 262.44 - lr: 0.000146 - momentum: 0.000000 2023-10-06 09:58:31,862 epoch 2 - iter 270/304 - loss 0.49514733 - time (sec): 105.74 - samples/sec: 261.92 - lr: 0.000144 - momentum: 0.000000 2023-10-06 09:58:43,426 epoch 2 - iter 300/304 - loss 0.47727700 - time (sec): 117.31 - samples/sec: 261.15 - lr: 0.000143 - momentum: 0.000000 2023-10-06 09:58:44,875 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:58:44,876 EPOCH 2 done: loss 0.4740 - lr: 0.000143 2023-10-06 09:58:53,032 DEV : loss 0.31851717829704285 - f1-score (micro avg) 0.4416 2023-10-06 09:58:53,040 saving best model 2023-10-06 09:58:53,870 ---------------------------------------------------------------------------------------------------- 2023-10-06 09:59:06,320 epoch 3 - iter 30/304 - loss 0.26608500 - time (sec): 12.45 - samples/sec: 257.05 - lr: 0.000141 - momentum: 0.000000 2023-10-06 09:59:18,741 epoch 3 - iter 60/304 - loss 0.23335679 - time (sec): 24.87 - samples/sec: 257.38 - lr: 0.000139 - momentum: 0.000000 2023-10-06 09:59:30,077 epoch 3 - iter 90/304 - loss 0.22225252 - time (sec): 36.21 - samples/sec: 253.38 - lr: 0.000137 - momentum: 0.000000 2023-10-06 09:59:42,424 epoch 3 - iter 120/304 - loss 0.22829801 - time (sec): 48.55 - samples/sec: 256.38 - lr: 0.000135 - momentum: 0.000000 2023-10-06 09:59:53,935 epoch 3 - iter 150/304 - loss 0.22362351 - time (sec): 60.06 - samples/sec: 254.76 - lr: 0.000134 - momentum: 0.000000 2023-10-06 10:00:05,592 epoch 3 - iter 180/304 - loss 0.22215273 - time (sec): 71.72 - samples/sec: 255.33 - lr: 0.000132 - momentum: 0.000000 2023-10-06 10:00:17,824 epoch 3 - iter 210/304 - loss 0.21460363 - time (sec): 83.95 - samples/sec: 255.66 - lr: 0.000130 - momentum: 0.000000 2023-10-06 10:00:30,321 epoch 3 - iter 240/304 - loss 0.20673630 - time (sec): 96.45 - samples/sec: 256.78 - lr: 0.000128 - momentum: 0.000000 2023-10-06 10:00:41,730 epoch 3 - iter 270/304 - loss 0.20231564 - time (sec): 107.86 - samples/sec: 256.59 - lr: 0.000127 - momentum: 0.000000 2023-10-06 10:00:52,833 epoch 3 - iter 300/304 - loss 0.19777168 - time (sec): 118.96 - samples/sec: 256.91 - lr: 0.000125 - momentum: 0.000000 2023-10-06 10:00:54,278 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:00:54,278 EPOCH 3 done: loss 0.1959 - lr: 0.000125 2023-10-06 10:01:01,511 DEV : loss 0.18969886004924774 - f1-score (micro avg) 0.7057 2023-10-06 10:01:01,520 saving best model 2023-10-06 10:01:05,856 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:01:17,088 epoch 4 - iter 30/304 - loss 0.12053707 - time (sec): 11.23 - samples/sec: 270.96 - lr: 0.000123 - momentum: 0.000000 2023-10-06 10:01:28,458 epoch 4 - iter 60/304 - loss 0.13105014 - time (sec): 22.60 - samples/sec: 268.75 - lr: 0.000121 - momentum: 0.000000 2023-10-06 10:01:39,577 epoch 4 - iter 90/304 - loss 0.12897336 - time (sec): 33.72 - samples/sec: 266.85 - lr: 0.000119 - momentum: 0.000000 2023-10-06 10:01:50,991 epoch 4 - iter 120/304 - loss 0.12192808 - time (sec): 45.13 - samples/sec: 266.81 - lr: 0.000118 - momentum: 0.000000 2023-10-06 10:02:03,220 epoch 4 - iter 150/304 - loss 0.12197006 - time (sec): 57.36 - samples/sec: 271.13 - lr: 0.000116 - momentum: 0.000000 2023-10-06 10:02:14,426 epoch 4 - iter 180/304 - loss 0.11629911 - time (sec): 68.57 - samples/sec: 270.53 - lr: 0.000114 - momentum: 0.000000 2023-10-06 10:02:25,342 epoch 4 - iter 210/304 - loss 0.11116342 - time (sec): 79.48 - samples/sec: 270.01 - lr: 0.000112 - momentum: 0.000000 2023-10-06 10:02:36,770 epoch 4 - iter 240/304 - loss 0.10660005 - time (sec): 90.91 - samples/sec: 270.52 - lr: 0.000111 - momentum: 0.000000 2023-10-06 10:02:48,070 epoch 4 - iter 270/304 - loss 0.10527643 - time (sec): 102.21 - samples/sec: 269.97 - lr: 0.000109 - momentum: 0.000000 2023-10-06 10:02:59,578 epoch 4 - iter 300/304 - loss 0.10675985 - time (sec): 113.72 - samples/sec: 269.88 - lr: 0.000107 - momentum: 0.000000 2023-10-06 10:03:00,863 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:03:00,864 EPOCH 4 done: loss 0.1062 - lr: 0.000107 2023-10-06 10:03:08,009 DEV : loss 0.14084239304065704 - f1-score (micro avg) 0.8033 2023-10-06 10:03:08,018 saving best model 2023-10-06 10:03:12,371 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:03:23,771 epoch 5 - iter 30/304 - loss 0.06645253 - time (sec): 11.40 - samples/sec: 281.09 - lr: 0.000105 - momentum: 0.000000 2023-10-06 10:03:35,248 epoch 5 - iter 60/304 - loss 0.06334492 - time (sec): 22.88 - samples/sec: 274.19 - lr: 0.000103 - momentum: 0.000000 2023-10-06 10:03:46,693 epoch 5 - iter 90/304 - loss 0.07146608 - time (sec): 34.32 - samples/sec: 270.86 - lr: 0.000102 - momentum: 0.000000 2023-10-06 10:03:58,057 epoch 5 - iter 120/304 - loss 0.06442638 - time (sec): 45.68 - samples/sec: 268.61 - lr: 0.000100 - momentum: 0.000000 2023-10-06 10:04:09,037 epoch 5 - iter 150/304 - loss 0.06651026 - time (sec): 56.66 - samples/sec: 264.08 - lr: 0.000098 - momentum: 0.000000 2023-10-06 10:04:21,637 epoch 5 - iter 180/304 - loss 0.07204840 - time (sec): 69.26 - samples/sec: 266.08 - lr: 0.000096 - momentum: 0.000000 2023-10-06 10:04:32,999 epoch 5 - iter 210/304 - loss 0.07177304 - time (sec): 80.63 - samples/sec: 265.09 - lr: 0.000094 - momentum: 0.000000 2023-10-06 10:04:45,215 epoch 5 - iter 240/304 - loss 0.07011091 - time (sec): 92.84 - samples/sec: 265.16 - lr: 0.000093 - momentum: 0.000000 2023-10-06 10:04:56,839 epoch 5 - iter 270/304 - loss 0.07061563 - time (sec): 104.47 - samples/sec: 265.47 - lr: 0.000091 - momentum: 0.000000 2023-10-06 10:05:08,637 epoch 5 - iter 300/304 - loss 0.06762143 - time (sec): 116.26 - samples/sec: 264.50 - lr: 0.000089 - momentum: 0.000000 2023-10-06 10:05:09,823 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:05:09,823 EPOCH 5 done: loss 0.0676 - lr: 0.000089 2023-10-06 10:05:17,506 DEV : loss 0.14746029675006866 - f1-score (micro avg) 0.8037 2023-10-06 10:05:17,513 saving best model 2023-10-06 10:05:22,299 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:05:34,206 epoch 6 - iter 30/304 - loss 0.04698576 - time (sec): 11.90 - samples/sec: 261.58 - lr: 0.000087 - momentum: 0.000000 2023-10-06 10:05:45,667 epoch 6 - iter 60/304 - loss 0.05984385 - time (sec): 23.37 - samples/sec: 253.91 - lr: 0.000085 - momentum: 0.000000 2023-10-06 10:05:57,678 epoch 6 - iter 90/304 - loss 0.05653784 - time (sec): 35.38 - samples/sec: 257.09 - lr: 0.000084 - momentum: 0.000000 2023-10-06 10:06:09,777 epoch 6 - iter 120/304 - loss 0.04619937 - time (sec): 47.48 - samples/sec: 256.55 - lr: 0.000082 - momentum: 0.000000 2023-10-06 10:06:22,306 epoch 6 - iter 150/304 - loss 0.05426085 - time (sec): 60.01 - samples/sec: 257.01 - lr: 0.000080 - momentum: 0.000000 2023-10-06 10:06:34,087 epoch 6 - iter 180/304 - loss 0.05153527 - time (sec): 71.79 - samples/sec: 257.04 - lr: 0.000078 - momentum: 0.000000 2023-10-06 10:06:46,005 epoch 6 - iter 210/304 - loss 0.04893707 - time (sec): 83.70 - samples/sec: 255.79 - lr: 0.000077 - momentum: 0.000000 2023-10-06 10:06:57,998 epoch 6 - iter 240/304 - loss 0.05206313 - time (sec): 95.70 - samples/sec: 255.94 - lr: 0.000075 - momentum: 0.000000 2023-10-06 10:07:10,628 epoch 6 - iter 270/304 - loss 0.05177056 - time (sec): 108.33 - samples/sec: 255.76 - lr: 0.000073 - momentum: 0.000000 2023-10-06 10:07:22,152 epoch 6 - iter 300/304 - loss 0.05044602 - time (sec): 119.85 - samples/sec: 255.02 - lr: 0.000071 - momentum: 0.000000 2023-10-06 10:07:23,729 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:07:23,730 EPOCH 6 done: loss 0.0507 - lr: 0.000071 2023-10-06 10:07:31,724 DEV : loss 0.15308107435703278 - f1-score (micro avg) 0.8071 2023-10-06 10:07:31,733 saving best model 2023-10-06 10:07:36,056 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:07:47,550 epoch 7 - iter 30/304 - loss 0.04688673 - time (sec): 11.49 - samples/sec: 245.11 - lr: 0.000069 - momentum: 0.000000 2023-10-06 10:07:59,541 epoch 7 - iter 60/304 - loss 0.05174308 - time (sec): 23.48 - samples/sec: 252.60 - lr: 0.000068 - momentum: 0.000000 2023-10-06 10:08:11,552 epoch 7 - iter 90/304 - loss 0.05838398 - time (sec): 35.49 - samples/sec: 255.59 - lr: 0.000066 - momentum: 0.000000 2023-10-06 10:08:23,942 epoch 7 - iter 120/304 - loss 0.04573919 - time (sec): 47.89 - samples/sec: 258.72 - lr: 0.000064 - momentum: 0.000000 2023-10-06 10:08:36,137 epoch 7 - iter 150/304 - loss 0.04438194 - time (sec): 60.08 - samples/sec: 258.27 - lr: 0.000062 - momentum: 0.000000 2023-10-06 10:08:47,827 epoch 7 - iter 180/304 - loss 0.04129594 - time (sec): 71.77 - samples/sec: 257.35 - lr: 0.000061 - momentum: 0.000000 2023-10-06 10:08:59,370 epoch 7 - iter 210/304 - loss 0.04159775 - time (sec): 83.31 - samples/sec: 255.87 - lr: 0.000059 - momentum: 0.000000 2023-10-06 10:09:11,738 epoch 7 - iter 240/304 - loss 0.04032401 - time (sec): 95.68 - samples/sec: 255.99 - lr: 0.000057 - momentum: 0.000000 2023-10-06 10:09:23,519 epoch 7 - iter 270/304 - loss 0.04184823 - time (sec): 107.46 - samples/sec: 255.77 - lr: 0.000055 - momentum: 0.000000 2023-10-06 10:09:35,883 epoch 7 - iter 300/304 - loss 0.03910036 - time (sec): 119.83 - samples/sec: 256.10 - lr: 0.000054 - momentum: 0.000000 2023-10-06 10:09:37,347 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:09:37,347 EPOCH 7 done: loss 0.0400 - lr: 0.000054 2023-10-06 10:09:45,183 DEV : loss 0.15628251433372498 - f1-score (micro avg) 0.8181 2023-10-06 10:09:45,190 saving best model 2023-10-06 10:09:49,516 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:10:02,225 epoch 8 - iter 30/304 - loss 0.01383221 - time (sec): 12.71 - samples/sec: 262.38 - lr: 0.000052 - momentum: 0.000000 2023-10-06 10:10:14,273 epoch 8 - iter 60/304 - loss 0.02596985 - time (sec): 24.76 - samples/sec: 261.20 - lr: 0.000050 - momentum: 0.000000 2023-10-06 10:10:26,103 epoch 8 - iter 90/304 - loss 0.02185886 - time (sec): 36.59 - samples/sec: 258.16 - lr: 0.000048 - momentum: 0.000000 2023-10-06 10:10:37,817 epoch 8 - iter 120/304 - loss 0.02907262 - time (sec): 48.30 - samples/sec: 255.92 - lr: 0.000046 - momentum: 0.000000 2023-10-06 10:10:49,679 epoch 8 - iter 150/304 - loss 0.02586838 - time (sec): 60.16 - samples/sec: 255.13 - lr: 0.000045 - momentum: 0.000000 2023-10-06 10:11:02,166 epoch 8 - iter 180/304 - loss 0.03153234 - time (sec): 72.65 - samples/sec: 256.35 - lr: 0.000043 - momentum: 0.000000 2023-10-06 10:11:12,938 epoch 8 - iter 210/304 - loss 0.03238097 - time (sec): 83.42 - samples/sec: 253.75 - lr: 0.000041 - momentum: 0.000000 2023-10-06 10:11:25,085 epoch 8 - iter 240/304 - loss 0.03115189 - time (sec): 95.57 - samples/sec: 254.39 - lr: 0.000039 - momentum: 0.000000 2023-10-06 10:11:37,295 epoch 8 - iter 270/304 - loss 0.03069472 - time (sec): 107.78 - samples/sec: 254.97 - lr: 0.000038 - momentum: 0.000000 2023-10-06 10:11:49,510 epoch 8 - iter 300/304 - loss 0.03294117 - time (sec): 119.99 - samples/sec: 255.33 - lr: 0.000036 - momentum: 0.000000 2023-10-06 10:11:50,987 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:11:50,987 EPOCH 8 done: loss 0.0326 - lr: 0.000036 2023-10-06 10:11:58,982 DEV : loss 0.16147054731845856 - f1-score (micro avg) 0.842 2023-10-06 10:11:58,990 saving best model 2023-10-06 10:12:03,319 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:12:14,831 epoch 9 - iter 30/304 - loss 0.03375994 - time (sec): 11.51 - samples/sec: 265.76 - lr: 0.000034 - momentum: 0.000000 2023-10-06 10:12:26,424 epoch 9 - iter 60/304 - loss 0.03249143 - time (sec): 23.10 - samples/sec: 268.41 - lr: 0.000032 - momentum: 0.000000 2023-10-06 10:12:37,836 epoch 9 - iter 90/304 - loss 0.03227701 - time (sec): 34.51 - samples/sec: 268.12 - lr: 0.000030 - momentum: 0.000000 2023-10-06 10:12:49,318 epoch 9 - iter 120/304 - loss 0.03168926 - time (sec): 46.00 - samples/sec: 269.93 - lr: 0.000029 - momentum: 0.000000 2023-10-06 10:12:59,865 epoch 9 - iter 150/304 - loss 0.02780577 - time (sec): 56.54 - samples/sec: 267.92 - lr: 0.000027 - momentum: 0.000000 2023-10-06 10:13:11,363 epoch 9 - iter 180/304 - loss 0.02743918 - time (sec): 68.04 - samples/sec: 269.23 - lr: 0.000025 - momentum: 0.000000 2023-10-06 10:13:22,294 epoch 9 - iter 210/304 - loss 0.02537944 - time (sec): 78.97 - samples/sec: 269.91 - lr: 0.000023 - momentum: 0.000000 2023-10-06 10:13:33,769 epoch 9 - iter 240/304 - loss 0.02592961 - time (sec): 90.45 - samples/sec: 270.22 - lr: 0.000022 - momentum: 0.000000 2023-10-06 10:13:44,979 epoch 9 - iter 270/304 - loss 0.03003227 - time (sec): 101.66 - samples/sec: 270.14 - lr: 0.000020 - momentum: 0.000000 2023-10-06 10:13:56,535 epoch 9 - iter 300/304 - loss 0.02736023 - time (sec): 113.21 - samples/sec: 270.09 - lr: 0.000018 - momentum: 0.000000 2023-10-06 10:13:57,990 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:13:57,990 EPOCH 9 done: loss 0.0270 - lr: 0.000018 2023-10-06 10:14:05,126 DEV : loss 0.1611461490392685 - f1-score (micro avg) 0.838 2023-10-06 10:14:05,135 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:14:16,176 epoch 10 - iter 30/304 - loss 0.05830548 - time (sec): 11.04 - samples/sec: 263.86 - lr: 0.000016 - momentum: 0.000000 2023-10-06 10:14:26,762 epoch 10 - iter 60/304 - loss 0.03839883 - time (sec): 21.63 - samples/sec: 259.36 - lr: 0.000014 - momentum: 0.000000 2023-10-06 10:14:38,590 epoch 10 - iter 90/304 - loss 0.02785645 - time (sec): 33.45 - samples/sec: 266.82 - lr: 0.000013 - momentum: 0.000000 2023-10-06 10:14:50,176 epoch 10 - iter 120/304 - loss 0.02580621 - time (sec): 45.04 - samples/sec: 270.00 - lr: 0.000011 - momentum: 0.000000 2023-10-06 10:15:01,694 epoch 10 - iter 150/304 - loss 0.02166950 - time (sec): 56.56 - samples/sec: 269.72 - lr: 0.000009 - momentum: 0.000000 2023-10-06 10:15:13,131 epoch 10 - iter 180/304 - loss 0.02232735 - time (sec): 67.99 - samples/sec: 270.60 - lr: 0.000007 - momentum: 0.000000 2023-10-06 10:15:24,968 epoch 10 - iter 210/304 - loss 0.02484298 - time (sec): 79.83 - samples/sec: 271.22 - lr: 0.000006 - momentum: 0.000000 2023-10-06 10:15:36,078 epoch 10 - iter 240/304 - loss 0.02347145 - time (sec): 90.94 - samples/sec: 270.75 - lr: 0.000004 - momentum: 0.000000 2023-10-06 10:15:47,348 epoch 10 - iter 270/304 - loss 0.02655606 - time (sec): 102.21 - samples/sec: 270.85 - lr: 0.000002 - momentum: 0.000000 2023-10-06 10:15:58,494 epoch 10 - iter 300/304 - loss 0.02544801 - time (sec): 113.36 - samples/sec: 270.31 - lr: 0.000000 - momentum: 0.000000 2023-10-06 10:15:59,811 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:15:59,811 EPOCH 10 done: loss 0.0252 - lr: 0.000000 2023-10-06 10:16:07,059 DEV : loss 0.16277427971363068 - f1-score (micro avg) 0.8367 2023-10-06 10:16:07,919 ---------------------------------------------------------------------------------------------------- 2023-10-06 10:16:07,921 Loading model from best epoch ... 2023-10-06 10:16:10,830 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-date, B-date, E-date, I-date, S-object, B-object, E-object, I-object 2023-10-06 10:16:17,666 Results: - F-score (micro) 0.7995 - F-score (macro) 0.5565 - Accuracy 0.6767 By class: precision recall f1-score support scope 0.7595 0.7947 0.7767 151 work 0.7455 0.8632 0.8000 95 pers 0.8241 0.9271 0.8725 96 loc 0.2222 0.6667 0.3333 3 date 0.0000 0.0000 0.0000 3 micro avg 0.7610 0.8420 0.7995 348 macro avg 0.5102 0.6503 0.5565 348 weighted avg 0.7623 0.8420 0.7990 348 2023-10-06 10:16:17,666 ----------------------------------------------------------------------------------------------------