2023-10-14 00:15:26,580 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,582 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 00:15:26,582 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,582 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-14 00:15:26,583 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,583 Train: 14465 sentences 2023-10-14 00:15:26,583 (train_with_dev=False, train_with_test=False) 2023-10-14 00:15:26,583 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,583 Training Params: 2023-10-14 00:15:26,583 - learning_rate: "0.00016" 2023-10-14 00:15:26,583 - mini_batch_size: "8" 2023-10-14 00:15:26,583 - max_epochs: "10" 2023-10-14 00:15:26,583 - shuffle: "True" 2023-10-14 00:15:26,583 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,583 Plugins: 2023-10-14 00:15:26,583 - TensorboardLogger 2023-10-14 00:15:26,583 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 00:15:26,584 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,584 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 00:15:26,584 - metric: "('micro avg', 'f1-score')" 2023-10-14 00:15:26,584 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,584 Computation: 2023-10-14 00:15:26,584 - compute on device: cuda:0 2023-10-14 00:15:26,584 - embedding storage: none 2023-10-14 00:15:26,584 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,584 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-14 00:15:26,584 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,584 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:15:26,584 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 00:16:57,553 epoch 1 - iter 180/1809 - loss 2.51944389 - time (sec): 90.97 - samples/sec: 413.65 - lr: 0.000016 - momentum: 0.000000 2023-10-14 00:18:29,092 epoch 1 - iter 360/1809 - loss 2.26637505 - time (sec): 182.51 - samples/sec: 406.16 - lr: 0.000032 - momentum: 0.000000 2023-10-14 00:20:01,815 epoch 1 - iter 540/1809 - loss 1.90816370 - time (sec): 275.23 - samples/sec: 409.55 - lr: 0.000048 - momentum: 0.000000 2023-10-14 00:21:31,625 epoch 1 - iter 720/1809 - loss 1.56194036 - time (sec): 365.04 - samples/sec: 413.28 - lr: 0.000064 - momentum: 0.000000 2023-10-14 00:23:01,817 epoch 1 - iter 900/1809 - loss 1.30444465 - time (sec): 455.23 - samples/sec: 414.24 - lr: 0.000080 - momentum: 0.000000 2023-10-14 00:24:33,627 epoch 1 - iter 1080/1809 - loss 1.11961360 - time (sec): 547.04 - samples/sec: 413.43 - lr: 0.000095 - momentum: 0.000000 2023-10-14 00:26:03,228 epoch 1 - iter 1260/1809 - loss 0.98696147 - time (sec): 636.64 - samples/sec: 412.31 - lr: 0.000111 - momentum: 0.000000 2023-10-14 00:27:35,596 epoch 1 - iter 1440/1809 - loss 0.88259280 - time (sec): 729.01 - samples/sec: 411.99 - lr: 0.000127 - momentum: 0.000000 2023-10-14 00:29:08,136 epoch 1 - iter 1620/1809 - loss 0.79457894 - time (sec): 821.55 - samples/sec: 413.43 - lr: 0.000143 - momentum: 0.000000 2023-10-14 00:30:36,487 epoch 1 - iter 1800/1809 - loss 0.72720679 - time (sec): 909.90 - samples/sec: 415.88 - lr: 0.000159 - momentum: 0.000000 2023-10-14 00:30:40,375 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:30:40,375 EPOCH 1 done: loss 0.7248 - lr: 0.000159 2023-10-14 00:31:18,366 DEV : loss 0.13587747514247894 - f1-score (micro avg) 0.5512 2023-10-14 00:31:18,430 saving best model 2023-10-14 00:31:19,337 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:32:49,760 epoch 2 - iter 180/1809 - loss 0.10182046 - time (sec): 90.42 - samples/sec: 417.47 - lr: 0.000158 - momentum: 0.000000 2023-10-14 00:34:19,933 epoch 2 - iter 360/1809 - loss 0.10191876 - time (sec): 180.59 - samples/sec: 413.42 - lr: 0.000156 - momentum: 0.000000 2023-10-14 00:35:49,015 epoch 2 - iter 540/1809 - loss 0.09941428 - time (sec): 269.68 - samples/sec: 429.78 - lr: 0.000155 - momentum: 0.000000 2023-10-14 00:37:16,788 epoch 2 - iter 720/1809 - loss 0.09701065 - time (sec): 357.45 - samples/sec: 432.34 - lr: 0.000153 - momentum: 0.000000 2023-10-14 00:38:45,501 epoch 2 - iter 900/1809 - loss 0.09626668 - time (sec): 446.16 - samples/sec: 431.73 - lr: 0.000151 - momentum: 0.000000 2023-10-14 00:40:13,863 epoch 2 - iter 1080/1809 - loss 0.09449862 - time (sec): 534.52 - samples/sec: 428.08 - lr: 0.000149 - momentum: 0.000000 2023-10-14 00:41:41,920 epoch 2 - iter 1260/1809 - loss 0.09221887 - time (sec): 622.58 - samples/sec: 427.87 - lr: 0.000148 - momentum: 0.000000 2023-10-14 00:43:10,709 epoch 2 - iter 1440/1809 - loss 0.09076303 - time (sec): 711.37 - samples/sec: 426.55 - lr: 0.000146 - momentum: 0.000000 2023-10-14 00:44:41,103 epoch 2 - iter 1620/1809 - loss 0.08931886 - time (sec): 801.76 - samples/sec: 425.61 - lr: 0.000144 - momentum: 0.000000 2023-10-14 00:46:10,438 epoch 2 - iter 1800/1809 - loss 0.08787461 - time (sec): 891.10 - samples/sec: 424.53 - lr: 0.000142 - momentum: 0.000000 2023-10-14 00:46:14,383 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:46:14,383 EPOCH 2 done: loss 0.0879 - lr: 0.000142 2023-10-14 00:46:52,480 DEV : loss 0.10200479626655579 - f1-score (micro avg) 0.6091 2023-10-14 00:46:52,536 saving best model 2023-10-14 00:46:55,099 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:48:24,885 epoch 3 - iter 180/1809 - loss 0.06314980 - time (sec): 89.78 - samples/sec: 416.42 - lr: 0.000140 - momentum: 0.000000 2023-10-14 00:49:56,919 epoch 3 - iter 360/1809 - loss 0.05882268 - time (sec): 181.82 - samples/sec: 422.98 - lr: 0.000139 - momentum: 0.000000 2023-10-14 00:51:27,745 epoch 3 - iter 540/1809 - loss 0.05798093 - time (sec): 272.64 - samples/sec: 420.56 - lr: 0.000137 - momentum: 0.000000 2023-10-14 00:52:57,778 epoch 3 - iter 720/1809 - loss 0.05728226 - time (sec): 362.67 - samples/sec: 419.92 - lr: 0.000135 - momentum: 0.000000 2023-10-14 00:54:27,582 epoch 3 - iter 900/1809 - loss 0.05773776 - time (sec): 452.48 - samples/sec: 423.51 - lr: 0.000133 - momentum: 0.000000 2023-10-14 00:55:56,659 epoch 3 - iter 1080/1809 - loss 0.05718530 - time (sec): 541.56 - samples/sec: 424.38 - lr: 0.000132 - momentum: 0.000000 2023-10-14 00:57:26,791 epoch 3 - iter 1260/1809 - loss 0.05787219 - time (sec): 631.69 - samples/sec: 420.89 - lr: 0.000130 - momentum: 0.000000 2023-10-14 00:58:55,840 epoch 3 - iter 1440/1809 - loss 0.05720897 - time (sec): 720.74 - samples/sec: 420.23 - lr: 0.000128 - momentum: 0.000000 2023-10-14 01:00:24,566 epoch 3 - iter 1620/1809 - loss 0.05641677 - time (sec): 809.46 - samples/sec: 419.21 - lr: 0.000126 - momentum: 0.000000 2023-10-14 01:01:57,412 epoch 3 - iter 1800/1809 - loss 0.05640597 - time (sec): 902.31 - samples/sec: 418.82 - lr: 0.000125 - momentum: 0.000000 2023-10-14 01:02:01,841 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:02:01,842 EPOCH 3 done: loss 0.0563 - lr: 0.000125 2023-10-14 01:02:41,767 DEV : loss 0.13950783014297485 - f1-score (micro avg) 0.6198 2023-10-14 01:02:41,831 saving best model 2023-10-14 01:02:44,406 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:04:14,083 epoch 4 - iter 180/1809 - loss 0.04170364 - time (sec): 89.67 - samples/sec: 407.77 - lr: 0.000123 - momentum: 0.000000 2023-10-14 01:05:46,581 epoch 4 - iter 360/1809 - loss 0.03858075 - time (sec): 182.17 - samples/sec: 411.86 - lr: 0.000121 - momentum: 0.000000 2023-10-14 01:07:17,093 epoch 4 - iter 540/1809 - loss 0.03981754 - time (sec): 272.68 - samples/sec: 412.97 - lr: 0.000119 - momentum: 0.000000 2023-10-14 01:08:47,525 epoch 4 - iter 720/1809 - loss 0.03847050 - time (sec): 363.11 - samples/sec: 411.25 - lr: 0.000117 - momentum: 0.000000 2023-10-14 01:10:18,340 epoch 4 - iter 900/1809 - loss 0.03892181 - time (sec): 453.93 - samples/sec: 411.93 - lr: 0.000116 - momentum: 0.000000 2023-10-14 01:11:50,274 epoch 4 - iter 1080/1809 - loss 0.04023787 - time (sec): 545.86 - samples/sec: 415.14 - lr: 0.000114 - momentum: 0.000000 2023-10-14 01:13:22,824 epoch 4 - iter 1260/1809 - loss 0.04131278 - time (sec): 638.41 - samples/sec: 415.03 - lr: 0.000112 - momentum: 0.000000 2023-10-14 01:14:51,158 epoch 4 - iter 1440/1809 - loss 0.04134067 - time (sec): 726.75 - samples/sec: 416.20 - lr: 0.000110 - momentum: 0.000000 2023-10-14 01:16:20,419 epoch 4 - iter 1620/1809 - loss 0.04065918 - time (sec): 816.01 - samples/sec: 418.02 - lr: 0.000109 - momentum: 0.000000 2023-10-14 01:17:50,525 epoch 4 - iter 1800/1809 - loss 0.04036183 - time (sec): 906.11 - samples/sec: 417.55 - lr: 0.000107 - momentum: 0.000000 2023-10-14 01:17:54,624 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:17:54,624 EPOCH 4 done: loss 0.0403 - lr: 0.000107 2023-10-14 01:18:35,366 DEV : loss 0.19245745241641998 - f1-score (micro avg) 0.6321 2023-10-14 01:18:35,431 saving best model 2023-10-14 01:18:38,017 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:20:09,743 epoch 5 - iter 180/1809 - loss 0.02576443 - time (sec): 91.72 - samples/sec: 418.78 - lr: 0.000105 - momentum: 0.000000 2023-10-14 01:21:44,400 epoch 5 - iter 360/1809 - loss 0.02641732 - time (sec): 186.38 - samples/sec: 408.40 - lr: 0.000103 - momentum: 0.000000 2023-10-14 01:23:15,834 epoch 5 - iter 540/1809 - loss 0.02651451 - time (sec): 277.81 - samples/sec: 407.70 - lr: 0.000101 - momentum: 0.000000 2023-10-14 01:24:43,605 epoch 5 - iter 720/1809 - loss 0.02751988 - time (sec): 365.58 - samples/sec: 409.65 - lr: 0.000100 - momentum: 0.000000 2023-10-14 01:26:12,328 epoch 5 - iter 900/1809 - loss 0.02767393 - time (sec): 454.31 - samples/sec: 414.53 - lr: 0.000098 - momentum: 0.000000 2023-10-14 01:27:41,085 epoch 5 - iter 1080/1809 - loss 0.02796036 - time (sec): 543.06 - samples/sec: 416.11 - lr: 0.000096 - momentum: 0.000000 2023-10-14 01:29:10,393 epoch 5 - iter 1260/1809 - loss 0.02785250 - time (sec): 632.37 - samples/sec: 414.60 - lr: 0.000094 - momentum: 0.000000 2023-10-14 01:30:45,505 epoch 5 - iter 1440/1809 - loss 0.02843191 - time (sec): 727.48 - samples/sec: 412.25 - lr: 0.000093 - momentum: 0.000000 2023-10-14 01:32:18,477 epoch 5 - iter 1620/1809 - loss 0.02924215 - time (sec): 820.46 - samples/sec: 412.99 - lr: 0.000091 - momentum: 0.000000 2023-10-14 01:33:51,099 epoch 5 - iter 1800/1809 - loss 0.02925813 - time (sec): 913.08 - samples/sec: 414.13 - lr: 0.000089 - momentum: 0.000000 2023-10-14 01:33:55,154 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:33:55,155 EPOCH 5 done: loss 0.0293 - lr: 0.000089 2023-10-14 01:34:33,142 DEV : loss 0.2196071296930313 - f1-score (micro avg) 0.6454 2023-10-14 01:34:33,199 saving best model 2023-10-14 01:34:35,762 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:36:06,097 epoch 6 - iter 180/1809 - loss 0.01696868 - time (sec): 90.33 - samples/sec: 434.02 - lr: 0.000087 - momentum: 0.000000 2023-10-14 01:37:36,160 epoch 6 - iter 360/1809 - loss 0.01685237 - time (sec): 180.39 - samples/sec: 426.33 - lr: 0.000085 - momentum: 0.000000 2023-10-14 01:39:05,475 epoch 6 - iter 540/1809 - loss 0.02067304 - time (sec): 269.71 - samples/sec: 422.42 - lr: 0.000084 - momentum: 0.000000 2023-10-14 01:40:35,631 epoch 6 - iter 720/1809 - loss 0.02166279 - time (sec): 359.87 - samples/sec: 418.94 - lr: 0.000082 - momentum: 0.000000 2023-10-14 01:42:08,370 epoch 6 - iter 900/1809 - loss 0.02213544 - time (sec): 452.60 - samples/sec: 412.98 - lr: 0.000080 - momentum: 0.000000 2023-10-14 01:43:40,493 epoch 6 - iter 1080/1809 - loss 0.02224410 - time (sec): 544.73 - samples/sec: 412.51 - lr: 0.000078 - momentum: 0.000000 2023-10-14 01:45:13,554 epoch 6 - iter 1260/1809 - loss 0.02163785 - time (sec): 637.79 - samples/sec: 414.18 - lr: 0.000077 - momentum: 0.000000 2023-10-14 01:46:46,037 epoch 6 - iter 1440/1809 - loss 0.02152121 - time (sec): 730.27 - samples/sec: 415.13 - lr: 0.000075 - momentum: 0.000000 2023-10-14 01:48:18,422 epoch 6 - iter 1620/1809 - loss 0.02245358 - time (sec): 822.66 - samples/sec: 412.61 - lr: 0.000073 - momentum: 0.000000 2023-10-14 01:49:50,889 epoch 6 - iter 1800/1809 - loss 0.02230601 - time (sec): 915.12 - samples/sec: 413.40 - lr: 0.000071 - momentum: 0.000000 2023-10-14 01:49:54,955 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:49:54,956 EPOCH 6 done: loss 0.0222 - lr: 0.000071 2023-10-14 01:50:36,500 DEV : loss 0.25768402218818665 - f1-score (micro avg) 0.6629 2023-10-14 01:50:36,559 saving best model 2023-10-14 01:50:39,137 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:52:12,077 epoch 7 - iter 180/1809 - loss 0.00921054 - time (sec): 92.94 - samples/sec: 411.47 - lr: 0.000069 - momentum: 0.000000 2023-10-14 01:53:50,733 epoch 7 - iter 360/1809 - loss 0.01168701 - time (sec): 191.59 - samples/sec: 395.89 - lr: 0.000068 - momentum: 0.000000 2023-10-14 01:55:21,611 epoch 7 - iter 540/1809 - loss 0.01305153 - time (sec): 282.47 - samples/sec: 400.63 - lr: 0.000066 - momentum: 0.000000 2023-10-14 01:56:55,925 epoch 7 - iter 720/1809 - loss 0.01260349 - time (sec): 376.78 - samples/sec: 404.80 - lr: 0.000064 - momentum: 0.000000 2023-10-14 01:58:26,714 epoch 7 - iter 900/1809 - loss 0.01297523 - time (sec): 467.57 - samples/sec: 406.34 - lr: 0.000062 - momentum: 0.000000 2023-10-14 02:00:02,490 epoch 7 - iter 1080/1809 - loss 0.01344991 - time (sec): 563.35 - samples/sec: 404.29 - lr: 0.000061 - momentum: 0.000000 2023-10-14 02:01:34,331 epoch 7 - iter 1260/1809 - loss 0.01433938 - time (sec): 655.19 - samples/sec: 406.81 - lr: 0.000059 - momentum: 0.000000 2023-10-14 02:03:06,795 epoch 7 - iter 1440/1809 - loss 0.01477713 - time (sec): 747.65 - samples/sec: 406.47 - lr: 0.000057 - momentum: 0.000000 2023-10-14 02:04:40,603 epoch 7 - iter 1620/1809 - loss 0.01477570 - time (sec): 841.46 - samples/sec: 406.11 - lr: 0.000055 - momentum: 0.000000 2023-10-14 02:06:15,178 epoch 7 - iter 1800/1809 - loss 0.01511790 - time (sec): 936.04 - samples/sec: 404.12 - lr: 0.000053 - momentum: 0.000000 2023-10-14 02:06:19,433 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:06:19,433 EPOCH 7 done: loss 0.0151 - lr: 0.000053 2023-10-14 02:06:59,410 DEV : loss 0.2951534390449524 - f1-score (micro avg) 0.6663 2023-10-14 02:06:59,467 saving best model 2023-10-14 02:07:02,034 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:08:31,365 epoch 8 - iter 180/1809 - loss 0.00690787 - time (sec): 89.33 - samples/sec: 414.43 - lr: 0.000052 - momentum: 0.000000 2023-10-14 02:10:01,793 epoch 8 - iter 360/1809 - loss 0.01015270 - time (sec): 179.75 - samples/sec: 417.93 - lr: 0.000050 - momentum: 0.000000 2023-10-14 02:11:31,245 epoch 8 - iter 540/1809 - loss 0.01075555 - time (sec): 269.21 - samples/sec: 425.42 - lr: 0.000048 - momentum: 0.000000 2023-10-14 02:12:59,192 epoch 8 - iter 720/1809 - loss 0.01026295 - time (sec): 357.15 - samples/sec: 426.20 - lr: 0.000046 - momentum: 0.000000 2023-10-14 02:14:29,940 epoch 8 - iter 900/1809 - loss 0.00994762 - time (sec): 447.90 - samples/sec: 425.93 - lr: 0.000044 - momentum: 0.000000 2023-10-14 02:16:04,150 epoch 8 - iter 1080/1809 - loss 0.00979174 - time (sec): 542.11 - samples/sec: 420.45 - lr: 0.000043 - momentum: 0.000000 2023-10-14 02:17:34,169 epoch 8 - iter 1260/1809 - loss 0.01043605 - time (sec): 632.13 - samples/sec: 420.83 - lr: 0.000041 - momentum: 0.000000 2023-10-14 02:19:02,279 epoch 8 - iter 1440/1809 - loss 0.01048305 - time (sec): 720.24 - samples/sec: 420.80 - lr: 0.000039 - momentum: 0.000000 2023-10-14 02:20:31,156 epoch 8 - iter 1620/1809 - loss 0.01031134 - time (sec): 809.12 - samples/sec: 421.20 - lr: 0.000037 - momentum: 0.000000 2023-10-14 02:21:59,294 epoch 8 - iter 1800/1809 - loss 0.01018123 - time (sec): 897.26 - samples/sec: 421.63 - lr: 0.000036 - momentum: 0.000000 2023-10-14 02:22:03,202 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:22:03,202 EPOCH 8 done: loss 0.0102 - lr: 0.000036 2023-10-14 02:22:42,340 DEV : loss 0.31866809725761414 - f1-score (micro avg) 0.6638 2023-10-14 02:22:42,399 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:24:15,293 epoch 9 - iter 180/1809 - loss 0.00817136 - time (sec): 92.89 - samples/sec: 415.41 - lr: 0.000034 - momentum: 0.000000 2023-10-14 02:25:48,900 epoch 9 - iter 360/1809 - loss 0.00980844 - time (sec): 186.50 - samples/sec: 417.93 - lr: 0.000032 - momentum: 0.000000 2023-10-14 02:27:21,286 epoch 9 - iter 540/1809 - loss 0.00968269 - time (sec): 278.88 - samples/sec: 413.52 - lr: 0.000030 - momentum: 0.000000 2023-10-14 02:28:53,866 epoch 9 - iter 720/1809 - loss 0.00851234 - time (sec): 371.46 - samples/sec: 413.77 - lr: 0.000028 - momentum: 0.000000 2023-10-14 02:30:25,601 epoch 9 - iter 900/1809 - loss 0.00876138 - time (sec): 463.20 - samples/sec: 412.00 - lr: 0.000027 - momentum: 0.000000 2023-10-14 02:31:58,342 epoch 9 - iter 1080/1809 - loss 0.00835601 - time (sec): 555.94 - samples/sec: 411.87 - lr: 0.000025 - momentum: 0.000000 2023-10-14 02:33:29,812 epoch 9 - iter 1260/1809 - loss 0.00813399 - time (sec): 647.41 - samples/sec: 411.93 - lr: 0.000023 - momentum: 0.000000 2023-10-14 02:35:00,772 epoch 9 - iter 1440/1809 - loss 0.00791238 - time (sec): 738.37 - samples/sec: 411.24 - lr: 0.000021 - momentum: 0.000000 2023-10-14 02:36:32,257 epoch 9 - iter 1620/1809 - loss 0.00783074 - time (sec): 829.86 - samples/sec: 409.66 - lr: 0.000020 - momentum: 0.000000 2023-10-14 02:38:03,837 epoch 9 - iter 1800/1809 - loss 0.00756760 - time (sec): 921.44 - samples/sec: 410.66 - lr: 0.000018 - momentum: 0.000000 2023-10-14 02:38:07,713 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:38:07,713 EPOCH 9 done: loss 0.0076 - lr: 0.000018 2023-10-14 02:38:45,463 DEV : loss 0.3397609293460846 - f1-score (micro avg) 0.6699 2023-10-14 02:38:45,524 saving best model 2023-10-14 02:38:48,090 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:40:17,848 epoch 10 - iter 180/1809 - loss 0.00675340 - time (sec): 89.75 - samples/sec: 418.97 - lr: 0.000016 - momentum: 0.000000 2023-10-14 02:41:47,430 epoch 10 - iter 360/1809 - loss 0.00699592 - time (sec): 179.34 - samples/sec: 410.52 - lr: 0.000014 - momentum: 0.000000 2023-10-14 02:43:19,155 epoch 10 - iter 540/1809 - loss 0.00674174 - time (sec): 271.06 - samples/sec: 414.12 - lr: 0.000012 - momentum: 0.000000 2023-10-14 02:44:53,881 epoch 10 - iter 720/1809 - loss 0.00580321 - time (sec): 365.79 - samples/sec: 407.45 - lr: 0.000011 - momentum: 0.000000 2023-10-14 02:46:33,110 epoch 10 - iter 900/1809 - loss 0.00563464 - time (sec): 465.02 - samples/sec: 403.22 - lr: 0.000009 - momentum: 0.000000 2023-10-14 02:48:05,921 epoch 10 - iter 1080/1809 - loss 0.00536878 - time (sec): 557.83 - samples/sec: 403.01 - lr: 0.000007 - momentum: 0.000000 2023-10-14 02:49:38,498 epoch 10 - iter 1260/1809 - loss 0.00519036 - time (sec): 650.40 - samples/sec: 406.07 - lr: 0.000005 - momentum: 0.000000 2023-10-14 02:51:10,923 epoch 10 - iter 1440/1809 - loss 0.00506535 - time (sec): 742.83 - samples/sec: 405.69 - lr: 0.000004 - momentum: 0.000000 2023-10-14 02:52:43,275 epoch 10 - iter 1620/1809 - loss 0.00520096 - time (sec): 835.18 - samples/sec: 407.55 - lr: 0.000002 - momentum: 0.000000 2023-10-14 02:54:18,302 epoch 10 - iter 1800/1809 - loss 0.00505952 - time (sec): 930.21 - samples/sec: 406.29 - lr: 0.000000 - momentum: 0.000000 2023-10-14 02:54:22,857 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:54:22,857 EPOCH 10 done: loss 0.0050 - lr: 0.000000 2023-10-14 02:55:02,618 DEV : loss 0.34813550114631653 - f1-score (micro avg) 0.6654 2023-10-14 02:55:03,588 ---------------------------------------------------------------------------------------------------- 2023-10-14 02:55:03,590 Loading model from best epoch ... 2023-10-14 02:55:07,452 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-14 02:56:07,325 Results: - F-score (micro) 0.6407 - F-score (macro) 0.4712 - Accuracy 0.4813 By class: precision recall f1-score support loc 0.6545 0.7563 0.7017 591 pers 0.5705 0.7143 0.6343 357 org 0.1000 0.0633 0.0775 79 micro avg 0.5992 0.6884 0.6407 1027 macro avg 0.4416 0.5113 0.4712 1027 weighted avg 0.5826 0.6884 0.6303 1027 2023-10-14 02:56:07,325 ----------------------------------------------------------------------------------------------------