2023-10-13 01:26:15,069 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,072 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 01:26:15,072 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,072 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-13 01:26:15,072 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,072 Train: 14465 sentences 2023-10-13 01:26:15,072 (train_with_dev=False, train_with_test=False) 2023-10-13 01:26:15,072 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,072 Training Params: 2023-10-13 01:26:15,072 - learning_rate: "0.00016" 2023-10-13 01:26:15,072 - mini_batch_size: "8" 2023-10-13 01:26:15,073 - max_epochs: "10" 2023-10-13 01:26:15,073 - shuffle: "True" 2023-10-13 01:26:15,073 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,073 Plugins: 2023-10-13 01:26:15,073 - TensorboardLogger 2023-10-13 01:26:15,073 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 01:26:15,073 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,073 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 01:26:15,073 - metric: "('micro avg', 'f1-score')" 2023-10-13 01:26:15,073 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,073 Computation: 2023-10-13 01:26:15,073 - compute on device: cuda:0 2023-10-13 01:26:15,073 - embedding storage: none 2023-10-13 01:26:15,073 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,073 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-13 01:26:15,074 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,074 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:26:15,074 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 01:27:50,591 epoch 1 - iter 180/1809 - loss 2.57027178 - time (sec): 95.52 - samples/sec: 402.43 - lr: 0.000016 - momentum: 0.000000 2023-10-13 01:29:25,564 epoch 1 - iter 360/1809 - loss 2.33739383 - time (sec): 190.49 - samples/sec: 398.49 - lr: 0.000032 - momentum: 0.000000 2023-10-13 01:30:59,649 epoch 1 - iter 540/1809 - loss 1.98337869 - time (sec): 284.57 - samples/sec: 396.72 - lr: 0.000048 - momentum: 0.000000 2023-10-13 01:32:35,319 epoch 1 - iter 720/1809 - loss 1.61896865 - time (sec): 380.24 - samples/sec: 398.89 - lr: 0.000064 - momentum: 0.000000 2023-10-13 01:34:08,395 epoch 1 - iter 900/1809 - loss 1.35128031 - time (sec): 473.32 - samples/sec: 400.08 - lr: 0.000080 - momentum: 0.000000 2023-10-13 01:35:39,939 epoch 1 - iter 1080/1809 - loss 1.16606342 - time (sec): 564.86 - samples/sec: 400.35 - lr: 0.000095 - momentum: 0.000000 2023-10-13 01:37:11,149 epoch 1 - iter 1260/1809 - loss 1.02972052 - time (sec): 656.07 - samples/sec: 400.53 - lr: 0.000111 - momentum: 0.000000 2023-10-13 01:38:42,954 epoch 1 - iter 1440/1809 - loss 0.92088707 - time (sec): 747.88 - samples/sec: 402.22 - lr: 0.000127 - momentum: 0.000000 2023-10-13 01:40:18,744 epoch 1 - iter 1620/1809 - loss 0.83665554 - time (sec): 843.67 - samples/sec: 401.67 - lr: 0.000143 - momentum: 0.000000 2023-10-13 01:41:56,057 epoch 1 - iter 1800/1809 - loss 0.76361147 - time (sec): 940.98 - samples/sec: 401.57 - lr: 0.000159 - momentum: 0.000000 2023-10-13 01:42:00,723 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:42:00,723 EPOCH 1 done: loss 0.7603 - lr: 0.000159 2023-10-13 01:42:37,965 DEV : loss 0.14501185715198517 - f1-score (micro avg) 0.4122 2023-10-13 01:42:38,027 saving best model 2023-10-13 01:42:38,900 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:44:12,141 epoch 2 - iter 180/1809 - loss 0.11022017 - time (sec): 93.24 - samples/sec: 415.46 - lr: 0.000158 - momentum: 0.000000 2023-10-13 01:45:45,611 epoch 2 - iter 360/1809 - loss 0.11221991 - time (sec): 186.71 - samples/sec: 414.87 - lr: 0.000156 - momentum: 0.000000 2023-10-13 01:47:16,997 epoch 2 - iter 540/1809 - loss 0.10902795 - time (sec): 278.09 - samples/sec: 414.67 - lr: 0.000155 - momentum: 0.000000 2023-10-13 01:48:48,299 epoch 2 - iter 720/1809 - loss 0.10834577 - time (sec): 369.40 - samples/sec: 411.87 - lr: 0.000153 - momentum: 0.000000 2023-10-13 01:50:21,166 epoch 2 - iter 900/1809 - loss 0.10540217 - time (sec): 462.26 - samples/sec: 407.82 - lr: 0.000151 - momentum: 0.000000 2023-10-13 01:51:51,954 epoch 2 - iter 1080/1809 - loss 0.10504549 - time (sec): 553.05 - samples/sec: 410.17 - lr: 0.000149 - momentum: 0.000000 2023-10-13 01:53:21,268 epoch 2 - iter 1260/1809 - loss 0.10280905 - time (sec): 642.37 - samples/sec: 411.75 - lr: 0.000148 - momentum: 0.000000 2023-10-13 01:54:50,585 epoch 2 - iter 1440/1809 - loss 0.10065484 - time (sec): 731.68 - samples/sec: 413.77 - lr: 0.000146 - momentum: 0.000000 2023-10-13 01:56:20,686 epoch 2 - iter 1620/1809 - loss 0.09800215 - time (sec): 821.78 - samples/sec: 415.31 - lr: 0.000144 - momentum: 0.000000 2023-10-13 01:57:49,286 epoch 2 - iter 1800/1809 - loss 0.09732421 - time (sec): 910.38 - samples/sec: 415.28 - lr: 0.000142 - momentum: 0.000000 2023-10-13 01:57:53,403 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:57:53,404 EPOCH 2 done: loss 0.0971 - lr: 0.000142 2023-10-13 01:58:32,248 DEV : loss 0.10631529986858368 - f1-score (micro avg) 0.618 2023-10-13 01:58:32,304 saving best model 2023-10-13 01:58:34,913 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:00:04,640 epoch 3 - iter 180/1809 - loss 0.06167102 - time (sec): 89.72 - samples/sec: 425.26 - lr: 0.000140 - momentum: 0.000000 2023-10-13 02:01:36,034 epoch 3 - iter 360/1809 - loss 0.06052679 - time (sec): 181.12 - samples/sec: 422.80 - lr: 0.000139 - momentum: 0.000000 2023-10-13 02:03:04,768 epoch 3 - iter 540/1809 - loss 0.06164459 - time (sec): 269.85 - samples/sec: 419.93 - lr: 0.000137 - momentum: 0.000000 2023-10-13 02:04:33,141 epoch 3 - iter 720/1809 - loss 0.06028257 - time (sec): 358.22 - samples/sec: 420.01 - lr: 0.000135 - momentum: 0.000000 2023-10-13 02:06:04,511 epoch 3 - iter 900/1809 - loss 0.06093924 - time (sec): 449.59 - samples/sec: 418.97 - lr: 0.000133 - momentum: 0.000000 2023-10-13 02:07:33,194 epoch 3 - iter 1080/1809 - loss 0.06123599 - time (sec): 538.28 - samples/sec: 420.59 - lr: 0.000132 - momentum: 0.000000 2023-10-13 02:09:05,494 epoch 3 - iter 1260/1809 - loss 0.06096843 - time (sec): 630.58 - samples/sec: 420.26 - lr: 0.000130 - momentum: 0.000000 2023-10-13 02:10:36,079 epoch 3 - iter 1440/1809 - loss 0.06042058 - time (sec): 721.16 - samples/sec: 419.13 - lr: 0.000128 - momentum: 0.000000 2023-10-13 02:12:05,652 epoch 3 - iter 1620/1809 - loss 0.06085687 - time (sec): 810.73 - samples/sec: 419.46 - lr: 0.000126 - momentum: 0.000000 2023-10-13 02:13:34,541 epoch 3 - iter 1800/1809 - loss 0.06010337 - time (sec): 899.62 - samples/sec: 420.31 - lr: 0.000125 - momentum: 0.000000 2023-10-13 02:13:38,557 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:13:38,557 EPOCH 3 done: loss 0.0600 - lr: 0.000125 2023-10-13 02:14:17,081 DEV : loss 0.1486276537179947 - f1-score (micro avg) 0.6279 2023-10-13 02:14:17,138 saving best model 2023-10-13 02:14:19,719 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:15:49,206 epoch 4 - iter 180/1809 - loss 0.04438082 - time (sec): 89.48 - samples/sec: 412.12 - lr: 0.000123 - momentum: 0.000000 2023-10-13 02:17:20,752 epoch 4 - iter 360/1809 - loss 0.04634535 - time (sec): 181.03 - samples/sec: 421.41 - lr: 0.000121 - momentum: 0.000000 2023-10-13 02:18:53,652 epoch 4 - iter 540/1809 - loss 0.04429804 - time (sec): 273.93 - samples/sec: 414.29 - lr: 0.000119 - momentum: 0.000000 2023-10-13 02:20:26,564 epoch 4 - iter 720/1809 - loss 0.04282892 - time (sec): 366.84 - samples/sec: 410.30 - lr: 0.000117 - momentum: 0.000000 2023-10-13 02:21:58,754 epoch 4 - iter 900/1809 - loss 0.04357623 - time (sec): 459.03 - samples/sec: 408.03 - lr: 0.000116 - momentum: 0.000000 2023-10-13 02:23:32,299 epoch 4 - iter 1080/1809 - loss 0.04492686 - time (sec): 552.57 - samples/sec: 407.98 - lr: 0.000114 - momentum: 0.000000 2023-10-13 02:25:06,137 epoch 4 - iter 1260/1809 - loss 0.04500505 - time (sec): 646.41 - samples/sec: 406.96 - lr: 0.000112 - momentum: 0.000000 2023-10-13 02:26:42,107 epoch 4 - iter 1440/1809 - loss 0.04465500 - time (sec): 742.38 - samples/sec: 405.46 - lr: 0.000110 - momentum: 0.000000 2023-10-13 02:28:20,025 epoch 4 - iter 1620/1809 - loss 0.04378505 - time (sec): 840.30 - samples/sec: 405.00 - lr: 0.000109 - momentum: 0.000000 2023-10-13 02:29:56,157 epoch 4 - iter 1800/1809 - loss 0.04381280 - time (sec): 936.43 - samples/sec: 403.84 - lr: 0.000107 - momentum: 0.000000 2023-10-13 02:30:00,550 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:30:00,551 EPOCH 4 done: loss 0.0440 - lr: 0.000107 2023-10-13 02:30:42,177 DEV : loss 0.1783849447965622 - f1-score (micro avg) 0.5908 2023-10-13 02:30:42,244 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:32:18,971 epoch 5 - iter 180/1809 - loss 0.02710856 - time (sec): 96.72 - samples/sec: 396.17 - lr: 0.000105 - momentum: 0.000000 2023-10-13 02:33:56,021 epoch 5 - iter 360/1809 - loss 0.02681704 - time (sec): 193.77 - samples/sec: 395.65 - lr: 0.000103 - momentum: 0.000000 2023-10-13 02:35:30,935 epoch 5 - iter 540/1809 - loss 0.02671099 - time (sec): 288.69 - samples/sec: 392.24 - lr: 0.000101 - momentum: 0.000000 2023-10-13 02:37:05,596 epoch 5 - iter 720/1809 - loss 0.02957950 - time (sec): 383.35 - samples/sec: 390.35 - lr: 0.000100 - momentum: 0.000000 2023-10-13 02:38:39,902 epoch 5 - iter 900/1809 - loss 0.03107445 - time (sec): 477.66 - samples/sec: 391.89 - lr: 0.000098 - momentum: 0.000000 2023-10-13 02:40:15,203 epoch 5 - iter 1080/1809 - loss 0.03053907 - time (sec): 572.96 - samples/sec: 393.23 - lr: 0.000096 - momentum: 0.000000 2023-10-13 02:41:51,148 epoch 5 - iter 1260/1809 - loss 0.03099549 - time (sec): 668.90 - samples/sec: 392.67 - lr: 0.000094 - momentum: 0.000000 2023-10-13 02:43:24,310 epoch 5 - iter 1440/1809 - loss 0.03239734 - time (sec): 762.06 - samples/sec: 393.07 - lr: 0.000093 - momentum: 0.000000 2023-10-13 02:45:01,678 epoch 5 - iter 1620/1809 - loss 0.03173686 - time (sec): 859.43 - samples/sec: 395.54 - lr: 0.000091 - momentum: 0.000000 2023-10-13 02:46:39,758 epoch 5 - iter 1800/1809 - loss 0.03205991 - time (sec): 957.51 - samples/sec: 394.91 - lr: 0.000089 - momentum: 0.000000 2023-10-13 02:46:44,304 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:46:44,305 EPOCH 5 done: loss 0.0320 - lr: 0.000089 2023-10-13 02:47:26,896 DEV : loss 0.2254001647233963 - f1-score (micro avg) 0.6268 2023-10-13 02:47:26,979 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:49:02,907 epoch 6 - iter 180/1809 - loss 0.02309495 - time (sec): 95.92 - samples/sec: 392.21 - lr: 0.000087 - momentum: 0.000000 2023-10-13 02:50:42,922 epoch 6 - iter 360/1809 - loss 0.02270880 - time (sec): 195.94 - samples/sec: 386.67 - lr: 0.000085 - momentum: 0.000000 2023-10-13 02:52:19,953 epoch 6 - iter 540/1809 - loss 0.02278981 - time (sec): 292.97 - samples/sec: 386.21 - lr: 0.000084 - momentum: 0.000000 2023-10-13 02:53:57,447 epoch 6 - iter 720/1809 - loss 0.02331295 - time (sec): 390.47 - samples/sec: 387.70 - lr: 0.000082 - momentum: 0.000000 2023-10-13 02:55:33,309 epoch 6 - iter 900/1809 - loss 0.02407055 - time (sec): 486.33 - samples/sec: 389.86 - lr: 0.000080 - momentum: 0.000000 2023-10-13 02:57:09,646 epoch 6 - iter 1080/1809 - loss 0.02408103 - time (sec): 582.66 - samples/sec: 389.96 - lr: 0.000078 - momentum: 0.000000 2023-10-13 02:58:45,520 epoch 6 - iter 1260/1809 - loss 0.02505139 - time (sec): 678.54 - samples/sec: 390.43 - lr: 0.000077 - momentum: 0.000000 2023-10-13 03:00:20,100 epoch 6 - iter 1440/1809 - loss 0.02498905 - time (sec): 773.12 - samples/sec: 389.95 - lr: 0.000075 - momentum: 0.000000 2023-10-13 03:01:52,543 epoch 6 - iter 1620/1809 - loss 0.02440205 - time (sec): 865.56 - samples/sec: 391.85 - lr: 0.000073 - momentum: 0.000000 2023-10-13 03:03:26,329 epoch 6 - iter 1800/1809 - loss 0.02416447 - time (sec): 959.35 - samples/sec: 394.14 - lr: 0.000071 - momentum: 0.000000 2023-10-13 03:03:30,703 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:03:30,704 EPOCH 6 done: loss 0.0241 - lr: 0.000071 2023-10-13 03:04:12,643 DEV : loss 0.26813653111457825 - f1-score (micro avg) 0.6519 2023-10-13 03:04:12,704 saving best model 2023-10-13 03:04:15,437 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:05:49,546 epoch 7 - iter 180/1809 - loss 0.01558310 - time (sec): 94.11 - samples/sec: 392.76 - lr: 0.000069 - momentum: 0.000000 2023-10-13 03:07:23,547 epoch 7 - iter 360/1809 - loss 0.01547849 - time (sec): 188.11 - samples/sec: 402.58 - lr: 0.000068 - momentum: 0.000000 2023-10-13 03:08:56,013 epoch 7 - iter 540/1809 - loss 0.01669214 - time (sec): 280.57 - samples/sec: 404.44 - lr: 0.000066 - momentum: 0.000000 2023-10-13 03:10:28,801 epoch 7 - iter 720/1809 - loss 0.01856464 - time (sec): 373.36 - samples/sec: 404.84 - lr: 0.000064 - momentum: 0.000000 2023-10-13 03:12:00,985 epoch 7 - iter 900/1809 - loss 0.01851849 - time (sec): 465.54 - samples/sec: 405.68 - lr: 0.000062 - momentum: 0.000000 2023-10-13 03:13:32,908 epoch 7 - iter 1080/1809 - loss 0.01798567 - time (sec): 557.47 - samples/sec: 405.34 - lr: 0.000061 - momentum: 0.000000 2023-10-13 03:15:06,435 epoch 7 - iter 1260/1809 - loss 0.01767135 - time (sec): 650.99 - samples/sec: 405.31 - lr: 0.000059 - momentum: 0.000000 2023-10-13 03:16:41,207 epoch 7 - iter 1440/1809 - loss 0.01898520 - time (sec): 745.77 - samples/sec: 404.12 - lr: 0.000057 - momentum: 0.000000 2023-10-13 03:18:16,201 epoch 7 - iter 1620/1809 - loss 0.01916789 - time (sec): 840.76 - samples/sec: 404.08 - lr: 0.000055 - momentum: 0.000000 2023-10-13 03:19:50,732 epoch 7 - iter 1800/1809 - loss 0.01895459 - time (sec): 935.29 - samples/sec: 404.42 - lr: 0.000053 - momentum: 0.000000 2023-10-13 03:19:55,054 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:19:55,054 EPOCH 7 done: loss 0.0189 - lr: 0.000053 2023-10-13 03:20:35,004 DEV : loss 0.29598313570022583 - f1-score (micro avg) 0.6553 2023-10-13 03:20:35,066 saving best model 2023-10-13 03:20:37,700 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:22:10,318 epoch 8 - iter 180/1809 - loss 0.01284778 - time (sec): 92.61 - samples/sec: 405.30 - lr: 0.000052 - momentum: 0.000000 2023-10-13 03:23:42,134 epoch 8 - iter 360/1809 - loss 0.01152205 - time (sec): 184.43 - samples/sec: 411.62 - lr: 0.000050 - momentum: 0.000000 2023-10-13 03:25:17,893 epoch 8 - iter 540/1809 - loss 0.01144334 - time (sec): 280.19 - samples/sec: 406.68 - lr: 0.000048 - momentum: 0.000000 2023-10-13 03:26:55,544 epoch 8 - iter 720/1809 - loss 0.01247695 - time (sec): 377.84 - samples/sec: 404.95 - lr: 0.000046 - momentum: 0.000000 2023-10-13 03:28:29,861 epoch 8 - iter 900/1809 - loss 0.01242235 - time (sec): 472.16 - samples/sec: 405.85 - lr: 0.000044 - momentum: 0.000000 2023-10-13 03:30:00,572 epoch 8 - iter 1080/1809 - loss 0.01249440 - time (sec): 562.87 - samples/sec: 404.47 - lr: 0.000043 - momentum: 0.000000 2023-10-13 03:31:32,676 epoch 8 - iter 1260/1809 - loss 0.01284750 - time (sec): 654.97 - samples/sec: 404.21 - lr: 0.000041 - momentum: 0.000000 2023-10-13 03:33:05,856 epoch 8 - iter 1440/1809 - loss 0.01286960 - time (sec): 748.15 - samples/sec: 405.03 - lr: 0.000039 - momentum: 0.000000 2023-10-13 03:34:39,321 epoch 8 - iter 1620/1809 - loss 0.01296803 - time (sec): 841.62 - samples/sec: 405.50 - lr: 0.000037 - momentum: 0.000000 2023-10-13 03:36:14,046 epoch 8 - iter 1800/1809 - loss 0.01287811 - time (sec): 936.34 - samples/sec: 404.13 - lr: 0.000036 - momentum: 0.000000 2023-10-13 03:36:18,143 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:36:18,143 EPOCH 8 done: loss 0.0128 - lr: 0.000036 2023-10-13 03:36:57,138 DEV : loss 0.33492255210876465 - f1-score (micro avg) 0.647 2023-10-13 03:36:57,201 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:38:34,200 epoch 9 - iter 180/1809 - loss 0.00784411 - time (sec): 97.00 - samples/sec: 383.03 - lr: 0.000034 - momentum: 0.000000 2023-10-13 03:40:11,602 epoch 9 - iter 360/1809 - loss 0.01133300 - time (sec): 194.40 - samples/sec: 384.49 - lr: 0.000032 - momentum: 0.000000 2023-10-13 03:41:47,232 epoch 9 - iter 540/1809 - loss 0.01015941 - time (sec): 290.03 - samples/sec: 387.16 - lr: 0.000030 - momentum: 0.000000 2023-10-13 03:43:23,081 epoch 9 - iter 720/1809 - loss 0.01025025 - time (sec): 385.88 - samples/sec: 394.75 - lr: 0.000028 - momentum: 0.000000 2023-10-13 03:44:57,677 epoch 9 - iter 900/1809 - loss 0.01070525 - time (sec): 480.47 - samples/sec: 394.93 - lr: 0.000027 - momentum: 0.000000 2023-10-13 03:46:31,742 epoch 9 - iter 1080/1809 - loss 0.01028318 - time (sec): 574.54 - samples/sec: 396.01 - lr: 0.000025 - momentum: 0.000000 2023-10-13 03:48:04,906 epoch 9 - iter 1260/1809 - loss 0.01021383 - time (sec): 667.70 - samples/sec: 396.05 - lr: 0.000023 - momentum: 0.000000 2023-10-13 03:49:39,964 epoch 9 - iter 1440/1809 - loss 0.01089912 - time (sec): 762.76 - samples/sec: 396.96 - lr: 0.000021 - momentum: 0.000000 2023-10-13 03:51:13,537 epoch 9 - iter 1620/1809 - loss 0.01101134 - time (sec): 856.33 - samples/sec: 398.65 - lr: 0.000020 - momentum: 0.000000 2023-10-13 03:52:47,726 epoch 9 - iter 1800/1809 - loss 0.01064439 - time (sec): 950.52 - samples/sec: 398.11 - lr: 0.000018 - momentum: 0.000000 2023-10-13 03:52:51,949 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:52:51,950 EPOCH 9 done: loss 0.0106 - lr: 0.000018 2023-10-13 03:53:31,082 DEV : loss 0.3527080714702606 - f1-score (micro avg) 0.6497 2023-10-13 03:53:31,150 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:55:10,271 epoch 10 - iter 180/1809 - loss 0.01074091 - time (sec): 99.12 - samples/sec: 381.35 - lr: 0.000016 - momentum: 0.000000 2023-10-13 03:56:50,888 epoch 10 - iter 360/1809 - loss 0.00827756 - time (sec): 199.74 - samples/sec: 375.73 - lr: 0.000014 - momentum: 0.000000 2023-10-13 03:58:30,839 epoch 10 - iter 540/1809 - loss 0.00823781 - time (sec): 299.69 - samples/sec: 376.56 - lr: 0.000012 - momentum: 0.000000 2023-10-13 04:00:10,229 epoch 10 - iter 720/1809 - loss 0.00811760 - time (sec): 399.08 - samples/sec: 375.88 - lr: 0.000011 - momentum: 0.000000 2023-10-13 04:01:47,773 epoch 10 - iter 900/1809 - loss 0.00799705 - time (sec): 496.62 - samples/sec: 378.88 - lr: 0.000009 - momentum: 0.000000 2023-10-13 04:03:23,198 epoch 10 - iter 1080/1809 - loss 0.00755984 - time (sec): 592.05 - samples/sec: 382.04 - lr: 0.000007 - momentum: 0.000000 2023-10-13 04:04:58,210 epoch 10 - iter 1260/1809 - loss 0.00766798 - time (sec): 687.06 - samples/sec: 384.62 - lr: 0.000005 - momentum: 0.000000 2023-10-13 04:06:32,716 epoch 10 - iter 1440/1809 - loss 0.00809054 - time (sec): 781.56 - samples/sec: 387.08 - lr: 0.000004 - momentum: 0.000000 2023-10-13 04:08:08,141 epoch 10 - iter 1620/1809 - loss 0.00862815 - time (sec): 876.99 - samples/sec: 388.22 - lr: 0.000002 - momentum: 0.000000 2023-10-13 04:09:43,867 epoch 10 - iter 1800/1809 - loss 0.00839058 - time (sec): 972.71 - samples/sec: 389.06 - lr: 0.000000 - momentum: 0.000000 2023-10-13 04:09:48,012 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:09:48,012 EPOCH 10 done: loss 0.0084 - lr: 0.000000 2023-10-13 04:10:26,982 DEV : loss 0.3519783318042755 - f1-score (micro avg) 0.6454 2023-10-13 04:10:27,904 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:10:27,906 Loading model from best epoch ... 2023-10-13 04:10:32,462 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-13 04:11:31,927 Results: - F-score (micro) 0.6338 - F-score (macro) 0.4822 - Accuracy 0.4769 By class: precision recall f1-score support loc 0.6496 0.7530 0.6975 591 pers 0.5405 0.7479 0.6275 357 org 0.1304 0.1139 0.1216 79 micro avg 0.5777 0.7020 0.6338 1027 macro avg 0.4402 0.5383 0.4822 1027 weighted avg 0.5718 0.7020 0.6289 1027 2023-10-13 04:11:31,927 ----------------------------------------------------------------------------------------------------