2023-10-13 07:19:53,183 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,185 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 07:19:53,186 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,186 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-13 07:19:53,186 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,186 Train: 7936 sentences 2023-10-13 07:19:53,186 (train_with_dev=False, train_with_test=False) 2023-10-13 07:19:53,186 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,186 Training Params: 2023-10-13 07:19:53,186 - learning_rate: "0.00016" 2023-10-13 07:19:53,186 - mini_batch_size: "4" 2023-10-13 07:19:53,187 - max_epochs: "10" 2023-10-13 07:19:53,187 - shuffle: "True" 2023-10-13 07:19:53,187 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,187 Plugins: 2023-10-13 07:19:53,187 - TensorboardLogger 2023-10-13 07:19:53,187 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 07:19:53,187 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,187 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 07:19:53,187 - metric: "('micro avg', 'f1-score')" 2023-10-13 07:19:53,187 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,187 Computation: 2023-10-13 07:19:53,187 - compute on device: cuda:0 2023-10-13 07:19:53,187 - embedding storage: none 2023-10-13 07:19:53,187 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,187 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-13 07:19:53,188 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,188 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:19:53,188 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 07:20:45,128 epoch 1 - iter 198/1984 - loss 2.55818872 - time (sec): 51.94 - samples/sec: 314.07 - lr: 0.000016 - momentum: 0.000000 2023-10-13 07:21:40,000 epoch 1 - iter 396/1984 - loss 2.33587782 - time (sec): 106.81 - samples/sec: 312.15 - lr: 0.000032 - momentum: 0.000000 2023-10-13 07:22:33,605 epoch 1 - iter 594/1984 - loss 2.01970139 - time (sec): 160.41 - samples/sec: 312.74 - lr: 0.000048 - momentum: 0.000000 2023-10-13 07:23:30,081 epoch 1 - iter 792/1984 - loss 1.70381571 - time (sec): 216.89 - samples/sec: 305.22 - lr: 0.000064 - momentum: 0.000000 2023-10-13 07:24:23,213 epoch 1 - iter 990/1984 - loss 1.45996090 - time (sec): 270.02 - samples/sec: 305.90 - lr: 0.000080 - momentum: 0.000000 2023-10-13 07:25:15,127 epoch 1 - iter 1188/1984 - loss 1.26768891 - time (sec): 321.94 - samples/sec: 307.04 - lr: 0.000096 - momentum: 0.000000 2023-10-13 07:26:08,262 epoch 1 - iter 1386/1984 - loss 1.12328892 - time (sec): 375.07 - samples/sec: 305.91 - lr: 0.000112 - momentum: 0.000000 2023-10-13 07:27:05,255 epoch 1 - iter 1584/1984 - loss 1.01186913 - time (sec): 432.06 - samples/sec: 303.95 - lr: 0.000128 - momentum: 0.000000 2023-10-13 07:28:00,428 epoch 1 - iter 1782/1984 - loss 0.91951880 - time (sec): 487.24 - samples/sec: 303.33 - lr: 0.000144 - momentum: 0.000000 2023-10-13 07:28:52,462 epoch 1 - iter 1980/1984 - loss 0.84899421 - time (sec): 539.27 - samples/sec: 303.25 - lr: 0.000160 - momentum: 0.000000 2023-10-13 07:28:53,563 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:28:53,563 EPOCH 1 done: loss 0.8477 - lr: 0.000160 2023-10-13 07:29:18,574 DEV : loss 0.15308877825737 - f1-score (micro avg) 0.6327 2023-10-13 07:29:18,615 saving best model 2023-10-13 07:29:19,514 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:30:13,242 epoch 2 - iter 198/1984 - loss 0.15815308 - time (sec): 53.73 - samples/sec: 307.56 - lr: 0.000158 - momentum: 0.000000 2023-10-13 07:31:05,331 epoch 2 - iter 396/1984 - loss 0.15294294 - time (sec): 105.81 - samples/sec: 310.42 - lr: 0.000156 - momentum: 0.000000 2023-10-13 07:31:57,246 epoch 2 - iter 594/1984 - loss 0.14762650 - time (sec): 157.73 - samples/sec: 311.13 - lr: 0.000155 - momentum: 0.000000 2023-10-13 07:32:47,565 epoch 2 - iter 792/1984 - loss 0.13839157 - time (sec): 208.05 - samples/sec: 309.53 - lr: 0.000153 - momentum: 0.000000 2023-10-13 07:33:38,915 epoch 2 - iter 990/1984 - loss 0.13451793 - time (sec): 259.40 - samples/sec: 314.83 - lr: 0.000151 - momentum: 0.000000 2023-10-13 07:34:29,616 epoch 2 - iter 1188/1984 - loss 0.13430916 - time (sec): 310.10 - samples/sec: 315.86 - lr: 0.000149 - momentum: 0.000000 2023-10-13 07:35:21,056 epoch 2 - iter 1386/1984 - loss 0.13142661 - time (sec): 361.54 - samples/sec: 316.87 - lr: 0.000148 - momentum: 0.000000 2023-10-13 07:36:12,839 epoch 2 - iter 1584/1984 - loss 0.12933421 - time (sec): 413.32 - samples/sec: 316.49 - lr: 0.000146 - momentum: 0.000000 2023-10-13 07:37:09,596 epoch 2 - iter 1782/1984 - loss 0.12598983 - time (sec): 470.08 - samples/sec: 313.38 - lr: 0.000144 - momentum: 0.000000 2023-10-13 07:38:05,462 epoch 2 - iter 1980/1984 - loss 0.12391478 - time (sec): 525.95 - samples/sec: 311.29 - lr: 0.000142 - momentum: 0.000000 2023-10-13 07:38:06,548 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:38:06,548 EPOCH 2 done: loss 0.1239 - lr: 0.000142 2023-10-13 07:38:33,487 DEV : loss 0.09046369045972824 - f1-score (micro avg) 0.7366 2023-10-13 07:38:33,530 saving best model 2023-10-13 07:38:36,623 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:39:29,157 epoch 3 - iter 198/1984 - loss 0.07114802 - time (sec): 52.53 - samples/sec: 326.44 - lr: 0.000140 - momentum: 0.000000 2023-10-13 07:40:20,054 epoch 3 - iter 396/1984 - loss 0.07583877 - time (sec): 103.43 - samples/sec: 321.00 - lr: 0.000139 - momentum: 0.000000 2023-10-13 07:41:10,929 epoch 3 - iter 594/1984 - loss 0.08206199 - time (sec): 154.30 - samples/sec: 320.46 - lr: 0.000137 - momentum: 0.000000 2023-10-13 07:42:07,330 epoch 3 - iter 792/1984 - loss 0.07672763 - time (sec): 210.70 - samples/sec: 313.03 - lr: 0.000135 - momentum: 0.000000 2023-10-13 07:43:00,039 epoch 3 - iter 990/1984 - loss 0.08019713 - time (sec): 263.41 - samples/sec: 309.16 - lr: 0.000133 - momentum: 0.000000 2023-10-13 07:43:52,103 epoch 3 - iter 1188/1984 - loss 0.07965612 - time (sec): 315.48 - samples/sec: 309.16 - lr: 0.000132 - momentum: 0.000000 2023-10-13 07:44:43,909 epoch 3 - iter 1386/1984 - loss 0.07648297 - time (sec): 367.28 - samples/sec: 309.91 - lr: 0.000130 - momentum: 0.000000 2023-10-13 07:45:36,700 epoch 3 - iter 1584/1984 - loss 0.07597711 - time (sec): 420.07 - samples/sec: 310.28 - lr: 0.000128 - momentum: 0.000000 2023-10-13 07:46:30,204 epoch 3 - iter 1782/1984 - loss 0.07670438 - time (sec): 473.58 - samples/sec: 311.61 - lr: 0.000126 - momentum: 0.000000 2023-10-13 07:47:27,417 epoch 3 - iter 1980/1984 - loss 0.07649439 - time (sec): 530.79 - samples/sec: 308.14 - lr: 0.000125 - momentum: 0.000000 2023-10-13 07:47:28,745 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:47:28,745 EPOCH 3 done: loss 0.0763 - lr: 0.000125 2023-10-13 07:47:59,318 DEV : loss 0.10242562741041183 - f1-score (micro avg) 0.7461 2023-10-13 07:47:59,367 saving best model 2023-10-13 07:48:02,023 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:48:54,136 epoch 4 - iter 198/1984 - loss 0.06357581 - time (sec): 52.11 - samples/sec: 319.93 - lr: 0.000123 - momentum: 0.000000 2023-10-13 07:49:46,911 epoch 4 - iter 396/1984 - loss 0.05408411 - time (sec): 104.88 - samples/sec: 310.42 - lr: 0.000121 - momentum: 0.000000 2023-10-13 07:50:42,323 epoch 4 - iter 594/1984 - loss 0.05453776 - time (sec): 160.30 - samples/sec: 315.16 - lr: 0.000119 - momentum: 0.000000 2023-10-13 07:51:39,996 epoch 4 - iter 792/1984 - loss 0.05390139 - time (sec): 217.97 - samples/sec: 305.84 - lr: 0.000117 - momentum: 0.000000 2023-10-13 07:52:32,946 epoch 4 - iter 990/1984 - loss 0.05557298 - time (sec): 270.92 - samples/sec: 307.16 - lr: 0.000116 - momentum: 0.000000 2023-10-13 07:53:26,134 epoch 4 - iter 1188/1984 - loss 0.05492451 - time (sec): 324.11 - samples/sec: 306.84 - lr: 0.000114 - momentum: 0.000000 2023-10-13 07:54:18,780 epoch 4 - iter 1386/1984 - loss 0.05394514 - time (sec): 376.75 - samples/sec: 307.31 - lr: 0.000112 - momentum: 0.000000 2023-10-13 07:55:10,916 epoch 4 - iter 1584/1984 - loss 0.05474911 - time (sec): 428.89 - samples/sec: 306.19 - lr: 0.000110 - momentum: 0.000000 2023-10-13 07:56:01,745 epoch 4 - iter 1782/1984 - loss 0.05498902 - time (sec): 479.72 - samples/sec: 308.69 - lr: 0.000109 - momentum: 0.000000 2023-10-13 07:56:53,986 epoch 4 - iter 1980/1984 - loss 0.05555249 - time (sec): 531.96 - samples/sec: 307.85 - lr: 0.000107 - momentum: 0.000000 2023-10-13 07:56:54,977 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:56:54,977 EPOCH 4 done: loss 0.0558 - lr: 0.000107 2023-10-13 07:57:20,222 DEV : loss 0.11803846806287766 - f1-score (micro avg) 0.7635 2023-10-13 07:57:20,263 saving best model 2023-10-13 07:57:23,319 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:58:16,078 epoch 5 - iter 198/1984 - loss 0.03671910 - time (sec): 52.75 - samples/sec: 307.65 - lr: 0.000105 - momentum: 0.000000 2023-10-13 07:59:07,480 epoch 5 - iter 396/1984 - loss 0.03393313 - time (sec): 104.16 - samples/sec: 302.23 - lr: 0.000103 - momentum: 0.000000 2023-10-13 08:00:04,822 epoch 5 - iter 594/1984 - loss 0.03710980 - time (sec): 161.50 - samples/sec: 298.74 - lr: 0.000101 - momentum: 0.000000 2023-10-13 08:00:57,737 epoch 5 - iter 792/1984 - loss 0.03775117 - time (sec): 214.41 - samples/sec: 300.91 - lr: 0.000100 - momentum: 0.000000 2023-10-13 08:01:51,384 epoch 5 - iter 990/1984 - loss 0.03707890 - time (sec): 268.06 - samples/sec: 308.40 - lr: 0.000098 - momentum: 0.000000 2023-10-13 08:02:43,475 epoch 5 - iter 1188/1984 - loss 0.03912596 - time (sec): 320.15 - samples/sec: 307.34 - lr: 0.000096 - momentum: 0.000000 2023-10-13 08:03:39,718 epoch 5 - iter 1386/1984 - loss 0.04177612 - time (sec): 376.40 - samples/sec: 303.59 - lr: 0.000094 - momentum: 0.000000 2023-10-13 08:04:32,815 epoch 5 - iter 1584/1984 - loss 0.04251075 - time (sec): 429.49 - samples/sec: 301.96 - lr: 0.000093 - momentum: 0.000000 2023-10-13 08:05:28,158 epoch 5 - iter 1782/1984 - loss 0.04104742 - time (sec): 484.83 - samples/sec: 301.28 - lr: 0.000091 - momentum: 0.000000 2023-10-13 08:06:23,416 epoch 5 - iter 1980/1984 - loss 0.04173445 - time (sec): 540.09 - samples/sec: 302.98 - lr: 0.000089 - momentum: 0.000000 2023-10-13 08:06:24,568 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:06:24,568 EPOCH 5 done: loss 0.0417 - lr: 0.000089 2023-10-13 08:06:53,692 DEV : loss 0.15268999338150024 - f1-score (micro avg) 0.7547 2023-10-13 08:06:53,735 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:07:46,397 epoch 6 - iter 198/1984 - loss 0.02535796 - time (sec): 52.66 - samples/sec: 292.77 - lr: 0.000087 - momentum: 0.000000 2023-10-13 08:08:40,288 epoch 6 - iter 396/1984 - loss 0.02718950 - time (sec): 106.55 - samples/sec: 295.78 - lr: 0.000085 - momentum: 0.000000 2023-10-13 08:09:40,348 epoch 6 - iter 594/1984 - loss 0.03096970 - time (sec): 166.61 - samples/sec: 286.38 - lr: 0.000084 - momentum: 0.000000 2023-10-13 08:10:33,306 epoch 6 - iter 792/1984 - loss 0.03063680 - time (sec): 219.57 - samples/sec: 293.51 - lr: 0.000082 - momentum: 0.000000 2023-10-13 08:11:32,247 epoch 6 - iter 990/1984 - loss 0.03001187 - time (sec): 278.51 - samples/sec: 293.48 - lr: 0.000080 - momentum: 0.000000 2023-10-13 08:12:27,392 epoch 6 - iter 1188/1984 - loss 0.03035169 - time (sec): 333.65 - samples/sec: 295.48 - lr: 0.000078 - momentum: 0.000000 2023-10-13 08:13:22,976 epoch 6 - iter 1386/1984 - loss 0.02900396 - time (sec): 389.24 - samples/sec: 294.91 - lr: 0.000077 - momentum: 0.000000 2023-10-13 08:14:14,076 epoch 6 - iter 1584/1984 - loss 0.02940746 - time (sec): 440.34 - samples/sec: 295.84 - lr: 0.000075 - momentum: 0.000000 2023-10-13 08:15:14,609 epoch 6 - iter 1782/1984 - loss 0.02963627 - time (sec): 500.87 - samples/sec: 294.25 - lr: 0.000073 - momentum: 0.000000 2023-10-13 08:16:10,077 epoch 6 - iter 1980/1984 - loss 0.02994504 - time (sec): 556.34 - samples/sec: 294.06 - lr: 0.000071 - momentum: 0.000000 2023-10-13 08:16:11,303 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:16:11,303 EPOCH 6 done: loss 0.0299 - lr: 0.000071 2023-10-13 08:16:39,069 DEV : loss 0.17402999103069305 - f1-score (micro avg) 0.7606 2023-10-13 08:16:39,114 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:17:34,989 epoch 7 - iter 198/1984 - loss 0.01598902 - time (sec): 55.87 - samples/sec: 280.01 - lr: 0.000069 - momentum: 0.000000 2023-10-13 08:18:28,873 epoch 7 - iter 396/1984 - loss 0.01943854 - time (sec): 109.76 - samples/sec: 288.92 - lr: 0.000068 - momentum: 0.000000 2023-10-13 08:19:26,960 epoch 7 - iter 594/1984 - loss 0.02024313 - time (sec): 167.84 - samples/sec: 287.81 - lr: 0.000066 - momentum: 0.000000 2023-10-13 08:20:22,810 epoch 7 - iter 792/1984 - loss 0.01931686 - time (sec): 223.69 - samples/sec: 287.41 - lr: 0.000064 - momentum: 0.000000 2023-10-13 08:21:19,600 epoch 7 - iter 990/1984 - loss 0.01880883 - time (sec): 280.48 - samples/sec: 287.09 - lr: 0.000062 - momentum: 0.000000 2023-10-13 08:22:20,464 epoch 7 - iter 1188/1984 - loss 0.01968767 - time (sec): 341.35 - samples/sec: 284.54 - lr: 0.000061 - momentum: 0.000000 2023-10-13 08:23:19,883 epoch 7 - iter 1386/1984 - loss 0.02039686 - time (sec): 400.77 - samples/sec: 284.40 - lr: 0.000059 - momentum: 0.000000 2023-10-13 08:24:16,699 epoch 7 - iter 1584/1984 - loss 0.01969449 - time (sec): 457.58 - samples/sec: 284.46 - lr: 0.000057 - momentum: 0.000000 2023-10-13 08:25:12,168 epoch 7 - iter 1782/1984 - loss 0.01944575 - time (sec): 513.05 - samples/sec: 284.69 - lr: 0.000055 - momentum: 0.000000 2023-10-13 08:26:06,899 epoch 7 - iter 1980/1984 - loss 0.02144356 - time (sec): 567.78 - samples/sec: 288.28 - lr: 0.000053 - momentum: 0.000000 2023-10-13 08:26:07,959 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:26:07,959 EPOCH 7 done: loss 0.0214 - lr: 0.000053 2023-10-13 08:26:35,116 DEV : loss 0.19614897668361664 - f1-score (micro avg) 0.7503 2023-10-13 08:26:35,167 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:27:29,035 epoch 8 - iter 198/1984 - loss 0.01158850 - time (sec): 53.87 - samples/sec: 295.21 - lr: 0.000052 - momentum: 0.000000 2023-10-13 08:28:22,570 epoch 8 - iter 396/1984 - loss 0.01135671 - time (sec): 107.40 - samples/sec: 298.21 - lr: 0.000050 - momentum: 0.000000 2023-10-13 08:29:14,780 epoch 8 - iter 594/1984 - loss 0.01283228 - time (sec): 159.61 - samples/sec: 297.53 - lr: 0.000048 - momentum: 0.000000 2023-10-13 08:30:08,374 epoch 8 - iter 792/1984 - loss 0.01290769 - time (sec): 213.20 - samples/sec: 299.95 - lr: 0.000046 - momentum: 0.000000 2023-10-13 08:31:04,454 epoch 8 - iter 990/1984 - loss 0.01378612 - time (sec): 269.28 - samples/sec: 299.54 - lr: 0.000045 - momentum: 0.000000 2023-10-13 08:31:58,431 epoch 8 - iter 1188/1984 - loss 0.01439749 - time (sec): 323.26 - samples/sec: 301.50 - lr: 0.000043 - momentum: 0.000000 2023-10-13 08:32:50,392 epoch 8 - iter 1386/1984 - loss 0.01433923 - time (sec): 375.22 - samples/sec: 302.17 - lr: 0.000041 - momentum: 0.000000 2023-10-13 08:33:46,683 epoch 8 - iter 1584/1984 - loss 0.01467700 - time (sec): 431.51 - samples/sec: 301.57 - lr: 0.000039 - momentum: 0.000000 2023-10-13 08:34:41,708 epoch 8 - iter 1782/1984 - loss 0.01511737 - time (sec): 486.54 - samples/sec: 303.30 - lr: 0.000037 - momentum: 0.000000 2023-10-13 08:35:36,094 epoch 8 - iter 1980/1984 - loss 0.01616362 - time (sec): 540.92 - samples/sec: 302.46 - lr: 0.000036 - momentum: 0.000000 2023-10-13 08:35:37,175 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:35:37,176 EPOCH 8 done: loss 0.0162 - lr: 0.000036 2023-10-13 08:36:04,089 DEV : loss 0.21642562747001648 - f1-score (micro avg) 0.7525 2023-10-13 08:36:04,133 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:36:57,949 epoch 9 - iter 198/1984 - loss 0.00640065 - time (sec): 53.81 - samples/sec: 309.76 - lr: 0.000034 - momentum: 0.000000 2023-10-13 08:37:52,235 epoch 9 - iter 396/1984 - loss 0.01156542 - time (sec): 108.10 - samples/sec: 314.42 - lr: 0.000032 - momentum: 0.000000 2023-10-13 08:38:45,254 epoch 9 - iter 594/1984 - loss 0.01213383 - time (sec): 161.12 - samples/sec: 311.52 - lr: 0.000030 - momentum: 0.000000 2023-10-13 08:39:39,026 epoch 9 - iter 792/1984 - loss 0.01092639 - time (sec): 214.89 - samples/sec: 308.89 - lr: 0.000029 - momentum: 0.000000 2023-10-13 08:40:32,750 epoch 9 - iter 990/1984 - loss 0.01079153 - time (sec): 268.61 - samples/sec: 307.59 - lr: 0.000027 - momentum: 0.000000 2023-10-13 08:41:26,372 epoch 9 - iter 1188/1984 - loss 0.01156083 - time (sec): 322.24 - samples/sec: 302.27 - lr: 0.000025 - momentum: 0.000000 2023-10-13 08:42:19,915 epoch 9 - iter 1386/1984 - loss 0.01100464 - time (sec): 375.78 - samples/sec: 301.17 - lr: 0.000023 - momentum: 0.000000 2023-10-13 08:43:14,716 epoch 9 - iter 1584/1984 - loss 0.01062470 - time (sec): 430.58 - samples/sec: 301.15 - lr: 0.000021 - momentum: 0.000000 2023-10-13 08:44:09,975 epoch 9 - iter 1782/1984 - loss 0.01144723 - time (sec): 485.84 - samples/sec: 301.52 - lr: 0.000020 - momentum: 0.000000 2023-10-13 08:45:05,444 epoch 9 - iter 1980/1984 - loss 0.01103215 - time (sec): 541.31 - samples/sec: 302.41 - lr: 0.000018 - momentum: 0.000000 2023-10-13 08:45:06,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:45:06,469 EPOCH 9 done: loss 0.0111 - lr: 0.000018 2023-10-13 08:45:33,139 DEV : loss 0.23107928037643433 - f1-score (micro avg) 0.7558 2023-10-13 08:45:33,190 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:46:29,480 epoch 10 - iter 198/1984 - loss 0.01096974 - time (sec): 56.29 - samples/sec: 293.21 - lr: 0.000016 - momentum: 0.000000 2023-10-13 08:47:24,775 epoch 10 - iter 396/1984 - loss 0.01022958 - time (sec): 111.58 - samples/sec: 295.82 - lr: 0.000014 - momentum: 0.000000 2023-10-13 08:48:18,420 epoch 10 - iter 594/1984 - loss 0.01081945 - time (sec): 165.23 - samples/sec: 298.57 - lr: 0.000013 - momentum: 0.000000 2023-10-13 08:49:14,804 epoch 10 - iter 792/1984 - loss 0.00943856 - time (sec): 221.61 - samples/sec: 299.73 - lr: 0.000011 - momentum: 0.000000 2023-10-13 08:50:09,344 epoch 10 - iter 990/1984 - loss 0.00912387 - time (sec): 276.15 - samples/sec: 298.42 - lr: 0.000009 - momentum: 0.000000 2023-10-13 08:51:06,217 epoch 10 - iter 1188/1984 - loss 0.00944281 - time (sec): 333.02 - samples/sec: 295.66 - lr: 0.000007 - momentum: 0.000000 2023-10-13 08:52:01,841 epoch 10 - iter 1386/1984 - loss 0.00876352 - time (sec): 388.65 - samples/sec: 297.79 - lr: 0.000005 - momentum: 0.000000 2023-10-13 08:52:56,639 epoch 10 - iter 1584/1984 - loss 0.00844070 - time (sec): 443.45 - samples/sec: 298.59 - lr: 0.000004 - momentum: 0.000000 2023-10-13 08:53:50,959 epoch 10 - iter 1782/1984 - loss 0.00854251 - time (sec): 497.77 - samples/sec: 296.91 - lr: 0.000002 - momentum: 0.000000 2023-10-13 08:54:44,573 epoch 10 - iter 1980/1984 - loss 0.00843691 - time (sec): 551.38 - samples/sec: 296.74 - lr: 0.000000 - momentum: 0.000000 2023-10-13 08:54:45,746 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:54:45,747 EPOCH 10 done: loss 0.0085 - lr: 0.000000 2023-10-13 08:55:12,311 DEV : loss 0.23015783727169037 - f1-score (micro avg) 0.7565 2023-10-13 08:55:13,309 ---------------------------------------------------------------------------------------------------- 2023-10-13 08:55:13,311 Loading model from best epoch ... 2023-10-13 08:55:18,271 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 08:55:44,303 Results: - F-score (micro) 0.7718 - F-score (macro) 0.6805 - Accuracy 0.6534 By class: precision recall f1-score support LOC 0.8261 0.8412 0.8336 655 PER 0.7296 0.7623 0.7456 223 ORG 0.5306 0.4094 0.4622 127 micro avg 0.7745 0.7692 0.7718 1005 macro avg 0.6954 0.6710 0.6805 1005 weighted avg 0.7673 0.7692 0.7671 1005 2023-10-13 08:55:44,303 ----------------------------------------------------------------------------------------------------