2023-10-13 05:43:31,169 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,171 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 05:43:31,172 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,172 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-13 05:43:31,172 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,172 Train: 7936 sentences 2023-10-13 05:43:31,172 (train_with_dev=False, train_with_test=False) 2023-10-13 05:43:31,172 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,172 Training Params: 2023-10-13 05:43:31,172 - learning_rate: "0.00015" 2023-10-13 05:43:31,172 - mini_batch_size: "4" 2023-10-13 05:43:31,172 - max_epochs: "10" 2023-10-13 05:43:31,172 - shuffle: "True" 2023-10-13 05:43:31,173 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,173 Plugins: 2023-10-13 05:43:31,173 - TensorboardLogger 2023-10-13 05:43:31,173 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 05:43:31,173 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,173 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 05:43:31,173 - metric: "('micro avg', 'f1-score')" 2023-10-13 05:43:31,173 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,173 Computation: 2023-10-13 05:43:31,173 - compute on device: cuda:0 2023-10-13 05:43:31,173 - embedding storage: none 2023-10-13 05:43:31,173 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,173 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-13 05:43:31,173 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,173 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:43:31,174 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 05:44:23,669 epoch 1 - iter 198/1984 - loss 2.56023642 - time (sec): 52.49 - samples/sec: 310.74 - lr: 0.000015 - momentum: 0.000000 2023-10-13 05:45:18,867 epoch 1 - iter 396/1984 - loss 2.35774357 - time (sec): 107.69 - samples/sec: 309.60 - lr: 0.000030 - momentum: 0.000000 2023-10-13 05:46:14,303 epoch 1 - iter 594/1984 - loss 2.05704108 - time (sec): 163.13 - samples/sec: 307.54 - lr: 0.000045 - momentum: 0.000000 2023-10-13 05:47:06,479 epoch 1 - iter 792/1984 - loss 1.74769940 - time (sec): 215.30 - samples/sec: 307.47 - lr: 0.000060 - momentum: 0.000000 2023-10-13 05:47:57,574 epoch 1 - iter 990/1984 - loss 1.49982570 - time (sec): 266.40 - samples/sec: 310.06 - lr: 0.000075 - momentum: 0.000000 2023-10-13 05:48:51,134 epoch 1 - iter 1188/1984 - loss 1.30358343 - time (sec): 319.96 - samples/sec: 308.94 - lr: 0.000090 - momentum: 0.000000 2023-10-13 05:49:47,616 epoch 1 - iter 1386/1984 - loss 1.15490690 - time (sec): 376.44 - samples/sec: 304.80 - lr: 0.000105 - momentum: 0.000000 2023-10-13 05:50:43,445 epoch 1 - iter 1584/1984 - loss 1.03965057 - time (sec): 432.27 - samples/sec: 303.80 - lr: 0.000120 - momentum: 0.000000 2023-10-13 05:51:37,606 epoch 1 - iter 1782/1984 - loss 0.94460620 - time (sec): 486.43 - samples/sec: 303.83 - lr: 0.000135 - momentum: 0.000000 2023-10-13 05:52:30,621 epoch 1 - iter 1980/1984 - loss 0.87250029 - time (sec): 539.45 - samples/sec: 303.15 - lr: 0.000150 - momentum: 0.000000 2023-10-13 05:52:31,783 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:52:31,783 EPOCH 1 done: loss 0.8711 - lr: 0.000150 2023-10-13 05:52:57,860 DEV : loss 0.16231723129749298 - f1-score (micro avg) 0.663 2023-10-13 05:52:57,906 saving best model 2023-10-13 05:52:58,787 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:53:54,708 epoch 2 - iter 198/1984 - loss 0.15963194 - time (sec): 55.92 - samples/sec: 295.50 - lr: 0.000148 - momentum: 0.000000 2023-10-13 05:54:49,891 epoch 2 - iter 396/1984 - loss 0.15446338 - time (sec): 111.10 - samples/sec: 295.65 - lr: 0.000147 - momentum: 0.000000 2023-10-13 05:55:42,962 epoch 2 - iter 594/1984 - loss 0.14934617 - time (sec): 164.17 - samples/sec: 298.92 - lr: 0.000145 - momentum: 0.000000 2023-10-13 05:56:38,570 epoch 2 - iter 792/1984 - loss 0.14004568 - time (sec): 219.78 - samples/sec: 293.01 - lr: 0.000143 - momentum: 0.000000 2023-10-13 05:57:34,636 epoch 2 - iter 990/1984 - loss 0.13664831 - time (sec): 275.85 - samples/sec: 296.06 - lr: 0.000142 - momentum: 0.000000 2023-10-13 05:58:28,427 epoch 2 - iter 1188/1984 - loss 0.13641309 - time (sec): 329.64 - samples/sec: 297.14 - lr: 0.000140 - momentum: 0.000000 2023-10-13 05:59:22,135 epoch 2 - iter 1386/1984 - loss 0.13311507 - time (sec): 383.35 - samples/sec: 298.85 - lr: 0.000138 - momentum: 0.000000 2023-10-13 06:00:19,474 epoch 2 - iter 1584/1984 - loss 0.13084502 - time (sec): 440.68 - samples/sec: 296.84 - lr: 0.000137 - momentum: 0.000000 2023-10-13 06:01:14,119 epoch 2 - iter 1782/1984 - loss 0.12778979 - time (sec): 495.33 - samples/sec: 297.40 - lr: 0.000135 - momentum: 0.000000 2023-10-13 06:02:07,974 epoch 2 - iter 1980/1984 - loss 0.12586842 - time (sec): 549.18 - samples/sec: 298.12 - lr: 0.000133 - momentum: 0.000000 2023-10-13 06:02:08,971 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:02:08,972 EPOCH 2 done: loss 0.1258 - lr: 0.000133 2023-10-13 06:02:35,243 DEV : loss 0.08949719369411469 - f1-score (micro avg) 0.7352 2023-10-13 06:02:35,285 saving best model 2023-10-13 06:02:37,860 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:03:32,717 epoch 3 - iter 198/1984 - loss 0.07354557 - time (sec): 54.85 - samples/sec: 312.62 - lr: 0.000132 - momentum: 0.000000 2023-10-13 06:04:25,496 epoch 3 - iter 396/1984 - loss 0.07621576 - time (sec): 107.63 - samples/sec: 308.46 - lr: 0.000130 - momentum: 0.000000 2023-10-13 06:05:19,074 epoch 3 - iter 594/1984 - loss 0.08125367 - time (sec): 161.21 - samples/sec: 306.72 - lr: 0.000128 - momentum: 0.000000 2023-10-13 06:06:15,512 epoch 3 - iter 792/1984 - loss 0.07585305 - time (sec): 217.65 - samples/sec: 303.05 - lr: 0.000127 - momentum: 0.000000 2023-10-13 06:07:10,722 epoch 3 - iter 990/1984 - loss 0.07972188 - time (sec): 272.86 - samples/sec: 298.46 - lr: 0.000125 - momentum: 0.000000 2023-10-13 06:08:07,849 epoch 3 - iter 1188/1984 - loss 0.07889533 - time (sec): 329.98 - samples/sec: 295.56 - lr: 0.000123 - momentum: 0.000000 2023-10-13 06:09:01,056 epoch 3 - iter 1386/1984 - loss 0.07614265 - time (sec): 383.19 - samples/sec: 297.04 - lr: 0.000122 - momentum: 0.000000 2023-10-13 06:09:55,354 epoch 3 - iter 1584/1984 - loss 0.07610978 - time (sec): 437.49 - samples/sec: 297.93 - lr: 0.000120 - momentum: 0.000000 2023-10-13 06:10:50,635 epoch 3 - iter 1782/1984 - loss 0.07645598 - time (sec): 492.77 - samples/sec: 299.47 - lr: 0.000118 - momentum: 0.000000 2023-10-13 06:11:45,893 epoch 3 - iter 1980/1984 - loss 0.07656055 - time (sec): 548.03 - samples/sec: 298.44 - lr: 0.000117 - momentum: 0.000000 2023-10-13 06:11:47,067 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:11:47,067 EPOCH 3 done: loss 0.0764 - lr: 0.000117 2023-10-13 06:12:13,772 DEV : loss 0.10229434072971344 - f1-score (micro avg) 0.7421 2023-10-13 06:12:13,819 saving best model 2023-10-13 06:12:16,515 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:13:11,781 epoch 4 - iter 198/1984 - loss 0.06257893 - time (sec): 55.26 - samples/sec: 301.67 - lr: 0.000115 - momentum: 0.000000 2023-10-13 06:14:06,955 epoch 4 - iter 396/1984 - loss 0.05453809 - time (sec): 110.44 - samples/sec: 294.81 - lr: 0.000113 - momentum: 0.000000 2023-10-13 06:15:02,500 epoch 4 - iter 594/1984 - loss 0.05487829 - time (sec): 165.98 - samples/sec: 304.37 - lr: 0.000112 - momentum: 0.000000 2023-10-13 06:15:57,389 epoch 4 - iter 792/1984 - loss 0.05252948 - time (sec): 220.87 - samples/sec: 301.83 - lr: 0.000110 - momentum: 0.000000 2023-10-13 06:16:50,636 epoch 4 - iter 990/1984 - loss 0.05421408 - time (sec): 274.12 - samples/sec: 303.57 - lr: 0.000108 - momentum: 0.000000 2023-10-13 06:17:42,895 epoch 4 - iter 1188/1984 - loss 0.05404910 - time (sec): 326.38 - samples/sec: 304.71 - lr: 0.000107 - momentum: 0.000000 2023-10-13 06:18:36,607 epoch 4 - iter 1386/1984 - loss 0.05254585 - time (sec): 380.09 - samples/sec: 304.61 - lr: 0.000105 - momentum: 0.000000 2023-10-13 06:19:34,329 epoch 4 - iter 1584/1984 - loss 0.05326865 - time (sec): 437.81 - samples/sec: 299.95 - lr: 0.000103 - momentum: 0.000000 2023-10-13 06:20:28,956 epoch 4 - iter 1782/1984 - loss 0.05356980 - time (sec): 492.44 - samples/sec: 300.72 - lr: 0.000102 - momentum: 0.000000 2023-10-13 06:21:26,190 epoch 4 - iter 1980/1984 - loss 0.05411207 - time (sec): 549.67 - samples/sec: 297.93 - lr: 0.000100 - momentum: 0.000000 2023-10-13 06:21:27,442 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:21:27,442 EPOCH 4 done: loss 0.0544 - lr: 0.000100 2023-10-13 06:21:56,062 DEV : loss 0.1296338140964508 - f1-score (micro avg) 0.7448 2023-10-13 06:21:56,106 saving best model 2023-10-13 06:22:00,166 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:22:57,064 epoch 5 - iter 198/1984 - loss 0.03401430 - time (sec): 56.89 - samples/sec: 285.27 - lr: 0.000098 - momentum: 0.000000 2023-10-13 06:23:49,847 epoch 5 - iter 396/1984 - loss 0.03367653 - time (sec): 109.68 - samples/sec: 287.03 - lr: 0.000097 - momentum: 0.000000 2023-10-13 06:24:44,133 epoch 5 - iter 594/1984 - loss 0.03772318 - time (sec): 163.96 - samples/sec: 294.26 - lr: 0.000095 - momentum: 0.000000 2023-10-13 06:25:37,666 epoch 5 - iter 792/1984 - loss 0.03702507 - time (sec): 217.49 - samples/sec: 296.65 - lr: 0.000093 - momentum: 0.000000 2023-10-13 06:26:37,327 epoch 5 - iter 990/1984 - loss 0.03647125 - time (sec): 277.16 - samples/sec: 298.28 - lr: 0.000092 - momentum: 0.000000 2023-10-13 06:27:29,315 epoch 5 - iter 1188/1984 - loss 0.03821053 - time (sec): 329.14 - samples/sec: 298.94 - lr: 0.000090 - momentum: 0.000000 2023-10-13 06:28:20,911 epoch 5 - iter 1386/1984 - loss 0.04020892 - time (sec): 380.74 - samples/sec: 300.13 - lr: 0.000088 - momentum: 0.000000 2023-10-13 06:29:15,349 epoch 5 - iter 1584/1984 - loss 0.04121393 - time (sec): 435.18 - samples/sec: 298.01 - lr: 0.000087 - momentum: 0.000000 2023-10-13 06:30:15,057 epoch 5 - iter 1782/1984 - loss 0.03983358 - time (sec): 494.89 - samples/sec: 295.16 - lr: 0.000085 - momentum: 0.000000 2023-10-13 06:31:16,160 epoch 5 - iter 1980/1984 - loss 0.04078366 - time (sec): 555.99 - samples/sec: 294.31 - lr: 0.000083 - momentum: 0.000000 2023-10-13 06:31:17,222 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:31:17,222 EPOCH 5 done: loss 0.0407 - lr: 0.000083 2023-10-13 06:31:42,184 DEV : loss 0.14384247362613678 - f1-score (micro avg) 0.7497 2023-10-13 06:31:42,224 saving best model 2023-10-13 06:31:44,772 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:32:37,857 epoch 6 - iter 198/1984 - loss 0.02600490 - time (sec): 53.08 - samples/sec: 290.45 - lr: 0.000082 - momentum: 0.000000 2023-10-13 06:33:33,217 epoch 6 - iter 396/1984 - loss 0.02863585 - time (sec): 108.44 - samples/sec: 290.63 - lr: 0.000080 - momentum: 0.000000 2023-10-13 06:34:28,450 epoch 6 - iter 594/1984 - loss 0.03111200 - time (sec): 163.67 - samples/sec: 291.52 - lr: 0.000078 - momentum: 0.000000 2023-10-13 06:35:21,690 epoch 6 - iter 792/1984 - loss 0.03154475 - time (sec): 216.91 - samples/sec: 297.11 - lr: 0.000077 - momentum: 0.000000 2023-10-13 06:36:14,761 epoch 6 - iter 990/1984 - loss 0.03052400 - time (sec): 269.98 - samples/sec: 302.74 - lr: 0.000075 - momentum: 0.000000 2023-10-13 06:37:09,857 epoch 6 - iter 1188/1984 - loss 0.02980984 - time (sec): 325.08 - samples/sec: 303.28 - lr: 0.000073 - momentum: 0.000000 2023-10-13 06:38:05,132 epoch 6 - iter 1386/1984 - loss 0.02858646 - time (sec): 380.36 - samples/sec: 301.79 - lr: 0.000072 - momentum: 0.000000 2023-10-13 06:38:57,131 epoch 6 - iter 1584/1984 - loss 0.02911257 - time (sec): 432.35 - samples/sec: 301.30 - lr: 0.000070 - momentum: 0.000000 2023-10-13 06:39:51,485 epoch 6 - iter 1782/1984 - loss 0.02938630 - time (sec): 486.71 - samples/sec: 302.81 - lr: 0.000068 - momentum: 0.000000 2023-10-13 06:40:47,211 epoch 6 - iter 1980/1984 - loss 0.02932146 - time (sec): 542.43 - samples/sec: 301.60 - lr: 0.000067 - momentum: 0.000000 2023-10-13 06:40:48,350 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:40:48,350 EPOCH 6 done: loss 0.0293 - lr: 0.000067 2023-10-13 06:41:17,254 DEV : loss 0.1786336749792099 - f1-score (micro avg) 0.7585 2023-10-13 06:41:17,296 saving best model 2023-10-13 06:41:18,383 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:42:15,602 epoch 7 - iter 198/1984 - loss 0.01565821 - time (sec): 57.22 - samples/sec: 273.44 - lr: 0.000065 - momentum: 0.000000 2023-10-13 06:43:13,372 epoch 7 - iter 396/1984 - loss 0.02160073 - time (sec): 114.99 - samples/sec: 275.78 - lr: 0.000063 - momentum: 0.000000 2023-10-13 06:44:11,042 epoch 7 - iter 594/1984 - loss 0.02297071 - time (sec): 172.66 - samples/sec: 279.79 - lr: 0.000062 - momentum: 0.000000 2023-10-13 06:45:06,994 epoch 7 - iter 792/1984 - loss 0.02194959 - time (sec): 228.61 - samples/sec: 281.23 - lr: 0.000060 - momentum: 0.000000 2023-10-13 06:46:02,561 epoch 7 - iter 990/1984 - loss 0.02145332 - time (sec): 284.18 - samples/sec: 283.36 - lr: 0.000058 - momentum: 0.000000 2023-10-13 06:46:55,114 epoch 7 - iter 1188/1984 - loss 0.02157394 - time (sec): 336.73 - samples/sec: 288.44 - lr: 0.000057 - momentum: 0.000000 2023-10-13 06:47:45,302 epoch 7 - iter 1386/1984 - loss 0.02232190 - time (sec): 386.92 - samples/sec: 294.58 - lr: 0.000055 - momentum: 0.000000 2023-10-13 06:48:36,186 epoch 7 - iter 1584/1984 - loss 0.02126233 - time (sec): 437.80 - samples/sec: 297.31 - lr: 0.000053 - momentum: 0.000000 2023-10-13 06:49:30,550 epoch 7 - iter 1782/1984 - loss 0.02124752 - time (sec): 492.16 - samples/sec: 296.78 - lr: 0.000052 - momentum: 0.000000 2023-10-13 06:50:24,014 epoch 7 - iter 1980/1984 - loss 0.02216943 - time (sec): 545.63 - samples/sec: 299.99 - lr: 0.000050 - momentum: 0.000000 2023-10-13 06:50:25,017 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:50:25,018 EPOCH 7 done: loss 0.0221 - lr: 0.000050 2023-10-13 06:50:50,990 DEV : loss 0.19668884575366974 - f1-score (micro avg) 0.7557 2023-10-13 06:50:51,030 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:51:42,700 epoch 8 - iter 198/1984 - loss 0.00771193 - time (sec): 51.67 - samples/sec: 307.77 - lr: 0.000048 - momentum: 0.000000 2023-10-13 06:52:33,732 epoch 8 - iter 396/1984 - loss 0.01096548 - time (sec): 102.70 - samples/sec: 311.86 - lr: 0.000047 - momentum: 0.000000 2023-10-13 06:53:25,866 epoch 8 - iter 594/1984 - loss 0.01124620 - time (sec): 154.83 - samples/sec: 306.71 - lr: 0.000045 - momentum: 0.000000 2023-10-13 06:54:21,829 epoch 8 - iter 792/1984 - loss 0.01189251 - time (sec): 210.80 - samples/sec: 303.37 - lr: 0.000043 - momentum: 0.000000 2023-10-13 06:55:16,537 epoch 8 - iter 990/1984 - loss 0.01234024 - time (sec): 265.50 - samples/sec: 303.81 - lr: 0.000042 - momentum: 0.000000 2023-10-13 06:56:09,844 epoch 8 - iter 1188/1984 - loss 0.01263419 - time (sec): 318.81 - samples/sec: 305.71 - lr: 0.000040 - momentum: 0.000000 2023-10-13 06:57:04,037 epoch 8 - iter 1386/1984 - loss 0.01265776 - time (sec): 373.00 - samples/sec: 303.96 - lr: 0.000038 - momentum: 0.000000 2023-10-13 06:57:55,226 epoch 8 - iter 1584/1984 - loss 0.01301713 - time (sec): 424.19 - samples/sec: 306.78 - lr: 0.000037 - momentum: 0.000000 2023-10-13 06:58:47,190 epoch 8 - iter 1782/1984 - loss 0.01413211 - time (sec): 476.16 - samples/sec: 309.92 - lr: 0.000035 - momentum: 0.000000 2023-10-13 06:59:38,513 epoch 8 - iter 1980/1984 - loss 0.01490286 - time (sec): 527.48 - samples/sec: 310.17 - lr: 0.000033 - momentum: 0.000000 2023-10-13 06:59:39,610 ---------------------------------------------------------------------------------------------------- 2023-10-13 06:59:39,610 EPOCH 8 done: loss 0.0149 - lr: 0.000033 2023-10-13 07:00:04,753 DEV : loss 0.2151404768228531 - f1-score (micro avg) 0.7413 2023-10-13 07:00:04,794 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:00:55,949 epoch 9 - iter 198/1984 - loss 0.00807895 - time (sec): 51.15 - samples/sec: 325.86 - lr: 0.000032 - momentum: 0.000000 2023-10-13 07:01:51,407 epoch 9 - iter 396/1984 - loss 0.01077008 - time (sec): 106.61 - samples/sec: 318.81 - lr: 0.000030 - momentum: 0.000000 2023-10-13 07:02:43,202 epoch 9 - iter 594/1984 - loss 0.01143847 - time (sec): 158.41 - samples/sec: 316.85 - lr: 0.000028 - momentum: 0.000000 2023-10-13 07:03:35,088 epoch 9 - iter 792/1984 - loss 0.01163397 - time (sec): 210.29 - samples/sec: 315.64 - lr: 0.000027 - momentum: 0.000000 2023-10-13 07:04:27,415 epoch 9 - iter 990/1984 - loss 0.01100476 - time (sec): 262.62 - samples/sec: 314.62 - lr: 0.000025 - momentum: 0.000000 2023-10-13 07:05:20,750 epoch 9 - iter 1188/1984 - loss 0.01112938 - time (sec): 315.95 - samples/sec: 308.28 - lr: 0.000023 - momentum: 0.000000 2023-10-13 07:06:13,433 epoch 9 - iter 1386/1984 - loss 0.01138202 - time (sec): 368.64 - samples/sec: 307.01 - lr: 0.000022 - momentum: 0.000000 2023-10-13 07:07:07,062 epoch 9 - iter 1584/1984 - loss 0.01089230 - time (sec): 422.27 - samples/sec: 307.08 - lr: 0.000020 - momentum: 0.000000 2023-10-13 07:08:02,575 epoch 9 - iter 1782/1984 - loss 0.01217880 - time (sec): 477.78 - samples/sec: 306.61 - lr: 0.000018 - momentum: 0.000000 2023-10-13 07:08:58,446 epoch 9 - iter 1980/1984 - loss 0.01165758 - time (sec): 533.65 - samples/sec: 306.75 - lr: 0.000017 - momentum: 0.000000 2023-10-13 07:08:59,508 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:08:59,508 EPOCH 9 done: loss 0.0117 - lr: 0.000017 2023-10-13 07:09:24,667 DEV : loss 0.22990703582763672 - f1-score (micro avg) 0.7597 2023-10-13 07:09:24,711 saving best model 2023-10-13 07:09:27,881 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:10:20,160 epoch 10 - iter 198/1984 - loss 0.00796034 - time (sec): 52.27 - samples/sec: 315.72 - lr: 0.000015 - momentum: 0.000000 2023-10-13 07:11:12,693 epoch 10 - iter 396/1984 - loss 0.00819348 - time (sec): 104.81 - samples/sec: 314.94 - lr: 0.000013 - momentum: 0.000000 2023-10-13 07:12:06,325 epoch 10 - iter 594/1984 - loss 0.00998088 - time (sec): 158.44 - samples/sec: 311.36 - lr: 0.000012 - momentum: 0.000000 2023-10-13 07:12:58,511 epoch 10 - iter 792/1984 - loss 0.00872009 - time (sec): 210.63 - samples/sec: 315.36 - lr: 0.000010 - momentum: 0.000000 2023-10-13 07:13:49,092 epoch 10 - iter 990/1984 - loss 0.00830259 - time (sec): 261.21 - samples/sec: 315.50 - lr: 0.000008 - momentum: 0.000000 2023-10-13 07:14:40,128 epoch 10 - iter 1188/1984 - loss 0.00852866 - time (sec): 312.24 - samples/sec: 315.34 - lr: 0.000007 - momentum: 0.000000 2023-10-13 07:15:33,304 epoch 10 - iter 1386/1984 - loss 0.00801943 - time (sec): 365.42 - samples/sec: 316.72 - lr: 0.000005 - momentum: 0.000000 2023-10-13 07:16:28,514 epoch 10 - iter 1584/1984 - loss 0.00810705 - time (sec): 420.63 - samples/sec: 314.79 - lr: 0.000003 - momentum: 0.000000 2023-10-13 07:17:20,494 epoch 10 - iter 1782/1984 - loss 0.00802002 - time (sec): 472.61 - samples/sec: 312.72 - lr: 0.000002 - momentum: 0.000000 2023-10-13 07:18:10,876 epoch 10 - iter 1980/1984 - loss 0.00791535 - time (sec): 522.99 - samples/sec: 312.85 - lr: 0.000000 - momentum: 0.000000 2023-10-13 07:18:11,919 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:18:11,920 EPOCH 10 done: loss 0.0079 - lr: 0.000000 2023-10-13 07:18:36,515 DEV : loss 0.23257124423980713 - f1-score (micro avg) 0.7575 2023-10-13 07:18:37,476 ---------------------------------------------------------------------------------------------------- 2023-10-13 07:18:37,478 Loading model from best epoch ... 2023-10-13 07:18:41,683 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 07:19:08,466 Results: - F-score (micro) 0.7605 - F-score (macro) 0.6686 - Accuracy 0.6421 By class: precision recall f1-score support LOC 0.8172 0.8397 0.8283 655 PER 0.6693 0.7713 0.7167 223 ORG 0.5146 0.4173 0.4609 127 micro avg 0.7502 0.7711 0.7605 1005 macro avg 0.6670 0.6761 0.6686 1005 weighted avg 0.7462 0.7711 0.7571 1005 2023-10-13 07:19:08,466 ----------------------------------------------------------------------------------------------------