stefan-it's picture
Upload folder using huggingface_hub
3a042c9
2023-10-13 05:43:31,169 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,171 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-13 05:43:31,172 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,172 MultiCorpus: 7936 train + 992 dev + 992 test sentences
- NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-13 05:43:31,172 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,172 Train: 7936 sentences
2023-10-13 05:43:31,172 (train_with_dev=False, train_with_test=False)
2023-10-13 05:43:31,172 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,172 Training Params:
2023-10-13 05:43:31,172 - learning_rate: "0.00015"
2023-10-13 05:43:31,172 - mini_batch_size: "4"
2023-10-13 05:43:31,172 - max_epochs: "10"
2023-10-13 05:43:31,172 - shuffle: "True"
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Plugins:
2023-10-13 05:43:31,173 - TensorboardLogger
2023-10-13 05:43:31,173 - LinearScheduler | warmup_fraction: '0.1'
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Final evaluation on model from best epoch (best-model.pt)
2023-10-13 05:43:31,173 - metric: "('micro avg', 'f1-score')"
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Computation:
2023-10-13 05:43:31,173 - compute on device: cuda:0
2023-10-13 05:43:31,173 - embedding storage: none
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,173 ----------------------------------------------------------------------------------------------------
2023-10-13 05:43:31,174 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-13 05:44:23,669 epoch 1 - iter 198/1984 - loss 2.56023642 - time (sec): 52.49 - samples/sec: 310.74 - lr: 0.000015 - momentum: 0.000000
2023-10-13 05:45:18,867 epoch 1 - iter 396/1984 - loss 2.35774357 - time (sec): 107.69 - samples/sec: 309.60 - lr: 0.000030 - momentum: 0.000000
2023-10-13 05:46:14,303 epoch 1 - iter 594/1984 - loss 2.05704108 - time (sec): 163.13 - samples/sec: 307.54 - lr: 0.000045 - momentum: 0.000000
2023-10-13 05:47:06,479 epoch 1 - iter 792/1984 - loss 1.74769940 - time (sec): 215.30 - samples/sec: 307.47 - lr: 0.000060 - momentum: 0.000000
2023-10-13 05:47:57,574 epoch 1 - iter 990/1984 - loss 1.49982570 - time (sec): 266.40 - samples/sec: 310.06 - lr: 0.000075 - momentum: 0.000000
2023-10-13 05:48:51,134 epoch 1 - iter 1188/1984 - loss 1.30358343 - time (sec): 319.96 - samples/sec: 308.94 - lr: 0.000090 - momentum: 0.000000
2023-10-13 05:49:47,616 epoch 1 - iter 1386/1984 - loss 1.15490690 - time (sec): 376.44 - samples/sec: 304.80 - lr: 0.000105 - momentum: 0.000000
2023-10-13 05:50:43,445 epoch 1 - iter 1584/1984 - loss 1.03965057 - time (sec): 432.27 - samples/sec: 303.80 - lr: 0.000120 - momentum: 0.000000
2023-10-13 05:51:37,606 epoch 1 - iter 1782/1984 - loss 0.94460620 - time (sec): 486.43 - samples/sec: 303.83 - lr: 0.000135 - momentum: 0.000000
2023-10-13 05:52:30,621 epoch 1 - iter 1980/1984 - loss 0.87250029 - time (sec): 539.45 - samples/sec: 303.15 - lr: 0.000150 - momentum: 0.000000
2023-10-13 05:52:31,783 ----------------------------------------------------------------------------------------------------
2023-10-13 05:52:31,783 EPOCH 1 done: loss 0.8711 - lr: 0.000150
2023-10-13 05:52:57,860 DEV : loss 0.16231723129749298 - f1-score (micro avg) 0.663
2023-10-13 05:52:57,906 saving best model
2023-10-13 05:52:58,787 ----------------------------------------------------------------------------------------------------
2023-10-13 05:53:54,708 epoch 2 - iter 198/1984 - loss 0.15963194 - time (sec): 55.92 - samples/sec: 295.50 - lr: 0.000148 - momentum: 0.000000
2023-10-13 05:54:49,891 epoch 2 - iter 396/1984 - loss 0.15446338 - time (sec): 111.10 - samples/sec: 295.65 - lr: 0.000147 - momentum: 0.000000
2023-10-13 05:55:42,962 epoch 2 - iter 594/1984 - loss 0.14934617 - time (sec): 164.17 - samples/sec: 298.92 - lr: 0.000145 - momentum: 0.000000
2023-10-13 05:56:38,570 epoch 2 - iter 792/1984 - loss 0.14004568 - time (sec): 219.78 - samples/sec: 293.01 - lr: 0.000143 - momentum: 0.000000
2023-10-13 05:57:34,636 epoch 2 - iter 990/1984 - loss 0.13664831 - time (sec): 275.85 - samples/sec: 296.06 - lr: 0.000142 - momentum: 0.000000
2023-10-13 05:58:28,427 epoch 2 - iter 1188/1984 - loss 0.13641309 - time (sec): 329.64 - samples/sec: 297.14 - lr: 0.000140 - momentum: 0.000000
2023-10-13 05:59:22,135 epoch 2 - iter 1386/1984 - loss 0.13311507 - time (sec): 383.35 - samples/sec: 298.85 - lr: 0.000138 - momentum: 0.000000
2023-10-13 06:00:19,474 epoch 2 - iter 1584/1984 - loss 0.13084502 - time (sec): 440.68 - samples/sec: 296.84 - lr: 0.000137 - momentum: 0.000000
2023-10-13 06:01:14,119 epoch 2 - iter 1782/1984 - loss 0.12778979 - time (sec): 495.33 - samples/sec: 297.40 - lr: 0.000135 - momentum: 0.000000
2023-10-13 06:02:07,974 epoch 2 - iter 1980/1984 - loss 0.12586842 - time (sec): 549.18 - samples/sec: 298.12 - lr: 0.000133 - momentum: 0.000000
2023-10-13 06:02:08,971 ----------------------------------------------------------------------------------------------------
2023-10-13 06:02:08,972 EPOCH 2 done: loss 0.1258 - lr: 0.000133
2023-10-13 06:02:35,243 DEV : loss 0.08949719369411469 - f1-score (micro avg) 0.7352
2023-10-13 06:02:35,285 saving best model
2023-10-13 06:02:37,860 ----------------------------------------------------------------------------------------------------
2023-10-13 06:03:32,717 epoch 3 - iter 198/1984 - loss 0.07354557 - time (sec): 54.85 - samples/sec: 312.62 - lr: 0.000132 - momentum: 0.000000
2023-10-13 06:04:25,496 epoch 3 - iter 396/1984 - loss 0.07621576 - time (sec): 107.63 - samples/sec: 308.46 - lr: 0.000130 - momentum: 0.000000
2023-10-13 06:05:19,074 epoch 3 - iter 594/1984 - loss 0.08125367 - time (sec): 161.21 - samples/sec: 306.72 - lr: 0.000128 - momentum: 0.000000
2023-10-13 06:06:15,512 epoch 3 - iter 792/1984 - loss 0.07585305 - time (sec): 217.65 - samples/sec: 303.05 - lr: 0.000127 - momentum: 0.000000
2023-10-13 06:07:10,722 epoch 3 - iter 990/1984 - loss 0.07972188 - time (sec): 272.86 - samples/sec: 298.46 - lr: 0.000125 - momentum: 0.000000
2023-10-13 06:08:07,849 epoch 3 - iter 1188/1984 - loss 0.07889533 - time (sec): 329.98 - samples/sec: 295.56 - lr: 0.000123 - momentum: 0.000000
2023-10-13 06:09:01,056 epoch 3 - iter 1386/1984 - loss 0.07614265 - time (sec): 383.19 - samples/sec: 297.04 - lr: 0.000122 - momentum: 0.000000
2023-10-13 06:09:55,354 epoch 3 - iter 1584/1984 - loss 0.07610978 - time (sec): 437.49 - samples/sec: 297.93 - lr: 0.000120 - momentum: 0.000000
2023-10-13 06:10:50,635 epoch 3 - iter 1782/1984 - loss 0.07645598 - time (sec): 492.77 - samples/sec: 299.47 - lr: 0.000118 - momentum: 0.000000
2023-10-13 06:11:45,893 epoch 3 - iter 1980/1984 - loss 0.07656055 - time (sec): 548.03 - samples/sec: 298.44 - lr: 0.000117 - momentum: 0.000000
2023-10-13 06:11:47,067 ----------------------------------------------------------------------------------------------------
2023-10-13 06:11:47,067 EPOCH 3 done: loss 0.0764 - lr: 0.000117
2023-10-13 06:12:13,772 DEV : loss 0.10229434072971344 - f1-score (micro avg) 0.7421
2023-10-13 06:12:13,819 saving best model
2023-10-13 06:12:16,515 ----------------------------------------------------------------------------------------------------
2023-10-13 06:13:11,781 epoch 4 - iter 198/1984 - loss 0.06257893 - time (sec): 55.26 - samples/sec: 301.67 - lr: 0.000115 - momentum: 0.000000
2023-10-13 06:14:06,955 epoch 4 - iter 396/1984 - loss 0.05453809 - time (sec): 110.44 - samples/sec: 294.81 - lr: 0.000113 - momentum: 0.000000
2023-10-13 06:15:02,500 epoch 4 - iter 594/1984 - loss 0.05487829 - time (sec): 165.98 - samples/sec: 304.37 - lr: 0.000112 - momentum: 0.000000
2023-10-13 06:15:57,389 epoch 4 - iter 792/1984 - loss 0.05252948 - time (sec): 220.87 - samples/sec: 301.83 - lr: 0.000110 - momentum: 0.000000
2023-10-13 06:16:50,636 epoch 4 - iter 990/1984 - loss 0.05421408 - time (sec): 274.12 - samples/sec: 303.57 - lr: 0.000108 - momentum: 0.000000
2023-10-13 06:17:42,895 epoch 4 - iter 1188/1984 - loss 0.05404910 - time (sec): 326.38 - samples/sec: 304.71 - lr: 0.000107 - momentum: 0.000000
2023-10-13 06:18:36,607 epoch 4 - iter 1386/1984 - loss 0.05254585 - time (sec): 380.09 - samples/sec: 304.61 - lr: 0.000105 - momentum: 0.000000
2023-10-13 06:19:34,329 epoch 4 - iter 1584/1984 - loss 0.05326865 - time (sec): 437.81 - samples/sec: 299.95 - lr: 0.000103 - momentum: 0.000000
2023-10-13 06:20:28,956 epoch 4 - iter 1782/1984 - loss 0.05356980 - time (sec): 492.44 - samples/sec: 300.72 - lr: 0.000102 - momentum: 0.000000
2023-10-13 06:21:26,190 epoch 4 - iter 1980/1984 - loss 0.05411207 - time (sec): 549.67 - samples/sec: 297.93 - lr: 0.000100 - momentum: 0.000000
2023-10-13 06:21:27,442 ----------------------------------------------------------------------------------------------------
2023-10-13 06:21:27,442 EPOCH 4 done: loss 0.0544 - lr: 0.000100
2023-10-13 06:21:56,062 DEV : loss 0.1296338140964508 - f1-score (micro avg) 0.7448
2023-10-13 06:21:56,106 saving best model
2023-10-13 06:22:00,166 ----------------------------------------------------------------------------------------------------
2023-10-13 06:22:57,064 epoch 5 - iter 198/1984 - loss 0.03401430 - time (sec): 56.89 - samples/sec: 285.27 - lr: 0.000098 - momentum: 0.000000
2023-10-13 06:23:49,847 epoch 5 - iter 396/1984 - loss 0.03367653 - time (sec): 109.68 - samples/sec: 287.03 - lr: 0.000097 - momentum: 0.000000
2023-10-13 06:24:44,133 epoch 5 - iter 594/1984 - loss 0.03772318 - time (sec): 163.96 - samples/sec: 294.26 - lr: 0.000095 - momentum: 0.000000
2023-10-13 06:25:37,666 epoch 5 - iter 792/1984 - loss 0.03702507 - time (sec): 217.49 - samples/sec: 296.65 - lr: 0.000093 - momentum: 0.000000
2023-10-13 06:26:37,327 epoch 5 - iter 990/1984 - loss 0.03647125 - time (sec): 277.16 - samples/sec: 298.28 - lr: 0.000092 - momentum: 0.000000
2023-10-13 06:27:29,315 epoch 5 - iter 1188/1984 - loss 0.03821053 - time (sec): 329.14 - samples/sec: 298.94 - lr: 0.000090 - momentum: 0.000000
2023-10-13 06:28:20,911 epoch 5 - iter 1386/1984 - loss 0.04020892 - time (sec): 380.74 - samples/sec: 300.13 - lr: 0.000088 - momentum: 0.000000
2023-10-13 06:29:15,349 epoch 5 - iter 1584/1984 - loss 0.04121393 - time (sec): 435.18 - samples/sec: 298.01 - lr: 0.000087 - momentum: 0.000000
2023-10-13 06:30:15,057 epoch 5 - iter 1782/1984 - loss 0.03983358 - time (sec): 494.89 - samples/sec: 295.16 - lr: 0.000085 - momentum: 0.000000
2023-10-13 06:31:16,160 epoch 5 - iter 1980/1984 - loss 0.04078366 - time (sec): 555.99 - samples/sec: 294.31 - lr: 0.000083 - momentum: 0.000000
2023-10-13 06:31:17,222 ----------------------------------------------------------------------------------------------------
2023-10-13 06:31:17,222 EPOCH 5 done: loss 0.0407 - lr: 0.000083
2023-10-13 06:31:42,184 DEV : loss 0.14384247362613678 - f1-score (micro avg) 0.7497
2023-10-13 06:31:42,224 saving best model
2023-10-13 06:31:44,772 ----------------------------------------------------------------------------------------------------
2023-10-13 06:32:37,857 epoch 6 - iter 198/1984 - loss 0.02600490 - time (sec): 53.08 - samples/sec: 290.45 - lr: 0.000082 - momentum: 0.000000
2023-10-13 06:33:33,217 epoch 6 - iter 396/1984 - loss 0.02863585 - time (sec): 108.44 - samples/sec: 290.63 - lr: 0.000080 - momentum: 0.000000
2023-10-13 06:34:28,450 epoch 6 - iter 594/1984 - loss 0.03111200 - time (sec): 163.67 - samples/sec: 291.52 - lr: 0.000078 - momentum: 0.000000
2023-10-13 06:35:21,690 epoch 6 - iter 792/1984 - loss 0.03154475 - time (sec): 216.91 - samples/sec: 297.11 - lr: 0.000077 - momentum: 0.000000
2023-10-13 06:36:14,761 epoch 6 - iter 990/1984 - loss 0.03052400 - time (sec): 269.98 - samples/sec: 302.74 - lr: 0.000075 - momentum: 0.000000
2023-10-13 06:37:09,857 epoch 6 - iter 1188/1984 - loss 0.02980984 - time (sec): 325.08 - samples/sec: 303.28 - lr: 0.000073 - momentum: 0.000000
2023-10-13 06:38:05,132 epoch 6 - iter 1386/1984 - loss 0.02858646 - time (sec): 380.36 - samples/sec: 301.79 - lr: 0.000072 - momentum: 0.000000
2023-10-13 06:38:57,131 epoch 6 - iter 1584/1984 - loss 0.02911257 - time (sec): 432.35 - samples/sec: 301.30 - lr: 0.000070 - momentum: 0.000000
2023-10-13 06:39:51,485 epoch 6 - iter 1782/1984 - loss 0.02938630 - time (sec): 486.71 - samples/sec: 302.81 - lr: 0.000068 - momentum: 0.000000
2023-10-13 06:40:47,211 epoch 6 - iter 1980/1984 - loss 0.02932146 - time (sec): 542.43 - samples/sec: 301.60 - lr: 0.000067 - momentum: 0.000000
2023-10-13 06:40:48,350 ----------------------------------------------------------------------------------------------------
2023-10-13 06:40:48,350 EPOCH 6 done: loss 0.0293 - lr: 0.000067
2023-10-13 06:41:17,254 DEV : loss 0.1786336749792099 - f1-score (micro avg) 0.7585
2023-10-13 06:41:17,296 saving best model
2023-10-13 06:41:18,383 ----------------------------------------------------------------------------------------------------
2023-10-13 06:42:15,602 epoch 7 - iter 198/1984 - loss 0.01565821 - time (sec): 57.22 - samples/sec: 273.44 - lr: 0.000065 - momentum: 0.000000
2023-10-13 06:43:13,372 epoch 7 - iter 396/1984 - loss 0.02160073 - time (sec): 114.99 - samples/sec: 275.78 - lr: 0.000063 - momentum: 0.000000
2023-10-13 06:44:11,042 epoch 7 - iter 594/1984 - loss 0.02297071 - time (sec): 172.66 - samples/sec: 279.79 - lr: 0.000062 - momentum: 0.000000
2023-10-13 06:45:06,994 epoch 7 - iter 792/1984 - loss 0.02194959 - time (sec): 228.61 - samples/sec: 281.23 - lr: 0.000060 - momentum: 0.000000
2023-10-13 06:46:02,561 epoch 7 - iter 990/1984 - loss 0.02145332 - time (sec): 284.18 - samples/sec: 283.36 - lr: 0.000058 - momentum: 0.000000
2023-10-13 06:46:55,114 epoch 7 - iter 1188/1984 - loss 0.02157394 - time (sec): 336.73 - samples/sec: 288.44 - lr: 0.000057 - momentum: 0.000000
2023-10-13 06:47:45,302 epoch 7 - iter 1386/1984 - loss 0.02232190 - time (sec): 386.92 - samples/sec: 294.58 - lr: 0.000055 - momentum: 0.000000
2023-10-13 06:48:36,186 epoch 7 - iter 1584/1984 - loss 0.02126233 - time (sec): 437.80 - samples/sec: 297.31 - lr: 0.000053 - momentum: 0.000000
2023-10-13 06:49:30,550 epoch 7 - iter 1782/1984 - loss 0.02124752 - time (sec): 492.16 - samples/sec: 296.78 - lr: 0.000052 - momentum: 0.000000
2023-10-13 06:50:24,014 epoch 7 - iter 1980/1984 - loss 0.02216943 - time (sec): 545.63 - samples/sec: 299.99 - lr: 0.000050 - momentum: 0.000000
2023-10-13 06:50:25,017 ----------------------------------------------------------------------------------------------------
2023-10-13 06:50:25,018 EPOCH 7 done: loss 0.0221 - lr: 0.000050
2023-10-13 06:50:50,990 DEV : loss 0.19668884575366974 - f1-score (micro avg) 0.7557
2023-10-13 06:50:51,030 ----------------------------------------------------------------------------------------------------
2023-10-13 06:51:42,700 epoch 8 - iter 198/1984 - loss 0.00771193 - time (sec): 51.67 - samples/sec: 307.77 - lr: 0.000048 - momentum: 0.000000
2023-10-13 06:52:33,732 epoch 8 - iter 396/1984 - loss 0.01096548 - time (sec): 102.70 - samples/sec: 311.86 - lr: 0.000047 - momentum: 0.000000
2023-10-13 06:53:25,866 epoch 8 - iter 594/1984 - loss 0.01124620 - time (sec): 154.83 - samples/sec: 306.71 - lr: 0.000045 - momentum: 0.000000
2023-10-13 06:54:21,829 epoch 8 - iter 792/1984 - loss 0.01189251 - time (sec): 210.80 - samples/sec: 303.37 - lr: 0.000043 - momentum: 0.000000
2023-10-13 06:55:16,537 epoch 8 - iter 990/1984 - loss 0.01234024 - time (sec): 265.50 - samples/sec: 303.81 - lr: 0.000042 - momentum: 0.000000
2023-10-13 06:56:09,844 epoch 8 - iter 1188/1984 - loss 0.01263419 - time (sec): 318.81 - samples/sec: 305.71 - lr: 0.000040 - momentum: 0.000000
2023-10-13 06:57:04,037 epoch 8 - iter 1386/1984 - loss 0.01265776 - time (sec): 373.00 - samples/sec: 303.96 - lr: 0.000038 - momentum: 0.000000
2023-10-13 06:57:55,226 epoch 8 - iter 1584/1984 - loss 0.01301713 - time (sec): 424.19 - samples/sec: 306.78 - lr: 0.000037 - momentum: 0.000000
2023-10-13 06:58:47,190 epoch 8 - iter 1782/1984 - loss 0.01413211 - time (sec): 476.16 - samples/sec: 309.92 - lr: 0.000035 - momentum: 0.000000
2023-10-13 06:59:38,513 epoch 8 - iter 1980/1984 - loss 0.01490286 - time (sec): 527.48 - samples/sec: 310.17 - lr: 0.000033 - momentum: 0.000000
2023-10-13 06:59:39,610 ----------------------------------------------------------------------------------------------------
2023-10-13 06:59:39,610 EPOCH 8 done: loss 0.0149 - lr: 0.000033
2023-10-13 07:00:04,753 DEV : loss 0.2151404768228531 - f1-score (micro avg) 0.7413
2023-10-13 07:00:04,794 ----------------------------------------------------------------------------------------------------
2023-10-13 07:00:55,949 epoch 9 - iter 198/1984 - loss 0.00807895 - time (sec): 51.15 - samples/sec: 325.86 - lr: 0.000032 - momentum: 0.000000
2023-10-13 07:01:51,407 epoch 9 - iter 396/1984 - loss 0.01077008 - time (sec): 106.61 - samples/sec: 318.81 - lr: 0.000030 - momentum: 0.000000
2023-10-13 07:02:43,202 epoch 9 - iter 594/1984 - loss 0.01143847 - time (sec): 158.41 - samples/sec: 316.85 - lr: 0.000028 - momentum: 0.000000
2023-10-13 07:03:35,088 epoch 9 - iter 792/1984 - loss 0.01163397 - time (sec): 210.29 - samples/sec: 315.64 - lr: 0.000027 - momentum: 0.000000
2023-10-13 07:04:27,415 epoch 9 - iter 990/1984 - loss 0.01100476 - time (sec): 262.62 - samples/sec: 314.62 - lr: 0.000025 - momentum: 0.000000
2023-10-13 07:05:20,750 epoch 9 - iter 1188/1984 - loss 0.01112938 - time (sec): 315.95 - samples/sec: 308.28 - lr: 0.000023 - momentum: 0.000000
2023-10-13 07:06:13,433 epoch 9 - iter 1386/1984 - loss 0.01138202 - time (sec): 368.64 - samples/sec: 307.01 - lr: 0.000022 - momentum: 0.000000
2023-10-13 07:07:07,062 epoch 9 - iter 1584/1984 - loss 0.01089230 - time (sec): 422.27 - samples/sec: 307.08 - lr: 0.000020 - momentum: 0.000000
2023-10-13 07:08:02,575 epoch 9 - iter 1782/1984 - loss 0.01217880 - time (sec): 477.78 - samples/sec: 306.61 - lr: 0.000018 - momentum: 0.000000
2023-10-13 07:08:58,446 epoch 9 - iter 1980/1984 - loss 0.01165758 - time (sec): 533.65 - samples/sec: 306.75 - lr: 0.000017 - momentum: 0.000000
2023-10-13 07:08:59,508 ----------------------------------------------------------------------------------------------------
2023-10-13 07:08:59,508 EPOCH 9 done: loss 0.0117 - lr: 0.000017
2023-10-13 07:09:24,667 DEV : loss 0.22990703582763672 - f1-score (micro avg) 0.7597
2023-10-13 07:09:24,711 saving best model
2023-10-13 07:09:27,881 ----------------------------------------------------------------------------------------------------
2023-10-13 07:10:20,160 epoch 10 - iter 198/1984 - loss 0.00796034 - time (sec): 52.27 - samples/sec: 315.72 - lr: 0.000015 - momentum: 0.000000
2023-10-13 07:11:12,693 epoch 10 - iter 396/1984 - loss 0.00819348 - time (sec): 104.81 - samples/sec: 314.94 - lr: 0.000013 - momentum: 0.000000
2023-10-13 07:12:06,325 epoch 10 - iter 594/1984 - loss 0.00998088 - time (sec): 158.44 - samples/sec: 311.36 - lr: 0.000012 - momentum: 0.000000
2023-10-13 07:12:58,511 epoch 10 - iter 792/1984 - loss 0.00872009 - time (sec): 210.63 - samples/sec: 315.36 - lr: 0.000010 - momentum: 0.000000
2023-10-13 07:13:49,092 epoch 10 - iter 990/1984 - loss 0.00830259 - time (sec): 261.21 - samples/sec: 315.50 - lr: 0.000008 - momentum: 0.000000
2023-10-13 07:14:40,128 epoch 10 - iter 1188/1984 - loss 0.00852866 - time (sec): 312.24 - samples/sec: 315.34 - lr: 0.000007 - momentum: 0.000000
2023-10-13 07:15:33,304 epoch 10 - iter 1386/1984 - loss 0.00801943 - time (sec): 365.42 - samples/sec: 316.72 - lr: 0.000005 - momentum: 0.000000
2023-10-13 07:16:28,514 epoch 10 - iter 1584/1984 - loss 0.00810705 - time (sec): 420.63 - samples/sec: 314.79 - lr: 0.000003 - momentum: 0.000000
2023-10-13 07:17:20,494 epoch 10 - iter 1782/1984 - loss 0.00802002 - time (sec): 472.61 - samples/sec: 312.72 - lr: 0.000002 - momentum: 0.000000
2023-10-13 07:18:10,876 epoch 10 - iter 1980/1984 - loss 0.00791535 - time (sec): 522.99 - samples/sec: 312.85 - lr: 0.000000 - momentum: 0.000000
2023-10-13 07:18:11,919 ----------------------------------------------------------------------------------------------------
2023-10-13 07:18:11,920 EPOCH 10 done: loss 0.0079 - lr: 0.000000
2023-10-13 07:18:36,515 DEV : loss 0.23257124423980713 - f1-score (micro avg) 0.7575
2023-10-13 07:18:37,476 ----------------------------------------------------------------------------------------------------
2023-10-13 07:18:37,478 Loading model from best epoch ...
2023-10-13 07:18:41,683 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-13 07:19:08,466
Results:
- F-score (micro) 0.7605
- F-score (macro) 0.6686
- Accuracy 0.6421
By class:
precision recall f1-score support
LOC 0.8172 0.8397 0.8283 655
PER 0.6693 0.7713 0.7167 223
ORG 0.5146 0.4173 0.4609 127
micro avg 0.7502 0.7711 0.7605 1005
macro avg 0.6670 0.6761 0.6686 1005
weighted avg 0.7462 0.7711 0.7571 1005
2023-10-13 07:19:08,466 ----------------------------------------------------------------------------------------------------