stefan-it's picture
Upload ./training.log with huggingface_hub
625da00
2023-10-24 17:53:07,606 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,607 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=13, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-24 17:53:07,607 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,607 MultiCorpus: 7936 train + 992 dev + 992 test sentences
- NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /home/ubuntu/.flair/datasets/ner_icdar_europeana/fr
2023-10-24 17:53:07,607 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,607 Train: 7936 sentences
2023-10-24 17:53:07,607 (train_with_dev=False, train_with_test=False)
2023-10-24 17:53:07,607 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,607 Training Params:
2023-10-24 17:53:07,607 - learning_rate: "5e-05"
2023-10-24 17:53:07,607 - mini_batch_size: "8"
2023-10-24 17:53:07,607 - max_epochs: "10"
2023-10-24 17:53:07,607 - shuffle: "True"
2023-10-24 17:53:07,607 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,607 Plugins:
2023-10-24 17:53:07,607 - TensorboardLogger
2023-10-24 17:53:07,607 - LinearScheduler | warmup_fraction: '0.1'
2023-10-24 17:53:07,607 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,607 Final evaluation on model from best epoch (best-model.pt)
2023-10-24 17:53:07,608 - metric: "('micro avg', 'f1-score')"
2023-10-24 17:53:07,608 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,608 Computation:
2023-10-24 17:53:07,608 - compute on device: cuda:0
2023-10-24 17:53:07,608 - embedding storage: none
2023-10-24 17:53:07,608 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,608 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-24 17:53:07,608 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,608 ----------------------------------------------------------------------------------------------------
2023-10-24 17:53:07,608 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-24 17:53:16,110 epoch 1 - iter 99/992 - loss 1.45019353 - time (sec): 8.50 - samples/sec: 2052.11 - lr: 0.000005 - momentum: 0.000000
2023-10-24 17:53:24,479 epoch 1 - iter 198/992 - loss 0.90651188 - time (sec): 16.87 - samples/sec: 1995.74 - lr: 0.000010 - momentum: 0.000000
2023-10-24 17:53:32,526 epoch 1 - iter 297/992 - loss 0.68242155 - time (sec): 24.92 - samples/sec: 1970.50 - lr: 0.000015 - momentum: 0.000000
2023-10-24 17:53:40,903 epoch 1 - iter 396/992 - loss 0.55232472 - time (sec): 33.29 - samples/sec: 1971.25 - lr: 0.000020 - momentum: 0.000000
2023-10-24 17:53:49,010 epoch 1 - iter 495/992 - loss 0.47511537 - time (sec): 41.40 - samples/sec: 1964.36 - lr: 0.000025 - momentum: 0.000000
2023-10-24 17:53:57,155 epoch 1 - iter 594/992 - loss 0.42026811 - time (sec): 49.55 - samples/sec: 1960.16 - lr: 0.000030 - momentum: 0.000000
2023-10-24 17:54:05,777 epoch 1 - iter 693/992 - loss 0.37516782 - time (sec): 58.17 - samples/sec: 1957.77 - lr: 0.000035 - momentum: 0.000000
2023-10-24 17:54:14,242 epoch 1 - iter 792/992 - loss 0.34277188 - time (sec): 66.63 - samples/sec: 1955.19 - lr: 0.000040 - momentum: 0.000000
2023-10-24 17:54:22,644 epoch 1 - iter 891/992 - loss 0.32033942 - time (sec): 75.04 - samples/sec: 1963.07 - lr: 0.000045 - momentum: 0.000000
2023-10-24 17:54:31,054 epoch 1 - iter 990/992 - loss 0.30228680 - time (sec): 83.45 - samples/sec: 1960.73 - lr: 0.000050 - momentum: 0.000000
2023-10-24 17:54:31,234 ----------------------------------------------------------------------------------------------------
2023-10-24 17:54:31,235 EPOCH 1 done: loss 0.3019 - lr: 0.000050
2023-10-24 17:54:34,306 DEV : loss 0.08691307157278061 - f1-score (micro avg) 0.7201
2023-10-24 17:54:34,321 saving best model
2023-10-24 17:54:34,791 ----------------------------------------------------------------------------------------------------
2023-10-24 17:54:42,930 epoch 2 - iter 99/992 - loss 0.09527419 - time (sec): 8.14 - samples/sec: 2003.53 - lr: 0.000049 - momentum: 0.000000
2023-10-24 17:54:51,239 epoch 2 - iter 198/992 - loss 0.09556154 - time (sec): 16.45 - samples/sec: 1975.24 - lr: 0.000049 - momentum: 0.000000
2023-10-24 17:54:59,412 epoch 2 - iter 297/992 - loss 0.09905757 - time (sec): 24.62 - samples/sec: 1984.65 - lr: 0.000048 - momentum: 0.000000
2023-10-24 17:55:07,962 epoch 2 - iter 396/992 - loss 0.10241445 - time (sec): 33.17 - samples/sec: 1977.76 - lr: 0.000048 - momentum: 0.000000
2023-10-24 17:55:16,323 epoch 2 - iter 495/992 - loss 0.10157096 - time (sec): 41.53 - samples/sec: 1984.19 - lr: 0.000047 - momentum: 0.000000
2023-10-24 17:55:24,569 epoch 2 - iter 594/992 - loss 0.10195348 - time (sec): 49.78 - samples/sec: 1983.29 - lr: 0.000047 - momentum: 0.000000
2023-10-24 17:55:33,030 epoch 2 - iter 693/992 - loss 0.10065037 - time (sec): 58.24 - samples/sec: 1983.87 - lr: 0.000046 - momentum: 0.000000
2023-10-24 17:55:41,369 epoch 2 - iter 792/992 - loss 0.09922714 - time (sec): 66.58 - samples/sec: 1970.91 - lr: 0.000046 - momentum: 0.000000
2023-10-24 17:55:49,709 epoch 2 - iter 891/992 - loss 0.09994013 - time (sec): 74.92 - samples/sec: 1964.70 - lr: 0.000045 - momentum: 0.000000
2023-10-24 17:55:58,196 epoch 2 - iter 990/992 - loss 0.10114388 - time (sec): 83.40 - samples/sec: 1963.04 - lr: 0.000044 - momentum: 0.000000
2023-10-24 17:55:58,341 ----------------------------------------------------------------------------------------------------
2023-10-24 17:55:58,341 EPOCH 2 done: loss 0.1011 - lr: 0.000044
2023-10-24 17:56:01,444 DEV : loss 0.09098362177610397 - f1-score (micro avg) 0.743
2023-10-24 17:56:01,459 saving best model
2023-10-24 17:56:02,049 ----------------------------------------------------------------------------------------------------
2023-10-24 17:56:10,259 epoch 3 - iter 99/992 - loss 0.06165896 - time (sec): 8.21 - samples/sec: 1971.22 - lr: 0.000044 - momentum: 0.000000
2023-10-24 17:56:18,758 epoch 3 - iter 198/992 - loss 0.06593462 - time (sec): 16.71 - samples/sec: 1971.74 - lr: 0.000043 - momentum: 0.000000
2023-10-24 17:56:27,234 epoch 3 - iter 297/992 - loss 0.07089123 - time (sec): 25.18 - samples/sec: 1940.51 - lr: 0.000043 - momentum: 0.000000
2023-10-24 17:56:35,331 epoch 3 - iter 396/992 - loss 0.06905276 - time (sec): 33.28 - samples/sec: 1946.67 - lr: 0.000042 - momentum: 0.000000
2023-10-24 17:56:44,066 epoch 3 - iter 495/992 - loss 0.06665018 - time (sec): 42.02 - samples/sec: 1961.30 - lr: 0.000042 - momentum: 0.000000
2023-10-24 17:56:52,473 epoch 3 - iter 594/992 - loss 0.06883315 - time (sec): 50.42 - samples/sec: 1959.12 - lr: 0.000041 - momentum: 0.000000
2023-10-24 17:57:00,614 epoch 3 - iter 693/992 - loss 0.06972989 - time (sec): 58.56 - samples/sec: 1959.41 - lr: 0.000041 - momentum: 0.000000
2023-10-24 17:57:09,003 epoch 3 - iter 792/992 - loss 0.06920701 - time (sec): 66.95 - samples/sec: 1961.79 - lr: 0.000040 - momentum: 0.000000
2023-10-24 17:57:17,427 epoch 3 - iter 891/992 - loss 0.06875715 - time (sec): 75.38 - samples/sec: 1961.18 - lr: 0.000039 - momentum: 0.000000
2023-10-24 17:57:25,554 epoch 3 - iter 990/992 - loss 0.06866266 - time (sec): 83.50 - samples/sec: 1960.90 - lr: 0.000039 - momentum: 0.000000
2023-10-24 17:57:25,711 ----------------------------------------------------------------------------------------------------
2023-10-24 17:57:25,711 EPOCH 3 done: loss 0.0686 - lr: 0.000039
2023-10-24 17:57:28,825 DEV : loss 0.10797995328903198 - f1-score (micro avg) 0.7225
2023-10-24 17:57:28,840 ----------------------------------------------------------------------------------------------------
2023-10-24 17:57:36,940 epoch 4 - iter 99/992 - loss 0.04262884 - time (sec): 8.10 - samples/sec: 1951.86 - lr: 0.000038 - momentum: 0.000000
2023-10-24 17:57:45,681 epoch 4 - iter 198/992 - loss 0.04995344 - time (sec): 16.84 - samples/sec: 1947.00 - lr: 0.000038 - momentum: 0.000000
2023-10-24 17:57:54,126 epoch 4 - iter 297/992 - loss 0.04925014 - time (sec): 25.28 - samples/sec: 1947.25 - lr: 0.000037 - momentum: 0.000000
2023-10-24 17:58:02,267 epoch 4 - iter 396/992 - loss 0.05068279 - time (sec): 33.43 - samples/sec: 1948.76 - lr: 0.000037 - momentum: 0.000000
2023-10-24 17:58:10,475 epoch 4 - iter 495/992 - loss 0.05043456 - time (sec): 41.63 - samples/sec: 1960.24 - lr: 0.000036 - momentum: 0.000000
2023-10-24 17:58:18,115 epoch 4 - iter 594/992 - loss 0.04948632 - time (sec): 49.27 - samples/sec: 1955.28 - lr: 0.000036 - momentum: 0.000000
2023-10-24 17:58:26,645 epoch 4 - iter 693/992 - loss 0.05029241 - time (sec): 57.80 - samples/sec: 1963.09 - lr: 0.000035 - momentum: 0.000000
2023-10-24 17:58:35,036 epoch 4 - iter 792/992 - loss 0.05054238 - time (sec): 66.19 - samples/sec: 1959.75 - lr: 0.000034 - momentum: 0.000000
2023-10-24 17:58:43,142 epoch 4 - iter 891/992 - loss 0.04973429 - time (sec): 74.30 - samples/sec: 1968.56 - lr: 0.000034 - momentum: 0.000000
2023-10-24 17:58:52,072 epoch 4 - iter 990/992 - loss 0.04918421 - time (sec): 83.23 - samples/sec: 1966.28 - lr: 0.000033 - momentum: 0.000000
2023-10-24 17:58:52,222 ----------------------------------------------------------------------------------------------------
2023-10-24 17:58:52,222 EPOCH 4 done: loss 0.0491 - lr: 0.000033
2023-10-24 17:58:55,339 DEV : loss 0.16018341481685638 - f1-score (micro avg) 0.7368
2023-10-24 17:58:55,354 ----------------------------------------------------------------------------------------------------
2023-10-24 17:59:03,855 epoch 5 - iter 99/992 - loss 0.03275354 - time (sec): 8.50 - samples/sec: 1998.54 - lr: 0.000033 - momentum: 0.000000
2023-10-24 17:59:12,128 epoch 5 - iter 198/992 - loss 0.03475824 - time (sec): 16.77 - samples/sec: 1968.33 - lr: 0.000032 - momentum: 0.000000
2023-10-24 17:59:20,968 epoch 5 - iter 297/992 - loss 0.03421394 - time (sec): 25.61 - samples/sec: 1935.93 - lr: 0.000032 - momentum: 0.000000
2023-10-24 17:59:29,192 epoch 5 - iter 396/992 - loss 0.03443327 - time (sec): 33.84 - samples/sec: 1928.20 - lr: 0.000031 - momentum: 0.000000
2023-10-24 17:59:37,436 epoch 5 - iter 495/992 - loss 0.03822480 - time (sec): 42.08 - samples/sec: 1946.88 - lr: 0.000031 - momentum: 0.000000
2023-10-24 17:59:45,453 epoch 5 - iter 594/992 - loss 0.03666575 - time (sec): 50.10 - samples/sec: 1952.70 - lr: 0.000030 - momentum: 0.000000
2023-10-24 17:59:54,159 epoch 5 - iter 693/992 - loss 0.03753706 - time (sec): 58.80 - samples/sec: 1951.41 - lr: 0.000029 - momentum: 0.000000
2023-10-24 18:00:02,511 epoch 5 - iter 792/992 - loss 0.03848741 - time (sec): 67.16 - samples/sec: 1952.21 - lr: 0.000029 - momentum: 0.000000
2023-10-24 18:00:10,602 epoch 5 - iter 891/992 - loss 0.03850444 - time (sec): 75.25 - samples/sec: 1953.08 - lr: 0.000028 - momentum: 0.000000
2023-10-24 18:00:19,094 epoch 5 - iter 990/992 - loss 0.03746891 - time (sec): 83.74 - samples/sec: 1954.15 - lr: 0.000028 - momentum: 0.000000
2023-10-24 18:00:19,260 ----------------------------------------------------------------------------------------------------
2023-10-24 18:00:19,260 EPOCH 5 done: loss 0.0374 - lr: 0.000028
2023-10-24 18:00:22,383 DEV : loss 0.17979347705841064 - f1-score (micro avg) 0.7377
2023-10-24 18:00:22,399 ----------------------------------------------------------------------------------------------------
2023-10-24 18:00:30,989 epoch 6 - iter 99/992 - loss 0.02323895 - time (sec): 8.59 - samples/sec: 1890.29 - lr: 0.000027 - momentum: 0.000000
2023-10-24 18:00:39,421 epoch 6 - iter 198/992 - loss 0.02299469 - time (sec): 17.02 - samples/sec: 1940.04 - lr: 0.000027 - momentum: 0.000000
2023-10-24 18:00:47,699 epoch 6 - iter 297/992 - loss 0.02440000 - time (sec): 25.30 - samples/sec: 1959.56 - lr: 0.000026 - momentum: 0.000000
2023-10-24 18:00:55,811 epoch 6 - iter 396/992 - loss 0.02517086 - time (sec): 33.41 - samples/sec: 1970.95 - lr: 0.000026 - momentum: 0.000000
2023-10-24 18:01:04,347 epoch 6 - iter 495/992 - loss 0.02699649 - time (sec): 41.95 - samples/sec: 1970.74 - lr: 0.000025 - momentum: 0.000000
2023-10-24 18:01:12,638 epoch 6 - iter 594/992 - loss 0.02691355 - time (sec): 50.24 - samples/sec: 1963.22 - lr: 0.000024 - momentum: 0.000000
2023-10-24 18:01:20,840 epoch 6 - iter 693/992 - loss 0.02767072 - time (sec): 58.44 - samples/sec: 1957.76 - lr: 0.000024 - momentum: 0.000000
2023-10-24 18:01:29,206 epoch 6 - iter 792/992 - loss 0.02671390 - time (sec): 66.81 - samples/sec: 1956.90 - lr: 0.000023 - momentum: 0.000000
2023-10-24 18:01:37,458 epoch 6 - iter 891/992 - loss 0.02828160 - time (sec): 75.06 - samples/sec: 1948.57 - lr: 0.000023 - momentum: 0.000000
2023-10-24 18:01:45,697 epoch 6 - iter 990/992 - loss 0.02810958 - time (sec): 83.30 - samples/sec: 1965.05 - lr: 0.000022 - momentum: 0.000000
2023-10-24 18:01:45,857 ----------------------------------------------------------------------------------------------------
2023-10-24 18:01:45,857 EPOCH 6 done: loss 0.0281 - lr: 0.000022
2023-10-24 18:01:48,981 DEV : loss 0.18152180314064026 - f1-score (micro avg) 0.7691
2023-10-24 18:01:48,996 saving best model
2023-10-24 18:01:49,629 ----------------------------------------------------------------------------------------------------
2023-10-24 18:01:58,369 epoch 7 - iter 99/992 - loss 0.02452345 - time (sec): 8.74 - samples/sec: 1920.30 - lr: 0.000022 - momentum: 0.000000
2023-10-24 18:02:06,462 epoch 7 - iter 198/992 - loss 0.02583170 - time (sec): 16.83 - samples/sec: 1928.62 - lr: 0.000021 - momentum: 0.000000
2023-10-24 18:02:15,105 epoch 7 - iter 297/992 - loss 0.02293195 - time (sec): 25.48 - samples/sec: 1913.00 - lr: 0.000021 - momentum: 0.000000
2023-10-24 18:02:23,523 epoch 7 - iter 396/992 - loss 0.01981722 - time (sec): 33.89 - samples/sec: 1901.12 - lr: 0.000020 - momentum: 0.000000
2023-10-24 18:02:31,697 epoch 7 - iter 495/992 - loss 0.01985616 - time (sec): 42.07 - samples/sec: 1910.50 - lr: 0.000019 - momentum: 0.000000
2023-10-24 18:02:40,392 epoch 7 - iter 594/992 - loss 0.01977100 - time (sec): 50.76 - samples/sec: 1926.84 - lr: 0.000019 - momentum: 0.000000
2023-10-24 18:02:48,921 epoch 7 - iter 693/992 - loss 0.02041123 - time (sec): 59.29 - samples/sec: 1935.23 - lr: 0.000018 - momentum: 0.000000
2023-10-24 18:02:57,131 epoch 7 - iter 792/992 - loss 0.02070652 - time (sec): 67.50 - samples/sec: 1941.00 - lr: 0.000018 - momentum: 0.000000
2023-10-24 18:03:05,223 epoch 7 - iter 891/992 - loss 0.02081829 - time (sec): 75.59 - samples/sec: 1949.40 - lr: 0.000017 - momentum: 0.000000
2023-10-24 18:03:13,362 epoch 7 - iter 990/992 - loss 0.02144692 - time (sec): 83.73 - samples/sec: 1952.78 - lr: 0.000017 - momentum: 0.000000
2023-10-24 18:03:13,536 ----------------------------------------------------------------------------------------------------
2023-10-24 18:03:13,536 EPOCH 7 done: loss 0.0214 - lr: 0.000017
2023-10-24 18:03:16,649 DEV : loss 0.18771061301231384 - f1-score (micro avg) 0.7667
2023-10-24 18:03:16,664 ----------------------------------------------------------------------------------------------------
2023-10-24 18:03:25,249 epoch 8 - iter 99/992 - loss 0.01897961 - time (sec): 8.58 - samples/sec: 2021.42 - lr: 0.000016 - momentum: 0.000000
2023-10-24 18:03:33,938 epoch 8 - iter 198/992 - loss 0.01571900 - time (sec): 17.27 - samples/sec: 1977.75 - lr: 0.000016 - momentum: 0.000000
2023-10-24 18:03:42,080 epoch 8 - iter 297/992 - loss 0.01464608 - time (sec): 25.41 - samples/sec: 1955.74 - lr: 0.000015 - momentum: 0.000000
2023-10-24 18:03:50,487 epoch 8 - iter 396/992 - loss 0.01451842 - time (sec): 33.82 - samples/sec: 1945.87 - lr: 0.000014 - momentum: 0.000000
2023-10-24 18:03:58,546 epoch 8 - iter 495/992 - loss 0.01464158 - time (sec): 41.88 - samples/sec: 1950.62 - lr: 0.000014 - momentum: 0.000000
2023-10-24 18:04:07,019 epoch 8 - iter 594/992 - loss 0.01494271 - time (sec): 50.35 - samples/sec: 1962.71 - lr: 0.000013 - momentum: 0.000000
2023-10-24 18:04:15,324 epoch 8 - iter 693/992 - loss 0.01427937 - time (sec): 58.66 - samples/sec: 1965.49 - lr: 0.000013 - momentum: 0.000000
2023-10-24 18:04:23,141 epoch 8 - iter 792/992 - loss 0.01444238 - time (sec): 66.48 - samples/sec: 1961.96 - lr: 0.000012 - momentum: 0.000000
2023-10-24 18:04:31,590 epoch 8 - iter 891/992 - loss 0.01470428 - time (sec): 74.93 - samples/sec: 1960.87 - lr: 0.000012 - momentum: 0.000000
2023-10-24 18:04:39,955 epoch 8 - iter 990/992 - loss 0.01468421 - time (sec): 83.29 - samples/sec: 1964.56 - lr: 0.000011 - momentum: 0.000000
2023-10-24 18:04:40,104 ----------------------------------------------------------------------------------------------------
2023-10-24 18:04:40,104 EPOCH 8 done: loss 0.0147 - lr: 0.000011
2023-10-24 18:04:43,222 DEV : loss 0.2224731296300888 - f1-score (micro avg) 0.7444
2023-10-24 18:04:43,237 ----------------------------------------------------------------------------------------------------
2023-10-24 18:04:51,724 epoch 9 - iter 99/992 - loss 0.01508641 - time (sec): 8.49 - samples/sec: 1869.36 - lr: 0.000011 - momentum: 0.000000
2023-10-24 18:04:59,936 epoch 9 - iter 198/992 - loss 0.01126348 - time (sec): 16.70 - samples/sec: 1893.49 - lr: 0.000010 - momentum: 0.000000
2023-10-24 18:05:08,057 epoch 9 - iter 297/992 - loss 0.01011817 - time (sec): 24.82 - samples/sec: 1903.84 - lr: 0.000009 - momentum: 0.000000
2023-10-24 18:05:17,194 epoch 9 - iter 396/992 - loss 0.01014998 - time (sec): 33.96 - samples/sec: 1904.20 - lr: 0.000009 - momentum: 0.000000
2023-10-24 18:05:25,870 epoch 9 - iter 495/992 - loss 0.00901112 - time (sec): 42.63 - samples/sec: 1918.22 - lr: 0.000008 - momentum: 0.000000
2023-10-24 18:05:34,447 epoch 9 - iter 594/992 - loss 0.00957614 - time (sec): 51.21 - samples/sec: 1921.71 - lr: 0.000008 - momentum: 0.000000
2023-10-24 18:05:42,470 epoch 9 - iter 693/992 - loss 0.00985237 - time (sec): 59.23 - samples/sec: 1932.18 - lr: 0.000007 - momentum: 0.000000
2023-10-24 18:05:50,719 epoch 9 - iter 792/992 - loss 0.00966421 - time (sec): 67.48 - samples/sec: 1935.77 - lr: 0.000007 - momentum: 0.000000
2023-10-24 18:05:58,741 epoch 9 - iter 891/992 - loss 0.00952795 - time (sec): 75.50 - samples/sec: 1944.93 - lr: 0.000006 - momentum: 0.000000
2023-10-24 18:06:06,945 epoch 9 - iter 990/992 - loss 0.00951350 - time (sec): 83.71 - samples/sec: 1955.66 - lr: 0.000006 - momentum: 0.000000
2023-10-24 18:06:07,091 ----------------------------------------------------------------------------------------------------
2023-10-24 18:06:07,092 EPOCH 9 done: loss 0.0095 - lr: 0.000006
2023-10-24 18:06:10,221 DEV : loss 0.2356439083814621 - f1-score (micro avg) 0.7551
2023-10-24 18:06:10,236 ----------------------------------------------------------------------------------------------------
2023-10-24 18:06:18,255 epoch 10 - iter 99/992 - loss 0.00471428 - time (sec): 8.02 - samples/sec: 2021.65 - lr: 0.000005 - momentum: 0.000000
2023-10-24 18:06:26,499 epoch 10 - iter 198/992 - loss 0.00492676 - time (sec): 16.26 - samples/sec: 1988.79 - lr: 0.000004 - momentum: 0.000000
2023-10-24 18:06:34,960 epoch 10 - iter 297/992 - loss 0.00537611 - time (sec): 24.72 - samples/sec: 1985.92 - lr: 0.000004 - momentum: 0.000000
2023-10-24 18:06:43,429 epoch 10 - iter 396/992 - loss 0.00591290 - time (sec): 33.19 - samples/sec: 1993.67 - lr: 0.000003 - momentum: 0.000000
2023-10-24 18:06:51,659 epoch 10 - iter 495/992 - loss 0.00619826 - time (sec): 41.42 - samples/sec: 1987.82 - lr: 0.000003 - momentum: 0.000000
2023-10-24 18:07:00,038 epoch 10 - iter 594/992 - loss 0.00579102 - time (sec): 49.80 - samples/sec: 1972.80 - lr: 0.000002 - momentum: 0.000000
2023-10-24 18:07:08,440 epoch 10 - iter 693/992 - loss 0.00584032 - time (sec): 58.20 - samples/sec: 1968.97 - lr: 0.000002 - momentum: 0.000000
2023-10-24 18:07:16,506 epoch 10 - iter 792/992 - loss 0.00552839 - time (sec): 66.27 - samples/sec: 1964.77 - lr: 0.000001 - momentum: 0.000000
2023-10-24 18:07:25,019 epoch 10 - iter 891/992 - loss 0.00572974 - time (sec): 74.78 - samples/sec: 1962.97 - lr: 0.000001 - momentum: 0.000000
2023-10-24 18:07:33,501 epoch 10 - iter 990/992 - loss 0.00560021 - time (sec): 83.26 - samples/sec: 1965.24 - lr: 0.000000 - momentum: 0.000000
2023-10-24 18:07:33,670 ----------------------------------------------------------------------------------------------------
2023-10-24 18:07:33,671 EPOCH 10 done: loss 0.0056 - lr: 0.000000
2023-10-24 18:07:36,792 DEV : loss 0.24207349121570587 - f1-score (micro avg) 0.7541
2023-10-24 18:07:37,277 ----------------------------------------------------------------------------------------------------
2023-10-24 18:07:37,277 Loading model from best epoch ...
2023-10-24 18:07:39,090 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-24 18:07:41,834
Results:
- F-score (micro) 0.7721
- F-score (macro) 0.6822
- Accuracy 0.6487
By class:
precision recall f1-score support
LOC 0.8067 0.8473 0.8265 655
PER 0.6980 0.7982 0.7448 223
ORG 0.6400 0.3780 0.4752 127
micro avg 0.7672 0.7771 0.7721 1005
macro avg 0.7149 0.6745 0.6822 1005
weighted avg 0.7615 0.7771 0.7640 1005
2023-10-24 18:07:41,834 ----------------------------------------------------------------------------------------------------