2023-10-13 13:26:55,352 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,353 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=21, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 13:26:55,353 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,353 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator 2023-10-13 13:26:55,353 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,353 Train: 3575 sentences 2023-10-13 13:26:55,353 (train_with_dev=False, train_with_test=False) 2023-10-13 13:26:55,353 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,353 Training Params: 2023-10-13 13:26:55,353 - learning_rate: "3e-05" 2023-10-13 13:26:55,353 - mini_batch_size: "4" 2023-10-13 13:26:55,353 - max_epochs: "10" 2023-10-13 13:26:55,353 - shuffle: "True" 2023-10-13 13:26:55,353 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,353 Plugins: 2023-10-13 13:26:55,353 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 13:26:55,353 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,353 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 13:26:55,354 - metric: "('micro avg', 'f1-score')" 2023-10-13 13:26:55,354 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,354 Computation: 2023-10-13 13:26:55,354 - compute on device: cuda:0 2023-10-13 13:26:55,354 - embedding storage: none 2023-10-13 13:26:55,354 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,354 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-4" 2023-10-13 13:26:55,354 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:55,354 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:26:59,604 epoch 1 - iter 89/894 - loss 2.95558386 - time (sec): 4.25 - samples/sec: 2080.59 - lr: 0.000003 - momentum: 0.000000 2023-10-13 13:27:03,798 epoch 1 - iter 178/894 - loss 1.86962681 - time (sec): 8.44 - samples/sec: 2140.01 - lr: 0.000006 - momentum: 0.000000 2023-10-13 13:27:07,907 epoch 1 - iter 267/894 - loss 1.43922230 - time (sec): 12.55 - samples/sec: 2070.79 - lr: 0.000009 - momentum: 0.000000 2023-10-13 13:27:12,189 epoch 1 - iter 356/894 - loss 1.16332138 - time (sec): 16.83 - samples/sec: 2074.45 - lr: 0.000012 - momentum: 0.000000 2023-10-13 13:27:16,305 epoch 1 - iter 445/894 - loss 0.99639681 - time (sec): 20.95 - samples/sec: 2063.30 - lr: 0.000015 - momentum: 0.000000 2023-10-13 13:27:20,409 epoch 1 - iter 534/894 - loss 0.88501649 - time (sec): 25.05 - samples/sec: 2062.28 - lr: 0.000018 - momentum: 0.000000 2023-10-13 13:27:24,535 epoch 1 - iter 623/894 - loss 0.80459753 - time (sec): 29.18 - samples/sec: 2051.80 - lr: 0.000021 - momentum: 0.000000 2023-10-13 13:27:28,784 epoch 1 - iter 712/894 - loss 0.73830399 - time (sec): 33.43 - samples/sec: 2048.10 - lr: 0.000024 - momentum: 0.000000 2023-10-13 13:27:32,761 epoch 1 - iter 801/894 - loss 0.68135095 - time (sec): 37.41 - samples/sec: 2044.69 - lr: 0.000027 - momentum: 0.000000 2023-10-13 13:27:37,145 epoch 1 - iter 890/894 - loss 0.63356747 - time (sec): 41.79 - samples/sec: 2062.28 - lr: 0.000030 - momentum: 0.000000 2023-10-13 13:27:37,323 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:27:37,324 EPOCH 1 done: loss 0.6313 - lr: 0.000030 2023-10-13 13:27:42,761 DEV : loss 0.1941666156053543 - f1-score (micro avg) 0.6106 2023-10-13 13:27:42,790 saving best model 2023-10-13 13:27:43,110 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:27:47,503 epoch 2 - iter 89/894 - loss 0.18794288 - time (sec): 4.39 - samples/sec: 2054.48 - lr: 0.000030 - momentum: 0.000000 2023-10-13 13:27:52,262 epoch 2 - iter 178/894 - loss 0.17923060 - time (sec): 9.15 - samples/sec: 2032.58 - lr: 0.000029 - momentum: 0.000000 2023-10-13 13:27:56,703 epoch 2 - iter 267/894 - loss 0.16858546 - time (sec): 13.59 - samples/sec: 1950.55 - lr: 0.000029 - momentum: 0.000000 2023-10-13 13:28:00,953 epoch 2 - iter 356/894 - loss 0.17427875 - time (sec): 17.84 - samples/sec: 1952.36 - lr: 0.000029 - momentum: 0.000000 2023-10-13 13:28:05,270 epoch 2 - iter 445/894 - loss 0.17144978 - time (sec): 22.16 - samples/sec: 1968.14 - lr: 0.000028 - momentum: 0.000000 2023-10-13 13:28:09,579 epoch 2 - iter 534/894 - loss 0.16359629 - time (sec): 26.47 - samples/sec: 1976.51 - lr: 0.000028 - momentum: 0.000000 2023-10-13 13:28:13,700 epoch 2 - iter 623/894 - loss 0.16276876 - time (sec): 30.59 - samples/sec: 1976.29 - lr: 0.000028 - momentum: 0.000000 2023-10-13 13:28:17,828 epoch 2 - iter 712/894 - loss 0.16121921 - time (sec): 34.72 - samples/sec: 1977.28 - lr: 0.000027 - momentum: 0.000000 2023-10-13 13:28:22,050 epoch 2 - iter 801/894 - loss 0.15844776 - time (sec): 38.94 - samples/sec: 1990.54 - lr: 0.000027 - momentum: 0.000000 2023-10-13 13:28:26,052 epoch 2 - iter 890/894 - loss 0.15768333 - time (sec): 42.94 - samples/sec: 2007.70 - lr: 0.000027 - momentum: 0.000000 2023-10-13 13:28:26,230 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:28:26,230 EPOCH 2 done: loss 0.1574 - lr: 0.000027 2023-10-13 13:28:35,343 DEV : loss 0.1417360156774521 - f1-score (micro avg) 0.6804 2023-10-13 13:28:35,382 saving best model 2023-10-13 13:28:35,903 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:28:40,362 epoch 3 - iter 89/894 - loss 0.10870491 - time (sec): 4.46 - samples/sec: 1945.07 - lr: 0.000026 - momentum: 0.000000 2023-10-13 13:28:44,659 epoch 3 - iter 178/894 - loss 0.09511837 - time (sec): 8.75 - samples/sec: 2060.06 - lr: 0.000026 - momentum: 0.000000 2023-10-13 13:28:48,835 epoch 3 - iter 267/894 - loss 0.08476480 - time (sec): 12.93 - samples/sec: 2045.85 - lr: 0.000026 - momentum: 0.000000 2023-10-13 13:28:52,984 epoch 3 - iter 356/894 - loss 0.08905987 - time (sec): 17.08 - samples/sec: 2036.20 - lr: 0.000025 - momentum: 0.000000 2023-10-13 13:28:57,234 epoch 3 - iter 445/894 - loss 0.08587345 - time (sec): 21.33 - samples/sec: 1995.76 - lr: 0.000025 - momentum: 0.000000 2023-10-13 13:29:01,699 epoch 3 - iter 534/894 - loss 0.08587494 - time (sec): 25.79 - samples/sec: 1983.40 - lr: 0.000025 - momentum: 0.000000 2023-10-13 13:29:06,110 epoch 3 - iter 623/894 - loss 0.08738808 - time (sec): 30.20 - samples/sec: 1965.13 - lr: 0.000024 - momentum: 0.000000 2023-10-13 13:29:10,497 epoch 3 - iter 712/894 - loss 0.08364026 - time (sec): 34.59 - samples/sec: 1980.18 - lr: 0.000024 - momentum: 0.000000 2023-10-13 13:29:14,555 epoch 3 - iter 801/894 - loss 0.08812239 - time (sec): 38.65 - samples/sec: 1985.64 - lr: 0.000024 - momentum: 0.000000 2023-10-13 13:29:18,956 epoch 3 - iter 890/894 - loss 0.08729426 - time (sec): 43.05 - samples/sec: 2001.73 - lr: 0.000023 - momentum: 0.000000 2023-10-13 13:29:19,149 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:29:19,149 EPOCH 3 done: loss 0.0872 - lr: 0.000023 2023-10-13 13:29:28,104 DEV : loss 0.1514357179403305 - f1-score (micro avg) 0.7256 2023-10-13 13:29:28,139 saving best model 2023-10-13 13:29:28,599 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:29:33,101 epoch 4 - iter 89/894 - loss 0.05364078 - time (sec): 4.50 - samples/sec: 1918.44 - lr: 0.000023 - momentum: 0.000000 2023-10-13 13:29:37,515 epoch 4 - iter 178/894 - loss 0.05117238 - time (sec): 8.91 - samples/sec: 1863.71 - lr: 0.000023 - momentum: 0.000000 2023-10-13 13:29:42,069 epoch 4 - iter 267/894 - loss 0.05442001 - time (sec): 13.46 - samples/sec: 1876.18 - lr: 0.000022 - momentum: 0.000000 2023-10-13 13:29:46,745 epoch 4 - iter 356/894 - loss 0.04904272 - time (sec): 18.14 - samples/sec: 1878.12 - lr: 0.000022 - momentum: 0.000000 2023-10-13 13:29:51,714 epoch 4 - iter 445/894 - loss 0.05288936 - time (sec): 23.11 - samples/sec: 1892.72 - lr: 0.000022 - momentum: 0.000000 2023-10-13 13:29:56,451 epoch 4 - iter 534/894 - loss 0.05337299 - time (sec): 27.84 - samples/sec: 1885.36 - lr: 0.000021 - momentum: 0.000000 2023-10-13 13:30:01,001 epoch 4 - iter 623/894 - loss 0.05444078 - time (sec): 32.40 - samples/sec: 1864.41 - lr: 0.000021 - momentum: 0.000000 2023-10-13 13:30:05,689 epoch 4 - iter 712/894 - loss 0.05523828 - time (sec): 37.08 - samples/sec: 1872.35 - lr: 0.000021 - momentum: 0.000000 2023-10-13 13:30:10,379 epoch 4 - iter 801/894 - loss 0.05634012 - time (sec): 41.77 - samples/sec: 1870.31 - lr: 0.000020 - momentum: 0.000000 2023-10-13 13:30:14,915 epoch 4 - iter 890/894 - loss 0.05893335 - time (sec): 46.31 - samples/sec: 1860.70 - lr: 0.000020 - momentum: 0.000000 2023-10-13 13:30:15,109 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:30:15,109 EPOCH 4 done: loss 0.0589 - lr: 0.000020 2023-10-13 13:30:23,755 DEV : loss 0.17774192988872528 - f1-score (micro avg) 0.7457 2023-10-13 13:30:23,789 saving best model 2023-10-13 13:30:24,260 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:30:28,888 epoch 5 - iter 89/894 - loss 0.04958656 - time (sec): 4.63 - samples/sec: 1958.63 - lr: 0.000020 - momentum: 0.000000 2023-10-13 13:30:32,984 epoch 5 - iter 178/894 - loss 0.04439640 - time (sec): 8.72 - samples/sec: 1995.20 - lr: 0.000019 - momentum: 0.000000 2023-10-13 13:30:37,156 epoch 5 - iter 267/894 - loss 0.04404992 - time (sec): 12.89 - samples/sec: 2038.74 - lr: 0.000019 - momentum: 0.000000 2023-10-13 13:30:41,256 epoch 5 - iter 356/894 - loss 0.04148328 - time (sec): 16.99 - samples/sec: 2062.03 - lr: 0.000019 - momentum: 0.000000 2023-10-13 13:30:45,339 epoch 5 - iter 445/894 - loss 0.04024971 - time (sec): 21.08 - samples/sec: 2053.98 - lr: 0.000018 - momentum: 0.000000 2023-10-13 13:30:49,695 epoch 5 - iter 534/894 - loss 0.04182158 - time (sec): 25.43 - samples/sec: 2035.14 - lr: 0.000018 - momentum: 0.000000 2023-10-13 13:30:54,179 epoch 5 - iter 623/894 - loss 0.03942623 - time (sec): 29.92 - samples/sec: 2047.61 - lr: 0.000018 - momentum: 0.000000 2023-10-13 13:30:58,471 epoch 5 - iter 712/894 - loss 0.04026267 - time (sec): 34.21 - samples/sec: 2025.62 - lr: 0.000017 - momentum: 0.000000 2023-10-13 13:31:02,647 epoch 5 - iter 801/894 - loss 0.03995825 - time (sec): 38.38 - samples/sec: 2024.13 - lr: 0.000017 - momentum: 0.000000 2023-10-13 13:31:06,816 epoch 5 - iter 890/894 - loss 0.03960817 - time (sec): 42.55 - samples/sec: 2026.89 - lr: 0.000017 - momentum: 0.000000 2023-10-13 13:31:07,001 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:31:07,002 EPOCH 5 done: loss 0.0397 - lr: 0.000017 2023-10-13 13:31:15,787 DEV : loss 0.2007289081811905 - f1-score (micro avg) 0.7645 2023-10-13 13:31:15,820 saving best model 2023-10-13 13:31:16,295 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:31:20,481 epoch 6 - iter 89/894 - loss 0.02040547 - time (sec): 4.18 - samples/sec: 2093.62 - lr: 0.000016 - momentum: 0.000000 2023-10-13 13:31:24,680 epoch 6 - iter 178/894 - loss 0.01964149 - time (sec): 8.38 - samples/sec: 2121.16 - lr: 0.000016 - momentum: 0.000000 2023-10-13 13:31:28,716 epoch 6 - iter 267/894 - loss 0.02088570 - time (sec): 12.42 - samples/sec: 2112.13 - lr: 0.000016 - momentum: 0.000000 2023-10-13 13:31:33,092 epoch 6 - iter 356/894 - loss 0.02324113 - time (sec): 16.79 - samples/sec: 2156.66 - lr: 0.000015 - momentum: 0.000000 2023-10-13 13:31:37,174 epoch 6 - iter 445/894 - loss 0.02270645 - time (sec): 20.88 - samples/sec: 2094.45 - lr: 0.000015 - momentum: 0.000000 2023-10-13 13:31:41,187 epoch 6 - iter 534/894 - loss 0.02337935 - time (sec): 24.89 - samples/sec: 2090.81 - lr: 0.000015 - momentum: 0.000000 2023-10-13 13:31:45,296 epoch 6 - iter 623/894 - loss 0.02406282 - time (sec): 29.00 - samples/sec: 2087.74 - lr: 0.000014 - momentum: 0.000000 2023-10-13 13:31:49,382 epoch 6 - iter 712/894 - loss 0.02361249 - time (sec): 33.08 - samples/sec: 2074.79 - lr: 0.000014 - momentum: 0.000000 2023-10-13 13:31:53,449 epoch 6 - iter 801/894 - loss 0.02432036 - time (sec): 37.15 - samples/sec: 2088.20 - lr: 0.000014 - momentum: 0.000000 2023-10-13 13:31:57,438 epoch 6 - iter 890/894 - loss 0.02630157 - time (sec): 41.14 - samples/sec: 2095.76 - lr: 0.000013 - momentum: 0.000000 2023-10-13 13:31:57,617 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:31:57,617 EPOCH 6 done: loss 0.0264 - lr: 0.000013 2023-10-13 13:32:06,141 DEV : loss 0.20405448973178864 - f1-score (micro avg) 0.7684 2023-10-13 13:32:06,171 saving best model 2023-10-13 13:32:06,604 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:32:11,208 epoch 7 - iter 89/894 - loss 0.01619462 - time (sec): 4.60 - samples/sec: 2178.62 - lr: 0.000013 - momentum: 0.000000 2023-10-13 13:32:15,438 epoch 7 - iter 178/894 - loss 0.01588258 - time (sec): 8.83 - samples/sec: 2050.83 - lr: 0.000013 - momentum: 0.000000 2023-10-13 13:32:19,895 epoch 7 - iter 267/894 - loss 0.01351689 - time (sec): 13.28 - samples/sec: 2029.35 - lr: 0.000012 - momentum: 0.000000 2023-10-13 13:32:24,203 epoch 7 - iter 356/894 - loss 0.01461563 - time (sec): 17.59 - samples/sec: 2039.65 - lr: 0.000012 - momentum: 0.000000 2023-10-13 13:32:28,462 epoch 7 - iter 445/894 - loss 0.01677621 - time (sec): 21.85 - samples/sec: 2033.09 - lr: 0.000012 - momentum: 0.000000 2023-10-13 13:32:32,582 epoch 7 - iter 534/894 - loss 0.01693157 - time (sec): 25.97 - samples/sec: 2006.28 - lr: 0.000011 - momentum: 0.000000 2023-10-13 13:32:36,783 epoch 7 - iter 623/894 - loss 0.01777874 - time (sec): 30.17 - samples/sec: 2017.12 - lr: 0.000011 - momentum: 0.000000 2023-10-13 13:32:40,885 epoch 7 - iter 712/894 - loss 0.01692073 - time (sec): 34.27 - samples/sec: 2018.10 - lr: 0.000011 - momentum: 0.000000 2023-10-13 13:32:44,949 epoch 7 - iter 801/894 - loss 0.01780651 - time (sec): 38.34 - samples/sec: 2021.61 - lr: 0.000010 - momentum: 0.000000 2023-10-13 13:32:48,987 epoch 7 - iter 890/894 - loss 0.01723398 - time (sec): 42.38 - samples/sec: 2034.07 - lr: 0.000010 - momentum: 0.000000 2023-10-13 13:32:49,163 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:32:49,163 EPOCH 7 done: loss 0.0174 - lr: 0.000010 2023-10-13 13:32:58,154 DEV : loss 0.22012537717819214 - f1-score (micro avg) 0.7795 2023-10-13 13:32:58,194 saving best model 2023-10-13 13:32:58,701 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:33:03,051 epoch 8 - iter 89/894 - loss 0.00832136 - time (sec): 4.34 - samples/sec: 2013.98 - lr: 0.000010 - momentum: 0.000000 2023-10-13 13:33:07,564 epoch 8 - iter 178/894 - loss 0.00808894 - time (sec): 8.86 - samples/sec: 1950.24 - lr: 0.000009 - momentum: 0.000000 2023-10-13 13:33:11,637 epoch 8 - iter 267/894 - loss 0.01090431 - time (sec): 12.93 - samples/sec: 1983.97 - lr: 0.000009 - momentum: 0.000000 2023-10-13 13:33:15,841 epoch 8 - iter 356/894 - loss 0.01087231 - time (sec): 17.13 - samples/sec: 1976.09 - lr: 0.000009 - momentum: 0.000000 2023-10-13 13:33:20,134 epoch 8 - iter 445/894 - loss 0.01052967 - time (sec): 21.43 - samples/sec: 1971.66 - lr: 0.000008 - momentum: 0.000000 2023-10-13 13:33:24,383 epoch 8 - iter 534/894 - loss 0.01078439 - time (sec): 25.68 - samples/sec: 1970.62 - lr: 0.000008 - momentum: 0.000000 2023-10-13 13:33:28,522 epoch 8 - iter 623/894 - loss 0.01109453 - time (sec): 29.82 - samples/sec: 1983.29 - lr: 0.000008 - momentum: 0.000000 2023-10-13 13:33:32,944 epoch 8 - iter 712/894 - loss 0.01240084 - time (sec): 34.24 - samples/sec: 1995.51 - lr: 0.000007 - momentum: 0.000000 2023-10-13 13:33:37,178 epoch 8 - iter 801/894 - loss 0.01237060 - time (sec): 38.47 - samples/sec: 2016.12 - lr: 0.000007 - momentum: 0.000000 2023-10-13 13:33:41,265 epoch 8 - iter 890/894 - loss 0.01219660 - time (sec): 42.56 - samples/sec: 2027.23 - lr: 0.000007 - momentum: 0.000000 2023-10-13 13:33:41,440 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:33:41,440 EPOCH 8 done: loss 0.0122 - lr: 0.000007 2023-10-13 13:33:50,357 DEV : loss 0.2315651774406433 - f1-score (micro avg) 0.7705 2023-10-13 13:33:50,387 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:33:54,612 epoch 9 - iter 89/894 - loss 0.00348902 - time (sec): 4.22 - samples/sec: 1985.17 - lr: 0.000006 - momentum: 0.000000 2023-10-13 13:33:58,941 epoch 9 - iter 178/894 - loss 0.00486255 - time (sec): 8.55 - samples/sec: 2008.41 - lr: 0.000006 - momentum: 0.000000 2023-10-13 13:34:03,159 epoch 9 - iter 267/894 - loss 0.00642317 - time (sec): 12.77 - samples/sec: 1974.93 - lr: 0.000006 - momentum: 0.000000 2023-10-13 13:34:07,390 epoch 9 - iter 356/894 - loss 0.01103944 - time (sec): 17.00 - samples/sec: 1971.80 - lr: 0.000005 - momentum: 0.000000 2023-10-13 13:34:11,668 epoch 9 - iter 445/894 - loss 0.00897085 - time (sec): 21.28 - samples/sec: 1999.42 - lr: 0.000005 - momentum: 0.000000 2023-10-13 13:34:15,923 epoch 9 - iter 534/894 - loss 0.00808237 - time (sec): 25.53 - samples/sec: 1998.01 - lr: 0.000005 - momentum: 0.000000 2023-10-13 13:34:20,374 epoch 9 - iter 623/894 - loss 0.00817370 - time (sec): 29.99 - samples/sec: 2006.05 - lr: 0.000004 - momentum: 0.000000 2023-10-13 13:34:24,751 epoch 9 - iter 712/894 - loss 0.00803605 - time (sec): 34.36 - samples/sec: 2035.31 - lr: 0.000004 - momentum: 0.000000 2023-10-13 13:34:28,751 epoch 9 - iter 801/894 - loss 0.00806848 - time (sec): 38.36 - samples/sec: 2035.57 - lr: 0.000004 - momentum: 0.000000 2023-10-13 13:34:32,862 epoch 9 - iter 890/894 - loss 0.00765097 - time (sec): 42.47 - samples/sec: 2032.01 - lr: 0.000003 - momentum: 0.000000 2023-10-13 13:34:33,037 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:34:33,038 EPOCH 9 done: loss 0.0076 - lr: 0.000003 2023-10-13 13:34:41,714 DEV : loss 0.23668904602527618 - f1-score (micro avg) 0.7859 2023-10-13 13:34:41,747 saving best model 2023-10-13 13:34:42,200 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:34:46,536 epoch 10 - iter 89/894 - loss 0.00697084 - time (sec): 4.33 - samples/sec: 2289.44 - lr: 0.000003 - momentum: 0.000000 2023-10-13 13:34:50,883 epoch 10 - iter 178/894 - loss 0.00707544 - time (sec): 8.68 - samples/sec: 2158.02 - lr: 0.000003 - momentum: 0.000000 2023-10-13 13:34:55,094 epoch 10 - iter 267/894 - loss 0.00741832 - time (sec): 12.89 - samples/sec: 2097.03 - lr: 0.000002 - momentum: 0.000000 2023-10-13 13:34:59,329 epoch 10 - iter 356/894 - loss 0.00805166 - time (sec): 17.13 - samples/sec: 2052.37 - lr: 0.000002 - momentum: 0.000000 2023-10-13 13:35:03,696 epoch 10 - iter 445/894 - loss 0.00661446 - time (sec): 21.49 - samples/sec: 2039.46 - lr: 0.000002 - momentum: 0.000000 2023-10-13 13:35:07,973 epoch 10 - iter 534/894 - loss 0.00680689 - time (sec): 25.77 - samples/sec: 2014.68 - lr: 0.000001 - momentum: 0.000000 2023-10-13 13:35:12,132 epoch 10 - iter 623/894 - loss 0.00641756 - time (sec): 29.93 - samples/sec: 2009.15 - lr: 0.000001 - momentum: 0.000000 2023-10-13 13:35:16,236 epoch 10 - iter 712/894 - loss 0.00594203 - time (sec): 34.03 - samples/sec: 2025.05 - lr: 0.000001 - momentum: 0.000000 2023-10-13 13:35:20,200 epoch 10 - iter 801/894 - loss 0.00534108 - time (sec): 38.00 - samples/sec: 2040.65 - lr: 0.000000 - momentum: 0.000000 2023-10-13 13:35:24,171 epoch 10 - iter 890/894 - loss 0.00532176 - time (sec): 41.97 - samples/sec: 2054.23 - lr: 0.000000 - momentum: 0.000000 2023-10-13 13:35:24,347 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:35:24,348 EPOCH 10 done: loss 0.0053 - lr: 0.000000 2023-10-13 13:35:33,250 DEV : loss 0.23821337521076202 - f1-score (micro avg) 0.7852 2023-10-13 13:35:33,648 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:35:33,649 Loading model from best epoch ... 2023-10-13 13:35:35,424 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time 2023-10-13 13:35:40,059 Results: - F-score (micro) 0.7491 - F-score (macro) 0.6783 - Accuracy 0.6188 By class: precision recall f1-score support loc 0.8225 0.8473 0.8347 596 pers 0.6649 0.7447 0.7025 333 org 0.6018 0.5152 0.5551 132 prod 0.6346 0.5000 0.5593 66 time 0.7255 0.7551 0.7400 49 micro avg 0.7406 0.7577 0.7491 1176 macro avg 0.6898 0.6725 0.6783 1176 weighted avg 0.7385 0.7577 0.7465 1176 2023-10-13 13:35:40,059 ----------------------------------------------------------------------------------------------------