|
2023-10-12 06:24:15,769 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,771 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-12 06:24:15,772 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,772 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences |
|
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator |
|
2023-10-12 06:24:15,772 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,772 Train: 20847 sentences |
|
2023-10-12 06:24:15,772 (train_with_dev=False, train_with_test=False) |
|
2023-10-12 06:24:15,772 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,772 Training Params: |
|
2023-10-12 06:24:15,772 - learning_rate: "0.00016" |
|
2023-10-12 06:24:15,772 - mini_batch_size: "4" |
|
2023-10-12 06:24:15,772 - max_epochs: "10" |
|
2023-10-12 06:24:15,773 - shuffle: "True" |
|
2023-10-12 06:24:15,773 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,773 Plugins: |
|
2023-10-12 06:24:15,773 - TensorboardLogger |
|
2023-10-12 06:24:15,773 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-12 06:24:15,773 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,773 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-12 06:24:15,773 - metric: "('micro avg', 'f1-score')" |
|
2023-10-12 06:24:15,773 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,773 Computation: |
|
2023-10-12 06:24:15,773 - compute on device: cuda:0 |
|
2023-10-12 06:24:15,773 - embedding storage: none |
|
2023-10-12 06:24:15,773 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,773 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" |
|
2023-10-12 06:24:15,774 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,774 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:24:15,774 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-12 06:26:32,674 epoch 1 - iter 521/5212 - loss 2.79144553 - time (sec): 136.90 - samples/sec: 242.00 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-12 06:28:51,123 epoch 1 - iter 1042/5212 - loss 2.32825107 - time (sec): 275.35 - samples/sec: 246.67 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-12 06:31:13,745 epoch 1 - iter 1563/5212 - loss 1.76882170 - time (sec): 417.97 - samples/sec: 251.47 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-12 06:33:39,975 epoch 1 - iter 2084/5212 - loss 1.42611448 - time (sec): 564.20 - samples/sec: 251.68 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-12 06:36:10,203 epoch 1 - iter 2605/5212 - loss 1.21917708 - time (sec): 714.43 - samples/sec: 252.03 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-12 06:38:38,555 epoch 1 - iter 3126/5212 - loss 1.07781127 - time (sec): 862.78 - samples/sec: 249.33 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-12 06:41:11,145 epoch 1 - iter 3647/5212 - loss 0.96200152 - time (sec): 1015.37 - samples/sec: 249.54 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-12 06:43:28,559 epoch 1 - iter 4168/5212 - loss 0.87230031 - time (sec): 1152.78 - samples/sec: 251.22 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-12 06:45:46,933 epoch 1 - iter 4689/5212 - loss 0.79145949 - time (sec): 1291.16 - samples/sec: 255.19 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-12 06:48:04,832 epoch 1 - iter 5210/5212 - loss 0.72825829 - time (sec): 1429.06 - samples/sec: 256.96 - lr: 0.000160 - momentum: 0.000000 |
|
2023-10-12 06:48:05,387 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:48:05,387 EPOCH 1 done: loss 0.7279 - lr: 0.000160 |
|
2023-10-12 06:48:40,151 DEV : loss 0.12137877196073532 - f1-score (micro avg) 0.2683 |
|
2023-10-12 06:48:40,200 saving best model |
|
2023-10-12 06:48:41,028 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 06:50:57,738 epoch 2 - iter 521/5212 - loss 0.16649058 - time (sec): 136.71 - samples/sec: 265.51 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-12 06:53:18,013 epoch 2 - iter 1042/5212 - loss 0.15471879 - time (sec): 276.98 - samples/sec: 267.97 - lr: 0.000156 - momentum: 0.000000 |
|
2023-10-12 06:55:29,293 epoch 2 - iter 1563/5212 - loss 0.15616420 - time (sec): 408.26 - samples/sec: 267.36 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-12 06:57:48,778 epoch 2 - iter 2084/5212 - loss 0.15588028 - time (sec): 547.75 - samples/sec: 270.67 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-12 07:00:07,196 epoch 2 - iter 2605/5212 - loss 0.15245153 - time (sec): 686.17 - samples/sec: 269.12 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-12 07:02:22,055 epoch 2 - iter 3126/5212 - loss 0.15216461 - time (sec): 821.03 - samples/sec: 265.80 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-12 07:04:35,943 epoch 2 - iter 3647/5212 - loss 0.15284241 - time (sec): 954.91 - samples/sec: 262.54 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-12 07:06:55,423 epoch 2 - iter 4168/5212 - loss 0.14914771 - time (sec): 1094.39 - samples/sec: 264.14 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-12 07:09:18,351 epoch 2 - iter 4689/5212 - loss 0.14565854 - time (sec): 1237.32 - samples/sec: 266.98 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-12 07:11:33,790 epoch 2 - iter 5210/5212 - loss 0.14406346 - time (sec): 1372.76 - samples/sec: 267.60 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-12 07:11:34,209 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 07:11:34,209 EPOCH 2 done: loss 0.1441 - lr: 0.000142 |
|
2023-10-12 07:12:11,927 DEV : loss 0.1253765970468521 - f1-score (micro avg) 0.3506 |
|
2023-10-12 07:12:11,979 saving best model |
|
2023-10-12 07:12:14,555 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 07:14:35,479 epoch 3 - iter 521/5212 - loss 0.09734501 - time (sec): 140.92 - samples/sec: 245.08 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-12 07:16:52,067 epoch 3 - iter 1042/5212 - loss 0.09555292 - time (sec): 277.51 - samples/sec: 243.06 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-12 07:19:15,804 epoch 3 - iter 1563/5212 - loss 0.09971065 - time (sec): 421.24 - samples/sec: 257.63 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-12 07:21:33,165 epoch 3 - iter 2084/5212 - loss 0.09886889 - time (sec): 558.61 - samples/sec: 257.86 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-12 07:23:51,596 epoch 3 - iter 2605/5212 - loss 0.09840080 - time (sec): 697.04 - samples/sec: 256.67 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-12 07:26:18,620 epoch 3 - iter 3126/5212 - loss 0.09591881 - time (sec): 844.06 - samples/sec: 259.68 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-12 07:28:46,160 epoch 3 - iter 3647/5212 - loss 0.09702574 - time (sec): 991.60 - samples/sec: 261.92 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-12 07:31:05,439 epoch 3 - iter 4168/5212 - loss 0.09812998 - time (sec): 1130.88 - samples/sec: 258.75 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-12 07:33:30,593 epoch 3 - iter 4689/5212 - loss 0.10064749 - time (sec): 1276.03 - samples/sec: 258.23 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-12 07:35:54,377 epoch 3 - iter 5210/5212 - loss 0.09998955 - time (sec): 1419.82 - samples/sec: 258.66 - lr: 0.000124 - momentum: 0.000000 |
|
2023-10-12 07:35:54,900 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 07:35:54,901 EPOCH 3 done: loss 0.1000 - lr: 0.000124 |
|
2023-10-12 07:36:33,027 DEV : loss 0.2415073961019516 - f1-score (micro avg) 0.3658 |
|
2023-10-12 07:36:33,078 saving best model |
|
2023-10-12 07:36:35,643 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 07:38:56,125 epoch 4 - iter 521/5212 - loss 0.06307318 - time (sec): 140.48 - samples/sec: 257.95 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-12 07:41:17,505 epoch 4 - iter 1042/5212 - loss 0.06517042 - time (sec): 281.86 - samples/sec: 263.32 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-12 07:43:40,499 epoch 4 - iter 1563/5212 - loss 0.06318112 - time (sec): 424.85 - samples/sec: 265.53 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-12 07:46:02,286 epoch 4 - iter 2084/5212 - loss 0.06508855 - time (sec): 566.64 - samples/sec: 265.94 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-12 07:48:22,695 epoch 4 - iter 2605/5212 - loss 0.06477669 - time (sec): 707.05 - samples/sec: 263.75 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-12 07:50:43,698 epoch 4 - iter 3126/5212 - loss 0.06348635 - time (sec): 848.05 - samples/sec: 264.37 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-12 07:53:11,823 epoch 4 - iter 3647/5212 - loss 0.06400079 - time (sec): 996.18 - samples/sec: 261.69 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-12 07:55:39,211 epoch 4 - iter 4168/5212 - loss 0.06527752 - time (sec): 1143.56 - samples/sec: 258.12 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-12 07:58:13,334 epoch 4 - iter 4689/5212 - loss 0.06466237 - time (sec): 1297.69 - samples/sec: 256.36 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-12 08:00:43,967 epoch 4 - iter 5210/5212 - loss 0.06493695 - time (sec): 1448.32 - samples/sec: 253.64 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-12 08:00:44,441 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 08:00:44,441 EPOCH 4 done: loss 0.0649 - lr: 0.000107 |
|
2023-10-12 08:01:24,860 DEV : loss 0.3256777822971344 - f1-score (micro avg) 0.3561 |
|
2023-10-12 08:01:24,915 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 08:03:55,582 epoch 5 - iter 521/5212 - loss 0.03739825 - time (sec): 150.66 - samples/sec: 240.43 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-12 08:06:27,648 epoch 5 - iter 1042/5212 - loss 0.04090493 - time (sec): 302.73 - samples/sec: 239.94 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-12 08:08:54,401 epoch 5 - iter 1563/5212 - loss 0.04076288 - time (sec): 449.48 - samples/sec: 237.59 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-12 08:11:24,674 epoch 5 - iter 2084/5212 - loss 0.04217115 - time (sec): 599.76 - samples/sec: 242.15 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-12 08:13:53,967 epoch 5 - iter 2605/5212 - loss 0.04382995 - time (sec): 749.05 - samples/sec: 240.00 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-12 08:16:24,187 epoch 5 - iter 3126/5212 - loss 0.04433151 - time (sec): 899.27 - samples/sec: 241.23 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-12 08:18:53,633 epoch 5 - iter 3647/5212 - loss 0.04289213 - time (sec): 1048.72 - samples/sec: 243.62 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-12 08:21:24,409 epoch 5 - iter 4168/5212 - loss 0.04379848 - time (sec): 1199.49 - samples/sec: 244.00 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-12 08:23:52,868 epoch 5 - iter 4689/5212 - loss 0.04476832 - time (sec): 1347.95 - samples/sec: 243.72 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-12 08:26:28,783 epoch 5 - iter 5210/5212 - loss 0.04555013 - time (sec): 1503.87 - samples/sec: 244.28 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-12 08:26:29,210 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 08:26:29,211 EPOCH 5 done: loss 0.0456 - lr: 0.000089 |
|
2023-10-12 08:27:10,531 DEV : loss 0.263163298368454 - f1-score (micro avg) 0.4003 |
|
2023-10-12 08:27:10,587 saving best model |
|
2023-10-12 08:27:13,284 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 08:29:48,146 epoch 6 - iter 521/5212 - loss 0.03123258 - time (sec): 154.86 - samples/sec: 246.96 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-12 08:32:19,404 epoch 6 - iter 1042/5212 - loss 0.02842938 - time (sec): 306.12 - samples/sec: 248.85 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-12 08:34:49,046 epoch 6 - iter 1563/5212 - loss 0.02734753 - time (sec): 455.76 - samples/sec: 245.42 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-12 08:37:18,531 epoch 6 - iter 2084/5212 - loss 0.02823876 - time (sec): 605.24 - samples/sec: 243.11 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-12 08:39:53,567 epoch 6 - iter 2605/5212 - loss 0.02786838 - time (sec): 760.28 - samples/sec: 245.30 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-12 08:42:25,382 epoch 6 - iter 3126/5212 - loss 0.02839744 - time (sec): 912.09 - samples/sec: 245.74 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-12 08:44:57,149 epoch 6 - iter 3647/5212 - loss 0.02951283 - time (sec): 1063.86 - samples/sec: 244.21 - lr: 0.000076 - momentum: 0.000000 |
|
2023-10-12 08:47:28,472 epoch 6 - iter 4168/5212 - loss 0.02979875 - time (sec): 1215.18 - samples/sec: 243.16 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-12 08:50:03,415 epoch 6 - iter 4689/5212 - loss 0.03174850 - time (sec): 1370.13 - samples/sec: 242.29 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-12 08:52:35,553 epoch 6 - iter 5210/5212 - loss 0.03173940 - time (sec): 1522.26 - samples/sec: 241.32 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-12 08:52:36,019 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 08:52:36,019 EPOCH 6 done: loss 0.0317 - lr: 0.000071 |
|
2023-10-12 08:53:17,512 DEV : loss 0.38352152705192566 - f1-score (micro avg) 0.3891 |
|
2023-10-12 08:53:17,573 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 08:55:50,278 epoch 7 - iter 521/5212 - loss 0.02027192 - time (sec): 152.70 - samples/sec: 240.26 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-12 08:58:23,318 epoch 7 - iter 1042/5212 - loss 0.02172861 - time (sec): 305.74 - samples/sec: 250.12 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-12 09:00:54,230 epoch 7 - iter 1563/5212 - loss 0.02271727 - time (sec): 456.65 - samples/sec: 244.80 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-12 09:03:26,584 epoch 7 - iter 2084/5212 - loss 0.02225368 - time (sec): 609.01 - samples/sec: 245.02 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-12 09:05:56,723 epoch 7 - iter 2605/5212 - loss 0.02220231 - time (sec): 759.15 - samples/sec: 244.06 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-12 09:08:30,667 epoch 7 - iter 3126/5212 - loss 0.02190183 - time (sec): 913.09 - samples/sec: 244.41 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-12 09:11:01,507 epoch 7 - iter 3647/5212 - loss 0.02278464 - time (sec): 1063.93 - samples/sec: 242.59 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-12 09:13:32,093 epoch 7 - iter 4168/5212 - loss 0.02211897 - time (sec): 1214.52 - samples/sec: 243.57 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-12 09:16:02,927 epoch 7 - iter 4689/5212 - loss 0.02201655 - time (sec): 1365.35 - samples/sec: 242.68 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-12 09:18:33,963 epoch 7 - iter 5210/5212 - loss 0.02167740 - time (sec): 1516.39 - samples/sec: 242.23 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-12 09:18:34,449 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 09:18:34,449 EPOCH 7 done: loss 0.0217 - lr: 0.000053 |
|
2023-10-12 09:19:15,575 DEV : loss 0.4565373957157135 - f1-score (micro avg) 0.3855 |
|
2023-10-12 09:19:15,632 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 09:21:49,016 epoch 8 - iter 521/5212 - loss 0.01518849 - time (sec): 153.38 - samples/sec: 242.58 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-12 09:24:21,066 epoch 8 - iter 1042/5212 - loss 0.01802269 - time (sec): 305.43 - samples/sec: 249.30 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-12 09:26:55,977 epoch 8 - iter 1563/5212 - loss 0.01788164 - time (sec): 460.34 - samples/sec: 255.33 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-12 09:29:26,001 epoch 8 - iter 2084/5212 - loss 0.01803102 - time (sec): 610.37 - samples/sec: 251.49 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-12 09:31:54,221 epoch 8 - iter 2605/5212 - loss 0.01787651 - time (sec): 758.59 - samples/sec: 248.40 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-12 09:34:22,255 epoch 8 - iter 3126/5212 - loss 0.01738394 - time (sec): 906.62 - samples/sec: 245.47 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-12 09:36:52,062 epoch 8 - iter 3647/5212 - loss 0.01601736 - time (sec): 1056.43 - samples/sec: 243.28 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-12 09:39:21,737 epoch 8 - iter 4168/5212 - loss 0.01630703 - time (sec): 1206.10 - samples/sec: 243.03 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-12 09:41:51,186 epoch 8 - iter 4689/5212 - loss 0.01592637 - time (sec): 1355.55 - samples/sec: 242.64 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-12 09:44:24,639 epoch 8 - iter 5210/5212 - loss 0.01658403 - time (sec): 1509.00 - samples/sec: 243.45 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-12 09:44:25,102 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 09:44:25,102 EPOCH 8 done: loss 0.0166 - lr: 0.000036 |
|
2023-10-12 09:45:06,697 DEV : loss 0.47024932503700256 - f1-score (micro avg) 0.3874 |
|
2023-10-12 09:45:06,754 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 09:47:40,905 epoch 9 - iter 521/5212 - loss 0.00960047 - time (sec): 154.15 - samples/sec: 251.93 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-12 09:50:19,961 epoch 9 - iter 1042/5212 - loss 0.01249724 - time (sec): 313.20 - samples/sec: 245.66 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-12 09:52:56,352 epoch 9 - iter 1563/5212 - loss 0.01171469 - time (sec): 469.60 - samples/sec: 236.58 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-12 09:55:26,979 epoch 9 - iter 2084/5212 - loss 0.01188106 - time (sec): 620.22 - samples/sec: 235.40 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-12 09:57:57,662 epoch 9 - iter 2605/5212 - loss 0.01152806 - time (sec): 770.90 - samples/sec: 237.92 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-12 10:00:21,367 epoch 9 - iter 3126/5212 - loss 0.01157546 - time (sec): 914.61 - samples/sec: 240.46 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-12 10:02:55,245 epoch 9 - iter 3647/5212 - loss 0.01097260 - time (sec): 1068.49 - samples/sec: 242.57 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-12 10:05:24,412 epoch 9 - iter 4168/5212 - loss 0.01044367 - time (sec): 1217.65 - samples/sec: 241.16 - lr: 0.000021 - momentum: 0.000000 |
|
2023-10-12 10:07:55,416 epoch 9 - iter 4689/5212 - loss 0.01056193 - time (sec): 1368.66 - samples/sec: 241.20 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-12 10:10:32,837 epoch 9 - iter 5210/5212 - loss 0.01115296 - time (sec): 1526.08 - samples/sec: 240.73 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-12 10:10:33,340 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 10:10:33,340 EPOCH 9 done: loss 0.0112 - lr: 0.000018 |
|
2023-10-12 10:11:14,544 DEV : loss 0.46671026945114136 - f1-score (micro avg) 0.3997 |
|
2023-10-12 10:11:14,602 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 10:13:42,802 epoch 10 - iter 521/5212 - loss 0.00611141 - time (sec): 148.20 - samples/sec: 241.15 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-12 10:16:14,619 epoch 10 - iter 1042/5212 - loss 0.00568024 - time (sec): 300.02 - samples/sec: 240.82 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-12 10:18:46,930 epoch 10 - iter 1563/5212 - loss 0.00566509 - time (sec): 452.33 - samples/sec: 245.10 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-12 10:21:16,717 epoch 10 - iter 2084/5212 - loss 0.00562846 - time (sec): 602.11 - samples/sec: 244.14 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-12 10:23:46,749 epoch 10 - iter 2605/5212 - loss 0.00614487 - time (sec): 752.15 - samples/sec: 243.23 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-12 10:26:18,183 epoch 10 - iter 3126/5212 - loss 0.00606442 - time (sec): 903.58 - samples/sec: 242.10 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-12 10:28:50,844 epoch 10 - iter 3647/5212 - loss 0.00629741 - time (sec): 1056.24 - samples/sec: 242.68 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-12 10:31:25,229 epoch 10 - iter 4168/5212 - loss 0.00581098 - time (sec): 1210.62 - samples/sec: 243.91 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-12 10:33:56,605 epoch 10 - iter 4689/5212 - loss 0.00597572 - time (sec): 1362.00 - samples/sec: 243.59 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-12 10:36:29,666 epoch 10 - iter 5210/5212 - loss 0.00617301 - time (sec): 1515.06 - samples/sec: 242.46 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-12 10:36:30,159 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 10:36:30,160 EPOCH 10 done: loss 0.0062 - lr: 0.000000 |
|
2023-10-12 10:37:12,484 DEV : loss 0.4969789683818817 - f1-score (micro avg) 0.4007 |
|
2023-10-12 10:37:12,548 saving best model |
|
2023-10-12 10:37:16,211 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 10:37:16,213 Loading model from best epoch ... |
|
2023-10-12 10:37:20,237 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-12 10:39:01,582 |
|
Results: |
|
- F-score (micro) 0.472 |
|
- F-score (macro) 0.3229 |
|
- Accuracy 0.3142 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.4978 0.5601 0.5271 1214 |
|
PER 0.4233 0.5297 0.4706 808 |
|
ORG 0.2930 0.2946 0.2938 353 |
|
HumanProd 0.0000 0.0000 0.0000 15 |
|
|
|
micro avg 0.4414 0.5071 0.4720 2390 |
|
macro avg 0.3035 0.3461 0.3229 2390 |
|
weighted avg 0.4393 0.5071 0.4702 2390 |
|
|
|
2023-10-12 10:39:01,583 ---------------------------------------------------------------------------------------------------- |
|
|