|
2023-10-10 22:19:18,056 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,058 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-10 22:19:18,059 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,059 MultiCorpus: 1166 train + 165 dev + 415 test sentences |
|
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator |
|
2023-10-10 22:19:18,059 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,059 Train: 1166 sentences |
|
2023-10-10 22:19:18,059 (train_with_dev=False, train_with_test=False) |
|
2023-10-10 22:19:18,059 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,059 Training Params: |
|
2023-10-10 22:19:18,059 - learning_rate: "0.00015" |
|
2023-10-10 22:19:18,059 - mini_batch_size: "4" |
|
2023-10-10 22:19:18,059 - max_epochs: "10" |
|
2023-10-10 22:19:18,060 - shuffle: "True" |
|
2023-10-10 22:19:18,060 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,060 Plugins: |
|
2023-10-10 22:19:18,060 - TensorboardLogger |
|
2023-10-10 22:19:18,060 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-10 22:19:18,060 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,060 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-10 22:19:18,060 - metric: "('micro avg', 'f1-score')" |
|
2023-10-10 22:19:18,060 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,060 Computation: |
|
2023-10-10 22:19:18,060 - compute on device: cuda:0 |
|
2023-10-10 22:19:18,060 - embedding storage: none |
|
2023-10-10 22:19:18,060 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,060 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" |
|
2023-10-10 22:19:18,060 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,061 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:19:18,061 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-10 22:19:28,613 epoch 1 - iter 29/292 - loss 2.82926505 - time (sec): 10.55 - samples/sec: 484.14 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-10 22:19:38,229 epoch 1 - iter 58/292 - loss 2.82094593 - time (sec): 20.17 - samples/sec: 452.45 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-10 22:19:49,440 epoch 1 - iter 87/292 - loss 2.79891280 - time (sec): 31.38 - samples/sec: 466.73 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-10 22:19:59,444 epoch 1 - iter 116/292 - loss 2.76576425 - time (sec): 41.38 - samples/sec: 450.64 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-10 22:20:08,988 epoch 1 - iter 145/292 - loss 2.69583102 - time (sec): 50.93 - samples/sec: 442.63 - lr: 0.000074 - momentum: 0.000000 |
|
2023-10-10 22:20:18,043 epoch 1 - iter 174/292 - loss 2.60800245 - time (sec): 59.98 - samples/sec: 438.04 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-10 22:20:27,984 epoch 1 - iter 203/292 - loss 2.47632788 - time (sec): 69.92 - samples/sec: 444.58 - lr: 0.000104 - momentum: 0.000000 |
|
2023-10-10 22:20:37,938 epoch 1 - iter 232/292 - loss 2.35519679 - time (sec): 79.88 - samples/sec: 447.56 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-10 22:20:47,347 epoch 1 - iter 261/292 - loss 2.23758720 - time (sec): 89.28 - samples/sec: 447.63 - lr: 0.000134 - momentum: 0.000000 |
|
2023-10-10 22:20:56,952 epoch 1 - iter 290/292 - loss 2.12289055 - time (sec): 98.89 - samples/sec: 447.60 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-10 22:20:57,414 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:20:57,414 EPOCH 1 done: loss 2.1182 - lr: 0.000148 |
|
2023-10-10 22:21:02,854 DEV : loss 0.7325010299682617 - f1-score (micro avg) 0.0 |
|
2023-10-10 22:21:02,865 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:21:13,161 epoch 2 - iter 29/292 - loss 0.79297456 - time (sec): 10.29 - samples/sec: 454.36 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-10 22:21:23,229 epoch 2 - iter 58/292 - loss 0.82708352 - time (sec): 20.36 - samples/sec: 451.29 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-10 22:21:33,413 epoch 2 - iter 87/292 - loss 0.72111522 - time (sec): 30.55 - samples/sec: 448.51 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-10 22:21:43,537 epoch 2 - iter 116/292 - loss 0.64880272 - time (sec): 40.67 - samples/sec: 454.17 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-10 22:21:54,263 epoch 2 - iter 145/292 - loss 0.61803861 - time (sec): 51.40 - samples/sec: 458.36 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-10 22:22:03,618 epoch 2 - iter 174/292 - loss 0.57992157 - time (sec): 60.75 - samples/sec: 455.90 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-10 22:22:13,432 epoch 2 - iter 203/292 - loss 0.56502344 - time (sec): 70.56 - samples/sec: 447.15 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-10 22:22:23,089 epoch 2 - iter 232/292 - loss 0.54830154 - time (sec): 80.22 - samples/sec: 444.07 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-10 22:22:33,044 epoch 2 - iter 261/292 - loss 0.52602966 - time (sec): 90.18 - samples/sec: 444.78 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-10 22:22:43,474 epoch 2 - iter 290/292 - loss 0.52486245 - time (sec): 100.61 - samples/sec: 441.05 - lr: 0.000134 - momentum: 0.000000 |
|
2023-10-10 22:22:43,929 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:22:43,930 EPOCH 2 done: loss 0.5243 - lr: 0.000134 |
|
2023-10-10 22:22:49,985 DEV : loss 0.30209726095199585 - f1-score (micro avg) 0.1084 |
|
2023-10-10 22:22:49,994 saving best model |
|
2023-10-10 22:22:50,936 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:23:00,714 epoch 3 - iter 29/292 - loss 0.38517526 - time (sec): 9.77 - samples/sec: 379.75 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-10 22:23:11,354 epoch 3 - iter 58/292 - loss 0.30206939 - time (sec): 20.41 - samples/sec: 435.28 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-10 22:23:21,441 epoch 3 - iter 87/292 - loss 0.33227764 - time (sec): 30.50 - samples/sec: 429.17 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-10 22:23:31,122 epoch 3 - iter 116/292 - loss 0.32587409 - time (sec): 40.18 - samples/sec: 425.78 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-10 22:23:40,829 epoch 3 - iter 145/292 - loss 0.32100867 - time (sec): 49.89 - samples/sec: 432.90 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-10 22:23:50,144 epoch 3 - iter 174/292 - loss 0.33023055 - time (sec): 59.20 - samples/sec: 432.40 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-10 22:24:02,391 epoch 3 - iter 203/292 - loss 0.33747240 - time (sec): 71.45 - samples/sec: 435.51 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-10 22:24:13,507 epoch 3 - iter 232/292 - loss 0.33787885 - time (sec): 82.57 - samples/sec: 432.12 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-10 22:24:24,624 epoch 3 - iter 261/292 - loss 0.33054624 - time (sec): 93.68 - samples/sec: 432.20 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-10 22:24:34,917 epoch 3 - iter 290/292 - loss 0.32750486 - time (sec): 103.98 - samples/sec: 426.04 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-10 22:24:35,419 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:24:35,419 EPOCH 3 done: loss 0.3268 - lr: 0.000117 |
|
2023-10-10 22:24:41,411 DEV : loss 0.24043378233909607 - f1-score (micro avg) 0.2967 |
|
2023-10-10 22:24:41,424 saving best model |
|
2023-10-10 22:24:50,241 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:25:01,073 epoch 4 - iter 29/292 - loss 0.23406478 - time (sec): 10.82 - samples/sec: 462.15 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-10 22:25:11,188 epoch 4 - iter 58/292 - loss 0.22046035 - time (sec): 20.94 - samples/sec: 435.43 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-10 22:25:22,136 epoch 4 - iter 87/292 - loss 0.26702786 - time (sec): 31.89 - samples/sec: 439.90 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-10 22:25:32,876 epoch 4 - iter 116/292 - loss 0.27026625 - time (sec): 42.63 - samples/sec: 431.15 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-10 22:25:43,367 epoch 4 - iter 145/292 - loss 0.26372476 - time (sec): 53.12 - samples/sec: 430.58 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-10 22:25:54,534 epoch 4 - iter 174/292 - loss 0.26533000 - time (sec): 64.28 - samples/sec: 421.48 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-10 22:26:05,644 epoch 4 - iter 203/292 - loss 0.25791821 - time (sec): 75.39 - samples/sec: 420.71 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-10 22:26:16,097 epoch 4 - iter 232/292 - loss 0.25332297 - time (sec): 85.85 - samples/sec: 420.57 - lr: 0.000104 - momentum: 0.000000 |
|
2023-10-10 22:26:25,223 epoch 4 - iter 261/292 - loss 0.25035247 - time (sec): 94.97 - samples/sec: 418.14 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-10 22:26:34,882 epoch 4 - iter 290/292 - loss 0.24508743 - time (sec): 104.63 - samples/sec: 422.74 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-10 22:26:35,337 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:26:35,337 EPOCH 4 done: loss 0.2441 - lr: 0.000100 |
|
2023-10-10 22:26:41,068 DEV : loss 0.1876380443572998 - f1-score (micro avg) 0.547 |
|
2023-10-10 22:26:41,079 saving best model |
|
2023-10-10 22:26:48,720 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:26:58,116 epoch 5 - iter 29/292 - loss 0.20545870 - time (sec): 9.39 - samples/sec: 462.55 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-10 22:27:08,281 epoch 5 - iter 58/292 - loss 0.22599380 - time (sec): 19.56 - samples/sec: 478.78 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-10 22:27:17,552 epoch 5 - iter 87/292 - loss 0.21763180 - time (sec): 28.83 - samples/sec: 471.25 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-10 22:27:26,972 epoch 5 - iter 116/292 - loss 0.18956745 - time (sec): 38.25 - samples/sec: 468.52 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-10 22:27:36,660 epoch 5 - iter 145/292 - loss 0.18521650 - time (sec): 47.93 - samples/sec: 474.60 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-10 22:27:45,741 epoch 5 - iter 174/292 - loss 0.17898882 - time (sec): 57.02 - samples/sec: 464.50 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-10 22:27:56,010 epoch 5 - iter 203/292 - loss 0.17521317 - time (sec): 67.29 - samples/sec: 467.17 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-10 22:28:06,581 epoch 5 - iter 232/292 - loss 0.17531990 - time (sec): 77.86 - samples/sec: 462.76 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-10 22:28:16,357 epoch 5 - iter 261/292 - loss 0.17062989 - time (sec): 87.63 - samples/sec: 453.22 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-10 22:28:26,404 epoch 5 - iter 290/292 - loss 0.16869967 - time (sec): 97.68 - samples/sec: 454.22 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-10 22:28:26,797 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:28:26,798 EPOCH 5 done: loss 0.1685 - lr: 0.000084 |
|
2023-10-10 22:28:32,756 DEV : loss 0.15193822979927063 - f1-score (micro avg) 0.674 |
|
2023-10-10 22:28:32,766 saving best model |
|
2023-10-10 22:28:40,480 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:28:51,531 epoch 6 - iter 29/292 - loss 0.13346566 - time (sec): 11.05 - samples/sec: 486.42 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-10 22:29:00,612 epoch 6 - iter 58/292 - loss 0.13900519 - time (sec): 20.13 - samples/sec: 452.48 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-10 22:29:10,836 epoch 6 - iter 87/292 - loss 0.12975395 - time (sec): 30.35 - samples/sec: 458.54 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-10 22:29:20,807 epoch 6 - iter 116/292 - loss 0.12347829 - time (sec): 40.32 - samples/sec: 458.73 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-10 22:29:30,877 epoch 6 - iter 145/292 - loss 0.12149791 - time (sec): 50.39 - samples/sec: 454.71 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-10 22:29:40,980 epoch 6 - iter 174/292 - loss 0.12536062 - time (sec): 60.50 - samples/sec: 448.45 - lr: 0.000074 - momentum: 0.000000 |
|
2023-10-10 22:29:50,814 epoch 6 - iter 203/292 - loss 0.12241815 - time (sec): 70.33 - samples/sec: 438.94 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-10 22:30:00,514 epoch 6 - iter 232/292 - loss 0.12167750 - time (sec): 80.03 - samples/sec: 437.37 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-10 22:30:10,869 epoch 6 - iter 261/292 - loss 0.12098743 - time (sec): 90.39 - samples/sec: 440.57 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-10 22:30:20,772 epoch 6 - iter 290/292 - loss 0.11981467 - time (sec): 100.29 - samples/sec: 439.95 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-10 22:30:21,347 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:30:21,348 EPOCH 6 done: loss 0.1204 - lr: 0.000067 |
|
2023-10-10 22:30:27,448 DEV : loss 0.13653677701950073 - f1-score (micro avg) 0.6985 |
|
2023-10-10 22:30:27,458 saving best model |
|
2023-10-10 22:30:35,269 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:30:45,651 epoch 7 - iter 29/292 - loss 0.10364982 - time (sec): 10.38 - samples/sec: 472.45 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-10 22:30:54,861 epoch 7 - iter 58/292 - loss 0.09344940 - time (sec): 19.59 - samples/sec: 444.41 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-10 22:31:04,398 epoch 7 - iter 87/292 - loss 0.09750181 - time (sec): 29.12 - samples/sec: 450.75 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-10 22:31:14,866 epoch 7 - iter 116/292 - loss 0.08693134 - time (sec): 39.59 - samples/sec: 465.92 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-10 22:31:23,789 epoch 7 - iter 145/292 - loss 0.08318189 - time (sec): 48.52 - samples/sec: 463.39 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-10 22:31:33,095 epoch 7 - iter 174/292 - loss 0.09163479 - time (sec): 57.82 - samples/sec: 457.97 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-10 22:31:42,534 epoch 7 - iter 203/292 - loss 0.08993163 - time (sec): 67.26 - samples/sec: 454.05 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-10 22:31:52,643 epoch 7 - iter 232/292 - loss 0.08882304 - time (sec): 77.37 - samples/sec: 458.73 - lr: 0.000054 - momentum: 0.000000 |
|
2023-10-10 22:32:02,192 epoch 7 - iter 261/292 - loss 0.08995054 - time (sec): 86.92 - samples/sec: 458.07 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-10 22:32:11,752 epoch 7 - iter 290/292 - loss 0.09260519 - time (sec): 96.48 - samples/sec: 458.64 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-10 22:32:12,222 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:32:12,222 EPOCH 7 done: loss 0.0929 - lr: 0.000050 |
|
2023-10-10 22:32:18,044 DEV : loss 0.13227997720241547 - f1-score (micro avg) 0.7339 |
|
2023-10-10 22:32:18,055 saving best model |
|
2023-10-10 22:32:26,797 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:32:37,310 epoch 8 - iter 29/292 - loss 0.07389758 - time (sec): 10.51 - samples/sec: 481.09 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-10 22:32:47,318 epoch 8 - iter 58/292 - loss 0.07781066 - time (sec): 20.52 - samples/sec: 483.16 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-10 22:32:57,022 epoch 8 - iter 87/292 - loss 0.07989627 - time (sec): 30.22 - samples/sec: 476.35 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-10 22:33:05,754 epoch 8 - iter 116/292 - loss 0.07567283 - time (sec): 38.95 - samples/sec: 463.02 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-10 22:33:15,391 epoch 8 - iter 145/292 - loss 0.07437945 - time (sec): 48.59 - samples/sec: 469.54 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-10 22:33:24,636 epoch 8 - iter 174/292 - loss 0.07758936 - time (sec): 57.84 - samples/sec: 466.94 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-10 22:33:33,871 epoch 8 - iter 203/292 - loss 0.07636403 - time (sec): 67.07 - samples/sec: 460.98 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-10 22:33:44,131 epoch 8 - iter 232/292 - loss 0.07443540 - time (sec): 77.33 - samples/sec: 462.34 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-10 22:33:53,435 epoch 8 - iter 261/292 - loss 0.07286737 - time (sec): 86.63 - samples/sec: 458.54 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-10 22:34:03,445 epoch 8 - iter 290/292 - loss 0.07492659 - time (sec): 96.64 - samples/sec: 458.25 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-10 22:34:03,886 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:34:03,886 EPOCH 8 done: loss 0.0749 - lr: 0.000034 |
|
2023-10-10 22:34:09,835 DEV : loss 0.1255185902118683 - f1-score (micro avg) 0.7106 |
|
2023-10-10 22:34:09,844 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:34:20,242 epoch 9 - iter 29/292 - loss 0.05795037 - time (sec): 10.40 - samples/sec: 448.34 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-10 22:34:30,842 epoch 9 - iter 58/292 - loss 0.05725661 - time (sec): 21.00 - samples/sec: 446.10 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-10 22:34:40,729 epoch 9 - iter 87/292 - loss 0.06037949 - time (sec): 30.88 - samples/sec: 447.63 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-10 22:34:51,425 epoch 9 - iter 116/292 - loss 0.05494455 - time (sec): 41.58 - samples/sec: 442.41 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-10 22:35:00,997 epoch 9 - iter 145/292 - loss 0.05906472 - time (sec): 51.15 - samples/sec: 437.78 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-10 22:35:12,000 epoch 9 - iter 174/292 - loss 0.06382807 - time (sec): 62.15 - samples/sec: 437.79 - lr: 0.000024 - momentum: 0.000000 |
|
2023-10-10 22:35:21,289 epoch 9 - iter 203/292 - loss 0.06188120 - time (sec): 71.44 - samples/sec: 434.30 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-10 22:35:31,447 epoch 9 - iter 232/292 - loss 0.06116108 - time (sec): 81.60 - samples/sec: 441.26 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-10 22:35:41,511 epoch 9 - iter 261/292 - loss 0.05935939 - time (sec): 91.66 - samples/sec: 439.74 - lr: 0.000019 - momentum: 0.000000 |
|
2023-10-10 22:35:50,707 epoch 9 - iter 290/292 - loss 0.06218808 - time (sec): 100.86 - samples/sec: 438.73 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-10 22:35:51,183 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:35:51,183 EPOCH 9 done: loss 0.0622 - lr: 0.000017 |
|
2023-10-10 22:35:57,859 DEV : loss 0.1263173669576645 - f1-score (micro avg) 0.7729 |
|
2023-10-10 22:35:57,869 saving best model |
|
2023-10-10 22:36:06,588 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:36:17,008 epoch 10 - iter 29/292 - loss 0.04466337 - time (sec): 10.42 - samples/sec: 443.76 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-10 22:36:26,256 epoch 10 - iter 58/292 - loss 0.05701445 - time (sec): 19.67 - samples/sec: 435.39 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-10 22:36:35,703 epoch 10 - iter 87/292 - loss 0.05271940 - time (sec): 29.11 - samples/sec: 440.61 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-10 22:36:46,382 epoch 10 - iter 116/292 - loss 0.04764948 - time (sec): 39.79 - samples/sec: 456.08 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-10 22:36:56,840 epoch 10 - iter 145/292 - loss 0.04801644 - time (sec): 50.25 - samples/sec: 456.55 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-10 22:37:06,730 epoch 10 - iter 174/292 - loss 0.04758792 - time (sec): 60.14 - samples/sec: 453.98 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-10 22:37:17,217 epoch 10 - iter 203/292 - loss 0.04879119 - time (sec): 70.63 - samples/sec: 452.69 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-10 22:37:28,159 epoch 10 - iter 232/292 - loss 0.05379231 - time (sec): 81.57 - samples/sec: 441.09 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-10 22:37:38,659 epoch 10 - iter 261/292 - loss 0.05272024 - time (sec): 92.07 - samples/sec: 433.95 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-10 22:37:48,216 epoch 10 - iter 290/292 - loss 0.05624633 - time (sec): 101.62 - samples/sec: 435.19 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-10 22:37:48,784 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:37:48,785 EPOCH 10 done: loss 0.0566 - lr: 0.000000 |
|
2023-10-10 22:37:54,495 DEV : loss 0.12670132517814636 - f1-score (micro avg) 0.7355 |
|
2023-10-10 22:37:55,464 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 22:37:55,466 Loading model from best epoch ... |
|
2023-10-10 22:37:59,258 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-10 22:38:11,937 |
|
Results: |
|
- F-score (micro) 0.732 |
|
- F-score (macro) 0.661 |
|
- Accuracy 0.5936 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
PER 0.7811 0.8305 0.8050 348 |
|
LOC 0.6645 0.7663 0.7117 261 |
|
ORG 0.3091 0.3269 0.3178 52 |
|
HumanProd 0.8500 0.7727 0.8095 22 |
|
|
|
micro avg 0.7011 0.7657 0.7320 683 |
|
macro avg 0.6512 0.6741 0.6610 683 |
|
weighted avg 0.7028 0.7657 0.7324 683 |
|
|
|
2023-10-10 22:38:11,937 ---------------------------------------------------------------------------------------------------- |
|
|