|
2023-10-11 02:22:17,955 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,957 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 02:22:17,957 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,957 MultiCorpus: 1166 train + 165 dev + 415 test sentences |
|
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator |
|
2023-10-11 02:22:17,957 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,957 Train: 1166 sentences |
|
2023-10-11 02:22:17,957 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 02:22:17,958 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,958 Training Params: |
|
2023-10-11 02:22:17,958 - learning_rate: "0.00016" |
|
2023-10-11 02:22:17,958 - mini_batch_size: "4" |
|
2023-10-11 02:22:17,958 - max_epochs: "10" |
|
2023-10-11 02:22:17,958 - shuffle: "True" |
|
2023-10-11 02:22:17,958 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,958 Plugins: |
|
2023-10-11 02:22:17,958 - TensorboardLogger |
|
2023-10-11 02:22:17,958 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 02:22:17,958 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,958 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 02:22:17,958 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 02:22:17,958 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,958 Computation: |
|
2023-10-11 02:22:17,959 - compute on device: cuda:0 |
|
2023-10-11 02:22:17,959 - embedding storage: none |
|
2023-10-11 02:22:17,959 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,959 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" |
|
2023-10-11 02:22:17,959 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,959 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:22:17,959 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 02:22:27,986 epoch 1 - iter 29/292 - loss 2.84405011 - time (sec): 10.02 - samples/sec: 469.93 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 02:22:37,370 epoch 1 - iter 58/292 - loss 2.83170628 - time (sec): 19.41 - samples/sec: 457.20 - lr: 0.000031 - momentum: 0.000000 |
|
2023-10-11 02:22:46,528 epoch 1 - iter 87/292 - loss 2.80816659 - time (sec): 28.57 - samples/sec: 445.96 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 02:22:56,398 epoch 1 - iter 116/292 - loss 2.74484364 - time (sec): 38.44 - samples/sec: 452.01 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 02:23:06,048 epoch 1 - iter 145/292 - loss 2.65830898 - time (sec): 48.09 - samples/sec: 441.16 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-11 02:23:15,959 epoch 1 - iter 174/292 - loss 2.54286823 - time (sec): 58.00 - samples/sec: 442.84 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 02:23:26,700 epoch 1 - iter 203/292 - loss 2.41236971 - time (sec): 68.74 - samples/sec: 455.77 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-11 02:23:36,342 epoch 1 - iter 232/292 - loss 2.29850246 - time (sec): 78.38 - samples/sec: 456.38 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 02:23:45,555 epoch 1 - iter 261/292 - loss 2.19378808 - time (sec): 87.59 - samples/sec: 451.85 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 02:23:55,974 epoch 1 - iter 290/292 - loss 2.05654901 - time (sec): 98.01 - samples/sec: 451.23 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 02:23:56,470 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:23:56,470 EPOCH 1 done: loss 2.0507 - lr: 0.000158 |
|
2023-10-11 02:24:01,919 DEV : loss 0.6568392515182495 - f1-score (micro avg) 0.0 |
|
2023-10-11 02:24:01,927 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:24:11,061 epoch 2 - iter 29/292 - loss 0.62598909 - time (sec): 9.13 - samples/sec: 461.56 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 02:24:20,167 epoch 2 - iter 58/292 - loss 0.57978019 - time (sec): 18.24 - samples/sec: 467.49 - lr: 0.000157 - momentum: 0.000000 |
|
2023-10-11 02:24:29,160 epoch 2 - iter 87/292 - loss 0.57477199 - time (sec): 27.23 - samples/sec: 471.57 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-11 02:24:38,531 epoch 2 - iter 116/292 - loss 0.53886914 - time (sec): 36.60 - samples/sec: 479.18 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-11 02:24:47,735 epoch 2 - iter 145/292 - loss 0.51975581 - time (sec): 45.81 - samples/sec: 480.73 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-11 02:24:57,373 epoch 2 - iter 174/292 - loss 0.53807317 - time (sec): 55.44 - samples/sec: 486.32 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 02:25:07,457 epoch 2 - iter 203/292 - loss 0.51656765 - time (sec): 65.53 - samples/sec: 485.08 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 02:25:17,793 epoch 2 - iter 232/292 - loss 0.50681885 - time (sec): 75.86 - samples/sec: 474.56 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-11 02:25:29,042 epoch 2 - iter 261/292 - loss 0.49819345 - time (sec): 87.11 - samples/sec: 464.32 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-11 02:25:39,415 epoch 2 - iter 290/292 - loss 0.48361749 - time (sec): 97.49 - samples/sec: 454.44 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 02:25:39,903 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:25:39,903 EPOCH 2 done: loss 0.4844 - lr: 0.000142 |
|
2023-10-11 02:25:45,697 DEV : loss 0.27027052640914917 - f1-score (micro avg) 0.2532 |
|
2023-10-11 02:25:45,706 saving best model |
|
2023-10-11 02:25:46,553 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:25:56,518 epoch 3 - iter 29/292 - loss 0.34171181 - time (sec): 9.96 - samples/sec: 385.13 - lr: 0.000141 - momentum: 0.000000 |
|
2023-10-11 02:26:06,230 epoch 3 - iter 58/292 - loss 0.33011318 - time (sec): 19.67 - samples/sec: 435.89 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-11 02:26:15,438 epoch 3 - iter 87/292 - loss 0.29524142 - time (sec): 28.88 - samples/sec: 428.18 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 02:26:25,582 epoch 3 - iter 116/292 - loss 0.32286885 - time (sec): 39.03 - samples/sec: 442.75 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 02:26:35,208 epoch 3 - iter 145/292 - loss 0.30129883 - time (sec): 48.65 - samples/sec: 452.27 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 02:26:45,353 epoch 3 - iter 174/292 - loss 0.30000804 - time (sec): 58.80 - samples/sec: 459.49 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 02:26:54,123 epoch 3 - iter 203/292 - loss 0.29053407 - time (sec): 67.57 - samples/sec: 454.97 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 02:27:02,974 epoch 3 - iter 232/292 - loss 0.28629204 - time (sec): 76.42 - samples/sec: 452.51 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 02:27:12,219 epoch 3 - iter 261/292 - loss 0.28138015 - time (sec): 85.66 - samples/sec: 455.33 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 02:27:22,634 epoch 3 - iter 290/292 - loss 0.27715632 - time (sec): 96.08 - samples/sec: 461.29 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 02:27:23,054 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:27:23,054 EPOCH 3 done: loss 0.2770 - lr: 0.000125 |
|
2023-10-11 02:27:28,787 DEV : loss 0.19208243489265442 - f1-score (micro avg) 0.498 |
|
2023-10-11 02:27:28,795 saving best model |
|
2023-10-11 02:27:31,430 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:27:40,794 epoch 4 - iter 29/292 - loss 0.15296171 - time (sec): 9.36 - samples/sec: 452.03 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 02:27:50,090 epoch 4 - iter 58/292 - loss 0.19406134 - time (sec): 18.66 - samples/sec: 458.51 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 02:27:59,801 epoch 4 - iter 87/292 - loss 0.17747525 - time (sec): 28.37 - samples/sec: 464.34 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 02:28:09,135 epoch 4 - iter 116/292 - loss 0.16770571 - time (sec): 37.70 - samples/sec: 463.17 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 02:28:18,141 epoch 4 - iter 145/292 - loss 0.17038541 - time (sec): 46.71 - samples/sec: 459.89 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-11 02:28:28,400 epoch 4 - iter 174/292 - loss 0.17589765 - time (sec): 56.97 - samples/sec: 458.13 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 02:28:38,133 epoch 4 - iter 203/292 - loss 0.17832868 - time (sec): 66.70 - samples/sec: 457.92 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 02:28:47,802 epoch 4 - iter 232/292 - loss 0.17493554 - time (sec): 76.37 - samples/sec: 458.90 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 02:28:57,190 epoch 4 - iter 261/292 - loss 0.17196372 - time (sec): 85.76 - samples/sec: 455.30 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 02:29:07,230 epoch 4 - iter 290/292 - loss 0.16898623 - time (sec): 95.80 - samples/sec: 459.64 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 02:29:07,912 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:29:07,913 EPOCH 4 done: loss 0.1681 - lr: 0.000107 |
|
2023-10-11 02:29:13,730 DEV : loss 0.147971972823143 - f1-score (micro avg) 0.7257 |
|
2023-10-11 02:29:13,739 saving best model |
|
2023-10-11 02:29:14,672 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:29:25,259 epoch 5 - iter 29/292 - loss 0.15169791 - time (sec): 10.58 - samples/sec: 526.70 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 02:29:35,330 epoch 5 - iter 58/292 - loss 0.13029739 - time (sec): 20.66 - samples/sec: 500.82 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 02:29:45,248 epoch 5 - iter 87/292 - loss 0.13245643 - time (sec): 30.57 - samples/sec: 462.98 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-11 02:29:55,075 epoch 5 - iter 116/292 - loss 0.12631426 - time (sec): 40.40 - samples/sec: 464.12 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 02:30:04,502 epoch 5 - iter 145/292 - loss 0.13024278 - time (sec): 49.83 - samples/sec: 462.23 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 02:30:13,782 epoch 5 - iter 174/292 - loss 0.12849868 - time (sec): 59.11 - samples/sec: 457.18 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-11 02:30:23,235 epoch 5 - iter 203/292 - loss 0.12118872 - time (sec): 68.56 - samples/sec: 455.06 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-11 02:30:32,460 epoch 5 - iter 232/292 - loss 0.11674466 - time (sec): 77.79 - samples/sec: 455.60 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 02:30:41,348 epoch 5 - iter 261/292 - loss 0.11497375 - time (sec): 86.67 - samples/sec: 452.60 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 02:30:51,317 epoch 5 - iter 290/292 - loss 0.11482274 - time (sec): 96.64 - samples/sec: 456.46 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 02:30:51,888 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:30:51,889 EPOCH 5 done: loss 0.1147 - lr: 0.000089 |
|
2023-10-11 02:30:57,394 DEV : loss 0.13591936230659485 - f1-score (micro avg) 0.7312 |
|
2023-10-11 02:30:57,403 saving best model |
|
2023-10-11 02:30:59,986 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:31:10,259 epoch 6 - iter 29/292 - loss 0.08136425 - time (sec): 10.27 - samples/sec: 534.03 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 02:31:19,438 epoch 6 - iter 58/292 - loss 0.06936994 - time (sec): 19.45 - samples/sec: 515.27 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 02:31:27,937 epoch 6 - iter 87/292 - loss 0.07071139 - time (sec): 27.95 - samples/sec: 490.85 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 02:31:37,281 epoch 6 - iter 116/292 - loss 0.06752745 - time (sec): 37.29 - samples/sec: 492.85 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 02:31:46,112 epoch 6 - iter 145/292 - loss 0.07212155 - time (sec): 46.12 - samples/sec: 480.88 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 02:31:55,850 epoch 6 - iter 174/292 - loss 0.08003889 - time (sec): 55.86 - samples/sec: 486.31 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 02:32:05,557 epoch 6 - iter 203/292 - loss 0.07724681 - time (sec): 65.57 - samples/sec: 488.70 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 02:32:15,268 epoch 6 - iter 232/292 - loss 0.07725976 - time (sec): 75.28 - samples/sec: 478.53 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 02:32:24,897 epoch 6 - iter 261/292 - loss 0.08058987 - time (sec): 84.91 - samples/sec: 468.19 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 02:32:34,909 epoch 6 - iter 290/292 - loss 0.07823012 - time (sec): 94.92 - samples/sec: 464.64 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 02:32:35,538 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:32:35,538 EPOCH 6 done: loss 0.0785 - lr: 0.000071 |
|
2023-10-11 02:32:42,085 DEV : loss 0.12154516577720642 - f1-score (micro avg) 0.7733 |
|
2023-10-11 02:32:42,094 saving best model |
|
2023-10-11 02:32:44,717 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:32:53,994 epoch 7 - iter 29/292 - loss 0.05511245 - time (sec): 9.27 - samples/sec: 411.85 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 02:33:03,636 epoch 7 - iter 58/292 - loss 0.05492446 - time (sec): 18.91 - samples/sec: 435.00 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 02:33:12,894 epoch 7 - iter 87/292 - loss 0.06175282 - time (sec): 28.17 - samples/sec: 428.04 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-11 02:33:23,190 epoch 7 - iter 116/292 - loss 0.05569448 - time (sec): 38.47 - samples/sec: 441.53 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 02:33:33,372 epoch 7 - iter 145/292 - loss 0.05718601 - time (sec): 48.65 - samples/sec: 458.43 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 02:33:42,561 epoch 7 - iter 174/292 - loss 0.06103347 - time (sec): 57.84 - samples/sec: 458.39 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 02:33:52,399 epoch 7 - iter 203/292 - loss 0.05935794 - time (sec): 67.68 - samples/sec: 463.39 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 02:34:01,519 epoch 7 - iter 232/292 - loss 0.05778039 - time (sec): 76.80 - samples/sec: 462.19 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 02:34:10,853 epoch 7 - iter 261/292 - loss 0.05858678 - time (sec): 86.13 - samples/sec: 461.09 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 02:34:20,513 epoch 7 - iter 290/292 - loss 0.05966390 - time (sec): 95.79 - samples/sec: 461.88 - lr: 0.000054 - momentum: 0.000000 |
|
2023-10-11 02:34:20,982 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:34:20,982 EPOCH 7 done: loss 0.0596 - lr: 0.000054 |
|
2023-10-11 02:34:26,485 DEV : loss 0.12782339751720428 - f1-score (micro avg) 0.7749 |
|
2023-10-11 02:34:26,494 saving best model |
|
2023-10-11 02:34:29,017 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:34:38,399 epoch 8 - iter 29/292 - loss 0.05488144 - time (sec): 9.38 - samples/sec: 460.14 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 02:34:47,950 epoch 8 - iter 58/292 - loss 0.04194912 - time (sec): 18.93 - samples/sec: 481.44 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 02:34:56,990 epoch 8 - iter 87/292 - loss 0.04292430 - time (sec): 27.97 - samples/sec: 470.25 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 02:35:06,302 epoch 8 - iter 116/292 - loss 0.05008338 - time (sec): 37.28 - samples/sec: 462.18 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-11 02:35:16,594 epoch 8 - iter 145/292 - loss 0.04445838 - time (sec): 47.57 - samples/sec: 473.68 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 02:35:26,793 epoch 8 - iter 174/292 - loss 0.04716916 - time (sec): 57.77 - samples/sec: 475.06 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 02:35:36,141 epoch 8 - iter 203/292 - loss 0.04687333 - time (sec): 67.12 - samples/sec: 471.02 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-11 02:35:45,092 epoch 8 - iter 232/292 - loss 0.04640217 - time (sec): 76.07 - samples/sec: 468.05 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 02:35:54,331 epoch 8 - iter 261/292 - loss 0.04752936 - time (sec): 85.31 - samples/sec: 466.07 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 02:36:03,829 epoch 8 - iter 290/292 - loss 0.04677794 - time (sec): 94.81 - samples/sec: 466.35 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-11 02:36:04,315 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:36:04,316 EPOCH 8 done: loss 0.0471 - lr: 0.000036 |
|
2023-10-11 02:36:09,809 DEV : loss 0.13004955649375916 - f1-score (micro avg) 0.7759 |
|
2023-10-11 02:36:09,818 saving best model |
|
2023-10-11 02:36:12,333 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:36:21,618 epoch 9 - iter 29/292 - loss 0.02873439 - time (sec): 9.28 - samples/sec: 485.32 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 02:36:30,730 epoch 9 - iter 58/292 - loss 0.03268934 - time (sec): 18.39 - samples/sec: 472.14 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 02:36:40,098 epoch 9 - iter 87/292 - loss 0.03158168 - time (sec): 27.76 - samples/sec: 479.63 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 02:36:49,302 epoch 9 - iter 116/292 - loss 0.02918808 - time (sec): 36.96 - samples/sec: 473.43 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 02:36:59,109 epoch 9 - iter 145/292 - loss 0.02774731 - time (sec): 46.77 - samples/sec: 475.68 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 02:37:08,619 epoch 9 - iter 174/292 - loss 0.02995775 - time (sec): 56.28 - samples/sec: 466.07 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 02:37:18,451 epoch 9 - iter 203/292 - loss 0.03214478 - time (sec): 66.11 - samples/sec: 468.14 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 02:37:28,951 epoch 9 - iter 232/292 - loss 0.03234939 - time (sec): 76.61 - samples/sec: 469.83 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 02:37:38,213 epoch 9 - iter 261/292 - loss 0.03627506 - time (sec): 85.88 - samples/sec: 463.78 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 02:37:48,218 epoch 9 - iter 290/292 - loss 0.03898896 - time (sec): 95.88 - samples/sec: 461.31 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 02:37:48,705 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:37:48,705 EPOCH 9 done: loss 0.0390 - lr: 0.000018 |
|
2023-10-11 02:37:54,501 DEV : loss 0.1367434859275818 - f1-score (micro avg) 0.7613 |
|
2023-10-11 02:37:54,510 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:38:04,763 epoch 10 - iter 29/292 - loss 0.02971200 - time (sec): 10.25 - samples/sec: 498.51 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 02:38:14,588 epoch 10 - iter 58/292 - loss 0.03029689 - time (sec): 20.08 - samples/sec: 481.86 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 02:38:24,532 epoch 10 - iter 87/292 - loss 0.02849897 - time (sec): 30.02 - samples/sec: 477.58 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 02:38:33,766 epoch 10 - iter 116/292 - loss 0.03324429 - time (sec): 39.25 - samples/sec: 462.91 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-11 02:38:43,535 epoch 10 - iter 145/292 - loss 0.03540899 - time (sec): 49.02 - samples/sec: 463.44 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-11 02:38:53,923 epoch 10 - iter 174/292 - loss 0.03318948 - time (sec): 59.41 - samples/sec: 466.48 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 02:39:02,992 epoch 10 - iter 203/292 - loss 0.03240122 - time (sec): 68.48 - samples/sec: 455.63 - lr: 0.000006 - momentum: 0.000000 |
|
2023-10-11 02:39:12,960 epoch 10 - iter 232/292 - loss 0.03389265 - time (sec): 78.45 - samples/sec: 451.94 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-11 02:39:22,905 epoch 10 - iter 261/292 - loss 0.03543843 - time (sec): 88.39 - samples/sec: 450.18 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 02:39:32,940 epoch 10 - iter 290/292 - loss 0.03586228 - time (sec): 98.43 - samples/sec: 449.45 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 02:39:33,421 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:39:33,422 EPOCH 10 done: loss 0.0358 - lr: 0.000000 |
|
2023-10-11 02:39:38,962 DEV : loss 0.1346944272518158 - f1-score (micro avg) 0.7646 |
|
2023-10-11 02:39:39,825 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 02:39:39,827 Loading model from best epoch ... |
|
2023-10-11 02:39:43,540 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-11 02:39:55,836 |
|
Results: |
|
- F-score (micro) 0.7191 |
|
- F-score (macro) 0.6682 |
|
- Accuracy 0.5799 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
PER 0.7908 0.8362 0.8128 348 |
|
LOC 0.5718 0.7778 0.6591 261 |
|
ORG 0.3830 0.3462 0.3636 52 |
|
HumanProd 0.8571 0.8182 0.8372 22 |
|
|
|
micro avg 0.6700 0.7760 0.7191 683 |
|
macro avg 0.6507 0.6946 0.6682 683 |
|
weighted avg 0.6782 0.7760 0.7207 683 |
|
|
|
2023-10-11 02:39:55,837 ---------------------------------------------------------------------------------------------------- |
|
|