|
2023-10-10 23:54:05,281 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,283 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-10 23:54:05,283 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,283 MultiCorpus: 1166 train + 165 dev + 415 test sentences |
|
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator |
|
2023-10-10 23:54:05,283 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,283 Train: 1166 sentences |
|
2023-10-10 23:54:05,283 (train_with_dev=False, train_with_test=False) |
|
2023-10-10 23:54:05,283 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,283 Training Params: |
|
2023-10-10 23:54:05,283 - learning_rate: "0.00016" |
|
2023-10-10 23:54:05,284 - mini_batch_size: "4" |
|
2023-10-10 23:54:05,284 - max_epochs: "10" |
|
2023-10-10 23:54:05,284 - shuffle: "True" |
|
2023-10-10 23:54:05,284 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,284 Plugins: |
|
2023-10-10 23:54:05,284 - TensorboardLogger |
|
2023-10-10 23:54:05,284 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-10 23:54:05,284 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,284 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-10 23:54:05,284 - metric: "('micro avg', 'f1-score')" |
|
2023-10-10 23:54:05,284 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,284 Computation: |
|
2023-10-10 23:54:05,284 - compute on device: cuda:0 |
|
2023-10-10 23:54:05,284 - embedding storage: none |
|
2023-10-10 23:54:05,284 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,285 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" |
|
2023-10-10 23:54:05,285 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,285 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:54:05,285 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-10 23:54:16,090 epoch 1 - iter 29/292 - loss 2.85292695 - time (sec): 10.80 - samples/sec: 466.34 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-10 23:54:25,760 epoch 1 - iter 58/292 - loss 2.84278330 - time (sec): 20.47 - samples/sec: 443.15 - lr: 0.000031 - momentum: 0.000000 |
|
2023-10-10 23:54:36,713 epoch 1 - iter 87/292 - loss 2.81776914 - time (sec): 31.43 - samples/sec: 439.22 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-10 23:54:47,841 epoch 1 - iter 116/292 - loss 2.76831466 - time (sec): 42.55 - samples/sec: 427.25 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-10 23:54:58,018 epoch 1 - iter 145/292 - loss 2.68656274 - time (sec): 52.73 - samples/sec: 415.48 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-10 23:55:08,442 epoch 1 - iter 174/292 - loss 2.57976127 - time (sec): 63.16 - samples/sec: 409.34 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-10 23:55:18,924 epoch 1 - iter 203/292 - loss 2.45236738 - time (sec): 73.64 - samples/sec: 413.26 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-10 23:55:28,867 epoch 1 - iter 232/292 - loss 2.33278661 - time (sec): 83.58 - samples/sec: 414.64 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-10 23:55:39,838 epoch 1 - iter 261/292 - loss 2.18255479 - time (sec): 94.55 - samples/sec: 419.36 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-10 23:55:50,363 epoch 1 - iter 290/292 - loss 2.04497265 - time (sec): 105.08 - samples/sec: 421.95 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-10 23:55:50,796 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:55:50,797 EPOCH 1 done: loss 2.0427 - lr: 0.000158 |
|
2023-10-10 23:55:56,831 DEV : loss 0.6711924076080322 - f1-score (micro avg) 0.0 |
|
2023-10-10 23:55:56,840 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:56:06,264 epoch 2 - iter 29/292 - loss 0.72788195 - time (sec): 9.42 - samples/sec: 460.20 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-10 23:56:16,572 epoch 2 - iter 58/292 - loss 0.65066851 - time (sec): 19.73 - samples/sec: 462.75 - lr: 0.000157 - momentum: 0.000000 |
|
2023-10-10 23:56:27,835 epoch 2 - iter 87/292 - loss 0.64893825 - time (sec): 30.99 - samples/sec: 455.63 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-10 23:56:38,177 epoch 2 - iter 116/292 - loss 0.63051086 - time (sec): 41.33 - samples/sec: 429.76 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-10 23:56:49,107 epoch 2 - iter 145/292 - loss 0.59200239 - time (sec): 52.27 - samples/sec: 426.17 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-10 23:56:58,355 epoch 2 - iter 174/292 - loss 0.59108378 - time (sec): 61.51 - samples/sec: 414.48 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-10 23:57:08,769 epoch 2 - iter 203/292 - loss 0.56265754 - time (sec): 71.93 - samples/sec: 422.57 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-10 23:57:18,903 epoch 2 - iter 232/292 - loss 0.52815545 - time (sec): 82.06 - samples/sec: 430.30 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-10 23:57:28,177 epoch 2 - iter 261/292 - loss 0.50476180 - time (sec): 91.33 - samples/sec: 432.41 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-10 23:57:38,370 epoch 2 - iter 290/292 - loss 0.51847967 - time (sec): 101.53 - samples/sec: 435.68 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-10 23:57:38,886 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:57:38,887 EPOCH 2 done: loss 0.5176 - lr: 0.000142 |
|
2023-10-10 23:57:44,863 DEV : loss 0.3161855638027191 - f1-score (micro avg) 0.0623 |
|
2023-10-10 23:57:44,872 saving best model |
|
2023-10-10 23:57:45,807 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:57:54,679 epoch 3 - iter 29/292 - loss 0.41070537 - time (sec): 8.87 - samples/sec: 414.46 - lr: 0.000141 - momentum: 0.000000 |
|
2023-10-10 23:58:03,954 epoch 3 - iter 58/292 - loss 0.34583803 - time (sec): 18.14 - samples/sec: 442.90 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-10 23:58:13,921 epoch 3 - iter 87/292 - loss 0.41258944 - time (sec): 28.11 - samples/sec: 467.28 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-10 23:58:22,598 epoch 3 - iter 116/292 - loss 0.40555208 - time (sec): 36.79 - samples/sec: 453.78 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-10 23:58:32,658 epoch 3 - iter 145/292 - loss 0.38549466 - time (sec): 46.85 - samples/sec: 459.81 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-10 23:58:41,945 epoch 3 - iter 174/292 - loss 0.37158746 - time (sec): 56.14 - samples/sec: 457.91 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-10 23:58:51,749 epoch 3 - iter 203/292 - loss 0.35890713 - time (sec): 65.94 - samples/sec: 458.28 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-10 23:59:01,562 epoch 3 - iter 232/292 - loss 0.34783392 - time (sec): 75.75 - samples/sec: 460.75 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-10 23:59:11,656 epoch 3 - iter 261/292 - loss 0.34033929 - time (sec): 85.85 - samples/sec: 457.48 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-10 23:59:21,987 epoch 3 - iter 290/292 - loss 0.33163979 - time (sec): 96.18 - samples/sec: 459.69 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-10 23:59:22,517 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:59:22,517 EPOCH 3 done: loss 0.3376 - lr: 0.000125 |
|
2023-10-10 23:59:28,109 DEV : loss 0.25150060653686523 - f1-score (micro avg) 0.2521 |
|
2023-10-10 23:59:28,118 saving best model |
|
2023-10-10 23:59:34,996 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 23:59:44,938 epoch 4 - iter 29/292 - loss 0.27546017 - time (sec): 9.94 - samples/sec: 417.80 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-10 23:59:55,490 epoch 4 - iter 58/292 - loss 0.35419864 - time (sec): 20.49 - samples/sec: 421.57 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 00:00:05,886 epoch 4 - iter 87/292 - loss 0.28871423 - time (sec): 30.89 - samples/sec: 423.34 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 00:00:16,538 epoch 4 - iter 116/292 - loss 0.27766491 - time (sec): 41.54 - samples/sec: 420.92 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 00:00:26,064 epoch 4 - iter 145/292 - loss 0.27300843 - time (sec): 51.06 - samples/sec: 417.19 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-11 00:00:36,178 epoch 4 - iter 174/292 - loss 0.26748700 - time (sec): 61.18 - samples/sec: 418.89 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 00:00:45,939 epoch 4 - iter 203/292 - loss 0.25642219 - time (sec): 70.94 - samples/sec: 425.81 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 00:00:55,685 epoch 4 - iter 232/292 - loss 0.25338406 - time (sec): 80.68 - samples/sec: 426.43 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 00:01:06,318 epoch 4 - iter 261/292 - loss 0.25874665 - time (sec): 91.32 - samples/sec: 429.89 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 00:01:18,053 epoch 4 - iter 290/292 - loss 0.25322113 - time (sec): 103.05 - samples/sec: 430.08 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 00:01:18,507 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:01:18,508 EPOCH 4 done: loss 0.2532 - lr: 0.000107 |
|
2023-10-11 00:01:24,255 DEV : loss 0.19647738337516785 - f1-score (micro avg) 0.4458 |
|
2023-10-11 00:01:24,264 saving best model |
|
2023-10-11 00:01:30,995 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:01:41,291 epoch 5 - iter 29/292 - loss 0.20651694 - time (sec): 10.29 - samples/sec: 418.24 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 00:01:52,132 epoch 5 - iter 58/292 - loss 0.17543958 - time (sec): 21.13 - samples/sec: 428.09 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 00:02:02,418 epoch 5 - iter 87/292 - loss 0.17084180 - time (sec): 31.42 - samples/sec: 420.60 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-11 00:02:12,891 epoch 5 - iter 116/292 - loss 0.16868452 - time (sec): 41.89 - samples/sec: 421.77 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 00:02:22,341 epoch 5 - iter 145/292 - loss 0.16960028 - time (sec): 51.34 - samples/sec: 422.58 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 00:02:33,183 epoch 5 - iter 174/292 - loss 0.18355358 - time (sec): 62.18 - samples/sec: 440.39 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-11 00:02:42,823 epoch 5 - iter 203/292 - loss 0.17979713 - time (sec): 71.82 - samples/sec: 443.66 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-11 00:02:53,349 epoch 5 - iter 232/292 - loss 0.17660361 - time (sec): 82.35 - samples/sec: 442.96 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 00:03:02,616 epoch 5 - iter 261/292 - loss 0.17489136 - time (sec): 91.62 - samples/sec: 438.13 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 00:03:12,400 epoch 5 - iter 290/292 - loss 0.17466389 - time (sec): 101.40 - samples/sec: 436.72 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 00:03:12,893 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:03:12,893 EPOCH 5 done: loss 0.1748 - lr: 0.000089 |
|
2023-10-11 00:03:18,827 DEV : loss 0.16777318716049194 - f1-score (micro avg) 0.5745 |
|
2023-10-11 00:03:18,837 saving best model |
|
2023-10-11 00:03:26,400 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:03:36,472 epoch 6 - iter 29/292 - loss 0.11628886 - time (sec): 10.07 - samples/sec: 478.62 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 00:03:45,925 epoch 6 - iter 58/292 - loss 0.12246444 - time (sec): 19.52 - samples/sec: 460.68 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 00:03:55,483 epoch 6 - iter 87/292 - loss 0.11954585 - time (sec): 29.08 - samples/sec: 456.69 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 00:04:05,383 epoch 6 - iter 116/292 - loss 0.12489577 - time (sec): 38.98 - samples/sec: 449.30 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 00:04:14,806 epoch 6 - iter 145/292 - loss 0.12818154 - time (sec): 48.40 - samples/sec: 450.02 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 00:04:24,229 epoch 6 - iter 174/292 - loss 0.13073488 - time (sec): 57.82 - samples/sec: 448.58 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 00:04:34,811 epoch 6 - iter 203/292 - loss 0.13015890 - time (sec): 68.41 - samples/sec: 457.91 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 00:04:44,607 epoch 6 - iter 232/292 - loss 0.13300589 - time (sec): 78.20 - samples/sec: 452.20 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 00:04:54,487 epoch 6 - iter 261/292 - loss 0.12882566 - time (sec): 88.08 - samples/sec: 451.32 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 00:05:03,962 epoch 6 - iter 290/292 - loss 0.12651322 - time (sec): 97.56 - samples/sec: 451.68 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 00:05:04,643 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:05:04,643 EPOCH 6 done: loss 0.1256 - lr: 0.000071 |
|
2023-10-11 00:05:10,396 DEV : loss 0.16338036954402924 - f1-score (micro avg) 0.6681 |
|
2023-10-11 00:05:10,405 saving best model |
|
2023-10-11 00:05:17,046 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:05:26,299 epoch 7 - iter 29/292 - loss 0.10546819 - time (sec): 9.25 - samples/sec: 474.01 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 00:05:36,757 epoch 7 - iter 58/292 - loss 0.09332528 - time (sec): 19.71 - samples/sec: 479.39 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 00:05:45,596 epoch 7 - iter 87/292 - loss 0.08784883 - time (sec): 28.55 - samples/sec: 458.94 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-11 00:05:55,071 epoch 7 - iter 116/292 - loss 0.09824998 - time (sec): 38.02 - samples/sec: 454.65 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 00:06:04,287 epoch 7 - iter 145/292 - loss 0.09962287 - time (sec): 47.24 - samples/sec: 437.76 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 00:06:14,804 epoch 7 - iter 174/292 - loss 0.09942096 - time (sec): 57.75 - samples/sec: 440.28 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 00:06:25,173 epoch 7 - iter 203/292 - loss 0.09994166 - time (sec): 68.12 - samples/sec: 444.26 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 00:06:35,887 epoch 7 - iter 232/292 - loss 0.09925328 - time (sec): 78.84 - samples/sec: 445.93 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 00:06:45,736 epoch 7 - iter 261/292 - loss 0.09753226 - time (sec): 88.69 - samples/sec: 445.91 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 00:06:56,594 epoch 7 - iter 290/292 - loss 0.09556655 - time (sec): 99.54 - samples/sec: 444.81 - lr: 0.000054 - momentum: 0.000000 |
|
2023-10-11 00:06:57,108 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:06:57,109 EPOCH 7 done: loss 0.0954 - lr: 0.000054 |
|
2023-10-11 00:07:03,477 DEV : loss 0.15695013105869293 - f1-score (micro avg) 0.7137 |
|
2023-10-11 00:07:03,487 saving best model |
|
2023-10-11 00:07:11,348 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:07:22,503 epoch 8 - iter 29/292 - loss 0.07758372 - time (sec): 11.15 - samples/sec: 390.65 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 00:07:32,880 epoch 8 - iter 58/292 - loss 0.09159419 - time (sec): 21.53 - samples/sec: 397.40 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 00:07:44,317 epoch 8 - iter 87/292 - loss 0.07999343 - time (sec): 32.96 - samples/sec: 396.79 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 00:07:55,122 epoch 8 - iter 116/292 - loss 0.08349239 - time (sec): 43.77 - samples/sec: 399.24 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-11 00:08:05,305 epoch 8 - iter 145/292 - loss 0.08024406 - time (sec): 53.95 - samples/sec: 404.48 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 00:08:14,966 epoch 8 - iter 174/292 - loss 0.08260764 - time (sec): 63.61 - samples/sec: 397.69 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 00:08:26,513 epoch 8 - iter 203/292 - loss 0.07913294 - time (sec): 75.16 - samples/sec: 407.83 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-11 00:08:36,850 epoch 8 - iter 232/292 - loss 0.07779059 - time (sec): 85.50 - samples/sec: 406.23 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 00:08:47,631 epoch 8 - iter 261/292 - loss 0.07763753 - time (sec): 96.28 - samples/sec: 415.71 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 00:08:57,163 epoch 8 - iter 290/292 - loss 0.07708456 - time (sec): 105.81 - samples/sec: 418.87 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-11 00:08:57,561 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:08:57,562 EPOCH 8 done: loss 0.0769 - lr: 0.000036 |
|
2023-10-11 00:09:03,594 DEV : loss 0.15200063586235046 - f1-score (micro avg) 0.7158 |
|
2023-10-11 00:09:03,603 saving best model |
|
2023-10-11 00:09:10,900 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:09:20,671 epoch 9 - iter 29/292 - loss 0.07231048 - time (sec): 9.77 - samples/sec: 450.13 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 00:09:29,933 epoch 9 - iter 58/292 - loss 0.06442967 - time (sec): 19.03 - samples/sec: 445.71 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 00:09:38,821 epoch 9 - iter 87/292 - loss 0.07412397 - time (sec): 27.92 - samples/sec: 431.82 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 00:09:49,204 epoch 9 - iter 116/292 - loss 0.06959765 - time (sec): 38.30 - samples/sec: 441.31 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 00:10:00,330 epoch 9 - iter 145/292 - loss 0.06728567 - time (sec): 49.43 - samples/sec: 448.23 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 00:10:09,929 epoch 9 - iter 174/292 - loss 0.06657777 - time (sec): 59.03 - samples/sec: 442.53 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 00:10:19,909 epoch 9 - iter 203/292 - loss 0.06542487 - time (sec): 69.01 - samples/sec: 446.84 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 00:10:29,409 epoch 9 - iter 232/292 - loss 0.06463835 - time (sec): 78.51 - samples/sec: 444.23 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 00:10:39,706 epoch 9 - iter 261/292 - loss 0.06426085 - time (sec): 88.80 - samples/sec: 446.58 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 00:10:50,193 epoch 9 - iter 290/292 - loss 0.06250526 - time (sec): 99.29 - samples/sec: 446.07 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 00:10:50,659 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:10:50,659 EPOCH 9 done: loss 0.0624 - lr: 0.000018 |
|
2023-10-11 00:10:56,556 DEV : loss 0.14497257769107819 - f1-score (micro avg) 0.73 |
|
2023-10-11 00:10:56,566 saving best model |
|
2023-10-11 00:11:01,168 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:11:11,357 epoch 10 - iter 29/292 - loss 0.06224818 - time (sec): 10.18 - samples/sec: 419.09 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 00:11:22,403 epoch 10 - iter 58/292 - loss 0.06532578 - time (sec): 21.23 - samples/sec: 425.84 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 00:11:33,271 epoch 10 - iter 87/292 - loss 0.05906433 - time (sec): 32.10 - samples/sec: 404.26 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 00:11:44,384 epoch 10 - iter 116/292 - loss 0.05573679 - time (sec): 43.21 - samples/sec: 400.72 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-11 00:11:55,480 epoch 10 - iter 145/292 - loss 0.05252369 - time (sec): 54.31 - samples/sec: 396.67 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-11 00:12:07,290 epoch 10 - iter 174/292 - loss 0.05442054 - time (sec): 66.12 - samples/sec: 398.00 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 00:12:19,426 epoch 10 - iter 203/292 - loss 0.05703866 - time (sec): 78.25 - samples/sec: 400.09 - lr: 0.000006 - momentum: 0.000000 |
|
2023-10-11 00:12:30,207 epoch 10 - iter 232/292 - loss 0.05634774 - time (sec): 89.03 - samples/sec: 394.96 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-11 00:12:42,095 epoch 10 - iter 261/292 - loss 0.05627346 - time (sec): 100.92 - samples/sec: 399.12 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 00:12:52,967 epoch 10 - iter 290/292 - loss 0.05708459 - time (sec): 111.79 - samples/sec: 396.06 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 00:12:53,475 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:12:53,476 EPOCH 10 done: loss 0.0569 - lr: 0.000000 |
|
2023-10-11 00:12:59,917 DEV : loss 0.14803124964237213 - f1-score (micro avg) 0.7406 |
|
2023-10-11 00:12:59,927 saving best model |
|
2023-10-11 00:13:03,728 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 00:13:03,730 Loading model from best epoch ... |
|
2023-10-11 00:13:07,606 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-11 00:13:22,595 |
|
Results: |
|
- F-score (micro) 0.6983 |
|
- F-score (macro) 0.6233 |
|
- Accuracy 0.5556 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
PER 0.7487 0.8305 0.7875 348 |
|
LOC 0.5710 0.7395 0.6444 261 |
|
ORG 0.3830 0.3462 0.3636 52 |
|
HumanProd 0.7143 0.6818 0.6977 22 |
|
|
|
micro avg 0.6503 0.7540 0.6983 683 |
|
macro avg 0.6042 0.6495 0.6233 683 |
|
weighted avg 0.6518 0.7540 0.6976 683 |
|
|
|
2023-10-11 00:13:22,595 ---------------------------------------------------------------------------------------------------- |
|
|