|
2023-10-12 18:58:20,800 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,802 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=13, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-12 18:58:20,803 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,803 MultiCorpus: 5777 train + 722 dev + 723 test sentences |
|
- NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl |
|
2023-10-12 18:58:20,803 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,803 Train: 5777 sentences |
|
2023-10-12 18:58:20,803 (train_with_dev=False, train_with_test=False) |
|
2023-10-12 18:58:20,803 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,803 Training Params: |
|
2023-10-12 18:58:20,803 - learning_rate: "0.00016" |
|
2023-10-12 18:58:20,803 - mini_batch_size: "8" |
|
2023-10-12 18:58:20,803 - max_epochs: "10" |
|
2023-10-12 18:58:20,804 - shuffle: "True" |
|
2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,804 Plugins: |
|
2023-10-12 18:58:20,804 - TensorboardLogger |
|
2023-10-12 18:58:20,804 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,804 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-12 18:58:20,804 - metric: "('micro avg', 'f1-score')" |
|
2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,804 Computation: |
|
2023-10-12 18:58:20,804 - compute on device: cuda:0 |
|
2023-10-12 18:58:20,804 - embedding storage: none |
|
2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,804 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" |
|
2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,804 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:58:20,805 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-12 18:59:00,054 epoch 1 - iter 72/723 - loss 2.54355580 - time (sec): 39.25 - samples/sec: 460.13 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-12 18:59:38,638 epoch 1 - iter 144/723 - loss 2.47351916 - time (sec): 77.83 - samples/sec: 463.81 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-12 19:00:17,358 epoch 1 - iter 216/723 - loss 2.30912831 - time (sec): 116.55 - samples/sec: 449.38 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-12 19:00:59,860 epoch 1 - iter 288/723 - loss 2.09160479 - time (sec): 159.05 - samples/sec: 440.50 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-12 19:01:41,319 epoch 1 - iter 360/723 - loss 1.86564000 - time (sec): 200.51 - samples/sec: 438.01 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-12 19:02:21,012 epoch 1 - iter 432/723 - loss 1.64989151 - time (sec): 240.21 - samples/sec: 435.87 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-12 19:03:00,546 epoch 1 - iter 504/723 - loss 1.45108438 - time (sec): 279.74 - samples/sec: 437.65 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-12 19:03:39,300 epoch 1 - iter 576/723 - loss 1.30837807 - time (sec): 318.49 - samples/sec: 437.09 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-12 19:04:19,048 epoch 1 - iter 648/723 - loss 1.18554857 - time (sec): 358.24 - samples/sec: 438.05 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-12 19:05:00,227 epoch 1 - iter 720/723 - loss 1.07844162 - time (sec): 399.42 - samples/sec: 439.29 - lr: 0.000159 - momentum: 0.000000 |
|
2023-10-12 19:05:01,624 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:05:01,624 EPOCH 1 done: loss 1.0744 - lr: 0.000159 |
|
2023-10-12 19:05:21,904 DEV : loss 0.2226296067237854 - f1-score (micro avg) 0.0 |
|
2023-10-12 19:05:21,937 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:06:00,883 epoch 2 - iter 72/723 - loss 0.16071835 - time (sec): 38.94 - samples/sec: 447.92 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-12 19:06:40,410 epoch 2 - iter 144/723 - loss 0.14928826 - time (sec): 78.47 - samples/sec: 451.95 - lr: 0.000156 - momentum: 0.000000 |
|
2023-10-12 19:07:19,001 epoch 2 - iter 216/723 - loss 0.14640929 - time (sec): 117.06 - samples/sec: 444.07 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-12 19:07:58,952 epoch 2 - iter 288/723 - loss 0.14351447 - time (sec): 157.01 - samples/sec: 444.43 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-12 19:08:37,927 epoch 2 - iter 360/723 - loss 0.13927936 - time (sec): 195.99 - samples/sec: 445.13 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-12 19:09:17,408 epoch 2 - iter 432/723 - loss 0.13562770 - time (sec): 235.47 - samples/sec: 445.08 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-12 19:09:56,251 epoch 2 - iter 504/723 - loss 0.13508475 - time (sec): 274.31 - samples/sec: 445.09 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-12 19:10:35,074 epoch 2 - iter 576/723 - loss 0.13232611 - time (sec): 313.13 - samples/sec: 446.82 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-12 19:11:14,451 epoch 2 - iter 648/723 - loss 0.12941887 - time (sec): 352.51 - samples/sec: 448.75 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-12 19:11:53,004 epoch 2 - iter 720/723 - loss 0.12559223 - time (sec): 391.07 - samples/sec: 449.04 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-12 19:11:54,252 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:11:54,252 EPOCH 2 done: loss 0.1254 - lr: 0.000142 |
|
2023-10-12 19:12:15,858 DEV : loss 0.10742620378732681 - f1-score (micro avg) 0.7805 |
|
2023-10-12 19:12:15,891 saving best model |
|
2023-10-12 19:12:16,807 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:12:56,695 epoch 3 - iter 72/723 - loss 0.08288091 - time (sec): 39.89 - samples/sec: 448.88 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-12 19:13:36,974 epoch 3 - iter 144/723 - loss 0.07985183 - time (sec): 80.16 - samples/sec: 448.64 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-12 19:14:16,040 epoch 3 - iter 216/723 - loss 0.07952059 - time (sec): 119.23 - samples/sec: 448.91 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-12 19:14:54,814 epoch 3 - iter 288/723 - loss 0.07742485 - time (sec): 158.01 - samples/sec: 449.88 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-12 19:15:33,909 epoch 3 - iter 360/723 - loss 0.07655356 - time (sec): 197.10 - samples/sec: 451.75 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-12 19:16:13,793 epoch 3 - iter 432/723 - loss 0.07627322 - time (sec): 236.98 - samples/sec: 454.76 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-12 19:16:52,959 epoch 3 - iter 504/723 - loss 0.07598384 - time (sec): 276.15 - samples/sec: 453.26 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-12 19:17:32,309 epoch 3 - iter 576/723 - loss 0.07582371 - time (sec): 315.50 - samples/sec: 450.64 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-12 19:18:11,512 epoch 3 - iter 648/723 - loss 0.07469627 - time (sec): 354.70 - samples/sec: 447.48 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-12 19:18:50,760 epoch 3 - iter 720/723 - loss 0.07352052 - time (sec): 393.95 - samples/sec: 445.53 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-12 19:18:52,075 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:18:52,075 EPOCH 3 done: loss 0.0736 - lr: 0.000125 |
|
2023-10-12 19:19:13,745 DEV : loss 0.07580851018428802 - f1-score (micro avg) 0.8611 |
|
2023-10-12 19:19:13,776 saving best model |
|
2023-10-12 19:19:24,857 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:20:05,578 epoch 4 - iter 72/723 - loss 0.05191155 - time (sec): 40.72 - samples/sec: 440.48 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-12 19:20:44,504 epoch 4 - iter 144/723 - loss 0.05203988 - time (sec): 79.64 - samples/sec: 436.31 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-12 19:21:22,936 epoch 4 - iter 216/723 - loss 0.04893082 - time (sec): 118.07 - samples/sec: 444.03 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-12 19:22:01,345 epoch 4 - iter 288/723 - loss 0.04936358 - time (sec): 156.48 - samples/sec: 456.68 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-12 19:22:39,287 epoch 4 - iter 360/723 - loss 0.04704445 - time (sec): 194.43 - samples/sec: 458.90 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-12 19:23:17,810 epoch 4 - iter 432/723 - loss 0.04632235 - time (sec): 232.95 - samples/sec: 455.86 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-12 19:23:57,203 epoch 4 - iter 504/723 - loss 0.04588963 - time (sec): 272.34 - samples/sec: 452.32 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-12 19:24:36,716 epoch 4 - iter 576/723 - loss 0.04525655 - time (sec): 311.85 - samples/sec: 453.47 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-12 19:25:18,379 epoch 4 - iter 648/723 - loss 0.04859378 - time (sec): 353.52 - samples/sec: 449.36 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-12 19:25:56,777 epoch 4 - iter 720/723 - loss 0.04735869 - time (sec): 391.92 - samples/sec: 448.64 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-12 19:25:57,841 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:25:57,841 EPOCH 4 done: loss 0.0475 - lr: 0.000107 |
|
2023-10-12 19:26:19,114 DEV : loss 0.09613429009914398 - f1-score (micro avg) 0.8346 |
|
2023-10-12 19:26:19,147 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:26:59,987 epoch 5 - iter 72/723 - loss 0.03592579 - time (sec): 40.84 - samples/sec: 462.23 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-12 19:27:38,528 epoch 5 - iter 144/723 - loss 0.03119838 - time (sec): 79.38 - samples/sec: 457.18 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-12 19:28:16,016 epoch 5 - iter 216/723 - loss 0.03042988 - time (sec): 116.87 - samples/sec: 443.70 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-12 19:28:54,004 epoch 5 - iter 288/723 - loss 0.03005227 - time (sec): 154.85 - samples/sec: 439.46 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-12 19:29:34,928 epoch 5 - iter 360/723 - loss 0.03217606 - time (sec): 195.78 - samples/sec: 442.46 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-12 19:30:14,684 epoch 5 - iter 432/723 - loss 0.03112412 - time (sec): 235.53 - samples/sec: 441.59 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-12 19:30:55,240 epoch 5 - iter 504/723 - loss 0.03176554 - time (sec): 276.09 - samples/sec: 443.79 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-12 19:31:34,003 epoch 5 - iter 576/723 - loss 0.03180365 - time (sec): 314.85 - samples/sec: 445.61 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-12 19:32:13,719 epoch 5 - iter 648/723 - loss 0.03213951 - time (sec): 354.57 - samples/sec: 445.33 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-12 19:32:54,951 epoch 5 - iter 720/723 - loss 0.03253080 - time (sec): 395.80 - samples/sec: 443.07 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-12 19:32:56,682 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:32:56,683 EPOCH 5 done: loss 0.0326 - lr: 0.000089 |
|
2023-10-12 19:33:18,519 DEV : loss 0.08075438439846039 - f1-score (micro avg) 0.8604 |
|
2023-10-12 19:33:18,549 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:33:57,261 epoch 6 - iter 72/723 - loss 0.02372279 - time (sec): 38.71 - samples/sec: 444.51 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-12 19:34:36,141 epoch 6 - iter 144/723 - loss 0.02332990 - time (sec): 77.59 - samples/sec: 445.00 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-12 19:35:15,551 epoch 6 - iter 216/723 - loss 0.02529176 - time (sec): 117.00 - samples/sec: 447.90 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-12 19:35:56,603 epoch 6 - iter 288/723 - loss 0.02502072 - time (sec): 158.05 - samples/sec: 445.49 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-12 19:36:37,040 epoch 6 - iter 360/723 - loss 0.02478190 - time (sec): 198.49 - samples/sec: 444.01 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-12 19:37:16,368 epoch 6 - iter 432/723 - loss 0.02272948 - time (sec): 237.82 - samples/sec: 448.09 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-12 19:37:55,143 epoch 6 - iter 504/723 - loss 0.02482061 - time (sec): 276.59 - samples/sec: 449.75 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-12 19:38:32,812 epoch 6 - iter 576/723 - loss 0.02360557 - time (sec): 314.26 - samples/sec: 448.53 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-12 19:39:11,274 epoch 6 - iter 648/723 - loss 0.02300141 - time (sec): 352.72 - samples/sec: 446.97 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-12 19:39:53,495 epoch 6 - iter 720/723 - loss 0.02395673 - time (sec): 394.94 - samples/sec: 444.81 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-12 19:39:54,699 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:39:54,700 EPOCH 6 done: loss 0.0240 - lr: 0.000071 |
|
2023-10-12 19:40:16,957 DEV : loss 0.09915146231651306 - f1-score (micro avg) 0.8614 |
|
2023-10-12 19:40:16,992 saving best model |
|
2023-10-12 19:40:19,580 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:41:00,926 epoch 7 - iter 72/723 - loss 0.02262591 - time (sec): 41.34 - samples/sec: 426.67 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-12 19:41:42,902 epoch 7 - iter 144/723 - loss 0.02190572 - time (sec): 83.32 - samples/sec: 426.99 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-12 19:42:24,325 epoch 7 - iter 216/723 - loss 0.02121091 - time (sec): 124.74 - samples/sec: 417.62 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-12 19:43:04,878 epoch 7 - iter 288/723 - loss 0.02062404 - time (sec): 165.29 - samples/sec: 415.31 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-12 19:43:46,681 epoch 7 - iter 360/723 - loss 0.02239849 - time (sec): 207.10 - samples/sec: 419.10 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-12 19:44:28,351 epoch 7 - iter 432/723 - loss 0.02093754 - time (sec): 248.77 - samples/sec: 418.93 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-12 19:45:08,141 epoch 7 - iter 504/723 - loss 0.02078108 - time (sec): 288.56 - samples/sec: 422.55 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-12 19:45:47,271 epoch 7 - iter 576/723 - loss 0.02052113 - time (sec): 327.69 - samples/sec: 424.00 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-12 19:46:26,176 epoch 7 - iter 648/723 - loss 0.01984905 - time (sec): 366.59 - samples/sec: 426.95 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-12 19:47:06,241 epoch 7 - iter 720/723 - loss 0.01954505 - time (sec): 406.66 - samples/sec: 431.54 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-12 19:47:07,629 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:47:07,630 EPOCH 7 done: loss 0.0195 - lr: 0.000053 |
|
2023-10-12 19:47:29,089 DEV : loss 0.10769647359848022 - f1-score (micro avg) 0.8602 |
|
2023-10-12 19:47:29,124 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:48:08,516 epoch 8 - iter 72/723 - loss 0.01456090 - time (sec): 39.39 - samples/sec: 471.19 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-12 19:48:47,249 epoch 8 - iter 144/723 - loss 0.01638180 - time (sec): 78.12 - samples/sec: 459.65 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-12 19:49:25,637 epoch 8 - iter 216/723 - loss 0.01560092 - time (sec): 116.51 - samples/sec: 453.37 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-12 19:50:05,829 epoch 8 - iter 288/723 - loss 0.01509465 - time (sec): 156.70 - samples/sec: 459.02 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-12 19:50:44,804 epoch 8 - iter 360/723 - loss 0.01478213 - time (sec): 195.68 - samples/sec: 456.37 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-12 19:51:23,381 epoch 8 - iter 432/723 - loss 0.01528980 - time (sec): 234.25 - samples/sec: 451.48 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-12 19:52:02,332 epoch 8 - iter 504/723 - loss 0.01593641 - time (sec): 273.21 - samples/sec: 450.38 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-12 19:52:41,277 epoch 8 - iter 576/723 - loss 0.01546593 - time (sec): 312.15 - samples/sec: 446.97 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-12 19:53:22,174 epoch 8 - iter 648/723 - loss 0.01707868 - time (sec): 353.05 - samples/sec: 447.25 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-12 19:54:02,545 epoch 8 - iter 720/723 - loss 0.01621935 - time (sec): 393.42 - samples/sec: 446.66 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-12 19:54:03,704 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:54:03,705 EPOCH 8 done: loss 0.0162 - lr: 0.000036 |
|
2023-10-12 19:54:24,898 DEV : loss 0.11788733303546906 - f1-score (micro avg) 0.8613 |
|
2023-10-12 19:54:24,929 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:55:04,405 epoch 9 - iter 72/723 - loss 0.00435665 - time (sec): 39.47 - samples/sec: 466.21 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-12 19:55:43,405 epoch 9 - iter 144/723 - loss 0.01561935 - time (sec): 78.47 - samples/sec: 474.03 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-12 19:56:21,332 epoch 9 - iter 216/723 - loss 0.01515669 - time (sec): 116.40 - samples/sec: 472.95 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-12 19:56:58,414 epoch 9 - iter 288/723 - loss 0.01423773 - time (sec): 153.48 - samples/sec: 463.50 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-12 19:57:36,161 epoch 9 - iter 360/723 - loss 0.01346231 - time (sec): 191.23 - samples/sec: 454.95 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-12 19:58:16,429 epoch 9 - iter 432/723 - loss 0.01303858 - time (sec): 231.50 - samples/sec: 453.21 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-12 19:58:56,064 epoch 9 - iter 504/723 - loss 0.01320394 - time (sec): 271.13 - samples/sec: 451.94 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-12 19:59:36,961 epoch 9 - iter 576/723 - loss 0.01367903 - time (sec): 312.03 - samples/sec: 453.35 - lr: 0.000021 - momentum: 0.000000 |
|
2023-10-12 20:00:16,437 epoch 9 - iter 648/723 - loss 0.01284167 - time (sec): 351.51 - samples/sec: 450.96 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-12 20:00:56,177 epoch 9 - iter 720/723 - loss 0.01273017 - time (sec): 391.25 - samples/sec: 449.00 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-12 20:00:57,336 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:00:57,336 EPOCH 9 done: loss 0.0127 - lr: 0.000018 |
|
2023-10-12 20:01:18,773 DEV : loss 0.11393096297979355 - f1-score (micro avg) 0.8665 |
|
2023-10-12 20:01:18,808 saving best model |
|
2023-10-12 20:01:23,910 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:02:03,031 epoch 10 - iter 72/723 - loss 0.00615811 - time (sec): 39.12 - samples/sec: 460.52 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-12 20:02:41,238 epoch 10 - iter 144/723 - loss 0.00676133 - time (sec): 77.32 - samples/sec: 436.23 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-12 20:03:20,138 epoch 10 - iter 216/723 - loss 0.00881176 - time (sec): 116.22 - samples/sec: 436.14 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-12 20:04:00,711 epoch 10 - iter 288/723 - loss 0.01105617 - time (sec): 156.80 - samples/sec: 440.79 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-12 20:04:40,165 epoch 10 - iter 360/723 - loss 0.01009560 - time (sec): 196.25 - samples/sec: 439.76 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-12 20:05:21,218 epoch 10 - iter 432/723 - loss 0.00922833 - time (sec): 237.30 - samples/sec: 441.24 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-12 20:06:01,972 epoch 10 - iter 504/723 - loss 0.00991438 - time (sec): 278.06 - samples/sec: 443.34 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-12 20:06:40,462 epoch 10 - iter 576/723 - loss 0.00967514 - time (sec): 316.55 - samples/sec: 442.17 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-12 20:07:19,416 epoch 10 - iter 648/723 - loss 0.00999608 - time (sec): 355.50 - samples/sec: 443.56 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-12 20:07:58,652 epoch 10 - iter 720/723 - loss 0.00993301 - time (sec): 394.74 - samples/sec: 445.20 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-12 20:07:59,756 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:07:59,756 EPOCH 10 done: loss 0.0099 - lr: 0.000000 |
|
2023-10-12 20:08:21,111 DEV : loss 0.11986048519611359 - f1-score (micro avg) 0.8657 |
|
2023-10-12 20:08:21,984 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:08:21,986 Loading model from best epoch ... |
|
2023-10-12 20:08:26,181 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG |
|
2023-10-12 20:08:47,461 |
|
Results: |
|
- F-score (micro) 0.8641 |
|
- F-score (macro) 0.7722 |
|
- Accuracy 0.7736 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
PER 0.8645 0.8734 0.8689 482 |
|
LOC 0.9154 0.8974 0.9063 458 |
|
ORG 0.5625 0.5217 0.5414 69 |
|
|
|
micro avg 0.8680 0.8603 0.8641 1009 |
|
macro avg 0.7808 0.7642 0.7722 1009 |
|
weighted avg 0.8669 0.8603 0.8635 1009 |
|
|
|
2023-10-12 20:08:47,461 ---------------------------------------------------------------------------------------------------- |
|
|