Upload folder using huggingface_hub
Browse files- best-model.pt +3 -0
- dev.tsv +0 -0
- final-model.pt +3 -0
- loss.tsv +11 -0
- runs/events.out.tfevents.1697077150.6d4c7681f95b.1253.14 +3 -0
- test.tsv +0 -0
- training.log +264 -0
best-model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d73122cc85eabb2c8e138e02799c3d71b22710b23003ce1c289facbe45ca28bb
|
3 |
+
size 870817519
|
dev.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
final-model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:531e0026cb5f9bce97aedf428c29f02d57146ae7beaaac9059ca5837f21c5c55
|
3 |
+
size 870817636
|
loss.tsv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
|
2 |
+
1 02:42:52 0.0001 0.7433 0.1264 0.2155 0.3902 0.2776 0.1612
|
3 |
+
2 03:07:05 0.0001 0.1446 0.1417 0.2918 0.3902 0.3339 0.2010
|
4 |
+
3 03:31:15 0.0001 0.0981 0.2601 0.2530 0.6345 0.3618 0.2224
|
5 |
+
4 03:55:45 0.0001 0.0665 0.3102 0.2560 0.5814 0.3555 0.2171
|
6 |
+
5 04:20:13 0.0001 0.0464 0.3113 0.3113 0.5606 0.4003 0.2519
|
7 |
+
6 04:44:50 0.0001 0.0330 0.4072 0.2902 0.6288 0.3971 0.2494
|
8 |
+
7 05:09:04 0.0001 0.0232 0.4082 0.3015 0.6098 0.4035 0.2543
|
9 |
+
8 05:32:58 0.0000 0.0164 0.4336 0.3006 0.6155 0.4040 0.2543
|
10 |
+
9 05:57:30 0.0000 0.0109 0.4758 0.2941 0.6004 0.3948 0.2475
|
11 |
+
10 06:21:43 0.0000 0.0072 0.4922 0.2891 0.6023 0.3907 0.2441
|
runs/events.out.tfevents.1697077150.6d4c7681f95b.1253.14
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dd916826dcdda4e92192cfe7df8b3fa0cd8f2af49a3635d4fa327312be52fbb2
|
3 |
+
size 2923780
|
test.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
training.log
ADDED
@@ -0,0 +1,264 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2023-10-12 02:19:10,693 ----------------------------------------------------------------------------------------------------
|
2 |
+
2023-10-12 02:19:10,695 Model: "SequenceTagger(
|
3 |
+
(embeddings): ByT5Embeddings(
|
4 |
+
(model): T5EncoderModel(
|
5 |
+
(shared): Embedding(384, 1472)
|
6 |
+
(encoder): T5Stack(
|
7 |
+
(embed_tokens): Embedding(384, 1472)
|
8 |
+
(block): ModuleList(
|
9 |
+
(0): T5Block(
|
10 |
+
(layer): ModuleList(
|
11 |
+
(0): T5LayerSelfAttention(
|
12 |
+
(SelfAttention): T5Attention(
|
13 |
+
(q): Linear(in_features=1472, out_features=384, bias=False)
|
14 |
+
(k): Linear(in_features=1472, out_features=384, bias=False)
|
15 |
+
(v): Linear(in_features=1472, out_features=384, bias=False)
|
16 |
+
(o): Linear(in_features=384, out_features=1472, bias=False)
|
17 |
+
(relative_attention_bias): Embedding(32, 6)
|
18 |
+
)
|
19 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
21 |
+
)
|
22 |
+
(1): T5LayerFF(
|
23 |
+
(DenseReluDense): T5DenseGatedActDense(
|
24 |
+
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
|
25 |
+
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
|
26 |
+
(wo): Linear(in_features=3584, out_features=1472, bias=False)
|
27 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
28 |
+
(act): NewGELUActivation()
|
29 |
+
)
|
30 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
31 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
32 |
+
)
|
33 |
+
)
|
34 |
+
)
|
35 |
+
(1-11): 11 x T5Block(
|
36 |
+
(layer): ModuleList(
|
37 |
+
(0): T5LayerSelfAttention(
|
38 |
+
(SelfAttention): T5Attention(
|
39 |
+
(q): Linear(in_features=1472, out_features=384, bias=False)
|
40 |
+
(k): Linear(in_features=1472, out_features=384, bias=False)
|
41 |
+
(v): Linear(in_features=1472, out_features=384, bias=False)
|
42 |
+
(o): Linear(in_features=384, out_features=1472, bias=False)
|
43 |
+
)
|
44 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
45 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
46 |
+
)
|
47 |
+
(1): T5LayerFF(
|
48 |
+
(DenseReluDense): T5DenseGatedActDense(
|
49 |
+
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
|
50 |
+
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
|
51 |
+
(wo): Linear(in_features=3584, out_features=1472, bias=False)
|
52 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
53 |
+
(act): NewGELUActivation()
|
54 |
+
)
|
55 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
56 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
57 |
+
)
|
58 |
+
)
|
59 |
+
)
|
60 |
+
)
|
61 |
+
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
62 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
63 |
+
)
|
64 |
+
)
|
65 |
+
)
|
66 |
+
(locked_dropout): LockedDropout(p=0.5)
|
67 |
+
(linear): Linear(in_features=1472, out_features=17, bias=True)
|
68 |
+
(loss_function): CrossEntropyLoss()
|
69 |
+
)"
|
70 |
+
2023-10-12 02:19:10,696 ----------------------------------------------------------------------------------------------------
|
71 |
+
2023-10-12 02:19:10,696 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
|
72 |
+
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
|
73 |
+
2023-10-12 02:19:10,696 ----------------------------------------------------------------------------------------------------
|
74 |
+
2023-10-12 02:19:10,696 Train: 20847 sentences
|
75 |
+
2023-10-12 02:19:10,696 (train_with_dev=False, train_with_test=False)
|
76 |
+
2023-10-12 02:19:10,696 ----------------------------------------------------------------------------------------------------
|
77 |
+
2023-10-12 02:19:10,696 Training Params:
|
78 |
+
2023-10-12 02:19:10,696 - learning_rate: "0.00015"
|
79 |
+
2023-10-12 02:19:10,696 - mini_batch_size: "4"
|
80 |
+
2023-10-12 02:19:10,696 - max_epochs: "10"
|
81 |
+
2023-10-12 02:19:10,696 - shuffle: "True"
|
82 |
+
2023-10-12 02:19:10,696 ----------------------------------------------------------------------------------------------------
|
83 |
+
2023-10-12 02:19:10,696 Plugins:
|
84 |
+
2023-10-12 02:19:10,697 - TensorboardLogger
|
85 |
+
2023-10-12 02:19:10,697 - LinearScheduler | warmup_fraction: '0.1'
|
86 |
+
2023-10-12 02:19:10,697 ----------------------------------------------------------------------------------------------------
|
87 |
+
2023-10-12 02:19:10,697 Final evaluation on model from best epoch (best-model.pt)
|
88 |
+
2023-10-12 02:19:10,697 - metric: "('micro avg', 'f1-score')"
|
89 |
+
2023-10-12 02:19:10,697 ----------------------------------------------------------------------------------------------------
|
90 |
+
2023-10-12 02:19:10,697 Computation:
|
91 |
+
2023-10-12 02:19:10,697 - compute on device: cuda:0
|
92 |
+
2023-10-12 02:19:10,697 - embedding storage: none
|
93 |
+
2023-10-12 02:19:10,697 ----------------------------------------------------------------------------------------------------
|
94 |
+
2023-10-12 02:19:10,697 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
|
95 |
+
2023-10-12 02:19:10,697 ----------------------------------------------------------------------------------------------------
|
96 |
+
2023-10-12 02:19:10,697 ----------------------------------------------------------------------------------------------------
|
97 |
+
2023-10-12 02:19:10,697 Logging anything other than scalars to TensorBoard is currently not supported.
|
98 |
+
2023-10-12 02:21:27,177 epoch 1 - iter 521/5212 - loss 2.79668095 - time (sec): 136.48 - samples/sec: 242.74 - lr: 0.000015 - momentum: 0.000000
|
99 |
+
2023-10-12 02:23:45,273 epoch 1 - iter 1042/5212 - loss 2.35843978 - time (sec): 274.57 - samples/sec: 247.36 - lr: 0.000030 - momentum: 0.000000
|
100 |
+
2023-10-12 02:26:04,245 epoch 1 - iter 1563/5212 - loss 1.80796410 - time (sec): 413.55 - samples/sec: 254.16 - lr: 0.000045 - momentum: 0.000000
|
101 |
+
2023-10-12 02:28:22,402 epoch 1 - iter 2084/5212 - loss 1.45533862 - time (sec): 551.70 - samples/sec: 257.38 - lr: 0.000060 - momentum: 0.000000
|
102 |
+
2023-10-12 02:30:41,547 epoch 1 - iter 2605/5212 - loss 1.24266578 - time (sec): 690.85 - samples/sec: 260.63 - lr: 0.000075 - momentum: 0.000000
|
103 |
+
2023-10-12 02:32:58,209 epoch 1 - iter 3126/5212 - loss 1.09834367 - time (sec): 827.51 - samples/sec: 259.96 - lr: 0.000090 - momentum: 0.000000
|
104 |
+
2023-10-12 02:35:17,942 epoch 1 - iter 3647/5212 - loss 0.98084623 - time (sec): 967.24 - samples/sec: 261.96 - lr: 0.000105 - momentum: 0.000000
|
105 |
+
2023-10-12 02:37:35,810 epoch 1 - iter 4168/5212 - loss 0.88950826 - time (sec): 1105.11 - samples/sec: 262.06 - lr: 0.000120 - momentum: 0.000000
|
106 |
+
2023-10-12 02:39:57,111 epoch 1 - iter 4689/5212 - loss 0.80746283 - time (sec): 1246.41 - samples/sec: 264.35 - lr: 0.000135 - momentum: 0.000000
|
107 |
+
2023-10-12 02:42:16,837 epoch 1 - iter 5210/5212 - loss 0.74371560 - time (sec): 1386.14 - samples/sec: 264.92 - lr: 0.000150 - momentum: 0.000000
|
108 |
+
2023-10-12 02:42:17,401 ----------------------------------------------------------------------------------------------------
|
109 |
+
2023-10-12 02:42:17,402 EPOCH 1 done: loss 0.7433 - lr: 0.000150
|
110 |
+
2023-10-12 02:42:52,120 DEV : loss 0.12636248767375946 - f1-score (micro avg) 0.2776
|
111 |
+
2023-10-12 02:42:52,174 saving best model
|
112 |
+
2023-10-12 02:42:53,044 ----------------------------------------------------------------------------------------------------
|
113 |
+
2023-10-12 02:45:11,088 epoch 2 - iter 521/5212 - loss 0.17686924 - time (sec): 138.04 - samples/sec: 262.94 - lr: 0.000148 - momentum: 0.000000
|
114 |
+
2023-10-12 02:47:30,960 epoch 2 - iter 1042/5212 - loss 0.15464838 - time (sec): 277.91 - samples/sec: 267.07 - lr: 0.000147 - momentum: 0.000000
|
115 |
+
2023-10-12 02:49:50,501 epoch 2 - iter 1563/5212 - loss 0.15677141 - time (sec): 417.45 - samples/sec: 261.48 - lr: 0.000145 - momentum: 0.000000
|
116 |
+
2023-10-12 02:52:16,593 epoch 2 - iter 2084/5212 - loss 0.15541495 - time (sec): 563.55 - samples/sec: 263.09 - lr: 0.000143 - momentum: 0.000000
|
117 |
+
2023-10-12 02:54:38,805 epoch 2 - iter 2605/5212 - loss 0.15288785 - time (sec): 705.76 - samples/sec: 261.65 - lr: 0.000142 - momentum: 0.000000
|
118 |
+
2023-10-12 02:56:58,256 epoch 2 - iter 3126/5212 - loss 0.15178570 - time (sec): 845.21 - samples/sec: 258.20 - lr: 0.000140 - momentum: 0.000000
|
119 |
+
2023-10-12 02:59:16,224 epoch 2 - iter 3647/5212 - loss 0.15320470 - time (sec): 983.18 - samples/sec: 254.99 - lr: 0.000138 - momentum: 0.000000
|
120 |
+
2023-10-12 03:01:39,472 epoch 2 - iter 4168/5212 - loss 0.14998413 - time (sec): 1126.43 - samples/sec: 256.63 - lr: 0.000137 - momentum: 0.000000
|
121 |
+
2023-10-12 03:04:04,945 epoch 2 - iter 4689/5212 - loss 0.14664370 - time (sec): 1271.90 - samples/sec: 259.73 - lr: 0.000135 - momentum: 0.000000
|
122 |
+
2023-10-12 03:06:26,106 epoch 2 - iter 5210/5212 - loss 0.14460575 - time (sec): 1413.06 - samples/sec: 259.97 - lr: 0.000133 - momentum: 0.000000
|
123 |
+
2023-10-12 03:06:26,551 ----------------------------------------------------------------------------------------------------
|
124 |
+
2023-10-12 03:06:26,552 EPOCH 2 done: loss 0.1446 - lr: 0.000133
|
125 |
+
2023-10-12 03:07:05,601 DEV : loss 0.14167223870754242 - f1-score (micro avg) 0.3339
|
126 |
+
2023-10-12 03:07:05,653 saving best model
|
127 |
+
2023-10-12 03:07:08,265 ----------------------------------------------------------------------------------------------------
|
128 |
+
2023-10-12 03:09:23,808 epoch 3 - iter 521/5212 - loss 0.10082822 - time (sec): 135.54 - samples/sec: 254.81 - lr: 0.000132 - momentum: 0.000000
|
129 |
+
2023-10-12 03:11:37,979 epoch 3 - iter 1042/5212 - loss 0.09748085 - time (sec): 269.71 - samples/sec: 250.08 - lr: 0.000130 - momentum: 0.000000
|
130 |
+
2023-10-12 03:13:58,974 epoch 3 - iter 1563/5212 - loss 0.09985524 - time (sec): 410.70 - samples/sec: 264.24 - lr: 0.000128 - momentum: 0.000000
|
131 |
+
2023-10-12 03:16:15,894 epoch 3 - iter 2084/5212 - loss 0.09997962 - time (sec): 547.62 - samples/sec: 263.03 - lr: 0.000127 - momentum: 0.000000
|
132 |
+
2023-10-12 03:18:33,733 epoch 3 - iter 2605/5212 - loss 0.09885177 - time (sec): 685.46 - samples/sec: 261.01 - lr: 0.000125 - momentum: 0.000000
|
133 |
+
2023-10-12 03:20:55,408 epoch 3 - iter 3126/5212 - loss 0.09501689 - time (sec): 827.14 - samples/sec: 264.99 - lr: 0.000123 - momentum: 0.000000
|
134 |
+
2023-10-12 03:23:18,463 epoch 3 - iter 3647/5212 - loss 0.09597740 - time (sec): 970.19 - samples/sec: 267.70 - lr: 0.000122 - momentum: 0.000000
|
135 |
+
2023-10-12 03:25:40,852 epoch 3 - iter 4168/5212 - loss 0.09794887 - time (sec): 1112.58 - samples/sec: 263.01 - lr: 0.000120 - momentum: 0.000000
|
136 |
+
2023-10-12 03:28:07,225 epoch 3 - iter 4689/5212 - loss 0.09904610 - time (sec): 1258.96 - samples/sec: 261.73 - lr: 0.000118 - momentum: 0.000000
|
137 |
+
2023-10-12 03:30:34,184 epoch 3 - iter 5210/5212 - loss 0.09816484 - time (sec): 1405.91 - samples/sec: 261.22 - lr: 0.000117 - momentum: 0.000000
|
138 |
+
2023-10-12 03:30:34,730 ----------------------------------------------------------------------------------------------------
|
139 |
+
2023-10-12 03:30:34,731 EPOCH 3 done: loss 0.0981 - lr: 0.000117
|
140 |
+
2023-10-12 03:31:15,205 DEV : loss 0.2600547671318054 - f1-score (micro avg) 0.3618
|
141 |
+
2023-10-12 03:31:15,262 saving best model
|
142 |
+
2023-10-12 03:31:17,833 ----------------------------------------------------------------------------------------------------
|
143 |
+
2023-10-12 03:33:41,338 epoch 4 - iter 521/5212 - loss 0.06934331 - time (sec): 143.50 - samples/sec: 252.51 - lr: 0.000115 - momentum: 0.000000
|
144 |
+
2023-10-12 03:36:05,531 epoch 4 - iter 1042/5212 - loss 0.07084255 - time (sec): 287.69 - samples/sec: 257.98 - lr: 0.000113 - momentum: 0.000000
|
145 |
+
2023-10-12 03:38:29,695 epoch 4 - iter 1563/5212 - loss 0.06755543 - time (sec): 431.86 - samples/sec: 261.22 - lr: 0.000112 - momentum: 0.000000
|
146 |
+
2023-10-12 03:40:54,693 epoch 4 - iter 2084/5212 - loss 0.06616838 - time (sec): 576.86 - samples/sec: 261.23 - lr: 0.000110 - momentum: 0.000000
|
147 |
+
2023-10-12 03:43:17,865 epoch 4 - iter 2605/5212 - loss 0.06558266 - time (sec): 720.03 - samples/sec: 258.99 - lr: 0.000108 - momentum: 0.000000
|
148 |
+
2023-10-12 03:45:41,115 epoch 4 - iter 3126/5212 - loss 0.06435344 - time (sec): 863.28 - samples/sec: 259.71 - lr: 0.000107 - momentum: 0.000000
|
149 |
+
2023-10-12 03:48:03,559 epoch 4 - iter 3647/5212 - loss 0.06413682 - time (sec): 1005.72 - samples/sec: 259.21 - lr: 0.000105 - momentum: 0.000000
|
150 |
+
2023-10-12 03:50:23,570 epoch 4 - iter 4168/5212 - loss 0.06596094 - time (sec): 1145.73 - samples/sec: 257.63 - lr: 0.000103 - momentum: 0.000000
|
151 |
+
2023-10-12 03:52:45,111 epoch 4 - iter 4689/5212 - loss 0.06614742 - time (sec): 1287.27 - samples/sec: 258.43 - lr: 0.000102 - momentum: 0.000000
|
152 |
+
2023-10-12 03:55:04,718 epoch 4 - iter 5210/5212 - loss 0.06648116 - time (sec): 1426.88 - samples/sec: 257.45 - lr: 0.000100 - momentum: 0.000000
|
153 |
+
2023-10-12 03:55:05,152 ----------------------------------------------------------------------------------------------------
|
154 |
+
2023-10-12 03:55:05,153 EPOCH 4 done: loss 0.0665 - lr: 0.000100
|
155 |
+
2023-10-12 03:55:45,279 DEV : loss 0.3101561367511749 - f1-score (micro avg) 0.3555
|
156 |
+
2023-10-12 03:55:45,331 ----------------------------------------------------------------------------------------------------
|
157 |
+
2023-10-12 03:58:04,744 epoch 5 - iter 521/5212 - loss 0.04426358 - time (sec): 139.41 - samples/sec: 259.84 - lr: 0.000098 - momentum: 0.000000
|
158 |
+
2023-10-12 04:00:24,001 epoch 5 - iter 1042/5212 - loss 0.04273940 - time (sec): 278.67 - samples/sec: 260.65 - lr: 0.000097 - momentum: 0.000000
|
159 |
+
2023-10-12 04:02:41,900 epoch 5 - iter 1563/5212 - loss 0.04286202 - time (sec): 416.57 - samples/sec: 256.36 - lr: 0.000095 - momentum: 0.000000
|
160 |
+
2023-10-12 04:05:03,372 epoch 5 - iter 2084/5212 - loss 0.04497980 - time (sec): 558.04 - samples/sec: 260.25 - lr: 0.000093 - momentum: 0.000000
|
161 |
+
2023-10-12 04:07:19,613 epoch 5 - iter 2605/5212 - loss 0.04479044 - time (sec): 694.28 - samples/sec: 258.94 - lr: 0.000092 - momentum: 0.000000
|
162 |
+
2023-10-12 04:09:43,942 epoch 5 - iter 3126/5212 - loss 0.04436294 - time (sec): 838.61 - samples/sec: 258.68 - lr: 0.000090 - momentum: 0.000000
|
163 |
+
2023-10-12 04:12:11,821 epoch 5 - iter 3647/5212 - loss 0.04380965 - time (sec): 986.49 - samples/sec: 258.99 - lr: 0.000088 - momentum: 0.000000
|
164 |
+
2023-10-12 04:14:37,186 epoch 5 - iter 4168/5212 - loss 0.04530386 - time (sec): 1131.85 - samples/sec: 258.58 - lr: 0.000087 - momentum: 0.000000
|
165 |
+
2023-10-12 04:17:02,247 epoch 5 - iter 4689/5212 - loss 0.04665775 - time (sec): 1276.91 - samples/sec: 257.28 - lr: 0.000085 - momentum: 0.000000
|
166 |
+
2023-10-12 04:19:31,906 epoch 5 - iter 5210/5212 - loss 0.04638068 - time (sec): 1426.57 - samples/sec: 257.51 - lr: 0.000083 - momentum: 0.000000
|
167 |
+
2023-10-12 04:19:32,352 ----------------------------------------------------------------------------------------------------
|
168 |
+
2023-10-12 04:19:32,353 EPOCH 5 done: loss 0.0464 - lr: 0.000083
|
169 |
+
2023-10-12 04:20:13,213 DEV : loss 0.3113304078578949 - f1-score (micro avg) 0.4003
|
170 |
+
2023-10-12 04:20:13,271 saving best model
|
171 |
+
2023-10-12 04:20:14,217 ----------------------------------------------------------------------------------------------------
|
172 |
+
2023-10-12 04:22:41,995 epoch 6 - iter 521/5212 - loss 0.02551644 - time (sec): 147.78 - samples/sec: 258.80 - lr: 0.000082 - momentum: 0.000000
|
173 |
+
2023-10-12 04:25:08,081 epoch 6 - iter 1042/5212 - loss 0.02937621 - time (sec): 293.86 - samples/sec: 259.22 - lr: 0.000080 - momentum: 0.000000
|
174 |
+
2023-10-12 04:27:33,632 epoch 6 - iter 1563/5212 - loss 0.03011832 - time (sec): 439.41 - samples/sec: 254.55 - lr: 0.000078 - momentum: 0.000000
|
175 |
+
2023-10-12 04:29:57,124 epoch 6 - iter 2084/5212 - loss 0.03067784 - time (sec): 582.90 - samples/sec: 252.43 - lr: 0.000077 - momentum: 0.000000
|
176 |
+
2023-10-12 04:32:22,709 epoch 6 - iter 2605/5212 - loss 0.03033894 - time (sec): 728.49 - samples/sec: 256.01 - lr: 0.000075 - momentum: 0.000000
|
177 |
+
2023-10-12 04:34:46,393 epoch 6 - iter 3126/5212 - loss 0.03057800 - time (sec): 872.17 - samples/sec: 256.98 - lr: 0.000073 - momentum: 0.000000
|
178 |
+
2023-10-12 04:37:07,410 epoch 6 - iter 3647/5212 - loss 0.03169587 - time (sec): 1013.19 - samples/sec: 256.43 - lr: 0.000072 - momentum: 0.000000
|
179 |
+
2023-10-12 04:39:27,981 epoch 6 - iter 4168/5212 - loss 0.03188952 - time (sec): 1153.76 - samples/sec: 256.11 - lr: 0.000070 - momentum: 0.000000
|
180 |
+
2023-10-12 04:41:50,066 epoch 6 - iter 4689/5212 - loss 0.03265092 - time (sec): 1295.85 - samples/sec: 256.18 - lr: 0.000068 - momentum: 0.000000
|
181 |
+
2023-10-12 04:44:09,929 epoch 6 - iter 5210/5212 - loss 0.03297566 - time (sec): 1435.71 - samples/sec: 255.87 - lr: 0.000067 - momentum: 0.000000
|
182 |
+
2023-10-12 04:44:10,371 ----------------------------------------------------------------------------------------------------
|
183 |
+
2023-10-12 04:44:10,371 EPOCH 6 done: loss 0.0330 - lr: 0.000067
|
184 |
+
2023-10-12 04:44:50,633 DEV : loss 0.40718281269073486 - f1-score (micro avg) 0.3971
|
185 |
+
2023-10-12 04:44:50,685 ----------------------------------------------------------------------------------------------------
|
186 |
+
2023-10-12 04:47:11,977 epoch 7 - iter 521/5212 - loss 0.02552297 - time (sec): 141.29 - samples/sec: 259.67 - lr: 0.000065 - momentum: 0.000000
|
187 |
+
2023-10-12 04:49:32,146 epoch 7 - iter 1042/5212 - loss 0.02437180 - time (sec): 281.46 - samples/sec: 271.70 - lr: 0.000063 - momentum: 0.000000
|
188 |
+
2023-10-12 04:51:48,960 epoch 7 - iter 1563/5212 - loss 0.02514062 - time (sec): 418.27 - samples/sec: 267.26 - lr: 0.000062 - momentum: 0.000000
|
189 |
+
2023-10-12 04:54:07,753 epoch 7 - iter 2084/5212 - loss 0.02516668 - time (sec): 557.07 - samples/sec: 267.87 - lr: 0.000060 - momentum: 0.000000
|
190 |
+
2023-10-12 04:56:27,890 epoch 7 - iter 2605/5212 - loss 0.02469412 - time (sec): 697.20 - samples/sec: 265.74 - lr: 0.000058 - momentum: 0.000000
|
191 |
+
2023-10-12 04:58:49,869 epoch 7 - iter 3126/5212 - loss 0.02456715 - time (sec): 839.18 - samples/sec: 265.93 - lr: 0.000057 - momentum: 0.000000
|
192 |
+
2023-10-12 05:01:09,634 epoch 7 - iter 3647/5212 - loss 0.02477855 - time (sec): 978.95 - samples/sec: 263.65 - lr: 0.000055 - momentum: 0.000000
|
193 |
+
2023-10-12 05:03:36,700 epoch 7 - iter 4168/5212 - loss 0.02394687 - time (sec): 1126.01 - samples/sec: 262.72 - lr: 0.000053 - momentum: 0.000000
|
194 |
+
2023-10-12 05:06:00,466 epoch 7 - iter 4689/5212 - loss 0.02356060 - time (sec): 1269.78 - samples/sec: 260.95 - lr: 0.000052 - momentum: 0.000000
|
195 |
+
2023-10-12 05:08:24,608 epoch 7 - iter 5210/5212 - loss 0.02323141 - time (sec): 1413.92 - samples/sec: 259.78 - lr: 0.000050 - momentum: 0.000000
|
196 |
+
2023-10-12 05:08:25,097 ----------------------------------------------------------------------------------------------------
|
197 |
+
2023-10-12 05:08:25,097 EPOCH 7 done: loss 0.0232 - lr: 0.000050
|
198 |
+
2023-10-12 05:09:04,885 DEV : loss 0.408222496509552 - f1-score (micro avg) 0.4035
|
199 |
+
2023-10-12 05:09:04,936 saving best model
|
200 |
+
2023-10-12 05:09:07,501 ----------------------------------------------------------------------------------------------------
|
201 |
+
2023-10-12 05:11:27,797 epoch 8 - iter 521/5212 - loss 0.01537607 - time (sec): 140.29 - samples/sec: 265.21 - lr: 0.000048 - momentum: 0.000000
|
202 |
+
2023-10-12 05:13:47,952 epoch 8 - iter 1042/5212 - loss 0.01753064 - time (sec): 280.45 - samples/sec: 271.51 - lr: 0.000047 - momentum: 0.000000
|
203 |
+
2023-10-12 05:16:11,225 epoch 8 - iter 1563/5212 - loss 0.01696724 - time (sec): 423.72 - samples/sec: 277.40 - lr: 0.000045 - momentum: 0.000000
|
204 |
+
2023-10-12 05:18:28,174 epoch 8 - iter 2084/5212 - loss 0.01669412 - time (sec): 560.67 - samples/sec: 273.78 - lr: 0.000043 - momentum: 0.000000
|
205 |
+
2023-10-12 05:20:45,546 epoch 8 - iter 2605/5212 - loss 0.01686027 - time (sec): 698.04 - samples/sec: 269.95 - lr: 0.000042 - momentum: 0.000000
|
206 |
+
2023-10-12 05:23:00,618 epoch 8 - iter 3126/5212 - loss 0.01671656 - time (sec): 833.11 - samples/sec: 267.12 - lr: 0.000040 - momentum: 0.000000
|
207 |
+
2023-10-12 05:25:16,222 epoch 8 - iter 3647/5212 - loss 0.01600572 - time (sec): 968.72 - samples/sec: 265.31 - lr: 0.000038 - momentum: 0.000000
|
208 |
+
2023-10-12 05:27:35,013 epoch 8 - iter 4168/5212 - loss 0.01597679 - time (sec): 1107.51 - samples/sec: 264.67 - lr: 0.000037 - momentum: 0.000000
|
209 |
+
2023-10-12 05:29:56,551 epoch 8 - iter 4689/5212 - loss 0.01553053 - time (sec): 1249.05 - samples/sec: 263.33 - lr: 0.000035 - momentum: 0.000000
|
210 |
+
2023-10-12 05:32:19,571 epoch 8 - iter 5210/5212 - loss 0.01639257 - time (sec): 1392.06 - samples/sec: 263.90 - lr: 0.000033 - momentum: 0.000000
|
211 |
+
2023-10-12 05:32:19,996 ----------------------------------------------------------------------------------------------------
|
212 |
+
2023-10-12 05:32:19,997 EPOCH 8 done: loss 0.0164 - lr: 0.000033
|
213 |
+
2023-10-12 05:32:58,072 DEV : loss 0.4335840940475464 - f1-score (micro avg) 0.404
|
214 |
+
2023-10-12 05:32:58,123 saving best model
|
215 |
+
2023-10-12 05:33:00,797 ----------------------------------------------------------------------------------------------------
|
216 |
+
2023-10-12 05:35:21,905 epoch 9 - iter 521/5212 - loss 0.01030793 - time (sec): 141.10 - samples/sec: 275.22 - lr: 0.000032 - momentum: 0.000000
|
217 |
+
2023-10-12 05:37:41,298 epoch 9 - iter 1042/5212 - loss 0.01148381 - time (sec): 280.50 - samples/sec: 274.31 - lr: 0.000030 - momentum: 0.000000
|
218 |
+
2023-10-12 05:40:02,195 epoch 9 - iter 1563/5212 - loss 0.01090402 - time (sec): 421.39 - samples/sec: 263.64 - lr: 0.000028 - momentum: 0.000000
|
219 |
+
2023-10-12 05:42:24,062 epoch 9 - iter 2084/5212 - loss 0.01205054 - time (sec): 563.26 - samples/sec: 259.21 - lr: 0.000027 - momentum: 0.000000
|
220 |
+
2023-10-12 05:44:49,921 epoch 9 - iter 2605/5212 - loss 0.01249863 - time (sec): 709.12 - samples/sec: 258.65 - lr: 0.000025 - momentum: 0.000000
|
221 |
+
2023-10-12 05:47:12,873 epoch 9 - iter 3126/5212 - loss 0.01215066 - time (sec): 852.07 - samples/sec: 258.10 - lr: 0.000023 - momentum: 0.000000
|
222 |
+
2023-10-12 05:49:39,769 epoch 9 - iter 3647/5212 - loss 0.01116151 - time (sec): 998.97 - samples/sec: 259.45 - lr: 0.000022 - momentum: 0.000000
|
223 |
+
2023-10-12 05:52:01,961 epoch 9 - iter 4168/5212 - loss 0.01081521 - time (sec): 1141.16 - samples/sec: 257.32 - lr: 0.000020 - momentum: 0.000000
|
224 |
+
2023-10-12 05:54:25,864 epoch 9 - iter 4689/5212 - loss 0.01082631 - time (sec): 1285.06 - samples/sec: 256.89 - lr: 0.000018 - momentum: 0.000000
|
225 |
+
2023-10-12 05:56:50,574 epoch 9 - iter 5210/5212 - loss 0.01087435 - time (sec): 1429.77 - samples/sec: 256.94 - lr: 0.000017 - momentum: 0.000000
|
226 |
+
2023-10-12 05:56:51,009 ----------------------------------------------------------------------------------------------------
|
227 |
+
2023-10-12 05:56:51,009 EPOCH 9 done: loss 0.0109 - lr: 0.000017
|
228 |
+
2023-10-12 05:57:30,552 DEV : loss 0.47575101256370544 - f1-score (micro avg) 0.3948
|
229 |
+
2023-10-12 05:57:30,605 ----------------------------------------------------------------------------------------------------
|
230 |
+
2023-10-12 05:59:52,416 epoch 10 - iter 521/5212 - loss 0.00498377 - time (sec): 141.81 - samples/sec: 252.02 - lr: 0.000015 - momentum: 0.000000
|
231 |
+
2023-10-12 06:02:13,588 epoch 10 - iter 1042/5212 - loss 0.00681335 - time (sec): 282.98 - samples/sec: 255.31 - lr: 0.000013 - momentum: 0.000000
|
232 |
+
2023-10-12 06:04:37,692 epoch 10 - iter 1563/5212 - loss 0.00622093 - time (sec): 427.08 - samples/sec: 259.59 - lr: 0.000012 - momentum: 0.000000
|
233 |
+
2023-10-12 06:06:59,831 epoch 10 - iter 2084/5212 - loss 0.00629281 - time (sec): 569.22 - samples/sec: 258.24 - lr: 0.000010 - momentum: 0.000000
|
234 |
+
2023-10-12 06:09:21,704 epoch 10 - iter 2605/5212 - loss 0.00707237 - time (sec): 711.10 - samples/sec: 257.28 - lr: 0.000008 - momentum: 0.000000
|
235 |
+
2023-10-12 06:11:42,090 epoch 10 - iter 3126/5212 - loss 0.00730698 - time (sec): 851.48 - samples/sec: 256.91 - lr: 0.000007 - momentum: 0.000000
|
236 |
+
2023-10-12 06:14:01,717 epoch 10 - iter 3647/5212 - loss 0.00730729 - time (sec): 991.11 - samples/sec: 258.63 - lr: 0.000005 - momentum: 0.000000
|
237 |
+
2023-10-12 06:16:22,285 epoch 10 - iter 4168/5212 - loss 0.00690845 - time (sec): 1131.68 - samples/sec: 260.93 - lr: 0.000003 - momentum: 0.000000
|
238 |
+
2023-10-12 06:18:43,018 epoch 10 - iter 4689/5212 - loss 0.00696219 - time (sec): 1272.41 - samples/sec: 260.74 - lr: 0.000002 - momentum: 0.000000
|
239 |
+
2023-10-12 06:21:02,617 epoch 10 - iter 5210/5212 - loss 0.00715270 - time (sec): 1412.01 - samples/sec: 260.15 - lr: 0.000000 - momentum: 0.000000
|
240 |
+
2023-10-12 06:21:03,064 ----------------------------------------------------------------------------------------------------
|
241 |
+
2023-10-12 06:21:03,065 EPOCH 10 done: loss 0.0072 - lr: 0.000000
|
242 |
+
2023-10-12 06:21:42,939 DEV : loss 0.4921533763408661 - f1-score (micro avg) 0.3907
|
243 |
+
2023-10-12 06:21:43,893 ----------------------------------------------------------------------------------------------------
|
244 |
+
2023-10-12 06:21:43,895 Loading model from best epoch ...
|
245 |
+
2023-10-12 06:21:47,649 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
|
246 |
+
2023-10-12 06:23:28,188
|
247 |
+
Results:
|
248 |
+
- F-score (micro) 0.4681
|
249 |
+
- F-score (macro) 0.3274
|
250 |
+
- Accuracy 0.3108
|
251 |
+
|
252 |
+
By class:
|
253 |
+
precision recall f1-score support
|
254 |
+
|
255 |
+
LOC 0.5033 0.5610 0.5306 1214
|
256 |
+
PER 0.4123 0.4567 0.4334 808
|
257 |
+
ORG 0.3282 0.3654 0.3458 353
|
258 |
+
HumanProd 0.0000 0.0000 0.0000 15
|
259 |
+
|
260 |
+
micro avg 0.4454 0.4933 0.4681 2390
|
261 |
+
macro avg 0.3110 0.3458 0.3274 2390
|
262 |
+
weighted avg 0.4435 0.4933 0.4671 2390
|
263 |
+
|
264 |
+
2023-10-12 06:23:28,188 ----------------------------------------------------------------------------------------------------
|