Upload folder using huggingface_hub
Browse files- best-model.pt +3 -0
- dev.tsv +0 -0
- final-model.pt +3 -0
- loss.tsv +11 -0
- runs/events.out.tfevents.1697107196.6d4c7681f95b.1253.16 +3 -0
- test.tsv +0 -0
- training.log +266 -0
best-model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:36d69da44ec26317c249e1c708b9223a296d212d18e55cc1004ed38fad8531d3
|
3 |
+
size 870817519
|
dev.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
final-model.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:612b5c2d72323154797a914eb5ab99b3aaa9f4a1c669a0c9a2251c19601bf039
|
3 |
+
size 870817636
|
loss.tsv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
|
2 |
+
1 11:03:52 0.0001 0.9515 0.1219 0.4361 0.2652 0.3298 0.1977
|
3 |
+
2 11:27:44 0.0001 0.1530 0.1417 0.2613 0.6875 0.3787 0.2354
|
4 |
+
3 11:51:46 0.0001 0.0910 0.1857 0.2805 0.5928 0.3808 0.2362
|
5 |
+
4 12:16:00 0.0001 0.0637 0.2290 0.2924 0.5057 0.3706 0.2284
|
6 |
+
5 12:39:59 0.0001 0.0447 0.2849 0.2915 0.5152 0.3723 0.2299
|
7 |
+
6 13:04:01 0.0001 0.0337 0.3598 0.2902 0.5814 0.3871 0.2417
|
8 |
+
7 13:28:03 0.0001 0.0257 0.4001 0.2951 0.6098 0.3978 0.2486
|
9 |
+
8 13:52:07 0.0000 0.0177 0.4319 0.2958 0.6364 0.4038 0.2538
|
10 |
+
9 14:16:18 0.0000 0.0122 0.4377 0.3060 0.6212 0.4100 0.2589
|
11 |
+
10 14:40:22 0.0000 0.0093 0.4512 0.3084 0.6326 0.4146 0.2624
|
runs/events.out.tfevents.1697107196.6d4c7681f95b.1253.16
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fe4a1b5a2b57ca9a13e32107ef2ab1c2c7e0d758d57cb0854a08a950bc849b6e
|
3 |
+
size 1464420
|
test.tsv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
training.log
ADDED
@@ -0,0 +1,266 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2023-10-12 10:39:56,361 ----------------------------------------------------------------------------------------------------
|
2 |
+
2023-10-12 10:39:56,363 Model: "SequenceTagger(
|
3 |
+
(embeddings): ByT5Embeddings(
|
4 |
+
(model): T5EncoderModel(
|
5 |
+
(shared): Embedding(384, 1472)
|
6 |
+
(encoder): T5Stack(
|
7 |
+
(embed_tokens): Embedding(384, 1472)
|
8 |
+
(block): ModuleList(
|
9 |
+
(0): T5Block(
|
10 |
+
(layer): ModuleList(
|
11 |
+
(0): T5LayerSelfAttention(
|
12 |
+
(SelfAttention): T5Attention(
|
13 |
+
(q): Linear(in_features=1472, out_features=384, bias=False)
|
14 |
+
(k): Linear(in_features=1472, out_features=384, bias=False)
|
15 |
+
(v): Linear(in_features=1472, out_features=384, bias=False)
|
16 |
+
(o): Linear(in_features=384, out_features=1472, bias=False)
|
17 |
+
(relative_attention_bias): Embedding(32, 6)
|
18 |
+
)
|
19 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
21 |
+
)
|
22 |
+
(1): T5LayerFF(
|
23 |
+
(DenseReluDense): T5DenseGatedActDense(
|
24 |
+
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
|
25 |
+
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
|
26 |
+
(wo): Linear(in_features=3584, out_features=1472, bias=False)
|
27 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
28 |
+
(act): NewGELUActivation()
|
29 |
+
)
|
30 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
31 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
32 |
+
)
|
33 |
+
)
|
34 |
+
)
|
35 |
+
(1-11): 11 x T5Block(
|
36 |
+
(layer): ModuleList(
|
37 |
+
(0): T5LayerSelfAttention(
|
38 |
+
(SelfAttention): T5Attention(
|
39 |
+
(q): Linear(in_features=1472, out_features=384, bias=False)
|
40 |
+
(k): Linear(in_features=1472, out_features=384, bias=False)
|
41 |
+
(v): Linear(in_features=1472, out_features=384, bias=False)
|
42 |
+
(o): Linear(in_features=384, out_features=1472, bias=False)
|
43 |
+
)
|
44 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
45 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
46 |
+
)
|
47 |
+
(1): T5LayerFF(
|
48 |
+
(DenseReluDense): T5DenseGatedActDense(
|
49 |
+
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
|
50 |
+
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
|
51 |
+
(wo): Linear(in_features=3584, out_features=1472, bias=False)
|
52 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
53 |
+
(act): NewGELUActivation()
|
54 |
+
)
|
55 |
+
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
56 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
57 |
+
)
|
58 |
+
)
|
59 |
+
)
|
60 |
+
)
|
61 |
+
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
|
62 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
63 |
+
)
|
64 |
+
)
|
65 |
+
)
|
66 |
+
(locked_dropout): LockedDropout(p=0.5)
|
67 |
+
(linear): Linear(in_features=1472, out_features=17, bias=True)
|
68 |
+
(loss_function): CrossEntropyLoss()
|
69 |
+
)"
|
70 |
+
2023-10-12 10:39:56,363 ----------------------------------------------------------------------------------------------------
|
71 |
+
2023-10-12 10:39:56,363 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
|
72 |
+
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
|
73 |
+
2023-10-12 10:39:56,363 ----------------------------------------------------------------------------------------------------
|
74 |
+
2023-10-12 10:39:56,363 Train: 20847 sentences
|
75 |
+
2023-10-12 10:39:56,364 (train_with_dev=False, train_with_test=False)
|
76 |
+
2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
|
77 |
+
2023-10-12 10:39:56,364 Training Params:
|
78 |
+
2023-10-12 10:39:56,364 - learning_rate: "0.00015"
|
79 |
+
2023-10-12 10:39:56,364 - mini_batch_size: "8"
|
80 |
+
2023-10-12 10:39:56,364 - max_epochs: "10"
|
81 |
+
2023-10-12 10:39:56,364 - shuffle: "True"
|
82 |
+
2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
|
83 |
+
2023-10-12 10:39:56,364 Plugins:
|
84 |
+
2023-10-12 10:39:56,364 - TensorboardLogger
|
85 |
+
2023-10-12 10:39:56,364 - LinearScheduler | warmup_fraction: '0.1'
|
86 |
+
2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
|
87 |
+
2023-10-12 10:39:56,364 Final evaluation on model from best epoch (best-model.pt)
|
88 |
+
2023-10-12 10:39:56,364 - metric: "('micro avg', 'f1-score')"
|
89 |
+
2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
|
90 |
+
2023-10-12 10:39:56,365 Computation:
|
91 |
+
2023-10-12 10:39:56,365 - compute on device: cuda:0
|
92 |
+
2023-10-12 10:39:56,365 - embedding storage: none
|
93 |
+
2023-10-12 10:39:56,365 ----------------------------------------------------------------------------------------------------
|
94 |
+
2023-10-12 10:39:56,365 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5"
|
95 |
+
2023-10-12 10:39:56,365 ----------------------------------------------------------------------------------------------------
|
96 |
+
2023-10-12 10:39:56,365 ----------------------------------------------------------------------------------------------------
|
97 |
+
2023-10-12 10:39:56,365 Logging anything other than scalars to TensorBoard is currently not supported.
|
98 |
+
2023-10-12 10:42:18,682 epoch 1 - iter 260/2606 - loss 2.79098046 - time (sec): 142.31 - samples/sec: 285.86 - lr: 0.000015 - momentum: 0.000000
|
99 |
+
2023-10-12 10:44:40,875 epoch 1 - iter 520/2606 - loss 2.53978742 - time (sec): 284.51 - samples/sec: 278.51 - lr: 0.000030 - momentum: 0.000000
|
100 |
+
2023-10-12 10:47:00,532 epoch 1 - iter 780/2606 - loss 2.17133359 - time (sec): 424.16 - samples/sec: 271.17 - lr: 0.000045 - momentum: 0.000000
|
101 |
+
2023-10-12 10:49:20,483 epoch 1 - iter 1040/2606 - loss 1.79656729 - time (sec): 564.12 - samples/sec: 269.30 - lr: 0.000060 - momentum: 0.000000
|
102 |
+
2023-10-12 10:51:42,179 epoch 1 - iter 1300/2606 - loss 1.53837733 - time (sec): 705.81 - samples/sec: 267.96 - lr: 0.000075 - momentum: 0.000000
|
103 |
+
2023-10-12 10:54:02,563 epoch 1 - iter 1560/2606 - loss 1.35780764 - time (sec): 846.20 - samples/sec: 268.62 - lr: 0.000090 - momentum: 0.000000
|
104 |
+
2023-10-12 10:56:22,180 epoch 1 - iter 1820/2606 - loss 1.21922934 - time (sec): 985.81 - samples/sec: 266.92 - lr: 0.000105 - momentum: 0.000000
|
105 |
+
2023-10-12 10:58:41,031 epoch 1 - iter 2080/2606 - loss 1.10847958 - time (sec): 1124.66 - samples/sec: 266.18 - lr: 0.000120 - momentum: 0.000000
|
106 |
+
2023-10-12 11:00:56,908 epoch 1 - iter 2340/2606 - loss 1.02422584 - time (sec): 1260.54 - samples/sec: 264.41 - lr: 0.000135 - momentum: 0.000000
|
107 |
+
2023-10-12 11:03:12,052 epoch 1 - iter 2600/2606 - loss 0.95341170 - time (sec): 1395.68 - samples/sec: 262.62 - lr: 0.000150 - momentum: 0.000000
|
108 |
+
2023-10-12 11:03:15,142 ----------------------------------------------------------------------------------------------------
|
109 |
+
2023-10-12 11:03:15,143 EPOCH 1 done: loss 0.9515 - lr: 0.000150
|
110 |
+
2023-10-12 11:03:52,364 DEV : loss 0.12193801254034042 - f1-score (micro avg) 0.3298
|
111 |
+
2023-10-12 11:03:52,416 saving best model
|
112 |
+
2023-10-12 11:03:53,327 ----------------------------------------------------------------------------------------------------
|
113 |
+
2023-10-12 11:06:10,460 epoch 2 - iter 260/2606 - loss 0.22817029 - time (sec): 137.13 - samples/sec: 260.71 - lr: 0.000148 - momentum: 0.000000
|
114 |
+
2023-10-12 11:08:28,429 epoch 2 - iter 520/2606 - loss 0.19785334 - time (sec): 275.10 - samples/sec: 261.21 - lr: 0.000147 - momentum: 0.000000
|
115 |
+
2023-10-12 11:10:44,641 epoch 2 - iter 780/2606 - loss 0.18391103 - time (sec): 411.31 - samples/sec: 259.17 - lr: 0.000145 - momentum: 0.000000
|
116 |
+
2023-10-12 11:13:03,481 epoch 2 - iter 1040/2606 - loss 0.17657181 - time (sec): 550.15 - samples/sec: 260.46 - lr: 0.000143 - momentum: 0.000000
|
117 |
+
2023-10-12 11:15:22,641 epoch 2 - iter 1300/2606 - loss 0.17249729 - time (sec): 689.31 - samples/sec: 260.29 - lr: 0.000142 - momentum: 0.000000
|
118 |
+
2023-10-12 11:17:42,766 epoch 2 - iter 1560/2606 - loss 0.16463122 - time (sec): 829.44 - samples/sec: 262.66 - lr: 0.000140 - momentum: 0.000000
|
119 |
+
2023-10-12 11:20:00,409 epoch 2 - iter 1820/2606 - loss 0.16141368 - time (sec): 967.08 - samples/sec: 261.73 - lr: 0.000138 - momentum: 0.000000
|
120 |
+
2023-10-12 11:22:18,980 epoch 2 - iter 2080/2606 - loss 0.15898957 - time (sec): 1105.65 - samples/sec: 263.51 - lr: 0.000137 - momentum: 0.000000
|
121 |
+
2023-10-12 11:24:40,875 epoch 2 - iter 2340/2606 - loss 0.15595656 - time (sec): 1247.55 - samples/sec: 263.93 - lr: 0.000135 - momentum: 0.000000
|
122 |
+
2023-10-12 11:26:59,696 epoch 2 - iter 2600/2606 - loss 0.15306281 - time (sec): 1386.37 - samples/sec: 264.42 - lr: 0.000133 - momentum: 0.000000
|
123 |
+
2023-10-12 11:27:02,847 ----------------------------------------------------------------------------------------------------
|
124 |
+
2023-10-12 11:27:02,847 EPOCH 2 done: loss 0.1530 - lr: 0.000133
|
125 |
+
2023-10-12 11:27:44,631 DEV : loss 0.14174272119998932 - f1-score (micro avg) 0.3787
|
126 |
+
2023-10-12 11:27:44,688 saving best model
|
127 |
+
2023-10-12 11:27:47,305 ----------------------------------------------------------------------------------------------------
|
128 |
+
2023-10-12 11:30:06,778 epoch 3 - iter 260/2606 - loss 0.09674644 - time (sec): 139.47 - samples/sec: 254.60 - lr: 0.000132 - momentum: 0.000000
|
129 |
+
2023-10-12 11:32:22,055 epoch 3 - iter 520/2606 - loss 0.10014325 - time (sec): 274.75 - samples/sec: 250.67 - lr: 0.000130 - momentum: 0.000000
|
130 |
+
2023-10-12 11:34:42,409 epoch 3 - iter 780/2606 - loss 0.09513807 - time (sec): 415.10 - samples/sec: 256.98 - lr: 0.000128 - momentum: 0.000000
|
131 |
+
2023-10-12 11:37:01,896 epoch 3 - iter 1040/2606 - loss 0.09185215 - time (sec): 554.59 - samples/sec: 257.11 - lr: 0.000127 - momentum: 0.000000
|
132 |
+
2023-10-12 11:39:21,533 epoch 3 - iter 1300/2606 - loss 0.09008465 - time (sec): 694.22 - samples/sec: 258.02 - lr: 0.000125 - momentum: 0.000000
|
133 |
+
2023-10-12 11:41:41,861 epoch 3 - iter 1560/2606 - loss 0.09055602 - time (sec): 834.55 - samples/sec: 259.20 - lr: 0.000123 - momentum: 0.000000
|
134 |
+
2023-10-12 11:44:01,170 epoch 3 - iter 1820/2606 - loss 0.09064034 - time (sec): 973.86 - samples/sec: 258.25 - lr: 0.000122 - momentum: 0.000000
|
135 |
+
2023-10-12 11:46:21,153 epoch 3 - iter 2080/2606 - loss 0.09076692 - time (sec): 1113.84 - samples/sec: 259.66 - lr: 0.000120 - momentum: 0.000000
|
136 |
+
2023-10-12 11:48:43,071 epoch 3 - iter 2340/2606 - loss 0.08970600 - time (sec): 1255.76 - samples/sec: 262.05 - lr: 0.000118 - momentum: 0.000000
|
137 |
+
2023-10-12 11:51:01,574 epoch 3 - iter 2600/2606 - loss 0.09003826 - time (sec): 1394.26 - samples/sec: 262.94 - lr: 0.000117 - momentum: 0.000000
|
138 |
+
2023-10-12 11:51:04,717 ----------------------------------------------------------------------------------------------------
|
139 |
+
2023-10-12 11:51:04,718 EPOCH 3 done: loss 0.0910 - lr: 0.000117
|
140 |
+
2023-10-12 11:51:46,787 DEV : loss 0.1857159584760666 - f1-score (micro avg) 0.3808
|
141 |
+
2023-10-12 11:51:46,841 saving best model
|
142 |
+
2023-10-12 11:51:49,486 ----------------------------------------------------------------------------------------------------
|
143 |
+
2023-10-12 11:54:08,085 epoch 4 - iter 260/2606 - loss 0.06864050 - time (sec): 138.59 - samples/sec: 257.51 - lr: 0.000115 - momentum: 0.000000
|
144 |
+
2023-10-12 11:56:28,898 epoch 4 - iter 520/2606 - loss 0.06148994 - time (sec): 279.41 - samples/sec: 261.33 - lr: 0.000113 - momentum: 0.000000
|
145 |
+
2023-10-12 11:58:48,559 epoch 4 - iter 780/2606 - loss 0.06145061 - time (sec): 419.07 - samples/sec: 260.81 - lr: 0.000112 - momentum: 0.000000
|
146 |
+
2023-10-12 12:01:11,693 epoch 4 - iter 1040/2606 - loss 0.06341903 - time (sec): 562.20 - samples/sec: 256.24 - lr: 0.000110 - momentum: 0.000000
|
147 |
+
2023-10-12 12:03:37,108 epoch 4 - iter 1300/2606 - loss 0.06522316 - time (sec): 707.62 - samples/sec: 257.77 - lr: 0.000108 - momentum: 0.000000
|
148 |
+
2023-10-12 12:06:02,487 epoch 4 - iter 1560/2606 - loss 0.06572690 - time (sec): 853.00 - samples/sec: 259.00 - lr: 0.000107 - momentum: 0.000000
|
149 |
+
2023-10-12 12:08:21,818 epoch 4 - iter 1820/2606 - loss 0.06365649 - time (sec): 992.33 - samples/sec: 259.68 - lr: 0.000105 - momentum: 0.000000
|
150 |
+
2023-10-12 12:10:38,937 epoch 4 - iter 2080/2606 - loss 0.06307986 - time (sec): 1129.45 - samples/sec: 260.42 - lr: 0.000103 - momentum: 0.000000
|
151 |
+
2023-10-12 12:12:58,069 epoch 4 - iter 2340/2606 - loss 0.06305110 - time (sec): 1268.58 - samples/sec: 260.22 - lr: 0.000102 - momentum: 0.000000
|
152 |
+
2023-10-12 12:15:16,561 epoch 4 - iter 2600/2606 - loss 0.06362841 - time (sec): 1407.07 - samples/sec: 260.73 - lr: 0.000100 - momentum: 0.000000
|
153 |
+
2023-10-12 12:15:19,400 ----------------------------------------------------------------------------------------------------
|
154 |
+
2023-10-12 12:15:19,401 EPOCH 4 done: loss 0.0637 - lr: 0.000100
|
155 |
+
2023-10-12 12:16:00,766 DEV : loss 0.22897003591060638 - f1-score (micro avg) 0.3706
|
156 |
+
2023-10-12 12:16:00,819 ----------------------------------------------------------------------------------------------------
|
157 |
+
2023-10-12 12:18:16,863 epoch 5 - iter 260/2606 - loss 0.04354472 - time (sec): 136.04 - samples/sec: 264.82 - lr: 0.000098 - momentum: 0.000000
|
158 |
+
2023-10-12 12:20:33,081 epoch 5 - iter 520/2606 - loss 0.04345957 - time (sec): 272.26 - samples/sec: 257.57 - lr: 0.000097 - momentum: 0.000000
|
159 |
+
2023-10-12 12:22:52,495 epoch 5 - iter 780/2606 - loss 0.04448711 - time (sec): 411.67 - samples/sec: 261.23 - lr: 0.000095 - momentum: 0.000000
|
160 |
+
2023-10-12 12:25:11,546 epoch 5 - iter 1040/2606 - loss 0.04417456 - time (sec): 550.72 - samples/sec: 260.87 - lr: 0.000093 - momentum: 0.000000
|
161 |
+
2023-10-12 12:27:30,883 epoch 5 - iter 1300/2606 - loss 0.04446586 - time (sec): 690.06 - samples/sec: 262.18 - lr: 0.000092 - momentum: 0.000000
|
162 |
+
2023-10-12 12:29:53,071 epoch 5 - iter 1560/2606 - loss 0.04447562 - time (sec): 832.25 - samples/sec: 265.01 - lr: 0.000090 - momentum: 0.000000
|
163 |
+
2023-10-12 12:32:13,835 epoch 5 - iter 1820/2606 - loss 0.04488614 - time (sec): 973.01 - samples/sec: 266.42 - lr: 0.000088 - momentum: 0.000000
|
164 |
+
2023-10-12 12:34:27,520 epoch 5 - iter 2080/2606 - loss 0.04528989 - time (sec): 1106.70 - samples/sec: 264.22 - lr: 0.000087 - momentum: 0.000000
|
165 |
+
2023-10-12 12:36:51,484 epoch 5 - iter 2340/2606 - loss 0.04482478 - time (sec): 1250.66 - samples/sec: 263.88 - lr: 0.000085 - momentum: 0.000000
|
166 |
+
2023-10-12 12:39:13,945 epoch 5 - iter 2600/2606 - loss 0.04472883 - time (sec): 1393.12 - samples/sec: 263.19 - lr: 0.000083 - momentum: 0.000000
|
167 |
+
2023-10-12 12:39:17,034 ----------------------------------------------------------------------------------------------------
|
168 |
+
2023-10-12 12:39:17,034 EPOCH 5 done: loss 0.0447 - lr: 0.000083
|
169 |
+
2023-10-12 12:39:59,133 DEV : loss 0.2849178612232208 - f1-score (micro avg) 0.3723
|
170 |
+
2023-10-12 12:39:59,190 ----------------------------------------------------------------------------------------------------
|
171 |
+
2023-10-12 12:42:16,257 epoch 6 - iter 260/2606 - loss 0.03205628 - time (sec): 137.07 - samples/sec: 257.42 - lr: 0.000082 - momentum: 0.000000
|
172 |
+
2023-10-12 12:44:40,129 epoch 6 - iter 520/2606 - loss 0.03060171 - time (sec): 280.94 - samples/sec: 264.85 - lr: 0.000080 - momentum: 0.000000
|
173 |
+
2023-10-12 12:46:59,629 epoch 6 - iter 780/2606 - loss 0.02986452 - time (sec): 420.44 - samples/sec: 265.75 - lr: 0.000078 - momentum: 0.000000
|
174 |
+
2023-10-12 12:49:19,956 epoch 6 - iter 1040/2606 - loss 0.03282256 - time (sec): 560.76 - samples/sec: 262.17 - lr: 0.000077 - momentum: 0.000000
|
175 |
+
2023-10-12 12:51:37,247 epoch 6 - iter 1300/2606 - loss 0.03337104 - time (sec): 698.06 - samples/sec: 260.05 - lr: 0.000075 - momentum: 0.000000
|
176 |
+
2023-10-12 12:53:58,164 epoch 6 - iter 1560/2606 - loss 0.03383846 - time (sec): 838.97 - samples/sec: 262.62 - lr: 0.000073 - momentum: 0.000000
|
177 |
+
2023-10-12 12:56:20,297 epoch 6 - iter 1820/2606 - loss 0.03513634 - time (sec): 981.11 - samples/sec: 263.52 - lr: 0.000072 - momentum: 0.000000
|
178 |
+
2023-10-12 12:58:37,971 epoch 6 - iter 2080/2606 - loss 0.03457993 - time (sec): 1118.78 - samples/sec: 261.28 - lr: 0.000070 - momentum: 0.000000
|
179 |
+
2023-10-12 13:00:55,395 epoch 6 - iter 2340/2606 - loss 0.03450918 - time (sec): 1256.20 - samples/sec: 261.02 - lr: 0.000068 - momentum: 0.000000
|
180 |
+
2023-10-12 13:03:15,842 epoch 6 - iter 2600/2606 - loss 0.03375577 - time (sec): 1396.65 - samples/sec: 262.22 - lr: 0.000067 - momentum: 0.000000
|
181 |
+
2023-10-12 13:03:19,438 ----------------------------------------------------------------------------------------------------
|
182 |
+
2023-10-12 13:03:19,439 EPOCH 6 done: loss 0.0337 - lr: 0.000067
|
183 |
+
2023-10-12 13:04:01,257 DEV : loss 0.35980162024497986 - f1-score (micro avg) 0.3871
|
184 |
+
2023-10-12 13:04:01,331 saving best model
|
185 |
+
2023-10-12 13:04:04,002 ----------------------------------------------------------------------------------------------------
|
186 |
+
2023-10-12 13:06:24,564 epoch 7 - iter 260/2606 - loss 0.02056707 - time (sec): 140.56 - samples/sec: 262.29 - lr: 0.000065 - momentum: 0.000000
|
187 |
+
2023-10-12 13:08:44,030 epoch 7 - iter 520/2606 - loss 0.02211430 - time (sec): 280.02 - samples/sec: 259.85 - lr: 0.000063 - momentum: 0.000000
|
188 |
+
2023-10-12 13:11:03,498 epoch 7 - iter 780/2606 - loss 0.02238197 - time (sec): 419.49 - samples/sec: 261.02 - lr: 0.000062 - momentum: 0.000000
|
189 |
+
2023-10-12 13:13:22,390 epoch 7 - iter 1040/2606 - loss 0.02341070 - time (sec): 558.38 - samples/sec: 262.32 - lr: 0.000060 - momentum: 0.000000
|
190 |
+
2023-10-12 13:15:40,113 epoch 7 - iter 1300/2606 - loss 0.02460152 - time (sec): 696.11 - samples/sec: 264.02 - lr: 0.000058 - momentum: 0.000000
|
191 |
+
2023-10-12 13:18:05,020 epoch 7 - iter 1560/2606 - loss 0.02429302 - time (sec): 841.01 - samples/sec: 268.16 - lr: 0.000057 - momentum: 0.000000
|
192 |
+
2023-10-12 13:20:23,524 epoch 7 - iter 1820/2606 - loss 0.02376757 - time (sec): 979.52 - samples/sec: 266.99 - lr: 0.000055 - momentum: 0.000000
|
193 |
+
2023-10-12 13:22:39,436 epoch 7 - iter 2080/2606 - loss 0.02507358 - time (sec): 1115.43 - samples/sec: 264.44 - lr: 0.000053 - momentum: 0.000000
|
194 |
+
2023-10-12 13:24:57,530 epoch 7 - iter 2340/2606 - loss 0.02604305 - time (sec): 1253.52 - samples/sec: 263.11 - lr: 0.000052 - momentum: 0.000000
|
195 |
+
2023-10-12 13:27:18,710 epoch 7 - iter 2600/2606 - loss 0.02570231 - time (sec): 1394.70 - samples/sec: 262.85 - lr: 0.000050 - momentum: 0.000000
|
196 |
+
2023-10-12 13:27:21,846 ----------------------------------------------------------------------------------------------------
|
197 |
+
2023-10-12 13:27:21,846 EPOCH 7 done: loss 0.0257 - lr: 0.000050
|
198 |
+
2023-10-12 13:28:03,580 DEV : loss 0.40007448196411133 - f1-score (micro avg) 0.3978
|
199 |
+
2023-10-12 13:28:03,637 saving best model
|
200 |
+
2023-10-12 13:28:06,266 ----------------------------------------------------------------------------------------------------
|
201 |
+
2023-10-12 13:30:27,485 epoch 8 - iter 260/2606 - loss 0.01482737 - time (sec): 141.21 - samples/sec: 266.04 - lr: 0.000048 - momentum: 0.000000
|
202 |
+
2023-10-12 13:32:45,532 epoch 8 - iter 520/2606 - loss 0.01696598 - time (sec): 279.26 - samples/sec: 261.91 - lr: 0.000047 - momentum: 0.000000
|
203 |
+
2023-10-12 13:35:06,637 epoch 8 - iter 780/2606 - loss 0.01817455 - time (sec): 420.37 - samples/sec: 262.67 - lr: 0.000045 - momentum: 0.000000
|
204 |
+
2023-10-12 13:37:25,483 epoch 8 - iter 1040/2606 - loss 0.01837600 - time (sec): 559.21 - samples/sec: 264.78 - lr: 0.000043 - momentum: 0.000000
|
205 |
+
2023-10-12 13:39:41,894 epoch 8 - iter 1300/2606 - loss 0.01883484 - time (sec): 695.62 - samples/sec: 261.77 - lr: 0.000042 - momentum: 0.000000
|
206 |
+
2023-10-12 13:41:59,734 epoch 8 - iter 1560/2606 - loss 0.01783081 - time (sec): 833.46 - samples/sec: 262.78 - lr: 0.000040 - momentum: 0.000000
|
207 |
+
2023-10-12 13:44:16,165 epoch 8 - iter 1820/2606 - loss 0.01833982 - time (sec): 969.89 - samples/sec: 261.10 - lr: 0.000038 - momentum: 0.000000
|
208 |
+
2023-10-12 13:46:35,668 epoch 8 - iter 2080/2606 - loss 0.01813588 - time (sec): 1109.40 - samples/sec: 260.69 - lr: 0.000037 - momentum: 0.000000
|
209 |
+
2023-10-12 13:48:57,758 epoch 8 - iter 2340/2606 - loss 0.01823581 - time (sec): 1251.49 - samples/sec: 263.09 - lr: 0.000035 - momentum: 0.000000
|
210 |
+
2023-10-12 13:51:20,031 epoch 8 - iter 2600/2606 - loss 0.01778836 - time (sec): 1393.76 - samples/sec: 262.81 - lr: 0.000033 - momentum: 0.000000
|
211 |
+
2023-10-12 13:51:23,576 ----------------------------------------------------------------------------------------------------
|
212 |
+
2023-10-12 13:51:23,576 EPOCH 8 done: loss 0.0177 - lr: 0.000033
|
213 |
+
2023-10-12 13:52:07,288 DEV : loss 0.4319295883178711 - f1-score (micro avg) 0.4038
|
214 |
+
2023-10-12 13:52:07,356 saving best model
|
215 |
+
2023-10-12 13:52:09,992 ----------------------------------------------------------------------------------------------------
|
216 |
+
2023-10-12 13:54:30,936 epoch 9 - iter 260/2606 - loss 0.01164345 - time (sec): 140.94 - samples/sec: 265.00 - lr: 0.000032 - momentum: 0.000000
|
217 |
+
2023-10-12 13:56:44,434 epoch 9 - iter 520/2606 - loss 0.01383209 - time (sec): 274.44 - samples/sec: 255.72 - lr: 0.000030 - momentum: 0.000000
|
218 |
+
2023-10-12 13:59:02,722 epoch 9 - iter 780/2606 - loss 0.01292411 - time (sec): 412.73 - samples/sec: 260.02 - lr: 0.000028 - momentum: 0.000000
|
219 |
+
2023-10-12 14:01:23,730 epoch 9 - iter 1040/2606 - loss 0.01215414 - time (sec): 553.73 - samples/sec: 260.86 - lr: 0.000027 - momentum: 0.000000
|
220 |
+
2023-10-12 14:03:45,315 epoch 9 - iter 1300/2606 - loss 0.01270955 - time (sec): 695.32 - samples/sec: 263.37 - lr: 0.000025 - momentum: 0.000000
|
221 |
+
2023-10-12 14:06:04,587 epoch 9 - iter 1560/2606 - loss 0.01279780 - time (sec): 834.59 - samples/sec: 262.06 - lr: 0.000023 - momentum: 0.000000
|
222 |
+
2023-10-12 14:08:30,951 epoch 9 - iter 1820/2606 - loss 0.01225063 - time (sec): 980.95 - samples/sec: 261.88 - lr: 0.000022 - momentum: 0.000000
|
223 |
+
2023-10-12 14:10:55,018 epoch 9 - iter 2080/2606 - loss 0.01260601 - time (sec): 1125.02 - samples/sec: 261.01 - lr: 0.000020 - momentum: 0.000000
|
224 |
+
2023-10-12 14:13:17,567 epoch 9 - iter 2340/2606 - loss 0.01255088 - time (sec): 1267.57 - samples/sec: 260.39 - lr: 0.000018 - momentum: 0.000000
|
225 |
+
2023-10-12 14:15:34,884 epoch 9 - iter 2600/2606 - loss 0.01226504 - time (sec): 1404.89 - samples/sec: 261.09 - lr: 0.000017 - momentum: 0.000000
|
226 |
+
2023-10-12 14:15:37,671 ----------------------------------------------------------------------------------------------------
|
227 |
+
2023-10-12 14:15:37,671 EPOCH 9 done: loss 0.0122 - lr: 0.000017
|
228 |
+
2023-10-12 14:16:18,359 DEV : loss 0.43767231702804565 - f1-score (micro avg) 0.41
|
229 |
+
2023-10-12 14:16:18,411 saving best model
|
230 |
+
2023-10-12 14:16:20,975 ----------------------------------------------------------------------------------------------------
|
231 |
+
2023-10-12 14:18:40,818 epoch 10 - iter 260/2606 - loss 0.00960636 - time (sec): 139.84 - samples/sec: 261.49 - lr: 0.000015 - momentum: 0.000000
|
232 |
+
2023-10-12 14:20:58,673 epoch 10 - iter 520/2606 - loss 0.01062704 - time (sec): 277.69 - samples/sec: 259.88 - lr: 0.000013 - momentum: 0.000000
|
233 |
+
2023-10-12 14:23:18,738 epoch 10 - iter 780/2606 - loss 0.01059562 - time (sec): 417.76 - samples/sec: 262.80 - lr: 0.000012 - momentum: 0.000000
|
234 |
+
2023-10-12 14:25:36,355 epoch 10 - iter 1040/2606 - loss 0.00990986 - time (sec): 555.38 - samples/sec: 260.97 - lr: 0.000010 - momentum: 0.000000
|
235 |
+
2023-10-12 14:27:56,432 epoch 10 - iter 1300/2606 - loss 0.00958578 - time (sec): 695.45 - samples/sec: 261.48 - lr: 0.000008 - momentum: 0.000000
|
236 |
+
2023-10-12 14:30:16,906 epoch 10 - iter 1560/2606 - loss 0.00964887 - time (sec): 835.93 - samples/sec: 261.59 - lr: 0.000007 - momentum: 0.000000
|
237 |
+
2023-10-12 14:32:34,264 epoch 10 - iter 1820/2606 - loss 0.00928930 - time (sec): 973.28 - samples/sec: 260.18 - lr: 0.000005 - momentum: 0.000000
|
238 |
+
2023-10-12 14:34:54,973 epoch 10 - iter 2080/2606 - loss 0.00963159 - time (sec): 1113.99 - samples/sec: 261.10 - lr: 0.000003 - momentum: 0.000000
|
239 |
+
2023-10-12 14:37:15,806 epoch 10 - iter 2340/2606 - loss 0.00957205 - time (sec): 1254.83 - samples/sec: 261.18 - lr: 0.000002 - momentum: 0.000000
|
240 |
+
2023-10-12 14:39:35,671 epoch 10 - iter 2600/2606 - loss 0.00934680 - time (sec): 1394.69 - samples/sec: 262.81 - lr: 0.000000 - momentum: 0.000000
|
241 |
+
2023-10-12 14:39:38,739 ----------------------------------------------------------------------------------------------------
|
242 |
+
2023-10-12 14:39:38,739 EPOCH 10 done: loss 0.0093 - lr: 0.000000
|
243 |
+
2023-10-12 14:40:22,122 DEV : loss 0.45118266344070435 - f1-score (micro avg) 0.4146
|
244 |
+
2023-10-12 14:40:22,181 saving best model
|
245 |
+
2023-10-12 14:40:24,269 ----------------------------------------------------------------------------------------------------
|
246 |
+
2023-10-12 14:40:24,271 Loading model from best epoch ...
|
247 |
+
2023-10-12 14:40:28,579 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
|
248 |
+
2023-10-12 14:42:13,016
|
249 |
+
Results:
|
250 |
+
- F-score (micro) 0.4344
|
251 |
+
- F-score (macro) 0.3036
|
252 |
+
- Accuracy 0.2816
|
253 |
+
|
254 |
+
By class:
|
255 |
+
precision recall f1-score support
|
256 |
+
|
257 |
+
LOC 0.4256 0.4992 0.4594 1214
|
258 |
+
PER 0.4038 0.5272 0.4573 808
|
259 |
+
ORG 0.2895 0.3059 0.2975 353
|
260 |
+
HumanProd 0.0000 0.0000 0.0000 15
|
261 |
+
|
262 |
+
micro avg 0.3987 0.4770 0.4344 2390
|
263 |
+
macro avg 0.2797 0.3331 0.3036 2390
|
264 |
+
weighted avg 0.3954 0.4770 0.4319 2390
|
265 |
+
|
266 |
+
2023-10-12 14:42:13,016 ----------------------------------------------------------------------------------------------------
|