stefan-it commited on
Commit
40d3df7
1 Parent(s): 003f8e8

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5365ebcdc43459dc04d0da82b433d4a7862dff2625130b62abfcf8edf32bf284
3
+ size 870817519
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:460c98f47485f7333757b4be38f4c848d208ea920eda25c8b3a98c0fdf97ecdc
3
+ size 870817636
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 06:57:43 0.0002 0.9337 0.1333 0.3805 0.2803 0.3228 0.1925
3
+ 2 07:20:39 0.0001 0.1664 0.1356 0.2397 0.4621 0.3157 0.1886
4
+ 3 07:44:02 0.0001 0.0987 0.2194 0.2285 0.5795 0.3278 0.1972
5
+ 4 08:07:49 0.0001 0.0702 0.2609 0.2672 0.5436 0.3583 0.2196
6
+ 5 08:31:31 0.0001 0.0491 0.3354 0.2458 0.5568 0.3411 0.2072
7
+ 6 08:54:06 0.0001 0.0353 0.4165 0.2392 0.6269 0.3462 0.2106
8
+ 7 09:17:14 0.0001 0.0280 0.3882 0.2827 0.6061 0.3855 0.2402
9
+ 8 09:39:38 0.0000 0.0216 0.4609 0.2630 0.6231 0.3699 0.2282
10
+ 9 10:02:30 0.0000 0.0152 0.4856 0.2588 0.6004 0.3617 0.2218
11
+ 10 10:25:04 0.0000 0.0104 0.4849 0.2654 0.5966 0.3673 0.2261
runs/events.out.tfevents.1697006045.6d4c7681f95b.1253.9 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a18fe5dd6df0577169b4320323061afb50867d1bcbf5ef7b6721540205329d74
3
+ size 1464420
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-11 06:34:05,203 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-11 06:34:05,205 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=17, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-11 06:34:05,205 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-11 06:34:05,206 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
72
+ - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
73
+ 2023-10-11 06:34:05,206 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-11 06:34:05,206 Train: 20847 sentences
75
+ 2023-10-11 06:34:05,206 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-11 06:34:05,206 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-11 06:34:05,206 Training Params:
78
+ 2023-10-11 06:34:05,206 - learning_rate: "0.00016"
79
+ 2023-10-11 06:34:05,206 - mini_batch_size: "8"
80
+ 2023-10-11 06:34:05,206 - max_epochs: "10"
81
+ 2023-10-11 06:34:05,206 - shuffle: "True"
82
+ 2023-10-11 06:34:05,207 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-11 06:34:05,207 Plugins:
84
+ 2023-10-11 06:34:05,207 - TensorboardLogger
85
+ 2023-10-11 06:34:05,207 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-11 06:34:05,207 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-11 06:34:05,207 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-11 06:34:05,207 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-11 06:34:05,207 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-11 06:34:05,207 Computation:
91
+ 2023-10-11 06:34:05,207 - compute on device: cuda:0
92
+ 2023-10-11 06:34:05,207 - embedding storage: none
93
+ 2023-10-11 06:34:05,207 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-11 06:34:05,207 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3"
95
+ 2023-10-11 06:34:05,207 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-11 06:34:05,208 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-11 06:34:05,208 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-11 06:36:23,104 epoch 1 - iter 260/2606 - loss 2.79623512 - time (sec): 137.89 - samples/sec: 270.68 - lr: 0.000016 - momentum: 0.000000
99
+ 2023-10-11 06:38:42,899 epoch 1 - iter 520/2606 - loss 2.51320583 - time (sec): 277.69 - samples/sec: 276.16 - lr: 0.000032 - momentum: 0.000000
100
+ 2023-10-11 06:41:02,243 epoch 1 - iter 780/2606 - loss 2.13123088 - time (sec): 417.03 - samples/sec: 271.28 - lr: 0.000048 - momentum: 0.000000
101
+ 2023-10-11 06:43:21,533 epoch 1 - iter 1040/2606 - loss 1.75671481 - time (sec): 556.32 - samples/sec: 268.41 - lr: 0.000064 - momentum: 0.000000
102
+ 2023-10-11 06:45:41,186 epoch 1 - iter 1300/2606 - loss 1.49838201 - time (sec): 695.98 - samples/sec: 268.65 - lr: 0.000080 - momentum: 0.000000
103
+ 2023-10-11 06:47:59,654 epoch 1 - iter 1560/2606 - loss 1.32866928 - time (sec): 834.44 - samples/sec: 267.37 - lr: 0.000096 - momentum: 0.000000
104
+ 2023-10-11 06:50:16,336 epoch 1 - iter 1820/2606 - loss 1.20054232 - time (sec): 971.13 - samples/sec: 265.79 - lr: 0.000112 - momentum: 0.000000
105
+ 2023-10-11 06:52:32,593 epoch 1 - iter 2080/2606 - loss 1.10214569 - time (sec): 1107.38 - samples/sec: 263.73 - lr: 0.000128 - momentum: 0.000000
106
+ 2023-10-11 06:54:49,367 epoch 1 - iter 2340/2606 - loss 1.01185782 - time (sec): 1244.16 - samples/sec: 265.32 - lr: 0.000144 - momentum: 0.000000
107
+ 2023-10-11 06:57:04,584 epoch 1 - iter 2600/2606 - loss 0.93498109 - time (sec): 1379.37 - samples/sec: 265.83 - lr: 0.000160 - momentum: 0.000000
108
+ 2023-10-11 06:57:07,647 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-11 06:57:07,648 EPOCH 1 done: loss 0.9337 - lr: 0.000160
110
+ 2023-10-11 06:57:43,437 DEV : loss 0.13330958783626556 - f1-score (micro avg) 0.3228
111
+ 2023-10-11 06:57:43,490 saving best model
112
+ 2023-10-11 06:57:44,505 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-11 06:59:57,930 epoch 2 - iter 260/2606 - loss 0.21019294 - time (sec): 133.42 - samples/sec: 276.63 - lr: 0.000158 - momentum: 0.000000
114
+ 2023-10-11 07:02:09,637 epoch 2 - iter 520/2606 - loss 0.19864513 - time (sec): 265.13 - samples/sec: 276.01 - lr: 0.000156 - momentum: 0.000000
115
+ 2023-10-11 07:04:22,717 epoch 2 - iter 780/2606 - loss 0.20049738 - time (sec): 398.21 - samples/sec: 282.71 - lr: 0.000155 - momentum: 0.000000
116
+ 2023-10-11 07:06:32,374 epoch 2 - iter 1040/2606 - loss 0.19536494 - time (sec): 527.87 - samples/sec: 282.50 - lr: 0.000153 - momentum: 0.000000
117
+ 2023-10-11 07:08:44,974 epoch 2 - iter 1300/2606 - loss 0.18944632 - time (sec): 660.47 - samples/sec: 278.93 - lr: 0.000151 - momentum: 0.000000
118
+ 2023-10-11 07:10:58,600 epoch 2 - iter 1560/2606 - loss 0.18221078 - time (sec): 794.09 - samples/sec: 278.60 - lr: 0.000149 - momentum: 0.000000
119
+ 2023-10-11 07:13:07,761 epoch 2 - iter 1820/2606 - loss 0.18085900 - time (sec): 923.25 - samples/sec: 276.27 - lr: 0.000148 - momentum: 0.000000
120
+ 2023-10-11 07:15:22,042 epoch 2 - iter 2080/2606 - loss 0.17490813 - time (sec): 1057.53 - samples/sec: 275.53 - lr: 0.000146 - momentum: 0.000000
121
+ 2023-10-11 07:17:39,559 epoch 2 - iter 2340/2606 - loss 0.17046757 - time (sec): 1195.05 - samples/sec: 275.94 - lr: 0.000144 - momentum: 0.000000
122
+ 2023-10-11 07:19:55,227 epoch 2 - iter 2600/2606 - loss 0.16672440 - time (sec): 1330.72 - samples/sec: 275.49 - lr: 0.000142 - momentum: 0.000000
123
+ 2023-10-11 07:19:58,227 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-11 07:19:58,228 EPOCH 2 done: loss 0.1664 - lr: 0.000142
125
+ 2023-10-11 07:20:39,656 DEV : loss 0.1355997771024704 - f1-score (micro avg) 0.3157
126
+ 2023-10-11 07:20:39,711 ----------------------------------------------------------------------------------------------------
127
+ 2023-10-11 07:22:55,760 epoch 3 - iter 260/2606 - loss 0.09722056 - time (sec): 136.05 - samples/sec: 256.55 - lr: 0.000140 - momentum: 0.000000
128
+ 2023-10-11 07:25:13,756 epoch 3 - iter 520/2606 - loss 0.09970387 - time (sec): 274.04 - samples/sec: 259.22 - lr: 0.000139 - momentum: 0.000000
129
+ 2023-10-11 07:27:29,818 epoch 3 - iter 780/2606 - loss 0.09543305 - time (sec): 410.11 - samples/sec: 260.14 - lr: 0.000137 - momentum: 0.000000
130
+ 2023-10-11 07:29:51,073 epoch 3 - iter 1040/2606 - loss 0.10146619 - time (sec): 551.36 - samples/sec: 263.99 - lr: 0.000135 - momentum: 0.000000
131
+ 2023-10-11 07:32:08,639 epoch 3 - iter 1300/2606 - loss 0.10377832 - time (sec): 688.93 - samples/sec: 266.67 - lr: 0.000133 - momentum: 0.000000
132
+ 2023-10-11 07:34:22,168 epoch 3 - iter 1560/2606 - loss 0.10053981 - time (sec): 822.46 - samples/sec: 266.82 - lr: 0.000132 - momentum: 0.000000
133
+ 2023-10-11 07:36:35,069 epoch 3 - iter 1820/2606 - loss 0.09949351 - time (sec): 955.36 - samples/sec: 266.99 - lr: 0.000130 - momentum: 0.000000
134
+ 2023-10-11 07:38:49,369 epoch 3 - iter 2080/2606 - loss 0.09938405 - time (sec): 1089.66 - samples/sec: 268.16 - lr: 0.000128 - momentum: 0.000000
135
+ 2023-10-11 07:41:02,378 epoch 3 - iter 2340/2606 - loss 0.09946291 - time (sec): 1222.66 - samples/sec: 268.23 - lr: 0.000126 - momentum: 0.000000
136
+ 2023-10-11 07:43:18,809 epoch 3 - iter 2600/2606 - loss 0.09852433 - time (sec): 1359.10 - samples/sec: 269.85 - lr: 0.000125 - momentum: 0.000000
137
+ 2023-10-11 07:43:21,680 ----------------------------------------------------------------------------------------------------
138
+ 2023-10-11 07:43:21,680 EPOCH 3 done: loss 0.0987 - lr: 0.000125
139
+ 2023-10-11 07:44:02,294 DEV : loss 0.21938827633857727 - f1-score (micro avg) 0.3278
140
+ 2023-10-11 07:44:02,348 saving best model
141
+ 2023-10-11 07:44:08,749 ----------------------------------------------------------------------------------------------------
142
+ 2023-10-11 07:46:23,521 epoch 4 - iter 260/2606 - loss 0.07666126 - time (sec): 134.77 - samples/sec: 260.80 - lr: 0.000123 - momentum: 0.000000
143
+ 2023-10-11 07:48:42,647 epoch 4 - iter 520/2606 - loss 0.07094144 - time (sec): 273.89 - samples/sec: 262.46 - lr: 0.000121 - momentum: 0.000000
144
+ 2023-10-11 07:51:03,507 epoch 4 - iter 780/2606 - loss 0.06881513 - time (sec): 414.75 - samples/sec: 262.90 - lr: 0.000119 - momentum: 0.000000
145
+ 2023-10-11 07:53:21,938 epoch 4 - iter 1040/2606 - loss 0.07087916 - time (sec): 553.18 - samples/sec: 261.33 - lr: 0.000117 - momentum: 0.000000
146
+ 2023-10-11 07:55:44,233 epoch 4 - iter 1300/2606 - loss 0.06949960 - time (sec): 695.48 - samples/sec: 265.57 - lr: 0.000116 - momentum: 0.000000
147
+ 2023-10-11 07:58:00,501 epoch 4 - iter 1560/2606 - loss 0.06914117 - time (sec): 831.75 - samples/sec: 264.19 - lr: 0.000114 - momentum: 0.000000
148
+ 2023-10-11 08:00:17,923 epoch 4 - iter 1820/2606 - loss 0.07037335 - time (sec): 969.17 - samples/sec: 265.35 - lr: 0.000112 - momentum: 0.000000
149
+ 2023-10-11 08:02:37,656 epoch 4 - iter 2080/2606 - loss 0.07026631 - time (sec): 1108.90 - samples/sec: 268.13 - lr: 0.000110 - momentum: 0.000000
150
+ 2023-10-11 08:04:51,458 epoch 4 - iter 2340/2606 - loss 0.06994160 - time (sec): 1242.70 - samples/sec: 266.75 - lr: 0.000109 - momentum: 0.000000
151
+ 2023-10-11 08:07:07,399 epoch 4 - iter 2600/2606 - loss 0.07025162 - time (sec): 1378.65 - samples/sec: 266.17 - lr: 0.000107 - momentum: 0.000000
152
+ 2023-10-11 08:07:10,188 ----------------------------------------------------------------------------------------------------
153
+ 2023-10-11 08:07:10,189 EPOCH 4 done: loss 0.0702 - lr: 0.000107
154
+ 2023-10-11 08:07:49,536 DEV : loss 0.26091474294662476 - f1-score (micro avg) 0.3583
155
+ 2023-10-11 08:07:49,591 saving best model
156
+ 2023-10-11 08:07:55,779 ----------------------------------------------------------------------------------------------------
157
+ 2023-10-11 08:10:11,216 epoch 5 - iter 260/2606 - loss 0.03843460 - time (sec): 135.43 - samples/sec: 264.71 - lr: 0.000105 - momentum: 0.000000
158
+ 2023-10-11 08:12:27,730 epoch 5 - iter 520/2606 - loss 0.04394140 - time (sec): 271.95 - samples/sec: 270.49 - lr: 0.000103 - momentum: 0.000000
159
+ 2023-10-11 08:14:43,439 epoch 5 - iter 780/2606 - loss 0.04684401 - time (sec): 407.66 - samples/sec: 267.22 - lr: 0.000101 - momentum: 0.000000
160
+ 2023-10-11 08:17:04,023 epoch 5 - iter 1040/2606 - loss 0.04695587 - time (sec): 548.24 - samples/sec: 265.21 - lr: 0.000100 - momentum: 0.000000
161
+ 2023-10-11 08:19:25,434 epoch 5 - iter 1300/2606 - loss 0.04626700 - time (sec): 689.65 - samples/sec: 266.52 - lr: 0.000098 - momentum: 0.000000
162
+ 2023-10-11 08:21:43,551 epoch 5 - iter 1560/2606 - loss 0.04788600 - time (sec): 827.77 - samples/sec: 264.81 - lr: 0.000096 - momentum: 0.000000
163
+ 2023-10-11 08:23:59,805 epoch 5 - iter 1820/2606 - loss 0.04881533 - time (sec): 964.02 - samples/sec: 265.49 - lr: 0.000094 - momentum: 0.000000
164
+ 2023-10-11 08:26:14,240 epoch 5 - iter 2080/2606 - loss 0.04905495 - time (sec): 1098.46 - samples/sec: 264.97 - lr: 0.000093 - momentum: 0.000000
165
+ 2023-10-11 08:28:29,966 epoch 5 - iter 2340/2606 - loss 0.04807348 - time (sec): 1234.18 - samples/sec: 266.04 - lr: 0.000091 - momentum: 0.000000
166
+ 2023-10-11 08:30:46,446 epoch 5 - iter 2600/2606 - loss 0.04913741 - time (sec): 1370.66 - samples/sec: 266.96 - lr: 0.000089 - momentum: 0.000000
167
+ 2023-10-11 08:30:50,213 ----------------------------------------------------------------------------------------------------
168
+ 2023-10-11 08:30:50,213 EPOCH 5 done: loss 0.0491 - lr: 0.000089
169
+ 2023-10-11 08:31:31,103 DEV : loss 0.3354221284389496 - f1-score (micro avg) 0.3411
170
+ 2023-10-11 08:31:31,156 ----------------------------------------------------------------------------------------------------
171
+ 2023-10-11 08:33:39,991 epoch 6 - iter 260/2606 - loss 0.03643917 - time (sec): 128.83 - samples/sec: 261.17 - lr: 0.000087 - momentum: 0.000000
172
+ 2023-10-11 08:35:50,088 epoch 6 - iter 520/2606 - loss 0.03512044 - time (sec): 258.93 - samples/sec: 263.39 - lr: 0.000085 - momentum: 0.000000
173
+ 2023-10-11 08:38:01,620 epoch 6 - iter 780/2606 - loss 0.03671140 - time (sec): 390.46 - samples/sec: 267.74 - lr: 0.000084 - momentum: 0.000000
174
+ 2023-10-11 08:40:10,677 epoch 6 - iter 1040/2606 - loss 0.03608106 - time (sec): 519.52 - samples/sec: 270.12 - lr: 0.000082 - momentum: 0.000000
175
+ 2023-10-11 08:42:21,009 epoch 6 - iter 1300/2606 - loss 0.03705224 - time (sec): 649.85 - samples/sec: 272.84 - lr: 0.000080 - momentum: 0.000000
176
+ 2023-10-11 08:44:31,388 epoch 6 - iter 1560/2606 - loss 0.03561669 - time (sec): 780.23 - samples/sec: 272.24 - lr: 0.000078 - momentum: 0.000000
177
+ 2023-10-11 08:46:47,025 epoch 6 - iter 1820/2606 - loss 0.03469015 - time (sec): 915.87 - samples/sec: 275.06 - lr: 0.000077 - momentum: 0.000000
178
+ 2023-10-11 08:48:58,805 epoch 6 - iter 2080/2606 - loss 0.03543369 - time (sec): 1047.65 - samples/sec: 276.49 - lr: 0.000075 - momentum: 0.000000
179
+ 2023-10-11 08:51:13,324 epoch 6 - iter 2340/2606 - loss 0.03550377 - time (sec): 1182.17 - samples/sec: 278.36 - lr: 0.000073 - momentum: 0.000000
180
+ 2023-10-11 08:53:25,518 epoch 6 - iter 2600/2606 - loss 0.03521598 - time (sec): 1314.36 - samples/sec: 278.94 - lr: 0.000071 - momentum: 0.000000
181
+ 2023-10-11 08:53:28,410 ----------------------------------------------------------------------------------------------------
182
+ 2023-10-11 08:53:28,410 EPOCH 6 done: loss 0.0353 - lr: 0.000071
183
+ 2023-10-11 08:54:06,572 DEV : loss 0.4164799451828003 - f1-score (micro avg) 0.3462
184
+ 2023-10-11 08:54:06,624 ----------------------------------------------------------------------------------------------------
185
+ 2023-10-11 08:56:18,996 epoch 7 - iter 260/2606 - loss 0.02421613 - time (sec): 132.37 - samples/sec: 302.86 - lr: 0.000069 - momentum: 0.000000
186
+ 2023-10-11 08:58:27,905 epoch 7 - iter 520/2606 - loss 0.02396377 - time (sec): 261.28 - samples/sec: 290.49 - lr: 0.000068 - momentum: 0.000000
187
+ 2023-10-11 09:00:42,064 epoch 7 - iter 780/2606 - loss 0.02348941 - time (sec): 395.44 - samples/sec: 286.61 - lr: 0.000066 - momentum: 0.000000
188
+ 2023-10-11 09:02:59,932 epoch 7 - iter 1040/2606 - loss 0.02640947 - time (sec): 533.31 - samples/sec: 284.72 - lr: 0.000064 - momentum: 0.000000
189
+ 2023-10-11 09:05:14,117 epoch 7 - iter 1300/2606 - loss 0.02707752 - time (sec): 667.49 - samples/sec: 277.99 - lr: 0.000062 - momentum: 0.000000
190
+ 2023-10-11 09:07:30,502 epoch 7 - iter 1560/2606 - loss 0.02838498 - time (sec): 803.88 - samples/sec: 277.62 - lr: 0.000061 - momentum: 0.000000
191
+ 2023-10-11 09:09:45,699 epoch 7 - iter 1820/2606 - loss 0.02906200 - time (sec): 939.07 - samples/sec: 275.27 - lr: 0.000059 - momentum: 0.000000
192
+ 2023-10-11 09:12:01,451 epoch 7 - iter 2080/2606 - loss 0.02815842 - time (sec): 1074.82 - samples/sec: 273.34 - lr: 0.000057 - momentum: 0.000000
193
+ 2023-10-11 09:14:19,544 epoch 7 - iter 2340/2606 - loss 0.02840422 - time (sec): 1212.92 - samples/sec: 272.62 - lr: 0.000055 - momentum: 0.000000
194
+ 2023-10-11 09:16:32,556 epoch 7 - iter 2600/2606 - loss 0.02795787 - time (sec): 1345.93 - samples/sec: 272.54 - lr: 0.000053 - momentum: 0.000000
195
+ 2023-10-11 09:16:35,342 ----------------------------------------------------------------------------------------------------
196
+ 2023-10-11 09:16:35,342 EPOCH 7 done: loss 0.0280 - lr: 0.000053
197
+ 2023-10-11 09:17:14,611 DEV : loss 0.38824594020843506 - f1-score (micro avg) 0.3855
198
+ 2023-10-11 09:17:14,663 saving best model
199
+ 2023-10-11 09:17:17,239 ----------------------------------------------------------------------------------------------------
200
+ 2023-10-11 09:19:25,024 epoch 8 - iter 260/2606 - loss 0.01471915 - time (sec): 127.78 - samples/sec: 287.72 - lr: 0.000052 - momentum: 0.000000
201
+ 2023-10-11 09:21:33,496 epoch 8 - iter 520/2606 - loss 0.01972084 - time (sec): 256.25 - samples/sec: 288.86 - lr: 0.000050 - momentum: 0.000000
202
+ 2023-10-11 09:23:42,553 epoch 8 - iter 780/2606 - loss 0.02027082 - time (sec): 385.31 - samples/sec: 285.92 - lr: 0.000048 - momentum: 0.000000
203
+ 2023-10-11 09:25:51,832 epoch 8 - iter 1040/2606 - loss 0.01980338 - time (sec): 514.59 - samples/sec: 284.71 - lr: 0.000046 - momentum: 0.000000
204
+ 2023-10-11 09:28:01,892 epoch 8 - iter 1300/2606 - loss 0.02000664 - time (sec): 644.65 - samples/sec: 285.96 - lr: 0.000045 - momentum: 0.000000
205
+ 2023-10-11 09:30:12,840 epoch 8 - iter 1560/2606 - loss 0.02105758 - time (sec): 775.60 - samples/sec: 284.89 - lr: 0.000043 - momentum: 0.000000
206
+ 2023-10-11 09:32:22,237 epoch 8 - iter 1820/2606 - loss 0.02052649 - time (sec): 904.99 - samples/sec: 283.61 - lr: 0.000041 - momentum: 0.000000
207
+ 2023-10-11 09:34:33,876 epoch 8 - iter 2080/2606 - loss 0.02031634 - time (sec): 1036.63 - samples/sec: 283.17 - lr: 0.000039 - momentum: 0.000000
208
+ 2023-10-11 09:36:45,792 epoch 8 - iter 2340/2606 - loss 0.02078787 - time (sec): 1168.55 - samples/sec: 283.56 - lr: 0.000037 - momentum: 0.000000
209
+ 2023-10-11 09:38:55,539 epoch 8 - iter 2600/2606 - loss 0.02156944 - time (sec): 1298.30 - samples/sec: 282.15 - lr: 0.000036 - momentum: 0.000000
210
+ 2023-10-11 09:38:58,823 ----------------------------------------------------------------------------------------------------
211
+ 2023-10-11 09:38:58,823 EPOCH 8 done: loss 0.0216 - lr: 0.000036
212
+ 2023-10-11 09:39:38,820 DEV : loss 0.4608902931213379 - f1-score (micro avg) 0.3699
213
+ 2023-10-11 09:39:38,874 ----------------------------------------------------------------------------------------------------
214
+ 2023-10-11 09:41:56,379 epoch 9 - iter 260/2606 - loss 0.01668864 - time (sec): 137.50 - samples/sec: 278.59 - lr: 0.000034 - momentum: 0.000000
215
+ 2023-10-11 09:44:11,267 epoch 9 - iter 520/2606 - loss 0.01829687 - time (sec): 272.39 - samples/sec: 278.00 - lr: 0.000032 - momentum: 0.000000
216
+ 2023-10-11 09:46:23,543 epoch 9 - iter 780/2606 - loss 0.01610329 - time (sec): 404.67 - samples/sec: 275.21 - lr: 0.000030 - momentum: 0.000000
217
+ 2023-10-11 09:48:36,901 epoch 9 - iter 1040/2606 - loss 0.01572074 - time (sec): 538.02 - samples/sec: 271.27 - lr: 0.000029 - momentum: 0.000000
218
+ 2023-10-11 09:50:50,690 epoch 9 - iter 1300/2606 - loss 0.01553277 - time (sec): 671.81 - samples/sec: 272.97 - lr: 0.000027 - momentum: 0.000000
219
+ 2023-10-11 09:53:01,793 epoch 9 - iter 1560/2606 - loss 0.01493528 - time (sec): 802.92 - samples/sec: 272.20 - lr: 0.000025 - momentum: 0.000000
220
+ 2023-10-11 09:55:13,466 epoch 9 - iter 1820/2606 - loss 0.01487477 - time (sec): 934.59 - samples/sec: 273.39 - lr: 0.000023 - momentum: 0.000000
221
+ 2023-10-11 09:57:26,097 epoch 9 - iter 2080/2606 - loss 0.01451586 - time (sec): 1067.22 - samples/sec: 274.48 - lr: 0.000021 - momentum: 0.000000
222
+ 2023-10-11 09:59:36,968 epoch 9 - iter 2340/2606 - loss 0.01514862 - time (sec): 1198.09 - samples/sec: 275.73 - lr: 0.000020 - momentum: 0.000000
223
+ 2023-10-11 10:01:48,255 epoch 9 - iter 2600/2606 - loss 0.01524259 - time (sec): 1329.38 - samples/sec: 275.86 - lr: 0.000018 - momentum: 0.000000
224
+ 2023-10-11 10:01:51,153 ----------------------------------------------------------------------------------------------------
225
+ 2023-10-11 10:01:51,154 EPOCH 9 done: loss 0.0152 - lr: 0.000018
226
+ 2023-10-11 10:02:30,205 DEV : loss 0.4856250286102295 - f1-score (micro avg) 0.3617
227
+ 2023-10-11 10:02:30,256 ----------------------------------------------------------------------------------------------------
228
+ 2023-10-11 10:04:42,536 epoch 10 - iter 260/2606 - loss 0.01055031 - time (sec): 132.28 - samples/sec: 275.53 - lr: 0.000016 - momentum: 0.000000
229
+ 2023-10-11 10:06:53,359 epoch 10 - iter 520/2606 - loss 0.01088598 - time (sec): 263.10 - samples/sec: 273.07 - lr: 0.000014 - momentum: 0.000000
230
+ 2023-10-11 10:09:05,981 epoch 10 - iter 780/2606 - loss 0.01012233 - time (sec): 395.72 - samples/sec: 273.96 - lr: 0.000013 - momentum: 0.000000
231
+ 2023-10-11 10:11:17,473 epoch 10 - iter 1040/2606 - loss 0.00939882 - time (sec): 527.21 - samples/sec: 272.16 - lr: 0.000011 - momentum: 0.000000
232
+ 2023-10-11 10:13:30,498 epoch 10 - iter 1300/2606 - loss 0.00948977 - time (sec): 660.24 - samples/sec: 276.86 - lr: 0.000009 - momentum: 0.000000
233
+ 2023-10-11 10:15:39,670 epoch 10 - iter 1560/2606 - loss 0.00947705 - time (sec): 789.41 - samples/sec: 276.95 - lr: 0.000007 - momentum: 0.000000
234
+ 2023-10-11 10:17:49,663 epoch 10 - iter 1820/2606 - loss 0.01016429 - time (sec): 919.40 - samples/sec: 277.18 - lr: 0.000005 - momentum: 0.000000
235
+ 2023-10-11 10:19:59,020 epoch 10 - iter 2080/2606 - loss 0.01020035 - time (sec): 1048.76 - samples/sec: 276.58 - lr: 0.000004 - momentum: 0.000000
236
+ 2023-10-11 10:22:11,668 epoch 10 - iter 2340/2606 - loss 0.01061835 - time (sec): 1181.41 - samples/sec: 278.88 - lr: 0.000002 - momentum: 0.000000
237
+ 2023-10-11 10:24:23,077 epoch 10 - iter 2600/2606 - loss 0.01044635 - time (sec): 1312.82 - samples/sec: 279.05 - lr: 0.000000 - momentum: 0.000000
238
+ 2023-10-11 10:24:26,217 ----------------------------------------------------------------------------------------------------
239
+ 2023-10-11 10:24:26,217 EPOCH 10 done: loss 0.0104 - lr: 0.000000
240
+ 2023-10-11 10:25:04,805 DEV : loss 0.4848763942718506 - f1-score (micro avg) 0.3673
241
+ 2023-10-11 10:25:05,729 ----------------------------------------------------------------------------------------------------
242
+ 2023-10-11 10:25:05,732 Loading model from best epoch ...
243
+ 2023-10-11 10:25:09,728 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
244
+ 2023-10-11 10:26:46,831
245
+ Results:
246
+ - F-score (micro) 0.4614
247
+ - F-score (macro) 0.3091
248
+ - Accuracy 0.3043
249
+
250
+ By class:
251
+ precision recall f1-score support
252
+
253
+ LOC 0.4851 0.5783 0.5276 1214
254
+ PER 0.4194 0.4765 0.4461 808
255
+ ORG 0.2620 0.2635 0.2627 353
256
+ HumanProd 0.0000 0.0000 0.0000 15
257
+
258
+ micro avg 0.4330 0.4937 0.4614 2390
259
+ macro avg 0.2916 0.3295 0.3091 2390
260
+ weighted avg 0.4269 0.4937 0.4576 2390
261
+
262
+ 2023-10-11 10:26:46,832 ----------------------------------------------------------------------------------------------------