stefan-it commited on
Commit
3cca4bf
1 Parent(s): 6ed9dfe

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b4daaf693e28c70332f5cdc2c27585cfc9d760a2d143808da380dd17865e2d6
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd40a4ee2ecc7f9d4900fe4487585346ceacaa208ba354371071e01e9b24aa2c
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 12:57:33 0.0001 1.1237 0.2241 0.5000 0.0010 0.0021 0.0010
3
+ 2 13:04:25 0.0001 0.1363 0.1259 0.7870 0.6260 0.6974 0.5509
4
+ 3 13:11:24 0.0001 0.0803 0.0913 0.8357 0.7831 0.8085 0.6910
5
+ 4 13:18:21 0.0001 0.0537 0.0855 0.8890 0.7862 0.8344 0.7234
6
+ 5 13:25:42 0.0001 0.0355 0.0984 0.8839 0.7789 0.8281 0.7160
7
+ 6 13:32:38 0.0001 0.0261 0.0910 0.8746 0.8357 0.8547 0.7568
8
+ 7 13:39:26 0.0001 0.0209 0.1229 0.8893 0.7965 0.8403 0.7343
9
+ 8 13:46:31 0.0000 0.0162 0.1279 0.8855 0.8068 0.8443 0.7410
10
+ 9 13:53:49 0.0000 0.0136 0.1361 0.8861 0.8037 0.8429 0.7374
11
+ 10 14:01:12 0.0000 0.0113 0.1389 0.8890 0.8027 0.8436 0.7386
runs/events.out.tfevents.1697115033.c8b2203b18a8.2408.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b1e33a8811ad30ce46aa8470ce88a946c2cd4e4c3249910d838cc181558d3c9
3
+ size 407048
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-12 12:50:33,877 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-12 12:50:33,879 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-12 12:50:33,879 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-12 12:50:33,880 MultiCorpus: 5777 train + 722 dev + 723 test sentences
72
+ - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
73
+ 2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-12 12:50:33,880 Train: 5777 sentences
75
+ 2023-10-12 12:50:33,880 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-12 12:50:33,880 Training Params:
78
+ 2023-10-12 12:50:33,880 - learning_rate: "0.00015"
79
+ 2023-10-12 12:50:33,880 - mini_batch_size: "8"
80
+ 2023-10-12 12:50:33,880 - max_epochs: "10"
81
+ 2023-10-12 12:50:33,880 - shuffle: "True"
82
+ 2023-10-12 12:50:33,880 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-12 12:50:33,880 Plugins:
84
+ 2023-10-12 12:50:33,881 - TensorboardLogger
85
+ 2023-10-12 12:50:33,881 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-12 12:50:33,881 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-12 12:50:33,881 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-12 12:50:33,881 Computation:
91
+ 2023-10-12 12:50:33,881 - compute on device: cuda:0
92
+ 2023-10-12 12:50:33,881 - embedding storage: none
93
+ 2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-12 12:50:33,881 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
95
+ 2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-12 12:50:33,881 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-12 12:50:33,882 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-12 12:51:13,856 epoch 1 - iter 72/723 - loss 2.57210097 - time (sec): 39.97 - samples/sec: 446.88 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-12 12:51:55,334 epoch 1 - iter 144/723 - loss 2.50368577 - time (sec): 81.45 - samples/sec: 443.12 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-12 12:52:35,583 epoch 1 - iter 216/723 - loss 2.34479028 - time (sec): 121.70 - samples/sec: 436.36 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-12 12:53:16,570 epoch 1 - iter 288/723 - loss 2.14084440 - time (sec): 162.69 - samples/sec: 434.66 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-12 12:53:56,024 epoch 1 - iter 360/723 - loss 1.92004853 - time (sec): 202.14 - samples/sec: 437.13 - lr: 0.000074 - momentum: 0.000000
103
+ 2023-10-12 12:54:36,420 epoch 1 - iter 432/723 - loss 1.71012873 - time (sec): 242.54 - samples/sec: 433.77 - lr: 0.000089 - momentum: 0.000000
104
+ 2023-10-12 12:55:15,712 epoch 1 - iter 504/723 - loss 1.51773573 - time (sec): 281.83 - samples/sec: 434.20 - lr: 0.000104 - momentum: 0.000000
105
+ 2023-10-12 12:55:55,929 epoch 1 - iter 576/723 - loss 1.35508499 - time (sec): 322.05 - samples/sec: 436.31 - lr: 0.000119 - momentum: 0.000000
106
+ 2023-10-12 12:56:35,029 epoch 1 - iter 648/723 - loss 1.22422963 - time (sec): 361.15 - samples/sec: 439.92 - lr: 0.000134 - momentum: 0.000000
107
+ 2023-10-12 12:57:12,712 epoch 1 - iter 720/723 - loss 1.12678212 - time (sec): 398.83 - samples/sec: 440.47 - lr: 0.000149 - momentum: 0.000000
108
+ 2023-10-12 12:57:13,890 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-12 12:57:13,891 EPOCH 1 done: loss 1.1237 - lr: 0.000149
110
+ 2023-10-12 12:57:33,815 DEV : loss 0.22408561408519745 - f1-score (micro avg) 0.0021
111
+ 2023-10-12 12:57:33,845 saving best model
112
+ 2023-10-12 12:57:34,699 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-12 12:58:12,792 epoch 2 - iter 72/723 - loss 0.16692332 - time (sec): 38.09 - samples/sec: 465.66 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-12 12:58:50,804 epoch 2 - iter 144/723 - loss 0.17013500 - time (sec): 76.10 - samples/sec: 458.04 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-12 12:59:29,373 epoch 2 - iter 216/723 - loss 0.16696133 - time (sec): 114.67 - samples/sec: 455.25 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-12 13:00:06,773 epoch 2 - iter 288/723 - loss 0.15925242 - time (sec): 152.07 - samples/sec: 456.96 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-12 13:00:44,163 epoch 2 - iter 360/723 - loss 0.15257157 - time (sec): 189.46 - samples/sec: 453.07 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-12 13:01:23,135 epoch 2 - iter 432/723 - loss 0.14873337 - time (sec): 228.43 - samples/sec: 453.50 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-12 13:02:02,355 epoch 2 - iter 504/723 - loss 0.14684350 - time (sec): 267.65 - samples/sec: 455.16 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-12 13:02:43,047 epoch 2 - iter 576/723 - loss 0.14308762 - time (sec): 308.35 - samples/sec: 453.78 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-12 13:03:23,828 epoch 2 - iter 648/723 - loss 0.13860972 - time (sec): 349.13 - samples/sec: 452.24 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-12 13:04:03,474 epoch 2 - iter 720/723 - loss 0.13666210 - time (sec): 388.77 - samples/sec: 451.37 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-12 13:04:04,944 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-12 13:04:04,945 EPOCH 2 done: loss 0.1363 - lr: 0.000133
125
+ 2023-10-12 13:04:25,559 DEV : loss 0.125865638256073 - f1-score (micro avg) 0.6974
126
+ 2023-10-12 13:04:25,588 saving best model
127
+ 2023-10-12 13:04:28,483 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-12 13:05:06,593 epoch 3 - iter 72/723 - loss 0.10614972 - time (sec): 38.11 - samples/sec: 441.85 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-12 13:05:45,790 epoch 3 - iter 144/723 - loss 0.09543565 - time (sec): 77.30 - samples/sec: 448.25 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-12 13:06:24,554 epoch 3 - iter 216/723 - loss 0.09411193 - time (sec): 116.07 - samples/sec: 445.86 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-12 13:07:02,711 epoch 3 - iter 288/723 - loss 0.09004094 - time (sec): 154.22 - samples/sec: 447.22 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-12 13:07:42,238 epoch 3 - iter 360/723 - loss 0.08961960 - time (sec): 193.75 - samples/sec: 446.27 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-12 13:08:22,102 epoch 3 - iter 432/723 - loss 0.08897250 - time (sec): 233.61 - samples/sec: 451.71 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-12 13:09:02,522 epoch 3 - iter 504/723 - loss 0.08525340 - time (sec): 274.03 - samples/sec: 450.43 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-12 13:09:41,313 epoch 3 - iter 576/723 - loss 0.08340939 - time (sec): 312.83 - samples/sec: 449.52 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-12 13:10:21,839 epoch 3 - iter 648/723 - loss 0.08185803 - time (sec): 353.35 - samples/sec: 446.67 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-12 13:11:01,410 epoch 3 - iter 720/723 - loss 0.08020714 - time (sec): 392.92 - samples/sec: 447.11 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-12 13:11:02,572 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-12 13:11:02,573 EPOCH 3 done: loss 0.0803 - lr: 0.000117
140
+ 2023-10-12 13:11:24,304 DEV : loss 0.09134244173765182 - f1-score (micro avg) 0.8085
141
+ 2023-10-12 13:11:24,336 saving best model
142
+ 2023-10-12 13:11:26,940 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-12 13:12:08,816 epoch 4 - iter 72/723 - loss 0.04375890 - time (sec): 41.87 - samples/sec: 451.41 - lr: 0.000115 - momentum: 0.000000
144
+ 2023-10-12 13:12:45,861 epoch 4 - iter 144/723 - loss 0.04780814 - time (sec): 78.92 - samples/sec: 443.89 - lr: 0.000113 - momentum: 0.000000
145
+ 2023-10-12 13:13:26,615 epoch 4 - iter 216/723 - loss 0.04927505 - time (sec): 119.67 - samples/sec: 431.85 - lr: 0.000112 - momentum: 0.000000
146
+ 2023-10-12 13:14:05,422 epoch 4 - iter 288/723 - loss 0.05251675 - time (sec): 158.48 - samples/sec: 435.09 - lr: 0.000110 - momentum: 0.000000
147
+ 2023-10-12 13:14:42,756 epoch 4 - iter 360/723 - loss 0.05197170 - time (sec): 195.81 - samples/sec: 439.06 - lr: 0.000108 - momentum: 0.000000
148
+ 2023-10-12 13:15:22,534 epoch 4 - iter 432/723 - loss 0.05444043 - time (sec): 235.59 - samples/sec: 441.72 - lr: 0.000107 - momentum: 0.000000
149
+ 2023-10-12 13:16:02,299 epoch 4 - iter 504/723 - loss 0.05358124 - time (sec): 275.36 - samples/sec: 447.12 - lr: 0.000105 - momentum: 0.000000
150
+ 2023-10-12 13:16:40,164 epoch 4 - iter 576/723 - loss 0.05390487 - time (sec): 313.22 - samples/sec: 447.55 - lr: 0.000103 - momentum: 0.000000
151
+ 2023-10-12 13:17:20,651 epoch 4 - iter 648/723 - loss 0.05448671 - time (sec): 353.71 - samples/sec: 445.79 - lr: 0.000102 - momentum: 0.000000
152
+ 2023-10-12 13:17:59,869 epoch 4 - iter 720/723 - loss 0.05377454 - time (sec): 392.93 - samples/sec: 447.28 - lr: 0.000100 - momentum: 0.000000
153
+ 2023-10-12 13:18:00,975 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-12 13:18:00,975 EPOCH 4 done: loss 0.0537 - lr: 0.000100
155
+ 2023-10-12 13:18:21,723 DEV : loss 0.08551333099603653 - f1-score (micro avg) 0.8344
156
+ 2023-10-12 13:18:21,755 saving best model
157
+ 2023-10-12 13:18:22,733 ----------------------------------------------------------------------------------------------------
158
+ 2023-10-12 13:19:01,088 epoch 5 - iter 72/723 - loss 0.02887265 - time (sec): 38.35 - samples/sec: 442.57 - lr: 0.000098 - momentum: 0.000000
159
+ 2023-10-12 13:19:42,050 epoch 5 - iter 144/723 - loss 0.03460162 - time (sec): 79.32 - samples/sec: 433.60 - lr: 0.000097 - momentum: 0.000000
160
+ 2023-10-12 13:20:22,461 epoch 5 - iter 216/723 - loss 0.03491227 - time (sec): 119.73 - samples/sec: 435.34 - lr: 0.000095 - momentum: 0.000000
161
+ 2023-10-12 13:21:05,047 epoch 5 - iter 288/723 - loss 0.03630309 - time (sec): 162.31 - samples/sec: 434.65 - lr: 0.000093 - momentum: 0.000000
162
+ 2023-10-12 13:21:48,859 epoch 5 - iter 360/723 - loss 0.03649219 - time (sec): 206.12 - samples/sec: 427.40 - lr: 0.000092 - momentum: 0.000000
163
+ 2023-10-12 13:22:31,786 epoch 5 - iter 432/723 - loss 0.03618346 - time (sec): 249.05 - samples/sec: 424.25 - lr: 0.000090 - momentum: 0.000000
164
+ 2023-10-12 13:23:12,902 epoch 5 - iter 504/723 - loss 0.03519474 - time (sec): 290.17 - samples/sec: 420.92 - lr: 0.000088 - momentum: 0.000000
165
+ 2023-10-12 13:23:54,082 epoch 5 - iter 576/723 - loss 0.03451399 - time (sec): 331.35 - samples/sec: 422.25 - lr: 0.000087 - momentum: 0.000000
166
+ 2023-10-12 13:24:35,954 epoch 5 - iter 648/723 - loss 0.03511239 - time (sec): 373.22 - samples/sec: 422.11 - lr: 0.000085 - momentum: 0.000000
167
+ 2023-10-12 13:25:18,788 epoch 5 - iter 720/723 - loss 0.03557155 - time (sec): 416.05 - samples/sec: 422.28 - lr: 0.000083 - momentum: 0.000000
168
+ 2023-10-12 13:25:20,108 ----------------------------------------------------------------------------------------------------
169
+ 2023-10-12 13:25:20,108 EPOCH 5 done: loss 0.0355 - lr: 0.000083
170
+ 2023-10-12 13:25:42,303 DEV : loss 0.09844549000263214 - f1-score (micro avg) 0.8281
171
+ 2023-10-12 13:25:42,335 ----------------------------------------------------------------------------------------------------
172
+ 2023-10-12 13:26:24,818 epoch 6 - iter 72/723 - loss 0.03450655 - time (sec): 42.48 - samples/sec: 423.14 - lr: 0.000082 - momentum: 0.000000
173
+ 2023-10-12 13:27:04,856 epoch 6 - iter 144/723 - loss 0.03035545 - time (sec): 82.52 - samples/sec: 423.88 - lr: 0.000080 - momentum: 0.000000
174
+ 2023-10-12 13:27:43,676 epoch 6 - iter 216/723 - loss 0.02931547 - time (sec): 121.34 - samples/sec: 436.00 - lr: 0.000078 - momentum: 0.000000
175
+ 2023-10-12 13:28:23,185 epoch 6 - iter 288/723 - loss 0.02672535 - time (sec): 160.85 - samples/sec: 445.88 - lr: 0.000077 - momentum: 0.000000
176
+ 2023-10-12 13:29:00,446 epoch 6 - iter 360/723 - loss 0.02600815 - time (sec): 198.11 - samples/sec: 441.33 - lr: 0.000075 - momentum: 0.000000
177
+ 2023-10-12 13:29:40,823 epoch 6 - iter 432/723 - loss 0.02605262 - time (sec): 238.49 - samples/sec: 447.85 - lr: 0.000073 - momentum: 0.000000
178
+ 2023-10-12 13:30:19,459 epoch 6 - iter 504/723 - loss 0.02591407 - time (sec): 277.12 - samples/sec: 446.06 - lr: 0.000072 - momentum: 0.000000
179
+ 2023-10-12 13:30:58,816 epoch 6 - iter 576/723 - loss 0.02584013 - time (sec): 316.48 - samples/sec: 446.29 - lr: 0.000070 - momentum: 0.000000
180
+ 2023-10-12 13:31:38,066 epoch 6 - iter 648/723 - loss 0.02608188 - time (sec): 355.73 - samples/sec: 446.29 - lr: 0.000068 - momentum: 0.000000
181
+ 2023-10-12 13:32:16,080 epoch 6 - iter 720/723 - loss 0.02616533 - time (sec): 393.74 - samples/sec: 446.09 - lr: 0.000067 - momentum: 0.000000
182
+ 2023-10-12 13:32:17,295 ----------------------------------------------------------------------------------------------------
183
+ 2023-10-12 13:32:17,296 EPOCH 6 done: loss 0.0261 - lr: 0.000067
184
+ 2023-10-12 13:32:38,779 DEV : loss 0.0909653976559639 - f1-score (micro avg) 0.8547
185
+ 2023-10-12 13:32:38,811 saving best model
186
+ 2023-10-12 13:32:41,417 ----------------------------------------------------------------------------------------------------
187
+ 2023-10-12 13:33:18,878 epoch 7 - iter 72/723 - loss 0.02548483 - time (sec): 37.46 - samples/sec: 446.16 - lr: 0.000065 - momentum: 0.000000
188
+ 2023-10-12 13:33:58,757 epoch 7 - iter 144/723 - loss 0.02514500 - time (sec): 77.34 - samples/sec: 462.95 - lr: 0.000063 - momentum: 0.000000
189
+ 2023-10-12 13:34:37,377 epoch 7 - iter 216/723 - loss 0.02611835 - time (sec): 115.96 - samples/sec: 462.23 - lr: 0.000062 - momentum: 0.000000
190
+ 2023-10-12 13:35:15,889 epoch 7 - iter 288/723 - loss 0.02415853 - time (sec): 154.47 - samples/sec: 459.97 - lr: 0.000060 - momentum: 0.000000
191
+ 2023-10-12 13:35:54,789 epoch 7 - iter 360/723 - loss 0.02340115 - time (sec): 193.37 - samples/sec: 458.91 - lr: 0.000058 - momentum: 0.000000
192
+ 2023-10-12 13:36:34,765 epoch 7 - iter 432/723 - loss 0.02218556 - time (sec): 233.34 - samples/sec: 458.27 - lr: 0.000057 - momentum: 0.000000
193
+ 2023-10-12 13:37:13,961 epoch 7 - iter 504/723 - loss 0.02199272 - time (sec): 272.54 - samples/sec: 459.46 - lr: 0.000055 - momentum: 0.000000
194
+ 2023-10-12 13:37:51,396 epoch 7 - iter 576/723 - loss 0.02149862 - time (sec): 309.98 - samples/sec: 459.97 - lr: 0.000053 - momentum: 0.000000
195
+ 2023-10-12 13:38:28,567 epoch 7 - iter 648/723 - loss 0.02110408 - time (sec): 347.15 - samples/sec: 457.63 - lr: 0.000052 - momentum: 0.000000
196
+ 2023-10-12 13:39:04,853 epoch 7 - iter 720/723 - loss 0.02089846 - time (sec): 383.43 - samples/sec: 458.45 - lr: 0.000050 - momentum: 0.000000
197
+ 2023-10-12 13:39:05,894 ----------------------------------------------------------------------------------------------------
198
+ 2023-10-12 13:39:05,895 EPOCH 7 done: loss 0.0209 - lr: 0.000050
199
+ 2023-10-12 13:39:26,497 DEV : loss 0.12286769598722458 - f1-score (micro avg) 0.8403
200
+ 2023-10-12 13:39:26,530 ----------------------------------------------------------------------------------------------------
201
+ 2023-10-12 13:40:05,852 epoch 8 - iter 72/723 - loss 0.01313140 - time (sec): 39.32 - samples/sec: 454.78 - lr: 0.000048 - momentum: 0.000000
202
+ 2023-10-12 13:40:45,572 epoch 8 - iter 144/723 - loss 0.01250697 - time (sec): 79.04 - samples/sec: 454.78 - lr: 0.000047 - momentum: 0.000000
203
+ 2023-10-12 13:41:25,089 epoch 8 - iter 216/723 - loss 0.01246682 - time (sec): 118.56 - samples/sec: 451.77 - lr: 0.000045 - momentum: 0.000000
204
+ 2023-10-12 13:42:05,028 epoch 8 - iter 288/723 - loss 0.01132435 - time (sec): 158.50 - samples/sec: 455.71 - lr: 0.000043 - momentum: 0.000000
205
+ 2023-10-12 13:42:42,164 epoch 8 - iter 360/723 - loss 0.01353114 - time (sec): 195.63 - samples/sec: 450.26 - lr: 0.000042 - momentum: 0.000000
206
+ 2023-10-12 13:43:21,974 epoch 8 - iter 432/723 - loss 0.01342207 - time (sec): 235.44 - samples/sec: 446.73 - lr: 0.000040 - momentum: 0.000000
207
+ 2023-10-12 13:44:01,853 epoch 8 - iter 504/723 - loss 0.01495552 - time (sec): 275.32 - samples/sec: 444.93 - lr: 0.000038 - momentum: 0.000000
208
+ 2023-10-12 13:44:44,572 epoch 8 - iter 576/723 - loss 0.01542540 - time (sec): 318.04 - samples/sec: 441.01 - lr: 0.000037 - momentum: 0.000000
209
+ 2023-10-12 13:45:26,594 epoch 8 - iter 648/723 - loss 0.01508947 - time (sec): 360.06 - samples/sec: 438.48 - lr: 0.000035 - momentum: 0.000000
210
+ 2023-10-12 13:46:08,421 epoch 8 - iter 720/723 - loss 0.01601048 - time (sec): 401.89 - samples/sec: 437.43 - lr: 0.000033 - momentum: 0.000000
211
+ 2023-10-12 13:46:09,570 ----------------------------------------------------------------------------------------------------
212
+ 2023-10-12 13:46:09,571 EPOCH 8 done: loss 0.0162 - lr: 0.000033
213
+ 2023-10-12 13:46:31,884 DEV : loss 0.12785491347312927 - f1-score (micro avg) 0.8443
214
+ 2023-10-12 13:46:31,946 ----------------------------------------------------------------------------------------------------
215
+ 2023-10-12 13:47:15,224 epoch 9 - iter 72/723 - loss 0.01226192 - time (sec): 43.28 - samples/sec: 424.75 - lr: 0.000032 - momentum: 0.000000
216
+ 2023-10-12 13:47:57,051 epoch 9 - iter 144/723 - loss 0.01230280 - time (sec): 85.10 - samples/sec: 413.44 - lr: 0.000030 - momentum: 0.000000
217
+ 2023-10-12 13:48:39,683 epoch 9 - iter 216/723 - loss 0.01384980 - time (sec): 127.73 - samples/sec: 408.83 - lr: 0.000028 - momentum: 0.000000
218
+ 2023-10-12 13:49:21,709 epoch 9 - iter 288/723 - loss 0.01248849 - time (sec): 169.76 - samples/sec: 410.18 - lr: 0.000027 - momentum: 0.000000
219
+ 2023-10-12 13:50:02,274 epoch 9 - iter 360/723 - loss 0.01133267 - time (sec): 210.32 - samples/sec: 418.21 - lr: 0.000025 - momentum: 0.000000
220
+ 2023-10-12 13:50:44,723 epoch 9 - iter 432/723 - loss 0.01131670 - time (sec): 252.77 - samples/sec: 417.56 - lr: 0.000023 - momentum: 0.000000
221
+ 2023-10-12 13:51:25,911 epoch 9 - iter 504/723 - loss 0.01149792 - time (sec): 293.96 - samples/sec: 421.43 - lr: 0.000022 - momentum: 0.000000
222
+ 2023-10-12 13:52:06,053 epoch 9 - iter 576/723 - loss 0.01200648 - time (sec): 334.10 - samples/sec: 422.14 - lr: 0.000020 - momentum: 0.000000
223
+ 2023-10-12 13:52:44,347 epoch 9 - iter 648/723 - loss 0.01233164 - time (sec): 372.40 - samples/sec: 422.87 - lr: 0.000018 - momentum: 0.000000
224
+ 2023-10-12 13:53:25,270 epoch 9 - iter 720/723 - loss 0.01241683 - time (sec): 413.32 - samples/sec: 423.69 - lr: 0.000017 - momentum: 0.000000
225
+ 2023-10-12 13:53:27,201 ----------------------------------------------------------------------------------------------------
226
+ 2023-10-12 13:53:27,202 EPOCH 9 done: loss 0.0136 - lr: 0.000017
227
+ 2023-10-12 13:53:49,325 DEV : loss 0.13609179854393005 - f1-score (micro avg) 0.8429
228
+ 2023-10-12 13:53:49,364 ----------------------------------------------------------------------------------------------------
229
+ 2023-10-12 13:54:34,158 epoch 10 - iter 72/723 - loss 0.02070022 - time (sec): 44.79 - samples/sec: 420.68 - lr: 0.000015 - momentum: 0.000000
230
+ 2023-10-12 13:55:14,534 epoch 10 - iter 144/723 - loss 0.01572521 - time (sec): 85.17 - samples/sec: 427.85 - lr: 0.000013 - momentum: 0.000000
231
+ 2023-10-12 13:55:57,299 epoch 10 - iter 216/723 - loss 0.01436480 - time (sec): 127.93 - samples/sec: 418.71 - lr: 0.000012 - momentum: 0.000000
232
+ 2023-10-12 13:56:38,887 epoch 10 - iter 288/723 - loss 0.01335702 - time (sec): 169.52 - samples/sec: 421.14 - lr: 0.000010 - momentum: 0.000000
233
+ 2023-10-12 13:57:20,435 epoch 10 - iter 360/723 - loss 0.01325818 - time (sec): 211.07 - samples/sec: 422.83 - lr: 0.000008 - momentum: 0.000000
234
+ 2023-10-12 13:58:02,750 epoch 10 - iter 432/723 - loss 0.01235929 - time (sec): 253.38 - samples/sec: 425.90 - lr: 0.000007 - momentum: 0.000000
235
+ 2023-10-12 13:58:43,798 epoch 10 - iter 504/723 - loss 0.01217336 - time (sec): 294.43 - samples/sec: 420.51 - lr: 0.000005 - momentum: 0.000000
236
+ 2023-10-12 13:59:25,942 epoch 10 - iter 576/723 - loss 0.01172104 - time (sec): 336.58 - samples/sec: 422.72 - lr: 0.000003 - momentum: 0.000000
237
+ 2023-10-12 14:00:06,168 epoch 10 - iter 648/723 - loss 0.01125967 - time (sec): 376.80 - samples/sec: 421.48 - lr: 0.000002 - momentum: 0.000000
238
+ 2023-10-12 14:00:47,810 epoch 10 - iter 720/723 - loss 0.01135370 - time (sec): 418.44 - samples/sec: 420.11 - lr: 0.000000 - momentum: 0.000000
239
+ 2023-10-12 14:00:48,941 ----------------------------------------------------------------------------------------------------
240
+ 2023-10-12 14:00:48,941 EPOCH 10 done: loss 0.0113 - lr: 0.000000
241
+ 2023-10-12 14:01:12,472 DEV : loss 0.13888543844223022 - f1-score (micro avg) 0.8436
242
+ 2023-10-12 14:01:13,512 ----------------------------------------------------------------------------------------------------
243
+ 2023-10-12 14:01:13,514 Loading model from best epoch ...
244
+ 2023-10-12 14:01:17,667 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
245
+ 2023-10-12 14:01:40,330
246
+ Results:
247
+ - F-score (micro) 0.8564
248
+ - F-score (macro) 0.7697
249
+ - Accuracy 0.7601
250
+
251
+ By class:
252
+ precision recall f1-score support
253
+
254
+ PER 0.8566 0.8672 0.8619 482
255
+ LOC 0.8937 0.8996 0.8966 458
256
+ ORG 0.5507 0.5507 0.5507 69
257
+
258
+ micro avg 0.8527 0.8603 0.8564 1009
259
+ macro avg 0.7670 0.7725 0.7697 1009
260
+ weighted avg 0.8525 0.8603 0.8564 1009
261
+
262
+ 2023-10-12 14:01:40,330 ----------------------------------------------------------------------------------------------------