stefan-it commited on
Commit
c4efc39
1 Parent(s): 1adc1e9

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f38b3ec45a8199bea26e33aaefc36fb1425480e1bc130593f043bf26a22a2a59
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2bb0274179d026a742d81654d1742588df7bd926a62fc744a0d3d5f0dd6e1184
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 14:09:26 0.0002 1.0975 0.2361 0.3056 0.0114 0.0219 0.0111
3
+ 2 14:16:42 0.0001 0.1355 0.1225 0.8060 0.6136 0.6968 0.5450
4
+ 3 14:24:03 0.0001 0.0773 0.0773 0.8529 0.8264 0.8395 0.7353
5
+ 4 14:31:11 0.0001 0.0510 0.0844 0.8915 0.7893 0.8373 0.7269
6
+ 5 14:38:32 0.0001 0.0336 0.0960 0.8749 0.7872 0.8287 0.7195
7
+ 6 14:45:31 0.0001 0.0252 0.1010 0.8760 0.8099 0.8417 0.7368
8
+ 7 14:52:45 0.0001 0.0196 0.1374 0.8762 0.7459 0.8058 0.6831
9
+ 8 14:59:56 0.0000 0.0152 0.1267 0.8906 0.8161 0.8518 0.7495
10
+ 9 15:07:06 0.0000 0.0115 0.1332 0.8966 0.8151 0.8539 0.7521
11
+ 10 15:14:11 0.0000 0.0093 0.1387 0.8977 0.8068 0.8498 0.7459
runs/events.out.tfevents.1697119340.c8b2203b18a8.2408.5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2dcbc0a78627e7215edd44fb14df2cbab84888ebded24c72d78df808160c984
3
+ size 407048
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-12 14:02:20,581 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-12 14:02:20,583 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-12 14:02:20,583 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-12 14:02:20,584 MultiCorpus: 5777 train + 722 dev + 723 test sentences
72
+ - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl
73
+ 2023-10-12 14:02:20,584 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-12 14:02:20,584 Train: 5777 sentences
75
+ 2023-10-12 14:02:20,584 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-12 14:02:20,584 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-12 14:02:20,584 Training Params:
78
+ 2023-10-12 14:02:20,584 - learning_rate: "0.00016"
79
+ 2023-10-12 14:02:20,584 - mini_batch_size: "8"
80
+ 2023-10-12 14:02:20,584 - max_epochs: "10"
81
+ 2023-10-12 14:02:20,584 - shuffle: "True"
82
+ 2023-10-12 14:02:20,584 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-12 14:02:20,584 Plugins:
84
+ 2023-10-12 14:02:20,585 - TensorboardLogger
85
+ 2023-10-12 14:02:20,585 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-12 14:02:20,585 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-12 14:02:20,585 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-12 14:02:20,585 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-12 14:02:20,585 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-12 14:02:20,585 Computation:
91
+ 2023-10-12 14:02:20,585 - compute on device: cuda:0
92
+ 2023-10-12 14:02:20,585 - embedding storage: none
93
+ 2023-10-12 14:02:20,585 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-12 14:02:20,585 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4"
95
+ 2023-10-12 14:02:20,585 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-12 14:02:20,585 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-12 14:02:20,586 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-12 14:03:00,450 epoch 1 - iter 72/723 - loss 2.57146864 - time (sec): 39.86 - samples/sec: 448.11 - lr: 0.000016 - momentum: 0.000000
99
+ 2023-10-12 14:03:40,814 epoch 1 - iter 144/723 - loss 2.49589202 - time (sec): 80.23 - samples/sec: 449.88 - lr: 0.000032 - momentum: 0.000000
100
+ 2023-10-12 14:04:19,302 epoch 1 - iter 216/723 - loss 2.32807345 - time (sec): 118.71 - samples/sec: 447.33 - lr: 0.000048 - momentum: 0.000000
101
+ 2023-10-12 14:04:57,778 epoch 1 - iter 288/723 - loss 2.11742467 - time (sec): 157.19 - samples/sec: 449.86 - lr: 0.000064 - momentum: 0.000000
102
+ 2023-10-12 14:05:37,192 epoch 1 - iter 360/723 - loss 1.88902537 - time (sec): 196.60 - samples/sec: 449.44 - lr: 0.000079 - momentum: 0.000000
103
+ 2023-10-12 14:06:17,712 epoch 1 - iter 432/723 - loss 1.67414156 - time (sec): 237.12 - samples/sec: 443.67 - lr: 0.000095 - momentum: 0.000000
104
+ 2023-10-12 14:06:57,989 epoch 1 - iter 504/723 - loss 1.48274010 - time (sec): 277.40 - samples/sec: 441.13 - lr: 0.000111 - momentum: 0.000000
105
+ 2023-10-12 14:07:40,425 epoch 1 - iter 576/723 - loss 1.32332329 - time (sec): 319.84 - samples/sec: 439.32 - lr: 0.000127 - momentum: 0.000000
106
+ 2023-10-12 14:08:22,595 epoch 1 - iter 648/723 - loss 1.19542792 - time (sec): 362.01 - samples/sec: 438.88 - lr: 0.000143 - momentum: 0.000000
107
+ 2023-10-12 14:09:04,093 epoch 1 - iter 720/723 - loss 1.10050274 - time (sec): 403.51 - samples/sec: 435.36 - lr: 0.000159 - momentum: 0.000000
108
+ 2023-10-12 14:09:05,529 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-12 14:09:05,529 EPOCH 1 done: loss 1.0975 - lr: 0.000159
110
+ 2023-10-12 14:09:26,230 DEV : loss 0.2361268699169159 - f1-score (micro avg) 0.0219
111
+ 2023-10-12 14:09:26,265 saving best model
112
+ 2023-10-12 14:09:27,169 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-12 14:10:08,416 epoch 2 - iter 72/723 - loss 0.16985132 - time (sec): 41.24 - samples/sec: 430.05 - lr: 0.000158 - momentum: 0.000000
114
+ 2023-10-12 14:10:49,483 epoch 2 - iter 144/723 - loss 0.17040915 - time (sec): 82.31 - samples/sec: 423.49 - lr: 0.000156 - momentum: 0.000000
115
+ 2023-10-12 14:11:31,985 epoch 2 - iter 216/723 - loss 0.16710193 - time (sec): 124.81 - samples/sec: 418.26 - lr: 0.000155 - momentum: 0.000000
116
+ 2023-10-12 14:12:13,326 epoch 2 - iter 288/723 - loss 0.15978625 - time (sec): 166.15 - samples/sec: 418.23 - lr: 0.000153 - momentum: 0.000000
117
+ 2023-10-12 14:12:52,562 epoch 2 - iter 360/723 - loss 0.15267936 - time (sec): 205.39 - samples/sec: 417.94 - lr: 0.000151 - momentum: 0.000000
118
+ 2023-10-12 14:13:33,393 epoch 2 - iter 432/723 - loss 0.14849362 - time (sec): 246.22 - samples/sec: 420.74 - lr: 0.000149 - momentum: 0.000000
119
+ 2023-10-12 14:14:13,896 epoch 2 - iter 504/723 - loss 0.14625781 - time (sec): 286.72 - samples/sec: 424.88 - lr: 0.000148 - momentum: 0.000000
120
+ 2023-10-12 14:14:53,859 epoch 2 - iter 576/723 - loss 0.14267707 - time (sec): 326.69 - samples/sec: 428.30 - lr: 0.000146 - momentum: 0.000000
121
+ 2023-10-12 14:15:35,999 epoch 2 - iter 648/723 - loss 0.13812316 - time (sec): 368.83 - samples/sec: 428.09 - lr: 0.000144 - momentum: 0.000000
122
+ 2023-10-12 14:16:17,416 epoch 2 - iter 720/723 - loss 0.13583196 - time (sec): 410.24 - samples/sec: 427.74 - lr: 0.000142 - momentum: 0.000000
123
+ 2023-10-12 14:16:19,108 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-12 14:16:19,109 EPOCH 2 done: loss 0.1355 - lr: 0.000142
125
+ 2023-10-12 14:16:42,231 DEV : loss 0.12248684465885162 - f1-score (micro avg) 0.6968
126
+ 2023-10-12 14:16:42,279 saving best model
127
+ 2023-10-12 14:16:45,002 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-12 14:17:24,300 epoch 3 - iter 72/723 - loss 0.10358979 - time (sec): 39.29 - samples/sec: 428.50 - lr: 0.000140 - momentum: 0.000000
129
+ 2023-10-12 14:18:05,010 epoch 3 - iter 144/723 - loss 0.09371822 - time (sec): 80.00 - samples/sec: 433.12 - lr: 0.000139 - momentum: 0.000000
130
+ 2023-10-12 14:18:46,397 epoch 3 - iter 216/723 - loss 0.09346647 - time (sec): 121.39 - samples/sec: 426.31 - lr: 0.000137 - momentum: 0.000000
131
+ 2023-10-12 14:19:28,061 epoch 3 - iter 288/723 - loss 0.08825349 - time (sec): 163.05 - samples/sec: 423.00 - lr: 0.000135 - momentum: 0.000000
132
+ 2023-10-12 14:20:09,909 epoch 3 - iter 360/723 - loss 0.08701050 - time (sec): 204.90 - samples/sec: 421.98 - lr: 0.000133 - momentum: 0.000000
133
+ 2023-10-12 14:20:54,725 epoch 3 - iter 432/723 - loss 0.08620917 - time (sec): 249.72 - samples/sec: 422.59 - lr: 0.000132 - momentum: 0.000000
134
+ 2023-10-12 14:21:36,824 epoch 3 - iter 504/723 - loss 0.08267895 - time (sec): 291.82 - samples/sec: 422.98 - lr: 0.000130 - momentum: 0.000000
135
+ 2023-10-12 14:22:18,077 epoch 3 - iter 576/723 - loss 0.08059396 - time (sec): 333.07 - samples/sec: 422.20 - lr: 0.000128 - momentum: 0.000000
136
+ 2023-10-12 14:23:00,039 epoch 3 - iter 648/723 - loss 0.07876330 - time (sec): 375.03 - samples/sec: 420.84 - lr: 0.000126 - momentum: 0.000000
137
+ 2023-10-12 14:23:39,699 epoch 3 - iter 720/723 - loss 0.07731069 - time (sec): 414.69 - samples/sec: 423.64 - lr: 0.000125 - momentum: 0.000000
138
+ 2023-10-12 14:23:41,000 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-12 14:23:41,001 EPOCH 3 done: loss 0.0773 - lr: 0.000125
140
+ 2023-10-12 14:24:03,948 DEV : loss 0.07725054025650024 - f1-score (micro avg) 0.8395
141
+ 2023-10-12 14:24:03,984 saving best model
142
+ 2023-10-12 14:24:06,590 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-12 14:24:47,396 epoch 4 - iter 72/723 - loss 0.04244017 - time (sec): 40.80 - samples/sec: 463.29 - lr: 0.000123 - momentum: 0.000000
144
+ 2023-10-12 14:25:24,545 epoch 4 - iter 144/723 - loss 0.04566430 - time (sec): 77.95 - samples/sec: 449.41 - lr: 0.000121 - momentum: 0.000000
145
+ 2023-10-12 14:26:02,955 epoch 4 - iter 216/723 - loss 0.04742743 - time (sec): 116.36 - samples/sec: 444.15 - lr: 0.000119 - momentum: 0.000000
146
+ 2023-10-12 14:26:42,655 epoch 4 - iter 288/723 - loss 0.04960497 - time (sec): 156.06 - samples/sec: 441.84 - lr: 0.000117 - momentum: 0.000000
147
+ 2023-10-12 14:27:21,464 epoch 4 - iter 360/723 - loss 0.04991076 - time (sec): 194.87 - samples/sec: 441.19 - lr: 0.000116 - momentum: 0.000000
148
+ 2023-10-12 14:28:01,068 epoch 4 - iter 432/723 - loss 0.05291596 - time (sec): 234.47 - samples/sec: 443.83 - lr: 0.000114 - momentum: 0.000000
149
+ 2023-10-12 14:28:42,792 epoch 4 - iter 504/723 - loss 0.05159753 - time (sec): 276.20 - samples/sec: 445.76 - lr: 0.000112 - momentum: 0.000000
150
+ 2023-10-12 14:29:23,833 epoch 4 - iter 576/723 - loss 0.05151396 - time (sec): 317.24 - samples/sec: 441.89 - lr: 0.000110 - momentum: 0.000000
151
+ 2023-10-12 14:30:06,126 epoch 4 - iter 648/723 - loss 0.05194550 - time (sec): 359.53 - samples/sec: 438.57 - lr: 0.000109 - momentum: 0.000000
152
+ 2023-10-12 14:30:48,071 epoch 4 - iter 720/723 - loss 0.05103521 - time (sec): 401.48 - samples/sec: 437.75 - lr: 0.000107 - momentum: 0.000000
153
+ 2023-10-12 14:30:49,234 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-12 14:30:49,234 EPOCH 4 done: loss 0.0510 - lr: 0.000107
155
+ 2023-10-12 14:31:11,165 DEV : loss 0.08439276367425919 - f1-score (micro avg) 0.8373
156
+ 2023-10-12 14:31:11,197 ----------------------------------------------------------------------------------------------------
157
+ 2023-10-12 14:31:51,823 epoch 5 - iter 72/723 - loss 0.02827603 - time (sec): 40.62 - samples/sec: 417.84 - lr: 0.000105 - momentum: 0.000000
158
+ 2023-10-12 14:32:34,630 epoch 5 - iter 144/723 - loss 0.03309349 - time (sec): 83.43 - samples/sec: 412.21 - lr: 0.000103 - momentum: 0.000000
159
+ 2023-10-12 14:33:17,993 epoch 5 - iter 216/723 - loss 0.03235199 - time (sec): 126.79 - samples/sec: 411.07 - lr: 0.000101 - momentum: 0.000000
160
+ 2023-10-12 14:34:00,697 epoch 5 - iter 288/723 - loss 0.03466735 - time (sec): 169.50 - samples/sec: 416.22 - lr: 0.000100 - momentum: 0.000000
161
+ 2023-10-12 14:34:41,998 epoch 5 - iter 360/723 - loss 0.03514176 - time (sec): 210.80 - samples/sec: 417.92 - lr: 0.000098 - momentum: 0.000000
162
+ 2023-10-12 14:35:24,599 epoch 5 - iter 432/723 - loss 0.03437310 - time (sec): 253.40 - samples/sec: 416.97 - lr: 0.000096 - momentum: 0.000000
163
+ 2023-10-12 14:36:08,609 epoch 5 - iter 504/723 - loss 0.03332593 - time (sec): 297.41 - samples/sec: 410.67 - lr: 0.000094 - momentum: 0.000000
164
+ 2023-10-12 14:36:52,064 epoch 5 - iter 576/723 - loss 0.03276601 - time (sec): 340.87 - samples/sec: 410.46 - lr: 0.000093 - momentum: 0.000000
165
+ 2023-10-12 14:37:31,645 epoch 5 - iter 648/723 - loss 0.03304384 - time (sec): 380.45 - samples/sec: 414.10 - lr: 0.000091 - momentum: 0.000000
166
+ 2023-10-12 14:38:10,349 epoch 5 - iter 720/723 - loss 0.03360075 - time (sec): 419.15 - samples/sec: 419.16 - lr: 0.000089 - momentum: 0.000000
167
+ 2023-10-12 14:38:11,539 ----------------------------------------------------------------------------------------------------
168
+ 2023-10-12 14:38:11,539 EPOCH 5 done: loss 0.0336 - lr: 0.000089
169
+ 2023-10-12 14:38:32,325 DEV : loss 0.09600105881690979 - f1-score (micro avg) 0.8287
170
+ 2023-10-12 14:38:32,357 ----------------------------------------------------------------------------------------------------
171
+ 2023-10-12 14:39:11,419 epoch 6 - iter 72/723 - loss 0.03220877 - time (sec): 39.06 - samples/sec: 460.19 - lr: 0.000087 - momentum: 0.000000
172
+ 2023-10-12 14:39:49,366 epoch 6 - iter 144/723 - loss 0.03071465 - time (sec): 77.01 - samples/sec: 454.22 - lr: 0.000085 - momentum: 0.000000
173
+ 2023-10-12 14:40:28,696 epoch 6 - iter 216/723 - loss 0.02895270 - time (sec): 116.34 - samples/sec: 454.75 - lr: 0.000084 - momentum: 0.000000
174
+ 2023-10-12 14:41:08,773 epoch 6 - iter 288/723 - loss 0.02631946 - time (sec): 156.41 - samples/sec: 458.51 - lr: 0.000082 - momentum: 0.000000
175
+ 2023-10-12 14:41:48,052 epoch 6 - iter 360/723 - loss 0.02581441 - time (sec): 195.69 - samples/sec: 446.78 - lr: 0.000080 - momentum: 0.000000
176
+ 2023-10-12 14:42:31,220 epoch 6 - iter 432/723 - loss 0.02580522 - time (sec): 238.86 - samples/sec: 447.15 - lr: 0.000078 - momentum: 0.000000
177
+ 2023-10-12 14:43:09,959 epoch 6 - iter 504/723 - loss 0.02470462 - time (sec): 277.60 - samples/sec: 445.29 - lr: 0.000077 - momentum: 0.000000
178
+ 2023-10-12 14:43:48,878 epoch 6 - iter 576/723 - loss 0.02475012 - time (sec): 316.52 - samples/sec: 446.23 - lr: 0.000075 - momentum: 0.000000
179
+ 2023-10-12 14:44:29,409 epoch 6 - iter 648/723 - loss 0.02481761 - time (sec): 357.05 - samples/sec: 444.63 - lr: 0.000073 - momentum: 0.000000
180
+ 2023-10-12 14:45:08,583 epoch 6 - iter 720/723 - loss 0.02526761 - time (sec): 396.22 - samples/sec: 443.30 - lr: 0.000071 - momentum: 0.000000
181
+ 2023-10-12 14:45:09,781 ----------------------------------------------------------------------------------------------------
182
+ 2023-10-12 14:45:09,782 EPOCH 6 done: loss 0.0252 - lr: 0.000071
183
+ 2023-10-12 14:45:31,300 DEV : loss 0.10096623748540878 - f1-score (micro avg) 0.8417
184
+ 2023-10-12 14:45:31,331 saving best model
185
+ 2023-10-12 14:45:33,926 ----------------------------------------------------------------------------------------------------
186
+ 2023-10-12 14:46:12,365 epoch 7 - iter 72/723 - loss 0.02687994 - time (sec): 38.44 - samples/sec: 434.81 - lr: 0.000069 - momentum: 0.000000
187
+ 2023-10-12 14:46:53,700 epoch 7 - iter 144/723 - loss 0.02462178 - time (sec): 79.77 - samples/sec: 448.83 - lr: 0.000068 - momentum: 0.000000
188
+ 2023-10-12 14:47:35,239 epoch 7 - iter 216/723 - loss 0.02481712 - time (sec): 121.31 - samples/sec: 441.84 - lr: 0.000066 - momentum: 0.000000
189
+ 2023-10-12 14:48:14,912 epoch 7 - iter 288/723 - loss 0.02288093 - time (sec): 160.98 - samples/sec: 441.36 - lr: 0.000064 - momentum: 0.000000
190
+ 2023-10-12 14:48:54,552 epoch 7 - iter 360/723 - loss 0.02190559 - time (sec): 200.62 - samples/sec: 442.32 - lr: 0.000062 - momentum: 0.000000
191
+ 2023-10-12 14:49:37,326 epoch 7 - iter 432/723 - loss 0.02096859 - time (sec): 243.40 - samples/sec: 439.35 - lr: 0.000061 - momentum: 0.000000
192
+ 2023-10-12 14:50:18,758 epoch 7 - iter 504/723 - loss 0.02057194 - time (sec): 284.83 - samples/sec: 439.64 - lr: 0.000059 - momentum: 0.000000
193
+ 2023-10-12 14:50:57,549 epoch 7 - iter 576/723 - loss 0.01995539 - time (sec): 323.62 - samples/sec: 440.58 - lr: 0.000057 - momentum: 0.000000
194
+ 2023-10-12 14:51:36,488 epoch 7 - iter 648/723 - loss 0.01987628 - time (sec): 362.56 - samples/sec: 438.18 - lr: 0.000055 - momentum: 0.000000
195
+ 2023-10-12 14:52:19,649 epoch 7 - iter 720/723 - loss 0.01959379 - time (sec): 405.72 - samples/sec: 433.27 - lr: 0.000053 - momentum: 0.000000
196
+ 2023-10-12 14:52:20,987 ----------------------------------------------------------------------------------------------------
197
+ 2023-10-12 14:52:20,988 EPOCH 7 done: loss 0.0196 - lr: 0.000053
198
+ 2023-10-12 14:52:45,021 DEV : loss 0.13744854927062988 - f1-score (micro avg) 0.8058
199
+ 2023-10-12 14:52:45,064 ----------------------------------------------------------------------------------------------------
200
+ 2023-10-12 14:53:26,090 epoch 8 - iter 72/723 - loss 0.01120851 - time (sec): 41.02 - samples/sec: 435.90 - lr: 0.000052 - momentum: 0.000000
201
+ 2023-10-12 14:54:05,919 epoch 8 - iter 144/723 - loss 0.01234430 - time (sec): 80.85 - samples/sec: 444.59 - lr: 0.000050 - momentum: 0.000000
202
+ 2023-10-12 14:54:47,450 epoch 8 - iter 216/723 - loss 0.01255276 - time (sec): 122.38 - samples/sec: 437.65 - lr: 0.000048 - momentum: 0.000000
203
+ 2023-10-12 14:55:29,854 epoch 8 - iter 288/723 - loss 0.01141486 - time (sec): 164.79 - samples/sec: 438.32 - lr: 0.000046 - momentum: 0.000000
204
+ 2023-10-12 14:56:10,467 epoch 8 - iter 360/723 - loss 0.01320089 - time (sec): 205.40 - samples/sec: 428.85 - lr: 0.000045 - momentum: 0.000000
205
+ 2023-10-12 14:56:51,709 epoch 8 - iter 432/723 - loss 0.01317686 - time (sec): 246.64 - samples/sec: 426.45 - lr: 0.000043 - momentum: 0.000000
206
+ 2023-10-12 14:57:32,364 epoch 8 - iter 504/723 - loss 0.01453607 - time (sec): 287.30 - samples/sec: 426.39 - lr: 0.000041 - momentum: 0.000000
207
+ 2023-10-12 14:58:13,699 epoch 8 - iter 576/723 - loss 0.01465906 - time (sec): 328.63 - samples/sec: 426.80 - lr: 0.000039 - momentum: 0.000000
208
+ 2023-10-12 14:58:53,964 epoch 8 - iter 648/723 - loss 0.01412604 - time (sec): 368.90 - samples/sec: 427.98 - lr: 0.000037 - momentum: 0.000000
209
+ 2023-10-12 14:59:34,520 epoch 8 - iter 720/723 - loss 0.01516939 - time (sec): 409.45 - samples/sec: 429.35 - lr: 0.000036 - momentum: 0.000000
210
+ 2023-10-12 14:59:35,627 ----------------------------------------------------------------------------------------------------
211
+ 2023-10-12 14:59:35,627 EPOCH 8 done: loss 0.0152 - lr: 0.000036
212
+ 2023-10-12 14:59:56,735 DEV : loss 0.12667076289653778 - f1-score (micro avg) 0.8518
213
+ 2023-10-12 14:59:56,766 saving best model
214
+ 2023-10-12 14:59:59,434 ----------------------------------------------------------------------------------------------------
215
+ 2023-10-12 15:00:40,080 epoch 9 - iter 72/723 - loss 0.01201924 - time (sec): 40.64 - samples/sec: 452.26 - lr: 0.000034 - momentum: 0.000000
216
+ 2023-10-12 15:01:20,276 epoch 9 - iter 144/723 - loss 0.01119713 - time (sec): 80.84 - samples/sec: 435.24 - lr: 0.000032 - momentum: 0.000000
217
+ 2023-10-12 15:02:01,633 epoch 9 - iter 216/723 - loss 0.01319119 - time (sec): 122.19 - samples/sec: 427.37 - lr: 0.000030 - momentum: 0.000000
218
+ 2023-10-12 15:02:42,950 epoch 9 - iter 288/723 - loss 0.01132013 - time (sec): 163.51 - samples/sec: 425.85 - lr: 0.000028 - momentum: 0.000000
219
+ 2023-10-12 15:03:25,210 epoch 9 - iter 360/723 - loss 0.01028013 - time (sec): 205.77 - samples/sec: 427.47 - lr: 0.000027 - momentum: 0.000000
220
+ 2023-10-12 15:04:06,118 epoch 9 - iter 432/723 - loss 0.01003947 - time (sec): 246.68 - samples/sec: 427.88 - lr: 0.000025 - momentum: 0.000000
221
+ 2023-10-12 15:04:46,490 epoch 9 - iter 504/723 - loss 0.01002813 - time (sec): 287.05 - samples/sec: 431.57 - lr: 0.000023 - momentum: 0.000000
222
+ 2023-10-12 15:05:25,758 epoch 9 - iter 576/723 - loss 0.01025793 - time (sec): 326.32 - samples/sec: 432.21 - lr: 0.000021 - momentum: 0.000000
223
+ 2023-10-12 15:06:04,302 epoch 9 - iter 648/723 - loss 0.01067156 - time (sec): 364.86 - samples/sec: 431.60 - lr: 0.000020 - momentum: 0.000000
224
+ 2023-10-12 15:06:44,309 epoch 9 - iter 720/723 - loss 0.01056221 - time (sec): 404.87 - samples/sec: 432.53 - lr: 0.000018 - momentum: 0.000000
225
+ 2023-10-12 15:06:46,196 ----------------------------------------------------------------------------------------------------
226
+ 2023-10-12 15:06:46,196 EPOCH 9 done: loss 0.0115 - lr: 0.000018
227
+ 2023-10-12 15:07:06,773 DEV : loss 0.13315436244010925 - f1-score (micro avg) 0.8539
228
+ 2023-10-12 15:07:06,804 saving best model
229
+ 2023-10-12 15:07:09,856 ----------------------------------------------------------------------------------------------------
230
+ 2023-10-12 15:07:50,190 epoch 10 - iter 72/723 - loss 0.01793669 - time (sec): 40.33 - samples/sec: 467.25 - lr: 0.000016 - momentum: 0.000000
231
+ 2023-10-12 15:08:29,257 epoch 10 - iter 144/723 - loss 0.01259728 - time (sec): 79.39 - samples/sec: 458.96 - lr: 0.000014 - momentum: 0.000000
232
+ 2023-10-12 15:09:08,646 epoch 10 - iter 216/723 - loss 0.01175785 - time (sec): 118.78 - samples/sec: 450.96 - lr: 0.000012 - momentum: 0.000000
233
+ 2023-10-12 15:09:48,830 epoch 10 - iter 288/723 - loss 0.01130089 - time (sec): 158.97 - samples/sec: 449.10 - lr: 0.000011 - momentum: 0.000000
234
+ 2023-10-12 15:10:28,554 epoch 10 - iter 360/723 - loss 0.01084751 - time (sec): 198.69 - samples/sec: 449.18 - lr: 0.000009 - momentum: 0.000000
235
+ 2023-10-12 15:11:09,274 epoch 10 - iter 432/723 - loss 0.00990835 - time (sec): 239.41 - samples/sec: 450.76 - lr: 0.000007 - momentum: 0.000000
236
+ 2023-10-12 15:11:48,871 epoch 10 - iter 504/723 - loss 0.00967903 - time (sec): 279.01 - samples/sec: 443.76 - lr: 0.000005 - momentum: 0.000000
237
+ 2023-10-12 15:12:29,990 epoch 10 - iter 576/723 - loss 0.00932348 - time (sec): 320.13 - samples/sec: 444.44 - lr: 0.000004 - momentum: 0.000000
238
+ 2023-10-12 15:13:08,637 epoch 10 - iter 648/723 - loss 0.00892390 - time (sec): 358.77 - samples/sec: 442.66 - lr: 0.000002 - momentum: 0.000000
239
+ 2023-10-12 15:13:47,594 epoch 10 - iter 720/723 - loss 0.00930951 - time (sec): 397.73 - samples/sec: 441.99 - lr: 0.000000 - momentum: 0.000000
240
+ 2023-10-12 15:13:48,661 ----------------------------------------------------------------------------------------------------
241
+ 2023-10-12 15:13:48,661 EPOCH 10 done: loss 0.0093 - lr: 0.000000
242
+ 2023-10-12 15:14:11,235 DEV : loss 0.138749361038208 - f1-score (micro avg) 0.8498
243
+ 2023-10-12 15:14:12,139 ----------------------------------------------------------------------------------------------------
244
+ 2023-10-12 15:14:12,141 Loading model from best epoch ...
245
+ 2023-10-12 15:14:16,462 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG
246
+ 2023-10-12 15:14:38,321
247
+ Results:
248
+ - F-score (micro) 0.8482
249
+ - F-score (macro) 0.7513
250
+ - Accuracy 0.7489
251
+
252
+ By class:
253
+ precision recall f1-score support
254
+
255
+ PER 0.8630 0.8361 0.8493 482
256
+ LOC 0.9302 0.8734 0.9009 458
257
+ ORG 0.5000 0.5072 0.5036 69
258
+
259
+ micro avg 0.8666 0.8305 0.8482 1009
260
+ macro avg 0.7644 0.7389 0.7513 1009
261
+ weighted avg 0.8687 0.8305 0.8491 1009
262
+
263
+ 2023-10-12 15:14:38,322 ----------------------------------------------------------------------------------------------------