stefan-it commited on
Commit
f29a286
1 Parent(s): ba11536

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17c31350179d6e8846c8431a38581cd8528f437529b3112339b37f7fa9ddbf61
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b5bcbc35690c1e40b89ff3502457671d8f372555ea356aaab1376868903664b
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 08:13:11 0.0001 1.0537 0.1830 0.3494 0.3609 0.3550 0.2463
3
+ 2 08:22:30 0.0001 0.1570 0.0931 0.6992 0.7127 0.7059 0.5681
4
+ 3 08:31:51 0.0001 0.0869 0.0919 0.6966 0.7896 0.7402 0.6107
5
+ 4 08:40:51 0.0001 0.0578 0.0993 0.7438 0.7817 0.7623 0.6363
6
+ 5 08:50:06 0.0001 0.0415 0.1137 0.7187 0.7975 0.7560 0.6306
7
+ 6 08:59:08 0.0001 0.0314 0.1343 0.7725 0.7760 0.7743 0.6502
8
+ 7 09:07:37 0.0001 0.0238 0.1695 0.7401 0.7862 0.7625 0.6382
9
+ 8 09:16:34 0.0000 0.0194 0.1777 0.7462 0.7749 0.7603 0.6360
10
+ 9 09:24:57 0.0000 0.0156 0.1852 0.7350 0.7907 0.7619 0.6378
11
+ 10 09:34:08 0.0000 0.0134 0.1930 0.7363 0.7771 0.7562 0.6303
runs/events.out.tfevents.1697097852.de2e83fddbee.1952.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e304db418498c3a77199b1fefb2adf9a11d92178649b23bcef7449669dc4980f
3
+ size 556612
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-12 08:04:12,035 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-12 08:04:12,037 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-12 08:04:12,038 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-12 08:04:12,038 MultiCorpus: 7936 train + 992 dev + 992 test sentences
72
+ - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
73
+ 2023-10-12 08:04:12,038 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-12 08:04:12,038 Train: 7936 sentences
75
+ 2023-10-12 08:04:12,038 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-12 08:04:12,038 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-12 08:04:12,038 Training Params:
78
+ 2023-10-12 08:04:12,039 - learning_rate: "0.00015"
79
+ 2023-10-12 08:04:12,039 - mini_batch_size: "8"
80
+ 2023-10-12 08:04:12,039 - max_epochs: "10"
81
+ 2023-10-12 08:04:12,039 - shuffle: "True"
82
+ 2023-10-12 08:04:12,039 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-12 08:04:12,039 Plugins:
84
+ 2023-10-12 08:04:12,039 - TensorboardLogger
85
+ 2023-10-12 08:04:12,039 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-12 08:04:12,039 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-12 08:04:12,039 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-12 08:04:12,039 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-12 08:04:12,039 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-12 08:04:12,039 Computation:
91
+ 2023-10-12 08:04:12,040 - compute on device: cuda:0
92
+ 2023-10-12 08:04:12,040 - embedding storage: none
93
+ 2023-10-12 08:04:12,040 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-12 08:04:12,040 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1"
95
+ 2023-10-12 08:04:12,040 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-12 08:04:12,040 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-12 08:04:12,040 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-12 08:05:06,449 epoch 1 - iter 99/992 - loss 2.58555570 - time (sec): 54.41 - samples/sec: 284.28 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-12 08:05:56,282 epoch 1 - iter 198/992 - loss 2.53686260 - time (sec): 104.24 - samples/sec: 302.44 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-12 08:06:46,233 epoch 1 - iter 297/992 - loss 2.34204530 - time (sec): 154.19 - samples/sec: 310.69 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-12 08:07:34,661 epoch 1 - iter 396/992 - loss 2.08169120 - time (sec): 202.62 - samples/sec: 317.45 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-12 08:08:24,602 epoch 1 - iter 495/992 - loss 1.82694454 - time (sec): 252.56 - samples/sec: 317.85 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-12 08:09:14,534 epoch 1 - iter 594/992 - loss 1.59247411 - time (sec): 302.49 - samples/sec: 320.69 - lr: 0.000090 - momentum: 0.000000
104
+ 2023-10-12 08:10:03,550 epoch 1 - iter 693/992 - loss 1.39639771 - time (sec): 351.51 - samples/sec: 325.34 - lr: 0.000105 - momentum: 0.000000
105
+ 2023-10-12 08:10:59,316 epoch 1 - iter 792/992 - loss 1.25466641 - time (sec): 407.27 - samples/sec: 321.81 - lr: 0.000120 - momentum: 0.000000
106
+ 2023-10-12 08:11:53,277 epoch 1 - iter 891/992 - loss 1.14679909 - time (sec): 461.23 - samples/sec: 319.44 - lr: 0.000135 - momentum: 0.000000
107
+ 2023-10-12 08:12:42,482 epoch 1 - iter 990/992 - loss 1.05527957 - time (sec): 510.44 - samples/sec: 320.66 - lr: 0.000150 - momentum: 0.000000
108
+ 2023-10-12 08:12:43,887 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-12 08:12:43,888 EPOCH 1 done: loss 1.0537 - lr: 0.000150
110
+ 2023-10-12 08:13:10,955 DEV : loss 0.18303227424621582 - f1-score (micro avg) 0.355
111
+ 2023-10-12 08:13:11,005 saving best model
112
+ 2023-10-12 08:13:11,967 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-12 08:14:02,123 epoch 2 - iter 99/992 - loss 0.24094875 - time (sec): 50.15 - samples/sec: 328.44 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-12 08:14:55,551 epoch 2 - iter 198/992 - loss 0.20476026 - time (sec): 103.58 - samples/sec: 315.61 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-12 08:15:49,201 epoch 2 - iter 297/992 - loss 0.19182712 - time (sec): 157.23 - samples/sec: 311.94 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-12 08:16:42,272 epoch 2 - iter 396/992 - loss 0.18719035 - time (sec): 210.30 - samples/sec: 312.52 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-12 08:17:36,327 epoch 2 - iter 495/992 - loss 0.18025399 - time (sec): 264.36 - samples/sec: 311.59 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-12 08:18:27,193 epoch 2 - iter 594/992 - loss 0.17568004 - time (sec): 315.22 - samples/sec: 312.61 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-12 08:19:21,888 epoch 2 - iter 693/992 - loss 0.17273836 - time (sec): 369.92 - samples/sec: 311.10 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-12 08:20:16,613 epoch 2 - iter 792/992 - loss 0.16679007 - time (sec): 424.64 - samples/sec: 308.04 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-12 08:21:07,673 epoch 2 - iter 891/992 - loss 0.16155770 - time (sec): 475.70 - samples/sec: 308.99 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-12 08:22:02,055 epoch 2 - iter 990/992 - loss 0.15729141 - time (sec): 530.08 - samples/sec: 308.45 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-12 08:22:03,276 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-12 08:22:03,276 EPOCH 2 done: loss 0.1570 - lr: 0.000133
125
+ 2023-10-12 08:22:30,282 DEV : loss 0.0930715873837471 - f1-score (micro avg) 0.7059
126
+ 2023-10-12 08:22:30,322 saving best model
127
+ 2023-10-12 08:22:33,238 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-12 08:23:31,054 epoch 3 - iter 99/992 - loss 0.09366490 - time (sec): 57.81 - samples/sec: 272.24 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-12 08:24:26,452 epoch 3 - iter 198/992 - loss 0.09371713 - time (sec): 113.21 - samples/sec: 282.27 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-12 08:25:20,392 epoch 3 - iter 297/992 - loss 0.09379383 - time (sec): 167.15 - samples/sec: 291.97 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-12 08:26:13,353 epoch 3 - iter 396/992 - loss 0.09391945 - time (sec): 220.11 - samples/sec: 296.14 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-12 08:27:04,613 epoch 3 - iter 495/992 - loss 0.09225266 - time (sec): 271.37 - samples/sec: 299.92 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-12 08:27:58,875 epoch 3 - iter 594/992 - loss 0.09074357 - time (sec): 325.63 - samples/sec: 299.29 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-12 08:28:48,568 epoch 3 - iter 693/992 - loss 0.09041959 - time (sec): 375.33 - samples/sec: 301.62 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-12 08:29:39,210 epoch 3 - iter 792/992 - loss 0.08851234 - time (sec): 425.97 - samples/sec: 307.21 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-12 08:30:29,714 epoch 3 - iter 891/992 - loss 0.08699505 - time (sec): 476.47 - samples/sec: 310.07 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-12 08:31:24,309 epoch 3 - iter 990/992 - loss 0.08693648 - time (sec): 531.07 - samples/sec: 308.23 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-12 08:31:25,444 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-12 08:31:25,445 EPOCH 3 done: loss 0.0869 - lr: 0.000117
140
+ 2023-10-12 08:31:51,548 DEV : loss 0.09189649671316147 - f1-score (micro avg) 0.7402
141
+ 2023-10-12 08:31:51,594 saving best model
142
+ 2023-10-12 08:31:54,213 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-12 08:32:43,918 epoch 4 - iter 99/992 - loss 0.06142325 - time (sec): 49.70 - samples/sec: 344.89 - lr: 0.000115 - momentum: 0.000000
144
+ 2023-10-12 08:33:36,309 epoch 4 - iter 198/992 - loss 0.06089639 - time (sec): 102.09 - samples/sec: 333.62 - lr: 0.000113 - momentum: 0.000000
145
+ 2023-10-12 08:34:25,710 epoch 4 - iter 297/992 - loss 0.06267594 - time (sec): 151.49 - samples/sec: 329.91 - lr: 0.000112 - momentum: 0.000000
146
+ 2023-10-12 08:35:15,366 epoch 4 - iter 396/992 - loss 0.06005022 - time (sec): 201.15 - samples/sec: 326.69 - lr: 0.000110 - momentum: 0.000000
147
+ 2023-10-12 08:36:07,476 epoch 4 - iter 495/992 - loss 0.05963130 - time (sec): 253.26 - samples/sec: 323.68 - lr: 0.000108 - momentum: 0.000000
148
+ 2023-10-12 08:36:57,931 epoch 4 - iter 594/992 - loss 0.05891404 - time (sec): 303.71 - samples/sec: 322.82 - lr: 0.000107 - momentum: 0.000000
149
+ 2023-10-12 08:37:46,481 epoch 4 - iter 693/992 - loss 0.05887390 - time (sec): 352.26 - samples/sec: 325.28 - lr: 0.000105 - momentum: 0.000000
150
+ 2023-10-12 08:38:34,872 epoch 4 - iter 792/992 - loss 0.05887289 - time (sec): 400.65 - samples/sec: 326.32 - lr: 0.000103 - momentum: 0.000000
151
+ 2023-10-12 08:39:29,526 epoch 4 - iter 891/992 - loss 0.05755141 - time (sec): 455.31 - samples/sec: 324.49 - lr: 0.000102 - momentum: 0.000000
152
+ 2023-10-12 08:40:24,825 epoch 4 - iter 990/992 - loss 0.05782701 - time (sec): 510.61 - samples/sec: 320.69 - lr: 0.000100 - momentum: 0.000000
153
+ 2023-10-12 08:40:25,816 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-12 08:40:25,816 EPOCH 4 done: loss 0.0578 - lr: 0.000100
155
+ 2023-10-12 08:40:51,595 DEV : loss 0.09931203722953796 - f1-score (micro avg) 0.7623
156
+ 2023-10-12 08:40:51,635 saving best model
157
+ 2023-10-12 08:40:57,442 ----------------------------------------------------------------------------------------------------
158
+ 2023-10-12 08:41:49,152 epoch 5 - iter 99/992 - loss 0.04436841 - time (sec): 51.71 - samples/sec: 312.44 - lr: 0.000098 - momentum: 0.000000
159
+ 2023-10-12 08:42:42,375 epoch 5 - iter 198/992 - loss 0.03706072 - time (sec): 104.93 - samples/sec: 308.20 - lr: 0.000097 - momentum: 0.000000
160
+ 2023-10-12 08:43:35,907 epoch 5 - iter 297/992 - loss 0.03821323 - time (sec): 158.46 - samples/sec: 307.02 - lr: 0.000095 - momentum: 0.000000
161
+ 2023-10-12 08:44:31,403 epoch 5 - iter 396/992 - loss 0.03912269 - time (sec): 213.96 - samples/sec: 304.26 - lr: 0.000093 - momentum: 0.000000
162
+ 2023-10-12 08:45:22,108 epoch 5 - iter 495/992 - loss 0.03917046 - time (sec): 264.66 - samples/sec: 307.19 - lr: 0.000092 - momentum: 0.000000
163
+ 2023-10-12 08:46:11,499 epoch 5 - iter 594/992 - loss 0.04022926 - time (sec): 314.05 - samples/sec: 311.81 - lr: 0.000090 - momentum: 0.000000
164
+ 2023-10-12 08:46:59,993 epoch 5 - iter 693/992 - loss 0.03989622 - time (sec): 362.55 - samples/sec: 315.65 - lr: 0.000088 - momentum: 0.000000
165
+ 2023-10-12 08:47:58,985 epoch 5 - iter 792/992 - loss 0.04056929 - time (sec): 421.54 - samples/sec: 311.55 - lr: 0.000087 - momentum: 0.000000
166
+ 2023-10-12 08:48:50,892 epoch 5 - iter 891/992 - loss 0.04088817 - time (sec): 473.45 - samples/sec: 312.27 - lr: 0.000085 - momentum: 0.000000
167
+ 2023-10-12 08:49:38,976 epoch 5 - iter 990/992 - loss 0.04156030 - time (sec): 521.53 - samples/sec: 313.73 - lr: 0.000083 - momentum: 0.000000
168
+ 2023-10-12 08:49:40,070 ----------------------------------------------------------------------------------------------------
169
+ 2023-10-12 08:49:40,071 EPOCH 5 done: loss 0.0415 - lr: 0.000083
170
+ 2023-10-12 08:50:06,253 DEV : loss 0.11372340470552444 - f1-score (micro avg) 0.756
171
+ 2023-10-12 08:50:06,293 ----------------------------------------------------------------------------------------------------
172
+ 2023-10-12 08:50:55,831 epoch 6 - iter 99/992 - loss 0.02534475 - time (sec): 49.54 - samples/sec: 316.24 - lr: 0.000082 - momentum: 0.000000
173
+ 2023-10-12 08:51:50,289 epoch 6 - iter 198/992 - loss 0.02728538 - time (sec): 103.99 - samples/sec: 307.91 - lr: 0.000080 - momentum: 0.000000
174
+ 2023-10-12 08:52:44,094 epoch 6 - iter 297/992 - loss 0.02693384 - time (sec): 157.80 - samples/sec: 305.30 - lr: 0.000078 - momentum: 0.000000
175
+ 2023-10-12 08:53:36,994 epoch 6 - iter 396/992 - loss 0.02900133 - time (sec): 210.70 - samples/sec: 309.30 - lr: 0.000077 - momentum: 0.000000
176
+ 2023-10-12 08:54:29,419 epoch 6 - iter 495/992 - loss 0.02831503 - time (sec): 263.12 - samples/sec: 308.36 - lr: 0.000075 - momentum: 0.000000
177
+ 2023-10-12 08:55:19,177 epoch 6 - iter 594/992 - loss 0.02808324 - time (sec): 312.88 - samples/sec: 312.68 - lr: 0.000073 - momentum: 0.000000
178
+ 2023-10-12 08:56:07,273 epoch 6 - iter 693/992 - loss 0.02892834 - time (sec): 360.98 - samples/sec: 317.73 - lr: 0.000072 - momentum: 0.000000
179
+ 2023-10-12 08:56:55,649 epoch 6 - iter 792/992 - loss 0.03069744 - time (sec): 409.35 - samples/sec: 319.35 - lr: 0.000070 - momentum: 0.000000
180
+ 2023-10-12 08:57:50,900 epoch 6 - iter 891/992 - loss 0.03124301 - time (sec): 464.60 - samples/sec: 317.11 - lr: 0.000068 - momentum: 0.000000
181
+ 2023-10-12 08:58:43,506 epoch 6 - iter 990/992 - loss 0.03144417 - time (sec): 517.21 - samples/sec: 316.34 - lr: 0.000067 - momentum: 0.000000
182
+ 2023-10-12 08:58:44,493 ----------------------------------------------------------------------------------------------------
183
+ 2023-10-12 08:58:44,493 EPOCH 6 done: loss 0.0314 - lr: 0.000067
184
+ 2023-10-12 08:59:08,717 DEV : loss 0.13427288830280304 - f1-score (micro avg) 0.7743
185
+ 2023-10-12 08:59:08,761 saving best model
186
+ 2023-10-12 08:59:11,805 ----------------------------------------------------------------------------------------------------
187
+ 2023-10-12 09:00:02,976 epoch 7 - iter 99/992 - loss 0.01884774 - time (sec): 51.17 - samples/sec: 318.71 - lr: 0.000065 - momentum: 0.000000
188
+ 2023-10-12 09:00:51,080 epoch 7 - iter 198/992 - loss 0.02301048 - time (sec): 99.27 - samples/sec: 332.12 - lr: 0.000063 - momentum: 0.000000
189
+ 2023-10-12 09:01:38,861 epoch 7 - iter 297/992 - loss 0.02287236 - time (sec): 147.05 - samples/sec: 332.81 - lr: 0.000062 - momentum: 0.000000
190
+ 2023-10-12 09:02:26,775 epoch 7 - iter 396/992 - loss 0.02348912 - time (sec): 194.97 - samples/sec: 336.71 - lr: 0.000060 - momentum: 0.000000
191
+ 2023-10-12 09:03:14,307 epoch 7 - iter 495/992 - loss 0.02304502 - time (sec): 242.50 - samples/sec: 336.33 - lr: 0.000058 - momentum: 0.000000
192
+ 2023-10-12 09:04:01,858 epoch 7 - iter 594/992 - loss 0.02320461 - time (sec): 290.05 - samples/sec: 337.37 - lr: 0.000057 - momentum: 0.000000
193
+ 2023-10-12 09:04:51,398 epoch 7 - iter 693/992 - loss 0.02410117 - time (sec): 339.59 - samples/sec: 337.92 - lr: 0.000055 - momentum: 0.000000
194
+ 2023-10-12 09:05:37,198 epoch 7 - iter 792/992 - loss 0.02467991 - time (sec): 385.39 - samples/sec: 336.38 - lr: 0.000053 - momentum: 0.000000
195
+ 2023-10-12 09:06:24,113 epoch 7 - iter 891/992 - loss 0.02411616 - time (sec): 432.30 - samples/sec: 338.45 - lr: 0.000052 - momentum: 0.000000
196
+ 2023-10-12 09:07:11,294 epoch 7 - iter 990/992 - loss 0.02385416 - time (sec): 479.48 - samples/sec: 341.20 - lr: 0.000050 - momentum: 0.000000
197
+ 2023-10-12 09:07:12,268 ----------------------------------------------------------------------------------------------------
198
+ 2023-10-12 09:07:12,268 EPOCH 7 done: loss 0.0238 - lr: 0.000050
199
+ 2023-10-12 09:07:37,616 DEV : loss 0.16945815086364746 - f1-score (micro avg) 0.7625
200
+ 2023-10-12 09:07:37,658 ----------------------------------------------------------------------------------------------------
201
+ 2023-10-12 09:08:25,636 epoch 8 - iter 99/992 - loss 0.01826581 - time (sec): 47.98 - samples/sec: 352.82 - lr: 0.000048 - momentum: 0.000000
202
+ 2023-10-12 09:09:15,818 epoch 8 - iter 198/992 - loss 0.01797263 - time (sec): 98.16 - samples/sec: 328.55 - lr: 0.000047 - momentum: 0.000000
203
+ 2023-10-12 09:10:08,897 epoch 8 - iter 297/992 - loss 0.01946032 - time (sec): 151.24 - samples/sec: 315.74 - lr: 0.000045 - momentum: 0.000000
204
+ 2023-10-12 09:10:58,922 epoch 8 - iter 396/992 - loss 0.01952599 - time (sec): 201.26 - samples/sec: 316.86 - lr: 0.000043 - momentum: 0.000000
205
+ 2023-10-12 09:11:52,850 epoch 8 - iter 495/992 - loss 0.01857797 - time (sec): 255.19 - samples/sec: 315.56 - lr: 0.000042 - momentum: 0.000000
206
+ 2023-10-12 09:12:45,735 epoch 8 - iter 594/992 - loss 0.02004084 - time (sec): 308.08 - samples/sec: 317.70 - lr: 0.000040 - momentum: 0.000000
207
+ 2023-10-12 09:13:36,327 epoch 8 - iter 693/992 - loss 0.02014586 - time (sec): 358.67 - samples/sec: 317.68 - lr: 0.000038 - momentum: 0.000000
208
+ 2023-10-12 09:14:30,138 epoch 8 - iter 792/992 - loss 0.01980906 - time (sec): 412.48 - samples/sec: 316.70 - lr: 0.000037 - momentum: 0.000000
209
+ 2023-10-12 09:15:21,623 epoch 8 - iter 891/992 - loss 0.02006957 - time (sec): 463.96 - samples/sec: 316.10 - lr: 0.000035 - momentum: 0.000000
210
+ 2023-10-12 09:16:09,180 epoch 8 - iter 990/992 - loss 0.01945183 - time (sec): 511.52 - samples/sec: 320.12 - lr: 0.000033 - momentum: 0.000000
211
+ 2023-10-12 09:16:10,068 ----------------------------------------------------------------------------------------------------
212
+ 2023-10-12 09:16:10,068 EPOCH 8 done: loss 0.0194 - lr: 0.000033
213
+ 2023-10-12 09:16:34,568 DEV : loss 0.1777261346578598 - f1-score (micro avg) 0.7603
214
+ 2023-10-12 09:16:34,606 ----------------------------------------------------------------------------------------------------
215
+ 2023-10-12 09:17:22,132 epoch 9 - iter 99/992 - loss 0.02532699 - time (sec): 47.52 - samples/sec: 362.18 - lr: 0.000032 - momentum: 0.000000
216
+ 2023-10-12 09:18:10,391 epoch 9 - iter 198/992 - loss 0.02156891 - time (sec): 95.78 - samples/sec: 351.32 - lr: 0.000030 - momentum: 0.000000
217
+ 2023-10-12 09:18:56,755 epoch 9 - iter 297/992 - loss 0.01810471 - time (sec): 142.15 - samples/sec: 356.12 - lr: 0.000028 - momentum: 0.000000
218
+ 2023-10-12 09:19:44,249 epoch 9 - iter 396/992 - loss 0.01799543 - time (sec): 189.64 - samples/sec: 348.96 - lr: 0.000027 - momentum: 0.000000
219
+ 2023-10-12 09:20:31,198 epoch 9 - iter 495/992 - loss 0.01649332 - time (sec): 236.59 - samples/sec: 349.44 - lr: 0.000025 - momentum: 0.000000
220
+ 2023-10-12 09:21:19,342 epoch 9 - iter 594/992 - loss 0.01544692 - time (sec): 284.73 - samples/sec: 347.94 - lr: 0.000023 - momentum: 0.000000
221
+ 2023-10-12 09:22:07,048 epoch 9 - iter 693/992 - loss 0.01478198 - time (sec): 332.44 - samples/sec: 348.11 - lr: 0.000022 - momentum: 0.000000
222
+ 2023-10-12 09:22:55,379 epoch 9 - iter 792/992 - loss 0.01567478 - time (sec): 380.77 - samples/sec: 344.64 - lr: 0.000020 - momentum: 0.000000
223
+ 2023-10-12 09:23:42,926 epoch 9 - iter 891/992 - loss 0.01599589 - time (sec): 428.32 - samples/sec: 344.01 - lr: 0.000018 - momentum: 0.000000
224
+ 2023-10-12 09:24:31,114 epoch 9 - iter 990/992 - loss 0.01558416 - time (sec): 476.51 - samples/sec: 343.42 - lr: 0.000017 - momentum: 0.000000
225
+ 2023-10-12 09:24:32,095 ----------------------------------------------------------------------------------------------------
226
+ 2023-10-12 09:24:32,095 EPOCH 9 done: loss 0.0156 - lr: 0.000017
227
+ 2023-10-12 09:24:57,467 DEV : loss 0.18520045280456543 - f1-score (micro avg) 0.7619
228
+ 2023-10-12 09:24:57,511 ----------------------------------------------------------------------------------------------------
229
+ 2023-10-12 09:25:46,427 epoch 10 - iter 99/992 - loss 0.01050991 - time (sec): 48.91 - samples/sec: 341.19 - lr: 0.000015 - momentum: 0.000000
230
+ 2023-10-12 09:26:34,054 epoch 10 - iter 198/992 - loss 0.01124717 - time (sec): 96.54 - samples/sec: 339.42 - lr: 0.000013 - momentum: 0.000000
231
+ 2023-10-12 09:27:22,983 epoch 10 - iter 297/992 - loss 0.01103351 - time (sec): 145.47 - samples/sec: 339.57 - lr: 0.000012 - momentum: 0.000000
232
+ 2023-10-12 09:28:15,052 epoch 10 - iter 396/992 - loss 0.01233967 - time (sec): 197.54 - samples/sec: 333.82 - lr: 0.000010 - momentum: 0.000000
233
+ 2023-10-12 09:29:10,897 epoch 10 - iter 495/992 - loss 0.01168010 - time (sec): 253.38 - samples/sec: 325.72 - lr: 0.000008 - momentum: 0.000000
234
+ 2023-10-12 09:30:07,424 epoch 10 - iter 594/992 - loss 0.01179804 - time (sec): 309.91 - samples/sec: 317.53 - lr: 0.000007 - momentum: 0.000000
235
+ 2023-10-12 09:31:02,969 epoch 10 - iter 693/992 - loss 0.01193022 - time (sec): 365.46 - samples/sec: 312.37 - lr: 0.000005 - momentum: 0.000000
236
+ 2023-10-12 09:31:59,360 epoch 10 - iter 792/992 - loss 0.01236357 - time (sec): 421.85 - samples/sec: 309.60 - lr: 0.000004 - momentum: 0.000000
237
+ 2023-10-12 09:32:52,374 epoch 10 - iter 891/992 - loss 0.01270245 - time (sec): 474.86 - samples/sec: 310.06 - lr: 0.000002 - momentum: 0.000000
238
+ 2023-10-12 09:33:40,701 epoch 10 - iter 990/992 - loss 0.01318705 - time (sec): 523.19 - samples/sec: 313.02 - lr: 0.000000 - momentum: 0.000000
239
+ 2023-10-12 09:33:41,597 ----------------------------------------------------------------------------------------------------
240
+ 2023-10-12 09:33:41,597 EPOCH 10 done: loss 0.0134 - lr: 0.000000
241
+ 2023-10-12 09:34:08,746 DEV : loss 0.19303283095359802 - f1-score (micro avg) 0.7562
242
+ 2023-10-12 09:34:09,762 ----------------------------------------------------------------------------------------------------
243
+ 2023-10-12 09:34:09,764 Loading model from best epoch ...
244
+ 2023-10-12 09:34:15,360 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
245
+ 2023-10-12 09:34:40,369
246
+ Results:
247
+ - F-score (micro) 0.7486
248
+ - F-score (macro) 0.6567
249
+ - Accuracy 0.6255
250
+
251
+ By class:
252
+ precision recall f1-score support
253
+
254
+ LOC 0.8082 0.8107 0.8095 655
255
+ PER 0.6795 0.7892 0.7303 223
256
+ ORG 0.5000 0.3780 0.4305 127
257
+
258
+ micro avg 0.7460 0.7512 0.7486 1005
259
+ macro avg 0.6626 0.6593 0.6567 1005
260
+ weighted avg 0.7407 0.7512 0.7440 1005
261
+
262
+ 2023-10-12 09:34:40,369 ----------------------------------------------------------------------------------------------------