stefan-it commited on
Commit
3280f8b
1 Parent(s): 3fa5a9e

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31c16296e221d2ca81eb429e0ee99ad5b572f2a976e628668113b8ef6fd899b7
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46b2f4e95ae3622feebe06f0b777b2acc66b96a1b4b879fc130ee9a43698cf27
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 11:16:24 0.0001 0.8622 0.1457 0.6215 0.6482 0.6346 0.5018
3
+ 2 11:26:19 0.0001 0.1257 0.0886 0.7206 0.7353 0.7279 0.5936
4
+ 3 11:35:54 0.0001 0.0764 0.0982 0.7372 0.7805 0.7582 0.6313
5
+ 4 11:45:33 0.0001 0.0570 0.1236 0.7287 0.7839 0.7553 0.6271
6
+ 5 11:55:01 0.0001 0.0429 0.1341 0.7452 0.7941 0.7689 0.6464
7
+ 6 12:04:42 0.0001 0.0333 0.1606 0.7479 0.7952 0.7708 0.6455
8
+ 7 12:14:00 0.0001 0.0223 0.1936 0.7433 0.7896 0.7658 0.6415
9
+ 8 12:23:15 0.0000 0.0159 0.2119 0.7550 0.7738 0.7642 0.6393
10
+ 9 12:32:07 0.0000 0.0123 0.2116 0.7634 0.7885 0.7757 0.6551
11
+ 10 12:42:28 0.0000 0.0090 0.2183 0.7519 0.7817 0.7665 0.6440
runs/events.out.tfevents.1697108791.de2e83fddbee.1952.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:208552e0787d8e2af0bbfd51bffd1de02aa25b8214a405809543ed8e328616c7
3
+ size 1108164
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-12 11:06:31,480 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-12 11:06:31,482 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-12 11:06:31,482 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-12 11:06:31,482 MultiCorpus: 7936 train + 992 dev + 992 test sentences
72
+ - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
73
+ 2023-10-12 11:06:31,482 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-12 11:06:31,482 Train: 7936 sentences
75
+ 2023-10-12 11:06:31,483 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-12 11:06:31,483 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-12 11:06:31,483 Training Params:
78
+ 2023-10-12 11:06:31,483 - learning_rate: "0.00015"
79
+ 2023-10-12 11:06:31,483 - mini_batch_size: "4"
80
+ 2023-10-12 11:06:31,483 - max_epochs: "10"
81
+ 2023-10-12 11:06:31,483 - shuffle: "True"
82
+ 2023-10-12 11:06:31,483 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-12 11:06:31,483 Plugins:
84
+ 2023-10-12 11:06:31,483 - TensorboardLogger
85
+ 2023-10-12 11:06:31,483 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-12 11:06:31,483 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-12 11:06:31,483 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-12 11:06:31,483 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-12 11:06:31,484 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-12 11:06:31,484 Computation:
91
+ 2023-10-12 11:06:31,484 - compute on device: cuda:0
92
+ 2023-10-12 11:06:31,484 - embedding storage: none
93
+ 2023-10-12 11:06:31,484 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-12 11:06:31,484 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1"
95
+ 2023-10-12 11:06:31,484 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-12 11:06:31,484 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-12 11:06:31,484 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-12 11:07:26,022 epoch 1 - iter 198/1984 - loss 2.57999255 - time (sec): 54.54 - samples/sec: 283.61 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-12 11:08:22,459 epoch 1 - iter 396/1984 - loss 2.44981053 - time (sec): 110.97 - samples/sec: 284.09 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-12 11:09:15,722 epoch 1 - iter 594/1984 - loss 2.13411946 - time (sec): 164.24 - samples/sec: 291.69 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-12 11:10:10,549 epoch 1 - iter 792/1984 - loss 1.79298227 - time (sec): 219.06 - samples/sec: 293.62 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-12 11:11:04,778 epoch 1 - iter 990/1984 - loss 1.52165692 - time (sec): 273.29 - samples/sec: 293.74 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-12 11:12:01,502 epoch 1 - iter 1188/1984 - loss 1.30787581 - time (sec): 330.02 - samples/sec: 293.94 - lr: 0.000090 - momentum: 0.000000
104
+ 2023-10-12 11:13:02,102 epoch 1 - iter 1386/1984 - loss 1.14333260 - time (sec): 390.62 - samples/sec: 292.77 - lr: 0.000105 - momentum: 0.000000
105
+ 2023-10-12 11:14:03,282 epoch 1 - iter 1584/1984 - loss 1.02809814 - time (sec): 451.80 - samples/sec: 290.09 - lr: 0.000120 - momentum: 0.000000
106
+ 2023-10-12 11:14:59,314 epoch 1 - iter 1782/1984 - loss 0.93905606 - time (sec): 507.83 - samples/sec: 290.13 - lr: 0.000135 - momentum: 0.000000
107
+ 2023-10-12 11:15:57,372 epoch 1 - iter 1980/1984 - loss 0.86344728 - time (sec): 565.89 - samples/sec: 289.24 - lr: 0.000150 - momentum: 0.000000
108
+ 2023-10-12 11:15:58,552 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-12 11:15:58,552 EPOCH 1 done: loss 0.8622 - lr: 0.000150
110
+ 2023-10-12 11:16:24,769 DEV : loss 0.14566047489643097 - f1-score (micro avg) 0.6346
111
+ 2023-10-12 11:16:24,809 saving best model
112
+ 2023-10-12 11:16:25,688 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-12 11:17:20,017 epoch 2 - iter 198/1984 - loss 0.16983836 - time (sec): 54.33 - samples/sec: 303.20 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-12 11:18:13,216 epoch 2 - iter 396/1984 - loss 0.14876062 - time (sec): 107.53 - samples/sec: 304.03 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-12 11:19:12,234 epoch 2 - iter 594/1984 - loss 0.14225847 - time (sec): 166.54 - samples/sec: 294.50 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-12 11:20:16,804 epoch 2 - iter 792/1984 - loss 0.14082827 - time (sec): 231.11 - samples/sec: 284.38 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-12 11:21:13,372 epoch 2 - iter 990/1984 - loss 0.13649551 - time (sec): 287.68 - samples/sec: 286.33 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-12 11:22:09,716 epoch 2 - iter 1188/1984 - loss 0.13419348 - time (sec): 344.03 - samples/sec: 286.44 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-12 11:23:05,261 epoch 2 - iter 1386/1984 - loss 0.13452875 - time (sec): 399.57 - samples/sec: 288.02 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-12 11:24:01,182 epoch 2 - iter 1584/1984 - loss 0.13063731 - time (sec): 455.49 - samples/sec: 287.18 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-12 11:24:57,266 epoch 2 - iter 1782/1984 - loss 0.12821277 - time (sec): 511.58 - samples/sec: 287.32 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-12 11:25:50,691 epoch 2 - iter 1980/1984 - loss 0.12598244 - time (sec): 565.00 - samples/sec: 289.39 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-12 11:25:51,942 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-12 11:25:51,942 EPOCH 2 done: loss 0.1257 - lr: 0.000133
125
+ 2023-10-12 11:26:18,959 DEV : loss 0.08862575143575668 - f1-score (micro avg) 0.7279
126
+ 2023-10-12 11:26:19,000 saving best model
127
+ 2023-10-12 11:26:21,592 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-12 11:27:15,300 epoch 3 - iter 198/1984 - loss 0.08035378 - time (sec): 53.70 - samples/sec: 293.08 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-12 11:28:07,651 epoch 3 - iter 396/1984 - loss 0.08045312 - time (sec): 106.05 - samples/sec: 301.32 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-12 11:29:04,386 epoch 3 - iter 594/1984 - loss 0.08136632 - time (sec): 162.79 - samples/sec: 299.79 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-12 11:29:59,889 epoch 3 - iter 792/1984 - loss 0.08074706 - time (sec): 218.29 - samples/sec: 298.62 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-12 11:30:55,332 epoch 3 - iter 990/1984 - loss 0.07862473 - time (sec): 273.73 - samples/sec: 297.34 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-12 11:31:48,283 epoch 3 - iter 1188/1984 - loss 0.07766833 - time (sec): 326.69 - samples/sec: 298.33 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-12 11:32:41,000 epoch 3 - iter 1386/1984 - loss 0.07860726 - time (sec): 379.40 - samples/sec: 298.38 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-12 11:33:38,620 epoch 3 - iter 1584/1984 - loss 0.07716988 - time (sec): 437.02 - samples/sec: 299.44 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-12 11:34:33,138 epoch 3 - iter 1782/1984 - loss 0.07591753 - time (sec): 491.54 - samples/sec: 300.57 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-12 11:35:27,361 epoch 3 - iter 1980/1984 - loss 0.07646179 - time (sec): 545.76 - samples/sec: 299.93 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-12 11:35:28,378 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-12 11:35:28,378 EPOCH 3 done: loss 0.0764 - lr: 0.000117
140
+ 2023-10-12 11:35:54,450 DEV : loss 0.09820140898227692 - f1-score (micro avg) 0.7582
141
+ 2023-10-12 11:35:54,501 saving best model
142
+ 2023-10-12 11:35:57,216 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-12 11:36:54,454 epoch 4 - iter 198/1984 - loss 0.05107641 - time (sec): 57.23 - samples/sec: 299.51 - lr: 0.000115 - momentum: 0.000000
144
+ 2023-10-12 11:37:49,239 epoch 4 - iter 396/1984 - loss 0.05351624 - time (sec): 112.01 - samples/sec: 304.06 - lr: 0.000113 - momentum: 0.000000
145
+ 2023-10-12 11:38:41,612 epoch 4 - iter 594/1984 - loss 0.05652355 - time (sec): 164.39 - samples/sec: 304.03 - lr: 0.000112 - momentum: 0.000000
146
+ 2023-10-12 11:39:34,981 epoch 4 - iter 792/1984 - loss 0.05647937 - time (sec): 217.76 - samples/sec: 301.77 - lr: 0.000110 - momentum: 0.000000
147
+ 2023-10-12 11:40:27,996 epoch 4 - iter 990/1984 - loss 0.05659483 - time (sec): 270.77 - samples/sec: 302.74 - lr: 0.000108 - momentum: 0.000000
148
+ 2023-10-12 11:41:21,031 epoch 4 - iter 1188/1984 - loss 0.05624160 - time (sec): 323.81 - samples/sec: 302.79 - lr: 0.000107 - momentum: 0.000000
149
+ 2023-10-12 11:42:19,315 epoch 4 - iter 1386/1984 - loss 0.05573244 - time (sec): 382.09 - samples/sec: 299.89 - lr: 0.000105 - momentum: 0.000000
150
+ 2023-10-12 11:43:14,049 epoch 4 - iter 1584/1984 - loss 0.05755650 - time (sec): 436.82 - samples/sec: 299.30 - lr: 0.000103 - momentum: 0.000000
151
+ 2023-10-12 11:44:08,504 epoch 4 - iter 1782/1984 - loss 0.05629255 - time (sec): 491.28 - samples/sec: 300.73 - lr: 0.000102 - momentum: 0.000000
152
+ 2023-10-12 11:45:04,669 epoch 4 - iter 1980/1984 - loss 0.05700810 - time (sec): 547.44 - samples/sec: 299.11 - lr: 0.000100 - momentum: 0.000000
153
+ 2023-10-12 11:45:05,749 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-12 11:45:05,749 EPOCH 4 done: loss 0.0570 - lr: 0.000100
155
+ 2023-10-12 11:45:33,306 DEV : loss 0.12358254194259644 - f1-score (micro avg) 0.7553
156
+ 2023-10-12 11:45:33,357 ----------------------------------------------------------------------------------------------------
157
+ 2023-10-12 11:46:26,712 epoch 5 - iter 198/1984 - loss 0.04324754 - time (sec): 53.35 - samples/sec: 302.80 - lr: 0.000098 - momentum: 0.000000
158
+ 2023-10-12 11:47:19,893 epoch 5 - iter 396/1984 - loss 0.03649823 - time (sec): 106.53 - samples/sec: 303.56 - lr: 0.000097 - momentum: 0.000000
159
+ 2023-10-12 11:48:13,499 epoch 5 - iter 594/1984 - loss 0.03805430 - time (sec): 160.14 - samples/sec: 303.80 - lr: 0.000095 - momentum: 0.000000
160
+ 2023-10-12 11:49:05,591 epoch 5 - iter 792/1984 - loss 0.03840236 - time (sec): 212.23 - samples/sec: 306.73 - lr: 0.000093 - momentum: 0.000000
161
+ 2023-10-12 11:49:59,452 epoch 5 - iter 990/1984 - loss 0.03877875 - time (sec): 266.09 - samples/sec: 305.54 - lr: 0.000092 - momentum: 0.000000
162
+ 2023-10-12 11:50:53,853 epoch 5 - iter 1188/1984 - loss 0.04135513 - time (sec): 320.49 - samples/sec: 305.54 - lr: 0.000090 - momentum: 0.000000
163
+ 2023-10-12 11:51:50,814 epoch 5 - iter 1386/1984 - loss 0.04131838 - time (sec): 377.45 - samples/sec: 303.18 - lr: 0.000088 - momentum: 0.000000
164
+ 2023-10-12 11:52:45,024 epoch 5 - iter 1584/1984 - loss 0.04180636 - time (sec): 431.66 - samples/sec: 304.24 - lr: 0.000087 - momentum: 0.000000
165
+ 2023-10-12 11:53:37,845 epoch 5 - iter 1782/1984 - loss 0.04178494 - time (sec): 484.49 - samples/sec: 305.16 - lr: 0.000085 - momentum: 0.000000
166
+ 2023-10-12 11:54:32,459 epoch 5 - iter 1980/1984 - loss 0.04294845 - time (sec): 539.10 - samples/sec: 303.51 - lr: 0.000083 - momentum: 0.000000
167
+ 2023-10-12 11:54:33,704 ----------------------------------------------------------------------------------------------------
168
+ 2023-10-12 11:54:33,704 EPOCH 5 done: loss 0.0429 - lr: 0.000083
169
+ 2023-10-12 11:55:01,009 DEV : loss 0.13407257199287415 - f1-score (micro avg) 0.7689
170
+ 2023-10-12 11:55:01,055 saving best model
171
+ 2023-10-12 11:55:03,705 ----------------------------------------------------------------------------------------------------
172
+ 2023-10-12 11:55:59,952 epoch 6 - iter 198/1984 - loss 0.03113160 - time (sec): 56.24 - samples/sec: 278.53 - lr: 0.000082 - momentum: 0.000000
173
+ 2023-10-12 11:56:56,451 epoch 6 - iter 396/1984 - loss 0.02887060 - time (sec): 112.74 - samples/sec: 284.02 - lr: 0.000080 - momentum: 0.000000
174
+ 2023-10-12 11:57:53,189 epoch 6 - iter 594/1984 - loss 0.02945609 - time (sec): 169.48 - samples/sec: 284.26 - lr: 0.000078 - momentum: 0.000000
175
+ 2023-10-12 11:58:50,010 epoch 6 - iter 792/1984 - loss 0.02963186 - time (sec): 226.30 - samples/sec: 287.98 - lr: 0.000077 - momentum: 0.000000
176
+ 2023-10-12 11:59:45,930 epoch 6 - iter 990/1984 - loss 0.02807978 - time (sec): 282.22 - samples/sec: 287.49 - lr: 0.000075 - momentum: 0.000000
177
+ 2023-10-12 12:00:42,181 epoch 6 - iter 1188/1984 - loss 0.02971360 - time (sec): 338.47 - samples/sec: 289.04 - lr: 0.000073 - momentum: 0.000000
178
+ 2023-10-12 12:01:34,537 epoch 6 - iter 1386/1984 - loss 0.02997489 - time (sec): 390.83 - samples/sec: 293.47 - lr: 0.000072 - momentum: 0.000000
179
+ 2023-10-12 12:02:27,693 epoch 6 - iter 1584/1984 - loss 0.03200441 - time (sec): 443.98 - samples/sec: 294.44 - lr: 0.000070 - momentum: 0.000000
180
+ 2023-10-12 12:03:23,539 epoch 6 - iter 1782/1984 - loss 0.03269817 - time (sec): 499.83 - samples/sec: 294.76 - lr: 0.000068 - momentum: 0.000000
181
+ 2023-10-12 12:04:15,284 epoch 6 - iter 1980/1984 - loss 0.03337058 - time (sec): 551.57 - samples/sec: 296.63 - lr: 0.000067 - momentum: 0.000000
182
+ 2023-10-12 12:04:16,359 ----------------------------------------------------------------------------------------------------
183
+ 2023-10-12 12:04:16,359 EPOCH 6 done: loss 0.0333 - lr: 0.000067
184
+ 2023-10-12 12:04:42,197 DEV : loss 0.1605551391839981 - f1-score (micro avg) 0.7708
185
+ 2023-10-12 12:04:42,242 saving best model
186
+ 2023-10-12 12:04:44,860 ----------------------------------------------------------------------------------------------------
187
+ 2023-10-12 12:05:38,725 epoch 7 - iter 198/1984 - loss 0.01630351 - time (sec): 53.86 - samples/sec: 302.76 - lr: 0.000065 - momentum: 0.000000
188
+ 2023-10-12 12:06:32,943 epoch 7 - iter 396/1984 - loss 0.02088592 - time (sec): 108.08 - samples/sec: 305.05 - lr: 0.000063 - momentum: 0.000000
189
+ 2023-10-12 12:07:25,178 epoch 7 - iter 594/1984 - loss 0.02089722 - time (sec): 160.31 - samples/sec: 305.28 - lr: 0.000062 - momentum: 0.000000
190
+ 2023-10-12 12:08:19,504 epoch 7 - iter 792/1984 - loss 0.02127973 - time (sec): 214.64 - samples/sec: 305.85 - lr: 0.000060 - momentum: 0.000000
191
+ 2023-10-12 12:09:12,089 epoch 7 - iter 990/1984 - loss 0.02061894 - time (sec): 267.23 - samples/sec: 305.21 - lr: 0.000058 - momentum: 0.000000
192
+ 2023-10-12 12:10:03,837 epoch 7 - iter 1188/1984 - loss 0.02132952 - time (sec): 318.97 - samples/sec: 306.78 - lr: 0.000057 - momentum: 0.000000
193
+ 2023-10-12 12:10:56,791 epoch 7 - iter 1386/1984 - loss 0.02181938 - time (sec): 371.93 - samples/sec: 308.54 - lr: 0.000055 - momentum: 0.000000
194
+ 2023-10-12 12:11:48,135 epoch 7 - iter 1584/1984 - loss 0.02244607 - time (sec): 423.27 - samples/sec: 306.27 - lr: 0.000053 - momentum: 0.000000
195
+ 2023-10-12 12:12:41,566 epoch 7 - iter 1782/1984 - loss 0.02208528 - time (sec): 476.70 - samples/sec: 306.92 - lr: 0.000052 - momentum: 0.000000
196
+ 2023-10-12 12:13:34,953 epoch 7 - iter 1980/1984 - loss 0.02228252 - time (sec): 530.09 - samples/sec: 308.63 - lr: 0.000050 - momentum: 0.000000
197
+ 2023-10-12 12:13:36,037 ----------------------------------------------------------------------------------------------------
198
+ 2023-10-12 12:13:36,037 EPOCH 7 done: loss 0.0223 - lr: 0.000050
199
+ 2023-10-12 12:14:00,840 DEV : loss 0.193552166223526 - f1-score (micro avg) 0.7658
200
+ 2023-10-12 12:14:00,880 ----------------------------------------------------------------------------------------------------
201
+ 2023-10-12 12:14:53,000 epoch 8 - iter 198/1984 - loss 0.01942802 - time (sec): 52.12 - samples/sec: 324.79 - lr: 0.000048 - momentum: 0.000000
202
+ 2023-10-12 12:15:44,809 epoch 8 - iter 396/1984 - loss 0.01739382 - time (sec): 103.93 - samples/sec: 310.32 - lr: 0.000047 - momentum: 0.000000
203
+ 2023-10-12 12:16:38,516 epoch 8 - iter 594/1984 - loss 0.01651369 - time (sec): 157.63 - samples/sec: 302.92 - lr: 0.000045 - momentum: 0.000000
204
+ 2023-10-12 12:17:34,341 epoch 8 - iter 792/1984 - loss 0.01549617 - time (sec): 213.46 - samples/sec: 298.75 - lr: 0.000043 - momentum: 0.000000
205
+ 2023-10-12 12:18:27,661 epoch 8 - iter 990/1984 - loss 0.01499724 - time (sec): 266.78 - samples/sec: 301.86 - lr: 0.000042 - momentum: 0.000000
206
+ 2023-10-12 12:19:19,537 epoch 8 - iter 1188/1984 - loss 0.01661391 - time (sec): 318.65 - samples/sec: 307.15 - lr: 0.000040 - momentum: 0.000000
207
+ 2023-10-12 12:20:11,598 epoch 8 - iter 1386/1984 - loss 0.01696979 - time (sec): 370.72 - samples/sec: 307.35 - lr: 0.000038 - momentum: 0.000000
208
+ 2023-10-12 12:21:05,660 epoch 8 - iter 1584/1984 - loss 0.01658658 - time (sec): 424.78 - samples/sec: 307.53 - lr: 0.000037 - momentum: 0.000000
209
+ 2023-10-12 12:21:59,556 epoch 8 - iter 1782/1984 - loss 0.01660014 - time (sec): 478.67 - samples/sec: 306.38 - lr: 0.000035 - momentum: 0.000000
210
+ 2023-10-12 12:22:50,214 epoch 8 - iter 1980/1984 - loss 0.01587470 - time (sec): 529.33 - samples/sec: 309.34 - lr: 0.000033 - momentum: 0.000000
211
+ 2023-10-12 12:22:51,156 ----------------------------------------------------------------------------------------------------
212
+ 2023-10-12 12:22:51,156 EPOCH 8 done: loss 0.0159 - lr: 0.000033
213
+ 2023-10-12 12:23:15,305 DEV : loss 0.21188023686408997 - f1-score (micro avg) 0.7642
214
+ 2023-10-12 12:23:15,344 ----------------------------------------------------------------------------------------------------
215
+ 2023-10-12 12:24:05,207 epoch 9 - iter 198/1984 - loss 0.01503978 - time (sec): 49.86 - samples/sec: 345.20 - lr: 0.000032 - momentum: 0.000000
216
+ 2023-10-12 12:24:55,182 epoch 9 - iter 396/1984 - loss 0.01537065 - time (sec): 99.84 - samples/sec: 337.06 - lr: 0.000030 - momentum: 0.000000
217
+ 2023-10-12 12:25:45,441 epoch 9 - iter 594/1984 - loss 0.01380477 - time (sec): 150.09 - samples/sec: 337.26 - lr: 0.000028 - momentum: 0.000000
218
+ 2023-10-12 12:26:34,719 epoch 9 - iter 792/1984 - loss 0.01424107 - time (sec): 199.37 - samples/sec: 331.93 - lr: 0.000027 - momentum: 0.000000
219
+ 2023-10-12 12:27:24,733 epoch 9 - iter 990/1984 - loss 0.01368776 - time (sec): 249.39 - samples/sec: 331.51 - lr: 0.000025 - momentum: 0.000000
220
+ 2023-10-12 12:28:16,126 epoch 9 - iter 1188/1984 - loss 0.01282396 - time (sec): 300.78 - samples/sec: 329.38 - lr: 0.000023 - momentum: 0.000000
221
+ 2023-10-12 12:29:07,987 epoch 9 - iter 1386/1984 - loss 0.01240754 - time (sec): 352.64 - samples/sec: 328.17 - lr: 0.000022 - momentum: 0.000000
222
+ 2023-10-12 12:29:58,734 epoch 9 - iter 1584/1984 - loss 0.01272824 - time (sec): 403.39 - samples/sec: 325.31 - lr: 0.000020 - momentum: 0.000000
223
+ 2023-10-12 12:30:49,736 epoch 9 - iter 1782/1984 - loss 0.01309551 - time (sec): 454.39 - samples/sec: 324.27 - lr: 0.000018 - momentum: 0.000000
224
+ 2023-10-12 12:31:40,744 epoch 9 - iter 1980/1984 - loss 0.01231972 - time (sec): 505.40 - samples/sec: 323.79 - lr: 0.000017 - momentum: 0.000000
225
+ 2023-10-12 12:31:41,792 ----------------------------------------------------------------------------------------------------
226
+ 2023-10-12 12:31:41,793 EPOCH 9 done: loss 0.0123 - lr: 0.000017
227
+ 2023-10-12 12:32:07,050 DEV : loss 0.21163929998874664 - f1-score (micro avg) 0.7757
228
+ 2023-10-12 12:32:07,090 saving best model
229
+ 2023-10-12 12:32:09,671 ----------------------------------------------------------------------------------------------------
230
+ 2023-10-12 12:33:08,394 epoch 10 - iter 198/1984 - loss 0.00652323 - time (sec): 58.72 - samples/sec: 284.22 - lr: 0.000015 - momentum: 0.000000
231
+ 2023-10-12 12:34:09,493 epoch 10 - iter 396/1984 - loss 0.00848912 - time (sec): 119.82 - samples/sec: 273.48 - lr: 0.000013 - momentum: 0.000000
232
+ 2023-10-12 12:35:07,187 epoch 10 - iter 594/1984 - loss 0.00744853 - time (sec): 177.51 - samples/sec: 278.27 - lr: 0.000012 - momentum: 0.000000
233
+ 2023-10-12 12:36:07,684 epoch 10 - iter 792/1984 - loss 0.00864282 - time (sec): 238.01 - samples/sec: 277.06 - lr: 0.000010 - momentum: 0.000000
234
+ 2023-10-12 12:37:07,295 epoch 10 - iter 990/1984 - loss 0.00791604 - time (sec): 297.62 - samples/sec: 277.30 - lr: 0.000008 - momentum: 0.000000
235
+ 2023-10-12 12:38:04,898 epoch 10 - iter 1188/1984 - loss 0.00821374 - time (sec): 355.22 - samples/sec: 277.02 - lr: 0.000007 - momentum: 0.000000
236
+ 2023-10-12 12:39:04,148 epoch 10 - iter 1386/1984 - loss 0.00830391 - time (sec): 414.47 - samples/sec: 275.43 - lr: 0.000005 - momentum: 0.000000
237
+ 2023-10-12 12:40:03,881 epoch 10 - iter 1584/1984 - loss 0.00845608 - time (sec): 474.21 - samples/sec: 275.41 - lr: 0.000003 - momentum: 0.000000
238
+ 2023-10-12 12:41:01,804 epoch 10 - iter 1782/1984 - loss 0.00817011 - time (sec): 532.13 - samples/sec: 276.69 - lr: 0.000002 - momentum: 0.000000
239
+ 2023-10-12 12:41:59,075 epoch 10 - iter 1980/1984 - loss 0.00863686 - time (sec): 589.40 - samples/sec: 277.86 - lr: 0.000000 - momentum: 0.000000
240
+ 2023-10-12 12:42:00,374 ----------------------------------------------------------------------------------------------------
241
+ 2023-10-12 12:42:00,374 EPOCH 10 done: loss 0.0090 - lr: 0.000000
242
+ 2023-10-12 12:42:28,840 DEV : loss 0.2182641178369522 - f1-score (micro avg) 0.7665
243
+ 2023-10-12 12:42:29,907 ----------------------------------------------------------------------------------------------------
244
+ 2023-10-12 12:42:29,909 Loading model from best epoch ...
245
+ 2023-10-12 12:42:33,658 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
246
+ 2023-10-12 12:42:58,091
247
+ Results:
248
+ - F-score (micro) 0.7533
249
+ - F-score (macro) 0.6701
250
+ - Accuracy 0.6309
251
+
252
+ By class:
253
+ precision recall f1-score support
254
+
255
+ LOC 0.8056 0.8290 0.8172 655
256
+ PER 0.6577 0.7668 0.7081 223
257
+ ORG 0.5278 0.4488 0.4851 127
258
+
259
+ micro avg 0.7399 0.7672 0.7533 1005
260
+ macro avg 0.6637 0.6815 0.6701 1005
261
+ weighted avg 0.7377 0.7672 0.7510 1005
262
+
263
+ 2023-10-12 12:42:58,091 ----------------------------------------------------------------------------------------------------