stefan-it commited on
Commit
01f75c1
1 Parent(s): 04e4782

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed9cca8762b4490f318017c38ed7f254dbfc6a457a123a0bb919242985532d0b
3
+ size 870817519
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aecf01e582cfc65adf73bd987b1e9faa21281e81f01e76f31bba4858f80c564e
3
+ size 870817636
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 05:11:06 0.0001 1.1525 0.2012 0.5061 0.5061 0.5061 0.3773
3
+ 2 05:20:14 0.0001 0.1372 0.1166 0.7241 0.7891 0.7552 0.6223
4
+ 3 05:30:11 0.0001 0.0754 0.1217 0.7616 0.8041 0.7823 0.6626
5
+ 4 05:39:55 0.0001 0.0520 0.1584 0.7871 0.8000 0.7935 0.6759
6
+ 5 05:50:15 0.0001 0.0398 0.1628 0.8057 0.8122 0.8089 0.6950
7
+ 6 05:59:32 0.0001 0.0283 0.1736 0.7937 0.8218 0.8075 0.6919
8
+ 7 06:08:46 0.0001 0.0204 0.1936 0.7940 0.8286 0.8109 0.6960
9
+ 8 06:18:42 0.0000 0.0161 0.2112 0.7958 0.8163 0.8059 0.6889
10
+ 9 06:28:04 0.0000 0.0112 0.2223 0.7894 0.8109 0.8000 0.6811
11
+ 10 06:37:23 0.0000 0.0082 0.2234 0.7822 0.8109 0.7963 0.6742
runs/events.out.tfevents.1697000494.de2e83fddbee.1120.6 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a49527bf1d0126a0fa9605a669edb1770af42b53f312956f3a81bcccd049ae8
3
+ size 999862
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-11 05:01:34,727 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-11 05:01:34,729 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=17, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-11 05:01:34,729 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-11 05:01:34,729 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
72
+ - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
73
+ 2023-10-11 05:01:34,729 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-11 05:01:34,729 Train: 7142 sentences
75
+ 2023-10-11 05:01:34,729 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-11 05:01:34,729 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-11 05:01:34,730 Training Params:
78
+ 2023-10-11 05:01:34,730 - learning_rate: "0.00015"
79
+ 2023-10-11 05:01:34,730 - mini_batch_size: "4"
80
+ 2023-10-11 05:01:34,730 - max_epochs: "10"
81
+ 2023-10-11 05:01:34,730 - shuffle: "True"
82
+ 2023-10-11 05:01:34,730 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-11 05:01:34,730 Plugins:
84
+ 2023-10-11 05:01:34,730 - TensorboardLogger
85
+ 2023-10-11 05:01:34,730 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-11 05:01:34,730 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-11 05:01:34,730 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-11 05:01:34,730 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-11 05:01:34,730 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-11 05:01:34,730 Computation:
91
+ 2023-10-11 05:01:34,730 - compute on device: cuda:0
92
+ 2023-10-11 05:01:34,730 - embedding storage: none
93
+ 2023-10-11 05:01:34,731 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-11 05:01:34,731 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2"
95
+ 2023-10-11 05:01:34,731 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-11 05:01:34,731 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-11 05:01:34,731 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-11 05:02:27,827 epoch 1 - iter 178/1786 - loss 2.83041151 - time (sec): 53.09 - samples/sec: 500.15 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-11 05:03:20,133 epoch 1 - iter 356/1786 - loss 2.69429508 - time (sec): 105.40 - samples/sec: 491.24 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-11 05:04:14,338 epoch 1 - iter 534/1786 - loss 2.41681822 - time (sec): 159.61 - samples/sec: 486.67 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-11 05:05:10,271 epoch 1 - iter 712/1786 - loss 2.10924834 - time (sec): 215.54 - samples/sec: 477.63 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-11 05:06:05,087 epoch 1 - iter 890/1786 - loss 1.84410867 - time (sec): 270.35 - samples/sec: 470.72 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-11 05:07:05,666 epoch 1 - iter 1068/1786 - loss 1.63794432 - time (sec): 330.93 - samples/sec: 462.45 - lr: 0.000090 - momentum: 0.000000
104
+ 2023-10-11 05:08:04,516 epoch 1 - iter 1246/1786 - loss 1.47552627 - time (sec): 389.78 - samples/sec: 456.44 - lr: 0.000105 - momentum: 0.000000
105
+ 2023-10-11 05:09:00,303 epoch 1 - iter 1424/1786 - loss 1.35086011 - time (sec): 445.57 - samples/sec: 451.33 - lr: 0.000120 - momentum: 0.000000
106
+ 2023-10-11 05:09:52,316 epoch 1 - iter 1602/1786 - loss 1.24502390 - time (sec): 497.58 - samples/sec: 451.20 - lr: 0.000134 - momentum: 0.000000
107
+ 2023-10-11 05:10:43,774 epoch 1 - iter 1780/1786 - loss 1.15419317 - time (sec): 549.04 - samples/sec: 452.16 - lr: 0.000149 - momentum: 0.000000
108
+ 2023-10-11 05:10:45,213 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-11 05:10:45,213 EPOCH 1 done: loss 1.1525 - lr: 0.000149
110
+ 2023-10-11 05:11:06,239 DEV : loss 0.20120850205421448 - f1-score (micro avg) 0.5061
111
+ 2023-10-11 05:11:06,271 saving best model
112
+ 2023-10-11 05:11:07,380 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-11 05:12:00,838 epoch 2 - iter 178/1786 - loss 0.20647638 - time (sec): 53.46 - samples/sec: 470.14 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-11 05:12:55,186 epoch 2 - iter 356/1786 - loss 0.19196816 - time (sec): 107.80 - samples/sec: 477.92 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-11 05:13:49,188 epoch 2 - iter 534/1786 - loss 0.17945236 - time (sec): 161.81 - samples/sec: 471.84 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-11 05:14:41,335 epoch 2 - iter 712/1786 - loss 0.17099654 - time (sec): 213.95 - samples/sec: 469.79 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-11 05:15:33,535 epoch 2 - iter 890/1786 - loss 0.16314737 - time (sec): 266.15 - samples/sec: 468.04 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-11 05:16:27,194 epoch 2 - iter 1068/1786 - loss 0.15692538 - time (sec): 319.81 - samples/sec: 464.19 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-11 05:17:17,836 epoch 2 - iter 1246/1786 - loss 0.15088690 - time (sec): 370.45 - samples/sec: 464.60 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-11 05:18:09,337 epoch 2 - iter 1424/1786 - loss 0.14597758 - time (sec): 421.96 - samples/sec: 467.64 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-11 05:19:01,609 epoch 2 - iter 1602/1786 - loss 0.14238346 - time (sec): 474.23 - samples/sec: 471.09 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-11 05:19:52,592 epoch 2 - iter 1780/1786 - loss 0.13731741 - time (sec): 525.21 - samples/sec: 472.20 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-11 05:19:54,174 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-11 05:19:54,174 EPOCH 2 done: loss 0.1372 - lr: 0.000133
125
+ 2023-10-11 05:20:14,463 DEV : loss 0.11661199480295181 - f1-score (micro avg) 0.7552
126
+ 2023-10-11 05:20:14,492 saving best model
127
+ 2023-10-11 05:20:17,182 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-11 05:21:15,759 epoch 3 - iter 178/1786 - loss 0.07763298 - time (sec): 58.57 - samples/sec: 441.25 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-11 05:22:11,182 epoch 3 - iter 356/1786 - loss 0.07495553 - time (sec): 113.99 - samples/sec: 426.96 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-11 05:23:09,155 epoch 3 - iter 534/1786 - loss 0.07669306 - time (sec): 171.97 - samples/sec: 430.39 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-11 05:24:05,593 epoch 3 - iter 712/1786 - loss 0.07775822 - time (sec): 228.41 - samples/sec: 430.05 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-11 05:25:03,308 epoch 3 - iter 890/1786 - loss 0.07467629 - time (sec): 286.12 - samples/sec: 429.61 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-11 05:26:00,547 epoch 3 - iter 1068/1786 - loss 0.07681723 - time (sec): 343.36 - samples/sec: 431.47 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-11 05:26:58,181 epoch 3 - iter 1246/1786 - loss 0.07505527 - time (sec): 400.99 - samples/sec: 435.90 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-11 05:27:52,072 epoch 3 - iter 1424/1786 - loss 0.07490696 - time (sec): 454.88 - samples/sec: 438.92 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-11 05:28:46,503 epoch 3 - iter 1602/1786 - loss 0.07479295 - time (sec): 509.32 - samples/sec: 439.46 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-11 05:29:46,396 epoch 3 - iter 1780/1786 - loss 0.07517529 - time (sec): 569.21 - samples/sec: 436.00 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-11 05:29:48,337 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-11 05:29:48,338 EPOCH 3 done: loss 0.0754 - lr: 0.000117
140
+ 2023-10-11 05:30:11,692 DEV : loss 0.1217304989695549 - f1-score (micro avg) 0.7823
141
+ 2023-10-11 05:30:11,727 saving best model
142
+ 2023-10-11 05:30:14,519 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-11 05:31:12,835 epoch 4 - iter 178/1786 - loss 0.06283829 - time (sec): 58.31 - samples/sec: 424.49 - lr: 0.000115 - momentum: 0.000000
144
+ 2023-10-11 05:32:06,563 epoch 4 - iter 356/1786 - loss 0.05265726 - time (sec): 112.04 - samples/sec: 426.97 - lr: 0.000113 - momentum: 0.000000
145
+ 2023-10-11 05:33:01,812 epoch 4 - iter 534/1786 - loss 0.05130324 - time (sec): 167.29 - samples/sec: 441.84 - lr: 0.000112 - momentum: 0.000000
146
+ 2023-10-11 05:33:57,758 epoch 4 - iter 712/1786 - loss 0.05156142 - time (sec): 223.24 - samples/sec: 450.76 - lr: 0.000110 - momentum: 0.000000
147
+ 2023-10-11 05:34:50,238 epoch 4 - iter 890/1786 - loss 0.05132046 - time (sec): 275.72 - samples/sec: 448.93 - lr: 0.000108 - momentum: 0.000000
148
+ 2023-10-11 05:35:46,545 epoch 4 - iter 1068/1786 - loss 0.05071305 - time (sec): 332.02 - samples/sec: 447.62 - lr: 0.000107 - momentum: 0.000000
149
+ 2023-10-11 05:36:42,170 epoch 4 - iter 1246/1786 - loss 0.05180539 - time (sec): 387.65 - samples/sec: 450.20 - lr: 0.000105 - momentum: 0.000000
150
+ 2023-10-11 05:37:38,584 epoch 4 - iter 1424/1786 - loss 0.05196249 - time (sec): 444.06 - samples/sec: 447.63 - lr: 0.000103 - momentum: 0.000000
151
+ 2023-10-11 05:38:33,429 epoch 4 - iter 1602/1786 - loss 0.05160455 - time (sec): 498.91 - samples/sec: 447.19 - lr: 0.000102 - momentum: 0.000000
152
+ 2023-10-11 05:39:31,504 epoch 4 - iter 1780/1786 - loss 0.05213671 - time (sec): 556.98 - samples/sec: 445.74 - lr: 0.000100 - momentum: 0.000000
153
+ 2023-10-11 05:39:33,161 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-11 05:39:33,161 EPOCH 4 done: loss 0.0520 - lr: 0.000100
155
+ 2023-10-11 05:39:55,740 DEV : loss 0.15838001668453217 - f1-score (micro avg) 0.7935
156
+ 2023-10-11 05:39:55,771 saving best model
157
+ 2023-10-11 05:39:58,431 ----------------------------------------------------------------------------------------------------
158
+ 2023-10-11 05:40:58,919 epoch 5 - iter 178/1786 - loss 0.04547964 - time (sec): 60.48 - samples/sec: 411.44 - lr: 0.000098 - momentum: 0.000000
159
+ 2023-10-11 05:41:53,865 epoch 5 - iter 356/1786 - loss 0.04560730 - time (sec): 115.43 - samples/sec: 411.80 - lr: 0.000097 - momentum: 0.000000
160
+ 2023-10-11 05:42:52,796 epoch 5 - iter 534/1786 - loss 0.04303715 - time (sec): 174.36 - samples/sec: 415.11 - lr: 0.000095 - momentum: 0.000000
161
+ 2023-10-11 05:43:49,710 epoch 5 - iter 712/1786 - loss 0.04135883 - time (sec): 231.27 - samples/sec: 420.84 - lr: 0.000093 - momentum: 0.000000
162
+ 2023-10-11 05:44:48,917 epoch 5 - iter 890/1786 - loss 0.04142354 - time (sec): 290.48 - samples/sec: 417.48 - lr: 0.000092 - momentum: 0.000000
163
+ 2023-10-11 05:45:52,793 epoch 5 - iter 1068/1786 - loss 0.03982523 - time (sec): 354.36 - samples/sec: 412.77 - lr: 0.000090 - momentum: 0.000000
164
+ 2023-10-11 05:47:01,510 epoch 5 - iter 1246/1786 - loss 0.04033156 - time (sec): 423.07 - samples/sec: 408.31 - lr: 0.000088 - momentum: 0.000000
165
+ 2023-10-11 05:48:01,676 epoch 5 - iter 1424/1786 - loss 0.04015505 - time (sec): 483.24 - samples/sec: 408.92 - lr: 0.000087 - momentum: 0.000000
166
+ 2023-10-11 05:48:58,632 epoch 5 - iter 1602/1786 - loss 0.03983915 - time (sec): 540.20 - samples/sec: 411.43 - lr: 0.000085 - momentum: 0.000000
167
+ 2023-10-11 05:49:52,916 epoch 5 - iter 1780/1786 - loss 0.03991586 - time (sec): 594.48 - samples/sec: 417.29 - lr: 0.000083 - momentum: 0.000000
168
+ 2023-10-11 05:49:54,495 ----------------------------------------------------------------------------------------------------
169
+ 2023-10-11 05:49:54,496 EPOCH 5 done: loss 0.0398 - lr: 0.000083
170
+ 2023-10-11 05:50:15,738 DEV : loss 0.1628066748380661 - f1-score (micro avg) 0.8089
171
+ 2023-10-11 05:50:15,773 saving best model
172
+ 2023-10-11 05:50:18,533 ----------------------------------------------------------------------------------------------------
173
+ 2023-10-11 05:51:12,723 epoch 6 - iter 178/1786 - loss 0.02789412 - time (sec): 54.19 - samples/sec: 456.91 - lr: 0.000082 - momentum: 0.000000
174
+ 2023-10-11 05:52:07,670 epoch 6 - iter 356/1786 - loss 0.02958477 - time (sec): 109.13 - samples/sec: 456.59 - lr: 0.000080 - momentum: 0.000000
175
+ 2023-10-11 05:52:59,150 epoch 6 - iter 534/1786 - loss 0.02757124 - time (sec): 160.61 - samples/sec: 463.60 - lr: 0.000078 - momentum: 0.000000
176
+ 2023-10-11 05:53:50,942 epoch 6 - iter 712/1786 - loss 0.02729736 - time (sec): 212.41 - samples/sec: 467.18 - lr: 0.000077 - momentum: 0.000000
177
+ 2023-10-11 05:54:43,400 epoch 6 - iter 890/1786 - loss 0.02840525 - time (sec): 264.86 - samples/sec: 468.78 - lr: 0.000075 - momentum: 0.000000
178
+ 2023-10-11 05:55:37,498 epoch 6 - iter 1068/1786 - loss 0.02774450 - time (sec): 318.96 - samples/sec: 470.91 - lr: 0.000073 - momentum: 0.000000
179
+ 2023-10-11 05:56:32,273 epoch 6 - iter 1246/1786 - loss 0.02655476 - time (sec): 373.74 - samples/sec: 466.63 - lr: 0.000072 - momentum: 0.000000
180
+ 2023-10-11 05:57:26,142 epoch 6 - iter 1424/1786 - loss 0.02755150 - time (sec): 427.61 - samples/sec: 466.59 - lr: 0.000070 - momentum: 0.000000
181
+ 2023-10-11 05:58:19,208 epoch 6 - iter 1602/1786 - loss 0.02776326 - time (sec): 480.67 - samples/sec: 468.70 - lr: 0.000068 - momentum: 0.000000
182
+ 2023-10-11 05:59:09,786 epoch 6 - iter 1780/1786 - loss 0.02837332 - time (sec): 531.25 - samples/sec: 467.37 - lr: 0.000067 - momentum: 0.000000
183
+ 2023-10-11 05:59:11,193 ----------------------------------------------------------------------------------------------------
184
+ 2023-10-11 05:59:11,193 EPOCH 6 done: loss 0.0283 - lr: 0.000067
185
+ 2023-10-11 05:59:32,445 DEV : loss 0.17363940179347992 - f1-score (micro avg) 0.8075
186
+ 2023-10-11 05:59:32,474 ----------------------------------------------------------------------------------------------------
187
+ 2023-10-11 06:00:27,457 epoch 7 - iter 178/1786 - loss 0.02062730 - time (sec): 54.98 - samples/sec: 497.62 - lr: 0.000065 - momentum: 0.000000
188
+ 2023-10-11 06:01:26,025 epoch 7 - iter 356/1786 - loss 0.02080307 - time (sec): 113.55 - samples/sec: 457.95 - lr: 0.000063 - momentum: 0.000000
189
+ 2023-10-11 06:02:18,555 epoch 7 - iter 534/1786 - loss 0.02116990 - time (sec): 166.08 - samples/sec: 455.85 - lr: 0.000062 - momentum: 0.000000
190
+ 2023-10-11 06:03:10,462 epoch 7 - iter 712/1786 - loss 0.02046165 - time (sec): 217.99 - samples/sec: 458.64 - lr: 0.000060 - momentum: 0.000000
191
+ 2023-10-11 06:04:02,581 epoch 7 - iter 890/1786 - loss 0.02124109 - time (sec): 270.10 - samples/sec: 460.37 - lr: 0.000058 - momentum: 0.000000
192
+ 2023-10-11 06:04:55,342 epoch 7 - iter 1068/1786 - loss 0.01994612 - time (sec): 322.86 - samples/sec: 464.15 - lr: 0.000057 - momentum: 0.000000
193
+ 2023-10-11 06:05:46,465 epoch 7 - iter 1246/1786 - loss 0.02017532 - time (sec): 373.99 - samples/sec: 464.41 - lr: 0.000055 - momentum: 0.000000
194
+ 2023-10-11 06:06:39,403 epoch 7 - iter 1424/1786 - loss 0.01973361 - time (sec): 426.93 - samples/sec: 468.63 - lr: 0.000053 - momentum: 0.000000
195
+ 2023-10-11 06:07:31,780 epoch 7 - iter 1602/1786 - loss 0.01962053 - time (sec): 479.30 - samples/sec: 468.69 - lr: 0.000052 - momentum: 0.000000
196
+ 2023-10-11 06:08:23,880 epoch 7 - iter 1780/1786 - loss 0.02041587 - time (sec): 531.40 - samples/sec: 466.78 - lr: 0.000050 - momentum: 0.000000
197
+ 2023-10-11 06:08:25,507 ----------------------------------------------------------------------------------------------------
198
+ 2023-10-11 06:08:25,507 EPOCH 7 done: loss 0.0204 - lr: 0.000050
199
+ 2023-10-11 06:08:46,785 DEV : loss 0.1936260312795639 - f1-score (micro avg) 0.8109
200
+ 2023-10-11 06:08:46,815 saving best model
201
+ 2023-10-11 06:08:49,446 ----------------------------------------------------------------------------------------------------
202
+ 2023-10-11 06:09:41,617 epoch 8 - iter 178/1786 - loss 0.01944064 - time (sec): 52.17 - samples/sec: 461.54 - lr: 0.000048 - momentum: 0.000000
203
+ 2023-10-11 06:10:33,775 epoch 8 - iter 356/1786 - loss 0.01633750 - time (sec): 104.32 - samples/sec: 465.74 - lr: 0.000047 - momentum: 0.000000
204
+ 2023-10-11 06:11:26,536 epoch 8 - iter 534/1786 - loss 0.01426110 - time (sec): 157.09 - samples/sec: 468.29 - lr: 0.000045 - momentum: 0.000000
205
+ 2023-10-11 06:12:20,764 epoch 8 - iter 712/1786 - loss 0.01661153 - time (sec): 211.31 - samples/sec: 470.04 - lr: 0.000043 - momentum: 0.000000
206
+ 2023-10-11 06:13:16,553 epoch 8 - iter 890/1786 - loss 0.01686518 - time (sec): 267.10 - samples/sec: 462.43 - lr: 0.000042 - momentum: 0.000000
207
+ 2023-10-11 06:14:17,499 epoch 8 - iter 1068/1786 - loss 0.01792696 - time (sec): 328.05 - samples/sec: 452.14 - lr: 0.000040 - momentum: 0.000000
208
+ 2023-10-11 06:15:22,194 epoch 8 - iter 1246/1786 - loss 0.01811955 - time (sec): 392.74 - samples/sec: 439.83 - lr: 0.000038 - momentum: 0.000000
209
+ 2023-10-11 06:16:23,362 epoch 8 - iter 1424/1786 - loss 0.01715705 - time (sec): 453.91 - samples/sec: 437.14 - lr: 0.000037 - momentum: 0.000000
210
+ 2023-10-11 06:17:23,666 epoch 8 - iter 1602/1786 - loss 0.01642572 - time (sec): 514.22 - samples/sec: 433.14 - lr: 0.000035 - momentum: 0.000000
211
+ 2023-10-11 06:18:18,761 epoch 8 - iter 1780/1786 - loss 0.01609207 - time (sec): 569.31 - samples/sec: 435.25 - lr: 0.000033 - momentum: 0.000000
212
+ 2023-10-11 06:18:20,641 ----------------------------------------------------------------------------------------------------
213
+ 2023-10-11 06:18:20,641 EPOCH 8 done: loss 0.0161 - lr: 0.000033
214
+ 2023-10-11 06:18:42,738 DEV : loss 0.21119572222232819 - f1-score (micro avg) 0.8059
215
+ 2023-10-11 06:18:42,770 ----------------------------------------------------------------------------------------------------
216
+ 2023-10-11 06:19:41,501 epoch 9 - iter 178/1786 - loss 0.01445335 - time (sec): 58.73 - samples/sec: 439.60 - lr: 0.000032 - momentum: 0.000000
217
+ 2023-10-11 06:20:35,546 epoch 9 - iter 356/1786 - loss 0.01611300 - time (sec): 112.77 - samples/sec: 445.46 - lr: 0.000030 - momentum: 0.000000
218
+ 2023-10-11 06:21:27,480 epoch 9 - iter 534/1786 - loss 0.01242337 - time (sec): 164.71 - samples/sec: 454.46 - lr: 0.000028 - momentum: 0.000000
219
+ 2023-10-11 06:22:19,171 epoch 9 - iter 712/1786 - loss 0.01188148 - time (sec): 216.40 - samples/sec: 458.47 - lr: 0.000027 - momentum: 0.000000
220
+ 2023-10-11 06:23:11,965 epoch 9 - iter 890/1786 - loss 0.01133948 - time (sec): 269.19 - samples/sec: 460.33 - lr: 0.000025 - momentum: 0.000000
221
+ 2023-10-11 06:24:06,959 epoch 9 - iter 1068/1786 - loss 0.01089479 - time (sec): 324.19 - samples/sec: 458.23 - lr: 0.000023 - momentum: 0.000000
222
+ 2023-10-11 06:24:58,784 epoch 9 - iter 1246/1786 - loss 0.01082033 - time (sec): 376.01 - samples/sec: 456.11 - lr: 0.000022 - momentum: 0.000000
223
+ 2023-10-11 06:25:52,869 epoch 9 - iter 1424/1786 - loss 0.01140983 - time (sec): 430.10 - samples/sec: 458.01 - lr: 0.000020 - momentum: 0.000000
224
+ 2023-10-11 06:26:48,015 epoch 9 - iter 1602/1786 - loss 0.01172453 - time (sec): 485.24 - samples/sec: 460.19 - lr: 0.000018 - momentum: 0.000000
225
+ 2023-10-11 06:27:40,966 epoch 9 - iter 1780/1786 - loss 0.01123782 - time (sec): 538.19 - samples/sec: 460.59 - lr: 0.000017 - momentum: 0.000000
226
+ 2023-10-11 06:27:42,742 ----------------------------------------------------------------------------------------------------
227
+ 2023-10-11 06:27:42,743 EPOCH 9 done: loss 0.0112 - lr: 0.000017
228
+ 2023-10-11 06:28:04,945 DEV : loss 0.222304567694664 - f1-score (micro avg) 0.8
229
+ 2023-10-11 06:28:04,975 ----------------------------------------------------------------------------------------------------
230
+ 2023-10-11 06:28:57,976 epoch 10 - iter 178/1786 - loss 0.00772526 - time (sec): 53.00 - samples/sec: 443.35 - lr: 0.000015 - momentum: 0.000000
231
+ 2023-10-11 06:29:51,408 epoch 10 - iter 356/1786 - loss 0.00840680 - time (sec): 106.43 - samples/sec: 454.34 - lr: 0.000013 - momentum: 0.000000
232
+ 2023-10-11 06:30:46,308 epoch 10 - iter 534/1786 - loss 0.00752016 - time (sec): 161.33 - samples/sec: 458.27 - lr: 0.000012 - momentum: 0.000000
233
+ 2023-10-11 06:31:38,377 epoch 10 - iter 712/1786 - loss 0.00697489 - time (sec): 213.40 - samples/sec: 458.38 - lr: 0.000010 - momentum: 0.000000
234
+ 2023-10-11 06:32:31,964 epoch 10 - iter 890/1786 - loss 0.00802755 - time (sec): 266.99 - samples/sec: 464.30 - lr: 0.000008 - momentum: 0.000000
235
+ 2023-10-11 06:33:27,144 epoch 10 - iter 1068/1786 - loss 0.00896322 - time (sec): 322.17 - samples/sec: 467.72 - lr: 0.000007 - momentum: 0.000000
236
+ 2023-10-11 06:34:18,958 epoch 10 - iter 1246/1786 - loss 0.00910142 - time (sec): 373.98 - samples/sec: 464.87 - lr: 0.000005 - momentum: 0.000000
237
+ 2023-10-11 06:35:13,297 epoch 10 - iter 1424/1786 - loss 0.00868295 - time (sec): 428.32 - samples/sec: 465.41 - lr: 0.000003 - momentum: 0.000000
238
+ 2023-10-11 06:36:07,043 epoch 10 - iter 1602/1786 - loss 0.00832301 - time (sec): 482.07 - samples/sec: 464.24 - lr: 0.000002 - momentum: 0.000000
239
+ 2023-10-11 06:37:00,746 epoch 10 - iter 1780/1786 - loss 0.00823723 - time (sec): 535.77 - samples/sec: 463.07 - lr: 0.000000 - momentum: 0.000000
240
+ 2023-10-11 06:37:02,365 ----------------------------------------------------------------------------------------------------
241
+ 2023-10-11 06:37:02,365 EPOCH 10 done: loss 0.0082 - lr: 0.000000
242
+ 2023-10-11 06:37:23,464 DEV : loss 0.22344990074634552 - f1-score (micro avg) 0.7963
243
+ 2023-10-11 06:37:24,411 ----------------------------------------------------------------------------------------------------
244
+ 2023-10-11 06:37:24,413 Loading model from best epoch ...
245
+ 2023-10-11 06:37:28,377 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
246
+ 2023-10-11 06:38:40,166
247
+ Results:
248
+ - F-score (micro) 0.6861
249
+ - F-score (macro) 0.5975
250
+ - Accuracy 0.5426
251
+
252
+ By class:
253
+ precision recall f1-score support
254
+
255
+ LOC 0.7197 0.6986 0.7090 1095
256
+ PER 0.7824 0.7638 0.7730 1012
257
+ ORG 0.3843 0.5770 0.4614 357
258
+ HumanProd 0.3443 0.6364 0.4468 33
259
+
260
+ micro avg 0.6665 0.7068 0.6861 2497
261
+ macro avg 0.5577 0.6690 0.5975 2497
262
+ weighted avg 0.6922 0.7068 0.6961 2497
263
+
264
+ 2023-10-11 06:38:40,166 ----------------------------------------------------------------------------------------------------