stefan-it commited on
Commit
f2bb399
1 Parent(s): a3d39ae

Upload ./training.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. training.log +242 -0
training.log ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-25 17:00:22,362 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-25 17:00:22,363 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(64001, 768)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0-11): 12 x BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ )
39
+ )
40
+ (pooler): BertPooler(
41
+ (dense): Linear(in_features=768, out_features=768, bias=True)
42
+ (activation): Tanh()
43
+ )
44
+ )
45
+ )
46
+ (locked_dropout): LockedDropout(p=0.5)
47
+ (linear): Linear(in_features=768, out_features=17, bias=True)
48
+ (loss_function): CrossEntropyLoss()
49
+ )"
50
+ 2023-10-25 17:00:22,363 ----------------------------------------------------------------------------------------------------
51
+ 2023-10-25 17:00:22,364 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
52
+ - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
53
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
54
+ 2023-10-25 17:00:22,364 Train: 7142 sentences
55
+ 2023-10-25 17:00:22,364 (train_with_dev=False, train_with_test=False)
56
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
57
+ 2023-10-25 17:00:22,364 Training Params:
58
+ 2023-10-25 17:00:22,364 - learning_rate: "3e-05"
59
+ 2023-10-25 17:00:22,364 - mini_batch_size: "8"
60
+ 2023-10-25 17:00:22,364 - max_epochs: "10"
61
+ 2023-10-25 17:00:22,364 - shuffle: "True"
62
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
63
+ 2023-10-25 17:00:22,364 Plugins:
64
+ 2023-10-25 17:00:22,364 - TensorboardLogger
65
+ 2023-10-25 17:00:22,364 - LinearScheduler | warmup_fraction: '0.1'
66
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
67
+ 2023-10-25 17:00:22,364 Final evaluation on model from best epoch (best-model.pt)
68
+ 2023-10-25 17:00:22,364 - metric: "('micro avg', 'f1-score')"
69
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
70
+ 2023-10-25 17:00:22,364 Computation:
71
+ 2023-10-25 17:00:22,364 - compute on device: cuda:0
72
+ 2023-10-25 17:00:22,364 - embedding storage: none
73
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-25 17:00:22,364 Model training base path: "hmbench-newseye/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-4"
75
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
76
+ 2023-10-25 17:00:22,364 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-25 17:00:22,365 Logging anything other than scalars to TensorBoard is currently not supported.
78
+ 2023-10-25 17:00:28,672 epoch 1 - iter 89/893 - loss 2.19742480 - time (sec): 6.31 - samples/sec: 4004.07 - lr: 0.000003 - momentum: 0.000000
79
+ 2023-10-25 17:00:35,048 epoch 1 - iter 178/893 - loss 1.39693997 - time (sec): 12.68 - samples/sec: 4002.88 - lr: 0.000006 - momentum: 0.000000
80
+ 2023-10-25 17:00:41,309 epoch 1 - iter 267/893 - loss 1.07633123 - time (sec): 18.94 - samples/sec: 3947.26 - lr: 0.000009 - momentum: 0.000000
81
+ 2023-10-25 17:00:47,306 epoch 1 - iter 356/893 - loss 0.87429157 - time (sec): 24.94 - samples/sec: 3993.37 - lr: 0.000012 - momentum: 0.000000
82
+ 2023-10-25 17:00:53,202 epoch 1 - iter 445/893 - loss 0.74799460 - time (sec): 30.84 - samples/sec: 4002.33 - lr: 0.000015 - momentum: 0.000000
83
+ 2023-10-25 17:00:59,211 epoch 1 - iter 534/893 - loss 0.65718250 - time (sec): 36.85 - samples/sec: 4018.73 - lr: 0.000018 - momentum: 0.000000
84
+ 2023-10-25 17:01:05,222 epoch 1 - iter 623/893 - loss 0.58410501 - time (sec): 42.86 - samples/sec: 4035.30 - lr: 0.000021 - momentum: 0.000000
85
+ 2023-10-25 17:01:11,176 epoch 1 - iter 712/893 - loss 0.52802697 - time (sec): 48.81 - samples/sec: 4073.71 - lr: 0.000024 - momentum: 0.000000
86
+ 2023-10-25 17:01:17,243 epoch 1 - iter 801/893 - loss 0.48658916 - time (sec): 54.88 - samples/sec: 4080.04 - lr: 0.000027 - momentum: 0.000000
87
+ 2023-10-25 17:01:23,202 epoch 1 - iter 890/893 - loss 0.45333206 - time (sec): 60.84 - samples/sec: 4071.24 - lr: 0.000030 - momentum: 0.000000
88
+ 2023-10-25 17:01:23,417 ----------------------------------------------------------------------------------------------------
89
+ 2023-10-25 17:01:23,417 EPOCH 1 done: loss 0.4518 - lr: 0.000030
90
+ 2023-10-25 17:01:27,249 DEV : loss 0.0998985692858696 - f1-score (micro avg) 0.7288
91
+ 2023-10-25 17:01:27,270 saving best model
92
+ 2023-10-25 17:01:27,743 ----------------------------------------------------------------------------------------------------
93
+ 2023-10-25 17:01:33,963 epoch 2 - iter 89/893 - loss 0.11010133 - time (sec): 6.22 - samples/sec: 3972.66 - lr: 0.000030 - momentum: 0.000000
94
+ 2023-10-25 17:01:40,098 epoch 2 - iter 178/893 - loss 0.10005667 - time (sec): 12.35 - samples/sec: 3975.15 - lr: 0.000029 - momentum: 0.000000
95
+ 2023-10-25 17:01:46,248 epoch 2 - iter 267/893 - loss 0.10041275 - time (sec): 18.50 - samples/sec: 4054.32 - lr: 0.000029 - momentum: 0.000000
96
+ 2023-10-25 17:01:52,378 epoch 2 - iter 356/893 - loss 0.10308505 - time (sec): 24.63 - samples/sec: 4108.35 - lr: 0.000029 - momentum: 0.000000
97
+ 2023-10-25 17:01:58,565 epoch 2 - iter 445/893 - loss 0.10163209 - time (sec): 30.82 - samples/sec: 4120.74 - lr: 0.000028 - momentum: 0.000000
98
+ 2023-10-25 17:02:04,485 epoch 2 - iter 534/893 - loss 0.10147740 - time (sec): 36.74 - samples/sec: 4101.34 - lr: 0.000028 - momentum: 0.000000
99
+ 2023-10-25 17:02:10,425 epoch 2 - iter 623/893 - loss 0.10276131 - time (sec): 42.68 - samples/sec: 4107.64 - lr: 0.000028 - momentum: 0.000000
100
+ 2023-10-25 17:02:16,395 epoch 2 - iter 712/893 - loss 0.10195994 - time (sec): 48.65 - samples/sec: 4123.54 - lr: 0.000027 - momentum: 0.000000
101
+ 2023-10-25 17:02:22,164 epoch 2 - iter 801/893 - loss 0.10209269 - time (sec): 54.42 - samples/sec: 4096.53 - lr: 0.000027 - momentum: 0.000000
102
+ 2023-10-25 17:02:28,238 epoch 2 - iter 890/893 - loss 0.10100025 - time (sec): 60.49 - samples/sec: 4102.24 - lr: 0.000027 - momentum: 0.000000
103
+ 2023-10-25 17:02:28,441 ----------------------------------------------------------------------------------------------------
104
+ 2023-10-25 17:02:28,441 EPOCH 2 done: loss 0.1009 - lr: 0.000027
105
+ 2023-10-25 17:02:33,319 DEV : loss 0.09367502480745316 - f1-score (micro avg) 0.7629
106
+ 2023-10-25 17:02:33,342 saving best model
107
+ 2023-10-25 17:02:34,008 ----------------------------------------------------------------------------------------------------
108
+ 2023-10-25 17:02:39,971 epoch 3 - iter 89/893 - loss 0.06332527 - time (sec): 5.96 - samples/sec: 3937.76 - lr: 0.000026 - momentum: 0.000000
109
+ 2023-10-25 17:02:46,251 epoch 3 - iter 178/893 - loss 0.06194394 - time (sec): 12.24 - samples/sec: 4036.39 - lr: 0.000026 - momentum: 0.000000
110
+ 2023-10-25 17:02:52,083 epoch 3 - iter 267/893 - loss 0.06017200 - time (sec): 18.07 - samples/sec: 4109.10 - lr: 0.000026 - momentum: 0.000000
111
+ 2023-10-25 17:02:58,255 epoch 3 - iter 356/893 - loss 0.06164862 - time (sec): 24.24 - samples/sec: 4086.99 - lr: 0.000025 - momentum: 0.000000
112
+ 2023-10-25 17:03:04,363 epoch 3 - iter 445/893 - loss 0.06151607 - time (sec): 30.35 - samples/sec: 4106.56 - lr: 0.000025 - momentum: 0.000000
113
+ 2023-10-25 17:03:10,344 epoch 3 - iter 534/893 - loss 0.06067296 - time (sec): 36.33 - samples/sec: 4120.16 - lr: 0.000025 - momentum: 0.000000
114
+ 2023-10-25 17:03:16,234 epoch 3 - iter 623/893 - loss 0.06086561 - time (sec): 42.22 - samples/sec: 4129.45 - lr: 0.000024 - momentum: 0.000000
115
+ 2023-10-25 17:03:22,147 epoch 3 - iter 712/893 - loss 0.06099317 - time (sec): 48.13 - samples/sec: 4094.37 - lr: 0.000024 - momentum: 0.000000
116
+ 2023-10-25 17:03:28,280 epoch 3 - iter 801/893 - loss 0.06051995 - time (sec): 54.27 - samples/sec: 4119.00 - lr: 0.000024 - momentum: 0.000000
117
+ 2023-10-25 17:03:34,251 epoch 3 - iter 890/893 - loss 0.06220034 - time (sec): 60.24 - samples/sec: 4117.78 - lr: 0.000023 - momentum: 0.000000
118
+ 2023-10-25 17:03:34,451 ----------------------------------------------------------------------------------------------------
119
+ 2023-10-25 17:03:34,451 EPOCH 3 done: loss 0.0624 - lr: 0.000023
120
+ 2023-10-25 17:03:39,555 DEV : loss 0.10349678248167038 - f1-score (micro avg) 0.7851
121
+ 2023-10-25 17:03:39,573 saving best model
122
+ 2023-10-25 17:03:40,237 ----------------------------------------------------------------------------------------------------
123
+ 2023-10-25 17:03:46,372 epoch 4 - iter 89/893 - loss 0.03754159 - time (sec): 6.13 - samples/sec: 4230.43 - lr: 0.000023 - momentum: 0.000000
124
+ 2023-10-25 17:03:52,281 epoch 4 - iter 178/893 - loss 0.04483007 - time (sec): 12.04 - samples/sec: 4282.44 - lr: 0.000023 - momentum: 0.000000
125
+ 2023-10-25 17:03:57,977 epoch 4 - iter 267/893 - loss 0.04464268 - time (sec): 17.74 - samples/sec: 4228.06 - lr: 0.000022 - momentum: 0.000000
126
+ 2023-10-25 17:04:04,177 epoch 4 - iter 356/893 - loss 0.04410251 - time (sec): 23.94 - samples/sec: 4134.91 - lr: 0.000022 - momentum: 0.000000
127
+ 2023-10-25 17:04:10,450 epoch 4 - iter 445/893 - loss 0.04290747 - time (sec): 30.21 - samples/sec: 4118.64 - lr: 0.000022 - momentum: 0.000000
128
+ 2023-10-25 17:04:16,490 epoch 4 - iter 534/893 - loss 0.04337588 - time (sec): 36.25 - samples/sec: 4135.74 - lr: 0.000021 - momentum: 0.000000
129
+ 2023-10-25 17:04:22,526 epoch 4 - iter 623/893 - loss 0.04450695 - time (sec): 42.29 - samples/sec: 4100.04 - lr: 0.000021 - momentum: 0.000000
130
+ 2023-10-25 17:04:28,608 epoch 4 - iter 712/893 - loss 0.04474457 - time (sec): 48.37 - samples/sec: 4104.66 - lr: 0.000021 - momentum: 0.000000
131
+ 2023-10-25 17:04:34,686 epoch 4 - iter 801/893 - loss 0.04576971 - time (sec): 54.45 - samples/sec: 4112.57 - lr: 0.000020 - momentum: 0.000000
132
+ 2023-10-25 17:04:40,591 epoch 4 - iter 890/893 - loss 0.04490676 - time (sec): 60.35 - samples/sec: 4097.87 - lr: 0.000020 - momentum: 0.000000
133
+ 2023-10-25 17:04:40,899 ----------------------------------------------------------------------------------------------------
134
+ 2023-10-25 17:04:40,904 EPOCH 4 done: loss 0.0447 - lr: 0.000020
135
+ 2023-10-25 17:04:45,230 DEV : loss 0.14620383083820343 - f1-score (micro avg) 0.8037
136
+ 2023-10-25 17:04:45,256 saving best model
137
+ 2023-10-25 17:04:46,044 ----------------------------------------------------------------------------------------------------
138
+ 2023-10-25 17:04:52,073 epoch 5 - iter 89/893 - loss 0.03530497 - time (sec): 6.03 - samples/sec: 3848.91 - lr: 0.000020 - momentum: 0.000000
139
+ 2023-10-25 17:04:58,050 epoch 5 - iter 178/893 - loss 0.03402744 - time (sec): 12.00 - samples/sec: 4001.34 - lr: 0.000019 - momentum: 0.000000
140
+ 2023-10-25 17:05:04,150 epoch 5 - iter 267/893 - loss 0.03369793 - time (sec): 18.10 - samples/sec: 4023.43 - lr: 0.000019 - momentum: 0.000000
141
+ 2023-10-25 17:05:10,344 epoch 5 - iter 356/893 - loss 0.03388932 - time (sec): 24.30 - samples/sec: 4021.69 - lr: 0.000019 - momentum: 0.000000
142
+ 2023-10-25 17:05:16,425 epoch 5 - iter 445/893 - loss 0.03377847 - time (sec): 30.38 - samples/sec: 4048.47 - lr: 0.000018 - momentum: 0.000000
143
+ 2023-10-25 17:05:22,583 epoch 5 - iter 534/893 - loss 0.03360074 - time (sec): 36.53 - samples/sec: 4053.71 - lr: 0.000018 - momentum: 0.000000
144
+ 2023-10-25 17:05:28,654 epoch 5 - iter 623/893 - loss 0.03242307 - time (sec): 42.61 - samples/sec: 4046.48 - lr: 0.000018 - momentum: 0.000000
145
+ 2023-10-25 17:05:34,820 epoch 5 - iter 712/893 - loss 0.03229538 - time (sec): 48.77 - samples/sec: 4034.35 - lr: 0.000017 - momentum: 0.000000
146
+ 2023-10-25 17:05:40,937 epoch 5 - iter 801/893 - loss 0.03238963 - time (sec): 54.89 - samples/sec: 4065.34 - lr: 0.000017 - momentum: 0.000000
147
+ 2023-10-25 17:05:47,041 epoch 5 - iter 890/893 - loss 0.03261197 - time (sec): 60.99 - samples/sec: 4063.33 - lr: 0.000017 - momentum: 0.000000
148
+ 2023-10-25 17:05:47,251 ----------------------------------------------------------------------------------------------------
149
+ 2023-10-25 17:05:47,251 EPOCH 5 done: loss 0.0325 - lr: 0.000017
150
+ 2023-10-25 17:05:52,885 DEV : loss 0.1633528769016266 - f1-score (micro avg) 0.797
151
+ 2023-10-25 17:05:52,915 ----------------------------------------------------------------------------------------------------
152
+ 2023-10-25 17:05:59,081 epoch 6 - iter 89/893 - loss 0.03029989 - time (sec): 6.16 - samples/sec: 3842.51 - lr: 0.000016 - momentum: 0.000000
153
+ 2023-10-25 17:06:05,172 epoch 6 - iter 178/893 - loss 0.02564591 - time (sec): 12.26 - samples/sec: 3799.19 - lr: 0.000016 - momentum: 0.000000
154
+ 2023-10-25 17:06:11,310 epoch 6 - iter 267/893 - loss 0.02415048 - time (sec): 18.39 - samples/sec: 3923.45 - lr: 0.000016 - momentum: 0.000000
155
+ 2023-10-25 17:06:17,327 epoch 6 - iter 356/893 - loss 0.02531047 - time (sec): 24.41 - samples/sec: 3979.94 - lr: 0.000015 - momentum: 0.000000
156
+ 2023-10-25 17:06:23,353 epoch 6 - iter 445/893 - loss 0.02540534 - time (sec): 30.44 - samples/sec: 4031.53 - lr: 0.000015 - momentum: 0.000000
157
+ 2023-10-25 17:06:29,489 epoch 6 - iter 534/893 - loss 0.02638207 - time (sec): 36.57 - samples/sec: 4054.77 - lr: 0.000015 - momentum: 0.000000
158
+ 2023-10-25 17:06:35,690 epoch 6 - iter 623/893 - loss 0.02582057 - time (sec): 42.77 - samples/sec: 4044.98 - lr: 0.000014 - momentum: 0.000000
159
+ 2023-10-25 17:06:41,917 epoch 6 - iter 712/893 - loss 0.02512173 - time (sec): 49.00 - samples/sec: 4050.05 - lr: 0.000014 - momentum: 0.000000
160
+ 2023-10-25 17:06:48,057 epoch 6 - iter 801/893 - loss 0.02591665 - time (sec): 55.14 - samples/sec: 4038.91 - lr: 0.000014 - momentum: 0.000000
161
+ 2023-10-25 17:06:54,250 epoch 6 - iter 890/893 - loss 0.02583713 - time (sec): 61.33 - samples/sec: 4048.55 - lr: 0.000013 - momentum: 0.000000
162
+ 2023-10-25 17:06:54,451 ----------------------------------------------------------------------------------------------------
163
+ 2023-10-25 17:06:54,452 EPOCH 6 done: loss 0.0259 - lr: 0.000013
164
+ 2023-10-25 17:06:59,824 DEV : loss 0.18684536218643188 - f1-score (micro avg) 0.7976
165
+ 2023-10-25 17:06:59,848 ----------------------------------------------------------------------------------------------------
166
+ 2023-10-25 17:07:06,024 epoch 7 - iter 89/893 - loss 0.01485614 - time (sec): 6.17 - samples/sec: 3881.67 - lr: 0.000013 - momentum: 0.000000
167
+ 2023-10-25 17:07:12,130 epoch 7 - iter 178/893 - loss 0.01598830 - time (sec): 12.28 - samples/sec: 3958.05 - lr: 0.000013 - momentum: 0.000000
168
+ 2023-10-25 17:07:18,166 epoch 7 - iter 267/893 - loss 0.01783078 - time (sec): 18.32 - samples/sec: 4076.48 - lr: 0.000012 - momentum: 0.000000
169
+ 2023-10-25 17:07:24,070 epoch 7 - iter 356/893 - loss 0.01936177 - time (sec): 24.22 - samples/sec: 4105.59 - lr: 0.000012 - momentum: 0.000000
170
+ 2023-10-25 17:07:30,059 epoch 7 - iter 445/893 - loss 0.01988732 - time (sec): 30.21 - samples/sec: 4146.37 - lr: 0.000012 - momentum: 0.000000
171
+ 2023-10-25 17:07:36,108 epoch 7 - iter 534/893 - loss 0.01928793 - time (sec): 36.26 - samples/sec: 4162.19 - lr: 0.000011 - momentum: 0.000000
172
+ 2023-10-25 17:07:42,411 epoch 7 - iter 623/893 - loss 0.02039273 - time (sec): 42.56 - samples/sec: 4121.89 - lr: 0.000011 - momentum: 0.000000
173
+ 2023-10-25 17:07:48,261 epoch 7 - iter 712/893 - loss 0.02001692 - time (sec): 48.41 - samples/sec: 4092.95 - lr: 0.000011 - momentum: 0.000000
174
+ 2023-10-25 17:07:54,353 epoch 7 - iter 801/893 - loss 0.02017325 - time (sec): 54.50 - samples/sec: 4087.57 - lr: 0.000010 - momentum: 0.000000
175
+ 2023-10-25 17:08:00,473 epoch 7 - iter 890/893 - loss 0.01991182 - time (sec): 60.62 - samples/sec: 4094.74 - lr: 0.000010 - momentum: 0.000000
176
+ 2023-10-25 17:08:00,656 ----------------------------------------------------------------------------------------------------
177
+ 2023-10-25 17:08:00,657 EPOCH 7 done: loss 0.0199 - lr: 0.000010
178
+ 2023-10-25 17:08:05,217 DEV : loss 0.2105928510427475 - f1-score (micro avg) 0.8011
179
+ 2023-10-25 17:08:05,237 ----------------------------------------------------------------------------------------------------
180
+ 2023-10-25 17:08:11,326 epoch 8 - iter 89/893 - loss 0.01743845 - time (sec): 6.09 - samples/sec: 4235.38 - lr: 0.000010 - momentum: 0.000000
181
+ 2023-10-25 17:08:17,447 epoch 8 - iter 178/893 - loss 0.01794746 - time (sec): 12.21 - samples/sec: 4130.72 - lr: 0.000009 - momentum: 0.000000
182
+ 2023-10-25 17:08:23,444 epoch 8 - iter 267/893 - loss 0.01517857 - time (sec): 18.21 - samples/sec: 4109.13 - lr: 0.000009 - momentum: 0.000000
183
+ 2023-10-25 17:08:29,466 epoch 8 - iter 356/893 - loss 0.01542169 - time (sec): 24.23 - samples/sec: 4050.01 - lr: 0.000009 - momentum: 0.000000
184
+ 2023-10-25 17:08:35,548 epoch 8 - iter 445/893 - loss 0.01465711 - time (sec): 30.31 - samples/sec: 4032.48 - lr: 0.000008 - momentum: 0.000000
185
+ 2023-10-25 17:08:41,861 epoch 8 - iter 534/893 - loss 0.01456755 - time (sec): 36.62 - samples/sec: 4029.83 - lr: 0.000008 - momentum: 0.000000
186
+ 2023-10-25 17:08:47,636 epoch 8 - iter 623/893 - loss 0.01409835 - time (sec): 42.40 - samples/sec: 4055.86 - lr: 0.000008 - momentum: 0.000000
187
+ 2023-10-25 17:08:53,646 epoch 8 - iter 712/893 - loss 0.01384014 - time (sec): 48.41 - samples/sec: 4051.49 - lr: 0.000007 - momentum: 0.000000
188
+ 2023-10-25 17:08:59,620 epoch 8 - iter 801/893 - loss 0.01422010 - time (sec): 54.38 - samples/sec: 4078.14 - lr: 0.000007 - momentum: 0.000000
189
+ 2023-10-25 17:09:05,944 epoch 8 - iter 890/893 - loss 0.01456695 - time (sec): 60.71 - samples/sec: 4085.39 - lr: 0.000007 - momentum: 0.000000
190
+ 2023-10-25 17:09:06,139 ----------------------------------------------------------------------------------------------------
191
+ 2023-10-25 17:09:06,140 EPOCH 8 done: loss 0.0146 - lr: 0.000007
192
+ 2023-10-25 17:09:11,159 DEV : loss 0.21266496181488037 - f1-score (micro avg) 0.7947
193
+ 2023-10-25 17:09:11,180 ----------------------------------------------------------------------------------------------------
194
+ 2023-10-25 17:09:17,250 epoch 9 - iter 89/893 - loss 0.00472214 - time (sec): 6.07 - samples/sec: 4171.11 - lr: 0.000006 - momentum: 0.000000
195
+ 2023-10-25 17:09:23,216 epoch 9 - iter 178/893 - loss 0.00879912 - time (sec): 12.03 - samples/sec: 4177.79 - lr: 0.000006 - momentum: 0.000000
196
+ 2023-10-25 17:09:29,325 epoch 9 - iter 267/893 - loss 0.01001564 - time (sec): 18.14 - samples/sec: 4086.48 - lr: 0.000006 - momentum: 0.000000
197
+ 2023-10-25 17:09:35,358 epoch 9 - iter 356/893 - loss 0.01086924 - time (sec): 24.18 - samples/sec: 4140.04 - lr: 0.000005 - momentum: 0.000000
198
+ 2023-10-25 17:09:41,382 epoch 9 - iter 445/893 - loss 0.01063271 - time (sec): 30.20 - samples/sec: 4151.32 - lr: 0.000005 - momentum: 0.000000
199
+ 2023-10-25 17:09:47,352 epoch 9 - iter 534/893 - loss 0.01049232 - time (sec): 36.17 - samples/sec: 4114.54 - lr: 0.000005 - momentum: 0.000000
200
+ 2023-10-25 17:09:53,532 epoch 9 - iter 623/893 - loss 0.01032612 - time (sec): 42.35 - samples/sec: 4133.27 - lr: 0.000004 - momentum: 0.000000
201
+ 2023-10-25 17:09:59,456 epoch 9 - iter 712/893 - loss 0.01040970 - time (sec): 48.27 - samples/sec: 4105.70 - lr: 0.000004 - momentum: 0.000000
202
+ 2023-10-25 17:10:05,526 epoch 9 - iter 801/893 - loss 0.01045359 - time (sec): 54.34 - samples/sec: 4091.22 - lr: 0.000004 - momentum: 0.000000
203
+ 2023-10-25 17:10:11,623 epoch 9 - iter 890/893 - loss 0.01060696 - time (sec): 60.44 - samples/sec: 4100.12 - lr: 0.000003 - momentum: 0.000000
204
+ 2023-10-25 17:10:11,820 ----------------------------------------------------------------------------------------------------
205
+ 2023-10-25 17:10:11,820 EPOCH 9 done: loss 0.0106 - lr: 0.000003
206
+ 2023-10-25 17:10:17,087 DEV : loss 0.2295289933681488 - f1-score (micro avg) 0.8011
207
+ 2023-10-25 17:10:17,112 ----------------------------------------------------------------------------------------------------
208
+ 2023-10-25 17:10:23,006 epoch 10 - iter 89/893 - loss 0.01002804 - time (sec): 5.89 - samples/sec: 4117.60 - lr: 0.000003 - momentum: 0.000000
209
+ 2023-10-25 17:10:29,006 epoch 10 - iter 178/893 - loss 0.01019071 - time (sec): 11.89 - samples/sec: 3983.06 - lr: 0.000003 - momentum: 0.000000
210
+ 2023-10-25 17:10:35,268 epoch 10 - iter 267/893 - loss 0.00952875 - time (sec): 18.15 - samples/sec: 4042.88 - lr: 0.000002 - momentum: 0.000000
211
+ 2023-10-25 17:10:41,474 epoch 10 - iter 356/893 - loss 0.00888864 - time (sec): 24.36 - samples/sec: 4047.46 - lr: 0.000002 - momentum: 0.000000
212
+ 2023-10-25 17:10:47,407 epoch 10 - iter 445/893 - loss 0.00924368 - time (sec): 30.29 - samples/sec: 4025.62 - lr: 0.000002 - momentum: 0.000000
213
+ 2023-10-25 17:10:53,592 epoch 10 - iter 534/893 - loss 0.00922564 - time (sec): 36.48 - samples/sec: 4056.96 - lr: 0.000001 - momentum: 0.000000
214
+ 2023-10-25 17:10:59,611 epoch 10 - iter 623/893 - loss 0.00891649 - time (sec): 42.50 - samples/sec: 4071.51 - lr: 0.000001 - momentum: 0.000000
215
+ 2023-10-25 17:11:05,620 epoch 10 - iter 712/893 - loss 0.00840213 - time (sec): 48.51 - samples/sec: 4045.77 - lr: 0.000001 - momentum: 0.000000
216
+ 2023-10-25 17:11:11,776 epoch 10 - iter 801/893 - loss 0.00803404 - time (sec): 54.66 - samples/sec: 4059.14 - lr: 0.000000 - momentum: 0.000000
217
+ 2023-10-25 17:11:18,055 epoch 10 - iter 890/893 - loss 0.00778595 - time (sec): 60.94 - samples/sec: 4068.83 - lr: 0.000000 - momentum: 0.000000
218
+ 2023-10-25 17:11:18,253 ----------------------------------------------------------------------------------------------------
219
+ 2023-10-25 17:11:18,253 EPOCH 10 done: loss 0.0078 - lr: 0.000000
220
+ 2023-10-25 17:11:22,853 DEV : loss 0.23914724588394165 - f1-score (micro avg) 0.7997
221
+ 2023-10-25 17:11:23,516 ----------------------------------------------------------------------------------------------------
222
+ 2023-10-25 17:11:23,517 Loading model from best epoch ...
223
+ 2023-10-25 17:11:25,628 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
224
+ 2023-10-25 17:11:37,463
225
+ Results:
226
+ - F-score (micro) 0.6825
227
+ - F-score (macro) 0.5925
228
+ - Accuracy 0.5411
229
+
230
+ By class:
231
+ precision recall f1-score support
232
+
233
+ LOC 0.7044 0.6813 0.6927 1095
234
+ PER 0.7967 0.7628 0.7794 1012
235
+ ORG 0.3908 0.5966 0.4723 357
236
+ HumanProd 0.3279 0.6061 0.4255 33
237
+
238
+ micro avg 0.6648 0.7012 0.6825 2497
239
+ macro avg 0.5550 0.6617 0.5925 2497
240
+ weighted avg 0.6920 0.7012 0.6928 2497
241
+
242
+ 2023-10-25 17:11:37,464 ----------------------------------------------------------------------------------------------------