stefan-it commited on
Commit
fa54ea9
1 Parent(s): e8de840

Upload ./training.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. training.log +245 -0
training.log ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-25 15:07:16,252 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-25 15:07:16,253 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(64001, 768)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0-11): 12 x BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ )
39
+ )
40
+ (pooler): BertPooler(
41
+ (dense): Linear(in_features=768, out_features=768, bias=True)
42
+ (activation): Tanh()
43
+ )
44
+ )
45
+ )
46
+ (locked_dropout): LockedDropout(p=0.5)
47
+ (linear): Linear(in_features=768, out_features=17, bias=True)
48
+ (loss_function): CrossEntropyLoss()
49
+ )"
50
+ 2023-10-25 15:07:16,253 ----------------------------------------------------------------------------------------------------
51
+ 2023-10-25 15:07:16,254 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
52
+ - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
53
+ 2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
54
+ 2023-10-25 15:07:16,254 Train: 7142 sentences
55
+ 2023-10-25 15:07:16,254 (train_with_dev=False, train_with_test=False)
56
+ 2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
57
+ 2023-10-25 15:07:16,254 Training Params:
58
+ 2023-10-25 15:07:16,254 - learning_rate: "3e-05"
59
+ 2023-10-25 15:07:16,254 - mini_batch_size: "8"
60
+ 2023-10-25 15:07:16,254 - max_epochs: "10"
61
+ 2023-10-25 15:07:16,254 - shuffle: "True"
62
+ 2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
63
+ 2023-10-25 15:07:16,254 Plugins:
64
+ 2023-10-25 15:07:16,254 - TensorboardLogger
65
+ 2023-10-25 15:07:16,254 - LinearScheduler | warmup_fraction: '0.1'
66
+ 2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
67
+ 2023-10-25 15:07:16,254 Final evaluation on model from best epoch (best-model.pt)
68
+ 2023-10-25 15:07:16,254 - metric: "('micro avg', 'f1-score')"
69
+ 2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
70
+ 2023-10-25 15:07:16,254 Computation:
71
+ 2023-10-25 15:07:16,254 - compute on device: cuda:0
72
+ 2023-10-25 15:07:16,254 - embedding storage: none
73
+ 2023-10-25 15:07:16,254 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-25 15:07:16,254 Model training base path: "hmbench-newseye/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2"
75
+ 2023-10-25 15:07:16,255 ----------------------------------------------------------------------------------------------------
76
+ 2023-10-25 15:07:16,255 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-25 15:07:16,255 Logging anything other than scalars to TensorBoard is currently not supported.
78
+ 2023-10-25 15:07:22,287 epoch 1 - iter 89/893 - loss 2.32679756 - time (sec): 6.03 - samples/sec: 4228.16 - lr: 0.000003 - momentum: 0.000000
79
+ 2023-10-25 15:07:27,984 epoch 1 - iter 178/893 - loss 1.51396462 - time (sec): 11.73 - samples/sec: 4163.39 - lr: 0.000006 - momentum: 0.000000
80
+ 2023-10-25 15:07:33,714 epoch 1 - iter 267/893 - loss 1.14478279 - time (sec): 17.46 - samples/sec: 4138.90 - lr: 0.000009 - momentum: 0.000000
81
+ 2023-10-25 15:07:39,758 epoch 1 - iter 356/893 - loss 0.92624548 - time (sec): 23.50 - samples/sec: 4118.92 - lr: 0.000012 - momentum: 0.000000
82
+ 2023-10-25 15:07:45,647 epoch 1 - iter 445/893 - loss 0.77920214 - time (sec): 29.39 - samples/sec: 4149.48 - lr: 0.000015 - momentum: 0.000000
83
+ 2023-10-25 15:07:51,499 epoch 1 - iter 534/893 - loss 0.67179663 - time (sec): 35.24 - samples/sec: 4208.38 - lr: 0.000018 - momentum: 0.000000
84
+ 2023-10-25 15:07:57,019 epoch 1 - iter 623/893 - loss 0.60130729 - time (sec): 40.76 - samples/sec: 4247.79 - lr: 0.000021 - momentum: 0.000000
85
+ 2023-10-25 15:08:02,490 epoch 1 - iter 712/893 - loss 0.54657076 - time (sec): 46.23 - samples/sec: 4282.03 - lr: 0.000024 - momentum: 0.000000
86
+ 2023-10-25 15:08:07,955 epoch 1 - iter 801/893 - loss 0.50230078 - time (sec): 51.70 - samples/sec: 4306.93 - lr: 0.000027 - momentum: 0.000000
87
+ 2023-10-25 15:08:13,549 epoch 1 - iter 890/893 - loss 0.46653814 - time (sec): 57.29 - samples/sec: 4330.30 - lr: 0.000030 - momentum: 0.000000
88
+ 2023-10-25 15:08:13,712 ----------------------------------------------------------------------------------------------------
89
+ 2023-10-25 15:08:13,712 EPOCH 1 done: loss 0.4656 - lr: 0.000030
90
+ 2023-10-25 15:08:17,345 DEV : loss 0.1060444563627243 - f1-score (micro avg) 0.7387
91
+ 2023-10-25 15:08:17,369 saving best model
92
+ 2023-10-25 15:08:17,905 ----------------------------------------------------------------------------------------------------
93
+ 2023-10-25 15:08:23,847 epoch 2 - iter 89/893 - loss 0.11786204 - time (sec): 5.94 - samples/sec: 4321.04 - lr: 0.000030 - momentum: 0.000000
94
+ 2023-10-25 15:08:29,333 epoch 2 - iter 178/893 - loss 0.11830957 - time (sec): 11.43 - samples/sec: 4126.75 - lr: 0.000029 - momentum: 0.000000
95
+ 2023-10-25 15:08:35,534 epoch 2 - iter 267/893 - loss 0.11117703 - time (sec): 17.63 - samples/sec: 4187.28 - lr: 0.000029 - momentum: 0.000000
96
+ 2023-10-25 15:08:41,384 epoch 2 - iter 356/893 - loss 0.11088492 - time (sec): 23.48 - samples/sec: 4194.66 - lr: 0.000029 - momentum: 0.000000
97
+ 2023-10-25 15:08:47,307 epoch 2 - iter 445/893 - loss 0.10753992 - time (sec): 29.40 - samples/sec: 4218.91 - lr: 0.000028 - momentum: 0.000000
98
+ 2023-10-25 15:08:53,058 epoch 2 - iter 534/893 - loss 0.10789428 - time (sec): 35.15 - samples/sec: 4232.20 - lr: 0.000028 - momentum: 0.000000
99
+ 2023-10-25 15:08:58,673 epoch 2 - iter 623/893 - loss 0.10558977 - time (sec): 40.77 - samples/sec: 4289.97 - lr: 0.000028 - momentum: 0.000000
100
+ 2023-10-25 15:09:04,088 epoch 2 - iter 712/893 - loss 0.10375529 - time (sec): 46.18 - samples/sec: 4271.55 - lr: 0.000027 - momentum: 0.000000
101
+ 2023-10-25 15:09:09,613 epoch 2 - iter 801/893 - loss 0.10372405 - time (sec): 51.71 - samples/sec: 4305.77 - lr: 0.000027 - momentum: 0.000000
102
+ 2023-10-25 15:09:15,240 epoch 2 - iter 890/893 - loss 0.10324515 - time (sec): 57.33 - samples/sec: 4321.41 - lr: 0.000027 - momentum: 0.000000
103
+ 2023-10-25 15:09:15,429 ----------------------------------------------------------------------------------------------------
104
+ 2023-10-25 15:09:15,430 EPOCH 2 done: loss 0.1031 - lr: 0.000027
105
+ 2023-10-25 15:09:20,245 DEV : loss 0.09593858569860458 - f1-score (micro avg) 0.777
106
+ 2023-10-25 15:09:20,268 saving best model
107
+ 2023-10-25 15:09:20,917 ----------------------------------------------------------------------------------------------------
108
+ 2023-10-25 15:09:26,476 epoch 3 - iter 89/893 - loss 0.06107853 - time (sec): 5.56 - samples/sec: 4578.32 - lr: 0.000026 - momentum: 0.000000
109
+ 2023-10-25 15:09:31,970 epoch 3 - iter 178/893 - loss 0.05655927 - time (sec): 11.05 - samples/sec: 4428.83 - lr: 0.000026 - momentum: 0.000000
110
+ 2023-10-25 15:09:37,621 epoch 3 - iter 267/893 - loss 0.05765556 - time (sec): 16.70 - samples/sec: 4489.95 - lr: 0.000026 - momentum: 0.000000
111
+ 2023-10-25 15:09:43,134 epoch 3 - iter 356/893 - loss 0.05899019 - time (sec): 22.22 - samples/sec: 4487.90 - lr: 0.000025 - momentum: 0.000000
112
+ 2023-10-25 15:09:48,684 epoch 3 - iter 445/893 - loss 0.06059285 - time (sec): 27.77 - samples/sec: 4430.84 - lr: 0.000025 - momentum: 0.000000
113
+ 2023-10-25 15:09:54,437 epoch 3 - iter 534/893 - loss 0.06203883 - time (sec): 33.52 - samples/sec: 4406.62 - lr: 0.000025 - momentum: 0.000000
114
+ 2023-10-25 15:10:00,394 epoch 3 - iter 623/893 - loss 0.06228959 - time (sec): 39.48 - samples/sec: 4403.41 - lr: 0.000024 - momentum: 0.000000
115
+ 2023-10-25 15:10:06,271 epoch 3 - iter 712/893 - loss 0.06223592 - time (sec): 45.35 - samples/sec: 4403.81 - lr: 0.000024 - momentum: 0.000000
116
+ 2023-10-25 15:10:12,116 epoch 3 - iter 801/893 - loss 0.06176464 - time (sec): 51.20 - samples/sec: 4397.44 - lr: 0.000024 - momentum: 0.000000
117
+ 2023-10-25 15:10:17,781 epoch 3 - iter 890/893 - loss 0.06140542 - time (sec): 56.86 - samples/sec: 4364.69 - lr: 0.000023 - momentum: 0.000000
118
+ 2023-10-25 15:10:17,965 ----------------------------------------------------------------------------------------------------
119
+ 2023-10-25 15:10:17,965 EPOCH 3 done: loss 0.0613 - lr: 0.000023
120
+ 2023-10-25 15:10:22,849 DEV : loss 0.10392870754003525 - f1-score (micro avg) 0.7824
121
+ 2023-10-25 15:10:22,870 saving best model
122
+ 2023-10-25 15:10:23,572 ----------------------------------------------------------------------------------------------------
123
+ 2023-10-25 15:10:29,407 epoch 4 - iter 89/893 - loss 0.04410301 - time (sec): 5.83 - samples/sec: 4278.04 - lr: 0.000023 - momentum: 0.000000
124
+ 2023-10-25 15:10:35,183 epoch 4 - iter 178/893 - loss 0.04544728 - time (sec): 11.61 - samples/sec: 4301.00 - lr: 0.000023 - momentum: 0.000000
125
+ 2023-10-25 15:10:40,777 epoch 4 - iter 267/893 - loss 0.04597693 - time (sec): 17.20 - samples/sec: 4268.80 - lr: 0.000022 - momentum: 0.000000
126
+ 2023-10-25 15:10:46,358 epoch 4 - iter 356/893 - loss 0.04537082 - time (sec): 22.78 - samples/sec: 4353.99 - lr: 0.000022 - momentum: 0.000000
127
+ 2023-10-25 15:10:52,237 epoch 4 - iter 445/893 - loss 0.04624815 - time (sec): 28.66 - samples/sec: 4335.26 - lr: 0.000022 - momentum: 0.000000
128
+ 2023-10-25 15:10:58,294 epoch 4 - iter 534/893 - loss 0.04475990 - time (sec): 34.72 - samples/sec: 4348.02 - lr: 0.000021 - momentum: 0.000000
129
+ 2023-10-25 15:11:04,125 epoch 4 - iter 623/893 - loss 0.04559760 - time (sec): 40.55 - samples/sec: 4316.02 - lr: 0.000021 - momentum: 0.000000
130
+ 2023-10-25 15:11:09,950 epoch 4 - iter 712/893 - loss 0.04461195 - time (sec): 46.38 - samples/sec: 4271.43 - lr: 0.000021 - momentum: 0.000000
131
+ 2023-10-25 15:11:15,961 epoch 4 - iter 801/893 - loss 0.04371497 - time (sec): 52.39 - samples/sec: 4285.47 - lr: 0.000020 - momentum: 0.000000
132
+ 2023-10-25 15:11:21,628 epoch 4 - iter 890/893 - loss 0.04355757 - time (sec): 58.05 - samples/sec: 4275.23 - lr: 0.000020 - momentum: 0.000000
133
+ 2023-10-25 15:11:21,806 ----------------------------------------------------------------------------------------------------
134
+ 2023-10-25 15:11:21,807 EPOCH 4 done: loss 0.0437 - lr: 0.000020
135
+ 2023-10-25 15:11:25,871 DEV : loss 0.1405394971370697 - f1-score (micro avg) 0.7739
136
+ 2023-10-25 15:11:25,895 ----------------------------------------------------------------------------------------------------
137
+ 2023-10-25 15:11:31,806 epoch 5 - iter 89/893 - loss 0.03086483 - time (sec): 5.91 - samples/sec: 4018.56 - lr: 0.000020 - momentum: 0.000000
138
+ 2023-10-25 15:11:37,763 epoch 5 - iter 178/893 - loss 0.03408901 - time (sec): 11.87 - samples/sec: 4169.34 - lr: 0.000019 - momentum: 0.000000
139
+ 2023-10-25 15:11:43,934 epoch 5 - iter 267/893 - loss 0.03368610 - time (sec): 18.04 - samples/sec: 4168.56 - lr: 0.000019 - momentum: 0.000000
140
+ 2023-10-25 15:11:49,742 epoch 5 - iter 356/893 - loss 0.03342150 - time (sec): 23.84 - samples/sec: 4171.56 - lr: 0.000019 - momentum: 0.000000
141
+ 2023-10-25 15:11:55,551 epoch 5 - iter 445/893 - loss 0.03366136 - time (sec): 29.65 - samples/sec: 4169.40 - lr: 0.000018 - momentum: 0.000000
142
+ 2023-10-25 15:12:01,573 epoch 5 - iter 534/893 - loss 0.03390282 - time (sec): 35.68 - samples/sec: 4202.78 - lr: 0.000018 - momentum: 0.000000
143
+ 2023-10-25 15:12:07,368 epoch 5 - iter 623/893 - loss 0.03357706 - time (sec): 41.47 - samples/sec: 4197.80 - lr: 0.000018 - momentum: 0.000000
144
+ 2023-10-25 15:12:13,096 epoch 5 - iter 712/893 - loss 0.03266215 - time (sec): 47.20 - samples/sec: 4199.37 - lr: 0.000017 - momentum: 0.000000
145
+ 2023-10-25 15:12:18,874 epoch 5 - iter 801/893 - loss 0.03391376 - time (sec): 52.98 - samples/sec: 4207.00 - lr: 0.000017 - momentum: 0.000000
146
+ 2023-10-25 15:12:24,683 epoch 5 - iter 890/893 - loss 0.03419362 - time (sec): 58.79 - samples/sec: 4219.53 - lr: 0.000017 - momentum: 0.000000
147
+ 2023-10-25 15:12:24,885 ----------------------------------------------------------------------------------------------------
148
+ 2023-10-25 15:12:24,885 EPOCH 5 done: loss 0.0341 - lr: 0.000017
149
+ 2023-10-25 15:12:29,919 DEV : loss 0.16618064045906067 - f1-score (micro avg) 0.8051
150
+ 2023-10-25 15:12:29,940 saving best model
151
+ 2023-10-25 15:12:30,620 ----------------------------------------------------------------------------------------------------
152
+ 2023-10-25 15:12:36,386 epoch 6 - iter 89/893 - loss 0.01934906 - time (sec): 5.76 - samples/sec: 4367.06 - lr: 0.000016 - momentum: 0.000000
153
+ 2023-10-25 15:12:42,034 epoch 6 - iter 178/893 - loss 0.01971571 - time (sec): 11.41 - samples/sec: 4277.35 - lr: 0.000016 - momentum: 0.000000
154
+ 2023-10-25 15:12:48,099 epoch 6 - iter 267/893 - loss 0.01891099 - time (sec): 17.48 - samples/sec: 4240.69 - lr: 0.000016 - momentum: 0.000000
155
+ 2023-10-25 15:12:54,052 epoch 6 - iter 356/893 - loss 0.02463843 - time (sec): 23.43 - samples/sec: 4232.23 - lr: 0.000015 - momentum: 0.000000
156
+ 2023-10-25 15:12:59,977 epoch 6 - iter 445/893 - loss 0.02485032 - time (sec): 29.35 - samples/sec: 4266.10 - lr: 0.000015 - momentum: 0.000000
157
+ 2023-10-25 15:13:05,810 epoch 6 - iter 534/893 - loss 0.02602922 - time (sec): 35.19 - samples/sec: 4217.86 - lr: 0.000015 - momentum: 0.000000
158
+ 2023-10-25 15:13:11,965 epoch 6 - iter 623/893 - loss 0.02550398 - time (sec): 41.34 - samples/sec: 4206.73 - lr: 0.000014 - momentum: 0.000000
159
+ 2023-10-25 15:13:17,850 epoch 6 - iter 712/893 - loss 0.02490406 - time (sec): 47.23 - samples/sec: 4219.72 - lr: 0.000014 - momentum: 0.000000
160
+ 2023-10-25 15:13:23,687 epoch 6 - iter 801/893 - loss 0.02516364 - time (sec): 53.06 - samples/sec: 4220.25 - lr: 0.000014 - momentum: 0.000000
161
+ 2023-10-25 15:13:29,433 epoch 6 - iter 890/893 - loss 0.02571510 - time (sec): 58.81 - samples/sec: 4220.50 - lr: 0.000013 - momentum: 0.000000
162
+ 2023-10-25 15:13:29,621 ----------------------------------------------------------------------------------------------------
163
+ 2023-10-25 15:13:29,622 EPOCH 6 done: loss 0.0258 - lr: 0.000013
164
+ 2023-10-25 15:13:34,396 DEV : loss 0.17371046543121338 - f1-score (micro avg) 0.8112
165
+ 2023-10-25 15:13:34,417 saving best model
166
+ 2023-10-25 15:13:35,072 ----------------------------------------------------------------------------------------------------
167
+ 2023-10-25 15:13:41,193 epoch 7 - iter 89/893 - loss 0.02011470 - time (sec): 6.12 - samples/sec: 4355.79 - lr: 0.000013 - momentum: 0.000000
168
+ 2023-10-25 15:13:47,009 epoch 7 - iter 178/893 - loss 0.02325248 - time (sec): 11.94 - samples/sec: 4245.96 - lr: 0.000013 - momentum: 0.000000
169
+ 2023-10-25 15:13:52,906 epoch 7 - iter 267/893 - loss 0.02032808 - time (sec): 17.83 - samples/sec: 4206.61 - lr: 0.000012 - momentum: 0.000000
170
+ 2023-10-25 15:13:58,744 epoch 7 - iter 356/893 - loss 0.02038788 - time (sec): 23.67 - samples/sec: 4235.40 - lr: 0.000012 - momentum: 0.000000
171
+ 2023-10-25 15:14:04,629 epoch 7 - iter 445/893 - loss 0.01997941 - time (sec): 29.56 - samples/sec: 4217.35 - lr: 0.000012 - momentum: 0.000000
172
+ 2023-10-25 15:14:10,590 epoch 7 - iter 534/893 - loss 0.01925592 - time (sec): 35.52 - samples/sec: 4185.73 - lr: 0.000011 - momentum: 0.000000
173
+ 2023-10-25 15:14:16,316 epoch 7 - iter 623/893 - loss 0.01978486 - time (sec): 41.24 - samples/sec: 4162.69 - lr: 0.000011 - momentum: 0.000000
174
+ 2023-10-25 15:14:22,572 epoch 7 - iter 712/893 - loss 0.01991371 - time (sec): 47.50 - samples/sec: 4168.20 - lr: 0.000011 - momentum: 0.000000
175
+ 2023-10-25 15:14:28,265 epoch 7 - iter 801/893 - loss 0.01994353 - time (sec): 53.19 - samples/sec: 4181.97 - lr: 0.000010 - momentum: 0.000000
176
+ 2023-10-25 15:14:34,108 epoch 7 - iter 890/893 - loss 0.02008156 - time (sec): 59.03 - samples/sec: 4202.26 - lr: 0.000010 - momentum: 0.000000
177
+ 2023-10-25 15:14:34,306 ----------------------------------------------------------------------------------------------------
178
+ 2023-10-25 15:14:34,306 EPOCH 7 done: loss 0.0200 - lr: 0.000010
179
+ 2023-10-25 15:14:38,320 DEV : loss 0.17937816679477692 - f1-score (micro avg) 0.8123
180
+ 2023-10-25 15:14:38,342 saving best model
181
+ 2023-10-25 15:14:39,015 ----------------------------------------------------------------------------------------------------
182
+ 2023-10-25 15:14:44,770 epoch 8 - iter 89/893 - loss 0.01881002 - time (sec): 5.75 - samples/sec: 4163.23 - lr: 0.000010 - momentum: 0.000000
183
+ 2023-10-25 15:14:50,519 epoch 8 - iter 178/893 - loss 0.01785540 - time (sec): 11.50 - samples/sec: 4223.67 - lr: 0.000009 - momentum: 0.000000
184
+ 2023-10-25 15:14:56,466 epoch 8 - iter 267/893 - loss 0.01663373 - time (sec): 17.45 - samples/sec: 4239.61 - lr: 0.000009 - momentum: 0.000000
185
+ 2023-10-25 15:15:02,200 epoch 8 - iter 356/893 - loss 0.01641502 - time (sec): 23.18 - samples/sec: 4212.11 - lr: 0.000009 - momentum: 0.000000
186
+ 2023-10-25 15:15:08,033 epoch 8 - iter 445/893 - loss 0.01638939 - time (sec): 29.02 - samples/sec: 4217.12 - lr: 0.000008 - momentum: 0.000000
187
+ 2023-10-25 15:15:14,046 epoch 8 - iter 534/893 - loss 0.01532943 - time (sec): 35.03 - samples/sec: 4218.47 - lr: 0.000008 - momentum: 0.000000
188
+ 2023-10-25 15:15:20,415 epoch 8 - iter 623/893 - loss 0.01502360 - time (sec): 41.40 - samples/sec: 4207.79 - lr: 0.000008 - momentum: 0.000000
189
+ 2023-10-25 15:15:26,296 epoch 8 - iter 712/893 - loss 0.01549364 - time (sec): 47.28 - samples/sec: 4180.53 - lr: 0.000007 - momentum: 0.000000
190
+ 2023-10-25 15:15:32,060 epoch 8 - iter 801/893 - loss 0.01600174 - time (sec): 53.04 - samples/sec: 4193.62 - lr: 0.000007 - momentum: 0.000000
191
+ 2023-10-25 15:15:38,072 epoch 8 - iter 890/893 - loss 0.01593321 - time (sec): 59.06 - samples/sec: 4200.02 - lr: 0.000007 - momentum: 0.000000
192
+ 2023-10-25 15:15:38,267 ----------------------------------------------------------------------------------------------------
193
+ 2023-10-25 15:15:38,267 EPOCH 8 done: loss 0.0159 - lr: 0.000007
194
+ 2023-10-25 15:15:43,263 DEV : loss 0.21227356791496277 - f1-score (micro avg) 0.7971
195
+ 2023-10-25 15:15:43,284 ----------------------------------------------------------------------------------------------------
196
+ 2023-10-25 15:15:49,118 epoch 9 - iter 89/893 - loss 0.00920994 - time (sec): 5.83 - samples/sec: 4231.87 - lr: 0.000006 - momentum: 0.000000
197
+ 2023-10-25 15:15:54,872 epoch 9 - iter 178/893 - loss 0.00891932 - time (sec): 11.59 - samples/sec: 4226.85 - lr: 0.000006 - momentum: 0.000000
198
+ 2023-10-25 15:16:00,508 epoch 9 - iter 267/893 - loss 0.01039436 - time (sec): 17.22 - samples/sec: 4278.15 - lr: 0.000006 - momentum: 0.000000
199
+ 2023-10-25 15:16:06,782 epoch 9 - iter 356/893 - loss 0.01056567 - time (sec): 23.50 - samples/sec: 4283.75 - lr: 0.000005 - momentum: 0.000000
200
+ 2023-10-25 15:16:12,664 epoch 9 - iter 445/893 - loss 0.01205154 - time (sec): 29.38 - samples/sec: 4283.65 - lr: 0.000005 - momentum: 0.000000
201
+ 2023-10-25 15:16:18,609 epoch 9 - iter 534/893 - loss 0.01179973 - time (sec): 35.32 - samples/sec: 4285.62 - lr: 0.000005 - momentum: 0.000000
202
+ 2023-10-25 15:16:24,456 epoch 9 - iter 623/893 - loss 0.01148151 - time (sec): 41.17 - samples/sec: 4253.07 - lr: 0.000004 - momentum: 0.000000
203
+ 2023-10-25 15:16:30,157 epoch 9 - iter 712/893 - loss 0.01108116 - time (sec): 46.87 - samples/sec: 4279.34 - lr: 0.000004 - momentum: 0.000000
204
+ 2023-10-25 15:16:35,747 epoch 9 - iter 801/893 - loss 0.01083303 - time (sec): 52.46 - samples/sec: 4280.96 - lr: 0.000004 - momentum: 0.000000
205
+ 2023-10-25 15:16:41,190 epoch 9 - iter 890/893 - loss 0.01082633 - time (sec): 57.90 - samples/sec: 4284.21 - lr: 0.000003 - momentum: 0.000000
206
+ 2023-10-25 15:16:41,365 ----------------------------------------------------------------------------------------------------
207
+ 2023-10-25 15:16:41,366 EPOCH 9 done: loss 0.0108 - lr: 0.000003
208
+ 2023-10-25 15:16:46,208 DEV : loss 0.21176157891750336 - f1-score (micro avg) 0.8104
209
+ 2023-10-25 15:16:46,230 ----------------------------------------------------------------------------------------------------
210
+ 2023-10-25 15:16:51,777 epoch 10 - iter 89/893 - loss 0.01317723 - time (sec): 5.55 - samples/sec: 4308.65 - lr: 0.000003 - momentum: 0.000000
211
+ 2023-10-25 15:16:57,614 epoch 10 - iter 178/893 - loss 0.00887501 - time (sec): 11.38 - samples/sec: 4360.90 - lr: 0.000003 - momentum: 0.000000
212
+ 2023-10-25 15:17:03,568 epoch 10 - iter 267/893 - loss 0.00690912 - time (sec): 17.34 - samples/sec: 4311.16 - lr: 0.000002 - momentum: 0.000000
213
+ 2023-10-25 15:17:09,586 epoch 10 - iter 356/893 - loss 0.00707851 - time (sec): 23.35 - samples/sec: 4278.17 - lr: 0.000002 - momentum: 0.000000
214
+ 2023-10-25 15:17:15,125 epoch 10 - iter 445/893 - loss 0.00657208 - time (sec): 28.89 - samples/sec: 4225.74 - lr: 0.000002 - momentum: 0.000000
215
+ 2023-10-25 15:17:20,950 epoch 10 - iter 534/893 - loss 0.00656213 - time (sec): 34.72 - samples/sec: 4263.47 - lr: 0.000001 - momentum: 0.000000
216
+ 2023-10-25 15:17:26,715 epoch 10 - iter 623/893 - loss 0.00748677 - time (sec): 40.48 - samples/sec: 4268.60 - lr: 0.000001 - momentum: 0.000000
217
+ 2023-10-25 15:17:32,254 epoch 10 - iter 712/893 - loss 0.00756648 - time (sec): 46.02 - samples/sec: 4296.21 - lr: 0.000001 - momentum: 0.000000
218
+ 2023-10-25 15:17:37,995 epoch 10 - iter 801/893 - loss 0.00771944 - time (sec): 51.76 - samples/sec: 4279.89 - lr: 0.000000 - momentum: 0.000000
219
+ 2023-10-25 15:17:43,827 epoch 10 - iter 890/893 - loss 0.00812425 - time (sec): 57.60 - samples/sec: 4307.35 - lr: 0.000000 - momentum: 0.000000
220
+ 2023-10-25 15:17:43,999 ----------------------------------------------------------------------------------------------------
221
+ 2023-10-25 15:17:43,999 EPOCH 10 done: loss 0.0081 - lr: 0.000000
222
+ 2023-10-25 15:17:47,978 DEV : loss 0.21720275282859802 - f1-score (micro avg) 0.8147
223
+ 2023-10-25 15:17:47,999 saving best model
224
+ 2023-10-25 15:17:49,119 ----------------------------------------------------------------------------------------------------
225
+ 2023-10-25 15:17:49,120 Loading model from best epoch ...
226
+ 2023-10-25 15:17:51,035 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
227
+ 2023-10-25 15:18:03,575
228
+ Results:
229
+ - F-score (micro) 0.6992
230
+ - F-score (macro) 0.6245
231
+ - Accuracy 0.5561
232
+
233
+ By class:
234
+ precision recall f1-score support
235
+
236
+ LOC 0.7038 0.6986 0.7012 1095
237
+ PER 0.7808 0.7816 0.7812 1012
238
+ ORG 0.4549 0.5798 0.5099 357
239
+ HumanProd 0.4074 0.6667 0.5057 33
240
+
241
+ micro avg 0.6842 0.7149 0.6992 2497
242
+ macro avg 0.5867 0.6817 0.6245 2497
243
+ weighted avg 0.6955 0.7149 0.7037 2497
244
+
245
+ 2023-10-25 15:18:03,576 ----------------------------------------------------------------------------------------------------