stefan-it commited on
Commit
cf98020
·
1 Parent(s): 08f68ef

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36d69da44ec26317c249e1c708b9223a296d212d18e55cc1004ed38fad8531d3
3
+ size 870817519
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:612b5c2d72323154797a914eb5ab99b3aaa9f4a1c669a0c9a2251c19601bf039
3
+ size 870817636
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 11:03:52 0.0001 0.9515 0.1219 0.4361 0.2652 0.3298 0.1977
3
+ 2 11:27:44 0.0001 0.1530 0.1417 0.2613 0.6875 0.3787 0.2354
4
+ 3 11:51:46 0.0001 0.0910 0.1857 0.2805 0.5928 0.3808 0.2362
5
+ 4 12:16:00 0.0001 0.0637 0.2290 0.2924 0.5057 0.3706 0.2284
6
+ 5 12:39:59 0.0001 0.0447 0.2849 0.2915 0.5152 0.3723 0.2299
7
+ 6 13:04:01 0.0001 0.0337 0.3598 0.2902 0.5814 0.3871 0.2417
8
+ 7 13:28:03 0.0001 0.0257 0.4001 0.2951 0.6098 0.3978 0.2486
9
+ 8 13:52:07 0.0000 0.0177 0.4319 0.2958 0.6364 0.4038 0.2538
10
+ 9 14:16:18 0.0000 0.0122 0.4377 0.3060 0.6212 0.4100 0.2589
11
+ 10 14:40:22 0.0000 0.0093 0.4512 0.3084 0.6326 0.4146 0.2624
runs/events.out.tfevents.1697107196.6d4c7681f95b.1253.16 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe4a1b5a2b57ca9a13e32107ef2ab1c2c7e0d758d57cb0854a08a950bc849b6e
3
+ size 1464420
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,266 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-12 10:39:56,361 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-12 10:39:56,363 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=17, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-12 10:39:56,363 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-12 10:39:56,363 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
72
+ - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
73
+ 2023-10-12 10:39:56,363 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-12 10:39:56,363 Train: 20847 sentences
75
+ 2023-10-12 10:39:56,364 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-12 10:39:56,364 Training Params:
78
+ 2023-10-12 10:39:56,364 - learning_rate: "0.00015"
79
+ 2023-10-12 10:39:56,364 - mini_batch_size: "8"
80
+ 2023-10-12 10:39:56,364 - max_epochs: "10"
81
+ 2023-10-12 10:39:56,364 - shuffle: "True"
82
+ 2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-12 10:39:56,364 Plugins:
84
+ 2023-10-12 10:39:56,364 - TensorboardLogger
85
+ 2023-10-12 10:39:56,364 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-12 10:39:56,364 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-12 10:39:56,364 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-12 10:39:56,364 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-12 10:39:56,365 Computation:
91
+ 2023-10-12 10:39:56,365 - compute on device: cuda:0
92
+ 2023-10-12 10:39:56,365 - embedding storage: none
93
+ 2023-10-12 10:39:56,365 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-12 10:39:56,365 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5"
95
+ 2023-10-12 10:39:56,365 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-12 10:39:56,365 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-12 10:39:56,365 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-12 10:42:18,682 epoch 1 - iter 260/2606 - loss 2.79098046 - time (sec): 142.31 - samples/sec: 285.86 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-12 10:44:40,875 epoch 1 - iter 520/2606 - loss 2.53978742 - time (sec): 284.51 - samples/sec: 278.51 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-12 10:47:00,532 epoch 1 - iter 780/2606 - loss 2.17133359 - time (sec): 424.16 - samples/sec: 271.17 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-12 10:49:20,483 epoch 1 - iter 1040/2606 - loss 1.79656729 - time (sec): 564.12 - samples/sec: 269.30 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-12 10:51:42,179 epoch 1 - iter 1300/2606 - loss 1.53837733 - time (sec): 705.81 - samples/sec: 267.96 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-12 10:54:02,563 epoch 1 - iter 1560/2606 - loss 1.35780764 - time (sec): 846.20 - samples/sec: 268.62 - lr: 0.000090 - momentum: 0.000000
104
+ 2023-10-12 10:56:22,180 epoch 1 - iter 1820/2606 - loss 1.21922934 - time (sec): 985.81 - samples/sec: 266.92 - lr: 0.000105 - momentum: 0.000000
105
+ 2023-10-12 10:58:41,031 epoch 1 - iter 2080/2606 - loss 1.10847958 - time (sec): 1124.66 - samples/sec: 266.18 - lr: 0.000120 - momentum: 0.000000
106
+ 2023-10-12 11:00:56,908 epoch 1 - iter 2340/2606 - loss 1.02422584 - time (sec): 1260.54 - samples/sec: 264.41 - lr: 0.000135 - momentum: 0.000000
107
+ 2023-10-12 11:03:12,052 epoch 1 - iter 2600/2606 - loss 0.95341170 - time (sec): 1395.68 - samples/sec: 262.62 - lr: 0.000150 - momentum: 0.000000
108
+ 2023-10-12 11:03:15,142 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-12 11:03:15,143 EPOCH 1 done: loss 0.9515 - lr: 0.000150
110
+ 2023-10-12 11:03:52,364 DEV : loss 0.12193801254034042 - f1-score (micro avg) 0.3298
111
+ 2023-10-12 11:03:52,416 saving best model
112
+ 2023-10-12 11:03:53,327 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-12 11:06:10,460 epoch 2 - iter 260/2606 - loss 0.22817029 - time (sec): 137.13 - samples/sec: 260.71 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-12 11:08:28,429 epoch 2 - iter 520/2606 - loss 0.19785334 - time (sec): 275.10 - samples/sec: 261.21 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-12 11:10:44,641 epoch 2 - iter 780/2606 - loss 0.18391103 - time (sec): 411.31 - samples/sec: 259.17 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-12 11:13:03,481 epoch 2 - iter 1040/2606 - loss 0.17657181 - time (sec): 550.15 - samples/sec: 260.46 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-12 11:15:22,641 epoch 2 - iter 1300/2606 - loss 0.17249729 - time (sec): 689.31 - samples/sec: 260.29 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-12 11:17:42,766 epoch 2 - iter 1560/2606 - loss 0.16463122 - time (sec): 829.44 - samples/sec: 262.66 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-12 11:20:00,409 epoch 2 - iter 1820/2606 - loss 0.16141368 - time (sec): 967.08 - samples/sec: 261.73 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-12 11:22:18,980 epoch 2 - iter 2080/2606 - loss 0.15898957 - time (sec): 1105.65 - samples/sec: 263.51 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-12 11:24:40,875 epoch 2 - iter 2340/2606 - loss 0.15595656 - time (sec): 1247.55 - samples/sec: 263.93 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-12 11:26:59,696 epoch 2 - iter 2600/2606 - loss 0.15306281 - time (sec): 1386.37 - samples/sec: 264.42 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-12 11:27:02,847 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-12 11:27:02,847 EPOCH 2 done: loss 0.1530 - lr: 0.000133
125
+ 2023-10-12 11:27:44,631 DEV : loss 0.14174272119998932 - f1-score (micro avg) 0.3787
126
+ 2023-10-12 11:27:44,688 saving best model
127
+ 2023-10-12 11:27:47,305 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-12 11:30:06,778 epoch 3 - iter 260/2606 - loss 0.09674644 - time (sec): 139.47 - samples/sec: 254.60 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-12 11:32:22,055 epoch 3 - iter 520/2606 - loss 0.10014325 - time (sec): 274.75 - samples/sec: 250.67 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-12 11:34:42,409 epoch 3 - iter 780/2606 - loss 0.09513807 - time (sec): 415.10 - samples/sec: 256.98 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-12 11:37:01,896 epoch 3 - iter 1040/2606 - loss 0.09185215 - time (sec): 554.59 - samples/sec: 257.11 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-12 11:39:21,533 epoch 3 - iter 1300/2606 - loss 0.09008465 - time (sec): 694.22 - samples/sec: 258.02 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-12 11:41:41,861 epoch 3 - iter 1560/2606 - loss 0.09055602 - time (sec): 834.55 - samples/sec: 259.20 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-12 11:44:01,170 epoch 3 - iter 1820/2606 - loss 0.09064034 - time (sec): 973.86 - samples/sec: 258.25 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-12 11:46:21,153 epoch 3 - iter 2080/2606 - loss 0.09076692 - time (sec): 1113.84 - samples/sec: 259.66 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-12 11:48:43,071 epoch 3 - iter 2340/2606 - loss 0.08970600 - time (sec): 1255.76 - samples/sec: 262.05 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-12 11:51:01,574 epoch 3 - iter 2600/2606 - loss 0.09003826 - time (sec): 1394.26 - samples/sec: 262.94 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-12 11:51:04,717 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-12 11:51:04,718 EPOCH 3 done: loss 0.0910 - lr: 0.000117
140
+ 2023-10-12 11:51:46,787 DEV : loss 0.1857159584760666 - f1-score (micro avg) 0.3808
141
+ 2023-10-12 11:51:46,841 saving best model
142
+ 2023-10-12 11:51:49,486 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-12 11:54:08,085 epoch 4 - iter 260/2606 - loss 0.06864050 - time (sec): 138.59 - samples/sec: 257.51 - lr: 0.000115 - momentum: 0.000000
144
+ 2023-10-12 11:56:28,898 epoch 4 - iter 520/2606 - loss 0.06148994 - time (sec): 279.41 - samples/sec: 261.33 - lr: 0.000113 - momentum: 0.000000
145
+ 2023-10-12 11:58:48,559 epoch 4 - iter 780/2606 - loss 0.06145061 - time (sec): 419.07 - samples/sec: 260.81 - lr: 0.000112 - momentum: 0.000000
146
+ 2023-10-12 12:01:11,693 epoch 4 - iter 1040/2606 - loss 0.06341903 - time (sec): 562.20 - samples/sec: 256.24 - lr: 0.000110 - momentum: 0.000000
147
+ 2023-10-12 12:03:37,108 epoch 4 - iter 1300/2606 - loss 0.06522316 - time (sec): 707.62 - samples/sec: 257.77 - lr: 0.000108 - momentum: 0.000000
148
+ 2023-10-12 12:06:02,487 epoch 4 - iter 1560/2606 - loss 0.06572690 - time (sec): 853.00 - samples/sec: 259.00 - lr: 0.000107 - momentum: 0.000000
149
+ 2023-10-12 12:08:21,818 epoch 4 - iter 1820/2606 - loss 0.06365649 - time (sec): 992.33 - samples/sec: 259.68 - lr: 0.000105 - momentum: 0.000000
150
+ 2023-10-12 12:10:38,937 epoch 4 - iter 2080/2606 - loss 0.06307986 - time (sec): 1129.45 - samples/sec: 260.42 - lr: 0.000103 - momentum: 0.000000
151
+ 2023-10-12 12:12:58,069 epoch 4 - iter 2340/2606 - loss 0.06305110 - time (sec): 1268.58 - samples/sec: 260.22 - lr: 0.000102 - momentum: 0.000000
152
+ 2023-10-12 12:15:16,561 epoch 4 - iter 2600/2606 - loss 0.06362841 - time (sec): 1407.07 - samples/sec: 260.73 - lr: 0.000100 - momentum: 0.000000
153
+ 2023-10-12 12:15:19,400 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-12 12:15:19,401 EPOCH 4 done: loss 0.0637 - lr: 0.000100
155
+ 2023-10-12 12:16:00,766 DEV : loss 0.22897003591060638 - f1-score (micro avg) 0.3706
156
+ 2023-10-12 12:16:00,819 ----------------------------------------------------------------------------------------------------
157
+ 2023-10-12 12:18:16,863 epoch 5 - iter 260/2606 - loss 0.04354472 - time (sec): 136.04 - samples/sec: 264.82 - lr: 0.000098 - momentum: 0.000000
158
+ 2023-10-12 12:20:33,081 epoch 5 - iter 520/2606 - loss 0.04345957 - time (sec): 272.26 - samples/sec: 257.57 - lr: 0.000097 - momentum: 0.000000
159
+ 2023-10-12 12:22:52,495 epoch 5 - iter 780/2606 - loss 0.04448711 - time (sec): 411.67 - samples/sec: 261.23 - lr: 0.000095 - momentum: 0.000000
160
+ 2023-10-12 12:25:11,546 epoch 5 - iter 1040/2606 - loss 0.04417456 - time (sec): 550.72 - samples/sec: 260.87 - lr: 0.000093 - momentum: 0.000000
161
+ 2023-10-12 12:27:30,883 epoch 5 - iter 1300/2606 - loss 0.04446586 - time (sec): 690.06 - samples/sec: 262.18 - lr: 0.000092 - momentum: 0.000000
162
+ 2023-10-12 12:29:53,071 epoch 5 - iter 1560/2606 - loss 0.04447562 - time (sec): 832.25 - samples/sec: 265.01 - lr: 0.000090 - momentum: 0.000000
163
+ 2023-10-12 12:32:13,835 epoch 5 - iter 1820/2606 - loss 0.04488614 - time (sec): 973.01 - samples/sec: 266.42 - lr: 0.000088 - momentum: 0.000000
164
+ 2023-10-12 12:34:27,520 epoch 5 - iter 2080/2606 - loss 0.04528989 - time (sec): 1106.70 - samples/sec: 264.22 - lr: 0.000087 - momentum: 0.000000
165
+ 2023-10-12 12:36:51,484 epoch 5 - iter 2340/2606 - loss 0.04482478 - time (sec): 1250.66 - samples/sec: 263.88 - lr: 0.000085 - momentum: 0.000000
166
+ 2023-10-12 12:39:13,945 epoch 5 - iter 2600/2606 - loss 0.04472883 - time (sec): 1393.12 - samples/sec: 263.19 - lr: 0.000083 - momentum: 0.000000
167
+ 2023-10-12 12:39:17,034 ----------------------------------------------------------------------------------------------------
168
+ 2023-10-12 12:39:17,034 EPOCH 5 done: loss 0.0447 - lr: 0.000083
169
+ 2023-10-12 12:39:59,133 DEV : loss 0.2849178612232208 - f1-score (micro avg) 0.3723
170
+ 2023-10-12 12:39:59,190 ----------------------------------------------------------------------------------------------------
171
+ 2023-10-12 12:42:16,257 epoch 6 - iter 260/2606 - loss 0.03205628 - time (sec): 137.07 - samples/sec: 257.42 - lr: 0.000082 - momentum: 0.000000
172
+ 2023-10-12 12:44:40,129 epoch 6 - iter 520/2606 - loss 0.03060171 - time (sec): 280.94 - samples/sec: 264.85 - lr: 0.000080 - momentum: 0.000000
173
+ 2023-10-12 12:46:59,629 epoch 6 - iter 780/2606 - loss 0.02986452 - time (sec): 420.44 - samples/sec: 265.75 - lr: 0.000078 - momentum: 0.000000
174
+ 2023-10-12 12:49:19,956 epoch 6 - iter 1040/2606 - loss 0.03282256 - time (sec): 560.76 - samples/sec: 262.17 - lr: 0.000077 - momentum: 0.000000
175
+ 2023-10-12 12:51:37,247 epoch 6 - iter 1300/2606 - loss 0.03337104 - time (sec): 698.06 - samples/sec: 260.05 - lr: 0.000075 - momentum: 0.000000
176
+ 2023-10-12 12:53:58,164 epoch 6 - iter 1560/2606 - loss 0.03383846 - time (sec): 838.97 - samples/sec: 262.62 - lr: 0.000073 - momentum: 0.000000
177
+ 2023-10-12 12:56:20,297 epoch 6 - iter 1820/2606 - loss 0.03513634 - time (sec): 981.11 - samples/sec: 263.52 - lr: 0.000072 - momentum: 0.000000
178
+ 2023-10-12 12:58:37,971 epoch 6 - iter 2080/2606 - loss 0.03457993 - time (sec): 1118.78 - samples/sec: 261.28 - lr: 0.000070 - momentum: 0.000000
179
+ 2023-10-12 13:00:55,395 epoch 6 - iter 2340/2606 - loss 0.03450918 - time (sec): 1256.20 - samples/sec: 261.02 - lr: 0.000068 - momentum: 0.000000
180
+ 2023-10-12 13:03:15,842 epoch 6 - iter 2600/2606 - loss 0.03375577 - time (sec): 1396.65 - samples/sec: 262.22 - lr: 0.000067 - momentum: 0.000000
181
+ 2023-10-12 13:03:19,438 ----------------------------------------------------------------------------------------------------
182
+ 2023-10-12 13:03:19,439 EPOCH 6 done: loss 0.0337 - lr: 0.000067
183
+ 2023-10-12 13:04:01,257 DEV : loss 0.35980162024497986 - f1-score (micro avg) 0.3871
184
+ 2023-10-12 13:04:01,331 saving best model
185
+ 2023-10-12 13:04:04,002 ----------------------------------------------------------------------------------------------------
186
+ 2023-10-12 13:06:24,564 epoch 7 - iter 260/2606 - loss 0.02056707 - time (sec): 140.56 - samples/sec: 262.29 - lr: 0.000065 - momentum: 0.000000
187
+ 2023-10-12 13:08:44,030 epoch 7 - iter 520/2606 - loss 0.02211430 - time (sec): 280.02 - samples/sec: 259.85 - lr: 0.000063 - momentum: 0.000000
188
+ 2023-10-12 13:11:03,498 epoch 7 - iter 780/2606 - loss 0.02238197 - time (sec): 419.49 - samples/sec: 261.02 - lr: 0.000062 - momentum: 0.000000
189
+ 2023-10-12 13:13:22,390 epoch 7 - iter 1040/2606 - loss 0.02341070 - time (sec): 558.38 - samples/sec: 262.32 - lr: 0.000060 - momentum: 0.000000
190
+ 2023-10-12 13:15:40,113 epoch 7 - iter 1300/2606 - loss 0.02460152 - time (sec): 696.11 - samples/sec: 264.02 - lr: 0.000058 - momentum: 0.000000
191
+ 2023-10-12 13:18:05,020 epoch 7 - iter 1560/2606 - loss 0.02429302 - time (sec): 841.01 - samples/sec: 268.16 - lr: 0.000057 - momentum: 0.000000
192
+ 2023-10-12 13:20:23,524 epoch 7 - iter 1820/2606 - loss 0.02376757 - time (sec): 979.52 - samples/sec: 266.99 - lr: 0.000055 - momentum: 0.000000
193
+ 2023-10-12 13:22:39,436 epoch 7 - iter 2080/2606 - loss 0.02507358 - time (sec): 1115.43 - samples/sec: 264.44 - lr: 0.000053 - momentum: 0.000000
194
+ 2023-10-12 13:24:57,530 epoch 7 - iter 2340/2606 - loss 0.02604305 - time (sec): 1253.52 - samples/sec: 263.11 - lr: 0.000052 - momentum: 0.000000
195
+ 2023-10-12 13:27:18,710 epoch 7 - iter 2600/2606 - loss 0.02570231 - time (sec): 1394.70 - samples/sec: 262.85 - lr: 0.000050 - momentum: 0.000000
196
+ 2023-10-12 13:27:21,846 ----------------------------------------------------------------------------------------------------
197
+ 2023-10-12 13:27:21,846 EPOCH 7 done: loss 0.0257 - lr: 0.000050
198
+ 2023-10-12 13:28:03,580 DEV : loss 0.40007448196411133 - f1-score (micro avg) 0.3978
199
+ 2023-10-12 13:28:03,637 saving best model
200
+ 2023-10-12 13:28:06,266 ----------------------------------------------------------------------------------------------------
201
+ 2023-10-12 13:30:27,485 epoch 8 - iter 260/2606 - loss 0.01482737 - time (sec): 141.21 - samples/sec: 266.04 - lr: 0.000048 - momentum: 0.000000
202
+ 2023-10-12 13:32:45,532 epoch 8 - iter 520/2606 - loss 0.01696598 - time (sec): 279.26 - samples/sec: 261.91 - lr: 0.000047 - momentum: 0.000000
203
+ 2023-10-12 13:35:06,637 epoch 8 - iter 780/2606 - loss 0.01817455 - time (sec): 420.37 - samples/sec: 262.67 - lr: 0.000045 - momentum: 0.000000
204
+ 2023-10-12 13:37:25,483 epoch 8 - iter 1040/2606 - loss 0.01837600 - time (sec): 559.21 - samples/sec: 264.78 - lr: 0.000043 - momentum: 0.000000
205
+ 2023-10-12 13:39:41,894 epoch 8 - iter 1300/2606 - loss 0.01883484 - time (sec): 695.62 - samples/sec: 261.77 - lr: 0.000042 - momentum: 0.000000
206
+ 2023-10-12 13:41:59,734 epoch 8 - iter 1560/2606 - loss 0.01783081 - time (sec): 833.46 - samples/sec: 262.78 - lr: 0.000040 - momentum: 0.000000
207
+ 2023-10-12 13:44:16,165 epoch 8 - iter 1820/2606 - loss 0.01833982 - time (sec): 969.89 - samples/sec: 261.10 - lr: 0.000038 - momentum: 0.000000
208
+ 2023-10-12 13:46:35,668 epoch 8 - iter 2080/2606 - loss 0.01813588 - time (sec): 1109.40 - samples/sec: 260.69 - lr: 0.000037 - momentum: 0.000000
209
+ 2023-10-12 13:48:57,758 epoch 8 - iter 2340/2606 - loss 0.01823581 - time (sec): 1251.49 - samples/sec: 263.09 - lr: 0.000035 - momentum: 0.000000
210
+ 2023-10-12 13:51:20,031 epoch 8 - iter 2600/2606 - loss 0.01778836 - time (sec): 1393.76 - samples/sec: 262.81 - lr: 0.000033 - momentum: 0.000000
211
+ 2023-10-12 13:51:23,576 ----------------------------------------------------------------------------------------------------
212
+ 2023-10-12 13:51:23,576 EPOCH 8 done: loss 0.0177 - lr: 0.000033
213
+ 2023-10-12 13:52:07,288 DEV : loss 0.4319295883178711 - f1-score (micro avg) 0.4038
214
+ 2023-10-12 13:52:07,356 saving best model
215
+ 2023-10-12 13:52:09,992 ----------------------------------------------------------------------------------------------------
216
+ 2023-10-12 13:54:30,936 epoch 9 - iter 260/2606 - loss 0.01164345 - time (sec): 140.94 - samples/sec: 265.00 - lr: 0.000032 - momentum: 0.000000
217
+ 2023-10-12 13:56:44,434 epoch 9 - iter 520/2606 - loss 0.01383209 - time (sec): 274.44 - samples/sec: 255.72 - lr: 0.000030 - momentum: 0.000000
218
+ 2023-10-12 13:59:02,722 epoch 9 - iter 780/2606 - loss 0.01292411 - time (sec): 412.73 - samples/sec: 260.02 - lr: 0.000028 - momentum: 0.000000
219
+ 2023-10-12 14:01:23,730 epoch 9 - iter 1040/2606 - loss 0.01215414 - time (sec): 553.73 - samples/sec: 260.86 - lr: 0.000027 - momentum: 0.000000
220
+ 2023-10-12 14:03:45,315 epoch 9 - iter 1300/2606 - loss 0.01270955 - time (sec): 695.32 - samples/sec: 263.37 - lr: 0.000025 - momentum: 0.000000
221
+ 2023-10-12 14:06:04,587 epoch 9 - iter 1560/2606 - loss 0.01279780 - time (sec): 834.59 - samples/sec: 262.06 - lr: 0.000023 - momentum: 0.000000
222
+ 2023-10-12 14:08:30,951 epoch 9 - iter 1820/2606 - loss 0.01225063 - time (sec): 980.95 - samples/sec: 261.88 - lr: 0.000022 - momentum: 0.000000
223
+ 2023-10-12 14:10:55,018 epoch 9 - iter 2080/2606 - loss 0.01260601 - time (sec): 1125.02 - samples/sec: 261.01 - lr: 0.000020 - momentum: 0.000000
224
+ 2023-10-12 14:13:17,567 epoch 9 - iter 2340/2606 - loss 0.01255088 - time (sec): 1267.57 - samples/sec: 260.39 - lr: 0.000018 - momentum: 0.000000
225
+ 2023-10-12 14:15:34,884 epoch 9 - iter 2600/2606 - loss 0.01226504 - time (sec): 1404.89 - samples/sec: 261.09 - lr: 0.000017 - momentum: 0.000000
226
+ 2023-10-12 14:15:37,671 ----------------------------------------------------------------------------------------------------
227
+ 2023-10-12 14:15:37,671 EPOCH 9 done: loss 0.0122 - lr: 0.000017
228
+ 2023-10-12 14:16:18,359 DEV : loss 0.43767231702804565 - f1-score (micro avg) 0.41
229
+ 2023-10-12 14:16:18,411 saving best model
230
+ 2023-10-12 14:16:20,975 ----------------------------------------------------------------------------------------------------
231
+ 2023-10-12 14:18:40,818 epoch 10 - iter 260/2606 - loss 0.00960636 - time (sec): 139.84 - samples/sec: 261.49 - lr: 0.000015 - momentum: 0.000000
232
+ 2023-10-12 14:20:58,673 epoch 10 - iter 520/2606 - loss 0.01062704 - time (sec): 277.69 - samples/sec: 259.88 - lr: 0.000013 - momentum: 0.000000
233
+ 2023-10-12 14:23:18,738 epoch 10 - iter 780/2606 - loss 0.01059562 - time (sec): 417.76 - samples/sec: 262.80 - lr: 0.000012 - momentum: 0.000000
234
+ 2023-10-12 14:25:36,355 epoch 10 - iter 1040/2606 - loss 0.00990986 - time (sec): 555.38 - samples/sec: 260.97 - lr: 0.000010 - momentum: 0.000000
235
+ 2023-10-12 14:27:56,432 epoch 10 - iter 1300/2606 - loss 0.00958578 - time (sec): 695.45 - samples/sec: 261.48 - lr: 0.000008 - momentum: 0.000000
236
+ 2023-10-12 14:30:16,906 epoch 10 - iter 1560/2606 - loss 0.00964887 - time (sec): 835.93 - samples/sec: 261.59 - lr: 0.000007 - momentum: 0.000000
237
+ 2023-10-12 14:32:34,264 epoch 10 - iter 1820/2606 - loss 0.00928930 - time (sec): 973.28 - samples/sec: 260.18 - lr: 0.000005 - momentum: 0.000000
238
+ 2023-10-12 14:34:54,973 epoch 10 - iter 2080/2606 - loss 0.00963159 - time (sec): 1113.99 - samples/sec: 261.10 - lr: 0.000003 - momentum: 0.000000
239
+ 2023-10-12 14:37:15,806 epoch 10 - iter 2340/2606 - loss 0.00957205 - time (sec): 1254.83 - samples/sec: 261.18 - lr: 0.000002 - momentum: 0.000000
240
+ 2023-10-12 14:39:35,671 epoch 10 - iter 2600/2606 - loss 0.00934680 - time (sec): 1394.69 - samples/sec: 262.81 - lr: 0.000000 - momentum: 0.000000
241
+ 2023-10-12 14:39:38,739 ----------------------------------------------------------------------------------------------------
242
+ 2023-10-12 14:39:38,739 EPOCH 10 done: loss 0.0093 - lr: 0.000000
243
+ 2023-10-12 14:40:22,122 DEV : loss 0.45118266344070435 - f1-score (micro avg) 0.4146
244
+ 2023-10-12 14:40:22,181 saving best model
245
+ 2023-10-12 14:40:24,269 ----------------------------------------------------------------------------------------------------
246
+ 2023-10-12 14:40:24,271 Loading model from best epoch ...
247
+ 2023-10-12 14:40:28,579 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
248
+ 2023-10-12 14:42:13,016
249
+ Results:
250
+ - F-score (micro) 0.4344
251
+ - F-score (macro) 0.3036
252
+ - Accuracy 0.2816
253
+
254
+ By class:
255
+ precision recall f1-score support
256
+
257
+ LOC 0.4256 0.4992 0.4594 1214
258
+ PER 0.4038 0.5272 0.4573 808
259
+ ORG 0.2895 0.3059 0.2975 353
260
+ HumanProd 0.0000 0.0000 0.0000 15
261
+
262
+ micro avg 0.3987 0.4770 0.4344 2390
263
+ macro avg 0.2797 0.3331 0.3036 2390
264
+ weighted avg 0.3954 0.4770 0.4319 2390
265
+
266
+ 2023-10-12 14:42:13,016 ----------------------------------------------------------------------------------------------------