stefan-it commited on
Commit
e2ee494
·
1 Parent(s): 82f1901

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42bbad9e0a98c295de69b36a412db65e37c5fb44fd921e6c0e7900e2a65e3831
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d3af10133e1a85c4b2fe5a457b7e2c87686f4a60b4b4e6229dc7b10be6f6b6f
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 10:25:30 0.0001 0.7705 0.1487 0.3518 0.4714 0.4029 0.2651
3
+ 2 10:42:25 0.0001 0.1022 0.0991 0.5314 0.6190 0.5719 0.4062
4
+ 3 10:59:24 0.0001 0.0632 0.1173 0.5590 0.7368 0.6357 0.4753
5
+ 4 11:16:44 0.0001 0.0457 0.1688 0.5385 0.7769 0.6361 0.4765
6
+ 5 11:34:04 0.0001 0.0337 0.2216 0.5587 0.7735 0.6488 0.4874
7
+ 6 11:50:42 0.0001 0.0248 0.2643 0.5562 0.7815 0.6499 0.4879
8
+ 7 12:07:37 0.0001 0.0190 0.3006 0.5588 0.7723 0.6484 0.4884
9
+ 8 12:24:39 0.0000 0.0136 0.3134 0.5600 0.7586 0.6443 0.4829
10
+ 9 12:41:52 0.0000 0.0112 0.3353 0.5612 0.7654 0.6476 0.4862
11
+ 10 12:58:38 0.0000 0.0071 0.3420 0.5649 0.7769 0.6541 0.4931
runs/events.out.tfevents.1697191722.c8b2203b18a8.2923.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c95853a85881346b1ad1bc25664a6dcf6b52348a7f73ef95ebaf0b4342fb38b0
3
+ size 1018100
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,264 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-13 10:08:42,076 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-13 10:08:42,079 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-13 10:08:42,079 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-13 10:08:42,079 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences
72
+ - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator
73
+ 2023-10-13 10:08:42,079 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-13 10:08:42,079 Train: 14465 sentences
75
+ 2023-10-13 10:08:42,079 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-13 10:08:42,079 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-13 10:08:42,079 Training Params:
78
+ 2023-10-13 10:08:42,079 - learning_rate: "0.00015"
79
+ 2023-10-13 10:08:42,080 - mini_batch_size: "8"
80
+ 2023-10-13 10:08:42,080 - max_epochs: "10"
81
+ 2023-10-13 10:08:42,080 - shuffle: "True"
82
+ 2023-10-13 10:08:42,080 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-13 10:08:42,080 Plugins:
84
+ 2023-10-13 10:08:42,080 - TensorboardLogger
85
+ 2023-10-13 10:08:42,080 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-13 10:08:42,080 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-13 10:08:42,080 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-13 10:08:42,080 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-13 10:08:42,080 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-13 10:08:42,080 Computation:
91
+ 2023-10-13 10:08:42,080 - compute on device: cuda:0
92
+ 2023-10-13 10:08:42,080 - embedding storage: none
93
+ 2023-10-13 10:08:42,080 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-13 10:08:42,081 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2"
95
+ 2023-10-13 10:08:42,081 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-13 10:08:42,081 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-13 10:08:42,081 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-13 10:10:16,670 epoch 1 - iter 180/1809 - loss 2.55193452 - time (sec): 94.59 - samples/sec: 393.52 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-13 10:11:50,746 epoch 1 - iter 360/1809 - loss 2.32664425 - time (sec): 188.66 - samples/sec: 395.41 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-13 10:13:30,449 epoch 1 - iter 540/1809 - loss 1.97458754 - time (sec): 288.37 - samples/sec: 391.31 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-13 10:15:09,687 epoch 1 - iter 720/1809 - loss 1.63149756 - time (sec): 387.60 - samples/sec: 388.96 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-13 10:16:47,097 epoch 1 - iter 900/1809 - loss 1.36469605 - time (sec): 485.01 - samples/sec: 389.16 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-13 10:18:23,052 epoch 1 - iter 1080/1809 - loss 1.17974807 - time (sec): 580.97 - samples/sec: 389.57 - lr: 0.000089 - momentum: 0.000000
104
+ 2023-10-13 10:19:58,384 epoch 1 - iter 1260/1809 - loss 1.03904632 - time (sec): 676.30 - samples/sec: 390.55 - lr: 0.000104 - momentum: 0.000000
105
+ 2023-10-13 10:21:34,085 epoch 1 - iter 1440/1809 - loss 0.92828376 - time (sec): 772.00 - samples/sec: 391.23 - lr: 0.000119 - momentum: 0.000000
106
+ 2023-10-13 10:23:11,299 epoch 1 - iter 1620/1809 - loss 0.84195771 - time (sec): 869.22 - samples/sec: 391.84 - lr: 0.000134 - momentum: 0.000000
107
+ 2023-10-13 10:24:46,248 epoch 1 - iter 1800/1809 - loss 0.77324166 - time (sec): 964.17 - samples/sec: 392.14 - lr: 0.000149 - momentum: 0.000000
108
+ 2023-10-13 10:24:50,774 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-13 10:24:50,774 EPOCH 1 done: loss 0.7705 - lr: 0.000149
110
+ 2023-10-13 10:25:30,426 DEV : loss 0.14874930679798126 - f1-score (micro avg) 0.4029
111
+ 2023-10-13 10:25:30,486 saving best model
112
+ 2023-10-13 10:25:31,350 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-13 10:27:05,720 epoch 2 - iter 180/1809 - loss 0.13805801 - time (sec): 94.37 - samples/sec: 388.33 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-13 10:28:42,834 epoch 2 - iter 360/1809 - loss 0.12933060 - time (sec): 191.48 - samples/sec: 389.43 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-13 10:30:19,877 epoch 2 - iter 540/1809 - loss 0.12513132 - time (sec): 288.52 - samples/sec: 390.84 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-13 10:31:57,990 epoch 2 - iter 720/1809 - loss 0.12053367 - time (sec): 386.64 - samples/sec: 389.16 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-13 10:33:35,527 epoch 2 - iter 900/1809 - loss 0.11629545 - time (sec): 484.17 - samples/sec: 391.21 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-13 10:35:08,984 epoch 2 - iter 1080/1809 - loss 0.11333380 - time (sec): 577.63 - samples/sec: 391.35 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-13 10:36:43,056 epoch 2 - iter 1260/1809 - loss 0.10988685 - time (sec): 671.70 - samples/sec: 391.95 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-13 10:38:22,782 epoch 2 - iter 1440/1809 - loss 0.10663465 - time (sec): 771.43 - samples/sec: 392.15 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-13 10:40:03,308 epoch 2 - iter 1620/1809 - loss 0.10350687 - time (sec): 871.95 - samples/sec: 390.60 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-13 10:41:38,900 epoch 2 - iter 1800/1809 - loss 0.10224515 - time (sec): 967.55 - samples/sec: 390.97 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-13 10:41:43,136 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-13 10:41:43,137 EPOCH 2 done: loss 0.1022 - lr: 0.000133
125
+ 2023-10-13 10:42:24,954 DEV : loss 0.09910175204277039 - f1-score (micro avg) 0.5719
126
+ 2023-10-13 10:42:25,015 saving best model
127
+ 2023-10-13 10:42:27,591 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-13 10:44:04,602 epoch 3 - iter 180/1809 - loss 0.06155697 - time (sec): 97.01 - samples/sec: 403.70 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-13 10:45:38,856 epoch 3 - iter 360/1809 - loss 0.06070254 - time (sec): 191.26 - samples/sec: 395.73 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-13 10:47:14,475 epoch 3 - iter 540/1809 - loss 0.06120311 - time (sec): 286.88 - samples/sec: 394.26 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-13 10:48:53,782 epoch 3 - iter 720/1809 - loss 0.06292844 - time (sec): 386.19 - samples/sec: 390.36 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-13 10:50:32,623 epoch 3 - iter 900/1809 - loss 0.06307044 - time (sec): 485.03 - samples/sec: 389.58 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-13 10:52:10,549 epoch 3 - iter 1080/1809 - loss 0.06402904 - time (sec): 582.95 - samples/sec: 386.68 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-13 10:53:49,156 epoch 3 - iter 1260/1809 - loss 0.06377227 - time (sec): 681.56 - samples/sec: 389.15 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-13 10:55:24,344 epoch 3 - iter 1440/1809 - loss 0.06426367 - time (sec): 776.75 - samples/sec: 388.15 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-13 10:57:00,411 epoch 3 - iter 1620/1809 - loss 0.06362259 - time (sec): 872.82 - samples/sec: 389.28 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-13 10:58:38,665 epoch 3 - iter 1800/1809 - loss 0.06327889 - time (sec): 971.07 - samples/sec: 389.10 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-13 10:58:43,368 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-13 10:58:43,368 EPOCH 3 done: loss 0.0632 - lr: 0.000117
140
+ 2023-10-13 10:59:24,367 DEV : loss 0.11729110032320023 - f1-score (micro avg) 0.6357
141
+ 2023-10-13 10:59:24,427 saving best model
142
+ 2023-10-13 10:59:26,988 ----------------------------------------------------------------------------------------------------
143
+ 2023-10-13 11:01:01,873 epoch 4 - iter 180/1809 - loss 0.03980255 - time (sec): 94.88 - samples/sec: 390.73 - lr: 0.000115 - momentum: 0.000000
144
+ 2023-10-13 11:02:41,954 epoch 4 - iter 360/1809 - loss 0.04218853 - time (sec): 194.96 - samples/sec: 389.89 - lr: 0.000113 - momentum: 0.000000
145
+ 2023-10-13 11:04:22,545 epoch 4 - iter 540/1809 - loss 0.04514616 - time (sec): 295.55 - samples/sec: 382.99 - lr: 0.000112 - momentum: 0.000000
146
+ 2023-10-13 11:06:00,574 epoch 4 - iter 720/1809 - loss 0.04552944 - time (sec): 393.58 - samples/sec: 382.84 - lr: 0.000110 - momentum: 0.000000
147
+ 2023-10-13 11:07:37,214 epoch 4 - iter 900/1809 - loss 0.04691891 - time (sec): 490.22 - samples/sec: 384.07 - lr: 0.000108 - momentum: 0.000000
148
+ 2023-10-13 11:09:17,007 epoch 4 - iter 1080/1809 - loss 0.04597866 - time (sec): 590.01 - samples/sec: 382.76 - lr: 0.000107 - momentum: 0.000000
149
+ 2023-10-13 11:10:57,169 epoch 4 - iter 1260/1809 - loss 0.04513610 - time (sec): 690.18 - samples/sec: 381.27 - lr: 0.000105 - momentum: 0.000000
150
+ 2023-10-13 11:12:40,828 epoch 4 - iter 1440/1809 - loss 0.04438682 - time (sec): 793.83 - samples/sec: 379.78 - lr: 0.000103 - momentum: 0.000000
151
+ 2023-10-13 11:14:18,442 epoch 4 - iter 1620/1809 - loss 0.04429229 - time (sec): 891.45 - samples/sec: 381.80 - lr: 0.000102 - momentum: 0.000000
152
+ 2023-10-13 11:16:00,362 epoch 4 - iter 1800/1809 - loss 0.04572766 - time (sec): 993.37 - samples/sec: 380.72 - lr: 0.000100 - momentum: 0.000000
153
+ 2023-10-13 11:16:04,771 ----------------------------------------------------------------------------------------------------
154
+ 2023-10-13 11:16:04,772 EPOCH 4 done: loss 0.0457 - lr: 0.000100
155
+ 2023-10-13 11:16:44,751 DEV : loss 0.16882555186748505 - f1-score (micro avg) 0.6361
156
+ 2023-10-13 11:16:44,823 saving best model
157
+ 2023-10-13 11:16:47,519 ----------------------------------------------------------------------------------------------------
158
+ 2023-10-13 11:18:28,087 epoch 5 - iter 180/1809 - loss 0.02745004 - time (sec): 100.56 - samples/sec: 383.98 - lr: 0.000098 - momentum: 0.000000
159
+ 2023-10-13 11:20:05,038 epoch 5 - iter 360/1809 - loss 0.02948707 - time (sec): 197.51 - samples/sec: 391.81 - lr: 0.000097 - momentum: 0.000000
160
+ 2023-10-13 11:21:43,338 epoch 5 - iter 540/1809 - loss 0.02991506 - time (sec): 295.81 - samples/sec: 385.31 - lr: 0.000095 - momentum: 0.000000
161
+ 2023-10-13 11:23:23,915 epoch 5 - iter 720/1809 - loss 0.03207680 - time (sec): 396.39 - samples/sec: 386.48 - lr: 0.000093 - momentum: 0.000000
162
+ 2023-10-13 11:25:03,777 epoch 5 - iter 900/1809 - loss 0.03173525 - time (sec): 496.25 - samples/sec: 386.05 - lr: 0.000092 - momentum: 0.000000
163
+ 2023-10-13 11:26:41,708 epoch 5 - iter 1080/1809 - loss 0.03291552 - time (sec): 594.18 - samples/sec: 383.07 - lr: 0.000090 - momentum: 0.000000
164
+ 2023-10-13 11:28:20,067 epoch 5 - iter 1260/1809 - loss 0.03293994 - time (sec): 692.54 - samples/sec: 383.92 - lr: 0.000088 - momentum: 0.000000
165
+ 2023-10-13 11:29:59,571 epoch 5 - iter 1440/1809 - loss 0.03244028 - time (sec): 792.05 - samples/sec: 383.50 - lr: 0.000087 - momentum: 0.000000
166
+ 2023-10-13 11:31:39,767 epoch 5 - iter 1620/1809 - loss 0.03323533 - time (sec): 892.24 - samples/sec: 381.40 - lr: 0.000085 - momentum: 0.000000
167
+ 2023-10-13 11:33:19,274 epoch 5 - iter 1800/1809 - loss 0.03363279 - time (sec): 991.75 - samples/sec: 381.47 - lr: 0.000083 - momentum: 0.000000
168
+ 2023-10-13 11:33:23,698 ----------------------------------------------------------------------------------------------------
169
+ 2023-10-13 11:33:23,698 EPOCH 5 done: loss 0.0337 - lr: 0.000083
170
+ 2023-10-13 11:34:04,732 DEV : loss 0.22161424160003662 - f1-score (micro avg) 0.6488
171
+ 2023-10-13 11:34:04,800 saving best model
172
+ 2023-10-13 11:34:07,393 ----------------------------------------------------------------------------------------------------
173
+ 2023-10-13 11:35:48,090 epoch 6 - iter 180/1809 - loss 0.01989408 - time (sec): 100.69 - samples/sec: 377.17 - lr: 0.000082 - momentum: 0.000000
174
+ 2023-10-13 11:37:24,788 epoch 6 - iter 360/1809 - loss 0.02145466 - time (sec): 197.39 - samples/sec: 380.27 - lr: 0.000080 - momentum: 0.000000
175
+ 2023-10-13 11:39:00,908 epoch 6 - iter 540/1809 - loss 0.02242469 - time (sec): 293.51 - samples/sec: 381.09 - lr: 0.000078 - momentum: 0.000000
176
+ 2023-10-13 11:40:34,463 epoch 6 - iter 720/1809 - loss 0.02387583 - time (sec): 387.06 - samples/sec: 388.26 - lr: 0.000077 - momentum: 0.000000
177
+ 2023-10-13 11:42:09,699 epoch 6 - iter 900/1809 - loss 0.02427821 - time (sec): 482.30 - samples/sec: 388.84 - lr: 0.000075 - momentum: 0.000000
178
+ 2023-10-13 11:43:42,056 epoch 6 - iter 1080/1809 - loss 0.02375360 - time (sec): 574.66 - samples/sec: 391.77 - lr: 0.000073 - momentum: 0.000000
179
+ 2023-10-13 11:45:16,686 epoch 6 - iter 1260/1809 - loss 0.02379900 - time (sec): 669.29 - samples/sec: 393.07 - lr: 0.000072 - momentum: 0.000000
180
+ 2023-10-13 11:46:50,964 epoch 6 - iter 1440/1809 - loss 0.02386266 - time (sec): 763.57 - samples/sec: 395.39 - lr: 0.000070 - momentum: 0.000000
181
+ 2023-10-13 11:48:23,979 epoch 6 - iter 1620/1809 - loss 0.02406347 - time (sec): 856.58 - samples/sec: 396.80 - lr: 0.000068 - momentum: 0.000000
182
+ 2023-10-13 11:49:57,153 epoch 6 - iter 1800/1809 - loss 0.02485654 - time (sec): 949.75 - samples/sec: 398.12 - lr: 0.000067 - momentum: 0.000000
183
+ 2023-10-13 11:50:01,456 ----------------------------------------------------------------------------------------------------
184
+ 2023-10-13 11:50:01,456 EPOCH 6 done: loss 0.0248 - lr: 0.000067
185
+ 2023-10-13 11:50:42,664 DEV : loss 0.26427823305130005 - f1-score (micro avg) 0.6499
186
+ 2023-10-13 11:50:42,729 saving best model
187
+ 2023-10-13 11:50:45,273 ----------------------------------------------------------------------------------------------------
188
+ 2023-10-13 11:52:24,565 epoch 7 - iter 180/1809 - loss 0.01829401 - time (sec): 99.29 - samples/sec: 388.56 - lr: 0.000065 - momentum: 0.000000
189
+ 2023-10-13 11:54:00,473 epoch 7 - iter 360/1809 - loss 0.01662527 - time (sec): 195.20 - samples/sec: 390.08 - lr: 0.000063 - momentum: 0.000000
190
+ 2023-10-13 11:55:35,250 epoch 7 - iter 540/1809 - loss 0.01711330 - time (sec): 289.97 - samples/sec: 396.97 - lr: 0.000062 - momentum: 0.000000
191
+ 2023-10-13 11:57:12,171 epoch 7 - iter 720/1809 - loss 0.01763868 - time (sec): 386.89 - samples/sec: 391.83 - lr: 0.000060 - momentum: 0.000000
192
+ 2023-10-13 11:58:52,384 epoch 7 - iter 900/1809 - loss 0.01892285 - time (sec): 487.11 - samples/sec: 388.52 - lr: 0.000058 - momentum: 0.000000
193
+ 2023-10-13 12:00:33,274 epoch 7 - iter 1080/1809 - loss 0.01907594 - time (sec): 588.00 - samples/sec: 389.53 - lr: 0.000057 - momentum: 0.000000
194
+ 2023-10-13 12:02:10,151 epoch 7 - iter 1260/1809 - loss 0.01948090 - time (sec): 684.87 - samples/sec: 389.17 - lr: 0.000055 - momentum: 0.000000
195
+ 2023-10-13 12:03:44,517 epoch 7 - iter 1440/1809 - loss 0.01959196 - time (sec): 779.24 - samples/sec: 389.11 - lr: 0.000053 - momentum: 0.000000
196
+ 2023-10-13 12:05:18,459 epoch 7 - iter 1620/1809 - loss 0.01974823 - time (sec): 873.18 - samples/sec: 389.17 - lr: 0.000052 - momentum: 0.000000
197
+ 2023-10-13 12:06:52,523 epoch 7 - iter 1800/1809 - loss 0.01891607 - time (sec): 967.24 - samples/sec: 390.92 - lr: 0.000050 - momentum: 0.000000
198
+ 2023-10-13 12:06:56,738 ----------------------------------------------------------------------------------------------------
199
+ 2023-10-13 12:06:56,738 EPOCH 7 done: loss 0.0190 - lr: 0.000050
200
+ 2023-10-13 12:07:37,764 DEV : loss 0.3006477653980255 - f1-score (micro avg) 0.6484
201
+ 2023-10-13 12:07:37,830 ----------------------------------------------------------------------------------------------------
202
+ 2023-10-13 12:09:14,953 epoch 8 - iter 180/1809 - loss 0.01107605 - time (sec): 97.12 - samples/sec: 390.74 - lr: 0.000048 - momentum: 0.000000
203
+ 2023-10-13 12:10:55,384 epoch 8 - iter 360/1809 - loss 0.01371757 - time (sec): 197.55 - samples/sec: 391.33 - lr: 0.000047 - momentum: 0.000000
204
+ 2023-10-13 12:12:34,854 epoch 8 - iter 540/1809 - loss 0.01237565 - time (sec): 297.02 - samples/sec: 389.62 - lr: 0.000045 - momentum: 0.000000
205
+ 2023-10-13 12:14:13,295 epoch 8 - iter 720/1809 - loss 0.01229570 - time (sec): 395.46 - samples/sec: 389.84 - lr: 0.000043 - momentum: 0.000000
206
+ 2023-10-13 12:15:49,208 epoch 8 - iter 900/1809 - loss 0.01315215 - time (sec): 491.38 - samples/sec: 387.35 - lr: 0.000042 - momentum: 0.000000
207
+ 2023-10-13 12:17:23,939 epoch 8 - iter 1080/1809 - loss 0.01311433 - time (sec): 586.11 - samples/sec: 391.13 - lr: 0.000040 - momentum: 0.000000
208
+ 2023-10-13 12:18:59,994 epoch 8 - iter 1260/1809 - loss 0.01292896 - time (sec): 682.16 - samples/sec: 389.79 - lr: 0.000038 - momentum: 0.000000
209
+ 2023-10-13 12:20:36,705 epoch 8 - iter 1440/1809 - loss 0.01275007 - time (sec): 778.87 - samples/sec: 389.16 - lr: 0.000037 - momentum: 0.000000
210
+ 2023-10-13 12:22:13,261 epoch 8 - iter 1620/1809 - loss 0.01281415 - time (sec): 875.43 - samples/sec: 389.99 - lr: 0.000035 - momentum: 0.000000
211
+ 2023-10-13 12:23:53,434 epoch 8 - iter 1800/1809 - loss 0.01361457 - time (sec): 975.60 - samples/sec: 387.96 - lr: 0.000033 - momentum: 0.000000
212
+ 2023-10-13 12:23:57,584 ----------------------------------------------------------------------------------------------------
213
+ 2023-10-13 12:23:57,584 EPOCH 8 done: loss 0.0136 - lr: 0.000033
214
+ 2023-10-13 12:24:39,214 DEV : loss 0.3133712708950043 - f1-score (micro avg) 0.6443
215
+ 2023-10-13 12:24:39,281 ----------------------------------------------------------------------------------------------------
216
+ 2023-10-13 12:26:14,295 epoch 9 - iter 180/1809 - loss 0.00811211 - time (sec): 95.01 - samples/sec: 381.60 - lr: 0.000032 - momentum: 0.000000
217
+ 2023-10-13 12:27:51,642 epoch 9 - iter 360/1809 - loss 0.01014665 - time (sec): 192.36 - samples/sec: 386.96 - lr: 0.000030 - momentum: 0.000000
218
+ 2023-10-13 12:29:28,757 epoch 9 - iter 540/1809 - loss 0.01204379 - time (sec): 289.47 - samples/sec: 389.86 - lr: 0.000028 - momentum: 0.000000
219
+ 2023-10-13 12:31:06,089 epoch 9 - iter 720/1809 - loss 0.01210666 - time (sec): 386.81 - samples/sec: 389.63 - lr: 0.000027 - momentum: 0.000000
220
+ 2023-10-13 12:32:48,351 epoch 9 - iter 900/1809 - loss 0.01191859 - time (sec): 489.07 - samples/sec: 387.06 - lr: 0.000025 - momentum: 0.000000
221
+ 2023-10-13 12:34:29,374 epoch 9 - iter 1080/1809 - loss 0.01103504 - time (sec): 590.09 - samples/sec: 385.32 - lr: 0.000023 - momentum: 0.000000
222
+ 2023-10-13 12:36:06,450 epoch 9 - iter 1260/1809 - loss 0.01185981 - time (sec): 687.17 - samples/sec: 384.86 - lr: 0.000022 - momentum: 0.000000
223
+ 2023-10-13 12:37:44,048 epoch 9 - iter 1440/1809 - loss 0.01164861 - time (sec): 784.76 - samples/sec: 383.08 - lr: 0.000020 - momentum: 0.000000
224
+ 2023-10-13 12:39:21,038 epoch 9 - iter 1620/1809 - loss 0.01129939 - time (sec): 881.75 - samples/sec: 384.22 - lr: 0.000018 - momentum: 0.000000
225
+ 2023-10-13 12:41:04,369 epoch 9 - iter 1800/1809 - loss 0.01127319 - time (sec): 985.09 - samples/sec: 383.82 - lr: 0.000017 - momentum: 0.000000
226
+ 2023-10-13 12:41:09,282 ----------------------------------------------------------------------------------------------------
227
+ 2023-10-13 12:41:09,283 EPOCH 9 done: loss 0.0112 - lr: 0.000017
228
+ 2023-10-13 12:41:52,191 DEV : loss 0.33530837297439575 - f1-score (micro avg) 0.6476
229
+ 2023-10-13 12:41:52,272 ----------------------------------------------------------------------------------------------------
230
+ 2023-10-13 12:43:32,493 epoch 10 - iter 180/1809 - loss 0.00503398 - time (sec): 100.22 - samples/sec: 380.94 - lr: 0.000015 - momentum: 0.000000
231
+ 2023-10-13 12:45:10,018 epoch 10 - iter 360/1809 - loss 0.00514473 - time (sec): 197.74 - samples/sec: 383.57 - lr: 0.000013 - momentum: 0.000000
232
+ 2023-10-13 12:46:46,023 epoch 10 - iter 540/1809 - loss 0.00587847 - time (sec): 293.75 - samples/sec: 385.18 - lr: 0.000012 - momentum: 0.000000
233
+ 2023-10-13 12:48:21,614 epoch 10 - iter 720/1809 - loss 0.00651019 - time (sec): 389.34 - samples/sec: 389.69 - lr: 0.000010 - momentum: 0.000000
234
+ 2023-10-13 12:49:57,424 epoch 10 - iter 900/1809 - loss 0.00679671 - time (sec): 485.15 - samples/sec: 389.14 - lr: 0.000008 - momentum: 0.000000
235
+ 2023-10-13 12:51:33,136 epoch 10 - iter 1080/1809 - loss 0.00712549 - time (sec): 580.86 - samples/sec: 389.66 - lr: 0.000007 - momentum: 0.000000
236
+ 2023-10-13 12:53:07,935 epoch 10 - iter 1260/1809 - loss 0.00705147 - time (sec): 675.66 - samples/sec: 391.82 - lr: 0.000005 - momentum: 0.000000
237
+ 2023-10-13 12:54:42,621 epoch 10 - iter 1440/1809 - loss 0.00701041 - time (sec): 770.35 - samples/sec: 394.49 - lr: 0.000003 - momentum: 0.000000
238
+ 2023-10-13 12:56:16,729 epoch 10 - iter 1620/1809 - loss 0.00703049 - time (sec): 864.45 - samples/sec: 392.77 - lr: 0.000002 - momentum: 0.000000
239
+ 2023-10-13 12:57:53,169 epoch 10 - iter 1800/1809 - loss 0.00712609 - time (sec): 960.89 - samples/sec: 393.85 - lr: 0.000000 - momentum: 0.000000
240
+ 2023-10-13 12:57:57,308 ----------------------------------------------------------------------------------------------------
241
+ 2023-10-13 12:57:57,308 EPOCH 10 done: loss 0.0071 - lr: 0.000000
242
+ 2023-10-13 12:58:38,220 DEV : loss 0.3420470952987671 - f1-score (micro avg) 0.6541
243
+ 2023-10-13 12:58:38,294 saving best model
244
+ 2023-10-13 12:58:45,503 ----------------------------------------------------------------------------------------------------
245
+ 2023-10-13 12:58:45,506 Loading model from best epoch ...
246
+ 2023-10-13 12:58:51,051 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org
247
+ 2023-10-13 12:59:49,682
248
+ Results:
249
+ - F-score (micro) 0.6291
250
+ - F-score (macro) 0.5022
251
+ - Accuracy 0.4688
252
+
253
+ By class:
254
+ precision recall f1-score support
255
+
256
+ loc 0.6234 0.7479 0.6800 591
257
+ pers 0.5742 0.6611 0.6146 357
258
+ org 0.2642 0.1772 0.2121 79
259
+
260
+ micro avg 0.5899 0.6738 0.6291 1027
261
+ macro avg 0.4873 0.5287 0.5022 1027
262
+ weighted avg 0.5787 0.6738 0.6213 1027
263
+
264
+ 2023-10-13 12:59:49,682 ----------------------------------------------------------------------------------------------------