lombardata commited on
Commit
1761b00
1 Parent(s): e00d895

🍻 cheers

Browse files
README.md CHANGED
@@ -1,7 +1,11 @@
1
  ---
 
 
2
  license: apache-2.0
3
  base_model: facebook/dinov2-large
4
  tags:
 
 
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
@@ -15,13 +19,13 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # dinov2-large-2024_01_24-with_data_aug_batch-size32_epochs85_freeze
17
 
18
- This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.0871
21
- - F1 Micro: 0.8657
22
- - F1 Macro: 0.8277
23
- - Roc Auc: 0.9141
24
- - Accuracy: 0.5807
25
  - Learning Rate: 0.0000
26
 
27
  ## Model description
 
1
  ---
2
+ language:
3
+ - eng
4
  license: apache-2.0
5
  base_model: facebook/dinov2-large
6
  tags:
7
+ - multilabel-image-classification
8
+ - multilabel
9
  - generated_from_trainer
10
  metrics:
11
  - accuracy
 
19
 
20
  # dinov2-large-2024_01_24-with_data_aug_batch-size32_epochs85_freeze
21
 
22
+ This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the multilabel_complete_dataset dataset.
23
  It achieves the following results on the evaluation set:
24
+ - Loss: 0.0864
25
+ - F1 Micro: 0.8668
26
+ - F1 Macro: 0.8381
27
+ - Roc Auc: 0.9138
28
+ - Accuracy: 0.5805
29
  - Learning Rate: 0.0000
30
 
31
  ## Model description
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 85.0,
3
+ "eval_accuracy": 0.5804980708523325,
4
+ "eval_f1_macro": 0.8380969738679429,
5
+ "eval_f1_micro": 0.866814335207588,
6
+ "eval_loss": 0.0863719955086708,
7
+ "eval_roc_auc": 0.9138424851830095,
8
+ "eval_runtime": 679.2028,
9
+ "eval_samples_per_second": 4.198,
10
+ "eval_steps_per_second": 0.133,
11
+ "learning_rate": 1.0000000000000002e-07,
12
+ "train_loss": 0.10122811819873383,
13
+ "train_runtime": 238033.4867,
14
+ "train_samples_per_second": 3.13,
15
+ "train_steps_per_second": 0.098
16
+ }
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "facebook/dinov2-large",
3
  "apply_layernorm": true,
4
  "architectures": [
5
  "NewheadDinov2ForImageClassification"
 
1
  {
2
+ "_name_or_path": "facebook/dinov2-large2024_01_24",
3
  "apply_layernorm": true,
4
  "architectures": [
5
  "NewheadDinov2ForImageClassification"
eval_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 85.0,
3
+ "eval_accuracy": 0.5804980708523325,
4
+ "eval_f1_macro": 0.8380969738679429,
5
+ "eval_f1_micro": 0.866814335207588,
6
+ "eval_loss": 0.0863719955086708,
7
+ "eval_roc_auc": 0.9138424851830095,
8
+ "eval_runtime": 679.2028,
9
+ "eval_samples_per_second": 4.198,
10
+ "eval_steps_per_second": 0.133,
11
+ "learning_rate": 1.0000000000000002e-07
12
+ }
runs/Jan24_12-39-13_datavisu4/events.out.tfevents.1706335170.datavisu4.49957.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44d0610a3743ee2428bd61ce11d14094a25de61298256d70fbb2f1a947b2cb86
3
+ size 634
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 85.0,
3
+ "learning_rate": 1.0000000000000002e-07,
4
+ "train_loss": 0.10122811819873383,
5
+ "train_runtime": 238033.4867,
6
+ "train_samples_per_second": 3.13,
7
+ "train_steps_per_second": 0.098
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.08695908635854721,
3
+ "best_model_checkpoint": "/home1/datawork/mcontini/models/multilabel/huggingface/dinov2-large-2024_01_24-with_data_aug_batch-size32_epochs85_freeze/checkpoint-22742",
4
+ "epoch": 85.0,
5
+ "eval_steps": 500,
6
+ "global_step": 23290,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_accuracy": 0.45894224077940154,
14
+ "eval_f1_macro": 0.6395389989693074,
15
+ "eval_f1_micro": 0.7737575503857426,
16
+ "eval_loss": 0.13585977256298065,
17
+ "eval_roc_auc": 0.8471240403763409,
18
+ "eval_runtime": 675.8068,
19
+ "eval_samples_per_second": 4.253,
20
+ "eval_steps_per_second": 0.133,
21
+ "learning_rate": 0.001,
22
+ "step": 274
23
+ },
24
+ {
25
+ "epoch": 1.82,
26
+ "learning_rate": 0.001,
27
+ "loss": 0.2459,
28
+ "step": 500
29
+ },
30
+ {
31
+ "epoch": 2.0,
32
+ "eval_accuracy": 0.4940848990953375,
33
+ "eval_f1_macro": 0.7304998296932924,
34
+ "eval_f1_micro": 0.8032231694499591,
35
+ "eval_loss": 0.12362784147262573,
36
+ "eval_roc_auc": 0.8697341470820456,
37
+ "eval_runtime": 678.2974,
38
+ "eval_samples_per_second": 4.237,
39
+ "eval_steps_per_second": 0.133,
40
+ "learning_rate": 0.001,
41
+ "step": 548
42
+ },
43
+ {
44
+ "epoch": 3.0,
45
+ "eval_accuracy": 0.5125260960334029,
46
+ "eval_f1_macro": 0.7426440054746392,
47
+ "eval_f1_micro": 0.8174202432866652,
48
+ "eval_loss": 0.11671263724565506,
49
+ "eval_roc_auc": 0.8827824537503088,
50
+ "eval_runtime": 674.2849,
51
+ "eval_samples_per_second": 4.262,
52
+ "eval_steps_per_second": 0.133,
53
+ "learning_rate": 0.001,
54
+ "step": 822
55
+ },
56
+ {
57
+ "epoch": 3.65,
58
+ "learning_rate": 0.001,
59
+ "loss": 0.1403,
60
+ "step": 1000
61
+ },
62
+ {
63
+ "epoch": 4.0,
64
+ "eval_accuracy": 0.5100904662491301,
65
+ "eval_f1_macro": 0.7481206268648029,
66
+ "eval_f1_micro": 0.817623068527773,
67
+ "eval_loss": 0.11555441468954086,
68
+ "eval_roc_auc": 0.8825597364016536,
69
+ "eval_runtime": 684.1218,
70
+ "eval_samples_per_second": 4.201,
71
+ "eval_steps_per_second": 0.132,
72
+ "learning_rate": 0.001,
73
+ "step": 1096
74
+ },
75
+ {
76
+ "epoch": 5.0,
77
+ "eval_accuracy": 0.5243562978427279,
78
+ "eval_f1_macro": 0.7614020034586013,
79
+ "eval_f1_micro": 0.8267689489351958,
80
+ "eval_loss": 0.11359219998121262,
81
+ "eval_roc_auc": 0.8886760312325277,
82
+ "eval_runtime": 674.0166,
83
+ "eval_samples_per_second": 4.264,
84
+ "eval_steps_per_second": 0.134,
85
+ "learning_rate": 0.001,
86
+ "step": 1370
87
+ },
88
+ {
89
+ "epoch": 5.47,
90
+ "learning_rate": 0.001,
91
+ "loss": 0.1313,
92
+ "step": 1500
93
+ },
94
+ {
95
+ "epoch": 6.0,
96
+ "eval_accuracy": 0.5219206680584552,
97
+ "eval_f1_macro": 0.7508698006051816,
98
+ "eval_f1_micro": 0.8210489222998767,
99
+ "eval_loss": 0.11100047826766968,
100
+ "eval_roc_auc": 0.877677266975988,
101
+ "eval_runtime": 676.1,
102
+ "eval_samples_per_second": 4.251,
103
+ "eval_steps_per_second": 0.133,
104
+ "learning_rate": 0.001,
105
+ "step": 1644
106
+ },
107
+ {
108
+ "epoch": 7.0,
109
+ "eval_accuracy": 0.5323590814196242,
110
+ "eval_f1_macro": 0.7613673312506429,
111
+ "eval_f1_micro": 0.8288991092740292,
112
+ "eval_loss": 0.10846547037363052,
113
+ "eval_roc_auc": 0.8846228046955259,
114
+ "eval_runtime": 682.0096,
115
+ "eval_samples_per_second": 4.214,
116
+ "eval_steps_per_second": 0.132,
117
+ "learning_rate": 0.001,
118
+ "step": 1918
119
+ },
120
+ {
121
+ "epoch": 7.3,
122
+ "learning_rate": 0.001,
123
+ "loss": 0.1289,
124
+ "step": 2000
125
+ },
126
+ {
127
+ "epoch": 8.0,
128
+ "eval_accuracy": 0.5379262352122477,
129
+ "eval_f1_macro": 0.7711215001442554,
130
+ "eval_f1_micro": 0.8331729408434757,
131
+ "eval_loss": 0.11005302518606186,
132
+ "eval_roc_auc": 0.8958012673255937,
133
+ "eval_runtime": 682.26,
134
+ "eval_samples_per_second": 4.212,
135
+ "eval_steps_per_second": 0.132,
136
+ "learning_rate": 0.001,
137
+ "step": 2192
138
+ },
139
+ {
140
+ "epoch": 9.0,
141
+ "eval_accuracy": 0.5139178844815588,
142
+ "eval_f1_macro": 0.7669688558128348,
143
+ "eval_f1_micro": 0.8271255519076193,
144
+ "eval_loss": 0.11129175871610641,
145
+ "eval_roc_auc": 0.8924250608458335,
146
+ "eval_runtime": 683.3423,
147
+ "eval_samples_per_second": 4.206,
148
+ "eval_steps_per_second": 0.132,
149
+ "learning_rate": 0.001,
150
+ "step": 2466
151
+ },
152
+ {
153
+ "epoch": 9.12,
154
+ "learning_rate": 0.001,
155
+ "loss": 0.1268,
156
+ "step": 2500
157
+ },
158
+ {
159
+ "epoch": 10.0,
160
+ "eval_accuracy": 0.5313152400835073,
161
+ "eval_f1_macro": 0.7610925982620881,
162
+ "eval_f1_micro": 0.8258011503697616,
163
+ "eval_loss": 0.11381296068429947,
164
+ "eval_roc_auc": 0.880444980112697,
165
+ "eval_runtime": 679.9943,
166
+ "eval_samples_per_second": 4.227,
167
+ "eval_steps_per_second": 0.132,
168
+ "learning_rate": 0.001,
169
+ "step": 2740
170
+ },
171
+ {
172
+ "epoch": 10.95,
173
+ "learning_rate": 0.001,
174
+ "loss": 0.1255,
175
+ "step": 3000
176
+ },
177
+ {
178
+ "epoch": 11.0,
179
+ "eval_accuracy": 0.5260960334029228,
180
+ "eval_f1_macro": 0.762697586166308,
181
+ "eval_f1_micro": 0.8262265016047684,
182
+ "eval_loss": 0.11390296369791031,
183
+ "eval_roc_auc": 0.8880168466934987,
184
+ "eval_runtime": 678.1509,
185
+ "eval_samples_per_second": 4.238,
186
+ "eval_steps_per_second": 0.133,
187
+ "learning_rate": 0.001,
188
+ "step": 3014
189
+ },
190
+ {
191
+ "epoch": 12.0,
192
+ "eval_accuracy": 0.5337508698677801,
193
+ "eval_f1_macro": 0.7573087365131856,
194
+ "eval_f1_micro": 0.8210012500744092,
195
+ "eval_loss": 0.11208122968673706,
196
+ "eval_roc_auc": 0.8736066784464123,
197
+ "eval_runtime": 680.166,
198
+ "eval_samples_per_second": 4.225,
199
+ "eval_steps_per_second": 0.132,
200
+ "learning_rate": 0.001,
201
+ "step": 3288
202
+ },
203
+ {
204
+ "epoch": 12.77,
205
+ "learning_rate": 0.001,
206
+ "loss": 0.1253,
207
+ "step": 3500
208
+ },
209
+ {
210
+ "epoch": 13.0,
211
+ "eval_accuracy": 0.5219206680584552,
212
+ "eval_f1_macro": 0.7489136029171714,
213
+ "eval_f1_micro": 0.8207366032466399,
214
+ "eval_loss": 0.1110881045460701,
215
+ "eval_roc_auc": 0.8803454162802951,
216
+ "eval_runtime": 682.0648,
217
+ "eval_samples_per_second": 4.214,
218
+ "eval_steps_per_second": 0.132,
219
+ "learning_rate": 0.001,
220
+ "step": 3562
221
+ },
222
+ {
223
+ "epoch": 14.0,
224
+ "eval_accuracy": 0.5400139178844816,
225
+ "eval_f1_macro": 0.7776741330298375,
226
+ "eval_f1_micro": 0.8408186469584993,
227
+ "eval_loss": 0.10247301310300827,
228
+ "eval_roc_auc": 0.8987147268632997,
229
+ "eval_runtime": 676.5367,
230
+ "eval_samples_per_second": 4.248,
231
+ "eval_steps_per_second": 0.133,
232
+ "learning_rate": 0.0001,
233
+ "step": 3836
234
+ },
235
+ {
236
+ "epoch": 14.6,
237
+ "learning_rate": 0.0001,
238
+ "loss": 0.1171,
239
+ "step": 4000
240
+ },
241
+ {
242
+ "epoch": 15.0,
243
+ "eval_accuracy": 0.5403618649965205,
244
+ "eval_f1_macro": 0.7795139529876273,
245
+ "eval_f1_micro": 0.842865329512894,
246
+ "eval_loss": 0.0998576357960701,
247
+ "eval_roc_auc": 0.897277663148542,
248
+ "eval_runtime": 675.6889,
249
+ "eval_samples_per_second": 4.253,
250
+ "eval_steps_per_second": 0.133,
251
+ "learning_rate": 0.0001,
252
+ "step": 4110
253
+ },
254
+ {
255
+ "epoch": 16.0,
256
+ "eval_accuracy": 0.5407098121085595,
257
+ "eval_f1_macro": 0.7861162275453341,
258
+ "eval_f1_micro": 0.8462626605556499,
259
+ "eval_loss": 0.10081179440021515,
260
+ "eval_roc_auc": 0.9032963122022265,
261
+ "eval_runtime": 680.4113,
262
+ "eval_samples_per_second": 4.224,
263
+ "eval_steps_per_second": 0.132,
264
+ "learning_rate": 0.0001,
265
+ "step": 4384
266
+ },
267
+ {
268
+ "epoch": 16.42,
269
+ "learning_rate": 0.0001,
270
+ "loss": 0.1107,
271
+ "step": 4500
272
+ },
273
+ {
274
+ "epoch": 17.0,
275
+ "eval_accuracy": 0.545929018789144,
276
+ "eval_f1_macro": 0.7877890037679841,
277
+ "eval_f1_micro": 0.8474232610532244,
278
+ "eval_loss": 0.10136950016021729,
279
+ "eval_roc_auc": 0.9054715489545434,
280
+ "eval_runtime": 689.6336,
281
+ "eval_samples_per_second": 4.167,
282
+ "eval_steps_per_second": 0.131,
283
+ "learning_rate": 0.0001,
284
+ "step": 4658
285
+ },
286
+ {
287
+ "epoch": 18.0,
288
+ "eval_accuracy": 0.5480167014613778,
289
+ "eval_f1_macro": 0.7867996984352024,
290
+ "eval_f1_micro": 0.8471123755334281,
291
+ "eval_loss": 0.09731467068195343,
292
+ "eval_roc_auc": 0.9019535814277009,
293
+ "eval_runtime": 689.7429,
294
+ "eval_samples_per_second": 4.167,
295
+ "eval_steps_per_second": 0.13,
296
+ "learning_rate": 0.0001,
297
+ "step": 4932
298
+ },
299
+ {
300
+ "epoch": 18.25,
301
+ "learning_rate": 0.0001,
302
+ "loss": 0.1078,
303
+ "step": 5000
304
+ },
305
+ {
306
+ "epoch": 19.0,
307
+ "eval_accuracy": 0.5480167014613778,
308
+ "eval_f1_macro": 0.789354289479613,
309
+ "eval_f1_micro": 0.849087519068874,
310
+ "eval_loss": 0.09738590568304062,
311
+ "eval_roc_auc": 0.9053669532212902,
312
+ "eval_runtime": 687.0367,
313
+ "eval_samples_per_second": 4.183,
314
+ "eval_steps_per_second": 0.131,
315
+ "learning_rate": 0.0001,
316
+ "step": 5206
317
+ },
318
+ {
319
+ "epoch": 20.0,
320
+ "eval_accuracy": 0.5549756437021572,
321
+ "eval_f1_macro": 0.7947863154349663,
322
+ "eval_f1_micro": 0.8497521508745941,
323
+ "eval_loss": 0.0971071869134903,
324
+ "eval_roc_auc": 0.9029799344302967,
325
+ "eval_runtime": 693.4393,
326
+ "eval_samples_per_second": 4.145,
327
+ "eval_steps_per_second": 0.13,
328
+ "learning_rate": 0.0001,
329
+ "step": 5480
330
+ },
331
+ {
332
+ "epoch": 20.07,
333
+ "learning_rate": 0.0001,
334
+ "loss": 0.1061,
335
+ "step": 5500
336
+ },
337
+ {
338
+ "epoch": 21.0,
339
+ "eval_accuracy": 0.5532359081419624,
340
+ "eval_f1_macro": 0.793994619616555,
341
+ "eval_f1_micro": 0.850910726332359,
342
+ "eval_loss": 0.09643097966909409,
343
+ "eval_roc_auc": 0.908055677859469,
344
+ "eval_runtime": 689.9756,
345
+ "eval_samples_per_second": 4.165,
346
+ "eval_steps_per_second": 0.13,
347
+ "learning_rate": 0.0001,
348
+ "step": 5754
349
+ },
350
+ {
351
+ "epoch": 21.9,
352
+ "learning_rate": 0.0001,
353
+ "loss": 0.1048,
354
+ "step": 6000
355
+ },
356
+ {
357
+ "epoch": 22.0,
358
+ "eval_accuracy": 0.5563674321503131,
359
+ "eval_f1_macro": 0.7973736665550476,
360
+ "eval_f1_micro": 0.8519603424966201,
361
+ "eval_loss": 0.096234992146492,
362
+ "eval_roc_auc": 0.9079748210535556,
363
+ "eval_runtime": 688.8118,
364
+ "eval_samples_per_second": 4.172,
365
+ "eval_steps_per_second": 0.131,
366
+ "learning_rate": 0.0001,
367
+ "step": 6028
368
+ },
369
+ {
370
+ "epoch": 23.0,
371
+ "eval_accuracy": 0.558455114822547,
372
+ "eval_f1_macro": 0.7969454250638132,
373
+ "eval_f1_micro": 0.8504731861198739,
374
+ "eval_loss": 0.09601961821317673,
375
+ "eval_roc_auc": 0.9012155078858011,
376
+ "eval_runtime": 688.0026,
377
+ "eval_samples_per_second": 4.177,
378
+ "eval_steps_per_second": 0.131,
379
+ "learning_rate": 0.0001,
380
+ "step": 6302
381
+ },
382
+ {
383
+ "epoch": 23.72,
384
+ "learning_rate": 0.0001,
385
+ "loss": 0.1038,
386
+ "step": 6500
387
+ },
388
+ {
389
+ "epoch": 24.0,
390
+ "eval_accuracy": 0.5626304801670147,
391
+ "eval_f1_macro": 0.7974458635640262,
392
+ "eval_f1_micro": 0.8510467909850132,
393
+ "eval_loss": 0.09510745108127594,
394
+ "eval_roc_auc": 0.9024119192380319,
395
+ "eval_runtime": 688.423,
396
+ "eval_samples_per_second": 4.175,
397
+ "eval_steps_per_second": 0.131,
398
+ "learning_rate": 0.0001,
399
+ "step": 6576
400
+ },
401
+ {
402
+ "epoch": 25.0,
403
+ "eval_accuracy": 0.5643702157272095,
404
+ "eval_f1_macro": 0.795289513465328,
405
+ "eval_f1_micro": 0.8511713367018835,
406
+ "eval_loss": 0.0944407731294632,
407
+ "eval_roc_auc": 0.9012469687818218,
408
+ "eval_runtime": 683.8812,
409
+ "eval_samples_per_second": 4.202,
410
+ "eval_steps_per_second": 0.132,
411
+ "learning_rate": 0.0001,
412
+ "step": 6850
413
+ },
414
+ {
415
+ "epoch": 25.55,
416
+ "learning_rate": 0.0001,
417
+ "loss": 0.1017,
418
+ "step": 7000
419
+ },
420
+ {
421
+ "epoch": 26.0,
422
+ "eval_accuracy": 0.5640222686151705,
423
+ "eval_f1_macro": 0.8036711965439244,
424
+ "eval_f1_micro": 0.8572393605043909,
425
+ "eval_loss": 0.0948282852768898,
426
+ "eval_roc_auc": 0.9111790013806387,
427
+ "eval_runtime": 681.6858,
428
+ "eval_samples_per_second": 4.216,
429
+ "eval_steps_per_second": 0.132,
430
+ "learning_rate": 0.0001,
431
+ "step": 7124
432
+ },
433
+ {
434
+ "epoch": 27.0,
435
+ "eval_accuracy": 0.5636743215031316,
436
+ "eval_f1_macro": 0.8034638180358344,
437
+ "eval_f1_micro": 0.8551240743881069,
438
+ "eval_loss": 0.09229259192943573,
439
+ "eval_roc_auc": 0.9086109391822021,
440
+ "eval_runtime": 683.6776,
441
+ "eval_samples_per_second": 4.204,
442
+ "eval_steps_per_second": 0.132,
443
+ "learning_rate": 0.0001,
444
+ "step": 7398
445
+ },
446
+ {
447
+ "epoch": 27.37,
448
+ "learning_rate": 0.0001,
449
+ "loss": 0.1008,
450
+ "step": 7500
451
+ },
452
+ {
453
+ "epoch": 28.0,
454
+ "eval_accuracy": 0.5643702157272095,
455
+ "eval_f1_macro": 0.8072611584992022,
456
+ "eval_f1_micro": 0.8561391580259505,
457
+ "eval_loss": 0.0919216200709343,
458
+ "eval_roc_auc": 0.9083895171196321,
459
+ "eval_runtime": 676.936,
460
+ "eval_samples_per_second": 4.246,
461
+ "eval_steps_per_second": 0.133,
462
+ "learning_rate": 0.0001,
463
+ "step": 7672
464
+ },
465
+ {
466
+ "epoch": 29.0,
467
+ "eval_accuracy": 0.5681976339596382,
468
+ "eval_f1_macro": 0.807775544791943,
469
+ "eval_f1_micro": 0.8571590844550463,
470
+ "eval_loss": 0.09229801595211029,
471
+ "eval_roc_auc": 0.9081680950570622,
472
+ "eval_runtime": 680.3447,
473
+ "eval_samples_per_second": 4.224,
474
+ "eval_steps_per_second": 0.132,
475
+ "learning_rate": 0.0001,
476
+ "step": 7946
477
+ },
478
+ {
479
+ "epoch": 29.2,
480
+ "learning_rate": 0.0001,
481
+ "loss": 0.1006,
482
+ "step": 8000
483
+ },
484
+ {
485
+ "epoch": 30.0,
486
+ "eval_accuracy": 0.5636743215031316,
487
+ "eval_f1_macro": 0.8078629475879894,
488
+ "eval_f1_micro": 0.8560661454525001,
489
+ "eval_loss": 0.09243426471948624,
490
+ "eval_roc_auc": 0.9107996520688381,
491
+ "eval_runtime": 679.1764,
492
+ "eval_samples_per_second": 4.232,
493
+ "eval_steps_per_second": 0.133,
494
+ "learning_rate": 0.0001,
495
+ "step": 8220
496
+ },
497
+ {
498
+ "epoch": 31.0,
499
+ "eval_accuracy": 0.5688935281837161,
500
+ "eval_f1_macro": 0.8043753783436429,
501
+ "eval_f1_micro": 0.8549068890666057,
502
+ "eval_loss": 0.09250637888908386,
503
+ "eval_roc_auc": 0.9050076062220636,
504
+ "eval_runtime": 675.7031,
505
+ "eval_samples_per_second": 4.253,
506
+ "eval_steps_per_second": 0.133,
507
+ "learning_rate": 0.0001,
508
+ "step": 8494
509
+ },
510
+ {
511
+ "epoch": 31.02,
512
+ "learning_rate": 0.0001,
513
+ "loss": 0.0987,
514
+ "step": 8500
515
+ },
516
+ {
517
+ "epoch": 32.0,
518
+ "eval_accuracy": 0.5678496868475992,
519
+ "eval_f1_macro": 0.8071226305218325,
520
+ "eval_f1_micro": 0.858236685057989,
521
+ "eval_loss": 0.09133294969797134,
522
+ "eval_roc_auc": 0.9117040473456065,
523
+ "eval_runtime": 677.7385,
524
+ "eval_samples_per_second": 4.241,
525
+ "eval_steps_per_second": 0.133,
526
+ "learning_rate": 0.0001,
527
+ "step": 8768
528
+ },
529
+ {
530
+ "epoch": 32.85,
531
+ "learning_rate": 0.0001,
532
+ "loss": 0.0983,
533
+ "step": 9000
534
+ },
535
+ {
536
+ "epoch": 33.0,
537
+ "eval_accuracy": 0.5692414752957551,
538
+ "eval_f1_macro": 0.8081519622072744,
539
+ "eval_f1_micro": 0.8570938803496942,
540
+ "eval_loss": 0.09114891290664673,
541
+ "eval_roc_auc": 0.9061295874509765,
542
+ "eval_runtime": 681.1845,
543
+ "eval_samples_per_second": 4.219,
544
+ "eval_steps_per_second": 0.132,
545
+ "learning_rate": 0.0001,
546
+ "step": 9042
547
+ },
548
+ {
549
+ "epoch": 34.0,
550
+ "eval_accuracy": 0.5709812108559499,
551
+ "eval_f1_macro": 0.8059984375887345,
552
+ "eval_f1_micro": 0.8570447522032734,
553
+ "eval_loss": 0.09058225899934769,
554
+ "eval_roc_auc": 0.9055923606377748,
555
+ "eval_runtime": 681.0802,
556
+ "eval_samples_per_second": 4.22,
557
+ "eval_steps_per_second": 0.132,
558
+ "learning_rate": 0.0001,
559
+ "step": 9316
560
+ },
561
+ {
562
+ "epoch": 34.67,
563
+ "learning_rate": 0.0001,
564
+ "loss": 0.0967,
565
+ "step": 9500
566
+ },
567
+ {
568
+ "epoch": 35.0,
569
+ "eval_accuracy": 0.5692414752957551,
570
+ "eval_f1_macro": 0.8103551770491668,
571
+ "eval_f1_micro": 0.857759845428198,
572
+ "eval_loss": 0.09091359376907349,
573
+ "eval_roc_auc": 0.9083150869963146,
574
+ "eval_runtime": 683.7099,
575
+ "eval_samples_per_second": 4.204,
576
+ "eval_steps_per_second": 0.132,
577
+ "learning_rate": 0.0001,
578
+ "step": 9590
579
+ },
580
+ {
581
+ "epoch": 36.0,
582
+ "eval_accuracy": 0.5748086290883786,
583
+ "eval_f1_macro": 0.8114188986781382,
584
+ "eval_f1_micro": 0.8582166040314315,
585
+ "eval_loss": 0.09166968613862991,
586
+ "eval_roc_auc": 0.9079081626467485,
587
+ "eval_runtime": 677.0062,
588
+ "eval_samples_per_second": 4.245,
589
+ "eval_steps_per_second": 0.133,
590
+ "learning_rate": 0.0001,
591
+ "step": 9864
592
+ },
593
+ {
594
+ "epoch": 36.5,
595
+ "learning_rate": 0.0001,
596
+ "loss": 0.0963,
597
+ "step": 10000
598
+ },
599
+ {
600
+ "epoch": 37.0,
601
+ "eval_accuracy": 0.5741127348643006,
602
+ "eval_f1_macro": 0.8104359485439742,
603
+ "eval_f1_micro": 0.8571918983865431,
604
+ "eval_loss": 0.09075025469064713,
605
+ "eval_roc_auc": 0.9057496153700481,
606
+ "eval_runtime": 682.1714,
607
+ "eval_samples_per_second": 4.213,
608
+ "eval_steps_per_second": 0.132,
609
+ "learning_rate": 0.0001,
610
+ "step": 10138
611
+ },
612
+ {
613
+ "epoch": 38.0,
614
+ "eval_accuracy": 0.5709812108559499,
615
+ "eval_f1_macro": 0.8135949001200257,
616
+ "eval_f1_micro": 0.8594423033325777,
617
+ "eval_loss": 0.09104561805725098,
618
+ "eval_roc_auc": 0.9101469439602342,
619
+ "eval_runtime": 690.1946,
620
+ "eval_samples_per_second": 4.164,
621
+ "eval_steps_per_second": 0.13,
622
+ "learning_rate": 0.0001,
623
+ "step": 10412
624
+ },
625
+ {
626
+ "epoch": 38.32,
627
+ "learning_rate": 0.0001,
628
+ "loss": 0.0957,
629
+ "step": 10500
630
+ },
631
+ {
632
+ "epoch": 39.0,
633
+ "eval_accuracy": 0.5685455810716771,
634
+ "eval_f1_macro": 0.808520223441343,
635
+ "eval_f1_micro": 0.8577247270464444,
636
+ "eval_loss": 0.09074629843235016,
637
+ "eval_roc_auc": 0.9098080230513902,
638
+ "eval_runtime": 678.1058,
639
+ "eval_samples_per_second": 4.238,
640
+ "eval_steps_per_second": 0.133,
641
+ "learning_rate": 0.0001,
642
+ "step": 10686
643
+ },
644
+ {
645
+ "epoch": 40.0,
646
+ "eval_accuracy": 0.5730688935281837,
647
+ "eval_f1_macro": 0.8111504469893477,
648
+ "eval_f1_micro": 0.8592332123411979,
649
+ "eval_loss": 0.09030281752347946,
650
+ "eval_roc_auc": 0.909802268885752,
651
+ "eval_runtime": 695.8681,
652
+ "eval_samples_per_second": 4.13,
653
+ "eval_steps_per_second": 0.129,
654
+ "learning_rate": 0.0001,
655
+ "step": 10960
656
+ },
657
+ {
658
+ "epoch": 40.15,
659
+ "learning_rate": 0.0001,
660
+ "loss": 0.0953,
661
+ "step": 11000
662
+ },
663
+ {
664
+ "epoch": 41.0,
665
+ "eval_accuracy": 0.5716771050800278,
666
+ "eval_f1_macro": 0.8133805742659422,
667
+ "eval_f1_micro": 0.8586208856801775,
668
+ "eval_loss": 0.09064245969057083,
669
+ "eval_roc_auc": 0.9086828782290092,
670
+ "eval_runtime": 687.8411,
671
+ "eval_samples_per_second": 4.178,
672
+ "eval_steps_per_second": 0.131,
673
+ "learning_rate": 0.0001,
674
+ "step": 11234
675
+ },
676
+ {
677
+ "epoch": 41.97,
678
+ "learning_rate": 0.0001,
679
+ "loss": 0.0943,
680
+ "step": 11500
681
+ },
682
+ {
683
+ "epoch": 42.0,
684
+ "eval_accuracy": 0.5664578983994433,
685
+ "eval_f1_macro": 0.8135815799291138,
686
+ "eval_f1_micro": 0.8584246692032484,
687
+ "eval_loss": 0.09031981229782104,
688
+ "eval_roc_auc": 0.9089139403154726,
689
+ "eval_runtime": 684.2332,
690
+ "eval_samples_per_second": 4.2,
691
+ "eval_steps_per_second": 0.132,
692
+ "learning_rate": 0.0001,
693
+ "step": 11508
694
+ },
695
+ {
696
+ "epoch": 43.0,
697
+ "eval_accuracy": 0.569937369519833,
698
+ "eval_f1_macro": 0.8177715667121555,
699
+ "eval_f1_micro": 0.8603735373537355,
700
+ "eval_loss": 0.09048929065465927,
701
+ "eval_roc_auc": 0.9131758350455123,
702
+ "eval_runtime": 683.871,
703
+ "eval_samples_per_second": 4.203,
704
+ "eval_steps_per_second": 0.132,
705
+ "learning_rate": 0.0001,
706
+ "step": 11782
707
+ },
708
+ {
709
+ "epoch": 43.8,
710
+ "learning_rate": 0.0001,
711
+ "loss": 0.0947,
712
+ "step": 12000
713
+ },
714
+ {
715
+ "epoch": 44.0,
716
+ "eval_accuracy": 0.5727209464161448,
717
+ "eval_f1_macro": 0.8149031816603105,
718
+ "eval_f1_micro": 0.8585443759981747,
719
+ "eval_loss": 0.090988889336586,
720
+ "eval_roc_auc": 0.9075230591693096,
721
+ "eval_runtime": 686.4073,
722
+ "eval_samples_per_second": 4.187,
723
+ "eval_steps_per_second": 0.131,
724
+ "learning_rate": 0.0001,
725
+ "step": 12056
726
+ },
727
+ {
728
+ "epoch": 45.0,
729
+ "eval_accuracy": 0.5727209464161448,
730
+ "eval_f1_macro": 0.8112945515986235,
731
+ "eval_f1_micro": 0.8590971272229823,
732
+ "eval_loss": 0.09051001071929932,
733
+ "eval_roc_auc": 0.9080583679272985,
734
+ "eval_runtime": 690.3088,
735
+ "eval_samples_per_second": 4.163,
736
+ "eval_steps_per_second": 0.13,
737
+ "learning_rate": 0.0001,
738
+ "step": 12330
739
+ },
740
+ {
741
+ "epoch": 45.62,
742
+ "learning_rate": 0.0001,
743
+ "loss": 0.0925,
744
+ "step": 12500
745
+ },
746
+ {
747
+ "epoch": 46.0,
748
+ "eval_accuracy": 0.5727209464161448,
749
+ "eval_f1_macro": 0.8138956921603455,
750
+ "eval_f1_micro": 0.8608370193943518,
751
+ "eval_loss": 0.08959119021892548,
752
+ "eval_roc_auc": 0.9107387478276688,
753
+ "eval_runtime": 684.5538,
754
+ "eval_samples_per_second": 4.198,
755
+ "eval_steps_per_second": 0.131,
756
+ "learning_rate": 0.0001,
757
+ "step": 12604
758
+ },
759
+ {
760
+ "epoch": 47.0,
761
+ "eval_accuracy": 0.5744606819763396,
762
+ "eval_f1_macro": 0.8154159530277365,
763
+ "eval_f1_micro": 0.8598835217540253,
764
+ "eval_loss": 0.08953865617513657,
765
+ "eval_roc_auc": 0.9079274426945352,
766
+ "eval_runtime": 681.6068,
767
+ "eval_samples_per_second": 4.217,
768
+ "eval_steps_per_second": 0.132,
769
+ "learning_rate": 0.0001,
770
+ "step": 12878
771
+ },
772
+ {
773
+ "epoch": 47.45,
774
+ "learning_rate": 0.0001,
775
+ "loss": 0.0928,
776
+ "step": 13000
777
+ },
778
+ {
779
+ "epoch": 48.0,
780
+ "eval_accuracy": 0.5744606819763396,
781
+ "eval_f1_macro": 0.8154966869589858,
782
+ "eval_f1_micro": 0.8605536922289807,
783
+ "eval_loss": 0.08962185680866241,
784
+ "eval_roc_auc": 0.9097631357688805,
785
+ "eval_runtime": 684.997,
786
+ "eval_samples_per_second": 4.196,
787
+ "eval_steps_per_second": 0.131,
788
+ "learning_rate": 0.0001,
789
+ "step": 13152
790
+ },
791
+ {
792
+ "epoch": 49.0,
793
+ "eval_accuracy": 0.5727209464161448,
794
+ "eval_f1_macro": 0.8168754926591527,
795
+ "eval_f1_micro": 0.8606169781580725,
796
+ "eval_loss": 0.08909053355455399,
797
+ "eval_roc_auc": 0.9130853382157057,
798
+ "eval_runtime": 683.2092,
799
+ "eval_samples_per_second": 4.207,
800
+ "eval_steps_per_second": 0.132,
801
+ "learning_rate": 0.0001,
802
+ "step": 13426
803
+ },
804
+ {
805
+ "epoch": 49.27,
806
+ "learning_rate": 0.0001,
807
+ "loss": 0.0914,
808
+ "step": 13500
809
+ },
810
+ {
811
+ "epoch": 50.0,
812
+ "eval_accuracy": 0.5734168406402227,
813
+ "eval_f1_macro": 0.8182687784925751,
814
+ "eval_f1_micro": 0.8616618652205841,
815
+ "eval_loss": 0.08951092511415482,
816
+ "eval_roc_auc": 0.9125141096821429,
817
+ "eval_runtime": 683.8641,
818
+ "eval_samples_per_second": 4.203,
819
+ "eval_steps_per_second": 0.132,
820
+ "learning_rate": 0.0001,
821
+ "step": 13700
822
+ },
823
+ {
824
+ "epoch": 51.0,
825
+ "eval_accuracy": 0.5668058455114823,
826
+ "eval_f1_macro": 0.8184177894108883,
827
+ "eval_f1_micro": 0.8608232987958555,
828
+ "eval_loss": 0.09029122442007065,
829
+ "eval_roc_auc": 0.914931294083072,
830
+ "eval_runtime": 685.4274,
831
+ "eval_samples_per_second": 4.193,
832
+ "eval_steps_per_second": 0.131,
833
+ "learning_rate": 0.0001,
834
+ "step": 13974
835
+ },
836
+ {
837
+ "epoch": 51.09,
838
+ "learning_rate": 0.0001,
839
+ "loss": 0.0919,
840
+ "step": 14000
841
+ },
842
+ {
843
+ "epoch": 52.0,
844
+ "eval_accuracy": 0.5762004175365344,
845
+ "eval_f1_macro": 0.8172163352414866,
846
+ "eval_f1_micro": 0.8617045454545454,
847
+ "eval_loss": 0.09041330218315125,
848
+ "eval_roc_auc": 0.9105776569849702,
849
+ "eval_runtime": 686.3022,
850
+ "eval_samples_per_second": 4.188,
851
+ "eval_steps_per_second": 0.131,
852
+ "learning_rate": 0.0001,
853
+ "step": 14248
854
+ },
855
+ {
856
+ "epoch": 52.92,
857
+ "learning_rate": 0.0001,
858
+ "loss": 0.091,
859
+ "step": 14500
860
+ },
861
+ {
862
+ "epoch": 53.0,
863
+ "eval_accuracy": 0.5734168406402227,
864
+ "eval_f1_macro": 0.8154347454270638,
865
+ "eval_f1_micro": 0.8604036655984708,
866
+ "eval_loss": 0.09106075763702393,
867
+ "eval_roc_auc": 0.913401765735465,
868
+ "eval_runtime": 686.9936,
869
+ "eval_samples_per_second": 4.183,
870
+ "eval_steps_per_second": 0.131,
871
+ "learning_rate": 0.0001,
872
+ "step": 14522
873
+ },
874
+ {
875
+ "epoch": 54.0,
876
+ "eval_accuracy": 0.5751565762004175,
877
+ "eval_f1_macro": 0.822392587875712,
878
+ "eval_f1_micro": 0.8628963639457711,
879
+ "eval_loss": 0.09085189551115036,
880
+ "eval_roc_auc": 0.9117971844954131,
881
+ "eval_runtime": 691.549,
882
+ "eval_samples_per_second": 4.156,
883
+ "eval_steps_per_second": 0.13,
884
+ "learning_rate": 0.0001,
885
+ "step": 14796
886
+ },
887
+ {
888
+ "epoch": 54.74,
889
+ "learning_rate": 0.0001,
890
+ "loss": 0.0907,
891
+ "step": 15000
892
+ },
893
+ {
894
+ "epoch": 55.0,
895
+ "eval_accuracy": 0.5720250521920668,
896
+ "eval_f1_macro": 0.8246722143238872,
897
+ "eval_f1_micro": 0.862824401752612,
898
+ "eval_loss": 0.0893503949046135,
899
+ "eval_roc_auc": 0.9150558423810694,
900
+ "eval_runtime": 687.0743,
901
+ "eval_samples_per_second": 4.183,
902
+ "eval_steps_per_second": 0.131,
903
+ "learning_rate": 0.0001,
904
+ "step": 15070
905
+ },
906
+ {
907
+ "epoch": 56.0,
908
+ "eval_accuracy": 0.5723729993041058,
909
+ "eval_f1_macro": 0.8197285299784532,
910
+ "eval_f1_micro": 0.8613505337062617,
911
+ "eval_loss": 0.0895121842622757,
912
+ "eval_roc_auc": 0.9088388874230271,
913
+ "eval_runtime": 688.6878,
914
+ "eval_samples_per_second": 4.173,
915
+ "eval_steps_per_second": 0.131,
916
+ "learning_rate": 1e-05,
917
+ "step": 15344
918
+ },
919
+ {
920
+ "epoch": 56.57,
921
+ "learning_rate": 1e-05,
922
+ "loss": 0.0883,
923
+ "step": 15500
924
+ },
925
+ {
926
+ "epoch": 57.0,
927
+ "eval_accuracy": 0.5755045233124565,
928
+ "eval_f1_macro": 0.8261680546876228,
929
+ "eval_f1_micro": 0.8653240324032403,
930
+ "eval_loss": 0.08795319497585297,
931
+ "eval_roc_auc": 0.9159717957369441,
932
+ "eval_runtime": 680.4805,
933
+ "eval_samples_per_second": 4.223,
934
+ "eval_steps_per_second": 0.132,
935
+ "learning_rate": 1e-05,
936
+ "step": 15618
937
+ },
938
+ {
939
+ "epoch": 58.0,
940
+ "eval_accuracy": 0.5782881002087683,
941
+ "eval_f1_macro": 0.8227228870436498,
942
+ "eval_f1_micro": 0.8639262127078114,
943
+ "eval_loss": 0.08846761286258698,
944
+ "eval_roc_auc": 0.9111322457907458,
945
+ "eval_runtime": 678.456,
946
+ "eval_samples_per_second": 4.236,
947
+ "eval_steps_per_second": 0.133,
948
+ "learning_rate": 1e-05,
949
+ "step": 15892
950
+ },
951
+ {
952
+ "epoch": 58.39,
953
+ "learning_rate": 1e-05,
954
+ "loss": 0.0872,
955
+ "step": 16000
956
+ },
957
+ {
958
+ "epoch": 59.0,
959
+ "eval_accuracy": 0.5765483646485734,
960
+ "eval_f1_macro": 0.8262742568594247,
961
+ "eval_f1_micro": 0.8655003656409969,
962
+ "eval_loss": 0.0878983661532402,
963
+ "eval_roc_auc": 0.9160905401214736,
964
+ "eval_runtime": 680.5904,
965
+ "eval_samples_per_second": 4.223,
966
+ "eval_steps_per_second": 0.132,
967
+ "learning_rate": 1e-05,
968
+ "step": 16166
969
+ },
970
+ {
971
+ "epoch": 60.0,
972
+ "eval_accuracy": 0.5800278357689631,
973
+ "eval_f1_macro": 0.8238378094426198,
974
+ "eval_f1_micro": 0.8654139156932453,
975
+ "eval_loss": 0.08844566345214844,
976
+ "eval_roc_auc": 0.914969231409518,
977
+ "eval_runtime": 682.0838,
978
+ "eval_samples_per_second": 4.214,
979
+ "eval_steps_per_second": 0.132,
980
+ "learning_rate": 1e-05,
981
+ "step": 16440
982
+ },
983
+ {
984
+ "epoch": 60.22,
985
+ "learning_rate": 1e-05,
986
+ "loss": 0.0873,
987
+ "step": 16500
988
+ },
989
+ {
990
+ "epoch": 61.0,
991
+ "eval_accuracy": 0.5744606819763396,
992
+ "eval_f1_macro": 0.8265572971487117,
993
+ "eval_f1_micro": 0.8651893408134642,
994
+ "eval_loss": 0.0878659188747406,
995
+ "eval_roc_auc": 0.9168337948077135,
996
+ "eval_runtime": 683.217,
997
+ "eval_samples_per_second": 4.207,
998
+ "eval_steps_per_second": 0.132,
999
+ "learning_rate": 1e-05,
1000
+ "step": 16714
1001
+ },
1002
+ {
1003
+ "epoch": 62.0,
1004
+ "eval_accuracy": 0.5765483646485734,
1005
+ "eval_f1_macro": 0.8251828516128455,
1006
+ "eval_f1_micro": 0.8649870071178397,
1007
+ "eval_loss": 0.08799029141664505,
1008
+ "eval_roc_auc": 0.9143652466938494,
1009
+ "eval_runtime": 680.2736,
1010
+ "eval_samples_per_second": 4.225,
1011
+ "eval_steps_per_second": 0.132,
1012
+ "learning_rate": 1e-05,
1013
+ "step": 16988
1014
+ },
1015
+ {
1016
+ "epoch": 62.04,
1017
+ "learning_rate": 1e-05,
1018
+ "loss": 0.0864,
1019
+ "step": 17000
1020
+ },
1021
+ {
1022
+ "epoch": 63.0,
1023
+ "eval_accuracy": 0.5800278357689631,
1024
+ "eval_f1_macro": 0.8266891115852992,
1025
+ "eval_f1_micro": 0.8650424929178471,
1026
+ "eval_loss": 0.08828118443489075,
1027
+ "eval_roc_auc": 0.9134011927141672,
1028
+ "eval_runtime": 677.3735,
1029
+ "eval_samples_per_second": 4.243,
1030
+ "eval_steps_per_second": 0.133,
1031
+ "learning_rate": 1e-05,
1032
+ "step": 17262
1033
+ },
1034
+ {
1035
+ "epoch": 63.87,
1036
+ "learning_rate": 1e-05,
1037
+ "loss": 0.086,
1038
+ "step": 17500
1039
+ },
1040
+ {
1041
+ "epoch": 64.0,
1042
+ "eval_accuracy": 0.5782881002087683,
1043
+ "eval_f1_macro": 0.8256635970178378,
1044
+ "eval_f1_micro": 0.8667077889306342,
1045
+ "eval_loss": 0.08754145354032516,
1046
+ "eval_roc_auc": 0.9178472944451183,
1047
+ "eval_runtime": 682.5828,
1048
+ "eval_samples_per_second": 4.21,
1049
+ "eval_steps_per_second": 0.132,
1050
+ "learning_rate": 1e-05,
1051
+ "step": 17536
1052
+ },
1053
+ {
1054
+ "epoch": 65.0,
1055
+ "eval_accuracy": 0.58107167710508,
1056
+ "eval_f1_macro": 0.8277460823758025,
1057
+ "eval_f1_micro": 0.8669750648764526,
1058
+ "eval_loss": 0.08722905069589615,
1059
+ "eval_roc_auc": 0.9159442206991787,
1060
+ "eval_runtime": 673.5072,
1061
+ "eval_samples_per_second": 4.267,
1062
+ "eval_steps_per_second": 0.134,
1063
+ "learning_rate": 1e-05,
1064
+ "step": 17810
1065
+ },
1066
+ {
1067
+ "epoch": 65.69,
1068
+ "learning_rate": 1e-05,
1069
+ "loss": 0.0855,
1070
+ "step": 18000
1071
+ },
1072
+ {
1073
+ "epoch": 66.0,
1074
+ "eval_accuracy": 0.581767571329158,
1075
+ "eval_f1_macro": 0.8263083392061107,
1076
+ "eval_f1_micro": 0.8662405972512867,
1077
+ "eval_loss": 0.0872766524553299,
1078
+ "eval_roc_auc": 0.9146675753101624,
1079
+ "eval_runtime": 674.2325,
1080
+ "eval_samples_per_second": 4.263,
1081
+ "eval_steps_per_second": 0.133,
1082
+ "learning_rate": 1e-05,
1083
+ "step": 18084
1084
+ },
1085
+ {
1086
+ "epoch": 67.0,
1087
+ "eval_accuracy": 0.5796798886569241,
1088
+ "eval_f1_macro": 0.8236507380069967,
1089
+ "eval_f1_micro": 0.8647603888351997,
1090
+ "eval_loss": 0.08779256045818329,
1091
+ "eval_roc_auc": 0.9121142845321298,
1092
+ "eval_runtime": 672.7686,
1093
+ "eval_samples_per_second": 4.272,
1094
+ "eval_steps_per_second": 0.134,
1095
+ "learning_rate": 1e-05,
1096
+ "step": 18358
1097
+ },
1098
+ {
1099
+ "epoch": 67.52,
1100
+ "learning_rate": 1e-05,
1101
+ "loss": 0.0853,
1102
+ "step": 18500
1103
+ },
1104
+ {
1105
+ "epoch": 68.0,
1106
+ "eval_accuracy": 0.580723729993041,
1107
+ "eval_f1_macro": 0.82334160354742,
1108
+ "eval_f1_micro": 0.8644058136221144,
1109
+ "eval_loss": 0.08787883818149567,
1110
+ "eval_roc_auc": 0.9110366175644288,
1111
+ "eval_runtime": 678.5717,
1112
+ "eval_samples_per_second": 4.235,
1113
+ "eval_steps_per_second": 0.133,
1114
+ "learning_rate": 1e-05,
1115
+ "step": 18632
1116
+ },
1117
+ {
1118
+ "epoch": 69.0,
1119
+ "eval_accuracy": 0.5831593597773138,
1120
+ "eval_f1_macro": 0.8274164123414606,
1121
+ "eval_f1_micro": 0.8653988078342322,
1122
+ "eval_loss": 0.08730249851942062,
1123
+ "eval_roc_auc": 0.9129307238034322,
1124
+ "eval_runtime": 682.302,
1125
+ "eval_samples_per_second": 4.212,
1126
+ "eval_steps_per_second": 0.132,
1127
+ "learning_rate": 1e-05,
1128
+ "step": 18906
1129
+ },
1130
+ {
1131
+ "epoch": 69.34,
1132
+ "learning_rate": 1e-05,
1133
+ "loss": 0.0854,
1134
+ "step": 19000
1135
+ },
1136
+ {
1137
+ "epoch": 70.0,
1138
+ "eval_accuracy": 0.58107167710508,
1139
+ "eval_f1_macro": 0.8286701109278063,
1140
+ "eval_f1_micro": 0.8661381908135155,
1141
+ "eval_loss": 0.08733326941728592,
1142
+ "eval_roc_auc": 0.9166425383550794,
1143
+ "eval_runtime": 673.3186,
1144
+ "eval_samples_per_second": 4.268,
1145
+ "eval_steps_per_second": 0.134,
1146
+ "learning_rate": 1e-05,
1147
+ "step": 19180
1148
+ },
1149
+ {
1150
+ "epoch": 71.0,
1151
+ "eval_accuracy": 0.5779401530967293,
1152
+ "eval_f1_macro": 0.8262073521627441,
1153
+ "eval_f1_micro": 0.865708650324035,
1154
+ "eval_loss": 0.08731996268033981,
1155
+ "eval_roc_auc": 0.9155950369973136,
1156
+ "eval_runtime": 672.8744,
1157
+ "eval_samples_per_second": 4.271,
1158
+ "eval_steps_per_second": 0.134,
1159
+ "learning_rate": 1e-05,
1160
+ "step": 19454
1161
+ },
1162
+ {
1163
+ "epoch": 71.17,
1164
+ "learning_rate": 1.0000000000000002e-06,
1165
+ "loss": 0.0847,
1166
+ "step": 19500
1167
+ },
1168
+ {
1169
+ "epoch": 72.0,
1170
+ "eval_accuracy": 0.5803757828810021,
1171
+ "eval_f1_macro": 0.8279492189021646,
1172
+ "eval_f1_micro": 0.8660418654245468,
1173
+ "eval_loss": 0.08729101717472076,
1174
+ "eval_roc_auc": 0.9172015860404081,
1175
+ "eval_runtime": 676.9231,
1176
+ "eval_samples_per_second": 4.246,
1177
+ "eval_steps_per_second": 0.133,
1178
+ "learning_rate": 1.0000000000000002e-06,
1179
+ "step": 19728
1180
+ },
1181
+ {
1182
+ "epoch": 72.99,
1183
+ "learning_rate": 1.0000000000000002e-06,
1184
+ "loss": 0.0852,
1185
+ "step": 20000
1186
+ },
1187
+ {
1188
+ "epoch": 73.0,
1189
+ "eval_accuracy": 0.5765483646485734,
1190
+ "eval_f1_macro": 0.8258696329291023,
1191
+ "eval_f1_micro": 0.8661956034096008,
1192
+ "eval_loss": 0.08899407833814621,
1193
+ "eval_roc_auc": 0.917537916377082,
1194
+ "eval_runtime": 674.8648,
1195
+ "eval_samples_per_second": 4.259,
1196
+ "eval_steps_per_second": 0.133,
1197
+ "learning_rate": 1.0000000000000002e-06,
1198
+ "step": 20002
1199
+ },
1200
+ {
1201
+ "epoch": 74.0,
1202
+ "eval_accuracy": 0.5835073068893528,
1203
+ "eval_f1_macro": 0.8266751443826955,
1204
+ "eval_f1_micro": 0.8663119764546072,
1205
+ "eval_loss": 0.08706125617027283,
1206
+ "eval_roc_auc": 0.9144583340958263,
1207
+ "eval_runtime": 676.3788,
1208
+ "eval_samples_per_second": 4.249,
1209
+ "eval_steps_per_second": 0.133,
1210
+ "learning_rate": 1.0000000000000002e-06,
1211
+ "step": 20276
1212
+ },
1213
+ {
1214
+ "epoch": 74.82,
1215
+ "learning_rate": 1.0000000000000002e-06,
1216
+ "loss": 0.0845,
1217
+ "step": 20500
1218
+ },
1219
+ {
1220
+ "epoch": 75.0,
1221
+ "eval_accuracy": 0.5762004175365344,
1222
+ "eval_f1_macro": 0.8242525331164202,
1223
+ "eval_f1_micro": 0.8650994982806247,
1224
+ "eval_loss": 0.08718431740999222,
1225
+ "eval_roc_auc": 0.9151367489348123,
1226
+ "eval_runtime": 674.1856,
1227
+ "eval_samples_per_second": 4.263,
1228
+ "eval_steps_per_second": 0.133,
1229
+ "learning_rate": 1.0000000000000002e-06,
1230
+ "step": 20550
1231
+ },
1232
+ {
1233
+ "epoch": 76.0,
1234
+ "eval_accuracy": 0.5775922059846903,
1235
+ "eval_f1_macro": 0.8258404959868192,
1236
+ "eval_f1_micro": 0.8660362490149724,
1237
+ "eval_loss": 0.08712752908468246,
1238
+ "eval_roc_auc": 0.9161823322373652,
1239
+ "eval_runtime": 676.0536,
1240
+ "eval_samples_per_second": 4.251,
1241
+ "eval_steps_per_second": 0.133,
1242
+ "learning_rate": 1.0000000000000002e-06,
1243
+ "step": 20824
1244
+ },
1245
+ {
1246
+ "epoch": 76.64,
1247
+ "learning_rate": 1.0000000000000002e-06,
1248
+ "loss": 0.0849,
1249
+ "step": 21000
1250
+ },
1251
+ {
1252
+ "epoch": 77.0,
1253
+ "eval_accuracy": 0.5779401530967293,
1254
+ "eval_f1_macro": 0.8262597281814207,
1255
+ "eval_f1_micro": 0.8654561858576745,
1256
+ "eval_loss": 0.08787967264652252,
1257
+ "eval_roc_auc": 0.915242017185023,
1258
+ "eval_runtime": 678.4216,
1259
+ "eval_samples_per_second": 4.236,
1260
+ "eval_steps_per_second": 0.133,
1261
+ "learning_rate": 1.0000000000000002e-06,
1262
+ "step": 21098
1263
+ },
1264
+ {
1265
+ "epoch": 78.0,
1266
+ "eval_accuracy": 0.5779401530967293,
1267
+ "eval_f1_macro": 0.824064674812195,
1268
+ "eval_f1_micro": 0.8647364849581541,
1269
+ "eval_loss": 0.08832630515098572,
1270
+ "eval_roc_auc": 0.9138800063627106,
1271
+ "eval_runtime": 674.504,
1272
+ "eval_samples_per_second": 4.261,
1273
+ "eval_steps_per_second": 0.133,
1274
+ "learning_rate": 1.0000000000000002e-06,
1275
+ "step": 21372
1276
+ },
1277
+ {
1278
+ "epoch": 78.47,
1279
+ "learning_rate": 1.0000000000000002e-06,
1280
+ "loss": 0.0853,
1281
+ "step": 21500
1282
+ },
1283
+ {
1284
+ "epoch": 79.0,
1285
+ "eval_accuracy": 0.580723729993041,
1286
+ "eval_f1_macro": 0.8283767069034536,
1287
+ "eval_f1_micro": 0.8667153859126425,
1288
+ "eval_loss": 0.08727473765611649,
1289
+ "eval_roc_auc": 0.9170071162464759,
1290
+ "eval_runtime": 680.1183,
1291
+ "eval_samples_per_second": 4.226,
1292
+ "eval_steps_per_second": 0.132,
1293
+ "learning_rate": 1.0000000000000002e-06,
1294
+ "step": 21646
1295
+ },
1296
+ {
1297
+ "epoch": 80.0,
1298
+ "eval_accuracy": 0.581419624217119,
1299
+ "eval_f1_macro": 0.8257519474670673,
1300
+ "eval_f1_micro": 0.8654216185625353,
1301
+ "eval_loss": 0.08734780550003052,
1302
+ "eval_roc_auc": 0.9139968326920274,
1303
+ "eval_runtime": 682.9935,
1304
+ "eval_samples_per_second": 4.208,
1305
+ "eval_steps_per_second": 0.132,
1306
+ "learning_rate": 1.0000000000000002e-06,
1307
+ "step": 21920
1308
+ },
1309
+ {
1310
+ "epoch": 80.29,
1311
+ "learning_rate": 1.0000000000000002e-07,
1312
+ "loss": 0.0838,
1313
+ "step": 22000
1314
+ },
1315
+ {
1316
+ "epoch": 81.0,
1317
+ "eval_accuracy": 0.5828114126652749,
1318
+ "eval_f1_macro": 0.8261813753948223,
1319
+ "eval_f1_micro": 0.8653922514039366,
1320
+ "eval_loss": 0.08708538860082626,
1321
+ "eval_roc_auc": 0.9131951648411291,
1322
+ "eval_runtime": 690.614,
1323
+ "eval_samples_per_second": 4.162,
1324
+ "eval_steps_per_second": 0.13,
1325
+ "learning_rate": 1.0000000000000002e-07,
1326
+ "step": 22194
1327
+ },
1328
+ {
1329
+ "epoch": 82.0,
1330
+ "eval_accuracy": 0.581767571329158,
1331
+ "eval_f1_macro": 0.8253000981325144,
1332
+ "eval_f1_micro": 0.866888801039137,
1333
+ "eval_loss": 0.08740255981683731,
1334
+ "eval_roc_auc": 0.9155308696670169,
1335
+ "eval_runtime": 680.0034,
1336
+ "eval_samples_per_second": 4.226,
1337
+ "eval_steps_per_second": 0.132,
1338
+ "learning_rate": 1.0000000000000002e-07,
1339
+ "step": 22468
1340
+ },
1341
+ {
1342
+ "epoch": 82.12,
1343
+ "learning_rate": 1.0000000000000002e-07,
1344
+ "loss": 0.0842,
1345
+ "step": 22500
1346
+ },
1347
+ {
1348
+ "epoch": 83.0,
1349
+ "eval_accuracy": 0.5845511482254697,
1350
+ "eval_f1_macro": 0.8282173993454429,
1351
+ "eval_f1_micro": 0.8666929710839298,
1352
+ "eval_loss": 0.08695908635854721,
1353
+ "eval_roc_auc": 0.9160732278767293,
1354
+ "eval_runtime": 685.2501,
1355
+ "eval_samples_per_second": 4.194,
1356
+ "eval_steps_per_second": 0.131,
1357
+ "learning_rate": 1.0000000000000002e-07,
1358
+ "step": 22742
1359
+ },
1360
+ {
1361
+ "epoch": 83.94,
1362
+ "learning_rate": 1.0000000000000002e-07,
1363
+ "loss": 0.0842,
1364
+ "step": 23000
1365
+ },
1366
+ {
1367
+ "epoch": 84.0,
1368
+ "eval_accuracy": 0.58107167710508,
1369
+ "eval_f1_macro": 0.8233437650206237,
1370
+ "eval_f1_micro": 0.8627316009866345,
1371
+ "eval_loss": 0.08810650557279587,
1372
+ "eval_roc_auc": 0.9079679208453217,
1373
+ "eval_runtime": 681.9462,
1374
+ "eval_samples_per_second": 4.214,
1375
+ "eval_steps_per_second": 0.132,
1376
+ "learning_rate": 1.0000000000000002e-07,
1377
+ "step": 23016
1378
+ },
1379
+ {
1380
+ "epoch": 85.0,
1381
+ "eval_accuracy": 0.580723729993041,
1382
+ "eval_f1_macro": 0.8276925304690478,
1383
+ "eval_f1_micro": 0.8657459814353634,
1384
+ "eval_loss": 0.08707784116268158,
1385
+ "eval_roc_auc": 0.9141406112899818,
1386
+ "eval_runtime": 686.2064,
1387
+ "eval_samples_per_second": 4.188,
1388
+ "eval_steps_per_second": 0.131,
1389
+ "learning_rate": 1.0000000000000002e-07,
1390
+ "step": 23290
1391
+ },
1392
+ {
1393
+ "epoch": 85.0,
1394
+ "learning_rate": 1.0000000000000002e-07,
1395
+ "step": 23290,
1396
+ "total_flos": 1.1045912459199104e+21,
1397
+ "train_loss": 0.10122811819873383,
1398
+ "train_runtime": 238033.4867,
1399
+ "train_samples_per_second": 3.13,
1400
+ "train_steps_per_second": 0.098
1401
+ }
1402
+ ],
1403
+ "logging_steps": 500,
1404
+ "max_steps": 23290,
1405
+ "num_input_tokens_seen": 0,
1406
+ "num_train_epochs": 85,
1407
+ "save_steps": 500,
1408
+ "total_flos": 1.1045912459199104e+21,
1409
+ "train_batch_size": 32,
1410
+ "trial_name": null,
1411
+ "trial_params": null
1412
+ }