zlucia commited on
Commit
dd8cb87
1 Parent(s): 070b8fd

End of training

Browse files
README.md CHANGED
@@ -4,28 +4,19 @@ library_name: peft
4
  tags:
5
  - generated_from_trainer
6
  base_model: mistralai/Mistral-7B-v0.1
7
- metrics:
8
- - accuracy
9
  model-index:
10
- - name: Mistral-7B-v0.1_district-court-db
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # Mistral-7B-v0.1_district-court-db
18
 
19
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0358
22
- - Precision Micro: 0.8142
23
- - Precision Macro: 0.7222
24
- - Recall Micro: 0.8142
25
- - Recall Macro: 0.7126
26
- - F1 Micro: 0.8142
27
- - F1 Macro: 0.7098
28
- - Accuracy: 0.8142
29
 
30
  ## Model description
31
 
@@ -48,46 +39,50 @@ The following hyperparameters were used during training:
48
  - train_batch_size: 4
49
  - eval_batch_size: 4
50
  - seed: 42
 
 
51
  - gradient_accumulation_steps: 4
52
- - total_train_batch_size: 16
 
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: constant
55
  - lr_scheduler_warmup_ratio: 0.03
56
- - training_steps: 1450
57
 
58
  ### Training results
59
 
60
- | Training Loss | Epoch | Step | Validation Loss | Precision Micro | Precision Macro | Recall Micro | Recall Macro | F1 Micro | F1 Macro | Accuracy |
61
- |:-------------:|:-----:|:----:|:---------------:|:---------------:|:---------------:|:------------:|:------------:|:--------:|:--------:|:--------:|
62
- | 0.1255 | 0.04 | 50 | 0.2459 | 0.2330 | 0.0980 | 0.2330 | 0.0939 | 0.2330 | 0.0773 | 0.2330 |
63
- | 0.1076 | 0.08 | 100 | 0.1451 | 0.4075 | 0.1951 | 0.4075 | 0.1846 | 0.4075 | 0.1681 | 0.4075 |
64
- | 0.066 | 0.12 | 150 | 0.1095 | 0.5387 | 0.3493 | 0.5387 | 0.2872 | 0.5387 | 0.2780 | 0.5387 |
65
- | 0.0699 | 0.16 | 200 | 0.0901 | 0.6208 | 0.3837 | 0.6208 | 0.3992 | 0.6208 | 0.3798 | 0.6208 |
66
- | 0.066 | 0.2 | 250 | 0.0883 | 0.6104 | 0.4544 | 0.6104 | 0.4312 | 0.6104 | 0.4135 | 0.6104 |
67
- | 0.0452 | 0.24 | 300 | 0.0879 | 0.6877 | 0.5649 | 0.6877 | 0.5135 | 0.6877 | 0.5092 | 0.6877 |
68
- | 0.0545 | 0.28 | 350 | 0.0761 | 0.6764 | 0.5194 | 0.6764 | 0.5288 | 0.6764 | 0.5040 | 0.6764 |
69
- | 0.0647 | 0.32 | 400 | 0.0665 | 0.7340 | 0.6193 | 0.7340 | 0.5252 | 0.7340 | 0.5493 | 0.7340 |
70
- | 0.056 | 0.36 | 450 | 0.0514 | 0.7396 | 0.6097 | 0.7396 | 0.5767 | 0.7396 | 0.5672 | 0.7396 |
71
- | 0.0513 | 0.4 | 500 | 0.0479 | 0.7613 | 0.6384 | 0.7613 | 0.6145 | 0.7613 | 0.6020 | 0.7613 |
72
- | 0.0501 | 0.44 | 550 | 0.0502 | 0.7509 | 0.6245 | 0.7509 | 0.6167 | 0.7509 | 0.6075 | 0.7509 |
73
- | 0.0533 | 0.48 | 600 | 0.0481 | 0.7642 | 0.6500 | 0.7642 | 0.6139 | 0.7642 | 0.6073 | 0.7642 |
74
- | 0.0462 | 0.52 | 650 | 0.0473 | 0.7481 | 0.5942 | 0.7481 | 0.5740 | 0.7481 | 0.5679 | 0.7481 |
75
- | 0.0496 | 0.56 | 700 | 0.0419 | 0.7972 | 0.6678 | 0.7972 | 0.6480 | 0.7972 | 0.6518 | 0.7972 |
76
- | 0.0614 | 0.6 | 750 | 0.0489 | 0.7774 | 0.6678 | 0.7774 | 0.6360 | 0.7774 | 0.6308 | 0.7774 |
77
- | 0.0468 | 0.64 | 800 | 0.0443 | 0.7830 | 0.6435 | 0.7830 | 0.6816 | 0.7830 | 0.6494 | 0.7830 |
78
- | 0.0477 | 0.68 | 850 | 0.0420 | 0.7972 | 0.7040 | 0.7972 | 0.6567 | 0.7972 | 0.6663 | 0.7972 |
79
- | 0.0519 | 0.72 | 900 | 0.0463 | 0.7632 | 0.6519 | 0.7632 | 0.6291 | 0.7632 | 0.6292 | 0.7632 |
80
- | 0.0453 | 0.76 | 950 | 0.0429 | 0.7802 | 0.6757 | 0.7802 | 0.6698 | 0.7802 | 0.6564 | 0.7802 |
81
- | 0.0452 | 0.79 | 1000 | 0.0471 | 0.7377 | 0.6182 | 0.7377 | 0.6300 | 0.7377 | 0.6049 | 0.7377 |
82
- | 0.0367 | 0.83 | 1050 | 0.0388 | 0.7981 | 0.6857 | 0.7981 | 0.6992 | 0.7981 | 0.6801 | 0.7981 |
83
- | 0.0377 | 0.87 | 1100 | 0.0382 | 0.8 | 0.6636 | 0.8 | 0.6698 | 0.8000 | 0.6591 | 0.8 |
84
- | 0.0429 | 0.91 | 1150 | 0.0398 | 0.7953 | 0.6924 | 0.7953 | 0.6441 | 0.7953 | 0.6466 | 0.7953 |
85
- | 0.0451 | 0.95 | 1200 | 0.0378 | 0.7943 | 0.6713 | 0.7943 | 0.6538 | 0.7943 | 0.6535 | 0.7943 |
86
- | 0.0347 | 0.99 | 1250 | 0.0413 | 0.7840 | 0.6735 | 0.7840 | 0.6450 | 0.7840 | 0.6331 | 0.7840 |
87
- | 0.0378 | 1.03 | 1300 | 0.0377 | 0.8047 | 0.7109 | 0.8047 | 0.6387 | 0.8047 | 0.6489 | 0.8047 |
88
- | 0.0357 | 1.07 | 1350 | 0.0386 | 0.8028 | 0.6899 | 0.8028 | 0.6559 | 0.8028 | 0.6649 | 0.8028 |
89
- | 0.0418 | 1.11 | 1400 | 0.0368 | 0.7962 | 0.7114 | 0.7962 | 0.6942 | 0.7962 | 0.6910 | 0.7962 |
90
- | 0.0293 | 1.15 | 1450 | 0.0358 | 0.8142 | 0.7222 | 0.8142 | 0.7126 | 0.8142 | 0.7098 | 0.8142 |
 
91
 
92
 
93
  ### Framework versions
 
4
  tags:
5
  - generated_from_trainer
6
  base_model: mistralai/Mistral-7B-v0.1
 
 
7
  model-index:
8
+ - name: Mistral-7B-v0.1_caselaw
9
  results: []
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # Mistral-7B-v0.1_caselaw
16
 
17
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 1.1640
 
 
 
 
 
 
 
20
 
21
  ## Model description
22
 
 
39
  - train_batch_size: 4
40
  - eval_batch_size: 4
41
  - seed: 42
42
+ - distributed_type: multi-GPU
43
+ - num_devices: 4
44
  - gradient_accumulation_steps: 4
45
+ - total_train_batch_size: 64
46
+ - total_eval_batch_size: 16
47
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
  - lr_scheduler_type: constant
49
  - lr_scheduler_warmup_ratio: 0.03
50
+ - num_epochs: 2.0
51
 
52
  ### Training results
53
 
54
+ | Training Loss | Epoch | Step | Validation Loss |
55
+ |:-------------:|:-----:|:----:|:---------------:|
56
+ | 1.2324 | 0.07 | 50 | 1.2373 |
57
+ | 1.2114 | 0.13 | 100 | 1.2199 |
58
+ | 1.1831 | 0.2 | 150 | 1.2111 |
59
+ | 1.2027 | 0.26 | 200 | 1.2048 |
60
+ | 1.1827 | 0.33 | 250 | 1.2001 |
61
+ | 1.1696 | 0.39 | 300 | 1.1973 |
62
+ | 1.2186 | 0.46 | 350 | 1.1938 |
63
+ | 1.1795 | 0.52 | 400 | 1.1919 |
64
+ | 1.2167 | 0.59 | 450 | 1.1884 |
65
+ | 1.1992 | 0.66 | 500 | 1.1840 |
66
+ | 1.2032 | 0.72 | 550 | 1.1824 |
67
+ | 1.1841 | 0.79 | 600 | 1.1798 |
68
+ | 1.166 | 0.85 | 650 | 1.1789 |
69
+ | 1.1641 | 0.92 | 700 | 1.1761 |
70
+ | 1.1859 | 0.98 | 750 | 1.1752 |
71
+ | 1.132 | 1.05 | 800 | 1.1736 |
72
+ | 1.1461 | 1.12 | 850 | 1.1724 |
73
+ | 1.0965 | 1.18 | 900 | 1.1726 |
74
+ | 1.1064 | 1.25 | 950 | 1.1724 |
75
+ | 1.123 | 1.31 | 1000 | 1.1729 |
76
+ | 1.1079 | 1.38 | 1050 | 1.1695 |
77
+ | 1.12 | 1.44 | 1100 | 1.1707 |
78
+ | 1.1288 | 1.51 | 1150 | 1.1693 |
79
+ | 1.133 | 1.57 | 1200 | 1.1676 |
80
+ | 1.1647 | 1.64 | 1250 | 1.1693 |
81
+ | 1.1269 | 1.71 | 1300 | 1.1658 |
82
+ | 1.1332 | 1.77 | 1350 | 1.1657 |
83
+ | 1.1276 | 1.84 | 1400 | 1.1681 |
84
+ | 1.1361 | 1.9 | 1450 | 1.1633 |
85
+ | 1.1205 | 1.97 | 1500 | 1.1640 |
86
 
87
 
88
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3e45b46710ea888a4dc454788bc0b057c6410011048773120d0a6de4f1e81f5b
3
  size 335605144
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a87aa1c22b9194aebc5bf0d5bfa563c8300e1a6549621dbf298d2728fc3e3a70
3
  size 335605144
all_results.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "epoch": 1.15,
3
  "eval_accuracy": 0.8141509433962264,
4
  "eval_f1_macro": 0.7097996478763092,
5
  "eval_f1_micro": 0.8141509433962264,
@@ -11,8 +11,8 @@
11
  "eval_runtime": 66.7734,
12
  "eval_samples_per_second": 15.875,
13
  "eval_steps_per_second": 3.969,
14
- "train_loss": 0.07879953698865298,
15
- "train_runtime": 5948.326,
16
- "train_samples_per_second": 3.9,
17
- "train_steps_per_second": 0.244
18
  }
 
1
  {
2
+ "epoch": 2.0,
3
  "eval_accuracy": 0.8141509433962264,
4
  "eval_f1_macro": 0.7097996478763092,
5
  "eval_f1_micro": 0.8141509433962264,
 
11
  "eval_runtime": 66.7734,
12
  "eval_samples_per_second": 15.875,
13
  "eval_steps_per_second": 3.969,
14
+ "train_loss": 1.1488912840840697,
15
+ "train_runtime": 10306.9039,
16
+ "train_samples_per_second": 9.463,
17
+ "train_steps_per_second": 0.148
18
  }
train_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "epoch": 1.15,
3
- "train_loss": 0.07879953698865298,
4
- "train_runtime": 5948.326,
5
- "train_samples_per_second": 3.9,
6
- "train_steps_per_second": 0.244
7
  }
 
1
  {
2
+ "epoch": 2.0,
3
+ "train_loss": 1.1488912840840697,
4
+ "train_runtime": 10306.9039,
5
+ "train_samples_per_second": 9.463,
6
+ "train_steps_per_second": 0.148
7
  }
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 1.1523941982912775,
5
  "eval_steps": 50,
6
- "global_step": 1450,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -11,1324 +11,1171 @@
11
  {
12
  "epoch": 0.01,
13
  "learning_rate": 3e-05,
14
- "loss": 2.0927,
15
  "step": 10
16
  },
17
  {
18
- "epoch": 0.02,
19
  "learning_rate": 3e-05,
20
- "loss": 0.267,
21
  "step": 20
22
  },
23
  {
24
- "epoch": 0.02,
25
  "learning_rate": 3e-05,
26
- "loss": 0.1926,
27
  "step": 30
28
  },
29
  {
30
- "epoch": 0.03,
31
  "learning_rate": 3e-05,
32
- "loss": 0.1601,
33
  "step": 40
34
  },
35
  {
36
- "epoch": 0.04,
37
  "learning_rate": 3e-05,
38
- "loss": 0.1255,
39
  "step": 50
40
  },
41
  {
42
- "epoch": 0.04,
43
- "eval_accuracy": 0.2330188679245283,
44
- "eval_f1_macro": 0.07731831394851697,
45
- "eval_f1_micro": 0.2330188679245283,
46
- "eval_loss": 0.24590551853179932,
47
- "eval_precision_macro": 0.09801465635567799,
48
- "eval_precision_micro": 0.2330188679245283,
49
- "eval_recall_macro": 0.0939265996231733,
50
- "eval_recall_micro": 0.2330188679245283,
51
- "eval_runtime": 67.1714,
52
- "eval_samples_per_second": 15.781,
53
- "eval_steps_per_second": 3.945,
54
  "step": 50
55
  },
56
  {
57
- "epoch": 0.05,
58
  "learning_rate": 3e-05,
59
- "loss": 0.6981,
60
  "step": 60
61
  },
62
  {
63
- "epoch": 0.06,
64
  "learning_rate": 3e-05,
65
- "loss": 0.1356,
66
  "step": 70
67
  },
68
  {
69
- "epoch": 0.06,
70
  "learning_rate": 3e-05,
71
- "loss": 0.0993,
72
  "step": 80
73
  },
74
  {
75
- "epoch": 0.07,
76
  "learning_rate": 3e-05,
77
- "loss": 0.1038,
78
  "step": 90
79
  },
80
  {
81
- "epoch": 0.08,
82
  "learning_rate": 3e-05,
83
- "loss": 0.1076,
84
  "step": 100
85
  },
86
  {
87
- "epoch": 0.08,
88
- "eval_accuracy": 0.4075471698113208,
89
- "eval_f1_macro": 0.1681261191284492,
90
- "eval_f1_micro": 0.4075471698113208,
91
- "eval_loss": 0.14505280554294586,
92
- "eval_precision_macro": 0.19505399860785297,
93
- "eval_precision_micro": 0.4075471698113208,
94
- "eval_recall_macro": 0.18462138174503467,
95
- "eval_recall_micro": 0.4075471698113208,
96
- "eval_runtime": 67.1009,
97
- "eval_samples_per_second": 15.797,
98
- "eval_steps_per_second": 3.949,
99
  "step": 100
100
  },
101
  {
102
- "epoch": 0.09,
103
  "learning_rate": 3e-05,
104
- "loss": 0.331,
105
  "step": 110
106
  },
107
  {
108
- "epoch": 0.1,
109
  "learning_rate": 3e-05,
110
- "loss": 0.0809,
111
  "step": 120
112
  },
113
  {
114
- "epoch": 0.1,
115
  "learning_rate": 3e-05,
116
- "loss": 0.0812,
117
  "step": 130
118
  },
119
  {
120
- "epoch": 0.11,
121
  "learning_rate": 3e-05,
122
- "loss": 0.0601,
123
  "step": 140
124
  },
125
  {
126
- "epoch": 0.12,
127
  "learning_rate": 3e-05,
128
- "loss": 0.066,
129
  "step": 150
130
  },
131
  {
132
- "epoch": 0.12,
133
- "eval_accuracy": 0.5386792452830189,
134
- "eval_f1_macro": 0.2780127225833117,
135
- "eval_f1_micro": 0.5386792452830189,
136
- "eval_loss": 0.10953618586063385,
137
- "eval_precision_macro": 0.3493311966119182,
138
- "eval_precision_micro": 0.5386792452830189,
139
- "eval_recall_macro": 0.2871523900319283,
140
- "eval_recall_micro": 0.5386792452830189,
141
- "eval_runtime": 67.0903,
142
- "eval_samples_per_second": 15.8,
143
- "eval_steps_per_second": 3.95,
144
  "step": 150
145
  },
146
  {
147
- "epoch": 0.13,
148
  "learning_rate": 3e-05,
149
- "loss": 0.2732,
150
  "step": 160
151
  },
152
  {
153
- "epoch": 0.14,
154
  "learning_rate": 3e-05,
155
- "loss": 0.0754,
156
  "step": 170
157
  },
158
  {
159
- "epoch": 0.14,
160
  "learning_rate": 3e-05,
161
- "loss": 0.0649,
162
  "step": 180
163
  },
164
  {
165
- "epoch": 0.15,
166
  "learning_rate": 3e-05,
167
- "loss": 0.0674,
168
  "step": 190
169
  },
170
  {
171
- "epoch": 0.16,
172
  "learning_rate": 3e-05,
173
- "loss": 0.0699,
174
  "step": 200
175
  },
176
  {
177
- "epoch": 0.16,
178
- "eval_accuracy": 0.620754716981132,
179
- "eval_f1_macro": 0.3797608124202816,
180
- "eval_f1_micro": 0.620754716981132,
181
- "eval_loss": 0.09009388834238052,
182
- "eval_precision_macro": 0.3837197141355178,
183
- "eval_precision_micro": 0.620754716981132,
184
- "eval_recall_macro": 0.39915842112719735,
185
- "eval_recall_micro": 0.620754716981132,
186
- "eval_runtime": 67.0671,
187
- "eval_samples_per_second": 15.805,
188
- "eval_steps_per_second": 3.951,
189
  "step": 200
190
  },
191
  {
192
- "epoch": 0.17,
193
  "learning_rate": 3e-05,
194
- "loss": 0.1946,
195
  "step": 210
196
  },
197
  {
198
- "epoch": 0.17,
199
  "learning_rate": 3e-05,
200
- "loss": 0.0657,
201
  "step": 220
202
  },
203
  {
204
- "epoch": 0.18,
205
  "learning_rate": 3e-05,
206
- "loss": 0.0547,
207
  "step": 230
208
  },
209
  {
210
- "epoch": 0.19,
211
  "learning_rate": 3e-05,
212
- "loss": 0.0615,
213
  "step": 240
214
  },
215
  {
216
- "epoch": 0.2,
217
  "learning_rate": 3e-05,
218
- "loss": 0.066,
219
  "step": 250
220
  },
221
  {
222
- "epoch": 0.2,
223
- "eval_accuracy": 0.6103773584905661,
224
- "eval_f1_macro": 0.41348516498355786,
225
- "eval_f1_micro": 0.6103773584905661,
226
- "eval_loss": 0.08832413703203201,
227
- "eval_precision_macro": 0.45439839834135715,
228
- "eval_precision_micro": 0.6103773584905661,
229
- "eval_recall_macro": 0.4312111435526721,
230
- "eval_recall_micro": 0.6103773584905661,
231
- "eval_runtime": 66.9345,
232
- "eval_samples_per_second": 15.836,
233
- "eval_steps_per_second": 3.959,
234
  "step": 250
235
  },
236
  {
237
- "epoch": 0.21,
238
  "learning_rate": 3e-05,
239
- "loss": 0.1494,
240
  "step": 260
241
  },
242
  {
243
- "epoch": 0.21,
244
  "learning_rate": 3e-05,
245
- "loss": 0.0655,
246
  "step": 270
247
  },
248
  {
249
- "epoch": 0.22,
250
  "learning_rate": 3e-05,
251
- "loss": 0.06,
252
  "step": 280
253
  },
254
  {
255
- "epoch": 0.23,
256
  "learning_rate": 3e-05,
257
- "loss": 0.0616,
258
  "step": 290
259
  },
260
  {
261
- "epoch": 0.24,
262
  "learning_rate": 3e-05,
263
- "loss": 0.0452,
264
  "step": 300
265
  },
266
  {
267
- "epoch": 0.24,
268
- "eval_accuracy": 0.6877358490566038,
269
- "eval_f1_macro": 0.5091555575082085,
270
- "eval_f1_micro": 0.6877358490566038,
271
- "eval_loss": 0.08789286762475967,
272
- "eval_precision_macro": 0.5649217974276287,
273
- "eval_precision_micro": 0.6877358490566038,
274
- "eval_recall_macro": 0.513496327466451,
275
- "eval_recall_micro": 0.6877358490566038,
276
- "eval_runtime": 67.1619,
277
- "eval_samples_per_second": 15.783,
278
- "eval_steps_per_second": 3.946,
279
  "step": 300
280
  },
281
  {
282
- "epoch": 0.25,
283
  "learning_rate": 3e-05,
284
- "loss": 0.1535,
285
  "step": 310
286
  },
287
  {
288
- "epoch": 0.25,
289
  "learning_rate": 3e-05,
290
- "loss": 0.0731,
291
  "step": 320
292
  },
293
  {
294
- "epoch": 0.26,
295
  "learning_rate": 3e-05,
296
- "loss": 0.044,
297
  "step": 330
298
  },
299
  {
300
- "epoch": 0.27,
301
  "learning_rate": 3e-05,
302
- "loss": 0.053,
303
  "step": 340
304
  },
305
  {
306
- "epoch": 0.28,
307
  "learning_rate": 3e-05,
308
- "loss": 0.0545,
309
  "step": 350
310
  },
311
  {
312
- "epoch": 0.28,
313
- "eval_accuracy": 0.6764150943396227,
314
- "eval_f1_macro": 0.503999030020007,
315
- "eval_f1_micro": 0.6764150943396227,
316
- "eval_loss": 0.07607663422822952,
317
- "eval_precision_macro": 0.5194445629359009,
318
- "eval_precision_micro": 0.6764150943396227,
319
- "eval_recall_macro": 0.5287937722322651,
320
- "eval_recall_micro": 0.6764150943396227,
321
- "eval_runtime": 67.2353,
322
- "eval_samples_per_second": 15.766,
323
- "eval_steps_per_second": 3.941,
324
  "step": 350
325
  },
326
  {
327
- "epoch": 0.29,
328
  "learning_rate": 3e-05,
329
- "loss": 0.1543,
330
  "step": 360
331
  },
332
  {
333
- "epoch": 0.29,
334
  "learning_rate": 3e-05,
335
- "loss": 0.0609,
336
  "step": 370
337
  },
338
  {
339
- "epoch": 0.3,
340
  "learning_rate": 3e-05,
341
- "loss": 0.0479,
342
  "step": 380
343
  },
344
  {
345
- "epoch": 0.31,
346
  "learning_rate": 3e-05,
347
- "loss": 0.0532,
348
  "step": 390
349
  },
350
  {
351
- "epoch": 0.32,
352
  "learning_rate": 3e-05,
353
- "loss": 0.0647,
354
  "step": 400
355
  },
356
  {
357
- "epoch": 0.32,
358
- "eval_accuracy": 0.7339622641509433,
359
- "eval_f1_macro": 0.5492932704438783,
360
- "eval_f1_micro": 0.7339622641509433,
361
- "eval_loss": 0.06653406471014023,
362
- "eval_precision_macro": 0.6193164476598846,
363
- "eval_precision_micro": 0.7339622641509433,
364
- "eval_recall_macro": 0.5252411264940735,
365
- "eval_recall_micro": 0.7339622641509433,
366
- "eval_runtime": 67.4334,
367
- "eval_samples_per_second": 15.719,
368
- "eval_steps_per_second": 3.93,
369
  "step": 400
370
  },
371
  {
372
- "epoch": 0.33,
373
  "learning_rate": 3e-05,
374
- "loss": 0.104,
375
  "step": 410
376
  },
377
  {
378
- "epoch": 0.33,
379
  "learning_rate": 3e-05,
380
- "loss": 0.0458,
381
  "step": 420
382
  },
383
  {
384
- "epoch": 0.34,
385
  "learning_rate": 3e-05,
386
- "loss": 0.0552,
387
  "step": 430
388
  },
389
  {
390
- "epoch": 0.35,
391
  "learning_rate": 3e-05,
392
- "loss": 0.0512,
393
  "step": 440
394
  },
395
  {
396
- "epoch": 0.36,
397
  "learning_rate": 3e-05,
398
- "loss": 0.056,
399
  "step": 450
400
  },
401
  {
402
- "epoch": 0.36,
403
- "eval_accuracy": 0.7396226415094339,
404
- "eval_f1_macro": 0.5671730153967399,
405
- "eval_f1_micro": 0.7396226415094339,
406
- "eval_loss": 0.05136344954371452,
407
- "eval_precision_macro": 0.6096698581228938,
408
- "eval_precision_micro": 0.7396226415094339,
409
- "eval_recall_macro": 0.5767264087198709,
410
- "eval_recall_micro": 0.7396226415094339,
411
- "eval_runtime": 66.8962,
412
- "eval_samples_per_second": 15.845,
413
- "eval_steps_per_second": 3.961,
414
  "step": 450
415
  },
416
  {
417
- "epoch": 0.37,
418
  "learning_rate": 3e-05,
419
- "loss": 0.0773,
420
  "step": 460
421
  },
422
  {
423
- "epoch": 0.37,
424
  "learning_rate": 3e-05,
425
- "loss": 0.0474,
426
  "step": 470
427
  },
428
  {
429
- "epoch": 0.38,
430
  "learning_rate": 3e-05,
431
- "loss": 0.0405,
432
  "step": 480
433
  },
434
  {
435
- "epoch": 0.39,
436
  "learning_rate": 3e-05,
437
- "loss": 0.0461,
438
  "step": 490
439
  },
440
  {
441
- "epoch": 0.4,
442
  "learning_rate": 3e-05,
443
- "loss": 0.0513,
444
  "step": 500
445
  },
446
  {
447
- "epoch": 0.4,
448
- "eval_accuracy": 0.7613207547169811,
449
- "eval_f1_macro": 0.601977568492687,
450
- "eval_f1_micro": 0.761320754716981,
451
- "eval_loss": 0.047934673726558685,
452
- "eval_precision_macro": 0.638418606498986,
453
- "eval_precision_micro": 0.7613207547169811,
454
- "eval_recall_macro": 0.6145296570629574,
455
- "eval_recall_micro": 0.7613207547169811,
456
- "eval_runtime": 66.9411,
457
- "eval_samples_per_second": 15.835,
458
- "eval_steps_per_second": 3.959,
459
  "step": 500
460
  },
461
  {
462
- "epoch": 0.41,
463
  "learning_rate": 3e-05,
464
- "loss": 0.0788,
465
  "step": 510
466
  },
467
  {
468
- "epoch": 0.41,
469
  "learning_rate": 3e-05,
470
- "loss": 0.0495,
471
  "step": 520
472
  },
473
  {
474
- "epoch": 0.42,
475
  "learning_rate": 3e-05,
476
- "loss": 0.0552,
477
  "step": 530
478
  },
479
  {
480
- "epoch": 0.43,
481
  "learning_rate": 3e-05,
482
- "loss": 0.0415,
483
  "step": 540
484
  },
485
  {
486
- "epoch": 0.44,
487
  "learning_rate": 3e-05,
488
- "loss": 0.0501,
489
  "step": 550
490
  },
491
  {
492
- "epoch": 0.44,
493
- "eval_accuracy": 0.7509433962264151,
494
- "eval_f1_macro": 0.6074975120648255,
495
- "eval_f1_micro": 0.7509433962264151,
496
- "eval_loss": 0.05019384250044823,
497
- "eval_precision_macro": 0.624502704252128,
498
- "eval_precision_micro": 0.7509433962264151,
499
- "eval_recall_macro": 0.6167049341328479,
500
- "eval_recall_micro": 0.7509433962264151,
501
- "eval_runtime": 67.3498,
502
- "eval_samples_per_second": 15.739,
503
- "eval_steps_per_second": 3.935,
504
  "step": 550
505
  },
506
  {
507
- "epoch": 0.45,
508
  "learning_rate": 3e-05,
509
- "loss": 0.0633,
510
  "step": 560
511
  },
512
  {
513
- "epoch": 0.45,
514
  "learning_rate": 3e-05,
515
- "loss": 0.0484,
516
  "step": 570
517
  },
518
  {
519
- "epoch": 0.46,
520
  "learning_rate": 3e-05,
521
- "loss": 0.0418,
522
  "step": 580
523
  },
524
  {
525
- "epoch": 0.47,
526
  "learning_rate": 3e-05,
527
- "loss": 0.0524,
528
  "step": 590
529
  },
530
  {
531
- "epoch": 0.48,
532
  "learning_rate": 3e-05,
533
- "loss": 0.0533,
534
  "step": 600
535
  },
536
  {
537
- "epoch": 0.48,
538
- "eval_accuracy": 0.7641509433962265,
539
- "eval_f1_macro": 0.607265930345707,
540
- "eval_f1_micro": 0.7641509433962265,
541
- "eval_loss": 0.048058342188596725,
542
- "eval_precision_macro": 0.6499724898555727,
543
- "eval_precision_micro": 0.7641509433962265,
544
- "eval_recall_macro": 0.6139175086252339,
545
- "eval_recall_micro": 0.7641509433962265,
546
- "eval_runtime": 66.897,
547
- "eval_samples_per_second": 15.845,
548
- "eval_steps_per_second": 3.961,
549
  "step": 600
550
  },
551
  {
552
- "epoch": 0.48,
553
  "learning_rate": 3e-05,
554
- "loss": 0.0418,
555
  "step": 610
556
  },
557
  {
558
- "epoch": 0.49,
559
  "learning_rate": 3e-05,
560
- "loss": 0.0482,
561
  "step": 620
562
  },
563
  {
564
- "epoch": 0.5,
565
  "learning_rate": 3e-05,
566
- "loss": 0.0458,
567
  "step": 630
568
  },
569
  {
570
- "epoch": 0.51,
571
  "learning_rate": 3e-05,
572
- "loss": 0.0432,
573
  "step": 640
574
  },
575
  {
576
- "epoch": 0.52,
577
  "learning_rate": 3e-05,
578
- "loss": 0.0462,
579
  "step": 650
580
  },
581
  {
582
- "epoch": 0.52,
583
- "eval_accuracy": 0.7481132075471698,
584
- "eval_f1_macro": 0.5679477471859753,
585
- "eval_f1_micro": 0.7481132075471698,
586
- "eval_loss": 0.047320980578660965,
587
- "eval_precision_macro": 0.5941670973495327,
588
- "eval_precision_micro": 0.7481132075471698,
589
- "eval_recall_macro": 0.5739727328111488,
590
- "eval_recall_micro": 0.7481132075471698,
591
- "eval_runtime": 67.2106,
592
- "eval_samples_per_second": 15.771,
593
- "eval_steps_per_second": 3.943,
594
  "step": 650
595
  },
596
  {
597
- "epoch": 0.52,
598
  "learning_rate": 3e-05,
599
- "loss": 0.0668,
600
  "step": 660
601
  },
602
  {
603
- "epoch": 0.53,
604
  "learning_rate": 3e-05,
605
- "loss": 0.0501,
606
  "step": 670
607
  },
608
  {
609
- "epoch": 0.54,
610
  "learning_rate": 3e-05,
611
- "loss": 0.0366,
612
  "step": 680
613
  },
614
  {
615
- "epoch": 0.55,
616
  "learning_rate": 3e-05,
617
- "loss": 0.0374,
618
  "step": 690
619
  },
620
  {
621
- "epoch": 0.56,
622
  "learning_rate": 3e-05,
623
- "loss": 0.0496,
624
  "step": 700
625
  },
626
  {
627
- "epoch": 0.56,
628
- "eval_accuracy": 0.7971698113207547,
629
- "eval_f1_macro": 0.6517694520426227,
630
- "eval_f1_micro": 0.7971698113207546,
631
- "eval_loss": 0.04193812981247902,
632
- "eval_precision_macro": 0.6678204026981202,
633
- "eval_precision_micro": 0.7971698113207547,
634
- "eval_recall_macro": 0.6480125227888868,
635
- "eval_recall_micro": 0.7971698113207547,
636
- "eval_runtime": 67.3982,
637
- "eval_samples_per_second": 15.727,
638
- "eval_steps_per_second": 3.932,
639
  "step": 700
640
  },
641
  {
642
- "epoch": 0.56,
643
  "learning_rate": 3e-05,
644
- "loss": 0.0649,
645
  "step": 710
646
  },
647
  {
648
- "epoch": 0.57,
649
  "learning_rate": 3e-05,
650
- "loss": 0.0447,
651
  "step": 720
652
  },
653
  {
654
- "epoch": 0.58,
655
  "learning_rate": 3e-05,
656
- "loss": 0.0442,
657
  "step": 730
658
  },
659
  {
660
- "epoch": 0.59,
661
  "learning_rate": 3e-05,
662
- "loss": 0.037,
663
  "step": 740
664
  },
665
  {
666
- "epoch": 0.6,
667
  "learning_rate": 3e-05,
668
- "loss": 0.0614,
669
  "step": 750
670
  },
671
  {
672
- "epoch": 0.6,
673
- "eval_accuracy": 0.7773584905660378,
674
- "eval_f1_macro": 0.6308119664331103,
675
- "eval_f1_micro": 0.7773584905660378,
676
- "eval_loss": 0.04885416477918625,
677
- "eval_precision_macro": 0.6677975283624125,
678
- "eval_precision_micro": 0.7773584905660378,
679
- "eval_recall_macro": 0.6360471775658058,
680
- "eval_recall_micro": 0.7773584905660378,
681
- "eval_runtime": 67.7832,
682
- "eval_samples_per_second": 15.638,
683
- "eval_steps_per_second": 3.91,
684
  "step": 750
685
  },
686
  {
687
- "epoch": 0.6,
688
  "learning_rate": 3e-05,
689
- "loss": 0.0649,
690
  "step": 760
691
  },
692
  {
693
- "epoch": 0.61,
694
  "learning_rate": 3e-05,
695
- "loss": 0.0426,
696
  "step": 770
697
  },
698
  {
699
- "epoch": 0.62,
700
  "learning_rate": 3e-05,
701
- "loss": 0.0347,
702
  "step": 780
703
  },
704
  {
705
- "epoch": 0.63,
706
  "learning_rate": 3e-05,
707
- "loss": 0.0414,
708
  "step": 790
709
  },
710
  {
711
- "epoch": 0.64,
712
  "learning_rate": 3e-05,
713
- "loss": 0.0468,
714
  "step": 800
715
  },
716
  {
717
- "epoch": 0.64,
718
- "eval_accuracy": 0.7830188679245284,
719
- "eval_f1_macro": 0.6493890925237205,
720
- "eval_f1_micro": 0.7830188679245284,
721
- "eval_loss": 0.044340912252664566,
722
- "eval_precision_macro": 0.6435014283226803,
723
- "eval_precision_micro": 0.7830188679245284,
724
- "eval_recall_macro": 0.6816157451405587,
725
- "eval_recall_micro": 0.7830188679245284,
726
- "eval_runtime": 67.2351,
727
- "eval_samples_per_second": 15.766,
728
- "eval_steps_per_second": 3.941,
729
  "step": 800
730
  },
731
  {
732
- "epoch": 0.64,
733
  "learning_rate": 3e-05,
734
- "loss": 0.052,
735
  "step": 810
736
  },
737
  {
738
- "epoch": 0.65,
739
  "learning_rate": 3e-05,
740
- "loss": 0.0414,
741
  "step": 820
742
  },
743
  {
744
- "epoch": 0.66,
745
  "learning_rate": 3e-05,
746
- "loss": 0.0342,
747
  "step": 830
748
  },
749
  {
750
- "epoch": 0.67,
751
  "learning_rate": 3e-05,
752
- "loss": 0.0451,
753
  "step": 840
754
  },
755
  {
756
- "epoch": 0.68,
757
  "learning_rate": 3e-05,
758
- "loss": 0.0477,
759
  "step": 850
760
  },
761
  {
762
- "epoch": 0.68,
763
- "eval_accuracy": 0.7971698113207547,
764
- "eval_f1_macro": 0.6662808099368048,
765
- "eval_f1_micro": 0.7971698113207546,
766
- "eval_loss": 0.041995830833911896,
767
- "eval_precision_macro": 0.7040157648486967,
768
- "eval_precision_micro": 0.7971698113207547,
769
- "eval_recall_macro": 0.6567342355863813,
770
- "eval_recall_micro": 0.7971698113207547,
771
- "eval_runtime": 67.3249,
772
- "eval_samples_per_second": 15.745,
773
- "eval_steps_per_second": 3.936,
774
  "step": 850
775
  },
776
  {
777
- "epoch": 0.68,
778
  "learning_rate": 3e-05,
779
- "loss": 0.0468,
780
  "step": 860
781
  },
782
  {
783
- "epoch": 0.69,
784
  "learning_rate": 3e-05,
785
- "loss": 0.0461,
786
  "step": 870
787
  },
788
  {
789
- "epoch": 0.7,
790
  "learning_rate": 3e-05,
791
- "loss": 0.0436,
792
  "step": 880
793
  },
794
  {
795
- "epoch": 0.71,
796
  "learning_rate": 3e-05,
797
- "loss": 0.0369,
798
  "step": 890
799
  },
800
  {
801
- "epoch": 0.72,
802
  "learning_rate": 3e-05,
803
- "loss": 0.0519,
804
  "step": 900
805
  },
806
  {
807
- "epoch": 0.72,
808
- "eval_accuracy": 0.7632075471698113,
809
- "eval_f1_macro": 0.6291599323302522,
810
- "eval_f1_micro": 0.7632075471698113,
811
- "eval_loss": 0.04627140238881111,
812
- "eval_precision_macro": 0.6519385252086033,
813
- "eval_precision_micro": 0.7632075471698113,
814
- "eval_recall_macro": 0.6290591814696965,
815
- "eval_recall_micro": 0.7632075471698113,
816
- "eval_runtime": 67.0228,
817
- "eval_samples_per_second": 15.816,
818
- "eval_steps_per_second": 3.954,
819
  "step": 900
820
  },
821
  {
822
- "epoch": 0.72,
823
  "learning_rate": 3e-05,
824
- "loss": 0.0543,
825
  "step": 910
826
  },
827
  {
828
- "epoch": 0.73,
829
  "learning_rate": 3e-05,
830
- "loss": 0.0426,
831
  "step": 920
832
  },
833
  {
834
- "epoch": 0.74,
835
  "learning_rate": 3e-05,
836
- "loss": 0.0421,
837
  "step": 930
838
  },
839
  {
840
- "epoch": 0.75,
841
  "learning_rate": 3e-05,
842
- "loss": 0.0338,
843
  "step": 940
844
  },
845
  {
846
- "epoch": 0.76,
847
  "learning_rate": 3e-05,
848
- "loss": 0.0453,
849
  "step": 950
850
  },
851
  {
852
- "epoch": 0.76,
853
- "eval_accuracy": 0.780188679245283,
854
- "eval_f1_macro": 0.6564187596520696,
855
- "eval_f1_micro": 0.780188679245283,
856
- "eval_loss": 0.042860858142375946,
857
- "eval_precision_macro": 0.67574812222591,
858
- "eval_precision_micro": 0.780188679245283,
859
- "eval_recall_macro": 0.6697872775950671,
860
- "eval_recall_micro": 0.780188679245283,
861
- "eval_runtime": 67.3483,
862
- "eval_samples_per_second": 15.739,
863
- "eval_steps_per_second": 3.935,
864
  "step": 950
865
  },
866
  {
867
- "epoch": 0.76,
868
  "learning_rate": 3e-05,
869
- "loss": 0.0554,
870
  "step": 960
871
  },
872
  {
873
- "epoch": 0.77,
874
  "learning_rate": 3e-05,
875
- "loss": 0.0397,
876
  "step": 970
877
  },
878
  {
879
- "epoch": 0.78,
880
  "learning_rate": 3e-05,
881
- "loss": 0.0407,
882
  "step": 980
883
  },
884
  {
885
- "epoch": 0.79,
886
  "learning_rate": 3e-05,
887
- "loss": 0.0361,
888
  "step": 990
889
  },
890
  {
891
- "epoch": 0.79,
892
  "learning_rate": 3e-05,
893
- "loss": 0.0452,
894
  "step": 1000
895
  },
896
  {
897
- "epoch": 0.79,
898
- "eval_accuracy": 0.7377358490566037,
899
- "eval_f1_macro": 0.6049285124615932,
900
- "eval_f1_micro": 0.7377358490566037,
901
- "eval_loss": 0.047125279903411865,
902
- "eval_precision_macro": 0.6181852032037266,
903
- "eval_precision_micro": 0.7377358490566037,
904
- "eval_recall_macro": 0.6300074429793591,
905
- "eval_recall_micro": 0.7377358490566037,
906
- "eval_runtime": 66.8035,
907
- "eval_samples_per_second": 15.867,
908
- "eval_steps_per_second": 3.967,
909
  "step": 1000
910
  },
911
  {
912
- "epoch": 0.8,
913
  "learning_rate": 3e-05,
914
- "loss": 0.0482,
915
  "step": 1010
916
  },
917
  {
918
- "epoch": 0.81,
919
  "learning_rate": 3e-05,
920
- "loss": 0.0379,
921
  "step": 1020
922
  },
923
  {
924
- "epoch": 0.82,
925
  "learning_rate": 3e-05,
926
- "loss": 0.0403,
927
  "step": 1030
928
  },
929
  {
930
- "epoch": 0.83,
931
  "learning_rate": 3e-05,
932
- "loss": 0.0471,
933
  "step": 1040
934
  },
935
  {
936
- "epoch": 0.83,
937
  "learning_rate": 3e-05,
938
- "loss": 0.0367,
939
  "step": 1050
940
  },
941
  {
942
- "epoch": 0.83,
943
- "eval_accuracy": 0.7981132075471699,
944
- "eval_f1_macro": 0.6800660818700823,
945
- "eval_f1_micro": 0.79811320754717,
946
- "eval_loss": 0.03875497728586197,
947
- "eval_precision_macro": 0.6856812225733196,
948
- "eval_precision_micro": 0.7981132075471699,
949
- "eval_recall_macro": 0.6992476720564776,
950
- "eval_recall_micro": 0.7981132075471699,
951
- "eval_runtime": 66.8444,
952
- "eval_samples_per_second": 15.858,
953
- "eval_steps_per_second": 3.964,
954
  "step": 1050
955
  },
956
  {
957
- "epoch": 0.84,
958
  "learning_rate": 3e-05,
959
- "loss": 0.0351,
960
  "step": 1060
961
  },
962
  {
963
- "epoch": 0.85,
964
  "learning_rate": 3e-05,
965
- "loss": 0.0479,
966
  "step": 1070
967
  },
968
  {
969
- "epoch": 0.86,
970
  "learning_rate": 3e-05,
971
- "loss": 0.0421,
972
  "step": 1080
973
  },
974
  {
975
- "epoch": 0.87,
976
  "learning_rate": 3e-05,
977
- "loss": 0.0406,
978
  "step": 1090
979
  },
980
  {
981
- "epoch": 0.87,
982
  "learning_rate": 3e-05,
983
- "loss": 0.0377,
984
  "step": 1100
985
  },
986
  {
987
- "epoch": 0.87,
988
- "eval_accuracy": 0.8,
989
- "eval_f1_macro": 0.6590911576508658,
990
- "eval_f1_micro": 0.8000000000000002,
991
- "eval_loss": 0.03815627098083496,
992
- "eval_precision_macro": 0.6636349851737382,
993
- "eval_precision_micro": 0.8,
994
- "eval_recall_macro": 0.6697553358712118,
995
- "eval_recall_micro": 0.8,
996
- "eval_runtime": 66.9434,
997
- "eval_samples_per_second": 15.834,
998
- "eval_steps_per_second": 3.959,
999
  "step": 1100
1000
  },
1001
  {
1002
- "epoch": 0.88,
1003
  "learning_rate": 3e-05,
1004
- "loss": 0.0365,
1005
  "step": 1110
1006
  },
1007
  {
1008
- "epoch": 0.89,
1009
  "learning_rate": 3e-05,
1010
- "loss": 0.0353,
1011
  "step": 1120
1012
  },
1013
  {
1014
- "epoch": 0.9,
1015
  "learning_rate": 3e-05,
1016
- "loss": 0.0388,
1017
  "step": 1130
1018
  },
1019
  {
1020
- "epoch": 0.91,
1021
  "learning_rate": 3e-05,
1022
- "loss": 0.0358,
1023
  "step": 1140
1024
  },
1025
  {
1026
- "epoch": 0.91,
1027
  "learning_rate": 3e-05,
1028
- "loss": 0.0429,
1029
  "step": 1150
1030
  },
1031
  {
1032
- "epoch": 0.91,
1033
- "eval_accuracy": 0.7952830188679245,
1034
- "eval_f1_macro": 0.6465609013784224,
1035
- "eval_f1_micro": 0.7952830188679245,
1036
- "eval_loss": 0.03976297378540039,
1037
- "eval_precision_macro": 0.6923924758215005,
1038
- "eval_precision_micro": 0.7952830188679245,
1039
- "eval_recall_macro": 0.6441492192889419,
1040
- "eval_recall_micro": 0.7952830188679245,
1041
- "eval_runtime": 67.1705,
1042
- "eval_samples_per_second": 15.781,
1043
- "eval_steps_per_second": 3.945,
1044
  "step": 1150
1045
  },
1046
  {
1047
- "epoch": 0.92,
1048
  "learning_rate": 3e-05,
1049
- "loss": 0.0461,
1050
  "step": 1160
1051
  },
1052
  {
1053
- "epoch": 0.93,
1054
  "learning_rate": 3e-05,
1055
- "loss": 0.0434,
1056
  "step": 1170
1057
  },
1058
  {
1059
- "epoch": 0.94,
1060
  "learning_rate": 3e-05,
1061
- "loss": 0.0524,
1062
  "step": 1180
1063
  },
1064
  {
1065
- "epoch": 0.95,
1066
  "learning_rate": 3e-05,
1067
- "loss": 0.0362,
1068
  "step": 1190
1069
  },
1070
  {
1071
- "epoch": 0.95,
1072
  "learning_rate": 3e-05,
1073
- "loss": 0.0451,
1074
  "step": 1200
1075
  },
1076
  {
1077
- "epoch": 0.95,
1078
- "eval_accuracy": 0.7943396226415095,
1079
- "eval_f1_macro": 0.6535399936575059,
1080
- "eval_f1_micro": 0.7943396226415095,
1081
- "eval_loss": 0.037755727767944336,
1082
- "eval_precision_macro": 0.6712905678869693,
1083
- "eval_precision_micro": 0.7943396226415095,
1084
- "eval_recall_macro": 0.6537773538776073,
1085
- "eval_recall_micro": 0.7943396226415095,
1086
- "eval_runtime": 66.9611,
1087
- "eval_samples_per_second": 15.83,
1088
- "eval_steps_per_second": 3.958,
1089
  "step": 1200
1090
  },
1091
  {
1092
- "epoch": 0.96,
1093
  "learning_rate": 3e-05,
1094
- "loss": 0.0456,
1095
  "step": 1210
1096
  },
1097
  {
1098
- "epoch": 0.97,
1099
  "learning_rate": 3e-05,
1100
- "loss": 0.0455,
1101
  "step": 1220
1102
  },
1103
  {
1104
- "epoch": 0.98,
1105
  "learning_rate": 3e-05,
1106
- "loss": 0.0409,
1107
  "step": 1230
1108
  },
1109
  {
1110
- "epoch": 0.99,
1111
  "learning_rate": 3e-05,
1112
- "loss": 0.037,
1113
  "step": 1240
1114
  },
1115
  {
1116
- "epoch": 0.99,
1117
  "learning_rate": 3e-05,
1118
- "loss": 0.0347,
1119
  "step": 1250
1120
  },
1121
  {
1122
- "epoch": 0.99,
1123
- "eval_accuracy": 0.7839622641509434,
1124
- "eval_f1_macro": 0.6330944207402169,
1125
- "eval_f1_micro": 0.7839622641509434,
1126
- "eval_loss": 0.041340529918670654,
1127
- "eval_precision_macro": 0.6735372413807635,
1128
- "eval_precision_micro": 0.7839622641509434,
1129
- "eval_recall_macro": 0.6450299050285588,
1130
- "eval_recall_micro": 0.7839622641509434,
1131
- "eval_runtime": 66.9053,
1132
- "eval_samples_per_second": 15.843,
1133
- "eval_steps_per_second": 3.961,
1134
  "step": 1250
1135
  },
1136
  {
1137
- "epoch": 1.0,
1138
  "learning_rate": 3e-05,
1139
- "loss": 0.0421,
1140
  "step": 1260
1141
  },
1142
  {
1143
- "epoch": 1.01,
1144
  "learning_rate": 3e-05,
1145
- "loss": 0.041,
1146
  "step": 1270
1147
  },
1148
  {
1149
- "epoch": 1.02,
1150
  "learning_rate": 3e-05,
1151
- "loss": 0.033,
1152
  "step": 1280
1153
  },
1154
  {
1155
- "epoch": 1.03,
1156
  "learning_rate": 3e-05,
1157
- "loss": 0.036,
1158
  "step": 1290
1159
  },
1160
  {
1161
- "epoch": 1.03,
1162
  "learning_rate": 3e-05,
1163
- "loss": 0.0378,
1164
  "step": 1300
1165
  },
1166
  {
1167
- "epoch": 1.03,
1168
- "eval_accuracy": 0.8047169811320755,
1169
- "eval_f1_macro": 0.6488791804614907,
1170
- "eval_f1_micro": 0.8047169811320755,
1171
- "eval_loss": 0.037683386355638504,
1172
- "eval_precision_macro": 0.7109359814450084,
1173
- "eval_precision_micro": 0.8047169811320755,
1174
- "eval_recall_macro": 0.6387082579227776,
1175
- "eval_recall_micro": 0.8047169811320755,
1176
- "eval_runtime": 67.3206,
1177
- "eval_samples_per_second": 15.746,
1178
- "eval_steps_per_second": 3.936,
1179
  "step": 1300
1180
  },
1181
  {
1182
- "epoch": 1.04,
1183
  "learning_rate": 3e-05,
1184
- "loss": 0.0343,
1185
  "step": 1310
1186
  },
1187
  {
1188
- "epoch": 1.05,
1189
  "learning_rate": 3e-05,
1190
- "loss": 0.0321,
1191
  "step": 1320
1192
  },
1193
  {
1194
- "epoch": 1.06,
1195
  "learning_rate": 3e-05,
1196
- "loss": 0.031,
1197
  "step": 1330
1198
  },
1199
  {
1200
- "epoch": 1.06,
1201
  "learning_rate": 3e-05,
1202
- "loss": 0.039,
1203
  "step": 1340
1204
  },
1205
  {
1206
- "epoch": 1.07,
1207
  "learning_rate": 3e-05,
1208
- "loss": 0.0357,
1209
  "step": 1350
1210
  },
1211
  {
1212
- "epoch": 1.07,
1213
- "eval_accuracy": 0.8028301886792453,
1214
- "eval_f1_macro": 0.6648963473667772,
1215
- "eval_f1_micro": 0.8028301886792453,
1216
- "eval_loss": 0.03860827535390854,
1217
- "eval_precision_macro": 0.6898539099210392,
1218
- "eval_precision_micro": 0.8028301886792453,
1219
- "eval_recall_macro": 0.6558796396655843,
1220
- "eval_recall_micro": 0.8028301886792453,
1221
- "eval_runtime": 67.0656,
1222
- "eval_samples_per_second": 15.805,
1223
- "eval_steps_per_second": 3.951,
1224
  "step": 1350
1225
  },
1226
  {
1227
- "epoch": 1.08,
1228
  "learning_rate": 3e-05,
1229
- "loss": 0.0445,
1230
  "step": 1360
1231
  },
1232
  {
1233
- "epoch": 1.09,
1234
  "learning_rate": 3e-05,
1235
- "loss": 0.0375,
1236
  "step": 1370
1237
  },
1238
  {
1239
- "epoch": 1.1,
1240
  "learning_rate": 3e-05,
1241
- "loss": 0.0375,
1242
  "step": 1380
1243
  },
1244
  {
1245
- "epoch": 1.1,
1246
  "learning_rate": 3e-05,
1247
- "loss": 0.0333,
1248
  "step": 1390
1249
  },
1250
  {
1251
- "epoch": 1.11,
1252
  "learning_rate": 3e-05,
1253
- "loss": 0.0418,
1254
  "step": 1400
1255
  },
1256
  {
1257
- "epoch": 1.11,
1258
- "eval_accuracy": 0.7962264150943397,
1259
- "eval_f1_macro": 0.6910242491250081,
1260
- "eval_f1_micro": 0.7962264150943396,
1261
- "eval_loss": 0.0368194542825222,
1262
- "eval_precision_macro": 0.7114033533579757,
1263
- "eval_precision_micro": 0.7962264150943397,
1264
- "eval_recall_macro": 0.6942176996685531,
1265
- "eval_recall_micro": 0.7962264150943397,
1266
- "eval_runtime": 66.8832,
1267
- "eval_samples_per_second": 15.849,
1268
- "eval_steps_per_second": 3.962,
1269
  "step": 1400
1270
  },
1271
  {
1272
- "epoch": 1.12,
1273
  "learning_rate": 3e-05,
1274
- "loss": 0.0414,
1275
  "step": 1410
1276
  },
1277
  {
1278
- "epoch": 1.13,
1279
  "learning_rate": 3e-05,
1280
- "loss": 0.0357,
1281
  "step": 1420
1282
  },
1283
  {
1284
- "epoch": 1.14,
1285
  "learning_rate": 3e-05,
1286
- "loss": 0.0272,
1287
  "step": 1430
1288
  },
1289
  {
1290
- "epoch": 1.14,
1291
  "learning_rate": 3e-05,
1292
- "loss": 0.0323,
1293
  "step": 1440
1294
  },
1295
  {
1296
- "epoch": 1.15,
1297
  "learning_rate": 3e-05,
1298
- "loss": 0.0293,
1299
  "step": 1450
1300
  },
1301
  {
1302
- "epoch": 1.15,
1303
- "eval_accuracy": 0.8141509433962264,
1304
- "eval_f1_macro": 0.7097996478763092,
1305
- "eval_f1_micro": 0.8141509433962264,
1306
- "eval_loss": 0.035770244896411896,
1307
- "eval_precision_macro": 0.7222302630120379,
1308
- "eval_precision_micro": 0.8141509433962264,
1309
- "eval_recall_macro": 0.7125706602249756,
1310
- "eval_recall_micro": 0.8141509433962264,
1311
- "eval_runtime": 67.0694,
1312
- "eval_samples_per_second": 15.805,
1313
- "eval_steps_per_second": 3.951,
1314
  "step": 1450
1315
  },
1316
  {
1317
- "epoch": 1.15,
1318
- "step": 1450,
1319
- "total_flos": 3.612646182806976e+17,
1320
- "train_loss": 0.07879953698865298,
1321
- "train_runtime": 5948.326,
1322
- "train_samples_per_second": 3.9,
1323
- "train_steps_per_second": 0.244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1324
  }
1325
  ],
1326
  "logging_steps": 10,
1327
- "max_steps": 1450,
1328
  "num_input_tokens_seen": 0,
1329
  "num_train_epochs": 2,
1330
- "save_steps": 250,
1331
- "total_flos": 3.612646182806976e+17,
1332
  "train_batch_size": 4,
1333
  "trial_name": null,
1334
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 2.0,
5
  "eval_steps": 50,
6
+ "global_step": 1524,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
11
  {
12
  "epoch": 0.01,
13
  "learning_rate": 3e-05,
14
+ "loss": 1.5105,
15
  "step": 10
16
  },
17
  {
18
+ "epoch": 0.03,
19
  "learning_rate": 3e-05,
20
+ "loss": 1.3015,
21
  "step": 20
22
  },
23
  {
24
+ "epoch": 0.04,
25
  "learning_rate": 3e-05,
26
+ "loss": 1.2713,
27
  "step": 30
28
  },
29
  {
30
+ "epoch": 0.05,
31
  "learning_rate": 3e-05,
32
+ "loss": 1.2762,
33
  "step": 40
34
  },
35
  {
36
+ "epoch": 0.07,
37
  "learning_rate": 3e-05,
38
+ "loss": 1.2324,
39
  "step": 50
40
  },
41
  {
42
+ "epoch": 0.07,
43
+ "eval_loss": 1.237317442893982,
44
+ "eval_runtime": 19.2113,
45
+ "eval_samples_per_second": 25.662,
46
+ "eval_steps_per_second": 1.614,
 
 
 
 
 
 
 
47
  "step": 50
48
  },
49
  {
50
+ "epoch": 0.08,
51
  "learning_rate": 3e-05,
52
+ "loss": 1.2385,
53
  "step": 60
54
  },
55
  {
56
+ "epoch": 0.09,
57
  "learning_rate": 3e-05,
58
+ "loss": 1.2294,
59
  "step": 70
60
  },
61
  {
62
+ "epoch": 0.1,
63
  "learning_rate": 3e-05,
64
+ "loss": 1.2065,
65
  "step": 80
66
  },
67
  {
68
+ "epoch": 0.12,
69
  "learning_rate": 3e-05,
70
+ "loss": 1.2171,
71
  "step": 90
72
  },
73
  {
74
+ "epoch": 0.13,
75
  "learning_rate": 3e-05,
76
+ "loss": 1.2114,
77
  "step": 100
78
  },
79
  {
80
+ "epoch": 0.13,
81
+ "eval_loss": 1.2198712825775146,
82
+ "eval_runtime": 19.1971,
83
+ "eval_samples_per_second": 25.681,
84
+ "eval_steps_per_second": 1.615,
 
 
 
 
 
 
 
85
  "step": 100
86
  },
87
  {
88
+ "epoch": 0.14,
89
  "learning_rate": 3e-05,
90
+ "loss": 1.1593,
91
  "step": 110
92
  },
93
  {
94
+ "epoch": 0.16,
95
  "learning_rate": 3e-05,
96
+ "loss": 1.1865,
97
  "step": 120
98
  },
99
  {
100
+ "epoch": 0.17,
101
  "learning_rate": 3e-05,
102
+ "loss": 1.1765,
103
  "step": 130
104
  },
105
  {
106
+ "epoch": 0.18,
107
  "learning_rate": 3e-05,
108
+ "loss": 1.2079,
109
  "step": 140
110
  },
111
  {
112
+ "epoch": 0.2,
113
  "learning_rate": 3e-05,
114
+ "loss": 1.1831,
115
  "step": 150
116
  },
117
  {
118
+ "epoch": 0.2,
119
+ "eval_loss": 1.2110768556594849,
120
+ "eval_runtime": 19.4082,
121
+ "eval_samples_per_second": 25.402,
122
+ "eval_steps_per_second": 1.597,
 
 
 
 
 
 
 
123
  "step": 150
124
  },
125
  {
126
+ "epoch": 0.21,
127
  "learning_rate": 3e-05,
128
+ "loss": 1.2075,
129
  "step": 160
130
  },
131
  {
132
+ "epoch": 0.22,
133
  "learning_rate": 3e-05,
134
+ "loss": 1.2124,
135
  "step": 170
136
  },
137
  {
138
+ "epoch": 0.24,
139
  "learning_rate": 3e-05,
140
+ "loss": 1.2023,
141
  "step": 180
142
  },
143
  {
144
+ "epoch": 0.25,
145
  "learning_rate": 3e-05,
146
+ "loss": 1.1721,
147
  "step": 190
148
  },
149
  {
150
+ "epoch": 0.26,
151
  "learning_rate": 3e-05,
152
+ "loss": 1.2027,
153
  "step": 200
154
  },
155
  {
156
+ "epoch": 0.26,
157
+ "eval_loss": 1.2048192024230957,
158
+ "eval_runtime": 19.3037,
159
+ "eval_samples_per_second": 25.539,
160
+ "eval_steps_per_second": 1.606,
 
 
 
 
 
 
 
161
  "step": 200
162
  },
163
  {
164
+ "epoch": 0.28,
165
  "learning_rate": 3e-05,
166
+ "loss": 1.1674,
167
  "step": 210
168
  },
169
  {
170
+ "epoch": 0.29,
171
  "learning_rate": 3e-05,
172
+ "loss": 1.1882,
173
  "step": 220
174
  },
175
  {
176
+ "epoch": 0.3,
177
  "learning_rate": 3e-05,
178
+ "loss": 1.2099,
179
  "step": 230
180
  },
181
  {
182
+ "epoch": 0.31,
183
  "learning_rate": 3e-05,
184
+ "loss": 1.1988,
185
  "step": 240
186
  },
187
  {
188
+ "epoch": 0.33,
189
  "learning_rate": 3e-05,
190
+ "loss": 1.1827,
191
  "step": 250
192
  },
193
  {
194
+ "epoch": 0.33,
195
+ "eval_loss": 1.2000844478607178,
196
+ "eval_runtime": 19.2851,
197
+ "eval_samples_per_second": 25.564,
198
+ "eval_steps_per_second": 1.607,
 
 
 
 
 
 
 
199
  "step": 250
200
  },
201
  {
202
+ "epoch": 0.34,
203
  "learning_rate": 3e-05,
204
+ "loss": 1.1452,
205
  "step": 260
206
  },
207
  {
208
+ "epoch": 0.35,
209
  "learning_rate": 3e-05,
210
+ "loss": 1.185,
211
  "step": 270
212
  },
213
  {
214
+ "epoch": 0.37,
215
  "learning_rate": 3e-05,
216
+ "loss": 1.1979,
217
  "step": 280
218
  },
219
  {
220
+ "epoch": 0.38,
221
  "learning_rate": 3e-05,
222
+ "loss": 1.2155,
223
  "step": 290
224
  },
225
  {
226
+ "epoch": 0.39,
227
  "learning_rate": 3e-05,
228
+ "loss": 1.1696,
229
  "step": 300
230
  },
231
  {
232
+ "epoch": 0.39,
233
+ "eval_loss": 1.1973345279693604,
234
+ "eval_runtime": 18.7078,
235
+ "eval_samples_per_second": 26.353,
236
+ "eval_steps_per_second": 1.657,
 
 
 
 
 
 
 
237
  "step": 300
238
  },
239
  {
240
+ "epoch": 0.41,
241
  "learning_rate": 3e-05,
242
+ "loss": 1.1426,
243
  "step": 310
244
  },
245
  {
246
+ "epoch": 0.42,
247
  "learning_rate": 3e-05,
248
+ "loss": 1.1691,
249
  "step": 320
250
  },
251
  {
252
+ "epoch": 0.43,
253
  "learning_rate": 3e-05,
254
+ "loss": 1.1991,
255
  "step": 330
256
  },
257
  {
258
+ "epoch": 0.45,
259
  "learning_rate": 3e-05,
260
+ "loss": 1.1992,
261
  "step": 340
262
  },
263
  {
264
+ "epoch": 0.46,
265
  "learning_rate": 3e-05,
266
+ "loss": 1.2186,
267
  "step": 350
268
  },
269
  {
270
+ "epoch": 0.46,
271
+ "eval_loss": 1.193804383277893,
272
+ "eval_runtime": 18.8972,
273
+ "eval_samples_per_second": 26.088,
274
+ "eval_steps_per_second": 1.64,
 
 
 
 
 
 
 
275
  "step": 350
276
  },
277
  {
278
+ "epoch": 0.47,
279
  "learning_rate": 3e-05,
280
+ "loss": 1.1691,
281
  "step": 360
282
  },
283
  {
284
+ "epoch": 0.49,
285
  "learning_rate": 3e-05,
286
+ "loss": 1.1595,
287
  "step": 370
288
  },
289
  {
290
+ "epoch": 0.5,
291
  "learning_rate": 3e-05,
292
+ "loss": 1.1494,
293
  "step": 380
294
  },
295
  {
296
+ "epoch": 0.51,
297
  "learning_rate": 3e-05,
298
+ "loss": 1.1985,
299
  "step": 390
300
  },
301
  {
302
+ "epoch": 0.52,
303
  "learning_rate": 3e-05,
304
+ "loss": 1.1795,
305
  "step": 400
306
  },
307
  {
308
+ "epoch": 0.52,
309
+ "eval_loss": 1.1919257640838623,
310
+ "eval_runtime": 19.3777,
311
+ "eval_samples_per_second": 25.442,
312
+ "eval_steps_per_second": 1.6,
 
 
 
 
 
 
 
313
  "step": 400
314
  },
315
  {
316
+ "epoch": 0.54,
317
  "learning_rate": 3e-05,
318
+ "loss": 1.1254,
319
  "step": 410
320
  },
321
  {
322
+ "epoch": 0.55,
323
  "learning_rate": 3e-05,
324
+ "loss": 1.1772,
325
  "step": 420
326
  },
327
  {
328
+ "epoch": 0.56,
329
  "learning_rate": 3e-05,
330
+ "loss": 1.1956,
331
  "step": 430
332
  },
333
  {
334
+ "epoch": 0.58,
335
  "learning_rate": 3e-05,
336
+ "loss": 1.1959,
337
  "step": 440
338
  },
339
  {
340
+ "epoch": 0.59,
341
  "learning_rate": 3e-05,
342
+ "loss": 1.2167,
343
  "step": 450
344
  },
345
  {
346
+ "epoch": 0.59,
347
+ "eval_loss": 1.188421607017517,
348
+ "eval_runtime": 18.7028,
349
+ "eval_samples_per_second": 26.36,
350
+ "eval_steps_per_second": 1.658,
 
 
 
 
 
 
 
351
  "step": 450
352
  },
353
  {
354
+ "epoch": 0.6,
355
  "learning_rate": 3e-05,
356
+ "loss": 1.1625,
357
  "step": 460
358
  },
359
  {
360
+ "epoch": 0.62,
361
  "learning_rate": 3e-05,
362
+ "loss": 1.1979,
363
  "step": 470
364
  },
365
  {
366
+ "epoch": 0.63,
367
  "learning_rate": 3e-05,
368
+ "loss": 1.1705,
369
  "step": 480
370
  },
371
  {
372
+ "epoch": 0.64,
373
  "learning_rate": 3e-05,
374
+ "loss": 1.1998,
375
  "step": 490
376
  },
377
  {
378
+ "epoch": 0.66,
379
  "learning_rate": 3e-05,
380
+ "loss": 1.1992,
381
  "step": 500
382
  },
383
  {
384
+ "epoch": 0.66,
385
+ "eval_loss": 1.1840450763702393,
386
+ "eval_runtime": 19.5434,
387
+ "eval_samples_per_second": 25.226,
388
+ "eval_steps_per_second": 1.586,
 
 
 
 
 
 
 
389
  "step": 500
390
  },
391
  {
392
+ "epoch": 0.67,
393
  "learning_rate": 3e-05,
394
+ "loss": 1.1842,
395
  "step": 510
396
  },
397
  {
398
+ "epoch": 0.68,
399
  "learning_rate": 3e-05,
400
+ "loss": 1.1598,
401
  "step": 520
402
  },
403
  {
404
+ "epoch": 0.7,
405
  "learning_rate": 3e-05,
406
+ "loss": 1.1538,
407
  "step": 530
408
  },
409
  {
410
+ "epoch": 0.71,
411
  "learning_rate": 3e-05,
412
+ "loss": 1.1506,
413
  "step": 540
414
  },
415
  {
416
+ "epoch": 0.72,
417
  "learning_rate": 3e-05,
418
+ "loss": 1.2032,
419
  "step": 550
420
  },
421
  {
422
+ "epoch": 0.72,
423
+ "eval_loss": 1.1824493408203125,
424
+ "eval_runtime": 18.4972,
425
+ "eval_samples_per_second": 26.653,
426
+ "eval_steps_per_second": 1.676,
 
 
 
 
 
 
 
427
  "step": 550
428
  },
429
  {
430
+ "epoch": 0.73,
431
  "learning_rate": 3e-05,
432
+ "loss": 1.1795,
433
  "step": 560
434
  },
435
  {
436
+ "epoch": 0.75,
437
  "learning_rate": 3e-05,
438
+ "loss": 1.1604,
439
  "step": 570
440
  },
441
  {
442
+ "epoch": 0.76,
443
  "learning_rate": 3e-05,
444
+ "loss": 1.1548,
445
  "step": 580
446
  },
447
  {
448
+ "epoch": 0.77,
449
  "learning_rate": 3e-05,
450
+ "loss": 1.1876,
451
  "step": 590
452
  },
453
  {
454
+ "epoch": 0.79,
455
  "learning_rate": 3e-05,
456
+ "loss": 1.1841,
457
  "step": 600
458
  },
459
  {
460
+ "epoch": 0.79,
461
+ "eval_loss": 1.1797986030578613,
462
+ "eval_runtime": 19.5627,
463
+ "eval_samples_per_second": 25.201,
464
+ "eval_steps_per_second": 1.585,
 
 
 
 
 
 
 
465
  "step": 600
466
  },
467
  {
468
+ "epoch": 0.8,
469
  "learning_rate": 3e-05,
470
+ "loss": 1.1579,
471
  "step": 610
472
  },
473
  {
474
+ "epoch": 0.81,
475
  "learning_rate": 3e-05,
476
+ "loss": 1.1858,
477
  "step": 620
478
  },
479
  {
480
+ "epoch": 0.83,
481
  "learning_rate": 3e-05,
482
+ "loss": 1.1994,
483
  "step": 630
484
  },
485
  {
486
+ "epoch": 0.84,
487
  "learning_rate": 3e-05,
488
+ "loss": 1.1712,
489
  "step": 640
490
  },
491
  {
492
+ "epoch": 0.85,
493
  "learning_rate": 3e-05,
494
+ "loss": 1.166,
495
  "step": 650
496
  },
497
  {
498
+ "epoch": 0.85,
499
+ "eval_loss": 1.1789214611053467,
500
+ "eval_runtime": 19.1568,
501
+ "eval_samples_per_second": 25.735,
502
+ "eval_steps_per_second": 1.618,
 
 
 
 
 
 
 
503
  "step": 650
504
  },
505
  {
506
+ "epoch": 0.87,
507
  "learning_rate": 3e-05,
508
+ "loss": 1.1426,
509
  "step": 660
510
  },
511
  {
512
+ "epoch": 0.88,
513
  "learning_rate": 3e-05,
514
+ "loss": 1.1291,
515
  "step": 670
516
  },
517
  {
518
+ "epoch": 0.89,
519
  "learning_rate": 3e-05,
520
+ "loss": 1.1825,
521
  "step": 680
522
  },
523
  {
524
+ "epoch": 0.91,
525
  "learning_rate": 3e-05,
526
+ "loss": 1.1214,
527
  "step": 690
528
  },
529
  {
530
+ "epoch": 0.92,
531
  "learning_rate": 3e-05,
532
+ "loss": 1.1641,
533
  "step": 700
534
  },
535
  {
536
+ "epoch": 0.92,
537
+ "eval_loss": 1.1761133670806885,
538
+ "eval_runtime": 18.1966,
539
+ "eval_samples_per_second": 27.093,
540
+ "eval_steps_per_second": 1.704,
 
 
 
 
 
 
 
541
  "step": 700
542
  },
543
  {
544
+ "epoch": 0.93,
545
  "learning_rate": 3e-05,
546
+ "loss": 1.1069,
547
  "step": 710
548
  },
549
  {
550
+ "epoch": 0.94,
551
  "learning_rate": 3e-05,
552
+ "loss": 1.1267,
553
  "step": 720
554
  },
555
  {
556
+ "epoch": 0.96,
557
  "learning_rate": 3e-05,
558
+ "loss": 1.1472,
559
  "step": 730
560
  },
561
  {
562
+ "epoch": 0.97,
563
  "learning_rate": 3e-05,
564
+ "loss": 1.2204,
565
  "step": 740
566
  },
567
  {
568
+ "epoch": 0.98,
569
  "learning_rate": 3e-05,
570
+ "loss": 1.1859,
571
  "step": 750
572
  },
573
  {
574
+ "epoch": 0.98,
575
+ "eval_loss": 1.1751502752304077,
576
+ "eval_runtime": 18.7143,
577
+ "eval_samples_per_second": 26.343,
578
+ "eval_steps_per_second": 1.656,
 
 
 
 
 
 
 
579
  "step": 750
580
  },
581
  {
582
+ "epoch": 1.0,
583
  "learning_rate": 3e-05,
584
+ "loss": 1.1314,
585
  "step": 760
586
  },
587
  {
588
+ "epoch": 1.01,
589
  "learning_rate": 3e-05,
590
+ "loss": 1.12,
591
  "step": 770
592
  },
593
  {
594
+ "epoch": 1.02,
595
  "learning_rate": 3e-05,
596
+ "loss": 1.1007,
597
  "step": 780
598
  },
599
  {
600
+ "epoch": 1.04,
601
  "learning_rate": 3e-05,
602
+ "loss": 1.0822,
603
  "step": 790
604
  },
605
  {
606
+ "epoch": 1.05,
607
  "learning_rate": 3e-05,
608
+ "loss": 1.132,
609
  "step": 800
610
  },
611
  {
612
+ "epoch": 1.05,
613
+ "eval_loss": 1.1736373901367188,
614
+ "eval_runtime": 19.2149,
615
+ "eval_samples_per_second": 25.657,
616
+ "eval_steps_per_second": 1.613,
 
 
 
 
 
 
 
617
  "step": 800
618
  },
619
  {
620
+ "epoch": 1.06,
621
  "learning_rate": 3e-05,
622
+ "loss": 1.1076,
623
  "step": 810
624
  },
625
  {
626
+ "epoch": 1.08,
627
  "learning_rate": 3e-05,
628
+ "loss": 1.1007,
629
  "step": 820
630
  },
631
  {
632
+ "epoch": 1.09,
633
  "learning_rate": 3e-05,
634
+ "loss": 1.1215,
635
  "step": 830
636
  },
637
  {
638
+ "epoch": 1.1,
639
  "learning_rate": 3e-05,
640
+ "loss": 1.0956,
641
  "step": 840
642
  },
643
  {
644
+ "epoch": 1.12,
645
  "learning_rate": 3e-05,
646
+ "loss": 1.1461,
647
  "step": 850
648
  },
649
  {
650
+ "epoch": 1.12,
651
+ "eval_loss": 1.1723910570144653,
652
+ "eval_runtime": 18.6093,
653
+ "eval_samples_per_second": 26.492,
654
+ "eval_steps_per_second": 1.666,
 
 
 
 
 
 
 
655
  "step": 850
656
  },
657
  {
658
+ "epoch": 1.13,
659
  "learning_rate": 3e-05,
660
+ "loss": 1.0818,
661
  "step": 860
662
  },
663
  {
664
+ "epoch": 1.14,
665
  "learning_rate": 3e-05,
666
+ "loss": 1.0959,
667
  "step": 870
668
  },
669
  {
670
+ "epoch": 1.15,
671
  "learning_rate": 3e-05,
672
+ "loss": 1.0948,
673
  "step": 880
674
  },
675
  {
676
+ "epoch": 1.17,
677
  "learning_rate": 3e-05,
678
+ "loss": 1.1246,
679
  "step": 890
680
  },
681
  {
682
+ "epoch": 1.18,
683
  "learning_rate": 3e-05,
684
+ "loss": 1.0965,
685
  "step": 900
686
  },
687
  {
688
+ "epoch": 1.18,
689
+ "eval_loss": 1.172638177871704,
690
+ "eval_runtime": 18.6729,
691
+ "eval_samples_per_second": 26.402,
692
+ "eval_steps_per_second": 1.66,
 
 
 
 
 
 
 
693
  "step": 900
694
  },
695
  {
696
+ "epoch": 1.19,
697
  "learning_rate": 3e-05,
698
+ "loss": 1.114,
699
  "step": 910
700
  },
701
  {
702
+ "epoch": 1.21,
703
  "learning_rate": 3e-05,
704
+ "loss": 1.1152,
705
  "step": 920
706
  },
707
  {
708
+ "epoch": 1.22,
709
  "learning_rate": 3e-05,
710
+ "loss": 1.0885,
711
  "step": 930
712
  },
713
  {
714
+ "epoch": 1.23,
715
  "learning_rate": 3e-05,
716
+ "loss": 1.1221,
717
  "step": 940
718
  },
719
  {
720
+ "epoch": 1.25,
721
  "learning_rate": 3e-05,
722
+ "loss": 1.1064,
723
  "step": 950
724
  },
725
  {
726
+ "epoch": 1.25,
727
+ "eval_loss": 1.172351598739624,
728
+ "eval_runtime": 19.5348,
729
+ "eval_samples_per_second": 25.237,
730
+ "eval_steps_per_second": 1.587,
 
 
 
 
 
 
 
731
  "step": 950
732
  },
733
  {
734
+ "epoch": 1.26,
735
  "learning_rate": 3e-05,
736
+ "loss": 1.0518,
737
  "step": 960
738
  },
739
  {
740
+ "epoch": 1.27,
741
  "learning_rate": 3e-05,
742
+ "loss": 1.0938,
743
  "step": 970
744
  },
745
  {
746
+ "epoch": 1.29,
747
  "learning_rate": 3e-05,
748
+ "loss": 1.1184,
749
  "step": 980
750
  },
751
  {
752
+ "epoch": 1.3,
753
  "learning_rate": 3e-05,
754
+ "loss": 1.09,
755
  "step": 990
756
  },
757
  {
758
+ "epoch": 1.31,
759
  "learning_rate": 3e-05,
760
+ "loss": 1.123,
761
  "step": 1000
762
  },
763
  {
764
+ "epoch": 1.31,
765
+ "eval_loss": 1.1728639602661133,
766
+ "eval_runtime": 18.8741,
767
+ "eval_samples_per_second": 26.12,
768
+ "eval_steps_per_second": 1.642,
 
 
 
 
 
 
 
769
  "step": 1000
770
  },
771
  {
772
+ "epoch": 1.33,
773
  "learning_rate": 3e-05,
774
+ "loss": 1.1059,
775
  "step": 1010
776
  },
777
  {
778
+ "epoch": 1.34,
779
  "learning_rate": 3e-05,
780
+ "loss": 1.1061,
781
  "step": 1020
782
  },
783
  {
784
+ "epoch": 1.35,
785
  "learning_rate": 3e-05,
786
+ "loss": 1.1147,
787
  "step": 1030
788
  },
789
  {
790
+ "epoch": 1.36,
791
  "learning_rate": 3e-05,
792
+ "loss": 1.1322,
793
  "step": 1040
794
  },
795
  {
796
+ "epoch": 1.38,
797
  "learning_rate": 3e-05,
798
+ "loss": 1.1079,
799
  "step": 1050
800
  },
801
  {
802
+ "epoch": 1.38,
803
+ "eval_loss": 1.1694797277450562,
804
+ "eval_runtime": 18.842,
805
+ "eval_samples_per_second": 26.165,
806
+ "eval_steps_per_second": 1.645,
 
 
 
 
 
 
 
807
  "step": 1050
808
  },
809
  {
810
+ "epoch": 1.39,
811
  "learning_rate": 3e-05,
812
+ "loss": 1.0826,
813
  "step": 1060
814
  },
815
  {
816
+ "epoch": 1.4,
817
  "learning_rate": 3e-05,
818
+ "loss": 1.1194,
819
  "step": 1070
820
  },
821
  {
822
+ "epoch": 1.42,
823
  "learning_rate": 3e-05,
824
+ "loss": 1.1398,
825
  "step": 1080
826
  },
827
  {
828
+ "epoch": 1.43,
829
  "learning_rate": 3e-05,
830
+ "loss": 1.11,
831
  "step": 1090
832
  },
833
  {
834
+ "epoch": 1.44,
835
  "learning_rate": 3e-05,
836
+ "loss": 1.12,
837
  "step": 1100
838
  },
839
  {
840
+ "epoch": 1.44,
841
+ "eval_loss": 1.1707435846328735,
842
+ "eval_runtime": 19.0362,
843
+ "eval_samples_per_second": 25.898,
844
+ "eval_steps_per_second": 1.628,
 
 
 
 
 
 
 
845
  "step": 1100
846
  },
847
  {
848
+ "epoch": 1.46,
849
  "learning_rate": 3e-05,
850
+ "loss": 1.0891,
851
  "step": 1110
852
  },
853
  {
854
+ "epoch": 1.47,
855
  "learning_rate": 3e-05,
856
+ "loss": 1.1216,
857
  "step": 1120
858
  },
859
  {
860
+ "epoch": 1.48,
861
  "learning_rate": 3e-05,
862
+ "loss": 1.1122,
863
  "step": 1130
864
  },
865
  {
866
+ "epoch": 1.5,
867
  "learning_rate": 3e-05,
868
+ "loss": 1.1065,
869
  "step": 1140
870
  },
871
  {
872
+ "epoch": 1.51,
873
  "learning_rate": 3e-05,
874
+ "loss": 1.1288,
875
  "step": 1150
876
  },
877
  {
878
+ "epoch": 1.51,
879
+ "eval_loss": 1.1693464517593384,
880
+ "eval_runtime": 19.1181,
881
+ "eval_samples_per_second": 25.787,
882
+ "eval_steps_per_second": 1.622,
 
 
 
 
 
 
 
883
  "step": 1150
884
  },
885
  {
886
+ "epoch": 1.52,
887
  "learning_rate": 3e-05,
888
+ "loss": 1.1145,
889
  "step": 1160
890
  },
891
  {
892
+ "epoch": 1.54,
893
  "learning_rate": 3e-05,
894
+ "loss": 1.0812,
895
  "step": 1170
896
  },
897
  {
898
+ "epoch": 1.55,
899
  "learning_rate": 3e-05,
900
+ "loss": 1.1291,
901
  "step": 1180
902
  },
903
  {
904
+ "epoch": 1.56,
905
  "learning_rate": 3e-05,
906
+ "loss": 1.1114,
907
  "step": 1190
908
  },
909
  {
910
+ "epoch": 1.57,
911
  "learning_rate": 3e-05,
912
+ "loss": 1.133,
913
  "step": 1200
914
  },
915
  {
916
+ "epoch": 1.57,
917
+ "eval_loss": 1.1675716638565063,
918
+ "eval_runtime": 19.1116,
919
+ "eval_samples_per_second": 25.796,
920
+ "eval_steps_per_second": 1.622,
 
 
 
 
 
 
 
921
  "step": 1200
922
  },
923
  {
924
+ "epoch": 1.59,
925
  "learning_rate": 3e-05,
926
+ "loss": 1.0918,
927
  "step": 1210
928
  },
929
  {
930
+ "epoch": 1.6,
931
  "learning_rate": 3e-05,
932
+ "loss": 1.1009,
933
  "step": 1220
934
  },
935
  {
936
+ "epoch": 1.61,
937
  "learning_rate": 3e-05,
938
+ "loss": 1.1279,
939
  "step": 1230
940
  },
941
  {
942
+ "epoch": 1.63,
943
  "learning_rate": 3e-05,
944
+ "loss": 1.1314,
945
  "step": 1240
946
  },
947
  {
948
+ "epoch": 1.64,
949
  "learning_rate": 3e-05,
950
+ "loss": 1.1647,
951
  "step": 1250
952
  },
953
  {
954
+ "epoch": 1.64,
955
+ "eval_loss": 1.1693305969238281,
956
+ "eval_runtime": 18.9258,
957
+ "eval_samples_per_second": 26.049,
958
+ "eval_steps_per_second": 1.638,
 
 
 
 
 
 
 
959
  "step": 1250
960
  },
961
  {
962
+ "epoch": 1.65,
963
  "learning_rate": 3e-05,
964
+ "loss": 1.0633,
965
  "step": 1260
966
  },
967
  {
968
+ "epoch": 1.67,
969
  "learning_rate": 3e-05,
970
+ "loss": 1.0961,
971
  "step": 1270
972
  },
973
  {
974
+ "epoch": 1.68,
975
  "learning_rate": 3e-05,
976
+ "loss": 1.1106,
977
  "step": 1280
978
  },
979
  {
980
+ "epoch": 1.69,
981
  "learning_rate": 3e-05,
982
+ "loss": 1.1233,
983
  "step": 1290
984
  },
985
  {
986
+ "epoch": 1.71,
987
  "learning_rate": 3e-05,
988
+ "loss": 1.1269,
989
  "step": 1300
990
  },
991
  {
992
+ "epoch": 1.71,
993
+ "eval_loss": 1.1658315658569336,
994
+ "eval_runtime": 18.972,
995
+ "eval_samples_per_second": 25.986,
996
+ "eval_steps_per_second": 1.634,
 
 
 
 
 
 
 
997
  "step": 1300
998
  },
999
  {
1000
+ "epoch": 1.72,
1001
  "learning_rate": 3e-05,
1002
+ "loss": 1.0683,
1003
  "step": 1310
1004
  },
1005
  {
1006
+ "epoch": 1.73,
1007
  "learning_rate": 3e-05,
1008
+ "loss": 1.1079,
1009
  "step": 1320
1010
  },
1011
  {
1012
+ "epoch": 1.75,
1013
  "learning_rate": 3e-05,
1014
+ "loss": 1.1367,
1015
  "step": 1330
1016
  },
1017
  {
1018
+ "epoch": 1.76,
1019
  "learning_rate": 3e-05,
1020
+ "loss": 1.1077,
1021
  "step": 1340
1022
  },
1023
  {
1024
+ "epoch": 1.77,
1025
  "learning_rate": 3e-05,
1026
+ "loss": 1.1332,
1027
  "step": 1350
1028
  },
1029
  {
1030
+ "epoch": 1.77,
1031
+ "eval_loss": 1.1656816005706787,
1032
+ "eval_runtime": 19.0244,
1033
+ "eval_samples_per_second": 25.914,
1034
+ "eval_steps_per_second": 1.629,
 
 
 
 
 
 
 
1035
  "step": 1350
1036
  },
1037
  {
1038
+ "epoch": 1.78,
1039
  "learning_rate": 3e-05,
1040
+ "loss": 1.0921,
1041
  "step": 1360
1042
  },
1043
  {
1044
+ "epoch": 1.8,
1045
  "learning_rate": 3e-05,
1046
+ "loss": 1.0669,
1047
  "step": 1370
1048
  },
1049
  {
1050
+ "epoch": 1.81,
1051
  "learning_rate": 3e-05,
1052
+ "loss": 1.1185,
1053
  "step": 1380
1054
  },
1055
  {
1056
+ "epoch": 1.82,
1057
  "learning_rate": 3e-05,
1058
+ "loss": 1.108,
1059
  "step": 1390
1060
  },
1061
  {
1062
+ "epoch": 1.84,
1063
  "learning_rate": 3e-05,
1064
+ "loss": 1.1276,
1065
  "step": 1400
1066
  },
1067
  {
1068
+ "epoch": 1.84,
1069
+ "eval_loss": 1.1681002378463745,
1070
+ "eval_runtime": 18.4913,
1071
+ "eval_samples_per_second": 26.661,
1072
+ "eval_steps_per_second": 1.676,
 
 
 
 
 
 
 
1073
  "step": 1400
1074
  },
1075
  {
1076
+ "epoch": 1.85,
1077
  "learning_rate": 3e-05,
1078
+ "loss": 1.0666,
1079
  "step": 1410
1080
  },
1081
  {
1082
+ "epoch": 1.86,
1083
  "learning_rate": 3e-05,
1084
+ "loss": 1.1286,
1085
  "step": 1420
1086
  },
1087
  {
1088
+ "epoch": 1.88,
1089
  "learning_rate": 3e-05,
1090
+ "loss": 1.1286,
1091
  "step": 1430
1092
  },
1093
  {
1094
+ "epoch": 1.89,
1095
  "learning_rate": 3e-05,
1096
+ "loss": 1.0967,
1097
  "step": 1440
1098
  },
1099
  {
1100
+ "epoch": 1.9,
1101
  "learning_rate": 3e-05,
1102
+ "loss": 1.1361,
1103
  "step": 1450
1104
  },
1105
  {
1106
+ "epoch": 1.9,
1107
+ "eval_loss": 1.1633367538452148,
1108
+ "eval_runtime": 18.9278,
1109
+ "eval_samples_per_second": 26.046,
1110
+ "eval_steps_per_second": 1.638,
 
 
 
 
 
 
 
1111
  "step": 1450
1112
  },
1113
  {
1114
+ "epoch": 1.92,
1115
+ "learning_rate": 3e-05,
1116
+ "loss": 1.0907,
1117
+ "step": 1460
1118
+ },
1119
+ {
1120
+ "epoch": 1.93,
1121
+ "learning_rate": 3e-05,
1122
+ "loss": 1.1137,
1123
+ "step": 1470
1124
+ },
1125
+ {
1126
+ "epoch": 1.94,
1127
+ "learning_rate": 3e-05,
1128
+ "loss": 1.125,
1129
+ "step": 1480
1130
+ },
1131
+ {
1132
+ "epoch": 1.96,
1133
+ "learning_rate": 3e-05,
1134
+ "loss": 1.1047,
1135
+ "step": 1490
1136
+ },
1137
+ {
1138
+ "epoch": 1.97,
1139
+ "learning_rate": 3e-05,
1140
+ "loss": 1.1205,
1141
+ "step": 1500
1142
+ },
1143
+ {
1144
+ "epoch": 1.97,
1145
+ "eval_loss": 1.1639732122421265,
1146
+ "eval_runtime": 19.1908,
1147
+ "eval_samples_per_second": 25.689,
1148
+ "eval_steps_per_second": 1.615,
1149
+ "step": 1500
1150
+ },
1151
+ {
1152
+ "epoch": 1.98,
1153
+ "learning_rate": 3e-05,
1154
+ "loss": 1.0902,
1155
+ "step": 1510
1156
+ },
1157
+ {
1158
+ "epoch": 1.99,
1159
+ "learning_rate": 3e-05,
1160
+ "loss": 1.0794,
1161
+ "step": 1520
1162
+ },
1163
+ {
1164
+ "epoch": 2.0,
1165
+ "step": 1524,
1166
+ "total_flos": 2.2001611985471406e+18,
1167
+ "train_loss": 1.1488912840840697,
1168
+ "train_runtime": 10306.9039,
1169
+ "train_samples_per_second": 9.463,
1170
+ "train_steps_per_second": 0.148
1171
  }
1172
  ],
1173
  "logging_steps": 10,
1174
+ "max_steps": 1524,
1175
  "num_input_tokens_seen": 0,
1176
  "num_train_epochs": 2,
1177
+ "save_steps": 100,
1178
+ "total_flos": 2.2001611985471406e+18,
1179
  "train_batch_size": 4,
1180
  "trial_name": null,
1181
  "trial_params": null