Matthew Finlayson commited on
Commit
b39dcfe
1 Parent(s): 47268a8

adding model

Browse files
README.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - generated_from_trainer
5
+ metrics:
6
+ - accuracy
7
+ model-index:
8
+ - name: output
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # output
16
+
17
+ This model is a fine-tuned version of [EleutherAI/gpt-neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B) on the [Lila dataset](https://github.com/allenai/Lila).
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 0.5884
20
+ - Accuracy: 0.8664
21
+
22
+ ## Model description
23
+
24
+ More information needed
25
+
26
+ ## Intended uses & limitations
27
+
28
+ More information needed
29
+
30
+ ## Training and evaluation data
31
+
32
+ More information needed
33
+
34
+ ## Training procedure
35
+
36
+ ### Training hyperparameters
37
+
38
+ The following hyperparameters were used during training:
39
+ - learning_rate: 5e-05
40
+ - train_batch_size: 4
41
+ - eval_batch_size: 4
42
+ - seed: 42
43
+ - distributed_type: multi-GPU
44
+ - num_devices: 2
45
+ - total_train_batch_size: 8
46
+ - total_eval_batch_size: 8
47
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
+ - lr_scheduler_type: linear
49
+ - num_epochs: 10.0
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
54
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|
55
+ | No log | 0.06 | 100 | 0.7930 | 0.8214 |
56
+ | No log | 0.11 | 200 | 0.7544 | 0.8290 |
57
+ | No log | 0.17 | 300 | 0.7358 | 0.8328 |
58
+ | No log | 0.23 | 400 | 0.7192 | 0.8357 |
59
+ | 0.8156 | 0.28 | 500 | 0.7012 | 0.8397 |
60
+ | 0.8156 | 0.34 | 600 | 0.6904 | 0.8419 |
61
+ | 0.8156 | 0.4 | 700 | 0.6802 | 0.8440 |
62
+ | 0.8156 | 0.45 | 800 | 0.6670 | 0.8465 |
63
+ | 0.8156 | 0.51 | 900 | 0.6572 | 0.8486 |
64
+ | 0.7219 | 0.57 | 1000 | 0.6499 | 0.8500 |
65
+ | 0.7219 | 0.62 | 1100 | 0.6411 | 0.8522 |
66
+ | 0.7219 | 0.68 | 1200 | 0.6343 | 0.8537 |
67
+ | 0.7219 | 0.74 | 1300 | 0.6299 | 0.8546 |
68
+ | 0.7219 | 0.79 | 1400 | 0.6221 | 0.8561 |
69
+ | 0.662 | 0.85 | 1500 | 0.6157 | 0.8574 |
70
+ | 0.662 | 0.91 | 1600 | 0.6138 | 0.8579 |
71
+ | 0.662 | 0.96 | 1700 | 0.6055 | 0.8595 |
72
+ | 0.662 | 1.02 | 1800 | 0.6143 | 0.8598 |
73
+ | 0.662 | 1.08 | 1900 | 0.6191 | 0.8599 |
74
+ | 0.5707 | 1.14 | 2000 | 0.6118 | 0.8607 |
75
+ | 0.5707 | 1.19 | 2100 | 0.6123 | 0.8611 |
76
+ | 0.5707 | 1.25 | 2200 | 0.6089 | 0.8617 |
77
+ | 0.5707 | 1.31 | 2300 | 0.6064 | 0.8619 |
78
+ | 0.5707 | 1.36 | 2400 | 0.6079 | 0.8625 |
79
+ | 0.4923 | 1.42 | 2500 | 0.6040 | 0.8625 |
80
+ | 0.4923 | 1.48 | 2600 | 0.6030 | 0.8630 |
81
+ | 0.4923 | 1.53 | 2700 | 0.6021 | 0.8636 |
82
+ | 0.4923 | 1.59 | 2800 | 0.6001 | 0.8643 |
83
+ | 0.4923 | 1.65 | 2900 | 0.5981 | 0.8644 |
84
+ | 0.4909 | 1.7 | 3000 | 0.5942 | 0.8648 |
85
+ | 0.4909 | 1.76 | 3100 | 0.5918 | 0.8650 |
86
+ | 0.4909 | 1.82 | 3200 | 0.5923 | 0.8659 |
87
+ | 0.4909 | 1.87 | 3300 | 0.5884 | 0.8664 |
88
+ | 0.4909 | 1.93 | 3400 | 0.5884 | 0.8663 |
89
+ | 0.4964 | 1.99 | 3500 | 0.5903 | 0.8669 |
90
+ | 0.4964 | 2.04 | 3600 | 0.6421 | 0.8655 |
91
+ | 0.4964 | 2.1 | 3700 | 0.6401 | 0.8651 |
92
+ | 0.4964 | 2.16 | 3800 | 0.6411 | 0.8649 |
93
+ | 0.4964 | 2.21 | 3900 | 0.6387 | 0.8645 |
94
+ | 0.345 | 2.27 | 4000 | 0.6362 | 0.8654 |
95
+ | 0.345 | 2.33 | 4100 | 0.6362 | 0.8654 |
96
+ | 0.345 | 2.38 | 4200 | 0.6362 | 0.8654 |
97
+ | 0.345 | 2.44 | 4300 | 0.6357 | 0.8655 |
98
+ | 0.345 | 2.5 | 4400 | 0.6362 | 0.8656 |
99
+ | 0.3463 | 2.55 | 4500 | 0.6377 | 0.8658 |
100
+ | 0.3463 | 2.61 | 4600 | 0.6357 | 0.8660 |
101
+ | 0.3463 | 2.67 | 4700 | 0.6294 | 0.8665 |
102
+ | 0.3463 | 2.72 | 4800 | 0.6333 | 0.8665 |
103
+ | 0.3463 | 2.78 | 4900 | 0.6362 | 0.8662 |
104
+ | 0.3508 | 2.84 | 5000 | 0.6357 | 0.8666 |
105
+ | 0.3508 | 2.89 | 5100 | 0.6299 | 0.8673 |
106
+ | 0.3508 | 2.95 | 5200 | 0.6313 | 0.8668 |
107
+ | 0.3508 | 3.01 | 5300 | 0.7188 | 0.8646 |
108
+ | 0.3508 | 3.06 | 5400 | 0.7017 | 0.8656 |
109
+ | 0.295 | 3.12 | 5500 | 0.6982 | 0.8653 |
110
+ | 0.295 | 3.18 | 5600 | 0.7031 | 0.8655 |
111
+ | 0.295 | 3.23 | 5700 | 0.6992 | 0.8651 |
112
+ | 0.295 | 3.29 | 5800 | 0.6997 | 0.8653 |
113
+ | 0.295 | 3.35 | 5900 | 0.7041 | 0.8651 |
114
+ | 0.2348 | 3.41 | 6000 | 0.7075 | 0.8649 |
115
+ | 0.2348 | 3.46 | 6100 | 0.6992 | 0.8650 |
116
+ | 0.2348 | 3.52 | 6200 | 0.7065 | 0.8647 |
117
+ | 0.2348 | 3.58 | 6300 | 0.6997 | 0.8652 |
118
+ | 0.2348 | 3.63 | 6400 | 0.7026 | 0.8651 |
119
+ | 0.2411 | 3.69 | 6500 | 0.7046 | 0.8656 |
120
+ | 0.2411 | 3.75 | 6600 | 0.7007 | 0.8655 |
121
+ | 0.2411 | 3.8 | 6700 | 0.7026 | 0.8651 |
122
+ | 0.2411 | 3.86 | 6800 | 0.7031 | 0.8655 |
123
+ | 0.2411 | 3.92 | 6900 | 0.7012 | 0.8658 |
124
+ | 0.251 | 3.97 | 7000 | 0.7051 | 0.8656 |
125
+ | 0.251 | 4.03 | 7100 | 0.7607 | 0.8650 |
126
+ | 0.251 | 4.09 | 7200 | 0.7632 | 0.8656 |
127
+ | 0.251 | 4.14 | 7300 | 0.7588 | 0.8655 |
128
+ | 0.251 | 4.2 | 7400 | 0.7578 | 0.8651 |
129
+ | 0.1797 | 4.26 | 7500 | 0.7710 | 0.8645 |
130
+ | 0.1797 | 4.31 | 7600 | 0.7627 | 0.8648 |
131
+ | 0.1797 | 4.37 | 7700 | 0.7583 | 0.8650 |
132
+ | 0.1797 | 4.43 | 7800 | 0.7646 | 0.8649 |
133
+ | 0.1797 | 4.48 | 7900 | 0.7598 | 0.8646 |
134
+ | 0.1784 | 4.54 | 8000 | 0.7656 | 0.8650 |
135
+ | 0.1784 | 4.6 | 8100 | 0.7617 | 0.8648 |
136
+ | 0.1784 | 4.65 | 8200 | 0.7573 | 0.8651 |
137
+ | 0.1784 | 4.71 | 8300 | 0.7671 | 0.8648 |
138
+ | 0.1784 | 4.77 | 8400 | 0.7563 | 0.8651 |
139
+ | 0.1827 | 4.82 | 8500 | 0.7651 | 0.8649 |
140
+ | 0.1827 | 4.88 | 8600 | 0.7637 | 0.8650 |
141
+ | 0.1827 | 4.94 | 8700 | 0.7607 | 0.8654 |
142
+ | 0.1827 | 4.99 | 8800 | 0.7607 | 0.8650 |
143
+ | 0.1827 | 5.05 | 8900 | 0.8149 | 0.8646 |
144
+ | 0.167 | 5.11 | 9000 | 0.8081 | 0.8648 |
145
+ | 0.167 | 5.16 | 9100 | 0.8184 | 0.8644 |
146
+ | 0.167 | 5.22 | 9200 | 0.8140 | 0.8647 |
147
+ | 0.167 | 5.28 | 9300 | 0.8169 | 0.8644 |
148
+ | 0.167 | 5.33 | 9400 | 0.8120 | 0.8645 |
149
+ | 0.1371 | 5.39 | 9500 | 0.8154 | 0.8643 |
150
+ | 0.1371 | 5.45 | 9600 | 0.8179 | 0.8642 |
151
+ | 0.1371 | 5.51 | 9700 | 0.8154 | 0.8643 |
152
+ | 0.1371 | 5.56 | 9800 | 0.8120 | 0.8645 |
153
+ | 0.1371 | 5.62 | 9900 | 0.8110 | 0.8650 |
154
+ | 0.1425 | 5.68 | 10000 | 0.8159 | 0.8645 |
155
+ | 0.1425 | 5.73 | 10100 | 0.8174 | 0.8646 |
156
+ | 0.1425 | 5.79 | 10200 | 0.8159 | 0.8649 |
157
+ | 0.1425 | 5.85 | 10300 | 0.8110 | 0.8639 |
158
+ | 0.1425 | 5.9 | 10400 | 0.8135 | 0.8645 |
159
+ | 0.1505 | 5.96 | 10500 | 0.8140 | 0.8642 |
160
+ | 0.1505 | 6.02 | 10600 | 0.8628 | 0.8640 |
161
+ | 0.1505 | 6.07 | 10700 | 0.8540 | 0.8644 |
162
+ | 0.1505 | 6.13 | 10800 | 0.8530 | 0.8642 |
163
+ | 0.1505 | 6.19 | 10900 | 0.8560 | 0.8647 |
164
+ | 0.1086 | 6.24 | 11000 | 0.8555 | 0.8649 |
165
+ | 0.1086 | 6.3 | 11100 | 0.8604 | 0.8644 |
166
+ | 0.1086 | 6.36 | 11200 | 0.8569 | 0.8642 |
167
+ | 0.1086 | 6.41 | 11300 | 0.8530 | 0.8639 |
168
+ | 0.1086 | 6.47 | 11400 | 0.8589 | 0.8643 |
169
+ | 0.1076 | 6.53 | 11500 | 0.8525 | 0.8639 |
170
+ | 0.1076 | 6.58 | 11600 | 0.8579 | 0.8640 |
171
+ | 0.1076 | 6.64 | 11700 | 0.8594 | 0.8640 |
172
+ | 0.1076 | 6.7 | 11800 | 0.8599 | 0.8643 |
173
+ | 0.1076 | 6.75 | 11900 | 0.8564 | 0.8640 |
174
+ | 0.1109 | 6.81 | 12000 | 0.8633 | 0.8640 |
175
+ | 0.1109 | 6.87 | 12100 | 0.8584 | 0.8638 |
176
+ | 0.1109 | 6.92 | 12200 | 0.8647 | 0.8636 |
177
+ | 0.1109 | 6.98 | 12300 | 0.8599 | 0.8635 |
178
+ | 0.1109 | 7.04 | 12400 | 0.8979 | 0.8632 |
179
+ | 0.1028 | 7.09 | 12500 | 0.8936 | 0.8635 |
180
+ | 0.1028 | 7.15 | 12600 | 0.9043 | 0.8637 |
181
+ | 0.1028 | 7.21 | 12700 | 0.8989 | 0.8642 |
182
+ | 0.1028 | 7.26 | 12800 | 0.8936 | 0.8642 |
183
+ | 0.1028 | 7.32 | 12900 | 0.8921 | 0.8641 |
184
+ | 0.0774 | 7.38 | 13000 | 0.8955 | 0.8634 |
185
+ | 0.0774 | 7.43 | 13100 | 0.8950 | 0.8636 |
186
+ | 0.0774 | 7.49 | 13200 | 0.8994 | 0.8635 |
187
+ | 0.0774 | 7.55 | 13300 | 0.8999 | 0.8635 |
188
+ | 0.0774 | 7.6 | 13400 | 0.8936 | 0.8631 |
189
+ | 0.0852 | 7.66 | 13500 | 0.9048 | 0.8634 |
190
+ | 0.0852 | 7.72 | 13600 | 0.8960 | 0.8632 |
191
+ | 0.0852 | 7.78 | 13700 | 0.9023 | 0.8635 |
192
+ | 0.0852 | 7.83 | 13800 | 0.8984 | 0.8638 |
193
+ | 0.0852 | 7.89 | 13900 | 0.9019 | 0.8635 |
194
+ | 0.0879 | 7.95 | 14000 | 0.9014 | 0.8634 |
195
+ | 0.0879 | 8.0 | 14100 | 0.9136 | 0.8630 |
196
+ | 0.0879 | 8.06 | 14200 | 0.9312 | 0.8639 |
197
+ | 0.0879 | 8.12 | 14300 | 0.9346 | 0.8635 |
198
+ | 0.0879 | 8.17 | 14400 | 0.9307 | 0.8635 |
199
+ | 0.0611 | 8.23 | 14500 | 0.9419 | 0.8641 |
200
+ | 0.0611 | 8.29 | 14600 | 0.9331 | 0.8631 |
201
+ | 0.0611 | 8.34 | 14700 | 0.9375 | 0.8636 |
202
+ | 0.0611 | 8.4 | 14800 | 0.9292 | 0.8626 |
203
+ | 0.0611 | 8.46 | 14900 | 0.9458 | 0.8637 |
204
+ | 0.061 | 8.51 | 15000 | 0.9336 | 0.8634 |
205
+ | 0.061 | 8.57 | 15100 | 0.9409 | 0.8630 |
206
+ | 0.061 | 8.63 | 15200 | 0.9390 | 0.8632 |
207
+ | 0.061 | 8.68 | 15300 | 0.9375 | 0.8628 |
208
+ | 0.061 | 8.74 | 15400 | 0.9365 | 0.8630 |
209
+ | 0.0646 | 8.8 | 15500 | 0.9370 | 0.8628 |
210
+ | 0.0646 | 8.85 | 15600 | 0.9355 | 0.8629 |
211
+ | 0.0646 | 8.91 | 15700 | 0.9375 | 0.8632 |
212
+ | 0.0646 | 8.97 | 15800 | 0.9390 | 0.8630 |
213
+ | 0.0646 | 9.02 | 15900 | 0.9717 | 0.8630 |
214
+ | 0.0593 | 9.08 | 16000 | 0.9673 | 0.8626 |
215
+ | 0.0593 | 9.14 | 16100 | 0.9644 | 0.8630 |
216
+ | 0.0593 | 9.19 | 16200 | 0.9624 | 0.8631 |
217
+ | 0.0593 | 9.25 | 16300 | 0.9648 | 0.8633 |
218
+ | 0.0593 | 9.31 | 16400 | 0.9673 | 0.8632 |
219
+ | 0.0415 | 9.36 | 16500 | 0.9658 | 0.8633 |
220
+ | 0.0415 | 9.42 | 16600 | 0.9688 | 0.8628 |
221
+ | 0.0415 | 9.48 | 16700 | 0.9653 | 0.8632 |
222
+ | 0.0415 | 9.53 | 16800 | 0.9658 | 0.8628 |
223
+ | 0.0415 | 9.59 | 16900 | 0.9668 | 0.8629 |
224
+ | 0.0471 | 9.65 | 17000 | 0.9604 | 0.8625 |
225
+ | 0.0471 | 9.7 | 17100 | 0.9658 | 0.8621 |
226
+ | 0.0471 | 9.76 | 17200 | 0.9731 | 0.8630 |
227
+ | 0.0471 | 9.82 | 17300 | 0.9692 | 0.8626 |
228
+ | 0.0471 | 9.88 | 17400 | 0.9673 | 0.8623 |
229
+ | 0.0528 | 9.93 | 17500 | 0.9614 | 0.8620 |
230
+ | 0.0528 | 9.99 | 17600 | 0.9697 | 0.8621 |
231
+
232
+
233
+ ### Framework versions
234
+
235
+ - Transformers 4.21.0.dev0
236
+ - Pytorch 1.12.1+cu113
237
+ - Datasets 2.4.0
238
+ - Tokenizers 0.12.1
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8663780555904803,
4
+ "eval_loss": 0.58837890625,
5
+ "eval_runtime": 291.8211,
6
+ "eval_samples": 4315,
7
+ "eval_samples_per_second": 14.786,
8
+ "eval_steps_per_second": 1.85,
9
+ "perplexity": 1.8010663501633464,
10
+ "train_loss": 0.2421213565700847,
11
+ "train_runtime": 122603.1424,
12
+ "train_samples": 14090,
13
+ "train_samples_per_second": 1.149,
14
+ "train_steps_per_second": 0.144
15
+ }
config.json ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "EleutherAI/gpt-neo-2.7B",
3
+ "activation_function": "gelu_new",
4
+ "architectures": [
5
+ "GPTNeoForCausalLM"
6
+ ],
7
+ "attention_dropout": 0,
8
+ "attention_layers": [
9
+ "global",
10
+ "local",
11
+ "global",
12
+ "local",
13
+ "global",
14
+ "local",
15
+ "global",
16
+ "local",
17
+ "global",
18
+ "local",
19
+ "global",
20
+ "local",
21
+ "global",
22
+ "local",
23
+ "global",
24
+ "local",
25
+ "global",
26
+ "local",
27
+ "global",
28
+ "local",
29
+ "global",
30
+ "local",
31
+ "global",
32
+ "local",
33
+ "global",
34
+ "local",
35
+ "global",
36
+ "local",
37
+ "global",
38
+ "local",
39
+ "global",
40
+ "local"
41
+ ],
42
+ "attention_types": [
43
+ [
44
+ [
45
+ "global",
46
+ "local"
47
+ ],
48
+ 16
49
+ ]
50
+ ],
51
+ "bos_token_id": 50256,
52
+ "embed_dropout": 0,
53
+ "eos_token_id": 50256,
54
+ "gradient_checkpointing": false,
55
+ "hidden_size": 2560,
56
+ "initializer_range": 0.02,
57
+ "intermediate_size": null,
58
+ "layer_norm_epsilon": 1e-05,
59
+ "max_position_embeddings": 2048,
60
+ "model_type": "gpt_neo",
61
+ "num_heads": 20,
62
+ "num_layers": 32,
63
+ "resid_dropout": 0,
64
+ "summary_activation": null,
65
+ "summary_first_dropout": 0.1,
66
+ "summary_proj_to_labels": true,
67
+ "summary_type": "cls_index",
68
+ "summary_use_proj": true,
69
+ "task_specific_params": {
70
+ "text-generation": {
71
+ "do_sample": true,
72
+ "max_length": 50,
73
+ "temperature": 0.9
74
+ }
75
+ },
76
+ "tokenizer_class": "GPT2Tokenizer",
77
+ "torch_dtype": "float16",
78
+ "transformers_version": "4.21.0.dev0",
79
+ "use_cache": true,
80
+ "vocab_size": 50257,
81
+ "window_size": 256
82
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8663780555904803,
4
+ "eval_loss": 0.58837890625,
5
+ "eval_runtime": 291.8211,
6
+ "eval_samples": 4315,
7
+ "eval_samples_per_second": 14.786,
8
+ "eval_steps_per_second": 1.85,
9
+ "perplexity": 1.8010663501633464
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:078fb54c98e33d61f8d121f0b99e7bbbe3d79b6dd9f40fe7c021bedf08162fdf
3
+ size 5436910218
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "unk_token": "<|endoftext|>"
5
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "eos_token": {
13
+ "__type": "AddedToken",
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "errors": "replace",
21
+ "model_max_length": 2048,
22
+ "name_or_path": "EleutherAI/gpt-neo-2.7B",
23
+ "pad_token": null,
24
+ "special_tokens_map_file": null,
25
+ "tokenizer_class": "GPT2Tokenizer",
26
+ "unk_token": {
27
+ "__type": "AddedToken",
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 0.2421213565700847,
4
+ "train_runtime": 122603.1424,
5
+ "train_samples": 14090,
6
+ "train_samples_per_second": 1.149,
7
+ "train_steps_per_second": 0.144
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1819 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.58837890625,
3
+ "best_model_checkpoint": "/output/checkpoint-3300",
4
+ "epoch": 10.0,
5
+ "global_step": 17620,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.06,
12
+ "eval_accuracy": 0.8214358287770616,
13
+ "eval_loss": 0.79296875,
14
+ "eval_runtime": 301.7299,
15
+ "eval_samples_per_second": 14.301,
16
+ "eval_steps_per_second": 1.79,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.11,
21
+ "eval_accuracy": 0.828983212304709,
22
+ "eval_loss": 0.75439453125,
23
+ "eval_runtime": 295.5652,
24
+ "eval_samples_per_second": 14.599,
25
+ "eval_steps_per_second": 1.827,
26
+ "step": 200
27
+ },
28
+ {
29
+ "epoch": 0.17,
30
+ "eval_accuracy": 0.83275463867547,
31
+ "eval_loss": 0.73583984375,
32
+ "eval_runtime": 297.8323,
33
+ "eval_samples_per_second": 14.488,
34
+ "eval_steps_per_second": 1.813,
35
+ "step": 300
36
+ },
37
+ {
38
+ "epoch": 0.23,
39
+ "eval_accuracy": 0.8357279670702464,
40
+ "eval_loss": 0.71923828125,
41
+ "eval_runtime": 297.4382,
42
+ "eval_samples_per_second": 14.507,
43
+ "eval_steps_per_second": 1.816,
44
+ "step": 400
45
+ },
46
+ {
47
+ "epoch": 0.28,
48
+ "learning_rate": 5e-05,
49
+ "loss": 0.8156,
50
+ "step": 500
51
+ },
52
+ {
53
+ "epoch": 0.28,
54
+ "eval_accuracy": 0.8396754144819782,
55
+ "eval_loss": 0.701171875,
56
+ "eval_runtime": 295.955,
57
+ "eval_samples_per_second": 14.58,
58
+ "eval_steps_per_second": 1.825,
59
+ "step": 500
60
+ },
61
+ {
62
+ "epoch": 0.34,
63
+ "eval_accuracy": 0.8418578941585707,
64
+ "eval_loss": 0.6904296875,
65
+ "eval_runtime": 305.3997,
66
+ "eval_samples_per_second": 14.129,
67
+ "eval_steps_per_second": 1.768,
68
+ "step": 600
69
+ },
70
+ {
71
+ "epoch": 0.4,
72
+ "eval_accuracy": 0.8439807939976145,
73
+ "eval_loss": 0.68017578125,
74
+ "eval_runtime": 303.944,
75
+ "eval_samples_per_second": 14.197,
76
+ "eval_steps_per_second": 1.777,
77
+ "step": 700
78
+ },
79
+ {
80
+ "epoch": 0.45,
81
+ "eval_accuracy": 0.8464999110833223,
82
+ "eval_loss": 0.6669921875,
83
+ "eval_runtime": 297.0827,
84
+ "eval_samples_per_second": 14.525,
85
+ "eval_steps_per_second": 1.818,
86
+ "step": 800
87
+ },
88
+ {
89
+ "epoch": 0.51,
90
+ "eval_accuracy": 0.8485815807686252,
91
+ "eval_loss": 0.6572265625,
92
+ "eval_runtime": 292.5803,
93
+ "eval_samples_per_second": 14.748,
94
+ "eval_steps_per_second": 1.846,
95
+ "step": 900
96
+ },
97
+ {
98
+ "epoch": 0.57,
99
+ "learning_rate": 5e-05,
100
+ "loss": 0.7219,
101
+ "step": 1000
102
+ },
103
+ {
104
+ "epoch": 0.57,
105
+ "eval_accuracy": 0.8500273546212319,
106
+ "eval_loss": 0.64990234375,
107
+ "eval_runtime": 292.7704,
108
+ "eval_samples_per_second": 14.739,
109
+ "eval_steps_per_second": 1.844,
110
+ "step": 1000
111
+ },
112
+ {
113
+ "epoch": 0.62,
114
+ "eval_accuracy": 0.8521808372666221,
115
+ "eval_loss": 0.64111328125,
116
+ "eval_runtime": 292.5084,
117
+ "eval_samples_per_second": 14.752,
118
+ "eval_steps_per_second": 1.846,
119
+ "step": 1100
120
+ },
121
+ {
122
+ "epoch": 0.68,
123
+ "eval_accuracy": 0.853712016437692,
124
+ "eval_loss": 0.63427734375,
125
+ "eval_runtime": 292.6573,
126
+ "eval_samples_per_second": 14.744,
127
+ "eval_steps_per_second": 1.845,
128
+ "step": 1200
129
+ },
130
+ {
131
+ "epoch": 0.74,
132
+ "eval_accuracy": 0.8545884970136456,
133
+ "eval_loss": 0.6298828125,
134
+ "eval_runtime": 291.6972,
135
+ "eval_samples_per_second": 14.793,
136
+ "eval_steps_per_second": 1.851,
137
+ "step": 1300
138
+ },
139
+ {
140
+ "epoch": 0.79,
141
+ "eval_accuracy": 0.856088866839063,
142
+ "eval_loss": 0.6220703125,
143
+ "eval_runtime": 291.6264,
144
+ "eval_samples_per_second": 14.796,
145
+ "eval_steps_per_second": 1.852,
146
+ "step": 1400
147
+ },
148
+ {
149
+ "epoch": 0.85,
150
+ "learning_rate": 5e-05,
151
+ "loss": 0.662,
152
+ "step": 1500
153
+ },
154
+ {
155
+ "epoch": 0.85,
156
+ "eval_accuracy": 0.857388975917739,
157
+ "eval_loss": 0.61572265625,
158
+ "eval_runtime": 292.8293,
159
+ "eval_samples_per_second": 14.736,
160
+ "eval_steps_per_second": 1.844,
161
+ "step": 1500
162
+ },
163
+ {
164
+ "epoch": 0.91,
165
+ "eval_accuracy": 0.8579150001868949,
166
+ "eval_loss": 0.61376953125,
167
+ "eval_runtime": 292.6424,
168
+ "eval_samples_per_second": 14.745,
169
+ "eval_steps_per_second": 1.845,
170
+ "step": 1600
171
+ },
172
+ {
173
+ "epoch": 0.96,
174
+ "eval_accuracy": 0.8595497078209298,
175
+ "eval_loss": 0.60546875,
176
+ "eval_runtime": 292.7935,
177
+ "eval_samples_per_second": 14.737,
178
+ "eval_steps_per_second": 1.844,
179
+ "step": 1700
180
+ },
181
+ {
182
+ "epoch": 1.02,
183
+ "eval_accuracy": 0.8597678651728665,
184
+ "eval_loss": 0.6142578125,
185
+ "eval_runtime": 292.6887,
186
+ "eval_samples_per_second": 14.743,
187
+ "eval_steps_per_second": 1.845,
188
+ "step": 1800
189
+ },
190
+ {
191
+ "epoch": 1.08,
192
+ "eval_accuracy": 0.8598800021294695,
193
+ "eval_loss": 0.619140625,
194
+ "eval_runtime": 291.7039,
195
+ "eval_samples_per_second": 14.792,
196
+ "eval_steps_per_second": 1.851,
197
+ "step": 1900
198
+ },
199
+ {
200
+ "epoch": 1.14,
201
+ "learning_rate": 5e-05,
202
+ "loss": 0.5707,
203
+ "step": 2000
204
+ },
205
+ {
206
+ "epoch": 1.14,
207
+ "eval_accuracy": 0.8606622423540152,
208
+ "eval_loss": 0.61181640625,
209
+ "eval_runtime": 292.661,
210
+ "eval_samples_per_second": 14.744,
211
+ "eval_steps_per_second": 1.845,
212
+ "step": 2000
213
+ },
214
+ {
215
+ "epoch": 1.19,
216
+ "eval_accuracy": 0.8611497549411055,
217
+ "eval_loss": 0.6123046875,
218
+ "eval_runtime": 292.734,
219
+ "eval_samples_per_second": 14.74,
220
+ "eval_steps_per_second": 1.845,
221
+ "step": 2100
222
+ },
223
+ {
224
+ "epoch": 1.25,
225
+ "eval_accuracy": 0.8616524456617156,
226
+ "eval_loss": 0.60888671875,
227
+ "eval_runtime": 291.6011,
228
+ "eval_samples_per_second": 14.798,
229
+ "eval_steps_per_second": 1.852,
230
+ "step": 2200
231
+ },
232
+ {
233
+ "epoch": 1.31,
234
+ "eval_accuracy": 0.8618950692587294,
235
+ "eval_loss": 0.6064453125,
236
+ "eval_runtime": 291.5809,
237
+ "eval_samples_per_second": 14.799,
238
+ "eval_steps_per_second": 1.852,
239
+ "step": 2300
240
+ },
241
+ {
242
+ "epoch": 1.36,
243
+ "eval_accuracy": 0.8625089907787176,
244
+ "eval_loss": 0.60791015625,
245
+ "eval_runtime": 292.7393,
246
+ "eval_samples_per_second": 14.74,
247
+ "eval_steps_per_second": 1.845,
248
+ "step": 2400
249
+ },
250
+ {
251
+ "epoch": 1.42,
252
+ "learning_rate": 5e-05,
253
+ "loss": 0.4923,
254
+ "step": 2500
255
+ },
256
+ {
257
+ "epoch": 1.42,
258
+ "eval_accuracy": 0.8624713852538769,
259
+ "eval_loss": 0.60400390625,
260
+ "eval_runtime": 292.7894,
261
+ "eval_samples_per_second": 14.738,
262
+ "eval_steps_per_second": 1.844,
263
+ "step": 2500
264
+ },
265
+ {
266
+ "epoch": 1.48,
267
+ "eval_accuracy": 0.8630089630276525,
268
+ "eval_loss": 0.60302734375,
269
+ "eval_runtime": 291.5842,
270
+ "eval_samples_per_second": 14.798,
271
+ "eval_steps_per_second": 1.852,
272
+ "step": 2600
273
+ },
274
+ {
275
+ "epoch": 1.53,
276
+ "eval_accuracy": 0.8636167679863714,
277
+ "eval_loss": 0.60205078125,
278
+ "eval_runtime": 291.6586,
279
+ "eval_samples_per_second": 14.795,
280
+ "eval_steps_per_second": 1.851,
281
+ "step": 2700
282
+ },
283
+ {
284
+ "epoch": 1.59,
285
+ "eval_accuracy": 0.8643006901519965,
286
+ "eval_loss": 0.60009765625,
287
+ "eval_runtime": 291.5536,
288
+ "eval_samples_per_second": 14.8,
289
+ "eval_steps_per_second": 1.852,
290
+ "step": 2800
291
+ },
292
+ {
293
+ "epoch": 1.65,
294
+ "eval_accuracy": 0.864360949607464,
295
+ "eval_loss": 0.59814453125,
296
+ "eval_runtime": 292.9127,
297
+ "eval_samples_per_second": 14.731,
298
+ "eval_steps_per_second": 1.844,
299
+ "step": 2900
300
+ },
301
+ {
302
+ "epoch": 1.7,
303
+ "learning_rate": 5e-05,
304
+ "loss": 0.4909,
305
+ "step": 3000
306
+ },
307
+ {
308
+ "epoch": 1.7,
309
+ "eval_accuracy": 0.8647961316148062,
310
+ "eval_loss": 0.59423828125,
311
+ "eval_runtime": 292.7189,
312
+ "eval_samples_per_second": 14.741,
313
+ "eval_steps_per_second": 1.845,
314
+ "step": 3000
315
+ },
316
+ {
317
+ "epoch": 1.76,
318
+ "eval_accuracy": 0.8649898227216659,
319
+ "eval_loss": 0.591796875,
320
+ "eval_runtime": 291.4828,
321
+ "eval_samples_per_second": 14.804,
322
+ "eval_steps_per_second": 1.853,
323
+ "step": 3100
324
+ },
325
+ {
326
+ "epoch": 1.82,
327
+ "eval_accuracy": 0.8659145561698547,
328
+ "eval_loss": 0.59228515625,
329
+ "eval_runtime": 291.6085,
330
+ "eval_samples_per_second": 14.797,
331
+ "eval_steps_per_second": 1.852,
332
+ "step": 3200
333
+ },
334
+ {
335
+ "epoch": 1.87,
336
+ "eval_accuracy": 0.8663780555904803,
337
+ "eval_loss": 0.58837890625,
338
+ "eval_runtime": 291.6163,
339
+ "eval_samples_per_second": 14.797,
340
+ "eval_steps_per_second": 1.852,
341
+ "step": 3300
342
+ },
343
+ {
344
+ "epoch": 1.93,
345
+ "eval_accuracy": 0.8662514201182762,
346
+ "eval_loss": 0.58837890625,
347
+ "eval_runtime": 291.7936,
348
+ "eval_samples_per_second": 14.788,
349
+ "eval_steps_per_second": 1.851,
350
+ "step": 3400
351
+ },
352
+ {
353
+ "epoch": 1.99,
354
+ "learning_rate": 5e-05,
355
+ "loss": 0.4964,
356
+ "step": 3500
357
+ },
358
+ {
359
+ "epoch": 1.99,
360
+ "eval_accuracy": 0.8669489346422774,
361
+ "eval_loss": 0.59033203125,
362
+ "eval_runtime": 291.5679,
363
+ "eval_samples_per_second": 14.799,
364
+ "eval_steps_per_second": 1.852,
365
+ "step": 3500
366
+ },
367
+ {
368
+ "epoch": 2.04,
369
+ "eval_accuracy": 0.8654691798937304,
370
+ "eval_loss": 0.64208984375,
371
+ "eval_runtime": 291.7326,
372
+ "eval_samples_per_second": 14.791,
373
+ "eval_steps_per_second": 1.851,
374
+ "step": 3600
375
+ },
376
+ {
377
+ "epoch": 2.1,
378
+ "eval_accuracy": 0.8650994677459,
379
+ "eval_loss": 0.64013671875,
380
+ "eval_runtime": 293.0337,
381
+ "eval_samples_per_second": 14.725,
382
+ "eval_steps_per_second": 1.843,
383
+ "step": 3700
384
+ },
385
+ {
386
+ "epoch": 2.16,
387
+ "eval_accuracy": 0.8649474598713937,
388
+ "eval_loss": 0.64111328125,
389
+ "eval_runtime": 291.5404,
390
+ "eval_samples_per_second": 14.801,
391
+ "eval_steps_per_second": 1.852,
392
+ "step": 3800
393
+ },
394
+ {
395
+ "epoch": 2.21,
396
+ "eval_accuracy": 0.8645410483559476,
397
+ "eval_loss": 0.638671875,
398
+ "eval_runtime": 291.6679,
399
+ "eval_samples_per_second": 14.794,
400
+ "eval_steps_per_second": 1.851,
401
+ "step": 3900
402
+ },
403
+ {
404
+ "epoch": 2.27,
405
+ "learning_rate": 5e-05,
406
+ "loss": 0.345,
407
+ "step": 4000
408
+ },
409
+ {
410
+ "epoch": 2.27,
411
+ "eval_accuracy": 0.8653572694764337,
412
+ "eval_loss": 0.63623046875,
413
+ "eval_runtime": 291.7901,
414
+ "eval_samples_per_second": 14.788,
415
+ "eval_steps_per_second": 1.851,
416
+ "step": 4000
417
+ },
418
+ {
419
+ "epoch": 2.33,
420
+ "eval_accuracy": 0.8654186616284325,
421
+ "eval_loss": 0.63623046875,
422
+ "eval_runtime": 291.6935,
423
+ "eval_samples_per_second": 14.793,
424
+ "eval_steps_per_second": 1.851,
425
+ "step": 4100
426
+ },
427
+ {
428
+ "epoch": 2.38,
429
+ "eval_accuracy": 0.8653559102405961,
430
+ "eval_loss": 0.63623046875,
431
+ "eval_runtime": 291.8601,
432
+ "eval_samples_per_second": 14.784,
433
+ "eval_steps_per_second": 1.85,
434
+ "step": 4200
435
+ },
436
+ {
437
+ "epoch": 2.44,
438
+ "eval_accuracy": 0.8654696329723429,
439
+ "eval_loss": 0.6357421875,
440
+ "eval_runtime": 292.876,
441
+ "eval_samples_per_second": 14.733,
442
+ "eval_steps_per_second": 1.844,
443
+ "step": 4300
444
+ },
445
+ {
446
+ "epoch": 2.5,
447
+ "eval_accuracy": 0.8655647794809758,
448
+ "eval_loss": 0.63623046875,
449
+ "eval_runtime": 291.8742,
450
+ "eval_samples_per_second": 14.784,
451
+ "eval_steps_per_second": 1.85,
452
+ "step": 4400
453
+ },
454
+ {
455
+ "epoch": 2.55,
456
+ "learning_rate": 5e-05,
457
+ "loss": 0.3463,
458
+ "step": 4500
459
+ },
460
+ {
461
+ "epoch": 2.55,
462
+ "eval_accuracy": 0.865790865708632,
463
+ "eval_loss": 0.6376953125,
464
+ "eval_runtime": 292.8657,
465
+ "eval_samples_per_second": 14.734,
466
+ "eval_steps_per_second": 1.844,
467
+ "step": 4500
468
+ },
469
+ {
470
+ "epoch": 2.61,
471
+ "eval_accuracy": 0.8660047188137496,
472
+ "eval_loss": 0.6357421875,
473
+ "eval_runtime": 292.8962,
474
+ "eval_samples_per_second": 14.732,
475
+ "eval_steps_per_second": 1.844,
476
+ "step": 4600
477
+ },
478
+ {
479
+ "epoch": 2.67,
480
+ "eval_accuracy": 0.8664827167499765,
481
+ "eval_loss": 0.62939453125,
482
+ "eval_runtime": 291.8634,
483
+ "eval_samples_per_second": 14.784,
484
+ "eval_steps_per_second": 1.85,
485
+ "step": 4700
486
+ },
487
+ {
488
+ "epoch": 2.72,
489
+ "eval_accuracy": 0.8664838494465078,
490
+ "eval_loss": 0.63330078125,
491
+ "eval_runtime": 292.9139,
492
+ "eval_samples_per_second": 14.731,
493
+ "eval_steps_per_second": 1.844,
494
+ "step": 4800
495
+ },
496
+ {
497
+ "epoch": 2.78,
498
+ "eval_accuracy": 0.8661707721252445,
499
+ "eval_loss": 0.63623046875,
500
+ "eval_runtime": 291.8657,
501
+ "eval_samples_per_second": 14.784,
502
+ "eval_steps_per_second": 1.85,
503
+ "step": 4900
504
+ },
505
+ {
506
+ "epoch": 2.84,
507
+ "learning_rate": 5e-05,
508
+ "loss": 0.3508,
509
+ "step": 5000
510
+ },
511
+ {
512
+ "epoch": 2.84,
513
+ "eval_accuracy": 0.8666297407597449,
514
+ "eval_loss": 0.6357421875,
515
+ "eval_runtime": 291.8826,
516
+ "eval_samples_per_second": 14.783,
517
+ "eval_steps_per_second": 1.85,
518
+ "step": 5000
519
+ },
520
+ {
521
+ "epoch": 2.89,
522
+ "eval_accuracy": 0.8673172875542703,
523
+ "eval_loss": 0.6298828125,
524
+ "eval_runtime": 291.7729,
525
+ "eval_samples_per_second": 14.789,
526
+ "eval_steps_per_second": 1.851,
527
+ "step": 5100
528
+ },
529
+ {
530
+ "epoch": 2.95,
531
+ "eval_accuracy": 0.866801004475284,
532
+ "eval_loss": 0.63134765625,
533
+ "eval_runtime": 291.9322,
534
+ "eval_samples_per_second": 14.781,
535
+ "eval_steps_per_second": 1.85,
536
+ "step": 5200
537
+ },
538
+ {
539
+ "epoch": 3.01,
540
+ "eval_accuracy": 0.8646230555848169,
541
+ "eval_loss": 0.71875,
542
+ "eval_runtime": 291.8776,
543
+ "eval_samples_per_second": 14.784,
544
+ "eval_steps_per_second": 1.85,
545
+ "step": 5300
546
+ },
547
+ {
548
+ "epoch": 3.06,
549
+ "eval_accuracy": 0.8655867537936839,
550
+ "eval_loss": 0.70166015625,
551
+ "eval_runtime": 293.0438,
552
+ "eval_samples_per_second": 14.725,
553
+ "eval_steps_per_second": 1.843,
554
+ "step": 5400
555
+ },
556
+ {
557
+ "epoch": 3.12,
558
+ "learning_rate": 5e-05,
559
+ "loss": 0.295,
560
+ "step": 5500
561
+ },
562
+ {
563
+ "epoch": 3.12,
564
+ "eval_accuracy": 0.8652775276406272,
565
+ "eval_loss": 0.6982421875,
566
+ "eval_runtime": 291.9251,
567
+ "eval_samples_per_second": 14.781,
568
+ "eval_steps_per_second": 1.85,
569
+ "step": 5500
570
+ },
571
+ {
572
+ "epoch": 3.18,
573
+ "eval_accuracy": 0.8654533221422916,
574
+ "eval_loss": 0.703125,
575
+ "eval_runtime": 292.9873,
576
+ "eval_samples_per_second": 14.728,
577
+ "eval_steps_per_second": 1.843,
578
+ "step": 5600
579
+ },
580
+ {
581
+ "epoch": 3.23,
582
+ "eval_accuracy": 0.8650992412065936,
583
+ "eval_loss": 0.69921875,
584
+ "eval_runtime": 293.1169,
585
+ "eval_samples_per_second": 14.721,
586
+ "eval_steps_per_second": 1.842,
587
+ "step": 5700
588
+ },
589
+ {
590
+ "epoch": 3.29,
591
+ "eval_accuracy": 0.8652641618215573,
592
+ "eval_loss": 0.69970703125,
593
+ "eval_runtime": 293.0467,
594
+ "eval_samples_per_second": 14.725,
595
+ "eval_steps_per_second": 1.843,
596
+ "step": 5800
597
+ },
598
+ {
599
+ "epoch": 3.35,
600
+ "eval_accuracy": 0.865102865835494,
601
+ "eval_loss": 0.7041015625,
602
+ "eval_runtime": 292.9654,
603
+ "eval_samples_per_second": 14.729,
604
+ "eval_steps_per_second": 1.843,
605
+ "step": 5900
606
+ },
607
+ {
608
+ "epoch": 3.41,
609
+ "learning_rate": 5e-05,
610
+ "loss": 0.2348,
611
+ "step": 6000
612
+ },
613
+ {
614
+ "epoch": 3.41,
615
+ "eval_accuracy": 0.8649191424581101,
616
+ "eval_loss": 0.70751953125,
617
+ "eval_runtime": 291.6546,
618
+ "eval_samples_per_second": 14.795,
619
+ "eval_steps_per_second": 1.852,
620
+ "step": 6000
621
+ },
622
+ {
623
+ "epoch": 3.46,
624
+ "eval_accuracy": 0.8649929942719536,
625
+ "eval_loss": 0.69921875,
626
+ "eval_runtime": 293.0033,
627
+ "eval_samples_per_second": 14.727,
628
+ "eval_steps_per_second": 1.843,
629
+ "step": 6100
630
+ },
631
+ {
632
+ "epoch": 3.52,
633
+ "eval_accuracy": 0.8647333802269698,
634
+ "eval_loss": 0.70654296875,
635
+ "eval_runtime": 292.0433,
636
+ "eval_samples_per_second": 14.775,
637
+ "eval_steps_per_second": 1.849,
638
+ "step": 6200
639
+ },
640
+ {
641
+ "epoch": 3.58,
642
+ "eval_accuracy": 0.8651932550186952,
643
+ "eval_loss": 0.69970703125,
644
+ "eval_runtime": 292.8338,
645
+ "eval_samples_per_second": 14.735,
646
+ "eval_steps_per_second": 1.844,
647
+ "step": 6300
648
+ },
649
+ {
650
+ "epoch": 3.63,
651
+ "eval_accuracy": 0.8651128335649698,
652
+ "eval_loss": 0.70263671875,
653
+ "eval_runtime": 291.8152,
654
+ "eval_samples_per_second": 14.787,
655
+ "eval_steps_per_second": 1.85,
656
+ "step": 6400
657
+ },
658
+ {
659
+ "epoch": 3.69,
660
+ "learning_rate": 5e-05,
661
+ "loss": 0.2411,
662
+ "step": 6500
663
+ },
664
+ {
665
+ "epoch": 3.69,
666
+ "eval_accuracy": 0.8655713491208575,
667
+ "eval_loss": 0.70458984375,
668
+ "eval_runtime": 292.9241,
669
+ "eval_samples_per_second": 14.731,
670
+ "eval_steps_per_second": 1.843,
671
+ "step": 6500
672
+ },
673
+ {
674
+ "epoch": 3.75,
675
+ "eval_accuracy": 0.8655097304295525,
676
+ "eval_loss": 0.70068359375,
677
+ "eval_runtime": 293.0902,
678
+ "eval_samples_per_second": 14.722,
679
+ "eval_steps_per_second": 1.842,
680
+ "step": 6600
681
+ },
682
+ {
683
+ "epoch": 3.8,
684
+ "eval_accuracy": 0.8651318628666963,
685
+ "eval_loss": 0.70263671875,
686
+ "eval_runtime": 292.9879,
687
+ "eval_samples_per_second": 14.728,
688
+ "eval_steps_per_second": 1.843,
689
+ "step": 6700
690
+ },
691
+ {
692
+ "epoch": 3.86,
693
+ "eval_accuracy": 0.8654825457128003,
694
+ "eval_loss": 0.703125,
695
+ "eval_runtime": 292.8939,
696
+ "eval_samples_per_second": 14.732,
697
+ "eval_steps_per_second": 1.844,
698
+ "step": 6800
699
+ },
700
+ {
701
+ "epoch": 3.92,
702
+ "eval_accuracy": 0.8657906391693256,
703
+ "eval_loss": 0.701171875,
704
+ "eval_runtime": 292.7957,
705
+ "eval_samples_per_second": 14.737,
706
+ "eval_steps_per_second": 1.844,
707
+ "step": 6900
708
+ },
709
+ {
710
+ "epoch": 3.97,
711
+ "learning_rate": 5e-05,
712
+ "loss": 0.251,
713
+ "step": 7000
714
+ },
715
+ {
716
+ "epoch": 3.97,
717
+ "eval_accuracy": 0.8656236797006056,
718
+ "eval_loss": 0.705078125,
719
+ "eval_runtime": 292.768,
720
+ "eval_samples_per_second": 14.739,
721
+ "eval_steps_per_second": 1.844,
722
+ "step": 7000
723
+ },
724
+ {
725
+ "epoch": 4.03,
726
+ "eval_accuracy": 0.8650197259100934,
727
+ "eval_loss": 0.7607421875,
728
+ "eval_runtime": 293.0243,
729
+ "eval_samples_per_second": 14.726,
730
+ "eval_steps_per_second": 1.843,
731
+ "step": 7100
732
+ },
733
+ {
734
+ "epoch": 4.09,
735
+ "eval_accuracy": 0.8655654590988946,
736
+ "eval_loss": 0.76318359375,
737
+ "eval_runtime": 293.1258,
738
+ "eval_samples_per_second": 14.721,
739
+ "eval_steps_per_second": 1.842,
740
+ "step": 7200
741
+ },
742
+ {
743
+ "epoch": 4.14,
744
+ "eval_accuracy": 0.8654986300035453,
745
+ "eval_loss": 0.7587890625,
746
+ "eval_runtime": 291.7489,
747
+ "eval_samples_per_second": 14.79,
748
+ "eval_steps_per_second": 1.851,
749
+ "step": 7300
750
+ },
751
+ {
752
+ "epoch": 4.2,
753
+ "eval_accuracy": 0.8650992412065936,
754
+ "eval_loss": 0.7578125,
755
+ "eval_runtime": 291.7365,
756
+ "eval_samples_per_second": 14.791,
757
+ "eval_steps_per_second": 1.851,
758
+ "step": 7400
759
+ },
760
+ {
761
+ "epoch": 4.26,
762
+ "learning_rate": 5e-05,
763
+ "loss": 0.1797,
764
+ "step": 7500
765
+ },
766
+ {
767
+ "epoch": 4.26,
768
+ "eval_accuracy": 0.8644635719132038,
769
+ "eval_loss": 0.77099609375,
770
+ "eval_runtime": 292.8962,
771
+ "eval_samples_per_second": 14.732,
772
+ "eval_steps_per_second": 1.844,
773
+ "step": 7500
774
+ },
775
+ {
776
+ "epoch": 4.31,
777
+ "eval_accuracy": 0.8648369086899346,
778
+ "eval_loss": 0.7626953125,
779
+ "eval_runtime": 291.9443,
780
+ "eval_samples_per_second": 14.78,
781
+ "eval_steps_per_second": 1.85,
782
+ "step": 7600
783
+ },
784
+ {
785
+ "epoch": 4.37,
786
+ "eval_accuracy": 0.8650006966083668,
787
+ "eval_loss": 0.75830078125,
788
+ "eval_runtime": 292.6223,
789
+ "eval_samples_per_second": 14.746,
790
+ "eval_steps_per_second": 1.845,
791
+ "step": 7700
792
+ },
793
+ {
794
+ "epoch": 4.43,
795
+ "eval_accuracy": 0.8648572972274987,
796
+ "eval_loss": 0.7646484375,
797
+ "eval_runtime": 292.6746,
798
+ "eval_samples_per_second": 14.743,
799
+ "eval_steps_per_second": 1.845,
800
+ "step": 7800
801
+ },
802
+ {
803
+ "epoch": 4.48,
804
+ "eval_accuracy": 0.8645709515443751,
805
+ "eval_loss": 0.759765625,
806
+ "eval_runtime": 292.8238,
807
+ "eval_samples_per_second": 14.736,
808
+ "eval_steps_per_second": 1.844,
809
+ "step": 7900
810
+ },
811
+ {
812
+ "epoch": 4.54,
813
+ "learning_rate": 5e-05,
814
+ "loss": 0.1784,
815
+ "step": 8000
816
+ },
817
+ {
818
+ "epoch": 4.54,
819
+ "eval_accuracy": 0.8649886900251346,
820
+ "eval_loss": 0.765625,
821
+ "eval_runtime": 292.7739,
822
+ "eval_samples_per_second": 14.738,
823
+ "eval_steps_per_second": 1.844,
824
+ "step": 8000
825
+ },
826
+ {
827
+ "epoch": 4.6,
828
+ "eval_accuracy": 0.8647777819309984,
829
+ "eval_loss": 0.76171875,
830
+ "eval_runtime": 291.5794,
831
+ "eval_samples_per_second": 14.799,
832
+ "eval_steps_per_second": 1.852,
833
+ "step": 8100
834
+ },
835
+ {
836
+ "epoch": 4.65,
837
+ "eval_accuracy": 0.8650895000164242,
838
+ "eval_loss": 0.75732421875,
839
+ "eval_runtime": 292.5766,
840
+ "eval_samples_per_second": 14.748,
841
+ "eval_steps_per_second": 1.846,
842
+ "step": 8200
843
+ },
844
+ {
845
+ "epoch": 4.71,
846
+ "eval_accuracy": 0.8647773288523859,
847
+ "eval_loss": 0.76708984375,
848
+ "eval_runtime": 292.8751,
849
+ "eval_samples_per_second": 14.733,
850
+ "eval_steps_per_second": 1.844,
851
+ "step": 8300
852
+ },
853
+ {
854
+ "epoch": 4.77,
855
+ "eval_accuracy": 0.8651295974736336,
856
+ "eval_loss": 0.75634765625,
857
+ "eval_runtime": 292.7435,
858
+ "eval_samples_per_second": 14.74,
859
+ "eval_steps_per_second": 1.845,
860
+ "step": 8400
861
+ },
862
+ {
863
+ "epoch": 4.82,
864
+ "learning_rate": 5e-05,
865
+ "loss": 0.1827,
866
+ "step": 8500
867
+ },
868
+ {
869
+ "epoch": 4.82,
870
+ "eval_accuracy": 0.8648883331124575,
871
+ "eval_loss": 0.76513671875,
872
+ "eval_runtime": 291.561,
873
+ "eval_samples_per_second": 14.8,
874
+ "eval_steps_per_second": 1.852,
875
+ "step": 8500
876
+ },
877
+ {
878
+ "epoch": 4.88,
879
+ "eval_accuracy": 0.8649513110396002,
880
+ "eval_loss": 0.763671875,
881
+ "eval_runtime": 292.871,
882
+ "eval_samples_per_second": 14.733,
883
+ "eval_steps_per_second": 1.844,
884
+ "step": 8600
885
+ },
886
+ {
887
+ "epoch": 4.94,
888
+ "eval_accuracy": 0.8653917034509865,
889
+ "eval_loss": 0.7607421875,
890
+ "eval_runtime": 292.9668,
891
+ "eval_samples_per_second": 14.729,
892
+ "eval_steps_per_second": 1.843,
893
+ "step": 8700
894
+ },
895
+ {
896
+ "epoch": 4.99,
897
+ "eval_accuracy": 0.86499322081126,
898
+ "eval_loss": 0.7607421875,
899
+ "eval_runtime": 292.8128,
900
+ "eval_samples_per_second": 14.736,
901
+ "eval_steps_per_second": 1.844,
902
+ "step": 8800
903
+ },
904
+ {
905
+ "epoch": 5.05,
906
+ "eval_accuracy": 0.8645997220362712,
907
+ "eval_loss": 0.81494140625,
908
+ "eval_runtime": 292.864,
909
+ "eval_samples_per_second": 14.734,
910
+ "eval_steps_per_second": 1.844,
911
+ "step": 8900
912
+ },
913
+ {
914
+ "epoch": 5.11,
915
+ "learning_rate": 5e-05,
916
+ "loss": 0.167,
917
+ "step": 9000
918
+ },
919
+ {
920
+ "epoch": 5.11,
921
+ "eval_accuracy": 0.8647852577281052,
922
+ "eval_loss": 0.80810546875,
923
+ "eval_runtime": 292.7519,
924
+ "eval_samples_per_second": 14.739,
925
+ "eval_steps_per_second": 1.845,
926
+ "step": 9000
927
+ },
928
+ {
929
+ "epoch": 5.16,
930
+ "eval_accuracy": 0.8643582311357888,
931
+ "eval_loss": 0.818359375,
932
+ "eval_runtime": 291.6434,
933
+ "eval_samples_per_second": 14.795,
934
+ "eval_steps_per_second": 1.852,
935
+ "step": 9100
936
+ },
937
+ {
938
+ "epoch": 5.22,
939
+ "eval_accuracy": 0.8647263575084754,
940
+ "eval_loss": 0.81396484375,
941
+ "eval_runtime": 292.5189,
942
+ "eval_samples_per_second": 14.751,
943
+ "eval_steps_per_second": 1.846,
944
+ "step": 9200
945
+ },
946
+ {
947
+ "epoch": 5.28,
948
+ "eval_accuracy": 0.8643802054484968,
949
+ "eval_loss": 0.81689453125,
950
+ "eval_runtime": 291.4921,
951
+ "eval_samples_per_second": 14.803,
952
+ "eval_steps_per_second": 1.853,
953
+ "step": 9300
954
+ },
955
+ {
956
+ "epoch": 5.33,
957
+ "eval_accuracy": 0.8644685557779417,
958
+ "eval_loss": 0.81201171875,
959
+ "eval_runtime": 292.6907,
960
+ "eval_samples_per_second": 14.743,
961
+ "eval_steps_per_second": 1.845,
962
+ "step": 9400
963
+ },
964
+ {
965
+ "epoch": 5.39,
966
+ "learning_rate": 5e-05,
967
+ "loss": 0.1371,
968
+ "step": 9500
969
+ },
970
+ {
971
+ "epoch": 5.39,
972
+ "eval_accuracy": 0.8642560619086617,
973
+ "eval_loss": 0.8154296875,
974
+ "eval_runtime": 292.8129,
975
+ "eval_samples_per_second": 14.736,
976
+ "eval_steps_per_second": 1.844,
977
+ "step": 9500
978
+ },
979
+ {
980
+ "epoch": 5.45,
981
+ "eval_accuracy": 0.8642103009687954,
982
+ "eval_loss": 0.81787109375,
983
+ "eval_runtime": 292.7443,
984
+ "eval_samples_per_second": 14.74,
985
+ "eval_steps_per_second": 1.845,
986
+ "step": 9600
987
+ },
988
+ {
989
+ "epoch": 5.51,
990
+ "eval_accuracy": 0.8642599130768682,
991
+ "eval_loss": 0.8154296875,
992
+ "eval_runtime": 291.6813,
993
+ "eval_samples_per_second": 14.794,
994
+ "eval_steps_per_second": 1.851,
995
+ "step": 9700
996
+ },
997
+ {
998
+ "epoch": 5.56,
999
+ "eval_accuracy": 0.8645023101345757,
1000
+ "eval_loss": 0.81201171875,
1001
+ "eval_runtime": 292.6716,
1002
+ "eval_samples_per_second": 14.743,
1003
+ "eval_steps_per_second": 1.845,
1004
+ "step": 9800
1005
+ },
1006
+ {
1007
+ "epoch": 5.62,
1008
+ "eval_accuracy": 0.8649979781366915,
1009
+ "eval_loss": 0.81103515625,
1010
+ "eval_runtime": 292.7654,
1011
+ "eval_samples_per_second": 14.739,
1012
+ "eval_steps_per_second": 1.844,
1013
+ "step": 9900
1014
+ },
1015
+ {
1016
+ "epoch": 5.68,
1017
+ "learning_rate": 5e-05,
1018
+ "loss": 0.1425,
1019
+ "step": 10000
1020
+ },
1021
+ {
1022
+ "epoch": 5.68,
1023
+ "eval_accuracy": 0.8645428606703978,
1024
+ "eval_loss": 0.81591796875,
1025
+ "eval_runtime": 292.6875,
1026
+ "eval_samples_per_second": 14.743,
1027
+ "eval_steps_per_second": 1.845,
1028
+ "step": 10000
1029
+ },
1030
+ {
1031
+ "epoch": 5.73,
1032
+ "eval_accuracy": 0.8646024405079464,
1033
+ "eval_loss": 0.8173828125,
1034
+ "eval_runtime": 292.6736,
1035
+ "eval_samples_per_second": 14.743,
1036
+ "eval_steps_per_second": 1.845,
1037
+ "step": 10100
1038
+ },
1039
+ {
1040
+ "epoch": 5.79,
1041
+ "eval_accuracy": 0.8649073624141841,
1042
+ "eval_loss": 0.81591796875,
1043
+ "eval_runtime": 292.7868,
1044
+ "eval_samples_per_second": 14.738,
1045
+ "eval_steps_per_second": 1.844,
1046
+ "step": 10200
1047
+ },
1048
+ {
1049
+ "epoch": 5.85,
1050
+ "eval_accuracy": 0.8639400395764169,
1051
+ "eval_loss": 0.81103515625,
1052
+ "eval_runtime": 292.8417,
1053
+ "eval_samples_per_second": 14.735,
1054
+ "eval_steps_per_second": 1.844,
1055
+ "step": 10300
1056
+ },
1057
+ {
1058
+ "epoch": 5.9,
1059
+ "eval_accuracy": 0.8645482976137482,
1060
+ "eval_loss": 0.8134765625,
1061
+ "eval_runtime": 292.765,
1062
+ "eval_samples_per_second": 14.739,
1063
+ "eval_steps_per_second": 1.844,
1064
+ "step": 10400
1065
+ },
1066
+ {
1067
+ "epoch": 5.96,
1068
+ "learning_rate": 5e-05,
1069
+ "loss": 0.1505,
1070
+ "step": 10500
1071
+ },
1072
+ {
1073
+ "epoch": 5.96,
1074
+ "eval_accuracy": 0.8642195890803523,
1075
+ "eval_loss": 0.81396484375,
1076
+ "eval_runtime": 292.5807,
1077
+ "eval_samples_per_second": 14.748,
1078
+ "eval_steps_per_second": 1.846,
1079
+ "step": 10500
1080
+ },
1081
+ {
1082
+ "epoch": 6.02,
1083
+ "eval_accuracy": 0.8639674508324753,
1084
+ "eval_loss": 0.86279296875,
1085
+ "eval_runtime": 292.6261,
1086
+ "eval_samples_per_second": 14.746,
1087
+ "eval_steps_per_second": 1.845,
1088
+ "step": 10600
1089
+ },
1090
+ {
1091
+ "epoch": 6.07,
1092
+ "eval_accuracy": 0.8644205294450127,
1093
+ "eval_loss": 0.85400390625,
1094
+ "eval_runtime": 292.6297,
1095
+ "eval_samples_per_second": 14.746,
1096
+ "eval_steps_per_second": 1.845,
1097
+ "step": 10700
1098
+ },
1099
+ {
1100
+ "epoch": 6.13,
1101
+ "eval_accuracy": 0.8642433757075105,
1102
+ "eval_loss": 0.85302734375,
1103
+ "eval_runtime": 291.5398,
1104
+ "eval_samples_per_second": 14.801,
1105
+ "eval_steps_per_second": 1.852,
1106
+ "step": 10800
1107
+ },
1108
+ {
1109
+ "epoch": 6.19,
1110
+ "eval_accuracy": 0.8646898846801662,
1111
+ "eval_loss": 0.85595703125,
1112
+ "eval_runtime": 291.664,
1113
+ "eval_samples_per_second": 14.794,
1114
+ "eval_steps_per_second": 1.851,
1115
+ "step": 10900
1116
+ },
1117
+ {
1118
+ "epoch": 6.24,
1119
+ "learning_rate": 5e-05,
1120
+ "loss": 0.1086,
1121
+ "step": 11000
1122
+ },
1123
+ {
1124
+ "epoch": 6.24,
1125
+ "eval_accuracy": 0.864855937991661,
1126
+ "eval_loss": 0.85546875,
1127
+ "eval_runtime": 292.8292,
1128
+ "eval_samples_per_second": 14.736,
1129
+ "eval_steps_per_second": 1.844,
1130
+ "step": 11000
1131
+ },
1132
+ {
1133
+ "epoch": 6.3,
1134
+ "eval_accuracy": 0.8643829239201721,
1135
+ "eval_loss": 0.8603515625,
1136
+ "eval_runtime": 292.6815,
1137
+ "eval_samples_per_second": 14.743,
1138
+ "eval_steps_per_second": 1.845,
1139
+ "step": 11100
1140
+ },
1141
+ {
1142
+ "epoch": 6.36,
1143
+ "eval_accuracy": 0.8641971616890317,
1144
+ "eval_loss": 0.85693359375,
1145
+ "eval_runtime": 292.4181,
1146
+ "eval_samples_per_second": 14.756,
1147
+ "eval_steps_per_second": 1.847,
1148
+ "step": 11200
1149
+ },
1150
+ {
1151
+ "epoch": 6.41,
1152
+ "eval_accuracy": 0.8638854436036061,
1153
+ "eval_loss": 0.85302734375,
1154
+ "eval_runtime": 293.018,
1155
+ "eval_samples_per_second": 14.726,
1156
+ "eval_steps_per_second": 1.843,
1157
+ "step": 11300
1158
+ },
1159
+ {
1160
+ "epoch": 6.47,
1161
+ "eval_accuracy": 0.8642818873895762,
1162
+ "eval_loss": 0.85888671875,
1163
+ "eval_runtime": 291.7384,
1164
+ "eval_samples_per_second": 14.791,
1165
+ "eval_steps_per_second": 1.851,
1166
+ "step": 11400
1167
+ },
1168
+ {
1169
+ "epoch": 6.53,
1170
+ "learning_rate": 5e-05,
1171
+ "loss": 0.1076,
1172
+ "step": 11500
1173
+ },
1174
+ {
1175
+ "epoch": 6.53,
1176
+ "eval_accuracy": 0.8638872559180562,
1177
+ "eval_loss": 0.8525390625,
1178
+ "eval_runtime": 293.4652,
1179
+ "eval_samples_per_second": 14.704,
1180
+ "eval_steps_per_second": 1.84,
1181
+ "step": 11500
1182
+ },
1183
+ {
1184
+ "epoch": 6.58,
1185
+ "eval_accuracy": 0.8640002990318842,
1186
+ "eval_loss": 0.85791015625,
1187
+ "eval_runtime": 292.9912,
1188
+ "eval_samples_per_second": 14.727,
1189
+ "eval_steps_per_second": 1.843,
1190
+ "step": 11600
1191
+ },
1192
+ {
1193
+ "epoch": 6.64,
1194
+ "eval_accuracy": 0.8639792308764013,
1195
+ "eval_loss": 0.859375,
1196
+ "eval_runtime": 292.1362,
1197
+ "eval_samples_per_second": 14.771,
1198
+ "eval_steps_per_second": 1.848,
1199
+ "step": 11700
1200
+ },
1201
+ {
1202
+ "epoch": 6.7,
1203
+ "eval_accuracy": 0.864300237073384,
1204
+ "eval_loss": 0.85986328125,
1205
+ "eval_runtime": 292.9015,
1206
+ "eval_samples_per_second": 14.732,
1207
+ "eval_steps_per_second": 1.844,
1208
+ "step": 11800
1209
+ },
1210
+ {
1211
+ "epoch": 6.75,
1212
+ "eval_accuracy": 0.8639912374596336,
1213
+ "eval_loss": 0.8564453125,
1214
+ "eval_runtime": 292.927,
1215
+ "eval_samples_per_second": 14.731,
1216
+ "eval_steps_per_second": 1.843,
1217
+ "step": 11900
1218
+ },
1219
+ {
1220
+ "epoch": 6.81,
1221
+ "learning_rate": 5e-05,
1222
+ "loss": 0.1109,
1223
+ "step": 12000
1224
+ },
1225
+ {
1226
+ "epoch": 6.81,
1227
+ "eval_accuracy": 0.8640392637925625,
1228
+ "eval_loss": 0.86328125,
1229
+ "eval_runtime": 292.9284,
1230
+ "eval_samples_per_second": 14.731,
1231
+ "eval_steps_per_second": 1.843,
1232
+ "step": 12000
1233
+ },
1234
+ {
1235
+ "epoch": 6.87,
1236
+ "eval_accuracy": 0.8638109121718437,
1237
+ "eval_loss": 0.8583984375,
1238
+ "eval_runtime": 292.6011,
1239
+ "eval_samples_per_second": 14.747,
1240
+ "eval_steps_per_second": 1.846,
1241
+ "step": 12100
1242
+ },
1243
+ {
1244
+ "epoch": 6.92,
1245
+ "eval_accuracy": 0.863599550999095,
1246
+ "eval_loss": 0.86474609375,
1247
+ "eval_runtime": 292.8714,
1248
+ "eval_samples_per_second": 14.733,
1249
+ "eval_steps_per_second": 1.844,
1250
+ "step": 12200
1251
+ },
1252
+ {
1253
+ "epoch": 6.98,
1254
+ "eval_accuracy": 0.8634767666950973,
1255
+ "eval_loss": 0.85986328125,
1256
+ "eval_runtime": 291.6716,
1257
+ "eval_samples_per_second": 14.794,
1258
+ "eval_steps_per_second": 1.851,
1259
+ "step": 12300
1260
+ },
1261
+ {
1262
+ "epoch": 7.04,
1263
+ "eval_accuracy": 0.8632459231420095,
1264
+ "eval_loss": 0.89794921875,
1265
+ "eval_runtime": 292.8834,
1266
+ "eval_samples_per_second": 14.733,
1267
+ "eval_steps_per_second": 1.844,
1268
+ "step": 12400
1269
+ },
1270
+ {
1271
+ "epoch": 7.09,
1272
+ "learning_rate": 5e-05,
1273
+ "loss": 0.1028,
1274
+ "step": 12500
1275
+ },
1276
+ {
1277
+ "epoch": 7.09,
1278
+ "eval_accuracy": 0.8634595497078209,
1279
+ "eval_loss": 0.8935546875,
1280
+ "eval_runtime": 291.8248,
1281
+ "eval_samples_per_second": 14.786,
1282
+ "eval_steps_per_second": 1.85,
1283
+ "step": 12500
1284
+ },
1285
+ {
1286
+ "epoch": 7.15,
1287
+ "eval_accuracy": 0.8637445361551069,
1288
+ "eval_loss": 0.904296875,
1289
+ "eval_runtime": 293.127,
1290
+ "eval_samples_per_second": 14.721,
1291
+ "eval_steps_per_second": 1.842,
1292
+ "step": 12600
1293
+ },
1294
+ {
1295
+ "epoch": 7.21,
1296
+ "eval_accuracy": 0.8641618215572539,
1297
+ "eval_loss": 0.89892578125,
1298
+ "eval_runtime": 291.7379,
1299
+ "eval_samples_per_second": 14.791,
1300
+ "eval_steps_per_second": 1.851,
1301
+ "step": 12700
1302
+ },
1303
+ {
1304
+ "epoch": 7.26,
1305
+ "eval_accuracy": 0.8641742812190987,
1306
+ "eval_loss": 0.8935546875,
1307
+ "eval_runtime": 291.8762,
1308
+ "eval_samples_per_second": 14.784,
1309
+ "eval_steps_per_second": 1.85,
1310
+ "step": 12800
1311
+ },
1312
+ {
1313
+ "epoch": 7.32,
1314
+ "eval_accuracy": 0.8641423391769147,
1315
+ "eval_loss": 0.89208984375,
1316
+ "eval_runtime": 293.0894,
1317
+ "eval_samples_per_second": 14.722,
1318
+ "eval_steps_per_second": 1.842,
1319
+ "step": 12900
1320
+ },
1321
+ {
1322
+ "epoch": 7.38,
1323
+ "learning_rate": 5e-05,
1324
+ "loss": 0.0774,
1325
+ "step": 13000
1326
+ },
1327
+ {
1328
+ "epoch": 7.38,
1329
+ "eval_accuracy": 0.8633791282540956,
1330
+ "eval_loss": 0.8955078125,
1331
+ "eval_runtime": 292.6585,
1332
+ "eval_samples_per_second": 14.744,
1333
+ "eval_steps_per_second": 1.845,
1334
+ "step": 13000
1335
+ },
1336
+ {
1337
+ "epoch": 7.43,
1338
+ "eval_accuracy": 0.8636269622551535,
1339
+ "eval_loss": 0.89501953125,
1340
+ "eval_runtime": 294.4392,
1341
+ "eval_samples_per_second": 14.655,
1342
+ "eval_steps_per_second": 1.834,
1343
+ "step": 13100
1344
+ },
1345
+ {
1346
+ "epoch": 7.49,
1347
+ "eval_accuracy": 0.8635223010956573,
1348
+ "eval_loss": 0.8994140625,
1349
+ "eval_runtime": 292.0929,
1350
+ "eval_samples_per_second": 14.773,
1351
+ "eval_steps_per_second": 1.849,
1352
+ "step": 13200
1353
+ },
1354
+ {
1355
+ "epoch": 7.55,
1356
+ "eval_accuracy": 0.8635028187153182,
1357
+ "eval_loss": 0.89990234375,
1358
+ "eval_runtime": 291.6539,
1359
+ "eval_samples_per_second": 14.795,
1360
+ "eval_steps_per_second": 1.852,
1361
+ "step": 13300
1362
+ },
1363
+ {
1364
+ "epoch": 7.6,
1365
+ "eval_accuracy": 0.8631285757813624,
1366
+ "eval_loss": 0.8935546875,
1367
+ "eval_runtime": 293.1835,
1368
+ "eval_samples_per_second": 14.718,
1369
+ "eval_steps_per_second": 1.842,
1370
+ "step": 13400
1371
+ },
1372
+ {
1373
+ "epoch": 7.66,
1374
+ "learning_rate": 5e-05,
1375
+ "loss": 0.0852,
1376
+ "step": 13500
1377
+ },
1378
+ {
1379
+ "epoch": 7.66,
1380
+ "eval_accuracy": 0.863441879641932,
1381
+ "eval_loss": 0.90478515625,
1382
+ "eval_runtime": 292.8044,
1383
+ "eval_samples_per_second": 14.737,
1384
+ "eval_steps_per_second": 1.844,
1385
+ "step": 13500
1386
+ },
1387
+ {
1388
+ "epoch": 7.72,
1389
+ "eval_accuracy": 0.8632284796154269,
1390
+ "eval_loss": 0.89599609375,
1391
+ "eval_runtime": 292.7129,
1392
+ "eval_samples_per_second": 14.741,
1393
+ "eval_steps_per_second": 1.845,
1394
+ "step": 13600
1395
+ },
1396
+ {
1397
+ "epoch": 7.78,
1398
+ "eval_accuracy": 0.8634731420661971,
1399
+ "eval_loss": 0.90234375,
1400
+ "eval_runtime": 292.9408,
1401
+ "eval_samples_per_second": 14.73,
1402
+ "eval_steps_per_second": 1.843,
1403
+ "step": 13700
1404
+ },
1405
+ {
1406
+ "epoch": 7.83,
1407
+ "eval_accuracy": 0.8638301680128765,
1408
+ "eval_loss": 0.8984375,
1409
+ "eval_runtime": 292.9793,
1410
+ "eval_samples_per_second": 14.728,
1411
+ "eval_steps_per_second": 1.843,
1412
+ "step": 13800
1413
+ },
1414
+ {
1415
+ "epoch": 7.89,
1416
+ "eval_accuracy": 0.8635458611835093,
1417
+ "eval_loss": 0.90185546875,
1418
+ "eval_runtime": 292.8153,
1419
+ "eval_samples_per_second": 14.736,
1420
+ "eval_steps_per_second": 1.844,
1421
+ "step": 13900
1422
+ },
1423
+ {
1424
+ "epoch": 7.95,
1425
+ "learning_rate": 5e-05,
1426
+ "loss": 0.0879,
1427
+ "step": 14000
1428
+ },
1429
+ {
1430
+ "epoch": 7.95,
1431
+ "eval_accuracy": 0.863396345241372,
1432
+ "eval_loss": 0.9013671875,
1433
+ "eval_runtime": 292.8988,
1434
+ "eval_samples_per_second": 14.732,
1435
+ "eval_steps_per_second": 1.844,
1436
+ "step": 14000
1437
+ },
1438
+ {
1439
+ "epoch": 8.0,
1440
+ "eval_accuracy": 0.8630044322415271,
1441
+ "eval_loss": 0.91357421875,
1442
+ "eval_runtime": 292.7911,
1443
+ "eval_samples_per_second": 14.737,
1444
+ "eval_steps_per_second": 1.844,
1445
+ "step": 14100
1446
+ },
1447
+ {
1448
+ "epoch": 8.06,
1449
+ "eval_accuracy": 0.8638926928614067,
1450
+ "eval_loss": 0.93115234375,
1451
+ "eval_runtime": 292.7997,
1452
+ "eval_samples_per_second": 14.737,
1453
+ "eval_steps_per_second": 1.844,
1454
+ "step": 14200
1455
+ },
1456
+ {
1457
+ "epoch": 8.12,
1458
+ "eval_accuracy": 0.8635213949384323,
1459
+ "eval_loss": 0.9345703125,
1460
+ "eval_runtime": 292.7958,
1461
+ "eval_samples_per_second": 14.737,
1462
+ "eval_steps_per_second": 1.844,
1463
+ "step": 14300
1464
+ },
1465
+ {
1466
+ "epoch": 8.17,
1467
+ "eval_accuracy": 0.8635272849603952,
1468
+ "eval_loss": 0.9306640625,
1469
+ "eval_runtime": 291.7965,
1470
+ "eval_samples_per_second": 14.788,
1471
+ "eval_steps_per_second": 1.851,
1472
+ "step": 14400
1473
+ },
1474
+ {
1475
+ "epoch": 8.23,
1476
+ "learning_rate": 5e-05,
1477
+ "loss": 0.0611,
1478
+ "step": 14500
1479
+ },
1480
+ {
1481
+ "epoch": 8.23,
1482
+ "eval_accuracy": 0.8640974843942736,
1483
+ "eval_loss": 0.94189453125,
1484
+ "eval_runtime": 292.9438,
1485
+ "eval_samples_per_second": 14.73,
1486
+ "eval_steps_per_second": 1.843,
1487
+ "step": 14500
1488
+ },
1489
+ {
1490
+ "epoch": 8.29,
1491
+ "eval_accuracy": 0.863091196795828,
1492
+ "eval_loss": 0.93310546875,
1493
+ "eval_runtime": 292.6114,
1494
+ "eval_samples_per_second": 14.747,
1495
+ "eval_steps_per_second": 1.845,
1496
+ "step": 14600
1497
+ },
1498
+ {
1499
+ "epoch": 8.34,
1500
+ "eval_accuracy": 0.8635660231817672,
1501
+ "eval_loss": 0.9375,
1502
+ "eval_runtime": 292.5076,
1503
+ "eval_samples_per_second": 14.752,
1504
+ "eval_steps_per_second": 1.846,
1505
+ "step": 14700
1506
+ },
1507
+ {
1508
+ "epoch": 8.4,
1509
+ "eval_accuracy": 0.8626125192416824,
1510
+ "eval_loss": 0.92919921875,
1511
+ "eval_runtime": 292.6676,
1512
+ "eval_samples_per_second": 14.744,
1513
+ "eval_steps_per_second": 1.845,
1514
+ "step": 14800
1515
+ },
1516
+ {
1517
+ "epoch": 8.46,
1518
+ "eval_accuracy": 0.8637236945389302,
1519
+ "eval_loss": 0.94580078125,
1520
+ "eval_runtime": 291.6673,
1521
+ "eval_samples_per_second": 14.794,
1522
+ "eval_steps_per_second": 1.851,
1523
+ "step": 14900
1524
+ },
1525
+ {
1526
+ "epoch": 8.51,
1527
+ "learning_rate": 5e-05,
1528
+ "loss": 0.061,
1529
+ "step": 15000
1530
+ },
1531
+ {
1532
+ "epoch": 8.51,
1533
+ "eval_accuracy": 0.8634267015084119,
1534
+ "eval_loss": 0.93359375,
1535
+ "eval_runtime": 291.6632,
1536
+ "eval_samples_per_second": 14.794,
1537
+ "eval_steps_per_second": 1.851,
1538
+ "step": 15000
1539
+ },
1540
+ {
1541
+ "epoch": 8.57,
1542
+ "eval_accuracy": 0.8629693186490555,
1543
+ "eval_loss": 0.94091796875,
1544
+ "eval_runtime": 292.9394,
1545
+ "eval_samples_per_second": 14.73,
1546
+ "eval_steps_per_second": 1.843,
1547
+ "step": 15100
1548
+ },
1549
+ {
1550
+ "epoch": 8.63,
1551
+ "eval_accuracy": 0.8632108095495379,
1552
+ "eval_loss": 0.93896484375,
1553
+ "eval_runtime": 292.836,
1554
+ "eval_samples_per_second": 14.735,
1555
+ "eval_steps_per_second": 1.844,
1556
+ "step": 15200
1557
+ },
1558
+ {
1559
+ "epoch": 8.68,
1560
+ "eval_accuracy": 0.862787634125428,
1561
+ "eval_loss": 0.9375,
1562
+ "eval_runtime": 291.5974,
1563
+ "eval_samples_per_second": 14.798,
1564
+ "eval_steps_per_second": 1.852,
1565
+ "step": 15300
1566
+ },
1567
+ {
1568
+ "epoch": 8.74,
1569
+ "eval_accuracy": 0.8629693186490555,
1570
+ "eval_loss": 0.9365234375,
1571
+ "eval_runtime": 292.847,
1572
+ "eval_samples_per_second": 14.735,
1573
+ "eval_steps_per_second": 1.844,
1574
+ "step": 15400
1575
+ },
1576
+ {
1577
+ "epoch": 8.8,
1578
+ "learning_rate": 5e-05,
1579
+ "loss": 0.0646,
1580
+ "step": 15500
1581
+ },
1582
+ {
1583
+ "epoch": 8.8,
1584
+ "eval_accuracy": 0.8628259192681874,
1585
+ "eval_loss": 0.93701171875,
1586
+ "eval_runtime": 292.6735,
1587
+ "eval_samples_per_second": 14.743,
1588
+ "eval_steps_per_second": 1.845,
1589
+ "step": 15500
1590
+ },
1591
+ {
1592
+ "epoch": 8.85,
1593
+ "eval_accuracy": 0.8629194800016764,
1594
+ "eval_loss": 0.935546875,
1595
+ "eval_runtime": 292.789,
1596
+ "eval_samples_per_second": 14.738,
1597
+ "eval_steps_per_second": 1.844,
1598
+ "step": 15600
1599
+ },
1600
+ {
1601
+ "epoch": 8.91,
1602
+ "eval_accuracy": 0.8632305184691833,
1603
+ "eval_loss": 0.9375,
1604
+ "eval_runtime": 291.6374,
1605
+ "eval_samples_per_second": 14.796,
1606
+ "eval_steps_per_second": 1.852,
1607
+ "step": 15700
1608
+ },
1609
+ {
1610
+ "epoch": 8.97,
1611
+ "eval_accuracy": 0.8629763413675499,
1612
+ "eval_loss": 0.93896484375,
1613
+ "eval_runtime": 293.0327,
1614
+ "eval_samples_per_second": 14.725,
1615
+ "eval_steps_per_second": 1.843,
1616
+ "step": 15800
1617
+ },
1618
+ {
1619
+ "epoch": 9.02,
1620
+ "eval_accuracy": 0.8630309373403606,
1621
+ "eval_loss": 0.9716796875,
1622
+ "eval_runtime": 292.645,
1623
+ "eval_samples_per_second": 14.745,
1624
+ "eval_steps_per_second": 1.845,
1625
+ "step": 15900
1626
+ },
1627
+ {
1628
+ "epoch": 9.08,
1629
+ "learning_rate": 5e-05,
1630
+ "loss": 0.0593,
1631
+ "step": 16000
1632
+ },
1633
+ {
1634
+ "epoch": 9.08,
1635
+ "eval_accuracy": 0.8626426489694161,
1636
+ "eval_loss": 0.96728515625,
1637
+ "eval_runtime": 292.8134,
1638
+ "eval_samples_per_second": 14.736,
1639
+ "eval_steps_per_second": 1.844,
1640
+ "step": 16000
1641
+ },
1642
+ {
1643
+ "epoch": 9.14,
1644
+ "eval_accuracy": 0.862975661749631,
1645
+ "eval_loss": 0.96435546875,
1646
+ "eval_runtime": 292.9061,
1647
+ "eval_samples_per_second": 14.732,
1648
+ "eval_steps_per_second": 1.844,
1649
+ "step": 16100
1650
+ },
1651
+ {
1652
+ "epoch": 9.19,
1653
+ "eval_accuracy": 0.8630644651576883,
1654
+ "eval_loss": 0.96240234375,
1655
+ "eval_runtime": 293.8107,
1656
+ "eval_samples_per_second": 14.686,
1657
+ "eval_steps_per_second": 1.838,
1658
+ "step": 16200
1659
+ },
1660
+ {
1661
+ "epoch": 9.25,
1662
+ "eval_accuracy": 0.8633050499009457,
1663
+ "eval_loss": 0.96484375,
1664
+ "eval_runtime": 291.7944,
1665
+ "eval_samples_per_second": 14.788,
1666
+ "eval_steps_per_second": 1.851,
1667
+ "step": 16300
1668
+ },
1669
+ {
1670
+ "epoch": 9.31,
1671
+ "eval_accuracy": 0.8632493212316036,
1672
+ "eval_loss": 0.96728515625,
1673
+ "eval_runtime": 291.6912,
1674
+ "eval_samples_per_second": 14.793,
1675
+ "eval_steps_per_second": 1.851,
1676
+ "step": 16400
1677
+ },
1678
+ {
1679
+ "epoch": 9.36,
1680
+ "learning_rate": 5e-05,
1681
+ "loss": 0.0415,
1682
+ "step": 16500
1683
+ },
1684
+ {
1685
+ "epoch": 9.36,
1686
+ "eval_accuracy": 0.8633073152940084,
1687
+ "eval_loss": 0.9658203125,
1688
+ "eval_runtime": 291.6781,
1689
+ "eval_samples_per_second": 14.794,
1690
+ "eval_steps_per_second": 1.851,
1691
+ "step": 16500
1692
+ },
1693
+ {
1694
+ "epoch": 9.42,
1695
+ "eval_accuracy": 0.8627819706427713,
1696
+ "eval_loss": 0.96875,
1697
+ "eval_runtime": 292.5363,
1698
+ "eval_samples_per_second": 14.75,
1699
+ "eval_steps_per_second": 1.846,
1700
+ "step": 16600
1701
+ },
1702
+ {
1703
+ "epoch": 9.48,
1704
+ "eval_accuracy": 0.8632289326940394,
1705
+ "eval_loss": 0.96533203125,
1706
+ "eval_runtime": 292.6096,
1707
+ "eval_samples_per_second": 14.747,
1708
+ "eval_steps_per_second": 1.845,
1709
+ "step": 16700
1710
+ },
1711
+ {
1712
+ "epoch": 9.53,
1713
+ "eval_accuracy": 0.862821388482062,
1714
+ "eval_loss": 0.9658203125,
1715
+ "eval_runtime": 291.5436,
1716
+ "eval_samples_per_second": 14.801,
1717
+ "eval_steps_per_second": 1.852,
1718
+ "step": 16800
1719
+ },
1720
+ {
1721
+ "epoch": 9.59,
1722
+ "eval_accuracy": 0.8629199330802889,
1723
+ "eval_loss": 0.966796875,
1724
+ "eval_runtime": 292.7541,
1725
+ "eval_samples_per_second": 14.739,
1726
+ "eval_steps_per_second": 1.845,
1727
+ "step": 16900
1728
+ },
1729
+ {
1730
+ "epoch": 9.65,
1731
+ "learning_rate": 5e-05,
1732
+ "loss": 0.0471,
1733
+ "step": 17000
1734
+ },
1735
+ {
1736
+ "epoch": 9.65,
1737
+ "eval_accuracy": 0.8625384408885325,
1738
+ "eval_loss": 0.96044921875,
1739
+ "eval_runtime": 292.6838,
1740
+ "eval_samples_per_second": 14.743,
1741
+ "eval_steps_per_second": 1.845,
1742
+ "step": 17000
1743
+ },
1744
+ {
1745
+ "epoch": 9.7,
1746
+ "eval_accuracy": 0.8620747149286004,
1747
+ "eval_loss": 0.9658203125,
1748
+ "eval_runtime": 291.636,
1749
+ "eval_samples_per_second": 14.796,
1750
+ "eval_steps_per_second": 1.852,
1751
+ "step": 17100
1752
+ },
1753
+ {
1754
+ "epoch": 9.76,
1755
+ "eval_accuracy": 0.8629695451883618,
1756
+ "eval_loss": 0.97314453125,
1757
+ "eval_runtime": 291.7564,
1758
+ "eval_samples_per_second": 14.79,
1759
+ "eval_steps_per_second": 1.851,
1760
+ "step": 17200
1761
+ },
1762
+ {
1763
+ "epoch": 9.82,
1764
+ "eval_accuracy": 0.8626276973752023,
1765
+ "eval_loss": 0.96923828125,
1766
+ "eval_runtime": 291.4249,
1767
+ "eval_samples_per_second": 14.807,
1768
+ "eval_steps_per_second": 1.853,
1769
+ "step": 17300
1770
+ },
1771
+ {
1772
+ "epoch": 9.88,
1773
+ "eval_accuracy": 0.8622577586880655,
1774
+ "eval_loss": 0.96728515625,
1775
+ "eval_runtime": 291.5949,
1776
+ "eval_samples_per_second": 14.798,
1777
+ "eval_steps_per_second": 1.852,
1778
+ "step": 17400
1779
+ },
1780
+ {
1781
+ "epoch": 9.93,
1782
+ "learning_rate": 5e-05,
1783
+ "loss": 0.0528,
1784
+ "step": 17500
1785
+ },
1786
+ {
1787
+ "epoch": 9.93,
1788
+ "eval_accuracy": 0.8619999569575318,
1789
+ "eval_loss": 0.96142578125,
1790
+ "eval_runtime": 292.6583,
1791
+ "eval_samples_per_second": 14.744,
1792
+ "eval_steps_per_second": 1.845,
1793
+ "step": 17500
1794
+ },
1795
+ {
1796
+ "epoch": 9.99,
1797
+ "eval_accuracy": 0.8620715433783127,
1798
+ "eval_loss": 0.9697265625,
1799
+ "eval_runtime": 291.7031,
1800
+ "eval_samples_per_second": 14.792,
1801
+ "eval_steps_per_second": 1.851,
1802
+ "step": 17600
1803
+ },
1804
+ {
1805
+ "epoch": 10.0,
1806
+ "step": 17620,
1807
+ "total_flos": 2.179292736824279e+18,
1808
+ "train_loss": 0.2421213565700847,
1809
+ "train_runtime": 122603.1424,
1810
+ "train_samples_per_second": 1.149,
1811
+ "train_steps_per_second": 0.144
1812
+ }
1813
+ ],
1814
+ "max_steps": 17620,
1815
+ "num_train_epochs": 10,
1816
+ "total_flos": 2.179292736824279e+18,
1817
+ "trial_name": null,
1818
+ "trial_params": null
1819
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1329a0f14c187b18f591e204cf458a0fc5d096cbe27c3326c557bbea66879f40
3
+ size 4463
vocab.json ADDED
The diff for this file is too large to render. See raw diff