Shiro commited on
Commit
514d950
1 Parent(s): 58fb8f1

Upload 13 files

Browse files
README.md CHANGED
@@ -1,3 +1,213 @@
1
- ---
2
  license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  license: mit
2
+
3
+
4
+
5
+ # roberta-large-movies
6
+
7
+ This model is a fine-tuned version of [roberta-large](https://huggingface.co/roberta-large) on the movie competition dataset.
8
+ link: https://huggingface.co/spaces/competitions/movie-genre-prediction
9
+ This model is nased on a MLM (Mask language modeling) finetuning. The goal is to apply a domain transfer. It needs then to be finetuned on labels.
10
+
11
+ It achieves the following results on the evaluation set:
12
+ - Loss: 1.3261
13
+ - Accuracy: 0.7375
14
+
15
+ ## Model description
16
+
17
+ roberta-large
18
+
19
+
20
+
21
+ ## Training procedure
22
+
23
+ ### Training hyperparameters
24
+
25
+ The following hyperparameters were used during training:
26
+ - learning_rate: 5e-05
27
+ - train_batch_size: 32
28
+ - eval_batch_size: 16
29
+ - seed: 42
30
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
31
+ - lr_scheduler_type: linear
32
+ - num_epochs: 30.0
33
+ - mixed_precision_training: Native AMP
34
+
35
+ ### Training results
36
+
37
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
38
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|
39
+ | 1.7698 | 0.18 | 500 | 1.6168 | 0.6738 |
40
+ | 1.7761 | 0.36 | 1000 | 1.6522 | 0.6830 |
41
+ | 1.7626 | 0.54 | 1500 | 1.6534 | 0.6660 |
42
+ | 1.7602 | 0.72 | 2000 | 1.6576 | 0.6787 |
43
+ | 1.7587 | 0.89 | 2500 | 1.6266 | 0.6773 |
44
+ | 1.7047 | 1.07 | 3000 | 1.6060 | 0.6852 |
45
+ | 1.6782 | 1.25 | 3500 | 1.5990 | 0.6906 |
46
+ | 1.6733 | 1.43 | 4000 | 1.5377 | 0.6967 |
47
+ | 1.6664 | 1.61 | 4500 | 1.6435 | 0.6747 |
48
+ | 1.6719 | 1.79 | 5000 | 1.4839 | 0.6907 |
49
+ | 1.6502 | 1.97 | 5500 | 1.5351 | 0.6897 |
50
+ | 1.6233 | 2.15 | 6000 | 1.6818 | 0.6763 |
51
+ | 1.6127 | 2.32 | 6500 | 1.5865 | 0.6853 |
52
+ | 1.6274 | 2.5 | 7000 | 1.5004 | 0.7004 |
53
+ | 1.601 | 2.68 | 7500 | 1.4522 | 0.6930 |
54
+ | 1.6123 | 2.86 | 8000 | 1.5371 | 0.6894 |
55
+ | 1.6074 | 3.04 | 8500 | 1.5342 | 0.6952 |
56
+ | 1.563 | 3.22 | 9000 | 1.5682 | 0.6876 |
57
+ | 1.5746 | 3.4 | 9500 | 1.5705 | 0.6958 |
58
+ | 1.5539 | 3.58 | 10000 | 1.4711 | 0.7041 |
59
+ | 1.578 | 3.75 | 10500 | 1.5466 | 0.6889 |
60
+ | 1.5492 | 3.93 | 11000 | 1.4629 | 0.6969 |
61
+ | 1.5291 | 4.11 | 11500 | 1.4265 | 0.7200 |
62
+ | 1.5079 | 4.29 | 12000 | 1.5053 | 0.6966 |
63
+ | 1.5283 | 4.47 | 12500 | 1.5257 | 0.6903 |
64
+ | 1.5141 | 4.65 | 13000 | 1.5063 | 0.6950 |
65
+ | 1.4979 | 4.83 | 13500 | 1.5636 | 0.6956 |
66
+ | 1.5294 | 5.01 | 14000 | 1.5878 | 0.6835 |
67
+ | 1.4641 | 5.18 | 14500 | 1.5575 | 0.6962 |
68
+ | 1.4754 | 5.36 | 15000 | 1.4779 | 0.7007 |
69
+ | 1.4696 | 5.54 | 15500 | 1.4520 | 0.6965 |
70
+ | 1.4655 | 5.72 | 16000 | 1.6320 | 0.6830 |
71
+ | 1.4792 | 5.9 | 16500 | 1.4152 | 0.7134 |
72
+ | 1.4379 | 6.08 | 17000 | 1.4900 | 0.7042 |
73
+ | 1.4281 | 6.26 | 17500 | 1.5407 | 0.6990 |
74
+ | 1.436 | 6.44 | 18000 | 1.5343 | 0.6914 |
75
+ | 1.4342 | 6.61 | 18500 | 1.5324 | 0.7024 |
76
+ | 1.4176 | 6.79 | 19000 | 1.4486 | 0.7133 |
77
+ | 1.4308 | 6.97 | 19500 | 1.4598 | 0.7032 |
78
+ | 1.4014 | 7.15 | 20000 | 1.5750 | 0.6938 |
79
+ | 1.3661 | 7.33 | 20500 | 1.5404 | 0.6985 |
80
+ | 1.3857 | 7.51 | 21000 | 1.4692 | 0.7037 |
81
+ | 1.3846 | 7.69 | 21500 | 1.5511 | 0.6941 |
82
+ | 1.3867 | 7.87 | 22000 | 1.5321 | 0.6925 |
83
+ | 1.3658 | 8.04 | 22500 | 1.5500 | 0.7021 |
84
+ | 1.3406 | 8.22 | 23000 | 1.5239 | 0.6960 |
85
+ | 1.3405 | 8.4 | 23500 | 1.4414 | 0.7055 |
86
+ | 1.3373 | 8.58 | 24000 | 1.5994 | 0.6784 |
87
+ | 1.3527 | 8.76 | 24500 | 1.5106 | 0.6970 |
88
+ | 1.3436 | 8.94 | 25000 | 1.4714 | 0.7080 |
89
+ | 1.3069 | 9.12 | 25500 | 1.4990 | 0.6953 |
90
+ | 1.2969 | 9.3 | 26000 | 1.4810 | 0.6964 |
91
+ | 1.3009 | 9.47 | 26500 | 1.5965 | 0.6876 |
92
+ | 1.3227 | 9.65 | 27000 | 1.4296 | 0.7014 |
93
+ | 1.3259 | 9.83 | 27500 | 1.4137 | 0.7189 |
94
+ | 1.3131 | 10.01 | 28000 | 1.5342 | 0.7020 |
95
+ | 1.271 | 10.19 | 28500 | 1.4708 | 0.7113 |
96
+ | 1.2684 | 10.37 | 29000 | 1.4342 | 0.7046 |
97
+ | 1.2767 | 10.55 | 29500 | 1.4703 | 0.7094 |
98
+ | 1.2861 | 10.73 | 30000 | 1.3323 | 0.7309 |
99
+ | 1.2617 | 10.9 | 30500 | 1.4562 | 0.7003 |
100
+ | 1.2551 | 11.08 | 31000 | 1.4361 | 0.7170 |
101
+ | 1.2404 | 11.26 | 31500 | 1.4537 | 0.7035 |
102
+ | 1.2562 | 11.44 | 32000 | 1.4039 | 0.7132 |
103
+ | 1.2489 | 11.62 | 32500 | 1.4372 | 0.7064 |
104
+ | 1.2406 | 11.8 | 33000 | 1.4926 | 0.7087 |
105
+ | 1.2285 | 11.98 | 33500 | 1.4080 | 0.7152 |
106
+ | 1.2213 | 12.16 | 34000 | 1.4031 | 0.7170 |
107
+ | 1.1998 | 12.33 | 34500 | 1.3541 | 0.7223 |
108
+ | 1.2184 | 12.51 | 35000 | 1.3630 | 0.7308 |
109
+ | 1.2195 | 12.69 | 35500 | 1.3125 | 0.7281 |
110
+ | 1.2178 | 12.87 | 36000 | 1.4257 | 0.7119 |
111
+ | 1.1918 | 13.05 | 36500 | 1.4108 | 0.7153 |
112
+ | 1.1664 | 13.23 | 37000 | 1.3577 | 0.7227 |
113
+ | 1.1754 | 13.41 | 37500 | 1.3777 | 0.7206 |
114
+ | 1.1855 | 13.59 | 38000 | 1.3501 | 0.7354 |
115
+ | 1.1644 | 13.76 | 38500 | 1.3747 | 0.7207 |
116
+ | 1.1709 | 13.94 | 39000 | 1.3704 | 0.7184 |
117
+ | 1.1613 | 14.12 | 39500 | 1.4307 | 0.7247 |
118
+ | 1.1443 | 14.3 | 40000 | 1.3190 | 0.7221 |
119
+ | 1.1356 | 14.48 | 40500 | 1.3288 | 0.7331 |
120
+ | 1.1493 | 14.66 | 41000 | 1.3505 | 0.7240 |
121
+ | 1.1417 | 14.84 | 41500 | 1.3146 | 0.7320 |
122
+ | 1.1349 | 15.02 | 42000 | 1.3546 | 0.7333 |
123
+ | 1.1169 | 15.19 | 42500 | 1.3709 | 0.7247 |
124
+ | 1.1187 | 15.37 | 43000 | 1.4243 | 0.7218 |
125
+ | 1.118 | 15.55 | 43500 | 1.3835 | 0.7264 |
126
+ | 1.1165 | 15.73 | 44000 | 1.3240 | 0.7254 |
127
+ | 1.114 | 15.91 | 44500 | 1.3264 | 0.7382 |
128
+ | 1.105 | 16.09 | 45000 | 1.3214 | 0.7334 |
129
+ | 1.0924 | 16.27 | 45500 | 1.3847 | 0.7282 |
130
+ | 1.0915 | 16.45 | 46000 | 1.3604 | 0.7317 |
131
+ | 1.0968 | 16.62 | 46500 | 1.3540 | 0.7319 |
132
+ | 1.0772 | 16.8 | 47000 | 1.2475 | 0.7306 |
133
+ | 1.0975 | 16.98 | 47500 | 1.2636 | 0.7448 |
134
+ | 1.0708 | 17.16 | 48000 | 1.4056 | 0.7182 |
135
+ | 1.0654 | 17.34 | 48500 | 1.3769 | 0.7276 |
136
+ | 1.0676 | 17.52 | 49000 | 1.3357 | 0.7224 |
137
+ | 1.0507 | 17.7 | 49500 | 1.4088 | 0.7124 |
138
+ | 1.0424 | 17.88 | 50000 | 1.3146 | 0.7315 |
139
+ | 1.0524 | 18.06 | 50500 | 1.2896 | 0.7393 |
140
+ | 1.0349 | 18.23 | 51000 | 1.3987 | 0.7192 |
141
+ | 1.0217 | 18.41 | 51500 | 1.2938 | 0.7381 |
142
+ | 1.0238 | 18.59 | 52000 | 1.2962 | 0.7387 |
143
+ | 1.0292 | 18.77 | 52500 | 1.3195 | 0.7371 |
144
+ | 1.0426 | 18.95 | 53000 | 1.2835 | 0.7412 |
145
+ | 1.0196 | 19.13 | 53500 | 1.2346 | 0.7473 |
146
+ | 1.012 | 19.31 | 54000 | 1.3666 | 0.7338 |
147
+ | 1.0256 | 19.49 | 54500 | 1.3140 | 0.7365 |
148
+ | 0.9824 | 19.66 | 55000 | 1.2764 | 0.7416 |
149
+ | 1.0048 | 19.84 | 55500 | 1.2514 | 0.7488 |
150
+ | 0.9947 | 20.02 | 56000 | 1.3351 | 0.7432 |
151
+ | 0.977 | 20.2 | 56500 | 1.2854 | 0.7451 |
152
+ | 0.9862 | 20.38 | 57000 | 1.3666 | 0.7285 |
153
+ | 0.9699 | 20.56 | 57500 | 1.3123 | 0.7348 |
154
+ | 0.977 | 20.74 | 58000 | 1.3426 | 0.7255 |
155
+ | 0.9749 | 20.92 | 58500 | 1.3763 | 0.7297 |
156
+ | 0.9505 | 21.09 | 59000 | 1.2372 | 0.7434 |
157
+ | 0.9438 | 21.27 | 59500 | 1.4334 | 0.7159 |
158
+ | 0.944 | 21.45 | 60000 | 1.2690 | 0.7508 |
159
+ | 0.9427 | 21.63 | 60500 | 1.2186 | 0.7486 |
160
+ | 0.9553 | 21.81 | 61000 | 1.3941 | 0.7269 |
161
+ | 0.9571 | 21.99 | 61500 | 1.4163 | 0.7274 |
162
+ | 0.932 | 22.17 | 62000 | 1.2717 | 0.7523 |
163
+ | 0.9166 | 22.35 | 62500 | 1.2177 | 0.7396 |
164
+ | 0.9301 | 22.52 | 63000 | 1.3264 | 0.7378 |
165
+ | 0.9351 | 22.7 | 63500 | 1.2570 | 0.7520 |
166
+ | 0.9211 | 22.88 | 64000 | 1.2639 | 0.75 |
167
+ | 0.9211 | 23.06 | 64500 | 1.2377 | 0.7606 |
168
+ | 0.9196 | 23.24 | 65000 | 1.2739 | 0.7485 |
169
+ | 0.9062 | 23.42 | 65500 | 1.3263 | 0.7365 |
170
+ | 0.8965 | 23.6 | 66000 | 1.2814 | 0.7455 |
171
+ | 0.9004 | 23.78 | 66500 | 1.2109 | 0.7562 |
172
+ | 0.9094 | 23.95 | 67000 | 1.2629 | 0.7528 |
173
+ | 0.8937 | 24.13 | 67500 | 1.2771 | 0.7375 |
174
+ | 0.8711 | 24.31 | 68000 | 1.3746 | 0.7353 |
175
+ | 0.8972 | 24.49 | 68500 | 1.2529 | 0.7454 |
176
+ | 0.8863 | 24.67 | 69000 | 1.3219 | 0.7359 |
177
+ | 0.8823 | 24.85 | 69500 | 1.3136 | 0.7367 |
178
+ | 0.8759 | 25.03 | 70000 | 1.3152 | 0.7428 |
179
+ | 0.8722 | 25.21 | 70500 | 1.3108 | 0.7570 |
180
+ | 0.8548 | 25.38 | 71000 | 1.3503 | 0.7368 |
181
+ | 0.8728 | 25.56 | 71500 | 1.3091 | 0.7403 |
182
+ | 0.8633 | 25.74 | 72000 | 1.2952 | 0.7416 |
183
+ | 0.8612 | 25.92 | 72500 | 1.1612 | 0.7719 |
184
+ | 0.8677 | 26.1 | 73000 | 1.2855 | 0.7450 |
185
+ | 0.8526 | 26.28 | 73500 | 1.2979 | 0.7545 |
186
+ | 0.8594 | 26.46 | 74000 | 1.2570 | 0.7598 |
187
+ | 0.8481 | 26.64 | 74500 | 1.2337 | 0.7492 |
188
+ | 0.855 | 26.81 | 75000 | 1.2875 | 0.7444 |
189
+ | 0.835 | 26.99 | 75500 | 1.2270 | 0.7585 |
190
+ | 0.8309 | 27.17 | 76000 | 1.2540 | 0.7389 |
191
+ | 0.8326 | 27.35 | 76500 | 1.3611 | 0.7375 |
192
+ | 0.8398 | 27.53 | 77000 | 1.2248 | 0.7505 |
193
+ | 0.8304 | 27.71 | 77500 | 1.2403 | 0.7607 |
194
+ | 0.8373 | 27.89 | 78000 | 1.1709 | 0.7611 |
195
+ | 0.8462 | 28.07 | 78500 | 1.2891 | 0.7508 |
196
+ | 0.8259 | 28.24 | 79000 | 1.2452 | 0.7501 |
197
+ | 0.8334 | 28.42 | 79500 | 1.2986 | 0.7468 |
198
+ | 0.8115 | 28.6 | 80000 | 1.2880 | 0.7515 |
199
+ | 0.8205 | 28.78 | 80500 | 1.2728 | 0.7562 |
200
+ | 0.8261 | 28.96 | 81000 | 1.2661 | 0.7524 |
201
+ | 0.8299 | 29.14 | 81500 | 1.2592 | 0.7486 |
202
+ | 0.8276 | 29.32 | 82000 | 1.2325 | 0.7530 |
203
+ | 0.8112 | 29.5 | 82500 | 1.3154 | 0.7478 |
204
+ | 0.8111 | 29.67 | 83000 | 1.3343 | 0.7405 |
205
+ | 0.8148 | 29.85 | 83500 | 1.2806 | 0.7485 |
206
+
207
+
208
+ ### Framework versions
209
+
210
+ - Transformers 4.21.3
211
+ - Pytorch 1.12.1+cu116
212
+ - Datasets 2.4.0
213
+ - Tokenizers 0.12.1
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.737508327781479,
4
+ "eval_loss": 1.3261024951934814,
5
+ "eval_runtime": 0.7869,
6
+ "eval_samples": 500,
7
+ "eval_samples_per_second": 635.414,
8
+ "eval_steps_per_second": 40.666,
9
+ "perplexity": 3.766335433571256,
10
+ "train_loss": 1.1746184680817338,
11
+ "train_runtime": 16410.0948,
12
+ "train_samples": 89500,
13
+ "train_samples_per_second": 163.619,
14
+ "train_steps_per_second": 5.113
15
+ }
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "roberta-large",
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 4096,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "roberta",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 24,
20
+ "pad_token_id": 1,
21
+ "position_embedding_type": "absolute",
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.21.3",
24
+ "type_vocab_size": 1,
25
+ "use_cache": true,
26
+ "vocab_size": 50265
27
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.737508327781479,
4
+ "eval_loss": 1.3261024951934814,
5
+ "eval_runtime": 0.7869,
6
+ "eval_samples": 500,
7
+ "eval_samples_per_second": 635.414,
8
+ "eval_steps_per_second": 40.666,
9
+ "perplexity": 3.766335433571256
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fb916109c43901c88702eb47544190f727b4267dfdee854887f87b50224c07a
3
+ size 1421785643
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<s>",
4
+ "cls_token": "<s>",
5
+ "eos_token": "</s>",
6
+ "errors": "replace",
7
+ "mask_token": "<mask>",
8
+ "model_max_length": 512,
9
+ "name_or_path": "roberta-large",
10
+ "pad_token": "<pad>",
11
+ "sep_token": "</s>",
12
+ "special_tokens_map_file": null,
13
+ "tokenizer_class": "RobertaTokenizer",
14
+ "trim_offsets": true,
15
+ "unk_token": "<unk>"
16
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "train_loss": 1.1746184680817338,
4
+ "train_runtime": 16410.0948,
5
+ "train_samples": 89500,
6
+ "train_samples_per_second": 163.619,
7
+ "train_steps_per_second": 5.113
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2530 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.7719072164948454,
3
+ "best_model_checkpoint": "roberta-large-movies/checkpoint-72500",
4
+ "epoch": 30.0,
5
+ "global_step": 83910,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.18,
12
+ "learning_rate": 4.970504111548088e-05,
13
+ "loss": 1.7698,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.18,
18
+ "eval_accuracy": 0.6738421395955643,
19
+ "eval_loss": 1.6167851686477661,
20
+ "eval_runtime": 0.8246,
21
+ "eval_samples_per_second": 606.37,
22
+ "eval_steps_per_second": 38.808,
23
+ "step": 500
24
+ },
25
+ {
26
+ "epoch": 0.36,
27
+ "learning_rate": 4.94082946013586e-05,
28
+ "loss": 1.7761,
29
+ "step": 1000
30
+ },
31
+ {
32
+ "epoch": 0.36,
33
+ "eval_accuracy": 0.6829508196721311,
34
+ "eval_loss": 1.6522468328475952,
35
+ "eval_runtime": 0.7873,
36
+ "eval_samples_per_second": 635.049,
37
+ "eval_steps_per_second": 40.643,
38
+ "step": 1000
39
+ },
40
+ {
41
+ "epoch": 0.54,
42
+ "learning_rate": 4.9110356334167565e-05,
43
+ "loss": 1.7626,
44
+ "step": 1500
45
+ },
46
+ {
47
+ "epoch": 0.54,
48
+ "eval_accuracy": 0.6660117878192534,
49
+ "eval_loss": 1.6534239053726196,
50
+ "eval_runtime": 0.7869,
51
+ "eval_samples_per_second": 635.425,
52
+ "eval_steps_per_second": 40.667,
53
+ "step": 1500
54
+ },
55
+ {
56
+ "epoch": 0.72,
57
+ "learning_rate": 4.8812418066976524e-05,
58
+ "loss": 1.7602,
59
+ "step": 2000
60
+ },
61
+ {
62
+ "epoch": 0.72,
63
+ "eval_accuracy": 0.6787299419597133,
64
+ "eval_loss": 1.6575504541397095,
65
+ "eval_runtime": 0.7882,
66
+ "eval_samples_per_second": 634.385,
67
+ "eval_steps_per_second": 40.601,
68
+ "step": 2000
69
+ },
70
+ {
71
+ "epoch": 0.89,
72
+ "learning_rate": 4.851447979978549e-05,
73
+ "loss": 1.7587,
74
+ "step": 2500
75
+ },
76
+ {
77
+ "epoch": 0.89,
78
+ "eval_accuracy": 0.6772697150430749,
79
+ "eval_loss": 1.6266298294067383,
80
+ "eval_runtime": 0.7893,
81
+ "eval_samples_per_second": 633.509,
82
+ "eval_steps_per_second": 40.545,
83
+ "step": 2500
84
+ },
85
+ {
86
+ "epoch": 1.07,
87
+ "learning_rate": 4.821654153259445e-05,
88
+ "loss": 1.7047,
89
+ "step": 3000
90
+ },
91
+ {
92
+ "epoch": 1.07,
93
+ "eval_accuracy": 0.6851971557853911,
94
+ "eval_loss": 1.605985164642334,
95
+ "eval_runtime": 0.8181,
96
+ "eval_samples_per_second": 611.179,
97
+ "eval_steps_per_second": 39.115,
98
+ "step": 3000
99
+ },
100
+ {
101
+ "epoch": 1.25,
102
+ "learning_rate": 4.791860326540341e-05,
103
+ "loss": 1.6782,
104
+ "step": 3500
105
+ },
106
+ {
107
+ "epoch": 1.25,
108
+ "eval_accuracy": 0.6906354515050167,
109
+ "eval_loss": 1.599035382270813,
110
+ "eval_runtime": 0.8184,
111
+ "eval_samples_per_second": 610.967,
112
+ "eval_steps_per_second": 39.102,
113
+ "step": 3500
114
+ },
115
+ {
116
+ "epoch": 1.43,
117
+ "learning_rate": 4.7620664998212375e-05,
118
+ "loss": 1.6733,
119
+ "step": 4000
120
+ },
121
+ {
122
+ "epoch": 1.43,
123
+ "eval_accuracy": 0.6967426710097719,
124
+ "eval_loss": 1.5377483367919922,
125
+ "eval_runtime": 0.819,
126
+ "eval_samples_per_second": 610.521,
127
+ "eval_steps_per_second": 39.073,
128
+ "step": 4000
129
+ },
130
+ {
131
+ "epoch": 1.61,
132
+ "learning_rate": 4.7322726731021334e-05,
133
+ "loss": 1.6664,
134
+ "step": 4500
135
+ },
136
+ {
137
+ "epoch": 1.61,
138
+ "eval_accuracy": 0.6746607762701168,
139
+ "eval_loss": 1.6434643268585205,
140
+ "eval_runtime": 0.7966,
141
+ "eval_samples_per_second": 627.631,
142
+ "eval_steps_per_second": 40.168,
143
+ "step": 4500
144
+ },
145
+ {
146
+ "epoch": 1.79,
147
+ "learning_rate": 4.70247884638303e-05,
148
+ "loss": 1.6719,
149
+ "step": 5000
150
+ },
151
+ {
152
+ "epoch": 1.79,
153
+ "eval_accuracy": 0.6907181571815718,
154
+ "eval_loss": 1.483905553817749,
155
+ "eval_runtime": 0.7989,
156
+ "eval_samples_per_second": 625.841,
157
+ "eval_steps_per_second": 40.054,
158
+ "step": 5000
159
+ },
160
+ {
161
+ "epoch": 1.97,
162
+ "learning_rate": 4.672685019663926e-05,
163
+ "loss": 1.6502,
164
+ "step": 5500
165
+ },
166
+ {
167
+ "epoch": 1.97,
168
+ "eval_accuracy": 0.6896661367249602,
169
+ "eval_loss": 1.535127878189087,
170
+ "eval_runtime": 0.823,
171
+ "eval_samples_per_second": 607.558,
172
+ "eval_steps_per_second": 38.884,
173
+ "step": 5500
174
+ },
175
+ {
176
+ "epoch": 2.15,
177
+ "learning_rate": 4.642891192944822e-05,
178
+ "loss": 1.6233,
179
+ "step": 6000
180
+ },
181
+ {
182
+ "epoch": 2.15,
183
+ "eval_accuracy": 0.6763219939373526,
184
+ "eval_loss": 1.6817570924758911,
185
+ "eval_runtime": 0.7881,
186
+ "eval_samples_per_second": 634.403,
187
+ "eval_steps_per_second": 40.602,
188
+ "step": 6000
189
+ },
190
+ {
191
+ "epoch": 2.32,
192
+ "learning_rate": 4.6130973662257184e-05,
193
+ "loss": 1.6127,
194
+ "step": 6500
195
+ },
196
+ {
197
+ "epoch": 2.32,
198
+ "eval_accuracy": 0.685335059889932,
199
+ "eval_loss": 1.5865211486816406,
200
+ "eval_runtime": 0.787,
201
+ "eval_samples_per_second": 635.291,
202
+ "eval_steps_per_second": 40.659,
203
+ "step": 6500
204
+ },
205
+ {
206
+ "epoch": 2.5,
207
+ "learning_rate": 4.5833035395066143e-05,
208
+ "loss": 1.6274,
209
+ "step": 7000
210
+ },
211
+ {
212
+ "epoch": 2.5,
213
+ "eval_accuracy": 0.7003633961017509,
214
+ "eval_loss": 1.5004233121871948,
215
+ "eval_runtime": 0.8009,
216
+ "eval_samples_per_second": 624.318,
217
+ "eval_steps_per_second": 39.956,
218
+ "step": 7000
219
+ },
220
+ {
221
+ "epoch": 2.68,
222
+ "learning_rate": 4.553628888094387e-05,
223
+ "loss": 1.601,
224
+ "step": 7500
225
+ },
226
+ {
227
+ "epoch": 2.68,
228
+ "eval_accuracy": 0.6929970129439097,
229
+ "eval_loss": 1.452188491821289,
230
+ "eval_runtime": 0.7898,
231
+ "eval_samples_per_second": 633.056,
232
+ "eval_steps_per_second": 40.516,
233
+ "step": 7500
234
+ },
235
+ {
236
+ "epoch": 2.86,
237
+ "learning_rate": 4.523835061375284e-05,
238
+ "loss": 1.6123,
239
+ "step": 8000
240
+ },
241
+ {
242
+ "epoch": 2.86,
243
+ "eval_accuracy": 0.689419795221843,
244
+ "eval_loss": 1.5370689630508423,
245
+ "eval_runtime": 0.8546,
246
+ "eval_samples_per_second": 585.05,
247
+ "eval_steps_per_second": 37.443,
248
+ "step": 8000
249
+ },
250
+ {
251
+ "epoch": 3.04,
252
+ "learning_rate": 4.4940412346561796e-05,
253
+ "loss": 1.6074,
254
+ "step": 8500
255
+ },
256
+ {
257
+ "epoch": 3.04,
258
+ "eval_accuracy": 0.6952157912345266,
259
+ "eval_loss": 1.5342369079589844,
260
+ "eval_runtime": 0.8214,
261
+ "eval_samples_per_second": 608.68,
262
+ "eval_steps_per_second": 38.956,
263
+ "step": 8500
264
+ },
265
+ {
266
+ "epoch": 3.22,
267
+ "learning_rate": 4.4642474079370755e-05,
268
+ "loss": 1.563,
269
+ "step": 9000
270
+ },
271
+ {
272
+ "epoch": 3.22,
273
+ "eval_accuracy": 0.6875834445927904,
274
+ "eval_loss": 1.568178415298462,
275
+ "eval_runtime": 0.8488,
276
+ "eval_samples_per_second": 589.06,
277
+ "eval_steps_per_second": 37.7,
278
+ "step": 9000
279
+ },
280
+ {
281
+ "epoch": 3.4,
282
+ "learning_rate": 4.4344535812179714e-05,
283
+ "loss": 1.5746,
284
+ "step": 9500
285
+ },
286
+ {
287
+ "epoch": 3.4,
288
+ "eval_accuracy": 0.6957663275352806,
289
+ "eval_loss": 1.5704632997512817,
290
+ "eval_runtime": 0.852,
291
+ "eval_samples_per_second": 586.84,
292
+ "eval_steps_per_second": 37.558,
293
+ "step": 9500
294
+ },
295
+ {
296
+ "epoch": 3.58,
297
+ "learning_rate": 4.404778929805745e-05,
298
+ "loss": 1.5539,
299
+ "step": 10000
300
+ },
301
+ {
302
+ "epoch": 3.58,
303
+ "eval_accuracy": 0.7040711597673623,
304
+ "eval_loss": 1.4710707664489746,
305
+ "eval_runtime": 0.85,
306
+ "eval_samples_per_second": 588.248,
307
+ "eval_steps_per_second": 37.648,
308
+ "step": 10000
309
+ },
310
+ {
311
+ "epoch": 3.75,
312
+ "learning_rate": 4.374985103086641e-05,
313
+ "loss": 1.578,
314
+ "step": 10500
315
+ },
316
+ {
317
+ "epoch": 3.75,
318
+ "eval_accuracy": 0.6888888888888889,
319
+ "eval_loss": 1.5465725660324097,
320
+ "eval_runtime": 0.8902,
321
+ "eval_samples_per_second": 561.645,
322
+ "eval_steps_per_second": 35.945,
323
+ "step": 10500
324
+ },
325
+ {
326
+ "epoch": 3.93,
327
+ "learning_rate": 4.345191276367537e-05,
328
+ "loss": 1.5492,
329
+ "step": 11000
330
+ },
331
+ {
332
+ "epoch": 3.93,
333
+ "eval_accuracy": 0.6968894771674388,
334
+ "eval_loss": 1.4628891944885254,
335
+ "eval_runtime": 0.8368,
336
+ "eval_samples_per_second": 597.487,
337
+ "eval_steps_per_second": 38.239,
338
+ "step": 11000
339
+ },
340
+ {
341
+ "epoch": 4.11,
342
+ "learning_rate": 4.3153974496484326e-05,
343
+ "loss": 1.5291,
344
+ "step": 11500
345
+ },
346
+ {
347
+ "epoch": 4.11,
348
+ "eval_accuracy": 0.7200132538104705,
349
+ "eval_loss": 1.4264894723892212,
350
+ "eval_runtime": 0.8798,
351
+ "eval_samples_per_second": 568.319,
352
+ "eval_steps_per_second": 36.372,
353
+ "step": 11500
354
+ },
355
+ {
356
+ "epoch": 4.29,
357
+ "learning_rate": 4.285603622929329e-05,
358
+ "loss": 1.5079,
359
+ "step": 12000
360
+ },
361
+ {
362
+ "epoch": 4.29,
363
+ "eval_accuracy": 0.6966074313408723,
364
+ "eval_loss": 1.5052707195281982,
365
+ "eval_runtime": 0.8186,
366
+ "eval_samples_per_second": 610.796,
367
+ "eval_steps_per_second": 39.091,
368
+ "step": 12000
369
+ },
370
+ {
371
+ "epoch": 4.47,
372
+ "learning_rate": 4.255809796210226e-05,
373
+ "loss": 1.5283,
374
+ "step": 12500
375
+ },
376
+ {
377
+ "epoch": 4.47,
378
+ "eval_accuracy": 0.6902654867256637,
379
+ "eval_loss": 1.5257039070129395,
380
+ "eval_runtime": 0.8002,
381
+ "eval_samples_per_second": 624.861,
382
+ "eval_steps_per_second": 39.991,
383
+ "step": 12500
384
+ },
385
+ {
386
+ "epoch": 4.65,
387
+ "learning_rate": 4.226015969491122e-05,
388
+ "loss": 1.5141,
389
+ "step": 13000
390
+ },
391
+ {
392
+ "epoch": 4.65,
393
+ "eval_accuracy": 0.6949898442789438,
394
+ "eval_loss": 1.5063292980194092,
395
+ "eval_runtime": 0.8654,
396
+ "eval_samples_per_second": 577.759,
397
+ "eval_steps_per_second": 36.977,
398
+ "step": 13000
399
+ },
400
+ {
401
+ "epoch": 4.83,
402
+ "learning_rate": 4.1962221427720176e-05,
403
+ "loss": 1.4979,
404
+ "step": 13500
405
+ },
406
+ {
407
+ "epoch": 4.83,
408
+ "eval_accuracy": 0.6955945677376615,
409
+ "eval_loss": 1.5636450052261353,
410
+ "eval_runtime": 0.8149,
411
+ "eval_samples_per_second": 613.582,
412
+ "eval_steps_per_second": 39.269,
413
+ "step": 13500
414
+ },
415
+ {
416
+ "epoch": 5.01,
417
+ "learning_rate": 4.1664283160529136e-05,
418
+ "loss": 1.5294,
419
+ "step": 14000
420
+ },
421
+ {
422
+ "epoch": 5.01,
423
+ "eval_accuracy": 0.6835193696651346,
424
+ "eval_loss": 1.587847113609314,
425
+ "eval_runtime": 0.8296,
426
+ "eval_samples_per_second": 602.733,
427
+ "eval_steps_per_second": 38.575,
428
+ "step": 14000
429
+ },
430
+ {
431
+ "epoch": 5.18,
432
+ "learning_rate": 4.13663448933381e-05,
433
+ "loss": 1.4641,
434
+ "step": 14500
435
+ },
436
+ {
437
+ "epoch": 5.18,
438
+ "eval_accuracy": 0.6962067807989258,
439
+ "eval_loss": 1.5574804544448853,
440
+ "eval_runtime": 0.81,
441
+ "eval_samples_per_second": 617.287,
442
+ "eval_steps_per_second": 39.506,
443
+ "step": 14500
444
+ },
445
+ {
446
+ "epoch": 5.36,
447
+ "learning_rate": 4.106840662614707e-05,
448
+ "loss": 1.4754,
449
+ "step": 15000
450
+ },
451
+ {
452
+ "epoch": 5.36,
453
+ "eval_accuracy": 0.7006847081838931,
454
+ "eval_loss": 1.4779187440872192,
455
+ "eval_runtime": 0.8312,
456
+ "eval_samples_per_second": 601.557,
457
+ "eval_steps_per_second": 38.5,
458
+ "step": 15000
459
+ },
460
+ {
461
+ "epoch": 5.54,
462
+ "learning_rate": 4.077046835895603e-05,
463
+ "loss": 1.4696,
464
+ "step": 15500
465
+ },
466
+ {
467
+ "epoch": 5.54,
468
+ "eval_accuracy": 0.6965271015903928,
469
+ "eval_loss": 1.451996922492981,
470
+ "eval_runtime": 0.7909,
471
+ "eval_samples_per_second": 632.19,
472
+ "eval_steps_per_second": 40.46,
473
+ "step": 15500
474
+ },
475
+ {
476
+ "epoch": 5.72,
477
+ "learning_rate": 4.0472530091764986e-05,
478
+ "loss": 1.4655,
479
+ "step": 16000
480
+ },
481
+ {
482
+ "epoch": 5.72,
483
+ "eval_accuracy": 0.683049147442327,
484
+ "eval_loss": 1.6320295333862305,
485
+ "eval_runtime": 0.8309,
486
+ "eval_samples_per_second": 601.76,
487
+ "eval_steps_per_second": 38.513,
488
+ "step": 16000
489
+ },
490
+ {
491
+ "epoch": 5.9,
492
+ "learning_rate": 4.0174591824573945e-05,
493
+ "loss": 1.4792,
494
+ "step": 16500
495
+ },
496
+ {
497
+ "epoch": 5.9,
498
+ "eval_accuracy": 0.7134165866154338,
499
+ "eval_loss": 1.415226697921753,
500
+ "eval_runtime": 0.8575,
501
+ "eval_samples_per_second": 583.097,
502
+ "eval_steps_per_second": 37.318,
503
+ "step": 16500
504
+ },
505
+ {
506
+ "epoch": 6.08,
507
+ "learning_rate": 3.98772494339173e-05,
508
+ "loss": 1.4379,
509
+ "step": 17000
510
+ },
511
+ {
512
+ "epoch": 6.08,
513
+ "eval_accuracy": 0.7041935483870968,
514
+ "eval_loss": 1.4900156259536743,
515
+ "eval_runtime": 0.8413,
516
+ "eval_samples_per_second": 594.352,
517
+ "eval_steps_per_second": 38.039,
518
+ "step": 17000
519
+ },
520
+ {
521
+ "epoch": 6.26,
522
+ "learning_rate": 3.957931116672626e-05,
523
+ "loss": 1.4281,
524
+ "step": 17500
525
+ },
526
+ {
527
+ "epoch": 6.26,
528
+ "eval_accuracy": 0.6989864864864865,
529
+ "eval_loss": 1.5407416820526123,
530
+ "eval_runtime": 0.8677,
531
+ "eval_samples_per_second": 576.232,
532
+ "eval_steps_per_second": 36.879,
533
+ "step": 17500
534
+ },
535
+ {
536
+ "epoch": 6.44,
537
+ "learning_rate": 3.928137289953522e-05,
538
+ "loss": 1.436,
539
+ "step": 18000
540
+ },
541
+ {
542
+ "epoch": 6.44,
543
+ "eval_accuracy": 0.6914175506268081,
544
+ "eval_loss": 1.534258246421814,
545
+ "eval_runtime": 0.843,
546
+ "eval_samples_per_second": 593.143,
547
+ "eval_steps_per_second": 37.961,
548
+ "step": 18000
549
+ },
550
+ {
551
+ "epoch": 6.61,
552
+ "learning_rate": 3.8983434632344176e-05,
553
+ "loss": 1.4342,
554
+ "step": 18500
555
+ },
556
+ {
557
+ "epoch": 6.61,
558
+ "eval_accuracy": 0.7023696682464455,
559
+ "eval_loss": 1.5323561429977417,
560
+ "eval_runtime": 0.7874,
561
+ "eval_samples_per_second": 635.024,
562
+ "eval_steps_per_second": 40.642,
563
+ "step": 18500
564
+ },
565
+ {
566
+ "epoch": 6.79,
567
+ "learning_rate": 3.868549636515314e-05,
568
+ "loss": 1.4176,
569
+ "step": 19000
570
+ },
571
+ {
572
+ "epoch": 6.79,
573
+ "eval_accuracy": 0.7132913490222075,
574
+ "eval_loss": 1.4485751390457153,
575
+ "eval_runtime": 0.8567,
576
+ "eval_samples_per_second": 583.665,
577
+ "eval_steps_per_second": 37.355,
578
+ "step": 19000
579
+ },
580
+ {
581
+ "epoch": 6.97,
582
+ "learning_rate": 3.838755809796211e-05,
583
+ "loss": 1.4308,
584
+ "step": 19500
585
+ },
586
+ {
587
+ "epoch": 6.97,
588
+ "eval_accuracy": 0.7031503734978889,
589
+ "eval_loss": 1.4598056077957153,
590
+ "eval_runtime": 0.79,
591
+ "eval_samples_per_second": 632.872,
592
+ "eval_steps_per_second": 40.504,
593
+ "step": 19500
594
+ },
595
+ {
596
+ "epoch": 7.15,
597
+ "learning_rate": 3.809021570730545e-05,
598
+ "loss": 1.4014,
599
+ "step": 20000
600
+ },
601
+ {
602
+ "epoch": 7.15,
603
+ "eval_accuracy": 0.6938435940099834,
604
+ "eval_loss": 1.575023889541626,
605
+ "eval_runtime": 0.8292,
606
+ "eval_samples_per_second": 603.024,
607
+ "eval_steps_per_second": 38.594,
608
+ "step": 20000
609
+ },
610
+ {
611
+ "epoch": 7.33,
612
+ "learning_rate": 3.779227744011441e-05,
613
+ "loss": 1.3661,
614
+ "step": 20500
615
+ },
616
+ {
617
+ "epoch": 7.33,
618
+ "eval_accuracy": 0.6985221674876847,
619
+ "eval_loss": 1.5403505563735962,
620
+ "eval_runtime": 0.8319,
621
+ "eval_samples_per_second": 601.063,
622
+ "eval_steps_per_second": 38.468,
623
+ "step": 20500
624
+ },
625
+ {
626
+ "epoch": 7.51,
627
+ "learning_rate": 3.7494935049457754e-05,
628
+ "loss": 1.3857,
629
+ "step": 21000
630
+ },
631
+ {
632
+ "epoch": 7.51,
633
+ "eval_accuracy": 0.7037155669442665,
634
+ "eval_loss": 1.4692307710647583,
635
+ "eval_runtime": 0.8177,
636
+ "eval_samples_per_second": 611.5,
637
+ "eval_steps_per_second": 39.136,
638
+ "step": 21000
639
+ },
640
+ {
641
+ "epoch": 7.69,
642
+ "learning_rate": 3.719699678226672e-05,
643
+ "loss": 1.3846,
644
+ "step": 21500
645
+ },
646
+ {
647
+ "epoch": 7.69,
648
+ "eval_accuracy": 0.6941445861956166,
649
+ "eval_loss": 1.5511342287063599,
650
+ "eval_runtime": 0.7898,
651
+ "eval_samples_per_second": 633.076,
652
+ "eval_steps_per_second": 40.517,
653
+ "step": 21500
654
+ },
655
+ {
656
+ "epoch": 7.87,
657
+ "learning_rate": 3.689905851507568e-05,
658
+ "loss": 1.3867,
659
+ "step": 22000
660
+ },
661
+ {
662
+ "epoch": 7.87,
663
+ "eval_accuracy": 0.6925124792013311,
664
+ "eval_loss": 1.5321439504623413,
665
+ "eval_runtime": 0.8379,
666
+ "eval_samples_per_second": 596.713,
667
+ "eval_steps_per_second": 38.19,
668
+ "step": 22000
669
+ },
670
+ {
671
+ "epoch": 8.04,
672
+ "learning_rate": 3.660112024788464e-05,
673
+ "loss": 1.3658,
674
+ "step": 22500
675
+ },
676
+ {
677
+ "epoch": 8.04,
678
+ "eval_accuracy": 0.7020917678812416,
679
+ "eval_loss": 1.5499885082244873,
680
+ "eval_runtime": 0.8209,
681
+ "eval_samples_per_second": 609.075,
682
+ "eval_steps_per_second": 38.981,
683
+ "step": 22500
684
+ },
685
+ {
686
+ "epoch": 8.22,
687
+ "learning_rate": 3.6303181980693604e-05,
688
+ "loss": 1.3406,
689
+ "step": 23000
690
+ },
691
+ {
692
+ "epoch": 8.22,
693
+ "eval_accuracy": 0.6959503592423253,
694
+ "eval_loss": 1.523918628692627,
695
+ "eval_runtime": 0.8298,
696
+ "eval_samples_per_second": 602.525,
697
+ "eval_steps_per_second": 38.562,
698
+ "step": 23000
699
+ },
700
+ {
701
+ "epoch": 8.4,
702
+ "learning_rate": 3.600524371350256e-05,
703
+ "loss": 1.3405,
704
+ "step": 23500
705
+ },
706
+ {
707
+ "epoch": 8.4,
708
+ "eval_accuracy": 0.7055256064690026,
709
+ "eval_loss": 1.4414023160934448,
710
+ "eval_runtime": 0.8516,
711
+ "eval_samples_per_second": 587.105,
712
+ "eval_steps_per_second": 37.575,
713
+ "step": 23500
714
+ },
715
+ {
716
+ "epoch": 8.58,
717
+ "learning_rate": 3.570730544631153e-05,
718
+ "loss": 1.3373,
719
+ "step": 24000
720
+ },
721
+ {
722
+ "epoch": 8.58,
723
+ "eval_accuracy": 0.6784238957737527,
724
+ "eval_loss": 1.599377155303955,
725
+ "eval_runtime": 0.791,
726
+ "eval_samples_per_second": 632.109,
727
+ "eval_steps_per_second": 40.455,
728
+ "step": 24000
729
+ },
730
+ {
731
+ "epoch": 8.76,
732
+ "learning_rate": 3.540936717912049e-05,
733
+ "loss": 1.3527,
734
+ "step": 24500
735
+ },
736
+ {
737
+ "epoch": 8.76,
738
+ "eval_accuracy": 0.6970387243735763,
739
+ "eval_loss": 1.5105814933776855,
740
+ "eval_runtime": 0.8594,
741
+ "eval_samples_per_second": 581.797,
742
+ "eval_steps_per_second": 37.235,
743
+ "step": 24500
744
+ },
745
+ {
746
+ "epoch": 8.94,
747
+ "learning_rate": 3.511142891192945e-05,
748
+ "loss": 1.3436,
749
+ "step": 25000
750
+ },
751
+ {
752
+ "epoch": 8.94,
753
+ "eval_accuracy": 0.7079758500158881,
754
+ "eval_loss": 1.471426010131836,
755
+ "eval_runtime": 0.8427,
756
+ "eval_samples_per_second": 593.355,
757
+ "eval_steps_per_second": 37.975,
758
+ "step": 25000
759
+ },
760
+ {
761
+ "epoch": 9.12,
762
+ "learning_rate": 3.4813490644738414e-05,
763
+ "loss": 1.3069,
764
+ "step": 25500
765
+ },
766
+ {
767
+ "epoch": 9.12,
768
+ "eval_accuracy": 0.6953099376844867,
769
+ "eval_loss": 1.4990392923355103,
770
+ "eval_runtime": 0.8575,
771
+ "eval_samples_per_second": 583.12,
772
+ "eval_steps_per_second": 37.32,
773
+ "step": 25500
774
+ },
775
+ {
776
+ "epoch": 9.3,
777
+ "learning_rate": 3.451555237754737e-05,
778
+ "loss": 1.2969,
779
+ "step": 26000
780
+ },
781
+ {
782
+ "epoch": 9.3,
783
+ "eval_accuracy": 0.6964285714285714,
784
+ "eval_loss": 1.4809668064117432,
785
+ "eval_runtime": 0.8312,
786
+ "eval_samples_per_second": 601.512,
787
+ "eval_steps_per_second": 38.497,
788
+ "step": 26000
789
+ },
790
+ {
791
+ "epoch": 9.47,
792
+ "learning_rate": 3.421761411035634e-05,
793
+ "loss": 1.3009,
794
+ "step": 26500
795
+ },
796
+ {
797
+ "epoch": 9.47,
798
+ "eval_accuracy": 0.6875602700096431,
799
+ "eval_loss": 1.5964903831481934,
800
+ "eval_runtime": 0.8752,
801
+ "eval_samples_per_second": 571.296,
802
+ "eval_steps_per_second": 36.563,
803
+ "step": 26500
804
+ },
805
+ {
806
+ "epoch": 9.65,
807
+ "learning_rate": 3.392086759623406e-05,
808
+ "loss": 1.3227,
809
+ "step": 27000
810
+ },
811
+ {
812
+ "epoch": 9.65,
813
+ "eval_accuracy": 0.7013662979830839,
814
+ "eval_loss": 1.429559588432312,
815
+ "eval_runtime": 0.7904,
816
+ "eval_samples_per_second": 632.561,
817
+ "eval_steps_per_second": 40.484,
818
+ "step": 27000
819
+ },
820
+ {
821
+ "epoch": 9.83,
822
+ "learning_rate": 3.3622929329043025e-05,
823
+ "loss": 1.3259,
824
+ "step": 27500
825
+ },
826
+ {
827
+ "epoch": 9.83,
828
+ "eval_accuracy": 0.7189224277831873,
829
+ "eval_loss": 1.413652777671814,
830
+ "eval_runtime": 0.8134,
831
+ "eval_samples_per_second": 614.697,
832
+ "eval_steps_per_second": 39.341,
833
+ "step": 27500
834
+ },
835
+ {
836
+ "epoch": 10.01,
837
+ "learning_rate": 3.3324991061851985e-05,
838
+ "loss": 1.3131,
839
+ "step": 28000
840
+ },
841
+ {
842
+ "epoch": 10.01,
843
+ "eval_accuracy": 0.7019570099454604,
844
+ "eval_loss": 1.534200668334961,
845
+ "eval_runtime": 0.8056,
846
+ "eval_samples_per_second": 620.653,
847
+ "eval_steps_per_second": 39.722,
848
+ "step": 28000
849
+ },
850
+ {
851
+ "epoch": 10.19,
852
+ "learning_rate": 3.3027052794660944e-05,
853
+ "loss": 1.271,
854
+ "step": 28500
855
+ },
856
+ {
857
+ "epoch": 10.19,
858
+ "eval_accuracy": 0.711340206185567,
859
+ "eval_loss": 1.470828890800476,
860
+ "eval_runtime": 0.7815,
861
+ "eval_samples_per_second": 639.779,
862
+ "eval_steps_per_second": 40.946,
863
+ "step": 28500
864
+ },
865
+ {
866
+ "epoch": 10.37,
867
+ "learning_rate": 3.272911452746991e-05,
868
+ "loss": 1.2684,
869
+ "step": 29000
870
+ },
871
+ {
872
+ "epoch": 10.37,
873
+ "eval_accuracy": 0.7045747422680413,
874
+ "eval_loss": 1.4341672658920288,
875
+ "eval_runtime": 0.7954,
876
+ "eval_samples_per_second": 628.629,
877
+ "eval_steps_per_second": 40.232,
878
+ "step": 29000
879
+ },
880
+ {
881
+ "epoch": 10.55,
882
+ "learning_rate": 3.2431176260278876e-05,
883
+ "loss": 1.2767,
884
+ "step": 29500
885
+ },
886
+ {
887
+ "epoch": 10.55,
888
+ "eval_accuracy": 0.709353000335233,
889
+ "eval_loss": 1.4703407287597656,
890
+ "eval_runtime": 0.8179,
891
+ "eval_samples_per_second": 611.351,
892
+ "eval_steps_per_second": 39.126,
893
+ "step": 29500
894
+ },
895
+ {
896
+ "epoch": 10.73,
897
+ "learning_rate": 3.2133237993087835e-05,
898
+ "loss": 1.2861,
899
+ "step": 30000
900
+ },
901
+ {
902
+ "epoch": 10.73,
903
+ "eval_accuracy": 0.7308937823834197,
904
+ "eval_loss": 1.3323109149932861,
905
+ "eval_runtime": 0.7855,
906
+ "eval_samples_per_second": 636.523,
907
+ "eval_steps_per_second": 40.737,
908
+ "step": 30000
909
+ },
910
+ {
911
+ "epoch": 10.9,
912
+ "learning_rate": 3.1835299725896794e-05,
913
+ "loss": 1.2617,
914
+ "step": 30500
915
+ },
916
+ {
917
+ "epoch": 10.9,
918
+ "eval_accuracy": 0.7003344481605351,
919
+ "eval_loss": 1.4562044143676758,
920
+ "eval_runtime": 0.7951,
921
+ "eval_samples_per_second": 628.826,
922
+ "eval_steps_per_second": 40.245,
923
+ "step": 30500
924
+ },
925
+ {
926
+ "epoch": 11.08,
927
+ "learning_rate": 3.153736145870575e-05,
928
+ "loss": 1.2551,
929
+ "step": 31000
930
+ },
931
+ {
932
+ "epoch": 11.08,
933
+ "eval_accuracy": 0.7169689119170984,
934
+ "eval_loss": 1.4361472129821777,
935
+ "eval_runtime": 0.8647,
936
+ "eval_samples_per_second": 578.22,
937
+ "eval_steps_per_second": 37.006,
938
+ "step": 31000
939
+ },
940
+ {
941
+ "epoch": 11.26,
942
+ "learning_rate": 3.124001906804911e-05,
943
+ "loss": 1.2404,
944
+ "step": 31500
945
+ },
946
+ {
947
+ "epoch": 11.26,
948
+ "eval_accuracy": 0.7034617896799478,
949
+ "eval_loss": 1.4536628723144531,
950
+ "eval_runtime": 0.7907,
951
+ "eval_samples_per_second": 632.325,
952
+ "eval_steps_per_second": 40.469,
953
+ "step": 31500
954
+ },
955
+ {
956
+ "epoch": 11.44,
957
+ "learning_rate": 3.0942080800858066e-05,
958
+ "loss": 1.2562,
959
+ "step": 32000
960
+ },
961
+ {
962
+ "epoch": 11.44,
963
+ "eval_accuracy": 0.7132209980557356,
964
+ "eval_loss": 1.4038574695587158,
965
+ "eval_runtime": 0.7924,
966
+ "eval_samples_per_second": 631.001,
967
+ "eval_steps_per_second": 40.384,
968
+ "step": 32000
969
+ },
970
+ {
971
+ "epoch": 11.62,
972
+ "learning_rate": 3.0644142533667025e-05,
973
+ "loss": 1.2489,
974
+ "step": 32500
975
+ },
976
+ {
977
+ "epoch": 11.62,
978
+ "eval_accuracy": 0.706418918918919,
979
+ "eval_loss": 1.4372212886810303,
980
+ "eval_runtime": 0.8024,
981
+ "eval_samples_per_second": 623.122,
982
+ "eval_steps_per_second": 39.88,
983
+ "step": 32500
984
+ },
985
+ {
986
+ "epoch": 11.8,
987
+ "learning_rate": 3.0346204266475984e-05,
988
+ "loss": 1.2406,
989
+ "step": 33000
990
+ },
991
+ {
992
+ "epoch": 11.8,
993
+ "eval_accuracy": 0.7087442472057857,
994
+ "eval_loss": 1.4926137924194336,
995
+ "eval_runtime": 0.8525,
996
+ "eval_samples_per_second": 586.532,
997
+ "eval_steps_per_second": 37.538,
998
+ "step": 33000
999
+ },
1000
+ {
1001
+ "epoch": 11.98,
1002
+ "learning_rate": 3.0048265999284947e-05,
1003
+ "loss": 1.2285,
1004
+ "step": 33500
1005
+ },
1006
+ {
1007
+ "epoch": 11.98,
1008
+ "eval_accuracy": 0.7152005392652511,
1009
+ "eval_loss": 1.4080321788787842,
1010
+ "eval_runtime": 0.8108,
1011
+ "eval_samples_per_second": 616.703,
1012
+ "eval_steps_per_second": 39.469,
1013
+ "step": 33500
1014
+ },
1015
+ {
1016
+ "epoch": 12.16,
1017
+ "learning_rate": 2.9750327732093913e-05,
1018
+ "loss": 1.2213,
1019
+ "step": 34000
1020
+ },
1021
+ {
1022
+ "epoch": 12.16,
1023
+ "eval_accuracy": 0.7170240415854451,
1024
+ "eval_loss": 1.403072476387024,
1025
+ "eval_runtime": 0.8459,
1026
+ "eval_samples_per_second": 591.089,
1027
+ "eval_steps_per_second": 37.83,
1028
+ "step": 34000
1029
+ },
1030
+ {
1031
+ "epoch": 12.33,
1032
+ "learning_rate": 2.9452389464902875e-05,
1033
+ "loss": 1.1998,
1034
+ "step": 34500
1035
+ },
1036
+ {
1037
+ "epoch": 12.33,
1038
+ "eval_accuracy": 0.7222584856396866,
1039
+ "eval_loss": 1.3541438579559326,
1040
+ "eval_runtime": 0.7909,
1041
+ "eval_samples_per_second": 632.16,
1042
+ "eval_steps_per_second": 40.458,
1043
+ "step": 34500
1044
+ },
1045
+ {
1046
+ "epoch": 12.51,
1047
+ "learning_rate": 2.9154451197711835e-05,
1048
+ "loss": 1.2184,
1049
+ "step": 35000
1050
+ },
1051
+ {
1052
+ "epoch": 12.51,
1053
+ "eval_accuracy": 0.7308441558441559,
1054
+ "eval_loss": 1.3629957437515259,
1055
+ "eval_runtime": 0.8716,
1056
+ "eval_samples_per_second": 573.677,
1057
+ "eval_steps_per_second": 36.715,
1058
+ "step": 35000
1059
+ },
1060
+ {
1061
+ "epoch": 12.69,
1062
+ "learning_rate": 2.8856512930520797e-05,
1063
+ "loss": 1.2195,
1064
+ "step": 35500
1065
+ },
1066
+ {
1067
+ "epoch": 12.69,
1068
+ "eval_accuracy": 0.7281362594169669,
1069
+ "eval_loss": 1.312456488609314,
1070
+ "eval_runtime": 0.852,
1071
+ "eval_samples_per_second": 586.847,
1072
+ "eval_steps_per_second": 37.558,
1073
+ "step": 35500
1074
+ },
1075
+ {
1076
+ "epoch": 12.87,
1077
+ "learning_rate": 2.8558574663329756e-05,
1078
+ "loss": 1.2178,
1079
+ "step": 36000
1080
+ },
1081
+ {
1082
+ "epoch": 12.87,
1083
+ "eval_accuracy": 0.7119236883942767,
1084
+ "eval_loss": 1.4257023334503174,
1085
+ "eval_runtime": 0.8597,
1086
+ "eval_samples_per_second": 581.571,
1087
+ "eval_steps_per_second": 37.221,
1088
+ "step": 36000
1089
+ },
1090
+ {
1091
+ "epoch": 13.05,
1092
+ "learning_rate": 2.8260636396138722e-05,
1093
+ "loss": 1.1918,
1094
+ "step": 36500
1095
+ },
1096
+ {
1097
+ "epoch": 13.05,
1098
+ "eval_accuracy": 0.7152686762778506,
1099
+ "eval_loss": 1.4108035564422607,
1100
+ "eval_runtime": 0.9192,
1101
+ "eval_samples_per_second": 543.96,
1102
+ "eval_steps_per_second": 34.813,
1103
+ "step": 36500
1104
+ },
1105
+ {
1106
+ "epoch": 13.23,
1107
+ "learning_rate": 2.7963294005482066e-05,
1108
+ "loss": 1.1664,
1109
+ "step": 37000
1110
+ },
1111
+ {
1112
+ "epoch": 13.23,
1113
+ "eval_accuracy": 0.7226588081204977,
1114
+ "eval_loss": 1.3577048778533936,
1115
+ "eval_runtime": 0.7887,
1116
+ "eval_samples_per_second": 633.948,
1117
+ "eval_steps_per_second": 40.573,
1118
+ "step": 37000
1119
+ },
1120
+ {
1121
+ "epoch": 13.41,
1122
+ "learning_rate": 2.7665355738291028e-05,
1123
+ "loss": 1.1754,
1124
+ "step": 37500
1125
+ },
1126
+ {
1127
+ "epoch": 13.41,
1128
+ "eval_accuracy": 0.720593191776205,
1129
+ "eval_loss": 1.377700924873352,
1130
+ "eval_runtime": 0.8445,
1131
+ "eval_samples_per_second": 592.06,
1132
+ "eval_steps_per_second": 37.892,
1133
+ "step": 37500
1134
+ },
1135
+ {
1136
+ "epoch": 13.59,
1137
+ "learning_rate": 2.7367417471099987e-05,
1138
+ "loss": 1.1855,
1139
+ "step": 38000
1140
+ },
1141
+ {
1142
+ "epoch": 13.59,
1143
+ "eval_accuracy": 0.7354008578027054,
1144
+ "eval_loss": 1.350059151649475,
1145
+ "eval_runtime": 0.8109,
1146
+ "eval_samples_per_second": 616.607,
1147
+ "eval_steps_per_second": 39.463,
1148
+ "step": 38000
1149
+ },
1150
+ {
1151
+ "epoch": 13.76,
1152
+ "learning_rate": 2.7070075080443334e-05,
1153
+ "loss": 1.1644,
1154
+ "step": 38500
1155
+ },
1156
+ {
1157
+ "epoch": 13.76,
1158
+ "eval_accuracy": 0.7206685953069752,
1159
+ "eval_loss": 1.374656081199646,
1160
+ "eval_runtime": 0.8397,
1161
+ "eval_samples_per_second": 595.482,
1162
+ "eval_steps_per_second": 38.111,
1163
+ "step": 38500
1164
+ },
1165
+ {
1166
+ "epoch": 13.94,
1167
+ "learning_rate": 2.6772136813252297e-05,
1168
+ "loss": 1.1709,
1169
+ "step": 39000
1170
+ },
1171
+ {
1172
+ "epoch": 13.94,
1173
+ "eval_accuracy": 0.7183739837398374,
1174
+ "eval_loss": 1.3703839778900146,
1175
+ "eval_runtime": 0.8025,
1176
+ "eval_samples_per_second": 623.038,
1177
+ "eval_steps_per_second": 39.874,
1178
+ "step": 39000
1179
+ },
1180
+ {
1181
+ "epoch": 14.12,
1182
+ "learning_rate": 2.6474198546061256e-05,
1183
+ "loss": 1.1613,
1184
+ "step": 39500
1185
+ },
1186
+ {
1187
+ "epoch": 14.12,
1188
+ "eval_accuracy": 0.7246875,
1189
+ "eval_loss": 1.4306718111038208,
1190
+ "eval_runtime": 0.8499,
1191
+ "eval_samples_per_second": 588.275,
1192
+ "eval_steps_per_second": 37.65,
1193
+ "step": 39500
1194
+ },
1195
+ {
1196
+ "epoch": 14.3,
1197
+ "learning_rate": 2.617626027887022e-05,
1198
+ "loss": 1.1443,
1199
+ "step": 40000
1200
+ },
1201
+ {
1202
+ "epoch": 14.3,
1203
+ "eval_accuracy": 0.7220978573712824,
1204
+ "eval_loss": 1.3189983367919922,
1205
+ "eval_runtime": 0.7903,
1206
+ "eval_samples_per_second": 632.651,
1207
+ "eval_steps_per_second": 40.49,
1208
+ "step": 40000
1209
+ },
1210
+ {
1211
+ "epoch": 14.48,
1212
+ "learning_rate": 2.5878322011679178e-05,
1213
+ "loss": 1.1356,
1214
+ "step": 40500
1215
+ },
1216
+ {
1217
+ "epoch": 14.48,
1218
+ "eval_accuracy": 0.7331329325317302,
1219
+ "eval_loss": 1.3287793397903442,
1220
+ "eval_runtime": 0.7921,
1221
+ "eval_samples_per_second": 631.257,
1222
+ "eval_steps_per_second": 40.4,
1223
+ "step": 40500
1224
+ },
1225
+ {
1226
+ "epoch": 14.66,
1227
+ "learning_rate": 2.5580383744488147e-05,
1228
+ "loss": 1.1493,
1229
+ "step": 41000
1230
+ },
1231
+ {
1232
+ "epoch": 14.66,
1233
+ "eval_accuracy": 0.7240227196792516,
1234
+ "eval_loss": 1.3504801988601685,
1235
+ "eval_runtime": 0.8432,
1236
+ "eval_samples_per_second": 592.975,
1237
+ "eval_steps_per_second": 37.95,
1238
+ "step": 41000
1239
+ },
1240
+ {
1241
+ "epoch": 14.84,
1242
+ "learning_rate": 2.5283041353831487e-05,
1243
+ "loss": 1.1417,
1244
+ "step": 41500
1245
+ },
1246
+ {
1247
+ "epoch": 14.84,
1248
+ "eval_accuracy": 0.7320369149637442,
1249
+ "eval_loss": 1.31459379196167,
1250
+ "eval_runtime": 0.8272,
1251
+ "eval_samples_per_second": 604.463,
1252
+ "eval_steps_per_second": 38.686,
1253
+ "step": 41500
1254
+ },
1255
+ {
1256
+ "epoch": 15.02,
1257
+ "learning_rate": 2.498569896317483e-05,
1258
+ "loss": 1.1349,
1259
+ "step": 42000
1260
+ },
1261
+ {
1262
+ "epoch": 15.02,
1263
+ "eval_accuracy": 0.7333114107201578,
1264
+ "eval_loss": 1.3545522689819336,
1265
+ "eval_runtime": 0.8634,
1266
+ "eval_samples_per_second": 579.106,
1267
+ "eval_steps_per_second": 37.063,
1268
+ "step": 42000
1269
+ },
1270
+ {
1271
+ "epoch": 15.19,
1272
+ "learning_rate": 2.4687760695983793e-05,
1273
+ "loss": 1.1169,
1274
+ "step": 42500
1275
+ },
1276
+ {
1277
+ "epoch": 15.19,
1278
+ "eval_accuracy": 0.7246922024623803,
1279
+ "eval_loss": 1.37086021900177,
1280
+ "eval_runtime": 0.8611,
1281
+ "eval_samples_per_second": 580.685,
1282
+ "eval_steps_per_second": 37.164,
1283
+ "step": 42500
1284
+ },
1285
+ {
1286
+ "epoch": 15.37,
1287
+ "learning_rate": 2.4390418305327136e-05,
1288
+ "loss": 1.1187,
1289
+ "step": 43000
1290
+ },
1291
+ {
1292
+ "epoch": 15.37,
1293
+ "eval_accuracy": 0.7217795484727756,
1294
+ "eval_loss": 1.4242717027664185,
1295
+ "eval_runtime": 0.8265,
1296
+ "eval_samples_per_second": 604.985,
1297
+ "eval_steps_per_second": 38.719,
1298
+ "step": 43000
1299
+ },
1300
+ {
1301
+ "epoch": 15.55,
1302
+ "learning_rate": 2.4092480038136102e-05,
1303
+ "loss": 1.118,
1304
+ "step": 43500
1305
+ },
1306
+ {
1307
+ "epoch": 15.55,
1308
+ "eval_accuracy": 0.7264245251582806,
1309
+ "eval_loss": 1.3835431337356567,
1310
+ "eval_runtime": 0.8374,
1311
+ "eval_samples_per_second": 597.064,
1312
+ "eval_steps_per_second": 38.212,
1313
+ "step": 43500
1314
+ },
1315
+ {
1316
+ "epoch": 15.73,
1317
+ "learning_rate": 2.379454177094506e-05,
1318
+ "loss": 1.1165,
1319
+ "step": 44000
1320
+ },
1321
+ {
1322
+ "epoch": 15.73,
1323
+ "eval_accuracy": 0.7253818654533637,
1324
+ "eval_loss": 1.3239895105361938,
1325
+ "eval_runtime": 0.8499,
1326
+ "eval_samples_per_second": 588.29,
1327
+ "eval_steps_per_second": 37.651,
1328
+ "step": 44000
1329
+ },
1330
+ {
1331
+ "epoch": 15.91,
1332
+ "learning_rate": 2.3496603503754024e-05,
1333
+ "loss": 1.114,
1334
+ "step": 44500
1335
+ },
1336
+ {
1337
+ "epoch": 15.91,
1338
+ "eval_accuracy": 0.7382113821138211,
1339
+ "eval_loss": 1.3263858556747437,
1340
+ "eval_runtime": 0.8424,
1341
+ "eval_samples_per_second": 593.546,
1342
+ "eval_steps_per_second": 37.987,
1343
+ "step": 44500
1344
+ },
1345
+ {
1346
+ "epoch": 16.09,
1347
+ "learning_rate": 2.3198665236562986e-05,
1348
+ "loss": 1.105,
1349
+ "step": 45000
1350
+ },
1351
+ {
1352
+ "epoch": 16.09,
1353
+ "eval_accuracy": 0.7333548804137039,
1354
+ "eval_loss": 1.3213739395141602,
1355
+ "eval_runtime": 0.8677,
1356
+ "eval_samples_per_second": 576.224,
1357
+ "eval_steps_per_second": 36.878,
1358
+ "step": 45000
1359
+ },
1360
+ {
1361
+ "epoch": 16.27,
1362
+ "learning_rate": 2.2900726969371946e-05,
1363
+ "loss": 1.0924,
1364
+ "step": 45500
1365
+ },
1366
+ {
1367
+ "epoch": 16.27,
1368
+ "eval_accuracy": 0.7282392026578073,
1369
+ "eval_loss": 1.384667992591858,
1370
+ "eval_runtime": 0.9421,
1371
+ "eval_samples_per_second": 530.704,
1372
+ "eval_steps_per_second": 33.965,
1373
+ "step": 45500
1374
+ },
1375
+ {
1376
+ "epoch": 16.45,
1377
+ "learning_rate": 2.260278870218091e-05,
1378
+ "loss": 1.0915,
1379
+ "step": 46000
1380
+ },
1381
+ {
1382
+ "epoch": 16.45,
1383
+ "eval_accuracy": 0.7317073170731707,
1384
+ "eval_loss": 1.3603721857070923,
1385
+ "eval_runtime": 0.7951,
1386
+ "eval_samples_per_second": 628.874,
1387
+ "eval_steps_per_second": 40.248,
1388
+ "step": 46000
1389
+ },
1390
+ {
1391
+ "epoch": 16.62,
1392
+ "learning_rate": 2.230485043498987e-05,
1393
+ "loss": 1.0968,
1394
+ "step": 46500
1395
+ },
1396
+ {
1397
+ "epoch": 16.62,
1398
+ "eval_accuracy": 0.7319177173191772,
1399
+ "eval_loss": 1.3539705276489258,
1400
+ "eval_runtime": 0.8815,
1401
+ "eval_samples_per_second": 567.187,
1402
+ "eval_steps_per_second": 36.3,
1403
+ "step": 46500
1404
+ },
1405
+ {
1406
+ "epoch": 16.8,
1407
+ "learning_rate": 2.2006912167798833e-05,
1408
+ "loss": 1.0772,
1409
+ "step": 47000
1410
+ },
1411
+ {
1412
+ "epoch": 16.8,
1413
+ "eval_accuracy": 0.7306332369013179,
1414
+ "eval_loss": 1.2475004196166992,
1415
+ "eval_runtime": 0.8301,
1416
+ "eval_samples_per_second": 602.308,
1417
+ "eval_steps_per_second": 38.548,
1418
+ "step": 47000
1419
+ },
1420
+ {
1421
+ "epoch": 16.98,
1422
+ "learning_rate": 2.1708973900607796e-05,
1423
+ "loss": 1.0975,
1424
+ "step": 47500
1425
+ },
1426
+ {
1427
+ "epoch": 16.98,
1428
+ "eval_accuracy": 0.7448207826372903,
1429
+ "eval_loss": 1.2635700702667236,
1430
+ "eval_runtime": 0.8269,
1431
+ "eval_samples_per_second": 604.655,
1432
+ "eval_steps_per_second": 38.698,
1433
+ "step": 47500
1434
+ },
1435
+ {
1436
+ "epoch": 17.16,
1437
+ "learning_rate": 2.1411035633416755e-05,
1438
+ "loss": 1.0708,
1439
+ "step": 48000
1440
+ },
1441
+ {
1442
+ "epoch": 17.16,
1443
+ "eval_accuracy": 0.7182085648904871,
1444
+ "eval_loss": 1.4056382179260254,
1445
+ "eval_runtime": 0.8973,
1446
+ "eval_samples_per_second": 557.236,
1447
+ "eval_steps_per_second": 35.663,
1448
+ "step": 48000
1449
+ },
1450
+ {
1451
+ "epoch": 17.34,
1452
+ "learning_rate": 2.111309736622572e-05,
1453
+ "loss": 1.0654,
1454
+ "step": 48500
1455
+ },
1456
+ {
1457
+ "epoch": 17.34,
1458
+ "eval_accuracy": 0.727630285152409,
1459
+ "eval_loss": 1.3769292831420898,
1460
+ "eval_runtime": 0.8377,
1461
+ "eval_samples_per_second": 596.886,
1462
+ "eval_steps_per_second": 38.201,
1463
+ "step": 48500
1464
+ },
1465
+ {
1466
+ "epoch": 17.52,
1467
+ "learning_rate": 2.081515909903468e-05,
1468
+ "loss": 1.0676,
1469
+ "step": 49000
1470
+ },
1471
+ {
1472
+ "epoch": 17.52,
1473
+ "eval_accuracy": 0.7224234441883438,
1474
+ "eval_loss": 1.33571457862854,
1475
+ "eval_runtime": 0.7909,
1476
+ "eval_samples_per_second": 632.166,
1477
+ "eval_steps_per_second": 40.459,
1478
+ "step": 49000
1479
+ },
1480
+ {
1481
+ "epoch": 17.7,
1482
+ "learning_rate": 2.0517220831843643e-05,
1483
+ "loss": 1.0507,
1484
+ "step": 49500
1485
+ },
1486
+ {
1487
+ "epoch": 17.7,
1488
+ "eval_accuracy": 0.712369109947644,
1489
+ "eval_loss": 1.4087713956832886,
1490
+ "eval_runtime": 0.7955,
1491
+ "eval_samples_per_second": 628.504,
1492
+ "eval_steps_per_second": 40.224,
1493
+ "step": 49500
1494
+ },
1495
+ {
1496
+ "epoch": 17.88,
1497
+ "learning_rate": 2.0219282564652605e-05,
1498
+ "loss": 1.0424,
1499
+ "step": 50000
1500
+ },
1501
+ {
1502
+ "epoch": 17.88,
1503
+ "eval_accuracy": 0.7314667515112949,
1504
+ "eval_loss": 1.3146371841430664,
1505
+ "eval_runtime": 0.7881,
1506
+ "eval_samples_per_second": 634.428,
1507
+ "eval_steps_per_second": 40.603,
1508
+ "step": 50000
1509
+ },
1510
+ {
1511
+ "epoch": 18.06,
1512
+ "learning_rate": 1.9921344297461568e-05,
1513
+ "loss": 1.0524,
1514
+ "step": 50500
1515
+ },
1516
+ {
1517
+ "epoch": 18.06,
1518
+ "eval_accuracy": 0.7393395319012503,
1519
+ "eval_loss": 1.28960382938385,
1520
+ "eval_runtime": 0.8581,
1521
+ "eval_samples_per_second": 582.683,
1522
+ "eval_steps_per_second": 37.292,
1523
+ "step": 50500
1524
+ },
1525
+ {
1526
+ "epoch": 18.23,
1527
+ "learning_rate": 1.962340603027053e-05,
1528
+ "loss": 1.0349,
1529
+ "step": 51000
1530
+ },
1531
+ {
1532
+ "epoch": 18.23,
1533
+ "eval_accuracy": 0.7191558441558441,
1534
+ "eval_loss": 1.3986730575561523,
1535
+ "eval_runtime": 0.7904,
1536
+ "eval_samples_per_second": 632.599,
1537
+ "eval_steps_per_second": 40.486,
1538
+ "step": 51000
1539
+ },
1540
+ {
1541
+ "epoch": 18.41,
1542
+ "learning_rate": 1.932546776307949e-05,
1543
+ "loss": 1.0217,
1544
+ "step": 51500
1545
+ },
1546
+ {
1547
+ "epoch": 18.41,
1548
+ "eval_accuracy": 0.7380645161290322,
1549
+ "eval_loss": 1.2937612533569336,
1550
+ "eval_runtime": 0.8575,
1551
+ "eval_samples_per_second": 583.089,
1552
+ "eval_steps_per_second": 37.318,
1553
+ "step": 51500
1554
+ },
1555
+ {
1556
+ "epoch": 18.59,
1557
+ "learning_rate": 1.9028125372422833e-05,
1558
+ "loss": 1.0238,
1559
+ "step": 52000
1560
+ },
1561
+ {
1562
+ "epoch": 18.59,
1563
+ "eval_accuracy": 0.738654650788542,
1564
+ "eval_loss": 1.296163558959961,
1565
+ "eval_runtime": 0.8423,
1566
+ "eval_samples_per_second": 593.617,
1567
+ "eval_steps_per_second": 37.992,
1568
+ "step": 52000
1569
+ },
1570
+ {
1571
+ "epoch": 18.77,
1572
+ "learning_rate": 1.87301871052318e-05,
1573
+ "loss": 1.0292,
1574
+ "step": 52500
1575
+ },
1576
+ {
1577
+ "epoch": 18.77,
1578
+ "eval_accuracy": 0.737131757850437,
1579
+ "eval_loss": 1.3194587230682373,
1580
+ "eval_runtime": 0.8232,
1581
+ "eval_samples_per_second": 607.358,
1582
+ "eval_steps_per_second": 38.871,
1583
+ "step": 52500
1584
+ },
1585
+ {
1586
+ "epoch": 18.95,
1587
+ "learning_rate": 1.8433440591109523e-05,
1588
+ "loss": 1.0426,
1589
+ "step": 53000
1590
+ },
1591
+ {
1592
+ "epoch": 18.95,
1593
+ "eval_accuracy": 0.7411687025420931,
1594
+ "eval_loss": 1.2835460901260376,
1595
+ "eval_runtime": 0.7859,
1596
+ "eval_samples_per_second": 636.221,
1597
+ "eval_steps_per_second": 40.718,
1598
+ "step": 53000
1599
+ },
1600
+ {
1601
+ "epoch": 19.13,
1602
+ "learning_rate": 1.8135502323918486e-05,
1603
+ "loss": 1.0196,
1604
+ "step": 53500
1605
+ },
1606
+ {
1607
+ "epoch": 19.13,
1608
+ "eval_accuracy": 0.747275204359673,
1609
+ "eval_loss": 1.234621524810791,
1610
+ "eval_runtime": 0.8361,
1611
+ "eval_samples_per_second": 597.997,
1612
+ "eval_steps_per_second": 38.272,
1613
+ "step": 53500
1614
+ },
1615
+ {
1616
+ "epoch": 19.31,
1617
+ "learning_rate": 1.7837564056727445e-05,
1618
+ "loss": 1.012,
1619
+ "step": 54000
1620
+ },
1621
+ {
1622
+ "epoch": 19.31,
1623
+ "eval_accuracy": 0.7338292367399741,
1624
+ "eval_loss": 1.3665757179260254,
1625
+ "eval_runtime": 0.8157,
1626
+ "eval_samples_per_second": 612.938,
1627
+ "eval_steps_per_second": 39.228,
1628
+ "step": 54000
1629
+ },
1630
+ {
1631
+ "epoch": 19.49,
1632
+ "learning_rate": 1.753962578953641e-05,
1633
+ "loss": 1.0256,
1634
+ "step": 54500
1635
+ },
1636
+ {
1637
+ "epoch": 19.49,
1638
+ "eval_accuracy": 0.7364842991259307,
1639
+ "eval_loss": 1.3140363693237305,
1640
+ "eval_runtime": 0.7949,
1641
+ "eval_samples_per_second": 628.974,
1642
+ "eval_steps_per_second": 40.254,
1643
+ "step": 54500
1644
+ },
1645
+ {
1646
+ "epoch": 19.66,
1647
+ "learning_rate": 1.724168752234537e-05,
1648
+ "loss": 0.9824,
1649
+ "step": 55000
1650
+ },
1651
+ {
1652
+ "epoch": 19.66,
1653
+ "eval_accuracy": 0.7416496250852079,
1654
+ "eval_loss": 1.2764383554458618,
1655
+ "eval_runtime": 0.8178,
1656
+ "eval_samples_per_second": 611.417,
1657
+ "eval_steps_per_second": 39.131,
1658
+ "step": 55000
1659
+ },
1660
+ {
1661
+ "epoch": 19.84,
1662
+ "learning_rate": 1.6943749255154336e-05,
1663
+ "loss": 1.0048,
1664
+ "step": 55500
1665
+ },
1666
+ {
1667
+ "epoch": 19.84,
1668
+ "eval_accuracy": 0.7487891507910881,
1669
+ "eval_loss": 1.2514091730117798,
1670
+ "eval_runtime": 0.8164,
1671
+ "eval_samples_per_second": 612.474,
1672
+ "eval_steps_per_second": 39.198,
1673
+ "step": 55500
1674
+ },
1675
+ {
1676
+ "epoch": 20.02,
1677
+ "learning_rate": 1.6645810987963295e-05,
1678
+ "loss": 0.9947,
1679
+ "step": 56000
1680
+ },
1681
+ {
1682
+ "epoch": 20.02,
1683
+ "eval_accuracy": 0.7431572246976448,
1684
+ "eval_loss": 1.3350915908813477,
1685
+ "eval_runtime": 0.7912,
1686
+ "eval_samples_per_second": 631.988,
1687
+ "eval_steps_per_second": 40.447,
1688
+ "step": 56000
1689
+ },
1690
+ {
1691
+ "epoch": 20.2,
1692
+ "learning_rate": 1.634846859730664e-05,
1693
+ "loss": 0.977,
1694
+ "step": 56500
1695
+ },
1696
+ {
1697
+ "epoch": 20.2,
1698
+ "eval_accuracy": 0.7451045469631596,
1699
+ "eval_loss": 1.2854044437408447,
1700
+ "eval_runtime": 0.8499,
1701
+ "eval_samples_per_second": 588.28,
1702
+ "eval_steps_per_second": 37.65,
1703
+ "step": 56500
1704
+ },
1705
+ {
1706
+ "epoch": 20.38,
1707
+ "learning_rate": 1.60505303301156e-05,
1708
+ "loss": 0.9862,
1709
+ "step": 57000
1710
+ },
1711
+ {
1712
+ "epoch": 20.38,
1713
+ "eval_accuracy": 0.7285475792988314,
1714
+ "eval_loss": 1.366584300994873,
1715
+ "eval_runtime": 0.816,
1716
+ "eval_samples_per_second": 612.774,
1717
+ "eval_steps_per_second": 39.218,
1718
+ "step": 57000
1719
+ },
1720
+ {
1721
+ "epoch": 20.56,
1722
+ "learning_rate": 1.5752592062924564e-05,
1723
+ "loss": 0.9699,
1724
+ "step": 57500
1725
+ },
1726
+ {
1727
+ "epoch": 20.56,
1728
+ "eval_accuracy": 0.7347811780190853,
1729
+ "eval_loss": 1.3123427629470825,
1730
+ "eval_runtime": 0.7779,
1731
+ "eval_samples_per_second": 642.731,
1732
+ "eval_steps_per_second": 41.135,
1733
+ "step": 57500
1734
+ },
1735
+ {
1736
+ "epoch": 20.74,
1737
+ "learning_rate": 1.5454653795733526e-05,
1738
+ "loss": 0.977,
1739
+ "step": 58000
1740
+ },
1741
+ {
1742
+ "epoch": 20.74,
1743
+ "eval_accuracy": 0.7254770672915969,
1744
+ "eval_loss": 1.3425793647766113,
1745
+ "eval_runtime": 0.8285,
1746
+ "eval_samples_per_second": 603.485,
1747
+ "eval_steps_per_second": 38.623,
1748
+ "step": 58000
1749
+ },
1750
+ {
1751
+ "epoch": 20.92,
1752
+ "learning_rate": 1.5157311405076868e-05,
1753
+ "loss": 0.9749,
1754
+ "step": 58500
1755
+ },
1756
+ {
1757
+ "epoch": 20.92,
1758
+ "eval_accuracy": 0.7296604740550929,
1759
+ "eval_loss": 1.3763371706008911,
1760
+ "eval_runtime": 0.7855,
1761
+ "eval_samples_per_second": 636.556,
1762
+ "eval_steps_per_second": 40.74,
1763
+ "step": 58500
1764
+ },
1765
+ {
1766
+ "epoch": 21.09,
1767
+ "learning_rate": 1.4859373137885832e-05,
1768
+ "loss": 0.9505,
1769
+ "step": 59000
1770
+ },
1771
+ {
1772
+ "epoch": 21.09,
1773
+ "eval_accuracy": 0.7434469200524246,
1774
+ "eval_loss": 1.2372225522994995,
1775
+ "eval_runtime": 0.7967,
1776
+ "eval_samples_per_second": 627.592,
1777
+ "eval_steps_per_second": 40.166,
1778
+ "step": 59000
1779
+ },
1780
+ {
1781
+ "epoch": 21.27,
1782
+ "learning_rate": 1.4561434870694793e-05,
1783
+ "loss": 0.9438,
1784
+ "step": 59500
1785
+ },
1786
+ {
1787
+ "epoch": 21.27,
1788
+ "eval_accuracy": 0.7159090909090909,
1789
+ "eval_loss": 1.433412790298462,
1790
+ "eval_runtime": 0.7929,
1791
+ "eval_samples_per_second": 630.567,
1792
+ "eval_steps_per_second": 40.356,
1793
+ "step": 59500
1794
+ },
1795
+ {
1796
+ "epoch": 21.45,
1797
+ "learning_rate": 1.4263496603503754e-05,
1798
+ "loss": 0.944,
1799
+ "step": 60000
1800
+ },
1801
+ {
1802
+ "epoch": 21.45,
1803
+ "eval_accuracy": 0.7507936507936508,
1804
+ "eval_loss": 1.269033432006836,
1805
+ "eval_runtime": 0.8274,
1806
+ "eval_samples_per_second": 604.314,
1807
+ "eval_steps_per_second": 38.676,
1808
+ "step": 60000
1809
+ },
1810
+ {
1811
+ "epoch": 21.63,
1812
+ "learning_rate": 1.3965558336312718e-05,
1813
+ "loss": 0.9427,
1814
+ "step": 60500
1815
+ },
1816
+ {
1817
+ "epoch": 21.63,
1818
+ "eval_accuracy": 0.7485941118094608,
1819
+ "eval_loss": 1.2185914516448975,
1820
+ "eval_runtime": 0.7923,
1821
+ "eval_samples_per_second": 631.05,
1822
+ "eval_steps_per_second": 40.387,
1823
+ "step": 60500
1824
+ },
1825
+ {
1826
+ "epoch": 21.81,
1827
+ "learning_rate": 1.3667620069121679e-05,
1828
+ "loss": 0.9553,
1829
+ "step": 61000
1830
+ },
1831
+ {
1832
+ "epoch": 21.81,
1833
+ "eval_accuracy": 0.726882430647292,
1834
+ "eval_loss": 1.3940554857254028,
1835
+ "eval_runtime": 0.7961,
1836
+ "eval_samples_per_second": 628.083,
1837
+ "eval_steps_per_second": 40.197,
1838
+ "step": 61000
1839
+ },
1840
+ {
1841
+ "epoch": 21.99,
1842
+ "learning_rate": 1.3369681801930641e-05,
1843
+ "loss": 0.9571,
1844
+ "step": 61500
1845
+ },
1846
+ {
1847
+ "epoch": 21.99,
1848
+ "eval_accuracy": 0.7273940607273941,
1849
+ "eval_loss": 1.4162867069244385,
1850
+ "eval_runtime": 0.791,
1851
+ "eval_samples_per_second": 632.128,
1852
+ "eval_steps_per_second": 40.456,
1853
+ "step": 61500
1854
+ },
1855
+ {
1856
+ "epoch": 22.17,
1857
+ "learning_rate": 1.3071743534739602e-05,
1858
+ "loss": 0.932,
1859
+ "step": 62000
1860
+ },
1861
+ {
1862
+ "epoch": 22.17,
1863
+ "eval_accuracy": 0.7522727272727273,
1864
+ "eval_loss": 1.2717351913452148,
1865
+ "eval_runtime": 0.796,
1866
+ "eval_samples_per_second": 628.103,
1867
+ "eval_steps_per_second": 40.199,
1868
+ "step": 62000
1869
+ },
1870
+ {
1871
+ "epoch": 22.35,
1872
+ "learning_rate": 1.2773805267548563e-05,
1873
+ "loss": 0.9166,
1874
+ "step": 62500
1875
+ },
1876
+ {
1877
+ "epoch": 22.35,
1878
+ "eval_accuracy": 0.73956326268465,
1879
+ "eval_loss": 1.217714786529541,
1880
+ "eval_runtime": 0.8289,
1881
+ "eval_samples_per_second": 603.185,
1882
+ "eval_steps_per_second": 38.604,
1883
+ "step": 62500
1884
+ },
1885
+ {
1886
+ "epoch": 22.52,
1887
+ "learning_rate": 1.2475867000357526e-05,
1888
+ "loss": 0.9301,
1889
+ "step": 63000
1890
+ },
1891
+ {
1892
+ "epoch": 22.52,
1893
+ "eval_accuracy": 0.7377950210151956,
1894
+ "eval_loss": 1.3264496326446533,
1895
+ "eval_runtime": 0.8524,
1896
+ "eval_samples_per_second": 586.56,
1897
+ "eval_steps_per_second": 37.54,
1898
+ "step": 63000
1899
+ },
1900
+ {
1901
+ "epoch": 22.7,
1902
+ "learning_rate": 1.2177928733166488e-05,
1903
+ "loss": 0.9351,
1904
+ "step": 63500
1905
+ },
1906
+ {
1907
+ "epoch": 22.7,
1908
+ "eval_accuracy": 0.752010292698617,
1909
+ "eval_loss": 1.2570440769195557,
1910
+ "eval_runtime": 0.785,
1911
+ "eval_samples_per_second": 636.94,
1912
+ "eval_steps_per_second": 40.764,
1913
+ "step": 63500
1914
+ },
1915
+ {
1916
+ "epoch": 22.88,
1917
+ "learning_rate": 1.1879990465975451e-05,
1918
+ "loss": 0.9211,
1919
+ "step": 64000
1920
+ },
1921
+ {
1922
+ "epoch": 22.88,
1923
+ "eval_accuracy": 0.75,
1924
+ "eval_loss": 1.2638896703720093,
1925
+ "eval_runtime": 0.8753,
1926
+ "eval_samples_per_second": 571.265,
1927
+ "eval_steps_per_second": 36.561,
1928
+ "step": 64000
1929
+ },
1930
+ {
1931
+ "epoch": 23.06,
1932
+ "learning_rate": 1.1582052198784414e-05,
1933
+ "loss": 0.9211,
1934
+ "step": 64500
1935
+ },
1936
+ {
1937
+ "epoch": 23.06,
1938
+ "eval_accuracy": 0.7605543022881083,
1939
+ "eval_loss": 1.2376515865325928,
1940
+ "eval_runtime": 0.7946,
1941
+ "eval_samples_per_second": 629.265,
1942
+ "eval_steps_per_second": 40.273,
1943
+ "step": 64500
1944
+ },
1945
+ {
1946
+ "epoch": 23.24,
1947
+ "learning_rate": 1.1284113931593374e-05,
1948
+ "loss": 0.9196,
1949
+ "step": 65000
1950
+ },
1951
+ {
1952
+ "epoch": 23.24,
1953
+ "eval_accuracy": 0.7485168094924193,
1954
+ "eval_loss": 1.2738728523254395,
1955
+ "eval_runtime": 0.8576,
1956
+ "eval_samples_per_second": 583.036,
1957
+ "eval_steps_per_second": 37.314,
1958
+ "step": 65000
1959
+ },
1960
+ {
1961
+ "epoch": 23.42,
1962
+ "learning_rate": 1.098677154093672e-05,
1963
+ "loss": 0.9062,
1964
+ "step": 65500
1965
+ },
1966
+ {
1967
+ "epoch": 23.42,
1968
+ "eval_accuracy": 0.7365366010964205,
1969
+ "eval_loss": 1.3262896537780762,
1970
+ "eval_runtime": 0.8401,
1971
+ "eval_samples_per_second": 595.164,
1972
+ "eval_steps_per_second": 38.09,
1973
+ "step": 65500
1974
+ },
1975
+ {
1976
+ "epoch": 23.6,
1977
+ "learning_rate": 1.068883327374568e-05,
1978
+ "loss": 0.8965,
1979
+ "step": 66000
1980
+ },
1981
+ {
1982
+ "epoch": 23.6,
1983
+ "eval_accuracy": 0.7455209024552091,
1984
+ "eval_loss": 1.2814128398895264,
1985
+ "eval_runtime": 0.778,
1986
+ "eval_samples_per_second": 642.691,
1987
+ "eval_steps_per_second": 41.132,
1988
+ "step": 66000
1989
+ },
1990
+ {
1991
+ "epoch": 23.78,
1992
+ "learning_rate": 1.0392086759623406e-05,
1993
+ "loss": 0.9004,
1994
+ "step": 66500
1995
+ },
1996
+ {
1997
+ "epoch": 23.78,
1998
+ "eval_accuracy": 0.7561779242174629,
1999
+ "eval_loss": 1.2108628749847412,
2000
+ "eval_runtime": 0.8669,
2001
+ "eval_samples_per_second": 576.736,
2002
+ "eval_steps_per_second": 36.911,
2003
+ "step": 66500
2004
+ },
2005
+ {
2006
+ "epoch": 23.95,
2007
+ "learning_rate": 1.0094148492432369e-05,
2008
+ "loss": 0.9094,
2009
+ "step": 67000
2010
+ },
2011
+ {
2012
+ "epoch": 23.95,
2013
+ "eval_accuracy": 0.7528089887640449,
2014
+ "eval_loss": 1.2629289627075195,
2015
+ "eval_runtime": 0.8653,
2016
+ "eval_samples_per_second": 577.859,
2017
+ "eval_steps_per_second": 36.983,
2018
+ "step": 67000
2019
+ },
2020
+ {
2021
+ "epoch": 24.13,
2022
+ "learning_rate": 9.79621022524133e-06,
2023
+ "loss": 0.8937,
2024
+ "step": 67500
2025
+ },
2026
+ {
2027
+ "epoch": 24.13,
2028
+ "eval_accuracy": 0.7375168690958165,
2029
+ "eval_loss": 1.2770532369613647,
2030
+ "eval_runtime": 0.8492,
2031
+ "eval_samples_per_second": 588.814,
2032
+ "eval_steps_per_second": 37.684,
2033
+ "step": 67500
2034
+ },
2035
+ {
2036
+ "epoch": 24.31,
2037
+ "learning_rate": 9.498271958050292e-06,
2038
+ "loss": 0.8711,
2039
+ "step": 68000
2040
+ },
2041
+ {
2042
+ "epoch": 24.31,
2043
+ "eval_accuracy": 0.7353233830845771,
2044
+ "eval_loss": 1.3746039867401123,
2045
+ "eval_runtime": 0.7929,
2046
+ "eval_samples_per_second": 630.629,
2047
+ "eval_steps_per_second": 40.36,
2048
+ "step": 68000
2049
+ },
2050
+ {
2051
+ "epoch": 24.49,
2052
+ "learning_rate": 9.200333690859255e-06,
2053
+ "loss": 0.8972,
2054
+ "step": 68500
2055
+ },
2056
+ {
2057
+ "epoch": 24.49,
2058
+ "eval_accuracy": 0.7453750420450723,
2059
+ "eval_loss": 1.2529133558273315,
2060
+ "eval_runtime": 0.8497,
2061
+ "eval_samples_per_second": 588.462,
2062
+ "eval_steps_per_second": 37.662,
2063
+ "step": 68500
2064
+ },
2065
+ {
2066
+ "epoch": 24.67,
2067
+ "learning_rate": 8.902395423668217e-06,
2068
+ "loss": 0.8863,
2069
+ "step": 69000
2070
+ },
2071
+ {
2072
+ "epoch": 24.67,
2073
+ "eval_accuracy": 0.7359154929577465,
2074
+ "eval_loss": 1.3219196796417236,
2075
+ "eval_runtime": 0.8149,
2076
+ "eval_samples_per_second": 613.598,
2077
+ "eval_steps_per_second": 39.27,
2078
+ "step": 69000
2079
+ },
2080
+ {
2081
+ "epoch": 24.85,
2082
+ "learning_rate": 8.604457156477178e-06,
2083
+ "loss": 0.8823,
2084
+ "step": 69500
2085
+ },
2086
+ {
2087
+ "epoch": 24.85,
2088
+ "eval_accuracy": 0.7367235275185066,
2089
+ "eval_loss": 1.313620924949646,
2090
+ "eval_runtime": 0.8311,
2091
+ "eval_samples_per_second": 601.621,
2092
+ "eval_steps_per_second": 38.504,
2093
+ "step": 69500
2094
+ },
2095
+ {
2096
+ "epoch": 25.03,
2097
+ "learning_rate": 8.306518889286139e-06,
2098
+ "loss": 0.8759,
2099
+ "step": 70000
2100
+ },
2101
+ {
2102
+ "epoch": 25.03,
2103
+ "eval_accuracy": 0.7427812811151676,
2104
+ "eval_loss": 1.3151708841323853,
2105
+ "eval_runtime": 0.7986,
2106
+ "eval_samples_per_second": 626.093,
2107
+ "eval_steps_per_second": 40.07,
2108
+ "step": 70000
2109
+ },
2110
+ {
2111
+ "epoch": 25.21,
2112
+ "learning_rate": 8.008580622095102e-06,
2113
+ "loss": 0.8722,
2114
+ "step": 70500
2115
+ },
2116
+ {
2117
+ "epoch": 25.21,
2118
+ "eval_accuracy": 0.7569644572526417,
2119
+ "eval_loss": 1.3108021020889282,
2120
+ "eval_runtime": 0.8281,
2121
+ "eval_samples_per_second": 603.782,
2122
+ "eval_steps_per_second": 38.642,
2123
+ "step": 70500
2124
+ },
2125
+ {
2126
+ "epoch": 25.38,
2127
+ "learning_rate": 7.710642354904064e-06,
2128
+ "loss": 0.8548,
2129
+ "step": 71000
2130
+ },
2131
+ {
2132
+ "epoch": 25.38,
2133
+ "eval_accuracy": 0.7367716008037508,
2134
+ "eval_loss": 1.3503183126449585,
2135
+ "eval_runtime": 0.7871,
2136
+ "eval_samples_per_second": 635.233,
2137
+ "eval_steps_per_second": 40.655,
2138
+ "step": 71000
2139
+ },
2140
+ {
2141
+ "epoch": 25.56,
2142
+ "learning_rate": 7.412704087713027e-06,
2143
+ "loss": 0.8728,
2144
+ "step": 71500
2145
+ },
2146
+ {
2147
+ "epoch": 25.56,
2148
+ "eval_accuracy": 0.7402768622280818,
2149
+ "eval_loss": 1.3091211318969727,
2150
+ "eval_runtime": 0.8581,
2151
+ "eval_samples_per_second": 582.712,
2152
+ "eval_steps_per_second": 37.294,
2153
+ "step": 71500
2154
+ },
2155
+ {
2156
+ "epoch": 25.74,
2157
+ "learning_rate": 7.114765820521989e-06,
2158
+ "loss": 0.8633,
2159
+ "step": 72000
2160
+ },
2161
+ {
2162
+ "epoch": 25.74,
2163
+ "eval_accuracy": 0.7416481069042317,
2164
+ "eval_loss": 1.2952070236206055,
2165
+ "eval_runtime": 0.8515,
2166
+ "eval_samples_per_second": 587.213,
2167
+ "eval_steps_per_second": 37.582,
2168
+ "step": 72000
2169
+ },
2170
+ {
2171
+ "epoch": 25.92,
2172
+ "learning_rate": 6.816827553330949e-06,
2173
+ "loss": 0.8612,
2174
+ "step": 72500
2175
+ },
2176
+ {
2177
+ "epoch": 25.92,
2178
+ "eval_accuracy": 0.7719072164948454,
2179
+ "eval_loss": 1.1612097024917603,
2180
+ "eval_runtime": 0.7967,
2181
+ "eval_samples_per_second": 627.618,
2182
+ "eval_steps_per_second": 40.168,
2183
+ "step": 72500
2184
+ },
2185
+ {
2186
+ "epoch": 26.1,
2187
+ "learning_rate": 6.5194851626742935e-06,
2188
+ "loss": 0.8677,
2189
+ "step": 73000
2190
+ },
2191
+ {
2192
+ "epoch": 26.1,
2193
+ "eval_accuracy": 0.7449731903485255,
2194
+ "eval_loss": 1.2855061292648315,
2195
+ "eval_runtime": 0.8112,
2196
+ "eval_samples_per_second": 616.391,
2197
+ "eval_steps_per_second": 39.449,
2198
+ "step": 73000
2199
+ },
2200
+ {
2201
+ "epoch": 26.28,
2202
+ "learning_rate": 6.2221427720176384e-06,
2203
+ "loss": 0.8526,
2204
+ "step": 73500
2205
+ },
2206
+ {
2207
+ "epoch": 26.28,
2208
+ "eval_accuracy": 0.7544929396662388,
2209
+ "eval_loss": 1.297914981842041,
2210
+ "eval_runtime": 0.8472,
2211
+ "eval_samples_per_second": 590.203,
2212
+ "eval_steps_per_second": 37.773,
2213
+ "step": 73500
2214
+ },
2215
+ {
2216
+ "epoch": 26.46,
2217
+ "learning_rate": 5.9242045048266e-06,
2218
+ "loss": 0.8594,
2219
+ "step": 74000
2220
+ },
2221
+ {
2222
+ "epoch": 26.46,
2223
+ "eval_accuracy": 0.7598070739549839,
2224
+ "eval_loss": 1.2569819688796997,
2225
+ "eval_runtime": 0.7923,
2226
+ "eval_samples_per_second": 631.066,
2227
+ "eval_steps_per_second": 40.388,
2228
+ "step": 74000
2229
+ },
2230
+ {
2231
+ "epoch": 26.64,
2232
+ "learning_rate": 5.626266237635562e-06,
2233
+ "loss": 0.8481,
2234
+ "step": 74500
2235
+ },
2236
+ {
2237
+ "epoch": 26.64,
2238
+ "eval_accuracy": 0.7491992312620115,
2239
+ "eval_loss": 1.2336714267730713,
2240
+ "eval_runtime": 0.8668,
2241
+ "eval_samples_per_second": 576.839,
2242
+ "eval_steps_per_second": 36.918,
2243
+ "step": 74500
2244
+ },
2245
+ {
2246
+ "epoch": 26.81,
2247
+ "learning_rate": 5.3283279704445245e-06,
2248
+ "loss": 0.855,
2249
+ "step": 75000
2250
+ },
2251
+ {
2252
+ "epoch": 26.81,
2253
+ "eval_accuracy": 0.7443507588532884,
2254
+ "eval_loss": 1.2874828577041626,
2255
+ "eval_runtime": 0.7926,
2256
+ "eval_samples_per_second": 630.803,
2257
+ "eval_steps_per_second": 40.371,
2258
+ "step": 75000
2259
+ },
2260
+ {
2261
+ "epoch": 26.99,
2262
+ "learning_rate": 5.030389703253486e-06,
2263
+ "loss": 0.835,
2264
+ "step": 75500
2265
+ },
2266
+ {
2267
+ "epoch": 26.99,
2268
+ "eval_accuracy": 0.7584731819677526,
2269
+ "eval_loss": 1.2270281314849854,
2270
+ "eval_runtime": 0.8172,
2271
+ "eval_samples_per_second": 611.826,
2272
+ "eval_steps_per_second": 39.157,
2273
+ "step": 75500
2274
+ },
2275
+ {
2276
+ "epoch": 27.17,
2277
+ "learning_rate": 4.732451436062448e-06,
2278
+ "loss": 0.8309,
2279
+ "step": 76000
2280
+ },
2281
+ {
2282
+ "epoch": 27.17,
2283
+ "eval_accuracy": 0.7389322916666666,
2284
+ "eval_loss": 1.2539992332458496,
2285
+ "eval_runtime": 0.8357,
2286
+ "eval_samples_per_second": 598.292,
2287
+ "eval_steps_per_second": 38.291,
2288
+ "step": 76000
2289
+ },
2290
+ {
2291
+ "epoch": 27.35,
2292
+ "learning_rate": 4.43451316887141e-06,
2293
+ "loss": 0.8326,
2294
+ "step": 76500
2295
+ },
2296
+ {
2297
+ "epoch": 27.35,
2298
+ "eval_accuracy": 0.7374631268436578,
2299
+ "eval_loss": 1.3610546588897705,
2300
+ "eval_runtime": 0.7953,
2301
+ "eval_samples_per_second": 628.676,
2302
+ "eval_steps_per_second": 40.235,
2303
+ "step": 76500
2304
+ },
2305
+ {
2306
+ "epoch": 27.53,
2307
+ "learning_rate": 4.136574901680372e-06,
2308
+ "loss": 0.8398,
2309
+ "step": 77000
2310
+ },
2311
+ {
2312
+ "epoch": 27.53,
2313
+ "eval_accuracy": 0.7504918032786885,
2314
+ "eval_loss": 1.2247506380081177,
2315
+ "eval_runtime": 0.859,
2316
+ "eval_samples_per_second": 582.099,
2317
+ "eval_steps_per_second": 37.254,
2318
+ "step": 77000
2319
+ },
2320
+ {
2321
+ "epoch": 27.71,
2322
+ "learning_rate": 3.838636634489334e-06,
2323
+ "loss": 0.8304,
2324
+ "step": 77500
2325
+ },
2326
+ {
2327
+ "epoch": 27.71,
2328
+ "eval_accuracy": 0.7607282184655396,
2329
+ "eval_loss": 1.2403171062469482,
2330
+ "eval_runtime": 0.9471,
2331
+ "eval_samples_per_second": 527.922,
2332
+ "eval_steps_per_second": 33.787,
2333
+ "step": 77500
2334
+ },
2335
+ {
2336
+ "epoch": 27.89,
2337
+ "learning_rate": 3.5406983672982957e-06,
2338
+ "loss": 0.8373,
2339
+ "step": 78000
2340
+ },
2341
+ {
2342
+ "epoch": 27.89,
2343
+ "eval_accuracy": 0.7611295681063123,
2344
+ "eval_loss": 1.1708660125732422,
2345
+ "eval_runtime": 0.8284,
2346
+ "eval_samples_per_second": 603.609,
2347
+ "eval_steps_per_second": 38.631,
2348
+ "step": 78000
2349
+ },
2350
+ {
2351
+ "epoch": 28.07,
2352
+ "learning_rate": 3.2427601001072583e-06,
2353
+ "loss": 0.8462,
2354
+ "step": 78500
2355
+ },
2356
+ {
2357
+ "epoch": 28.07,
2358
+ "eval_accuracy": 0.7508185985592666,
2359
+ "eval_loss": 1.289104700088501,
2360
+ "eval_runtime": 0.8603,
2361
+ "eval_samples_per_second": 581.16,
2362
+ "eval_steps_per_second": 37.194,
2363
+ "step": 78500
2364
+ },
2365
+ {
2366
+ "epoch": 28.24,
2367
+ "learning_rate": 2.945417709450602e-06,
2368
+ "loss": 0.8259,
2369
+ "step": 79000
2370
+ },
2371
+ {
2372
+ "epoch": 28.24,
2373
+ "eval_accuracy": 0.7500814597588791,
2374
+ "eval_loss": 1.2452012300491333,
2375
+ "eval_runtime": 0.8046,
2376
+ "eval_samples_per_second": 621.394,
2377
+ "eval_steps_per_second": 39.769,
2378
+ "step": 79000
2379
+ },
2380
+ {
2381
+ "epoch": 28.42,
2382
+ "learning_rate": 2.647479442259564e-06,
2383
+ "loss": 0.8334,
2384
+ "step": 79500
2385
+ },
2386
+ {
2387
+ "epoch": 28.42,
2388
+ "eval_accuracy": 0.746810598626104,
2389
+ "eval_loss": 1.2985996007919312,
2390
+ "eval_runtime": 0.9197,
2391
+ "eval_samples_per_second": 543.676,
2392
+ "eval_steps_per_second": 34.795,
2393
+ "step": 79500
2394
+ },
2395
+ {
2396
+ "epoch": 28.6,
2397
+ "learning_rate": 2.349541175068526e-06,
2398
+ "loss": 0.8115,
2399
+ "step": 80000
2400
+ },
2401
+ {
2402
+ "epoch": 28.6,
2403
+ "eval_accuracy": 0.7514638906961614,
2404
+ "eval_loss": 1.2879589796066284,
2405
+ "eval_runtime": 0.7986,
2406
+ "eval_samples_per_second": 626.129,
2407
+ "eval_steps_per_second": 40.072,
2408
+ "step": 80000
2409
+ },
2410
+ {
2411
+ "epoch": 28.78,
2412
+ "learning_rate": 2.0516029078774876e-06,
2413
+ "loss": 0.8205,
2414
+ "step": 80500
2415
+ },
2416
+ {
2417
+ "epoch": 28.78,
2418
+ "eval_accuracy": 0.75615359369872,
2419
+ "eval_loss": 1.2727956771850586,
2420
+ "eval_runtime": 0.8652,
2421
+ "eval_samples_per_second": 577.899,
2422
+ "eval_steps_per_second": 36.986,
2423
+ "step": 80500
2424
+ },
2425
+ {
2426
+ "epoch": 28.96,
2427
+ "learning_rate": 1.7536646406864498e-06,
2428
+ "loss": 0.8261,
2429
+ "step": 81000
2430
+ },
2431
+ {
2432
+ "epoch": 28.96,
2433
+ "eval_accuracy": 0.7523561910952227,
2434
+ "eval_loss": 1.2660555839538574,
2435
+ "eval_runtime": 0.7893,
2436
+ "eval_samples_per_second": 633.494,
2437
+ "eval_steps_per_second": 40.544,
2438
+ "step": 81000
2439
+ },
2440
+ {
2441
+ "epoch": 29.14,
2442
+ "learning_rate": 1.4563222500297937e-06,
2443
+ "loss": 0.8299,
2444
+ "step": 81500
2445
+ },
2446
+ {
2447
+ "epoch": 29.14,
2448
+ "eval_accuracy": 0.7486106570774763,
2449
+ "eval_loss": 1.25924813747406,
2450
+ "eval_runtime": 0.8513,
2451
+ "eval_samples_per_second": 587.342,
2452
+ "eval_steps_per_second": 37.59,
2453
+ "step": 81500
2454
+ },
2455
+ {
2456
+ "epoch": 29.32,
2457
+ "learning_rate": 1.1583839828387559e-06,
2458
+ "loss": 0.8276,
2459
+ "step": 82000
2460
+ },
2461
+ {
2462
+ "epoch": 29.32,
2463
+ "eval_accuracy": 0.7529644268774703,
2464
+ "eval_loss": 1.2325080633163452,
2465
+ "eval_runtime": 0.8587,
2466
+ "eval_samples_per_second": 582.291,
2467
+ "eval_steps_per_second": 37.267,
2468
+ "step": 82000
2469
+ },
2470
+ {
2471
+ "epoch": 29.5,
2472
+ "learning_rate": 8.604457156477178e-07,
2473
+ "loss": 0.8112,
2474
+ "step": 82500
2475
+ },
2476
+ {
2477
+ "epoch": 29.5,
2478
+ "eval_accuracy": 0.7477890599410416,
2479
+ "eval_loss": 1.3154096603393555,
2480
+ "eval_runtime": 0.8166,
2481
+ "eval_samples_per_second": 612.267,
2482
+ "eval_steps_per_second": 39.185,
2483
+ "step": 82500
2484
+ },
2485
+ {
2486
+ "epoch": 29.67,
2487
+ "learning_rate": 5.625074484566799e-07,
2488
+ "loss": 0.8111,
2489
+ "step": 83000
2490
+ },
2491
+ {
2492
+ "epoch": 29.67,
2493
+ "eval_accuracy": 0.740531561461794,
2494
+ "eval_loss": 1.3342524766921997,
2495
+ "eval_runtime": 0.8076,
2496
+ "eval_samples_per_second": 619.083,
2497
+ "eval_steps_per_second": 39.621,
2498
+ "step": 83000
2499
+ },
2500
+ {
2501
+ "epoch": 29.85,
2502
+ "learning_rate": 2.645691812656418e-07,
2503
+ "loss": 0.8148,
2504
+ "step": 83500
2505
+ },
2506
+ {
2507
+ "epoch": 29.85,
2508
+ "eval_accuracy": 0.7484622855292975,
2509
+ "eval_loss": 1.2806158065795898,
2510
+ "eval_runtime": 0.8122,
2511
+ "eval_samples_per_second": 615.596,
2512
+ "eval_steps_per_second": 39.398,
2513
+ "step": 83500
2514
+ },
2515
+ {
2516
+ "epoch": 30.0,
2517
+ "step": 83910,
2518
+ "total_flos": 3.583580261367381e+17,
2519
+ "train_loss": 1.1746184680817338,
2520
+ "train_runtime": 16410.0948,
2521
+ "train_samples_per_second": 163.619,
2522
+ "train_steps_per_second": 5.113
2523
+ }
2524
+ ],
2525
+ "max_steps": 83910,
2526
+ "num_train_epochs": 30,
2527
+ "total_flos": 3.583580261367381e+17,
2528
+ "trial_name": null,
2529
+ "trial_params": null
2530
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f45b37904a4dc306c6f60adc2099b53a85a7087592415bc16975aeb7298b9aa0
3
+ size 3375
vocab.json ADDED
The diff for this file is too large to render. See raw diff