chansung commited on
Commit
989ee20
1 Parent(s): 4769f68

Model save

Browse files
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - sft
7
+ - generated_from_trainer
8
+ base_model: google/gemma-7b
9
+ datasets:
10
+ - generator
11
+ model-index:
12
+ - name: coding_llamaduo_60k_v0.2
13
+ results: []
14
+ ---
15
+
16
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
+ should probably proofread and complete it, then remove this comment. -->
18
+
19
+ # coding_llamaduo_60k_v0.2
20
+
21
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the generator dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 1.3326
24
+
25
+ ## Model description
26
+
27
+ More information needed
28
+
29
+ ## Intended uses & limitations
30
+
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
+
35
+ More information needed
36
+
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
+
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 0.0002
43
+ - train_batch_size: 4
44
+ - eval_batch_size: 4
45
+ - seed: 42
46
+ - distributed_type: multi-GPU
47
+ - num_devices: 4
48
+ - gradient_accumulation_steps: 2
49
+ - total_train_batch_size: 32
50
+ - total_eval_batch_size: 16
51
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
+ - lr_scheduler_type: cosine
53
+ - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 10
55
+
56
+ ### Training results
57
+
58
+ | Training Loss | Epoch | Step | Validation Loss |
59
+ |:-------------:|:-----:|:----:|:---------------:|
60
+ | 0.7499 | 1.0 | 126 | 1.2580 |
61
+ | 0.6058 | 2.0 | 252 | 1.1687 |
62
+ | 0.5571 | 3.0 | 378 | 1.1492 |
63
+ | 0.5118 | 4.0 | 504 | 1.1551 |
64
+ | 0.4711 | 5.0 | 630 | 1.1767 |
65
+ | 0.4287 | 6.0 | 756 | 1.1948 |
66
+ | 0.3943 | 7.0 | 882 | 1.2383 |
67
+ | 0.3612 | 8.0 | 1008 | 1.2904 |
68
+ | 0.3457 | 9.0 | 1134 | 1.3253 |
69
+ | 0.3328 | 10.0 | 1260 | 1.3326 |
70
+
71
+
72
+ ### Framework versions
73
+
74
+ - PEFT 0.7.1
75
+ - Transformers 4.40.1
76
+ - Pytorch 2.2.2+cu121
77
+ - Datasets 2.19.0
78
+ - Tokenizers 0.19.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:623fbfc6abc9137caedcf170d530ea0bb4b371b182b32ca3a64cbc5d8229f9f9
3
  size 200068904
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad3430977fbf67d56dea82f8eb026d5d3d24e08a481959da1f646cd6eff46fca
3
  size 200068904
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 3.8899222565240177e+18,
4
+ "train_loss": 1.363766341266178,
5
+ "train_runtime": 8098.2535,
6
+ "train_samples": 60531,
7
+ "train_samples_per_second": 4.97,
8
+ "train_steps_per_second": 0.156
9
+ }
runs/Apr23_22-36-57_deep-diver-main-ordinary-zebra-1-0-0/events.out.tfevents.1713926426.deep-diver-main-ordinary-zebra-1-0-0.520.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3b2dccd904385904b4d194e533d7b7258002d91fd036c5eb93a926ad3ea2875
3
- size 58645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10b6e287b3a6cb3f928613155dee2b20705a605137df2bde40eb9975f8e21804
3
+ size 61802
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 3.8899222565240177e+18,
4
+ "train_loss": 1.363766341266178,
5
+ "train_runtime": 8098.2535,
6
+ "train_samples": 60531,
7
+ "train_samples_per_second": 4.97,
8
+ "train_steps_per_second": 0.156
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1881 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1260,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.007936507936507936,
13
+ "grad_norm": 300.0,
14
+ "learning_rate": 1.5873015873015873e-06,
15
+ "loss": 44.0514,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.03968253968253968,
20
+ "grad_norm": 294.0,
21
+ "learning_rate": 7.936507936507936e-06,
22
+ "loss": 44.4732,
23
+ "step": 5
24
+ },
25
+ {
26
+ "epoch": 0.07936507936507936,
27
+ "grad_norm": 172.0,
28
+ "learning_rate": 1.5873015873015872e-05,
29
+ "loss": 39.0478,
30
+ "step": 10
31
+ },
32
+ {
33
+ "epoch": 0.11904761904761904,
34
+ "grad_norm": 61.5,
35
+ "learning_rate": 2.380952380952381e-05,
36
+ "loss": 27.5033,
37
+ "step": 15
38
+ },
39
+ {
40
+ "epoch": 0.15873015873015872,
41
+ "grad_norm": 19.625,
42
+ "learning_rate": 3.1746031746031745e-05,
43
+ "loss": 19.8771,
44
+ "step": 20
45
+ },
46
+ {
47
+ "epoch": 0.1984126984126984,
48
+ "grad_norm": 12.6875,
49
+ "learning_rate": 3.968253968253968e-05,
50
+ "loss": 18.3177,
51
+ "step": 25
52
+ },
53
+ {
54
+ "epoch": 0.23809523809523808,
55
+ "grad_norm": 5.875,
56
+ "learning_rate": 4.761904761904762e-05,
57
+ "loss": 16.5284,
58
+ "step": 30
59
+ },
60
+ {
61
+ "epoch": 0.2777777777777778,
62
+ "grad_norm": 3.015625,
63
+ "learning_rate": 5.555555555555556e-05,
64
+ "loss": 15.6508,
65
+ "step": 35
66
+ },
67
+ {
68
+ "epoch": 0.31746031746031744,
69
+ "grad_norm": 3.828125,
70
+ "learning_rate": 6.349206349206349e-05,
71
+ "loss": 14.7333,
72
+ "step": 40
73
+ },
74
+ {
75
+ "epoch": 0.35714285714285715,
76
+ "grad_norm": 7.34375,
77
+ "learning_rate": 7.142857142857143e-05,
78
+ "loss": 14.189,
79
+ "step": 45
80
+ },
81
+ {
82
+ "epoch": 0.3968253968253968,
83
+ "grad_norm": 19.0,
84
+ "learning_rate": 7.936507936507937e-05,
85
+ "loss": 11.8511,
86
+ "step": 50
87
+ },
88
+ {
89
+ "epoch": 0.4365079365079365,
90
+ "grad_norm": 16.625,
91
+ "learning_rate": 8.730158730158731e-05,
92
+ "loss": 5.6454,
93
+ "step": 55
94
+ },
95
+ {
96
+ "epoch": 0.47619047619047616,
97
+ "grad_norm": 2.203125,
98
+ "learning_rate": 9.523809523809524e-05,
99
+ "loss": 1.7925,
100
+ "step": 60
101
+ },
102
+ {
103
+ "epoch": 0.5158730158730159,
104
+ "grad_norm": 2.265625,
105
+ "learning_rate": 0.00010317460317460319,
106
+ "loss": 1.4147,
107
+ "step": 65
108
+ },
109
+ {
110
+ "epoch": 0.5555555555555556,
111
+ "grad_norm": 0.74609375,
112
+ "learning_rate": 0.00011111111111111112,
113
+ "loss": 1.2306,
114
+ "step": 70
115
+ },
116
+ {
117
+ "epoch": 0.5952380952380952,
118
+ "grad_norm": 2.046875,
119
+ "learning_rate": 0.00011904761904761905,
120
+ "loss": 1.1237,
121
+ "step": 75
122
+ },
123
+ {
124
+ "epoch": 0.6349206349206349,
125
+ "grad_norm": 2.0,
126
+ "learning_rate": 0.00012698412698412698,
127
+ "loss": 1.0467,
128
+ "step": 80
129
+ },
130
+ {
131
+ "epoch": 0.6746031746031746,
132
+ "grad_norm": 7.90625,
133
+ "learning_rate": 0.00013492063492063494,
134
+ "loss": 0.9706,
135
+ "step": 85
136
+ },
137
+ {
138
+ "epoch": 0.7142857142857143,
139
+ "grad_norm": 1.5078125,
140
+ "learning_rate": 0.00014285714285714287,
141
+ "loss": 0.944,
142
+ "step": 90
143
+ },
144
+ {
145
+ "epoch": 0.753968253968254,
146
+ "grad_norm": 2.765625,
147
+ "learning_rate": 0.0001507936507936508,
148
+ "loss": 0.9101,
149
+ "step": 95
150
+ },
151
+ {
152
+ "epoch": 0.7936507936507936,
153
+ "grad_norm": 2.140625,
154
+ "learning_rate": 0.00015873015873015873,
155
+ "loss": 0.8892,
156
+ "step": 100
157
+ },
158
+ {
159
+ "epoch": 0.8333333333333334,
160
+ "grad_norm": 1.640625,
161
+ "learning_rate": 0.0001666666666666667,
162
+ "loss": 0.8475,
163
+ "step": 105
164
+ },
165
+ {
166
+ "epoch": 0.873015873015873,
167
+ "grad_norm": 27.125,
168
+ "learning_rate": 0.00017460317460317462,
169
+ "loss": 0.8386,
170
+ "step": 110
171
+ },
172
+ {
173
+ "epoch": 0.9126984126984127,
174
+ "grad_norm": 1.8828125,
175
+ "learning_rate": 0.00018253968253968255,
176
+ "loss": 0.8187,
177
+ "step": 115
178
+ },
179
+ {
180
+ "epoch": 0.9523809523809523,
181
+ "grad_norm": 1.6015625,
182
+ "learning_rate": 0.00019047619047619048,
183
+ "loss": 0.7724,
184
+ "step": 120
185
+ },
186
+ {
187
+ "epoch": 0.9920634920634921,
188
+ "grad_norm": 1.6640625,
189
+ "learning_rate": 0.00019841269841269844,
190
+ "loss": 0.7499,
191
+ "step": 125
192
+ },
193
+ {
194
+ "epoch": 1.0,
195
+ "eval_loss": 1.2579786777496338,
196
+ "eval_runtime": 1.0146,
197
+ "eval_samples_per_second": 1.971,
198
+ "eval_steps_per_second": 0.986,
199
+ "step": 126
200
+ },
201
+ {
202
+ "epoch": 1.0317460317460316,
203
+ "grad_norm": 0.453125,
204
+ "learning_rate": 0.00019999386012995552,
205
+ "loss": 0.7176,
206
+ "step": 130
207
+ },
208
+ {
209
+ "epoch": 1.0714285714285714,
210
+ "grad_norm": 0.5078125,
211
+ "learning_rate": 0.00019996891820008164,
212
+ "loss": 0.7074,
213
+ "step": 135
214
+ },
215
+ {
216
+ "epoch": 1.1111111111111112,
217
+ "grad_norm": 0.609375,
218
+ "learning_rate": 0.00019992479525042303,
219
+ "loss": 0.6984,
220
+ "step": 140
221
+ },
222
+ {
223
+ "epoch": 1.1507936507936507,
224
+ "grad_norm": 7.375,
225
+ "learning_rate": 0.0001998614997468427,
226
+ "loss": 0.6891,
227
+ "step": 145
228
+ },
229
+ {
230
+ "epoch": 1.1904761904761905,
231
+ "grad_norm": 0.90625,
232
+ "learning_rate": 0.0001997790438338385,
233
+ "loss": 0.6864,
234
+ "step": 150
235
+ },
236
+ {
237
+ "epoch": 1.2301587301587302,
238
+ "grad_norm": 0.306640625,
239
+ "learning_rate": 0.00019967744333221278,
240
+ "loss": 0.6766,
241
+ "step": 155
242
+ },
243
+ {
244
+ "epoch": 1.2698412698412698,
245
+ "grad_norm": 8.1875,
246
+ "learning_rate": 0.00019955671773603696,
247
+ "loss": 0.6776,
248
+ "step": 160
249
+ },
250
+ {
251
+ "epoch": 1.3095238095238095,
252
+ "grad_norm": 2.203125,
253
+ "learning_rate": 0.0001994168902089112,
254
+ "loss": 0.6565,
255
+ "step": 165
256
+ },
257
+ {
258
+ "epoch": 1.3492063492063493,
259
+ "grad_norm": 0.80859375,
260
+ "learning_rate": 0.00019925798757952,
261
+ "loss": 0.6495,
262
+ "step": 170
263
+ },
264
+ {
265
+ "epoch": 1.3888888888888888,
266
+ "grad_norm": 1.90625,
267
+ "learning_rate": 0.00019908004033648453,
268
+ "loss": 0.6625,
269
+ "step": 175
270
+ },
271
+ {
272
+ "epoch": 1.4285714285714286,
273
+ "grad_norm": 1.0234375,
274
+ "learning_rate": 0.00019888308262251285,
275
+ "loss": 0.6444,
276
+ "step": 180
277
+ },
278
+ {
279
+ "epoch": 1.4682539682539684,
280
+ "grad_norm": 0.9609375,
281
+ "learning_rate": 0.00019866715222784895,
282
+ "loss": 0.635,
283
+ "step": 185
284
+ },
285
+ {
286
+ "epoch": 1.507936507936508,
287
+ "grad_norm": 1.953125,
288
+ "learning_rate": 0.0001984322905830219,
289
+ "loss": 0.6417,
290
+ "step": 190
291
+ },
292
+ {
293
+ "epoch": 1.5476190476190477,
294
+ "grad_norm": 1.0859375,
295
+ "learning_rate": 0.0001981785427508966,
296
+ "loss": 0.6381,
297
+ "step": 195
298
+ },
299
+ {
300
+ "epoch": 1.5873015873015874,
301
+ "grad_norm": 0.69140625,
302
+ "learning_rate": 0.00019790595741802757,
303
+ "loss": 0.6256,
304
+ "step": 200
305
+ },
306
+ {
307
+ "epoch": 1.626984126984127,
308
+ "grad_norm": 0.30078125,
309
+ "learning_rate": 0.00019761458688531756,
310
+ "loss": 0.6247,
311
+ "step": 205
312
+ },
313
+ {
314
+ "epoch": 1.6666666666666665,
315
+ "grad_norm": 0.921875,
316
+ "learning_rate": 0.00019730448705798239,
317
+ "loss": 0.6244,
318
+ "step": 210
319
+ },
320
+ {
321
+ "epoch": 1.7063492063492065,
322
+ "grad_norm": 0.6328125,
323
+ "learning_rate": 0.0001969757174348246,
324
+ "loss": 0.6094,
325
+ "step": 215
326
+ },
327
+ {
328
+ "epoch": 1.746031746031746,
329
+ "grad_norm": 0.76953125,
330
+ "learning_rate": 0.0001966283410968174,
331
+ "loss": 0.6156,
332
+ "step": 220
333
+ },
334
+ {
335
+ "epoch": 1.7857142857142856,
336
+ "grad_norm": 1.1484375,
337
+ "learning_rate": 0.0001962624246950012,
338
+ "loss": 0.6037,
339
+ "step": 225
340
+ },
341
+ {
342
+ "epoch": 1.8253968253968254,
343
+ "grad_norm": 0.984375,
344
+ "learning_rate": 0.0001958780384376955,
345
+ "loss": 0.6068,
346
+ "step": 230
347
+ },
348
+ {
349
+ "epoch": 1.8650793650793651,
350
+ "grad_norm": 0.640625,
351
+ "learning_rate": 0.00019547525607702774,
352
+ "loss": 0.5994,
353
+ "step": 235
354
+ },
355
+ {
356
+ "epoch": 1.9047619047619047,
357
+ "grad_norm": 1.8203125,
358
+ "learning_rate": 0.0001950541548947829,
359
+ "loss": 0.6115,
360
+ "step": 240
361
+ },
362
+ {
363
+ "epoch": 1.9444444444444444,
364
+ "grad_norm": 0.30078125,
365
+ "learning_rate": 0.00019461481568757506,
366
+ "loss": 0.598,
367
+ "step": 245
368
+ },
369
+ {
370
+ "epoch": 1.9841269841269842,
371
+ "grad_norm": 0.396484375,
372
+ "learning_rate": 0.00019415732275134513,
373
+ "loss": 0.6058,
374
+ "step": 250
375
+ },
376
+ {
377
+ "epoch": 2.0,
378
+ "eval_loss": 1.1687145233154297,
379
+ "eval_runtime": 1.0157,
380
+ "eval_samples_per_second": 1.969,
381
+ "eval_steps_per_second": 0.985,
382
+ "step": 252
383
+ },
384
+ {
385
+ "epoch": 2.0238095238095237,
386
+ "grad_norm": 0.458984375,
387
+ "learning_rate": 0.0001936817638651871,
388
+ "loss": 0.5677,
389
+ "step": 255
390
+ },
391
+ {
392
+ "epoch": 2.0634920634920633,
393
+ "grad_norm": 0.5078125,
394
+ "learning_rate": 0.0001931882302745057,
395
+ "loss": 0.5648,
396
+ "step": 260
397
+ },
398
+ {
399
+ "epoch": 2.1031746031746033,
400
+ "grad_norm": 0.66796875,
401
+ "learning_rate": 0.00019267681667350928,
402
+ "loss": 0.5502,
403
+ "step": 265
404
+ },
405
+ {
406
+ "epoch": 2.142857142857143,
407
+ "grad_norm": 0.48828125,
408
+ "learning_rate": 0.00019214762118704076,
409
+ "loss": 0.5573,
410
+ "step": 270
411
+ },
412
+ {
413
+ "epoch": 2.1825396825396823,
414
+ "grad_norm": 0.26953125,
415
+ "learning_rate": 0.00019160074535175058,
416
+ "loss": 0.5622,
417
+ "step": 275
418
+ },
419
+ {
420
+ "epoch": 2.2222222222222223,
421
+ "grad_norm": 0.291015625,
422
+ "learning_rate": 0.0001910362940966147,
423
+ "loss": 0.5586,
424
+ "step": 280
425
+ },
426
+ {
427
+ "epoch": 2.261904761904762,
428
+ "grad_norm": 0.3984375,
429
+ "learning_rate": 0.00019045437572280194,
430
+ "loss": 0.5545,
431
+ "step": 285
432
+ },
433
+ {
434
+ "epoch": 2.3015873015873014,
435
+ "grad_norm": 0.34765625,
436
+ "learning_rate": 0.0001898551018828944,
437
+ "loss": 0.5489,
438
+ "step": 290
439
+ },
440
+ {
441
+ "epoch": 2.3412698412698414,
442
+ "grad_norm": 0.275390625,
443
+ "learning_rate": 0.0001892385875594645,
444
+ "loss": 0.5577,
445
+ "step": 295
446
+ },
447
+ {
448
+ "epoch": 2.380952380952381,
449
+ "grad_norm": 0.2578125,
450
+ "learning_rate": 0.00018860495104301345,
451
+ "loss": 0.5462,
452
+ "step": 300
453
+ },
454
+ {
455
+ "epoch": 2.4206349206349205,
456
+ "grad_norm": 0.38671875,
457
+ "learning_rate": 0.0001879543139092747,
458
+ "loss": 0.557,
459
+ "step": 305
460
+ },
461
+ {
462
+ "epoch": 2.4603174603174605,
463
+ "grad_norm": 0.44921875,
464
+ "learning_rate": 0.00018728680099588748,
465
+ "loss": 0.5531,
466
+ "step": 310
467
+ },
468
+ {
469
+ "epoch": 2.5,
470
+ "grad_norm": 0.27734375,
471
+ "learning_rate": 0.00018660254037844388,
472
+ "loss": 0.5597,
473
+ "step": 315
474
+ },
475
+ {
476
+ "epoch": 2.5396825396825395,
477
+ "grad_norm": 0.71484375,
478
+ "learning_rate": 0.00018590166334591531,
479
+ "loss": 0.5578,
480
+ "step": 320
481
+ },
482
+ {
483
+ "epoch": 2.5793650793650795,
484
+ "grad_norm": 0.375,
485
+ "learning_rate": 0.000185184304375462,
486
+ "loss": 0.5546,
487
+ "step": 325
488
+ },
489
+ {
490
+ "epoch": 2.619047619047619,
491
+ "grad_norm": 0.2109375,
492
+ "learning_rate": 0.0001844506011066308,
493
+ "loss": 0.5532,
494
+ "step": 330
495
+ },
496
+ {
497
+ "epoch": 2.6587301587301586,
498
+ "grad_norm": 0.349609375,
499
+ "learning_rate": 0.00018370069431494646,
500
+ "loss": 0.5509,
501
+ "step": 335
502
+ },
503
+ {
504
+ "epoch": 2.6984126984126986,
505
+ "grad_norm": 0.27734375,
506
+ "learning_rate": 0.00018293472788490095,
507
+ "loss": 0.5479,
508
+ "step": 340
509
+ },
510
+ {
511
+ "epoch": 2.738095238095238,
512
+ "grad_norm": 0.279296875,
513
+ "learning_rate": 0.00018215284878234642,
514
+ "loss": 0.5505,
515
+ "step": 345
516
+ },
517
+ {
518
+ "epoch": 2.7777777777777777,
519
+ "grad_norm": 0.28125,
520
+ "learning_rate": 0.00018135520702629675,
521
+ "loss": 0.5466,
522
+ "step": 350
523
+ },
524
+ {
525
+ "epoch": 2.817460317460317,
526
+ "grad_norm": 0.373046875,
527
+ "learning_rate": 0.0001805419556601437,
528
+ "loss": 0.5548,
529
+ "step": 355
530
+ },
531
+ {
532
+ "epoch": 2.857142857142857,
533
+ "grad_norm": 0.25,
534
+ "learning_rate": 0.00017971325072229226,
535
+ "loss": 0.5521,
536
+ "step": 360
537
+ },
538
+ {
539
+ "epoch": 2.8968253968253967,
540
+ "grad_norm": 0.318359375,
541
+ "learning_rate": 0.0001788692512162216,
542
+ "loss": 0.5413,
543
+ "step": 365
544
+ },
545
+ {
546
+ "epoch": 2.9365079365079367,
547
+ "grad_norm": 0.390625,
548
+ "learning_rate": 0.00017801011907997725,
549
+ "loss": 0.5546,
550
+ "step": 370
551
+ },
552
+ {
553
+ "epoch": 2.9761904761904763,
554
+ "grad_norm": 0.26171875,
555
+ "learning_rate": 0.0001771360191551,
556
+ "loss": 0.5571,
557
+ "step": 375
558
+ },
559
+ {
560
+ "epoch": 3.0,
561
+ "eval_loss": 1.1491789817810059,
562
+ "eval_runtime": 1.0142,
563
+ "eval_samples_per_second": 1.972,
564
+ "eval_steps_per_second": 0.986,
565
+ "step": 378
566
+ },
567
+ {
568
+ "epoch": 3.015873015873016,
569
+ "grad_norm": 0.30859375,
570
+ "learning_rate": 0.00017624711915499764,
571
+ "loss": 0.5262,
572
+ "step": 380
573
+ },
574
+ {
575
+ "epoch": 3.0555555555555554,
576
+ "grad_norm": 0.2734375,
577
+ "learning_rate": 0.00017534358963276607,
578
+ "loss": 0.5035,
579
+ "step": 385
580
+ },
581
+ {
582
+ "epoch": 3.0952380952380953,
583
+ "grad_norm": 0.203125,
584
+ "learning_rate": 0.00017442560394846516,
585
+ "loss": 0.5017,
586
+ "step": 390
587
+ },
588
+ {
589
+ "epoch": 3.134920634920635,
590
+ "grad_norm": 0.21484375,
591
+ "learning_rate": 0.00017349333823585617,
592
+ "loss": 0.5052,
593
+ "step": 395
594
+ },
595
+ {
596
+ "epoch": 3.1746031746031744,
597
+ "grad_norm": 0.232421875,
598
+ "learning_rate": 0.00017254697136860703,
599
+ "loss": 0.5056,
600
+ "step": 400
601
+ },
602
+ {
603
+ "epoch": 3.2142857142857144,
604
+ "grad_norm": 0.333984375,
605
+ "learning_rate": 0.00017158668492597186,
606
+ "loss": 0.5199,
607
+ "step": 405
608
+ },
609
+ {
610
+ "epoch": 3.253968253968254,
611
+ "grad_norm": 0.2216796875,
612
+ "learning_rate": 0.00017061266315795146,
613
+ "loss": 0.5038,
614
+ "step": 410
615
+ },
616
+ {
617
+ "epoch": 3.2936507936507935,
618
+ "grad_norm": 0.333984375,
619
+ "learning_rate": 0.0001696250929499412,
620
+ "loss": 0.501,
621
+ "step": 415
622
+ },
623
+ {
624
+ "epoch": 3.3333333333333335,
625
+ "grad_norm": 0.79296875,
626
+ "learning_rate": 0.0001686241637868734,
627
+ "loss": 0.5044,
628
+ "step": 420
629
+ },
630
+ {
631
+ "epoch": 3.373015873015873,
632
+ "grad_norm": 0.4296875,
633
+ "learning_rate": 0.0001676100677168608,
634
+ "loss": 0.4998,
635
+ "step": 425
636
+ },
637
+ {
638
+ "epoch": 3.4126984126984126,
639
+ "grad_norm": 0.265625,
640
+ "learning_rate": 0.00016658299931434858,
641
+ "loss": 0.5172,
642
+ "step": 430
643
+ },
644
+ {
645
+ "epoch": 3.4523809523809526,
646
+ "grad_norm": 0.212890625,
647
+ "learning_rate": 0.000165543155642781,
648
+ "loss": 0.5082,
649
+ "step": 435
650
+ },
651
+ {
652
+ "epoch": 3.492063492063492,
653
+ "grad_norm": 0.22265625,
654
+ "learning_rate": 0.00016449073621679127,
655
+ "loss": 0.5016,
656
+ "step": 440
657
+ },
658
+ {
659
+ "epoch": 3.5317460317460316,
660
+ "grad_norm": 0.1865234375,
661
+ "learning_rate": 0.0001634259429639203,
662
+ "loss": 0.5109,
663
+ "step": 445
664
+ },
665
+ {
666
+ "epoch": 3.571428571428571,
667
+ "grad_norm": 0.197265625,
668
+ "learning_rate": 0.00016234898018587337,
669
+ "loss": 0.5178,
670
+ "step": 450
671
+ },
672
+ {
673
+ "epoch": 3.611111111111111,
674
+ "grad_norm": 0.267578125,
675
+ "learning_rate": 0.0001612600545193203,
676
+ "loss": 0.5163,
677
+ "step": 455
678
+ },
679
+ {
680
+ "epoch": 3.6507936507936507,
681
+ "grad_norm": 0.29296875,
682
+ "learning_rate": 0.00016015937489624848,
683
+ "loss": 0.5078,
684
+ "step": 460
685
+ },
686
+ {
687
+ "epoch": 3.6904761904761907,
688
+ "grad_norm": 0.25,
689
+ "learning_rate": 0.00015904715250387498,
690
+ "loss": 0.508,
691
+ "step": 465
692
+ },
693
+ {
694
+ "epoch": 3.7301587301587302,
695
+ "grad_norm": 0.2890625,
696
+ "learning_rate": 0.00015792360074412613,
697
+ "loss": 0.5055,
698
+ "step": 470
699
+ },
700
+ {
701
+ "epoch": 3.7698412698412698,
702
+ "grad_norm": 0.19140625,
703
+ "learning_rate": 0.00015678893519269197,
704
+ "loss": 0.5083,
705
+ "step": 475
706
+ },
707
+ {
708
+ "epoch": 3.8095238095238093,
709
+ "grad_norm": 0.322265625,
710
+ "learning_rate": 0.00015564337355766412,
711
+ "loss": 0.5121,
712
+ "step": 480
713
+ },
714
+ {
715
+ "epoch": 3.8492063492063493,
716
+ "grad_norm": 0.2109375,
717
+ "learning_rate": 0.00015448713563776374,
718
+ "loss": 0.4984,
719
+ "step": 485
720
+ },
721
+ {
722
+ "epoch": 3.888888888888889,
723
+ "grad_norm": 0.412109375,
724
+ "learning_rate": 0.00015332044328016914,
725
+ "loss": 0.5044,
726
+ "step": 490
727
+ },
728
+ {
729
+ "epoch": 3.928571428571429,
730
+ "grad_norm": 0.44140625,
731
+ "learning_rate": 0.0001521435203379498,
732
+ "loss": 0.5127,
733
+ "step": 495
734
+ },
735
+ {
736
+ "epoch": 3.9682539682539684,
737
+ "grad_norm": 0.54296875,
738
+ "learning_rate": 0.0001509565926271159,
739
+ "loss": 0.5118,
740
+ "step": 500
741
+ },
742
+ {
743
+ "epoch": 4.0,
744
+ "eval_loss": 1.155090570449829,
745
+ "eval_runtime": 1.0165,
746
+ "eval_samples_per_second": 1.968,
747
+ "eval_steps_per_second": 0.984,
748
+ "step": 504
749
+ },
750
+ {
751
+ "epoch": 4.007936507936508,
752
+ "grad_norm": 0.97265625,
753
+ "learning_rate": 0.00014975988788329064,
754
+ "loss": 0.4977,
755
+ "step": 505
756
+ },
757
+ {
758
+ "epoch": 4.0476190476190474,
759
+ "grad_norm": 0.515625,
760
+ "learning_rate": 0.00014855363571801523,
761
+ "loss": 0.4642,
762
+ "step": 510
763
+ },
764
+ {
765
+ "epoch": 4.087301587301587,
766
+ "grad_norm": 0.251953125,
767
+ "learning_rate": 0.00014733806757469286,
768
+ "loss": 0.457,
769
+ "step": 515
770
+ },
771
+ {
772
+ "epoch": 4.1269841269841265,
773
+ "grad_norm": 0.51953125,
774
+ "learning_rate": 0.000146113416684182,
775
+ "loss": 0.4674,
776
+ "step": 520
777
+ },
778
+ {
779
+ "epoch": 4.166666666666667,
780
+ "grad_norm": 0.259765625,
781
+ "learning_rate": 0.00014487991802004623,
782
+ "loss": 0.4608,
783
+ "step": 525
784
+ },
785
+ {
786
+ "epoch": 4.2063492063492065,
787
+ "grad_norm": 0.45703125,
788
+ "learning_rate": 0.00014363780825347005,
789
+ "loss": 0.4601,
790
+ "step": 530
791
+ },
792
+ {
793
+ "epoch": 4.246031746031746,
794
+ "grad_norm": 0.408203125,
795
+ "learning_rate": 0.00014238732570784866,
796
+ "loss": 0.4656,
797
+ "step": 535
798
+ },
799
+ {
800
+ "epoch": 4.285714285714286,
801
+ "grad_norm": 0.263671875,
802
+ "learning_rate": 0.00014112871031306119,
803
+ "loss": 0.4661,
804
+ "step": 540
805
+ },
806
+ {
807
+ "epoch": 4.325396825396825,
808
+ "grad_norm": 0.42578125,
809
+ "learning_rate": 0.00013986220355943494,
810
+ "loss": 0.4652,
811
+ "step": 545
812
+ },
813
+ {
814
+ "epoch": 4.365079365079365,
815
+ "grad_norm": 0.5,
816
+ "learning_rate": 0.00013858804845141116,
817
+ "loss": 0.4667,
818
+ "step": 550
819
+ },
820
+ {
821
+ "epoch": 4.404761904761905,
822
+ "grad_norm": 0.205078125,
823
+ "learning_rate": 0.0001373064894609194,
824
+ "loss": 0.469,
825
+ "step": 555
826
+ },
827
+ {
828
+ "epoch": 4.444444444444445,
829
+ "grad_norm": 0.59375,
830
+ "learning_rate": 0.00013601777248047105,
831
+ "loss": 0.4654,
832
+ "step": 560
833
+ },
834
+ {
835
+ "epoch": 4.484126984126984,
836
+ "grad_norm": 0.298828125,
837
+ "learning_rate": 0.00013472214477597977,
838
+ "loss": 0.4662,
839
+ "step": 565
840
+ },
841
+ {
842
+ "epoch": 4.523809523809524,
843
+ "grad_norm": 0.28515625,
844
+ "learning_rate": 0.00013341985493931877,
845
+ "loss": 0.4669,
846
+ "step": 570
847
+ },
848
+ {
849
+ "epoch": 4.563492063492063,
850
+ "grad_norm": 0.3125,
851
+ "learning_rate": 0.00013211115284062335,
852
+ "loss": 0.465,
853
+ "step": 575
854
+ },
855
+ {
856
+ "epoch": 4.603174603174603,
857
+ "grad_norm": 0.30859375,
858
+ "learning_rate": 0.00013079628958034855,
859
+ "loss": 0.4696,
860
+ "step": 580
861
+ },
862
+ {
863
+ "epoch": 4.642857142857143,
864
+ "grad_norm": 0.38671875,
865
+ "learning_rate": 0.00012947551744109043,
866
+ "loss": 0.4837,
867
+ "step": 585
868
+ },
869
+ {
870
+ "epoch": 4.682539682539683,
871
+ "grad_norm": 0.75390625,
872
+ "learning_rate": 0.00012814908983918073,
873
+ "loss": 0.4752,
874
+ "step": 590
875
+ },
876
+ {
877
+ "epoch": 4.722222222222222,
878
+ "grad_norm": 0.59375,
879
+ "learning_rate": 0.00012681726127606376,
880
+ "loss": 0.4678,
881
+ "step": 595
882
+ },
883
+ {
884
+ "epoch": 4.761904761904762,
885
+ "grad_norm": 0.306640625,
886
+ "learning_rate": 0.0001254802872894655,
887
+ "loss": 0.4753,
888
+ "step": 600
889
+ },
890
+ {
891
+ "epoch": 4.801587301587301,
892
+ "grad_norm": 0.333984375,
893
+ "learning_rate": 0.00012413842440436333,
894
+ "loss": 0.473,
895
+ "step": 605
896
+ },
897
+ {
898
+ "epoch": 4.841269841269841,
899
+ "grad_norm": 0.353515625,
900
+ "learning_rate": 0.000122791930083767,
901
+ "loss": 0.4646,
902
+ "step": 610
903
+ },
904
+ {
905
+ "epoch": 4.880952380952381,
906
+ "grad_norm": 0.224609375,
907
+ "learning_rate": 0.00012144106267931876,
908
+ "loss": 0.4715,
909
+ "step": 615
910
+ },
911
+ {
912
+ "epoch": 4.920634920634921,
913
+ "grad_norm": 0.2734375,
914
+ "learning_rate": 0.00012008608138172393,
915
+ "loss": 0.4704,
916
+ "step": 620
917
+ },
918
+ {
919
+ "epoch": 4.9603174603174605,
920
+ "grad_norm": 0.31640625,
921
+ "learning_rate": 0.00011872724617101969,
922
+ "loss": 0.4657,
923
+ "step": 625
924
+ },
925
+ {
926
+ "epoch": 5.0,
927
+ "grad_norm": 0.224609375,
928
+ "learning_rate": 0.00011736481776669306,
929
+ "loss": 0.4711,
930
+ "step": 630
931
+ },
932
+ {
933
+ "epoch": 5.0,
934
+ "eval_loss": 1.1766771078109741,
935
+ "eval_runtime": 1.015,
936
+ "eval_samples_per_second": 1.971,
937
+ "eval_steps_per_second": 0.985,
938
+ "step": 630
939
+ },
940
+ {
941
+ "epoch": 5.0396825396825395,
942
+ "grad_norm": 0.2890625,
943
+ "learning_rate": 0.0001159990575776563,
944
+ "loss": 0.4228,
945
+ "step": 635
946
+ },
947
+ {
948
+ "epoch": 5.079365079365079,
949
+ "grad_norm": 0.3125,
950
+ "learning_rate": 0.00011463022765209088,
951
+ "loss": 0.4163,
952
+ "step": 640
953
+ },
954
+ {
955
+ "epoch": 5.119047619047619,
956
+ "grad_norm": 0.353515625,
957
+ "learning_rate": 0.00011325859062716795,
958
+ "loss": 0.4213,
959
+ "step": 645
960
+ },
961
+ {
962
+ "epoch": 5.158730158730159,
963
+ "grad_norm": 0.365234375,
964
+ "learning_rate": 0.00011188440967865641,
965
+ "loss": 0.4319,
966
+ "step": 650
967
+ },
968
+ {
969
+ "epoch": 5.198412698412699,
970
+ "grad_norm": 0.3203125,
971
+ "learning_rate": 0.00011050794847042731,
972
+ "loss": 0.4258,
973
+ "step": 655
974
+ },
975
+ {
976
+ "epoch": 5.238095238095238,
977
+ "grad_norm": 0.26953125,
978
+ "learning_rate": 0.00010912947110386484,
979
+ "loss": 0.4237,
980
+ "step": 660
981
+ },
982
+ {
983
+ "epoch": 5.277777777777778,
984
+ "grad_norm": 0.5078125,
985
+ "learning_rate": 0.0001077492420671931,
986
+ "loss": 0.4238,
987
+ "step": 665
988
+ },
989
+ {
990
+ "epoch": 5.317460317460317,
991
+ "grad_norm": 0.22265625,
992
+ "learning_rate": 0.00010636752618472887,
993
+ "loss": 0.4225,
994
+ "step": 670
995
+ },
996
+ {
997
+ "epoch": 5.357142857142857,
998
+ "grad_norm": 0.2451171875,
999
+ "learning_rate": 0.00010498458856606972,
1000
+ "loss": 0.4214,
1001
+ "step": 675
1002
+ },
1003
+ {
1004
+ "epoch": 5.396825396825397,
1005
+ "grad_norm": 0.283203125,
1006
+ "learning_rate": 0.00010360069455522765,
1007
+ "loss": 0.425,
1008
+ "step": 680
1009
+ },
1010
+ {
1011
+ "epoch": 5.436507936507937,
1012
+ "grad_norm": 0.43359375,
1013
+ "learning_rate": 0.00010221610967971735,
1014
+ "loss": 0.4281,
1015
+ "step": 685
1016
+ },
1017
+ {
1018
+ "epoch": 5.476190476190476,
1019
+ "grad_norm": 0.275390625,
1020
+ "learning_rate": 0.00010083109959960973,
1021
+ "loss": 0.427,
1022
+ "step": 690
1023
+ },
1024
+ {
1025
+ "epoch": 5.515873015873016,
1026
+ "grad_norm": 0.2099609375,
1027
+ "learning_rate": 9.944593005655947e-05,
1028
+ "loss": 0.4299,
1029
+ "step": 695
1030
+ },
1031
+ {
1032
+ "epoch": 5.555555555555555,
1033
+ "grad_norm": 0.216796875,
1034
+ "learning_rate": 9.806086682281758e-05,
1035
+ "loss": 0.4413,
1036
+ "step": 700
1037
+ },
1038
+ {
1039
+ "epoch": 5.595238095238095,
1040
+ "grad_norm": 0.2158203125,
1041
+ "learning_rate": 9.667617565023735e-05,
1042
+ "loss": 0.4352,
1043
+ "step": 705
1044
+ },
1045
+ {
1046
+ "epoch": 5.634920634920634,
1047
+ "grad_norm": 0.2578125,
1048
+ "learning_rate": 9.529212221928483e-05,
1049
+ "loss": 0.4337,
1050
+ "step": 710
1051
+ },
1052
+ {
1053
+ "epoch": 5.674603174603175,
1054
+ "grad_norm": 0.2294921875,
1055
+ "learning_rate": 9.390897208806266e-05,
1056
+ "loss": 0.4242,
1057
+ "step": 715
1058
+ },
1059
+ {
1060
+ "epoch": 5.714285714285714,
1061
+ "grad_norm": 0.365234375,
1062
+ "learning_rate": 9.252699064135758e-05,
1063
+ "loss": 0.4286,
1064
+ "step": 720
1065
+ },
1066
+ {
1067
+ "epoch": 5.753968253968254,
1068
+ "grad_norm": 0.5078125,
1069
+ "learning_rate": 9.114644303972096e-05,
1070
+ "loss": 0.4349,
1071
+ "step": 725
1072
+ },
1073
+ {
1074
+ "epoch": 5.7936507936507935,
1075
+ "grad_norm": 0.318359375,
1076
+ "learning_rate": 8.976759416859256e-05,
1077
+ "loss": 0.4311,
1078
+ "step": 730
1079
+ },
1080
+ {
1081
+ "epoch": 5.833333333333333,
1082
+ "grad_norm": 0.2470703125,
1083
+ "learning_rate": 8.839070858747697e-05,
1084
+ "loss": 0.4257,
1085
+ "step": 735
1086
+ },
1087
+ {
1088
+ "epoch": 5.8730158730158735,
1089
+ "grad_norm": 0.236328125,
1090
+ "learning_rate": 8.701605047918276e-05,
1091
+ "loss": 0.4332,
1092
+ "step": 740
1093
+ },
1094
+ {
1095
+ "epoch": 5.912698412698413,
1096
+ "grad_norm": 0.2255859375,
1097
+ "learning_rate": 8.564388359913356e-05,
1098
+ "loss": 0.4309,
1099
+ "step": 745
1100
+ },
1101
+ {
1102
+ "epoch": 5.9523809523809526,
1103
+ "grad_norm": 0.2138671875,
1104
+ "learning_rate": 8.427447122476148e-05,
1105
+ "loss": 0.4311,
1106
+ "step": 750
1107
+ },
1108
+ {
1109
+ "epoch": 5.992063492063492,
1110
+ "grad_norm": 0.244140625,
1111
+ "learning_rate": 8.290807610499206e-05,
1112
+ "loss": 0.4287,
1113
+ "step": 755
1114
+ },
1115
+ {
1116
+ "epoch": 6.0,
1117
+ "eval_loss": 1.1948192119598389,
1118
+ "eval_runtime": 1.0151,
1119
+ "eval_samples_per_second": 1.97,
1120
+ "eval_steps_per_second": 0.985,
1121
+ "step": 756
1122
+ },
1123
+ {
1124
+ "epoch": 6.031746031746032,
1125
+ "grad_norm": 0.265625,
1126
+ "learning_rate": 8.154496040983073e-05,
1127
+ "loss": 0.3917,
1128
+ "step": 760
1129
+ },
1130
+ {
1131
+ "epoch": 6.071428571428571,
1132
+ "grad_norm": 0.322265625,
1133
+ "learning_rate": 8.018538568006027e-05,
1134
+ "loss": 0.3851,
1135
+ "step": 765
1136
+ },
1137
+ {
1138
+ "epoch": 6.111111111111111,
1139
+ "grad_norm": 0.267578125,
1140
+ "learning_rate": 7.882961277705895e-05,
1141
+ "loss": 0.3846,
1142
+ "step": 770
1143
+ },
1144
+ {
1145
+ "epoch": 6.150793650793651,
1146
+ "grad_norm": 0.392578125,
1147
+ "learning_rate": 7.747790183274922e-05,
1148
+ "loss": 0.3897,
1149
+ "step": 775
1150
+ },
1151
+ {
1152
+ "epoch": 6.190476190476191,
1153
+ "grad_norm": 0.25,
1154
+ "learning_rate": 7.613051219968623e-05,
1155
+ "loss": 0.3894,
1156
+ "step": 780
1157
+ },
1158
+ {
1159
+ "epoch": 6.23015873015873,
1160
+ "grad_norm": 0.30859375,
1161
+ "learning_rate": 7.478770240129579e-05,
1162
+ "loss": 0.386,
1163
+ "step": 785
1164
+ },
1165
+ {
1166
+ "epoch": 6.26984126984127,
1167
+ "grad_norm": 0.29296875,
1168
+ "learning_rate": 7.344973008227161e-05,
1169
+ "loss": 0.383,
1170
+ "step": 790
1171
+ },
1172
+ {
1173
+ "epoch": 6.309523809523809,
1174
+ "grad_norm": 0.265625,
1175
+ "learning_rate": 7.211685195914097e-05,
1176
+ "loss": 0.3867,
1177
+ "step": 795
1178
+ },
1179
+ {
1180
+ "epoch": 6.349206349206349,
1181
+ "grad_norm": 0.263671875,
1182
+ "learning_rate": 7.078932377100877e-05,
1183
+ "loss": 0.393,
1184
+ "step": 800
1185
+ },
1186
+ {
1187
+ "epoch": 6.388888888888889,
1188
+ "grad_norm": 0.345703125,
1189
+ "learning_rate": 6.94674002304887e-05,
1190
+ "loss": 0.3856,
1191
+ "step": 805
1192
+ },
1193
+ {
1194
+ "epoch": 6.428571428571429,
1195
+ "grad_norm": 0.2255859375,
1196
+ "learning_rate": 6.815133497483157e-05,
1197
+ "loss": 0.397,
1198
+ "step": 810
1199
+ },
1200
+ {
1201
+ "epoch": 6.468253968253968,
1202
+ "grad_norm": 0.302734375,
1203
+ "learning_rate": 6.684138051726012e-05,
1204
+ "loss": 0.3879,
1205
+ "step": 815
1206
+ },
1207
+ {
1208
+ "epoch": 6.507936507936508,
1209
+ "grad_norm": 0.248046875,
1210
+ "learning_rate": 6.553778819851926e-05,
1211
+ "loss": 0.3852,
1212
+ "step": 820
1213
+ },
1214
+ {
1215
+ "epoch": 6.5476190476190474,
1216
+ "grad_norm": 0.24609375,
1217
+ "learning_rate": 6.424080813865138e-05,
1218
+ "loss": 0.3956,
1219
+ "step": 825
1220
+ },
1221
+ {
1222
+ "epoch": 6.587301587301587,
1223
+ "grad_norm": 0.4296875,
1224
+ "learning_rate": 6.295068918900586e-05,
1225
+ "loss": 0.394,
1226
+ "step": 830
1227
+ },
1228
+ {
1229
+ "epoch": 6.6269841269841265,
1230
+ "grad_norm": 0.2392578125,
1231
+ "learning_rate": 6.16676788844919e-05,
1232
+ "loss": 0.3904,
1233
+ "step": 835
1234
+ },
1235
+ {
1236
+ "epoch": 6.666666666666667,
1237
+ "grad_norm": 0.33984375,
1238
+ "learning_rate": 6.039202339608432e-05,
1239
+ "loss": 0.395,
1240
+ "step": 840
1241
+ },
1242
+ {
1243
+ "epoch": 6.7063492063492065,
1244
+ "grad_norm": 0.326171875,
1245
+ "learning_rate": 5.912396748359046e-05,
1246
+ "loss": 0.3892,
1247
+ "step": 845
1248
+ },
1249
+ {
1250
+ "epoch": 6.746031746031746,
1251
+ "grad_norm": 0.337890625,
1252
+ "learning_rate": 5.786375444868828e-05,
1253
+ "loss": 0.3945,
1254
+ "step": 850
1255
+ },
1256
+ {
1257
+ "epoch": 6.785714285714286,
1258
+ "grad_norm": 0.443359375,
1259
+ "learning_rate": 5.6611626088244194e-05,
1260
+ "loss": 0.3948,
1261
+ "step": 855
1262
+ },
1263
+ {
1264
+ "epoch": 6.825396825396825,
1265
+ "grad_norm": 0.314453125,
1266
+ "learning_rate": 5.5367822647919424e-05,
1267
+ "loss": 0.3953,
1268
+ "step": 860
1269
+ },
1270
+ {
1271
+ "epoch": 6.865079365079366,
1272
+ "grad_norm": 0.283203125,
1273
+ "learning_rate": 5.4132582776074126e-05,
1274
+ "loss": 0.3983,
1275
+ "step": 865
1276
+ },
1277
+ {
1278
+ "epoch": 6.904761904761905,
1279
+ "grad_norm": 0.236328125,
1280
+ "learning_rate": 5.290614347797802e-05,
1281
+ "loss": 0.3862,
1282
+ "step": 870
1283
+ },
1284
+ {
1285
+ "epoch": 6.944444444444445,
1286
+ "grad_norm": 0.2431640625,
1287
+ "learning_rate": 5.168874007033615e-05,
1288
+ "loss": 0.39,
1289
+ "step": 875
1290
+ },
1291
+ {
1292
+ "epoch": 6.984126984126984,
1293
+ "grad_norm": 0.232421875,
1294
+ "learning_rate": 5.048060613613888e-05,
1295
+ "loss": 0.3943,
1296
+ "step": 880
1297
+ },
1298
+ {
1299
+ "epoch": 7.0,
1300
+ "eval_loss": 1.2383077144622803,
1301
+ "eval_runtime": 1.0141,
1302
+ "eval_samples_per_second": 1.972,
1303
+ "eval_steps_per_second": 0.986,
1304
+ "step": 882
1305
+ },
1306
+ {
1307
+ "epoch": 7.023809523809524,
1308
+ "grad_norm": 0.2216796875,
1309
+ "learning_rate": 4.92819734798441e-05,
1310
+ "loss": 0.3718,
1311
+ "step": 885
1312
+ },
1313
+ {
1314
+ "epoch": 7.063492063492063,
1315
+ "grad_norm": 0.41015625,
1316
+ "learning_rate": 4.809307208290114e-05,
1317
+ "loss": 0.3505,
1318
+ "step": 890
1319
+ },
1320
+ {
1321
+ "epoch": 7.103174603174603,
1322
+ "grad_norm": 0.236328125,
1323
+ "learning_rate": 4.691413005962415e-05,
1324
+ "loss": 0.3559,
1325
+ "step": 895
1326
+ },
1327
+ {
1328
+ "epoch": 7.142857142857143,
1329
+ "grad_norm": 0.248046875,
1330
+ "learning_rate": 4.574537361342407e-05,
1331
+ "loss": 0.3581,
1332
+ "step": 900
1333
+ },
1334
+ {
1335
+ "epoch": 7.182539682539683,
1336
+ "grad_norm": 0.224609375,
1337
+ "learning_rate": 4.458702699340667e-05,
1338
+ "loss": 0.3601,
1339
+ "step": 905
1340
+ },
1341
+ {
1342
+ "epoch": 7.222222222222222,
1343
+ "grad_norm": 0.259765625,
1344
+ "learning_rate": 4.343931245134616e-05,
1345
+ "loss": 0.3587,
1346
+ "step": 910
1347
+ },
1348
+ {
1349
+ "epoch": 7.261904761904762,
1350
+ "grad_norm": 0.240234375,
1351
+ "learning_rate": 4.23024501990417e-05,
1352
+ "loss": 0.3589,
1353
+ "step": 915
1354
+ },
1355
+ {
1356
+ "epoch": 7.301587301587301,
1357
+ "grad_norm": 0.26171875,
1358
+ "learning_rate": 4.117665836606549e-05,
1359
+ "loss": 0.3595,
1360
+ "step": 920
1361
+ },
1362
+ {
1363
+ "epoch": 7.341269841269841,
1364
+ "grad_norm": 0.259765625,
1365
+ "learning_rate": 4.00621529579101e-05,
1366
+ "loss": 0.359,
1367
+ "step": 925
1368
+ },
1369
+ {
1370
+ "epoch": 7.380952380952381,
1371
+ "grad_norm": 0.24609375,
1372
+ "learning_rate": 3.89591478145437e-05,
1373
+ "loss": 0.3637,
1374
+ "step": 930
1375
+ },
1376
+ {
1377
+ "epoch": 7.420634920634921,
1378
+ "grad_norm": 0.26953125,
1379
+ "learning_rate": 3.786785456938049e-05,
1380
+ "loss": 0.3667,
1381
+ "step": 935
1382
+ },
1383
+ {
1384
+ "epoch": 7.4603174603174605,
1385
+ "grad_norm": 0.26171875,
1386
+ "learning_rate": 3.6788482608674826e-05,
1387
+ "loss": 0.3566,
1388
+ "step": 940
1389
+ },
1390
+ {
1391
+ "epoch": 7.5,
1392
+ "grad_norm": 0.248046875,
1393
+ "learning_rate": 3.5721239031346066e-05,
1394
+ "loss": 0.3538,
1395
+ "step": 945
1396
+ },
1397
+ {
1398
+ "epoch": 7.5396825396825395,
1399
+ "grad_norm": 0.2392578125,
1400
+ "learning_rate": 3.4666328609242725e-05,
1401
+ "loss": 0.3564,
1402
+ "step": 950
1403
+ },
1404
+ {
1405
+ "epoch": 7.579365079365079,
1406
+ "grad_norm": 0.2421875,
1407
+ "learning_rate": 3.362395374785283e-05,
1408
+ "loss": 0.3588,
1409
+ "step": 955
1410
+ },
1411
+ {
1412
+ "epoch": 7.619047619047619,
1413
+ "grad_norm": 0.25390625,
1414
+ "learning_rate": 3.259431444746846e-05,
1415
+ "loss": 0.3617,
1416
+ "step": 960
1417
+ },
1418
+ {
1419
+ "epoch": 7.658730158730159,
1420
+ "grad_norm": 0.2470703125,
1421
+ "learning_rate": 3.157760826481174e-05,
1422
+ "loss": 0.3616,
1423
+ "step": 965
1424
+ },
1425
+ {
1426
+ "epoch": 7.698412698412699,
1427
+ "grad_norm": 0.25,
1428
+ "learning_rate": 3.057403027512963e-05,
1429
+ "loss": 0.3531,
1430
+ "step": 970
1431
+ },
1432
+ {
1433
+ "epoch": 7.738095238095238,
1434
+ "grad_norm": 0.287109375,
1435
+ "learning_rate": 2.9583773034764826e-05,
1436
+ "loss": 0.3547,
1437
+ "step": 975
1438
+ },
1439
+ {
1440
+ "epoch": 7.777777777777778,
1441
+ "grad_norm": 0.2421875,
1442
+ "learning_rate": 2.8607026544210114e-05,
1443
+ "loss": 0.3609,
1444
+ "step": 980
1445
+ },
1446
+ {
1447
+ "epoch": 7.817460317460317,
1448
+ "grad_norm": 0.2392578125,
1449
+ "learning_rate": 2.764397821165292e-05,
1450
+ "loss": 0.3588,
1451
+ "step": 985
1452
+ },
1453
+ {
1454
+ "epoch": 7.857142857142857,
1455
+ "grad_norm": 0.271484375,
1456
+ "learning_rate": 2.669481281701739e-05,
1457
+ "loss": 0.3648,
1458
+ "step": 990
1459
+ },
1460
+ {
1461
+ "epoch": 7.896825396825397,
1462
+ "grad_norm": 0.251953125,
1463
+ "learning_rate": 2.5759712476510622e-05,
1464
+ "loss": 0.3635,
1465
+ "step": 995
1466
+ },
1467
+ {
1468
+ "epoch": 7.936507936507937,
1469
+ "grad_norm": 0.2412109375,
1470
+ "learning_rate": 2.4838856607680183e-05,
1471
+ "loss": 0.3568,
1472
+ "step": 1000
1473
+ },
1474
+ {
1475
+ "epoch": 7.976190476190476,
1476
+ "grad_norm": 0.25390625,
1477
+ "learning_rate": 2.3932421894989167e-05,
1478
+ "loss": 0.3612,
1479
+ "step": 1005
1480
+ },
1481
+ {
1482
+ "epoch": 8.0,
1483
+ "eval_loss": 1.2904332876205444,
1484
+ "eval_runtime": 1.0153,
1485
+ "eval_samples_per_second": 1.97,
1486
+ "eval_steps_per_second": 0.985,
1487
+ "step": 1008
1488
+ },
1489
+ {
1490
+ "epoch": 8.015873015873016,
1491
+ "grad_norm": 0.26171875,
1492
+ "learning_rate": 2.304058225591581e-05,
1493
+ "loss": 0.354,
1494
+ "step": 1010
1495
+ },
1496
+ {
1497
+ "epoch": 8.055555555555555,
1498
+ "grad_norm": 0.23828125,
1499
+ "learning_rate": 2.2163508807583998e-05,
1500
+ "loss": 0.3464,
1501
+ "step": 1015
1502
+ },
1503
+ {
1504
+ "epoch": 8.095238095238095,
1505
+ "grad_norm": 0.259765625,
1506
+ "learning_rate": 2.1301369833931117e-05,
1507
+ "loss": 0.3411,
1508
+ "step": 1020
1509
+ },
1510
+ {
1511
+ "epoch": 8.134920634920634,
1512
+ "grad_norm": 0.259765625,
1513
+ "learning_rate": 2.045433075341927e-05,
1514
+ "loss": 0.3369,
1515
+ "step": 1025
1516
+ },
1517
+ {
1518
+ "epoch": 8.174603174603174,
1519
+ "grad_norm": 0.232421875,
1520
+ "learning_rate": 1.962255408729662e-05,
1521
+ "loss": 0.3369,
1522
+ "step": 1030
1523
+ },
1524
+ {
1525
+ "epoch": 8.214285714285714,
1526
+ "grad_norm": 0.2353515625,
1527
+ "learning_rate": 1.880619942841435e-05,
1528
+ "loss": 0.3391,
1529
+ "step": 1035
1530
+ },
1531
+ {
1532
+ "epoch": 8.253968253968253,
1533
+ "grad_norm": 0.2470703125,
1534
+ "learning_rate": 1.8005423410605772e-05,
1535
+ "loss": 0.3395,
1536
+ "step": 1040
1537
+ },
1538
+ {
1539
+ "epoch": 8.293650793650794,
1540
+ "grad_norm": 0.244140625,
1541
+ "learning_rate": 1.7220379678632814e-05,
1542
+ "loss": 0.3409,
1543
+ "step": 1045
1544
+ },
1545
+ {
1546
+ "epoch": 8.333333333333334,
1547
+ "grad_norm": 0.2412109375,
1548
+ "learning_rate": 1.6451218858706374e-05,
1549
+ "loss": 0.3483,
1550
+ "step": 1050
1551
+ },
1552
+ {
1553
+ "epoch": 8.373015873015873,
1554
+ "grad_norm": 0.2421875,
1555
+ "learning_rate": 1.5698088529585597e-05,
1556
+ "loss": 0.3459,
1557
+ "step": 1055
1558
+ },
1559
+ {
1560
+ "epoch": 8.412698412698413,
1561
+ "grad_norm": 0.2412109375,
1562
+ "learning_rate": 1.49611331942621e-05,
1563
+ "loss": 0.336,
1564
+ "step": 1060
1565
+ },
1566
+ {
1567
+ "epoch": 8.452380952380953,
1568
+ "grad_norm": 0.2578125,
1569
+ "learning_rate": 1.4240494252234049e-05,
1570
+ "loss": 0.3349,
1571
+ "step": 1065
1572
+ },
1573
+ {
1574
+ "epoch": 8.492063492063492,
1575
+ "grad_norm": 0.2412109375,
1576
+ "learning_rate": 1.3536309972375948e-05,
1577
+ "loss": 0.3463,
1578
+ "step": 1070
1579
+ },
1580
+ {
1581
+ "epoch": 8.531746031746032,
1582
+ "grad_norm": 0.23828125,
1583
+ "learning_rate": 1.2848715466408967e-05,
1584
+ "loss": 0.3372,
1585
+ "step": 1075
1586
+ },
1587
+ {
1588
+ "epoch": 8.571428571428571,
1589
+ "grad_norm": 0.2314453125,
1590
+ "learning_rate": 1.2177842662977135e-05,
1591
+ "loss": 0.3346,
1592
+ "step": 1080
1593
+ },
1594
+ {
1595
+ "epoch": 8.61111111111111,
1596
+ "grad_norm": 0.236328125,
1597
+ "learning_rate": 1.1523820282334219e-05,
1598
+ "loss": 0.3449,
1599
+ "step": 1085
1600
+ },
1601
+ {
1602
+ "epoch": 8.65079365079365,
1603
+ "grad_norm": 0.234375,
1604
+ "learning_rate": 1.088677381164609e-05,
1605
+ "loss": 0.3368,
1606
+ "step": 1090
1607
+ },
1608
+ {
1609
+ "epoch": 8.69047619047619,
1610
+ "grad_norm": 0.23828125,
1611
+ "learning_rate": 1.0266825480913611e-05,
1612
+ "loss": 0.3379,
1613
+ "step": 1095
1614
+ },
1615
+ {
1616
+ "epoch": 8.73015873015873,
1617
+ "grad_norm": 0.2421875,
1618
+ "learning_rate": 9.664094239520372e-06,
1619
+ "loss": 0.348,
1620
+ "step": 1100
1621
+ },
1622
+ {
1623
+ "epoch": 8.76984126984127,
1624
+ "grad_norm": 0.244140625,
1625
+ "learning_rate": 9.07869573340987e-06,
1626
+ "loss": 0.343,
1627
+ "step": 1105
1628
+ },
1629
+ {
1630
+ "epoch": 8.80952380952381,
1631
+ "grad_norm": 0.236328125,
1632
+ "learning_rate": 8.510742282896544e-06,
1633
+ "loss": 0.3396,
1634
+ "step": 1110
1635
+ },
1636
+ {
1637
+ "epoch": 8.84920634920635,
1638
+ "grad_norm": 0.23828125,
1639
+ "learning_rate": 7.960342861114921e-06,
1640
+ "loss": 0.3391,
1641
+ "step": 1115
1642
+ },
1643
+ {
1644
+ "epoch": 8.88888888888889,
1645
+ "grad_norm": 0.2333984375,
1646
+ "learning_rate": 7.427603073110967e-06,
1647
+ "loss": 0.3405,
1648
+ "step": 1120
1649
+ },
1650
+ {
1651
+ "epoch": 8.928571428571429,
1652
+ "grad_norm": 0.2314453125,
1653
+ "learning_rate": 6.9126251355795864e-06,
1654
+ "loss": 0.3378,
1655
+ "step": 1125
1656
+ },
1657
+ {
1658
+ "epoch": 8.968253968253968,
1659
+ "grad_norm": 0.263671875,
1660
+ "learning_rate": 6.415507857252389e-06,
1661
+ "loss": 0.3457,
1662
+ "step": 1130
1663
+ },
1664
+ {
1665
+ "epoch": 9.0,
1666
+ "eval_loss": 1.3253473043441772,
1667
+ "eval_runtime": 1.0138,
1668
+ "eval_samples_per_second": 1.973,
1669
+ "eval_steps_per_second": 0.986,
1670
+ "step": 1134
1671
+ },
1672
+ {
1673
+ "epoch": 9.007936507936508,
1674
+ "grad_norm": 0.232421875,
1675
+ "learning_rate": 5.936346619939271e-06,
1676
+ "loss": 0.3402,
1677
+ "step": 1135
1678
+ },
1679
+ {
1680
+ "epoch": 9.047619047619047,
1681
+ "grad_norm": 0.248046875,
1682
+ "learning_rate": 5.475233360227516e-06,
1683
+ "loss": 0.3362,
1684
+ "step": 1140
1685
+ },
1686
+ {
1687
+ "epoch": 9.087301587301587,
1688
+ "grad_norm": 0.232421875,
1689
+ "learning_rate": 5.03225655184194e-06,
1690
+ "loss": 0.3346,
1691
+ "step": 1145
1692
+ },
1693
+ {
1694
+ "epoch": 9.126984126984127,
1695
+ "grad_norm": 0.228515625,
1696
+ "learning_rate": 4.607501188669394e-06,
1697
+ "loss": 0.3358,
1698
+ "step": 1150
1699
+ },
1700
+ {
1701
+ "epoch": 9.166666666666666,
1702
+ "grad_norm": 0.232421875,
1703
+ "learning_rate": 4.20104876845111e-06,
1704
+ "loss": 0.333,
1705
+ "step": 1155
1706
+ },
1707
+ {
1708
+ "epoch": 9.206349206349206,
1709
+ "grad_norm": 0.2373046875,
1710
+ "learning_rate": 3.8129772771456797e-06,
1711
+ "loss": 0.3393,
1712
+ "step": 1160
1713
+ },
1714
+ {
1715
+ "epoch": 9.246031746031745,
1716
+ "grad_norm": 0.232421875,
1717
+ "learning_rate": 3.4433611739658645e-06,
1718
+ "loss": 0.3409,
1719
+ "step": 1165
1720
+ },
1721
+ {
1722
+ "epoch": 9.285714285714286,
1723
+ "grad_norm": 0.2373046875,
1724
+ "learning_rate": 3.092271377092215e-06,
1725
+ "loss": 0.34,
1726
+ "step": 1170
1727
+ },
1728
+ {
1729
+ "epoch": 9.325396825396826,
1730
+ "grad_norm": 0.2353515625,
1731
+ "learning_rate": 2.759775250065899e-06,
1732
+ "loss": 0.3362,
1733
+ "step": 1175
1734
+ },
1735
+ {
1736
+ "epoch": 9.365079365079366,
1737
+ "grad_norm": 0.23828125,
1738
+ "learning_rate": 2.4459365888638062e-06,
1739
+ "loss": 0.333,
1740
+ "step": 1180
1741
+ },
1742
+ {
1743
+ "epoch": 9.404761904761905,
1744
+ "grad_norm": 0.2314453125,
1745
+ "learning_rate": 2.150815609657875e-06,
1746
+ "loss": 0.3315,
1747
+ "step": 1185
1748
+ },
1749
+ {
1750
+ "epoch": 9.444444444444445,
1751
+ "grad_norm": 0.23046875,
1752
+ "learning_rate": 1.874468937261531e-06,
1753
+ "loss": 0.3371,
1754
+ "step": 1190
1755
+ },
1756
+ {
1757
+ "epoch": 9.484126984126984,
1758
+ "grad_norm": 0.24609375,
1759
+ "learning_rate": 1.6169495942650714e-06,
1760
+ "loss": 0.339,
1761
+ "step": 1195
1762
+ },
1763
+ {
1764
+ "epoch": 9.523809523809524,
1765
+ "grad_norm": 0.23046875,
1766
+ "learning_rate": 1.378306990862177e-06,
1767
+ "loss": 0.3413,
1768
+ "step": 1200
1769
+ },
1770
+ {
1771
+ "epoch": 9.563492063492063,
1772
+ "grad_norm": 0.2333984375,
1773
+ "learning_rate": 1.158586915369675e-06,
1774
+ "loss": 0.3313,
1775
+ "step": 1205
1776
+ },
1777
+ {
1778
+ "epoch": 9.603174603174603,
1779
+ "grad_norm": 0.2333984375,
1780
+ "learning_rate": 9.578315254420767e-07,
1781
+ "loss": 0.339,
1782
+ "step": 1210
1783
+ },
1784
+ {
1785
+ "epoch": 9.642857142857142,
1786
+ "grad_norm": 0.2392578125,
1787
+ "learning_rate": 7.760793399827937e-07,
1788
+ "loss": 0.3284,
1789
+ "step": 1215
1790
+ },
1791
+ {
1792
+ "epoch": 9.682539682539682,
1793
+ "grad_norm": 0.23828125,
1794
+ "learning_rate": 6.13365231753571e-07,
1795
+ "loss": 0.3359,
1796
+ "step": 1220
1797
+ },
1798
+ {
1799
+ "epoch": 9.722222222222221,
1800
+ "grad_norm": 0.228515625,
1801
+ "learning_rate": 4.6972042068341714e-07,
1802
+ "loss": 0.3394,
1803
+ "step": 1225
1804
+ },
1805
+ {
1806
+ "epoch": 9.761904761904763,
1807
+ "grad_norm": 0.2373046875,
1808
+ "learning_rate": 3.451724678784518e-07,
1809
+ "loss": 0.337,
1810
+ "step": 1230
1811
+ },
1812
+ {
1813
+ "epoch": 9.801587301587302,
1814
+ "grad_norm": 0.2421875,
1815
+ "learning_rate": 2.397452703337577e-07,
1816
+ "loss": 0.3344,
1817
+ "step": 1235
1818
+ },
1819
+ {
1820
+ "epoch": 9.841269841269842,
1821
+ "grad_norm": 0.234375,
1822
+ "learning_rate": 1.5345905634827074e-07,
1823
+ "loss": 0.3307,
1824
+ "step": 1240
1825
+ },
1826
+ {
1827
+ "epoch": 9.880952380952381,
1828
+ "grad_norm": 0.2373046875,
1829
+ "learning_rate": 8.633038164358454e-08,
1830
+ "loss": 0.3361,
1831
+ "step": 1245
1832
+ },
1833
+ {
1834
+ "epoch": 9.920634920634921,
1835
+ "grad_norm": 0.2392578125,
1836
+ "learning_rate": 3.8372126187413704e-08,
1837
+ "loss": 0.34,
1838
+ "step": 1250
1839
+ },
1840
+ {
1841
+ "epoch": 9.96031746031746,
1842
+ "grad_norm": 0.2314453125,
1843
+ "learning_rate": 9.593491722270642e-09,
1844
+ "loss": 0.3385,
1845
+ "step": 1255
1846
+ },
1847
+ {
1848
+ "epoch": 10.0,
1849
+ "grad_norm": 0.2470703125,
1850
+ "learning_rate": 0.0,
1851
+ "loss": 0.3328,
1852
+ "step": 1260
1853
+ },
1854
+ {
1855
+ "epoch": 10.0,
1856
+ "eval_loss": 1.332617163658142,
1857
+ "eval_runtime": 1.0141,
1858
+ "eval_samples_per_second": 1.972,
1859
+ "eval_steps_per_second": 0.986,
1860
+ "step": 1260
1861
+ },
1862
+ {
1863
+ "epoch": 10.0,
1864
+ "step": 1260,
1865
+ "total_flos": 3.8899222565240177e+18,
1866
+ "train_loss": 1.363766341266178,
1867
+ "train_runtime": 8098.2535,
1868
+ "train_samples_per_second": 4.97,
1869
+ "train_steps_per_second": 0.156
1870
+ }
1871
+ ],
1872
+ "logging_steps": 5,
1873
+ "max_steps": 1260,
1874
+ "num_input_tokens_seen": 0,
1875
+ "num_train_epochs": 10,
1876
+ "save_steps": 100,
1877
+ "total_flos": 3.8899222565240177e+18,
1878
+ "train_batch_size": 4,
1879
+ "trial_name": null,
1880
+ "trial_params": null
1881
+ }