PEFT
Safetensors
English
Spanish
nmarafo commited on
Commit
8b18696
1 Parent(s): e6e5e1a

Upload 12 files

Browse files
README.md CHANGED
@@ -1,95 +1,203 @@
1
  ---
2
  library_name: peft
3
  base_model: google/gemma-7b-it
4
- datasets:
5
- - nmarafo/truthful_qa_TrueFalse-Feedback
6
- language:
7
- - en
8
- - es
9
- license: other
10
- license_name: gemma-terms-of-use
11
- license_link: https://ai.google.dev/gemma/terms
12
-
13
  ---
14
 
15
  # Model Card for Model ID
16
 
17
- This is an adapter prepared to return True or False depending on whether the student's answer ("student_answer") is correct based on the question ("question") and comparing it with a given answer ("best_answer").
18
- The prompt has the following structure:
19
- ```
20
- <start_of_turn>user\nAnalyze the question, the expected answer, and the student's response.
21
- Determine if the student's answer is correct or not. It only returns True if the student's answer is correct with respect to the expected answer or False otherwise.
22
- Add a brief comment explaining why the answer is correct or incorrect.\n\n
23
- Question: {question}\n
24
- Expected Answer: {best_answer}\n
25
- Student Answer: {student_answer}<end_of_turn><start_of_turn>model"
26
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## How to Get Started with the Model
30
- In Google Colab:
31
- ```
32
-
33
- !pip install -q -U bitsandbytes
34
- !pip install -q -U git+https://github.com/huggingface/transformers.git
35
- !pip install -q -U git+https://github.com/huggingface/peft.git
36
- !pip install -q -U git+https://github.com/huggingface/accelerate.git
37
- !pip install -q -U gradio
38
-
39
- from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GemmaTokenizer
40
- from peft import AutoPeftModelForCausalLM
41
- import torch
42
-
43
- # Carga el modelo y el tokenizer
44
- model_id = "google/gemma-7b-it"
45
- adapter = "nmarafo/Gemma-7B-it-4bit-TrueFalse-Feedback"
46
-
47
- bnb_config = BitsAndBytesConfig(
48
- load_in_4bit=True,
49
- bnb_4bit_quant_type="nf4",
50
- bnb_4bit_compute_dtype=torch.bfloat16
51
- )
52
-
53
- tokenizer = AutoTokenizer.from_pretrained(model_id)
54
-
55
- model = AutoPeftModelForCausalLM.from_pretrained(adapter, quantization_config=bnb_config, device_map={"":0})
56
-
57
- def predict(question, best_answer, student_answer, language):
58
- if language == "English":
59
- system_message = "Analyze the question, the expected answer, and the student's response. Determine if the student's answer is conceptually correct in relation to the expected answer, regardless of the exact wording. Return True if the student's answer is correct or False otherwise. Add a brief comment explaining the rationale behind the answer being correct or incorrect."
60
- else: # Asumimos que cualquier otra opción será Español
61
- system_message = "Analiza la pregunta, la respuesta esperada y la respuesta del estudiante. Determina si la respuesta del estudiante es conceptualmente correcta en relación con la respuesta esperada, independientemente de la redacción exacta. Devuelve Verdadero si la respuesta del estudiante es correcta o Falso en caso contrario. Añade un breve comentario explicando el razonamiento detrás de la corrección o incorrección de la respuesta."
62
-
63
- prompt = f"{system_message}\n\nQuestion: {question}\nExpected Answer: {best_answer}\nStudent Answer: {student_answer}"
64
- prompt_template=f"<start_of_turn>user{prompt}<end_of_turn><start_of_turn>model"
65
-
66
- # Ajusta aquí para incluir attention_mask
67
- encoding = tokenizer(prompt_template, return_tensors='pt', padding=True, truncation=True, max_length=256)
68
- input_ids = encoding['input_ids'].cuda()
69
- attention_mask = encoding['attention_mask'].cuda()
70
-
71
- output = model.generate(input_ids, attention_mask=attention_mask,
72
- temperature=0.7, do_sample=True, top_p=0.95,
73
- top_k=40, max_new_tokens=256, pad_token_id=tokenizer.eos_token_id)
74
- response = tokenizer.decode(output[0], skip_special_tokens=True)
75
-
76
- return response
77
-
78
- import gradio as gr
79
-
80
- iface = gr.Interface(
81
- fn=predict,
82
- inputs=[
83
- gr.Textbox(lines=2, placeholder="Pregunta"),
84
- gr.Textbox(lines=2, placeholder="Mejor Respuesta"),
85
- gr.Textbox(lines=2, placeholder="Respuesta del Estudiante"),
86
- gr.Radio(choices=["English", "Español"], label="Idioma")
87
- ],
88
- outputs=gr.Textbox(label="Respuesta del Modelo")
89
- )
90
- iface.launch(share=True,debug=True)
91
-
92
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
  ### Framework versions
95
 
 
1
  ---
2
  library_name: peft
3
  base_model: google/gemma-7b-it
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
 
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
  ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
 
202
  ### Framework versions
203
 
adapter_config.json CHANGED
@@ -19,13 +19,13 @@
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
- "o_proj",
23
- "k_proj",
24
  "down_proj",
25
- "v_proj",
26
  "gate_proj",
27
  "up_proj",
28
- "q_proj"
 
 
29
  ],
30
  "task_type": "CAUSAL_LM",
31
  "use_rslora": false
 
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
+ "q_proj",
 
23
  "down_proj",
 
24
  "gate_proj",
25
  "up_proj",
26
+ "v_proj",
27
+ "k_proj",
28
+ "o_proj"
29
  ],
30
  "task_type": "CAUSAL_LM",
31
  "use_rslora": false
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:30f4a246c1f4f9856ab88a883fd0e60bc5595caa8199955de3c9993f0ae5171d
3
  size 100059752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f49b115cbce3ef3723ae99f825dc14bfebed4333b03f710f7edc1cade4f1991
3
  size 100059752
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0e9822545dcd04649a67476c0930a0020fde0d4172429320eedd4b24afa59839
3
- size 50545780
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d459d0f52aa7a9d1f02adc583cd92762d3b40bbcf157bc414c1f04e1daf177b
3
+ size 50546164
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a0e32a00d1dc583ddebb6ee0c3b9cb549aa96ce10f763167188e46f4b89e143b
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65ac4c8840217741756b14f988f2593764f15747778c32be0b59f17319024854
3
  size 14244
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:312b1a4e327e3843fca725781dfeb890e3325a1f08b77b1a3b7f934fb5467564
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d309a7074f8d0cca2409ea08c87118790cf0c34b0431932fd72cfef6772e0aa0
3
  size 1064
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 100.0,
5
  "eval_steps": 500,
6
- "global_step": 100,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -12,709 +12,3509 @@
12
  "epoch": 1.0,
13
  "grad_norm": NaN,
14
  "learning_rate": 0.0,
15
- "loss": 7.3765,
16
  "step": 1
17
  },
18
  {
19
  "epoch": 2.0,
20
- "grad_norm": 2.7495672702789307,
21
  "learning_rate": 0.0001,
22
- "loss": 7.3765,
23
  "step": 2
24
  },
25
  {
26
  "epoch": 3.0,
27
- "grad_norm": 2.7495672702789307,
28
  "learning_rate": 0.0002,
29
- "loss": 7.3765,
30
  "step": 3
31
  },
32
  {
33
  "epoch": 4.0,
34
- "grad_norm": 3.6275627613067627,
35
- "learning_rate": 0.00019795918367346938,
36
- "loss": 6.994,
37
  "step": 4
38
  },
39
  {
40
  "epoch": 5.0,
41
- "grad_norm": 7.796189308166504,
42
- "learning_rate": 0.0001959183673469388,
43
- "loss": 6.341,
44
  "step": 5
45
  },
46
  {
47
  "epoch": 6.0,
48
- "grad_norm": 11.919865608215332,
49
- "learning_rate": 0.00019387755102040816,
50
- "loss": 5.805,
51
  "step": 6
52
  },
53
  {
54
  "epoch": 7.0,
55
- "grad_norm": Infinity,
56
- "learning_rate": 0.00019387755102040816,
57
- "loss": 5.2771,
58
  "step": 7
59
  },
60
  {
61
  "epoch": 8.0,
62
- "grad_norm": 15.628558158874512,
63
- "learning_rate": 0.00019183673469387756,
64
- "loss": 5.2771,
65
  "step": 8
66
  },
67
  {
68
  "epoch": 9.0,
69
- "grad_norm": 18.900388717651367,
70
- "learning_rate": 0.00018979591836734697,
71
- "loss": 4.7626,
72
  "step": 9
73
  },
74
  {
75
  "epoch": 10.0,
76
- "grad_norm": 21.62285614013672,
77
- "learning_rate": 0.00018775510204081634,
78
- "loss": 4.2169,
79
  "step": 10
80
  },
81
  {
82
  "epoch": 11.0,
83
- "grad_norm": 23.690582275390625,
84
- "learning_rate": 0.00018571428571428572,
85
- "loss": 3.623,
86
  "step": 11
87
  },
88
  {
89
  "epoch": 12.0,
90
- "grad_norm": 25.02626609802246,
91
- "learning_rate": 0.00018367346938775512,
92
- "loss": 2.9824,
93
  "step": 12
94
  },
95
  {
96
  "epoch": 13.0,
97
- "grad_norm": 25.598007202148438,
98
- "learning_rate": 0.0001816326530612245,
99
- "loss": 2.3122,
100
  "step": 13
101
  },
102
  {
103
  "epoch": 14.0,
104
- "grad_norm": 25.378807067871094,
105
- "learning_rate": 0.0001795918367346939,
106
- "loss": 1.6226,
107
  "step": 14
108
  },
109
  {
110
  "epoch": 15.0,
111
- "grad_norm": 24.527645111083984,
112
- "learning_rate": 0.00017755102040816327,
113
- "loss": 0.9334,
114
  "step": 15
115
  },
116
  {
117
  "epoch": 16.0,
118
- "grad_norm": 23.03998565673828,
119
- "learning_rate": 0.00017551020408163265,
120
- "loss": 0.2465,
121
  "step": 16
122
  },
123
  {
124
  "epoch": 17.0,
125
- "grad_norm": 4.810272216796875,
126
- "learning_rate": 0.00017346938775510205,
127
- "loss": 0.2217,
128
  "step": 17
129
  },
130
  {
131
  "epoch": 18.0,
132
- "grad_norm": 5.40369987487793,
133
- "learning_rate": 0.00017142857142857143,
134
- "loss": 0.2093,
135
  "step": 18
136
  },
137
  {
138
  "epoch": 19.0,
139
- "grad_norm": 5.298532962799072,
140
- "learning_rate": 0.00016938775510204083,
141
- "loss": 0.1797,
142
  "step": 19
143
  },
144
  {
145
  "epoch": 20.0,
146
- "grad_norm": 4.629075050354004,
147
- "learning_rate": 0.00016734693877551023,
148
- "loss": 0.1339,
149
  "step": 20
150
  },
151
  {
152
  "epoch": 21.0,
153
- "grad_norm": 1.2457849979400635,
154
- "learning_rate": 0.0001653061224489796,
155
- "loss": 0.092,
156
  "step": 21
157
  },
158
  {
159
  "epoch": 22.0,
160
- "grad_norm": 0.8375206589698792,
161
- "learning_rate": 0.00016326530612244898,
162
- "loss": 0.0933,
163
  "step": 22
164
  },
165
  {
166
  "epoch": 23.0,
167
- "grad_norm": 0.7440481185913086,
168
- "learning_rate": 0.00016122448979591838,
169
- "loss": 0.081,
170
  "step": 23
171
  },
172
  {
173
  "epoch": 24.0,
174
- "grad_norm": 0.728550910949707,
175
- "learning_rate": 0.00015918367346938776,
176
- "loss": 0.0715,
177
  "step": 24
178
  },
179
  {
180
  "epoch": 25.0,
181
- "grad_norm": 0.729324460029602,
182
- "learning_rate": 0.00015714285714285716,
183
- "loss": 0.0583,
184
  "step": 25
185
  },
186
  {
187
  "epoch": 26.0,
188
- "grad_norm": 0.7445201873779297,
189
- "learning_rate": 0.00015510204081632654,
190
- "loss": 0.044,
191
  "step": 26
192
  },
193
  {
194
  "epoch": 27.0,
195
- "grad_norm": 0.64507657289505,
196
- "learning_rate": 0.0001530612244897959,
197
- "loss": 0.0256,
198
  "step": 27
199
  },
200
  {
201
  "epoch": 28.0,
202
- "grad_norm": 0.3869144916534424,
203
- "learning_rate": 0.0001510204081632653,
204
- "loss": 0.0138,
205
  "step": 28
206
  },
207
  {
208
  "epoch": 29.0,
209
- "grad_norm": 0.17224831879138947,
210
- "learning_rate": 0.00014897959183673472,
211
- "loss": 0.0087,
212
  "step": 29
213
  },
214
  {
215
  "epoch": 30.0,
216
- "grad_norm": 0.0585104376077652,
217
- "learning_rate": 0.0001469387755102041,
218
- "loss": 0.0072,
219
  "step": 30
220
  },
221
  {
222
  "epoch": 31.0,
223
- "grad_norm": 0.18696996569633484,
224
- "learning_rate": 0.0001448979591836735,
225
- "loss": 0.0081,
226
  "step": 31
227
  },
228
  {
229
  "epoch": 32.0,
230
- "grad_norm": 0.10075689852237701,
231
- "learning_rate": 0.00014285714285714287,
232
- "loss": 0.0072,
233
  "step": 32
234
  },
235
  {
236
  "epoch": 33.0,
237
- "grad_norm": 0.04343040660023689,
238
- "learning_rate": 0.00014081632653061224,
239
- "loss": 0.0069,
240
  "step": 33
241
  },
242
  {
243
  "epoch": 34.0,
244
- "grad_norm": 0.13335004448890686,
245
- "learning_rate": 0.00013877551020408165,
246
- "loss": 0.0074,
247
  "step": 34
248
  },
249
  {
250
  "epoch": 35.0,
251
- "grad_norm": 0.0894094929099083,
252
- "learning_rate": 0.00013673469387755102,
253
- "loss": 0.007,
254
  "step": 35
255
  },
256
  {
257
  "epoch": 36.0,
258
- "grad_norm": 0.01999577507376671,
259
- "learning_rate": 0.0001346938775510204,
260
- "loss": 0.0067,
261
  "step": 36
262
  },
263
  {
264
  "epoch": 37.0,
265
- "grad_norm": 0.1184980571269989,
266
- "learning_rate": 0.0001326530612244898,
267
- "loss": 0.0072,
268
  "step": 37
269
  },
270
  {
271
  "epoch": 38.0,
272
- "grad_norm": 0.09607323259115219,
273
- "learning_rate": 0.00013061224489795917,
274
- "loss": 0.007,
275
  "step": 38
276
  },
277
  {
278
  "epoch": 39.0,
279
- "grad_norm": 0.027331219986081123,
280
- "learning_rate": 0.00012857142857142858,
281
- "loss": 0.0067,
282
  "step": 39
283
  },
284
  {
285
  "epoch": 40.0,
286
- "grad_norm": 0.08817232400178909,
287
- "learning_rate": 0.00012653061224489798,
288
- "loss": 0.0069,
289
  "step": 40
290
  },
291
  {
292
  "epoch": 41.0,
293
- "grad_norm": 0.08792853355407715,
294
- "learning_rate": 0.00012448979591836735,
295
- "loss": 0.0069,
296
  "step": 41
297
  },
298
  {
299
  "epoch": 42.0,
300
- "grad_norm": 0.04289069399237633,
301
- "learning_rate": 0.00012244897959183676,
302
- "loss": 0.0067,
303
  "step": 42
304
  },
305
  {
306
  "epoch": 43.0,
307
- "grad_norm": 0.04996877163648605,
308
- "learning_rate": 0.00012040816326530613,
309
- "loss": 0.0067,
310
  "step": 43
311
  },
312
  {
313
  "epoch": 44.0,
314
- "grad_norm": 0.07244863361120224,
315
- "learning_rate": 0.00011836734693877552,
316
- "loss": 0.0068,
317
  "step": 44
318
  },
319
  {
320
  "epoch": 45.0,
321
- "grad_norm": 0.07215742021799088,
322
- "learning_rate": 0.0001163265306122449,
323
- "loss": 0.0068,
324
  "step": 45
325
  },
326
  {
327
  "epoch": 46.0,
328
- "grad_norm": 0.01955232582986355,
329
- "learning_rate": 0.00011428571428571428,
330
- "loss": 0.0067,
331
  "step": 46
332
  },
333
  {
334
  "epoch": 47.0,
335
- "grad_norm": 0.06493868678808212,
336
- "learning_rate": 0.00011224489795918367,
337
- "loss": 0.0068,
338
  "step": 47
339
  },
340
  {
341
  "epoch": 48.0,
342
- "grad_norm": 0.06490014493465424,
343
- "learning_rate": 0.00011020408163265306,
344
- "loss": 0.0068,
345
  "step": 48
346
  },
347
  {
348
  "epoch": 49.0,
349
- "grad_norm": 0.019649550318717957,
350
- "learning_rate": 0.00010816326530612246,
351
- "loss": 0.0067,
352
  "step": 49
353
  },
354
  {
355
  "epoch": 50.0,
356
- "grad_norm": 0.04920223355293274,
357
- "learning_rate": 0.00010612244897959185,
358
- "loss": 0.0067,
359
  "step": 50
360
  },
361
  {
362
  "epoch": 51.0,
363
- "grad_norm": 0.07163064181804657,
364
- "learning_rate": 0.00010408163265306123,
365
- "loss": 0.0068,
366
  "step": 51
367
  },
368
  {
369
  "epoch": 52.0,
370
- "grad_norm": 0.005953885614871979,
371
- "learning_rate": 0.00010204081632653062,
372
- "loss": 0.0066,
373
  "step": 52
374
  },
375
  {
376
  "epoch": 53.0,
377
- "grad_norm": 0.01944654807448387,
378
- "learning_rate": 0.0001,
379
- "loss": 0.0066,
380
  "step": 53
381
  },
382
  {
383
  "epoch": 54.0,
384
- "grad_norm": 0.0421106182038784,
385
- "learning_rate": 9.79591836734694e-05,
386
- "loss": 0.0067,
387
  "step": 54
388
  },
389
  {
390
  "epoch": 55.0,
391
- "grad_norm": 0.019489118829369545,
392
- "learning_rate": 9.591836734693878e-05,
393
- "loss": 0.0066,
394
  "step": 55
395
  },
396
  {
397
  "epoch": 56.0,
398
- "grad_norm": 0.004421094432473183,
399
- "learning_rate": 9.387755102040817e-05,
400
- "loss": 0.0066,
401
  "step": 56
402
  },
403
  {
404
  "epoch": 57.0,
405
- "grad_norm": 0.026416227221488953,
406
- "learning_rate": 9.183673469387756e-05,
407
- "loss": 0.0067,
408
  "step": 57
409
  },
410
  {
411
  "epoch": 58.0,
412
- "grad_norm": 0.003954235929995775,
413
- "learning_rate": 8.979591836734695e-05,
414
- "loss": 0.0066,
415
  "step": 58
416
  },
417
  {
418
  "epoch": 59.0,
419
- "grad_norm": 0.003926219418644905,
420
- "learning_rate": 8.775510204081632e-05,
421
- "loss": 0.0066,
422
  "step": 59
423
  },
424
  {
425
  "epoch": 60.0,
426
- "grad_norm": 0.0038123615086078644,
427
- "learning_rate": 8.571428571428571e-05,
428
- "loss": 0.0066,
429
  "step": 60
430
  },
431
  {
432
  "epoch": 61.0,
433
- "grad_norm": 0.003582009579986334,
434
- "learning_rate": 8.367346938775511e-05,
435
- "loss": 0.0066,
436
  "step": 61
437
  },
438
  {
439
  "epoch": 62.0,
440
- "grad_norm": 0.0035740730818361044,
441
- "learning_rate": 8.163265306122449e-05,
442
- "loss": 0.0066,
443
  "step": 62
444
  },
445
  {
446
  "epoch": 63.0,
447
- "grad_norm": 0.01964273676276207,
448
- "learning_rate": 7.959183673469388e-05,
449
- "loss": 0.0066,
450
  "step": 63
451
  },
452
  {
453
  "epoch": 64.0,
454
- "grad_norm": 0.01971287839114666,
455
- "learning_rate": 7.755102040816327e-05,
456
- "loss": 0.0066,
457
  "step": 64
458
  },
459
  {
460
  "epoch": 65.0,
461
- "grad_norm": 0.0035709121730178595,
462
- "learning_rate": 7.551020408163266e-05,
463
- "loss": 0.0066,
464
  "step": 65
465
  },
466
  {
467
  "epoch": 66.0,
468
- "grad_norm": 0.003548271721228957,
469
- "learning_rate": 7.346938775510205e-05,
470
- "loss": 0.0066,
471
  "step": 66
472
  },
473
  {
474
  "epoch": 67.0,
475
- "grad_norm": 0.02695435844361782,
476
- "learning_rate": 7.142857142857143e-05,
477
- "loss": 0.0066,
478
  "step": 67
479
  },
480
  {
481
  "epoch": 68.0,
482
- "grad_norm": 0.026985742151737213,
483
- "learning_rate": 6.938775510204082e-05,
484
- "loss": 0.0066,
485
  "step": 68
486
  },
487
  {
488
  "epoch": 69.0,
489
- "grad_norm": 0.00358410133048892,
490
- "learning_rate": 6.73469387755102e-05,
491
- "loss": 0.0066,
492
  "step": 69
493
  },
494
  {
495
  "epoch": 70.0,
496
- "grad_norm": 0.04342804476618767,
497
- "learning_rate": 6.530612244897959e-05,
498
- "loss": 0.0067,
499
  "step": 70
500
  },
501
  {
502
  "epoch": 71.0,
503
- "grad_norm": 0.020023003220558167,
504
- "learning_rate": 6.326530612244899e-05,
505
- "loss": 0.0066,
506
  "step": 71
507
  },
508
  {
509
  "epoch": 72.0,
510
- "grad_norm": 0.020061027258634567,
511
- "learning_rate": 6.122448979591838e-05,
512
- "loss": 0.0066,
513
  "step": 72
514
  },
515
  {
516
  "epoch": 73.0,
517
- "grad_norm": 0.003791953669860959,
518
- "learning_rate": 5.918367346938776e-05,
519
- "loss": 0.0066,
520
  "step": 73
521
  },
522
  {
523
  "epoch": 74.0,
524
- "grad_norm": 0.050881966948509216,
525
- "learning_rate": 5.714285714285714e-05,
526
- "loss": 0.0067,
527
  "step": 74
528
  },
529
  {
530
  "epoch": 75.0,
531
- "grad_norm": 0.027295473963022232,
532
- "learning_rate": 5.510204081632653e-05,
533
- "loss": 0.0066,
534
  "step": 75
535
  },
536
  {
537
  "epoch": 76.0,
538
- "grad_norm": 0.0037258469965308905,
539
- "learning_rate": 5.3061224489795926e-05,
540
- "loss": 0.0066,
541
  "step": 76
542
  },
543
  {
544
  "epoch": 77.0,
545
- "grad_norm": 0.020169131457805634,
546
- "learning_rate": 5.102040816326531e-05,
547
- "loss": 0.0066,
548
  "step": 77
549
  },
550
  {
551
  "epoch": 78.0,
552
- "grad_norm": 0.04392065480351448,
553
- "learning_rate": 4.89795918367347e-05,
554
- "loss": 0.0067,
555
  "step": 78
556
  },
557
  {
558
  "epoch": 79.0,
559
- "grad_norm": 0.02023773454129696,
560
- "learning_rate": 4.6938775510204086e-05,
561
- "loss": 0.0066,
562
  "step": 79
563
  },
564
  {
565
  "epoch": 80.0,
566
- "grad_norm": 0.003931655548512936,
567
- "learning_rate": 4.4897959183673474e-05,
568
- "loss": 0.0066,
569
  "step": 80
570
  },
571
  {
572
  "epoch": 81.0,
573
- "grad_norm": 0.027433717623353004,
574
- "learning_rate": 4.2857142857142856e-05,
575
- "loss": 0.0066,
576
  "step": 81
577
  },
578
  {
579
  "epoch": 82.0,
580
- "grad_norm": 0.027440495789051056,
581
- "learning_rate": 4.0816326530612245e-05,
582
- "loss": 0.0066,
583
  "step": 82
584
  },
585
  {
586
  "epoch": 83.0,
587
- "grad_norm": 0.003971911035478115,
588
- "learning_rate": 3.8775510204081634e-05,
589
- "loss": 0.0066,
590
  "step": 83
591
  },
592
  {
593
  "epoch": 84.0,
594
- "grad_norm": 0.0040692477487027645,
595
- "learning_rate": 3.673469387755102e-05,
596
- "loss": 0.0066,
597
  "step": 84
598
  },
599
  {
600
  "epoch": 85.0,
601
- "grad_norm": 0.02032075822353363,
602
- "learning_rate": 3.469387755102041e-05,
603
- "loss": 0.0066,
604
  "step": 85
605
  },
606
  {
607
  "epoch": 86.0,
608
- "grad_norm": 0.004029002971947193,
609
- "learning_rate": 3.265306122448979e-05,
610
- "loss": 0.0066,
611
  "step": 86
612
  },
613
  {
614
  "epoch": 87.0,
615
- "grad_norm": 0.02034132555127144,
616
- "learning_rate": 3.061224489795919e-05,
617
- "loss": 0.0066,
618
  "step": 87
619
  },
620
  {
621
  "epoch": 88.0,
622
- "grad_norm": 0.003994234371930361,
623
- "learning_rate": 2.857142857142857e-05,
624
- "loss": 0.0066,
625
  "step": 88
626
  },
627
  {
628
  "epoch": 89.0,
629
- "grad_norm": 0.004034143406897783,
630
- "learning_rate": 2.6530612244897963e-05,
631
- "loss": 0.0066,
632
  "step": 89
633
  },
634
  {
635
  "epoch": 90.0,
636
- "grad_norm": 0.004001120571047068,
637
- "learning_rate": 2.448979591836735e-05,
638
- "loss": 0.0066,
639
  "step": 90
640
  },
641
  {
642
  "epoch": 91.0,
643
- "grad_norm": 0.020308438688516617,
644
- "learning_rate": 2.2448979591836737e-05,
645
- "loss": 0.0066,
646
  "step": 91
647
  },
648
  {
649
  "epoch": 92.0,
650
- "grad_norm": 0.004174523055553436,
651
- "learning_rate": 2.0408163265306123e-05,
652
- "loss": 0.0066,
653
  "step": 92
654
  },
655
  {
656
  "epoch": 93.0,
657
- "grad_norm": 0.004282441921532154,
658
- "learning_rate": 1.836734693877551e-05,
659
- "loss": 0.0066,
660
  "step": 93
661
  },
662
  {
663
  "epoch": 94.0,
664
- "grad_norm": 0.004000538494437933,
665
- "learning_rate": 1.6326530612244897e-05,
666
- "loss": 0.0066,
667
  "step": 94
668
  },
669
  {
670
  "epoch": 95.0,
671
- "grad_norm": 0.003992615267634392,
672
- "learning_rate": 1.4285714285714285e-05,
673
- "loss": 0.0066,
674
  "step": 95
675
  },
676
  {
677
  "epoch": 96.0,
678
- "grad_norm": 0.0273627657443285,
679
- "learning_rate": 1.2244897959183674e-05,
680
- "loss": 0.0066,
681
  "step": 96
682
  },
683
  {
684
  "epoch": 97.0,
685
- "grad_norm": 0.027324387803673744,
686
- "learning_rate": 1.0204081632653061e-05,
687
- "loss": 0.0066,
688
  "step": 97
689
  },
690
  {
691
  "epoch": 98.0,
692
- "grad_norm": 0.027326995506882668,
693
- "learning_rate": 8.163265306122448e-06,
694
- "loss": 0.0066,
695
  "step": 98
696
  },
697
  {
698
  "epoch": 99.0,
699
- "grad_norm": 0.020300107076764107,
700
- "learning_rate": 6.122448979591837e-06,
701
- "loss": 0.0066,
702
  "step": 99
703
  },
704
  {
705
  "epoch": 100.0,
706
- "grad_norm": 0.003976646810770035,
707
- "learning_rate": 4.081632653061224e-06,
708
- "loss": 0.0066,
709
  "step": 100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
710
  }
711
  ],
712
  "logging_steps": 1,
713
- "max_steps": 100,
714
  "num_input_tokens_seen": 0,
715
- "num_train_epochs": 100,
716
  "save_steps": 500,
717
- "total_flos": 499235306496000.0,
718
  "train_batch_size": 1,
719
  "trial_name": null,
720
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 500.0,
5
  "eval_steps": 500,
6
+ "global_step": 500,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
12
  "epoch": 1.0,
13
  "grad_norm": NaN,
14
  "learning_rate": 0.0,
15
+ "loss": 5.453,
16
  "step": 1
17
  },
18
  {
19
  "epoch": 2.0,
20
+ "grad_norm": 2.349853515625,
21
  "learning_rate": 0.0001,
22
+ "loss": 5.453,
23
  "step": 2
24
  },
25
  {
26
  "epoch": 3.0,
27
+ "grad_norm": 2.349853515625,
28
  "learning_rate": 0.0002,
29
+ "loss": 5.453,
30
  "step": 3
31
  },
32
  {
33
  "epoch": 4.0,
34
+ "grad_norm": 2.8772072792053223,
35
+ "learning_rate": 0.0001995983935742972,
36
+ "loss": 5.1576,
37
  "step": 4
38
  },
39
  {
40
  "epoch": 5.0,
41
+ "grad_norm": 5.726413726806641,
42
+ "learning_rate": 0.0001991967871485944,
43
+ "loss": 4.6773,
44
  "step": 5
45
  },
46
  {
47
  "epoch": 6.0,
48
+ "grad_norm": 8.641162872314453,
49
+ "learning_rate": 0.00019879518072289158,
50
+ "loss": 4.2517,
51
  "step": 6
52
  },
53
  {
54
  "epoch": 7.0,
55
+ "grad_norm": 11.281049728393555,
56
+ "learning_rate": 0.00019839357429718877,
57
+ "loss": 3.8401,
58
  "step": 7
59
  },
60
  {
61
  "epoch": 8.0,
62
+ "grad_norm": 13.561233520507812,
63
+ "learning_rate": 0.00019799196787148596,
64
+ "loss": 3.428,
65
  "step": 8
66
  },
67
  {
68
  "epoch": 9.0,
69
+ "grad_norm": Infinity,
70
+ "learning_rate": 0.00019799196787148596,
71
+ "loss": 2.9945,
72
  "step": 9
73
  },
74
  {
75
  "epoch": 10.0,
76
+ "grad_norm": 15.408284187316895,
77
+ "learning_rate": 0.00019759036144578314,
78
+ "loss": 2.9945,
79
  "step": 10
80
  },
81
  {
82
  "epoch": 11.0,
83
+ "grad_norm": 16.737504959106445,
84
+ "learning_rate": 0.00019718875502008033,
85
+ "loss": 2.5293,
86
  "step": 11
87
  },
88
  {
89
  "epoch": 12.0,
90
+ "grad_norm": 17.475238800048828,
91
+ "learning_rate": 0.00019678714859437752,
92
+ "loss": 2.0314,
93
  "step": 12
94
  },
95
  {
96
  "epoch": 13.0,
97
+ "grad_norm": 17.607587814331055,
98
+ "learning_rate": 0.0001963855421686747,
99
+ "loss": 1.5013,
100
  "step": 13
101
  },
102
  {
103
  "epoch": 14.0,
104
+ "grad_norm": 17.160503387451172,
105
+ "learning_rate": 0.0001959839357429719,
106
+ "loss": 0.9549,
107
  "step": 14
108
  },
109
  {
110
  "epoch": 15.0,
111
+ "grad_norm": 16.20315933227539,
112
+ "learning_rate": 0.00019558232931726906,
113
+ "loss": 0.4056,
114
  "step": 15
115
  },
116
  {
117
  "epoch": 16.0,
118
+ "grad_norm": 2.948519229888916,
119
+ "learning_rate": 0.00019518072289156628,
120
+ "loss": 0.1432,
121
  "step": 16
122
  },
123
  {
124
  "epoch": 17.0,
125
+ "grad_norm": 3.228358507156372,
126
+ "learning_rate": 0.00019477911646586347,
127
+ "loss": 0.1246,
128
  "step": 17
129
  },
130
  {
131
  "epoch": 18.0,
132
+ "grad_norm": 3.2282843589782715,
133
+ "learning_rate": 0.00019437751004016066,
134
+ "loss": 0.1034,
135
  "step": 18
136
  },
137
  {
138
  "epoch": 19.0,
139
+ "grad_norm": 1.0867902040481567,
140
+ "learning_rate": 0.00019397590361445782,
141
+ "loss": 0.0786,
142
  "step": 19
143
  },
144
  {
145
  "epoch": 20.0,
146
+ "grad_norm": 0.5086488723754883,
147
+ "learning_rate": 0.00019357429718875504,
148
+ "loss": 0.0706,
149
  "step": 20
150
  },
151
  {
152
  "epoch": 21.0,
153
+ "grad_norm": 0.4829910397529602,
154
+ "learning_rate": 0.00019317269076305223,
155
+ "loss": 0.063,
156
  "step": 21
157
  },
158
  {
159
  "epoch": 22.0,
160
+ "grad_norm": 0.5215936899185181,
161
+ "learning_rate": 0.00019277108433734942,
162
+ "loss": 0.0565,
163
  "step": 22
164
  },
165
  {
166
  "epoch": 23.0,
167
+ "grad_norm": 0.5226811766624451,
168
+ "learning_rate": 0.00019236947791164658,
169
+ "loss": 0.0445,
170
  "step": 23
171
  },
172
  {
173
  "epoch": 24.0,
174
+ "grad_norm": 0.5145213603973389,
175
+ "learning_rate": 0.00019196787148594377,
176
+ "loss": 0.0316,
177
  "step": 24
178
  },
179
  {
180
  "epoch": 25.0,
181
+ "grad_norm": 0.50876384973526,
182
+ "learning_rate": 0.00019156626506024098,
183
+ "loss": 0.0197,
184
  "step": 25
185
  },
186
  {
187
  "epoch": 26.0,
188
+ "grad_norm": 0.23416705429553986,
189
+ "learning_rate": 0.00019116465863453817,
190
+ "loss": 0.0081,
191
  "step": 26
192
  },
193
  {
194
  "epoch": 27.0,
195
+ "grad_norm": 0.07123460620641708,
196
+ "learning_rate": 0.00019076305220883533,
197
+ "loss": 0.0055,
198
  "step": 27
199
  },
200
  {
201
  "epoch": 28.0,
202
+ "grad_norm": 0.17463913559913635,
203
+ "learning_rate": 0.00019036144578313252,
204
+ "loss": 0.0056,
205
  "step": 28
206
  },
207
  {
208
  "epoch": 29.0,
209
+ "grad_norm": 0.08099503815174103,
210
+ "learning_rate": 0.00018995983935742974,
211
+ "loss": 0.005,
212
  "step": 29
213
  },
214
  {
215
  "epoch": 30.0,
216
+ "grad_norm": 0.10401125252246857,
217
+ "learning_rate": 0.00018955823293172693,
218
+ "loss": 0.005,
219
  "step": 30
220
  },
221
  {
222
  "epoch": 31.0,
223
+ "grad_norm": 0.18095582723617554,
224
+ "learning_rate": 0.0001891566265060241,
225
+ "loss": 0.0057,
226
  "step": 31
227
  },
228
  {
229
  "epoch": 32.0,
230
+ "grad_norm": 0.056214649230241776,
231
+ "learning_rate": 0.00018875502008032128,
232
+ "loss": 0.0048,
233
  "step": 32
234
  },
235
  {
236
  "epoch": 33.0,
237
+ "grad_norm": 0.12311957776546478,
238
+ "learning_rate": 0.0001883534136546185,
239
+ "loss": 0.0051,
240
  "step": 33
241
  },
242
  {
243
  "epoch": 34.0,
244
+ "grad_norm": 0.1186085045337677,
245
+ "learning_rate": 0.00018795180722891569,
246
+ "loss": 0.0052,
247
  "step": 34
248
  },
249
  {
250
  "epoch": 35.0,
251
+ "grad_norm": 0.05622096359729767,
252
+ "learning_rate": 0.00018755020080321285,
253
+ "loss": 0.0048,
254
  "step": 35
255
  },
256
  {
257
  "epoch": 36.0,
258
+ "grad_norm": 0.1179257333278656,
259
+ "learning_rate": 0.00018714859437751004,
260
+ "loss": 0.0051,
261
  "step": 36
262
  },
263
  {
264
  "epoch": 37.0,
265
+ "grad_norm": 0.02104870229959488,
266
+ "learning_rate": 0.00018674698795180723,
267
+ "loss": 0.005,
268
  "step": 37
269
  },
270
  {
271
  "epoch": 38.0,
272
+ "grad_norm": 0.09374430030584335,
273
+ "learning_rate": 0.00018634538152610444,
274
+ "loss": 0.005,
275
  "step": 38
276
  },
277
  {
278
  "epoch": 39.0,
279
+ "grad_norm": 0.03337598219513893,
280
+ "learning_rate": 0.0001859437751004016,
281
+ "loss": 0.0048,
282
  "step": 39
283
  },
284
  {
285
  "epoch": 40.0,
286
+ "grad_norm": 0.09240953624248505,
287
+ "learning_rate": 0.0001855421686746988,
288
+ "loss": 0.005,
289
  "step": 40
290
  },
291
  {
292
  "epoch": 41.0,
293
+ "grad_norm": 0.02956731617450714,
294
+ "learning_rate": 0.00018514056224899598,
295
+ "loss": 0.0048,
296
  "step": 41
297
  },
298
  {
299
  "epoch": 42.0,
300
+ "grad_norm": 0.08981137722730637,
301
+ "learning_rate": 0.0001847389558232932,
302
+ "loss": 0.005,
303
  "step": 42
304
  },
305
  {
306
  "epoch": 43.0,
307
+ "grad_norm": 0.0320754237473011,
308
+ "learning_rate": 0.00018433734939759036,
309
+ "loss": 0.0048,
310
  "step": 43
311
  },
312
  {
313
  "epoch": 44.0,
314
+ "grad_norm": 0.088747039437294,
315
+ "learning_rate": 0.00018393574297188755,
316
+ "loss": 0.005,
317
  "step": 44
318
  },
319
  {
320
  "epoch": 45.0,
321
+ "grad_norm": 0.04862065240740776,
322
+ "learning_rate": 0.00018353413654618474,
323
+ "loss": 0.0048,
324
  "step": 45
325
  },
326
  {
327
  "epoch": 46.0,
328
+ "grad_norm": 0.08655441552400589,
329
+ "learning_rate": 0.00018313253012048193,
330
+ "loss": 0.005,
331
  "step": 46
332
  },
333
  {
334
  "epoch": 47.0,
335
+ "grad_norm": 0.06792537122964859,
336
+ "learning_rate": 0.00018273092369477912,
337
+ "loss": 0.0049,
338
  "step": 47
339
  },
340
  {
341
  "epoch": 48.0,
342
+ "grad_norm": 0.08522295206785202,
343
+ "learning_rate": 0.0001823293172690763,
344
+ "loss": 0.005,
345
  "step": 48
346
  },
347
  {
348
  "epoch": 49.0,
349
+ "grad_norm": 0.04663475975394249,
350
+ "learning_rate": 0.0001819277108433735,
351
+ "loss": 0.0048,
352
  "step": 49
353
  },
354
  {
355
  "epoch": 50.0,
356
+ "grad_norm": 0.029744861647486687,
357
+ "learning_rate": 0.0001815261044176707,
358
+ "loss": 0.0048,
359
  "step": 50
360
  },
361
  {
362
  "epoch": 51.0,
363
+ "grad_norm": 0.06534643471240997,
364
+ "learning_rate": 0.0001811244979919679,
365
+ "loss": 0.0049,
366
  "step": 51
367
  },
368
  {
369
  "epoch": 52.0,
370
+ "grad_norm": 0.007808285299688578,
371
+ "learning_rate": 0.00018072289156626507,
372
+ "loss": 0.0047,
373
  "step": 52
374
  },
375
  {
376
  "epoch": 53.0,
377
+ "grad_norm": 0.0546131432056427,
378
+ "learning_rate": 0.00018032128514056225,
379
+ "loss": 0.0051,
380
  "step": 53
381
  },
382
  {
383
  "epoch": 54.0,
384
+ "grad_norm": 0.016065070405602455,
385
+ "learning_rate": 0.00017991967871485944,
386
+ "loss": 0.0045,
387
  "step": 54
388
  },
389
  {
390
  "epoch": 55.0,
391
+ "grad_norm": 0.02857162430882454,
392
+ "learning_rate": 0.00017951807228915663,
393
+ "loss": 0.0048,
394
  "step": 55
395
  },
396
  {
397
  "epoch": 56.0,
398
+ "grad_norm": 0.03628357872366905,
399
+ "learning_rate": 0.00017911646586345382,
400
+ "loss": 0.005,
401
  "step": 56
402
  },
403
  {
404
  "epoch": 57.0,
405
+ "grad_norm": 0.0075448257848620415,
406
+ "learning_rate": 0.000178714859437751,
407
+ "loss": 0.0047,
408
  "step": 57
409
  },
410
  {
411
  "epoch": 58.0,
412
+ "grad_norm": 0.04328390583395958,
413
+ "learning_rate": 0.0001783132530120482,
414
+ "loss": 0.0048,
415
  "step": 58
416
  },
417
  {
418
  "epoch": 59.0,
419
+ "grad_norm": 0.0074650561437010765,
420
+ "learning_rate": 0.0001779116465863454,
421
+ "loss": 0.0047,
422
  "step": 59
423
  },
424
  {
425
  "epoch": 60.0,
426
+ "grad_norm": 0.04476163163781166,
427
+ "learning_rate": 0.00017751004016064258,
428
+ "loss": 0.0048,
429
  "step": 60
430
  },
431
  {
432
  "epoch": 61.0,
433
+ "grad_norm": 0.010204787366092205,
434
+ "learning_rate": 0.00017710843373493977,
435
+ "loss": 0.0047,
436
  "step": 61
437
  },
438
  {
439
  "epoch": 62.0,
440
+ "grad_norm": 0.007382046431303024,
441
+ "learning_rate": 0.00017670682730923696,
442
+ "loss": 0.0047,
443
  "step": 62
444
  },
445
  {
446
  "epoch": 63.0,
447
+ "grad_norm": 0.042141955345869064,
448
+ "learning_rate": 0.00017630522088353415,
449
+ "loss": 0.0048,
450
  "step": 63
451
  },
452
  {
453
  "epoch": 64.0,
454
+ "grad_norm": 0.007281558588147163,
455
+ "learning_rate": 0.00017590361445783134,
456
+ "loss": 0.0047,
457
  "step": 64
458
  },
459
  {
460
  "epoch": 65.0,
461
+ "grad_norm": 0.04382139816880226,
462
+ "learning_rate": 0.00017550200803212853,
463
+ "loss": 0.0048,
464
  "step": 65
465
  },
466
  {
467
  "epoch": 66.0,
468
+ "grad_norm": 0.027046991512179375,
469
+ "learning_rate": 0.00017510040160642571,
470
+ "loss": 0.0048,
471
  "step": 66
472
  },
473
  {
474
  "epoch": 67.0,
475
+ "grad_norm": 0.041668713092803955,
476
+ "learning_rate": 0.0001746987951807229,
477
+ "loss": 0.0048,
478
  "step": 67
479
  },
480
  {
481
  "epoch": 68.0,
482
+ "grad_norm": 0.024397345259785652,
483
+ "learning_rate": 0.0001742971887550201,
484
+ "loss": 0.0048,
485
  "step": 68
486
  },
487
  {
488
  "epoch": 69.0,
489
+ "grad_norm": 0.010038570500910282,
490
+ "learning_rate": 0.00017389558232931728,
491
+ "loss": 0.0047,
492
  "step": 69
493
  },
494
  {
495
  "epoch": 70.0,
496
+ "grad_norm": 0.04328250139951706,
497
+ "learning_rate": 0.00017349397590361447,
498
+ "loss": 0.0048,
499
  "step": 70
500
  },
501
  {
502
  "epoch": 71.0,
503
+ "grad_norm": 0.007169268559664488,
504
+ "learning_rate": 0.00017309236947791166,
505
+ "loss": 0.0047,
506
  "step": 71
507
  },
508
  {
509
  "epoch": 72.0,
510
+ "grad_norm": 0.024196363985538483,
511
+ "learning_rate": 0.00017269076305220885,
512
+ "loss": 0.0048,
513
  "step": 72
514
  },
515
  {
516
  "epoch": 73.0,
517
+ "grad_norm": 0.016377076506614685,
518
+ "learning_rate": 0.00017228915662650604,
519
+ "loss": 0.005,
520
  "step": 73
521
  },
522
  {
523
  "epoch": 74.0,
524
+ "grad_norm": 0.009935774840414524,
525
+ "learning_rate": 0.00017188755020080323,
526
+ "loss": 0.0047,
527
  "step": 74
528
  },
529
  {
530
  "epoch": 75.0,
531
+ "grad_norm": 0.026620058342814445,
532
+ "learning_rate": 0.00017148594377510042,
533
+ "loss": 0.0048,
534
  "step": 75
535
  },
536
  {
537
  "epoch": 76.0,
538
+ "grad_norm": 0.00991370715200901,
539
+ "learning_rate": 0.0001710843373493976,
540
+ "loss": 0.0047,
541
  "step": 76
542
  },
543
  {
544
  "epoch": 77.0,
545
+ "grad_norm": 0.041056908667087555,
546
+ "learning_rate": 0.00017068273092369477,
547
+ "loss": 0.0048,
548
  "step": 77
549
  },
550
  {
551
  "epoch": 78.0,
552
+ "grad_norm": 0.0022562043741345406,
553
+ "learning_rate": 0.00017028112449799199,
554
+ "loss": 0.0045,
555
  "step": 78
556
  },
557
  {
558
  "epoch": 79.0,
559
+ "grad_norm": 0.009892994537949562,
560
+ "learning_rate": 0.00016987951807228917,
561
+ "loss": 0.0047,
562
  "step": 79
563
  },
564
  {
565
  "epoch": 80.0,
566
+ "grad_norm": 0.026598593220114708,
567
+ "learning_rate": 0.00016947791164658636,
568
+ "loss": 0.0048,
569
  "step": 80
570
  },
571
  {
572
  "epoch": 81.0,
573
+ "grad_norm": 0.007094119675457478,
574
+ "learning_rate": 0.00016907630522088353,
575
+ "loss": 0.0047,
576
  "step": 81
577
  },
578
  {
579
  "epoch": 82.0,
580
+ "grad_norm": 0.024074360728263855,
581
+ "learning_rate": 0.00016867469879518074,
582
+ "loss": 0.0048,
583
  "step": 82
584
  },
585
  {
586
  "epoch": 83.0,
587
+ "grad_norm": 0.007143808528780937,
588
+ "learning_rate": 0.00016827309236947793,
589
+ "loss": 0.0047,
590
  "step": 83
591
  },
592
  {
593
  "epoch": 84.0,
594
+ "grad_norm": 0.00991752091795206,
595
+ "learning_rate": 0.00016787148594377512,
596
+ "loss": 0.0047,
597
  "step": 84
598
  },
599
  {
600
  "epoch": 85.0,
601
+ "grad_norm": 0.026643570512533188,
602
+ "learning_rate": 0.00016746987951807228,
603
+ "loss": 0.0048,
604
  "step": 85
605
  },
606
  {
607
  "epoch": 86.0,
608
+ "grad_norm": 0.0071501159109175205,
609
+ "learning_rate": 0.00016706827309236947,
610
+ "loss": 0.0047,
611
  "step": 86
612
  },
613
  {
614
  "epoch": 87.0,
615
+ "grad_norm": 0.041228219866752625,
616
+ "learning_rate": 0.0001666666666666667,
617
+ "loss": 0.0048,
618
  "step": 87
619
  },
620
  {
621
  "epoch": 88.0,
622
+ "grad_norm": 0.026733651757240295,
623
+ "learning_rate": 0.00016626506024096388,
624
+ "loss": 0.0048,
625
  "step": 88
626
  },
627
  {
628
  "epoch": 89.0,
629
+ "grad_norm": 0.01906830444931984,
630
+ "learning_rate": 0.00016586345381526104,
631
+ "loss": 0.0045,
632
  "step": 89
633
  },
634
  {
635
  "epoch": 90.0,
636
+ "grad_norm": 0.016484271734952927,
637
+ "learning_rate": 0.00016546184738955823,
638
+ "loss": 0.005,
639
  "step": 90
640
  },
641
  {
642
  "epoch": 91.0,
643
+ "grad_norm": 0.007174884434789419,
644
+ "learning_rate": 0.00016506024096385545,
645
+ "loss": 0.0047,
646
  "step": 91
647
  },
648
  {
649
  "epoch": 92.0,
650
+ "grad_norm": 0.007238594815135002,
651
+ "learning_rate": 0.00016465863453815263,
652
+ "loss": 0.0047,
653
  "step": 92
654
  },
655
  {
656
  "epoch": 93.0,
657
+ "grad_norm": 0.010034309700131416,
658
+ "learning_rate": 0.0001642570281124498,
659
+ "loss": 0.0047,
660
  "step": 93
661
  },
662
  {
663
  "epoch": 94.0,
664
+ "grad_norm": 0.02700984664261341,
665
+ "learning_rate": 0.00016385542168674699,
666
+ "loss": 0.0048,
667
  "step": 94
668
  },
669
  {
670
  "epoch": 95.0,
671
+ "grad_norm": 0.015189659781754017,
672
+ "learning_rate": 0.00016345381526104417,
673
+ "loss": 0.0045,
674
  "step": 95
675
  },
676
  {
677
  "epoch": 96.0,
678
+ "grad_norm": 0.024589456617832184,
679
+ "learning_rate": 0.0001630522088353414,
680
+ "loss": 0.0048,
681
  "step": 96
682
  },
683
  {
684
  "epoch": 97.0,
685
+ "grad_norm": 0.001375726773403585,
686
+ "learning_rate": 0.00016265060240963855,
687
+ "loss": 0.0049,
688
  "step": 97
689
  },
690
  {
691
  "epoch": 98.0,
692
+ "grad_norm": 0.04419811815023422,
693
+ "learning_rate": 0.00016224899598393574,
694
+ "loss": 0.0048,
695
  "step": 98
696
  },
697
  {
698
  "epoch": 99.0,
699
+ "grad_norm": 0.007295660208910704,
700
+ "learning_rate": 0.00016184738955823293,
701
+ "loss": 0.0047,
702
  "step": 99
703
  },
704
  {
705
  "epoch": 100.0,
706
+ "grad_norm": 0.0422837920486927,
707
+ "learning_rate": 0.00016144578313253015,
708
+ "loss": 0.0048,
709
  "step": 100
710
+ },
711
+ {
712
+ "epoch": 101.0,
713
+ "grad_norm": 0.007468740921467543,
714
+ "learning_rate": 0.0001610441767068273,
715
+ "loss": 0.0047,
716
+ "step": 101
717
+ },
718
+ {
719
+ "epoch": 102.0,
720
+ "grad_norm": 0.052220698446035385,
721
+ "learning_rate": 0.0001606425702811245,
722
+ "loss": 0.0051,
723
+ "step": 102
724
+ },
725
+ {
726
+ "epoch": 103.0,
727
+ "grad_norm": 0.010211293585598469,
728
+ "learning_rate": 0.0001602409638554217,
729
+ "loss": 0.0047,
730
+ "step": 103
731
+ },
732
+ {
733
+ "epoch": 104.0,
734
+ "grad_norm": 0.0423286072909832,
735
+ "learning_rate": 0.00015983935742971888,
736
+ "loss": 0.0048,
737
+ "step": 104
738
+ },
739
+ {
740
+ "epoch": 105.0,
741
+ "grad_norm": 0.02506270445883274,
742
+ "learning_rate": 0.00015943775100401607,
743
+ "loss": 0.0048,
744
+ "step": 105
745
+ },
746
+ {
747
+ "epoch": 106.0,
748
+ "grad_norm": 0.04458456113934517,
749
+ "learning_rate": 0.00015903614457831326,
750
+ "loss": 0.0048,
751
+ "step": 106
752
+ },
753
+ {
754
+ "epoch": 107.0,
755
+ "grad_norm": 0.02753458172082901,
756
+ "learning_rate": 0.00015863453815261045,
757
+ "loss": 0.0048,
758
+ "step": 107
759
+ },
760
+ {
761
+ "epoch": 108.0,
762
+ "grad_norm": 0.04234497249126434,
763
+ "learning_rate": 0.00015823293172690763,
764
+ "loss": 0.0048,
765
+ "step": 108
766
+ },
767
+ {
768
+ "epoch": 109.0,
769
+ "grad_norm": 0.03287286311388016,
770
+ "learning_rate": 0.00015783132530120482,
771
+ "loss": 0.0046,
772
+ "step": 109
773
+ },
774
+ {
775
+ "epoch": 110.0,
776
+ "grad_norm": 0.03671009838581085,
777
+ "learning_rate": 0.000157429718875502,
778
+ "loss": 0.0046,
779
+ "step": 110
780
+ },
781
+ {
782
+ "epoch": 111.0,
783
+ "grad_norm": 0.044372282922267914,
784
+ "learning_rate": 0.0001570281124497992,
785
+ "loss": 0.0048,
786
+ "step": 111
787
+ },
788
+ {
789
+ "epoch": 112.0,
790
+ "grad_norm": 0.03433435037732124,
791
+ "learning_rate": 0.0001566265060240964,
792
+ "loss": 0.005,
793
+ "step": 112
794
+ },
795
+ {
796
+ "epoch": 113.0,
797
+ "grad_norm": 0.04201051965355873,
798
+ "learning_rate": 0.00015622489959839358,
799
+ "loss": 0.0048,
800
+ "step": 113
801
+ },
802
+ {
803
+ "epoch": 114.0,
804
+ "grad_norm": 0.027228495106101036,
805
+ "learning_rate": 0.00015582329317269077,
806
+ "loss": 0.0048,
807
+ "step": 114
808
+ },
809
+ {
810
+ "epoch": 115.0,
811
+ "grad_norm": 0.04395684599876404,
812
+ "learning_rate": 0.00015542168674698796,
813
+ "loss": 0.0048,
814
+ "step": 115
815
+ },
816
+ {
817
+ "epoch": 116.0,
818
+ "grad_norm": 0.0418144129216671,
819
+ "learning_rate": 0.00015502008032128515,
820
+ "loss": 0.0048,
821
+ "step": 116
822
+ },
823
+ {
824
+ "epoch": 117.0,
825
+ "grad_norm": 0.015051459893584251,
826
+ "learning_rate": 0.00015461847389558234,
827
+ "loss": 0.0045,
828
+ "step": 117
829
+ },
830
+ {
831
+ "epoch": 118.0,
832
+ "grad_norm": 0.010102898813784122,
833
+ "learning_rate": 0.00015421686746987953,
834
+ "loss": 0.0047,
835
+ "step": 118
836
+ },
837
+ {
838
+ "epoch": 119.0,
839
+ "grad_norm": 0.026845330372452736,
840
+ "learning_rate": 0.00015381526104417672,
841
+ "loss": 0.0048,
842
+ "step": 119
843
+ },
844
+ {
845
+ "epoch": 120.0,
846
+ "grad_norm": 0.010000316426157951,
847
+ "learning_rate": 0.0001534136546184739,
848
+ "loss": 0.0047,
849
+ "step": 120
850
+ },
851
+ {
852
+ "epoch": 121.0,
853
+ "grad_norm": 0.041023530066013336,
854
+ "learning_rate": 0.0001530120481927711,
855
+ "loss": 0.0048,
856
+ "step": 121
857
+ },
858
+ {
859
+ "epoch": 122.0,
860
+ "grad_norm": 0.0072840056382119656,
861
+ "learning_rate": 0.00015261044176706828,
862
+ "loss": 0.0047,
863
+ "step": 122
864
+ },
865
+ {
866
+ "epoch": 123.0,
867
+ "grad_norm": 0.02653086557984352,
868
+ "learning_rate": 0.00015220883534136547,
869
+ "loss": 0.0048,
870
+ "step": 123
871
+ },
872
+ {
873
+ "epoch": 124.0,
874
+ "grad_norm": 0.01739220879971981,
875
+ "learning_rate": 0.00015180722891566266,
876
+ "loss": 0.005,
877
+ "step": 124
878
+ },
879
+ {
880
+ "epoch": 125.0,
881
+ "grad_norm": 0.007263750769197941,
882
+ "learning_rate": 0.00015140562248995985,
883
+ "loss": 0.0047,
884
+ "step": 125
885
+ },
886
+ {
887
+ "epoch": 126.0,
888
+ "grad_norm": 0.04079929739236832,
889
+ "learning_rate": 0.00015100401606425701,
890
+ "loss": 0.0048,
891
+ "step": 126
892
+ },
893
+ {
894
+ "epoch": 127.0,
895
+ "grad_norm": 0.009831869974732399,
896
+ "learning_rate": 0.00015060240963855423,
897
+ "loss": 0.0047,
898
+ "step": 127
899
+ },
900
+ {
901
+ "epoch": 128.0,
902
+ "grad_norm": 0.026599382981657982,
903
+ "learning_rate": 0.00015020080321285142,
904
+ "loss": 0.0048,
905
+ "step": 128
906
+ },
907
+ {
908
+ "epoch": 129.0,
909
+ "grad_norm": 0.018224092200398445,
910
+ "learning_rate": 0.0001497991967871486,
911
+ "loss": 0.0047,
912
+ "step": 129
913
+ },
914
+ {
915
+ "epoch": 130.0,
916
+ "grad_norm": 0.02792244777083397,
917
+ "learning_rate": 0.00014939759036144577,
918
+ "loss": 0.0047,
919
+ "step": 130
920
+ },
921
+ {
922
+ "epoch": 131.0,
923
+ "grad_norm": 0.024047773331403732,
924
+ "learning_rate": 0.000148995983935743,
925
+ "loss": 0.0048,
926
+ "step": 131
927
+ },
928
+ {
929
+ "epoch": 132.0,
930
+ "grad_norm": 0.018245236948132515,
931
+ "learning_rate": 0.00014859437751004018,
932
+ "loss": 0.0047,
933
+ "step": 132
934
+ },
935
+ {
936
+ "epoch": 133.0,
937
+ "grad_norm": 0.026581475511193275,
938
+ "learning_rate": 0.00014819277108433737,
939
+ "loss": 0.0048,
940
+ "step": 133
941
+ },
942
+ {
943
+ "epoch": 134.0,
944
+ "grad_norm": 0.007153503131121397,
945
+ "learning_rate": 0.00014779116465863453,
946
+ "loss": 0.0047,
947
+ "step": 134
948
+ },
949
+ {
950
+ "epoch": 135.0,
951
+ "grad_norm": 0.0323675237596035,
952
+ "learning_rate": 0.00014738955823293172,
953
+ "loss": 0.0048,
954
+ "step": 135
955
+ },
956
+ {
957
+ "epoch": 136.0,
958
+ "grad_norm": 0.005924216937273741,
959
+ "learning_rate": 0.00014698795180722893,
960
+ "loss": 0.0046,
961
+ "step": 136
962
+ },
963
+ {
964
+ "epoch": 137.0,
965
+ "grad_norm": 0.026371264830231667,
966
+ "learning_rate": 0.00014658634538152612,
967
+ "loss": 0.0048,
968
+ "step": 137
969
+ },
970
+ {
971
+ "epoch": 138.0,
972
+ "grad_norm": 0.002094594296067953,
973
+ "learning_rate": 0.00014618473895582328,
974
+ "loss": 0.0047,
975
+ "step": 138
976
+ },
977
+ {
978
+ "epoch": 139.0,
979
+ "grad_norm": 0.023835647851228714,
980
+ "learning_rate": 0.00014578313253012047,
981
+ "loss": 0.0048,
982
+ "step": 139
983
+ },
984
+ {
985
+ "epoch": 140.0,
986
+ "grad_norm": 0.007189361844211817,
987
+ "learning_rate": 0.0001453815261044177,
988
+ "loss": 0.0047,
989
+ "step": 140
990
+ },
991
+ {
992
+ "epoch": 141.0,
993
+ "grad_norm": 0.021903127431869507,
994
+ "learning_rate": 0.00014497991967871488,
995
+ "loss": 0.0049,
996
+ "step": 141
997
+ },
998
+ {
999
+ "epoch": 142.0,
1000
+ "grad_norm": 0.009862157516181469,
1001
+ "learning_rate": 0.00014457831325301204,
1002
+ "loss": 0.0047,
1003
+ "step": 142
1004
+ },
1005
+ {
1006
+ "epoch": 143.0,
1007
+ "grad_norm": 0.01557657215744257,
1008
+ "learning_rate": 0.00014417670682730923,
1009
+ "loss": 0.0047,
1010
+ "step": 143
1011
+ },
1012
+ {
1013
+ "epoch": 144.0,
1014
+ "grad_norm": 0.01178675051778555,
1015
+ "learning_rate": 0.00014377510040160642,
1016
+ "loss": 0.0048,
1017
+ "step": 144
1018
+ },
1019
+ {
1020
+ "epoch": 145.0,
1021
+ "grad_norm": 0.009945346042513847,
1022
+ "learning_rate": 0.00014337349397590364,
1023
+ "loss": 0.0047,
1024
+ "step": 145
1025
+ },
1026
+ {
1027
+ "epoch": 146.0,
1028
+ "grad_norm": 0.013681800104677677,
1029
+ "learning_rate": 0.0001429718875502008,
1030
+ "loss": 0.0048,
1031
+ "step": 146
1032
+ },
1033
+ {
1034
+ "epoch": 147.0,
1035
+ "grad_norm": 0.0072199697606265545,
1036
+ "learning_rate": 0.000142570281124498,
1037
+ "loss": 0.0047,
1038
+ "step": 147
1039
+ },
1040
+ {
1041
+ "epoch": 148.0,
1042
+ "grad_norm": 0.015607825480401516,
1043
+ "learning_rate": 0.00014216867469879518,
1044
+ "loss": 0.0047,
1045
+ "step": 148
1046
+ },
1047
+ {
1048
+ "epoch": 149.0,
1049
+ "grad_norm": 0.00990898534655571,
1050
+ "learning_rate": 0.0001417670682730924,
1051
+ "loss": 0.0047,
1052
+ "step": 149
1053
+ },
1054
+ {
1055
+ "epoch": 150.0,
1056
+ "grad_norm": 0.009972168132662773,
1057
+ "learning_rate": 0.00014136546184738956,
1058
+ "loss": 0.0047,
1059
+ "step": 150
1060
+ },
1061
+ {
1062
+ "epoch": 151.0,
1063
+ "grad_norm": 0.007324350066483021,
1064
+ "learning_rate": 0.00014096385542168674,
1065
+ "loss": 0.0047,
1066
+ "step": 151
1067
+ },
1068
+ {
1069
+ "epoch": 152.0,
1070
+ "grad_norm": 0.007397031411528587,
1071
+ "learning_rate": 0.00014056224899598393,
1072
+ "loss": 0.0047,
1073
+ "step": 152
1074
+ },
1075
+ {
1076
+ "epoch": 153.0,
1077
+ "grad_norm": 0.00632756482809782,
1078
+ "learning_rate": 0.00014016064257028115,
1079
+ "loss": 0.0046,
1080
+ "step": 153
1081
+ },
1082
+ {
1083
+ "epoch": 154.0,
1084
+ "grad_norm": 0.0021140226162970066,
1085
+ "learning_rate": 0.00013975903614457834,
1086
+ "loss": 0.0047,
1087
+ "step": 154
1088
+ },
1089
+ {
1090
+ "epoch": 155.0,
1091
+ "grad_norm": 0.0021610369440168142,
1092
+ "learning_rate": 0.0001393574297188755,
1093
+ "loss": 0.0047,
1094
+ "step": 155
1095
+ },
1096
+ {
1097
+ "epoch": 156.0,
1098
+ "grad_norm": 0.004180264193564653,
1099
+ "learning_rate": 0.0001389558232931727,
1100
+ "loss": 0.0048,
1101
+ "step": 156
1102
+ },
1103
+ {
1104
+ "epoch": 157.0,
1105
+ "grad_norm": 0.002081150421872735,
1106
+ "learning_rate": 0.00013855421686746988,
1107
+ "loss": 0.0047,
1108
+ "step": 157
1109
+ },
1110
+ {
1111
+ "epoch": 158.0,
1112
+ "grad_norm": 0.004214874934405088,
1113
+ "learning_rate": 0.0001381526104417671,
1114
+ "loss": 0.0048,
1115
+ "step": 158
1116
+ },
1117
+ {
1118
+ "epoch": 159.0,
1119
+ "grad_norm": 0.002199581591412425,
1120
+ "learning_rate": 0.00013775100401606426,
1121
+ "loss": 0.0047,
1122
+ "step": 159
1123
+ },
1124
+ {
1125
+ "epoch": 160.0,
1126
+ "grad_norm": 0.010448895394802094,
1127
+ "learning_rate": 0.00013734939759036145,
1128
+ "loss": 0.0047,
1129
+ "step": 160
1130
+ },
1131
+ {
1132
+ "epoch": 161.0,
1133
+ "grad_norm": 0.007638930808752775,
1134
+ "learning_rate": 0.00013694779116465864,
1135
+ "loss": 0.0047,
1136
+ "step": 161
1137
+ },
1138
+ {
1139
+ "epoch": 162.0,
1140
+ "grad_norm": 0.007661975454539061,
1141
+ "learning_rate": 0.00013654618473895585,
1142
+ "loss": 0.0047,
1143
+ "step": 162
1144
+ },
1145
+ {
1146
+ "epoch": 163.0,
1147
+ "grad_norm": 0.0014610282378271222,
1148
+ "learning_rate": 0.00013614457831325302,
1149
+ "loss": 0.0047,
1150
+ "step": 163
1151
+ },
1152
+ {
1153
+ "epoch": 164.0,
1154
+ "grad_norm": 0.01049406360834837,
1155
+ "learning_rate": 0.0001357429718875502,
1156
+ "loss": 0.0047,
1157
+ "step": 164
1158
+ },
1159
+ {
1160
+ "epoch": 165.0,
1161
+ "grad_norm": 0.0021653317380696535,
1162
+ "learning_rate": 0.0001353413654618474,
1163
+ "loss": 0.0047,
1164
+ "step": 165
1165
+ },
1166
+ {
1167
+ "epoch": 166.0,
1168
+ "grad_norm": 0.016612282022833824,
1169
+ "learning_rate": 0.00013493975903614458,
1170
+ "loss": 0.0047,
1171
+ "step": 166
1172
+ },
1173
+ {
1174
+ "epoch": 167.0,
1175
+ "grad_norm": 0.005656089633703232,
1176
+ "learning_rate": 0.00013453815261044177,
1177
+ "loss": 0.0048,
1178
+ "step": 167
1179
+ },
1180
+ {
1181
+ "epoch": 168.0,
1182
+ "grad_norm": 0.015557671897113323,
1183
+ "learning_rate": 0.00013413654618473896,
1184
+ "loss": 0.0046,
1185
+ "step": 168
1186
+ },
1187
+ {
1188
+ "epoch": 169.0,
1189
+ "grad_norm": 0.00793896708637476,
1190
+ "learning_rate": 0.00013373493975903615,
1191
+ "loss": 0.0047,
1192
+ "step": 169
1193
+ },
1194
+ {
1195
+ "epoch": 170.0,
1196
+ "grad_norm": 0.016965791583061218,
1197
+ "learning_rate": 0.00013333333333333334,
1198
+ "loss": 0.0047,
1199
+ "step": 170
1200
+ },
1201
+ {
1202
+ "epoch": 171.0,
1203
+ "grad_norm": 0.010896142572164536,
1204
+ "learning_rate": 0.00013293172690763053,
1205
+ "loss": 0.0047,
1206
+ "step": 171
1207
+ },
1208
+ {
1209
+ "epoch": 172.0,
1210
+ "grad_norm": 0.014977889135479927,
1211
+ "learning_rate": 0.00013253012048192772,
1212
+ "loss": 0.0048,
1213
+ "step": 172
1214
+ },
1215
+ {
1216
+ "epoch": 173.0,
1217
+ "grad_norm": 0.007990415208041668,
1218
+ "learning_rate": 0.0001321285140562249,
1219
+ "loss": 0.0047,
1220
+ "step": 173
1221
+ },
1222
+ {
1223
+ "epoch": 174.0,
1224
+ "grad_norm": 0.01746082492172718,
1225
+ "learning_rate": 0.0001317269076305221,
1226
+ "loss": 0.0047,
1227
+ "step": 174
1228
+ },
1229
+ {
1230
+ "epoch": 175.0,
1231
+ "grad_norm": 0.011155808344483376,
1232
+ "learning_rate": 0.00013132530120481929,
1233
+ "loss": 0.0047,
1234
+ "step": 175
1235
+ },
1236
+ {
1237
+ "epoch": 176.0,
1238
+ "grad_norm": 0.020374253392219543,
1239
+ "learning_rate": 0.00013092369477911648,
1240
+ "loss": 0.0047,
1241
+ "step": 176
1242
+ },
1243
+ {
1244
+ "epoch": 177.0,
1245
+ "grad_norm": 0.027189314365386963,
1246
+ "learning_rate": 0.00013052208835341366,
1247
+ "loss": 0.0048,
1248
+ "step": 177
1249
+ },
1250
+ {
1251
+ "epoch": 178.0,
1252
+ "grad_norm": 0.0035853274166584015,
1253
+ "learning_rate": 0.00013012048192771085,
1254
+ "loss": 0.0046,
1255
+ "step": 178
1256
+ },
1257
+ {
1258
+ "epoch": 179.0,
1259
+ "grad_norm": 0.02057839184999466,
1260
+ "learning_rate": 0.00012971887550200804,
1261
+ "loss": 0.0047,
1262
+ "step": 179
1263
+ },
1264
+ {
1265
+ "epoch": 180.0,
1266
+ "grad_norm": 0.004144900944083929,
1267
+ "learning_rate": 0.00012931726907630523,
1268
+ "loss": 0.0047,
1269
+ "step": 180
1270
+ },
1271
+ {
1272
+ "epoch": 181.0,
1273
+ "grad_norm": 0.027396870777010918,
1274
+ "learning_rate": 0.00012891566265060242,
1275
+ "loss": 0.0048,
1276
+ "step": 181
1277
+ },
1278
+ {
1279
+ "epoch": 182.0,
1280
+ "grad_norm": 0.004257265478372574,
1281
+ "learning_rate": 0.0001285140562248996,
1282
+ "loss": 0.0048,
1283
+ "step": 182
1284
+ },
1285
+ {
1286
+ "epoch": 183.0,
1287
+ "grad_norm": 0.029999705031514168,
1288
+ "learning_rate": 0.0001281124497991968,
1289
+ "loss": 0.0048,
1290
+ "step": 183
1291
+ },
1292
+ {
1293
+ "epoch": 184.0,
1294
+ "grad_norm": 0.012998619116842747,
1295
+ "learning_rate": 0.00012771084337349396,
1296
+ "loss": 0.0047,
1297
+ "step": 184
1298
+ },
1299
+ {
1300
+ "epoch": 185.0,
1301
+ "grad_norm": 0.027277518063783646,
1302
+ "learning_rate": 0.00012730923694779118,
1303
+ "loss": 0.0048,
1304
+ "step": 185
1305
+ },
1306
+ {
1307
+ "epoch": 186.0,
1308
+ "grad_norm": 0.020436229184269905,
1309
+ "learning_rate": 0.00012690763052208837,
1310
+ "loss": 0.0047,
1311
+ "step": 186
1312
+ },
1313
+ {
1314
+ "epoch": 187.0,
1315
+ "grad_norm": 0.017912449315190315,
1316
+ "learning_rate": 0.00012650602409638556,
1317
+ "loss": 0.0048,
1318
+ "step": 187
1319
+ },
1320
+ {
1321
+ "epoch": 188.0,
1322
+ "grad_norm": 0.022624023258686066,
1323
+ "learning_rate": 0.00012610441767068272,
1324
+ "loss": 0.0047,
1325
+ "step": 188
1326
+ },
1327
+ {
1328
+ "epoch": 189.0,
1329
+ "grad_norm": 0.00817954819649458,
1330
+ "learning_rate": 0.00012570281124497994,
1331
+ "loss": 0.0047,
1332
+ "step": 189
1333
+ },
1334
+ {
1335
+ "epoch": 190.0,
1336
+ "grad_norm": 0.01837238110601902,
1337
+ "learning_rate": 0.00012530120481927712,
1338
+ "loss": 0.0047,
1339
+ "step": 190
1340
+ },
1341
+ {
1342
+ "epoch": 191.0,
1343
+ "grad_norm": 0.006303516216576099,
1344
+ "learning_rate": 0.0001248995983935743,
1345
+ "loss": 0.0047,
1346
+ "step": 191
1347
+ },
1348
+ {
1349
+ "epoch": 192.0,
1350
+ "grad_norm": 0.02228759415447712,
1351
+ "learning_rate": 0.00012449799196787148,
1352
+ "loss": 0.0047,
1353
+ "step": 192
1354
+ },
1355
+ {
1356
+ "epoch": 193.0,
1357
+ "grad_norm": 0.004886090289801359,
1358
+ "learning_rate": 0.0001240963855421687,
1359
+ "loss": 0.0047,
1360
+ "step": 193
1361
+ },
1362
+ {
1363
+ "epoch": 194.0,
1364
+ "grad_norm": 0.017957722768187523,
1365
+ "learning_rate": 0.00012369477911646588,
1366
+ "loss": 0.0048,
1367
+ "step": 194
1368
+ },
1369
+ {
1370
+ "epoch": 195.0,
1371
+ "grad_norm": 0.008203946985304356,
1372
+ "learning_rate": 0.00012329317269076307,
1373
+ "loss": 0.0047,
1374
+ "step": 195
1375
+ },
1376
+ {
1377
+ "epoch": 196.0,
1378
+ "grad_norm": 0.013346477411687374,
1379
+ "learning_rate": 0.00012289156626506023,
1380
+ "loss": 0.0047,
1381
+ "step": 196
1382
+ },
1383
+ {
1384
+ "epoch": 197.0,
1385
+ "grad_norm": 0.008646669797599316,
1386
+ "learning_rate": 0.00012248995983935742,
1387
+ "loss": 0.0048,
1388
+ "step": 197
1389
+ },
1390
+ {
1391
+ "epoch": 198.0,
1392
+ "grad_norm": 0.011337646283209324,
1393
+ "learning_rate": 0.00012208835341365464,
1394
+ "loss": 0.0047,
1395
+ "step": 198
1396
+ },
1397
+ {
1398
+ "epoch": 199.0,
1399
+ "grad_norm": 0.01342825498431921,
1400
+ "learning_rate": 0.00012168674698795181,
1401
+ "loss": 0.0047,
1402
+ "step": 199
1403
+ },
1404
+ {
1405
+ "epoch": 200.0,
1406
+ "grad_norm": 0.006050780415534973,
1407
+ "learning_rate": 0.000121285140562249,
1408
+ "loss": 0.0048,
1409
+ "step": 200
1410
+ },
1411
+ {
1412
+ "epoch": 201.0,
1413
+ "grad_norm": 0.01836731843650341,
1414
+ "learning_rate": 0.00012088353413654618,
1415
+ "loss": 0.0048,
1416
+ "step": 201
1417
+ },
1418
+ {
1419
+ "epoch": 202.0,
1420
+ "grad_norm": 0.008505699224770069,
1421
+ "learning_rate": 0.0001204819277108434,
1422
+ "loss": 0.0047,
1423
+ "step": 202
1424
+ },
1425
+ {
1426
+ "epoch": 203.0,
1427
+ "grad_norm": 0.008753190748393536,
1428
+ "learning_rate": 0.00012008032128514057,
1429
+ "loss": 0.0047,
1430
+ "step": 203
1431
+ },
1432
+ {
1433
+ "epoch": 204.0,
1434
+ "grad_norm": 0.009133870713412762,
1435
+ "learning_rate": 0.00011967871485943776,
1436
+ "loss": 0.0048,
1437
+ "step": 204
1438
+ },
1439
+ {
1440
+ "epoch": 205.0,
1441
+ "grad_norm": 0.0029124633874744177,
1442
+ "learning_rate": 0.00011927710843373494,
1443
+ "loss": 0.0048,
1444
+ "step": 205
1445
+ },
1446
+ {
1447
+ "epoch": 206.0,
1448
+ "grad_norm": 0.00847614649683237,
1449
+ "learning_rate": 0.00011887550200803212,
1450
+ "loss": 0.0047,
1451
+ "step": 206
1452
+ },
1453
+ {
1454
+ "epoch": 207.0,
1455
+ "grad_norm": 0.0028652322944253683,
1456
+ "learning_rate": 0.00011847389558232933,
1457
+ "loss": 0.0047,
1458
+ "step": 207
1459
+ },
1460
+ {
1461
+ "epoch": 208.0,
1462
+ "grad_norm": 0.009550940245389938,
1463
+ "learning_rate": 0.00011807228915662652,
1464
+ "loss": 0.0047,
1465
+ "step": 208
1466
+ },
1467
+ {
1468
+ "epoch": 209.0,
1469
+ "grad_norm": 0.008687314577400684,
1470
+ "learning_rate": 0.00011767068273092369,
1471
+ "loss": 0.0047,
1472
+ "step": 209
1473
+ },
1474
+ {
1475
+ "epoch": 210.0,
1476
+ "grad_norm": 0.004487687721848488,
1477
+ "learning_rate": 0.00011726907630522088,
1478
+ "loss": 0.0047,
1479
+ "step": 210
1480
+ },
1481
+ {
1482
+ "epoch": 211.0,
1483
+ "grad_norm": 0.009615003131330013,
1484
+ "learning_rate": 0.00011686746987951808,
1485
+ "loss": 0.0047,
1486
+ "step": 211
1487
+ },
1488
+ {
1489
+ "epoch": 212.0,
1490
+ "grad_norm": 0.001644686795771122,
1491
+ "learning_rate": 0.00011646586345381527,
1492
+ "loss": 0.0047,
1493
+ "step": 212
1494
+ },
1495
+ {
1496
+ "epoch": 213.0,
1497
+ "grad_norm": 0.012544874101877213,
1498
+ "learning_rate": 0.00011606425702811245,
1499
+ "loss": 0.0047,
1500
+ "step": 213
1501
+ },
1502
+ {
1503
+ "epoch": 214.0,
1504
+ "grad_norm": 0.009784480556845665,
1505
+ "learning_rate": 0.00011566265060240964,
1506
+ "loss": 0.0047,
1507
+ "step": 214
1508
+ },
1509
+ {
1510
+ "epoch": 215.0,
1511
+ "grad_norm": 0.008885402232408524,
1512
+ "learning_rate": 0.00011526104417670683,
1513
+ "loss": 0.0047,
1514
+ "step": 215
1515
+ },
1516
+ {
1517
+ "epoch": 216.0,
1518
+ "grad_norm": 0.015116319991648197,
1519
+ "learning_rate": 0.00011485943775100403,
1520
+ "loss": 0.0047,
1521
+ "step": 216
1522
+ },
1523
+ {
1524
+ "epoch": 217.0,
1525
+ "grad_norm": 0.0030038722325116396,
1526
+ "learning_rate": 0.0001144578313253012,
1527
+ "loss": 0.0047,
1528
+ "step": 217
1529
+ },
1530
+ {
1531
+ "epoch": 218.0,
1532
+ "grad_norm": 0.014432215131819248,
1533
+ "learning_rate": 0.0001140562248995984,
1534
+ "loss": 0.0047,
1535
+ "step": 218
1536
+ },
1537
+ {
1538
+ "epoch": 219.0,
1539
+ "grad_norm": 0.012555493041872978,
1540
+ "learning_rate": 0.00011365461847389558,
1541
+ "loss": 0.0047,
1542
+ "step": 219
1543
+ },
1544
+ {
1545
+ "epoch": 220.0,
1546
+ "grad_norm": 0.004350865725427866,
1547
+ "learning_rate": 0.00011325301204819279,
1548
+ "loss": 0.0047,
1549
+ "step": 220
1550
+ },
1551
+ {
1552
+ "epoch": 221.0,
1553
+ "grad_norm": 0.011415142565965652,
1554
+ "learning_rate": 0.00011285140562248996,
1555
+ "loss": 0.0047,
1556
+ "step": 221
1557
+ },
1558
+ {
1559
+ "epoch": 222.0,
1560
+ "grad_norm": 0.002292638411745429,
1561
+ "learning_rate": 0.00011244979919678715,
1562
+ "loss": 0.0047,
1563
+ "step": 222
1564
+ },
1565
+ {
1566
+ "epoch": 223.0,
1567
+ "grad_norm": 0.011944664642214775,
1568
+ "learning_rate": 0.00011204819277108434,
1569
+ "loss": 0.0047,
1570
+ "step": 223
1571
+ },
1572
+ {
1573
+ "epoch": 224.0,
1574
+ "grad_norm": 0.009497747756540775,
1575
+ "learning_rate": 0.00011164658634538152,
1576
+ "loss": 0.0047,
1577
+ "step": 224
1578
+ },
1579
+ {
1580
+ "epoch": 225.0,
1581
+ "grad_norm": 0.004834707360714674,
1582
+ "learning_rate": 0.00011124497991967872,
1583
+ "loss": 0.0047,
1584
+ "step": 225
1585
+ },
1586
+ {
1587
+ "epoch": 226.0,
1588
+ "grad_norm": 0.012152622453868389,
1589
+ "learning_rate": 0.00011084337349397591,
1590
+ "loss": 0.0048,
1591
+ "step": 226
1592
+ },
1593
+ {
1594
+ "epoch": 227.0,
1595
+ "grad_norm": 0.005508288741111755,
1596
+ "learning_rate": 0.0001104417670682731,
1597
+ "loss": 0.0047,
1598
+ "step": 227
1599
+ },
1600
+ {
1601
+ "epoch": 228.0,
1602
+ "grad_norm": 0.004662630148231983,
1603
+ "learning_rate": 0.00011004016064257027,
1604
+ "loss": 0.0047,
1605
+ "step": 228
1606
+ },
1607
+ {
1608
+ "epoch": 229.0,
1609
+ "grad_norm": 0.0046459161676466465,
1610
+ "learning_rate": 0.00010963855421686749,
1611
+ "loss": 0.0047,
1612
+ "step": 229
1613
+ },
1614
+ {
1615
+ "epoch": 230.0,
1616
+ "grad_norm": 0.003354718443006277,
1617
+ "learning_rate": 0.00010923694779116467,
1618
+ "loss": 0.0047,
1619
+ "step": 230
1620
+ },
1621
+ {
1622
+ "epoch": 231.0,
1623
+ "grad_norm": 0.005278999917209148,
1624
+ "learning_rate": 0.00010883534136546186,
1625
+ "loss": 0.0048,
1626
+ "step": 231
1627
+ },
1628
+ {
1629
+ "epoch": 232.0,
1630
+ "grad_norm": 0.006012341473251581,
1631
+ "learning_rate": 0.00010843373493975903,
1632
+ "loss": 0.0047,
1633
+ "step": 232
1634
+ },
1635
+ {
1636
+ "epoch": 233.0,
1637
+ "grad_norm": 0.0016911854036152363,
1638
+ "learning_rate": 0.00010803212851405625,
1639
+ "loss": 0.0047,
1640
+ "step": 233
1641
+ },
1642
+ {
1643
+ "epoch": 234.0,
1644
+ "grad_norm": 0.004478626884520054,
1645
+ "learning_rate": 0.00010763052208835342,
1646
+ "loss": 0.0047,
1647
+ "step": 234
1648
+ },
1649
+ {
1650
+ "epoch": 235.0,
1651
+ "grad_norm": 0.005508603993803263,
1652
+ "learning_rate": 0.00010722891566265061,
1653
+ "loss": 0.0047,
1654
+ "step": 235
1655
+ },
1656
+ {
1657
+ "epoch": 236.0,
1658
+ "grad_norm": 0.0032062928657978773,
1659
+ "learning_rate": 0.00010682730923694779,
1660
+ "loss": 0.0047,
1661
+ "step": 236
1662
+ },
1663
+ {
1664
+ "epoch": 237.0,
1665
+ "grad_norm": 0.006789966020733118,
1666
+ "learning_rate": 0.00010642570281124498,
1667
+ "loss": 0.0048,
1668
+ "step": 237
1669
+ },
1670
+ {
1671
+ "epoch": 238.0,
1672
+ "grad_norm": 0.005682968068867922,
1673
+ "learning_rate": 0.00010602409638554218,
1674
+ "loss": 0.0047,
1675
+ "step": 238
1676
+ },
1677
+ {
1678
+ "epoch": 239.0,
1679
+ "grad_norm": 0.00296304514631629,
1680
+ "learning_rate": 0.00010562248995983937,
1681
+ "loss": 0.0047,
1682
+ "step": 239
1683
+ },
1684
+ {
1685
+ "epoch": 240.0,
1686
+ "grad_norm": 0.006762088742107153,
1687
+ "learning_rate": 0.00010522088353413654,
1688
+ "loss": 0.0047,
1689
+ "step": 240
1690
+ },
1691
+ {
1692
+ "epoch": 241.0,
1693
+ "grad_norm": 0.0028762409929186106,
1694
+ "learning_rate": 0.00010481927710843373,
1695
+ "loss": 0.0047,
1696
+ "step": 241
1697
+ },
1698
+ {
1699
+ "epoch": 242.0,
1700
+ "grad_norm": 0.009846026077866554,
1701
+ "learning_rate": 0.00010441767068273094,
1702
+ "loss": 0.0047,
1703
+ "step": 242
1704
+ },
1705
+ {
1706
+ "epoch": 243.0,
1707
+ "grad_norm": 0.01071733795106411,
1708
+ "learning_rate": 0.00010401606425702813,
1709
+ "loss": 0.0047,
1710
+ "step": 243
1711
+ },
1712
+ {
1713
+ "epoch": 244.0,
1714
+ "grad_norm": 0.001721803448162973,
1715
+ "learning_rate": 0.0001036144578313253,
1716
+ "loss": 0.0047,
1717
+ "step": 244
1718
+ },
1719
+ {
1720
+ "epoch": 245.0,
1721
+ "grad_norm": 0.015101822093129158,
1722
+ "learning_rate": 0.00010321285140562249,
1723
+ "loss": 0.0047,
1724
+ "step": 245
1725
+ },
1726
+ {
1727
+ "epoch": 246.0,
1728
+ "grad_norm": 0.012098951265215874,
1729
+ "learning_rate": 0.00010281124497991968,
1730
+ "loss": 0.0047,
1731
+ "step": 246
1732
+ },
1733
+ {
1734
+ "epoch": 247.0,
1735
+ "grad_norm": 0.006853122264146805,
1736
+ "learning_rate": 0.00010240963855421688,
1737
+ "loss": 0.0047,
1738
+ "step": 247
1739
+ },
1740
+ {
1741
+ "epoch": 248.0,
1742
+ "grad_norm": 0.016420654952526093,
1743
+ "learning_rate": 0.00010200803212851406,
1744
+ "loss": 0.0047,
1745
+ "step": 248
1746
+ },
1747
+ {
1748
+ "epoch": 249.0,
1749
+ "grad_norm": 0.00433447165414691,
1750
+ "learning_rate": 0.00010160642570281125,
1751
+ "loss": 0.0047,
1752
+ "step": 249
1753
+ },
1754
+ {
1755
+ "epoch": 250.0,
1756
+ "grad_norm": 0.012108503840863705,
1757
+ "learning_rate": 0.00010120481927710844,
1758
+ "loss": 0.0047,
1759
+ "step": 250
1760
+ },
1761
+ {
1762
+ "epoch": 251.0,
1763
+ "grad_norm": 0.008616012521088123,
1764
+ "learning_rate": 0.00010080321285140564,
1765
+ "loss": 0.0047,
1766
+ "step": 251
1767
+ },
1768
+ {
1769
+ "epoch": 252.0,
1770
+ "grad_norm": 0.008652638643980026,
1771
+ "learning_rate": 0.00010040160642570282,
1772
+ "loss": 0.0047,
1773
+ "step": 252
1774
+ },
1775
+ {
1776
+ "epoch": 253.0,
1777
+ "grad_norm": 0.01692233793437481,
1778
+ "learning_rate": 0.0001,
1779
+ "loss": 0.0047,
1780
+ "step": 253
1781
+ },
1782
+ {
1783
+ "epoch": 254.0,
1784
+ "grad_norm": 0.006091867107897997,
1785
+ "learning_rate": 9.95983935742972e-05,
1786
+ "loss": 0.0047,
1787
+ "step": 254
1788
+ },
1789
+ {
1790
+ "epoch": 255.0,
1791
+ "grad_norm": 0.011222707107663155,
1792
+ "learning_rate": 9.919678714859438e-05,
1793
+ "loss": 0.0047,
1794
+ "step": 255
1795
+ },
1796
+ {
1797
+ "epoch": 256.0,
1798
+ "grad_norm": 0.011931284330785275,
1799
+ "learning_rate": 9.879518072289157e-05,
1800
+ "loss": 0.0047,
1801
+ "step": 256
1802
+ },
1803
+ {
1804
+ "epoch": 257.0,
1805
+ "grad_norm": 0.004332751967012882,
1806
+ "learning_rate": 9.839357429718876e-05,
1807
+ "loss": 0.0048,
1808
+ "step": 257
1809
+ },
1810
+ {
1811
+ "epoch": 258.0,
1812
+ "grad_norm": 0.013347852043807507,
1813
+ "learning_rate": 9.799196787148595e-05,
1814
+ "loss": 0.0047,
1815
+ "step": 258
1816
+ },
1817
+ {
1818
+ "epoch": 259.0,
1819
+ "grad_norm": 0.006608230993151665,
1820
+ "learning_rate": 9.759036144578314e-05,
1821
+ "loss": 0.0047,
1822
+ "step": 259
1823
+ },
1824
+ {
1825
+ "epoch": 260.0,
1826
+ "grad_norm": 0.004314239602535963,
1827
+ "learning_rate": 9.718875502008033e-05,
1828
+ "loss": 0.0048,
1829
+ "step": 260
1830
+ },
1831
+ {
1832
+ "epoch": 261.0,
1833
+ "grad_norm": 0.0060504162684082985,
1834
+ "learning_rate": 9.678714859437752e-05,
1835
+ "loss": 0.0048,
1836
+ "step": 261
1837
+ },
1838
+ {
1839
+ "epoch": 262.0,
1840
+ "grad_norm": 0.0033889045007526875,
1841
+ "learning_rate": 9.638554216867471e-05,
1842
+ "loss": 0.0047,
1843
+ "step": 262
1844
+ },
1845
+ {
1846
+ "epoch": 263.0,
1847
+ "grad_norm": 0.008607766591012478,
1848
+ "learning_rate": 9.598393574297188e-05,
1849
+ "loss": 0.0048,
1850
+ "step": 263
1851
+ },
1852
+ {
1853
+ "epoch": 264.0,
1854
+ "grad_norm": 0.001759338192641735,
1855
+ "learning_rate": 9.558232931726909e-05,
1856
+ "loss": 0.0047,
1857
+ "step": 264
1858
+ },
1859
+ {
1860
+ "epoch": 265.0,
1861
+ "grad_norm": 0.008092672564089298,
1862
+ "learning_rate": 9.518072289156626e-05,
1863
+ "loss": 0.0047,
1864
+ "step": 265
1865
+ },
1866
+ {
1867
+ "epoch": 266.0,
1868
+ "grad_norm": 0.0017322164494544268,
1869
+ "learning_rate": 9.477911646586346e-05,
1870
+ "loss": 0.0047,
1871
+ "step": 266
1872
+ },
1873
+ {
1874
+ "epoch": 267.0,
1875
+ "grad_norm": 0.0017360730562359095,
1876
+ "learning_rate": 9.437751004016064e-05,
1877
+ "loss": 0.0047,
1878
+ "step": 267
1879
+ },
1880
+ {
1881
+ "epoch": 268.0,
1882
+ "grad_norm": 0.0017291605472564697,
1883
+ "learning_rate": 9.397590361445784e-05,
1884
+ "loss": 0.0047,
1885
+ "step": 268
1886
+ },
1887
+ {
1888
+ "epoch": 269.0,
1889
+ "grad_norm": 0.0016873552231118083,
1890
+ "learning_rate": 9.357429718875502e-05,
1891
+ "loss": 0.0047,
1892
+ "step": 269
1893
+ },
1894
+ {
1895
+ "epoch": 270.0,
1896
+ "grad_norm": 0.007920237258076668,
1897
+ "learning_rate": 9.317269076305222e-05,
1898
+ "loss": 0.0047,
1899
+ "step": 270
1900
+ },
1901
+ {
1902
+ "epoch": 271.0,
1903
+ "grad_norm": 0.0036651096306741238,
1904
+ "learning_rate": 9.27710843373494e-05,
1905
+ "loss": 0.0048,
1906
+ "step": 271
1907
+ },
1908
+ {
1909
+ "epoch": 272.0,
1910
+ "grad_norm": 0.01999688521027565,
1911
+ "learning_rate": 9.23694779116466e-05,
1912
+ "loss": 0.0047,
1913
+ "step": 272
1914
+ },
1915
+ {
1916
+ "epoch": 273.0,
1917
+ "grad_norm": 0.007801530417054892,
1918
+ "learning_rate": 9.196787148594378e-05,
1919
+ "loss": 0.0047,
1920
+ "step": 273
1921
+ },
1922
+ {
1923
+ "epoch": 274.0,
1924
+ "grad_norm": 0.012862344272434711,
1925
+ "learning_rate": 9.156626506024096e-05,
1926
+ "loss": 0.0048,
1927
+ "step": 274
1928
+ },
1929
+ {
1930
+ "epoch": 275.0,
1931
+ "grad_norm": 0.0016800492303445935,
1932
+ "learning_rate": 9.116465863453815e-05,
1933
+ "loss": 0.0047,
1934
+ "step": 275
1935
+ },
1936
+ {
1937
+ "epoch": 276.0,
1938
+ "grad_norm": 0.019683390855789185,
1939
+ "learning_rate": 9.076305220883534e-05,
1940
+ "loss": 0.0047,
1941
+ "step": 276
1942
+ },
1943
+ {
1944
+ "epoch": 277.0,
1945
+ "grad_norm": 0.0017172418301925063,
1946
+ "learning_rate": 9.036144578313253e-05,
1947
+ "loss": 0.0047,
1948
+ "step": 277
1949
+ },
1950
+ {
1951
+ "epoch": 278.0,
1952
+ "grad_norm": 0.030774248763918877,
1953
+ "learning_rate": 8.995983935742972e-05,
1954
+ "loss": 0.0049,
1955
+ "step": 278
1956
+ },
1957
+ {
1958
+ "epoch": 279.0,
1959
+ "grad_norm": 0.0035772966220974922,
1960
+ "learning_rate": 8.955823293172691e-05,
1961
+ "loss": 0.0048,
1962
+ "step": 279
1963
+ },
1964
+ {
1965
+ "epoch": 280.0,
1966
+ "grad_norm": 0.03694264590740204,
1967
+ "learning_rate": 8.91566265060241e-05,
1968
+ "loss": 0.0048,
1969
+ "step": 280
1970
+ },
1971
+ {
1972
+ "epoch": 281.0,
1973
+ "grad_norm": 0.0031574727036058903,
1974
+ "learning_rate": 8.875502008032129e-05,
1975
+ "loss": 0.0045,
1976
+ "step": 281
1977
+ },
1978
+ {
1979
+ "epoch": 282.0,
1980
+ "grad_norm": 0.03539184108376503,
1981
+ "learning_rate": 8.835341365461848e-05,
1982
+ "loss": 0.005,
1983
+ "step": 282
1984
+ },
1985
+ {
1986
+ "epoch": 283.0,
1987
+ "grad_norm": 0.010516048409044743,
1988
+ "learning_rate": 8.795180722891567e-05,
1989
+ "loss": 0.0047,
1990
+ "step": 283
1991
+ },
1992
+ {
1993
+ "epoch": 284.0,
1994
+ "grad_norm": 0.010487216524779797,
1995
+ "learning_rate": 8.755020080321286e-05,
1996
+ "loss": 0.0047,
1997
+ "step": 284
1998
+ },
1999
+ {
2000
+ "epoch": 285.0,
2001
+ "grad_norm": 0.010434217751026154,
2002
+ "learning_rate": 8.714859437751005e-05,
2003
+ "loss": 0.0047,
2004
+ "step": 285
2005
+ },
2006
+ {
2007
+ "epoch": 286.0,
2008
+ "grad_norm": 0.007612716872245073,
2009
+ "learning_rate": 8.674698795180724e-05,
2010
+ "loss": 0.0047,
2011
+ "step": 286
2012
+ },
2013
+ {
2014
+ "epoch": 287.0,
2015
+ "grad_norm": 0.025456130504608154,
2016
+ "learning_rate": 8.634538152610442e-05,
2017
+ "loss": 0.0048,
2018
+ "step": 287
2019
+ },
2020
+ {
2021
+ "epoch": 288.0,
2022
+ "grad_norm": 0.010371106676757336,
2023
+ "learning_rate": 8.594377510040161e-05,
2024
+ "loss": 0.0047,
2025
+ "step": 288
2026
+ },
2027
+ {
2028
+ "epoch": 289.0,
2029
+ "grad_norm": 0.028378885239362717,
2030
+ "learning_rate": 8.55421686746988e-05,
2031
+ "loss": 0.0048,
2032
+ "step": 289
2033
+ },
2034
+ {
2035
+ "epoch": 290.0,
2036
+ "grad_norm": 0.007674542721360922,
2037
+ "learning_rate": 8.514056224899599e-05,
2038
+ "loss": 0.0047,
2039
+ "step": 290
2040
+ },
2041
+ {
2042
+ "epoch": 291.0,
2043
+ "grad_norm": 0.025507695972919464,
2044
+ "learning_rate": 8.473895582329318e-05,
2045
+ "loss": 0.0048,
2046
+ "step": 291
2047
+ },
2048
+ {
2049
+ "epoch": 292.0,
2050
+ "grad_norm": 0.0009824644075706601,
2051
+ "learning_rate": 8.433734939759037e-05,
2052
+ "loss": 0.0049,
2053
+ "step": 292
2054
+ },
2055
+ {
2056
+ "epoch": 293.0,
2057
+ "grad_norm": 0.03781874477863312,
2058
+ "learning_rate": 8.393574297188756e-05,
2059
+ "loss": 0.0046,
2060
+ "step": 293
2061
+ },
2062
+ {
2063
+ "epoch": 294.0,
2064
+ "grad_norm": 0.007710340432822704,
2065
+ "learning_rate": 8.353413654618474e-05,
2066
+ "loss": 0.0047,
2067
+ "step": 294
2068
+ },
2069
+ {
2070
+ "epoch": 295.0,
2071
+ "grad_norm": 0.03508257865905762,
2072
+ "learning_rate": 8.313253012048194e-05,
2073
+ "loss": 0.005,
2074
+ "step": 295
2075
+ },
2076
+ {
2077
+ "epoch": 296.0,
2078
+ "grad_norm": 0.007686258759349585,
2079
+ "learning_rate": 8.273092369477911e-05,
2080
+ "loss": 0.0047,
2081
+ "step": 296
2082
+ },
2083
+ {
2084
+ "epoch": 297.0,
2085
+ "grad_norm": 0.04559750109910965,
2086
+ "learning_rate": 8.232931726907632e-05,
2087
+ "loss": 0.0048,
2088
+ "step": 297
2089
+ },
2090
+ {
2091
+ "epoch": 298.0,
2092
+ "grad_norm": 0.010225856676697731,
2093
+ "learning_rate": 8.192771084337349e-05,
2094
+ "loss": 0.0047,
2095
+ "step": 298
2096
+ },
2097
+ {
2098
+ "epoch": 299.0,
2099
+ "grad_norm": 0.02517218515276909,
2100
+ "learning_rate": 8.15261044176707e-05,
2101
+ "loss": 0.0048,
2102
+ "step": 299
2103
+ },
2104
+ {
2105
+ "epoch": 300.0,
2106
+ "grad_norm": 0.025102809071540833,
2107
+ "learning_rate": 8.112449799196787e-05,
2108
+ "loss": 0.0048,
2109
+ "step": 300
2110
+ },
2111
+ {
2112
+ "epoch": 301.0,
2113
+ "grad_norm": 0.0025317783001810312,
2114
+ "learning_rate": 8.072289156626507e-05,
2115
+ "loss": 0.0045,
2116
+ "step": 301
2117
+ },
2118
+ {
2119
+ "epoch": 302.0,
2120
+ "grad_norm": 0.0447322279214859,
2121
+ "learning_rate": 8.032128514056225e-05,
2122
+ "loss": 0.0048,
2123
+ "step": 302
2124
+ },
2125
+ {
2126
+ "epoch": 303.0,
2127
+ "grad_norm": 0.0074921357445418835,
2128
+ "learning_rate": 7.991967871485944e-05,
2129
+ "loss": 0.0047,
2130
+ "step": 303
2131
+ },
2132
+ {
2133
+ "epoch": 304.0,
2134
+ "grad_norm": 0.02473229542374611,
2135
+ "learning_rate": 7.951807228915663e-05,
2136
+ "loss": 0.0048,
2137
+ "step": 304
2138
+ },
2139
+ {
2140
+ "epoch": 305.0,
2141
+ "grad_norm": 0.007424222771078348,
2142
+ "learning_rate": 7.911646586345382e-05,
2143
+ "loss": 0.0047,
2144
+ "step": 305
2145
+ },
2146
+ {
2147
+ "epoch": 306.0,
2148
+ "grad_norm": 0.027306661009788513,
2149
+ "learning_rate": 7.8714859437751e-05,
2150
+ "loss": 0.0048,
2151
+ "step": 306
2152
+ },
2153
+ {
2154
+ "epoch": 307.0,
2155
+ "grad_norm": 0.007345912978053093,
2156
+ "learning_rate": 7.83132530120482e-05,
2157
+ "loss": 0.0047,
2158
+ "step": 307
2159
+ },
2160
+ {
2161
+ "epoch": 308.0,
2162
+ "grad_norm": 0.02444930374622345,
2163
+ "learning_rate": 7.791164658634539e-05,
2164
+ "loss": 0.0048,
2165
+ "step": 308
2166
+ },
2167
+ {
2168
+ "epoch": 309.0,
2169
+ "grad_norm": 0.02708006091415882,
2170
+ "learning_rate": 7.751004016064257e-05,
2171
+ "loss": 0.0048,
2172
+ "step": 309
2173
+ },
2174
+ {
2175
+ "epoch": 310.0,
2176
+ "grad_norm": 0.007334452122449875,
2177
+ "learning_rate": 7.710843373493976e-05,
2178
+ "loss": 0.0047,
2179
+ "step": 310
2180
+ },
2181
+ {
2182
+ "epoch": 311.0,
2183
+ "grad_norm": 0.009786078706383705,
2184
+ "learning_rate": 7.670682730923695e-05,
2185
+ "loss": 0.0047,
2186
+ "step": 311
2187
+ },
2188
+ {
2189
+ "epoch": 312.0,
2190
+ "grad_norm": 0.007255980744957924,
2191
+ "learning_rate": 7.630522088353414e-05,
2192
+ "loss": 0.0047,
2193
+ "step": 312
2194
+ },
2195
+ {
2196
+ "epoch": 313.0,
2197
+ "grad_norm": 0.014863966032862663,
2198
+ "learning_rate": 7.590361445783133e-05,
2199
+ "loss": 0.0045,
2200
+ "step": 313
2201
+ },
2202
+ {
2203
+ "epoch": 314.0,
2204
+ "grad_norm": 0.009730237536132336,
2205
+ "learning_rate": 7.550200803212851e-05,
2206
+ "loss": 0.0047,
2207
+ "step": 314
2208
+ },
2209
+ {
2210
+ "epoch": 315.0,
2211
+ "grad_norm": 0.00974965188652277,
2212
+ "learning_rate": 7.510040160642571e-05,
2213
+ "loss": 0.0047,
2214
+ "step": 315
2215
+ },
2216
+ {
2217
+ "epoch": 316.0,
2218
+ "grad_norm": 0.0008471576729789376,
2219
+ "learning_rate": 7.469879518072289e-05,
2220
+ "loss": 0.0049,
2221
+ "step": 316
2222
+ },
2223
+ {
2224
+ "epoch": 317.0,
2225
+ "grad_norm": 0.007247288711369038,
2226
+ "learning_rate": 7.429718875502009e-05,
2227
+ "loss": 0.0047,
2228
+ "step": 317
2229
+ },
2230
+ {
2231
+ "epoch": 318.0,
2232
+ "grad_norm": 0.00724770175293088,
2233
+ "learning_rate": 7.389558232931726e-05,
2234
+ "loss": 0.0047,
2235
+ "step": 318
2236
+ },
2237
+ {
2238
+ "epoch": 319.0,
2239
+ "grad_norm": 0.009726744145154953,
2240
+ "learning_rate": 7.349397590361447e-05,
2241
+ "loss": 0.0047,
2242
+ "step": 319
2243
+ },
2244
+ {
2245
+ "epoch": 320.0,
2246
+ "grad_norm": 0.002370405476540327,
2247
+ "learning_rate": 7.309236947791164e-05,
2248
+ "loss": 0.0045,
2249
+ "step": 320
2250
+ },
2251
+ {
2252
+ "epoch": 321.0,
2253
+ "grad_norm": 0.007258490659296513,
2254
+ "learning_rate": 7.269076305220885e-05,
2255
+ "loss": 0.0047,
2256
+ "step": 321
2257
+ },
2258
+ {
2259
+ "epoch": 322.0,
2260
+ "grad_norm": 0.009763582609593868,
2261
+ "learning_rate": 7.228915662650602e-05,
2262
+ "loss": 0.0047,
2263
+ "step": 322
2264
+ },
2265
+ {
2266
+ "epoch": 323.0,
2267
+ "grad_norm": 0.0072934250347316265,
2268
+ "learning_rate": 7.188755020080321e-05,
2269
+ "loss": 0.0047,
2270
+ "step": 323
2271
+ },
2272
+ {
2273
+ "epoch": 324.0,
2274
+ "grad_norm": 0.007325456012040377,
2275
+ "learning_rate": 7.14859437751004e-05,
2276
+ "loss": 0.0047,
2277
+ "step": 324
2278
+ },
2279
+ {
2280
+ "epoch": 325.0,
2281
+ "grad_norm": 0.02716483175754547,
2282
+ "learning_rate": 7.108433734939759e-05,
2283
+ "loss": 0.0048,
2284
+ "step": 325
2285
+ },
2286
+ {
2287
+ "epoch": 326.0,
2288
+ "grad_norm": 0.007358341012150049,
2289
+ "learning_rate": 7.068273092369478e-05,
2290
+ "loss": 0.0047,
2291
+ "step": 326
2292
+ },
2293
+ {
2294
+ "epoch": 327.0,
2295
+ "grad_norm": 0.04177234321832657,
2296
+ "learning_rate": 7.028112449799197e-05,
2297
+ "loss": 0.0048,
2298
+ "step": 327
2299
+ },
2300
+ {
2301
+ "epoch": 328.0,
2302
+ "grad_norm": 0.027355682104825974,
2303
+ "learning_rate": 6.987951807228917e-05,
2304
+ "loss": 0.0048,
2305
+ "step": 328
2306
+ },
2307
+ {
2308
+ "epoch": 329.0,
2309
+ "grad_norm": 0.02739114686846733,
2310
+ "learning_rate": 6.947791164658635e-05,
2311
+ "loss": 0.0048,
2312
+ "step": 329
2313
+ },
2314
+ {
2315
+ "epoch": 330.0,
2316
+ "grad_norm": 0.024521242827177048,
2317
+ "learning_rate": 6.907630522088355e-05,
2318
+ "loss": 0.0048,
2319
+ "step": 330
2320
+ },
2321
+ {
2322
+ "epoch": 331.0,
2323
+ "grad_norm": 0.007352576591074467,
2324
+ "learning_rate": 6.867469879518072e-05,
2325
+ "loss": 0.0047,
2326
+ "step": 331
2327
+ },
2328
+ {
2329
+ "epoch": 332.0,
2330
+ "grad_norm": 0.0023073432967066765,
2331
+ "learning_rate": 6.827309236947793e-05,
2332
+ "loss": 0.0045,
2333
+ "step": 332
2334
+ },
2335
+ {
2336
+ "epoch": 333.0,
2337
+ "grad_norm": 0.009870602749288082,
2338
+ "learning_rate": 6.78714859437751e-05,
2339
+ "loss": 0.0047,
2340
+ "step": 333
2341
+ },
2342
+ {
2343
+ "epoch": 334.0,
2344
+ "grad_norm": 0.00988033413887024,
2345
+ "learning_rate": 6.746987951807229e-05,
2346
+ "loss": 0.0047,
2347
+ "step": 334
2348
+ },
2349
+ {
2350
+ "epoch": 335.0,
2351
+ "grad_norm": 0.024536525830626488,
2352
+ "learning_rate": 6.706827309236948e-05,
2353
+ "loss": 0.0048,
2354
+ "step": 335
2355
+ },
2356
+ {
2357
+ "epoch": 336.0,
2358
+ "grad_norm": 0.009867388755083084,
2359
+ "learning_rate": 6.666666666666667e-05,
2360
+ "loss": 0.0047,
2361
+ "step": 336
2362
+ },
2363
+ {
2364
+ "epoch": 337.0,
2365
+ "grad_norm": 0.00988020095974207,
2366
+ "learning_rate": 6.626506024096386e-05,
2367
+ "loss": 0.0047,
2368
+ "step": 337
2369
+ },
2370
+ {
2371
+ "epoch": 338.0,
2372
+ "grad_norm": 0.007355022244155407,
2373
+ "learning_rate": 6.586345381526105e-05,
2374
+ "loss": 0.0047,
2375
+ "step": 338
2376
+ },
2377
+ {
2378
+ "epoch": 339.0,
2379
+ "grad_norm": 0.007368543650954962,
2380
+ "learning_rate": 6.546184738955824e-05,
2381
+ "loss": 0.0047,
2382
+ "step": 339
2383
+ },
2384
+ {
2385
+ "epoch": 340.0,
2386
+ "grad_norm": 0.0073932805098593235,
2387
+ "learning_rate": 6.506024096385543e-05,
2388
+ "loss": 0.0047,
2389
+ "step": 340
2390
+ },
2391
+ {
2392
+ "epoch": 341.0,
2393
+ "grad_norm": 0.027419747784733772,
2394
+ "learning_rate": 6.465863453815262e-05,
2395
+ "loss": 0.0048,
2396
+ "step": 341
2397
+ },
2398
+ {
2399
+ "epoch": 342.0,
2400
+ "grad_norm": 0.009928101673722267,
2401
+ "learning_rate": 6.42570281124498e-05,
2402
+ "loss": 0.0047,
2403
+ "step": 342
2404
+ },
2405
+ {
2406
+ "epoch": 343.0,
2407
+ "grad_norm": 0.03391006961464882,
2408
+ "learning_rate": 6.385542168674698e-05,
2409
+ "loss": 0.005,
2410
+ "step": 343
2411
+ },
2412
+ {
2413
+ "epoch": 344.0,
2414
+ "grad_norm": 0.024691926315426826,
2415
+ "learning_rate": 6.345381526104418e-05,
2416
+ "loss": 0.0048,
2417
+ "step": 344
2418
+ },
2419
+ {
2420
+ "epoch": 345.0,
2421
+ "grad_norm": 0.036551907658576965,
2422
+ "learning_rate": 6.305220883534136e-05,
2423
+ "loss": 0.0046,
2424
+ "step": 345
2425
+ },
2426
+ {
2427
+ "epoch": 346.0,
2428
+ "grad_norm": 0.027336876839399338,
2429
+ "learning_rate": 6.265060240963856e-05,
2430
+ "loss": 0.0048,
2431
+ "step": 346
2432
+ },
2433
+ {
2434
+ "epoch": 347.0,
2435
+ "grad_norm": 0.016788320615887642,
2436
+ "learning_rate": 6.224899598393574e-05,
2437
+ "loss": 0.005,
2438
+ "step": 347
2439
+ },
2440
+ {
2441
+ "epoch": 348.0,
2442
+ "grad_norm": 0.0243778508156538,
2443
+ "learning_rate": 6.184738955823294e-05,
2444
+ "loss": 0.0048,
2445
+ "step": 348
2446
+ },
2447
+ {
2448
+ "epoch": 349.0,
2449
+ "grad_norm": 0.007300014141947031,
2450
+ "learning_rate": 6.144578313253012e-05,
2451
+ "loss": 0.0047,
2452
+ "step": 349
2453
+ },
2454
+ {
2455
+ "epoch": 350.0,
2456
+ "grad_norm": 0.01766958087682724,
2457
+ "learning_rate": 6.104417670682732e-05,
2458
+ "loss": 0.005,
2459
+ "step": 350
2460
+ },
2461
+ {
2462
+ "epoch": 351.0,
2463
+ "grad_norm": 0.026927420869469643,
2464
+ "learning_rate": 6.06425702811245e-05,
2465
+ "loss": 0.0048,
2466
+ "step": 351
2467
+ },
2468
+ {
2469
+ "epoch": 352.0,
2470
+ "grad_norm": 0.016552148386836052,
2471
+ "learning_rate": 6.02409638554217e-05,
2472
+ "loss": 0.005,
2473
+ "step": 352
2474
+ },
2475
+ {
2476
+ "epoch": 353.0,
2477
+ "grad_norm": 0.02408471331000328,
2478
+ "learning_rate": 5.983935742971888e-05,
2479
+ "loss": 0.0048,
2480
+ "step": 353
2481
+ },
2482
+ {
2483
+ "epoch": 354.0,
2484
+ "grad_norm": 0.0007216089288704097,
2485
+ "learning_rate": 5.943775100401606e-05,
2486
+ "loss": 0.0049,
2487
+ "step": 354
2488
+ },
2489
+ {
2490
+ "epoch": 355.0,
2491
+ "grad_norm": 0.009755726903676987,
2492
+ "learning_rate": 5.903614457831326e-05,
2493
+ "loss": 0.0047,
2494
+ "step": 355
2495
+ },
2496
+ {
2497
+ "epoch": 356.0,
2498
+ "grad_norm": 0.026764798909425735,
2499
+ "learning_rate": 5.863453815261044e-05,
2500
+ "loss": 0.0048,
2501
+ "step": 356
2502
+ },
2503
+ {
2504
+ "epoch": 357.0,
2505
+ "grad_norm": 0.007179186213761568,
2506
+ "learning_rate": 5.823293172690764e-05,
2507
+ "loss": 0.0047,
2508
+ "step": 357
2509
+ },
2510
+ {
2511
+ "epoch": 358.0,
2512
+ "grad_norm": 0.023883482441306114,
2513
+ "learning_rate": 5.783132530120482e-05,
2514
+ "loss": 0.0048,
2515
+ "step": 358
2516
+ },
2517
+ {
2518
+ "epoch": 359.0,
2519
+ "grad_norm": 0.007125131320208311,
2520
+ "learning_rate": 5.7429718875502015e-05,
2521
+ "loss": 0.0047,
2522
+ "step": 359
2523
+ },
2524
+ {
2525
+ "epoch": 360.0,
2526
+ "grad_norm": 0.026448102667927742,
2527
+ "learning_rate": 5.70281124497992e-05,
2528
+ "loss": 0.0048,
2529
+ "step": 360
2530
+ },
2531
+ {
2532
+ "epoch": 361.0,
2533
+ "grad_norm": 0.009603764861822128,
2534
+ "learning_rate": 5.6626506024096394e-05,
2535
+ "loss": 0.0047,
2536
+ "step": 361
2537
+ },
2538
+ {
2539
+ "epoch": 362.0,
2540
+ "grad_norm": 0.023749757558107376,
2541
+ "learning_rate": 5.6224899598393576e-05,
2542
+ "loss": 0.0048,
2543
+ "step": 362
2544
+ },
2545
+ {
2546
+ "epoch": 363.0,
2547
+ "grad_norm": 0.007082722615450621,
2548
+ "learning_rate": 5.582329317269076e-05,
2549
+ "loss": 0.0047,
2550
+ "step": 363
2551
+ },
2552
+ {
2553
+ "epoch": 364.0,
2554
+ "grad_norm": 0.0006819483824074268,
2555
+ "learning_rate": 5.5421686746987955e-05,
2556
+ "loss": 0.0049,
2557
+ "step": 364
2558
+ },
2559
+ {
2560
+ "epoch": 365.0,
2561
+ "grad_norm": 0.026299143210053444,
2562
+ "learning_rate": 5.502008032128514e-05,
2563
+ "loss": 0.0048,
2564
+ "step": 365
2565
+ },
2566
+ {
2567
+ "epoch": 366.0,
2568
+ "grad_norm": 0.007077427115291357,
2569
+ "learning_rate": 5.461847389558233e-05,
2570
+ "loss": 0.0047,
2571
+ "step": 366
2572
+ },
2573
+ {
2574
+ "epoch": 367.0,
2575
+ "grad_norm": 0.0070725963450968266,
2576
+ "learning_rate": 5.4216867469879516e-05,
2577
+ "loss": 0.0047,
2578
+ "step": 367
2579
+ },
2580
+ {
2581
+ "epoch": 368.0,
2582
+ "grad_norm": 0.007042643614113331,
2583
+ "learning_rate": 5.381526104417671e-05,
2584
+ "loss": 0.0047,
2585
+ "step": 368
2586
+ },
2587
+ {
2588
+ "epoch": 369.0,
2589
+ "grad_norm": 0.009521468542516232,
2590
+ "learning_rate": 5.3413654618473894e-05,
2591
+ "loss": 0.0047,
2592
+ "step": 369
2593
+ },
2594
+ {
2595
+ "epoch": 370.0,
2596
+ "grad_norm": 0.0022259766701608896,
2597
+ "learning_rate": 5.301204819277109e-05,
2598
+ "loss": 0.0045,
2599
+ "step": 370
2600
+ },
2601
+ {
2602
+ "epoch": 371.0,
2603
+ "grad_norm": 0.023667145520448685,
2604
+ "learning_rate": 5.261044176706827e-05,
2605
+ "loss": 0.0048,
2606
+ "step": 371
2607
+ },
2608
+ {
2609
+ "epoch": 372.0,
2610
+ "grad_norm": 0.009567582048475742,
2611
+ "learning_rate": 5.220883534136547e-05,
2612
+ "loss": 0.0047,
2613
+ "step": 372
2614
+ },
2615
+ {
2616
+ "epoch": 373.0,
2617
+ "grad_norm": 0.00958697684109211,
2618
+ "learning_rate": 5.180722891566265e-05,
2619
+ "loss": 0.0047,
2620
+ "step": 373
2621
+ },
2622
+ {
2623
+ "epoch": 374.0,
2624
+ "grad_norm": 0.00959386583417654,
2625
+ "learning_rate": 5.140562248995984e-05,
2626
+ "loss": 0.0047,
2627
+ "step": 374
2628
+ },
2629
+ {
2630
+ "epoch": 375.0,
2631
+ "grad_norm": 0.00957813672721386,
2632
+ "learning_rate": 5.100401606425703e-05,
2633
+ "loss": 0.0047,
2634
+ "step": 375
2635
+ },
2636
+ {
2637
+ "epoch": 376.0,
2638
+ "grad_norm": 0.016183258965611458,
2639
+ "learning_rate": 5.060240963855422e-05,
2640
+ "loss": 0.005,
2641
+ "step": 376
2642
+ },
2643
+ {
2644
+ "epoch": 377.0,
2645
+ "grad_norm": 0.007078688126057386,
2646
+ "learning_rate": 5.020080321285141e-05,
2647
+ "loss": 0.0047,
2648
+ "step": 377
2649
+ },
2650
+ {
2651
+ "epoch": 378.0,
2652
+ "grad_norm": 0.007077342830598354,
2653
+ "learning_rate": 4.97991967871486e-05,
2654
+ "loss": 0.0047,
2655
+ "step": 378
2656
+ },
2657
+ {
2658
+ "epoch": 379.0,
2659
+ "grad_norm": 0.002201406517997384,
2660
+ "learning_rate": 4.9397590361445786e-05,
2661
+ "loss": 0.0045,
2662
+ "step": 379
2663
+ },
2664
+ {
2665
+ "epoch": 380.0,
2666
+ "grad_norm": 0.009655999019742012,
2667
+ "learning_rate": 4.8995983935742975e-05,
2668
+ "loss": 0.0047,
2669
+ "step": 380
2670
+ },
2671
+ {
2672
+ "epoch": 381.0,
2673
+ "grad_norm": 0.017296917736530304,
2674
+ "learning_rate": 4.8594377510040165e-05,
2675
+ "loss": 0.005,
2676
+ "step": 381
2677
+ },
2678
+ {
2679
+ "epoch": 382.0,
2680
+ "grad_norm": 0.007125664968043566,
2681
+ "learning_rate": 4.8192771084337354e-05,
2682
+ "loss": 0.0047,
2683
+ "step": 382
2684
+ },
2685
+ {
2686
+ "epoch": 383.0,
2687
+ "grad_norm": 0.0238560251891613,
2688
+ "learning_rate": 4.779116465863454e-05,
2689
+ "loss": 0.0048,
2690
+ "step": 383
2691
+ },
2692
+ {
2693
+ "epoch": 384.0,
2694
+ "grad_norm": 0.009672388434410095,
2695
+ "learning_rate": 4.738955823293173e-05,
2696
+ "loss": 0.0047,
2697
+ "step": 384
2698
+ },
2699
+ {
2700
+ "epoch": 385.0,
2701
+ "grad_norm": 0.0022134389728307724,
2702
+ "learning_rate": 4.698795180722892e-05,
2703
+ "loss": 0.0045,
2704
+ "step": 385
2705
+ },
2706
+ {
2707
+ "epoch": 386.0,
2708
+ "grad_norm": 0.00969780795276165,
2709
+ "learning_rate": 4.658634538152611e-05,
2710
+ "loss": 0.0047,
2711
+ "step": 386
2712
+ },
2713
+ {
2714
+ "epoch": 387.0,
2715
+ "grad_norm": 0.007160300388932228,
2716
+ "learning_rate": 4.61847389558233e-05,
2717
+ "loss": 0.0047,
2718
+ "step": 387
2719
+ },
2720
+ {
2721
+ "epoch": 388.0,
2722
+ "grad_norm": 0.007146279327571392,
2723
+ "learning_rate": 4.578313253012048e-05,
2724
+ "loss": 0.0047,
2725
+ "step": 388
2726
+ },
2727
+ {
2728
+ "epoch": 389.0,
2729
+ "grad_norm": 0.002192781073972583,
2730
+ "learning_rate": 4.538152610441767e-05,
2731
+ "loss": 0.0045,
2732
+ "step": 389
2733
+ },
2734
+ {
2735
+ "epoch": 390.0,
2736
+ "grad_norm": 0.009754459373652935,
2737
+ "learning_rate": 4.497991967871486e-05,
2738
+ "loss": 0.0047,
2739
+ "step": 390
2740
+ },
2741
+ {
2742
+ "epoch": 391.0,
2743
+ "grad_norm": 0.00970425084233284,
2744
+ "learning_rate": 4.457831325301205e-05,
2745
+ "loss": 0.0047,
2746
+ "step": 391
2747
+ },
2748
+ {
2749
+ "epoch": 392.0,
2750
+ "grad_norm": 0.007177860010415316,
2751
+ "learning_rate": 4.417670682730924e-05,
2752
+ "loss": 0.0047,
2753
+ "step": 392
2754
+ },
2755
+ {
2756
+ "epoch": 393.0,
2757
+ "grad_norm": 0.024075862020254135,
2758
+ "learning_rate": 4.377510040160643e-05,
2759
+ "loss": 0.0048,
2760
+ "step": 393
2761
+ },
2762
+ {
2763
+ "epoch": 394.0,
2764
+ "grad_norm": 0.009754209779202938,
2765
+ "learning_rate": 4.337349397590362e-05,
2766
+ "loss": 0.0047,
2767
+ "step": 394
2768
+ },
2769
+ {
2770
+ "epoch": 395.0,
2771
+ "grad_norm": 0.009791336953639984,
2772
+ "learning_rate": 4.297188755020081e-05,
2773
+ "loss": 0.0047,
2774
+ "step": 395
2775
+ },
2776
+ {
2777
+ "epoch": 396.0,
2778
+ "grad_norm": 0.009780234657227993,
2779
+ "learning_rate": 4.2570281124497996e-05,
2780
+ "loss": 0.0047,
2781
+ "step": 396
2782
+ },
2783
+ {
2784
+ "epoch": 397.0,
2785
+ "grad_norm": 0.007180201821029186,
2786
+ "learning_rate": 4.2168674698795186e-05,
2787
+ "loss": 0.0047,
2788
+ "step": 397
2789
+ },
2790
+ {
2791
+ "epoch": 398.0,
2792
+ "grad_norm": 0.007196042221039534,
2793
+ "learning_rate": 4.176706827309237e-05,
2794
+ "loss": 0.0047,
2795
+ "step": 398
2796
+ },
2797
+ {
2798
+ "epoch": 399.0,
2799
+ "grad_norm": 0.00722030783072114,
2800
+ "learning_rate": 4.136546184738956e-05,
2801
+ "loss": 0.0047,
2802
+ "step": 399
2803
+ },
2804
+ {
2805
+ "epoch": 400.0,
2806
+ "grad_norm": 0.009809617884457111,
2807
+ "learning_rate": 4.0963855421686746e-05,
2808
+ "loss": 0.0047,
2809
+ "step": 400
2810
+ },
2811
+ {
2812
+ "epoch": 401.0,
2813
+ "grad_norm": 0.007194210775196552,
2814
+ "learning_rate": 4.0562248995983936e-05,
2815
+ "loss": 0.0047,
2816
+ "step": 401
2817
+ },
2818
+ {
2819
+ "epoch": 402.0,
2820
+ "grad_norm": 0.00984671525657177,
2821
+ "learning_rate": 4.0160642570281125e-05,
2822
+ "loss": 0.0047,
2823
+ "step": 402
2824
+ },
2825
+ {
2826
+ "epoch": 403.0,
2827
+ "grad_norm": 0.00983081478625536,
2828
+ "learning_rate": 3.9759036144578314e-05,
2829
+ "loss": 0.0047,
2830
+ "step": 403
2831
+ },
2832
+ {
2833
+ "epoch": 404.0,
2834
+ "grad_norm": 0.007195820100605488,
2835
+ "learning_rate": 3.93574297188755e-05,
2836
+ "loss": 0.0047,
2837
+ "step": 404
2838
+ },
2839
+ {
2840
+ "epoch": 405.0,
2841
+ "grad_norm": 0.007182563189417124,
2842
+ "learning_rate": 3.895582329317269e-05,
2843
+ "loss": 0.0047,
2844
+ "step": 405
2845
+ },
2846
+ {
2847
+ "epoch": 406.0,
2848
+ "grad_norm": 0.007222824264317751,
2849
+ "learning_rate": 3.855421686746988e-05,
2850
+ "loss": 0.0047,
2851
+ "step": 406
2852
+ },
2853
+ {
2854
+ "epoch": 407.0,
2855
+ "grad_norm": 0.007211462128907442,
2856
+ "learning_rate": 3.815261044176707e-05,
2857
+ "loss": 0.0047,
2858
+ "step": 407
2859
+ },
2860
+ {
2861
+ "epoch": 408.0,
2862
+ "grad_norm": 0.009921679273247719,
2863
+ "learning_rate": 3.7751004016064253e-05,
2864
+ "loss": 0.0047,
2865
+ "step": 408
2866
+ },
2867
+ {
2868
+ "epoch": 409.0,
2869
+ "grad_norm": 0.02687370777130127,
2870
+ "learning_rate": 3.734939759036144e-05,
2871
+ "loss": 0.0048,
2872
+ "step": 409
2873
+ },
2874
+ {
2875
+ "epoch": 410.0,
2876
+ "grad_norm": 0.009853348135948181,
2877
+ "learning_rate": 3.694779116465863e-05,
2878
+ "loss": 0.0047,
2879
+ "step": 410
2880
+ },
2881
+ {
2882
+ "epoch": 411.0,
2883
+ "grad_norm": 0.024240443482995033,
2884
+ "learning_rate": 3.654618473895582e-05,
2885
+ "loss": 0.0048,
2886
+ "step": 411
2887
+ },
2888
+ {
2889
+ "epoch": 412.0,
2890
+ "grad_norm": 0.04131508618593216,
2891
+ "learning_rate": 3.614457831325301e-05,
2892
+ "loss": 0.0048,
2893
+ "step": 412
2894
+ },
2895
+ {
2896
+ "epoch": 413.0,
2897
+ "grad_norm": 0.009910643100738525,
2898
+ "learning_rate": 3.57429718875502e-05,
2899
+ "loss": 0.0047,
2900
+ "step": 413
2901
+ },
2902
+ {
2903
+ "epoch": 414.0,
2904
+ "grad_norm": 0.03583821654319763,
2905
+ "learning_rate": 3.534136546184739e-05,
2906
+ "loss": 0.0046,
2907
+ "step": 414
2908
+ },
2909
+ {
2910
+ "epoch": 415.0,
2911
+ "grad_norm": 0.017576098442077637,
2912
+ "learning_rate": 3.4939759036144585e-05,
2913
+ "loss": 0.005,
2914
+ "step": 415
2915
+ },
2916
+ {
2917
+ "epoch": 416.0,
2918
+ "grad_norm": 0.01653764583170414,
2919
+ "learning_rate": 3.4538152610441774e-05,
2920
+ "loss": 0.005,
2921
+ "step": 416
2922
+ },
2923
+ {
2924
+ "epoch": 417.0,
2925
+ "grad_norm": 0.024247560650110245,
2926
+ "learning_rate": 3.413654618473896e-05,
2927
+ "loss": 0.0048,
2928
+ "step": 417
2929
+ },
2930
+ {
2931
+ "epoch": 418.0,
2932
+ "grad_norm": 0.007164428010582924,
2933
+ "learning_rate": 3.3734939759036146e-05,
2934
+ "loss": 0.0047,
2935
+ "step": 418
2936
+ },
2937
+ {
2938
+ "epoch": 419.0,
2939
+ "grad_norm": 0.0098257539793849,
2940
+ "learning_rate": 3.3333333333333335e-05,
2941
+ "loss": 0.0047,
2942
+ "step": 419
2943
+ },
2944
+ {
2945
+ "epoch": 420.0,
2946
+ "grad_norm": 0.017520597204566002,
2947
+ "learning_rate": 3.2931726907630524e-05,
2948
+ "loss": 0.005,
2949
+ "step": 420
2950
+ },
2951
+ {
2952
+ "epoch": 421.0,
2953
+ "grad_norm": 0.018933523446321487,
2954
+ "learning_rate": 3.253012048192771e-05,
2955
+ "loss": 0.0045,
2956
+ "step": 421
2957
+ },
2958
+ {
2959
+ "epoch": 422.0,
2960
+ "grad_norm": 0.007097120396792889,
2961
+ "learning_rate": 3.21285140562249e-05,
2962
+ "loss": 0.0047,
2963
+ "step": 422
2964
+ },
2965
+ {
2966
+ "epoch": 423.0,
2967
+ "grad_norm": 0.024037552997469902,
2968
+ "learning_rate": 3.172690763052209e-05,
2969
+ "loss": 0.0048,
2970
+ "step": 423
2971
+ },
2972
+ {
2973
+ "epoch": 424.0,
2974
+ "grad_norm": 0.024066118523478508,
2975
+ "learning_rate": 3.132530120481928e-05,
2976
+ "loss": 0.0048,
2977
+ "step": 424
2978
+ },
2979
+ {
2980
+ "epoch": 425.0,
2981
+ "grad_norm": 0.0006517537985928357,
2982
+ "learning_rate": 3.092369477911647e-05,
2983
+ "loss": 0.0049,
2984
+ "step": 425
2985
+ },
2986
+ {
2987
+ "epoch": 426.0,
2988
+ "grad_norm": 0.034052085131406784,
2989
+ "learning_rate": 3.052208835341366e-05,
2990
+ "loss": 0.005,
2991
+ "step": 426
2992
+ },
2993
+ {
2994
+ "epoch": 427.0,
2995
+ "grad_norm": 0.017422359436750412,
2996
+ "learning_rate": 3.012048192771085e-05,
2997
+ "loss": 0.005,
2998
+ "step": 427
2999
+ },
3000
+ {
3001
+ "epoch": 428.0,
3002
+ "grad_norm": 0.009725161828100681,
3003
+ "learning_rate": 2.971887550200803e-05,
3004
+ "loss": 0.0047,
3005
+ "step": 428
3006
+ },
3007
+ {
3008
+ "epoch": 429.0,
3009
+ "grad_norm": 0.023919757455587387,
3010
+ "learning_rate": 2.931726907630522e-05,
3011
+ "loss": 0.0048,
3012
+ "step": 429
3013
+ },
3014
+ {
3015
+ "epoch": 430.0,
3016
+ "grad_norm": 0.04076967015862465,
3017
+ "learning_rate": 2.891566265060241e-05,
3018
+ "loss": 0.0048,
3019
+ "step": 430
3020
+ },
3021
+ {
3022
+ "epoch": 431.0,
3023
+ "grad_norm": 0.002202529925853014,
3024
+ "learning_rate": 2.85140562248996e-05,
3025
+ "loss": 0.0045,
3026
+ "step": 431
3027
+ },
3028
+ {
3029
+ "epoch": 432.0,
3030
+ "grad_norm": 0.026460327208042145,
3031
+ "learning_rate": 2.8112449799196788e-05,
3032
+ "loss": 0.0048,
3033
+ "step": 432
3034
+ },
3035
+ {
3036
+ "epoch": 433.0,
3037
+ "grad_norm": 0.026399288326501846,
3038
+ "learning_rate": 2.7710843373493977e-05,
3039
+ "loss": 0.0048,
3040
+ "step": 433
3041
+ },
3042
+ {
3043
+ "epoch": 434.0,
3044
+ "grad_norm": 0.007069241255521774,
3045
+ "learning_rate": 2.7309236947791167e-05,
3046
+ "loss": 0.0047,
3047
+ "step": 434
3048
+ },
3049
+ {
3050
+ "epoch": 435.0,
3051
+ "grad_norm": 0.023836608976125717,
3052
+ "learning_rate": 2.6907630522088356e-05,
3053
+ "loss": 0.0048,
3054
+ "step": 435
3055
+ },
3056
+ {
3057
+ "epoch": 436.0,
3058
+ "grad_norm": 0.007101175840944052,
3059
+ "learning_rate": 2.6506024096385545e-05,
3060
+ "loss": 0.0047,
3061
+ "step": 436
3062
+ },
3063
+ {
3064
+ "epoch": 437.0,
3065
+ "grad_norm": 0.009633008390665054,
3066
+ "learning_rate": 2.6104417670682734e-05,
3067
+ "loss": 0.0047,
3068
+ "step": 437
3069
+ },
3070
+ {
3071
+ "epoch": 438.0,
3072
+ "grad_norm": 0.00965973362326622,
3073
+ "learning_rate": 2.570281124497992e-05,
3074
+ "loss": 0.0047,
3075
+ "step": 438
3076
+ },
3077
+ {
3078
+ "epoch": 439.0,
3079
+ "grad_norm": 0.009659296832978725,
3080
+ "learning_rate": 2.530120481927711e-05,
3081
+ "loss": 0.0047,
3082
+ "step": 439
3083
+ },
3084
+ {
3085
+ "epoch": 440.0,
3086
+ "grad_norm": 0.00703496765345335,
3087
+ "learning_rate": 2.48995983935743e-05,
3088
+ "loss": 0.0047,
3089
+ "step": 440
3090
+ },
3091
+ {
3092
+ "epoch": 441.0,
3093
+ "grad_norm": 0.007030785549432039,
3094
+ "learning_rate": 2.4497991967871488e-05,
3095
+ "loss": 0.0047,
3096
+ "step": 441
3097
+ },
3098
+ {
3099
+ "epoch": 442.0,
3100
+ "grad_norm": 0.007043666671961546,
3101
+ "learning_rate": 2.4096385542168677e-05,
3102
+ "loss": 0.0047,
3103
+ "step": 442
3104
+ },
3105
+ {
3106
+ "epoch": 443.0,
3107
+ "grad_norm": 0.007040888071060181,
3108
+ "learning_rate": 2.3694779116465866e-05,
3109
+ "loss": 0.0047,
3110
+ "step": 443
3111
+ },
3112
+ {
3113
+ "epoch": 444.0,
3114
+ "grad_norm": 0.009699525311589241,
3115
+ "learning_rate": 2.3293172690763055e-05,
3116
+ "loss": 0.0047,
3117
+ "step": 444
3118
+ },
3119
+ {
3120
+ "epoch": 445.0,
3121
+ "grad_norm": 0.0007015722803771496,
3122
+ "learning_rate": 2.289156626506024e-05,
3123
+ "loss": 0.0049,
3124
+ "step": 445
3125
+ },
3126
+ {
3127
+ "epoch": 446.0,
3128
+ "grad_norm": 0.00966339185833931,
3129
+ "learning_rate": 2.248995983935743e-05,
3130
+ "loss": 0.0047,
3131
+ "step": 446
3132
+ },
3133
+ {
3134
+ "epoch": 447.0,
3135
+ "grad_norm": 0.009668245911598206,
3136
+ "learning_rate": 2.208835341365462e-05,
3137
+ "loss": 0.0047,
3138
+ "step": 447
3139
+ },
3140
+ {
3141
+ "epoch": 448.0,
3142
+ "grad_norm": 0.007046937942504883,
3143
+ "learning_rate": 2.168674698795181e-05,
3144
+ "loss": 0.0047,
3145
+ "step": 448
3146
+ },
3147
+ {
3148
+ "epoch": 449.0,
3149
+ "grad_norm": 0.0070527587085962296,
3150
+ "learning_rate": 2.1285140562248998e-05,
3151
+ "loss": 0.0047,
3152
+ "step": 449
3153
+ },
3154
+ {
3155
+ "epoch": 450.0,
3156
+ "grad_norm": 0.007038488052785397,
3157
+ "learning_rate": 2.0883534136546184e-05,
3158
+ "loss": 0.0047,
3159
+ "step": 450
3160
+ },
3161
+ {
3162
+ "epoch": 451.0,
3163
+ "grad_norm": 0.0021306683775037527,
3164
+ "learning_rate": 2.0481927710843373e-05,
3165
+ "loss": 0.0045,
3166
+ "step": 451
3167
+ },
3168
+ {
3169
+ "epoch": 452.0,
3170
+ "grad_norm": 0.007048788480460644,
3171
+ "learning_rate": 2.0080321285140562e-05,
3172
+ "loss": 0.0047,
3173
+ "step": 452
3174
+ },
3175
+ {
3176
+ "epoch": 453.0,
3177
+ "grad_norm": 0.0006838923436589539,
3178
+ "learning_rate": 1.967871485943775e-05,
3179
+ "loss": 0.0049,
3180
+ "step": 453
3181
+ },
3182
+ {
3183
+ "epoch": 454.0,
3184
+ "grad_norm": 0.00969866942614317,
3185
+ "learning_rate": 1.927710843373494e-05,
3186
+ "loss": 0.0047,
3187
+ "step": 454
3188
+ },
3189
+ {
3190
+ "epoch": 455.0,
3191
+ "grad_norm": 0.009712344966828823,
3192
+ "learning_rate": 1.8875502008032127e-05,
3193
+ "loss": 0.0047,
3194
+ "step": 455
3195
+ },
3196
+ {
3197
+ "epoch": 456.0,
3198
+ "grad_norm": 0.009688866324722767,
3199
+ "learning_rate": 1.8473895582329316e-05,
3200
+ "loss": 0.0047,
3201
+ "step": 456
3202
+ },
3203
+ {
3204
+ "epoch": 457.0,
3205
+ "grad_norm": 0.007071357686072588,
3206
+ "learning_rate": 1.8072289156626505e-05,
3207
+ "loss": 0.0047,
3208
+ "step": 457
3209
+ },
3210
+ {
3211
+ "epoch": 458.0,
3212
+ "grad_norm": 0.014747700653970242,
3213
+ "learning_rate": 1.7670682730923694e-05,
3214
+ "loss": 0.0045,
3215
+ "step": 458
3216
+ },
3217
+ {
3218
+ "epoch": 459.0,
3219
+ "grad_norm": 0.007095323875546455,
3220
+ "learning_rate": 1.7269076305220887e-05,
3221
+ "loss": 0.0047,
3222
+ "step": 459
3223
+ },
3224
+ {
3225
+ "epoch": 460.0,
3226
+ "grad_norm": 0.0021590902470052242,
3227
+ "learning_rate": 1.6867469879518073e-05,
3228
+ "loss": 0.0045,
3229
+ "step": 460
3230
+ },
3231
+ {
3232
+ "epoch": 461.0,
3233
+ "grad_norm": 0.0006770145264454186,
3234
+ "learning_rate": 1.6465863453815262e-05,
3235
+ "loss": 0.0049,
3236
+ "step": 461
3237
+ },
3238
+ {
3239
+ "epoch": 462.0,
3240
+ "grad_norm": 0.0006719469674862921,
3241
+ "learning_rate": 1.606425702811245e-05,
3242
+ "loss": 0.0049,
3243
+ "step": 462
3244
+ },
3245
+ {
3246
+ "epoch": 463.0,
3247
+ "grad_norm": 0.009731734171509743,
3248
+ "learning_rate": 1.566265060240964e-05,
3249
+ "loss": 0.0047,
3250
+ "step": 463
3251
+ },
3252
+ {
3253
+ "epoch": 464.0,
3254
+ "grad_norm": 0.009717794135212898,
3255
+ "learning_rate": 1.526104417670683e-05,
3256
+ "loss": 0.0047,
3257
+ "step": 464
3258
+ },
3259
+ {
3260
+ "epoch": 465.0,
3261
+ "grad_norm": 0.009746178984642029,
3262
+ "learning_rate": 1.4859437751004016e-05,
3263
+ "loss": 0.0047,
3264
+ "step": 465
3265
+ },
3266
+ {
3267
+ "epoch": 466.0,
3268
+ "grad_norm": 0.007102122530341148,
3269
+ "learning_rate": 1.4457831325301205e-05,
3270
+ "loss": 0.0047,
3271
+ "step": 466
3272
+ },
3273
+ {
3274
+ "epoch": 467.0,
3275
+ "grad_norm": 0.00708032725378871,
3276
+ "learning_rate": 1.4056224899598394e-05,
3277
+ "loss": 0.0047,
3278
+ "step": 467
3279
+ },
3280
+ {
3281
+ "epoch": 468.0,
3282
+ "grad_norm": 0.007086113095283508,
3283
+ "learning_rate": 1.3654618473895583e-05,
3284
+ "loss": 0.0047,
3285
+ "step": 468
3286
+ },
3287
+ {
3288
+ "epoch": 469.0,
3289
+ "grad_norm": 0.014743141829967499,
3290
+ "learning_rate": 1.3253012048192772e-05,
3291
+ "loss": 0.0045,
3292
+ "step": 469
3293
+ },
3294
+ {
3295
+ "epoch": 470.0,
3296
+ "grad_norm": 0.0021814049687236547,
3297
+ "learning_rate": 1.285140562248996e-05,
3298
+ "loss": 0.0045,
3299
+ "step": 470
3300
+ },
3301
+ {
3302
+ "epoch": 471.0,
3303
+ "grad_norm": 0.007091291714459658,
3304
+ "learning_rate": 1.244979919678715e-05,
3305
+ "loss": 0.0047,
3306
+ "step": 471
3307
+ },
3308
+ {
3309
+ "epoch": 472.0,
3310
+ "grad_norm": 0.0071353972889482975,
3311
+ "learning_rate": 1.2048192771084338e-05,
3312
+ "loss": 0.0047,
3313
+ "step": 472
3314
+ },
3315
+ {
3316
+ "epoch": 473.0,
3317
+ "grad_norm": 0.009766222909092903,
3318
+ "learning_rate": 1.1646586345381528e-05,
3319
+ "loss": 0.0047,
3320
+ "step": 473
3321
+ },
3322
+ {
3323
+ "epoch": 474.0,
3324
+ "grad_norm": 0.017397722229361534,
3325
+ "learning_rate": 1.1244979919678715e-05,
3326
+ "loss": 0.005,
3327
+ "step": 474
3328
+ },
3329
+ {
3330
+ "epoch": 475.0,
3331
+ "grad_norm": 0.026444077491760254,
3332
+ "learning_rate": 1.0843373493975904e-05,
3333
+ "loss": 0.0048,
3334
+ "step": 475
3335
+ },
3336
+ {
3337
+ "epoch": 476.0,
3338
+ "grad_norm": 0.009789888747036457,
3339
+ "learning_rate": 1.0441767068273092e-05,
3340
+ "loss": 0.0047,
3341
+ "step": 476
3342
+ },
3343
+ {
3344
+ "epoch": 477.0,
3345
+ "grad_norm": 0.007101455237716436,
3346
+ "learning_rate": 1.0040160642570281e-05,
3347
+ "loss": 0.0047,
3348
+ "step": 477
3349
+ },
3350
+ {
3351
+ "epoch": 478.0,
3352
+ "grad_norm": 0.007093328982591629,
3353
+ "learning_rate": 9.63855421686747e-06,
3354
+ "loss": 0.0047,
3355
+ "step": 478
3356
+ },
3357
+ {
3358
+ "epoch": 479.0,
3359
+ "grad_norm": 0.014785111881792545,
3360
+ "learning_rate": 9.236947791164658e-06,
3361
+ "loss": 0.0045,
3362
+ "step": 479
3363
+ },
3364
+ {
3365
+ "epoch": 480.0,
3366
+ "grad_norm": 0.014800012111663818,
3367
+ "learning_rate": 8.835341365461847e-06,
3368
+ "loss": 0.0045,
3369
+ "step": 480
3370
+ },
3371
+ {
3372
+ "epoch": 481.0,
3373
+ "grad_norm": 0.00710406294092536,
3374
+ "learning_rate": 8.433734939759036e-06,
3375
+ "loss": 0.0047,
3376
+ "step": 481
3377
+ },
3378
+ {
3379
+ "epoch": 482.0,
3380
+ "grad_norm": 0.0163735318928957,
3381
+ "learning_rate": 8.032128514056226e-06,
3382
+ "loss": 0.005,
3383
+ "step": 482
3384
+ },
3385
+ {
3386
+ "epoch": 483.0,
3387
+ "grad_norm": 0.00218359031714499,
3388
+ "learning_rate": 7.630522088353415e-06,
3389
+ "loss": 0.0045,
3390
+ "step": 483
3391
+ },
3392
+ {
3393
+ "epoch": 484.0,
3394
+ "grad_norm": 0.009749580174684525,
3395
+ "learning_rate": 7.228915662650602e-06,
3396
+ "loss": 0.0047,
3397
+ "step": 484
3398
+ },
3399
+ {
3400
+ "epoch": 485.0,
3401
+ "grad_norm": 0.009765205904841423,
3402
+ "learning_rate": 6.827309236947792e-06,
3403
+ "loss": 0.0047,
3404
+ "step": 485
3405
+ },
3406
+ {
3407
+ "epoch": 486.0,
3408
+ "grad_norm": 0.000685015635099262,
3409
+ "learning_rate": 6.42570281124498e-06,
3410
+ "loss": 0.0049,
3411
+ "step": 486
3412
+ },
3413
+ {
3414
+ "epoch": 487.0,
3415
+ "grad_norm": 0.00976163987070322,
3416
+ "learning_rate": 6.024096385542169e-06,
3417
+ "loss": 0.0047,
3418
+ "step": 487
3419
+ },
3420
+ {
3421
+ "epoch": 488.0,
3422
+ "grad_norm": 0.0006887281779199839,
3423
+ "learning_rate": 5.622489959839358e-06,
3424
+ "loss": 0.0049,
3425
+ "step": 488
3426
+ },
3427
+ {
3428
+ "epoch": 489.0,
3429
+ "grad_norm": 0.009809976443648338,
3430
+ "learning_rate": 5.220883534136546e-06,
3431
+ "loss": 0.0047,
3432
+ "step": 489
3433
+ },
3434
+ {
3435
+ "epoch": 490.0,
3436
+ "grad_norm": 0.009795522317290306,
3437
+ "learning_rate": 4.819277108433735e-06,
3438
+ "loss": 0.0047,
3439
+ "step": 490
3440
+ },
3441
+ {
3442
+ "epoch": 491.0,
3443
+ "grad_norm": 0.009749695658683777,
3444
+ "learning_rate": 4.417670682730924e-06,
3445
+ "loss": 0.0047,
3446
+ "step": 491
3447
+ },
3448
+ {
3449
+ "epoch": 492.0,
3450
+ "grad_norm": 0.0021824862342327833,
3451
+ "learning_rate": 4.016064257028113e-06,
3452
+ "loss": 0.0045,
3453
+ "step": 492
3454
+ },
3455
+ {
3456
+ "epoch": 493.0,
3457
+ "grad_norm": 0.007140113040804863,
3458
+ "learning_rate": 3.614457831325301e-06,
3459
+ "loss": 0.0047,
3460
+ "step": 493
3461
+ },
3462
+ {
3463
+ "epoch": 494.0,
3464
+ "grad_norm": 0.007124903611838818,
3465
+ "learning_rate": 3.21285140562249e-06,
3466
+ "loss": 0.0047,
3467
+ "step": 494
3468
+ },
3469
+ {
3470
+ "epoch": 495.0,
3471
+ "grad_norm": 0.0071158865466713905,
3472
+ "learning_rate": 2.811244979919679e-06,
3473
+ "loss": 0.0047,
3474
+ "step": 495
3475
+ },
3476
+ {
3477
+ "epoch": 496.0,
3478
+ "grad_norm": 0.007119073532521725,
3479
+ "learning_rate": 2.4096385542168676e-06,
3480
+ "loss": 0.0047,
3481
+ "step": 496
3482
+ },
3483
+ {
3484
+ "epoch": 497.0,
3485
+ "grad_norm": 0.007112898863852024,
3486
+ "learning_rate": 2.0080321285140564e-06,
3487
+ "loss": 0.0047,
3488
+ "step": 497
3489
+ },
3490
+ {
3491
+ "epoch": 498.0,
3492
+ "grad_norm": 0.014866248704493046,
3493
+ "learning_rate": 1.606425702811245e-06,
3494
+ "loss": 0.0045,
3495
+ "step": 498
3496
+ },
3497
+ {
3498
+ "epoch": 499.0,
3499
+ "grad_norm": 0.007127212826162577,
3500
+ "learning_rate": 1.2048192771084338e-06,
3501
+ "loss": 0.0047,
3502
+ "step": 499
3503
+ },
3504
+ {
3505
+ "epoch": 500.0,
3506
+ "grad_norm": 0.007142535876482725,
3507
+ "learning_rate": 8.032128514056225e-07,
3508
+ "loss": 0.0047,
3509
+ "step": 500
3510
  }
3511
  ],
3512
  "logging_steps": 1,
3513
+ "max_steps": 500,
3514
  "num_input_tokens_seen": 0,
3515
+ "num_train_epochs": 500,
3516
  "save_steps": 500,
3517
+ "total_flos": 3475984143360000.0,
3518
  "train_batch_size": 1,
3519
  "trial_name": null,
3520
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c913740270ce755c394445d072d8363aa026f0debcf3f5a3a5f27a1eea73cb07
3
  size 4984
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be3c39cb3e8fc5eccbc435ae7d38b2c12dcd5cd4225e0927ad68ac23f57a5051
3
  size 4984