PEFT
barbaroo commited on
Commit
197100e
1 Parent(s): ab66be7

Upload 8 files

Browse files
README.md ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: AI-Sweden-Models/gpt-sw3-1.3b
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ## Training procedure
201
+
202
+
203
+ The following `bitsandbytes` quantization config was used during training:
204
+ - quant_method: bitsandbytes
205
+ - load_in_8bit: True
206
+ - load_in_4bit: False
207
+ - llm_int8_threshold: 6.0
208
+ - llm_int8_skip_modules: None
209
+ - llm_int8_enable_fp32_cpu_offload: False
210
+ - llm_int8_has_fp16_weight: False
211
+ - bnb_4bit_quant_type: fp4
212
+ - bnb_4bit_use_double_quant: False
213
+ - bnb_4bit_compute_dtype: float32
214
+
215
+ ### Framework versions
216
+
217
+
218
+ - PEFT 0.6.2.dev0
adapter_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "AI-Sweden-Models/gpt-sw3-1.3b",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "lora_alpha": 4,
12
+ "lora_dropout": 0.1,
13
+ "modules_to_save": null,
14
+ "peft_type": "LORA",
15
+ "r": 4,
16
+ "rank_pattern": {},
17
+ "revision": null,
18
+ "target_modules": [
19
+ "c_proj",
20
+ "c_attn"
21
+ ],
22
+ "task_type": "CAUSAL_LM"
23
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb2decd3194becaa80d28e6c3efd8eaf6af2623c565623d38fc8d4520ed13167
3
+ size 8702090
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5574d4d1767919cc69415fc2bb95c9d7e3aeb2f4cdb8ea23efa71f2d99fa3e63
3
+ size 17422522
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acc44475eaa37b1fc34677a8c262f99486feb08a541bd05f44be261f4e14b289
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3260dbb521f258c8cac26e24d13a21999783314dc623088dba4d6dc168ca1b9
3
+ size 1064
trainer_state.json ADDED
@@ -0,0 +1,3267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 3.2050249576568604,
3
+ "best_model_checkpoint": "outputs-small/checkpoint-232000",
4
+ "epoch": 9.961698984937223,
5
+ "eval_steps": 4000,
6
+ "global_step": 232000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02,
13
+ "learning_rate": 0.000399312685252803,
14
+ "loss": 2.7854,
15
+ "step": 500
16
+ },
17
+ {
18
+ "epoch": 0.04,
19
+ "learning_rate": 0.0003984535418188067,
20
+ "loss": 2.7344,
21
+ "step": 1000
22
+ },
23
+ {
24
+ "epoch": 0.06,
25
+ "learning_rate": 0.00039759439838481036,
26
+ "loss": 2.7156,
27
+ "step": 1500
28
+ },
29
+ {
30
+ "epoch": 0.09,
31
+ "learning_rate": 0.00039673525495081404,
32
+ "loss": 2.6988,
33
+ "step": 2000
34
+ },
35
+ {
36
+ "epoch": 0.11,
37
+ "learning_rate": 0.00039587611151681773,
38
+ "loss": 2.6938,
39
+ "step": 2500
40
+ },
41
+ {
42
+ "epoch": 0.13,
43
+ "learning_rate": 0.0003950169680828214,
44
+ "loss": 2.6735,
45
+ "step": 3000
46
+ },
47
+ {
48
+ "epoch": 0.15,
49
+ "learning_rate": 0.00039415782464882515,
50
+ "loss": 2.6691,
51
+ "step": 3500
52
+ },
53
+ {
54
+ "epoch": 0.17,
55
+ "learning_rate": 0.00039329868121482884,
56
+ "loss": 2.6607,
57
+ "step": 4000
58
+ },
59
+ {
60
+ "epoch": 0.17,
61
+ "eval_loss": 3.4210398197174072,
62
+ "eval_runtime": 69.8141,
63
+ "eval_samples_per_second": 35.809,
64
+ "eval_steps_per_second": 8.952,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 0.19,
69
+ "learning_rate": 0.0003924395377808325,
70
+ "loss": 2.6441,
71
+ "step": 4500
72
+ },
73
+ {
74
+ "epoch": 0.21,
75
+ "learning_rate": 0.0003915803943468362,
76
+ "loss": 2.6413,
77
+ "step": 5000
78
+ },
79
+ {
80
+ "epoch": 0.24,
81
+ "learning_rate": 0.0003907212509128399,
82
+ "loss": 2.6438,
83
+ "step": 5500
84
+ },
85
+ {
86
+ "epoch": 0.26,
87
+ "learning_rate": 0.0003898621074788436,
88
+ "loss": 2.64,
89
+ "step": 6000
90
+ },
91
+ {
92
+ "epoch": 0.28,
93
+ "learning_rate": 0.00038900296404484726,
94
+ "loss": 2.6249,
95
+ "step": 6500
96
+ },
97
+ {
98
+ "epoch": 0.3,
99
+ "learning_rate": 0.000388143820610851,
100
+ "loss": 2.6251,
101
+ "step": 7000
102
+ },
103
+ {
104
+ "epoch": 0.32,
105
+ "learning_rate": 0.0003872846771768547,
106
+ "loss": 2.6362,
107
+ "step": 7500
108
+ },
109
+ {
110
+ "epoch": 0.34,
111
+ "learning_rate": 0.00038642553374285837,
112
+ "loss": 2.6154,
113
+ "step": 8000
114
+ },
115
+ {
116
+ "epoch": 0.34,
117
+ "eval_loss": 3.375439167022705,
118
+ "eval_runtime": 69.7038,
119
+ "eval_samples_per_second": 35.866,
120
+ "eval_steps_per_second": 8.967,
121
+ "step": 8000
122
+ },
123
+ {
124
+ "epoch": 0.36,
125
+ "learning_rate": 0.00038556639030886205,
126
+ "loss": 2.6244,
127
+ "step": 8500
128
+ },
129
+ {
130
+ "epoch": 0.39,
131
+ "learning_rate": 0.0003847072468748658,
132
+ "loss": 2.605,
133
+ "step": 9000
134
+ },
135
+ {
136
+ "epoch": 0.41,
137
+ "learning_rate": 0.0003838481034408695,
138
+ "loss": 2.607,
139
+ "step": 9500
140
+ },
141
+ {
142
+ "epoch": 0.43,
143
+ "learning_rate": 0.00038298896000687316,
144
+ "loss": 2.5968,
145
+ "step": 10000
146
+ },
147
+ {
148
+ "epoch": 0.45,
149
+ "learning_rate": 0.0003821298165728769,
150
+ "loss": 2.6011,
151
+ "step": 10500
152
+ },
153
+ {
154
+ "epoch": 0.47,
155
+ "learning_rate": 0.0003812706731388806,
156
+ "loss": 2.6044,
157
+ "step": 11000
158
+ },
159
+ {
160
+ "epoch": 0.49,
161
+ "learning_rate": 0.00038041152970488427,
162
+ "loss": 2.6152,
163
+ "step": 11500
164
+ },
165
+ {
166
+ "epoch": 0.52,
167
+ "learning_rate": 0.00037955238627088796,
168
+ "loss": 2.5931,
169
+ "step": 12000
170
+ },
171
+ {
172
+ "epoch": 0.52,
173
+ "eval_loss": 3.3464293479919434,
174
+ "eval_runtime": 70.0461,
175
+ "eval_samples_per_second": 35.691,
176
+ "eval_steps_per_second": 8.923,
177
+ "step": 12000
178
+ },
179
+ {
180
+ "epoch": 0.54,
181
+ "learning_rate": 0.00037869324283689164,
182
+ "loss": 2.6057,
183
+ "step": 12500
184
+ },
185
+ {
186
+ "epoch": 0.56,
187
+ "learning_rate": 0.0003778340994028953,
188
+ "loss": 2.5971,
189
+ "step": 13000
190
+ },
191
+ {
192
+ "epoch": 0.58,
193
+ "learning_rate": 0.00037697495596889907,
194
+ "loss": 2.5972,
195
+ "step": 13500
196
+ },
197
+ {
198
+ "epoch": 0.6,
199
+ "learning_rate": 0.00037611581253490275,
200
+ "loss": 2.5886,
201
+ "step": 14000
202
+ },
203
+ {
204
+ "epoch": 0.62,
205
+ "learning_rate": 0.00037525666910090644,
206
+ "loss": 2.6032,
207
+ "step": 14500
208
+ },
209
+ {
210
+ "epoch": 0.64,
211
+ "learning_rate": 0.0003743975256669101,
212
+ "loss": 2.5948,
213
+ "step": 15000
214
+ },
215
+ {
216
+ "epoch": 0.67,
217
+ "learning_rate": 0.0003735383822329138,
218
+ "loss": 2.5832,
219
+ "step": 15500
220
+ },
221
+ {
222
+ "epoch": 0.69,
223
+ "learning_rate": 0.0003726792387989175,
224
+ "loss": 2.5915,
225
+ "step": 16000
226
+ },
227
+ {
228
+ "epoch": 0.69,
229
+ "eval_loss": 3.3282320499420166,
230
+ "eval_runtime": 69.8767,
231
+ "eval_samples_per_second": 35.777,
232
+ "eval_steps_per_second": 8.944,
233
+ "step": 16000
234
+ },
235
+ {
236
+ "epoch": 0.71,
237
+ "learning_rate": 0.0003718200953649212,
238
+ "loss": 2.5876,
239
+ "step": 16500
240
+ },
241
+ {
242
+ "epoch": 0.73,
243
+ "learning_rate": 0.0003709609519309249,
244
+ "loss": 2.5797,
245
+ "step": 17000
246
+ },
247
+ {
248
+ "epoch": 0.75,
249
+ "learning_rate": 0.0003701018084969286,
250
+ "loss": 2.5902,
251
+ "step": 17500
252
+ },
253
+ {
254
+ "epoch": 0.77,
255
+ "learning_rate": 0.0003692426650629323,
256
+ "loss": 2.5837,
257
+ "step": 18000
258
+ },
259
+ {
260
+ "epoch": 0.79,
261
+ "learning_rate": 0.00036838352162893597,
262
+ "loss": 2.5814,
263
+ "step": 18500
264
+ },
265
+ {
266
+ "epoch": 0.82,
267
+ "learning_rate": 0.00036752437819493965,
268
+ "loss": 2.5798,
269
+ "step": 19000
270
+ },
271
+ {
272
+ "epoch": 0.84,
273
+ "learning_rate": 0.00036666523476094334,
274
+ "loss": 2.5868,
275
+ "step": 19500
276
+ },
277
+ {
278
+ "epoch": 0.86,
279
+ "learning_rate": 0.000365806091326947,
280
+ "loss": 2.5812,
281
+ "step": 20000
282
+ },
283
+ {
284
+ "epoch": 0.86,
285
+ "eval_loss": 3.3161280155181885,
286
+ "eval_runtime": 69.8895,
287
+ "eval_samples_per_second": 35.771,
288
+ "eval_steps_per_second": 8.943,
289
+ "step": 20000
290
+ },
291
+ {
292
+ "epoch": 0.88,
293
+ "learning_rate": 0.00036494694789295076,
294
+ "loss": 2.5681,
295
+ "step": 20500
296
+ },
297
+ {
298
+ "epoch": 0.9,
299
+ "learning_rate": 0.00036408780445895445,
300
+ "loss": 2.583,
301
+ "step": 21000
302
+ },
303
+ {
304
+ "epoch": 0.92,
305
+ "learning_rate": 0.00036322866102495813,
306
+ "loss": 2.5875,
307
+ "step": 21500
308
+ },
309
+ {
310
+ "epoch": 0.94,
311
+ "learning_rate": 0.0003623695175909618,
312
+ "loss": 2.5692,
313
+ "step": 22000
314
+ },
315
+ {
316
+ "epoch": 0.97,
317
+ "learning_rate": 0.0003615103741569655,
318
+ "loss": 2.5728,
319
+ "step": 22500
320
+ },
321
+ {
322
+ "epoch": 0.99,
323
+ "learning_rate": 0.0003606512307229692,
324
+ "loss": 2.58,
325
+ "step": 23000
326
+ },
327
+ {
328
+ "epoch": 1.01,
329
+ "learning_rate": 0.00035979208728897287,
330
+ "loss": 2.5674,
331
+ "step": 23500
332
+ },
333
+ {
334
+ "epoch": 1.03,
335
+ "learning_rate": 0.0003589329438549766,
336
+ "loss": 2.5685,
337
+ "step": 24000
338
+ },
339
+ {
340
+ "epoch": 1.03,
341
+ "eval_loss": 3.305285930633545,
342
+ "eval_runtime": 70.0352,
343
+ "eval_samples_per_second": 35.696,
344
+ "eval_steps_per_second": 8.924,
345
+ "step": 24000
346
+ },
347
+ {
348
+ "epoch": 1.05,
349
+ "learning_rate": 0.0003580738004209803,
350
+ "loss": 2.5612,
351
+ "step": 24500
352
+ },
353
+ {
354
+ "epoch": 1.07,
355
+ "learning_rate": 0.000357214656986984,
356
+ "loss": 2.5571,
357
+ "step": 25000
358
+ },
359
+ {
360
+ "epoch": 1.09,
361
+ "learning_rate": 0.00035635551355298766,
362
+ "loss": 2.5599,
363
+ "step": 25500
364
+ },
365
+ {
366
+ "epoch": 1.12,
367
+ "learning_rate": 0.00035549637011899135,
368
+ "loss": 2.5691,
369
+ "step": 26000
370
+ },
371
+ {
372
+ "epoch": 1.14,
373
+ "learning_rate": 0.00035463722668499503,
374
+ "loss": 2.5652,
375
+ "step": 26500
376
+ },
377
+ {
378
+ "epoch": 1.16,
379
+ "learning_rate": 0.0003537780832509988,
380
+ "loss": 2.5615,
381
+ "step": 27000
382
+ },
383
+ {
384
+ "epoch": 1.18,
385
+ "learning_rate": 0.00035291893981700246,
386
+ "loss": 2.5624,
387
+ "step": 27500
388
+ },
389
+ {
390
+ "epoch": 1.2,
391
+ "learning_rate": 0.00035205979638300614,
392
+ "loss": 2.5551,
393
+ "step": 28000
394
+ },
395
+ {
396
+ "epoch": 1.2,
397
+ "eval_loss": 3.3031015396118164,
398
+ "eval_runtime": 70.3897,
399
+ "eval_samples_per_second": 35.517,
400
+ "eval_steps_per_second": 8.879,
401
+ "step": 28000
402
+ },
403
+ {
404
+ "epoch": 1.22,
405
+ "learning_rate": 0.0003512006529490099,
406
+ "loss": 2.5608,
407
+ "step": 28500
408
+ },
409
+ {
410
+ "epoch": 1.25,
411
+ "learning_rate": 0.00035034150951501357,
412
+ "loss": 2.5662,
413
+ "step": 29000
414
+ },
415
+ {
416
+ "epoch": 1.27,
417
+ "learning_rate": 0.00034948236608101725,
418
+ "loss": 2.5642,
419
+ "step": 29500
420
+ },
421
+ {
422
+ "epoch": 1.29,
423
+ "learning_rate": 0.00034862322264702094,
424
+ "loss": 2.5727,
425
+ "step": 30000
426
+ },
427
+ {
428
+ "epoch": 1.31,
429
+ "learning_rate": 0.0003477640792130247,
430
+ "loss": 2.5536,
431
+ "step": 30500
432
+ },
433
+ {
434
+ "epoch": 1.33,
435
+ "learning_rate": 0.00034690493577902836,
436
+ "loss": 2.5683,
437
+ "step": 31000
438
+ },
439
+ {
440
+ "epoch": 1.35,
441
+ "learning_rate": 0.00034604579234503205,
442
+ "loss": 2.5515,
443
+ "step": 31500
444
+ },
445
+ {
446
+ "epoch": 1.37,
447
+ "learning_rate": 0.00034518664891103573,
448
+ "loss": 2.5568,
449
+ "step": 32000
450
+ },
451
+ {
452
+ "epoch": 1.37,
453
+ "eval_loss": 3.2884230613708496,
454
+ "eval_runtime": 70.9923,
455
+ "eval_samples_per_second": 35.215,
456
+ "eval_steps_per_second": 8.804,
457
+ "step": 32000
458
+ },
459
+ {
460
+ "epoch": 1.4,
461
+ "learning_rate": 0.0003443275054770394,
462
+ "loss": 2.5517,
463
+ "step": 32500
464
+ },
465
+ {
466
+ "epoch": 1.42,
467
+ "learning_rate": 0.0003434683620430431,
468
+ "loss": 2.5621,
469
+ "step": 33000
470
+ },
471
+ {
472
+ "epoch": 1.44,
473
+ "learning_rate": 0.0003426092186090468,
474
+ "loss": 2.5494,
475
+ "step": 33500
476
+ },
477
+ {
478
+ "epoch": 1.46,
479
+ "learning_rate": 0.0003417500751750505,
480
+ "loss": 2.55,
481
+ "step": 34000
482
+ },
483
+ {
484
+ "epoch": 1.48,
485
+ "learning_rate": 0.0003408909317410542,
486
+ "loss": 2.5584,
487
+ "step": 34500
488
+ },
489
+ {
490
+ "epoch": 1.5,
491
+ "learning_rate": 0.0003400317883070579,
492
+ "loss": 2.5564,
493
+ "step": 35000
494
+ },
495
+ {
496
+ "epoch": 1.52,
497
+ "learning_rate": 0.0003391726448730616,
498
+ "loss": 2.5529,
499
+ "step": 35500
500
+ },
501
+ {
502
+ "epoch": 1.55,
503
+ "learning_rate": 0.00033831350143906526,
504
+ "loss": 2.5552,
505
+ "step": 36000
506
+ },
507
+ {
508
+ "epoch": 1.55,
509
+ "eval_loss": 3.2850100994110107,
510
+ "eval_runtime": 69.9369,
511
+ "eval_samples_per_second": 35.747,
512
+ "eval_steps_per_second": 8.937,
513
+ "step": 36000
514
+ },
515
+ {
516
+ "epoch": 1.57,
517
+ "learning_rate": 0.00033745435800506895,
518
+ "loss": 2.5711,
519
+ "step": 36500
520
+ },
521
+ {
522
+ "epoch": 1.59,
523
+ "learning_rate": 0.00033659521457107263,
524
+ "loss": 2.5477,
525
+ "step": 37000
526
+ },
527
+ {
528
+ "epoch": 1.61,
529
+ "learning_rate": 0.00033573607113707637,
530
+ "loss": 2.5592,
531
+ "step": 37500
532
+ },
533
+ {
534
+ "epoch": 1.63,
535
+ "learning_rate": 0.00033487692770308006,
536
+ "loss": 2.5594,
537
+ "step": 38000
538
+ },
539
+ {
540
+ "epoch": 1.65,
541
+ "learning_rate": 0.00033401778426908374,
542
+ "loss": 2.5456,
543
+ "step": 38500
544
+ },
545
+ {
546
+ "epoch": 1.67,
547
+ "learning_rate": 0.0003331586408350874,
548
+ "loss": 2.5583,
549
+ "step": 39000
550
+ },
551
+ {
552
+ "epoch": 1.7,
553
+ "learning_rate": 0.0003322994974010911,
554
+ "loss": 2.5503,
555
+ "step": 39500
556
+ },
557
+ {
558
+ "epoch": 1.72,
559
+ "learning_rate": 0.0003314403539670948,
560
+ "loss": 2.5502,
561
+ "step": 40000
562
+ },
563
+ {
564
+ "epoch": 1.72,
565
+ "eval_loss": 3.289710521697998,
566
+ "eval_runtime": 69.6403,
567
+ "eval_samples_per_second": 35.899,
568
+ "eval_steps_per_second": 8.975,
569
+ "step": 40000
570
+ },
571
+ {
572
+ "epoch": 1.74,
573
+ "learning_rate": 0.0003305812105330985,
574
+ "loss": 2.5422,
575
+ "step": 40500
576
+ },
577
+ {
578
+ "epoch": 1.76,
579
+ "learning_rate": 0.0003297220670991022,
580
+ "loss": 2.5566,
581
+ "step": 41000
582
+ },
583
+ {
584
+ "epoch": 1.78,
585
+ "learning_rate": 0.0003288629236651059,
586
+ "loss": 2.5492,
587
+ "step": 41500
588
+ },
589
+ {
590
+ "epoch": 1.8,
591
+ "learning_rate": 0.0003280037802311096,
592
+ "loss": 2.5482,
593
+ "step": 42000
594
+ },
595
+ {
596
+ "epoch": 1.82,
597
+ "learning_rate": 0.0003271446367971133,
598
+ "loss": 2.5538,
599
+ "step": 42500
600
+ },
601
+ {
602
+ "epoch": 1.85,
603
+ "learning_rate": 0.00032628549336311696,
604
+ "loss": 2.5482,
605
+ "step": 43000
606
+ },
607
+ {
608
+ "epoch": 1.87,
609
+ "learning_rate": 0.00032542634992912064,
610
+ "loss": 2.5499,
611
+ "step": 43500
612
+ },
613
+ {
614
+ "epoch": 1.89,
615
+ "learning_rate": 0.0003245672064951244,
616
+ "loss": 2.5553,
617
+ "step": 44000
618
+ },
619
+ {
620
+ "epoch": 1.89,
621
+ "eval_loss": 3.2857840061187744,
622
+ "eval_runtime": 69.6297,
623
+ "eval_samples_per_second": 35.904,
624
+ "eval_steps_per_second": 8.976,
625
+ "step": 44000
626
+ },
627
+ {
628
+ "epoch": 1.91,
629
+ "learning_rate": 0.00032370806306112807,
630
+ "loss": 2.5506,
631
+ "step": 44500
632
+ },
633
+ {
634
+ "epoch": 1.93,
635
+ "learning_rate": 0.00032284891962713175,
636
+ "loss": 2.5452,
637
+ "step": 45000
638
+ },
639
+ {
640
+ "epoch": 1.95,
641
+ "learning_rate": 0.00032198977619313544,
642
+ "loss": 2.5505,
643
+ "step": 45500
644
+ },
645
+ {
646
+ "epoch": 1.98,
647
+ "learning_rate": 0.0003211306327591392,
648
+ "loss": 2.5403,
649
+ "step": 46000
650
+ },
651
+ {
652
+ "epoch": 2.0,
653
+ "learning_rate": 0.00032027148932514286,
654
+ "loss": 2.5493,
655
+ "step": 46500
656
+ },
657
+ {
658
+ "epoch": 2.02,
659
+ "learning_rate": 0.00031941234589114655,
660
+ "loss": 2.5413,
661
+ "step": 47000
662
+ },
663
+ {
664
+ "epoch": 2.04,
665
+ "learning_rate": 0.00031855320245715023,
666
+ "loss": 2.538,
667
+ "step": 47500
668
+ },
669
+ {
670
+ "epoch": 2.06,
671
+ "learning_rate": 0.00031769405902315397,
672
+ "loss": 2.5408,
673
+ "step": 48000
674
+ },
675
+ {
676
+ "epoch": 2.06,
677
+ "eval_loss": 3.2898008823394775,
678
+ "eval_runtime": 68.7611,
679
+ "eval_samples_per_second": 36.358,
680
+ "eval_steps_per_second": 9.089,
681
+ "step": 48000
682
+ },
683
+ {
684
+ "epoch": 2.08,
685
+ "learning_rate": 0.00031683491558915765,
686
+ "loss": 2.5362,
687
+ "step": 48500
688
+ },
689
+ {
690
+ "epoch": 2.1,
691
+ "learning_rate": 0.00031597577215516134,
692
+ "loss": 2.5368,
693
+ "step": 49000
694
+ },
695
+ {
696
+ "epoch": 2.13,
697
+ "learning_rate": 0.000315116628721165,
698
+ "loss": 2.5366,
699
+ "step": 49500
700
+ },
701
+ {
702
+ "epoch": 2.15,
703
+ "learning_rate": 0.0003142574852871687,
704
+ "loss": 2.5375,
705
+ "step": 50000
706
+ },
707
+ {
708
+ "epoch": 2.17,
709
+ "learning_rate": 0.0003133983418531724,
710
+ "loss": 2.5455,
711
+ "step": 50500
712
+ },
713
+ {
714
+ "epoch": 2.19,
715
+ "learning_rate": 0.00031253919841917613,
716
+ "loss": 2.5465,
717
+ "step": 51000
718
+ },
719
+ {
720
+ "epoch": 2.21,
721
+ "learning_rate": 0.0003116800549851798,
722
+ "loss": 2.5352,
723
+ "step": 51500
724
+ },
725
+ {
726
+ "epoch": 2.23,
727
+ "learning_rate": 0.0003108209115511835,
728
+ "loss": 2.5377,
729
+ "step": 52000
730
+ },
731
+ {
732
+ "epoch": 2.23,
733
+ "eval_loss": 3.282806158065796,
734
+ "eval_runtime": 69.9972,
735
+ "eval_samples_per_second": 35.716,
736
+ "eval_steps_per_second": 8.929,
737
+ "step": 52000
738
+ },
739
+ {
740
+ "epoch": 2.25,
741
+ "learning_rate": 0.0003099617681171872,
742
+ "loss": 2.5459,
743
+ "step": 52500
744
+ },
745
+ {
746
+ "epoch": 2.28,
747
+ "learning_rate": 0.00030910262468319087,
748
+ "loss": 2.5332,
749
+ "step": 53000
750
+ },
751
+ {
752
+ "epoch": 2.3,
753
+ "learning_rate": 0.00030824348124919456,
754
+ "loss": 2.5324,
755
+ "step": 53500
756
+ },
757
+ {
758
+ "epoch": 2.32,
759
+ "learning_rate": 0.00030738433781519824,
760
+ "loss": 2.5464,
761
+ "step": 54000
762
+ },
763
+ {
764
+ "epoch": 2.34,
765
+ "learning_rate": 0.000306525194381202,
766
+ "loss": 2.538,
767
+ "step": 54500
768
+ },
769
+ {
770
+ "epoch": 2.36,
771
+ "learning_rate": 0.00030566605094720567,
772
+ "loss": 2.5338,
773
+ "step": 55000
774
+ },
775
+ {
776
+ "epoch": 2.38,
777
+ "learning_rate": 0.00030480690751320935,
778
+ "loss": 2.5329,
779
+ "step": 55500
780
+ },
781
+ {
782
+ "epoch": 2.4,
783
+ "learning_rate": 0.00030394776407921304,
784
+ "loss": 2.5418,
785
+ "step": 56000
786
+ },
787
+ {
788
+ "epoch": 2.4,
789
+ "eval_loss": 3.2688469886779785,
790
+ "eval_runtime": 69.5754,
791
+ "eval_samples_per_second": 35.932,
792
+ "eval_steps_per_second": 8.983,
793
+ "step": 56000
794
+ },
795
+ {
796
+ "epoch": 2.43,
797
+ "learning_rate": 0.0003030886206452167,
798
+ "loss": 2.5293,
799
+ "step": 56500
800
+ },
801
+ {
802
+ "epoch": 2.45,
803
+ "learning_rate": 0.0003022294772112204,
804
+ "loss": 2.53,
805
+ "step": 57000
806
+ },
807
+ {
808
+ "epoch": 2.47,
809
+ "learning_rate": 0.0003013703337772241,
810
+ "loss": 2.5316,
811
+ "step": 57500
812
+ },
813
+ {
814
+ "epoch": 2.49,
815
+ "learning_rate": 0.00030051119034322783,
816
+ "loss": 2.5439,
817
+ "step": 58000
818
+ },
819
+ {
820
+ "epoch": 2.51,
821
+ "learning_rate": 0.0002996520469092315,
822
+ "loss": 2.5343,
823
+ "step": 58500
824
+ },
825
+ {
826
+ "epoch": 2.53,
827
+ "learning_rate": 0.0002987929034752352,
828
+ "loss": 2.5396,
829
+ "step": 59000
830
+ },
831
+ {
832
+ "epoch": 2.55,
833
+ "learning_rate": 0.0002979337600412389,
834
+ "loss": 2.5536,
835
+ "step": 59500
836
+ },
837
+ {
838
+ "epoch": 2.58,
839
+ "learning_rate": 0.00029707461660724257,
840
+ "loss": 2.5441,
841
+ "step": 60000
842
+ },
843
+ {
844
+ "epoch": 2.58,
845
+ "eval_loss": 3.2687106132507324,
846
+ "eval_runtime": 68.9594,
847
+ "eval_samples_per_second": 36.253,
848
+ "eval_steps_per_second": 9.063,
849
+ "step": 60000
850
+ },
851
+ {
852
+ "epoch": 2.6,
853
+ "learning_rate": 0.00029621547317324625,
854
+ "loss": 2.5359,
855
+ "step": 60500
856
+ },
857
+ {
858
+ "epoch": 2.62,
859
+ "learning_rate": 0.00029535632973925,
860
+ "loss": 2.5327,
861
+ "step": 61000
862
+ },
863
+ {
864
+ "epoch": 2.64,
865
+ "learning_rate": 0.0002944971863052537,
866
+ "loss": 2.5418,
867
+ "step": 61500
868
+ },
869
+ {
870
+ "epoch": 2.66,
871
+ "learning_rate": 0.00029363804287125736,
872
+ "loss": 2.5382,
873
+ "step": 62000
874
+ },
875
+ {
876
+ "epoch": 2.68,
877
+ "learning_rate": 0.00029277889943726105,
878
+ "loss": 2.529,
879
+ "step": 62500
880
+ },
881
+ {
882
+ "epoch": 2.71,
883
+ "learning_rate": 0.00029191975600326473,
884
+ "loss": 2.5239,
885
+ "step": 63000
886
+ },
887
+ {
888
+ "epoch": 2.73,
889
+ "learning_rate": 0.0002910606125692684,
890
+ "loss": 2.538,
891
+ "step": 63500
892
+ },
893
+ {
894
+ "epoch": 2.75,
895
+ "learning_rate": 0.00029020146913527216,
896
+ "loss": 2.5468,
897
+ "step": 64000
898
+ },
899
+ {
900
+ "epoch": 2.75,
901
+ "eval_loss": 3.265432357788086,
902
+ "eval_runtime": 70.53,
903
+ "eval_samples_per_second": 35.446,
904
+ "eval_steps_per_second": 8.861,
905
+ "step": 64000
906
+ },
907
+ {
908
+ "epoch": 2.77,
909
+ "learning_rate": 0.00028934232570127584,
910
+ "loss": 2.5357,
911
+ "step": 64500
912
+ },
913
+ {
914
+ "epoch": 2.79,
915
+ "learning_rate": 0.0002884831822672795,
916
+ "loss": 2.5266,
917
+ "step": 65000
918
+ },
919
+ {
920
+ "epoch": 2.81,
921
+ "learning_rate": 0.00028762403883328326,
922
+ "loss": 2.5329,
923
+ "step": 65500
924
+ },
925
+ {
926
+ "epoch": 2.83,
927
+ "learning_rate": 0.00028676489539928695,
928
+ "loss": 2.5356,
929
+ "step": 66000
930
+ },
931
+ {
932
+ "epoch": 2.86,
933
+ "learning_rate": 0.00028590575196529063,
934
+ "loss": 2.5332,
935
+ "step": 66500
936
+ },
937
+ {
938
+ "epoch": 2.88,
939
+ "learning_rate": 0.0002850466085312943,
940
+ "loss": 2.531,
941
+ "step": 67000
942
+ },
943
+ {
944
+ "epoch": 2.9,
945
+ "learning_rate": 0.000284187465097298,
946
+ "loss": 2.5347,
947
+ "step": 67500
948
+ },
949
+ {
950
+ "epoch": 2.92,
951
+ "learning_rate": 0.00028332832166330174,
952
+ "loss": 2.545,
953
+ "step": 68000
954
+ },
955
+ {
956
+ "epoch": 2.92,
957
+ "eval_loss": 3.2531003952026367,
958
+ "eval_runtime": 69.6224,
959
+ "eval_samples_per_second": 35.908,
960
+ "eval_steps_per_second": 8.977,
961
+ "step": 68000
962
+ },
963
+ {
964
+ "epoch": 2.94,
965
+ "learning_rate": 0.00028246917822930543,
966
+ "loss": 2.5359,
967
+ "step": 68500
968
+ },
969
+ {
970
+ "epoch": 2.96,
971
+ "learning_rate": 0.0002816100347953091,
972
+ "loss": 2.5332,
973
+ "step": 69000
974
+ },
975
+ {
976
+ "epoch": 2.98,
977
+ "learning_rate": 0.0002807508913613128,
978
+ "loss": 2.5355,
979
+ "step": 69500
980
+ },
981
+ {
982
+ "epoch": 3.01,
983
+ "learning_rate": 0.0002798917479273165,
984
+ "loss": 2.5359,
985
+ "step": 70000
986
+ },
987
+ {
988
+ "epoch": 3.03,
989
+ "learning_rate": 0.00027903260449332017,
990
+ "loss": 2.514,
991
+ "step": 70500
992
+ },
993
+ {
994
+ "epoch": 3.05,
995
+ "learning_rate": 0.00027817346105932385,
996
+ "loss": 2.5217,
997
+ "step": 71000
998
+ },
999
+ {
1000
+ "epoch": 3.07,
1001
+ "learning_rate": 0.0002773143176253276,
1002
+ "loss": 2.5188,
1003
+ "step": 71500
1004
+ },
1005
+ {
1006
+ "epoch": 3.09,
1007
+ "learning_rate": 0.0002764551741913313,
1008
+ "loss": 2.5232,
1009
+ "step": 72000
1010
+ },
1011
+ {
1012
+ "epoch": 3.09,
1013
+ "eval_loss": 3.2648589611053467,
1014
+ "eval_runtime": 69.9722,
1015
+ "eval_samples_per_second": 35.728,
1016
+ "eval_steps_per_second": 8.932,
1017
+ "step": 72000
1018
+ },
1019
+ {
1020
+ "epoch": 3.11,
1021
+ "learning_rate": 0.00027559603075733496,
1022
+ "loss": 2.51,
1023
+ "step": 72500
1024
+ },
1025
+ {
1026
+ "epoch": 3.13,
1027
+ "learning_rate": 0.00027473688732333865,
1028
+ "loss": 2.5234,
1029
+ "step": 73000
1030
+ },
1031
+ {
1032
+ "epoch": 3.16,
1033
+ "learning_rate": 0.00027387774388934233,
1034
+ "loss": 2.5139,
1035
+ "step": 73500
1036
+ },
1037
+ {
1038
+ "epoch": 3.18,
1039
+ "learning_rate": 0.000273018600455346,
1040
+ "loss": 2.5233,
1041
+ "step": 74000
1042
+ },
1043
+ {
1044
+ "epoch": 3.2,
1045
+ "learning_rate": 0.0002721594570213497,
1046
+ "loss": 2.5247,
1047
+ "step": 74500
1048
+ },
1049
+ {
1050
+ "epoch": 3.22,
1051
+ "learning_rate": 0.00027130031358735344,
1052
+ "loss": 2.5225,
1053
+ "step": 75000
1054
+ },
1055
+ {
1056
+ "epoch": 3.24,
1057
+ "learning_rate": 0.0002704411701533571,
1058
+ "loss": 2.528,
1059
+ "step": 75500
1060
+ },
1061
+ {
1062
+ "epoch": 3.26,
1063
+ "learning_rate": 0.0002695820267193608,
1064
+ "loss": 2.5187,
1065
+ "step": 76000
1066
+ },
1067
+ {
1068
+ "epoch": 3.26,
1069
+ "eval_loss": 3.2556352615356445,
1070
+ "eval_runtime": 70.0453,
1071
+ "eval_samples_per_second": 35.691,
1072
+ "eval_steps_per_second": 8.923,
1073
+ "step": 76000
1074
+ },
1075
+ {
1076
+ "epoch": 3.28,
1077
+ "learning_rate": 0.0002687228832853645,
1078
+ "loss": 2.5246,
1079
+ "step": 76500
1080
+ },
1081
+ {
1082
+ "epoch": 3.31,
1083
+ "learning_rate": 0.0002678637398513682,
1084
+ "loss": 2.521,
1085
+ "step": 77000
1086
+ },
1087
+ {
1088
+ "epoch": 3.33,
1089
+ "learning_rate": 0.00026700459641737186,
1090
+ "loss": 2.5226,
1091
+ "step": 77500
1092
+ },
1093
+ {
1094
+ "epoch": 3.35,
1095
+ "learning_rate": 0.0002661454529833756,
1096
+ "loss": 2.5239,
1097
+ "step": 78000
1098
+ },
1099
+ {
1100
+ "epoch": 3.37,
1101
+ "learning_rate": 0.0002652863095493793,
1102
+ "loss": 2.5243,
1103
+ "step": 78500
1104
+ },
1105
+ {
1106
+ "epoch": 3.39,
1107
+ "learning_rate": 0.00026442716611538297,
1108
+ "loss": 2.5316,
1109
+ "step": 79000
1110
+ },
1111
+ {
1112
+ "epoch": 3.41,
1113
+ "learning_rate": 0.00026356802268138666,
1114
+ "loss": 2.531,
1115
+ "step": 79500
1116
+ },
1117
+ {
1118
+ "epoch": 3.44,
1119
+ "learning_rate": 0.00026270887924739034,
1120
+ "loss": 2.5125,
1121
+ "step": 80000
1122
+ },
1123
+ {
1124
+ "epoch": 3.44,
1125
+ "eval_loss": 3.2571377754211426,
1126
+ "eval_runtime": 69.7633,
1127
+ "eval_samples_per_second": 35.835,
1128
+ "eval_steps_per_second": 8.959,
1129
+ "step": 80000
1130
+ },
1131
+ {
1132
+ "epoch": 3.46,
1133
+ "learning_rate": 0.000261849735813394,
1134
+ "loss": 2.5253,
1135
+ "step": 80500
1136
+ },
1137
+ {
1138
+ "epoch": 3.48,
1139
+ "learning_rate": 0.0002609905923793977,
1140
+ "loss": 2.5195,
1141
+ "step": 81000
1142
+ },
1143
+ {
1144
+ "epoch": 3.5,
1145
+ "learning_rate": 0.00026013144894540145,
1146
+ "loss": 2.5314,
1147
+ "step": 81500
1148
+ },
1149
+ {
1150
+ "epoch": 3.52,
1151
+ "learning_rate": 0.00025927230551140514,
1152
+ "loss": 2.5207,
1153
+ "step": 82000
1154
+ },
1155
+ {
1156
+ "epoch": 3.54,
1157
+ "learning_rate": 0.0002584131620774088,
1158
+ "loss": 2.5342,
1159
+ "step": 82500
1160
+ },
1161
+ {
1162
+ "epoch": 3.56,
1163
+ "learning_rate": 0.0002575540186434125,
1164
+ "loss": 2.5344,
1165
+ "step": 83000
1166
+ },
1167
+ {
1168
+ "epoch": 3.59,
1169
+ "learning_rate": 0.00025669487520941624,
1170
+ "loss": 2.5287,
1171
+ "step": 83500
1172
+ },
1173
+ {
1174
+ "epoch": 3.61,
1175
+ "learning_rate": 0.00025583573177541993,
1176
+ "loss": 2.5225,
1177
+ "step": 84000
1178
+ },
1179
+ {
1180
+ "epoch": 3.61,
1181
+ "eval_loss": 3.2613072395324707,
1182
+ "eval_runtime": 70.5525,
1183
+ "eval_samples_per_second": 35.435,
1184
+ "eval_steps_per_second": 8.859,
1185
+ "step": 84000
1186
+ },
1187
+ {
1188
+ "epoch": 3.63,
1189
+ "learning_rate": 0.0002549765883414236,
1190
+ "loss": 2.522,
1191
+ "step": 84500
1192
+ },
1193
+ {
1194
+ "epoch": 3.65,
1195
+ "learning_rate": 0.00025411744490742735,
1196
+ "loss": 2.5343,
1197
+ "step": 85000
1198
+ },
1199
+ {
1200
+ "epoch": 3.67,
1201
+ "learning_rate": 0.00025325830147343104,
1202
+ "loss": 2.5234,
1203
+ "step": 85500
1204
+ },
1205
+ {
1206
+ "epoch": 3.69,
1207
+ "learning_rate": 0.0002523991580394347,
1208
+ "loss": 2.5293,
1209
+ "step": 86000
1210
+ },
1211
+ {
1212
+ "epoch": 3.71,
1213
+ "learning_rate": 0.0002515400146054384,
1214
+ "loss": 2.527,
1215
+ "step": 86500
1216
+ },
1217
+ {
1218
+ "epoch": 3.74,
1219
+ "learning_rate": 0.0002506808711714421,
1220
+ "loss": 2.5245,
1221
+ "step": 87000
1222
+ },
1223
+ {
1224
+ "epoch": 3.76,
1225
+ "learning_rate": 0.0002498217277374458,
1226
+ "loss": 2.529,
1227
+ "step": 87500
1228
+ },
1229
+ {
1230
+ "epoch": 3.78,
1231
+ "learning_rate": 0.00024896258430344946,
1232
+ "loss": 2.5259,
1233
+ "step": 88000
1234
+ },
1235
+ {
1236
+ "epoch": 3.78,
1237
+ "eval_loss": 3.2457616329193115,
1238
+ "eval_runtime": 69.6751,
1239
+ "eval_samples_per_second": 35.881,
1240
+ "eval_steps_per_second": 8.97,
1241
+ "step": 88000
1242
+ },
1243
+ {
1244
+ "epoch": 3.8,
1245
+ "learning_rate": 0.0002481034408694532,
1246
+ "loss": 2.5205,
1247
+ "step": 88500
1248
+ },
1249
+ {
1250
+ "epoch": 3.82,
1251
+ "learning_rate": 0.0002472442974354569,
1252
+ "loss": 2.5183,
1253
+ "step": 89000
1254
+ },
1255
+ {
1256
+ "epoch": 3.84,
1257
+ "learning_rate": 0.00024638515400146057,
1258
+ "loss": 2.5257,
1259
+ "step": 89500
1260
+ },
1261
+ {
1262
+ "epoch": 3.86,
1263
+ "learning_rate": 0.00024552601056746426,
1264
+ "loss": 2.5272,
1265
+ "step": 90000
1266
+ },
1267
+ {
1268
+ "epoch": 3.89,
1269
+ "learning_rate": 0.00024466686713346794,
1270
+ "loss": 2.5228,
1271
+ "step": 90500
1272
+ },
1273
+ {
1274
+ "epoch": 3.91,
1275
+ "learning_rate": 0.00024380772369947163,
1276
+ "loss": 2.5306,
1277
+ "step": 91000
1278
+ },
1279
+ {
1280
+ "epoch": 3.93,
1281
+ "learning_rate": 0.0002429485802654753,
1282
+ "loss": 2.5149,
1283
+ "step": 91500
1284
+ },
1285
+ {
1286
+ "epoch": 3.95,
1287
+ "learning_rate": 0.00024208943683147905,
1288
+ "loss": 2.5303,
1289
+ "step": 92000
1290
+ },
1291
+ {
1292
+ "epoch": 3.95,
1293
+ "eval_loss": 3.2447612285614014,
1294
+ "eval_runtime": 78.9548,
1295
+ "eval_samples_per_second": 31.664,
1296
+ "eval_steps_per_second": 7.916,
1297
+ "step": 92000
1298
+ },
1299
+ {
1300
+ "epoch": 3.97,
1301
+ "learning_rate": 0.00024123029339748273,
1302
+ "loss": 2.5277,
1303
+ "step": 92500
1304
+ },
1305
+ {
1306
+ "epoch": 3.99,
1307
+ "learning_rate": 0.00024037114996348642,
1308
+ "loss": 2.5204,
1309
+ "step": 93000
1310
+ },
1311
+ {
1312
+ "epoch": 4.01,
1313
+ "learning_rate": 0.0002395120065294901,
1314
+ "loss": 2.5176,
1315
+ "step": 93500
1316
+ },
1317
+ {
1318
+ "epoch": 4.04,
1319
+ "learning_rate": 0.0002386528630954938,
1320
+ "loss": 2.5151,
1321
+ "step": 94000
1322
+ },
1323
+ {
1324
+ "epoch": 4.06,
1325
+ "learning_rate": 0.00023779371966149747,
1326
+ "loss": 2.5152,
1327
+ "step": 94500
1328
+ },
1329
+ {
1330
+ "epoch": 4.08,
1331
+ "learning_rate": 0.0002369345762275012,
1332
+ "loss": 2.5124,
1333
+ "step": 95000
1334
+ },
1335
+ {
1336
+ "epoch": 4.1,
1337
+ "learning_rate": 0.0002360754327935049,
1338
+ "loss": 2.5208,
1339
+ "step": 95500
1340
+ },
1341
+ {
1342
+ "epoch": 4.12,
1343
+ "learning_rate": 0.00023521628935950858,
1344
+ "loss": 2.5129,
1345
+ "step": 96000
1346
+ },
1347
+ {
1348
+ "epoch": 4.12,
1349
+ "eval_loss": 3.2467501163482666,
1350
+ "eval_runtime": 69.6325,
1351
+ "eval_samples_per_second": 35.903,
1352
+ "eval_steps_per_second": 8.976,
1353
+ "step": 96000
1354
+ },
1355
+ {
1356
+ "epoch": 4.14,
1357
+ "learning_rate": 0.00023435714592551227,
1358
+ "loss": 2.5189,
1359
+ "step": 96500
1360
+ },
1361
+ {
1362
+ "epoch": 4.17,
1363
+ "learning_rate": 0.00023349800249151598,
1364
+ "loss": 2.5064,
1365
+ "step": 97000
1366
+ },
1367
+ {
1368
+ "epoch": 4.19,
1369
+ "learning_rate": 0.00023263885905751966,
1370
+ "loss": 2.5088,
1371
+ "step": 97500
1372
+ },
1373
+ {
1374
+ "epoch": 4.21,
1375
+ "learning_rate": 0.00023177971562352335,
1376
+ "loss": 2.5204,
1377
+ "step": 98000
1378
+ },
1379
+ {
1380
+ "epoch": 4.23,
1381
+ "learning_rate": 0.00023092057218952706,
1382
+ "loss": 2.523,
1383
+ "step": 98500
1384
+ },
1385
+ {
1386
+ "epoch": 4.25,
1387
+ "learning_rate": 0.00023006142875553077,
1388
+ "loss": 2.5147,
1389
+ "step": 99000
1390
+ },
1391
+ {
1392
+ "epoch": 4.27,
1393
+ "learning_rate": 0.00022920228532153446,
1394
+ "loss": 2.5034,
1395
+ "step": 99500
1396
+ },
1397
+ {
1398
+ "epoch": 4.29,
1399
+ "learning_rate": 0.00022834314188753814,
1400
+ "loss": 2.5081,
1401
+ "step": 100000
1402
+ },
1403
+ {
1404
+ "epoch": 4.29,
1405
+ "eval_loss": 3.2460429668426514,
1406
+ "eval_runtime": 69.2535,
1407
+ "eval_samples_per_second": 36.099,
1408
+ "eval_steps_per_second": 9.025,
1409
+ "step": 100000
1410
+ },
1411
+ {
1412
+ "epoch": 4.32,
1413
+ "learning_rate": 0.00022748399845354183,
1414
+ "loss": 2.5043,
1415
+ "step": 100500
1416
+ },
1417
+ {
1418
+ "epoch": 4.34,
1419
+ "learning_rate": 0.0002266248550195455,
1420
+ "loss": 2.5175,
1421
+ "step": 101000
1422
+ },
1423
+ {
1424
+ "epoch": 4.36,
1425
+ "learning_rate": 0.0002257657115855492,
1426
+ "loss": 2.5049,
1427
+ "step": 101500
1428
+ },
1429
+ {
1430
+ "epoch": 4.38,
1431
+ "learning_rate": 0.00022490656815155294,
1432
+ "loss": 2.5042,
1433
+ "step": 102000
1434
+ },
1435
+ {
1436
+ "epoch": 4.4,
1437
+ "learning_rate": 0.00022404742471755662,
1438
+ "loss": 2.5209,
1439
+ "step": 102500
1440
+ },
1441
+ {
1442
+ "epoch": 4.42,
1443
+ "learning_rate": 0.0002231882812835603,
1444
+ "loss": 2.5175,
1445
+ "step": 103000
1446
+ },
1447
+ {
1448
+ "epoch": 4.44,
1449
+ "learning_rate": 0.000222329137849564,
1450
+ "loss": 2.5228,
1451
+ "step": 103500
1452
+ },
1453
+ {
1454
+ "epoch": 4.47,
1455
+ "learning_rate": 0.00022146999441556767,
1456
+ "loss": 2.5198,
1457
+ "step": 104000
1458
+ },
1459
+ {
1460
+ "epoch": 4.47,
1461
+ "eval_loss": 3.2494592666625977,
1462
+ "eval_runtime": 70.2669,
1463
+ "eval_samples_per_second": 35.579,
1464
+ "eval_steps_per_second": 8.895,
1465
+ "step": 104000
1466
+ },
1467
+ {
1468
+ "epoch": 4.49,
1469
+ "learning_rate": 0.00022061085098157136,
1470
+ "loss": 2.5168,
1471
+ "step": 104500
1472
+ },
1473
+ {
1474
+ "epoch": 4.51,
1475
+ "learning_rate": 0.00021975170754757507,
1476
+ "loss": 2.514,
1477
+ "step": 105000
1478
+ },
1479
+ {
1480
+ "epoch": 4.53,
1481
+ "learning_rate": 0.00021889256411357878,
1482
+ "loss": 2.5276,
1483
+ "step": 105500
1484
+ },
1485
+ {
1486
+ "epoch": 4.55,
1487
+ "learning_rate": 0.00021803342067958247,
1488
+ "loss": 2.5154,
1489
+ "step": 106000
1490
+ },
1491
+ {
1492
+ "epoch": 4.57,
1493
+ "learning_rate": 0.00021717427724558615,
1494
+ "loss": 2.5151,
1495
+ "step": 106500
1496
+ },
1497
+ {
1498
+ "epoch": 4.59,
1499
+ "learning_rate": 0.00021631513381158987,
1500
+ "loss": 2.5186,
1501
+ "step": 107000
1502
+ },
1503
+ {
1504
+ "epoch": 4.62,
1505
+ "learning_rate": 0.00021545599037759355,
1506
+ "loss": 2.5236,
1507
+ "step": 107500
1508
+ },
1509
+ {
1510
+ "epoch": 4.64,
1511
+ "learning_rate": 0.00021459684694359724,
1512
+ "loss": 2.5123,
1513
+ "step": 108000
1514
+ },
1515
+ {
1516
+ "epoch": 4.64,
1517
+ "eval_loss": 3.2353665828704834,
1518
+ "eval_runtime": 69.8092,
1519
+ "eval_samples_per_second": 35.812,
1520
+ "eval_steps_per_second": 8.953,
1521
+ "step": 108000
1522
+ },
1523
+ {
1524
+ "epoch": 4.66,
1525
+ "learning_rate": 0.00021373770350960092,
1526
+ "loss": 2.5078,
1527
+ "step": 108500
1528
+ },
1529
+ {
1530
+ "epoch": 4.68,
1531
+ "learning_rate": 0.00021287856007560466,
1532
+ "loss": 2.5114,
1533
+ "step": 109000
1534
+ },
1535
+ {
1536
+ "epoch": 4.7,
1537
+ "learning_rate": 0.00021201941664160834,
1538
+ "loss": 2.5164,
1539
+ "step": 109500
1540
+ },
1541
+ {
1542
+ "epoch": 4.72,
1543
+ "learning_rate": 0.00021116027320761203,
1544
+ "loss": 2.5136,
1545
+ "step": 110000
1546
+ },
1547
+ {
1548
+ "epoch": 4.74,
1549
+ "learning_rate": 0.0002103011297736157,
1550
+ "loss": 2.5181,
1551
+ "step": 110500
1552
+ },
1553
+ {
1554
+ "epoch": 4.77,
1555
+ "learning_rate": 0.0002094419863396194,
1556
+ "loss": 2.5091,
1557
+ "step": 111000
1558
+ },
1559
+ {
1560
+ "epoch": 4.79,
1561
+ "learning_rate": 0.00020858284290562308,
1562
+ "loss": 2.4991,
1563
+ "step": 111500
1564
+ },
1565
+ {
1566
+ "epoch": 4.81,
1567
+ "learning_rate": 0.00020772369947162677,
1568
+ "loss": 2.5035,
1569
+ "step": 112000
1570
+ },
1571
+ {
1572
+ "epoch": 4.81,
1573
+ "eval_loss": 3.242424726486206,
1574
+ "eval_runtime": 70.3737,
1575
+ "eval_samples_per_second": 35.525,
1576
+ "eval_steps_per_second": 8.881,
1577
+ "step": 112000
1578
+ },
1579
+ {
1580
+ "epoch": 4.83,
1581
+ "learning_rate": 0.0002068645560376305,
1582
+ "loss": 2.5244,
1583
+ "step": 112500
1584
+ },
1585
+ {
1586
+ "epoch": 4.85,
1587
+ "learning_rate": 0.0002060054126036342,
1588
+ "loss": 2.5139,
1589
+ "step": 113000
1590
+ },
1591
+ {
1592
+ "epoch": 4.87,
1593
+ "learning_rate": 0.00020514626916963788,
1594
+ "loss": 2.5144,
1595
+ "step": 113500
1596
+ },
1597
+ {
1598
+ "epoch": 4.89,
1599
+ "learning_rate": 0.00020428712573564156,
1600
+ "loss": 2.4944,
1601
+ "step": 114000
1602
+ },
1603
+ {
1604
+ "epoch": 4.92,
1605
+ "learning_rate": 0.00020342798230164525,
1606
+ "loss": 2.5014,
1607
+ "step": 114500
1608
+ },
1609
+ {
1610
+ "epoch": 4.94,
1611
+ "learning_rate": 0.00020256883886764896,
1612
+ "loss": 2.5067,
1613
+ "step": 115000
1614
+ },
1615
+ {
1616
+ "epoch": 4.96,
1617
+ "learning_rate": 0.00020170969543365267,
1618
+ "loss": 2.5061,
1619
+ "step": 115500
1620
+ },
1621
+ {
1622
+ "epoch": 4.98,
1623
+ "learning_rate": 0.00020085055199965636,
1624
+ "loss": 2.5177,
1625
+ "step": 116000
1626
+ },
1627
+ {
1628
+ "epoch": 4.98,
1629
+ "eval_loss": 3.2430241107940674,
1630
+ "eval_runtime": 69.1442,
1631
+ "eval_samples_per_second": 36.156,
1632
+ "eval_steps_per_second": 9.039,
1633
+ "step": 116000
1634
+ },
1635
+ {
1636
+ "epoch": 5.0,
1637
+ "learning_rate": 0.00019999140856566007,
1638
+ "loss": 2.5116,
1639
+ "step": 116500
1640
+ },
1641
+ {
1642
+ "epoch": 5.02,
1643
+ "learning_rate": 0.00019913226513166375,
1644
+ "loss": 2.501,
1645
+ "step": 117000
1646
+ },
1647
+ {
1648
+ "epoch": 5.05,
1649
+ "learning_rate": 0.00019827312169766744,
1650
+ "loss": 2.4926,
1651
+ "step": 117500
1652
+ },
1653
+ {
1654
+ "epoch": 5.07,
1655
+ "learning_rate": 0.00019741397826367115,
1656
+ "loss": 2.4918,
1657
+ "step": 118000
1658
+ },
1659
+ {
1660
+ "epoch": 5.09,
1661
+ "learning_rate": 0.00019655483482967483,
1662
+ "loss": 2.4971,
1663
+ "step": 118500
1664
+ },
1665
+ {
1666
+ "epoch": 5.11,
1667
+ "learning_rate": 0.00019569569139567852,
1668
+ "loss": 2.4957,
1669
+ "step": 119000
1670
+ },
1671
+ {
1672
+ "epoch": 5.13,
1673
+ "learning_rate": 0.0001948365479616822,
1674
+ "loss": 2.5049,
1675
+ "step": 119500
1676
+ },
1677
+ {
1678
+ "epoch": 5.15,
1679
+ "learning_rate": 0.00019397740452768592,
1680
+ "loss": 2.4946,
1681
+ "step": 120000
1682
+ },
1683
+ {
1684
+ "epoch": 5.15,
1685
+ "eval_loss": 3.233226776123047,
1686
+ "eval_runtime": 69.9335,
1687
+ "eval_samples_per_second": 35.748,
1688
+ "eval_steps_per_second": 8.937,
1689
+ "step": 120000
1690
+ },
1691
+ {
1692
+ "epoch": 5.17,
1693
+ "learning_rate": 0.0001931182610936896,
1694
+ "loss": 2.5004,
1695
+ "step": 120500
1696
+ },
1697
+ {
1698
+ "epoch": 5.2,
1699
+ "learning_rate": 0.00019225911765969328,
1700
+ "loss": 2.5101,
1701
+ "step": 121000
1702
+ },
1703
+ {
1704
+ "epoch": 5.22,
1705
+ "learning_rate": 0.000191399974225697,
1706
+ "loss": 2.4973,
1707
+ "step": 121500
1708
+ },
1709
+ {
1710
+ "epoch": 5.24,
1711
+ "learning_rate": 0.00019054083079170068,
1712
+ "loss": 2.5075,
1713
+ "step": 122000
1714
+ },
1715
+ {
1716
+ "epoch": 5.26,
1717
+ "learning_rate": 0.00018968168735770437,
1718
+ "loss": 2.503,
1719
+ "step": 122500
1720
+ },
1721
+ {
1722
+ "epoch": 5.28,
1723
+ "learning_rate": 0.00018882254392370805,
1724
+ "loss": 2.5034,
1725
+ "step": 123000
1726
+ },
1727
+ {
1728
+ "epoch": 5.3,
1729
+ "learning_rate": 0.00018796340048971176,
1730
+ "loss": 2.5022,
1731
+ "step": 123500
1732
+ },
1733
+ {
1734
+ "epoch": 5.32,
1735
+ "learning_rate": 0.00018710425705571545,
1736
+ "loss": 2.501,
1737
+ "step": 124000
1738
+ },
1739
+ {
1740
+ "epoch": 5.32,
1741
+ "eval_loss": 3.242777109146118,
1742
+ "eval_runtime": 69.4476,
1743
+ "eval_samples_per_second": 35.998,
1744
+ "eval_steps_per_second": 9.0,
1745
+ "step": 124000
1746
+ },
1747
+ {
1748
+ "epoch": 5.35,
1749
+ "learning_rate": 0.00018624511362171916,
1750
+ "loss": 2.5108,
1751
+ "step": 124500
1752
+ },
1753
+ {
1754
+ "epoch": 5.37,
1755
+ "learning_rate": 0.00018538597018772284,
1756
+ "loss": 2.5104,
1757
+ "step": 125000
1758
+ },
1759
+ {
1760
+ "epoch": 5.39,
1761
+ "learning_rate": 0.00018452682675372656,
1762
+ "loss": 2.5043,
1763
+ "step": 125500
1764
+ },
1765
+ {
1766
+ "epoch": 5.41,
1767
+ "learning_rate": 0.00018366768331973024,
1768
+ "loss": 2.5205,
1769
+ "step": 126000
1770
+ },
1771
+ {
1772
+ "epoch": 5.43,
1773
+ "learning_rate": 0.00018280853988573395,
1774
+ "loss": 2.4971,
1775
+ "step": 126500
1776
+ },
1777
+ {
1778
+ "epoch": 5.45,
1779
+ "learning_rate": 0.00018194939645173764,
1780
+ "loss": 2.5098,
1781
+ "step": 127000
1782
+ },
1783
+ {
1784
+ "epoch": 5.47,
1785
+ "learning_rate": 0.00018109025301774132,
1786
+ "loss": 2.5088,
1787
+ "step": 127500
1788
+ },
1789
+ {
1790
+ "epoch": 5.5,
1791
+ "learning_rate": 0.000180231109583745,
1792
+ "loss": 2.4968,
1793
+ "step": 128000
1794
+ },
1795
+ {
1796
+ "epoch": 5.5,
1797
+ "eval_loss": 3.2393054962158203,
1798
+ "eval_runtime": 70.4742,
1799
+ "eval_samples_per_second": 35.474,
1800
+ "eval_steps_per_second": 8.868,
1801
+ "step": 128000
1802
+ },
1803
+ {
1804
+ "epoch": 5.52,
1805
+ "learning_rate": 0.00017937196614974872,
1806
+ "loss": 2.5051,
1807
+ "step": 128500
1808
+ },
1809
+ {
1810
+ "epoch": 5.54,
1811
+ "learning_rate": 0.0001785128227157524,
1812
+ "loss": 2.5041,
1813
+ "step": 129000
1814
+ },
1815
+ {
1816
+ "epoch": 5.56,
1817
+ "learning_rate": 0.0001776536792817561,
1818
+ "loss": 2.5002,
1819
+ "step": 129500
1820
+ },
1821
+ {
1822
+ "epoch": 5.58,
1823
+ "learning_rate": 0.0001767945358477598,
1824
+ "loss": 2.5095,
1825
+ "step": 130000
1826
+ },
1827
+ {
1828
+ "epoch": 5.6,
1829
+ "learning_rate": 0.0001759353924137635,
1830
+ "loss": 2.5044,
1831
+ "step": 130500
1832
+ },
1833
+ {
1834
+ "epoch": 5.62,
1835
+ "learning_rate": 0.00017507624897976717,
1836
+ "loss": 2.508,
1837
+ "step": 131000
1838
+ },
1839
+ {
1840
+ "epoch": 5.65,
1841
+ "learning_rate": 0.00017421710554577086,
1842
+ "loss": 2.5052,
1843
+ "step": 131500
1844
+ },
1845
+ {
1846
+ "epoch": 5.67,
1847
+ "learning_rate": 0.00017335796211177457,
1848
+ "loss": 2.5086,
1849
+ "step": 132000
1850
+ },
1851
+ {
1852
+ "epoch": 5.67,
1853
+ "eval_loss": 3.2350261211395264,
1854
+ "eval_runtime": 70.6424,
1855
+ "eval_samples_per_second": 35.39,
1856
+ "eval_steps_per_second": 8.847,
1857
+ "step": 132000
1858
+ },
1859
+ {
1860
+ "epoch": 5.69,
1861
+ "learning_rate": 0.00017249881867777825,
1862
+ "loss": 2.5015,
1863
+ "step": 132500
1864
+ },
1865
+ {
1866
+ "epoch": 5.71,
1867
+ "learning_rate": 0.00017163967524378194,
1868
+ "loss": 2.5,
1869
+ "step": 133000
1870
+ },
1871
+ {
1872
+ "epoch": 5.73,
1873
+ "learning_rate": 0.00017078053180978565,
1874
+ "loss": 2.5026,
1875
+ "step": 133500
1876
+ },
1877
+ {
1878
+ "epoch": 5.75,
1879
+ "learning_rate": 0.00016992138837578933,
1880
+ "loss": 2.4933,
1881
+ "step": 134000
1882
+ },
1883
+ {
1884
+ "epoch": 5.78,
1885
+ "learning_rate": 0.00016906224494179305,
1886
+ "loss": 2.5079,
1887
+ "step": 134500
1888
+ },
1889
+ {
1890
+ "epoch": 5.8,
1891
+ "learning_rate": 0.00016820310150779676,
1892
+ "loss": 2.4959,
1893
+ "step": 135000
1894
+ },
1895
+ {
1896
+ "epoch": 5.82,
1897
+ "learning_rate": 0.00016734395807380044,
1898
+ "loss": 2.5109,
1899
+ "step": 135500
1900
+ },
1901
+ {
1902
+ "epoch": 5.84,
1903
+ "learning_rate": 0.00016648481463980413,
1904
+ "loss": 2.5091,
1905
+ "step": 136000
1906
+ },
1907
+ {
1908
+ "epoch": 5.84,
1909
+ "eval_loss": 3.2328109741210938,
1910
+ "eval_runtime": 70.9654,
1911
+ "eval_samples_per_second": 35.228,
1912
+ "eval_steps_per_second": 8.807,
1913
+ "step": 136000
1914
+ },
1915
+ {
1916
+ "epoch": 5.86,
1917
+ "learning_rate": 0.0001656256712058078,
1918
+ "loss": 2.4965,
1919
+ "step": 136500
1920
+ },
1921
+ {
1922
+ "epoch": 5.88,
1923
+ "learning_rate": 0.00016476652777181152,
1924
+ "loss": 2.509,
1925
+ "step": 137000
1926
+ },
1927
+ {
1928
+ "epoch": 5.9,
1929
+ "learning_rate": 0.0001639073843378152,
1930
+ "loss": 2.5055,
1931
+ "step": 137500
1932
+ },
1933
+ {
1934
+ "epoch": 5.93,
1935
+ "learning_rate": 0.0001630482409038189,
1936
+ "loss": 2.5145,
1937
+ "step": 138000
1938
+ },
1939
+ {
1940
+ "epoch": 5.95,
1941
+ "learning_rate": 0.0001621890974698226,
1942
+ "loss": 2.5099,
1943
+ "step": 138500
1944
+ },
1945
+ {
1946
+ "epoch": 5.97,
1947
+ "learning_rate": 0.0001613299540358263,
1948
+ "loss": 2.497,
1949
+ "step": 139000
1950
+ },
1951
+ {
1952
+ "epoch": 5.99,
1953
+ "learning_rate": 0.00016047081060182998,
1954
+ "loss": 2.4981,
1955
+ "step": 139500
1956
+ },
1957
+ {
1958
+ "epoch": 6.01,
1959
+ "learning_rate": 0.00015961166716783366,
1960
+ "loss": 2.4972,
1961
+ "step": 140000
1962
+ },
1963
+ {
1964
+ "epoch": 6.01,
1965
+ "eval_loss": 3.2302370071411133,
1966
+ "eval_runtime": 71.0669,
1967
+ "eval_samples_per_second": 35.178,
1968
+ "eval_steps_per_second": 8.795,
1969
+ "step": 140000
1970
+ },
1971
+ {
1972
+ "epoch": 6.03,
1973
+ "learning_rate": 0.00015875252373383737,
1974
+ "loss": 2.4869,
1975
+ "step": 140500
1976
+ },
1977
+ {
1978
+ "epoch": 6.05,
1979
+ "learning_rate": 0.00015789338029984106,
1980
+ "loss": 2.5036,
1981
+ "step": 141000
1982
+ },
1983
+ {
1984
+ "epoch": 6.08,
1985
+ "learning_rate": 0.00015703423686584474,
1986
+ "loss": 2.4872,
1987
+ "step": 141500
1988
+ },
1989
+ {
1990
+ "epoch": 6.1,
1991
+ "learning_rate": 0.00015617509343184845,
1992
+ "loss": 2.4802,
1993
+ "step": 142000
1994
+ },
1995
+ {
1996
+ "epoch": 6.12,
1997
+ "learning_rate": 0.00015531594999785214,
1998
+ "loss": 2.4812,
1999
+ "step": 142500
2000
+ },
2001
+ {
2002
+ "epoch": 6.14,
2003
+ "learning_rate": 0.00015445680656385585,
2004
+ "loss": 2.486,
2005
+ "step": 143000
2006
+ },
2007
+ {
2008
+ "epoch": 6.16,
2009
+ "learning_rate": 0.00015359766312985954,
2010
+ "loss": 2.5004,
2011
+ "step": 143500
2012
+ },
2013
+ {
2014
+ "epoch": 6.18,
2015
+ "learning_rate": 0.00015273851969586325,
2016
+ "loss": 2.496,
2017
+ "step": 144000
2018
+ },
2019
+ {
2020
+ "epoch": 6.18,
2021
+ "eval_loss": 3.224409341812134,
2022
+ "eval_runtime": 69.085,
2023
+ "eval_samples_per_second": 36.187,
2024
+ "eval_steps_per_second": 9.047,
2025
+ "step": 144000
2026
+ },
2027
+ {
2028
+ "epoch": 6.2,
2029
+ "learning_rate": 0.00015187937626186693,
2030
+ "loss": 2.4869,
2031
+ "step": 144500
2032
+ },
2033
+ {
2034
+ "epoch": 6.23,
2035
+ "learning_rate": 0.00015102023282787062,
2036
+ "loss": 2.5018,
2037
+ "step": 145000
2038
+ },
2039
+ {
2040
+ "epoch": 6.25,
2041
+ "learning_rate": 0.00015016108939387433,
2042
+ "loss": 2.5033,
2043
+ "step": 145500
2044
+ },
2045
+ {
2046
+ "epoch": 6.27,
2047
+ "learning_rate": 0.00014930194595987801,
2048
+ "loss": 2.4944,
2049
+ "step": 146000
2050
+ },
2051
+ {
2052
+ "epoch": 6.29,
2053
+ "learning_rate": 0.0001484428025258817,
2054
+ "loss": 2.4889,
2055
+ "step": 146500
2056
+ },
2057
+ {
2058
+ "epoch": 6.31,
2059
+ "learning_rate": 0.0001475836590918854,
2060
+ "loss": 2.4913,
2061
+ "step": 147000
2062
+ },
2063
+ {
2064
+ "epoch": 6.33,
2065
+ "learning_rate": 0.0001467245156578891,
2066
+ "loss": 2.5018,
2067
+ "step": 147500
2068
+ },
2069
+ {
2070
+ "epoch": 6.35,
2071
+ "learning_rate": 0.00014586537222389278,
2072
+ "loss": 2.4943,
2073
+ "step": 148000
2074
+ },
2075
+ {
2076
+ "epoch": 6.35,
2077
+ "eval_loss": 3.2242684364318848,
2078
+ "eval_runtime": 70.9743,
2079
+ "eval_samples_per_second": 35.224,
2080
+ "eval_steps_per_second": 8.806,
2081
+ "step": 148000
2082
+ },
2083
+ {
2084
+ "epoch": 6.38,
2085
+ "learning_rate": 0.00014500622878989647,
2086
+ "loss": 2.5,
2087
+ "step": 148500
2088
+ },
2089
+ {
2090
+ "epoch": 6.4,
2091
+ "learning_rate": 0.00014414708535590018,
2092
+ "loss": 2.4994,
2093
+ "step": 149000
2094
+ },
2095
+ {
2096
+ "epoch": 6.42,
2097
+ "learning_rate": 0.00014328794192190386,
2098
+ "loss": 2.4986,
2099
+ "step": 149500
2100
+ },
2101
+ {
2102
+ "epoch": 6.44,
2103
+ "learning_rate": 0.00014242879848790755,
2104
+ "loss": 2.4934,
2105
+ "step": 150000
2106
+ },
2107
+ {
2108
+ "epoch": 6.46,
2109
+ "learning_rate": 0.00014156965505391126,
2110
+ "loss": 2.4887,
2111
+ "step": 150500
2112
+ },
2113
+ {
2114
+ "epoch": 6.48,
2115
+ "learning_rate": 0.00014071051161991494,
2116
+ "loss": 2.4922,
2117
+ "step": 151000
2118
+ },
2119
+ {
2120
+ "epoch": 6.51,
2121
+ "learning_rate": 0.00013985136818591863,
2122
+ "loss": 2.4943,
2123
+ "step": 151500
2124
+ },
2125
+ {
2126
+ "epoch": 6.53,
2127
+ "learning_rate": 0.00013899222475192234,
2128
+ "loss": 2.491,
2129
+ "step": 152000
2130
+ },
2131
+ {
2132
+ "epoch": 6.53,
2133
+ "eval_loss": 3.220249652862549,
2134
+ "eval_runtime": 70.6987,
2135
+ "eval_samples_per_second": 35.361,
2136
+ "eval_steps_per_second": 8.84,
2137
+ "step": 152000
2138
+ },
2139
+ {
2140
+ "epoch": 6.55,
2141
+ "learning_rate": 0.00013813308131792603,
2142
+ "loss": 2.4967,
2143
+ "step": 152500
2144
+ },
2145
+ {
2146
+ "epoch": 6.57,
2147
+ "learning_rate": 0.00013727393788392974,
2148
+ "loss": 2.4959,
2149
+ "step": 153000
2150
+ },
2151
+ {
2152
+ "epoch": 6.59,
2153
+ "learning_rate": 0.00013641479444993342,
2154
+ "loss": 2.4777,
2155
+ "step": 153500
2156
+ },
2157
+ {
2158
+ "epoch": 6.61,
2159
+ "learning_rate": 0.00013555565101593713,
2160
+ "loss": 2.4931,
2161
+ "step": 154000
2162
+ },
2163
+ {
2164
+ "epoch": 6.63,
2165
+ "learning_rate": 0.00013469650758194082,
2166
+ "loss": 2.497,
2167
+ "step": 154500
2168
+ },
2169
+ {
2170
+ "epoch": 6.66,
2171
+ "learning_rate": 0.0001338373641479445,
2172
+ "loss": 2.4964,
2173
+ "step": 155000
2174
+ },
2175
+ {
2176
+ "epoch": 6.68,
2177
+ "learning_rate": 0.00013297822071394822,
2178
+ "loss": 2.4959,
2179
+ "step": 155500
2180
+ },
2181
+ {
2182
+ "epoch": 6.7,
2183
+ "learning_rate": 0.0001321190772799519,
2184
+ "loss": 2.4913,
2185
+ "step": 156000
2186
+ },
2187
+ {
2188
+ "epoch": 6.7,
2189
+ "eval_loss": 3.2231602668762207,
2190
+ "eval_runtime": 70.5445,
2191
+ "eval_samples_per_second": 35.439,
2192
+ "eval_steps_per_second": 8.86,
2193
+ "step": 156000
2194
+ },
2195
+ {
2196
+ "epoch": 6.72,
2197
+ "learning_rate": 0.00013125993384595559,
2198
+ "loss": 2.4955,
2199
+ "step": 156500
2200
+ },
2201
+ {
2202
+ "epoch": 6.74,
2203
+ "learning_rate": 0.00013040079041195927,
2204
+ "loss": 2.4916,
2205
+ "step": 157000
2206
+ },
2207
+ {
2208
+ "epoch": 6.76,
2209
+ "learning_rate": 0.00012954164697796298,
2210
+ "loss": 2.4898,
2211
+ "step": 157500
2212
+ },
2213
+ {
2214
+ "epoch": 6.78,
2215
+ "learning_rate": 0.00012868250354396667,
2216
+ "loss": 2.5014,
2217
+ "step": 158000
2218
+ },
2219
+ {
2220
+ "epoch": 6.81,
2221
+ "learning_rate": 0.00012782336010997035,
2222
+ "loss": 2.4953,
2223
+ "step": 158500
2224
+ },
2225
+ {
2226
+ "epoch": 6.83,
2227
+ "learning_rate": 0.00012696421667597406,
2228
+ "loss": 2.5013,
2229
+ "step": 159000
2230
+ },
2231
+ {
2232
+ "epoch": 6.85,
2233
+ "learning_rate": 0.00012610507324197775,
2234
+ "loss": 2.5066,
2235
+ "step": 159500
2236
+ },
2237
+ {
2238
+ "epoch": 6.87,
2239
+ "learning_rate": 0.00012524592980798143,
2240
+ "loss": 2.4968,
2241
+ "step": 160000
2242
+ },
2243
+ {
2244
+ "epoch": 6.87,
2245
+ "eval_loss": 3.2243635654449463,
2246
+ "eval_runtime": 70.4612,
2247
+ "eval_samples_per_second": 35.481,
2248
+ "eval_steps_per_second": 8.87,
2249
+ "step": 160000
2250
+ },
2251
+ {
2252
+ "epoch": 6.89,
2253
+ "learning_rate": 0.00012438678637398515,
2254
+ "loss": 2.5027,
2255
+ "step": 160500
2256
+ },
2257
+ {
2258
+ "epoch": 6.91,
2259
+ "learning_rate": 0.00012352764293998883,
2260
+ "loss": 2.4855,
2261
+ "step": 161000
2262
+ },
2263
+ {
2264
+ "epoch": 6.93,
2265
+ "learning_rate": 0.00012266849950599252,
2266
+ "loss": 2.4915,
2267
+ "step": 161500
2268
+ },
2269
+ {
2270
+ "epoch": 6.96,
2271
+ "learning_rate": 0.00012180935607199621,
2272
+ "loss": 2.5066,
2273
+ "step": 162000
2274
+ },
2275
+ {
2276
+ "epoch": 6.98,
2277
+ "learning_rate": 0.00012095021263799993,
2278
+ "loss": 2.4947,
2279
+ "step": 162500
2280
+ },
2281
+ {
2282
+ "epoch": 7.0,
2283
+ "learning_rate": 0.00012009106920400361,
2284
+ "loss": 2.4971,
2285
+ "step": 163000
2286
+ },
2287
+ {
2288
+ "epoch": 7.02,
2289
+ "learning_rate": 0.0001192319257700073,
2290
+ "loss": 2.4976,
2291
+ "step": 163500
2292
+ },
2293
+ {
2294
+ "epoch": 7.04,
2295
+ "learning_rate": 0.00011837278233601101,
2296
+ "loss": 2.4854,
2297
+ "step": 164000
2298
+ },
2299
+ {
2300
+ "epoch": 7.04,
2301
+ "eval_loss": 3.2264649868011475,
2302
+ "eval_runtime": 69.9835,
2303
+ "eval_samples_per_second": 35.723,
2304
+ "eval_steps_per_second": 8.931,
2305
+ "step": 164000
2306
+ },
2307
+ {
2308
+ "epoch": 7.06,
2309
+ "learning_rate": 0.0001175136389020147,
2310
+ "loss": 2.4712,
2311
+ "step": 164500
2312
+ },
2313
+ {
2314
+ "epoch": 7.08,
2315
+ "learning_rate": 0.00011665449546801839,
2316
+ "loss": 2.4761,
2317
+ "step": 165000
2318
+ },
2319
+ {
2320
+ "epoch": 7.11,
2321
+ "learning_rate": 0.00011579535203402208,
2322
+ "loss": 2.4873,
2323
+ "step": 165500
2324
+ },
2325
+ {
2326
+ "epoch": 7.13,
2327
+ "learning_rate": 0.00011493620860002579,
2328
+ "loss": 2.4862,
2329
+ "step": 166000
2330
+ },
2331
+ {
2332
+ "epoch": 7.15,
2333
+ "learning_rate": 0.00011407706516602947,
2334
+ "loss": 2.4801,
2335
+ "step": 166500
2336
+ },
2337
+ {
2338
+ "epoch": 7.17,
2339
+ "learning_rate": 0.00011321792173203316,
2340
+ "loss": 2.4785,
2341
+ "step": 167000
2342
+ },
2343
+ {
2344
+ "epoch": 7.19,
2345
+ "learning_rate": 0.00011235877829803687,
2346
+ "loss": 2.4861,
2347
+ "step": 167500
2348
+ },
2349
+ {
2350
+ "epoch": 7.21,
2351
+ "learning_rate": 0.00011149963486404055,
2352
+ "loss": 2.4865,
2353
+ "step": 168000
2354
+ },
2355
+ {
2356
+ "epoch": 7.21,
2357
+ "eval_loss": 3.221519947052002,
2358
+ "eval_runtime": 70.7932,
2359
+ "eval_samples_per_second": 35.314,
2360
+ "eval_steps_per_second": 8.829,
2361
+ "step": 168000
2362
+ },
2363
+ {
2364
+ "epoch": 7.24,
2365
+ "learning_rate": 0.00011064049143004425,
2366
+ "loss": 2.4908,
2367
+ "step": 168500
2368
+ },
2369
+ {
2370
+ "epoch": 7.26,
2371
+ "learning_rate": 0.00010978134799604795,
2372
+ "loss": 2.4814,
2373
+ "step": 169000
2374
+ },
2375
+ {
2376
+ "epoch": 7.28,
2377
+ "learning_rate": 0.00010892220456205165,
2378
+ "loss": 2.4759,
2379
+ "step": 169500
2380
+ },
2381
+ {
2382
+ "epoch": 7.3,
2383
+ "learning_rate": 0.00010806306112805533,
2384
+ "loss": 2.495,
2385
+ "step": 170000
2386
+ },
2387
+ {
2388
+ "epoch": 7.32,
2389
+ "learning_rate": 0.00010720391769405902,
2390
+ "loss": 2.4837,
2391
+ "step": 170500
2392
+ },
2393
+ {
2394
+ "epoch": 7.34,
2395
+ "learning_rate": 0.00010634477426006273,
2396
+ "loss": 2.4946,
2397
+ "step": 171000
2398
+ },
2399
+ {
2400
+ "epoch": 7.36,
2401
+ "learning_rate": 0.00010548563082606642,
2402
+ "loss": 2.4762,
2403
+ "step": 171500
2404
+ },
2405
+ {
2406
+ "epoch": 7.39,
2407
+ "learning_rate": 0.0001046264873920701,
2408
+ "loss": 2.4854,
2409
+ "step": 172000
2410
+ },
2411
+ {
2412
+ "epoch": 7.39,
2413
+ "eval_loss": 3.220597982406616,
2414
+ "eval_runtime": 69.1381,
2415
+ "eval_samples_per_second": 36.159,
2416
+ "eval_steps_per_second": 9.04,
2417
+ "step": 172000
2418
+ },
2419
+ {
2420
+ "epoch": 7.41,
2421
+ "learning_rate": 0.00010376734395807381,
2422
+ "loss": 2.4899,
2423
+ "step": 172500
2424
+ },
2425
+ {
2426
+ "epoch": 7.43,
2427
+ "learning_rate": 0.0001029082005240775,
2428
+ "loss": 2.486,
2429
+ "step": 173000
2430
+ },
2431
+ {
2432
+ "epoch": 7.45,
2433
+ "learning_rate": 0.0001020490570900812,
2434
+ "loss": 2.4792,
2435
+ "step": 173500
2436
+ },
2437
+ {
2438
+ "epoch": 7.47,
2439
+ "learning_rate": 0.00010118991365608488,
2440
+ "loss": 2.4907,
2441
+ "step": 174000
2442
+ },
2443
+ {
2444
+ "epoch": 7.49,
2445
+ "learning_rate": 0.00010033077022208859,
2446
+ "loss": 2.486,
2447
+ "step": 174500
2448
+ },
2449
+ {
2450
+ "epoch": 7.51,
2451
+ "learning_rate": 9.947162678809228e-05,
2452
+ "loss": 2.4801,
2453
+ "step": 175000
2454
+ },
2455
+ {
2456
+ "epoch": 7.54,
2457
+ "learning_rate": 9.861248335409598e-05,
2458
+ "loss": 2.4931,
2459
+ "step": 175500
2460
+ },
2461
+ {
2462
+ "epoch": 7.56,
2463
+ "learning_rate": 9.775333992009966e-05,
2464
+ "loss": 2.4853,
2465
+ "step": 176000
2466
+ },
2467
+ {
2468
+ "epoch": 7.56,
2469
+ "eval_loss": 3.2134642601013184,
2470
+ "eval_runtime": 70.239,
2471
+ "eval_samples_per_second": 35.593,
2472
+ "eval_steps_per_second": 8.898,
2473
+ "step": 176000
2474
+ },
2475
+ {
2476
+ "epoch": 7.58,
2477
+ "learning_rate": 9.689419648610336e-05,
2478
+ "loss": 2.4821,
2479
+ "step": 176500
2480
+ },
2481
+ {
2482
+ "epoch": 7.6,
2483
+ "learning_rate": 9.603505305210704e-05,
2484
+ "loss": 2.4801,
2485
+ "step": 177000
2486
+ },
2487
+ {
2488
+ "epoch": 7.62,
2489
+ "learning_rate": 9.517590961811074e-05,
2490
+ "loss": 2.4888,
2491
+ "step": 177500
2492
+ },
2493
+ {
2494
+ "epoch": 7.64,
2495
+ "learning_rate": 9.431676618411444e-05,
2496
+ "loss": 2.4847,
2497
+ "step": 178000
2498
+ },
2499
+ {
2500
+ "epoch": 7.66,
2501
+ "learning_rate": 9.345762275011814e-05,
2502
+ "loss": 2.4929,
2503
+ "step": 178500
2504
+ },
2505
+ {
2506
+ "epoch": 7.69,
2507
+ "learning_rate": 9.259847931612184e-05,
2508
+ "loss": 2.4911,
2509
+ "step": 179000
2510
+ },
2511
+ {
2512
+ "epoch": 7.71,
2513
+ "learning_rate": 9.173933588212552e-05,
2514
+ "loss": 2.4881,
2515
+ "step": 179500
2516
+ },
2517
+ {
2518
+ "epoch": 7.73,
2519
+ "learning_rate": 9.088019244812922e-05,
2520
+ "loss": 2.4894,
2521
+ "step": 180000
2522
+ },
2523
+ {
2524
+ "epoch": 7.73,
2525
+ "eval_loss": 3.2165322303771973,
2526
+ "eval_runtime": 70.2927,
2527
+ "eval_samples_per_second": 35.566,
2528
+ "eval_steps_per_second": 8.891,
2529
+ "step": 180000
2530
+ },
2531
+ {
2532
+ "epoch": 7.75,
2533
+ "learning_rate": 9.002104901413292e-05,
2534
+ "loss": 2.4906,
2535
+ "step": 180500
2536
+ },
2537
+ {
2538
+ "epoch": 7.77,
2539
+ "learning_rate": 8.91619055801366e-05,
2540
+ "loss": 2.4783,
2541
+ "step": 181000
2542
+ },
2543
+ {
2544
+ "epoch": 7.79,
2545
+ "learning_rate": 8.83027621461403e-05,
2546
+ "loss": 2.4734,
2547
+ "step": 181500
2548
+ },
2549
+ {
2550
+ "epoch": 7.81,
2551
+ "learning_rate": 8.744361871214399e-05,
2552
+ "loss": 2.4803,
2553
+ "step": 182000
2554
+ },
2555
+ {
2556
+ "epoch": 7.84,
2557
+ "learning_rate": 8.658447527814769e-05,
2558
+ "loss": 2.489,
2559
+ "step": 182500
2560
+ },
2561
+ {
2562
+ "epoch": 7.86,
2563
+ "learning_rate": 8.572533184415138e-05,
2564
+ "loss": 2.4844,
2565
+ "step": 183000
2566
+ },
2567
+ {
2568
+ "epoch": 7.88,
2569
+ "learning_rate": 8.486618841015508e-05,
2570
+ "loss": 2.4934,
2571
+ "step": 183500
2572
+ },
2573
+ {
2574
+ "epoch": 7.9,
2575
+ "learning_rate": 8.400704497615878e-05,
2576
+ "loss": 2.4811,
2577
+ "step": 184000
2578
+ },
2579
+ {
2580
+ "epoch": 7.9,
2581
+ "eval_loss": 3.214756727218628,
2582
+ "eval_runtime": 70.5306,
2583
+ "eval_samples_per_second": 35.446,
2584
+ "eval_steps_per_second": 8.861,
2585
+ "step": 184000
2586
+ },
2587
+ {
2588
+ "epoch": 7.92,
2589
+ "learning_rate": 8.314790154216247e-05,
2590
+ "loss": 2.4828,
2591
+ "step": 184500
2592
+ },
2593
+ {
2594
+ "epoch": 7.94,
2595
+ "learning_rate": 8.228875810816616e-05,
2596
+ "loss": 2.4778,
2597
+ "step": 185000
2598
+ },
2599
+ {
2600
+ "epoch": 7.97,
2601
+ "learning_rate": 8.142961467416985e-05,
2602
+ "loss": 2.4866,
2603
+ "step": 185500
2604
+ },
2605
+ {
2606
+ "epoch": 7.99,
2607
+ "learning_rate": 8.057047124017355e-05,
2608
+ "loss": 2.4847,
2609
+ "step": 186000
2610
+ },
2611
+ {
2612
+ "epoch": 8.01,
2613
+ "learning_rate": 7.971132780617725e-05,
2614
+ "loss": 2.4715,
2615
+ "step": 186500
2616
+ },
2617
+ {
2618
+ "epoch": 8.03,
2619
+ "learning_rate": 7.885218437218093e-05,
2620
+ "loss": 2.4782,
2621
+ "step": 187000
2622
+ },
2623
+ {
2624
+ "epoch": 8.05,
2625
+ "learning_rate": 7.799304093818464e-05,
2626
+ "loss": 2.4781,
2627
+ "step": 187500
2628
+ },
2629
+ {
2630
+ "epoch": 8.07,
2631
+ "learning_rate": 7.713389750418833e-05,
2632
+ "loss": 2.4789,
2633
+ "step": 188000
2634
+ },
2635
+ {
2636
+ "epoch": 8.07,
2637
+ "eval_loss": 3.2158737182617188,
2638
+ "eval_runtime": 71.443,
2639
+ "eval_samples_per_second": 34.993,
2640
+ "eval_steps_per_second": 8.748,
2641
+ "step": 188000
2642
+ },
2643
+ {
2644
+ "epoch": 8.09,
2645
+ "learning_rate": 7.627475407019203e-05,
2646
+ "loss": 2.4867,
2647
+ "step": 188500
2648
+ },
2649
+ {
2650
+ "epoch": 8.12,
2651
+ "learning_rate": 7.541561063619572e-05,
2652
+ "loss": 2.4799,
2653
+ "step": 189000
2654
+ },
2655
+ {
2656
+ "epoch": 8.14,
2657
+ "learning_rate": 7.455646720219941e-05,
2658
+ "loss": 2.4779,
2659
+ "step": 189500
2660
+ },
2661
+ {
2662
+ "epoch": 8.16,
2663
+ "learning_rate": 7.369732376820311e-05,
2664
+ "loss": 2.4703,
2665
+ "step": 190000
2666
+ },
2667
+ {
2668
+ "epoch": 8.18,
2669
+ "learning_rate": 7.283818033420679e-05,
2670
+ "loss": 2.4715,
2671
+ "step": 190500
2672
+ },
2673
+ {
2674
+ "epoch": 8.2,
2675
+ "learning_rate": 7.197903690021049e-05,
2676
+ "loss": 2.4754,
2677
+ "step": 191000
2678
+ },
2679
+ {
2680
+ "epoch": 8.22,
2681
+ "learning_rate": 7.111989346621419e-05,
2682
+ "loss": 2.4819,
2683
+ "step": 191500
2684
+ },
2685
+ {
2686
+ "epoch": 8.24,
2687
+ "learning_rate": 7.026075003221789e-05,
2688
+ "loss": 2.4768,
2689
+ "step": 192000
2690
+ },
2691
+ {
2692
+ "epoch": 8.24,
2693
+ "eval_loss": 3.2115867137908936,
2694
+ "eval_runtime": 70.9666,
2695
+ "eval_samples_per_second": 35.228,
2696
+ "eval_steps_per_second": 8.807,
2697
+ "step": 192000
2698
+ },
2699
+ {
2700
+ "epoch": 8.27,
2701
+ "learning_rate": 6.940160659822159e-05,
2702
+ "loss": 2.4697,
2703
+ "step": 192500
2704
+ },
2705
+ {
2706
+ "epoch": 8.29,
2707
+ "learning_rate": 6.854246316422527e-05,
2708
+ "loss": 2.4843,
2709
+ "step": 193000
2710
+ },
2711
+ {
2712
+ "epoch": 8.31,
2713
+ "learning_rate": 6.768331973022897e-05,
2714
+ "loss": 2.484,
2715
+ "step": 193500
2716
+ },
2717
+ {
2718
+ "epoch": 8.33,
2719
+ "learning_rate": 6.682417629623265e-05,
2720
+ "loss": 2.4753,
2721
+ "step": 194000
2722
+ },
2723
+ {
2724
+ "epoch": 8.35,
2725
+ "learning_rate": 6.596503286223635e-05,
2726
+ "loss": 2.4827,
2727
+ "step": 194500
2728
+ },
2729
+ {
2730
+ "epoch": 8.37,
2731
+ "learning_rate": 6.510588942824005e-05,
2732
+ "loss": 2.4798,
2733
+ "step": 195000
2734
+ },
2735
+ {
2736
+ "epoch": 8.39,
2737
+ "learning_rate": 6.424674599424374e-05,
2738
+ "loss": 2.4785,
2739
+ "step": 195500
2740
+ },
2741
+ {
2742
+ "epoch": 8.42,
2743
+ "learning_rate": 6.338760256024743e-05,
2744
+ "loss": 2.4832,
2745
+ "step": 196000
2746
+ },
2747
+ {
2748
+ "epoch": 8.42,
2749
+ "eval_loss": 3.2087242603302,
2750
+ "eval_runtime": 70.623,
2751
+ "eval_samples_per_second": 35.399,
2752
+ "eval_steps_per_second": 8.85,
2753
+ "step": 196000
2754
+ },
2755
+ {
2756
+ "epoch": 8.44,
2757
+ "learning_rate": 6.252845912625113e-05,
2758
+ "loss": 2.4944,
2759
+ "step": 196500
2760
+ },
2761
+ {
2762
+ "epoch": 8.46,
2763
+ "learning_rate": 6.166931569225483e-05,
2764
+ "loss": 2.4881,
2765
+ "step": 197000
2766
+ },
2767
+ {
2768
+ "epoch": 8.48,
2769
+ "learning_rate": 6.081017225825852e-05,
2770
+ "loss": 2.4742,
2771
+ "step": 197500
2772
+ },
2773
+ {
2774
+ "epoch": 8.5,
2775
+ "learning_rate": 5.9951028824262214e-05,
2776
+ "loss": 2.478,
2777
+ "step": 198000
2778
+ },
2779
+ {
2780
+ "epoch": 8.52,
2781
+ "learning_rate": 5.909188539026591e-05,
2782
+ "loss": 2.4703,
2783
+ "step": 198500
2784
+ },
2785
+ {
2786
+ "epoch": 8.54,
2787
+ "learning_rate": 5.82327419562696e-05,
2788
+ "loss": 2.479,
2789
+ "step": 199000
2790
+ },
2791
+ {
2792
+ "epoch": 8.57,
2793
+ "learning_rate": 5.7373598522273295e-05,
2794
+ "loss": 2.4808,
2795
+ "step": 199500
2796
+ },
2797
+ {
2798
+ "epoch": 8.59,
2799
+ "learning_rate": 5.651445508827699e-05,
2800
+ "loss": 2.4774,
2801
+ "step": 200000
2802
+ },
2803
+ {
2804
+ "epoch": 8.59,
2805
+ "eval_loss": 3.211932420730591,
2806
+ "eval_runtime": 70.6236,
2807
+ "eval_samples_per_second": 35.399,
2808
+ "eval_steps_per_second": 8.85,
2809
+ "step": 200000
2810
+ },
2811
+ {
2812
+ "epoch": 8.61,
2813
+ "learning_rate": 5.5655311654280685e-05,
2814
+ "loss": 2.4699,
2815
+ "step": 200500
2816
+ },
2817
+ {
2818
+ "epoch": 8.63,
2819
+ "learning_rate": 5.4796168220284384e-05,
2820
+ "loss": 2.4727,
2821
+ "step": 201000
2822
+ },
2823
+ {
2824
+ "epoch": 8.65,
2825
+ "learning_rate": 5.393702478628807e-05,
2826
+ "loss": 2.4784,
2827
+ "step": 201500
2828
+ },
2829
+ {
2830
+ "epoch": 8.67,
2831
+ "learning_rate": 5.307788135229177e-05,
2832
+ "loss": 2.4695,
2833
+ "step": 202000
2834
+ },
2835
+ {
2836
+ "epoch": 8.7,
2837
+ "learning_rate": 5.221873791829546e-05,
2838
+ "loss": 2.4829,
2839
+ "step": 202500
2840
+ },
2841
+ {
2842
+ "epoch": 8.72,
2843
+ "learning_rate": 5.135959448429916e-05,
2844
+ "loss": 2.4752,
2845
+ "step": 203000
2846
+ },
2847
+ {
2848
+ "epoch": 8.74,
2849
+ "learning_rate": 5.0500451050302855e-05,
2850
+ "loss": 2.4781,
2851
+ "step": 203500
2852
+ },
2853
+ {
2854
+ "epoch": 8.76,
2855
+ "learning_rate": 4.964130761630654e-05,
2856
+ "loss": 2.4748,
2857
+ "step": 204000
2858
+ },
2859
+ {
2860
+ "epoch": 8.76,
2861
+ "eval_loss": 3.210552453994751,
2862
+ "eval_runtime": 69.9859,
2863
+ "eval_samples_per_second": 35.721,
2864
+ "eval_steps_per_second": 8.93,
2865
+ "step": 204000
2866
+ },
2867
+ {
2868
+ "epoch": 8.78,
2869
+ "learning_rate": 4.878216418231024e-05,
2870
+ "loss": 2.474,
2871
+ "step": 204500
2872
+ },
2873
+ {
2874
+ "epoch": 8.8,
2875
+ "learning_rate": 4.792302074831394e-05,
2876
+ "loss": 2.463,
2877
+ "step": 205000
2878
+ },
2879
+ {
2880
+ "epoch": 8.82,
2881
+ "learning_rate": 4.706387731431763e-05,
2882
+ "loss": 2.4736,
2883
+ "step": 205500
2884
+ },
2885
+ {
2886
+ "epoch": 8.85,
2887
+ "learning_rate": 4.620473388032132e-05,
2888
+ "loss": 2.4883,
2889
+ "step": 206000
2890
+ },
2891
+ {
2892
+ "epoch": 8.87,
2893
+ "learning_rate": 4.534559044632501e-05,
2894
+ "loss": 2.4745,
2895
+ "step": 206500
2896
+ },
2897
+ {
2898
+ "epoch": 8.89,
2899
+ "learning_rate": 4.448644701232871e-05,
2900
+ "loss": 2.4775,
2901
+ "step": 207000
2902
+ },
2903
+ {
2904
+ "epoch": 8.91,
2905
+ "learning_rate": 4.362730357833241e-05,
2906
+ "loss": 2.4785,
2907
+ "step": 207500
2908
+ },
2909
+ {
2910
+ "epoch": 8.93,
2911
+ "learning_rate": 4.27681601443361e-05,
2912
+ "loss": 2.472,
2913
+ "step": 208000
2914
+ },
2915
+ {
2916
+ "epoch": 8.93,
2917
+ "eval_loss": 3.2094578742980957,
2918
+ "eval_runtime": 70.3513,
2919
+ "eval_samples_per_second": 35.536,
2920
+ "eval_steps_per_second": 8.884,
2921
+ "step": 208000
2922
+ },
2923
+ {
2924
+ "epoch": 8.95,
2925
+ "learning_rate": 4.190901671033979e-05,
2926
+ "loss": 2.4694,
2927
+ "step": 208500
2928
+ },
2929
+ {
2930
+ "epoch": 8.97,
2931
+ "learning_rate": 4.1049873276343484e-05,
2932
+ "loss": 2.4754,
2933
+ "step": 209000
2934
+ },
2935
+ {
2936
+ "epoch": 9.0,
2937
+ "learning_rate": 4.019072984234718e-05,
2938
+ "loss": 2.4789,
2939
+ "step": 209500
2940
+ },
2941
+ {
2942
+ "epoch": 9.02,
2943
+ "learning_rate": 3.9331586408350874e-05,
2944
+ "loss": 2.4632,
2945
+ "step": 210000
2946
+ },
2947
+ {
2948
+ "epoch": 9.04,
2949
+ "learning_rate": 3.847244297435457e-05,
2950
+ "loss": 2.4676,
2951
+ "step": 210500
2952
+ },
2953
+ {
2954
+ "epoch": 9.06,
2955
+ "learning_rate": 3.7613299540358264e-05,
2956
+ "loss": 2.4675,
2957
+ "step": 211000
2958
+ },
2959
+ {
2960
+ "epoch": 9.08,
2961
+ "learning_rate": 3.675415610636196e-05,
2962
+ "loss": 2.4588,
2963
+ "step": 211500
2964
+ },
2965
+ {
2966
+ "epoch": 9.1,
2967
+ "learning_rate": 3.5895012672365654e-05,
2968
+ "loss": 2.4749,
2969
+ "step": 212000
2970
+ },
2971
+ {
2972
+ "epoch": 9.1,
2973
+ "eval_loss": 3.204545497894287,
2974
+ "eval_runtime": 69.8241,
2975
+ "eval_samples_per_second": 35.804,
2976
+ "eval_steps_per_second": 8.951,
2977
+ "step": 212000
2978
+ },
2979
+ {
2980
+ "epoch": 9.12,
2981
+ "learning_rate": 3.5035869238369345e-05,
2982
+ "loss": 2.4771,
2983
+ "step": 212500
2984
+ },
2985
+ {
2986
+ "epoch": 9.15,
2987
+ "learning_rate": 3.4176725804373044e-05,
2988
+ "loss": 2.4606,
2989
+ "step": 213000
2990
+ },
2991
+ {
2992
+ "epoch": 9.17,
2993
+ "learning_rate": 3.3317582370376735e-05,
2994
+ "loss": 2.4672,
2995
+ "step": 213500
2996
+ },
2997
+ {
2998
+ "epoch": 9.19,
2999
+ "learning_rate": 3.2458438936380434e-05,
3000
+ "loss": 2.4725,
3001
+ "step": 214000
3002
+ },
3003
+ {
3004
+ "epoch": 9.21,
3005
+ "learning_rate": 3.1599295502384125e-05,
3006
+ "loss": 2.4669,
3007
+ "step": 214500
3008
+ },
3009
+ {
3010
+ "epoch": 9.23,
3011
+ "learning_rate": 3.074015206838782e-05,
3012
+ "loss": 2.4674,
3013
+ "step": 215000
3014
+ },
3015
+ {
3016
+ "epoch": 9.25,
3017
+ "learning_rate": 2.9881008634391512e-05,
3018
+ "loss": 2.4661,
3019
+ "step": 215500
3020
+ },
3021
+ {
3022
+ "epoch": 9.27,
3023
+ "learning_rate": 2.902186520039521e-05,
3024
+ "loss": 2.4722,
3025
+ "step": 216000
3026
+ },
3027
+ {
3028
+ "epoch": 9.27,
3029
+ "eval_loss": 3.2078707218170166,
3030
+ "eval_runtime": 69.832,
3031
+ "eval_samples_per_second": 35.8,
3032
+ "eval_steps_per_second": 8.95,
3033
+ "step": 216000
3034
+ },
3035
+ {
3036
+ "epoch": 9.3,
3037
+ "learning_rate": 2.8162721766398902e-05,
3038
+ "loss": 2.474,
3039
+ "step": 216500
3040
+ },
3041
+ {
3042
+ "epoch": 9.32,
3043
+ "learning_rate": 2.7303578332402597e-05,
3044
+ "loss": 2.4714,
3045
+ "step": 217000
3046
+ },
3047
+ {
3048
+ "epoch": 9.34,
3049
+ "learning_rate": 2.644443489840629e-05,
3050
+ "loss": 2.465,
3051
+ "step": 217500
3052
+ },
3053
+ {
3054
+ "epoch": 9.36,
3055
+ "learning_rate": 2.5585291464409983e-05,
3056
+ "loss": 2.4691,
3057
+ "step": 218000
3058
+ },
3059
+ {
3060
+ "epoch": 9.38,
3061
+ "learning_rate": 2.472614803041368e-05,
3062
+ "loss": 2.4803,
3063
+ "step": 218500
3064
+ },
3065
+ {
3066
+ "epoch": 9.4,
3067
+ "learning_rate": 2.3867004596417373e-05,
3068
+ "loss": 2.4742,
3069
+ "step": 219000
3070
+ },
3071
+ {
3072
+ "epoch": 9.42,
3073
+ "learning_rate": 2.300786116242107e-05,
3074
+ "loss": 2.4675,
3075
+ "step": 219500
3076
+ },
3077
+ {
3078
+ "epoch": 9.45,
3079
+ "learning_rate": 2.214871772842476e-05,
3080
+ "loss": 2.4707,
3081
+ "step": 220000
3082
+ },
3083
+ {
3084
+ "epoch": 9.45,
3085
+ "eval_loss": 3.203800678253174,
3086
+ "eval_runtime": 69.9274,
3087
+ "eval_samples_per_second": 35.751,
3088
+ "eval_steps_per_second": 8.938,
3089
+ "step": 220000
3090
+ },
3091
+ {
3092
+ "epoch": 9.47,
3093
+ "learning_rate": 2.1289574294428455e-05,
3094
+ "loss": 2.4683,
3095
+ "step": 220500
3096
+ },
3097
+ {
3098
+ "epoch": 9.49,
3099
+ "learning_rate": 2.043043086043215e-05,
3100
+ "loss": 2.4674,
3101
+ "step": 221000
3102
+ },
3103
+ {
3104
+ "epoch": 9.51,
3105
+ "learning_rate": 1.9571287426435845e-05,
3106
+ "loss": 2.4748,
3107
+ "step": 221500
3108
+ },
3109
+ {
3110
+ "epoch": 9.53,
3111
+ "learning_rate": 1.8712143992439537e-05,
3112
+ "loss": 2.462,
3113
+ "step": 222000
3114
+ },
3115
+ {
3116
+ "epoch": 9.55,
3117
+ "learning_rate": 1.7853000558443235e-05,
3118
+ "loss": 2.4709,
3119
+ "step": 222500
3120
+ },
3121
+ {
3122
+ "epoch": 9.58,
3123
+ "learning_rate": 1.6993857124446927e-05,
3124
+ "loss": 2.4593,
3125
+ "step": 223000
3126
+ },
3127
+ {
3128
+ "epoch": 9.6,
3129
+ "learning_rate": 1.6134713690450622e-05,
3130
+ "loss": 2.4553,
3131
+ "step": 223500
3132
+ },
3133
+ {
3134
+ "epoch": 9.62,
3135
+ "learning_rate": 1.5275570256454317e-05,
3136
+ "loss": 2.4683,
3137
+ "step": 224000
3138
+ },
3139
+ {
3140
+ "epoch": 9.62,
3141
+ "eval_loss": 3.205549478530884,
3142
+ "eval_runtime": 69.3247,
3143
+ "eval_samples_per_second": 36.062,
3144
+ "eval_steps_per_second": 9.016,
3145
+ "step": 224000
3146
+ },
3147
+ {
3148
+ "epoch": 9.64,
3149
+ "learning_rate": 1.441642682245801e-05,
3150
+ "loss": 2.4676,
3151
+ "step": 224500
3152
+ },
3153
+ {
3154
+ "epoch": 9.66,
3155
+ "learning_rate": 1.3557283388461705e-05,
3156
+ "loss": 2.4728,
3157
+ "step": 225000
3158
+ },
3159
+ {
3160
+ "epoch": 9.68,
3161
+ "learning_rate": 1.2698139954465398e-05,
3162
+ "loss": 2.468,
3163
+ "step": 225500
3164
+ },
3165
+ {
3166
+ "epoch": 9.7,
3167
+ "learning_rate": 1.1838996520469092e-05,
3168
+ "loss": 2.4725,
3169
+ "step": 226000
3170
+ },
3171
+ {
3172
+ "epoch": 9.73,
3173
+ "learning_rate": 1.0979853086472787e-05,
3174
+ "loss": 2.4756,
3175
+ "step": 226500
3176
+ },
3177
+ {
3178
+ "epoch": 9.75,
3179
+ "learning_rate": 1.0120709652476482e-05,
3180
+ "loss": 2.4708,
3181
+ "step": 227000
3182
+ },
3183
+ {
3184
+ "epoch": 9.77,
3185
+ "learning_rate": 9.261566218480177e-06,
3186
+ "loss": 2.4695,
3187
+ "step": 227500
3188
+ },
3189
+ {
3190
+ "epoch": 9.79,
3191
+ "learning_rate": 8.40242278448387e-06,
3192
+ "loss": 2.4601,
3193
+ "step": 228000
3194
+ },
3195
+ {
3196
+ "epoch": 9.79,
3197
+ "eval_loss": 3.203986406326294,
3198
+ "eval_runtime": 70.61,
3199
+ "eval_samples_per_second": 35.406,
3200
+ "eval_steps_per_second": 8.851,
3201
+ "step": 228000
3202
+ },
3203
+ {
3204
+ "epoch": 9.81,
3205
+ "learning_rate": 7.543279350487564e-06,
3206
+ "loss": 2.4745,
3207
+ "step": 228500
3208
+ },
3209
+ {
3210
+ "epoch": 9.83,
3211
+ "learning_rate": 6.684135916491259e-06,
3212
+ "loss": 2.4692,
3213
+ "step": 229000
3214
+ },
3215
+ {
3216
+ "epoch": 9.85,
3217
+ "learning_rate": 5.824992482494953e-06,
3218
+ "loss": 2.4679,
3219
+ "step": 229500
3220
+ },
3221
+ {
3222
+ "epoch": 9.88,
3223
+ "learning_rate": 4.965849048498647e-06,
3224
+ "loss": 2.4675,
3225
+ "step": 230000
3226
+ },
3227
+ {
3228
+ "epoch": 9.9,
3229
+ "learning_rate": 4.106705614502342e-06,
3230
+ "loss": 2.4678,
3231
+ "step": 230500
3232
+ },
3233
+ {
3234
+ "epoch": 9.92,
3235
+ "learning_rate": 3.2475621805060358e-06,
3236
+ "loss": 2.4679,
3237
+ "step": 231000
3238
+ },
3239
+ {
3240
+ "epoch": 9.94,
3241
+ "learning_rate": 2.38841874650973e-06,
3242
+ "loss": 2.4683,
3243
+ "step": 231500
3244
+ },
3245
+ {
3246
+ "epoch": 9.96,
3247
+ "learning_rate": 1.529275312513424e-06,
3248
+ "loss": 2.4613,
3249
+ "step": 232000
3250
+ },
3251
+ {
3252
+ "epoch": 9.96,
3253
+ "eval_loss": 3.2050249576568604,
3254
+ "eval_runtime": 69.6048,
3255
+ "eval_samples_per_second": 35.917,
3256
+ "eval_steps_per_second": 8.979,
3257
+ "step": 232000
3258
+ }
3259
+ ],
3260
+ "logging_steps": 500,
3261
+ "max_steps": 232890,
3262
+ "num_train_epochs": 10,
3263
+ "save_steps": 8000,
3264
+ "total_flos": 5.270438359850582e+18,
3265
+ "trial_name": null,
3266
+ "trial_params": null
3267
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7ae6060e2dd22815f89a335924b12b85b7889e44828b24a16a0f7fe1752e1f4
3
+ size 4472