habanoz commited on
Commit
21a1653
1 Parent(s): 17434fc

Upload folder using huggingface_hub

Browse files
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:58c4abac213a98ad1575e336811bac987fa1af4a807d81c136bd4215e0e3ccb4
3
  size 201892112
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b78df766e4e468b9aab567b6ceaecb08b3db111cb1b090bc4e219c1be8f668e
3
  size 201892112
checkpoint-1488/README.md ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Data Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
199
+
200
+
201
+ ## Training procedure
202
+
203
+
204
+ The following `bitsandbytes` quantization config was used during training:
205
+ - quant_method: bitsandbytes
206
+ - load_in_8bit: False
207
+ - load_in_4bit: True
208
+ - llm_int8_threshold: 6.0
209
+ - llm_int8_skip_modules: None
210
+ - llm_int8_enable_fp32_cpu_offload: False
211
+ - llm_int8_has_fp16_weight: False
212
+ - bnb_4bit_quant_type: nf4
213
+ - bnb_4bit_use_double_quant: True
214
+ - bnb_4bit_compute_dtype: float16
215
+
216
+ ### Framework versions
217
+
218
+
219
+ - PEFT 0.6.2
checkpoint-1488/adapter_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "lora_alpha": 16.0,
12
+ "lora_dropout": 0.1,
13
+ "modules_to_save": null,
14
+ "peft_type": "LORA",
15
+ "r": 64,
16
+ "rank_pattern": {},
17
+ "revision": null,
18
+ "target_modules": [
19
+ "k_proj",
20
+ "v_proj",
21
+ "o_proj",
22
+ "up_proj",
23
+ "q_proj",
24
+ "down_proj",
25
+ "gate_proj"
26
+ ],
27
+ "task_type": "CAUSAL_LM"
28
+ }
checkpoint-1488/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b78df766e4e468b9aab567b6ceaecb08b3db111cb1b090bc4e219c1be8f668e
3
+ size 201892112
checkpoint-1488/adapter_model/README.md ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Data Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
199
+
200
+
201
+ ## Training procedure
202
+
203
+
204
+ The following `bitsandbytes` quantization config was used during training:
205
+ - quant_method: bitsandbytes
206
+ - load_in_8bit: False
207
+ - load_in_4bit: True
208
+ - llm_int8_threshold: 6.0
209
+ - llm_int8_skip_modules: None
210
+ - llm_int8_enable_fp32_cpu_offload: False
211
+ - llm_int8_has_fp16_weight: False
212
+ - bnb_4bit_quant_type: nf4
213
+ - bnb_4bit_use_double_quant: True
214
+ - bnb_4bit_compute_dtype: float16
215
+
216
+ ### Framework versions
217
+
218
+
219
+ - PEFT 0.6.2
checkpoint-1488/adapter_model/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b468d544465a2b6cc1e65c27e7c64e41da179202a4fce0f4bbe3189df84d283
3
+ size 9912320
checkpoint-1488/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:121f203e1f87745866ecf4ce56c64ee4e592768bffcfdeb75c450637697c60b4
3
+ size 403965498
checkpoint-1488/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bfa6b207b520c1195c1cfcbdf8fc7dedd039d977889bc961a574c174d01f6eb
3
+ size 14244
checkpoint-1488/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db7f9d155f11eec6bdf3d43c5ca6f5f6cf43cf75f6656a15f9fe0a28724699a8
3
+ size 1064
checkpoint-1488/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<unk>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-1488/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
checkpoint-1488/tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "bos_token": "<s>",
31
+ "chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif false == true and not '<<SYS>>' in messages[0]['content'] %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\\n\\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don\\'t know the answer to a question, please don\\'t share false information.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'system' %}{{ '<<SYS>>\\n' + content.strip() + '\\n<</SYS>>\\n\\n' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "legacy": false,
35
+ "model_max_length": 1000000000000000019884624838656,
36
+ "pad_token": "<unk>",
37
+ "padding_side": "right",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }
checkpoint-1488/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1488/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:256eb98e1514db6b5f4110313cf6834a546393b9505cfda85e800f159569f9ec
3
+ size 6840
checkpoint-744/README.md ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Data Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
199
+
200
+
201
+ ## Training procedure
202
+
203
+
204
+ The following `bitsandbytes` quantization config was used during training:
205
+ - quant_method: bitsandbytes
206
+ - load_in_8bit: False
207
+ - load_in_4bit: True
208
+ - llm_int8_threshold: 6.0
209
+ - llm_int8_skip_modules: None
210
+ - llm_int8_enable_fp32_cpu_offload: False
211
+ - llm_int8_has_fp16_weight: False
212
+ - bnb_4bit_quant_type: nf4
213
+ - bnb_4bit_use_double_quant: True
214
+ - bnb_4bit_compute_dtype: float16
215
+
216
+ ### Framework versions
217
+
218
+
219
+ - PEFT 0.6.2
checkpoint-744/adapter_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "lora_alpha": 16.0,
12
+ "lora_dropout": 0.1,
13
+ "modules_to_save": null,
14
+ "peft_type": "LORA",
15
+ "r": 64,
16
+ "rank_pattern": {},
17
+ "revision": null,
18
+ "target_modules": [
19
+ "k_proj",
20
+ "v_proj",
21
+ "o_proj",
22
+ "up_proj",
23
+ "q_proj",
24
+ "down_proj",
25
+ "gate_proj"
26
+ ],
27
+ "task_type": "CAUSAL_LM"
28
+ }
checkpoint-744/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58c4abac213a98ad1575e336811bac987fa1af4a807d81c136bd4215e0e3ccb4
3
+ size 201892112
checkpoint-744/adapter_model/README.md ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Data Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
199
+
200
+
201
+ ## Training procedure
202
+
203
+
204
+ The following `bitsandbytes` quantization config was used during training:
205
+ - quant_method: bitsandbytes
206
+ - load_in_8bit: False
207
+ - load_in_4bit: True
208
+ - llm_int8_threshold: 6.0
209
+ - llm_int8_skip_modules: None
210
+ - llm_int8_enable_fp32_cpu_offload: False
211
+ - llm_int8_has_fp16_weight: False
212
+ - bnb_4bit_quant_type: nf4
213
+ - bnb_4bit_use_double_quant: True
214
+ - bnb_4bit_compute_dtype: float16
215
+
216
+ ### Framework versions
217
+
218
+
219
+ - PEFT 0.6.2
checkpoint-744/adapter_model/adapter_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layers_pattern": null,
10
+ "layers_to_transform": null,
11
+ "lora_alpha": 16.0,
12
+ "lora_dropout": 0.1,
13
+ "modules_to_save": null,
14
+ "peft_type": "LORA",
15
+ "r": 64,
16
+ "rank_pattern": {},
17
+ "revision": null,
18
+ "target_modules": [
19
+ "k_proj",
20
+ "v_proj",
21
+ "o_proj",
22
+ "up_proj",
23
+ "q_proj",
24
+ "down_proj",
25
+ "gate_proj"
26
+ ],
27
+ "task_type": "CAUSAL_LM"
28
+ }
checkpoint-744/adapter_model/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58c4abac213a98ad1575e336811bac987fa1af4a807d81c136bd4215e0e3ccb4
3
+ size 201892112
checkpoint-744/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20536752a5e86854d1f53ba55fba9fe428b8f0fe5b37b2ea5e33ea021e8aeffa
3
+ size 403965498
checkpoint-744/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b19be971bf9250d62e5aaa12aa362a1a843d0744f35f26f8d5da21b0c056d9c
3
+ size 14244
checkpoint-744/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3093b555292e8ea86086206280646d262f8b13b1127fd83bff56d7228bca1ca3
3
+ size 1064
checkpoint-744/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<unk>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-744/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
checkpoint-744/tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "bos_token": "<s>",
31
+ "chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif false == true and not '<<SYS>>' in messages[0]['content'] %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\\n\\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don\\'t know the answer to a question, please don\\'t share false information.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'system' %}{{ '<<SYS>>\\n' + content.strip() + '\\n<</SYS>>\\n\\n' }}{% elif message['role'] == 'assistant' %}{{ ' ' + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "legacy": false,
35
+ "model_max_length": 1000000000000000019884624838656,
36
+ "pad_token": "<unk>",
37
+ "padding_side": "right",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }
checkpoint-744/trainer_state.json ADDED
@@ -0,0 +1,4539 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.9996640913671482,
5
+ "eval_steps": 100,
6
+ "global_step": 744,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 0.0002,
14
+ "loss": 1.4925,
15
+ "step": 1
16
+ },
17
+ {
18
+ "epoch": 0.0,
19
+ "learning_rate": 0.0002,
20
+ "loss": 1.5144,
21
+ "step": 2
22
+ },
23
+ {
24
+ "epoch": 0.0,
25
+ "learning_rate": 0.0002,
26
+ "loss": 1.3874,
27
+ "step": 3
28
+ },
29
+ {
30
+ "epoch": 0.01,
31
+ "learning_rate": 0.0002,
32
+ "loss": 1.8667,
33
+ "step": 4
34
+ },
35
+ {
36
+ "epoch": 0.01,
37
+ "learning_rate": 0.0002,
38
+ "loss": 1.5413,
39
+ "step": 5
40
+ },
41
+ {
42
+ "epoch": 0.01,
43
+ "learning_rate": 0.0002,
44
+ "loss": 1.3561,
45
+ "step": 6
46
+ },
47
+ {
48
+ "epoch": 0.01,
49
+ "learning_rate": 0.0002,
50
+ "loss": 1.6307,
51
+ "step": 7
52
+ },
53
+ {
54
+ "epoch": 0.01,
55
+ "learning_rate": 0.0002,
56
+ "loss": 1.6405,
57
+ "step": 8
58
+ },
59
+ {
60
+ "epoch": 0.01,
61
+ "learning_rate": 0.0002,
62
+ "loss": 1.5505,
63
+ "step": 9
64
+ },
65
+ {
66
+ "epoch": 0.01,
67
+ "learning_rate": 0.0002,
68
+ "loss": 1.3125,
69
+ "step": 10
70
+ },
71
+ {
72
+ "epoch": 0.01,
73
+ "learning_rate": 0.0002,
74
+ "loss": 1.3672,
75
+ "step": 11
76
+ },
77
+ {
78
+ "epoch": 0.02,
79
+ "learning_rate": 0.0002,
80
+ "loss": 1.5262,
81
+ "step": 12
82
+ },
83
+ {
84
+ "epoch": 0.02,
85
+ "learning_rate": 0.0002,
86
+ "loss": 1.6935,
87
+ "step": 13
88
+ },
89
+ {
90
+ "epoch": 0.02,
91
+ "learning_rate": 0.0002,
92
+ "loss": 1.4954,
93
+ "step": 14
94
+ },
95
+ {
96
+ "epoch": 0.02,
97
+ "learning_rate": 0.0002,
98
+ "loss": 1.4848,
99
+ "step": 15
100
+ },
101
+ {
102
+ "epoch": 0.02,
103
+ "learning_rate": 0.0002,
104
+ "loss": 1.4264,
105
+ "step": 16
106
+ },
107
+ {
108
+ "epoch": 0.02,
109
+ "learning_rate": 0.0002,
110
+ "loss": 1.473,
111
+ "step": 17
112
+ },
113
+ {
114
+ "epoch": 0.02,
115
+ "learning_rate": 0.0002,
116
+ "loss": 1.4026,
117
+ "step": 18
118
+ },
119
+ {
120
+ "epoch": 0.03,
121
+ "learning_rate": 0.0002,
122
+ "loss": 1.5937,
123
+ "step": 19
124
+ },
125
+ {
126
+ "epoch": 0.03,
127
+ "learning_rate": 0.0002,
128
+ "loss": 1.6744,
129
+ "step": 20
130
+ },
131
+ {
132
+ "epoch": 0.03,
133
+ "learning_rate": 0.0002,
134
+ "loss": 1.3461,
135
+ "step": 21
136
+ },
137
+ {
138
+ "epoch": 0.03,
139
+ "learning_rate": 0.0002,
140
+ "loss": 1.4358,
141
+ "step": 22
142
+ },
143
+ {
144
+ "epoch": 0.03,
145
+ "learning_rate": 0.0002,
146
+ "loss": 1.2995,
147
+ "step": 23
148
+ },
149
+ {
150
+ "epoch": 0.03,
151
+ "learning_rate": 0.0002,
152
+ "loss": 1.4141,
153
+ "step": 24
154
+ },
155
+ {
156
+ "epoch": 0.03,
157
+ "learning_rate": 0.0002,
158
+ "loss": 1.6971,
159
+ "step": 25
160
+ },
161
+ {
162
+ "epoch": 0.03,
163
+ "learning_rate": 0.0002,
164
+ "loss": 1.4573,
165
+ "step": 26
166
+ },
167
+ {
168
+ "epoch": 0.04,
169
+ "learning_rate": 0.0002,
170
+ "loss": 1.3961,
171
+ "step": 27
172
+ },
173
+ {
174
+ "epoch": 0.04,
175
+ "learning_rate": 0.0002,
176
+ "loss": 1.2911,
177
+ "step": 28
178
+ },
179
+ {
180
+ "epoch": 0.04,
181
+ "learning_rate": 0.0002,
182
+ "loss": 1.5642,
183
+ "step": 29
184
+ },
185
+ {
186
+ "epoch": 0.04,
187
+ "learning_rate": 0.0002,
188
+ "loss": 1.6783,
189
+ "step": 30
190
+ },
191
+ {
192
+ "epoch": 0.04,
193
+ "learning_rate": 0.0002,
194
+ "loss": 1.6658,
195
+ "step": 31
196
+ },
197
+ {
198
+ "epoch": 0.04,
199
+ "learning_rate": 0.0002,
200
+ "loss": 1.6881,
201
+ "step": 32
202
+ },
203
+ {
204
+ "epoch": 0.04,
205
+ "learning_rate": 0.0002,
206
+ "loss": 1.5346,
207
+ "step": 33
208
+ },
209
+ {
210
+ "epoch": 0.05,
211
+ "learning_rate": 0.0002,
212
+ "loss": 1.4087,
213
+ "step": 34
214
+ },
215
+ {
216
+ "epoch": 0.05,
217
+ "learning_rate": 0.0002,
218
+ "loss": 1.3902,
219
+ "step": 35
220
+ },
221
+ {
222
+ "epoch": 0.05,
223
+ "learning_rate": 0.0002,
224
+ "loss": 1.5317,
225
+ "step": 36
226
+ },
227
+ {
228
+ "epoch": 0.05,
229
+ "learning_rate": 0.0002,
230
+ "loss": 1.902,
231
+ "step": 37
232
+ },
233
+ {
234
+ "epoch": 0.05,
235
+ "learning_rate": 0.0002,
236
+ "loss": 1.5535,
237
+ "step": 38
238
+ },
239
+ {
240
+ "epoch": 0.05,
241
+ "learning_rate": 0.0002,
242
+ "loss": 1.2245,
243
+ "step": 39
244
+ },
245
+ {
246
+ "epoch": 0.05,
247
+ "learning_rate": 0.0002,
248
+ "loss": 1.5001,
249
+ "step": 40
250
+ },
251
+ {
252
+ "epoch": 0.06,
253
+ "learning_rate": 0.0002,
254
+ "loss": 1.3615,
255
+ "step": 41
256
+ },
257
+ {
258
+ "epoch": 0.06,
259
+ "learning_rate": 0.0002,
260
+ "loss": 1.3751,
261
+ "step": 42
262
+ },
263
+ {
264
+ "epoch": 0.06,
265
+ "learning_rate": 0.0002,
266
+ "loss": 1.4114,
267
+ "step": 43
268
+ },
269
+ {
270
+ "epoch": 0.06,
271
+ "learning_rate": 0.0002,
272
+ "loss": 1.4872,
273
+ "step": 44
274
+ },
275
+ {
276
+ "epoch": 0.06,
277
+ "learning_rate": 0.0002,
278
+ "loss": 1.1861,
279
+ "step": 45
280
+ },
281
+ {
282
+ "epoch": 0.06,
283
+ "learning_rate": 0.0002,
284
+ "loss": 1.4556,
285
+ "step": 46
286
+ },
287
+ {
288
+ "epoch": 0.06,
289
+ "learning_rate": 0.0002,
290
+ "loss": 1.4738,
291
+ "step": 47
292
+ },
293
+ {
294
+ "epoch": 0.06,
295
+ "learning_rate": 0.0002,
296
+ "loss": 1.5168,
297
+ "step": 48
298
+ },
299
+ {
300
+ "epoch": 0.07,
301
+ "learning_rate": 0.0002,
302
+ "loss": 1.4411,
303
+ "step": 49
304
+ },
305
+ {
306
+ "epoch": 0.07,
307
+ "learning_rate": 0.0002,
308
+ "loss": 1.4251,
309
+ "step": 50
310
+ },
311
+ {
312
+ "epoch": 0.07,
313
+ "learning_rate": 0.0002,
314
+ "loss": 1.2558,
315
+ "step": 51
316
+ },
317
+ {
318
+ "epoch": 0.07,
319
+ "learning_rate": 0.0002,
320
+ "loss": 1.3872,
321
+ "step": 52
322
+ },
323
+ {
324
+ "epoch": 0.07,
325
+ "learning_rate": 0.0002,
326
+ "loss": 1.3716,
327
+ "step": 53
328
+ },
329
+ {
330
+ "epoch": 0.07,
331
+ "learning_rate": 0.0002,
332
+ "loss": 1.2279,
333
+ "step": 54
334
+ },
335
+ {
336
+ "epoch": 0.07,
337
+ "learning_rate": 0.0002,
338
+ "loss": 1.378,
339
+ "step": 55
340
+ },
341
+ {
342
+ "epoch": 0.08,
343
+ "learning_rate": 0.0002,
344
+ "loss": 1.4844,
345
+ "step": 56
346
+ },
347
+ {
348
+ "epoch": 0.08,
349
+ "learning_rate": 0.0002,
350
+ "loss": 1.5299,
351
+ "step": 57
352
+ },
353
+ {
354
+ "epoch": 0.08,
355
+ "learning_rate": 0.0002,
356
+ "loss": 1.5403,
357
+ "step": 58
358
+ },
359
+ {
360
+ "epoch": 0.08,
361
+ "learning_rate": 0.0002,
362
+ "loss": 1.653,
363
+ "step": 59
364
+ },
365
+ {
366
+ "epoch": 0.08,
367
+ "learning_rate": 0.0002,
368
+ "loss": 1.7322,
369
+ "step": 60
370
+ },
371
+ {
372
+ "epoch": 0.08,
373
+ "learning_rate": 0.0002,
374
+ "loss": 1.3715,
375
+ "step": 61
376
+ },
377
+ {
378
+ "epoch": 0.08,
379
+ "learning_rate": 0.0002,
380
+ "loss": 1.5525,
381
+ "step": 62
382
+ },
383
+ {
384
+ "epoch": 0.08,
385
+ "learning_rate": 0.0002,
386
+ "loss": 1.1855,
387
+ "step": 63
388
+ },
389
+ {
390
+ "epoch": 0.09,
391
+ "learning_rate": 0.0002,
392
+ "loss": 1.6929,
393
+ "step": 64
394
+ },
395
+ {
396
+ "epoch": 0.09,
397
+ "learning_rate": 0.0002,
398
+ "loss": 1.3304,
399
+ "step": 65
400
+ },
401
+ {
402
+ "epoch": 0.09,
403
+ "learning_rate": 0.0002,
404
+ "loss": 1.4673,
405
+ "step": 66
406
+ },
407
+ {
408
+ "epoch": 0.09,
409
+ "learning_rate": 0.0002,
410
+ "loss": 1.3078,
411
+ "step": 67
412
+ },
413
+ {
414
+ "epoch": 0.09,
415
+ "learning_rate": 0.0002,
416
+ "loss": 1.5174,
417
+ "step": 68
418
+ },
419
+ {
420
+ "epoch": 0.09,
421
+ "learning_rate": 0.0002,
422
+ "loss": 1.2391,
423
+ "step": 69
424
+ },
425
+ {
426
+ "epoch": 0.09,
427
+ "learning_rate": 0.0002,
428
+ "loss": 1.1477,
429
+ "step": 70
430
+ },
431
+ {
432
+ "epoch": 0.1,
433
+ "learning_rate": 0.0002,
434
+ "loss": 1.5104,
435
+ "step": 71
436
+ },
437
+ {
438
+ "epoch": 0.1,
439
+ "learning_rate": 0.0002,
440
+ "loss": 1.3076,
441
+ "step": 72
442
+ },
443
+ {
444
+ "epoch": 0.1,
445
+ "learning_rate": 0.0002,
446
+ "loss": 1.4435,
447
+ "step": 73
448
+ },
449
+ {
450
+ "epoch": 0.1,
451
+ "learning_rate": 0.0002,
452
+ "loss": 1.622,
453
+ "step": 74
454
+ },
455
+ {
456
+ "epoch": 0.1,
457
+ "learning_rate": 0.0002,
458
+ "loss": 1.5879,
459
+ "step": 75
460
+ },
461
+ {
462
+ "epoch": 0.1,
463
+ "learning_rate": 0.0002,
464
+ "loss": 1.375,
465
+ "step": 76
466
+ },
467
+ {
468
+ "epoch": 0.1,
469
+ "learning_rate": 0.0002,
470
+ "loss": 1.5987,
471
+ "step": 77
472
+ },
473
+ {
474
+ "epoch": 0.1,
475
+ "learning_rate": 0.0002,
476
+ "loss": 1.4196,
477
+ "step": 78
478
+ },
479
+ {
480
+ "epoch": 0.11,
481
+ "learning_rate": 0.0002,
482
+ "loss": 1.291,
483
+ "step": 79
484
+ },
485
+ {
486
+ "epoch": 0.11,
487
+ "learning_rate": 0.0002,
488
+ "loss": 1.3158,
489
+ "step": 80
490
+ },
491
+ {
492
+ "epoch": 0.11,
493
+ "learning_rate": 0.0002,
494
+ "loss": 1.5917,
495
+ "step": 81
496
+ },
497
+ {
498
+ "epoch": 0.11,
499
+ "learning_rate": 0.0002,
500
+ "loss": 1.5557,
501
+ "step": 82
502
+ },
503
+ {
504
+ "epoch": 0.11,
505
+ "learning_rate": 0.0002,
506
+ "loss": 1.6552,
507
+ "step": 83
508
+ },
509
+ {
510
+ "epoch": 0.11,
511
+ "learning_rate": 0.0002,
512
+ "loss": 1.2357,
513
+ "step": 84
514
+ },
515
+ {
516
+ "epoch": 0.11,
517
+ "learning_rate": 0.0002,
518
+ "loss": 1.2287,
519
+ "step": 85
520
+ },
521
+ {
522
+ "epoch": 0.12,
523
+ "learning_rate": 0.0002,
524
+ "loss": 1.4418,
525
+ "step": 86
526
+ },
527
+ {
528
+ "epoch": 0.12,
529
+ "learning_rate": 0.0002,
530
+ "loss": 1.6311,
531
+ "step": 87
532
+ },
533
+ {
534
+ "epoch": 0.12,
535
+ "learning_rate": 0.0002,
536
+ "loss": 1.4767,
537
+ "step": 88
538
+ },
539
+ {
540
+ "epoch": 0.12,
541
+ "learning_rate": 0.0002,
542
+ "loss": 1.5289,
543
+ "step": 89
544
+ },
545
+ {
546
+ "epoch": 0.12,
547
+ "learning_rate": 0.0002,
548
+ "loss": 1.3354,
549
+ "step": 90
550
+ },
551
+ {
552
+ "epoch": 0.12,
553
+ "learning_rate": 0.0002,
554
+ "loss": 1.3328,
555
+ "step": 91
556
+ },
557
+ {
558
+ "epoch": 0.12,
559
+ "learning_rate": 0.0002,
560
+ "loss": 1.319,
561
+ "step": 92
562
+ },
563
+ {
564
+ "epoch": 0.12,
565
+ "learning_rate": 0.0002,
566
+ "loss": 1.382,
567
+ "step": 93
568
+ },
569
+ {
570
+ "epoch": 0.13,
571
+ "learning_rate": 0.0002,
572
+ "loss": 1.6372,
573
+ "step": 94
574
+ },
575
+ {
576
+ "epoch": 0.13,
577
+ "learning_rate": 0.0002,
578
+ "loss": 1.6074,
579
+ "step": 95
580
+ },
581
+ {
582
+ "epoch": 0.13,
583
+ "learning_rate": 0.0002,
584
+ "loss": 1.3375,
585
+ "step": 96
586
+ },
587
+ {
588
+ "epoch": 0.13,
589
+ "learning_rate": 0.0002,
590
+ "loss": 1.3432,
591
+ "step": 97
592
+ },
593
+ {
594
+ "epoch": 0.13,
595
+ "learning_rate": 0.0002,
596
+ "loss": 1.4305,
597
+ "step": 98
598
+ },
599
+ {
600
+ "epoch": 0.13,
601
+ "learning_rate": 0.0002,
602
+ "loss": 1.2407,
603
+ "step": 99
604
+ },
605
+ {
606
+ "epoch": 0.13,
607
+ "learning_rate": 0.0002,
608
+ "loss": 1.5083,
609
+ "step": 100
610
+ },
611
+ {
612
+ "epoch": 0.13,
613
+ "eval_loss": 1.4049460887908936,
614
+ "eval_runtime": 441.5212,
615
+ "eval_samples_per_second": 1.563,
616
+ "eval_steps_per_second": 0.392,
617
+ "step": 100
618
+ },
619
+ {
620
+ "epoch": 0.14,
621
+ "learning_rate": 0.0002,
622
+ "loss": 1.4292,
623
+ "step": 101
624
+ },
625
+ {
626
+ "epoch": 0.14,
627
+ "learning_rate": 0.0002,
628
+ "loss": 1.3542,
629
+ "step": 102
630
+ },
631
+ {
632
+ "epoch": 0.14,
633
+ "learning_rate": 0.0002,
634
+ "loss": 1.4069,
635
+ "step": 103
636
+ },
637
+ {
638
+ "epoch": 0.14,
639
+ "learning_rate": 0.0002,
640
+ "loss": 1.4706,
641
+ "step": 104
642
+ },
643
+ {
644
+ "epoch": 0.14,
645
+ "learning_rate": 0.0002,
646
+ "loss": 1.3688,
647
+ "step": 105
648
+ },
649
+ {
650
+ "epoch": 0.14,
651
+ "learning_rate": 0.0002,
652
+ "loss": 1.4376,
653
+ "step": 106
654
+ },
655
+ {
656
+ "epoch": 0.14,
657
+ "learning_rate": 0.0002,
658
+ "loss": 1.4636,
659
+ "step": 107
660
+ },
661
+ {
662
+ "epoch": 0.15,
663
+ "learning_rate": 0.0002,
664
+ "loss": 1.5471,
665
+ "step": 108
666
+ },
667
+ {
668
+ "epoch": 0.15,
669
+ "learning_rate": 0.0002,
670
+ "loss": 1.4346,
671
+ "step": 109
672
+ },
673
+ {
674
+ "epoch": 0.15,
675
+ "learning_rate": 0.0002,
676
+ "loss": 1.2338,
677
+ "step": 110
678
+ },
679
+ {
680
+ "epoch": 0.15,
681
+ "learning_rate": 0.0002,
682
+ "loss": 1.4768,
683
+ "step": 111
684
+ },
685
+ {
686
+ "epoch": 0.15,
687
+ "learning_rate": 0.0002,
688
+ "loss": 1.432,
689
+ "step": 112
690
+ },
691
+ {
692
+ "epoch": 0.15,
693
+ "learning_rate": 0.0002,
694
+ "loss": 1.2932,
695
+ "step": 113
696
+ },
697
+ {
698
+ "epoch": 0.15,
699
+ "learning_rate": 0.0002,
700
+ "loss": 1.6056,
701
+ "step": 114
702
+ },
703
+ {
704
+ "epoch": 0.15,
705
+ "learning_rate": 0.0002,
706
+ "loss": 1.2941,
707
+ "step": 115
708
+ },
709
+ {
710
+ "epoch": 0.16,
711
+ "learning_rate": 0.0002,
712
+ "loss": 1.4151,
713
+ "step": 116
714
+ },
715
+ {
716
+ "epoch": 0.16,
717
+ "learning_rate": 0.0002,
718
+ "loss": 1.5091,
719
+ "step": 117
720
+ },
721
+ {
722
+ "epoch": 0.16,
723
+ "learning_rate": 0.0002,
724
+ "loss": 1.3322,
725
+ "step": 118
726
+ },
727
+ {
728
+ "epoch": 0.16,
729
+ "learning_rate": 0.0002,
730
+ "loss": 1.5314,
731
+ "step": 119
732
+ },
733
+ {
734
+ "epoch": 0.16,
735
+ "learning_rate": 0.0002,
736
+ "loss": 1.5164,
737
+ "step": 120
738
+ },
739
+ {
740
+ "epoch": 0.16,
741
+ "learning_rate": 0.0002,
742
+ "loss": 1.7211,
743
+ "step": 121
744
+ },
745
+ {
746
+ "epoch": 0.16,
747
+ "learning_rate": 0.0002,
748
+ "loss": 1.2817,
749
+ "step": 122
750
+ },
751
+ {
752
+ "epoch": 0.17,
753
+ "learning_rate": 0.0002,
754
+ "loss": 1.3317,
755
+ "step": 123
756
+ },
757
+ {
758
+ "epoch": 0.17,
759
+ "learning_rate": 0.0002,
760
+ "loss": 1.5745,
761
+ "step": 124
762
+ },
763
+ {
764
+ "epoch": 0.17,
765
+ "learning_rate": 0.0002,
766
+ "loss": 1.2308,
767
+ "step": 125
768
+ },
769
+ {
770
+ "epoch": 0.17,
771
+ "learning_rate": 0.0002,
772
+ "loss": 1.411,
773
+ "step": 126
774
+ },
775
+ {
776
+ "epoch": 0.17,
777
+ "learning_rate": 0.0002,
778
+ "loss": 1.2042,
779
+ "step": 127
780
+ },
781
+ {
782
+ "epoch": 0.17,
783
+ "learning_rate": 0.0002,
784
+ "loss": 1.4981,
785
+ "step": 128
786
+ },
787
+ {
788
+ "epoch": 0.17,
789
+ "learning_rate": 0.0002,
790
+ "loss": 1.4421,
791
+ "step": 129
792
+ },
793
+ {
794
+ "epoch": 0.17,
795
+ "learning_rate": 0.0002,
796
+ "loss": 1.2531,
797
+ "step": 130
798
+ },
799
+ {
800
+ "epoch": 0.18,
801
+ "learning_rate": 0.0002,
802
+ "loss": 1.1973,
803
+ "step": 131
804
+ },
805
+ {
806
+ "epoch": 0.18,
807
+ "learning_rate": 0.0002,
808
+ "loss": 1.6006,
809
+ "step": 132
810
+ },
811
+ {
812
+ "epoch": 0.18,
813
+ "learning_rate": 0.0002,
814
+ "loss": 1.594,
815
+ "step": 133
816
+ },
817
+ {
818
+ "epoch": 0.18,
819
+ "learning_rate": 0.0002,
820
+ "loss": 1.4344,
821
+ "step": 134
822
+ },
823
+ {
824
+ "epoch": 0.18,
825
+ "learning_rate": 0.0002,
826
+ "loss": 1.554,
827
+ "step": 135
828
+ },
829
+ {
830
+ "epoch": 0.18,
831
+ "learning_rate": 0.0002,
832
+ "loss": 1.2604,
833
+ "step": 136
834
+ },
835
+ {
836
+ "epoch": 0.18,
837
+ "learning_rate": 0.0002,
838
+ "loss": 1.3399,
839
+ "step": 137
840
+ },
841
+ {
842
+ "epoch": 0.19,
843
+ "learning_rate": 0.0002,
844
+ "loss": 1.3839,
845
+ "step": 138
846
+ },
847
+ {
848
+ "epoch": 0.19,
849
+ "learning_rate": 0.0002,
850
+ "loss": 1.4957,
851
+ "step": 139
852
+ },
853
+ {
854
+ "epoch": 0.19,
855
+ "learning_rate": 0.0002,
856
+ "loss": 1.3904,
857
+ "step": 140
858
+ },
859
+ {
860
+ "epoch": 0.19,
861
+ "learning_rate": 0.0002,
862
+ "loss": 1.6935,
863
+ "step": 141
864
+ },
865
+ {
866
+ "epoch": 0.19,
867
+ "learning_rate": 0.0002,
868
+ "loss": 1.1986,
869
+ "step": 142
870
+ },
871
+ {
872
+ "epoch": 0.19,
873
+ "learning_rate": 0.0002,
874
+ "loss": 1.5167,
875
+ "step": 143
876
+ },
877
+ {
878
+ "epoch": 0.19,
879
+ "learning_rate": 0.0002,
880
+ "loss": 1.5019,
881
+ "step": 144
882
+ },
883
+ {
884
+ "epoch": 0.19,
885
+ "learning_rate": 0.0002,
886
+ "loss": 1.3443,
887
+ "step": 145
888
+ },
889
+ {
890
+ "epoch": 0.2,
891
+ "learning_rate": 0.0002,
892
+ "loss": 1.21,
893
+ "step": 146
894
+ },
895
+ {
896
+ "epoch": 0.2,
897
+ "learning_rate": 0.0002,
898
+ "loss": 1.8859,
899
+ "step": 147
900
+ },
901
+ {
902
+ "epoch": 0.2,
903
+ "learning_rate": 0.0002,
904
+ "loss": 1.5173,
905
+ "step": 148
906
+ },
907
+ {
908
+ "epoch": 0.2,
909
+ "learning_rate": 0.0002,
910
+ "loss": 1.2812,
911
+ "step": 149
912
+ },
913
+ {
914
+ "epoch": 0.2,
915
+ "learning_rate": 0.0002,
916
+ "loss": 1.5561,
917
+ "step": 150
918
+ },
919
+ {
920
+ "epoch": 0.2,
921
+ "learning_rate": 0.0002,
922
+ "loss": 1.511,
923
+ "step": 151
924
+ },
925
+ {
926
+ "epoch": 0.2,
927
+ "learning_rate": 0.0002,
928
+ "loss": 1.6042,
929
+ "step": 152
930
+ },
931
+ {
932
+ "epoch": 0.21,
933
+ "learning_rate": 0.0002,
934
+ "loss": 1.2779,
935
+ "step": 153
936
+ },
937
+ {
938
+ "epoch": 0.21,
939
+ "learning_rate": 0.0002,
940
+ "loss": 1.3322,
941
+ "step": 154
942
+ },
943
+ {
944
+ "epoch": 0.21,
945
+ "learning_rate": 0.0002,
946
+ "loss": 1.4381,
947
+ "step": 155
948
+ },
949
+ {
950
+ "epoch": 0.21,
951
+ "learning_rate": 0.0002,
952
+ "loss": 1.6009,
953
+ "step": 156
954
+ },
955
+ {
956
+ "epoch": 0.21,
957
+ "learning_rate": 0.0002,
958
+ "loss": 1.5746,
959
+ "step": 157
960
+ },
961
+ {
962
+ "epoch": 0.21,
963
+ "learning_rate": 0.0002,
964
+ "loss": 1.5367,
965
+ "step": 158
966
+ },
967
+ {
968
+ "epoch": 0.21,
969
+ "learning_rate": 0.0002,
970
+ "loss": 1.586,
971
+ "step": 159
972
+ },
973
+ {
974
+ "epoch": 0.21,
975
+ "learning_rate": 0.0002,
976
+ "loss": 1.3541,
977
+ "step": 160
978
+ },
979
+ {
980
+ "epoch": 0.22,
981
+ "learning_rate": 0.0002,
982
+ "loss": 1.4011,
983
+ "step": 161
984
+ },
985
+ {
986
+ "epoch": 0.22,
987
+ "learning_rate": 0.0002,
988
+ "loss": 1.6345,
989
+ "step": 162
990
+ },
991
+ {
992
+ "epoch": 0.22,
993
+ "learning_rate": 0.0002,
994
+ "loss": 1.691,
995
+ "step": 163
996
+ },
997
+ {
998
+ "epoch": 0.22,
999
+ "learning_rate": 0.0002,
1000
+ "loss": 1.5831,
1001
+ "step": 164
1002
+ },
1003
+ {
1004
+ "epoch": 0.22,
1005
+ "learning_rate": 0.0002,
1006
+ "loss": 1.3157,
1007
+ "step": 165
1008
+ },
1009
+ {
1010
+ "epoch": 0.22,
1011
+ "learning_rate": 0.0002,
1012
+ "loss": 1.3137,
1013
+ "step": 166
1014
+ },
1015
+ {
1016
+ "epoch": 0.22,
1017
+ "learning_rate": 0.0002,
1018
+ "loss": 1.0595,
1019
+ "step": 167
1020
+ },
1021
+ {
1022
+ "epoch": 0.23,
1023
+ "learning_rate": 0.0002,
1024
+ "loss": 1.2418,
1025
+ "step": 168
1026
+ },
1027
+ {
1028
+ "epoch": 0.23,
1029
+ "learning_rate": 0.0002,
1030
+ "loss": 1.2534,
1031
+ "step": 169
1032
+ },
1033
+ {
1034
+ "epoch": 0.23,
1035
+ "learning_rate": 0.0002,
1036
+ "loss": 1.3005,
1037
+ "step": 170
1038
+ },
1039
+ {
1040
+ "epoch": 0.23,
1041
+ "learning_rate": 0.0002,
1042
+ "loss": 1.4944,
1043
+ "step": 171
1044
+ },
1045
+ {
1046
+ "epoch": 0.23,
1047
+ "learning_rate": 0.0002,
1048
+ "loss": 1.3034,
1049
+ "step": 172
1050
+ },
1051
+ {
1052
+ "epoch": 0.23,
1053
+ "learning_rate": 0.0002,
1054
+ "loss": 1.4854,
1055
+ "step": 173
1056
+ },
1057
+ {
1058
+ "epoch": 0.23,
1059
+ "learning_rate": 0.0002,
1060
+ "loss": 1.3637,
1061
+ "step": 174
1062
+ },
1063
+ {
1064
+ "epoch": 0.24,
1065
+ "learning_rate": 0.0002,
1066
+ "loss": 1.4306,
1067
+ "step": 175
1068
+ },
1069
+ {
1070
+ "epoch": 0.24,
1071
+ "learning_rate": 0.0002,
1072
+ "loss": 1.6367,
1073
+ "step": 176
1074
+ },
1075
+ {
1076
+ "epoch": 0.24,
1077
+ "learning_rate": 0.0002,
1078
+ "loss": 1.2033,
1079
+ "step": 177
1080
+ },
1081
+ {
1082
+ "epoch": 0.24,
1083
+ "learning_rate": 0.0002,
1084
+ "loss": 1.5012,
1085
+ "step": 178
1086
+ },
1087
+ {
1088
+ "epoch": 0.24,
1089
+ "learning_rate": 0.0002,
1090
+ "loss": 1.4991,
1091
+ "step": 179
1092
+ },
1093
+ {
1094
+ "epoch": 0.24,
1095
+ "learning_rate": 0.0002,
1096
+ "loss": 1.2487,
1097
+ "step": 180
1098
+ },
1099
+ {
1100
+ "epoch": 0.24,
1101
+ "learning_rate": 0.0002,
1102
+ "loss": 1.4416,
1103
+ "step": 181
1104
+ },
1105
+ {
1106
+ "epoch": 0.24,
1107
+ "learning_rate": 0.0002,
1108
+ "loss": 1.3695,
1109
+ "step": 182
1110
+ },
1111
+ {
1112
+ "epoch": 0.25,
1113
+ "learning_rate": 0.0002,
1114
+ "loss": 1.0741,
1115
+ "step": 183
1116
+ },
1117
+ {
1118
+ "epoch": 0.25,
1119
+ "learning_rate": 0.0002,
1120
+ "loss": 1.4816,
1121
+ "step": 184
1122
+ },
1123
+ {
1124
+ "epoch": 0.25,
1125
+ "learning_rate": 0.0002,
1126
+ "loss": 1.3346,
1127
+ "step": 185
1128
+ },
1129
+ {
1130
+ "epoch": 0.25,
1131
+ "learning_rate": 0.0002,
1132
+ "loss": 1.4782,
1133
+ "step": 186
1134
+ },
1135
+ {
1136
+ "epoch": 0.25,
1137
+ "learning_rate": 0.0002,
1138
+ "loss": 1.4808,
1139
+ "step": 187
1140
+ },
1141
+ {
1142
+ "epoch": 0.25,
1143
+ "learning_rate": 0.0002,
1144
+ "loss": 1.4079,
1145
+ "step": 188
1146
+ },
1147
+ {
1148
+ "epoch": 0.25,
1149
+ "learning_rate": 0.0002,
1150
+ "loss": 1.3433,
1151
+ "step": 189
1152
+ },
1153
+ {
1154
+ "epoch": 0.26,
1155
+ "learning_rate": 0.0002,
1156
+ "loss": 1.6758,
1157
+ "step": 190
1158
+ },
1159
+ {
1160
+ "epoch": 0.26,
1161
+ "learning_rate": 0.0002,
1162
+ "loss": 1.3544,
1163
+ "step": 191
1164
+ },
1165
+ {
1166
+ "epoch": 0.26,
1167
+ "learning_rate": 0.0002,
1168
+ "loss": 1.1564,
1169
+ "step": 192
1170
+ },
1171
+ {
1172
+ "epoch": 0.26,
1173
+ "learning_rate": 0.0002,
1174
+ "loss": 1.3612,
1175
+ "step": 193
1176
+ },
1177
+ {
1178
+ "epoch": 0.26,
1179
+ "learning_rate": 0.0002,
1180
+ "loss": 1.3226,
1181
+ "step": 194
1182
+ },
1183
+ {
1184
+ "epoch": 0.26,
1185
+ "learning_rate": 0.0002,
1186
+ "loss": 1.365,
1187
+ "step": 195
1188
+ },
1189
+ {
1190
+ "epoch": 0.26,
1191
+ "learning_rate": 0.0002,
1192
+ "loss": 1.4344,
1193
+ "step": 196
1194
+ },
1195
+ {
1196
+ "epoch": 0.26,
1197
+ "learning_rate": 0.0002,
1198
+ "loss": 1.2987,
1199
+ "step": 197
1200
+ },
1201
+ {
1202
+ "epoch": 0.27,
1203
+ "learning_rate": 0.0002,
1204
+ "loss": 1.3551,
1205
+ "step": 198
1206
+ },
1207
+ {
1208
+ "epoch": 0.27,
1209
+ "learning_rate": 0.0002,
1210
+ "loss": 1.2806,
1211
+ "step": 199
1212
+ },
1213
+ {
1214
+ "epoch": 0.27,
1215
+ "learning_rate": 0.0002,
1216
+ "loss": 1.2726,
1217
+ "step": 200
1218
+ },
1219
+ {
1220
+ "epoch": 0.27,
1221
+ "eval_loss": 1.3933924436569214,
1222
+ "eval_runtime": 441.7187,
1223
+ "eval_samples_per_second": 1.562,
1224
+ "eval_steps_per_second": 0.392,
1225
+ "step": 200
1226
+ },
1227
+ {
1228
+ "epoch": 0.27,
1229
+ "learning_rate": 0.0002,
1230
+ "loss": 1.4918,
1231
+ "step": 201
1232
+ },
1233
+ {
1234
+ "epoch": 0.27,
1235
+ "learning_rate": 0.0002,
1236
+ "loss": 1.6278,
1237
+ "step": 202
1238
+ },
1239
+ {
1240
+ "epoch": 0.27,
1241
+ "learning_rate": 0.0002,
1242
+ "loss": 1.2418,
1243
+ "step": 203
1244
+ },
1245
+ {
1246
+ "epoch": 0.27,
1247
+ "learning_rate": 0.0002,
1248
+ "loss": 1.4545,
1249
+ "step": 204
1250
+ },
1251
+ {
1252
+ "epoch": 0.28,
1253
+ "learning_rate": 0.0002,
1254
+ "loss": 1.4311,
1255
+ "step": 205
1256
+ },
1257
+ {
1258
+ "epoch": 0.28,
1259
+ "learning_rate": 0.0002,
1260
+ "loss": 1.294,
1261
+ "step": 206
1262
+ },
1263
+ {
1264
+ "epoch": 0.28,
1265
+ "learning_rate": 0.0002,
1266
+ "loss": 1.3711,
1267
+ "step": 207
1268
+ },
1269
+ {
1270
+ "epoch": 0.28,
1271
+ "learning_rate": 0.0002,
1272
+ "loss": 1.2889,
1273
+ "step": 208
1274
+ },
1275
+ {
1276
+ "epoch": 0.28,
1277
+ "learning_rate": 0.0002,
1278
+ "loss": 1.483,
1279
+ "step": 209
1280
+ },
1281
+ {
1282
+ "epoch": 0.28,
1283
+ "learning_rate": 0.0002,
1284
+ "loss": 1.4393,
1285
+ "step": 210
1286
+ },
1287
+ {
1288
+ "epoch": 0.28,
1289
+ "learning_rate": 0.0002,
1290
+ "loss": 1.45,
1291
+ "step": 211
1292
+ },
1293
+ {
1294
+ "epoch": 0.28,
1295
+ "learning_rate": 0.0002,
1296
+ "loss": 1.1867,
1297
+ "step": 212
1298
+ },
1299
+ {
1300
+ "epoch": 0.29,
1301
+ "learning_rate": 0.0002,
1302
+ "loss": 1.2354,
1303
+ "step": 213
1304
+ },
1305
+ {
1306
+ "epoch": 0.29,
1307
+ "learning_rate": 0.0002,
1308
+ "loss": 1.5312,
1309
+ "step": 214
1310
+ },
1311
+ {
1312
+ "epoch": 0.29,
1313
+ "learning_rate": 0.0002,
1314
+ "loss": 1.2599,
1315
+ "step": 215
1316
+ },
1317
+ {
1318
+ "epoch": 0.29,
1319
+ "learning_rate": 0.0002,
1320
+ "loss": 1.316,
1321
+ "step": 216
1322
+ },
1323
+ {
1324
+ "epoch": 0.29,
1325
+ "learning_rate": 0.0002,
1326
+ "loss": 1.5382,
1327
+ "step": 217
1328
+ },
1329
+ {
1330
+ "epoch": 0.29,
1331
+ "learning_rate": 0.0002,
1332
+ "loss": 1.581,
1333
+ "step": 218
1334
+ },
1335
+ {
1336
+ "epoch": 0.29,
1337
+ "learning_rate": 0.0002,
1338
+ "loss": 1.2455,
1339
+ "step": 219
1340
+ },
1341
+ {
1342
+ "epoch": 0.3,
1343
+ "learning_rate": 0.0002,
1344
+ "loss": 1.6401,
1345
+ "step": 220
1346
+ },
1347
+ {
1348
+ "epoch": 0.3,
1349
+ "learning_rate": 0.0002,
1350
+ "loss": 1.5745,
1351
+ "step": 221
1352
+ },
1353
+ {
1354
+ "epoch": 0.3,
1355
+ "learning_rate": 0.0002,
1356
+ "loss": 1.3209,
1357
+ "step": 222
1358
+ },
1359
+ {
1360
+ "epoch": 0.3,
1361
+ "learning_rate": 0.0002,
1362
+ "loss": 1.5797,
1363
+ "step": 223
1364
+ },
1365
+ {
1366
+ "epoch": 0.3,
1367
+ "learning_rate": 0.0002,
1368
+ "loss": 1.1661,
1369
+ "step": 224
1370
+ },
1371
+ {
1372
+ "epoch": 0.3,
1373
+ "learning_rate": 0.0002,
1374
+ "loss": 1.3139,
1375
+ "step": 225
1376
+ },
1377
+ {
1378
+ "epoch": 0.3,
1379
+ "learning_rate": 0.0002,
1380
+ "loss": 1.5553,
1381
+ "step": 226
1382
+ },
1383
+ {
1384
+ "epoch": 0.31,
1385
+ "learning_rate": 0.0002,
1386
+ "loss": 1.3963,
1387
+ "step": 227
1388
+ },
1389
+ {
1390
+ "epoch": 0.31,
1391
+ "learning_rate": 0.0002,
1392
+ "loss": 1.4288,
1393
+ "step": 228
1394
+ },
1395
+ {
1396
+ "epoch": 0.31,
1397
+ "learning_rate": 0.0002,
1398
+ "loss": 1.621,
1399
+ "step": 229
1400
+ },
1401
+ {
1402
+ "epoch": 0.31,
1403
+ "learning_rate": 0.0002,
1404
+ "loss": 1.3305,
1405
+ "step": 230
1406
+ },
1407
+ {
1408
+ "epoch": 0.31,
1409
+ "learning_rate": 0.0002,
1410
+ "loss": 1.4525,
1411
+ "step": 231
1412
+ },
1413
+ {
1414
+ "epoch": 0.31,
1415
+ "learning_rate": 0.0002,
1416
+ "loss": 1.5967,
1417
+ "step": 232
1418
+ },
1419
+ {
1420
+ "epoch": 0.31,
1421
+ "learning_rate": 0.0002,
1422
+ "loss": 1.2565,
1423
+ "step": 233
1424
+ },
1425
+ {
1426
+ "epoch": 0.31,
1427
+ "learning_rate": 0.0002,
1428
+ "loss": 1.387,
1429
+ "step": 234
1430
+ },
1431
+ {
1432
+ "epoch": 0.32,
1433
+ "learning_rate": 0.0002,
1434
+ "loss": 1.2859,
1435
+ "step": 235
1436
+ },
1437
+ {
1438
+ "epoch": 0.32,
1439
+ "learning_rate": 0.0002,
1440
+ "loss": 1.4987,
1441
+ "step": 236
1442
+ },
1443
+ {
1444
+ "epoch": 0.32,
1445
+ "learning_rate": 0.0002,
1446
+ "loss": 1.3214,
1447
+ "step": 237
1448
+ },
1449
+ {
1450
+ "epoch": 0.32,
1451
+ "learning_rate": 0.0002,
1452
+ "loss": 1.2937,
1453
+ "step": 238
1454
+ },
1455
+ {
1456
+ "epoch": 0.32,
1457
+ "learning_rate": 0.0002,
1458
+ "loss": 1.1512,
1459
+ "step": 239
1460
+ },
1461
+ {
1462
+ "epoch": 0.32,
1463
+ "learning_rate": 0.0002,
1464
+ "loss": 1.621,
1465
+ "step": 240
1466
+ },
1467
+ {
1468
+ "epoch": 0.32,
1469
+ "learning_rate": 0.0002,
1470
+ "loss": 1.4683,
1471
+ "step": 241
1472
+ },
1473
+ {
1474
+ "epoch": 0.33,
1475
+ "learning_rate": 0.0002,
1476
+ "loss": 1.1805,
1477
+ "step": 242
1478
+ },
1479
+ {
1480
+ "epoch": 0.33,
1481
+ "learning_rate": 0.0002,
1482
+ "loss": 1.238,
1483
+ "step": 243
1484
+ },
1485
+ {
1486
+ "epoch": 0.33,
1487
+ "learning_rate": 0.0002,
1488
+ "loss": 1.5211,
1489
+ "step": 244
1490
+ },
1491
+ {
1492
+ "epoch": 0.33,
1493
+ "learning_rate": 0.0002,
1494
+ "loss": 1.4926,
1495
+ "step": 245
1496
+ },
1497
+ {
1498
+ "epoch": 0.33,
1499
+ "learning_rate": 0.0002,
1500
+ "loss": 1.5397,
1501
+ "step": 246
1502
+ },
1503
+ {
1504
+ "epoch": 0.33,
1505
+ "learning_rate": 0.0002,
1506
+ "loss": 1.5255,
1507
+ "step": 247
1508
+ },
1509
+ {
1510
+ "epoch": 0.33,
1511
+ "learning_rate": 0.0002,
1512
+ "loss": 1.4253,
1513
+ "step": 248
1514
+ },
1515
+ {
1516
+ "epoch": 0.33,
1517
+ "learning_rate": 0.0002,
1518
+ "loss": 1.3297,
1519
+ "step": 249
1520
+ },
1521
+ {
1522
+ "epoch": 0.34,
1523
+ "learning_rate": 0.0002,
1524
+ "loss": 1.2316,
1525
+ "step": 250
1526
+ },
1527
+ {
1528
+ "epoch": 0.34,
1529
+ "learning_rate": 0.0002,
1530
+ "loss": 1.429,
1531
+ "step": 251
1532
+ },
1533
+ {
1534
+ "epoch": 0.34,
1535
+ "learning_rate": 0.0002,
1536
+ "loss": 1.2792,
1537
+ "step": 252
1538
+ },
1539
+ {
1540
+ "epoch": 0.34,
1541
+ "learning_rate": 0.0002,
1542
+ "loss": 1.5727,
1543
+ "step": 253
1544
+ },
1545
+ {
1546
+ "epoch": 0.34,
1547
+ "learning_rate": 0.0002,
1548
+ "loss": 1.2032,
1549
+ "step": 254
1550
+ },
1551
+ {
1552
+ "epoch": 0.34,
1553
+ "learning_rate": 0.0002,
1554
+ "loss": 1.4153,
1555
+ "step": 255
1556
+ },
1557
+ {
1558
+ "epoch": 0.34,
1559
+ "learning_rate": 0.0002,
1560
+ "loss": 1.2724,
1561
+ "step": 256
1562
+ },
1563
+ {
1564
+ "epoch": 0.35,
1565
+ "learning_rate": 0.0002,
1566
+ "loss": 1.4178,
1567
+ "step": 257
1568
+ },
1569
+ {
1570
+ "epoch": 0.35,
1571
+ "learning_rate": 0.0002,
1572
+ "loss": 1.3131,
1573
+ "step": 258
1574
+ },
1575
+ {
1576
+ "epoch": 0.35,
1577
+ "learning_rate": 0.0002,
1578
+ "loss": 1.6291,
1579
+ "step": 259
1580
+ },
1581
+ {
1582
+ "epoch": 0.35,
1583
+ "learning_rate": 0.0002,
1584
+ "loss": 1.1144,
1585
+ "step": 260
1586
+ },
1587
+ {
1588
+ "epoch": 0.35,
1589
+ "learning_rate": 0.0002,
1590
+ "loss": 1.425,
1591
+ "step": 261
1592
+ },
1593
+ {
1594
+ "epoch": 0.35,
1595
+ "learning_rate": 0.0002,
1596
+ "loss": 1.5624,
1597
+ "step": 262
1598
+ },
1599
+ {
1600
+ "epoch": 0.35,
1601
+ "learning_rate": 0.0002,
1602
+ "loss": 1.4533,
1603
+ "step": 263
1604
+ },
1605
+ {
1606
+ "epoch": 0.35,
1607
+ "learning_rate": 0.0002,
1608
+ "loss": 1.209,
1609
+ "step": 264
1610
+ },
1611
+ {
1612
+ "epoch": 0.36,
1613
+ "learning_rate": 0.0002,
1614
+ "loss": 1.6137,
1615
+ "step": 265
1616
+ },
1617
+ {
1618
+ "epoch": 0.36,
1619
+ "learning_rate": 0.0002,
1620
+ "loss": 1.2784,
1621
+ "step": 266
1622
+ },
1623
+ {
1624
+ "epoch": 0.36,
1625
+ "learning_rate": 0.0002,
1626
+ "loss": 1.4203,
1627
+ "step": 267
1628
+ },
1629
+ {
1630
+ "epoch": 0.36,
1631
+ "learning_rate": 0.0002,
1632
+ "loss": 1.2836,
1633
+ "step": 268
1634
+ },
1635
+ {
1636
+ "epoch": 0.36,
1637
+ "learning_rate": 0.0002,
1638
+ "loss": 1.4429,
1639
+ "step": 269
1640
+ },
1641
+ {
1642
+ "epoch": 0.36,
1643
+ "learning_rate": 0.0002,
1644
+ "loss": 1.5235,
1645
+ "step": 270
1646
+ },
1647
+ {
1648
+ "epoch": 0.36,
1649
+ "learning_rate": 0.0002,
1650
+ "loss": 1.2781,
1651
+ "step": 271
1652
+ },
1653
+ {
1654
+ "epoch": 0.37,
1655
+ "learning_rate": 0.0002,
1656
+ "loss": 1.2376,
1657
+ "step": 272
1658
+ },
1659
+ {
1660
+ "epoch": 0.37,
1661
+ "learning_rate": 0.0002,
1662
+ "loss": 1.4518,
1663
+ "step": 273
1664
+ },
1665
+ {
1666
+ "epoch": 0.37,
1667
+ "learning_rate": 0.0002,
1668
+ "loss": 1.2264,
1669
+ "step": 274
1670
+ },
1671
+ {
1672
+ "epoch": 0.37,
1673
+ "learning_rate": 0.0002,
1674
+ "loss": 1.3288,
1675
+ "step": 275
1676
+ },
1677
+ {
1678
+ "epoch": 0.37,
1679
+ "learning_rate": 0.0002,
1680
+ "loss": 1.2508,
1681
+ "step": 276
1682
+ },
1683
+ {
1684
+ "epoch": 0.37,
1685
+ "learning_rate": 0.0002,
1686
+ "loss": 1.971,
1687
+ "step": 277
1688
+ },
1689
+ {
1690
+ "epoch": 0.37,
1691
+ "learning_rate": 0.0002,
1692
+ "loss": 1.1255,
1693
+ "step": 278
1694
+ },
1695
+ {
1696
+ "epoch": 0.37,
1697
+ "learning_rate": 0.0002,
1698
+ "loss": 1.6362,
1699
+ "step": 279
1700
+ },
1701
+ {
1702
+ "epoch": 0.38,
1703
+ "learning_rate": 0.0002,
1704
+ "loss": 1.2952,
1705
+ "step": 280
1706
+ },
1707
+ {
1708
+ "epoch": 0.38,
1709
+ "learning_rate": 0.0002,
1710
+ "loss": 1.3496,
1711
+ "step": 281
1712
+ },
1713
+ {
1714
+ "epoch": 0.38,
1715
+ "learning_rate": 0.0002,
1716
+ "loss": 1.2185,
1717
+ "step": 282
1718
+ },
1719
+ {
1720
+ "epoch": 0.38,
1721
+ "learning_rate": 0.0002,
1722
+ "loss": 1.4449,
1723
+ "step": 283
1724
+ },
1725
+ {
1726
+ "epoch": 0.38,
1727
+ "learning_rate": 0.0002,
1728
+ "loss": 1.7358,
1729
+ "step": 284
1730
+ },
1731
+ {
1732
+ "epoch": 0.38,
1733
+ "learning_rate": 0.0002,
1734
+ "loss": 1.3203,
1735
+ "step": 285
1736
+ },
1737
+ {
1738
+ "epoch": 0.38,
1739
+ "learning_rate": 0.0002,
1740
+ "loss": 1.3007,
1741
+ "step": 286
1742
+ },
1743
+ {
1744
+ "epoch": 0.39,
1745
+ "learning_rate": 0.0002,
1746
+ "loss": 1.6082,
1747
+ "step": 287
1748
+ },
1749
+ {
1750
+ "epoch": 0.39,
1751
+ "learning_rate": 0.0002,
1752
+ "loss": 1.2585,
1753
+ "step": 288
1754
+ },
1755
+ {
1756
+ "epoch": 0.39,
1757
+ "learning_rate": 0.0002,
1758
+ "loss": 1.9611,
1759
+ "step": 289
1760
+ },
1761
+ {
1762
+ "epoch": 0.39,
1763
+ "learning_rate": 0.0002,
1764
+ "loss": 0.9947,
1765
+ "step": 290
1766
+ },
1767
+ {
1768
+ "epoch": 0.39,
1769
+ "learning_rate": 0.0002,
1770
+ "loss": 1.4437,
1771
+ "step": 291
1772
+ },
1773
+ {
1774
+ "epoch": 0.39,
1775
+ "learning_rate": 0.0002,
1776
+ "loss": 1.269,
1777
+ "step": 292
1778
+ },
1779
+ {
1780
+ "epoch": 0.39,
1781
+ "learning_rate": 0.0002,
1782
+ "loss": 1.4283,
1783
+ "step": 293
1784
+ },
1785
+ {
1786
+ "epoch": 0.4,
1787
+ "learning_rate": 0.0002,
1788
+ "loss": 1.5007,
1789
+ "step": 294
1790
+ },
1791
+ {
1792
+ "epoch": 0.4,
1793
+ "learning_rate": 0.0002,
1794
+ "loss": 1.3605,
1795
+ "step": 295
1796
+ },
1797
+ {
1798
+ "epoch": 0.4,
1799
+ "learning_rate": 0.0002,
1800
+ "loss": 1.3069,
1801
+ "step": 296
1802
+ },
1803
+ {
1804
+ "epoch": 0.4,
1805
+ "learning_rate": 0.0002,
1806
+ "loss": 1.0557,
1807
+ "step": 297
1808
+ },
1809
+ {
1810
+ "epoch": 0.4,
1811
+ "learning_rate": 0.0002,
1812
+ "loss": 1.2875,
1813
+ "step": 298
1814
+ },
1815
+ {
1816
+ "epoch": 0.4,
1817
+ "learning_rate": 0.0002,
1818
+ "loss": 1.3322,
1819
+ "step": 299
1820
+ },
1821
+ {
1822
+ "epoch": 0.4,
1823
+ "learning_rate": 0.0002,
1824
+ "loss": 1.3506,
1825
+ "step": 300
1826
+ },
1827
+ {
1828
+ "epoch": 0.4,
1829
+ "eval_loss": 1.3853427171707153,
1830
+ "eval_runtime": 441.2843,
1831
+ "eval_samples_per_second": 1.564,
1832
+ "eval_steps_per_second": 0.392,
1833
+ "step": 300
1834
+ },
1835
+ {
1836
+ "epoch": 0.4,
1837
+ "learning_rate": 0.0002,
1838
+ "loss": 1.6926,
1839
+ "step": 301
1840
+ },
1841
+ {
1842
+ "epoch": 0.41,
1843
+ "learning_rate": 0.0002,
1844
+ "loss": 1.5522,
1845
+ "step": 302
1846
+ },
1847
+ {
1848
+ "epoch": 0.41,
1849
+ "learning_rate": 0.0002,
1850
+ "loss": 1.3527,
1851
+ "step": 303
1852
+ },
1853
+ {
1854
+ "epoch": 0.41,
1855
+ "learning_rate": 0.0002,
1856
+ "loss": 1.4214,
1857
+ "step": 304
1858
+ },
1859
+ {
1860
+ "epoch": 0.41,
1861
+ "learning_rate": 0.0002,
1862
+ "loss": 1.3068,
1863
+ "step": 305
1864
+ },
1865
+ {
1866
+ "epoch": 0.41,
1867
+ "learning_rate": 0.0002,
1868
+ "loss": 1.5722,
1869
+ "step": 306
1870
+ },
1871
+ {
1872
+ "epoch": 0.41,
1873
+ "learning_rate": 0.0002,
1874
+ "loss": 1.2584,
1875
+ "step": 307
1876
+ },
1877
+ {
1878
+ "epoch": 0.41,
1879
+ "learning_rate": 0.0002,
1880
+ "loss": 1.5793,
1881
+ "step": 308
1882
+ },
1883
+ {
1884
+ "epoch": 0.42,
1885
+ "learning_rate": 0.0002,
1886
+ "loss": 1.3942,
1887
+ "step": 309
1888
+ },
1889
+ {
1890
+ "epoch": 0.42,
1891
+ "learning_rate": 0.0002,
1892
+ "loss": 1.5487,
1893
+ "step": 310
1894
+ },
1895
+ {
1896
+ "epoch": 0.42,
1897
+ "learning_rate": 0.0002,
1898
+ "loss": 1.3595,
1899
+ "step": 311
1900
+ },
1901
+ {
1902
+ "epoch": 0.42,
1903
+ "learning_rate": 0.0002,
1904
+ "loss": 1.271,
1905
+ "step": 312
1906
+ },
1907
+ {
1908
+ "epoch": 0.42,
1909
+ "learning_rate": 0.0002,
1910
+ "loss": 1.6985,
1911
+ "step": 313
1912
+ },
1913
+ {
1914
+ "epoch": 0.42,
1915
+ "learning_rate": 0.0002,
1916
+ "loss": 1.2786,
1917
+ "step": 314
1918
+ },
1919
+ {
1920
+ "epoch": 0.42,
1921
+ "learning_rate": 0.0002,
1922
+ "loss": 1.7656,
1923
+ "step": 315
1924
+ },
1925
+ {
1926
+ "epoch": 0.42,
1927
+ "learning_rate": 0.0002,
1928
+ "loss": 1.5713,
1929
+ "step": 316
1930
+ },
1931
+ {
1932
+ "epoch": 0.43,
1933
+ "learning_rate": 0.0002,
1934
+ "loss": 1.3235,
1935
+ "step": 317
1936
+ },
1937
+ {
1938
+ "epoch": 0.43,
1939
+ "learning_rate": 0.0002,
1940
+ "loss": 1.3829,
1941
+ "step": 318
1942
+ },
1943
+ {
1944
+ "epoch": 0.43,
1945
+ "learning_rate": 0.0002,
1946
+ "loss": 1.4187,
1947
+ "step": 319
1948
+ },
1949
+ {
1950
+ "epoch": 0.43,
1951
+ "learning_rate": 0.0002,
1952
+ "loss": 1.3544,
1953
+ "step": 320
1954
+ },
1955
+ {
1956
+ "epoch": 0.43,
1957
+ "learning_rate": 0.0002,
1958
+ "loss": 1.5638,
1959
+ "step": 321
1960
+ },
1961
+ {
1962
+ "epoch": 0.43,
1963
+ "learning_rate": 0.0002,
1964
+ "loss": 1.269,
1965
+ "step": 322
1966
+ },
1967
+ {
1968
+ "epoch": 0.43,
1969
+ "learning_rate": 0.0002,
1970
+ "loss": 1.4917,
1971
+ "step": 323
1972
+ },
1973
+ {
1974
+ "epoch": 0.44,
1975
+ "learning_rate": 0.0002,
1976
+ "loss": 1.4635,
1977
+ "step": 324
1978
+ },
1979
+ {
1980
+ "epoch": 0.44,
1981
+ "learning_rate": 0.0002,
1982
+ "loss": 1.3772,
1983
+ "step": 325
1984
+ },
1985
+ {
1986
+ "epoch": 0.44,
1987
+ "learning_rate": 0.0002,
1988
+ "loss": 1.3561,
1989
+ "step": 326
1990
+ },
1991
+ {
1992
+ "epoch": 0.44,
1993
+ "learning_rate": 0.0002,
1994
+ "loss": 1.3586,
1995
+ "step": 327
1996
+ },
1997
+ {
1998
+ "epoch": 0.44,
1999
+ "learning_rate": 0.0002,
2000
+ "loss": 1.1845,
2001
+ "step": 328
2002
+ },
2003
+ {
2004
+ "epoch": 0.44,
2005
+ "learning_rate": 0.0002,
2006
+ "loss": 1.4362,
2007
+ "step": 329
2008
+ },
2009
+ {
2010
+ "epoch": 0.44,
2011
+ "learning_rate": 0.0002,
2012
+ "loss": 1.1881,
2013
+ "step": 330
2014
+ },
2015
+ {
2016
+ "epoch": 0.44,
2017
+ "learning_rate": 0.0002,
2018
+ "loss": 1.3098,
2019
+ "step": 331
2020
+ },
2021
+ {
2022
+ "epoch": 0.45,
2023
+ "learning_rate": 0.0002,
2024
+ "loss": 1.4673,
2025
+ "step": 332
2026
+ },
2027
+ {
2028
+ "epoch": 0.45,
2029
+ "learning_rate": 0.0002,
2030
+ "loss": 1.4496,
2031
+ "step": 333
2032
+ },
2033
+ {
2034
+ "epoch": 0.45,
2035
+ "learning_rate": 0.0002,
2036
+ "loss": 1.5788,
2037
+ "step": 334
2038
+ },
2039
+ {
2040
+ "epoch": 0.45,
2041
+ "learning_rate": 0.0002,
2042
+ "loss": 1.2582,
2043
+ "step": 335
2044
+ },
2045
+ {
2046
+ "epoch": 0.45,
2047
+ "learning_rate": 0.0002,
2048
+ "loss": 1.5255,
2049
+ "step": 336
2050
+ },
2051
+ {
2052
+ "epoch": 0.45,
2053
+ "learning_rate": 0.0002,
2054
+ "loss": 1.7017,
2055
+ "step": 337
2056
+ },
2057
+ {
2058
+ "epoch": 0.45,
2059
+ "learning_rate": 0.0002,
2060
+ "loss": 1.7231,
2061
+ "step": 338
2062
+ },
2063
+ {
2064
+ "epoch": 0.46,
2065
+ "learning_rate": 0.0002,
2066
+ "loss": 1.4447,
2067
+ "step": 339
2068
+ },
2069
+ {
2070
+ "epoch": 0.46,
2071
+ "learning_rate": 0.0002,
2072
+ "loss": 1.3386,
2073
+ "step": 340
2074
+ },
2075
+ {
2076
+ "epoch": 0.46,
2077
+ "learning_rate": 0.0002,
2078
+ "loss": 1.3791,
2079
+ "step": 341
2080
+ },
2081
+ {
2082
+ "epoch": 0.46,
2083
+ "learning_rate": 0.0002,
2084
+ "loss": 1.3071,
2085
+ "step": 342
2086
+ },
2087
+ {
2088
+ "epoch": 0.46,
2089
+ "learning_rate": 0.0002,
2090
+ "loss": 1.2949,
2091
+ "step": 343
2092
+ },
2093
+ {
2094
+ "epoch": 0.46,
2095
+ "learning_rate": 0.0002,
2096
+ "loss": 1.3033,
2097
+ "step": 344
2098
+ },
2099
+ {
2100
+ "epoch": 0.46,
2101
+ "learning_rate": 0.0002,
2102
+ "loss": 1.4243,
2103
+ "step": 345
2104
+ },
2105
+ {
2106
+ "epoch": 0.46,
2107
+ "learning_rate": 0.0002,
2108
+ "loss": 1.3747,
2109
+ "step": 346
2110
+ },
2111
+ {
2112
+ "epoch": 0.47,
2113
+ "learning_rate": 0.0002,
2114
+ "loss": 1.427,
2115
+ "step": 347
2116
+ },
2117
+ {
2118
+ "epoch": 0.47,
2119
+ "learning_rate": 0.0002,
2120
+ "loss": 1.8376,
2121
+ "step": 348
2122
+ },
2123
+ {
2124
+ "epoch": 0.47,
2125
+ "learning_rate": 0.0002,
2126
+ "loss": 1.7076,
2127
+ "step": 349
2128
+ },
2129
+ {
2130
+ "epoch": 0.47,
2131
+ "learning_rate": 0.0002,
2132
+ "loss": 1.3889,
2133
+ "step": 350
2134
+ },
2135
+ {
2136
+ "epoch": 0.47,
2137
+ "learning_rate": 0.0002,
2138
+ "loss": 1.4117,
2139
+ "step": 351
2140
+ },
2141
+ {
2142
+ "epoch": 0.47,
2143
+ "learning_rate": 0.0002,
2144
+ "loss": 1.4598,
2145
+ "step": 352
2146
+ },
2147
+ {
2148
+ "epoch": 0.47,
2149
+ "learning_rate": 0.0002,
2150
+ "loss": 1.2908,
2151
+ "step": 353
2152
+ },
2153
+ {
2154
+ "epoch": 0.48,
2155
+ "learning_rate": 0.0002,
2156
+ "loss": 1.4376,
2157
+ "step": 354
2158
+ },
2159
+ {
2160
+ "epoch": 0.48,
2161
+ "learning_rate": 0.0002,
2162
+ "loss": 1.5732,
2163
+ "step": 355
2164
+ },
2165
+ {
2166
+ "epoch": 0.48,
2167
+ "learning_rate": 0.0002,
2168
+ "loss": 1.3376,
2169
+ "step": 356
2170
+ },
2171
+ {
2172
+ "epoch": 0.48,
2173
+ "learning_rate": 0.0002,
2174
+ "loss": 1.3073,
2175
+ "step": 357
2176
+ },
2177
+ {
2178
+ "epoch": 0.48,
2179
+ "learning_rate": 0.0002,
2180
+ "loss": 1.592,
2181
+ "step": 358
2182
+ },
2183
+ {
2184
+ "epoch": 0.48,
2185
+ "learning_rate": 0.0002,
2186
+ "loss": 1.5166,
2187
+ "step": 359
2188
+ },
2189
+ {
2190
+ "epoch": 0.48,
2191
+ "learning_rate": 0.0002,
2192
+ "loss": 1.2739,
2193
+ "step": 360
2194
+ },
2195
+ {
2196
+ "epoch": 0.49,
2197
+ "learning_rate": 0.0002,
2198
+ "loss": 1.3329,
2199
+ "step": 361
2200
+ },
2201
+ {
2202
+ "epoch": 0.49,
2203
+ "learning_rate": 0.0002,
2204
+ "loss": 1.5451,
2205
+ "step": 362
2206
+ },
2207
+ {
2208
+ "epoch": 0.49,
2209
+ "learning_rate": 0.0002,
2210
+ "loss": 1.3675,
2211
+ "step": 363
2212
+ },
2213
+ {
2214
+ "epoch": 0.49,
2215
+ "learning_rate": 0.0002,
2216
+ "loss": 1.1963,
2217
+ "step": 364
2218
+ },
2219
+ {
2220
+ "epoch": 0.49,
2221
+ "learning_rate": 0.0002,
2222
+ "loss": 1.2345,
2223
+ "step": 365
2224
+ },
2225
+ {
2226
+ "epoch": 0.49,
2227
+ "learning_rate": 0.0002,
2228
+ "loss": 1.705,
2229
+ "step": 366
2230
+ },
2231
+ {
2232
+ "epoch": 0.49,
2233
+ "learning_rate": 0.0002,
2234
+ "loss": 1.3867,
2235
+ "step": 367
2236
+ },
2237
+ {
2238
+ "epoch": 0.49,
2239
+ "learning_rate": 0.0002,
2240
+ "loss": 1.3674,
2241
+ "step": 368
2242
+ },
2243
+ {
2244
+ "epoch": 0.5,
2245
+ "learning_rate": 0.0002,
2246
+ "loss": 1.4713,
2247
+ "step": 369
2248
+ },
2249
+ {
2250
+ "epoch": 0.5,
2251
+ "learning_rate": 0.0002,
2252
+ "loss": 1.2732,
2253
+ "step": 370
2254
+ },
2255
+ {
2256
+ "epoch": 0.5,
2257
+ "learning_rate": 0.0002,
2258
+ "loss": 1.3148,
2259
+ "step": 371
2260
+ },
2261
+ {
2262
+ "epoch": 0.5,
2263
+ "learning_rate": 0.0002,
2264
+ "loss": 1.2673,
2265
+ "step": 372
2266
+ },
2267
+ {
2268
+ "epoch": 0.5,
2269
+ "learning_rate": 0.0002,
2270
+ "loss": 1.2504,
2271
+ "step": 373
2272
+ },
2273
+ {
2274
+ "epoch": 0.5,
2275
+ "learning_rate": 0.0002,
2276
+ "loss": 1.4689,
2277
+ "step": 374
2278
+ },
2279
+ {
2280
+ "epoch": 0.5,
2281
+ "learning_rate": 0.0002,
2282
+ "loss": 1.3824,
2283
+ "step": 375
2284
+ },
2285
+ {
2286
+ "epoch": 0.51,
2287
+ "learning_rate": 0.0002,
2288
+ "loss": 1.2314,
2289
+ "step": 376
2290
+ },
2291
+ {
2292
+ "epoch": 0.51,
2293
+ "learning_rate": 0.0002,
2294
+ "loss": 1.6187,
2295
+ "step": 377
2296
+ },
2297
+ {
2298
+ "epoch": 0.51,
2299
+ "learning_rate": 0.0002,
2300
+ "loss": 1.27,
2301
+ "step": 378
2302
+ },
2303
+ {
2304
+ "epoch": 0.51,
2305
+ "learning_rate": 0.0002,
2306
+ "loss": 1.4199,
2307
+ "step": 379
2308
+ },
2309
+ {
2310
+ "epoch": 0.51,
2311
+ "learning_rate": 0.0002,
2312
+ "loss": 1.5226,
2313
+ "step": 380
2314
+ },
2315
+ {
2316
+ "epoch": 0.51,
2317
+ "learning_rate": 0.0002,
2318
+ "loss": 1.4534,
2319
+ "step": 381
2320
+ },
2321
+ {
2322
+ "epoch": 0.51,
2323
+ "learning_rate": 0.0002,
2324
+ "loss": 1.1076,
2325
+ "step": 382
2326
+ },
2327
+ {
2328
+ "epoch": 0.51,
2329
+ "learning_rate": 0.0002,
2330
+ "loss": 1.438,
2331
+ "step": 383
2332
+ },
2333
+ {
2334
+ "epoch": 0.52,
2335
+ "learning_rate": 0.0002,
2336
+ "loss": 1.2989,
2337
+ "step": 384
2338
+ },
2339
+ {
2340
+ "epoch": 0.52,
2341
+ "learning_rate": 0.0002,
2342
+ "loss": 1.2926,
2343
+ "step": 385
2344
+ },
2345
+ {
2346
+ "epoch": 0.52,
2347
+ "learning_rate": 0.0002,
2348
+ "loss": 1.2179,
2349
+ "step": 386
2350
+ },
2351
+ {
2352
+ "epoch": 0.52,
2353
+ "learning_rate": 0.0002,
2354
+ "loss": 1.3079,
2355
+ "step": 387
2356
+ },
2357
+ {
2358
+ "epoch": 0.52,
2359
+ "learning_rate": 0.0002,
2360
+ "loss": 1.3239,
2361
+ "step": 388
2362
+ },
2363
+ {
2364
+ "epoch": 0.52,
2365
+ "learning_rate": 0.0002,
2366
+ "loss": 1.4648,
2367
+ "step": 389
2368
+ },
2369
+ {
2370
+ "epoch": 0.52,
2371
+ "learning_rate": 0.0002,
2372
+ "loss": 1.309,
2373
+ "step": 390
2374
+ },
2375
+ {
2376
+ "epoch": 0.53,
2377
+ "learning_rate": 0.0002,
2378
+ "loss": 1.2591,
2379
+ "step": 391
2380
+ },
2381
+ {
2382
+ "epoch": 0.53,
2383
+ "learning_rate": 0.0002,
2384
+ "loss": 1.2762,
2385
+ "step": 392
2386
+ },
2387
+ {
2388
+ "epoch": 0.53,
2389
+ "learning_rate": 0.0002,
2390
+ "loss": 1.3492,
2391
+ "step": 393
2392
+ },
2393
+ {
2394
+ "epoch": 0.53,
2395
+ "learning_rate": 0.0002,
2396
+ "loss": 1.4163,
2397
+ "step": 394
2398
+ },
2399
+ {
2400
+ "epoch": 0.53,
2401
+ "learning_rate": 0.0002,
2402
+ "loss": 1.6296,
2403
+ "step": 395
2404
+ },
2405
+ {
2406
+ "epoch": 0.53,
2407
+ "learning_rate": 0.0002,
2408
+ "loss": 1.3282,
2409
+ "step": 396
2410
+ },
2411
+ {
2412
+ "epoch": 0.53,
2413
+ "learning_rate": 0.0002,
2414
+ "loss": 1.3957,
2415
+ "step": 397
2416
+ },
2417
+ {
2418
+ "epoch": 0.53,
2419
+ "learning_rate": 0.0002,
2420
+ "loss": 1.4954,
2421
+ "step": 398
2422
+ },
2423
+ {
2424
+ "epoch": 0.54,
2425
+ "learning_rate": 0.0002,
2426
+ "loss": 1.0683,
2427
+ "step": 399
2428
+ },
2429
+ {
2430
+ "epoch": 0.54,
2431
+ "learning_rate": 0.0002,
2432
+ "loss": 1.4019,
2433
+ "step": 400
2434
+ },
2435
+ {
2436
+ "epoch": 0.54,
2437
+ "eval_loss": 1.3792049884796143,
2438
+ "eval_runtime": 441.0954,
2439
+ "eval_samples_per_second": 1.564,
2440
+ "eval_steps_per_second": 0.392,
2441
+ "step": 400
2442
+ },
2443
+ {
2444
+ "epoch": 0.54,
2445
+ "learning_rate": 0.0002,
2446
+ "loss": 1.4533,
2447
+ "step": 401
2448
+ },
2449
+ {
2450
+ "epoch": 0.54,
2451
+ "learning_rate": 0.0002,
2452
+ "loss": 1.5101,
2453
+ "step": 402
2454
+ },
2455
+ {
2456
+ "epoch": 0.54,
2457
+ "learning_rate": 0.0002,
2458
+ "loss": 1.3062,
2459
+ "step": 403
2460
+ },
2461
+ {
2462
+ "epoch": 0.54,
2463
+ "learning_rate": 0.0002,
2464
+ "loss": 1.4517,
2465
+ "step": 404
2466
+ },
2467
+ {
2468
+ "epoch": 0.54,
2469
+ "learning_rate": 0.0002,
2470
+ "loss": 1.4478,
2471
+ "step": 405
2472
+ },
2473
+ {
2474
+ "epoch": 0.55,
2475
+ "learning_rate": 0.0002,
2476
+ "loss": 1.4938,
2477
+ "step": 406
2478
+ },
2479
+ {
2480
+ "epoch": 0.55,
2481
+ "learning_rate": 0.0002,
2482
+ "loss": 1.3137,
2483
+ "step": 407
2484
+ },
2485
+ {
2486
+ "epoch": 0.55,
2487
+ "learning_rate": 0.0002,
2488
+ "loss": 1.254,
2489
+ "step": 408
2490
+ },
2491
+ {
2492
+ "epoch": 0.55,
2493
+ "learning_rate": 0.0002,
2494
+ "loss": 1.4163,
2495
+ "step": 409
2496
+ },
2497
+ {
2498
+ "epoch": 0.55,
2499
+ "learning_rate": 0.0002,
2500
+ "loss": 1.7432,
2501
+ "step": 410
2502
+ },
2503
+ {
2504
+ "epoch": 0.55,
2505
+ "learning_rate": 0.0002,
2506
+ "loss": 1.5261,
2507
+ "step": 411
2508
+ },
2509
+ {
2510
+ "epoch": 0.55,
2511
+ "learning_rate": 0.0002,
2512
+ "loss": 1.2845,
2513
+ "step": 412
2514
+ },
2515
+ {
2516
+ "epoch": 0.55,
2517
+ "learning_rate": 0.0002,
2518
+ "loss": 1.2449,
2519
+ "step": 413
2520
+ },
2521
+ {
2522
+ "epoch": 0.56,
2523
+ "learning_rate": 0.0002,
2524
+ "loss": 1.68,
2525
+ "step": 414
2526
+ },
2527
+ {
2528
+ "epoch": 0.56,
2529
+ "learning_rate": 0.0002,
2530
+ "loss": 2.0014,
2531
+ "step": 415
2532
+ },
2533
+ {
2534
+ "epoch": 0.56,
2535
+ "learning_rate": 0.0002,
2536
+ "loss": 1.3557,
2537
+ "step": 416
2538
+ },
2539
+ {
2540
+ "epoch": 0.56,
2541
+ "learning_rate": 0.0002,
2542
+ "loss": 1.8836,
2543
+ "step": 417
2544
+ },
2545
+ {
2546
+ "epoch": 0.56,
2547
+ "learning_rate": 0.0002,
2548
+ "loss": 1.496,
2549
+ "step": 418
2550
+ },
2551
+ {
2552
+ "epoch": 0.56,
2553
+ "learning_rate": 0.0002,
2554
+ "loss": 1.283,
2555
+ "step": 419
2556
+ },
2557
+ {
2558
+ "epoch": 0.56,
2559
+ "learning_rate": 0.0002,
2560
+ "loss": 1.4569,
2561
+ "step": 420
2562
+ },
2563
+ {
2564
+ "epoch": 0.57,
2565
+ "learning_rate": 0.0002,
2566
+ "loss": 1.455,
2567
+ "step": 421
2568
+ },
2569
+ {
2570
+ "epoch": 0.57,
2571
+ "learning_rate": 0.0002,
2572
+ "loss": 1.4418,
2573
+ "step": 422
2574
+ },
2575
+ {
2576
+ "epoch": 0.57,
2577
+ "learning_rate": 0.0002,
2578
+ "loss": 1.4152,
2579
+ "step": 423
2580
+ },
2581
+ {
2582
+ "epoch": 0.57,
2583
+ "learning_rate": 0.0002,
2584
+ "loss": 1.3991,
2585
+ "step": 424
2586
+ },
2587
+ {
2588
+ "epoch": 0.57,
2589
+ "learning_rate": 0.0002,
2590
+ "loss": 1.194,
2591
+ "step": 425
2592
+ },
2593
+ {
2594
+ "epoch": 0.57,
2595
+ "learning_rate": 0.0002,
2596
+ "loss": 1.4039,
2597
+ "step": 426
2598
+ },
2599
+ {
2600
+ "epoch": 0.57,
2601
+ "learning_rate": 0.0002,
2602
+ "loss": 1.4844,
2603
+ "step": 427
2604
+ },
2605
+ {
2606
+ "epoch": 0.58,
2607
+ "learning_rate": 0.0002,
2608
+ "loss": 1.5439,
2609
+ "step": 428
2610
+ },
2611
+ {
2612
+ "epoch": 0.58,
2613
+ "learning_rate": 0.0002,
2614
+ "loss": 1.3603,
2615
+ "step": 429
2616
+ },
2617
+ {
2618
+ "epoch": 0.58,
2619
+ "learning_rate": 0.0002,
2620
+ "loss": 1.4299,
2621
+ "step": 430
2622
+ },
2623
+ {
2624
+ "epoch": 0.58,
2625
+ "learning_rate": 0.0002,
2626
+ "loss": 1.6496,
2627
+ "step": 431
2628
+ },
2629
+ {
2630
+ "epoch": 0.58,
2631
+ "learning_rate": 0.0002,
2632
+ "loss": 1.447,
2633
+ "step": 432
2634
+ },
2635
+ {
2636
+ "epoch": 0.58,
2637
+ "learning_rate": 0.0002,
2638
+ "loss": 1.1043,
2639
+ "step": 433
2640
+ },
2641
+ {
2642
+ "epoch": 0.58,
2643
+ "learning_rate": 0.0002,
2644
+ "loss": 1.5972,
2645
+ "step": 434
2646
+ },
2647
+ {
2648
+ "epoch": 0.58,
2649
+ "learning_rate": 0.0002,
2650
+ "loss": 1.5962,
2651
+ "step": 435
2652
+ },
2653
+ {
2654
+ "epoch": 0.59,
2655
+ "learning_rate": 0.0002,
2656
+ "loss": 1.4682,
2657
+ "step": 436
2658
+ },
2659
+ {
2660
+ "epoch": 0.59,
2661
+ "learning_rate": 0.0002,
2662
+ "loss": 1.4733,
2663
+ "step": 437
2664
+ },
2665
+ {
2666
+ "epoch": 0.59,
2667
+ "learning_rate": 0.0002,
2668
+ "loss": 1.5552,
2669
+ "step": 438
2670
+ },
2671
+ {
2672
+ "epoch": 0.59,
2673
+ "learning_rate": 0.0002,
2674
+ "loss": 1.3559,
2675
+ "step": 439
2676
+ },
2677
+ {
2678
+ "epoch": 0.59,
2679
+ "learning_rate": 0.0002,
2680
+ "loss": 1.4644,
2681
+ "step": 440
2682
+ },
2683
+ {
2684
+ "epoch": 0.59,
2685
+ "learning_rate": 0.0002,
2686
+ "loss": 1.3019,
2687
+ "step": 441
2688
+ },
2689
+ {
2690
+ "epoch": 0.59,
2691
+ "learning_rate": 0.0002,
2692
+ "loss": 1.3299,
2693
+ "step": 442
2694
+ },
2695
+ {
2696
+ "epoch": 0.6,
2697
+ "learning_rate": 0.0002,
2698
+ "loss": 1.2782,
2699
+ "step": 443
2700
+ },
2701
+ {
2702
+ "epoch": 0.6,
2703
+ "learning_rate": 0.0002,
2704
+ "loss": 1.4651,
2705
+ "step": 444
2706
+ },
2707
+ {
2708
+ "epoch": 0.6,
2709
+ "learning_rate": 0.0002,
2710
+ "loss": 1.3575,
2711
+ "step": 445
2712
+ },
2713
+ {
2714
+ "epoch": 0.6,
2715
+ "learning_rate": 0.0002,
2716
+ "loss": 1.4675,
2717
+ "step": 446
2718
+ },
2719
+ {
2720
+ "epoch": 0.6,
2721
+ "learning_rate": 0.0002,
2722
+ "loss": 1.2348,
2723
+ "step": 447
2724
+ },
2725
+ {
2726
+ "epoch": 0.6,
2727
+ "learning_rate": 0.0002,
2728
+ "loss": 1.3047,
2729
+ "step": 448
2730
+ },
2731
+ {
2732
+ "epoch": 0.6,
2733
+ "learning_rate": 0.0002,
2734
+ "loss": 1.3681,
2735
+ "step": 449
2736
+ },
2737
+ {
2738
+ "epoch": 0.6,
2739
+ "learning_rate": 0.0002,
2740
+ "loss": 1.1474,
2741
+ "step": 450
2742
+ },
2743
+ {
2744
+ "epoch": 0.61,
2745
+ "learning_rate": 0.0002,
2746
+ "loss": 1.328,
2747
+ "step": 451
2748
+ },
2749
+ {
2750
+ "epoch": 0.61,
2751
+ "learning_rate": 0.0002,
2752
+ "loss": 1.5314,
2753
+ "step": 452
2754
+ },
2755
+ {
2756
+ "epoch": 0.61,
2757
+ "learning_rate": 0.0002,
2758
+ "loss": 1.2926,
2759
+ "step": 453
2760
+ },
2761
+ {
2762
+ "epoch": 0.61,
2763
+ "learning_rate": 0.0002,
2764
+ "loss": 1.4488,
2765
+ "step": 454
2766
+ },
2767
+ {
2768
+ "epoch": 0.61,
2769
+ "learning_rate": 0.0002,
2770
+ "loss": 1.5578,
2771
+ "step": 455
2772
+ },
2773
+ {
2774
+ "epoch": 0.61,
2775
+ "learning_rate": 0.0002,
2776
+ "loss": 1.2888,
2777
+ "step": 456
2778
+ },
2779
+ {
2780
+ "epoch": 0.61,
2781
+ "learning_rate": 0.0002,
2782
+ "loss": 1.4423,
2783
+ "step": 457
2784
+ },
2785
+ {
2786
+ "epoch": 0.62,
2787
+ "learning_rate": 0.0002,
2788
+ "loss": 1.5929,
2789
+ "step": 458
2790
+ },
2791
+ {
2792
+ "epoch": 0.62,
2793
+ "learning_rate": 0.0002,
2794
+ "loss": 1.5411,
2795
+ "step": 459
2796
+ },
2797
+ {
2798
+ "epoch": 0.62,
2799
+ "learning_rate": 0.0002,
2800
+ "loss": 1.2644,
2801
+ "step": 460
2802
+ },
2803
+ {
2804
+ "epoch": 0.62,
2805
+ "learning_rate": 0.0002,
2806
+ "loss": 1.3733,
2807
+ "step": 461
2808
+ },
2809
+ {
2810
+ "epoch": 0.62,
2811
+ "learning_rate": 0.0002,
2812
+ "loss": 1.6307,
2813
+ "step": 462
2814
+ },
2815
+ {
2816
+ "epoch": 0.62,
2817
+ "learning_rate": 0.0002,
2818
+ "loss": 1.492,
2819
+ "step": 463
2820
+ },
2821
+ {
2822
+ "epoch": 0.62,
2823
+ "learning_rate": 0.0002,
2824
+ "loss": 1.4052,
2825
+ "step": 464
2826
+ },
2827
+ {
2828
+ "epoch": 0.62,
2829
+ "learning_rate": 0.0002,
2830
+ "loss": 1.2535,
2831
+ "step": 465
2832
+ },
2833
+ {
2834
+ "epoch": 0.63,
2835
+ "learning_rate": 0.0002,
2836
+ "loss": 1.6804,
2837
+ "step": 466
2838
+ },
2839
+ {
2840
+ "epoch": 0.63,
2841
+ "learning_rate": 0.0002,
2842
+ "loss": 1.306,
2843
+ "step": 467
2844
+ },
2845
+ {
2846
+ "epoch": 0.63,
2847
+ "learning_rate": 0.0002,
2848
+ "loss": 1.3982,
2849
+ "step": 468
2850
+ },
2851
+ {
2852
+ "epoch": 0.63,
2853
+ "learning_rate": 0.0002,
2854
+ "loss": 1.3402,
2855
+ "step": 469
2856
+ },
2857
+ {
2858
+ "epoch": 0.63,
2859
+ "learning_rate": 0.0002,
2860
+ "loss": 1.2094,
2861
+ "step": 470
2862
+ },
2863
+ {
2864
+ "epoch": 0.63,
2865
+ "learning_rate": 0.0002,
2866
+ "loss": 1.3615,
2867
+ "step": 471
2868
+ },
2869
+ {
2870
+ "epoch": 0.63,
2871
+ "learning_rate": 0.0002,
2872
+ "loss": 1.3427,
2873
+ "step": 472
2874
+ },
2875
+ {
2876
+ "epoch": 0.64,
2877
+ "learning_rate": 0.0002,
2878
+ "loss": 1.4148,
2879
+ "step": 473
2880
+ },
2881
+ {
2882
+ "epoch": 0.64,
2883
+ "learning_rate": 0.0002,
2884
+ "loss": 1.4688,
2885
+ "step": 474
2886
+ },
2887
+ {
2888
+ "epoch": 0.64,
2889
+ "learning_rate": 0.0002,
2890
+ "loss": 1.2682,
2891
+ "step": 475
2892
+ },
2893
+ {
2894
+ "epoch": 0.64,
2895
+ "learning_rate": 0.0002,
2896
+ "loss": 1.3589,
2897
+ "step": 476
2898
+ },
2899
+ {
2900
+ "epoch": 0.64,
2901
+ "learning_rate": 0.0002,
2902
+ "loss": 1.2851,
2903
+ "step": 477
2904
+ },
2905
+ {
2906
+ "epoch": 0.64,
2907
+ "learning_rate": 0.0002,
2908
+ "loss": 1.3374,
2909
+ "step": 478
2910
+ },
2911
+ {
2912
+ "epoch": 0.64,
2913
+ "learning_rate": 0.0002,
2914
+ "loss": 1.3439,
2915
+ "step": 479
2916
+ },
2917
+ {
2918
+ "epoch": 0.64,
2919
+ "learning_rate": 0.0002,
2920
+ "loss": 1.4842,
2921
+ "step": 480
2922
+ },
2923
+ {
2924
+ "epoch": 0.65,
2925
+ "learning_rate": 0.0002,
2926
+ "loss": 1.5513,
2927
+ "step": 481
2928
+ },
2929
+ {
2930
+ "epoch": 0.65,
2931
+ "learning_rate": 0.0002,
2932
+ "loss": 1.48,
2933
+ "step": 482
2934
+ },
2935
+ {
2936
+ "epoch": 0.65,
2937
+ "learning_rate": 0.0002,
2938
+ "loss": 1.2883,
2939
+ "step": 483
2940
+ },
2941
+ {
2942
+ "epoch": 0.65,
2943
+ "learning_rate": 0.0002,
2944
+ "loss": 1.4636,
2945
+ "step": 484
2946
+ },
2947
+ {
2948
+ "epoch": 0.65,
2949
+ "learning_rate": 0.0002,
2950
+ "loss": 1.4196,
2951
+ "step": 485
2952
+ },
2953
+ {
2954
+ "epoch": 0.65,
2955
+ "learning_rate": 0.0002,
2956
+ "loss": 1.3489,
2957
+ "step": 486
2958
+ },
2959
+ {
2960
+ "epoch": 0.65,
2961
+ "learning_rate": 0.0002,
2962
+ "loss": 1.3909,
2963
+ "step": 487
2964
+ },
2965
+ {
2966
+ "epoch": 0.66,
2967
+ "learning_rate": 0.0002,
2968
+ "loss": 1.2938,
2969
+ "step": 488
2970
+ },
2971
+ {
2972
+ "epoch": 0.66,
2973
+ "learning_rate": 0.0002,
2974
+ "loss": 1.3898,
2975
+ "step": 489
2976
+ },
2977
+ {
2978
+ "epoch": 0.66,
2979
+ "learning_rate": 0.0002,
2980
+ "loss": 1.6946,
2981
+ "step": 490
2982
+ },
2983
+ {
2984
+ "epoch": 0.66,
2985
+ "learning_rate": 0.0002,
2986
+ "loss": 1.528,
2987
+ "step": 491
2988
+ },
2989
+ {
2990
+ "epoch": 0.66,
2991
+ "learning_rate": 0.0002,
2992
+ "loss": 1.4905,
2993
+ "step": 492
2994
+ },
2995
+ {
2996
+ "epoch": 0.66,
2997
+ "learning_rate": 0.0002,
2998
+ "loss": 1.2424,
2999
+ "step": 493
3000
+ },
3001
+ {
3002
+ "epoch": 0.66,
3003
+ "learning_rate": 0.0002,
3004
+ "loss": 1.475,
3005
+ "step": 494
3006
+ },
3007
+ {
3008
+ "epoch": 0.67,
3009
+ "learning_rate": 0.0002,
3010
+ "loss": 1.3993,
3011
+ "step": 495
3012
+ },
3013
+ {
3014
+ "epoch": 0.67,
3015
+ "learning_rate": 0.0002,
3016
+ "loss": 1.3028,
3017
+ "step": 496
3018
+ },
3019
+ {
3020
+ "epoch": 0.67,
3021
+ "learning_rate": 0.0002,
3022
+ "loss": 1.4869,
3023
+ "step": 497
3024
+ },
3025
+ {
3026
+ "epoch": 0.67,
3027
+ "learning_rate": 0.0002,
3028
+ "loss": 1.6581,
3029
+ "step": 498
3030
+ },
3031
+ {
3032
+ "epoch": 0.67,
3033
+ "learning_rate": 0.0002,
3034
+ "loss": 1.4166,
3035
+ "step": 499
3036
+ },
3037
+ {
3038
+ "epoch": 0.67,
3039
+ "learning_rate": 0.0002,
3040
+ "loss": 1.9966,
3041
+ "step": 500
3042
+ },
3043
+ {
3044
+ "epoch": 0.67,
3045
+ "eval_loss": 1.3746258020401,
3046
+ "eval_runtime": 440.7359,
3047
+ "eval_samples_per_second": 1.566,
3048
+ "eval_steps_per_second": 0.393,
3049
+ "step": 500
3050
+ },
3051
+ {
3052
+ "epoch": 0.67,
3053
+ "learning_rate": 0.0002,
3054
+ "loss": 1.2995,
3055
+ "step": 501
3056
+ },
3057
+ {
3058
+ "epoch": 0.67,
3059
+ "learning_rate": 0.0002,
3060
+ "loss": 1.2557,
3061
+ "step": 502
3062
+ },
3063
+ {
3064
+ "epoch": 0.68,
3065
+ "learning_rate": 0.0002,
3066
+ "loss": 1.2462,
3067
+ "step": 503
3068
+ },
3069
+ {
3070
+ "epoch": 0.68,
3071
+ "learning_rate": 0.0002,
3072
+ "loss": 1.5088,
3073
+ "step": 504
3074
+ },
3075
+ {
3076
+ "epoch": 0.68,
3077
+ "learning_rate": 0.0002,
3078
+ "loss": 1.6118,
3079
+ "step": 505
3080
+ },
3081
+ {
3082
+ "epoch": 0.68,
3083
+ "learning_rate": 0.0002,
3084
+ "loss": 1.1935,
3085
+ "step": 506
3086
+ },
3087
+ {
3088
+ "epoch": 0.68,
3089
+ "learning_rate": 0.0002,
3090
+ "loss": 1.4858,
3091
+ "step": 507
3092
+ },
3093
+ {
3094
+ "epoch": 0.68,
3095
+ "learning_rate": 0.0002,
3096
+ "loss": 1.6135,
3097
+ "step": 508
3098
+ },
3099
+ {
3100
+ "epoch": 0.68,
3101
+ "learning_rate": 0.0002,
3102
+ "loss": 1.329,
3103
+ "step": 509
3104
+ },
3105
+ {
3106
+ "epoch": 0.69,
3107
+ "learning_rate": 0.0002,
3108
+ "loss": 1.6557,
3109
+ "step": 510
3110
+ },
3111
+ {
3112
+ "epoch": 0.69,
3113
+ "learning_rate": 0.0002,
3114
+ "loss": 1.5889,
3115
+ "step": 511
3116
+ },
3117
+ {
3118
+ "epoch": 0.69,
3119
+ "learning_rate": 0.0002,
3120
+ "loss": 1.3667,
3121
+ "step": 512
3122
+ },
3123
+ {
3124
+ "epoch": 0.69,
3125
+ "learning_rate": 0.0002,
3126
+ "loss": 1.7799,
3127
+ "step": 513
3128
+ },
3129
+ {
3130
+ "epoch": 0.69,
3131
+ "learning_rate": 0.0002,
3132
+ "loss": 1.3817,
3133
+ "step": 514
3134
+ },
3135
+ {
3136
+ "epoch": 0.69,
3137
+ "learning_rate": 0.0002,
3138
+ "loss": 1.4662,
3139
+ "step": 515
3140
+ },
3141
+ {
3142
+ "epoch": 0.69,
3143
+ "learning_rate": 0.0002,
3144
+ "loss": 1.4186,
3145
+ "step": 516
3146
+ },
3147
+ {
3148
+ "epoch": 0.69,
3149
+ "learning_rate": 0.0002,
3150
+ "loss": 1.4437,
3151
+ "step": 517
3152
+ },
3153
+ {
3154
+ "epoch": 0.7,
3155
+ "learning_rate": 0.0002,
3156
+ "loss": 1.3603,
3157
+ "step": 518
3158
+ },
3159
+ {
3160
+ "epoch": 0.7,
3161
+ "learning_rate": 0.0002,
3162
+ "loss": 1.6023,
3163
+ "step": 519
3164
+ },
3165
+ {
3166
+ "epoch": 0.7,
3167
+ "learning_rate": 0.0002,
3168
+ "loss": 1.6167,
3169
+ "step": 520
3170
+ },
3171
+ {
3172
+ "epoch": 0.7,
3173
+ "learning_rate": 0.0002,
3174
+ "loss": 1.5113,
3175
+ "step": 521
3176
+ },
3177
+ {
3178
+ "epoch": 0.7,
3179
+ "learning_rate": 0.0002,
3180
+ "loss": 1.3075,
3181
+ "step": 522
3182
+ },
3183
+ {
3184
+ "epoch": 0.7,
3185
+ "learning_rate": 0.0002,
3186
+ "loss": 1.5514,
3187
+ "step": 523
3188
+ },
3189
+ {
3190
+ "epoch": 0.7,
3191
+ "learning_rate": 0.0002,
3192
+ "loss": 1.6074,
3193
+ "step": 524
3194
+ },
3195
+ {
3196
+ "epoch": 0.71,
3197
+ "learning_rate": 0.0002,
3198
+ "loss": 1.3738,
3199
+ "step": 525
3200
+ },
3201
+ {
3202
+ "epoch": 0.71,
3203
+ "learning_rate": 0.0002,
3204
+ "loss": 1.766,
3205
+ "step": 526
3206
+ },
3207
+ {
3208
+ "epoch": 0.71,
3209
+ "learning_rate": 0.0002,
3210
+ "loss": 1.1326,
3211
+ "step": 527
3212
+ },
3213
+ {
3214
+ "epoch": 0.71,
3215
+ "learning_rate": 0.0002,
3216
+ "loss": 1.6338,
3217
+ "step": 528
3218
+ },
3219
+ {
3220
+ "epoch": 0.71,
3221
+ "learning_rate": 0.0002,
3222
+ "loss": 1.3261,
3223
+ "step": 529
3224
+ },
3225
+ {
3226
+ "epoch": 0.71,
3227
+ "learning_rate": 0.0002,
3228
+ "loss": 1.4421,
3229
+ "step": 530
3230
+ },
3231
+ {
3232
+ "epoch": 0.71,
3233
+ "learning_rate": 0.0002,
3234
+ "loss": 1.1819,
3235
+ "step": 531
3236
+ },
3237
+ {
3238
+ "epoch": 0.71,
3239
+ "learning_rate": 0.0002,
3240
+ "loss": 1.3441,
3241
+ "step": 532
3242
+ },
3243
+ {
3244
+ "epoch": 0.72,
3245
+ "learning_rate": 0.0002,
3246
+ "loss": 1.2674,
3247
+ "step": 533
3248
+ },
3249
+ {
3250
+ "epoch": 0.72,
3251
+ "learning_rate": 0.0002,
3252
+ "loss": 1.0905,
3253
+ "step": 534
3254
+ },
3255
+ {
3256
+ "epoch": 0.72,
3257
+ "learning_rate": 0.0002,
3258
+ "loss": 1.7163,
3259
+ "step": 535
3260
+ },
3261
+ {
3262
+ "epoch": 0.72,
3263
+ "learning_rate": 0.0002,
3264
+ "loss": 1.4708,
3265
+ "step": 536
3266
+ },
3267
+ {
3268
+ "epoch": 0.72,
3269
+ "learning_rate": 0.0002,
3270
+ "loss": 1.2213,
3271
+ "step": 537
3272
+ },
3273
+ {
3274
+ "epoch": 0.72,
3275
+ "learning_rate": 0.0002,
3276
+ "loss": 1.4032,
3277
+ "step": 538
3278
+ },
3279
+ {
3280
+ "epoch": 0.72,
3281
+ "learning_rate": 0.0002,
3282
+ "loss": 1.4613,
3283
+ "step": 539
3284
+ },
3285
+ {
3286
+ "epoch": 0.73,
3287
+ "learning_rate": 0.0002,
3288
+ "loss": 1.1315,
3289
+ "step": 540
3290
+ },
3291
+ {
3292
+ "epoch": 0.73,
3293
+ "learning_rate": 0.0002,
3294
+ "loss": 1.4049,
3295
+ "step": 541
3296
+ },
3297
+ {
3298
+ "epoch": 0.73,
3299
+ "learning_rate": 0.0002,
3300
+ "loss": 1.2075,
3301
+ "step": 542
3302
+ },
3303
+ {
3304
+ "epoch": 0.73,
3305
+ "learning_rate": 0.0002,
3306
+ "loss": 1.2874,
3307
+ "step": 543
3308
+ },
3309
+ {
3310
+ "epoch": 0.73,
3311
+ "learning_rate": 0.0002,
3312
+ "loss": 1.9946,
3313
+ "step": 544
3314
+ },
3315
+ {
3316
+ "epoch": 0.73,
3317
+ "learning_rate": 0.0002,
3318
+ "loss": 1.2956,
3319
+ "step": 545
3320
+ },
3321
+ {
3322
+ "epoch": 0.73,
3323
+ "learning_rate": 0.0002,
3324
+ "loss": 1.5638,
3325
+ "step": 546
3326
+ },
3327
+ {
3328
+ "epoch": 0.73,
3329
+ "learning_rate": 0.0002,
3330
+ "loss": 1.4105,
3331
+ "step": 547
3332
+ },
3333
+ {
3334
+ "epoch": 0.74,
3335
+ "learning_rate": 0.0002,
3336
+ "loss": 1.2435,
3337
+ "step": 548
3338
+ },
3339
+ {
3340
+ "epoch": 0.74,
3341
+ "learning_rate": 0.0002,
3342
+ "loss": 1.3654,
3343
+ "step": 549
3344
+ },
3345
+ {
3346
+ "epoch": 0.74,
3347
+ "learning_rate": 0.0002,
3348
+ "loss": 1.7154,
3349
+ "step": 550
3350
+ },
3351
+ {
3352
+ "epoch": 0.74,
3353
+ "learning_rate": 0.0002,
3354
+ "loss": 1.2973,
3355
+ "step": 551
3356
+ },
3357
+ {
3358
+ "epoch": 0.74,
3359
+ "learning_rate": 0.0002,
3360
+ "loss": 1.2755,
3361
+ "step": 552
3362
+ },
3363
+ {
3364
+ "epoch": 0.74,
3365
+ "learning_rate": 0.0002,
3366
+ "loss": 1.5998,
3367
+ "step": 553
3368
+ },
3369
+ {
3370
+ "epoch": 0.74,
3371
+ "learning_rate": 0.0002,
3372
+ "loss": 1.4952,
3373
+ "step": 554
3374
+ },
3375
+ {
3376
+ "epoch": 0.75,
3377
+ "learning_rate": 0.0002,
3378
+ "loss": 1.0843,
3379
+ "step": 555
3380
+ },
3381
+ {
3382
+ "epoch": 0.75,
3383
+ "learning_rate": 0.0002,
3384
+ "loss": 1.4332,
3385
+ "step": 556
3386
+ },
3387
+ {
3388
+ "epoch": 0.75,
3389
+ "learning_rate": 0.0002,
3390
+ "loss": 1.3382,
3391
+ "step": 557
3392
+ },
3393
+ {
3394
+ "epoch": 0.75,
3395
+ "learning_rate": 0.0002,
3396
+ "loss": 1.6568,
3397
+ "step": 558
3398
+ },
3399
+ {
3400
+ "epoch": 0.75,
3401
+ "learning_rate": 0.0002,
3402
+ "loss": 1.4465,
3403
+ "step": 559
3404
+ },
3405
+ {
3406
+ "epoch": 0.75,
3407
+ "learning_rate": 0.0002,
3408
+ "loss": 1.7039,
3409
+ "step": 560
3410
+ },
3411
+ {
3412
+ "epoch": 0.75,
3413
+ "learning_rate": 0.0002,
3414
+ "loss": 1.3814,
3415
+ "step": 561
3416
+ },
3417
+ {
3418
+ "epoch": 0.76,
3419
+ "learning_rate": 0.0002,
3420
+ "loss": 1.3159,
3421
+ "step": 562
3422
+ },
3423
+ {
3424
+ "epoch": 0.76,
3425
+ "learning_rate": 0.0002,
3426
+ "loss": 1.2362,
3427
+ "step": 563
3428
+ },
3429
+ {
3430
+ "epoch": 0.76,
3431
+ "learning_rate": 0.0002,
3432
+ "loss": 1.3358,
3433
+ "step": 564
3434
+ },
3435
+ {
3436
+ "epoch": 0.76,
3437
+ "learning_rate": 0.0002,
3438
+ "loss": 1.5882,
3439
+ "step": 565
3440
+ },
3441
+ {
3442
+ "epoch": 0.76,
3443
+ "learning_rate": 0.0002,
3444
+ "loss": 1.4905,
3445
+ "step": 566
3446
+ },
3447
+ {
3448
+ "epoch": 0.76,
3449
+ "learning_rate": 0.0002,
3450
+ "loss": 1.2225,
3451
+ "step": 567
3452
+ },
3453
+ {
3454
+ "epoch": 0.76,
3455
+ "learning_rate": 0.0002,
3456
+ "loss": 1.8351,
3457
+ "step": 568
3458
+ },
3459
+ {
3460
+ "epoch": 0.76,
3461
+ "learning_rate": 0.0002,
3462
+ "loss": 1.6711,
3463
+ "step": 569
3464
+ },
3465
+ {
3466
+ "epoch": 0.77,
3467
+ "learning_rate": 0.0002,
3468
+ "loss": 1.3793,
3469
+ "step": 570
3470
+ },
3471
+ {
3472
+ "epoch": 0.77,
3473
+ "learning_rate": 0.0002,
3474
+ "loss": 1.6333,
3475
+ "step": 571
3476
+ },
3477
+ {
3478
+ "epoch": 0.77,
3479
+ "learning_rate": 0.0002,
3480
+ "loss": 1.3526,
3481
+ "step": 572
3482
+ },
3483
+ {
3484
+ "epoch": 0.77,
3485
+ "learning_rate": 0.0002,
3486
+ "loss": 1.6924,
3487
+ "step": 573
3488
+ },
3489
+ {
3490
+ "epoch": 0.77,
3491
+ "learning_rate": 0.0002,
3492
+ "loss": 1.3053,
3493
+ "step": 574
3494
+ },
3495
+ {
3496
+ "epoch": 0.77,
3497
+ "learning_rate": 0.0002,
3498
+ "loss": 1.0704,
3499
+ "step": 575
3500
+ },
3501
+ {
3502
+ "epoch": 0.77,
3503
+ "learning_rate": 0.0002,
3504
+ "loss": 1.2368,
3505
+ "step": 576
3506
+ },
3507
+ {
3508
+ "epoch": 0.78,
3509
+ "learning_rate": 0.0002,
3510
+ "loss": 1.2332,
3511
+ "step": 577
3512
+ },
3513
+ {
3514
+ "epoch": 0.78,
3515
+ "learning_rate": 0.0002,
3516
+ "loss": 1.4293,
3517
+ "step": 578
3518
+ },
3519
+ {
3520
+ "epoch": 0.78,
3521
+ "learning_rate": 0.0002,
3522
+ "loss": 1.4907,
3523
+ "step": 579
3524
+ },
3525
+ {
3526
+ "epoch": 0.78,
3527
+ "learning_rate": 0.0002,
3528
+ "loss": 1.7018,
3529
+ "step": 580
3530
+ },
3531
+ {
3532
+ "epoch": 0.78,
3533
+ "learning_rate": 0.0002,
3534
+ "loss": 1.4077,
3535
+ "step": 581
3536
+ },
3537
+ {
3538
+ "epoch": 0.78,
3539
+ "learning_rate": 0.0002,
3540
+ "loss": 1.3053,
3541
+ "step": 582
3542
+ },
3543
+ {
3544
+ "epoch": 0.78,
3545
+ "learning_rate": 0.0002,
3546
+ "loss": 1.3998,
3547
+ "step": 583
3548
+ },
3549
+ {
3550
+ "epoch": 0.78,
3551
+ "learning_rate": 0.0002,
3552
+ "loss": 1.2415,
3553
+ "step": 584
3554
+ },
3555
+ {
3556
+ "epoch": 0.79,
3557
+ "learning_rate": 0.0002,
3558
+ "loss": 1.3822,
3559
+ "step": 585
3560
+ },
3561
+ {
3562
+ "epoch": 0.79,
3563
+ "learning_rate": 0.0002,
3564
+ "loss": 1.3607,
3565
+ "step": 586
3566
+ },
3567
+ {
3568
+ "epoch": 0.79,
3569
+ "learning_rate": 0.0002,
3570
+ "loss": 1.483,
3571
+ "step": 587
3572
+ },
3573
+ {
3574
+ "epoch": 0.79,
3575
+ "learning_rate": 0.0002,
3576
+ "loss": 1.6341,
3577
+ "step": 588
3578
+ },
3579
+ {
3580
+ "epoch": 0.79,
3581
+ "learning_rate": 0.0002,
3582
+ "loss": 1.5254,
3583
+ "step": 589
3584
+ },
3585
+ {
3586
+ "epoch": 0.79,
3587
+ "learning_rate": 0.0002,
3588
+ "loss": 1.5788,
3589
+ "step": 590
3590
+ },
3591
+ {
3592
+ "epoch": 0.79,
3593
+ "learning_rate": 0.0002,
3594
+ "loss": 1.3189,
3595
+ "step": 591
3596
+ },
3597
+ {
3598
+ "epoch": 0.8,
3599
+ "learning_rate": 0.0002,
3600
+ "loss": 1.4361,
3601
+ "step": 592
3602
+ },
3603
+ {
3604
+ "epoch": 0.8,
3605
+ "learning_rate": 0.0002,
3606
+ "loss": 1.3071,
3607
+ "step": 593
3608
+ },
3609
+ {
3610
+ "epoch": 0.8,
3611
+ "learning_rate": 0.0002,
3612
+ "loss": 1.4418,
3613
+ "step": 594
3614
+ },
3615
+ {
3616
+ "epoch": 0.8,
3617
+ "learning_rate": 0.0002,
3618
+ "loss": 1.4925,
3619
+ "step": 595
3620
+ },
3621
+ {
3622
+ "epoch": 0.8,
3623
+ "learning_rate": 0.0002,
3624
+ "loss": 1.4968,
3625
+ "step": 596
3626
+ },
3627
+ {
3628
+ "epoch": 0.8,
3629
+ "learning_rate": 0.0002,
3630
+ "loss": 1.6274,
3631
+ "step": 597
3632
+ },
3633
+ {
3634
+ "epoch": 0.8,
3635
+ "learning_rate": 0.0002,
3636
+ "loss": 1.7317,
3637
+ "step": 598
3638
+ },
3639
+ {
3640
+ "epoch": 0.8,
3641
+ "learning_rate": 0.0002,
3642
+ "loss": 1.3714,
3643
+ "step": 599
3644
+ },
3645
+ {
3646
+ "epoch": 0.81,
3647
+ "learning_rate": 0.0002,
3648
+ "loss": 1.3965,
3649
+ "step": 600
3650
+ },
3651
+ {
3652
+ "epoch": 0.81,
3653
+ "eval_loss": 1.3718293905258179,
3654
+ "eval_runtime": 440.6925,
3655
+ "eval_samples_per_second": 1.566,
3656
+ "eval_steps_per_second": 0.393,
3657
+ "step": 600
3658
+ },
3659
+ {
3660
+ "epoch": 0.81,
3661
+ "learning_rate": 0.0002,
3662
+ "loss": 1.5125,
3663
+ "step": 601
3664
+ },
3665
+ {
3666
+ "epoch": 0.81,
3667
+ "learning_rate": 0.0002,
3668
+ "loss": 1.3987,
3669
+ "step": 602
3670
+ },
3671
+ {
3672
+ "epoch": 0.81,
3673
+ "learning_rate": 0.0002,
3674
+ "loss": 1.3577,
3675
+ "step": 603
3676
+ },
3677
+ {
3678
+ "epoch": 0.81,
3679
+ "learning_rate": 0.0002,
3680
+ "loss": 1.3159,
3681
+ "step": 604
3682
+ },
3683
+ {
3684
+ "epoch": 0.81,
3685
+ "learning_rate": 0.0002,
3686
+ "loss": 1.197,
3687
+ "step": 605
3688
+ },
3689
+ {
3690
+ "epoch": 0.81,
3691
+ "learning_rate": 0.0002,
3692
+ "loss": 1.2876,
3693
+ "step": 606
3694
+ },
3695
+ {
3696
+ "epoch": 0.82,
3697
+ "learning_rate": 0.0002,
3698
+ "loss": 1.3119,
3699
+ "step": 607
3700
+ },
3701
+ {
3702
+ "epoch": 0.82,
3703
+ "learning_rate": 0.0002,
3704
+ "loss": 1.6125,
3705
+ "step": 608
3706
+ },
3707
+ {
3708
+ "epoch": 0.82,
3709
+ "learning_rate": 0.0002,
3710
+ "loss": 1.2761,
3711
+ "step": 609
3712
+ },
3713
+ {
3714
+ "epoch": 0.82,
3715
+ "learning_rate": 0.0002,
3716
+ "loss": 1.7309,
3717
+ "step": 610
3718
+ },
3719
+ {
3720
+ "epoch": 0.82,
3721
+ "learning_rate": 0.0002,
3722
+ "loss": 1.4789,
3723
+ "step": 611
3724
+ },
3725
+ {
3726
+ "epoch": 0.82,
3727
+ "learning_rate": 0.0002,
3728
+ "loss": 1.3247,
3729
+ "step": 612
3730
+ },
3731
+ {
3732
+ "epoch": 0.82,
3733
+ "learning_rate": 0.0002,
3734
+ "loss": 1.3337,
3735
+ "step": 613
3736
+ },
3737
+ {
3738
+ "epoch": 0.82,
3739
+ "learning_rate": 0.0002,
3740
+ "loss": 1.5705,
3741
+ "step": 614
3742
+ },
3743
+ {
3744
+ "epoch": 0.83,
3745
+ "learning_rate": 0.0002,
3746
+ "loss": 1.3059,
3747
+ "step": 615
3748
+ },
3749
+ {
3750
+ "epoch": 0.83,
3751
+ "learning_rate": 0.0002,
3752
+ "loss": 1.4452,
3753
+ "step": 616
3754
+ },
3755
+ {
3756
+ "epoch": 0.83,
3757
+ "learning_rate": 0.0002,
3758
+ "loss": 1.6685,
3759
+ "step": 617
3760
+ },
3761
+ {
3762
+ "epoch": 0.83,
3763
+ "learning_rate": 0.0002,
3764
+ "loss": 1.3522,
3765
+ "step": 618
3766
+ },
3767
+ {
3768
+ "epoch": 0.83,
3769
+ "learning_rate": 0.0002,
3770
+ "loss": 1.1878,
3771
+ "step": 619
3772
+ },
3773
+ {
3774
+ "epoch": 0.83,
3775
+ "learning_rate": 0.0002,
3776
+ "loss": 1.294,
3777
+ "step": 620
3778
+ },
3779
+ {
3780
+ "epoch": 0.83,
3781
+ "learning_rate": 0.0002,
3782
+ "loss": 1.613,
3783
+ "step": 621
3784
+ },
3785
+ {
3786
+ "epoch": 0.84,
3787
+ "learning_rate": 0.0002,
3788
+ "loss": 1.3739,
3789
+ "step": 622
3790
+ },
3791
+ {
3792
+ "epoch": 0.84,
3793
+ "learning_rate": 0.0002,
3794
+ "loss": 1.4221,
3795
+ "step": 623
3796
+ },
3797
+ {
3798
+ "epoch": 0.84,
3799
+ "learning_rate": 0.0002,
3800
+ "loss": 1.5149,
3801
+ "step": 624
3802
+ },
3803
+ {
3804
+ "epoch": 0.84,
3805
+ "learning_rate": 0.0002,
3806
+ "loss": 1.3332,
3807
+ "step": 625
3808
+ },
3809
+ {
3810
+ "epoch": 0.84,
3811
+ "learning_rate": 0.0002,
3812
+ "loss": 1.6892,
3813
+ "step": 626
3814
+ },
3815
+ {
3816
+ "epoch": 0.84,
3817
+ "learning_rate": 0.0002,
3818
+ "loss": 1.1803,
3819
+ "step": 627
3820
+ },
3821
+ {
3822
+ "epoch": 0.84,
3823
+ "learning_rate": 0.0002,
3824
+ "loss": 1.4843,
3825
+ "step": 628
3826
+ },
3827
+ {
3828
+ "epoch": 0.85,
3829
+ "learning_rate": 0.0002,
3830
+ "loss": 1.5341,
3831
+ "step": 629
3832
+ },
3833
+ {
3834
+ "epoch": 0.85,
3835
+ "learning_rate": 0.0002,
3836
+ "loss": 1.2203,
3837
+ "step": 630
3838
+ },
3839
+ {
3840
+ "epoch": 0.85,
3841
+ "learning_rate": 0.0002,
3842
+ "loss": 1.4969,
3843
+ "step": 631
3844
+ },
3845
+ {
3846
+ "epoch": 0.85,
3847
+ "learning_rate": 0.0002,
3848
+ "loss": 1.5029,
3849
+ "step": 632
3850
+ },
3851
+ {
3852
+ "epoch": 0.85,
3853
+ "learning_rate": 0.0002,
3854
+ "loss": 1.2501,
3855
+ "step": 633
3856
+ },
3857
+ {
3858
+ "epoch": 0.85,
3859
+ "learning_rate": 0.0002,
3860
+ "loss": 1.5621,
3861
+ "step": 634
3862
+ },
3863
+ {
3864
+ "epoch": 0.85,
3865
+ "learning_rate": 0.0002,
3866
+ "loss": 1.4174,
3867
+ "step": 635
3868
+ },
3869
+ {
3870
+ "epoch": 0.85,
3871
+ "learning_rate": 0.0002,
3872
+ "loss": 1.3022,
3873
+ "step": 636
3874
+ },
3875
+ {
3876
+ "epoch": 0.86,
3877
+ "learning_rate": 0.0002,
3878
+ "loss": 1.4917,
3879
+ "step": 637
3880
+ },
3881
+ {
3882
+ "epoch": 0.86,
3883
+ "learning_rate": 0.0002,
3884
+ "loss": 1.4227,
3885
+ "step": 638
3886
+ },
3887
+ {
3888
+ "epoch": 0.86,
3889
+ "learning_rate": 0.0002,
3890
+ "loss": 1.6772,
3891
+ "step": 639
3892
+ },
3893
+ {
3894
+ "epoch": 0.86,
3895
+ "learning_rate": 0.0002,
3896
+ "loss": 1.4155,
3897
+ "step": 640
3898
+ },
3899
+ {
3900
+ "epoch": 0.86,
3901
+ "learning_rate": 0.0002,
3902
+ "loss": 1.4245,
3903
+ "step": 641
3904
+ },
3905
+ {
3906
+ "epoch": 0.86,
3907
+ "learning_rate": 0.0002,
3908
+ "loss": 1.3916,
3909
+ "step": 642
3910
+ },
3911
+ {
3912
+ "epoch": 0.86,
3913
+ "learning_rate": 0.0002,
3914
+ "loss": 1.2547,
3915
+ "step": 643
3916
+ },
3917
+ {
3918
+ "epoch": 0.87,
3919
+ "learning_rate": 0.0002,
3920
+ "loss": 1.6559,
3921
+ "step": 644
3922
+ },
3923
+ {
3924
+ "epoch": 0.87,
3925
+ "learning_rate": 0.0002,
3926
+ "loss": 1.3959,
3927
+ "step": 645
3928
+ },
3929
+ {
3930
+ "epoch": 0.87,
3931
+ "learning_rate": 0.0002,
3932
+ "loss": 1.6932,
3933
+ "step": 646
3934
+ },
3935
+ {
3936
+ "epoch": 0.87,
3937
+ "learning_rate": 0.0002,
3938
+ "loss": 1.412,
3939
+ "step": 647
3940
+ },
3941
+ {
3942
+ "epoch": 0.87,
3943
+ "learning_rate": 0.0002,
3944
+ "loss": 1.4734,
3945
+ "step": 648
3946
+ },
3947
+ {
3948
+ "epoch": 0.87,
3949
+ "learning_rate": 0.0002,
3950
+ "loss": 1.4544,
3951
+ "step": 649
3952
+ },
3953
+ {
3954
+ "epoch": 0.87,
3955
+ "learning_rate": 0.0002,
3956
+ "loss": 1.3993,
3957
+ "step": 650
3958
+ },
3959
+ {
3960
+ "epoch": 0.87,
3961
+ "learning_rate": 0.0002,
3962
+ "loss": 1.4305,
3963
+ "step": 651
3964
+ },
3965
+ {
3966
+ "epoch": 0.88,
3967
+ "learning_rate": 0.0002,
3968
+ "loss": 1.4364,
3969
+ "step": 652
3970
+ },
3971
+ {
3972
+ "epoch": 0.88,
3973
+ "learning_rate": 0.0002,
3974
+ "loss": 1.481,
3975
+ "step": 653
3976
+ },
3977
+ {
3978
+ "epoch": 0.88,
3979
+ "learning_rate": 0.0002,
3980
+ "loss": 1.3716,
3981
+ "step": 654
3982
+ },
3983
+ {
3984
+ "epoch": 0.88,
3985
+ "learning_rate": 0.0002,
3986
+ "loss": 1.4739,
3987
+ "step": 655
3988
+ },
3989
+ {
3990
+ "epoch": 0.88,
3991
+ "learning_rate": 0.0002,
3992
+ "loss": 1.2871,
3993
+ "step": 656
3994
+ },
3995
+ {
3996
+ "epoch": 0.88,
3997
+ "learning_rate": 0.0002,
3998
+ "loss": 1.3972,
3999
+ "step": 657
4000
+ },
4001
+ {
4002
+ "epoch": 0.88,
4003
+ "learning_rate": 0.0002,
4004
+ "loss": 1.1626,
4005
+ "step": 658
4006
+ },
4007
+ {
4008
+ "epoch": 0.89,
4009
+ "learning_rate": 0.0002,
4010
+ "loss": 1.7518,
4011
+ "step": 659
4012
+ },
4013
+ {
4014
+ "epoch": 0.89,
4015
+ "learning_rate": 0.0002,
4016
+ "loss": 1.5674,
4017
+ "step": 660
4018
+ },
4019
+ {
4020
+ "epoch": 0.89,
4021
+ "learning_rate": 0.0002,
4022
+ "loss": 1.5055,
4023
+ "step": 661
4024
+ },
4025
+ {
4026
+ "epoch": 0.89,
4027
+ "learning_rate": 0.0002,
4028
+ "loss": 1.1769,
4029
+ "step": 662
4030
+ },
4031
+ {
4032
+ "epoch": 0.89,
4033
+ "learning_rate": 0.0002,
4034
+ "loss": 1.4755,
4035
+ "step": 663
4036
+ },
4037
+ {
4038
+ "epoch": 0.89,
4039
+ "learning_rate": 0.0002,
4040
+ "loss": 1.4907,
4041
+ "step": 664
4042
+ },
4043
+ {
4044
+ "epoch": 0.89,
4045
+ "learning_rate": 0.0002,
4046
+ "loss": 1.3265,
4047
+ "step": 665
4048
+ },
4049
+ {
4050
+ "epoch": 0.89,
4051
+ "learning_rate": 0.0002,
4052
+ "loss": 1.3154,
4053
+ "step": 666
4054
+ },
4055
+ {
4056
+ "epoch": 0.9,
4057
+ "learning_rate": 0.0002,
4058
+ "loss": 1.3409,
4059
+ "step": 667
4060
+ },
4061
+ {
4062
+ "epoch": 0.9,
4063
+ "learning_rate": 0.0002,
4064
+ "loss": 1.378,
4065
+ "step": 668
4066
+ },
4067
+ {
4068
+ "epoch": 0.9,
4069
+ "learning_rate": 0.0002,
4070
+ "loss": 1.4048,
4071
+ "step": 669
4072
+ },
4073
+ {
4074
+ "epoch": 0.9,
4075
+ "learning_rate": 0.0002,
4076
+ "loss": 1.4964,
4077
+ "step": 670
4078
+ },
4079
+ {
4080
+ "epoch": 0.9,
4081
+ "learning_rate": 0.0002,
4082
+ "loss": 1.6212,
4083
+ "step": 671
4084
+ },
4085
+ {
4086
+ "epoch": 0.9,
4087
+ "learning_rate": 0.0002,
4088
+ "loss": 1.3127,
4089
+ "step": 672
4090
+ },
4091
+ {
4092
+ "epoch": 0.9,
4093
+ "learning_rate": 0.0002,
4094
+ "loss": 1.4169,
4095
+ "step": 673
4096
+ },
4097
+ {
4098
+ "epoch": 0.91,
4099
+ "learning_rate": 0.0002,
4100
+ "loss": 1.2498,
4101
+ "step": 674
4102
+ },
4103
+ {
4104
+ "epoch": 0.91,
4105
+ "learning_rate": 0.0002,
4106
+ "loss": 1.4045,
4107
+ "step": 675
4108
+ },
4109
+ {
4110
+ "epoch": 0.91,
4111
+ "learning_rate": 0.0002,
4112
+ "loss": 1.5758,
4113
+ "step": 676
4114
+ },
4115
+ {
4116
+ "epoch": 0.91,
4117
+ "learning_rate": 0.0002,
4118
+ "loss": 1.3823,
4119
+ "step": 677
4120
+ },
4121
+ {
4122
+ "epoch": 0.91,
4123
+ "learning_rate": 0.0002,
4124
+ "loss": 1.6601,
4125
+ "step": 678
4126
+ },
4127
+ {
4128
+ "epoch": 0.91,
4129
+ "learning_rate": 0.0002,
4130
+ "loss": 1.5562,
4131
+ "step": 679
4132
+ },
4133
+ {
4134
+ "epoch": 0.91,
4135
+ "learning_rate": 0.0002,
4136
+ "loss": 1.1358,
4137
+ "step": 680
4138
+ },
4139
+ {
4140
+ "epoch": 0.92,
4141
+ "learning_rate": 0.0002,
4142
+ "loss": 1.1325,
4143
+ "step": 681
4144
+ },
4145
+ {
4146
+ "epoch": 0.92,
4147
+ "learning_rate": 0.0002,
4148
+ "loss": 1.4813,
4149
+ "step": 682
4150
+ },
4151
+ {
4152
+ "epoch": 0.92,
4153
+ "learning_rate": 0.0002,
4154
+ "loss": 1.335,
4155
+ "step": 683
4156
+ },
4157
+ {
4158
+ "epoch": 0.92,
4159
+ "learning_rate": 0.0002,
4160
+ "loss": 1.614,
4161
+ "step": 684
4162
+ },
4163
+ {
4164
+ "epoch": 0.92,
4165
+ "learning_rate": 0.0002,
4166
+ "loss": 1.448,
4167
+ "step": 685
4168
+ },
4169
+ {
4170
+ "epoch": 0.92,
4171
+ "learning_rate": 0.0002,
4172
+ "loss": 1.3724,
4173
+ "step": 686
4174
+ },
4175
+ {
4176
+ "epoch": 0.92,
4177
+ "learning_rate": 0.0002,
4178
+ "loss": 1.4873,
4179
+ "step": 687
4180
+ },
4181
+ {
4182
+ "epoch": 0.92,
4183
+ "learning_rate": 0.0002,
4184
+ "loss": 1.4579,
4185
+ "step": 688
4186
+ },
4187
+ {
4188
+ "epoch": 0.93,
4189
+ "learning_rate": 0.0002,
4190
+ "loss": 1.4331,
4191
+ "step": 689
4192
+ },
4193
+ {
4194
+ "epoch": 0.93,
4195
+ "learning_rate": 0.0002,
4196
+ "loss": 1.6089,
4197
+ "step": 690
4198
+ },
4199
+ {
4200
+ "epoch": 0.93,
4201
+ "learning_rate": 0.0002,
4202
+ "loss": 1.4011,
4203
+ "step": 691
4204
+ },
4205
+ {
4206
+ "epoch": 0.93,
4207
+ "learning_rate": 0.0002,
4208
+ "loss": 1.3296,
4209
+ "step": 692
4210
+ },
4211
+ {
4212
+ "epoch": 0.93,
4213
+ "learning_rate": 0.0002,
4214
+ "loss": 1.4143,
4215
+ "step": 693
4216
+ },
4217
+ {
4218
+ "epoch": 0.93,
4219
+ "learning_rate": 0.0002,
4220
+ "loss": 1.4736,
4221
+ "step": 694
4222
+ },
4223
+ {
4224
+ "epoch": 0.93,
4225
+ "learning_rate": 0.0002,
4226
+ "loss": 1.406,
4227
+ "step": 695
4228
+ },
4229
+ {
4230
+ "epoch": 0.94,
4231
+ "learning_rate": 0.0002,
4232
+ "loss": 1.5285,
4233
+ "step": 696
4234
+ },
4235
+ {
4236
+ "epoch": 0.94,
4237
+ "learning_rate": 0.0002,
4238
+ "loss": 1.2369,
4239
+ "step": 697
4240
+ },
4241
+ {
4242
+ "epoch": 0.94,
4243
+ "learning_rate": 0.0002,
4244
+ "loss": 1.3969,
4245
+ "step": 698
4246
+ },
4247
+ {
4248
+ "epoch": 0.94,
4249
+ "learning_rate": 0.0002,
4250
+ "loss": 1.4348,
4251
+ "step": 699
4252
+ },
4253
+ {
4254
+ "epoch": 0.94,
4255
+ "learning_rate": 0.0002,
4256
+ "loss": 1.5787,
4257
+ "step": 700
4258
+ },
4259
+ {
4260
+ "epoch": 0.94,
4261
+ "eval_loss": 1.3678728342056274,
4262
+ "eval_runtime": 440.729,
4263
+ "eval_samples_per_second": 1.566,
4264
+ "eval_steps_per_second": 0.393,
4265
+ "step": 700
4266
+ },
4267
+ {
4268
+ "epoch": 0.94,
4269
+ "learning_rate": 0.0002,
4270
+ "loss": 1.3193,
4271
+ "step": 701
4272
+ },
4273
+ {
4274
+ "epoch": 0.94,
4275
+ "learning_rate": 0.0002,
4276
+ "loss": 1.2932,
4277
+ "step": 702
4278
+ },
4279
+ {
4280
+ "epoch": 0.94,
4281
+ "learning_rate": 0.0002,
4282
+ "loss": 1.4183,
4283
+ "step": 703
4284
+ },
4285
+ {
4286
+ "epoch": 0.95,
4287
+ "learning_rate": 0.0002,
4288
+ "loss": 1.5328,
4289
+ "step": 704
4290
+ },
4291
+ {
4292
+ "epoch": 0.95,
4293
+ "learning_rate": 0.0002,
4294
+ "loss": 1.4639,
4295
+ "step": 705
4296
+ },
4297
+ {
4298
+ "epoch": 0.95,
4299
+ "learning_rate": 0.0002,
4300
+ "loss": 1.3475,
4301
+ "step": 706
4302
+ },
4303
+ {
4304
+ "epoch": 0.95,
4305
+ "learning_rate": 0.0002,
4306
+ "loss": 1.3079,
4307
+ "step": 707
4308
+ },
4309
+ {
4310
+ "epoch": 0.95,
4311
+ "learning_rate": 0.0002,
4312
+ "loss": 1.2619,
4313
+ "step": 708
4314
+ },
4315
+ {
4316
+ "epoch": 0.95,
4317
+ "learning_rate": 0.0002,
4318
+ "loss": 1.5947,
4319
+ "step": 709
4320
+ },
4321
+ {
4322
+ "epoch": 0.95,
4323
+ "learning_rate": 0.0002,
4324
+ "loss": 1.1239,
4325
+ "step": 710
4326
+ },
4327
+ {
4328
+ "epoch": 0.96,
4329
+ "learning_rate": 0.0002,
4330
+ "loss": 1.129,
4331
+ "step": 711
4332
+ },
4333
+ {
4334
+ "epoch": 0.96,
4335
+ "learning_rate": 0.0002,
4336
+ "loss": 1.4643,
4337
+ "step": 712
4338
+ },
4339
+ {
4340
+ "epoch": 0.96,
4341
+ "learning_rate": 0.0002,
4342
+ "loss": 1.5388,
4343
+ "step": 713
4344
+ },
4345
+ {
4346
+ "epoch": 0.96,
4347
+ "learning_rate": 0.0002,
4348
+ "loss": 1.4328,
4349
+ "step": 714
4350
+ },
4351
+ {
4352
+ "epoch": 0.96,
4353
+ "learning_rate": 0.0002,
4354
+ "loss": 1.4876,
4355
+ "step": 715
4356
+ },
4357
+ {
4358
+ "epoch": 0.96,
4359
+ "learning_rate": 0.0002,
4360
+ "loss": 1.7079,
4361
+ "step": 716
4362
+ },
4363
+ {
4364
+ "epoch": 0.96,
4365
+ "learning_rate": 0.0002,
4366
+ "loss": 1.4483,
4367
+ "step": 717
4368
+ },
4369
+ {
4370
+ "epoch": 0.96,
4371
+ "learning_rate": 0.0002,
4372
+ "loss": 1.4254,
4373
+ "step": 718
4374
+ },
4375
+ {
4376
+ "epoch": 0.97,
4377
+ "learning_rate": 0.0002,
4378
+ "loss": 1.5946,
4379
+ "step": 719
4380
+ },
4381
+ {
4382
+ "epoch": 0.97,
4383
+ "learning_rate": 0.0002,
4384
+ "loss": 1.5887,
4385
+ "step": 720
4386
+ },
4387
+ {
4388
+ "epoch": 0.97,
4389
+ "learning_rate": 0.0002,
4390
+ "loss": 1.2913,
4391
+ "step": 721
4392
+ },
4393
+ {
4394
+ "epoch": 0.97,
4395
+ "learning_rate": 0.0002,
4396
+ "loss": 1.612,
4397
+ "step": 722
4398
+ },
4399
+ {
4400
+ "epoch": 0.97,
4401
+ "learning_rate": 0.0002,
4402
+ "loss": 1.2837,
4403
+ "step": 723
4404
+ },
4405
+ {
4406
+ "epoch": 0.97,
4407
+ "learning_rate": 0.0002,
4408
+ "loss": 1.3668,
4409
+ "step": 724
4410
+ },
4411
+ {
4412
+ "epoch": 0.97,
4413
+ "learning_rate": 0.0002,
4414
+ "loss": 1.3397,
4415
+ "step": 725
4416
+ },
4417
+ {
4418
+ "epoch": 0.98,
4419
+ "learning_rate": 0.0002,
4420
+ "loss": 1.5159,
4421
+ "step": 726
4422
+ },
4423
+ {
4424
+ "epoch": 0.98,
4425
+ "learning_rate": 0.0002,
4426
+ "loss": 1.7313,
4427
+ "step": 727
4428
+ },
4429
+ {
4430
+ "epoch": 0.98,
4431
+ "learning_rate": 0.0002,
4432
+ "loss": 1.3203,
4433
+ "step": 728
4434
+ },
4435
+ {
4436
+ "epoch": 0.98,
4437
+ "learning_rate": 0.0002,
4438
+ "loss": 1.3875,
4439
+ "step": 729
4440
+ },
4441
+ {
4442
+ "epoch": 0.98,
4443
+ "learning_rate": 0.0002,
4444
+ "loss": 1.4126,
4445
+ "step": 730
4446
+ },
4447
+ {
4448
+ "epoch": 0.98,
4449
+ "learning_rate": 0.0002,
4450
+ "loss": 1.5195,
4451
+ "step": 731
4452
+ },
4453
+ {
4454
+ "epoch": 0.98,
4455
+ "learning_rate": 0.0002,
4456
+ "loss": 1.5687,
4457
+ "step": 732
4458
+ },
4459
+ {
4460
+ "epoch": 0.98,
4461
+ "learning_rate": 0.0002,
4462
+ "loss": 1.7246,
4463
+ "step": 733
4464
+ },
4465
+ {
4466
+ "epoch": 0.99,
4467
+ "learning_rate": 0.0002,
4468
+ "loss": 1.392,
4469
+ "step": 734
4470
+ },
4471
+ {
4472
+ "epoch": 0.99,
4473
+ "learning_rate": 0.0002,
4474
+ "loss": 1.3392,
4475
+ "step": 735
4476
+ },
4477
+ {
4478
+ "epoch": 0.99,
4479
+ "learning_rate": 0.0002,
4480
+ "loss": 1.1387,
4481
+ "step": 736
4482
+ },
4483
+ {
4484
+ "epoch": 0.99,
4485
+ "learning_rate": 0.0002,
4486
+ "loss": 1.4896,
4487
+ "step": 737
4488
+ },
4489
+ {
4490
+ "epoch": 0.99,
4491
+ "learning_rate": 0.0002,
4492
+ "loss": 1.5993,
4493
+ "step": 738
4494
+ },
4495
+ {
4496
+ "epoch": 0.99,
4497
+ "learning_rate": 0.0002,
4498
+ "loss": 1.4317,
4499
+ "step": 739
4500
+ },
4501
+ {
4502
+ "epoch": 0.99,
4503
+ "learning_rate": 0.0002,
4504
+ "loss": 1.0769,
4505
+ "step": 740
4506
+ },
4507
+ {
4508
+ "epoch": 1.0,
4509
+ "learning_rate": 0.0002,
4510
+ "loss": 1.7145,
4511
+ "step": 741
4512
+ },
4513
+ {
4514
+ "epoch": 1.0,
4515
+ "learning_rate": 0.0002,
4516
+ "loss": 1.4863,
4517
+ "step": 742
4518
+ },
4519
+ {
4520
+ "epoch": 1.0,
4521
+ "learning_rate": 0.0002,
4522
+ "loss": 1.1356,
4523
+ "step": 743
4524
+ },
4525
+ {
4526
+ "epoch": 1.0,
4527
+ "learning_rate": 0.0002,
4528
+ "loss": 1.3969,
4529
+ "step": 744
4530
+ }
4531
+ ],
4532
+ "logging_steps": 1,
4533
+ "max_steps": 1488,
4534
+ "num_train_epochs": 2,
4535
+ "save_steps": 250,
4536
+ "total_flos": 5.095548469787443e+16,
4537
+ "trial_name": null,
4538
+ "trial_params": null
4539
+ }
checkpoint-744/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:256eb98e1514db6b5f4110313cf6834a546393b9505cfda85e800f159569f9ec
3
+ size 6840
runs/Jan17_01-10-57_melek-GL502VS/events.out.tfevents.1705443105.melek-GL502VS.80553.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4a2ef162feee0ee068d3b2165c9d0c1d7cbb71b3f6d0846957c64e572ae80d3d
3
- size 125367
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6dd778de1201d5a0e8187ec3435191d86cdb0b75561e562d6cbfceafb88f26d8
3
+ size 244072