Daniel23Stack commited on
Commit
b71cb9c
1 Parent(s): 67cbaf1

Delete aliceinwonderland-llama3

Browse files
aliceinwonderland-llama3/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- library_name: peft
3
- base_model: models\Meta-Llama-3-8b
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.8.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/adapter_config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "models\\Meta-Llama-3-8b",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layers_pattern": null,
10
- "layers_to_transform": null,
11
- "loftq_config": {},
12
- "lora_alpha": 64,
13
- "lora_dropout": 0.05,
14
- "megatron_config": null,
15
- "megatron_core": "megatron.core",
16
- "modules_to_save": null,
17
- "peft_type": "LORA",
18
- "r": 32,
19
- "rank_pattern": {},
20
- "revision": null,
21
- "target_modules": [
22
- "q_proj",
23
- "v_proj"
24
- ],
25
- "task_type": "CAUSAL_LM",
26
- "use_rslora": false
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/adapter_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8c8aea5a6c4a4dd8d4e7b4abc49349557f05d86a74217158bf968ca948c06ed5
3
- size 54572362
 
 
 
 
aliceinwonderland-llama3/checkpoint-13-loss-0_98/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- library_name: peft
3
- base_model: models\Meta-Llama-3-8b
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.8.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/checkpoint-13-loss-0_98/adapter_config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "models\\Meta-Llama-3-8b",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layers_pattern": null,
10
- "layers_to_transform": null,
11
- "loftq_config": {},
12
- "lora_alpha": 64,
13
- "lora_dropout": 0.05,
14
- "megatron_config": null,
15
- "megatron_core": "megatron.core",
16
- "modules_to_save": null,
17
- "peft_type": "LORA",
18
- "r": 32,
19
- "rank_pattern": {},
20
- "revision": null,
21
- "target_modules": [
22
- "q_proj",
23
- "v_proj"
24
- ],
25
- "task_type": "CAUSAL_LM",
26
- "use_rslora": false
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/checkpoint-13-loss-0_98/adapter_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:cb176e0bbe60607d99ff9fd4795a52ceb55a6f0024bebd0f5852a24acc5e7121
3
- size 54572362
 
 
 
 
aliceinwonderland-llama3/checkpoint-13-loss-0_98/training_log.json DELETED
@@ -1,14 +0,0 @@
1
- {
2
- "base_model_name": "Meta-Llama-3-8b",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": true,
5
- "base_loaded_in_8bit": false,
6
- "projections": "q, v",
7
- "loss": 0.9825,
8
- "grad_norm": 2.2934579849243164,
9
- "learning_rate": 1.2e-07,
10
- "epoch": 0.1326530612244898,
11
- "current_steps": 12,
12
- "current_steps_adjusted": 12,
13
- "epoch_adjusted": 0.1326530612244898
14
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/checkpoint-13-loss-0_98/training_prompt.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "template_type": "raw_text"
3
- }
 
 
 
 
aliceinwonderland-llama3/checkpoint-392-loss-0_86/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- library_name: peft
3
- base_model: models\Meta-Llama-3-8b
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.8.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/checkpoint-392-loss-0_86/adapter_config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "models\\Meta-Llama-3-8b",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layers_pattern": null,
10
- "layers_to_transform": null,
11
- "loftq_config": {},
12
- "lora_alpha": 64,
13
- "lora_dropout": 0.05,
14
- "megatron_config": null,
15
- "megatron_core": "megatron.core",
16
- "modules_to_save": null,
17
- "peft_type": "LORA",
18
- "r": 32,
19
- "rank_pattern": {},
20
- "revision": null,
21
- "target_modules": [
22
- "q_proj",
23
- "v_proj"
24
- ],
25
- "task_type": "CAUSAL_LM",
26
- "use_rslora": false
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/checkpoint-392-loss-0_86/adapter_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b5cfba2103baac5a1734f17ae2e0f95004cdb0ad415ba264862240a4900a127
3
- size 54572362
 
 
 
 
aliceinwonderland-llama3/checkpoint-392-loss-0_86/training_log.json DELETED
@@ -1,14 +0,0 @@
1
- {
2
- "base_model_name": "Meta-Llama-3-8b",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": true,
5
- "base_loaded_in_8bit": false,
6
- "projections": "q, v",
7
- "loss": 0.862,
8
- "grad_norm": 3.4459314346313477,
9
- "learning_rate": 2.5897435897435897e-07,
10
- "epoch": 4.0,
11
- "current_steps": 391,
12
- "current_steps_adjusted": 391,
13
- "epoch_adjusted": 4.0
14
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/checkpoint-392-loss-0_86/training_prompt.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "template_type": "raw_text"
3
- }
 
 
 
 
aliceinwonderland-llama3/runs/Jun04_16-40-59_DESKTOP-7QRHF82/events.out.tfevents.1717537260.DESKTOP-7QRHF82.4688.0 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e052f98b17ecd4340beb555eb5f48b1a6b16cfe76c599dda1421aa779e5ce7f1
3
- size 108486
 
 
 
 
aliceinwonderland-llama3/training_graph.json DELETED
@@ -1,2948 +0,0 @@
1
- [
2
- {
3
- "current_steps": 0,
4
- "loss": 1.0375,
5
- "learning_rate": 1e-08,
6
- "epoch": 0.01020408163265306
7
- },
8
- {
9
- "current_steps": 1,
10
- "loss": 0.9218,
11
- "learning_rate": 1e-08,
12
- "epoch": 0.02040816326530612
13
- },
14
- {
15
- "current_steps": 2,
16
- "loss": 1.2099,
17
- "learning_rate": 2e-08,
18
- "epoch": 0.030612244897959183
19
- },
20
- {
21
- "current_steps": 3,
22
- "loss": 0.8966,
23
- "learning_rate": 3e-08,
24
- "epoch": 0.04081632653061224
25
- },
26
- {
27
- "current_steps": 4,
28
- "loss": 1.0577,
29
- "learning_rate": 4e-08,
30
- "epoch": 0.05102040816326531
31
- },
32
- {
33
- "current_steps": 5,
34
- "loss": 1.3639,
35
- "learning_rate": 5e-08,
36
- "epoch": 0.061224489795918366
37
- },
38
- {
39
- "current_steps": 6,
40
- "loss": 1.2809,
41
- "learning_rate": 6e-08,
42
- "epoch": 0.07142857142857142
43
- },
44
- {
45
- "current_steps": 7,
46
- "loss": 1.0623,
47
- "learning_rate": 7e-08,
48
- "epoch": 0.08163265306122448
49
- },
50
- {
51
- "current_steps": 8,
52
- "loss": 1.4736,
53
- "learning_rate": 8e-08,
54
- "epoch": 0.09183673469387756
55
- },
56
- {
57
- "current_steps": 9,
58
- "loss": 1.0629,
59
- "learning_rate": 9e-08,
60
- "epoch": 0.10204081632653061
61
- },
62
- {
63
- "current_steps": 10,
64
- "loss": 0.9894,
65
- "learning_rate": 1e-07,
66
- "epoch": 0.11224489795918367
67
- },
68
- {
69
- "current_steps": 11,
70
- "loss": 1.3042,
71
- "learning_rate": 1.0999999999999999e-07,
72
- "epoch": 0.12244897959183673
73
- },
74
- {
75
- "current_steps": 12,
76
- "loss": 0.9825,
77
- "learning_rate": 1.2e-07,
78
- "epoch": 0.1326530612244898
79
- },
80
- {
81
- "current_steps": 13,
82
- "loss": 0.9608,
83
- "learning_rate": 1.3e-07,
84
- "epoch": 0.14285714285714285
85
- },
86
- {
87
- "current_steps": 14,
88
- "loss": 1.1691,
89
- "learning_rate": 1.4e-07,
90
- "epoch": 0.15306122448979592
91
- },
92
- {
93
- "current_steps": 15,
94
- "loss": 0.9785,
95
- "learning_rate": 1.5e-07,
96
- "epoch": 0.16326530612244897
97
- },
98
- {
99
- "current_steps": 16,
100
- "loss": 1.0136,
101
- "learning_rate": 1.6e-07,
102
- "epoch": 0.17346938775510204
103
- },
104
- {
105
- "current_steps": 17,
106
- "loss": 1.0683,
107
- "learning_rate": 1.7000000000000001e-07,
108
- "epoch": 0.1836734693877551
109
- },
110
- {
111
- "current_steps": 18,
112
- "loss": 1.354,
113
- "learning_rate": 1.8e-07,
114
- "epoch": 0.19387755102040816
115
- },
116
- {
117
- "current_steps": 19,
118
- "loss": 0.8537,
119
- "learning_rate": 1.8999999999999998e-07,
120
- "epoch": 0.20408163265306123
121
- },
122
- {
123
- "current_steps": 20,
124
- "loss": 1.3124,
125
- "learning_rate": 2e-07,
126
- "epoch": 0.21428571428571427
127
- },
128
- {
129
- "current_steps": 21,
130
- "loss": 1.0973,
131
- "learning_rate": 2.0999999999999997e-07,
132
- "epoch": 0.22448979591836735
133
- },
134
- {
135
- "current_steps": 22,
136
- "loss": 0.8491,
137
- "learning_rate": 2.1999999999999998e-07,
138
- "epoch": 0.23469387755102042
139
- },
140
- {
141
- "current_steps": 23,
142
- "loss": 0.8702,
143
- "learning_rate": 2.3e-07,
144
- "epoch": 0.24489795918367346
145
- },
146
- {
147
- "current_steps": 24,
148
- "loss": 1.065,
149
- "learning_rate": 2.4e-07,
150
- "epoch": 0.25510204081632654
151
- },
152
- {
153
- "current_steps": 25,
154
- "loss": 0.9593,
155
- "learning_rate": 2.5e-07,
156
- "epoch": 0.2653061224489796
157
- },
158
- {
159
- "current_steps": 26,
160
- "loss": 1.0278,
161
- "learning_rate": 2.6e-07,
162
- "epoch": 0.2755102040816326
163
- },
164
- {
165
- "current_steps": 27,
166
- "loss": 1.0488,
167
- "learning_rate": 2.7e-07,
168
- "epoch": 0.2857142857142857
169
- },
170
- {
171
- "current_steps": 28,
172
- "loss": 0.8877,
173
- "learning_rate": 2.8e-07,
174
- "epoch": 0.29591836734693877
175
- },
176
- {
177
- "current_steps": 29,
178
- "loss": 1.2834,
179
- "learning_rate": 2.9e-07,
180
- "epoch": 0.30612244897959184
181
- },
182
- {
183
- "current_steps": 30,
184
- "loss": 1.1416,
185
- "learning_rate": 3e-07,
186
- "epoch": 0.3163265306122449
187
- },
188
- {
189
- "current_steps": 31,
190
- "loss": 1.2615,
191
- "learning_rate": 3.1e-07,
192
- "epoch": 0.32653061224489793
193
- },
194
- {
195
- "current_steps": 32,
196
- "loss": 1.522,
197
- "learning_rate": 3.2e-07,
198
- "epoch": 0.336734693877551
199
- },
200
- {
201
- "current_steps": 33,
202
- "loss": 1.2533,
203
- "learning_rate": 3.3e-07,
204
- "epoch": 0.3469387755102041
205
- },
206
- {
207
- "current_steps": 34,
208
- "loss": 1.3297,
209
- "learning_rate": 3.4000000000000003e-07,
210
- "epoch": 0.35714285714285715
211
- },
212
- {
213
- "current_steps": 35,
214
- "loss": 1.1923,
215
- "learning_rate": 3.4000000000000003e-07,
216
- "epoch": 0.3673469387755102
217
- },
218
- {
219
- "current_steps": 36,
220
- "loss": 1.0751,
221
- "learning_rate": 3.5e-07,
222
- "epoch": 0.37755102040816324
223
- },
224
- {
225
- "current_steps": 37,
226
- "loss": 0.9824,
227
- "learning_rate": 3.6e-07,
228
- "epoch": 0.3877551020408163
229
- },
230
- {
231
- "current_steps": 38,
232
- "loss": 1.2065,
233
- "learning_rate": 3.7e-07,
234
- "epoch": 0.3979591836734694
235
- },
236
- {
237
- "current_steps": 39,
238
- "loss": 0.6985,
239
- "learning_rate": 3.7999999999999996e-07,
240
- "epoch": 0.40816326530612246
241
- },
242
- {
243
- "current_steps": 40,
244
- "loss": 1.2638,
245
- "learning_rate": 3.8999999999999997e-07,
246
- "epoch": 0.41836734693877553
247
- },
248
- {
249
- "current_steps": 41,
250
- "loss": 1.0332,
251
- "learning_rate": 4e-07,
252
- "epoch": 0.42857142857142855
253
- },
254
- {
255
- "current_steps": 42,
256
- "loss": 0.9709,
257
- "learning_rate": 4.0999999999999994e-07,
258
- "epoch": 0.4387755102040816
259
- },
260
- {
261
- "current_steps": 43,
262
- "loss": 1.0852,
263
- "learning_rate": 4.1999999999999995e-07,
264
- "epoch": 0.4489795918367347
265
- },
266
- {
267
- "current_steps": 44,
268
- "loss": 1.0733,
269
- "learning_rate": 4.2999999999999996e-07,
270
- "epoch": 0.45918367346938777
271
- },
272
- {
273
- "current_steps": 45,
274
- "loss": 1.3022,
275
- "learning_rate": 4.3999999999999997e-07,
276
- "epoch": 0.46938775510204084
277
- },
278
- {
279
- "current_steps": 46,
280
- "loss": 1.0857,
281
- "learning_rate": 4.5e-07,
282
- "epoch": 0.47959183673469385
283
- },
284
- {
285
- "current_steps": 47,
286
- "loss": 1.1048,
287
- "learning_rate": 4.6e-07,
288
- "epoch": 0.4897959183673469
289
- },
290
- {
291
- "current_steps": 48,
292
- "loss": 0.8515,
293
- "learning_rate": 4.6999999999999995e-07,
294
- "epoch": 0.5
295
- },
296
- {
297
- "current_steps": 49,
298
- "loss": 1.0769,
299
- "learning_rate": 4.8e-07,
300
- "epoch": 0.5102040816326531
301
- },
302
- {
303
- "current_steps": 50,
304
- "loss": 1.2454,
305
- "learning_rate": 4.9e-07,
306
- "epoch": 0.5204081632653061
307
- },
308
- {
309
- "current_steps": 51,
310
- "loss": 1.309,
311
- "learning_rate": 5e-07,
312
- "epoch": 0.5306122448979592
313
- },
314
- {
315
- "current_steps": 52,
316
- "loss": 1.0071,
317
- "learning_rate": 5.1e-07,
318
- "epoch": 0.5408163265306123
319
- },
320
- {
321
- "current_steps": 53,
322
- "loss": 1.01,
323
- "learning_rate": 5.2e-07,
324
- "epoch": 0.5510204081632653
325
- },
326
- {
327
- "current_steps": 54,
328
- "loss": 0.9557,
329
- "learning_rate": 5.3e-07,
330
- "epoch": 0.5612244897959183
331
- },
332
- {
333
- "current_steps": 55,
334
- "loss": 1.0609,
335
- "learning_rate": 5.4e-07,
336
- "epoch": 0.5714285714285714
337
- },
338
- {
339
- "current_steps": 56,
340
- "loss": 0.8179,
341
- "learning_rate": 5.5e-07,
342
- "epoch": 0.5816326530612245
343
- },
344
- {
345
- "current_steps": 57,
346
- "loss": 1.0753,
347
- "learning_rate": 5.6e-07,
348
- "epoch": 0.5918367346938775
349
- },
350
- {
351
- "current_steps": 58,
352
- "loss": 0.9524,
353
- "learning_rate": 5.699999999999999e-07,
354
- "epoch": 0.6020408163265306
355
- },
356
- {
357
- "current_steps": 59,
358
- "loss": 1.0423,
359
- "learning_rate": 5.8e-07,
360
- "epoch": 0.6122448979591837
361
- },
362
- {
363
- "current_steps": 60,
364
- "loss": 1.1175,
365
- "learning_rate": 5.9e-07,
366
- "epoch": 0.6224489795918368
367
- },
368
- {
369
- "current_steps": 61,
370
- "loss": 1.1842,
371
- "learning_rate": 6e-07,
372
- "epoch": 0.6326530612244898
373
- },
374
- {
375
- "current_steps": 62,
376
- "loss": 0.9762,
377
- "learning_rate": 6.1e-07,
378
- "epoch": 0.6428571428571429
379
- },
380
- {
381
- "current_steps": 63,
382
- "loss": 0.9862,
383
- "learning_rate": 6.2e-07,
384
- "epoch": 0.6530612244897959
385
- },
386
- {
387
- "current_steps": 64,
388
- "loss": 1.3298,
389
- "learning_rate": 6.3e-07,
390
- "epoch": 0.6632653061224489
391
- },
392
- {
393
- "current_steps": 65,
394
- "loss": 0.892,
395
- "learning_rate": 6.4e-07,
396
- "epoch": 0.673469387755102
397
- },
398
- {
399
- "current_steps": 66,
400
- "loss": 1.3031,
401
- "learning_rate": 6.5e-07,
402
- "epoch": 0.6836734693877551
403
- },
404
- {
405
- "current_steps": 67,
406
- "loss": 1.0295,
407
- "learning_rate": 6.6e-07,
408
- "epoch": 0.6938775510204082
409
- },
410
- {
411
- "current_steps": 68,
412
- "loss": 1.0984,
413
- "learning_rate": 6.7e-07,
414
- "epoch": 0.7040816326530612
415
- },
416
- {
417
- "current_steps": 69,
418
- "loss": 0.961,
419
- "learning_rate": 6.800000000000001e-07,
420
- "epoch": 0.7142857142857143
421
- },
422
- {
423
- "current_steps": 70,
424
- "loss": 1.1315,
425
- "learning_rate": 6.9e-07,
426
- "epoch": 0.7244897959183674
427
- },
428
- {
429
- "current_steps": 71,
430
- "loss": 1.151,
431
- "learning_rate": 7e-07,
432
- "epoch": 0.7346938775510204
433
- },
434
- {
435
- "current_steps": 72,
436
- "loss": 1.1734,
437
- "learning_rate": 7.1e-07,
438
- "epoch": 0.7448979591836735
439
- },
440
- {
441
- "current_steps": 73,
442
- "loss": 1.2119,
443
- "learning_rate": 7.2e-07,
444
- "epoch": 0.7551020408163265
445
- },
446
- {
447
- "current_steps": 74,
448
- "loss": 1.2703,
449
- "learning_rate": 7.3e-07,
450
- "epoch": 0.7653061224489796
451
- },
452
- {
453
- "current_steps": 75,
454
- "loss": 1.171,
455
- "learning_rate": 7.4e-07,
456
- "epoch": 0.7755102040816326
457
- },
458
- {
459
- "current_steps": 76,
460
- "loss": 1.0916,
461
- "learning_rate": 7.5e-07,
462
- "epoch": 0.7857142857142857
463
- },
464
- {
465
- "current_steps": 77,
466
- "loss": 0.9059,
467
- "learning_rate": 7.599999999999999e-07,
468
- "epoch": 0.7959183673469388
469
- },
470
- {
471
- "current_steps": 78,
472
- "loss": 1.0347,
473
- "learning_rate": 7.699999999999999e-07,
474
- "epoch": 0.8061224489795918
475
- },
476
- {
477
- "current_steps": 79,
478
- "loss": 1.2085,
479
- "learning_rate": 7.799999999999999e-07,
480
- "epoch": 0.8163265306122449
481
- },
482
- {
483
- "current_steps": 80,
484
- "loss": 1.0081,
485
- "learning_rate": 7.9e-07,
486
- "epoch": 0.826530612244898
487
- },
488
- {
489
- "current_steps": 81,
490
- "loss": 1.0151,
491
- "learning_rate": 8e-07,
492
- "epoch": 0.8367346938775511
493
- },
494
- {
495
- "current_steps": 82,
496
- "loss": 1.0571,
497
- "learning_rate": 8.1e-07,
498
- "epoch": 0.8469387755102041
499
- },
500
- {
501
- "current_steps": 83,
502
- "loss": 1.3335,
503
- "learning_rate": 8.199999999999999e-07,
504
- "epoch": 0.8571428571428571
505
- },
506
- {
507
- "current_steps": 84,
508
- "loss": 1.0442,
509
- "learning_rate": 8.299999999999999e-07,
510
- "epoch": 0.8673469387755102
511
- },
512
- {
513
- "current_steps": 85,
514
- "loss": 0.9886,
515
- "learning_rate": 8.399999999999999e-07,
516
- "epoch": 0.8775510204081632
517
- },
518
- {
519
- "current_steps": 86,
520
- "loss": 1.1209,
521
- "learning_rate": 8.499999999999999e-07,
522
- "epoch": 0.8877551020408163
523
- },
524
- {
525
- "current_steps": 87,
526
- "loss": 0.9748,
527
- "learning_rate": 8.599999999999999e-07,
528
- "epoch": 0.8979591836734694
529
- },
530
- {
531
- "current_steps": 88,
532
- "loss": 0.9632,
533
- "learning_rate": 8.699999999999999e-07,
534
- "epoch": 0.9081632653061225
535
- },
536
- {
537
- "current_steps": 89,
538
- "loss": 1.3457,
539
- "learning_rate": 8.799999999999999e-07,
540
- "epoch": 0.9183673469387755
541
- },
542
- {
543
- "current_steps": 90,
544
- "loss": 0.9974,
545
- "learning_rate": 8.9e-07,
546
- "epoch": 0.9285714285714286
547
- },
548
- {
549
- "current_steps": 91,
550
- "loss": 1.1924,
551
- "learning_rate": 9e-07,
552
- "epoch": 0.9387755102040817
553
- },
554
- {
555
- "current_steps": 92,
556
- "loss": 1.2519,
557
- "learning_rate": 9.1e-07,
558
- "epoch": 0.9489795918367347
559
- },
560
- {
561
- "current_steps": 93,
562
- "loss": 1.0482,
563
- "learning_rate": 9.2e-07,
564
- "epoch": 0.9591836734693877
565
- },
566
- {
567
- "current_steps": 94,
568
- "loss": 0.9984,
569
- "learning_rate": 9.3e-07,
570
- "epoch": 0.9693877551020408
571
- },
572
- {
573
- "current_steps": 95,
574
- "loss": 1.2079,
575
- "learning_rate": 9.399999999999999e-07,
576
- "epoch": 0.9795918367346939
577
- },
578
- {
579
- "current_steps": 96,
580
- "loss": 0.9468,
581
- "learning_rate": 9.499999999999999e-07,
582
- "epoch": 0.9897959183673469
583
- },
584
- {
585
- "current_steps": 97,
586
- "loss": 0.9655,
587
- "learning_rate": 9.6e-07,
588
- "epoch": 1.0
589
- },
590
- {
591
- "current_steps": 98,
592
- "loss": 0.9911,
593
- "learning_rate": 9.7e-07,
594
- "epoch": 1.010204081632653
595
- },
596
- {
597
- "current_steps": 99,
598
- "loss": 1.019,
599
- "learning_rate": 9.8e-07,
600
- "epoch": 1.0204081632653061
601
- },
602
- {
603
- "current_steps": 100,
604
- "loss": 1.0679,
605
- "learning_rate": 9.9e-07,
606
- "epoch": 1.030612244897959
607
- },
608
- {
609
- "current_steps": 101,
610
- "loss": 0.8909,
611
- "learning_rate": 1e-06,
612
- "epoch": 1.0408163265306123
613
- },
614
- {
615
- "current_steps": 102,
616
- "loss": 1.1474,
617
- "learning_rate": 9.974358974358974e-07,
618
- "epoch": 1.0510204081632653
619
- },
620
- {
621
- "current_steps": 103,
622
- "loss": 0.9872,
623
- "learning_rate": 9.948717948717949e-07,
624
- "epoch": 1.0612244897959184
625
- },
626
- {
627
- "current_steps": 104,
628
- "loss": 0.8438,
629
- "learning_rate": 9.923076923076923e-07,
630
- "epoch": 1.0714285714285714
631
- },
632
- {
633
- "current_steps": 105,
634
- "loss": 0.9473,
635
- "learning_rate": 9.897435897435898e-07,
636
- "epoch": 1.0816326530612246
637
- },
638
- {
639
- "current_steps": 106,
640
- "loss": 1.0933,
641
- "learning_rate": 9.871794871794872e-07,
642
- "epoch": 1.0918367346938775
643
- },
644
- {
645
- "current_steps": 107,
646
- "loss": 0.8545,
647
- "learning_rate": 9.846153846153847e-07,
648
- "epoch": 1.1020408163265305
649
- },
650
- {
651
- "current_steps": 108,
652
- "loss": 1.1711,
653
- "learning_rate": 9.820512820512819e-07,
654
- "epoch": 1.1122448979591837
655
- },
656
- {
657
- "current_steps": 109,
658
- "loss": 0.905,
659
- "learning_rate": 9.794871794871793e-07,
660
- "epoch": 1.1224489795918366
661
- },
662
- {
663
- "current_steps": 110,
664
- "loss": 0.9026,
665
- "learning_rate": 9.769230769230768e-07,
666
- "epoch": 1.1326530612244898
667
- },
668
- {
669
- "current_steps": 111,
670
- "loss": 1.2263,
671
- "learning_rate": 9.743589743589742e-07,
672
- "epoch": 1.1428571428571428
673
- },
674
- {
675
- "current_steps": 112,
676
- "loss": 1.0883,
677
- "learning_rate": 9.717948717948717e-07,
678
- "epoch": 1.153061224489796
679
- },
680
- {
681
- "current_steps": 113,
682
- "loss": 1.1185,
683
- "learning_rate": 9.692307692307691e-07,
684
- "epoch": 1.163265306122449
685
- },
686
- {
687
- "current_steps": 114,
688
- "loss": 1.2212,
689
- "learning_rate": 9.666666666666666e-07,
690
- "epoch": 1.1734693877551021
691
- },
692
- {
693
- "current_steps": 115,
694
- "loss": 1.2142,
695
- "learning_rate": 9.64102564102564e-07,
696
- "epoch": 1.183673469387755
697
- },
698
- {
699
- "current_steps": 116,
700
- "loss": 1.8625,
701
- "learning_rate": 9.615384615384615e-07,
702
- "epoch": 1.193877551020408
703
- },
704
- {
705
- "current_steps": 117,
706
- "loss": 1.0982,
707
- "learning_rate": 9.58974358974359e-07,
708
- "epoch": 1.2040816326530612
709
- },
710
- {
711
- "current_steps": 118,
712
- "loss": 1.2912,
713
- "learning_rate": 9.564102564102564e-07,
714
- "epoch": 1.2142857142857142
715
- },
716
- {
717
- "current_steps": 119,
718
- "loss": 1.2424,
719
- "learning_rate": 9.538461538461538e-07,
720
- "epoch": 1.2244897959183674
721
- },
722
- {
723
- "current_steps": 120,
724
- "loss": 1.0336,
725
- "learning_rate": 9.512820512820512e-07,
726
- "epoch": 1.2346938775510203
727
- },
728
- {
729
- "current_steps": 121,
730
- "loss": 1.1639,
731
- "learning_rate": 9.487179487179486e-07,
732
- "epoch": 1.2448979591836735
733
- },
734
- {
735
- "current_steps": 122,
736
- "loss": 1.1928,
737
- "learning_rate": 9.461538461538461e-07,
738
- "epoch": 1.2551020408163265
739
- },
740
- {
741
- "current_steps": 123,
742
- "loss": 0.8534,
743
- "learning_rate": 9.435897435897435e-07,
744
- "epoch": 1.2653061224489797
745
- },
746
- {
747
- "current_steps": 124,
748
- "loss": 1.0712,
749
- "learning_rate": 9.41025641025641e-07,
750
- "epoch": 1.2755102040816326
751
- },
752
- {
753
- "current_steps": 125,
754
- "loss": 1.2151,
755
- "learning_rate": 9.384615384615384e-07,
756
- "epoch": 1.2857142857142856
757
- },
758
- {
759
- "current_steps": 126,
760
- "loss": 0.8883,
761
- "learning_rate": 9.358974358974359e-07,
762
- "epoch": 1.2959183673469388
763
- },
764
- {
765
- "current_steps": 127,
766
- "loss": 1.1596,
767
- "learning_rate": 9.333333333333333e-07,
768
- "epoch": 1.306122448979592
769
- },
770
- {
771
- "current_steps": 128,
772
- "loss": 1.0218,
773
- "learning_rate": 9.307692307692308e-07,
774
- "epoch": 1.316326530612245
775
- },
776
- {
777
- "current_steps": 129,
778
- "loss": 0.953,
779
- "learning_rate": 9.282051282051282e-07,
780
- "epoch": 1.3265306122448979
781
- },
782
- {
783
- "current_steps": 130,
784
- "loss": 0.9616,
785
- "learning_rate": 9.282051282051282e-07,
786
- "epoch": 1.336734693877551
787
- },
788
- {
789
- "current_steps": 131,
790
- "loss": 1.0368,
791
- "learning_rate": 9.256410256410257e-07,
792
- "epoch": 1.346938775510204
793
- },
794
- {
795
- "current_steps": 132,
796
- "loss": 1.008,
797
- "learning_rate": 9.230769230769231e-07,
798
- "epoch": 1.3571428571428572
799
- },
800
- {
801
- "current_steps": 133,
802
- "loss": 0.9845,
803
- "learning_rate": 9.205128205128205e-07,
804
- "epoch": 1.3673469387755102
805
- },
806
- {
807
- "current_steps": 134,
808
- "loss": 0.8404,
809
- "learning_rate": 9.179487179487179e-07,
810
- "epoch": 1.3775510204081631
811
- },
812
- {
813
- "current_steps": 135,
814
- "loss": 0.9514,
815
- "learning_rate": 9.153846153846153e-07,
816
- "epoch": 1.3877551020408163
817
- },
818
- {
819
- "current_steps": 136,
820
- "loss": 1.0032,
821
- "learning_rate": 9.128205128205127e-07,
822
- "epoch": 1.3979591836734695
823
- },
824
- {
825
- "current_steps": 137,
826
- "loss": 0.9905,
827
- "learning_rate": 9.102564102564102e-07,
828
- "epoch": 1.4081632653061225
829
- },
830
- {
831
- "current_steps": 138,
832
- "loss": 1.0208,
833
- "learning_rate": 9.076923076923076e-07,
834
- "epoch": 1.4183673469387754
835
- },
836
- {
837
- "current_steps": 139,
838
- "loss": 0.9982,
839
- "learning_rate": 9.051282051282051e-07,
840
- "epoch": 1.4285714285714286
841
- },
842
- {
843
- "current_steps": 140,
844
- "loss": 0.9697,
845
- "learning_rate": 9.025641025641025e-07,
846
- "epoch": 1.4387755102040816
847
- },
848
- {
849
- "current_steps": 141,
850
- "loss": 1.031,
851
- "learning_rate": 9e-07,
852
- "epoch": 1.4489795918367347
853
- },
854
- {
855
- "current_steps": 142,
856
- "loss": 1.1713,
857
- "learning_rate": 8.974358974358974e-07,
858
- "epoch": 1.4591836734693877
859
- },
860
- {
861
- "current_steps": 143,
862
- "loss": 1.3207,
863
- "learning_rate": 8.948717948717949e-07,
864
- "epoch": 1.469387755102041
865
- },
866
- {
867
- "current_steps": 144,
868
- "loss": 0.9822,
869
- "learning_rate": 8.923076923076923e-07,
870
- "epoch": 1.4795918367346939
871
- },
872
- {
873
- "current_steps": 145,
874
- "loss": 0.9969,
875
- "learning_rate": 8.897435897435897e-07,
876
- "epoch": 1.489795918367347
877
- },
878
- {
879
- "current_steps": 146,
880
- "loss": 1.216,
881
- "learning_rate": 8.871794871794871e-07,
882
- "epoch": 1.5
883
- },
884
- {
885
- "current_steps": 147,
886
- "loss": 1.0927,
887
- "learning_rate": 8.846153846153846e-07,
888
- "epoch": 1.510204081632653
889
- },
890
- {
891
- "current_steps": 148,
892
- "loss": 1.1464,
893
- "learning_rate": 8.82051282051282e-07,
894
- "epoch": 1.5204081632653061
895
- },
896
- {
897
- "current_steps": 149,
898
- "loss": 0.9045,
899
- "learning_rate": 8.794871794871795e-07,
900
- "epoch": 1.5306122448979593
901
- },
902
- {
903
- "current_steps": 150,
904
- "loss": 1.0216,
905
- "learning_rate": 8.769230769230769e-07,
906
- "epoch": 1.5408163265306123
907
- },
908
- {
909
- "current_steps": 151,
910
- "loss": 0.8687,
911
- "learning_rate": 8.743589743589743e-07,
912
- "epoch": 1.5510204081632653
913
- },
914
- {
915
- "current_steps": 152,
916
- "loss": 1.3796,
917
- "learning_rate": 8.717948717948718e-07,
918
- "epoch": 1.5612244897959182
919
- },
920
- {
921
- "current_steps": 153,
922
- "loss": 1.0517,
923
- "learning_rate": 8.692307692307692e-07,
924
- "epoch": 1.5714285714285714
925
- },
926
- {
927
- "current_steps": 154,
928
- "loss": 1.2114,
929
- "learning_rate": 8.666666666666667e-07,
930
- "epoch": 1.5816326530612246
931
- },
932
- {
933
- "current_steps": 155,
934
- "loss": 0.8193,
935
- "learning_rate": 8.641025641025641e-07,
936
- "epoch": 1.5918367346938775
937
- },
938
- {
939
- "current_steps": 156,
940
- "loss": 1.1385,
941
- "learning_rate": 8.615384615384616e-07,
942
- "epoch": 1.6020408163265305
943
- },
944
- {
945
- "current_steps": 157,
946
- "loss": 0.9739,
947
- "learning_rate": 8.589743589743588e-07,
948
- "epoch": 1.6122448979591837
949
- },
950
- {
951
- "current_steps": 158,
952
- "loss": 1.5266,
953
- "learning_rate": 8.564102564102563e-07,
954
- "epoch": 1.6224489795918369
955
- },
956
- {
957
- "current_steps": 159,
958
- "loss": 1.0021,
959
- "learning_rate": 8.538461538461537e-07,
960
- "epoch": 1.6326530612244898
961
- },
962
- {
963
- "current_steps": 160,
964
- "loss": 1.119,
965
- "learning_rate": 8.512820512820512e-07,
966
- "epoch": 1.6428571428571428
967
- },
968
- {
969
- "current_steps": 161,
970
- "loss": 0.9662,
971
- "learning_rate": 8.487179487179486e-07,
972
- "epoch": 1.6530612244897958
973
- },
974
- {
975
- "current_steps": 162,
976
- "loss": 1.1719,
977
- "learning_rate": 8.461538461538461e-07,
978
- "epoch": 1.663265306122449
979
- },
980
- {
981
- "current_steps": 163,
982
- "loss": 0.9716,
983
- "learning_rate": 8.435897435897435e-07,
984
- "epoch": 1.6734693877551021
985
- },
986
- {
987
- "current_steps": 164,
988
- "loss": 1.5846,
989
- "learning_rate": 8.41025641025641e-07,
990
- "epoch": 1.683673469387755
991
- },
992
- {
993
- "current_steps": 165,
994
- "loss": 0.8901,
995
- "learning_rate": 8.384615384615384e-07,
996
- "epoch": 1.693877551020408
997
- },
998
- {
999
- "current_steps": 166,
1000
- "loss": 0.9545,
1001
- "learning_rate": 8.358974358974359e-07,
1002
- "epoch": 1.7040816326530612
1003
- },
1004
- {
1005
- "current_steps": 167,
1006
- "loss": 1.1515,
1007
- "learning_rate": 8.333333333333333e-07,
1008
- "epoch": 1.7142857142857144
1009
- },
1010
- {
1011
- "current_steps": 168,
1012
- "loss": 1.3852,
1013
- "learning_rate": 8.307692307692308e-07,
1014
- "epoch": 1.7244897959183674
1015
- },
1016
- {
1017
- "current_steps": 169,
1018
- "loss": 1.0304,
1019
- "learning_rate": 8.282051282051282e-07,
1020
- "epoch": 1.7346938775510203
1021
- },
1022
- {
1023
- "current_steps": 170,
1024
- "loss": 1.0555,
1025
- "learning_rate": 8.256410256410256e-07,
1026
- "epoch": 1.7448979591836735
1027
- },
1028
- {
1029
- "current_steps": 171,
1030
- "loss": 0.7715,
1031
- "learning_rate": 8.23076923076923e-07,
1032
- "epoch": 1.7551020408163265
1033
- },
1034
- {
1035
- "current_steps": 172,
1036
- "loss": 0.9879,
1037
- "learning_rate": 8.205128205128205e-07,
1038
- "epoch": 1.7653061224489797
1039
- },
1040
- {
1041
- "current_steps": 173,
1042
- "loss": 1.244,
1043
- "learning_rate": 8.179487179487179e-07,
1044
- "epoch": 1.7755102040816326
1045
- },
1046
- {
1047
- "current_steps": 174,
1048
- "loss": 1.1134,
1049
- "learning_rate": 8.153846153846154e-07,
1050
- "epoch": 1.7857142857142856
1051
- },
1052
- {
1053
- "current_steps": 175,
1054
- "loss": 1.1013,
1055
- "learning_rate": 8.128205128205128e-07,
1056
- "epoch": 1.7959183673469388
1057
- },
1058
- {
1059
- "current_steps": 176,
1060
- "loss": 1.1775,
1061
- "learning_rate": 8.102564102564103e-07,
1062
- "epoch": 1.806122448979592
1063
- },
1064
- {
1065
- "current_steps": 177,
1066
- "loss": 1.1815,
1067
- "learning_rate": 8.076923076923077e-07,
1068
- "epoch": 1.816326530612245
1069
- },
1070
- {
1071
- "current_steps": 178,
1072
- "loss": 1.1765,
1073
- "learning_rate": 8.051282051282052e-07,
1074
- "epoch": 1.8265306122448979
1075
- },
1076
- {
1077
- "current_steps": 179,
1078
- "loss": 0.9524,
1079
- "learning_rate": 8.025641025641025e-07,
1080
- "epoch": 1.836734693877551
1081
- },
1082
- {
1083
- "current_steps": 180,
1084
- "loss": 1.1465,
1085
- "learning_rate": 8e-07,
1086
- "epoch": 1.8469387755102042
1087
- },
1088
- {
1089
- "current_steps": 181,
1090
- "loss": 1.198,
1091
- "learning_rate": 7.974358974358974e-07,
1092
- "epoch": 1.8571428571428572
1093
- },
1094
- {
1095
- "current_steps": 182,
1096
- "loss": 1.0793,
1097
- "learning_rate": 7.948717948717948e-07,
1098
- "epoch": 1.8673469387755102
1099
- },
1100
- {
1101
- "current_steps": 183,
1102
- "loss": 1.2496,
1103
- "learning_rate": 7.923076923076922e-07,
1104
- "epoch": 1.8775510204081631
1105
- },
1106
- {
1107
- "current_steps": 184,
1108
- "loss": 0.986,
1109
- "learning_rate": 7.897435897435897e-07,
1110
- "epoch": 1.8877551020408163
1111
- },
1112
- {
1113
- "current_steps": 185,
1114
- "loss": 1.0159,
1115
- "learning_rate": 7.871794871794871e-07,
1116
- "epoch": 1.8979591836734695
1117
- },
1118
- {
1119
- "current_steps": 186,
1120
- "loss": 1.0971,
1121
- "learning_rate": 7.846153846153846e-07,
1122
- "epoch": 1.9081632653061225
1123
- },
1124
- {
1125
- "current_steps": 187,
1126
- "loss": 0.9174,
1127
- "learning_rate": 7.82051282051282e-07,
1128
- "epoch": 1.9183673469387754
1129
- },
1130
- {
1131
- "current_steps": 188,
1132
- "loss": 0.9513,
1133
- "learning_rate": 7.794871794871795e-07,
1134
- "epoch": 1.9285714285714286
1135
- },
1136
- {
1137
- "current_steps": 189,
1138
- "loss": 1.0736,
1139
- "learning_rate": 7.769230769230769e-07,
1140
- "epoch": 1.9387755102040818
1141
- },
1142
- {
1143
- "current_steps": 190,
1144
- "loss": 1.1306,
1145
- "learning_rate": 7.743589743589744e-07,
1146
- "epoch": 1.9489795918367347
1147
- },
1148
- {
1149
- "current_steps": 191,
1150
- "loss": 0.9481,
1151
- "learning_rate": 7.717948717948718e-07,
1152
- "epoch": 1.9591836734693877
1153
- },
1154
- {
1155
- "current_steps": 192,
1156
- "loss": 0.8325,
1157
- "learning_rate": 7.692307692307693e-07,
1158
- "epoch": 1.9693877551020407
1159
- },
1160
- {
1161
- "current_steps": 193,
1162
- "loss": 1.0885,
1163
- "learning_rate": 7.666666666666667e-07,
1164
- "epoch": 1.9795918367346939
1165
- },
1166
- {
1167
- "current_steps": 194,
1168
- "loss": 1.1409,
1169
- "learning_rate": 7.64102564102564e-07,
1170
- "epoch": 1.989795918367347
1171
- },
1172
- {
1173
- "current_steps": 195,
1174
- "loss": 0.7824,
1175
- "learning_rate": 7.615384615384615e-07,
1176
- "epoch": 2.0
1177
- },
1178
- {
1179
- "current_steps": 196,
1180
- "loss": 1.1916,
1181
- "learning_rate": 7.589743589743589e-07,
1182
- "epoch": 2.010204081632653
1183
- },
1184
- {
1185
- "current_steps": 197,
1186
- "loss": 1.1858,
1187
- "learning_rate": 7.564102564102564e-07,
1188
- "epoch": 2.020408163265306
1189
- },
1190
- {
1191
- "current_steps": 198,
1192
- "loss": 0.8909,
1193
- "learning_rate": 7.538461538461538e-07,
1194
- "epoch": 2.0306122448979593
1195
- },
1196
- {
1197
- "current_steps": 199,
1198
- "loss": 1.2537,
1199
- "learning_rate": 7.512820512820513e-07,
1200
- "epoch": 2.0408163265306123
1201
- },
1202
- {
1203
- "current_steps": 200,
1204
- "loss": 1.1669,
1205
- "learning_rate": 7.487179487179486e-07,
1206
- "epoch": 2.0510204081632653
1207
- },
1208
- {
1209
- "current_steps": 201,
1210
- "loss": 0.9486,
1211
- "learning_rate": 7.461538461538461e-07,
1212
- "epoch": 2.061224489795918
1213
- },
1214
- {
1215
- "current_steps": 202,
1216
- "loss": 0.9397,
1217
- "learning_rate": 7.435897435897435e-07,
1218
- "epoch": 2.0714285714285716
1219
- },
1220
- {
1221
- "current_steps": 203,
1222
- "loss": 1.0689,
1223
- "learning_rate": 7.41025641025641e-07,
1224
- "epoch": 2.0816326530612246
1225
- },
1226
- {
1227
- "current_steps": 204,
1228
- "loss": 0.8393,
1229
- "learning_rate": 7.384615384615384e-07,
1230
- "epoch": 2.0918367346938775
1231
- },
1232
- {
1233
- "current_steps": 205,
1234
- "loss": 0.9834,
1235
- "learning_rate": 7.358974358974359e-07,
1236
- "epoch": 2.1020408163265305
1237
- },
1238
- {
1239
- "current_steps": 206,
1240
- "loss": 0.9443,
1241
- "learning_rate": 7.333333333333332e-07,
1242
- "epoch": 2.1122448979591835
1243
- },
1244
- {
1245
- "current_steps": 207,
1246
- "loss": 0.8711,
1247
- "learning_rate": 7.307692307692307e-07,
1248
- "epoch": 2.122448979591837
1249
- },
1250
- {
1251
- "current_steps": 208,
1252
- "loss": 1.0088,
1253
- "learning_rate": 7.282051282051281e-07,
1254
- "epoch": 2.13265306122449
1255
- },
1256
- {
1257
- "current_steps": 209,
1258
- "loss": 0.9495,
1259
- "learning_rate": 7.256410256410256e-07,
1260
- "epoch": 2.142857142857143
1261
- },
1262
- {
1263
- "current_steps": 210,
1264
- "loss": 1.0246,
1265
- "learning_rate": 7.23076923076923e-07,
1266
- "epoch": 2.1530612244897958
1267
- },
1268
- {
1269
- "current_steps": 211,
1270
- "loss": 0.8712,
1271
- "learning_rate": 7.205128205128205e-07,
1272
- "epoch": 2.163265306122449
1273
- },
1274
- {
1275
- "current_steps": 212,
1276
- "loss": 1.0604,
1277
- "learning_rate": 7.179487179487179e-07,
1278
- "epoch": 2.173469387755102
1279
- },
1280
- {
1281
- "current_steps": 213,
1282
- "loss": 0.9848,
1283
- "learning_rate": 7.153846153846154e-07,
1284
- "epoch": 2.183673469387755
1285
- },
1286
- {
1287
- "current_steps": 214,
1288
- "loss": 1.0133,
1289
- "learning_rate": 7.128205128205128e-07,
1290
- "epoch": 2.193877551020408
1291
- },
1292
- {
1293
- "current_steps": 215,
1294
- "loss": 1.0367,
1295
- "learning_rate": 7.102564102564103e-07,
1296
- "epoch": 2.204081632653061
1297
- },
1298
- {
1299
- "current_steps": 216,
1300
- "loss": 1.0779,
1301
- "learning_rate": 7.076923076923077e-07,
1302
- "epoch": 2.2142857142857144
1303
- },
1304
- {
1305
- "current_steps": 217,
1306
- "loss": 1.1813,
1307
- "learning_rate": 7.051282051282052e-07,
1308
- "epoch": 2.2244897959183674
1309
- },
1310
- {
1311
- "current_steps": 218,
1312
- "loss": 1.209,
1313
- "learning_rate": 7.025641025641025e-07,
1314
- "epoch": 2.2346938775510203
1315
- },
1316
- {
1317
- "current_steps": 219,
1318
- "loss": 0.7289,
1319
- "learning_rate": 7e-07,
1320
- "epoch": 2.2448979591836733
1321
- },
1322
- {
1323
- "current_steps": 220,
1324
- "loss": 1.0283,
1325
- "learning_rate": 6.974358974358974e-07,
1326
- "epoch": 2.2551020408163267
1327
- },
1328
- {
1329
- "current_steps": 221,
1330
- "loss": 1.141,
1331
- "learning_rate": 6.948717948717948e-07,
1332
- "epoch": 2.2653061224489797
1333
- },
1334
- {
1335
- "current_steps": 222,
1336
- "loss": 0.8899,
1337
- "learning_rate": 6.923076923076922e-07,
1338
- "epoch": 2.2755102040816326
1339
- },
1340
- {
1341
- "current_steps": 223,
1342
- "loss": 1.0982,
1343
- "learning_rate": 6.897435897435897e-07,
1344
- "epoch": 2.2857142857142856
1345
- },
1346
- {
1347
- "current_steps": 224,
1348
- "loss": 0.9901,
1349
- "learning_rate": 6.871794871794871e-07,
1350
- "epoch": 2.295918367346939
1351
- },
1352
- {
1353
- "current_steps": 225,
1354
- "loss": 1.0405,
1355
- "learning_rate": 6.846153846153846e-07,
1356
- "epoch": 2.306122448979592
1357
- },
1358
- {
1359
- "current_steps": 226,
1360
- "loss": 0.8221,
1361
- "learning_rate": 6.82051282051282e-07,
1362
- "epoch": 2.316326530612245
1363
- },
1364
- {
1365
- "current_steps": 227,
1366
- "loss": 1.2293,
1367
- "learning_rate": 6.794871794871795e-07,
1368
- "epoch": 2.326530612244898
1369
- },
1370
- {
1371
- "current_steps": 228,
1372
- "loss": 1.2191,
1373
- "learning_rate": 6.769230769230769e-07,
1374
- "epoch": 2.336734693877551
1375
- },
1376
- {
1377
- "current_steps": 229,
1378
- "loss": 1.0375,
1379
- "learning_rate": 6.743589743589744e-07,
1380
- "epoch": 2.3469387755102042
1381
- },
1382
- {
1383
- "current_steps": 230,
1384
- "loss": 0.9397,
1385
- "learning_rate": 6.717948717948717e-07,
1386
- "epoch": 2.357142857142857
1387
- },
1388
- {
1389
- "current_steps": 231,
1390
- "loss": 0.9278,
1391
- "learning_rate": 6.692307692307692e-07,
1392
- "epoch": 2.36734693877551
1393
- },
1394
- {
1395
- "current_steps": 232,
1396
- "loss": 1.0717,
1397
- "learning_rate": 6.666666666666666e-07,
1398
- "epoch": 2.377551020408163
1399
- },
1400
- {
1401
- "current_steps": 233,
1402
- "loss": 1.1695,
1403
- "learning_rate": 6.64102564102564e-07,
1404
- "epoch": 2.387755102040816
1405
- },
1406
- {
1407
- "current_steps": 234,
1408
- "loss": 1.1653,
1409
- "learning_rate": 6.615384615384615e-07,
1410
- "epoch": 2.3979591836734695
1411
- },
1412
- {
1413
- "current_steps": 235,
1414
- "loss": 1.1682,
1415
- "learning_rate": 6.58974358974359e-07,
1416
- "epoch": 2.4081632653061225
1417
- },
1418
- {
1419
- "current_steps": 236,
1420
- "loss": 0.9805,
1421
- "learning_rate": 6.564102564102564e-07,
1422
- "epoch": 2.4183673469387754
1423
- },
1424
- {
1425
- "current_steps": 237,
1426
- "loss": 1.1046,
1427
- "learning_rate": 6.538461538461538e-07,
1428
- "epoch": 2.4285714285714284
1429
- },
1430
- {
1431
- "current_steps": 238,
1432
- "loss": 0.9734,
1433
- "learning_rate": 6.512820512820513e-07,
1434
- "epoch": 2.438775510204082
1435
- },
1436
- {
1437
- "current_steps": 239,
1438
- "loss": 1.2178,
1439
- "learning_rate": 6.487179487179487e-07,
1440
- "epoch": 2.4489795918367347
1441
- },
1442
- {
1443
- "current_steps": 240,
1444
- "loss": 1.1556,
1445
- "learning_rate": 6.461538461538462e-07,
1446
- "epoch": 2.4591836734693877
1447
- },
1448
- {
1449
- "current_steps": 241,
1450
- "loss": 1.0326,
1451
- "learning_rate": 6.435897435897436e-07,
1452
- "epoch": 2.4693877551020407
1453
- },
1454
- {
1455
- "current_steps": 242,
1456
- "loss": 0.9868,
1457
- "learning_rate": 6.410256410256411e-07,
1458
- "epoch": 2.479591836734694
1459
- },
1460
- {
1461
- "current_steps": 243,
1462
- "loss": 0.8947,
1463
- "learning_rate": 6.384615384615383e-07,
1464
- "epoch": 2.489795918367347
1465
- },
1466
- {
1467
- "current_steps": 244,
1468
- "loss": 0.9736,
1469
- "learning_rate": 6.358974358974358e-07,
1470
- "epoch": 2.5
1471
- },
1472
- {
1473
- "current_steps": 245,
1474
- "loss": 1.5294,
1475
- "learning_rate": 6.333333333333332e-07,
1476
- "epoch": 2.510204081632653
1477
- },
1478
- {
1479
- "current_steps": 246,
1480
- "loss": 1.176,
1481
- "learning_rate": 6.307692307692307e-07,
1482
- "epoch": 2.520408163265306
1483
- },
1484
- {
1485
- "current_steps": 247,
1486
- "loss": 1.0583,
1487
- "learning_rate": 6.282051282051281e-07,
1488
- "epoch": 2.5306122448979593
1489
- },
1490
- {
1491
- "current_steps": 248,
1492
- "loss": 0.9439,
1493
- "learning_rate": 6.256410256410256e-07,
1494
- "epoch": 2.5408163265306123
1495
- },
1496
- {
1497
- "current_steps": 249,
1498
- "loss": 0.9092,
1499
- "learning_rate": 6.23076923076923e-07,
1500
- "epoch": 2.5510204081632653
1501
- },
1502
- {
1503
- "current_steps": 250,
1504
- "loss": 1.2091,
1505
- "learning_rate": 6.205128205128205e-07,
1506
- "epoch": 2.561224489795918
1507
- },
1508
- {
1509
- "current_steps": 251,
1510
- "loss": 0.9285,
1511
- "learning_rate": 6.179487179487179e-07,
1512
- "epoch": 2.571428571428571
1513
- },
1514
- {
1515
- "current_steps": 252,
1516
- "loss": 1.039,
1517
- "learning_rate": 6.153846153846154e-07,
1518
- "epoch": 2.5816326530612246
1519
- },
1520
- {
1521
- "current_steps": 253,
1522
- "loss": 1.0148,
1523
- "learning_rate": 6.128205128205128e-07,
1524
- "epoch": 2.5918367346938775
1525
- },
1526
- {
1527
- "current_steps": 254,
1528
- "loss": 0.9658,
1529
- "learning_rate": 6.102564102564103e-07,
1530
- "epoch": 2.6020408163265305
1531
- },
1532
- {
1533
- "current_steps": 255,
1534
- "loss": 0.7872,
1535
- "learning_rate": 6.076923076923076e-07,
1536
- "epoch": 2.612244897959184
1537
- },
1538
- {
1539
- "current_steps": 256,
1540
- "loss": 0.9161,
1541
- "learning_rate": 6.051282051282051e-07,
1542
- "epoch": 2.622448979591837
1543
- },
1544
- {
1545
- "current_steps": 257,
1546
- "loss": 1.2506,
1547
- "learning_rate": 6.025641025641025e-07,
1548
- "epoch": 2.63265306122449
1549
- },
1550
- {
1551
- "current_steps": 258,
1552
- "loss": 0.969,
1553
- "learning_rate": 6e-07,
1554
- "epoch": 2.642857142857143
1555
- },
1556
- {
1557
- "current_steps": 259,
1558
- "loss": 0.8615,
1559
- "learning_rate": 5.974358974358974e-07,
1560
- "epoch": 2.6530612244897958
1561
- },
1562
- {
1563
- "current_steps": 260,
1564
- "loss": 0.8467,
1565
- "learning_rate": 5.948717948717949e-07,
1566
- "epoch": 2.663265306122449
1567
- },
1568
- {
1569
- "current_steps": 261,
1570
- "loss": 1.1803,
1571
- "learning_rate": 5.923076923076923e-07,
1572
- "epoch": 2.673469387755102
1573
- },
1574
- {
1575
- "current_steps": 262,
1576
- "loss": 0.6155,
1577
- "learning_rate": 5.897435897435898e-07,
1578
- "epoch": 2.683673469387755
1579
- },
1580
- {
1581
- "current_steps": 263,
1582
- "loss": 1.148,
1583
- "learning_rate": 5.871794871794872e-07,
1584
- "epoch": 2.693877551020408
1585
- },
1586
- {
1587
- "current_steps": 264,
1588
- "loss": 1.2103,
1589
- "learning_rate": 5.846153846153847e-07,
1590
- "epoch": 2.704081632653061
1591
- },
1592
- {
1593
- "current_steps": 265,
1594
- "loss": 0.8083,
1595
- "learning_rate": 5.82051282051282e-07,
1596
- "epoch": 2.7142857142857144
1597
- },
1598
- {
1599
- "current_steps": 266,
1600
- "loss": 0.9858,
1601
- "learning_rate": 5.794871794871795e-07,
1602
- "epoch": 2.7244897959183674
1603
- },
1604
- {
1605
- "current_steps": 267,
1606
- "loss": 0.9855,
1607
- "learning_rate": 5.769230769230768e-07,
1608
- "epoch": 2.7346938775510203
1609
- },
1610
- {
1611
- "current_steps": 268,
1612
- "loss": 1.1443,
1613
- "learning_rate": 5.743589743589743e-07,
1614
- "epoch": 2.7448979591836737
1615
- },
1616
- {
1617
- "current_steps": 269,
1618
- "loss": 0.8432,
1619
- "learning_rate": 5.717948717948717e-07,
1620
- "epoch": 2.7551020408163263
1621
- },
1622
- {
1623
- "current_steps": 270,
1624
- "loss": 1.1858,
1625
- "learning_rate": 5.692307692307692e-07,
1626
- "epoch": 2.7653061224489797
1627
- },
1628
- {
1629
- "current_steps": 271,
1630
- "loss": 1.5042,
1631
- "learning_rate": 5.666666666666666e-07,
1632
- "epoch": 2.7755102040816326
1633
- },
1634
- {
1635
- "current_steps": 272,
1636
- "loss": 1.0224,
1637
- "learning_rate": 5.641025641025641e-07,
1638
- "epoch": 2.7857142857142856
1639
- },
1640
- {
1641
- "current_steps": 273,
1642
- "loss": 1.243,
1643
- "learning_rate": 5.615384615384615e-07,
1644
- "epoch": 2.795918367346939
1645
- },
1646
- {
1647
- "current_steps": 274,
1648
- "loss": 0.9863,
1649
- "learning_rate": 5.58974358974359e-07,
1650
- "epoch": 2.806122448979592
1651
- },
1652
- {
1653
- "current_steps": 275,
1654
- "loss": 1.2529,
1655
- "learning_rate": 5.564102564102564e-07,
1656
- "epoch": 2.816326530612245
1657
- },
1658
- {
1659
- "current_steps": 276,
1660
- "loss": 0.9227,
1661
- "learning_rate": 5.538461538461539e-07,
1662
- "epoch": 2.826530612244898
1663
- },
1664
- {
1665
- "current_steps": 277,
1666
- "loss": 0.8991,
1667
- "learning_rate": 5.512820512820513e-07,
1668
- "epoch": 2.836734693877551
1669
- },
1670
- {
1671
- "current_steps": 278,
1672
- "loss": 0.9586,
1673
- "learning_rate": 5.487179487179488e-07,
1674
- "epoch": 2.8469387755102042
1675
- },
1676
- {
1677
- "current_steps": 279,
1678
- "loss": 0.9635,
1679
- "learning_rate": 5.461538461538461e-07,
1680
- "epoch": 2.857142857142857
1681
- },
1682
- {
1683
- "current_steps": 280,
1684
- "loss": 1.0263,
1685
- "learning_rate": 5.435897435897435e-07,
1686
- "epoch": 2.86734693877551
1687
- },
1688
- {
1689
- "current_steps": 281,
1690
- "loss": 0.9285,
1691
- "learning_rate": 5.41025641025641e-07,
1692
- "epoch": 2.877551020408163
1693
- },
1694
- {
1695
- "current_steps": 282,
1696
- "loss": 1.2307,
1697
- "learning_rate": 5.384615384615384e-07,
1698
- "epoch": 2.887755102040816
1699
- },
1700
- {
1701
- "current_steps": 283,
1702
- "loss": 1.0337,
1703
- "learning_rate": 5.358974358974359e-07,
1704
- "epoch": 2.8979591836734695
1705
- },
1706
- {
1707
- "current_steps": 284,
1708
- "loss": 0.9798,
1709
- "learning_rate": 5.333333333333333e-07,
1710
- "epoch": 2.9081632653061225
1711
- },
1712
- {
1713
- "current_steps": 285,
1714
- "loss": 1.0283,
1715
- "learning_rate": 5.307692307692308e-07,
1716
- "epoch": 2.9183673469387754
1717
- },
1718
- {
1719
- "current_steps": 286,
1720
- "loss": 0.946,
1721
- "learning_rate": 5.282051282051282e-07,
1722
- "epoch": 2.928571428571429
1723
- },
1724
- {
1725
- "current_steps": 287,
1726
- "loss": 0.9716,
1727
- "learning_rate": 5.256410256410256e-07,
1728
- "epoch": 2.938775510204082
1729
- },
1730
- {
1731
- "current_steps": 288,
1732
- "loss": 1.0291,
1733
- "learning_rate": 5.23076923076923e-07,
1734
- "epoch": 2.9489795918367347
1735
- },
1736
- {
1737
- "current_steps": 289,
1738
- "loss": 1.1638,
1739
- "learning_rate": 5.205128205128205e-07,
1740
- "epoch": 2.9591836734693877
1741
- },
1742
- {
1743
- "current_steps": 290,
1744
- "loss": 1.249,
1745
- "learning_rate": 5.179487179487179e-07,
1746
- "epoch": 2.9693877551020407
1747
- },
1748
- {
1749
- "current_steps": 291,
1750
- "loss": 1.1532,
1751
- "learning_rate": 5.153846153846153e-07,
1752
- "epoch": 2.979591836734694
1753
- },
1754
- {
1755
- "current_steps": 292,
1756
- "loss": 0.8845,
1757
- "learning_rate": 5.128205128205127e-07,
1758
- "epoch": 2.989795918367347
1759
- },
1760
- {
1761
- "current_steps": 293,
1762
- "loss": 1.2865,
1763
- "learning_rate": 5.102564102564102e-07,
1764
- "epoch": 3.0
1765
- },
1766
- {
1767
- "current_steps": 294,
1768
- "loss": 1.0558,
1769
- "learning_rate": 5.076923076923076e-07,
1770
- "epoch": 3.010204081632653
1771
- },
1772
- {
1773
- "current_steps": 295,
1774
- "loss": 0.8381,
1775
- "learning_rate": 5.051282051282051e-07,
1776
- "epoch": 3.020408163265306
1777
- },
1778
- {
1779
- "current_steps": 296,
1780
- "loss": 0.9348,
1781
- "learning_rate": 5.025641025641025e-07,
1782
- "epoch": 3.0306122448979593
1783
- },
1784
- {
1785
- "current_steps": 297,
1786
- "loss": 0.9659,
1787
- "learning_rate": 5e-07,
1788
- "epoch": 3.0408163265306123
1789
- },
1790
- {
1791
- "current_steps": 298,
1792
- "loss": 1.3105,
1793
- "learning_rate": 4.974358974358974e-07,
1794
- "epoch": 3.0510204081632653
1795
- },
1796
- {
1797
- "current_steps": 299,
1798
- "loss": 0.7721,
1799
- "learning_rate": 4.948717948717949e-07,
1800
- "epoch": 3.061224489795918
1801
- },
1802
- {
1803
- "current_steps": 300,
1804
- "loss": 0.9762,
1805
- "learning_rate": 4.923076923076923e-07,
1806
- "epoch": 3.0714285714285716
1807
- },
1808
- {
1809
- "current_steps": 301,
1810
- "loss": 0.9398,
1811
- "learning_rate": 4.897435897435897e-07,
1812
- "epoch": 3.0816326530612246
1813
- },
1814
- {
1815
- "current_steps": 302,
1816
- "loss": 1.2612,
1817
- "learning_rate": 4.871794871794871e-07,
1818
- "epoch": 3.0918367346938775
1819
- },
1820
- {
1821
- "current_steps": 303,
1822
- "loss": 1.2505,
1823
- "learning_rate": 4.846153846153846e-07,
1824
- "epoch": 3.1020408163265305
1825
- },
1826
- {
1827
- "current_steps": 304,
1828
- "loss": 1.1641,
1829
- "learning_rate": 4.82051282051282e-07,
1830
- "epoch": 3.1122448979591835
1831
- },
1832
- {
1833
- "current_steps": 305,
1834
- "loss": 1.0805,
1835
- "learning_rate": 4.794871794871795e-07,
1836
- "epoch": 3.122448979591837
1837
- },
1838
- {
1839
- "current_steps": 306,
1840
- "loss": 0.8856,
1841
- "learning_rate": 4.769230769230769e-07,
1842
- "epoch": 3.13265306122449
1843
- },
1844
- {
1845
- "current_steps": 307,
1846
- "loss": 1.1931,
1847
- "learning_rate": 4.743589743589743e-07,
1848
- "epoch": 3.142857142857143
1849
- },
1850
- {
1851
- "current_steps": 308,
1852
- "loss": 1.0618,
1853
- "learning_rate": 4.7179487179487176e-07,
1854
- "epoch": 3.1530612244897958
1855
- },
1856
- {
1857
- "current_steps": 309,
1858
- "loss": 0.9113,
1859
- "learning_rate": 4.692307692307692e-07,
1860
- "epoch": 3.163265306122449
1861
- },
1862
- {
1863
- "current_steps": 310,
1864
- "loss": 1.0872,
1865
- "learning_rate": 4.6666666666666666e-07,
1866
- "epoch": 3.173469387755102
1867
- },
1868
- {
1869
- "current_steps": 311,
1870
- "loss": 1.0619,
1871
- "learning_rate": 4.641025641025641e-07,
1872
- "epoch": 3.183673469387755
1873
- },
1874
- {
1875
- "current_steps": 312,
1876
- "loss": 0.8784,
1877
- "learning_rate": 4.6153846153846156e-07,
1878
- "epoch": 3.193877551020408
1879
- },
1880
- {
1881
- "current_steps": 313,
1882
- "loss": 1.0601,
1883
- "learning_rate": 4.5897435897435896e-07,
1884
- "epoch": 3.204081632653061
1885
- },
1886
- {
1887
- "current_steps": 314,
1888
- "loss": 0.959,
1889
- "learning_rate": 4.5641025641025636e-07,
1890
- "epoch": 3.2142857142857144
1891
- },
1892
- {
1893
- "current_steps": 315,
1894
- "loss": 1.0085,
1895
- "learning_rate": 4.538461538461538e-07,
1896
- "epoch": 3.2244897959183674
1897
- },
1898
- {
1899
- "current_steps": 316,
1900
- "loss": 1.0718,
1901
- "learning_rate": 4.5128205128205125e-07,
1902
- "epoch": 3.2346938775510203
1903
- },
1904
- {
1905
- "current_steps": 317,
1906
- "loss": 1.1865,
1907
- "learning_rate": 4.487179487179487e-07,
1908
- "epoch": 3.2448979591836733
1909
- },
1910
- {
1911
- "current_steps": 318,
1912
- "loss": 0.9815,
1913
- "learning_rate": 4.4615384615384615e-07,
1914
- "epoch": 3.2551020408163267
1915
- },
1916
- {
1917
- "current_steps": 319,
1918
- "loss": 1.1176,
1919
- "learning_rate": 4.4358974358974355e-07,
1920
- "epoch": 3.2653061224489797
1921
- },
1922
- {
1923
- "current_steps": 320,
1924
- "loss": 0.7826,
1925
- "learning_rate": 4.41025641025641e-07,
1926
- "epoch": 3.2755102040816326
1927
- },
1928
- {
1929
- "current_steps": 321,
1930
- "loss": 1.0717,
1931
- "learning_rate": 4.3846153846153845e-07,
1932
- "epoch": 3.2857142857142856
1933
- },
1934
- {
1935
- "current_steps": 322,
1936
- "loss": 0.9129,
1937
- "learning_rate": 4.358974358974359e-07,
1938
- "epoch": 3.295918367346939
1939
- },
1940
- {
1941
- "current_steps": 323,
1942
- "loss": 0.8037,
1943
- "learning_rate": 4.3333333333333335e-07,
1944
- "epoch": 3.306122448979592
1945
- },
1946
- {
1947
- "current_steps": 324,
1948
- "loss": 1.1173,
1949
- "learning_rate": 4.307692307692308e-07,
1950
- "epoch": 3.316326530612245
1951
- },
1952
- {
1953
- "current_steps": 325,
1954
- "loss": 1.2198,
1955
- "learning_rate": 4.2820512820512814e-07,
1956
- "epoch": 3.326530612244898
1957
- },
1958
- {
1959
- "current_steps": 326,
1960
- "loss": 1.0326,
1961
- "learning_rate": 4.256410256410256e-07,
1962
- "epoch": 3.336734693877551
1963
- },
1964
- {
1965
- "current_steps": 327,
1966
- "loss": 1.237,
1967
- "learning_rate": 4.2307692307692304e-07,
1968
- "epoch": 3.3469387755102042
1969
- },
1970
- {
1971
- "current_steps": 328,
1972
- "loss": 1.1247,
1973
- "learning_rate": 4.205128205128205e-07,
1974
- "epoch": 3.357142857142857
1975
- },
1976
- {
1977
- "current_steps": 329,
1978
- "loss": 1.0071,
1979
- "learning_rate": 4.1794871794871794e-07,
1980
- "epoch": 3.36734693877551
1981
- },
1982
- {
1983
- "current_steps": 330,
1984
- "loss": 1.2143,
1985
- "learning_rate": 4.153846153846154e-07,
1986
- "epoch": 3.377551020408163
1987
- },
1988
- {
1989
- "current_steps": 331,
1990
- "loss": 0.9502,
1991
- "learning_rate": 4.128205128205128e-07,
1992
- "epoch": 3.387755102040816
1993
- },
1994
- {
1995
- "current_steps": 332,
1996
- "loss": 0.8315,
1997
- "learning_rate": 4.1025641025641024e-07,
1998
- "epoch": 3.3979591836734695
1999
- },
2000
- {
2001
- "current_steps": 333,
2002
- "loss": 0.8832,
2003
- "learning_rate": 4.076923076923077e-07,
2004
- "epoch": 3.4081632653061225
2005
- },
2006
- {
2007
- "current_steps": 334,
2008
- "loss": 1.0922,
2009
- "learning_rate": 4.0512820512820514e-07,
2010
- "epoch": 3.4183673469387754
2011
- },
2012
- {
2013
- "current_steps": 335,
2014
- "loss": 0.8653,
2015
- "learning_rate": 4.025641025641026e-07,
2016
- "epoch": 3.4285714285714284
2017
- },
2018
- {
2019
- "current_steps": 336,
2020
- "loss": 0.8588,
2021
- "learning_rate": 4e-07,
2022
- "epoch": 3.438775510204082
2023
- },
2024
- {
2025
- "current_steps": 337,
2026
- "loss": 1.0424,
2027
- "learning_rate": 3.974358974358974e-07,
2028
- "epoch": 3.4489795918367347
2029
- },
2030
- {
2031
- "current_steps": 338,
2032
- "loss": 1.0364,
2033
- "learning_rate": 3.9487179487179483e-07,
2034
- "epoch": 3.4591836734693877
2035
- },
2036
- {
2037
- "current_steps": 339,
2038
- "loss": 1.1119,
2039
- "learning_rate": 3.923076923076923e-07,
2040
- "epoch": 3.4693877551020407
2041
- },
2042
- {
2043
- "current_steps": 340,
2044
- "loss": 1.412,
2045
- "learning_rate": 3.8974358974358973e-07,
2046
- "epoch": 3.479591836734694
2047
- },
2048
- {
2049
- "current_steps": 341,
2050
- "loss": 1.0895,
2051
- "learning_rate": 3.871794871794872e-07,
2052
- "epoch": 3.489795918367347
2053
- },
2054
- {
2055
- "current_steps": 342,
2056
- "loss": 0.937,
2057
- "learning_rate": 3.8461538461538463e-07,
2058
- "epoch": 3.5
2059
- },
2060
- {
2061
- "current_steps": 343,
2062
- "loss": 0.9634,
2063
- "learning_rate": 3.82051282051282e-07,
2064
- "epoch": 3.510204081632653
2065
- },
2066
- {
2067
- "current_steps": 344,
2068
- "loss": 1.0896,
2069
- "learning_rate": 3.7948717948717947e-07,
2070
- "epoch": 3.520408163265306
2071
- },
2072
- {
2073
- "current_steps": 345,
2074
- "loss": 1.1336,
2075
- "learning_rate": 3.769230769230769e-07,
2076
- "epoch": 3.5306122448979593
2077
- },
2078
- {
2079
- "current_steps": 346,
2080
- "loss": 1.1216,
2081
- "learning_rate": 3.743589743589743e-07,
2082
- "epoch": 3.5408163265306123
2083
- },
2084
- {
2085
- "current_steps": 347,
2086
- "loss": 1.0329,
2087
- "learning_rate": 3.7179487179487177e-07,
2088
- "epoch": 3.5510204081632653
2089
- },
2090
- {
2091
- "current_steps": 348,
2092
- "loss": 0.9592,
2093
- "learning_rate": 3.692307692307692e-07,
2094
- "epoch": 3.561224489795918
2095
- },
2096
- {
2097
- "current_steps": 349,
2098
- "loss": 1.1346,
2099
- "learning_rate": 3.666666666666666e-07,
2100
- "epoch": 3.571428571428571
2101
- },
2102
- {
2103
- "current_steps": 350,
2104
- "loss": 0.9627,
2105
- "learning_rate": 3.6410256410256406e-07,
2106
- "epoch": 3.5816326530612246
2107
- },
2108
- {
2109
- "current_steps": 351,
2110
- "loss": 0.8698,
2111
- "learning_rate": 3.615384615384615e-07,
2112
- "epoch": 3.5918367346938775
2113
- },
2114
- {
2115
- "current_steps": 352,
2116
- "loss": 1.0556,
2117
- "learning_rate": 3.5897435897435896e-07,
2118
- "epoch": 3.6020408163265305
2119
- },
2120
- {
2121
- "current_steps": 353,
2122
- "loss": 1.1182,
2123
- "learning_rate": 3.564102564102564e-07,
2124
- "epoch": 3.612244897959184
2125
- },
2126
- {
2127
- "current_steps": 354,
2128
- "loss": 1.1447,
2129
- "learning_rate": 3.5384615384615386e-07,
2130
- "epoch": 3.622448979591837
2131
- },
2132
- {
2133
- "current_steps": 355,
2134
- "loss": 0.8987,
2135
- "learning_rate": 3.5128205128205126e-07,
2136
- "epoch": 3.63265306122449
2137
- },
2138
- {
2139
- "current_steps": 356,
2140
- "loss": 0.8758,
2141
- "learning_rate": 3.487179487179487e-07,
2142
- "epoch": 3.642857142857143
2143
- },
2144
- {
2145
- "current_steps": 357,
2146
- "loss": 1.7681,
2147
- "learning_rate": 3.461538461538461e-07,
2148
- "epoch": 3.6530612244897958
2149
- },
2150
- {
2151
- "current_steps": 358,
2152
- "loss": 1.0395,
2153
- "learning_rate": 3.4358974358974356e-07,
2154
- "epoch": 3.663265306122449
2155
- },
2156
- {
2157
- "current_steps": 359,
2158
- "loss": 0.7837,
2159
- "learning_rate": 3.41025641025641e-07,
2160
- "epoch": 3.673469387755102
2161
- },
2162
- {
2163
- "current_steps": 360,
2164
- "loss": 1.0153,
2165
- "learning_rate": 3.3846153846153845e-07,
2166
- "epoch": 3.683673469387755
2167
- },
2168
- {
2169
- "current_steps": 361,
2170
- "loss": 1.1471,
2171
- "learning_rate": 3.3589743589743585e-07,
2172
- "epoch": 3.693877551020408
2173
- },
2174
- {
2175
- "current_steps": 362,
2176
- "loss": 0.8569,
2177
- "learning_rate": 3.333333333333333e-07,
2178
- "epoch": 3.704081632653061
2179
- },
2180
- {
2181
- "current_steps": 363,
2182
- "loss": 1.0616,
2183
- "learning_rate": 3.3076923076923075e-07,
2184
- "epoch": 3.7142857142857144
2185
- },
2186
- {
2187
- "current_steps": 364,
2188
- "loss": 1.0197,
2189
- "learning_rate": 3.282051282051282e-07,
2190
- "epoch": 3.7244897959183674
2191
- },
2192
- {
2193
- "current_steps": 365,
2194
- "loss": 0.7505,
2195
- "learning_rate": 3.2564102564102565e-07,
2196
- "epoch": 3.7346938775510203
2197
- },
2198
- {
2199
- "current_steps": 366,
2200
- "loss": 1.1823,
2201
- "learning_rate": 3.230769230769231e-07,
2202
- "epoch": 3.7448979591836737
2203
- },
2204
- {
2205
- "current_steps": 367,
2206
- "loss": 1.0465,
2207
- "learning_rate": 3.2051282051282055e-07,
2208
- "epoch": 3.7551020408163263
2209
- },
2210
- {
2211
- "current_steps": 368,
2212
- "loss": 1.2175,
2213
- "learning_rate": 3.179487179487179e-07,
2214
- "epoch": 3.7653061224489797
2215
- },
2216
- {
2217
- "current_steps": 369,
2218
- "loss": 0.7914,
2219
- "learning_rate": 3.1538461538461534e-07,
2220
- "epoch": 3.7755102040816326
2221
- },
2222
- {
2223
- "current_steps": 370,
2224
- "loss": 1.0506,
2225
- "learning_rate": 3.128205128205128e-07,
2226
- "epoch": 3.7857142857142856
2227
- },
2228
- {
2229
- "current_steps": 371,
2230
- "loss": 1.2272,
2231
- "learning_rate": 3.1025641025641024e-07,
2232
- "epoch": 3.795918367346939
2233
- },
2234
- {
2235
- "current_steps": 372,
2236
- "loss": 0.9431,
2237
- "learning_rate": 3.076923076923077e-07,
2238
- "epoch": 3.806122448979592
2239
- },
2240
- {
2241
- "current_steps": 373,
2242
- "loss": 1.0228,
2243
- "learning_rate": 3.0512820512820514e-07,
2244
- "epoch": 3.816326530612245
2245
- },
2246
- {
2247
- "current_steps": 374,
2248
- "loss": 0.8733,
2249
- "learning_rate": 3.0256410256410254e-07,
2250
- "epoch": 3.826530612244898
2251
- },
2252
- {
2253
- "current_steps": 375,
2254
- "loss": 1.1344,
2255
- "learning_rate": 3e-07,
2256
- "epoch": 3.836734693877551
2257
- },
2258
- {
2259
- "current_steps": 376,
2260
- "loss": 1.0388,
2261
- "learning_rate": 2.9743589743589744e-07,
2262
- "epoch": 3.8469387755102042
2263
- },
2264
- {
2265
- "current_steps": 377,
2266
- "loss": 1.0039,
2267
- "learning_rate": 2.948717948717949e-07,
2268
- "epoch": 3.857142857142857
2269
- },
2270
- {
2271
- "current_steps": 378,
2272
- "loss": 1.161,
2273
- "learning_rate": 2.9230769230769234e-07,
2274
- "epoch": 3.86734693877551
2275
- },
2276
- {
2277
- "current_steps": 379,
2278
- "loss": 0.7736,
2279
- "learning_rate": 2.8974358974358973e-07,
2280
- "epoch": 3.877551020408163
2281
- },
2282
- {
2283
- "current_steps": 380,
2284
- "loss": 1.0812,
2285
- "learning_rate": 2.8717948717948713e-07,
2286
- "epoch": 3.887755102040816
2287
- },
2288
- {
2289
- "current_steps": 381,
2290
- "loss": 1.067,
2291
- "learning_rate": 2.846153846153846e-07,
2292
- "epoch": 3.8979591836734695
2293
- },
2294
- {
2295
- "current_steps": 382,
2296
- "loss": 0.8961,
2297
- "learning_rate": 2.8205128205128203e-07,
2298
- "epoch": 3.9081632653061225
2299
- },
2300
- {
2301
- "current_steps": 383,
2302
- "loss": 0.7864,
2303
- "learning_rate": 2.794871794871795e-07,
2304
- "epoch": 3.9183673469387754
2305
- },
2306
- {
2307
- "current_steps": 384,
2308
- "loss": 0.8673,
2309
- "learning_rate": 2.7692307692307693e-07,
2310
- "epoch": 3.928571428571429
2311
- },
2312
- {
2313
- "current_steps": 385,
2314
- "loss": 0.9608,
2315
- "learning_rate": 2.743589743589744e-07,
2316
- "epoch": 3.938775510204082
2317
- },
2318
- {
2319
- "current_steps": 386,
2320
- "loss": 1.0788,
2321
- "learning_rate": 2.7179487179487177e-07,
2322
- "epoch": 3.9489795918367347
2323
- },
2324
- {
2325
- "current_steps": 387,
2326
- "loss": 1.2868,
2327
- "learning_rate": 2.692307692307692e-07,
2328
- "epoch": 3.9591836734693877
2329
- },
2330
- {
2331
- "current_steps": 388,
2332
- "loss": 0.8304,
2333
- "learning_rate": 2.6666666666666667e-07,
2334
- "epoch": 3.9693877551020407
2335
- },
2336
- {
2337
- "current_steps": 389,
2338
- "loss": 0.7779,
2339
- "learning_rate": 2.641025641025641e-07,
2340
- "epoch": 3.979591836734694
2341
- },
2342
- {
2343
- "current_steps": 390,
2344
- "loss": 0.7525,
2345
- "learning_rate": 2.615384615384615e-07,
2346
- "epoch": 3.989795918367347
2347
- },
2348
- {
2349
- "current_steps": 391,
2350
- "loss": 0.862,
2351
- "learning_rate": 2.5897435897435897e-07,
2352
- "epoch": 4.0
2353
- },
2354
- {
2355
- "current_steps": 392,
2356
- "loss": 1.0776,
2357
- "learning_rate": 2.5641025641025636e-07,
2358
- "epoch": 4.010204081632653
2359
- },
2360
- {
2361
- "current_steps": 393,
2362
- "loss": 0.9312,
2363
- "learning_rate": 2.538461538461538e-07,
2364
- "epoch": 4.020408163265306
2365
- },
2366
- {
2367
- "current_steps": 394,
2368
- "loss": 1.3004,
2369
- "learning_rate": 2.5128205128205126e-07,
2370
- "epoch": 4.030612244897959
2371
- },
2372
- {
2373
- "current_steps": 395,
2374
- "loss": 1.2295,
2375
- "learning_rate": 2.487179487179487e-07,
2376
- "epoch": 4.040816326530612
2377
- },
2378
- {
2379
- "current_steps": 396,
2380
- "loss": 1.0688,
2381
- "learning_rate": 2.4615384615384616e-07,
2382
- "epoch": 4.051020408163265
2383
- },
2384
- {
2385
- "current_steps": 397,
2386
- "loss": 0.906,
2387
- "learning_rate": 2.4358974358974356e-07,
2388
- "epoch": 4.061224489795919
2389
- },
2390
- {
2391
- "current_steps": 398,
2392
- "loss": 0.8381,
2393
- "learning_rate": 2.41025641025641e-07,
2394
- "epoch": 4.071428571428571
2395
- },
2396
- {
2397
- "current_steps": 399,
2398
- "loss": 0.9278,
2399
- "learning_rate": 2.3846153846153846e-07,
2400
- "epoch": 4.081632653061225
2401
- },
2402
- {
2403
- "current_steps": 400,
2404
- "loss": 1.3734,
2405
- "learning_rate": 2.3589743589743588e-07,
2406
- "epoch": 4.091836734693878
2407
- },
2408
- {
2409
- "current_steps": 401,
2410
- "loss": 1.2469,
2411
- "learning_rate": 2.3333333333333333e-07,
2412
- "epoch": 4.1020408163265305
2413
- },
2414
- {
2415
- "current_steps": 402,
2416
- "loss": 1.3845,
2417
- "learning_rate": 2.3076923076923078e-07,
2418
- "epoch": 4.112244897959184
2419
- },
2420
- {
2421
- "current_steps": 403,
2422
- "loss": 0.9578,
2423
- "learning_rate": 2.2820512820512818e-07,
2424
- "epoch": 4.122448979591836
2425
- },
2426
- {
2427
- "current_steps": 404,
2428
- "loss": 0.9949,
2429
- "learning_rate": 2.2564102564102563e-07,
2430
- "epoch": 4.13265306122449
2431
- },
2432
- {
2433
- "current_steps": 405,
2434
- "loss": 1.0655,
2435
- "learning_rate": 2.2307692307692308e-07,
2436
- "epoch": 4.142857142857143
2437
- },
2438
- {
2439
- "current_steps": 406,
2440
- "loss": 0.8242,
2441
- "learning_rate": 2.205128205128205e-07,
2442
- "epoch": 4.153061224489796
2443
- },
2444
- {
2445
- "current_steps": 407,
2446
- "loss": 1.1328,
2447
- "learning_rate": 2.1794871794871795e-07,
2448
- "epoch": 4.163265306122449
2449
- },
2450
- {
2451
- "current_steps": 408,
2452
- "loss": 0.9127,
2453
- "learning_rate": 2.153846153846154e-07,
2454
- "epoch": 4.173469387755102
2455
- },
2456
- {
2457
- "current_steps": 409,
2458
- "loss": 0.9174,
2459
- "learning_rate": 2.128205128205128e-07,
2460
- "epoch": 4.183673469387755
2461
- },
2462
- {
2463
- "current_steps": 410,
2464
- "loss": 1.2596,
2465
- "learning_rate": 2.1025641025641025e-07,
2466
- "epoch": 4.1938775510204085
2467
- },
2468
- {
2469
- "current_steps": 411,
2470
- "loss": 1.109,
2471
- "learning_rate": 2.076923076923077e-07,
2472
- "epoch": 4.204081632653061
2473
- },
2474
- {
2475
- "current_steps": 412,
2476
- "loss": 1.313,
2477
- "learning_rate": 2.0512820512820512e-07,
2478
- "epoch": 4.214285714285714
2479
- },
2480
- {
2481
- "current_steps": 413,
2482
- "loss": 1.0135,
2483
- "learning_rate": 2.0256410256410257e-07,
2484
- "epoch": 4.224489795918367
2485
- },
2486
- {
2487
- "current_steps": 414,
2488
- "loss": 0.9048,
2489
- "learning_rate": 2e-07,
2490
- "epoch": 4.23469387755102
2491
- },
2492
- {
2493
- "current_steps": 415,
2494
- "loss": 1.1162,
2495
- "learning_rate": 1.9743589743589741e-07,
2496
- "epoch": 4.244897959183674
2497
- },
2498
- {
2499
- "current_steps": 416,
2500
- "loss": 0.8849,
2501
- "learning_rate": 1.9487179487179486e-07,
2502
- "epoch": 4.255102040816326
2503
- },
2504
- {
2505
- "current_steps": 417,
2506
- "loss": 0.8803,
2507
- "learning_rate": 1.9230769230769231e-07,
2508
- "epoch": 4.26530612244898
2509
- },
2510
- {
2511
- "current_steps": 418,
2512
- "loss": 0.9497,
2513
- "learning_rate": 1.8974358974358974e-07,
2514
- "epoch": 4.275510204081632
2515
- },
2516
- {
2517
- "current_steps": 419,
2518
- "loss": 0.9216,
2519
- "learning_rate": 1.8717948717948716e-07,
2520
- "epoch": 4.285714285714286
2521
- },
2522
- {
2523
- "current_steps": 420,
2524
- "loss": 0.8915,
2525
- "learning_rate": 1.846153846153846e-07,
2526
- "epoch": 4.295918367346939
2527
- },
2528
- {
2529
- "current_steps": 421,
2530
- "loss": 0.9967,
2531
- "learning_rate": 1.8205128205128203e-07,
2532
- "epoch": 4.3061224489795915
2533
- },
2534
- {
2535
- "current_steps": 422,
2536
- "loss": 0.9392,
2537
- "learning_rate": 1.7948717948717948e-07,
2538
- "epoch": 4.316326530612245
2539
- },
2540
- {
2541
- "current_steps": 423,
2542
- "loss": 1.1217,
2543
- "learning_rate": 1.7692307692307693e-07,
2544
- "epoch": 4.326530612244898
2545
- },
2546
- {
2547
- "current_steps": 424,
2548
- "loss": 1.0389,
2549
- "learning_rate": 1.7435897435897435e-07,
2550
- "epoch": 4.336734693877551
2551
- },
2552
- {
2553
- "current_steps": 425,
2554
- "loss": 1.106,
2555
- "learning_rate": 1.7179487179487178e-07,
2556
- "epoch": 4.346938775510204
2557
- },
2558
- {
2559
- "current_steps": 426,
2560
- "loss": 1.0438,
2561
- "learning_rate": 1.6923076923076923e-07,
2562
- "epoch": 4.357142857142857
2563
- },
2564
- {
2565
- "current_steps": 427,
2566
- "loss": 1.0671,
2567
- "learning_rate": 1.6666666666666665e-07,
2568
- "epoch": 4.36734693877551
2569
- },
2570
- {
2571
- "current_steps": 428,
2572
- "loss": 1.0963,
2573
- "learning_rate": 1.641025641025641e-07,
2574
- "epoch": 4.377551020408164
2575
- },
2576
- {
2577
- "current_steps": 429,
2578
- "loss": 1.0008,
2579
- "learning_rate": 1.6153846153846155e-07,
2580
- "epoch": 4.387755102040816
2581
- },
2582
- {
2583
- "current_steps": 430,
2584
- "loss": 0.8875,
2585
- "learning_rate": 1.5897435897435895e-07,
2586
- "epoch": 4.3979591836734695
2587
- },
2588
- {
2589
- "current_steps": 431,
2590
- "loss": 0.989,
2591
- "learning_rate": 1.564102564102564e-07,
2592
- "epoch": 4.408163265306122
2593
- },
2594
- {
2595
- "current_steps": 432,
2596
- "loss": 1.0361,
2597
- "learning_rate": 1.5384615384615385e-07,
2598
- "epoch": 4.418367346938775
2599
- },
2600
- {
2601
- "current_steps": 433,
2602
- "loss": 0.9075,
2603
- "learning_rate": 1.5128205128205127e-07,
2604
- "epoch": 4.428571428571429
2605
- },
2606
- {
2607
- "current_steps": 434,
2608
- "loss": 0.9151,
2609
- "learning_rate": 1.4871794871794872e-07,
2610
- "epoch": 4.438775510204081
2611
- },
2612
- {
2613
- "current_steps": 435,
2614
- "loss": 1.2658,
2615
- "learning_rate": 1.4615384615384617e-07,
2616
- "epoch": 4.448979591836735
2617
- },
2618
- {
2619
- "current_steps": 436,
2620
- "loss": 0.9476,
2621
- "learning_rate": 1.4358974358974356e-07,
2622
- "epoch": 4.459183673469388
2623
- },
2624
- {
2625
- "current_steps": 437,
2626
- "loss": 1.2257,
2627
- "learning_rate": 1.4102564102564101e-07,
2628
- "epoch": 4.469387755102041
2629
- },
2630
- {
2631
- "current_steps": 438,
2632
- "loss": 1.0466,
2633
- "learning_rate": 1.3846153846153846e-07,
2634
- "epoch": 4.479591836734694
2635
- },
2636
- {
2637
- "current_steps": 439,
2638
- "loss": 1.0721,
2639
- "learning_rate": 1.3589743589743589e-07,
2640
- "epoch": 4.489795918367347
2641
- },
2642
- {
2643
- "current_steps": 440,
2644
- "loss": 0.9636,
2645
- "learning_rate": 1.3333333333333334e-07,
2646
- "epoch": 4.5
2647
- },
2648
- {
2649
- "current_steps": 441,
2650
- "loss": 0.7291,
2651
- "learning_rate": 1.3076923076923076e-07,
2652
- "epoch": 4.510204081632653
2653
- },
2654
- {
2655
- "current_steps": 442,
2656
- "loss": 1.1522,
2657
- "learning_rate": 1.2820512820512818e-07,
2658
- "epoch": 4.520408163265306
2659
- },
2660
- {
2661
- "current_steps": 443,
2662
- "loss": 0.8466,
2663
- "learning_rate": 1.2564102564102563e-07,
2664
- "epoch": 4.530612244897959
2665
- },
2666
- {
2667
- "current_steps": 444,
2668
- "loss": 0.9945,
2669
- "learning_rate": 1.2307692307692308e-07,
2670
- "epoch": 4.540816326530612
2671
- },
2672
- {
2673
- "current_steps": 445,
2674
- "loss": 0.9127,
2675
- "learning_rate": 1.205128205128205e-07,
2676
- "epoch": 4.551020408163265
2677
- },
2678
- {
2679
- "current_steps": 446,
2680
- "loss": 0.6726,
2681
- "learning_rate": 1.1794871794871794e-07,
2682
- "epoch": 4.561224489795919
2683
- },
2684
- {
2685
- "current_steps": 447,
2686
- "loss": 1.2726,
2687
- "learning_rate": 1.1538461538461539e-07,
2688
- "epoch": 4.571428571428571
2689
- },
2690
- {
2691
- "current_steps": 448,
2692
- "loss": 0.9189,
2693
- "learning_rate": 1.1282051282051281e-07,
2694
- "epoch": 4.581632653061225
2695
- },
2696
- {
2697
- "current_steps": 449,
2698
- "loss": 0.7115,
2699
- "learning_rate": 1.1025641025641025e-07,
2700
- "epoch": 4.591836734693878
2701
- },
2702
- {
2703
- "current_steps": 450,
2704
- "loss": 0.8172,
2705
- "learning_rate": 1.076923076923077e-07,
2706
- "epoch": 4.6020408163265305
2707
- },
2708
- {
2709
- "current_steps": 451,
2710
- "loss": 0.9128,
2711
- "learning_rate": 1.0512820512820512e-07,
2712
- "epoch": 4.612244897959184
2713
- },
2714
- {
2715
- "current_steps": 452,
2716
- "loss": 0.8185,
2717
- "learning_rate": 1.0256410256410256e-07,
2718
- "epoch": 4.622448979591836
2719
- },
2720
- {
2721
- "current_steps": 453,
2722
- "loss": 0.6656,
2723
- "learning_rate": 1e-07,
2724
- "epoch": 4.63265306122449
2725
- },
2726
- {
2727
- "current_steps": 454,
2728
- "loss": 0.9927,
2729
- "learning_rate": 9.743589743589743e-08,
2730
- "epoch": 4.642857142857143
2731
- },
2732
- {
2733
- "current_steps": 455,
2734
- "loss": 1.038,
2735
- "learning_rate": 9.487179487179487e-08,
2736
- "epoch": 4.653061224489796
2737
- },
2738
- {
2739
- "current_steps": 456,
2740
- "loss": 1.0286,
2741
- "learning_rate": 9.23076923076923e-08,
2742
- "epoch": 4.663265306122449
2743
- },
2744
- {
2745
- "current_steps": 457,
2746
- "loss": 0.9324,
2747
- "learning_rate": 8.974358974358974e-08,
2748
- "epoch": 4.673469387755102
2749
- },
2750
- {
2751
- "current_steps": 458,
2752
- "loss": 1.1796,
2753
- "learning_rate": 8.717948717948718e-08,
2754
- "epoch": 4.683673469387755
2755
- },
2756
- {
2757
- "current_steps": 459,
2758
- "loss": 1.0709,
2759
- "learning_rate": 8.461538461538461e-08,
2760
- "epoch": 4.6938775510204085
2761
- },
2762
- {
2763
- "current_steps": 460,
2764
- "loss": 1.2764,
2765
- "learning_rate": 8.205128205128205e-08,
2766
- "epoch": 4.704081632653061
2767
- },
2768
- {
2769
- "current_steps": 461,
2770
- "loss": 0.9408,
2771
- "learning_rate": 7.948717948717947e-08,
2772
- "epoch": 4.714285714285714
2773
- },
2774
- {
2775
- "current_steps": 462,
2776
- "loss": 1.0265,
2777
- "learning_rate": 7.692307692307692e-08,
2778
- "epoch": 4.724489795918368
2779
- },
2780
- {
2781
- "current_steps": 463,
2782
- "loss": 0.8105,
2783
- "learning_rate": 7.435897435897436e-08,
2784
- "epoch": 4.73469387755102
2785
- },
2786
- {
2787
- "current_steps": 464,
2788
- "loss": 0.8581,
2789
- "learning_rate": 7.179487179487178e-08,
2790
- "epoch": 4.744897959183674
2791
- },
2792
- {
2793
- "current_steps": 465,
2794
- "loss": 1.0193,
2795
- "learning_rate": 6.923076923076923e-08,
2796
- "epoch": 4.755102040816326
2797
- },
2798
- {
2799
- "current_steps": 466,
2800
- "loss": 0.9166,
2801
- "learning_rate": 6.666666666666667e-08,
2802
- "epoch": 4.76530612244898
2803
- },
2804
- {
2805
- "current_steps": 467,
2806
- "loss": 1.0751,
2807
- "learning_rate": 6.410256410256409e-08,
2808
- "epoch": 4.775510204081632
2809
- },
2810
- {
2811
- "current_steps": 468,
2812
- "loss": 1.0601,
2813
- "learning_rate": 6.153846153846154e-08,
2814
- "epoch": 4.785714285714286
2815
- },
2816
- {
2817
- "current_steps": 469,
2818
- "loss": 1.2884,
2819
- "learning_rate": 5.897435897435897e-08,
2820
- "epoch": 4.795918367346939
2821
- },
2822
- {
2823
- "current_steps": 470,
2824
- "loss": 0.9581,
2825
- "learning_rate": 5.641025641025641e-08,
2826
- "epoch": 4.8061224489795915
2827
- },
2828
- {
2829
- "current_steps": 471,
2830
- "loss": 1.0762,
2831
- "learning_rate": 5.384615384615385e-08,
2832
- "epoch": 4.816326530612245
2833
- },
2834
- {
2835
- "current_steps": 472,
2836
- "loss": 0.9197,
2837
- "learning_rate": 5.128205128205128e-08,
2838
- "epoch": 4.826530612244898
2839
- },
2840
- {
2841
- "current_steps": 473,
2842
- "loss": 1.0248,
2843
- "learning_rate": 4.8717948717948716e-08,
2844
- "epoch": 4.836734693877551
2845
- },
2846
- {
2847
- "current_steps": 474,
2848
- "loss": 1.0387,
2849
- "learning_rate": 4.615384615384615e-08,
2850
- "epoch": 4.846938775510204
2851
- },
2852
- {
2853
- "current_steps": 475,
2854
- "loss": 0.9627,
2855
- "learning_rate": 4.358974358974359e-08,
2856
- "epoch": 4.857142857142857
2857
- },
2858
- {
2859
- "current_steps": 476,
2860
- "loss": 0.951,
2861
- "learning_rate": 4.1025641025641025e-08,
2862
- "epoch": 4.86734693877551
2863
- },
2864
- {
2865
- "current_steps": 477,
2866
- "loss": 1.0213,
2867
- "learning_rate": 3.846153846153846e-08,
2868
- "epoch": 4.877551020408164
2869
- },
2870
- {
2871
- "current_steps": 478,
2872
- "loss": 1.4383,
2873
- "learning_rate": 3.589743589743589e-08,
2874
- "epoch": 4.887755102040816
2875
- },
2876
- {
2877
- "current_steps": 479,
2878
- "loss": 1.0652,
2879
- "learning_rate": 3.3333333333333334e-08,
2880
- "epoch": 4.8979591836734695
2881
- },
2882
- {
2883
- "current_steps": 480,
2884
- "loss": 0.9408,
2885
- "learning_rate": 3.076923076923077e-08,
2886
- "epoch": 4.908163265306122
2887
- },
2888
- {
2889
- "current_steps": 481,
2890
- "loss": 1.2111,
2891
- "learning_rate": 2.8205128205128203e-08,
2892
- "epoch": 4.918367346938775
2893
- },
2894
- {
2895
- "current_steps": 482,
2896
- "loss": 0.9162,
2897
- "learning_rate": 2.564102564102564e-08,
2898
- "epoch": 4.928571428571429
2899
- },
2900
- {
2901
- "current_steps": 483,
2902
- "loss": 0.9021,
2903
- "learning_rate": 2.3076923076923076e-08,
2904
- "epoch": 4.938775510204081
2905
- },
2906
- {
2907
- "current_steps": 484,
2908
- "loss": 1.2209,
2909
- "learning_rate": 2.0512820512820512e-08,
2910
- "epoch": 4.948979591836735
2911
- },
2912
- {
2913
- "current_steps": 485,
2914
- "loss": 0.8742,
2915
- "learning_rate": 1.7948717948717946e-08,
2916
- "epoch": 4.959183673469388
2917
- },
2918
- {
2919
- "current_steps": 486,
2920
- "loss": 1.0272,
2921
- "learning_rate": 1.5384615384615385e-08,
2922
- "epoch": 4.969387755102041
2923
- },
2924
- {
2925
- "current_steps": 487,
2926
- "loss": 1.3311,
2927
- "learning_rate": 1.282051282051282e-08,
2928
- "epoch": 4.979591836734694
2929
- },
2930
- {
2931
- "current_steps": 488,
2932
- "loss": 0.7024,
2933
- "learning_rate": 1.0256410256410256e-08,
2934
- "epoch": 4.989795918367347
2935
- },
2936
- {
2937
- "current_steps": 489,
2938
- "loss": 1.0515,
2939
- "learning_rate": 7.692307692307693e-09,
2940
- "epoch": 5.0
2941
- },
2942
- {
2943
- "current_steps": 489,
2944
- "loss": 1.0515,
2945
- "learning_rate": 7.692307692307693e-09,
2946
- "epoch": 5.0
2947
- }
2948
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/training_graph.png DELETED
Binary file (90 kB)
 
aliceinwonderland-llama3/training_log.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "base_model_name": "Meta-Llama-3-8b",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": true,
5
- "base_loaded_in_8bit": false,
6
- "projections": "q, v",
7
- "loss": 1.0515,
8
- "grad_norm": 3.200089693069458,
9
- "learning_rate": 7.692307692307693e-09,
10
- "epoch": 5.0,
11
- "current_steps": 489,
12
- "current_steps_adjusted": 489,
13
- "epoch_adjusted": 5.0,
14
- "train_runtime": 779.2785,
15
- "train_samples_per_second": 2.502,
16
- "train_steps_per_second": 0.629,
17
- "total_flos": 2.2519579410432e+16,
18
- "train_loss": 1.0483642434587284
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/training_parameters.json DELETED
@@ -1,37 +0,0 @@
1
- {
2
- "lora_name": "aliceinwonderland-llama3",
3
- "always_override": true,
4
- "save_steps": 0,
5
- "micro_batch_size": 4,
6
- "batch_size": 0,
7
- "epochs": 5,
8
- "learning_rate": "1e-6",
9
- "lr_scheduler_type": "linear",
10
- "lora_rank": 32,
11
- "lora_alpha": 64,
12
- "lora_dropout": 0.05,
13
- "cutoff_len": 256,
14
- "dataset": "None",
15
- "eval_dataset": "None",
16
- "format": "None",
17
- "eval_steps": 100,
18
- "raw_text_file": "aliceandwonderland",
19
- "higher_rank_limit": false,
20
- "warmup_steps": 100,
21
- "optimizer": "adamw_torch",
22
- "hard_cut_string": "\\n\\n\\n",
23
- "train_only_after": "",
24
- "stop_at_loss": 0,
25
- "add_eos_token": false,
26
- "min_chars": 20,
27
- "report_to": "None",
28
- "precize_slicing_overlap": true,
29
- "add_eos_token_type": "Every Block",
30
- "save_steps_under_loss": 1.8,
31
- "add_bos_token": true,
32
- "training_projection": "q-v",
33
- "sliding_window": false,
34
- "warmup_ratio": 0,
35
- "grad_accumulation": 1,
36
- "neft_noise_alpha": 0
37
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland-llama3/training_prompt.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "template_type": "raw_text"
3
- }