Daniel23Stack commited on
Commit
f175a7c
1 Parent(s): b71cb9c

Delete aliceinwonderland

Browse files
Files changed (26) hide show
  1. aliceinwonderland/README.md +0 -202
  2. aliceinwonderland/adapter_config.json +0 -27
  3. aliceinwonderland/adapter_model.bin +0 -3
  4. aliceinwonderland/checkpoint-15-loss-1_17/README.md +0 -202
  5. aliceinwonderland/checkpoint-15-loss-1_17/adapter_config.json +0 -27
  6. aliceinwonderland/checkpoint-15-loss-1_17/adapter_model.bin +0 -3
  7. aliceinwonderland/checkpoint-15-loss-1_17/training_log.json +0 -19
  8. aliceinwonderland/checkpoint-15-loss-1_17/training_prompt.json +0 -3
  9. aliceinwonderland/checkpoint-19-loss-0_90/README.md +0 -202
  10. aliceinwonderland/checkpoint-19-loss-0_90/adapter_config.json +0 -27
  11. aliceinwonderland/checkpoint-19-loss-0_90/adapter_model.bin +0 -3
  12. aliceinwonderland/checkpoint-19-loss-0_90/training_log.json +0 -19
  13. aliceinwonderland/checkpoint-19-loss-0_90/training_prompt.json +0 -3
  14. aliceinwonderland/checkpoint-23-loss-0_60/README.md +0 -202
  15. aliceinwonderland/checkpoint-23-loss-0_60/adapter_config.json +0 -27
  16. aliceinwonderland/checkpoint-23-loss-0_60/adapter_model.bin +0 -3
  17. aliceinwonderland/checkpoint-23-loss-0_60/training_log.json +0 -19
  18. aliceinwonderland/checkpoint-23-loss-0_60/training_prompt.json +0 -3
  19. aliceinwonderland/runs/Jun04_00-27-53_DESKTOP-7QRHF82/events.out.tfevents.1717478875.DESKTOP-7QRHF82.5780.0 +0 -3
  20. aliceinwonderland/runs/Jun04_00-32-30_DESKTOP-7QRHF82/events.out.tfevents.1717479151.DESKTOP-7QRHF82.5780.1 +0 -3
  21. aliceinwonderland/runs/Jun04_00-34-12_DESKTOP-7QRHF82/events.out.tfevents.1717479252.DESKTOP-7QRHF82.5780.2 +0 -3
  22. aliceinwonderland/training_graph.json +0 -3368
  23. aliceinwonderland/training_graph.png +0 -0
  24. aliceinwonderland/training_log.json +0 -19
  25. aliceinwonderland/training_parameters.json +0 -37
  26. aliceinwonderland/training_prompt.json +0 -3
aliceinwonderland/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- library_name: peft
3
- base_model: models\Llama-2-13b-hf
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.8.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/adapter_config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "models\\Llama-2-13b-hf",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layers_pattern": null,
10
- "layers_to_transform": null,
11
- "loftq_config": {},
12
- "lora_alpha": 64,
13
- "lora_dropout": 0.05,
14
- "megatron_config": null,
15
- "megatron_core": "megatron.core",
16
- "modules_to_save": null,
17
- "peft_type": "LORA",
18
- "r": 32,
19
- "rank_pattern": {},
20
- "revision": null,
21
- "target_modules": [
22
- "q_proj",
23
- "v_proj"
24
- ],
25
- "task_type": "CAUSAL_LM",
26
- "use_rslora": false
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/adapter_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8ec95cd40d9469ab7d56f444483c09b8727f0c834258363b46306bb4387fd5dd
3
- size 104915722
 
 
 
 
aliceinwonderland/checkpoint-15-loss-1_17/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- library_name: peft
3
- base_model: models\Llama-2-13b-hf
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.8.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-15-loss-1_17/adapter_config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "models\\Llama-2-13b-hf",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layers_pattern": null,
10
- "layers_to_transform": null,
11
- "loftq_config": {},
12
- "lora_alpha": 64,
13
- "lora_dropout": 0.05,
14
- "megatron_config": null,
15
- "megatron_core": "megatron.core",
16
- "modules_to_save": null,
17
- "peft_type": "LORA",
18
- "r": 32,
19
- "rank_pattern": {},
20
- "revision": null,
21
- "target_modules": [
22
- "q_proj",
23
- "v_proj"
24
- ],
25
- "task_type": "CAUSAL_LM",
26
- "use_rslora": false
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-15-loss-1_17/adapter_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:67856ecab2080beca8aca0a5e47b291f0805c418a022b4b2a97bb80d7f901ee7
3
- size 104915722
 
 
 
 
aliceinwonderland/checkpoint-15-loss-1_17/training_log.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "base_model_name": "Llama-2-13b-hf",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": true,
5
- "base_loaded_in_8bit": false,
6
- "projections": "q, v",
7
- "loss": 1.1716,
8
- "grad_norm": 1.0258234739303589,
9
- "learning_rate": 1.3e-07,
10
- "epoch": 0.13392857142857142,
11
- "current_steps": 14,
12
- "current_steps_adjusted": 14,
13
- "epoch_adjusted": 0.13392857142857142,
14
- "train_runtime": 60.8524,
15
- "train_samples_per_second": 7.313,
16
- "train_steps_per_second": 1.841,
17
- "total_flos": 1819849670000640.0,
18
- "train_loss": 0.7478187213773313
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-15-loss-1_17/training_prompt.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "template_type": "raw_text"
3
- }
 
 
 
 
aliceinwonderland/checkpoint-19-loss-0_90/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- library_name: peft
3
- base_model: models\Llama-2-13b-hf
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.8.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-19-loss-0_90/adapter_config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "models\\Llama-2-13b-hf",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layers_pattern": null,
10
- "layers_to_transform": null,
11
- "loftq_config": {},
12
- "lora_alpha": 64,
13
- "lora_dropout": 0.05,
14
- "megatron_config": null,
15
- "megatron_core": "megatron.core",
16
- "modules_to_save": null,
17
- "peft_type": "LORA",
18
- "r": 32,
19
- "rank_pattern": {},
20
- "revision": null,
21
- "target_modules": [
22
- "q_proj",
23
- "v_proj"
24
- ],
25
- "task_type": "CAUSAL_LM",
26
- "use_rslora": false
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-19-loss-0_90/adapter_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:33728eded59d53993902b4320dd496b739df642eebfd71df1f2fded242f1cf5e
3
- size 104915722
 
 
 
 
aliceinwonderland/checkpoint-19-loss-0_90/training_log.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "base_model_name": "Llama-2-13b-hf",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": true,
5
- "base_loaded_in_8bit": false,
6
- "projections": "q, v",
7
- "loss": 0.9004,
8
- "grad_norm": 0.8880526423454285,
9
- "learning_rate": 1.7000000000000001e-07,
10
- "epoch": 0.16964285714285715,
11
- "current_steps": 18,
12
- "current_steps_adjusted": 18,
13
- "epoch_adjusted": 0.16964285714285715,
14
- "train_runtime": 60.8524,
15
- "train_samples_per_second": 7.313,
16
- "train_steps_per_second": 1.841,
17
- "total_flos": 1819849670000640.0,
18
- "train_loss": 0.7478187213773313
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-19-loss-0_90/training_prompt.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "template_type": "raw_text"
3
- }
 
 
 
 
aliceinwonderland/checkpoint-23-loss-0_60/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- library_name: peft
3
- base_model: models\Llama-2-13b-hf
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.8.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-23-loss-0_60/adapter_config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "models\\Llama-2-13b-hf",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layers_pattern": null,
10
- "layers_to_transform": null,
11
- "loftq_config": {},
12
- "lora_alpha": 64,
13
- "lora_dropout": 0.05,
14
- "megatron_config": null,
15
- "megatron_core": "megatron.core",
16
- "modules_to_save": null,
17
- "peft_type": "LORA",
18
- "r": 32,
19
- "rank_pattern": {},
20
- "revision": null,
21
- "target_modules": [
22
- "q_proj",
23
- "v_proj"
24
- ],
25
- "task_type": "CAUSAL_LM",
26
- "use_rslora": false
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-23-loss-0_60/adapter_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:bfb41f5ef71b08818902ba0f659dca42b6e428c3fca8dc04cccd89f096be730b
3
- size 104915722
 
 
 
 
aliceinwonderland/checkpoint-23-loss-0_60/training_log.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "base_model_name": "Llama-2-13b-hf",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": true,
5
- "base_loaded_in_8bit": false,
6
- "projections": "q, v",
7
- "loss": 0.6049,
8
- "grad_norm": 1.030413269996643,
9
- "learning_rate": 2.0999999999999997e-07,
10
- "epoch": 0.20535714285714285,
11
- "current_steps": 22,
12
- "current_steps_adjusted": 22,
13
- "epoch_adjusted": 0.20535714285714285,
14
- "train_runtime": 60.8524,
15
- "train_samples_per_second": 7.313,
16
- "train_steps_per_second": 1.841,
17
- "total_flos": 1819849670000640.0,
18
- "train_loss": 0.7478187213773313
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/checkpoint-23-loss-0_60/training_prompt.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "template_type": "raw_text"
3
- }
 
 
 
 
aliceinwonderland/runs/Jun04_00-27-53_DESKTOP-7QRHF82/events.out.tfevents.1717478875.DESKTOP-7QRHF82.5780.0 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:4d9c48c3ff261ea97428f58f99fdcc72538405b49e4ca84c9eb991bd5963f9b3
3
- size 10741
 
 
 
 
aliceinwonderland/runs/Jun04_00-32-30_DESKTOP-7QRHF82/events.out.tfevents.1717479151.DESKTOP-7QRHF82.5780.1 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8acbbef9133cbacd0f4cc53e3cd9da98fe8510f03648ccc3a82c169bf8daa6c2
3
- size 10326
 
 
 
 
aliceinwonderland/runs/Jun04_00-34-12_DESKTOP-7QRHF82/events.out.tfevents.1717479252.DESKTOP-7QRHF82.5780.2 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:7a194065673c018cd47b84d86a22c61ccebe7dc95cda5512cd1658d56cec6938
3
- size 123223
 
 
 
 
aliceinwonderland/training_graph.json DELETED
@@ -1,3368 +0,0 @@
1
- [
2
- {
3
- "current_steps": 0,
4
- "loss": 0.6046,
5
- "learning_rate": 1e-08,
6
- "epoch": 0.008928571428571428
7
- },
8
- {
9
- "current_steps": 1,
10
- "loss": 0.6431,
11
- "learning_rate": 2e-08,
12
- "epoch": 0.017857142857142856
13
- },
14
- {
15
- "current_steps": 2,
16
- "loss": 0.6447,
17
- "learning_rate": 3e-08,
18
- "epoch": 0.026785714285714284
19
- },
20
- {
21
- "current_steps": 3,
22
- "loss": 0.7972,
23
- "learning_rate": 4e-08,
24
- "epoch": 0.03571428571428571
25
- },
26
- {
27
- "current_steps": 4,
28
- "loss": 0.6911,
29
- "learning_rate": 5e-08,
30
- "epoch": 0.044642857142857144
31
- },
32
- {
33
- "current_steps": 5,
34
- "loss": 0.8546,
35
- "learning_rate": 6e-08,
36
- "epoch": 0.05357142857142857
37
- },
38
- {
39
- "current_steps": 6,
40
- "loss": 0.7624,
41
- "learning_rate": 6e-08,
42
- "epoch": 0.0625
43
- },
44
- {
45
- "current_steps": 7,
46
- "loss": 0.6565,
47
- "learning_rate": 7e-08,
48
- "epoch": 0.07142857142857142
49
- },
50
- {
51
- "current_steps": 8,
52
- "loss": 0.6789,
53
- "learning_rate": 8e-08,
54
- "epoch": 0.08035714285714286
55
- },
56
- {
57
- "current_steps": 9,
58
- "loss": 0.8562,
59
- "learning_rate": 8e-08,
60
- "epoch": 0.08928571428571429
61
- },
62
- {
63
- "current_steps": 10,
64
- "loss": 0.8084,
65
- "learning_rate": 9e-08,
66
- "epoch": 0.09821428571428571
67
- },
68
- {
69
- "current_steps": 11,
70
- "loss": 0.7024,
71
- "learning_rate": 1e-07,
72
- "epoch": 0.10714285714285714
73
- },
74
- {
75
- "current_steps": 12,
76
- "loss": 0.7454,
77
- "learning_rate": 1.0999999999999999e-07,
78
- "epoch": 0.11607142857142858
79
- },
80
- {
81
- "current_steps": 13,
82
- "loss": 0.5896,
83
- "learning_rate": 1.2e-07,
84
- "epoch": 0.125
85
- },
86
- {
87
- "current_steps": 14,
88
- "loss": 1.1716,
89
- "learning_rate": 1.3e-07,
90
- "epoch": 0.13392857142857142
91
- },
92
- {
93
- "current_steps": 15,
94
- "loss": 0.8561,
95
- "learning_rate": 1.4e-07,
96
- "epoch": 0.14285714285714285
97
- },
98
- {
99
- "current_steps": 16,
100
- "loss": 0.9048,
101
- "learning_rate": 1.5e-07,
102
- "epoch": 0.15178571428571427
103
- },
104
- {
105
- "current_steps": 17,
106
- "loss": 0.6079,
107
- "learning_rate": 1.6e-07,
108
- "epoch": 0.16071428571428573
109
- },
110
- {
111
- "current_steps": 18,
112
- "loss": 0.9004,
113
- "learning_rate": 1.7000000000000001e-07,
114
- "epoch": 0.16964285714285715
115
- },
116
- {
117
- "current_steps": 19,
118
- "loss": 0.5512,
119
- "learning_rate": 1.8e-07,
120
- "epoch": 0.17857142857142858
121
- },
122
- {
123
- "current_steps": 20,
124
- "loss": 0.7782,
125
- "learning_rate": 1.8999999999999998e-07,
126
- "epoch": 0.1875
127
- },
128
- {
129
- "current_steps": 21,
130
- "loss": 0.7905,
131
- "learning_rate": 2e-07,
132
- "epoch": 0.19642857142857142
133
- },
134
- {
135
- "current_steps": 22,
136
- "loss": 0.6049,
137
- "learning_rate": 2.0999999999999997e-07,
138
- "epoch": 0.20535714285714285
139
- },
140
- {
141
- "current_steps": 23,
142
- "loss": 0.685,
143
- "learning_rate": 2.1999999999999998e-07,
144
- "epoch": 0.21428571428571427
145
- },
146
- {
147
- "current_steps": 24,
148
- "loss": 0.8171,
149
- "learning_rate": 2.3e-07,
150
- "epoch": 0.22321428571428573
151
- },
152
- {
153
- "current_steps": 25,
154
- "loss": 0.8018,
155
- "learning_rate": 2.4e-07,
156
- "epoch": 0.23214285714285715
157
- },
158
- {
159
- "current_steps": 26,
160
- "loss": 0.4959,
161
- "learning_rate": 2.5e-07,
162
- "epoch": 0.24107142857142858
163
- },
164
- {
165
- "current_steps": 27,
166
- "loss": 0.6348,
167
- "learning_rate": 2.6e-07,
168
- "epoch": 0.25
169
- },
170
- {
171
- "current_steps": 28,
172
- "loss": 0.8005,
173
- "learning_rate": 2.7e-07,
174
- "epoch": 0.25892857142857145
175
- },
176
- {
177
- "current_steps": 29,
178
- "loss": 0.6777,
179
- "learning_rate": 2.8e-07,
180
- "epoch": 0.26785714285714285
181
- },
182
- {
183
- "current_steps": 30,
184
- "loss": 0.9042,
185
- "learning_rate": 2.9e-07,
186
- "epoch": 0.2767857142857143
187
- },
188
- {
189
- "current_steps": 31,
190
- "loss": 0.6491,
191
- "learning_rate": 3e-07,
192
- "epoch": 0.2857142857142857
193
- },
194
- {
195
- "current_steps": 32,
196
- "loss": 1.0966,
197
- "learning_rate": 3.1e-07,
198
- "epoch": 0.29464285714285715
199
- },
200
- {
201
- "current_steps": 33,
202
- "loss": 0.7451,
203
- "learning_rate": 3.2e-07,
204
- "epoch": 0.30357142857142855
205
- },
206
- {
207
- "current_steps": 34,
208
- "loss": 1.1446,
209
- "learning_rate": 3.2e-07,
210
- "epoch": 0.3125
211
- },
212
- {
213
- "current_steps": 35,
214
- "loss": 0.7644,
215
- "learning_rate": 3.3e-07,
216
- "epoch": 0.32142857142857145
217
- },
218
- {
219
- "current_steps": 36,
220
- "loss": 0.7742,
221
- "learning_rate": 3.4000000000000003e-07,
222
- "epoch": 0.33035714285714285
223
- },
224
- {
225
- "current_steps": 37,
226
- "loss": 0.8247,
227
- "learning_rate": 3.5e-07,
228
- "epoch": 0.3392857142857143
229
- },
230
- {
231
- "current_steps": 38,
232
- "loss": 0.8667,
233
- "learning_rate": 3.6e-07,
234
- "epoch": 0.3482142857142857
235
- },
236
- {
237
- "current_steps": 39,
238
- "loss": 0.8309,
239
- "learning_rate": 3.7e-07,
240
- "epoch": 0.35714285714285715
241
- },
242
- {
243
- "current_steps": 40,
244
- "loss": 0.5913,
245
- "learning_rate": 3.7999999999999996e-07,
246
- "epoch": 0.36607142857142855
247
- },
248
- {
249
- "current_steps": 41,
250
- "loss": 0.5562,
251
- "learning_rate": 3.8999999999999997e-07,
252
- "epoch": 0.375
253
- },
254
- {
255
- "current_steps": 42,
256
- "loss": 1.6276,
257
- "learning_rate": 4e-07,
258
- "epoch": 0.38392857142857145
259
- },
260
- {
261
- "current_steps": 43,
262
- "loss": 0.682,
263
- "learning_rate": 4.0999999999999994e-07,
264
- "epoch": 0.39285714285714285
265
- },
266
- {
267
- "current_steps": 44,
268
- "loss": 0.8022,
269
- "learning_rate": 4.1999999999999995e-07,
270
- "epoch": 0.4017857142857143
271
- },
272
- {
273
- "current_steps": 45,
274
- "loss": 0.6702,
275
- "learning_rate": 4.2999999999999996e-07,
276
- "epoch": 0.4107142857142857
277
- },
278
- {
279
- "current_steps": 46,
280
- "loss": 0.6993,
281
- "learning_rate": 4.3999999999999997e-07,
282
- "epoch": 0.41964285714285715
283
- },
284
- {
285
- "current_steps": 47,
286
- "loss": 0.9685,
287
- "learning_rate": 4.5e-07,
288
- "epoch": 0.42857142857142855
289
- },
290
- {
291
- "current_steps": 48,
292
- "loss": 0.6637,
293
- "learning_rate": 4.6e-07,
294
- "epoch": 0.4375
295
- },
296
- {
297
- "current_steps": 49,
298
- "loss": 0.908,
299
- "learning_rate": 4.6999999999999995e-07,
300
- "epoch": 0.44642857142857145
301
- },
302
- {
303
- "current_steps": 50,
304
- "loss": 0.8683,
305
- "learning_rate": 4.8e-07,
306
- "epoch": 0.45535714285714285
307
- },
308
- {
309
- "current_steps": 51,
310
- "loss": 0.9243,
311
- "learning_rate": 4.9e-07,
312
- "epoch": 0.4642857142857143
313
- },
314
- {
315
- "current_steps": 52,
316
- "loss": 0.7933,
317
- "learning_rate": 5e-07,
318
- "epoch": 0.4732142857142857
319
- },
320
- {
321
- "current_steps": 53,
322
- "loss": 0.5856,
323
- "learning_rate": 5.1e-07,
324
- "epoch": 0.48214285714285715
325
- },
326
- {
327
- "current_steps": 54,
328
- "loss": 0.7097,
329
- "learning_rate": 5.2e-07,
330
- "epoch": 0.49107142857142855
331
- },
332
- {
333
- "current_steps": 55,
334
- "loss": 0.6476,
335
- "learning_rate": 5.3e-07,
336
- "epoch": 0.5
337
- },
338
- {
339
- "current_steps": 56,
340
- "loss": 0.8212,
341
- "learning_rate": 5.4e-07,
342
- "epoch": 0.5089285714285714
343
- },
344
- {
345
- "current_steps": 57,
346
- "loss": 0.7932,
347
- "learning_rate": 5.5e-07,
348
- "epoch": 0.5178571428571429
349
- },
350
- {
351
- "current_steps": 58,
352
- "loss": 0.8155,
353
- "learning_rate": 5.6e-07,
354
- "epoch": 0.5267857142857143
355
- },
356
- {
357
- "current_steps": 59,
358
- "loss": 0.5644,
359
- "learning_rate": 5.699999999999999e-07,
360
- "epoch": 0.5357142857142857
361
- },
362
- {
363
- "current_steps": 60,
364
- "loss": 0.8935,
365
- "learning_rate": 5.8e-07,
366
- "epoch": 0.5446428571428571
367
- },
368
- {
369
- "current_steps": 61,
370
- "loss": 0.6935,
371
- "learning_rate": 5.9e-07,
372
- "epoch": 0.5535714285714286
373
- },
374
- {
375
- "current_steps": 62,
376
- "loss": 0.6186,
377
- "learning_rate": 6e-07,
378
- "epoch": 0.5625
379
- },
380
- {
381
- "current_steps": 63,
382
- "loss": 0.7528,
383
- "learning_rate": 6.1e-07,
384
- "epoch": 0.5714285714285714
385
- },
386
- {
387
- "current_steps": 64,
388
- "loss": 0.7043,
389
- "learning_rate": 6.2e-07,
390
- "epoch": 0.5803571428571429
391
- },
392
- {
393
- "current_steps": 65,
394
- "loss": 0.5926,
395
- "learning_rate": 6.3e-07,
396
- "epoch": 0.5892857142857143
397
- },
398
- {
399
- "current_steps": 66,
400
- "loss": 0.7927,
401
- "learning_rate": 6.4e-07,
402
- "epoch": 0.5982142857142857
403
- },
404
- {
405
- "current_steps": 67,
406
- "loss": 0.5625,
407
- "learning_rate": 6.5e-07,
408
- "epoch": 0.6071428571428571
409
- },
410
- {
411
- "current_steps": 68,
412
- "loss": 0.707,
413
- "learning_rate": 6.6e-07,
414
- "epoch": 0.6160714285714286
415
- },
416
- {
417
- "current_steps": 69,
418
- "loss": 0.7023,
419
- "learning_rate": 6.7e-07,
420
- "epoch": 0.625
421
- },
422
- {
423
- "current_steps": 70,
424
- "loss": 0.586,
425
- "learning_rate": 6.800000000000001e-07,
426
- "epoch": 0.6339285714285714
427
- },
428
- {
429
- "current_steps": 71,
430
- "loss": 0.5741,
431
- "learning_rate": 6.9e-07,
432
- "epoch": 0.6428571428571429
433
- },
434
- {
435
- "current_steps": 72,
436
- "loss": 1.086,
437
- "learning_rate": 7e-07,
438
- "epoch": 0.6517857142857143
439
- },
440
- {
441
- "current_steps": 73,
442
- "loss": 0.6381,
443
- "learning_rate": 7.1e-07,
444
- "epoch": 0.6607142857142857
445
- },
446
- {
447
- "current_steps": 74,
448
- "loss": 0.7509,
449
- "learning_rate": 7.2e-07,
450
- "epoch": 0.6696428571428571
451
- },
452
- {
453
- "current_steps": 75,
454
- "loss": 0.8276,
455
- "learning_rate": 7.3e-07,
456
- "epoch": 0.6785714285714286
457
- },
458
- {
459
- "current_steps": 76,
460
- "loss": 0.7623,
461
- "learning_rate": 7.4e-07,
462
- "epoch": 0.6875
463
- },
464
- {
465
- "current_steps": 77,
466
- "loss": 0.9499,
467
- "learning_rate": 7.5e-07,
468
- "epoch": 0.6964285714285714
469
- },
470
- {
471
- "current_steps": 78,
472
- "loss": 0.8563,
473
- "learning_rate": 7.599999999999999e-07,
474
- "epoch": 0.7053571428571429
475
- },
476
- {
477
- "current_steps": 79,
478
- "loss": 0.6512,
479
- "learning_rate": 7.699999999999999e-07,
480
- "epoch": 0.7142857142857143
481
- },
482
- {
483
- "current_steps": 80,
484
- "loss": 0.843,
485
- "learning_rate": 7.799999999999999e-07,
486
- "epoch": 0.7232142857142857
487
- },
488
- {
489
- "current_steps": 81,
490
- "loss": 0.7272,
491
- "learning_rate": 7.9e-07,
492
- "epoch": 0.7321428571428571
493
- },
494
- {
495
- "current_steps": 82,
496
- "loss": 0.5161,
497
- "learning_rate": 8e-07,
498
- "epoch": 0.7410714285714286
499
- },
500
- {
501
- "current_steps": 83,
502
- "loss": 0.8293,
503
- "learning_rate": 8.1e-07,
504
- "epoch": 0.75
505
- },
506
- {
507
- "current_steps": 84,
508
- "loss": 0.8704,
509
- "learning_rate": 8.199999999999999e-07,
510
- "epoch": 0.7589285714285714
511
- },
512
- {
513
- "current_steps": 85,
514
- "loss": 0.7255,
515
- "learning_rate": 8.299999999999999e-07,
516
- "epoch": 0.7678571428571429
517
- },
518
- {
519
- "current_steps": 86,
520
- "loss": 0.6252,
521
- "learning_rate": 8.399999999999999e-07,
522
- "epoch": 0.7767857142857143
523
- },
524
- {
525
- "current_steps": 87,
526
- "loss": 0.8116,
527
- "learning_rate": 8.499999999999999e-07,
528
- "epoch": 0.7857142857142857
529
- },
530
- {
531
- "current_steps": 88,
532
- "loss": 0.7703,
533
- "learning_rate": 8.599999999999999e-07,
534
- "epoch": 0.7946428571428571
535
- },
536
- {
537
- "current_steps": 89,
538
- "loss": 0.6496,
539
- "learning_rate": 8.699999999999999e-07,
540
- "epoch": 0.8035714285714286
541
- },
542
- {
543
- "current_steps": 90,
544
- "loss": 0.8585,
545
- "learning_rate": 8.799999999999999e-07,
546
- "epoch": 0.8125
547
- },
548
- {
549
- "current_steps": 91,
550
- "loss": 0.905,
551
- "learning_rate": 8.9e-07,
552
- "epoch": 0.8214285714285714
553
- },
554
- {
555
- "current_steps": 92,
556
- "loss": 0.9139,
557
- "learning_rate": 9e-07,
558
- "epoch": 0.8303571428571429
559
- },
560
- {
561
- "current_steps": 93,
562
- "loss": 0.9925,
563
- "learning_rate": 9.1e-07,
564
- "epoch": 0.8392857142857143
565
- },
566
- {
567
- "current_steps": 94,
568
- "loss": 0.7344,
569
- "learning_rate": 9.2e-07,
570
- "epoch": 0.8482142857142857
571
- },
572
- {
573
- "current_steps": 95,
574
- "loss": 0.7477,
575
- "learning_rate": 9.3e-07,
576
- "epoch": 0.8571428571428571
577
- },
578
- {
579
- "current_steps": 96,
580
- "loss": 0.671,
581
- "learning_rate": 9.399999999999999e-07,
582
- "epoch": 0.8660714285714286
583
- },
584
- {
585
- "current_steps": 97,
586
- "loss": 0.9654,
587
- "learning_rate": 9.499999999999999e-07,
588
- "epoch": 0.875
589
- },
590
- {
591
- "current_steps": 98,
592
- "loss": 0.6788,
593
- "learning_rate": 9.6e-07,
594
- "epoch": 0.8839285714285714
595
- },
596
- {
597
- "current_steps": 99,
598
- "loss": 0.764,
599
- "learning_rate": 9.7e-07,
600
- "epoch": 0.8928571428571429
601
- },
602
- {
603
- "current_steps": 100,
604
- "loss": 0.7536,
605
- "learning_rate": 9.8e-07,
606
- "epoch": 0.9017857142857143
607
- },
608
- {
609
- "current_steps": 101,
610
- "loss": 0.6409,
611
- "learning_rate": 9.9e-07,
612
- "epoch": 0.9107142857142857
613
- },
614
- {
615
- "current_steps": 102,
616
- "loss": 0.904,
617
- "learning_rate": 1e-06,
618
- "epoch": 0.9196428571428571
619
- },
620
- {
621
- "current_steps": 103,
622
- "loss": 0.7079,
623
- "learning_rate": 9.978260869565217e-07,
624
- "epoch": 0.9285714285714286
625
- },
626
- {
627
- "current_steps": 104,
628
- "loss": 0.748,
629
- "learning_rate": 9.956521739130434e-07,
630
- "epoch": 0.9375
631
- },
632
- {
633
- "current_steps": 105,
634
- "loss": 0.7228,
635
- "learning_rate": 9.934782608695653e-07,
636
- "epoch": 0.9464285714285714
637
- },
638
- {
639
- "current_steps": 106,
640
- "loss": 0.722,
641
- "learning_rate": 9.91304347826087e-07,
642
- "epoch": 0.9553571428571429
643
- },
644
- {
645
- "current_steps": 107,
646
- "loss": 0.8011,
647
- "learning_rate": 9.891304347826085e-07,
648
- "epoch": 0.9642857142857143
649
- },
650
- {
651
- "current_steps": 108,
652
- "loss": 0.8125,
653
- "learning_rate": 9.869565217391304e-07,
654
- "epoch": 0.9732142857142857
655
- },
656
- {
657
- "current_steps": 109,
658
- "loss": 0.8091,
659
- "learning_rate": 9.847826086956522e-07,
660
- "epoch": 0.9821428571428571
661
- },
662
- {
663
- "current_steps": 110,
664
- "loss": 0.9399,
665
- "learning_rate": 9.826086956521739e-07,
666
- "epoch": 0.9910714285714286
667
- },
668
- {
669
- "current_steps": 111,
670
- "loss": 1.0917,
671
- "learning_rate": 9.804347826086956e-07,
672
- "epoch": 1.0
673
- },
674
- {
675
- "current_steps": 112,
676
- "loss": 0.9014,
677
- "learning_rate": 9.782608695652173e-07,
678
- "epoch": 1.0089285714285714
679
- },
680
- {
681
- "current_steps": 113,
682
- "loss": 0.873,
683
- "learning_rate": 9.782608695652173e-07,
684
- "epoch": 1.0178571428571428
685
- },
686
- {
687
- "current_steps": 114,
688
- "loss": 0.7153,
689
- "learning_rate": 9.76086956521739e-07,
690
- "epoch": 1.0267857142857142
691
- },
692
- {
693
- "current_steps": 115,
694
- "loss": 0.8828,
695
- "learning_rate": 9.73913043478261e-07,
696
- "epoch": 1.0357142857142858
697
- },
698
- {
699
- "current_steps": 116,
700
- "loss": 1.0329,
701
- "learning_rate": 9.717391304347827e-07,
702
- "epoch": 1.0446428571428572
703
- },
704
- {
705
- "current_steps": 117,
706
- "loss": 1.057,
707
- "learning_rate": 9.695652173913042e-07,
708
- "epoch": 1.0535714285714286
709
- },
710
- {
711
- "current_steps": 118,
712
- "loss": 0.8047,
713
- "learning_rate": 9.67391304347826e-07,
714
- "epoch": 1.0625
715
- },
716
- {
717
- "current_steps": 119,
718
- "loss": 0.7098,
719
- "learning_rate": 9.652173913043478e-07,
720
- "epoch": 1.0714285714285714
721
- },
722
- {
723
- "current_steps": 120,
724
- "loss": 1.094,
725
- "learning_rate": 9.630434782608695e-07,
726
- "epoch": 1.0803571428571428
727
- },
728
- {
729
- "current_steps": 121,
730
- "loss": 0.7521,
731
- "learning_rate": 9.608695652173912e-07,
732
- "epoch": 1.0892857142857142
733
- },
734
- {
735
- "current_steps": 122,
736
- "loss": 0.9738,
737
- "learning_rate": 9.58695652173913e-07,
738
- "epoch": 1.0982142857142858
739
- },
740
- {
741
- "current_steps": 123,
742
- "loss": 0.5577,
743
- "learning_rate": 9.565217391304349e-07,
744
- "epoch": 1.1071428571428572
745
- },
746
- {
747
- "current_steps": 124,
748
- "loss": 1.046,
749
- "learning_rate": 9.543478260869566e-07,
750
- "epoch": 1.1160714285714286
751
- },
752
- {
753
- "current_steps": 125,
754
- "loss": 0.597,
755
- "learning_rate": 9.521739130434783e-07,
756
- "epoch": 1.125
757
- },
758
- {
759
- "current_steps": 126,
760
- "loss": 0.7996,
761
- "learning_rate": 9.499999999999999e-07,
762
- "epoch": 1.1339285714285714
763
- },
764
- {
765
- "current_steps": 127,
766
- "loss": 0.9885,
767
- "learning_rate": 9.478260869565216e-07,
768
- "epoch": 1.1428571428571428
769
- },
770
- {
771
- "current_steps": 128,
772
- "loss": 0.6274,
773
- "learning_rate": 9.456521739130434e-07,
774
- "epoch": 1.1517857142857142
775
- },
776
- {
777
- "current_steps": 129,
778
- "loss": 0.8557,
779
- "learning_rate": 9.434782608695652e-07,
780
- "epoch": 1.1607142857142858
781
- },
782
- {
783
- "current_steps": 130,
784
- "loss": 0.702,
785
- "learning_rate": 9.41304347826087e-07,
786
- "epoch": 1.1696428571428572
787
- },
788
- {
789
- "current_steps": 131,
790
- "loss": 0.6905,
791
- "learning_rate": 9.391304347826087e-07,
792
- "epoch": 1.1785714285714286
793
- },
794
- {
795
- "current_steps": 132,
796
- "loss": 0.5707,
797
- "learning_rate": 9.369565217391304e-07,
798
- "epoch": 1.1875
799
- },
800
- {
801
- "current_steps": 133,
802
- "loss": 0.6121,
803
- "learning_rate": 9.347826086956522e-07,
804
- "epoch": 1.1964285714285714
805
- },
806
- {
807
- "current_steps": 134,
808
- "loss": 0.8348,
809
- "learning_rate": 9.326086956521738e-07,
810
- "epoch": 1.2053571428571428
811
- },
812
- {
813
- "current_steps": 135,
814
- "loss": 0.8768,
815
- "learning_rate": 9.304347826086955e-07,
816
- "epoch": 1.2142857142857142
817
- },
818
- {
819
- "current_steps": 136,
820
- "loss": 0.5648,
821
- "learning_rate": 9.282608695652174e-07,
822
- "epoch": 1.2232142857142858
823
- },
824
- {
825
- "current_steps": 137,
826
- "loss": 0.6316,
827
- "learning_rate": 9.260869565217391e-07,
828
- "epoch": 1.2321428571428572
829
- },
830
- {
831
- "current_steps": 138,
832
- "loss": 1.1728,
833
- "learning_rate": 9.239130434782608e-07,
834
- "epoch": 1.2410714285714286
835
- },
836
- {
837
- "current_steps": 139,
838
- "loss": 0.7299,
839
- "learning_rate": 9.217391304347826e-07,
840
- "epoch": 1.25
841
- },
842
- {
843
- "current_steps": 140,
844
- "loss": 0.6284,
845
- "learning_rate": 9.195652173913043e-07,
846
- "epoch": 1.2589285714285714
847
- },
848
- {
849
- "current_steps": 141,
850
- "loss": 0.6366,
851
- "learning_rate": 9.17391304347826e-07,
852
- "epoch": 1.2678571428571428
853
- },
854
- {
855
- "current_steps": 142,
856
- "loss": 0.7357,
857
- "learning_rate": 9.152173913043479e-07,
858
- "epoch": 1.2767857142857144
859
- },
860
- {
861
- "current_steps": 143,
862
- "loss": 0.8618,
863
- "learning_rate": 9.130434782608695e-07,
864
- "epoch": 1.2857142857142856
865
- },
866
- {
867
- "current_steps": 144,
868
- "loss": 0.6803,
869
- "learning_rate": 9.108695652173912e-07,
870
- "epoch": 1.2946428571428572
871
- },
872
- {
873
- "current_steps": 145,
874
- "loss": 0.8093,
875
- "learning_rate": 9.08695652173913e-07,
876
- "epoch": 1.3035714285714286
877
- },
878
- {
879
- "current_steps": 146,
880
- "loss": 0.6808,
881
- "learning_rate": 9.065217391304347e-07,
882
- "epoch": 1.3125
883
- },
884
- {
885
- "current_steps": 147,
886
- "loss": 0.7173,
887
- "learning_rate": 9.043478260869564e-07,
888
- "epoch": 1.3214285714285714
889
- },
890
- {
891
- "current_steps": 148,
892
- "loss": 0.6964,
893
- "learning_rate": 9.021739130434782e-07,
894
- "epoch": 1.3303571428571428
895
- },
896
- {
897
- "current_steps": 149,
898
- "loss": 0.5458,
899
- "learning_rate": 9e-07,
900
- "epoch": 1.3392857142857144
901
- },
902
- {
903
- "current_steps": 150,
904
- "loss": 0.5362,
905
- "learning_rate": 8.978260869565218e-07,
906
- "epoch": 1.3482142857142856
907
- },
908
- {
909
- "current_steps": 151,
910
- "loss": 0.7248,
911
- "learning_rate": 8.956521739130435e-07,
912
- "epoch": 1.3571428571428572
913
- },
914
- {
915
- "current_steps": 152,
916
- "loss": 0.9701,
917
- "learning_rate": 8.934782608695651e-07,
918
- "epoch": 1.3660714285714286
919
- },
920
- {
921
- "current_steps": 153,
922
- "loss": 0.6072,
923
- "learning_rate": 8.913043478260869e-07,
924
- "epoch": 1.375
925
- },
926
- {
927
- "current_steps": 154,
928
- "loss": 0.8135,
929
- "learning_rate": 8.891304347826086e-07,
930
- "epoch": 1.3839285714285714
931
- },
932
- {
933
- "current_steps": 155,
934
- "loss": 0.6519,
935
- "learning_rate": 8.869565217391303e-07,
936
- "epoch": 1.3928571428571428
937
- },
938
- {
939
- "current_steps": 156,
940
- "loss": 0.7911,
941
- "learning_rate": 8.847826086956522e-07,
942
- "epoch": 1.4017857142857144
943
- },
944
- {
945
- "current_steps": 157,
946
- "loss": 0.7084,
947
- "learning_rate": 8.826086956521739e-07,
948
- "epoch": 1.4107142857142856
949
- },
950
- {
951
- "current_steps": 158,
952
- "loss": 0.6062,
953
- "learning_rate": 8.804347826086956e-07,
954
- "epoch": 1.4196428571428572
955
- },
956
- {
957
- "current_steps": 159,
958
- "loss": 0.5372,
959
- "learning_rate": 8.782608695652174e-07,
960
- "epoch": 1.4285714285714286
961
- },
962
- {
963
- "current_steps": 160,
964
- "loss": 0.7001,
965
- "learning_rate": 8.760869565217391e-07,
966
- "epoch": 1.4375
967
- },
968
- {
969
- "current_steps": 161,
970
- "loss": 0.628,
971
- "learning_rate": 8.739130434782607e-07,
972
- "epoch": 1.4464285714285714
973
- },
974
- {
975
- "current_steps": 162,
976
- "loss": 0.6766,
977
- "learning_rate": 8.717391304347826e-07,
978
- "epoch": 1.4553571428571428
979
- },
980
- {
981
- "current_steps": 163,
982
- "loss": 0.7406,
983
- "learning_rate": 8.695652173913043e-07,
984
- "epoch": 1.4642857142857144
985
- },
986
- {
987
- "current_steps": 164,
988
- "loss": 0.7032,
989
- "learning_rate": 8.67391304347826e-07,
990
- "epoch": 1.4732142857142856
991
- },
992
- {
993
- "current_steps": 165,
994
- "loss": 0.8338,
995
- "learning_rate": 8.652173913043478e-07,
996
- "epoch": 1.4821428571428572
997
- },
998
- {
999
- "current_steps": 166,
1000
- "loss": 0.6067,
1001
- "learning_rate": 8.630434782608695e-07,
1002
- "epoch": 1.4910714285714286
1003
- },
1004
- {
1005
- "current_steps": 167,
1006
- "loss": 0.6988,
1007
- "learning_rate": 8.608695652173913e-07,
1008
- "epoch": 1.5
1009
- },
1010
- {
1011
- "current_steps": 168,
1012
- "loss": 0.6294,
1013
- "learning_rate": 8.586956521739131e-07,
1014
- "epoch": 1.5089285714285714
1015
- },
1016
- {
1017
- "current_steps": 169,
1018
- "loss": 0.7358,
1019
- "learning_rate": 8.565217391304348e-07,
1020
- "epoch": 1.5178571428571428
1021
- },
1022
- {
1023
- "current_steps": 170,
1024
- "loss": 0.7709,
1025
- "learning_rate": 8.543478260869565e-07,
1026
- "epoch": 1.5267857142857144
1027
- },
1028
- {
1029
- "current_steps": 171,
1030
- "loss": 0.8913,
1031
- "learning_rate": 8.521739130434782e-07,
1032
- "epoch": 1.5357142857142856
1033
- },
1034
- {
1035
- "current_steps": 172,
1036
- "loss": 0.697,
1037
- "learning_rate": 8.499999999999999e-07,
1038
- "epoch": 1.5446428571428572
1039
- },
1040
- {
1041
- "current_steps": 173,
1042
- "loss": 0.7902,
1043
- "learning_rate": 8.478260869565217e-07,
1044
- "epoch": 1.5535714285714286
1045
- },
1046
- {
1047
- "current_steps": 174,
1048
- "loss": 0.7858,
1049
- "learning_rate": 8.456521739130434e-07,
1050
- "epoch": 1.5625
1051
- },
1052
- {
1053
- "current_steps": 175,
1054
- "loss": 0.8903,
1055
- "learning_rate": 8.434782608695652e-07,
1056
- "epoch": 1.5714285714285714
1057
- },
1058
- {
1059
- "current_steps": 176,
1060
- "loss": 0.8324,
1061
- "learning_rate": 8.41304347826087e-07,
1062
- "epoch": 1.5803571428571428
1063
- },
1064
- {
1065
- "current_steps": 177,
1066
- "loss": 0.7323,
1067
- "learning_rate": 8.391304347826087e-07,
1068
- "epoch": 1.5892857142857144
1069
- },
1070
- {
1071
- "current_steps": 178,
1072
- "loss": 0.7527,
1073
- "learning_rate": 8.369565217391304e-07,
1074
- "epoch": 1.5982142857142856
1075
- },
1076
- {
1077
- "current_steps": 179,
1078
- "loss": 0.8336,
1079
- "learning_rate": 8.347826086956521e-07,
1080
- "epoch": 1.6071428571428572
1081
- },
1082
- {
1083
- "current_steps": 180,
1084
- "loss": 0.7886,
1085
- "learning_rate": 8.326086956521738e-07,
1086
- "epoch": 1.6160714285714286
1087
- },
1088
- {
1089
- "current_steps": 181,
1090
- "loss": 0.7455,
1091
- "learning_rate": 8.304347826086955e-07,
1092
- "epoch": 1.625
1093
- },
1094
- {
1095
- "current_steps": 182,
1096
- "loss": 0.7702,
1097
- "learning_rate": 8.282608695652174e-07,
1098
- "epoch": 1.6339285714285714
1099
- },
1100
- {
1101
- "current_steps": 183,
1102
- "loss": 0.6935,
1103
- "learning_rate": 8.260869565217391e-07,
1104
- "epoch": 1.6428571428571428
1105
- },
1106
- {
1107
- "current_steps": 184,
1108
- "loss": 0.6778,
1109
- "learning_rate": 8.239130434782609e-07,
1110
- "epoch": 1.6517857142857144
1111
- },
1112
- {
1113
- "current_steps": 185,
1114
- "loss": 0.7623,
1115
- "learning_rate": 8.217391304347826e-07,
1116
- "epoch": 1.6607142857142856
1117
- },
1118
- {
1119
- "current_steps": 186,
1120
- "loss": 0.8068,
1121
- "learning_rate": 8.195652173913043e-07,
1122
- "epoch": 1.6696428571428572
1123
- },
1124
- {
1125
- "current_steps": 187,
1126
- "loss": 0.6384,
1127
- "learning_rate": 8.173913043478261e-07,
1128
- "epoch": 1.6785714285714286
1129
- },
1130
- {
1131
- "current_steps": 188,
1132
- "loss": 0.9876,
1133
- "learning_rate": 8.152173913043478e-07,
1134
- "epoch": 1.6875
1135
- },
1136
- {
1137
- "current_steps": 189,
1138
- "loss": 0.5316,
1139
- "learning_rate": 8.130434782608695e-07,
1140
- "epoch": 1.6964285714285714
1141
- },
1142
- {
1143
- "current_steps": 190,
1144
- "loss": 0.6117,
1145
- "learning_rate": 8.108695652173913e-07,
1146
- "epoch": 1.7053571428571428
1147
- },
1148
- {
1149
- "current_steps": 191,
1150
- "loss": 0.5897,
1151
- "learning_rate": 8.08695652173913e-07,
1152
- "epoch": 1.7142857142857144
1153
- },
1154
- {
1155
- "current_steps": 192,
1156
- "loss": 0.7045,
1157
- "learning_rate": 8.065217391304347e-07,
1158
- "epoch": 1.7232142857142856
1159
- },
1160
- {
1161
- "current_steps": 193,
1162
- "loss": 0.7491,
1163
- "learning_rate": 8.043478260869565e-07,
1164
- "epoch": 1.7321428571428572
1165
- },
1166
- {
1167
- "current_steps": 194,
1168
- "loss": 0.8067,
1169
- "learning_rate": 8.021739130434782e-07,
1170
- "epoch": 1.7410714285714286
1171
- },
1172
- {
1173
- "current_steps": 195,
1174
- "loss": 0.9085,
1175
- "learning_rate": 8e-07,
1176
- "epoch": 1.75
1177
- },
1178
- {
1179
- "current_steps": 196,
1180
- "loss": 0.7977,
1181
- "learning_rate": 7.978260869565217e-07,
1182
- "epoch": 1.7589285714285714
1183
- },
1184
- {
1185
- "current_steps": 197,
1186
- "loss": 0.7509,
1187
- "learning_rate": 7.956521739130434e-07,
1188
- "epoch": 1.7678571428571428
1189
- },
1190
- {
1191
- "current_steps": 198,
1192
- "loss": 0.7048,
1193
- "learning_rate": 7.934782608695651e-07,
1194
- "epoch": 1.7767857142857144
1195
- },
1196
- {
1197
- "current_steps": 199,
1198
- "loss": 0.6452,
1199
- "learning_rate": 7.913043478260869e-07,
1200
- "epoch": 1.7857142857142856
1201
- },
1202
- {
1203
- "current_steps": 200,
1204
- "loss": 0.7265,
1205
- "learning_rate": 7.891304347826086e-07,
1206
- "epoch": 1.7946428571428572
1207
- },
1208
- {
1209
- "current_steps": 201,
1210
- "loss": 0.7936,
1211
- "learning_rate": 7.869565217391305e-07,
1212
- "epoch": 1.8035714285714286
1213
- },
1214
- {
1215
- "current_steps": 202,
1216
- "loss": 0.7336,
1217
- "learning_rate": 7.847826086956522e-07,
1218
- "epoch": 1.8125
1219
- },
1220
- {
1221
- "current_steps": 203,
1222
- "loss": 0.6462,
1223
- "learning_rate": 7.826086956521739e-07,
1224
- "epoch": 1.8214285714285714
1225
- },
1226
- {
1227
- "current_steps": 204,
1228
- "loss": 0.579,
1229
- "learning_rate": 7.804347826086957e-07,
1230
- "epoch": 1.8303571428571428
1231
- },
1232
- {
1233
- "current_steps": 205,
1234
- "loss": 0.6014,
1235
- "learning_rate": 7.782608695652173e-07,
1236
- "epoch": 1.8392857142857144
1237
- },
1238
- {
1239
- "current_steps": 206,
1240
- "loss": 0.684,
1241
- "learning_rate": 7.76086956521739e-07,
1242
- "epoch": 1.8482142857142856
1243
- },
1244
- {
1245
- "current_steps": 207,
1246
- "loss": 0.5932,
1247
- "learning_rate": 7.739130434782608e-07,
1248
- "epoch": 1.8571428571428572
1249
- },
1250
- {
1251
- "current_steps": 208,
1252
- "loss": 0.7736,
1253
- "learning_rate": 7.717391304347826e-07,
1254
- "epoch": 1.8660714285714286
1255
- },
1256
- {
1257
- "current_steps": 209,
1258
- "loss": 0.7601,
1259
- "learning_rate": 7.695652173913043e-07,
1260
- "epoch": 1.875
1261
- },
1262
- {
1263
- "current_steps": 210,
1264
- "loss": 0.8428,
1265
- "learning_rate": 7.673913043478261e-07,
1266
- "epoch": 1.8839285714285714
1267
- },
1268
- {
1269
- "current_steps": 211,
1270
- "loss": 0.8017,
1271
- "learning_rate": 7.652173913043478e-07,
1272
- "epoch": 1.8928571428571428
1273
- },
1274
- {
1275
- "current_steps": 212,
1276
- "loss": 0.5998,
1277
- "learning_rate": 7.630434782608695e-07,
1278
- "epoch": 1.9017857142857144
1279
- },
1280
- {
1281
- "current_steps": 213,
1282
- "loss": 0.9071,
1283
- "learning_rate": 7.608695652173913e-07,
1284
- "epoch": 1.9107142857142856
1285
- },
1286
- {
1287
- "current_steps": 214,
1288
- "loss": 0.8255,
1289
- "learning_rate": 7.58695652173913e-07,
1290
- "epoch": 1.9196428571428572
1291
- },
1292
- {
1293
- "current_steps": 215,
1294
- "loss": 0.9256,
1295
- "learning_rate": 7.565217391304347e-07,
1296
- "epoch": 1.9285714285714286
1297
- },
1298
- {
1299
- "current_steps": 216,
1300
- "loss": 0.6745,
1301
- "learning_rate": 7.543478260869565e-07,
1302
- "epoch": 1.9375
1303
- },
1304
- {
1305
- "current_steps": 217,
1306
- "loss": 0.6372,
1307
- "learning_rate": 7.521739130434782e-07,
1308
- "epoch": 1.9464285714285714
1309
- },
1310
- {
1311
- "current_steps": 218,
1312
- "loss": 0.6495,
1313
- "learning_rate": 7.5e-07,
1314
- "epoch": 1.9553571428571428
1315
- },
1316
- {
1317
- "current_steps": 219,
1318
- "loss": 0.6054,
1319
- "learning_rate": 7.478260869565217e-07,
1320
- "epoch": 1.9642857142857144
1321
- },
1322
- {
1323
- "current_steps": 220,
1324
- "loss": 0.9751,
1325
- "learning_rate": 7.478260869565217e-07,
1326
- "epoch": 1.9732142857142856
1327
- },
1328
- {
1329
- "current_steps": 221,
1330
- "loss": 0.6258,
1331
- "learning_rate": 7.456521739130434e-07,
1332
- "epoch": 1.9821428571428572
1333
- },
1334
- {
1335
- "current_steps": 222,
1336
- "loss": 0.794,
1337
- "learning_rate": 7.434782608695653e-07,
1338
- "epoch": 1.9910714285714286
1339
- },
1340
- {
1341
- "current_steps": 223,
1342
- "loss": 0.9991,
1343
- "learning_rate": 7.41304347826087e-07,
1344
- "epoch": 2.0
1345
- },
1346
- {
1347
- "current_steps": 224,
1348
- "loss": 0.8048,
1349
- "learning_rate": 7.391304347826086e-07,
1350
- "epoch": 2.0089285714285716
1351
- },
1352
- {
1353
- "current_steps": 225,
1354
- "loss": 0.8439,
1355
- "learning_rate": 7.369565217391304e-07,
1356
- "epoch": 2.017857142857143
1357
- },
1358
- {
1359
- "current_steps": 226,
1360
- "loss": 0.7546,
1361
- "learning_rate": 7.347826086956521e-07,
1362
- "epoch": 2.0267857142857144
1363
- },
1364
- {
1365
- "current_steps": 227,
1366
- "loss": 0.8195,
1367
- "learning_rate": 7.326086956521738e-07,
1368
- "epoch": 2.0357142857142856
1369
- },
1370
- {
1371
- "current_steps": 228,
1372
- "loss": 0.6988,
1373
- "learning_rate": 7.304347826086957e-07,
1374
- "epoch": 2.044642857142857
1375
- },
1376
- {
1377
- "current_steps": 229,
1378
- "loss": 0.8419,
1379
- "learning_rate": 7.282608695652174e-07,
1380
- "epoch": 2.0535714285714284
1381
- },
1382
- {
1383
- "current_steps": 230,
1384
- "loss": 0.6133,
1385
- "learning_rate": 7.260869565217391e-07,
1386
- "epoch": 2.0625
1387
- },
1388
- {
1389
- "current_steps": 231,
1390
- "loss": 0.6307,
1391
- "learning_rate": 7.239130434782609e-07,
1392
- "epoch": 2.0714285714285716
1393
- },
1394
- {
1395
- "current_steps": 232,
1396
- "loss": 0.7852,
1397
- "learning_rate": 7.217391304347826e-07,
1398
- "epoch": 2.080357142857143
1399
- },
1400
- {
1401
- "current_steps": 233,
1402
- "loss": 0.4894,
1403
- "learning_rate": 7.195652173913042e-07,
1404
- "epoch": 2.0892857142857144
1405
- },
1406
- {
1407
- "current_steps": 234,
1408
- "loss": 0.6806,
1409
- "learning_rate": 7.17391304347826e-07,
1410
- "epoch": 2.0982142857142856
1411
- },
1412
- {
1413
- "current_steps": 235,
1414
- "loss": 0.7798,
1415
- "learning_rate": 7.152173913043478e-07,
1416
- "epoch": 2.107142857142857
1417
- },
1418
- {
1419
- "current_steps": 236,
1420
- "loss": 0.934,
1421
- "learning_rate": 7.130434782608695e-07,
1422
- "epoch": 2.1160714285714284
1423
- },
1424
- {
1425
- "current_steps": 237,
1426
- "loss": 0.8044,
1427
- "learning_rate": 7.108695652173913e-07,
1428
- "epoch": 2.125
1429
- },
1430
- {
1431
- "current_steps": 238,
1432
- "loss": 0.8984,
1433
- "learning_rate": 7.08695652173913e-07,
1434
- "epoch": 2.1339285714285716
1435
- },
1436
- {
1437
- "current_steps": 239,
1438
- "loss": 0.7468,
1439
- "learning_rate": 7.065217391304348e-07,
1440
- "epoch": 2.142857142857143
1441
- },
1442
- {
1443
- "current_steps": 240,
1444
- "loss": 0.744,
1445
- "learning_rate": 7.043478260869565e-07,
1446
- "epoch": 2.1517857142857144
1447
- },
1448
- {
1449
- "current_steps": 241,
1450
- "loss": 0.5531,
1451
- "learning_rate": 7.021739130434783e-07,
1452
- "epoch": 2.1607142857142856
1453
- },
1454
- {
1455
- "current_steps": 242,
1456
- "loss": 0.8155,
1457
- "learning_rate": 7e-07,
1458
- "epoch": 2.169642857142857
1459
- },
1460
- {
1461
- "current_steps": 243,
1462
- "loss": 0.7626,
1463
- "learning_rate": 6.978260869565217e-07,
1464
- "epoch": 2.1785714285714284
1465
- },
1466
- {
1467
- "current_steps": 244,
1468
- "loss": 0.5438,
1469
- "learning_rate": 6.956521739130434e-07,
1470
- "epoch": 2.1875
1471
- },
1472
- {
1473
- "current_steps": 245,
1474
- "loss": 0.7638,
1475
- "learning_rate": 6.934782608695652e-07,
1476
- "epoch": 2.1964285714285716
1477
- },
1478
- {
1479
- "current_steps": 246,
1480
- "loss": 0.5092,
1481
- "learning_rate": 6.913043478260869e-07,
1482
- "epoch": 2.205357142857143
1483
- },
1484
- {
1485
- "current_steps": 247,
1486
- "loss": 0.7026,
1487
- "learning_rate": 6.891304347826086e-07,
1488
- "epoch": 2.2142857142857144
1489
- },
1490
- {
1491
- "current_steps": 248,
1492
- "loss": 0.727,
1493
- "learning_rate": 6.869565217391305e-07,
1494
- "epoch": 2.2232142857142856
1495
- },
1496
- {
1497
- "current_steps": 249,
1498
- "loss": 0.6229,
1499
- "learning_rate": 6.847826086956522e-07,
1500
- "epoch": 2.232142857142857
1501
- },
1502
- {
1503
- "current_steps": 250,
1504
- "loss": 0.6695,
1505
- "learning_rate": 6.826086956521738e-07,
1506
- "epoch": 2.2410714285714284
1507
- },
1508
- {
1509
- "current_steps": 251,
1510
- "loss": 0.6603,
1511
- "learning_rate": 6.804347826086956e-07,
1512
- "epoch": 2.25
1513
- },
1514
- {
1515
- "current_steps": 252,
1516
- "loss": 0.7804,
1517
- "learning_rate": 6.782608695652173e-07,
1518
- "epoch": 2.2589285714285716
1519
- },
1520
- {
1521
- "current_steps": 253,
1522
- "loss": 0.9138,
1523
- "learning_rate": 6.76086956521739e-07,
1524
- "epoch": 2.267857142857143
1525
- },
1526
- {
1527
- "current_steps": 254,
1528
- "loss": 0.7793,
1529
- "learning_rate": 6.739130434782609e-07,
1530
- "epoch": 2.2767857142857144
1531
- },
1532
- {
1533
- "current_steps": 255,
1534
- "loss": 0.7045,
1535
- "learning_rate": 6.717391304347826e-07,
1536
- "epoch": 2.2857142857142856
1537
- },
1538
- {
1539
- "current_steps": 256,
1540
- "loss": 0.8594,
1541
- "learning_rate": 6.695652173913044e-07,
1542
- "epoch": 2.294642857142857
1543
- },
1544
- {
1545
- "current_steps": 257,
1546
- "loss": 0.9529,
1547
- "learning_rate": 6.673913043478261e-07,
1548
- "epoch": 2.3035714285714284
1549
- },
1550
- {
1551
- "current_steps": 258,
1552
- "loss": 0.7477,
1553
- "learning_rate": 6.652173913043478e-07,
1554
- "epoch": 2.3125
1555
- },
1556
- {
1557
- "current_steps": 259,
1558
- "loss": 0.7676,
1559
- "learning_rate": 6.630434782608695e-07,
1560
- "epoch": 2.3214285714285716
1561
- },
1562
- {
1563
- "current_steps": 260,
1564
- "loss": 0.6468,
1565
- "learning_rate": 6.608695652173912e-07,
1566
- "epoch": 2.330357142857143
1567
- },
1568
- {
1569
- "current_steps": 261,
1570
- "loss": 0.6665,
1571
- "learning_rate": 6.58695652173913e-07,
1572
- "epoch": 2.3392857142857144
1573
- },
1574
- {
1575
- "current_steps": 262,
1576
- "loss": 0.838,
1577
- "learning_rate": 6.565217391304348e-07,
1578
- "epoch": 2.3482142857142856
1579
- },
1580
- {
1581
- "current_steps": 263,
1582
- "loss": 0.7129,
1583
- "learning_rate": 6.543478260869565e-07,
1584
- "epoch": 2.357142857142857
1585
- },
1586
- {
1587
- "current_steps": 264,
1588
- "loss": 0.8685,
1589
- "learning_rate": 6.521739130434782e-07,
1590
- "epoch": 2.3660714285714284
1591
- },
1592
- {
1593
- "current_steps": 265,
1594
- "loss": 0.7224,
1595
- "learning_rate": 6.5e-07,
1596
- "epoch": 2.375
1597
- },
1598
- {
1599
- "current_steps": 266,
1600
- "loss": 0.7037,
1601
- "learning_rate": 6.478260869565217e-07,
1602
- "epoch": 2.3839285714285716
1603
- },
1604
- {
1605
- "current_steps": 267,
1606
- "loss": 0.5596,
1607
- "learning_rate": 6.456521739130435e-07,
1608
- "epoch": 2.392857142857143
1609
- },
1610
- {
1611
- "current_steps": 268,
1612
- "loss": 0.8887,
1613
- "learning_rate": 6.434782608695652e-07,
1614
- "epoch": 2.4017857142857144
1615
- },
1616
- {
1617
- "current_steps": 269,
1618
- "loss": 0.6721,
1619
- "learning_rate": 6.413043478260869e-07,
1620
- "epoch": 2.4107142857142856
1621
- },
1622
- {
1623
- "current_steps": 270,
1624
- "loss": 0.7387,
1625
- "learning_rate": 6.391304347826086e-07,
1626
- "epoch": 2.419642857142857
1627
- },
1628
- {
1629
- "current_steps": 271,
1630
- "loss": 0.6304,
1631
- "learning_rate": 6.369565217391304e-07,
1632
- "epoch": 2.4285714285714284
1633
- },
1634
- {
1635
- "current_steps": 272,
1636
- "loss": 0.7563,
1637
- "learning_rate": 6.347826086956521e-07,
1638
- "epoch": 2.4375
1639
- },
1640
- {
1641
- "current_steps": 273,
1642
- "loss": 0.6833,
1643
- "learning_rate": 6.326086956521739e-07,
1644
- "epoch": 2.4464285714285716
1645
- },
1646
- {
1647
- "current_steps": 274,
1648
- "loss": 0.722,
1649
- "learning_rate": 6.304347826086957e-07,
1650
- "epoch": 2.455357142857143
1651
- },
1652
- {
1653
- "current_steps": 275,
1654
- "loss": 0.8583,
1655
- "learning_rate": 6.282608695652174e-07,
1656
- "epoch": 2.4642857142857144
1657
- },
1658
- {
1659
- "current_steps": 276,
1660
- "loss": 0.8988,
1661
- "learning_rate": 6.260869565217392e-07,
1662
- "epoch": 2.4732142857142856
1663
- },
1664
- {
1665
- "current_steps": 277,
1666
- "loss": 0.6269,
1667
- "learning_rate": 6.239130434782608e-07,
1668
- "epoch": 2.482142857142857
1669
- },
1670
- {
1671
- "current_steps": 278,
1672
- "loss": 0.473,
1673
- "learning_rate": 6.217391304347825e-07,
1674
- "epoch": 2.4910714285714284
1675
- },
1676
- {
1677
- "current_steps": 279,
1678
- "loss": 0.7065,
1679
- "learning_rate": 6.195652173913043e-07,
1680
- "epoch": 2.5
1681
- },
1682
- {
1683
- "current_steps": 280,
1684
- "loss": 0.7912,
1685
- "learning_rate": 6.17391304347826e-07,
1686
- "epoch": 2.508928571428571
1687
- },
1688
- {
1689
- "current_steps": 281,
1690
- "loss": 0.6589,
1691
- "learning_rate": 6.152173913043478e-07,
1692
- "epoch": 2.517857142857143
1693
- },
1694
- {
1695
- "current_steps": 282,
1696
- "loss": 0.5908,
1697
- "learning_rate": 6.130434782608696e-07,
1698
- "epoch": 2.5267857142857144
1699
- },
1700
- {
1701
- "current_steps": 283,
1702
- "loss": 0.839,
1703
- "learning_rate": 6.108695652173913e-07,
1704
- "epoch": 2.5357142857142856
1705
- },
1706
- {
1707
- "current_steps": 284,
1708
- "loss": 0.9573,
1709
- "learning_rate": 6.08695652173913e-07,
1710
- "epoch": 2.544642857142857
1711
- },
1712
- {
1713
- "current_steps": 285,
1714
- "loss": 0.8881,
1715
- "learning_rate": 6.065217391304348e-07,
1716
- "epoch": 2.553571428571429
1717
- },
1718
- {
1719
- "current_steps": 286,
1720
- "loss": 0.5213,
1721
- "learning_rate": 6.043478260869564e-07,
1722
- "epoch": 2.5625
1723
- },
1724
- {
1725
- "current_steps": 287,
1726
- "loss": 0.5668,
1727
- "learning_rate": 6.021739130434782e-07,
1728
- "epoch": 2.571428571428571
1729
- },
1730
- {
1731
- "current_steps": 288,
1732
- "loss": 0.6856,
1733
- "learning_rate": 6e-07,
1734
- "epoch": 2.580357142857143
1735
- },
1736
- {
1737
- "current_steps": 289,
1738
- "loss": 0.6793,
1739
- "learning_rate": 5.978260869565217e-07,
1740
- "epoch": 2.5892857142857144
1741
- },
1742
- {
1743
- "current_steps": 290,
1744
- "loss": 0.6176,
1745
- "learning_rate": 5.956521739130435e-07,
1746
- "epoch": 2.5982142857142856
1747
- },
1748
- {
1749
- "current_steps": 291,
1750
- "loss": 0.5633,
1751
- "learning_rate": 5.934782608695652e-07,
1752
- "epoch": 2.607142857142857
1753
- },
1754
- {
1755
- "current_steps": 292,
1756
- "loss": 0.8512,
1757
- "learning_rate": 5.913043478260869e-07,
1758
- "epoch": 2.616071428571429
1759
- },
1760
- {
1761
- "current_steps": 293,
1762
- "loss": 0.9664,
1763
- "learning_rate": 5.891304347826088e-07,
1764
- "epoch": 2.625
1765
- },
1766
- {
1767
- "current_steps": 294,
1768
- "loss": 0.6124,
1769
- "learning_rate": 5.869565217391305e-07,
1770
- "epoch": 2.633928571428571
1771
- },
1772
- {
1773
- "current_steps": 295,
1774
- "loss": 0.6244,
1775
- "learning_rate": 5.847826086956521e-07,
1776
- "epoch": 2.642857142857143
1777
- },
1778
- {
1779
- "current_steps": 296,
1780
- "loss": 0.7879,
1781
- "learning_rate": 5.826086956521739e-07,
1782
- "epoch": 2.6517857142857144
1783
- },
1784
- {
1785
- "current_steps": 297,
1786
- "loss": 0.6862,
1787
- "learning_rate": 5.804347826086956e-07,
1788
- "epoch": 2.6607142857142856
1789
- },
1790
- {
1791
- "current_steps": 298,
1792
- "loss": 0.6368,
1793
- "learning_rate": 5.782608695652173e-07,
1794
- "epoch": 2.669642857142857
1795
- },
1796
- {
1797
- "current_steps": 299,
1798
- "loss": 0.8478,
1799
- "learning_rate": 5.760869565217391e-07,
1800
- "epoch": 2.678571428571429
1801
- },
1802
- {
1803
- "current_steps": 300,
1804
- "loss": 0.6466,
1805
- "learning_rate": 5.739130434782609e-07,
1806
- "epoch": 2.6875
1807
- },
1808
- {
1809
- "current_steps": 301,
1810
- "loss": 0.7323,
1811
- "learning_rate": 5.717391304347826e-07,
1812
- "epoch": 2.696428571428571
1813
- },
1814
- {
1815
- "current_steps": 302,
1816
- "loss": 0.7611,
1817
- "learning_rate": 5.695652173913044e-07,
1818
- "epoch": 2.705357142857143
1819
- },
1820
- {
1821
- "current_steps": 303,
1822
- "loss": 0.7075,
1823
- "learning_rate": 5.673913043478261e-07,
1824
- "epoch": 2.7142857142857144
1825
- },
1826
- {
1827
- "current_steps": 304,
1828
- "loss": 0.5448,
1829
- "learning_rate": 5.652173913043477e-07,
1830
- "epoch": 2.7232142857142856
1831
- },
1832
- {
1833
- "current_steps": 305,
1834
- "loss": 0.704,
1835
- "learning_rate": 5.630434782608695e-07,
1836
- "epoch": 2.732142857142857
1837
- },
1838
- {
1839
- "current_steps": 306,
1840
- "loss": 0.8591,
1841
- "learning_rate": 5.608695652173912e-07,
1842
- "epoch": 2.741071428571429
1843
- },
1844
- {
1845
- "current_steps": 307,
1846
- "loss": 0.6702,
1847
- "learning_rate": 5.58695652173913e-07,
1848
- "epoch": 2.75
1849
- },
1850
- {
1851
- "current_steps": 308,
1852
- "loss": 0.6652,
1853
- "learning_rate": 5.565217391304348e-07,
1854
- "epoch": 2.758928571428571
1855
- },
1856
- {
1857
- "current_steps": 309,
1858
- "loss": 0.7208,
1859
- "learning_rate": 5.543478260869565e-07,
1860
- "epoch": 2.767857142857143
1861
- },
1862
- {
1863
- "current_steps": 310,
1864
- "loss": 0.7334,
1865
- "learning_rate": 5.521739130434783e-07,
1866
- "epoch": 2.7767857142857144
1867
- },
1868
- {
1869
- "current_steps": 311,
1870
- "loss": 0.865,
1871
- "learning_rate": 5.5e-07,
1872
- "epoch": 2.7857142857142856
1873
- },
1874
- {
1875
- "current_steps": 312,
1876
- "loss": 0.5955,
1877
- "learning_rate": 5.478260869565216e-07,
1878
- "epoch": 2.794642857142857
1879
- },
1880
- {
1881
- "current_steps": 313,
1882
- "loss": 0.5059,
1883
- "learning_rate": 5.456521739130435e-07,
1884
- "epoch": 2.803571428571429
1885
- },
1886
- {
1887
- "current_steps": 314,
1888
- "loss": 1.0855,
1889
- "learning_rate": 5.434782608695652e-07,
1890
- "epoch": 2.8125
1891
- },
1892
- {
1893
- "current_steps": 315,
1894
- "loss": 0.7484,
1895
- "learning_rate": 5.413043478260869e-07,
1896
- "epoch": 2.821428571428571
1897
- },
1898
- {
1899
- "current_steps": 316,
1900
- "loss": 0.8017,
1901
- "learning_rate": 5.391304347826087e-07,
1902
- "epoch": 2.830357142857143
1903
- },
1904
- {
1905
- "current_steps": 317,
1906
- "loss": 0.7272,
1907
- "learning_rate": 5.369565217391304e-07,
1908
- "epoch": 2.8392857142857144
1909
- },
1910
- {
1911
- "current_steps": 318,
1912
- "loss": 0.6897,
1913
- "learning_rate": 5.347826086956521e-07,
1914
- "epoch": 2.8482142857142856
1915
- },
1916
- {
1917
- "current_steps": 319,
1918
- "loss": 0.634,
1919
- "learning_rate": 5.32608695652174e-07,
1920
- "epoch": 2.857142857142857
1921
- },
1922
- {
1923
- "current_steps": 320,
1924
- "loss": 0.7684,
1925
- "learning_rate": 5.304347826086957e-07,
1926
- "epoch": 2.866071428571429
1927
- },
1928
- {
1929
- "current_steps": 321,
1930
- "loss": 0.5758,
1931
- "learning_rate": 5.282608695652173e-07,
1932
- "epoch": 2.875
1933
- },
1934
- {
1935
- "current_steps": 322,
1936
- "loss": 0.687,
1937
- "learning_rate": 5.260869565217391e-07,
1938
- "epoch": 2.883928571428571
1939
- },
1940
- {
1941
- "current_steps": 323,
1942
- "loss": 0.6942,
1943
- "learning_rate": 5.239130434782608e-07,
1944
- "epoch": 2.892857142857143
1945
- },
1946
- {
1947
- "current_steps": 324,
1948
- "loss": 0.7698,
1949
- "learning_rate": 5.217391304347825e-07,
1950
- "epoch": 2.9017857142857144
1951
- },
1952
- {
1953
- "current_steps": 325,
1954
- "loss": 0.815,
1955
- "learning_rate": 5.195652173913043e-07,
1956
- "epoch": 2.9107142857142856
1957
- },
1958
- {
1959
- "current_steps": 326,
1960
- "loss": 0.6837,
1961
- "learning_rate": 5.173913043478261e-07,
1962
- "epoch": 2.919642857142857
1963
- },
1964
- {
1965
- "current_steps": 327,
1966
- "loss": 0.7103,
1967
- "learning_rate": 5.152173913043479e-07,
1968
- "epoch": 2.928571428571429
1969
- },
1970
- {
1971
- "current_steps": 328,
1972
- "loss": 0.6798,
1973
- "learning_rate": 5.130434782608696e-07,
1974
- "epoch": 2.9375
1975
- },
1976
- {
1977
- "current_steps": 329,
1978
- "loss": 0.767,
1979
- "learning_rate": 5.108695652173913e-07,
1980
- "epoch": 2.946428571428571
1981
- },
1982
- {
1983
- "current_steps": 330,
1984
- "loss": 0.6161,
1985
- "learning_rate": 5.08695652173913e-07,
1986
- "epoch": 2.955357142857143
1987
- },
1988
- {
1989
- "current_steps": 331,
1990
- "loss": 0.6607,
1991
- "learning_rate": 5.065217391304347e-07,
1992
- "epoch": 2.9642857142857144
1993
- },
1994
- {
1995
- "current_steps": 332,
1996
- "loss": 0.6875,
1997
- "learning_rate": 5.043478260869564e-07,
1998
- "epoch": 2.9732142857142856
1999
- },
2000
- {
2001
- "current_steps": 333,
2002
- "loss": 0.746,
2003
- "learning_rate": 5.021739130434783e-07,
2004
- "epoch": 2.982142857142857
2005
- },
2006
- {
2007
- "current_steps": 334,
2008
- "loss": 0.6093,
2009
- "learning_rate": 5e-07,
2010
- "epoch": 2.991071428571429
2011
- },
2012
- {
2013
- "current_steps": 335,
2014
- "loss": 0.5599,
2015
- "learning_rate": 4.978260869565217e-07,
2016
- "epoch": 3.0
2017
- },
2018
- {
2019
- "current_steps": 336,
2020
- "loss": 0.5985,
2021
- "learning_rate": 4.956521739130435e-07,
2022
- "epoch": 3.0089285714285716
2023
- },
2024
- {
2025
- "current_steps": 337,
2026
- "loss": 0.6692,
2027
- "learning_rate": 4.934782608695652e-07,
2028
- "epoch": 3.017857142857143
2029
- },
2030
- {
2031
- "current_steps": 338,
2032
- "loss": 0.5887,
2033
- "learning_rate": 4.913043478260869e-07,
2034
- "epoch": 3.0267857142857144
2035
- },
2036
- {
2037
- "current_steps": 339,
2038
- "loss": 0.5831,
2039
- "learning_rate": 4.891304347826087e-07,
2040
- "epoch": 3.0357142857142856
2041
- },
2042
- {
2043
- "current_steps": 340,
2044
- "loss": 0.5424,
2045
- "learning_rate": 4.869565217391305e-07,
2046
- "epoch": 3.044642857142857
2047
- },
2048
- {
2049
- "current_steps": 341,
2050
- "loss": 1.0041,
2051
- "learning_rate": 4.847826086956521e-07,
2052
- "epoch": 3.0535714285714284
2053
- },
2054
- {
2055
- "current_steps": 342,
2056
- "loss": 0.6989,
2057
- "learning_rate": 4.826086956521739e-07,
2058
- "epoch": 3.0625
2059
- },
2060
- {
2061
- "current_steps": 343,
2062
- "loss": 0.7104,
2063
- "learning_rate": 4.804347826086956e-07,
2064
- "epoch": 3.0714285714285716
2065
- },
2066
- {
2067
- "current_steps": 344,
2068
- "loss": 0.6493,
2069
- "learning_rate": 4.782608695652174e-07,
2070
- "epoch": 3.080357142857143
2071
- },
2072
- {
2073
- "current_steps": 345,
2074
- "loss": 0.8018,
2075
- "learning_rate": 4.7608695652173915e-07,
2076
- "epoch": 3.0892857142857144
2077
- },
2078
- {
2079
- "current_steps": 346,
2080
- "loss": 0.638,
2081
- "learning_rate": 4.739130434782608e-07,
2082
- "epoch": 3.0982142857142856
2083
- },
2084
- {
2085
- "current_steps": 347,
2086
- "loss": 0.7714,
2087
- "learning_rate": 4.717391304347826e-07,
2088
- "epoch": 3.107142857142857
2089
- },
2090
- {
2091
- "current_steps": 348,
2092
- "loss": 0.7103,
2093
- "learning_rate": 4.6956521739130434e-07,
2094
- "epoch": 3.1160714285714284
2095
- },
2096
- {
2097
- "current_steps": 349,
2098
- "loss": 0.5937,
2099
- "learning_rate": 4.673913043478261e-07,
2100
- "epoch": 3.125
2101
- },
2102
- {
2103
- "current_steps": 350,
2104
- "loss": 0.7256,
2105
- "learning_rate": 4.6521739130434777e-07,
2106
- "epoch": 3.1339285714285716
2107
- },
2108
- {
2109
- "current_steps": 351,
2110
- "loss": 0.864,
2111
- "learning_rate": 4.6304347826086954e-07,
2112
- "epoch": 3.142857142857143
2113
- },
2114
- {
2115
- "current_steps": 352,
2116
- "loss": 0.7429,
2117
- "learning_rate": 4.608695652173913e-07,
2118
- "epoch": 3.1517857142857144
2119
- },
2120
- {
2121
- "current_steps": 353,
2122
- "loss": 0.6658,
2123
- "learning_rate": 4.58695652173913e-07,
2124
- "epoch": 3.1607142857142856
2125
- },
2126
- {
2127
- "current_steps": 354,
2128
- "loss": 0.647,
2129
- "learning_rate": 4.5652173913043473e-07,
2130
- "epoch": 3.169642857142857
2131
- },
2132
- {
2133
- "current_steps": 355,
2134
- "loss": 0.7772,
2135
- "learning_rate": 4.543478260869565e-07,
2136
- "epoch": 3.1785714285714284
2137
- },
2138
- {
2139
- "current_steps": 356,
2140
- "loss": 0.6939,
2141
- "learning_rate": 4.521739130434782e-07,
2142
- "epoch": 3.1875
2143
- },
2144
- {
2145
- "current_steps": 357,
2146
- "loss": 0.5744,
2147
- "learning_rate": 4.5e-07,
2148
- "epoch": 3.1964285714285716
2149
- },
2150
- {
2151
- "current_steps": 358,
2152
- "loss": 0.7193,
2153
- "learning_rate": 4.4782608695652175e-07,
2154
- "epoch": 3.205357142857143
2155
- },
2156
- {
2157
- "current_steps": 359,
2158
- "loss": 0.667,
2159
- "learning_rate": 4.4565217391304346e-07,
2160
- "epoch": 3.2142857142857144
2161
- },
2162
- {
2163
- "current_steps": 360,
2164
- "loss": 0.6671,
2165
- "learning_rate": 4.434782608695652e-07,
2166
- "epoch": 3.2232142857142856
2167
- },
2168
- {
2169
- "current_steps": 361,
2170
- "loss": 0.8531,
2171
- "learning_rate": 4.4130434782608694e-07,
2172
- "epoch": 3.232142857142857
2173
- },
2174
- {
2175
- "current_steps": 362,
2176
- "loss": 0.6706,
2177
- "learning_rate": 4.391304347826087e-07,
2178
- "epoch": 3.2410714285714284
2179
- },
2180
- {
2181
- "current_steps": 363,
2182
- "loss": 0.8786,
2183
- "learning_rate": 4.3695652173913037e-07,
2184
- "epoch": 3.25
2185
- },
2186
- {
2187
- "current_steps": 364,
2188
- "loss": 0.6281,
2189
- "learning_rate": 4.3478260869565214e-07,
2190
- "epoch": 3.2589285714285716
2191
- },
2192
- {
2193
- "current_steps": 365,
2194
- "loss": 0.8648,
2195
- "learning_rate": 4.326086956521739e-07,
2196
- "epoch": 3.267857142857143
2197
- },
2198
- {
2199
- "current_steps": 366,
2200
- "loss": 0.5872,
2201
- "learning_rate": 4.3043478260869567e-07,
2202
- "epoch": 3.2767857142857144
2203
- },
2204
- {
2205
- "current_steps": 367,
2206
- "loss": 0.5874,
2207
- "learning_rate": 4.282608695652174e-07,
2208
- "epoch": 3.2857142857142856
2209
- },
2210
- {
2211
- "current_steps": 368,
2212
- "loss": 0.7057,
2213
- "learning_rate": 4.260869565217391e-07,
2214
- "epoch": 3.294642857142857
2215
- },
2216
- {
2217
- "current_steps": 369,
2218
- "loss": 0.6076,
2219
- "learning_rate": 4.2391304347826086e-07,
2220
- "epoch": 3.3035714285714284
2221
- },
2222
- {
2223
- "current_steps": 370,
2224
- "loss": 0.7514,
2225
- "learning_rate": 4.217391304347826e-07,
2226
- "epoch": 3.3125
2227
- },
2228
- {
2229
- "current_steps": 371,
2230
- "loss": 0.689,
2231
- "learning_rate": 4.1956521739130434e-07,
2232
- "epoch": 3.3214285714285716
2233
- },
2234
- {
2235
- "current_steps": 372,
2236
- "loss": 0.7074,
2237
- "learning_rate": 4.1739130434782606e-07,
2238
- "epoch": 3.330357142857143
2239
- },
2240
- {
2241
- "current_steps": 373,
2242
- "loss": 0.6425,
2243
- "learning_rate": 4.1521739130434777e-07,
2244
- "epoch": 3.3392857142857144
2245
- },
2246
- {
2247
- "current_steps": 374,
2248
- "loss": 0.5247,
2249
- "learning_rate": 4.1304347826086954e-07,
2250
- "epoch": 3.3482142857142856
2251
- },
2252
- {
2253
- "current_steps": 375,
2254
- "loss": 0.7755,
2255
- "learning_rate": 4.108695652173913e-07,
2256
- "epoch": 3.357142857142857
2257
- },
2258
- {
2259
- "current_steps": 376,
2260
- "loss": 0.7774,
2261
- "learning_rate": 4.0869565217391307e-07,
2262
- "epoch": 3.3660714285714284
2263
- },
2264
- {
2265
- "current_steps": 377,
2266
- "loss": 0.6871,
2267
- "learning_rate": 4.0652173913043473e-07,
2268
- "epoch": 3.375
2269
- },
2270
- {
2271
- "current_steps": 378,
2272
- "loss": 0.566,
2273
- "learning_rate": 4.043478260869565e-07,
2274
- "epoch": 3.3839285714285716
2275
- },
2276
- {
2277
- "current_steps": 379,
2278
- "loss": 1.0922,
2279
- "learning_rate": 4.0217391304347827e-07,
2280
- "epoch": 3.392857142857143
2281
- },
2282
- {
2283
- "current_steps": 380,
2284
- "loss": 0.5958,
2285
- "learning_rate": 4e-07,
2286
- "epoch": 3.4017857142857144
2287
- },
2288
- {
2289
- "current_steps": 381,
2290
- "loss": 0.9182,
2291
- "learning_rate": 3.978260869565217e-07,
2292
- "epoch": 3.4107142857142856
2293
- },
2294
- {
2295
- "current_steps": 382,
2296
- "loss": 0.7356,
2297
- "learning_rate": 3.9565217391304346e-07,
2298
- "epoch": 3.419642857142857
2299
- },
2300
- {
2301
- "current_steps": 383,
2302
- "loss": 0.8677,
2303
- "learning_rate": 3.9347826086956523e-07,
2304
- "epoch": 3.4285714285714284
2305
- },
2306
- {
2307
- "current_steps": 384,
2308
- "loss": 0.6885,
2309
- "learning_rate": 3.9130434782608694e-07,
2310
- "epoch": 3.4375
2311
- },
2312
- {
2313
- "current_steps": 385,
2314
- "loss": 0.7982,
2315
- "learning_rate": 3.8913043478260866e-07,
2316
- "epoch": 3.4464285714285716
2317
- },
2318
- {
2319
- "current_steps": 386,
2320
- "loss": 0.8466,
2321
- "learning_rate": 3.869565217391304e-07,
2322
- "epoch": 3.455357142857143
2323
- },
2324
- {
2325
- "current_steps": 387,
2326
- "loss": 0.4563,
2327
- "learning_rate": 3.8478260869565214e-07,
2328
- "epoch": 3.4642857142857144
2329
- },
2330
- {
2331
- "current_steps": 388,
2332
- "loss": 0.7675,
2333
- "learning_rate": 3.826086956521739e-07,
2334
- "epoch": 3.4732142857142856
2335
- },
2336
- {
2337
- "current_steps": 389,
2338
- "loss": 0.7642,
2339
- "learning_rate": 3.8043478260869567e-07,
2340
- "epoch": 3.482142857142857
2341
- },
2342
- {
2343
- "current_steps": 390,
2344
- "loss": 0.6065,
2345
- "learning_rate": 3.7826086956521733e-07,
2346
- "epoch": 3.4910714285714284
2347
- },
2348
- {
2349
- "current_steps": 391,
2350
- "loss": 0.6121,
2351
- "learning_rate": 3.760869565217391e-07,
2352
- "epoch": 3.5
2353
- },
2354
- {
2355
- "current_steps": 392,
2356
- "loss": 0.8562,
2357
- "learning_rate": 3.7391304347826087e-07,
2358
- "epoch": 3.508928571428571
2359
- },
2360
- {
2361
- "current_steps": 393,
2362
- "loss": 0.8169,
2363
- "learning_rate": 3.7173913043478263e-07,
2364
- "epoch": 3.517857142857143
2365
- },
2366
- {
2367
- "current_steps": 394,
2368
- "loss": 0.7264,
2369
- "learning_rate": 3.695652173913043e-07,
2370
- "epoch": 3.5267857142857144
2371
- },
2372
- {
2373
- "current_steps": 395,
2374
- "loss": 0.6761,
2375
- "learning_rate": 3.6739130434782606e-07,
2376
- "epoch": 3.5357142857142856
2377
- },
2378
- {
2379
- "current_steps": 396,
2380
- "loss": 0.485,
2381
- "learning_rate": 3.6521739130434783e-07,
2382
- "epoch": 3.544642857142857
2383
- },
2384
- {
2385
- "current_steps": 397,
2386
- "loss": 0.6992,
2387
- "learning_rate": 3.6304347826086954e-07,
2388
- "epoch": 3.553571428571429
2389
- },
2390
- {
2391
- "current_steps": 398,
2392
- "loss": 0.6543,
2393
- "learning_rate": 3.608695652173913e-07,
2394
- "epoch": 3.5625
2395
- },
2396
- {
2397
- "current_steps": 399,
2398
- "loss": 0.6019,
2399
- "learning_rate": 3.58695652173913e-07,
2400
- "epoch": 3.571428571428571
2401
- },
2402
- {
2403
- "current_steps": 400,
2404
- "loss": 0.8135,
2405
- "learning_rate": 3.5652173913043474e-07,
2406
- "epoch": 3.580357142857143
2407
- },
2408
- {
2409
- "current_steps": 401,
2410
- "loss": 0.5053,
2411
- "learning_rate": 3.543478260869565e-07,
2412
- "epoch": 3.5892857142857144
2413
- },
2414
- {
2415
- "current_steps": 402,
2416
- "loss": 0.6121,
2417
- "learning_rate": 3.5217391304347827e-07,
2418
- "epoch": 3.5982142857142856
2419
- },
2420
- {
2421
- "current_steps": 403,
2422
- "loss": 0.5648,
2423
- "learning_rate": 3.5e-07,
2424
- "epoch": 3.607142857142857
2425
- },
2426
- {
2427
- "current_steps": 404,
2428
- "loss": 0.6023,
2429
- "learning_rate": 3.478260869565217e-07,
2430
- "epoch": 3.616071428571429
2431
- },
2432
- {
2433
- "current_steps": 405,
2434
- "loss": 0.7843,
2435
- "learning_rate": 3.4565217391304346e-07,
2436
- "epoch": 3.625
2437
- },
2438
- {
2439
- "current_steps": 406,
2440
- "loss": 0.6902,
2441
- "learning_rate": 3.4347826086956523e-07,
2442
- "epoch": 3.633928571428571
2443
- },
2444
- {
2445
- "current_steps": 407,
2446
- "loss": 0.6103,
2447
- "learning_rate": 3.413043478260869e-07,
2448
- "epoch": 3.642857142857143
2449
- },
2450
- {
2451
- "current_steps": 408,
2452
- "loss": 0.759,
2453
- "learning_rate": 3.3913043478260866e-07,
2454
- "epoch": 3.6517857142857144
2455
- },
2456
- {
2457
- "current_steps": 409,
2458
- "loss": 0.7823,
2459
- "learning_rate": 3.369565217391304e-07,
2460
- "epoch": 3.6607142857142856
2461
- },
2462
- {
2463
- "current_steps": 410,
2464
- "loss": 0.8021,
2465
- "learning_rate": 3.347826086956522e-07,
2466
- "epoch": 3.669642857142857
2467
- },
2468
- {
2469
- "current_steps": 411,
2470
- "loss": 0.5927,
2471
- "learning_rate": 3.326086956521739e-07,
2472
- "epoch": 3.678571428571429
2473
- },
2474
- {
2475
- "current_steps": 412,
2476
- "loss": 0.6503,
2477
- "learning_rate": 3.304347826086956e-07,
2478
- "epoch": 3.6875
2479
- },
2480
- {
2481
- "current_steps": 413,
2482
- "loss": 0.886,
2483
- "learning_rate": 3.282608695652174e-07,
2484
- "epoch": 3.696428571428571
2485
- },
2486
- {
2487
- "current_steps": 414,
2488
- "loss": 0.6331,
2489
- "learning_rate": 3.260869565217391e-07,
2490
- "epoch": 3.705357142857143
2491
- },
2492
- {
2493
- "current_steps": 415,
2494
- "loss": 0.7633,
2495
- "learning_rate": 3.2391304347826087e-07,
2496
- "epoch": 3.7142857142857144
2497
- },
2498
- {
2499
- "current_steps": 416,
2500
- "loss": 0.6538,
2501
- "learning_rate": 3.217391304347826e-07,
2502
- "epoch": 3.7232142857142856
2503
- },
2504
- {
2505
- "current_steps": 417,
2506
- "loss": 0.6156,
2507
- "learning_rate": 3.195652173913043e-07,
2508
- "epoch": 3.732142857142857
2509
- },
2510
- {
2511
- "current_steps": 418,
2512
- "loss": 0.6973,
2513
- "learning_rate": 3.1739130434782606e-07,
2514
- "epoch": 3.741071428571429
2515
- },
2516
- {
2517
- "current_steps": 419,
2518
- "loss": 0.6521,
2519
- "learning_rate": 3.1521739130434783e-07,
2520
- "epoch": 3.75
2521
- },
2522
- {
2523
- "current_steps": 420,
2524
- "loss": 0.6931,
2525
- "learning_rate": 3.130434782608696e-07,
2526
- "epoch": 3.758928571428571
2527
- },
2528
- {
2529
- "current_steps": 421,
2530
- "loss": 0.8192,
2531
- "learning_rate": 3.1086956521739126e-07,
2532
- "epoch": 3.767857142857143
2533
- },
2534
- {
2535
- "current_steps": 422,
2536
- "loss": 0.5986,
2537
- "learning_rate": 3.08695652173913e-07,
2538
- "epoch": 3.7767857142857144
2539
- },
2540
- {
2541
- "current_steps": 423,
2542
- "loss": 0.9986,
2543
- "learning_rate": 3.065217391304348e-07,
2544
- "epoch": 3.7857142857142856
2545
- },
2546
- {
2547
- "current_steps": 424,
2548
- "loss": 0.7645,
2549
- "learning_rate": 3.043478260869565e-07,
2550
- "epoch": 3.794642857142857
2551
- },
2552
- {
2553
- "current_steps": 425,
2554
- "loss": 0.6489,
2555
- "learning_rate": 3.021739130434782e-07,
2556
- "epoch": 3.803571428571429
2557
- },
2558
- {
2559
- "current_steps": 426,
2560
- "loss": 0.5974,
2561
- "learning_rate": 3e-07,
2562
- "epoch": 3.8125
2563
- },
2564
- {
2565
- "current_steps": 427,
2566
- "loss": 0.7392,
2567
- "learning_rate": 2.9782608695652175e-07,
2568
- "epoch": 3.821428571428571
2569
- },
2570
- {
2571
- "current_steps": 428,
2572
- "loss": 0.7813,
2573
- "learning_rate": 2.9565217391304347e-07,
2574
- "epoch": 3.830357142857143
2575
- },
2576
- {
2577
- "current_steps": 429,
2578
- "loss": 0.7818,
2579
- "learning_rate": 2.9347826086956523e-07,
2580
- "epoch": 3.8392857142857144
2581
- },
2582
- {
2583
- "current_steps": 430,
2584
- "loss": 1.0693,
2585
- "learning_rate": 2.9130434782608695e-07,
2586
- "epoch": 3.8482142857142856
2587
- },
2588
- {
2589
- "current_steps": 431,
2590
- "loss": 0.6324,
2591
- "learning_rate": 2.8913043478260866e-07,
2592
- "epoch": 3.857142857142857
2593
- },
2594
- {
2595
- "current_steps": 432,
2596
- "loss": 0.5228,
2597
- "learning_rate": 2.8695652173913043e-07,
2598
- "epoch": 3.866071428571429
2599
- },
2600
- {
2601
- "current_steps": 433,
2602
- "loss": 0.6631,
2603
- "learning_rate": 2.847826086956522e-07,
2604
- "epoch": 3.875
2605
- },
2606
- {
2607
- "current_steps": 434,
2608
- "loss": 0.6685,
2609
- "learning_rate": 2.8260869565217386e-07,
2610
- "epoch": 3.883928571428571
2611
- },
2612
- {
2613
- "current_steps": 435,
2614
- "loss": 0.6566,
2615
- "learning_rate": 2.804347826086956e-07,
2616
- "epoch": 3.892857142857143
2617
- },
2618
- {
2619
- "current_steps": 436,
2620
- "loss": 0.6169,
2621
- "learning_rate": 2.782608695652174e-07,
2622
- "epoch": 3.9017857142857144
2623
- },
2624
- {
2625
- "current_steps": 437,
2626
- "loss": 0.5012,
2627
- "learning_rate": 2.7608695652173916e-07,
2628
- "epoch": 3.9107142857142856
2629
- },
2630
- {
2631
- "current_steps": 438,
2632
- "loss": 0.637,
2633
- "learning_rate": 2.739130434782608e-07,
2634
- "epoch": 3.919642857142857
2635
- },
2636
- {
2637
- "current_steps": 439,
2638
- "loss": 0.7777,
2639
- "learning_rate": 2.717391304347826e-07,
2640
- "epoch": 3.928571428571429
2641
- },
2642
- {
2643
- "current_steps": 440,
2644
- "loss": 0.6963,
2645
- "learning_rate": 2.6956521739130435e-07,
2646
- "epoch": 3.9375
2647
- },
2648
- {
2649
- "current_steps": 441,
2650
- "loss": 0.5398,
2651
- "learning_rate": 2.6739130434782607e-07,
2652
- "epoch": 3.946428571428571
2653
- },
2654
- {
2655
- "current_steps": 442,
2656
- "loss": 1.0029,
2657
- "learning_rate": 2.6521739130434783e-07,
2658
- "epoch": 3.955357142857143
2659
- },
2660
- {
2661
- "current_steps": 443,
2662
- "loss": 0.8166,
2663
- "learning_rate": 2.6304347826086955e-07,
2664
- "epoch": 3.9642857142857144
2665
- },
2666
- {
2667
- "current_steps": 444,
2668
- "loss": 0.8981,
2669
- "learning_rate": 2.6086956521739126e-07,
2670
- "epoch": 3.9732142857142856
2671
- },
2672
- {
2673
- "current_steps": 445,
2674
- "loss": 0.536,
2675
- "learning_rate": 2.5869565217391303e-07,
2676
- "epoch": 3.982142857142857
2677
- },
2678
- {
2679
- "current_steps": 446,
2680
- "loss": 0.7719,
2681
- "learning_rate": 2.565217391304348e-07,
2682
- "epoch": 3.991071428571429
2683
- },
2684
- {
2685
- "current_steps": 447,
2686
- "loss": 3.9574,
2687
- "learning_rate": 2.565217391304348e-07,
2688
- "epoch": 4.0
2689
- },
2690
- {
2691
- "current_steps": 448,
2692
- "loss": 0.6567,
2693
- "learning_rate": 2.543478260869565e-07,
2694
- "epoch": 4.008928571428571
2695
- },
2696
- {
2697
- "current_steps": 449,
2698
- "loss": 0.8622,
2699
- "learning_rate": 2.521739130434782e-07,
2700
- "epoch": 4.017857142857143
2701
- },
2702
- {
2703
- "current_steps": 450,
2704
- "loss": 0.5737,
2705
- "learning_rate": 2.5e-07,
2706
- "epoch": 4.026785714285714
2707
- },
2708
- {
2709
- "current_steps": 451,
2710
- "loss": 0.736,
2711
- "learning_rate": 2.4782608695652176e-07,
2712
- "epoch": 4.035714285714286
2713
- },
2714
- {
2715
- "current_steps": 452,
2716
- "loss": 0.8457,
2717
- "learning_rate": 2.4565217391304347e-07,
2718
- "epoch": 4.044642857142857
2719
- },
2720
- {
2721
- "current_steps": 453,
2722
- "loss": 0.7416,
2723
- "learning_rate": 2.4347826086956524e-07,
2724
- "epoch": 4.053571428571429
2725
- },
2726
- {
2727
- "current_steps": 454,
2728
- "loss": 1.0355,
2729
- "learning_rate": 2.4130434782608695e-07,
2730
- "epoch": 4.0625
2731
- },
2732
- {
2733
- "current_steps": 455,
2734
- "loss": 0.7162,
2735
- "learning_rate": 2.391304347826087e-07,
2736
- "epoch": 4.071428571428571
2737
- },
2738
- {
2739
- "current_steps": 456,
2740
- "loss": 0.8163,
2741
- "learning_rate": 2.369565217391304e-07,
2742
- "epoch": 4.080357142857143
2743
- },
2744
- {
2745
- "current_steps": 457,
2746
- "loss": 0.5188,
2747
- "learning_rate": 2.3478260869565217e-07,
2748
- "epoch": 4.089285714285714
2749
- },
2750
- {
2751
- "current_steps": 458,
2752
- "loss": 0.9544,
2753
- "learning_rate": 2.3260869565217389e-07,
2754
- "epoch": 4.098214285714286
2755
- },
2756
- {
2757
- "current_steps": 459,
2758
- "loss": 0.6205,
2759
- "learning_rate": 2.3043478260869565e-07,
2760
- "epoch": 4.107142857142857
2761
- },
2762
- {
2763
- "current_steps": 460,
2764
- "loss": 0.6643,
2765
- "learning_rate": 2.2826086956521737e-07,
2766
- "epoch": 4.116071428571429
2767
- },
2768
- {
2769
- "current_steps": 461,
2770
- "loss": 0.6465,
2771
- "learning_rate": 2.260869565217391e-07,
2772
- "epoch": 4.125
2773
- },
2774
- {
2775
- "current_steps": 462,
2776
- "loss": 0.6697,
2777
- "learning_rate": 2.2391304347826087e-07,
2778
- "epoch": 4.133928571428571
2779
- },
2780
- {
2781
- "current_steps": 463,
2782
- "loss": 0.7041,
2783
- "learning_rate": 2.217391304347826e-07,
2784
- "epoch": 4.142857142857143
2785
- },
2786
- {
2787
- "current_steps": 464,
2788
- "loss": 0.802,
2789
- "learning_rate": 2.1956521739130435e-07,
2790
- "epoch": 4.151785714285714
2791
- },
2792
- {
2793
- "current_steps": 465,
2794
- "loss": 0.623,
2795
- "learning_rate": 2.1739130434782607e-07,
2796
- "epoch": 4.160714285714286
2797
- },
2798
- {
2799
- "current_steps": 466,
2800
- "loss": 0.6071,
2801
- "learning_rate": 2.1521739130434783e-07,
2802
- "epoch": 4.169642857142857
2803
- },
2804
- {
2805
- "current_steps": 467,
2806
- "loss": 0.718,
2807
- "learning_rate": 2.1304347826086955e-07,
2808
- "epoch": 4.178571428571429
2809
- },
2810
- {
2811
- "current_steps": 468,
2812
- "loss": 0.6337,
2813
- "learning_rate": 2.108695652173913e-07,
2814
- "epoch": 4.1875
2815
- },
2816
- {
2817
- "current_steps": 469,
2818
- "loss": 0.5689,
2819
- "learning_rate": 2.0869565217391303e-07,
2820
- "epoch": 4.196428571428571
2821
- },
2822
- {
2823
- "current_steps": 470,
2824
- "loss": 0.62,
2825
- "learning_rate": 2.0652173913043477e-07,
2826
- "epoch": 4.205357142857143
2827
- },
2828
- {
2829
- "current_steps": 471,
2830
- "loss": 1.0191,
2831
- "learning_rate": 2.0434782608695654e-07,
2832
- "epoch": 4.214285714285714
2833
- },
2834
- {
2835
- "current_steps": 472,
2836
- "loss": 0.6678,
2837
- "learning_rate": 2.0217391304347825e-07,
2838
- "epoch": 4.223214285714286
2839
- },
2840
- {
2841
- "current_steps": 473,
2842
- "loss": 0.6296,
2843
- "learning_rate": 2e-07,
2844
- "epoch": 4.232142857142857
2845
- },
2846
- {
2847
- "current_steps": 474,
2848
- "loss": 0.884,
2849
- "learning_rate": 1.9782608695652173e-07,
2850
- "epoch": 4.241071428571429
2851
- },
2852
- {
2853
- "current_steps": 475,
2854
- "loss": 0.7207,
2855
- "learning_rate": 1.9565217391304347e-07,
2856
- "epoch": 4.25
2857
- },
2858
- {
2859
- "current_steps": 476,
2860
- "loss": 0.6856,
2861
- "learning_rate": 1.934782608695652e-07,
2862
- "epoch": 4.258928571428571
2863
- },
2864
- {
2865
- "current_steps": 477,
2866
- "loss": 0.6314,
2867
- "learning_rate": 1.9130434782608695e-07,
2868
- "epoch": 4.267857142857143
2869
- },
2870
- {
2871
- "current_steps": 478,
2872
- "loss": 0.5759,
2873
- "learning_rate": 1.8913043478260867e-07,
2874
- "epoch": 4.276785714285714
2875
- },
2876
- {
2877
- "current_steps": 479,
2878
- "loss": 0.6925,
2879
- "learning_rate": 1.8695652173913043e-07,
2880
- "epoch": 4.285714285714286
2881
- },
2882
- {
2883
- "current_steps": 480,
2884
- "loss": 0.6237,
2885
- "learning_rate": 1.8478260869565215e-07,
2886
- "epoch": 4.294642857142857
2887
- },
2888
- {
2889
- "current_steps": 481,
2890
- "loss": 0.6666,
2891
- "learning_rate": 1.8260869565217391e-07,
2892
- "epoch": 4.303571428571429
2893
- },
2894
- {
2895
- "current_steps": 482,
2896
- "loss": 0.709,
2897
- "learning_rate": 1.8043478260869565e-07,
2898
- "epoch": 4.3125
2899
- },
2900
- {
2901
- "current_steps": 483,
2902
- "loss": 0.8078,
2903
- "learning_rate": 1.7826086956521737e-07,
2904
- "epoch": 4.321428571428571
2905
- },
2906
- {
2907
- "current_steps": 484,
2908
- "loss": 0.7355,
2909
- "learning_rate": 1.7608695652173914e-07,
2910
- "epoch": 4.330357142857143
2911
- },
2912
- {
2913
- "current_steps": 485,
2914
- "loss": 0.8901,
2915
- "learning_rate": 1.7391304347826085e-07,
2916
- "epoch": 4.339285714285714
2917
- },
2918
- {
2919
- "current_steps": 486,
2920
- "loss": 0.565,
2921
- "learning_rate": 1.7173913043478262e-07,
2922
- "epoch": 4.348214285714286
2923
- },
2924
- {
2925
- "current_steps": 487,
2926
- "loss": 0.6396,
2927
- "learning_rate": 1.6956521739130433e-07,
2928
- "epoch": 4.357142857142857
2929
- },
2930
- {
2931
- "current_steps": 488,
2932
- "loss": 0.531,
2933
- "learning_rate": 1.673913043478261e-07,
2934
- "epoch": 4.366071428571429
2935
- },
2936
- {
2937
- "current_steps": 489,
2938
- "loss": 0.5726,
2939
- "learning_rate": 1.652173913043478e-07,
2940
- "epoch": 4.375
2941
- },
2942
- {
2943
- "current_steps": 490,
2944
- "loss": 0.602,
2945
- "learning_rate": 1.6304347826086955e-07,
2946
- "epoch": 4.383928571428571
2947
- },
2948
- {
2949
- "current_steps": 491,
2950
- "loss": 0.7032,
2951
- "learning_rate": 1.608695652173913e-07,
2952
- "epoch": 4.392857142857143
2953
- },
2954
- {
2955
- "current_steps": 492,
2956
- "loss": 0.8984,
2957
- "learning_rate": 1.5869565217391303e-07,
2958
- "epoch": 4.401785714285714
2959
- },
2960
- {
2961
- "current_steps": 493,
2962
- "loss": 0.5913,
2963
- "learning_rate": 1.565217391304348e-07,
2964
- "epoch": 4.410714285714286
2965
- },
2966
- {
2967
- "current_steps": 494,
2968
- "loss": 0.6021,
2969
- "learning_rate": 1.543478260869565e-07,
2970
- "epoch": 4.419642857142857
2971
- },
2972
- {
2973
- "current_steps": 495,
2974
- "loss": 0.7554,
2975
- "learning_rate": 1.5217391304347825e-07,
2976
- "epoch": 4.428571428571429
2977
- },
2978
- {
2979
- "current_steps": 496,
2980
- "loss": 0.8683,
2981
- "learning_rate": 1.5e-07,
2982
- "epoch": 4.4375
2983
- },
2984
- {
2985
- "current_steps": 497,
2986
- "loss": 0.5465,
2987
- "learning_rate": 1.4782608695652173e-07,
2988
- "epoch": 4.446428571428571
2989
- },
2990
- {
2991
- "current_steps": 498,
2992
- "loss": 0.6903,
2993
- "learning_rate": 1.4565217391304347e-07,
2994
- "epoch": 4.455357142857143
2995
- },
2996
- {
2997
- "current_steps": 499,
2998
- "loss": 0.4821,
2999
- "learning_rate": 1.4347826086956521e-07,
3000
- "epoch": 4.464285714285714
3001
- },
3002
- {
3003
- "current_steps": 500,
3004
- "loss": 0.6731,
3005
- "learning_rate": 1.4130434782608693e-07,
3006
- "epoch": 4.473214285714286
3007
- },
3008
- {
3009
- "current_steps": 501,
3010
- "loss": 0.7423,
3011
- "learning_rate": 1.391304347826087e-07,
3012
- "epoch": 4.482142857142857
3013
- },
3014
- {
3015
- "current_steps": 502,
3016
- "loss": 0.6967,
3017
- "learning_rate": 1.369565217391304e-07,
3018
- "epoch": 4.491071428571429
3019
- },
3020
- {
3021
- "current_steps": 503,
3022
- "loss": 0.5918,
3023
- "learning_rate": 1.3478260869565218e-07,
3024
- "epoch": 4.5
3025
- },
3026
- {
3027
- "current_steps": 504,
3028
- "loss": 0.8028,
3029
- "learning_rate": 1.3260869565217392e-07,
3030
- "epoch": 4.508928571428571
3031
- },
3032
- {
3033
- "current_steps": 505,
3034
- "loss": 0.9578,
3035
- "learning_rate": 1.3043478260869563e-07,
3036
- "epoch": 4.517857142857143
3037
- },
3038
- {
3039
- "current_steps": 506,
3040
- "loss": 0.6187,
3041
- "learning_rate": 1.282608695652174e-07,
3042
- "epoch": 4.526785714285714
3043
- },
3044
- {
3045
- "current_steps": 507,
3046
- "loss": 0.6426,
3047
- "learning_rate": 1.260869565217391e-07,
3048
- "epoch": 4.535714285714286
3049
- },
3050
- {
3051
- "current_steps": 508,
3052
- "loss": 0.5835,
3053
- "learning_rate": 1.2391304347826088e-07,
3054
- "epoch": 4.544642857142857
3055
- },
3056
- {
3057
- "current_steps": 509,
3058
- "loss": 0.7218,
3059
- "learning_rate": 1.2173913043478262e-07,
3060
- "epoch": 4.553571428571429
3061
- },
3062
- {
3063
- "current_steps": 510,
3064
- "loss": 0.812,
3065
- "learning_rate": 1.1956521739130436e-07,
3066
- "epoch": 4.5625
3067
- },
3068
- {
3069
- "current_steps": 511,
3070
- "loss": 0.5526,
3071
- "learning_rate": 1.1739130434782609e-07,
3072
- "epoch": 4.571428571428571
3073
- },
3074
- {
3075
- "current_steps": 512,
3076
- "loss": 0.8554,
3077
- "learning_rate": 1.1521739130434783e-07,
3078
- "epoch": 4.580357142857143
3079
- },
3080
- {
3081
- "current_steps": 513,
3082
- "loss": 0.7209,
3083
- "learning_rate": 1.1304347826086955e-07,
3084
- "epoch": 4.589285714285714
3085
- },
3086
- {
3087
- "current_steps": 514,
3088
- "loss": 0.7154,
3089
- "learning_rate": 1.108695652173913e-07,
3090
- "epoch": 4.598214285714286
3091
- },
3092
- {
3093
- "current_steps": 515,
3094
- "loss": 0.7147,
3095
- "learning_rate": 1.0869565217391303e-07,
3096
- "epoch": 4.607142857142857
3097
- },
3098
- {
3099
- "current_steps": 516,
3100
- "loss": 0.6997,
3101
- "learning_rate": 1.0652173913043477e-07,
3102
- "epoch": 4.616071428571429
3103
- },
3104
- {
3105
- "current_steps": 517,
3106
- "loss": 0.6283,
3107
- "learning_rate": 1.0434782608695651e-07,
3108
- "epoch": 4.625
3109
- },
3110
- {
3111
- "current_steps": 518,
3112
- "loss": 0.6279,
3113
- "learning_rate": 1.0217391304347827e-07,
3114
- "epoch": 4.633928571428571
3115
- },
3116
- {
3117
- "current_steps": 519,
3118
- "loss": 0.8152,
3119
- "learning_rate": 1e-07,
3120
- "epoch": 4.642857142857143
3121
- },
3122
- {
3123
- "current_steps": 520,
3124
- "loss": 0.6155,
3125
- "learning_rate": 9.782608695652174e-08,
3126
- "epoch": 4.651785714285714
3127
- },
3128
- {
3129
- "current_steps": 521,
3130
- "loss": 0.4727,
3131
- "learning_rate": 9.565217391304348e-08,
3132
- "epoch": 4.660714285714286
3133
- },
3134
- {
3135
- "current_steps": 522,
3136
- "loss": 0.7457,
3137
- "learning_rate": 9.347826086956522e-08,
3138
- "epoch": 4.669642857142857
3139
- },
3140
- {
3141
- "current_steps": 523,
3142
- "loss": 0.9712,
3143
- "learning_rate": 9.130434782608696e-08,
3144
- "epoch": 4.678571428571429
3145
- },
3146
- {
3147
- "current_steps": 524,
3148
- "loss": 0.7759,
3149
- "learning_rate": 8.913043478260868e-08,
3150
- "epoch": 4.6875
3151
- },
3152
- {
3153
- "current_steps": 525,
3154
- "loss": 0.6597,
3155
- "learning_rate": 8.695652173913042e-08,
3156
- "epoch": 4.696428571428571
3157
- },
3158
- {
3159
- "current_steps": 526,
3160
- "loss": 0.6258,
3161
- "learning_rate": 8.478260869565216e-08,
3162
- "epoch": 4.705357142857143
3163
- },
3164
- {
3165
- "current_steps": 527,
3166
- "loss": 0.6443,
3167
- "learning_rate": 8.26086956521739e-08,
3168
- "epoch": 4.714285714285714
3169
- },
3170
- {
3171
- "current_steps": 528,
3172
- "loss": 0.5547,
3173
- "learning_rate": 8.043478260869565e-08,
3174
- "epoch": 4.723214285714286
3175
- },
3176
- {
3177
- "current_steps": 529,
3178
- "loss": 0.7149,
3179
- "learning_rate": 7.82608695652174e-08,
3180
- "epoch": 4.732142857142857
3181
- },
3182
- {
3183
- "current_steps": 530,
3184
- "loss": 0.6138,
3185
- "learning_rate": 7.608695652173913e-08,
3186
- "epoch": 4.741071428571429
3187
- },
3188
- {
3189
- "current_steps": 531,
3190
- "loss": 0.8032,
3191
- "learning_rate": 7.391304347826087e-08,
3192
- "epoch": 4.75
3193
- },
3194
- {
3195
- "current_steps": 532,
3196
- "loss": 0.7141,
3197
- "learning_rate": 7.173913043478261e-08,
3198
- "epoch": 4.758928571428571
3199
- },
3200
- {
3201
- "current_steps": 533,
3202
- "loss": 0.724,
3203
- "learning_rate": 6.956521739130435e-08,
3204
- "epoch": 4.767857142857143
3205
- },
3206
- {
3207
- "current_steps": 534,
3208
- "loss": 0.7707,
3209
- "learning_rate": 6.739130434782609e-08,
3210
- "epoch": 4.776785714285714
3211
- },
3212
- {
3213
- "current_steps": 535,
3214
- "loss": 0.6754,
3215
- "learning_rate": 6.521739130434782e-08,
3216
- "epoch": 4.785714285714286
3217
- },
3218
- {
3219
- "current_steps": 536,
3220
- "loss": 0.5861,
3221
- "learning_rate": 6.304347826086956e-08,
3222
- "epoch": 4.794642857142857
3223
- },
3224
- {
3225
- "current_steps": 537,
3226
- "loss": 0.8395,
3227
- "learning_rate": 6.086956521739131e-08,
3228
- "epoch": 4.803571428571429
3229
- },
3230
- {
3231
- "current_steps": 538,
3232
- "loss": 0.7642,
3233
- "learning_rate": 5.869565217391304e-08,
3234
- "epoch": 4.8125
3235
- },
3236
- {
3237
- "current_steps": 539,
3238
- "loss": 0.735,
3239
- "learning_rate": 5.6521739130434777e-08,
3240
- "epoch": 4.821428571428571
3241
- },
3242
- {
3243
- "current_steps": 540,
3244
- "loss": 0.6153,
3245
- "learning_rate": 5.434782608695652e-08,
3246
- "epoch": 4.830357142857143
3247
- },
3248
- {
3249
- "current_steps": 541,
3250
- "loss": 0.6299,
3251
- "learning_rate": 5.217391304347826e-08,
3252
- "epoch": 4.839285714285714
3253
- },
3254
- {
3255
- "current_steps": 542,
3256
- "loss": 1.078,
3257
- "learning_rate": 5e-08,
3258
- "epoch": 4.848214285714286
3259
- },
3260
- {
3261
- "current_steps": 543,
3262
- "loss": 0.7314,
3263
- "learning_rate": 4.782608695652174e-08,
3264
- "epoch": 4.857142857142857
3265
- },
3266
- {
3267
- "current_steps": 544,
3268
- "loss": 0.8515,
3269
- "learning_rate": 4.565217391304348e-08,
3270
- "epoch": 4.866071428571429
3271
- },
3272
- {
3273
- "current_steps": 545,
3274
- "loss": 0.5401,
3275
- "learning_rate": 4.347826086956521e-08,
3276
- "epoch": 4.875
3277
- },
3278
- {
3279
- "current_steps": 546,
3280
- "loss": 0.7315,
3281
- "learning_rate": 4.130434782608695e-08,
3282
- "epoch": 4.883928571428571
3283
- },
3284
- {
3285
- "current_steps": 547,
3286
- "loss": 0.6113,
3287
- "learning_rate": 3.91304347826087e-08,
3288
- "epoch": 4.892857142857143
3289
- },
3290
- {
3291
- "current_steps": 548,
3292
- "loss": 0.6239,
3293
- "learning_rate": 3.6956521739130433e-08,
3294
- "epoch": 4.901785714285714
3295
- },
3296
- {
3297
- "current_steps": 549,
3298
- "loss": 0.7292,
3299
- "learning_rate": 3.4782608695652174e-08,
3300
- "epoch": 4.910714285714286
3301
- },
3302
- {
3303
- "current_steps": 550,
3304
- "loss": 0.5297,
3305
- "learning_rate": 3.260869565217391e-08,
3306
- "epoch": 4.919642857142857
3307
- },
3308
- {
3309
- "current_steps": 551,
3310
- "loss": 0.6269,
3311
- "learning_rate": 3.0434782608695655e-08,
3312
- "epoch": 4.928571428571429
3313
- },
3314
- {
3315
- "current_steps": 552,
3316
- "loss": 0.6724,
3317
- "learning_rate": 2.8260869565217388e-08,
3318
- "epoch": 4.9375
3319
- },
3320
- {
3321
- "current_steps": 553,
3322
- "loss": 0.5109,
3323
- "learning_rate": 2.608695652173913e-08,
3324
- "epoch": 4.946428571428571
3325
- },
3326
- {
3327
- "current_steps": 554,
3328
- "loss": 0.9446,
3329
- "learning_rate": 2.391304347826087e-08,
3330
- "epoch": 4.955357142857143
3331
- },
3332
- {
3333
- "current_steps": 555,
3334
- "loss": 0.6897,
3335
- "learning_rate": 2.1739130434782606e-08,
3336
- "epoch": 4.964285714285714
3337
- },
3338
- {
3339
- "current_steps": 556,
3340
- "loss": 0.5511,
3341
- "learning_rate": 1.956521739130435e-08,
3342
- "epoch": 4.973214285714286
3343
- },
3344
- {
3345
- "current_steps": 557,
3346
- "loss": 0.7246,
3347
- "learning_rate": 1.7391304347826087e-08,
3348
- "epoch": 4.982142857142857
3349
- },
3350
- {
3351
- "current_steps": 558,
3352
- "loss": 0.6332,
3353
- "learning_rate": 1.5217391304347827e-08,
3354
- "epoch": 4.991071428571429
3355
- },
3356
- {
3357
- "current_steps": 559,
3358
- "loss": 1.0499,
3359
- "learning_rate": 1.3043478260869564e-08,
3360
- "epoch": 5.0
3361
- },
3362
- {
3363
- "current_steps": 559,
3364
- "loss": 1.0499,
3365
- "learning_rate": 1.3043478260869564e-08,
3366
- "epoch": 5.0
3367
- }
3368
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/training_graph.png DELETED
Binary file (64.6 kB)
 
aliceinwonderland/training_log.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "base_model_name": "Llama-2-13b-hf",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": true,
5
- "base_loaded_in_8bit": false,
6
- "projections": "q, v",
7
- "loss": 1.0499,
8
- "grad_norm": 5.645450592041016,
9
- "learning_rate": 1.3043478260869564e-08,
10
- "epoch": 5.0,
11
- "current_steps": 559,
12
- "current_steps_adjusted": 559,
13
- "epoch_adjusted": 5.0,
14
- "train_runtime": 1468.5439,
15
- "train_samples_per_second": 1.515,
16
- "train_steps_per_second": 0.381,
17
- "total_flos": 4.4012668649472e+16,
18
- "train_loss": 0.7355319578732763
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/training_parameters.json DELETED
@@ -1,37 +0,0 @@
1
- {
2
- "lora_name": "aliceinwonderland",
3
- "always_override": true,
4
- "save_steps": 0,
5
- "micro_batch_size": 4,
6
- "batch_size": 0,
7
- "epochs": 5,
8
- "learning_rate": "1e-6",
9
- "lr_scheduler_type": "linear",
10
- "lora_rank": 32,
11
- "lora_alpha": 64,
12
- "lora_dropout": 0.05,
13
- "cutoff_len": 256,
14
- "dataset": "None",
15
- "eval_dataset": "None",
16
- "format": "None",
17
- "eval_steps": 100,
18
- "raw_text_file": "aliceandwonderland",
19
- "higher_rank_limit": false,
20
- "warmup_steps": 100,
21
- "optimizer": "adamw_torch",
22
- "hard_cut_string": "\\n\\n\\n",
23
- "train_only_after": "",
24
- "stop_at_loss": 0,
25
- "add_eos_token": false,
26
- "min_chars": 20,
27
- "report_to": "None",
28
- "precize_slicing_overlap": true,
29
- "add_eos_token_type": "Every Block",
30
- "save_steps_under_loss": 1.8,
31
- "add_bos_token": true,
32
- "training_projection": "q-v",
33
- "sliding_window": false,
34
- "warmup_ratio": 0,
35
- "grad_accumulation": 1,
36
- "neft_noise_alpha": 0
37
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aliceinwonderland/training_prompt.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "template_type": "raw_text"
3
- }