Upload folder using huggingface_hub
Browse files- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/README.md +202 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/adapter_config.json +29 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/adapter_model.safetensors +3 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/optimizer.pt +3 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/rng_state.pth +3 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/scheduler.pt +3 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/special_tokens_map.json +24 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/tokenizer.json +0 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/tokenizer.model +3 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/tokenizer_config.json +0 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/trainer_state.json +0 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/training_args.bin +3 -0
- Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/training_log.jsonl +1 -0
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/README.md
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: mistralai/Mistral-7B-Instruct-v0.3
|
| 3 |
+
library_name: peft
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Model Card for Model ID
|
| 7 |
+
|
| 8 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
## Model Details
|
| 13 |
+
|
| 14 |
+
### Model Description
|
| 15 |
+
|
| 16 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
- **Developed by:** [More Information Needed]
|
| 21 |
+
- **Funded by [optional]:** [More Information Needed]
|
| 22 |
+
- **Shared by [optional]:** [More Information Needed]
|
| 23 |
+
- **Model type:** [More Information Needed]
|
| 24 |
+
- **Language(s) (NLP):** [More Information Needed]
|
| 25 |
+
- **License:** [More Information Needed]
|
| 26 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
| 27 |
+
|
| 28 |
+
### Model Sources [optional]
|
| 29 |
+
|
| 30 |
+
<!-- Provide the basic links for the model. -->
|
| 31 |
+
|
| 32 |
+
- **Repository:** [More Information Needed]
|
| 33 |
+
- **Paper [optional]:** [More Information Needed]
|
| 34 |
+
- **Demo [optional]:** [More Information Needed]
|
| 35 |
+
|
| 36 |
+
## Uses
|
| 37 |
+
|
| 38 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 39 |
+
|
| 40 |
+
### Direct Use
|
| 41 |
+
|
| 42 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 43 |
+
|
| 44 |
+
[More Information Needed]
|
| 45 |
+
|
| 46 |
+
### Downstream Use [optional]
|
| 47 |
+
|
| 48 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 49 |
+
|
| 50 |
+
[More Information Needed]
|
| 51 |
+
|
| 52 |
+
### Out-of-Scope Use
|
| 53 |
+
|
| 54 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 55 |
+
|
| 56 |
+
[More Information Needed]
|
| 57 |
+
|
| 58 |
+
## Bias, Risks, and Limitations
|
| 59 |
+
|
| 60 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 61 |
+
|
| 62 |
+
[More Information Needed]
|
| 63 |
+
|
| 64 |
+
### Recommendations
|
| 65 |
+
|
| 66 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 67 |
+
|
| 68 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 69 |
+
|
| 70 |
+
## How to Get Started with the Model
|
| 71 |
+
|
| 72 |
+
Use the code below to get started with the model.
|
| 73 |
+
|
| 74 |
+
[More Information Needed]
|
| 75 |
+
|
| 76 |
+
## Training Details
|
| 77 |
+
|
| 78 |
+
### Training Data
|
| 79 |
+
|
| 80 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 81 |
+
|
| 82 |
+
[More Information Needed]
|
| 83 |
+
|
| 84 |
+
### Training Procedure
|
| 85 |
+
|
| 86 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 87 |
+
|
| 88 |
+
#### Preprocessing [optional]
|
| 89 |
+
|
| 90 |
+
[More Information Needed]
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
#### Training Hyperparameters
|
| 94 |
+
|
| 95 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 96 |
+
|
| 97 |
+
#### Speeds, Sizes, Times [optional]
|
| 98 |
+
|
| 99 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 100 |
+
|
| 101 |
+
[More Information Needed]
|
| 102 |
+
|
| 103 |
+
## Evaluation
|
| 104 |
+
|
| 105 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 106 |
+
|
| 107 |
+
### Testing Data, Factors & Metrics
|
| 108 |
+
|
| 109 |
+
#### Testing Data
|
| 110 |
+
|
| 111 |
+
<!-- This should link to a Dataset Card if possible. -->
|
| 112 |
+
|
| 113 |
+
[More Information Needed]
|
| 114 |
+
|
| 115 |
+
#### Factors
|
| 116 |
+
|
| 117 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 118 |
+
|
| 119 |
+
[More Information Needed]
|
| 120 |
+
|
| 121 |
+
#### Metrics
|
| 122 |
+
|
| 123 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 124 |
+
|
| 125 |
+
[More Information Needed]
|
| 126 |
+
|
| 127 |
+
### Results
|
| 128 |
+
|
| 129 |
+
[More Information Needed]
|
| 130 |
+
|
| 131 |
+
#### Summary
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
## Model Examination [optional]
|
| 136 |
+
|
| 137 |
+
<!-- Relevant interpretability work for the model goes here -->
|
| 138 |
+
|
| 139 |
+
[More Information Needed]
|
| 140 |
+
|
| 141 |
+
## Environmental Impact
|
| 142 |
+
|
| 143 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 144 |
+
|
| 145 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 146 |
+
|
| 147 |
+
- **Hardware Type:** [More Information Needed]
|
| 148 |
+
- **Hours used:** [More Information Needed]
|
| 149 |
+
- **Cloud Provider:** [More Information Needed]
|
| 150 |
+
- **Compute Region:** [More Information Needed]
|
| 151 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 152 |
+
|
| 153 |
+
## Technical Specifications [optional]
|
| 154 |
+
|
| 155 |
+
### Model Architecture and Objective
|
| 156 |
+
|
| 157 |
+
[More Information Needed]
|
| 158 |
+
|
| 159 |
+
### Compute Infrastructure
|
| 160 |
+
|
| 161 |
+
[More Information Needed]
|
| 162 |
+
|
| 163 |
+
#### Hardware
|
| 164 |
+
|
| 165 |
+
[More Information Needed]
|
| 166 |
+
|
| 167 |
+
#### Software
|
| 168 |
+
|
| 169 |
+
[More Information Needed]
|
| 170 |
+
|
| 171 |
+
## Citation [optional]
|
| 172 |
+
|
| 173 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 174 |
+
|
| 175 |
+
**BibTeX:**
|
| 176 |
+
|
| 177 |
+
[More Information Needed]
|
| 178 |
+
|
| 179 |
+
**APA:**
|
| 180 |
+
|
| 181 |
+
[More Information Needed]
|
| 182 |
+
|
| 183 |
+
## Glossary [optional]
|
| 184 |
+
|
| 185 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 186 |
+
|
| 187 |
+
[More Information Needed]
|
| 188 |
+
|
| 189 |
+
## More Information [optional]
|
| 190 |
+
|
| 191 |
+
[More Information Needed]
|
| 192 |
+
|
| 193 |
+
## Model Card Authors [optional]
|
| 194 |
+
|
| 195 |
+
[More Information Needed]
|
| 196 |
+
|
| 197 |
+
## Model Card Contact
|
| 198 |
+
|
| 199 |
+
[More Information Needed]
|
| 200 |
+
### Framework versions
|
| 201 |
+
|
| 202 |
+
- PEFT 0.13.1
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/adapter_config.json
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alpha_pattern": {},
|
| 3 |
+
"auto_mapping": null,
|
| 4 |
+
"base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
|
| 5 |
+
"bias": "none",
|
| 6 |
+
"fan_in_fan_out": false,
|
| 7 |
+
"inference_mode": true,
|
| 8 |
+
"init_lora_weights": true,
|
| 9 |
+
"layer_replication": null,
|
| 10 |
+
"layers_pattern": null,
|
| 11 |
+
"layers_to_transform": null,
|
| 12 |
+
"loftq_config": {},
|
| 13 |
+
"lora_alpha": 32,
|
| 14 |
+
"lora_dropout": 0.05,
|
| 15 |
+
"megatron_config": null,
|
| 16 |
+
"megatron_core": "megatron.core",
|
| 17 |
+
"modules_to_save": null,
|
| 18 |
+
"peft_type": "LORA",
|
| 19 |
+
"r": 64,
|
| 20 |
+
"rank_pattern": {},
|
| 21 |
+
"revision": null,
|
| 22 |
+
"target_modules": [
|
| 23 |
+
"q_proj",
|
| 24 |
+
"v_proj"
|
| 25 |
+
],
|
| 26 |
+
"task_type": "CAUSAL_LM",
|
| 27 |
+
"use_dora": false,
|
| 28 |
+
"use_rslora": false
|
| 29 |
+
}
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0a5bbc58d19eb50c247845fd90a2989680a84ab0590d8cf6979faeb95b35b796
|
| 3 |
+
size 109069176
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/optimizer.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7f7ce635182c91d81882839eca791ee99f791e5fa9868224eccb2814f4faf24d
|
| 3 |
+
size 55532666
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/rng_state.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:06581e61fd1a58568d3b9852340a8011370d2a4592e3cd7d1ccb189139f1b0f2
|
| 3 |
+
size 14244
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/scheduler.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:75be25b4c8bc91ab1af55884c6479e35984f30a4060ab91f248ff756e05b7471
|
| 3 |
+
size 1064
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/special_tokens_map.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": {
|
| 3 |
+
"content": "<s>",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": false,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"eos_token": {
|
| 10 |
+
"content": "</s>",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": false,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"pad_token": "</s>",
|
| 17 |
+
"unk_token": {
|
| 18 |
+
"content": "<unk>",
|
| 19 |
+
"lstrip": false,
|
| 20 |
+
"normalized": false,
|
| 21 |
+
"rstrip": false,
|
| 22 |
+
"single_word": false
|
| 23 |
+
}
|
| 24 |
+
}
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/tokenizer.model
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
|
| 3 |
+
size 587404
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/tokenizer_config.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/trainer_state.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/checkpoint-6423/training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7bc228d88473a7726f9cdf402a5c8b6f865262f73489f565d3b9a71e23604269
|
| 3 |
+
size 5560
|
Mistral-7B-Instruct-v0.3_int4_medmcqa_full_con_lr-0.0002_e-8_seq-512_lora-a-32-d-0.05-r-64_bs-1_gas-2_tf32-True_tunedata-portion-p-0.4-num-51190-sd-42/training_log.jsonl
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"epoch": 1.0, "step": 6423, "epoch_duration": 15985.253971338272, "total_accumulated_duration": 15985.253971338272, "gpu_info": {"GPU_0": "NVIDIA A100-PCIE-40GB"}, "memory_usage": {"avg_memory_usage": {"GPU_0": 4584.45361328125}, "peak_memory_usage": {"GPU_0": 5464.775390625}, "avg_memory_reserved": {"GPU_0": 5698.0}, "peak_memory_reserved": {"GPU_0": 5698.0}, "total_memory": {"GPU_0": 40444.375}}, "best_checkpoint_path": "N/A", "params": {"epochs": 8, "batch_size": 1, "learning_rate": 0.0002, "gradient_accumulation_steps": 2, "warmup_ratio": 0.03, "max_grad_norm": 0.3, "lora_alpha": 32, "lora_dropout": 0.05, "lora_r": 64, "tf32": true, "seed": 42}, "log_history": [{"loss": 1.9297, "grad_norm": 1.0625836849212646, "learning_rate": 0.0002, "epoch": 0.0015569048731122529, "step": 10}, {"loss": 1.7912, "grad_norm": 1.0186495780944824, "learning_rate": 0.0002, "epoch": 0.0031138097462245057, "step": 20}, {"loss": 1.6922, "grad_norm": 0.9173291325569153, "learning_rate": 0.0002, "epoch": 0.004670714619336758, "step": 30}, {"loss": 1.3488, "grad_norm": 0.7627642154693604, "learning_rate": 0.0002, "epoch": 0.0062276194924490115, "step": 40}, {"loss": 1.1782, "grad_norm": 0.7223761081695557, "learning_rate": 0.0002, "epoch": 0.0077845243655612646, "step": 50}, {"loss": 1.2297, "grad_norm": 0.6947183012962341, "learning_rate": 0.0002, "epoch": 0.009341429238673517, "step": 60}, {"loss": 1.1711, "grad_norm": 0.6863043308258057, "learning_rate": 0.0002, "epoch": 0.01089833411178577, "step": 70}, {"loss": 1.1723, "grad_norm": 0.5997875332832336, "learning_rate": 0.0002, "epoch": 0.012455238984898023, "step": 80}, {"loss": 1.1309, "grad_norm": 0.601932942867279, "learning_rate": 0.0002, "epoch": 0.014012143858010275, "step": 90}, {"loss": 1.1873, "grad_norm": 0.6305660009384155, "learning_rate": 0.0002, "epoch": 0.015569048731122529, "step": 100}, {"loss": 1.1171, "grad_norm": 0.6064867377281189, "learning_rate": 0.0002, "epoch": 0.01712595360423478, "step": 110}, {"loss": 1.1137, "grad_norm": 0.539921224117279, "learning_rate": 0.0002, "epoch": 0.018682858477347034, "step": 120}, {"loss": 1.1857, "grad_norm": 0.9058607220649719, "learning_rate": 0.0002, "epoch": 0.020239763350459287, "step": 130}, {"loss": 1.1087, "grad_norm": 0.6353763937950134, "learning_rate": 0.0002, "epoch": 0.02179666822357154, "step": 140}, {"loss": 1.1362, "grad_norm": 0.6469660401344299, "learning_rate": 0.0002, "epoch": 0.023353573096683792, "step": 150}, {"loss": 1.1406, "grad_norm": 0.7145527005195618, "learning_rate": 0.0002, "epoch": 0.024910477969796046, "step": 160}, {"loss": 1.1471, "grad_norm": 0.5554297566413879, "learning_rate": 0.0002, "epoch": 0.0264673828429083, "step": 170}, {"loss": 1.1957, "grad_norm": 0.5539569854736328, "learning_rate": 0.0002, "epoch": 0.02802428771602055, "step": 180}, {"loss": 1.1516, "grad_norm": 0.5214126110076904, "learning_rate": 0.0002, "epoch": 0.029581192589132804, "step": 190}, {"loss": 1.162, "grad_norm": 0.663166344165802, "learning_rate": 0.0002, "epoch": 0.031138097462245058, "step": 200}, {"loss": 1.1891, "grad_norm": 0.6545661687850952, "learning_rate": 0.0002, "epoch": 0.03269500233535731, "step": 210}, {"loss": 1.1353, "grad_norm": 0.7013102173805237, "learning_rate": 0.0002, "epoch": 0.03425190720846956, "step": 220}, {"loss": 1.1439, "grad_norm": 0.7057784795761108, "learning_rate": 0.0002, "epoch": 0.03580881208158181, "step": 230}, {"loss": 1.158, "grad_norm": 0.612928569316864, "learning_rate": 0.0002, "epoch": 0.03736571695469407, "step": 240}, {"loss": 1.1178, "grad_norm": 0.5062541365623474, "learning_rate": 0.0002, "epoch": 0.03892262182780632, "step": 250}, {"loss": 1.1734, "grad_norm": 0.5950959920883179, "learning_rate": 0.0002, "epoch": 0.040479526700918575, "step": 260}, {"loss": 1.1644, "grad_norm": 0.5848205089569092, "learning_rate": 0.0002, "epoch": 0.04203643157403083, "step": 270}, {"loss": 1.1392, "grad_norm": 0.5236861109733582, "learning_rate": 0.0002, "epoch": 0.04359333644714308, "step": 280}, {"loss": 1.1252, "grad_norm": 0.6309433579444885, "learning_rate": 0.0002, "epoch": 0.04515024132025533, "step": 290}, {"loss": 1.1749, "grad_norm": 0.7145726084709167, "learning_rate": 0.0002, "epoch": 0.046707146193367584, "step": 300}, {"loss": 1.1445, "grad_norm": 0.5978087186813354, "learning_rate": 0.0002, "epoch": 0.04826405106647984, "step": 310}, {"loss": 1.1583, "grad_norm": 0.5891343355178833, "learning_rate": 0.0002, "epoch": 0.04982095593959209, "step": 320}, {"loss": 1.1593, "grad_norm": 0.5066903233528137, "learning_rate": 0.0002, "epoch": 0.051377860812704346, "step": 330}, {"loss": 1.098, "grad_norm": 0.676838755607605, "learning_rate": 0.0002, "epoch": 0.0529347656858166, "step": 340}, {"loss": 1.1124, "grad_norm": 0.6622801423072815, "learning_rate": 0.0002, "epoch": 0.05449167055892885, "step": 350}, {"loss": 1.1195, "grad_norm": 0.5221540927886963, "learning_rate": 0.0002, "epoch": 0.0560485754320411, "step": 360}, {"loss": 1.0962, "grad_norm": 0.6157132387161255, "learning_rate": 0.0002, "epoch": 0.057605480305153355, "step": 370}, {"loss": 1.1671, "grad_norm": 0.6694879531860352, "learning_rate": 0.0002, "epoch": 0.05916238517826561, "step": 380}, {"loss": 1.1469, "grad_norm": 0.5748186707496643, "learning_rate": 0.0002, "epoch": 0.06071929005137786, "step": 390}, {"loss": 1.1183, "grad_norm": 0.6818327307701111, "learning_rate": 0.0002, "epoch": 0.062276194924490116, "step": 400}, {"loss": 1.0979, "grad_norm": 0.6167246103286743, "learning_rate": 0.0002, "epoch": 0.06383309979760236, "step": 410}, {"loss": 1.0974, "grad_norm": 0.6108185052871704, "learning_rate": 0.0002, "epoch": 0.06539000467071462, "step": 420}, {"loss": 1.0879, "grad_norm": 0.5813191533088684, "learning_rate": 0.0002, "epoch": 0.06694690954382687, "step": 430}, {"loss": 1.1193, "grad_norm": 0.5398768186569214, "learning_rate": 0.0002, "epoch": 0.06850381441693912, "step": 440}, {"loss": 1.1159, "grad_norm": 0.5680503249168396, "learning_rate": 0.0002, "epoch": 0.07006071929005138, "step": 450}, {"loss": 1.1265, "grad_norm": 0.6427502632141113, "learning_rate": 0.0002, "epoch": 0.07161762416316363, "step": 460}, {"loss": 1.1104, "grad_norm": 0.8448787331581116, "learning_rate": 0.0002, "epoch": 0.07317452903627589, "step": 470}, {"loss": 1.1219, "grad_norm": 0.6331955790519714, "learning_rate": 0.0002, "epoch": 0.07473143390938813, "step": 480}, {"loss": 1.1036, "grad_norm": 0.6323956251144409, "learning_rate": 0.0002, "epoch": 0.0762883387825004, "step": 490}, {"loss": 1.1129, "grad_norm": 0.597777783870697, "learning_rate": 0.0002, "epoch": 0.07784524365561264, "step": 500}, {"loss": 1.13, "grad_norm": 0.5934968590736389, "learning_rate": 0.0002, "epoch": 0.07940214852872489, "step": 510}, {"loss": 1.095, "grad_norm": 0.5090578198432922, "learning_rate": 0.0002, "epoch": 0.08095905340183715, "step": 520}, {"loss": 1.1693, "grad_norm": 0.5294740796089172, "learning_rate": 0.0002, "epoch": 0.0825159582749494, "step": 530}, {"loss": 1.1262, "grad_norm": 0.7146711349487305, "learning_rate": 0.0002, "epoch": 0.08407286314806166, "step": 540}, {"loss": 1.1735, "grad_norm": 0.5683578848838806, "learning_rate": 0.0002, "epoch": 0.0856297680211739, "step": 550}, {"loss": 1.098, "grad_norm": 0.6001752614974976, "learning_rate": 0.0002, "epoch": 0.08718667289428617, "step": 560}, {"loss": 1.1436, "grad_norm": 0.6592530012130737, "learning_rate": 0.0002, "epoch": 0.08874357776739841, "step": 570}, {"loss": 1.1198, "grad_norm": 0.769167959690094, "learning_rate": 0.0002, "epoch": 0.09030048264051066, "step": 580}, {"loss": 1.1425, "grad_norm": 0.5630056262016296, "learning_rate": 0.0002, "epoch": 0.09185738751362292, "step": 590}, {"loss": 1.1559, "grad_norm": 0.5160059332847595, "learning_rate": 0.0002, "epoch": 0.09341429238673517, "step": 600}, {"loss": 1.1291, "grad_norm": 0.596750795841217, "learning_rate": 0.0002, "epoch": 0.09497119725984743, "step": 610}, {"loss": 1.13, "grad_norm": 0.5554390549659729, "learning_rate": 0.0002, "epoch": 0.09652810213295968, "step": 620}, {"loss": 1.1399, "grad_norm": 0.5168308019638062, "learning_rate": 0.0002, "epoch": 0.09808500700607192, "step": 630}, {"loss": 1.105, "grad_norm": 0.7179445624351501, "learning_rate": 0.0002, "epoch": 0.09964191187918418, "step": 640}, {"loss": 1.0831, "grad_norm": 0.795161247253418, "learning_rate": 0.0002, "epoch": 0.10119881675229643, "step": 650}, {"loss": 1.1175, "grad_norm": 0.5458967089653015, "learning_rate": 0.0002, "epoch": 0.10275572162540869, "step": 660}, {"loss": 1.0904, "grad_norm": 0.6403341889381409, "learning_rate": 0.0002, "epoch": 0.10431262649852094, "step": 670}, {"loss": 1.1325, "grad_norm": 0.599587082862854, "learning_rate": 0.0002, "epoch": 0.1058695313716332, "step": 680}, {"loss": 1.1586, "grad_norm": 0.5924763083457947, "learning_rate": 0.0002, "epoch": 0.10742643624474545, "step": 690}, {"loss": 1.1499, "grad_norm": 0.6349402070045471, "learning_rate": 0.0002, "epoch": 0.1089833411178577, "step": 700}, {"loss": 1.0972, "grad_norm": 0.6329488158226013, "learning_rate": 0.0002, "epoch": 0.11054024599096995, "step": 710}, {"loss": 1.1016, "grad_norm": 0.6591516137123108, "learning_rate": 0.0002, "epoch": 0.1120971508640822, "step": 720}, {"loss": 1.0814, "grad_norm": 0.5567219853401184, "learning_rate": 0.0002, "epoch": 0.11365405573719446, "step": 730}, {"loss": 1.1321, "grad_norm": 2.051968574523926, "learning_rate": 0.0002, "epoch": 0.11521096061030671, "step": 740}, {"loss": 1.0956, "grad_norm": 0.534951388835907, "learning_rate": 0.0002, "epoch": 0.11676786548341897, "step": 750}, {"loss": 1.1718, "grad_norm": 0.524635910987854, "learning_rate": 0.0002, "epoch": 0.11832477035653122, "step": 760}, {"loss": 1.1135, "grad_norm": 0.6297617554664612, "learning_rate": 0.0002, "epoch": 0.11988167522964346, "step": 770}, {"loss": 1.0575, "grad_norm": 0.69248366355896, "learning_rate": 0.0002, "epoch": 0.12143858010275572, "step": 780}, {"loss": 1.1351, "grad_norm": 0.6420977115631104, "learning_rate": 0.0002, "epoch": 0.12299548497586797, "step": 790}, {"loss": 1.1081, "grad_norm": 0.46199411153793335, "learning_rate": 0.0002, "epoch": 0.12455238984898023, "step": 800}, {"loss": 1.0724, "grad_norm": 0.6685129404067993, "learning_rate": 0.0002, "epoch": 0.12610929472209248, "step": 810}, {"loss": 1.1477, "grad_norm": 0.6581668853759766, "learning_rate": 0.0002, "epoch": 0.12766619959520473, "step": 820}, {"loss": 1.106, "grad_norm": 0.535027027130127, "learning_rate": 0.0002, "epoch": 0.12922310446831697, "step": 830}, {"loss": 1.1828, "grad_norm": 0.5848594903945923, "learning_rate": 0.0002, "epoch": 0.13078000934142925, "step": 840}, {"loss": 1.118, "grad_norm": 0.7530373930931091, "learning_rate": 0.0002, "epoch": 0.1323369142145415, "step": 850}, {"loss": 1.1268, "grad_norm": 0.511100709438324, "learning_rate": 0.0002, "epoch": 0.13389381908765374, "step": 860}, {"loss": 1.122, "grad_norm": 0.5951367020606995, "learning_rate": 0.0002, "epoch": 0.135450723960766, "step": 870}, {"loss": 1.0885, "grad_norm": 0.4586924612522125, "learning_rate": 0.0002, "epoch": 0.13700762883387824, "step": 880}, {"loss": 1.0708, "grad_norm": 0.7017108798027039, "learning_rate": 0.0002, "epoch": 0.1385645337069905, "step": 890}, {"loss": 1.0935, "grad_norm": 0.7491217255592346, "learning_rate": 0.0002, "epoch": 0.14012143858010276, "step": 900}, {"loss": 1.1446, "grad_norm": 0.6166567802429199, "learning_rate": 0.0002, "epoch": 0.141678343453215, "step": 910}, {"loss": 1.1185, "grad_norm": 0.6532944440841675, "learning_rate": 0.0002, "epoch": 0.14323524832632725, "step": 920}, {"loss": 1.1016, "grad_norm": 0.6104971766471863, "learning_rate": 0.0002, "epoch": 0.14479215319943953, "step": 930}, {"loss": 1.136, "grad_norm": 0.7276142835617065, "learning_rate": 0.0002, "epoch": 0.14634905807255177, "step": 940}, {"loss": 1.1095, "grad_norm": 0.638933002948761, "learning_rate": 0.0002, "epoch": 0.14790596294566402, "step": 950}, {"loss": 1.1178, "grad_norm": 0.5478529334068298, "learning_rate": 0.0002, "epoch": 0.14946286781877627, "step": 960}, {"loss": 1.1215, "grad_norm": 0.6534238457679749, "learning_rate": 0.0002, "epoch": 0.15101977269188852, "step": 970}, {"loss": 1.0974, "grad_norm": 0.5965027809143066, "learning_rate": 0.0002, "epoch": 0.1525766775650008, "step": 980}, {"loss": 1.1457, "grad_norm": 0.5843324065208435, "learning_rate": 0.0002, "epoch": 0.15413358243811304, "step": 990}, {"loss": 1.0734, "grad_norm": 0.559089720249176, "learning_rate": 0.0002, "epoch": 0.15569048731122528, "step": 1000}, {"loss": 1.0947, "grad_norm": 0.605983316898346, "learning_rate": 0.0002, "epoch": 0.15724739218433753, "step": 1010}, {"loss": 1.0545, "grad_norm": 0.6432632207870483, "learning_rate": 0.0002, "epoch": 0.15880429705744978, "step": 1020}, {"loss": 1.1447, "grad_norm": 0.613200306892395, "learning_rate": 0.0002, "epoch": 0.16036120193056205, "step": 1030}, {"loss": 1.0739, "grad_norm": 0.8461366295814514, "learning_rate": 0.0002, "epoch": 0.1619181068036743, "step": 1040}, {"loss": 1.1326, "grad_norm": 0.6300869584083557, "learning_rate": 0.0002, "epoch": 0.16347501167678655, "step": 1050}, {"loss": 1.1726, "grad_norm": 0.6111822724342346, "learning_rate": 0.0002, "epoch": 0.1650319165498988, "step": 1060}, {"loss": 1.1684, "grad_norm": 0.6561546921730042, "learning_rate": 0.0002, "epoch": 0.16658882142301104, "step": 1070}, {"loss": 1.0887, "grad_norm": 0.6523616313934326, "learning_rate": 0.0002, "epoch": 0.16814572629612332, "step": 1080}, {"loss": 1.1844, "grad_norm": 0.6651533842086792, "learning_rate": 0.0002, "epoch": 0.16970263116923556, "step": 1090}, {"loss": 1.1447, "grad_norm": 0.6121208071708679, "learning_rate": 0.0002, "epoch": 0.1712595360423478, "step": 1100}, {"loss": 1.1139, "grad_norm": 0.6707288026809692, "learning_rate": 0.0002, "epoch": 0.17281644091546006, "step": 1110}, {"loss": 1.1203, "grad_norm": 0.5783587098121643, "learning_rate": 0.0002, "epoch": 0.17437334578857233, "step": 1120}, {"loss": 1.1448, "grad_norm": 0.672231912612915, "learning_rate": 0.0002, "epoch": 0.17593025066168458, "step": 1130}, {"loss": 1.1399, "grad_norm": 0.7658334374427795, "learning_rate": 0.0002, "epoch": 0.17748715553479683, "step": 1140}, {"loss": 1.1074, "grad_norm": 0.5522013902664185, "learning_rate": 0.0002, "epoch": 0.17904406040790907, "step": 1150}, {"loss": 1.1243, "grad_norm": 0.6899687647819519, "learning_rate": 0.0002, "epoch": 0.18060096528102132, "step": 1160}, {"loss": 1.1037, "grad_norm": 0.5737553238868713, "learning_rate": 0.0002, "epoch": 0.1821578701541336, "step": 1170}, {"loss": 1.1545, "grad_norm": 0.5949761867523193, "learning_rate": 0.0002, "epoch": 0.18371477502724584, "step": 1180}, {"loss": 1.1605, "grad_norm": 0.9239516258239746, "learning_rate": 0.0002, "epoch": 0.1852716799003581, "step": 1190}, {"loss": 1.0773, "grad_norm": 0.6174934506416321, "learning_rate": 0.0002, "epoch": 0.18682858477347034, "step": 1200}, {"loss": 1.0421, "grad_norm": 0.6435124278068542, "learning_rate": 0.0002, "epoch": 0.18838548964658258, "step": 1210}, {"loss": 1.1263, "grad_norm": 0.6969723701477051, "learning_rate": 0.0002, "epoch": 0.18994239451969486, "step": 1220}, {"loss": 1.1227, "grad_norm": 6.226013660430908, "learning_rate": 0.0002, "epoch": 0.1914992993928071, "step": 1230}, {"loss": 1.1269, "grad_norm": 0.5392214059829712, "learning_rate": 0.0002, "epoch": 0.19305620426591935, "step": 1240}, {"loss": 1.0963, "grad_norm": 0.6338254809379578, "learning_rate": 0.0002, "epoch": 0.1946131091390316, "step": 1250}, {"loss": 1.1319, "grad_norm": 0.7355178594589233, "learning_rate": 0.0002, "epoch": 0.19617001401214385, "step": 1260}, {"loss": 1.1068, "grad_norm": 0.6277201771736145, "learning_rate": 0.0002, "epoch": 0.19772691888525612, "step": 1270}, {"loss": 1.1088, "grad_norm": 0.583827793598175, "learning_rate": 0.0002, "epoch": 0.19928382375836837, "step": 1280}, {"loss": 1.1031, "grad_norm": 0.6162388920783997, "learning_rate": 0.0002, "epoch": 0.20084072863148061, "step": 1290}, {"loss": 1.0732, "grad_norm": 0.515679121017456, "learning_rate": 0.0002, "epoch": 0.20239763350459286, "step": 1300}, {"loss": 1.0995, "grad_norm": 0.6534159183502197, "learning_rate": 0.0002, "epoch": 0.20395453837770514, "step": 1310}, {"loss": 1.1293, "grad_norm": 0.7893869280815125, "learning_rate": 0.0002, "epoch": 0.20551144325081738, "step": 1320}, {"loss": 1.102, "grad_norm": 0.6813029646873474, "learning_rate": 0.0002, "epoch": 0.20706834812392963, "step": 1330}, {"loss": 1.11, "grad_norm": 0.7346246242523193, "learning_rate": 0.0002, "epoch": 0.20862525299704188, "step": 1340}, {"loss": 1.0635, "grad_norm": 0.5932967066764832, "learning_rate": 0.0002, "epoch": 0.21018215787015412, "step": 1350}, {"loss": 1.1002, "grad_norm": 0.7866950035095215, "learning_rate": 0.0002, "epoch": 0.2117390627432664, "step": 1360}, {"loss": 1.1071, "grad_norm": 0.6781790852546692, "learning_rate": 0.0002, "epoch": 0.21329596761637865, "step": 1370}, {"loss": 1.139, "grad_norm": 0.7438035607337952, "learning_rate": 0.0002, "epoch": 0.2148528724894909, "step": 1380}, {"loss": 1.0589, "grad_norm": 0.5854487419128418, "learning_rate": 0.0002, "epoch": 0.21640977736260314, "step": 1390}, {"loss": 1.0986, "grad_norm": 0.7214435338973999, "learning_rate": 0.0002, "epoch": 0.2179666822357154, "step": 1400}, {"loss": 1.0967, "grad_norm": 0.5515540838241577, "learning_rate": 0.0002, "epoch": 0.21952358710882766, "step": 1410}, {"loss": 1.0918, "grad_norm": 0.625274121761322, "learning_rate": 0.0002, "epoch": 0.2210804919819399, "step": 1420}, {"loss": 1.1095, "grad_norm": 0.670879065990448, "learning_rate": 0.0002, "epoch": 0.22263739685505216, "step": 1430}, {"loss": 1.0972, "grad_norm": 0.6580260396003723, "learning_rate": 0.0002, "epoch": 0.2241943017281644, "step": 1440}, {"loss": 1.107, "grad_norm": 0.744412899017334, "learning_rate": 0.0002, "epoch": 0.22575120660127665, "step": 1450}, {"loss": 1.0778, "grad_norm": 0.6362533569335938, "learning_rate": 0.0002, "epoch": 0.22730811147438892, "step": 1460}, {"loss": 1.0428, "grad_norm": 0.6822949051856995, "learning_rate": 0.0002, "epoch": 0.22886501634750117, "step": 1470}, {"loss": 1.0846, "grad_norm": 0.5765610933303833, "learning_rate": 0.0002, "epoch": 0.23042192122061342, "step": 1480}, {"loss": 1.103, "grad_norm": 0.5817660689353943, "learning_rate": 0.0002, "epoch": 0.23197882609372567, "step": 1490}, {"loss": 1.1709, "grad_norm": 0.7262201309204102, "learning_rate": 0.0002, "epoch": 0.23353573096683794, "step": 1500}, {"loss": 1.1055, "grad_norm": 0.6998226046562195, "learning_rate": 0.0002, "epoch": 0.2350926358399502, "step": 1510}, {"loss": 1.1176, "grad_norm": 0.6553370952606201, "learning_rate": 0.0002, "epoch": 0.23664954071306243, "step": 1520}, {"loss": 1.1229, "grad_norm": 0.5544524788856506, "learning_rate": 0.0002, "epoch": 0.23820644558617468, "step": 1530}, {"loss": 1.1157, "grad_norm": 0.6616725325584412, "learning_rate": 0.0002, "epoch": 0.23976335045928693, "step": 1540}, {"loss": 1.0853, "grad_norm": 0.634032666683197, "learning_rate": 0.0002, "epoch": 0.2413202553323992, "step": 1550}, {"loss": 1.0946, "grad_norm": 0.73193359375, "learning_rate": 0.0002, "epoch": 0.24287716020551145, "step": 1560}, {"loss": 1.0634, "grad_norm": 0.6141799688339233, "learning_rate": 0.0002, "epoch": 0.2444340650786237, "step": 1570}, {"loss": 1.0902, "grad_norm": 0.6891711950302124, "learning_rate": 0.0002, "epoch": 0.24599096995173594, "step": 1580}, {"loss": 1.0413, "grad_norm": 0.6239911317825317, "learning_rate": 0.0002, "epoch": 0.2475478748248482, "step": 1590}, {"loss": 1.0733, "grad_norm": 0.7287254333496094, "learning_rate": 0.0002, "epoch": 0.24910477969796047, "step": 1600}, {"loss": 1.1551, "grad_norm": 0.6797583103179932, "learning_rate": 0.0002, "epoch": 0.2506616845710727, "step": 1610}, {"loss": 1.1153, "grad_norm": 0.8263372182846069, "learning_rate": 0.0002, "epoch": 0.25221858944418496, "step": 1620}, {"loss": 1.0628, "grad_norm": 0.6124197244644165, "learning_rate": 0.0002, "epoch": 0.25377549431729723, "step": 1630}, {"loss": 1.1301, "grad_norm": 0.7104243636131287, "learning_rate": 0.0002, "epoch": 0.25533239919040945, "step": 1640}, {"loss": 1.0985, "grad_norm": 0.6841777563095093, "learning_rate": 0.0002, "epoch": 0.25688930406352173, "step": 1650}, {"loss": 1.0711, "grad_norm": 0.8895524740219116, "learning_rate": 0.0002, "epoch": 0.25844620893663395, "step": 1660}, {"loss": 1.0803, "grad_norm": 0.723491907119751, "learning_rate": 0.0002, "epoch": 0.2600031138097462, "step": 1670}, {"loss": 1.1472, "grad_norm": 0.6887248158454895, "learning_rate": 0.0002, "epoch": 0.2615600186828585, "step": 1680}, {"loss": 1.0883, "grad_norm": 0.6614824533462524, "learning_rate": 0.0002, "epoch": 0.2631169235559707, "step": 1690}, {"loss": 1.0892, "grad_norm": 0.6652423143386841, "learning_rate": 0.0002, "epoch": 0.264673828429083, "step": 1700}, {"loss": 1.1211, "grad_norm": 0.8568106889724731, "learning_rate": 0.0002, "epoch": 0.2662307333021952, "step": 1710}, {"loss": 1.1166, "grad_norm": 0.733070969581604, "learning_rate": 0.0002, "epoch": 0.2677876381753075, "step": 1720}, {"loss": 1.1297, "grad_norm": 0.7037351131439209, "learning_rate": 0.0002, "epoch": 0.26934454304841976, "step": 1730}, {"loss": 1.0807, "grad_norm": 0.6304486989974976, "learning_rate": 0.0002, "epoch": 0.270901447921532, "step": 1740}, {"loss": 1.0647, "grad_norm": 0.7296901345252991, "learning_rate": 0.0002, "epoch": 0.27245835279464425, "step": 1750}, {"loss": 1.0756, "grad_norm": 0.6070392727851868, "learning_rate": 0.0002, "epoch": 0.2740152576677565, "step": 1760}, {"loss": 1.085, "grad_norm": 0.659273624420166, "learning_rate": 0.0002, "epoch": 0.27557216254086875, "step": 1770}, {"loss": 1.1124, "grad_norm": 0.6617187261581421, "learning_rate": 0.0002, "epoch": 0.277129067413981, "step": 1780}, {"loss": 1.0875, "grad_norm": 0.5451317429542542, "learning_rate": 0.0002, "epoch": 0.27868597228709324, "step": 1790}, {"loss": 1.0726, "grad_norm": 0.6661293506622314, "learning_rate": 0.0002, "epoch": 0.2802428771602055, "step": 1800}, {"loss": 1.0706, "grad_norm": 0.7937290072441101, "learning_rate": 0.0002, "epoch": 0.2817997820333178, "step": 1810}, {"loss": 1.1361, "grad_norm": 0.6947421431541443, "learning_rate": 0.0002, "epoch": 0.28335668690643, "step": 1820}, {"loss": 1.1408, "grad_norm": 0.729793131351471, "learning_rate": 0.0002, "epoch": 0.2849135917795423, "step": 1830}, {"loss": 1.1, "grad_norm": 0.702356219291687, "learning_rate": 0.0002, "epoch": 0.2864704966526545, "step": 1840}, {"loss": 1.0708, "grad_norm": 0.5542839169502258, "learning_rate": 0.0002, "epoch": 0.2880274015257668, "step": 1850}, {"loss": 1.1157, "grad_norm": 0.7186998128890991, "learning_rate": 0.0002, "epoch": 0.28958430639887905, "step": 1860}, {"loss": 1.084, "grad_norm": 1.0709528923034668, "learning_rate": 0.0002, "epoch": 0.2911412112719913, "step": 1870}, {"loss": 1.0804, "grad_norm": 0.6950598955154419, "learning_rate": 0.0002, "epoch": 0.29269811614510355, "step": 1880}, {"loss": 1.0829, "grad_norm": 0.8781602382659912, "learning_rate": 0.0002, "epoch": 0.29425502101821577, "step": 1890}, {"loss": 1.061, "grad_norm": 0.6020617485046387, "learning_rate": 0.0002, "epoch": 0.29581192589132804, "step": 1900}, {"loss": 1.0876, "grad_norm": 0.6175223588943481, "learning_rate": 0.0002, "epoch": 0.2973688307644403, "step": 1910}, {"loss": 1.0439, "grad_norm": 0.6156674027442932, "learning_rate": 0.0002, "epoch": 0.29892573563755254, "step": 1920}, {"loss": 1.0593, "grad_norm": 0.6090167760848999, "learning_rate": 0.0002, "epoch": 0.3004826405106648, "step": 1930}, {"loss": 1.0762, "grad_norm": 1.018808364868164, "learning_rate": 0.0002, "epoch": 0.30203954538377703, "step": 1940}, {"loss": 1.1386, "grad_norm": 1.0168933868408203, "learning_rate": 0.0002, "epoch": 0.3035964502568893, "step": 1950}, {"loss": 1.055, "grad_norm": 0.598308265209198, "learning_rate": 0.0002, "epoch": 0.3051533551300016, "step": 1960}, {"loss": 1.0883, "grad_norm": 0.6474918723106384, "learning_rate": 0.0002, "epoch": 0.3067102600031138, "step": 1970}, {"loss": 1.1321, "grad_norm": 0.5655245184898376, "learning_rate": 0.0002, "epoch": 0.3082671648762261, "step": 1980}, {"loss": 1.1428, "grad_norm": 0.6680483222007751, "learning_rate": 0.0002, "epoch": 0.3098240697493383, "step": 1990}, {"loss": 1.1369, "grad_norm": 0.665328323841095, "learning_rate": 0.0002, "epoch": 0.31138097462245057, "step": 2000}, {"loss": 1.0895, "grad_norm": 0.5541640520095825, "learning_rate": 0.0002, "epoch": 0.31293787949556284, "step": 2010}, {"loss": 1.1434, "grad_norm": 0.8245078921318054, "learning_rate": 0.0002, "epoch": 0.31449478436867506, "step": 2020}, {"loss": 1.0766, "grad_norm": 0.6890619993209839, "learning_rate": 0.0002, "epoch": 0.31605168924178734, "step": 2030}, {"loss": 1.1116, "grad_norm": 0.6615879535675049, "learning_rate": 0.0002, "epoch": 0.31760859411489956, "step": 2040}, {"loss": 1.1019, "grad_norm": 0.6049224734306335, "learning_rate": 0.0002, "epoch": 0.31916549898801183, "step": 2050}, {"loss": 1.0777, "grad_norm": 0.6408320665359497, "learning_rate": 0.0002, "epoch": 0.3207224038611241, "step": 2060}, {"loss": 1.0915, "grad_norm": 0.6702662706375122, "learning_rate": 0.0002, "epoch": 0.3222793087342363, "step": 2070}, {"loss": 1.1043, "grad_norm": 0.645772397518158, "learning_rate": 0.0002, "epoch": 0.3238362136073486, "step": 2080}, {"loss": 1.0745, "grad_norm": 0.7813620567321777, "learning_rate": 0.0002, "epoch": 0.3253931184804608, "step": 2090}, {"loss": 1.0972, "grad_norm": 0.710206151008606, "learning_rate": 0.0002, "epoch": 0.3269500233535731, "step": 2100}, {"loss": 1.0975, "grad_norm": 0.696354866027832, "learning_rate": 0.0002, "epoch": 0.32850692822668537, "step": 2110}, {"loss": 1.0826, "grad_norm": 0.6182078719139099, "learning_rate": 0.0002, "epoch": 0.3300638330997976, "step": 2120}, {"loss": 1.0985, "grad_norm": 0.7604923844337463, "learning_rate": 0.0002, "epoch": 0.33162073797290986, "step": 2130}, {"loss": 1.089, "grad_norm": 0.610990583896637, "learning_rate": 0.0002, "epoch": 0.3331776428460221, "step": 2140}, {"loss": 1.1743, "grad_norm": 0.6476627588272095, "learning_rate": 0.0002, "epoch": 0.33473454771913436, "step": 2150}, {"loss": 1.0877, "grad_norm": 0.6220194101333618, "learning_rate": 0.0002, "epoch": 0.33629145259224663, "step": 2160}, {"loss": 1.1573, "grad_norm": 0.6761205792427063, "learning_rate": 0.0002, "epoch": 0.33784835746535885, "step": 2170}, {"loss": 1.1161, "grad_norm": 0.7645694613456726, "learning_rate": 0.0002, "epoch": 0.3394052623384711, "step": 2180}, {"loss": 1.0712, "grad_norm": 1.0127054452896118, "learning_rate": 0.0002, "epoch": 0.34096216721158334, "step": 2190}, {"loss": 1.1081, "grad_norm": 0.7457563877105713, "learning_rate": 0.0002, "epoch": 0.3425190720846956, "step": 2200}, {"loss": 1.0874, "grad_norm": 0.7844580411911011, "learning_rate": 0.0002, "epoch": 0.3440759769578079, "step": 2210}, {"loss": 1.0751, "grad_norm": 0.6543853282928467, "learning_rate": 0.0002, "epoch": 0.3456328818309201, "step": 2220}, {"loss": 1.0959, "grad_norm": 0.6699133515357971, "learning_rate": 0.0002, "epoch": 0.3471897867040324, "step": 2230}, {"loss": 1.0965, "grad_norm": 0.7180582284927368, "learning_rate": 0.0002, "epoch": 0.34874669157714466, "step": 2240}, {"loss": 1.0177, "grad_norm": 0.7387579083442688, "learning_rate": 0.0002, "epoch": 0.3503035964502569, "step": 2250}, {"loss": 1.0786, "grad_norm": 0.6241863369941711, "learning_rate": 0.0002, "epoch": 0.35186050132336916, "step": 2260}, {"loss": 1.101, "grad_norm": 0.5506595969200134, "learning_rate": 0.0002, "epoch": 0.3534174061964814, "step": 2270}, {"loss": 1.1404, "grad_norm": 2.04541277885437, "learning_rate": 0.0002, "epoch": 0.35497431106959365, "step": 2280}, {"loss": 1.1066, "grad_norm": 0.6534666419029236, "learning_rate": 0.0002, "epoch": 0.3565312159427059, "step": 2290}, {"loss": 1.0903, "grad_norm": 0.6315365433692932, "learning_rate": 0.0002, "epoch": 0.35808812081581815, "step": 2300}, {"loss": 1.0809, "grad_norm": 0.7615909576416016, "learning_rate": 0.0002, "epoch": 0.3596450256889304, "step": 2310}, {"loss": 1.0671, "grad_norm": 0.7002543807029724, "learning_rate": 0.0002, "epoch": 0.36120193056204264, "step": 2320}, {"loss": 1.0849, "grad_norm": 0.7433227896690369, "learning_rate": 0.0002, "epoch": 0.3627588354351549, "step": 2330}, {"loss": 1.0309, "grad_norm": 0.7358414530754089, "learning_rate": 0.0002, "epoch": 0.3643157403082672, "step": 2340}, {"loss": 1.1154, "grad_norm": 1.0423749685287476, "learning_rate": 0.0002, "epoch": 0.3658726451813794, "step": 2350}, {"loss": 1.0376, "grad_norm": 0.7239764928817749, "learning_rate": 0.0002, "epoch": 0.3674295500544917, "step": 2360}, {"loss": 1.1064, "grad_norm": 0.6718716025352478, "learning_rate": 0.0002, "epoch": 0.3689864549276039, "step": 2370}, {"loss": 1.0515, "grad_norm": 0.7648200392723083, "learning_rate": 0.0002, "epoch": 0.3705433598007162, "step": 2380}, {"loss": 1.1316, "grad_norm": 0.694695234298706, "learning_rate": 0.0002, "epoch": 0.37210026467382845, "step": 2390}, {"loss": 1.101, "grad_norm": 0.5861249566078186, "learning_rate": 0.0002, "epoch": 0.37365716954694067, "step": 2400}, {"loss": 1.1102, "grad_norm": 0.6659696698188782, "learning_rate": 0.0002, "epoch": 0.37521407442005295, "step": 2410}, {"loss": 1.0964, "grad_norm": 0.8343538641929626, "learning_rate": 0.0002, "epoch": 0.37677097929316516, "step": 2420}, {"loss": 1.0711, "grad_norm": 0.5911353826522827, "learning_rate": 0.0002, "epoch": 0.37832788416627744, "step": 2430}, {"loss": 1.1488, "grad_norm": 0.713294506072998, "learning_rate": 0.0002, "epoch": 0.3798847890393897, "step": 2440}, {"loss": 1.0825, "grad_norm": 0.6990512013435364, "learning_rate": 0.0002, "epoch": 0.38144169391250193, "step": 2450}, {"loss": 1.0565, "grad_norm": 0.6704243421554565, "learning_rate": 0.0002, "epoch": 0.3829985987856142, "step": 2460}, {"loss": 1.0888, "grad_norm": 1.7147644758224487, "learning_rate": 0.0002, "epoch": 0.3845555036587264, "step": 2470}, {"loss": 1.1124, "grad_norm": 0.6609890460968018, "learning_rate": 0.0002, "epoch": 0.3861124085318387, "step": 2480}, {"loss": 1.107, "grad_norm": 0.7996148467063904, "learning_rate": 0.0002, "epoch": 0.387669313404951, "step": 2490}, {"loss": 1.0707, "grad_norm": 0.6513879299163818, "learning_rate": 0.0002, "epoch": 0.3892262182780632, "step": 2500}, {"loss": 1.0731, "grad_norm": 0.7325628995895386, "learning_rate": 0.0002, "epoch": 0.39078312315117547, "step": 2510}, {"loss": 1.088, "grad_norm": 0.6879380345344543, "learning_rate": 0.0002, "epoch": 0.3923400280242877, "step": 2520}, {"loss": 1.1215, "grad_norm": 0.7462451457977295, "learning_rate": 0.0002, "epoch": 0.39389693289739997, "step": 2530}, {"loss": 1.0911, "grad_norm": 0.5704542398452759, "learning_rate": 0.0002, "epoch": 0.39545383777051224, "step": 2540}, {"loss": 1.07, "grad_norm": 0.7691283822059631, "learning_rate": 0.0002, "epoch": 0.39701074264362446, "step": 2550}, {"loss": 1.1244, "grad_norm": 0.8406426310539246, "learning_rate": 0.0002, "epoch": 0.39856764751673673, "step": 2560}, {"loss": 1.0831, "grad_norm": 0.8305632472038269, "learning_rate": 0.0002, "epoch": 0.40012455238984895, "step": 2570}, {"loss": 1.1022, "grad_norm": 0.648078441619873, "learning_rate": 0.0002, "epoch": 0.40168145726296123, "step": 2580}, {"loss": 1.071, "grad_norm": 0.6585285067558289, "learning_rate": 0.0002, "epoch": 0.4032383621360735, "step": 2590}, {"loss": 1.1339, "grad_norm": 0.6653701663017273, "learning_rate": 0.0002, "epoch": 0.4047952670091857, "step": 2600}, {"loss": 1.0768, "grad_norm": 0.6572180986404419, "learning_rate": 0.0002, "epoch": 0.406352171882298, "step": 2610}, {"loss": 1.1058, "grad_norm": 1.275527000427246, "learning_rate": 0.0002, "epoch": 0.40790907675541027, "step": 2620}, {"loss": 1.1092, "grad_norm": 0.7313233017921448, "learning_rate": 0.0002, "epoch": 0.4094659816285225, "step": 2630}, {"loss": 1.1115, "grad_norm": 0.7788786888122559, "learning_rate": 0.0002, "epoch": 0.41102288650163477, "step": 2640}, {"loss": 1.1229, "grad_norm": 0.671308696269989, "learning_rate": 0.0002, "epoch": 0.412579791374747, "step": 2650}, {"loss": 1.0766, "grad_norm": 0.6768915057182312, "learning_rate": 0.0002, "epoch": 0.41413669624785926, "step": 2660}, {"loss": 1.1119, "grad_norm": 0.7758002877235413, "learning_rate": 0.0002, "epoch": 0.41569360112097153, "step": 2670}, {"loss": 1.0677, "grad_norm": 0.8067342638969421, "learning_rate": 0.0002, "epoch": 0.41725050599408375, "step": 2680}, {"loss": 1.0917, "grad_norm": 0.8075875043869019, "learning_rate": 0.0002, "epoch": 0.41880741086719603, "step": 2690}, {"loss": 1.0918, "grad_norm": 0.689170241355896, "learning_rate": 0.0002, "epoch": 0.42036431574030825, "step": 2700}, {"loss": 1.1443, "grad_norm": 0.5704254508018494, "learning_rate": 0.0002, "epoch": 0.4219212206134205, "step": 2710}, {"loss": 1.0876, "grad_norm": 0.6495749354362488, "learning_rate": 0.0002, "epoch": 0.4234781254865328, "step": 2720}, {"loss": 1.1156, "grad_norm": 0.6903294920921326, "learning_rate": 0.0002, "epoch": 0.425035030359645, "step": 2730}, {"loss": 1.0945, "grad_norm": 0.6182425022125244, "learning_rate": 0.0002, "epoch": 0.4265919352327573, "step": 2740}, {"loss": 1.0888, "grad_norm": 0.8160443305969238, "learning_rate": 0.0002, "epoch": 0.4281488401058695, "step": 2750}, {"loss": 1.0632, "grad_norm": 0.7278578877449036, "learning_rate": 0.0002, "epoch": 0.4297057449789818, "step": 2760}, {"loss": 1.0699, "grad_norm": 0.6571283340454102, "learning_rate": 0.0002, "epoch": 0.43126264985209406, "step": 2770}, {"loss": 1.0841, "grad_norm": 0.6829530000686646, "learning_rate": 0.0002, "epoch": 0.4328195547252063, "step": 2780}, {"loss": 1.1289, "grad_norm": 0.6663913726806641, "learning_rate": 0.0002, "epoch": 0.43437645959831855, "step": 2790}, {"loss": 1.0924, "grad_norm": 0.6987531185150146, "learning_rate": 0.0002, "epoch": 0.4359333644714308, "step": 2800}, {"loss": 1.097, "grad_norm": 0.6285653710365295, "learning_rate": 0.0002, "epoch": 0.43749026934454305, "step": 2810}, {"loss": 1.0634, "grad_norm": 0.8040802478790283, "learning_rate": 0.0002, "epoch": 0.4390471742176553, "step": 2820}, {"loss": 1.0542, "grad_norm": 0.7612026929855347, "learning_rate": 0.0002, "epoch": 0.44060407909076754, "step": 2830}, {"loss": 1.0419, "grad_norm": 0.597648561000824, "learning_rate": 0.0002, "epoch": 0.4421609839638798, "step": 2840}, {"loss": 1.0959, "grad_norm": 0.957578718662262, "learning_rate": 0.0002, "epoch": 0.44371788883699204, "step": 2850}, {"loss": 1.1038, "grad_norm": 0.7712880969047546, "learning_rate": 0.0002, "epoch": 0.4452747937101043, "step": 2860}, {"loss": 1.0711, "grad_norm": 0.6894059181213379, "learning_rate": 0.0002, "epoch": 0.4468316985832166, "step": 2870}, {"loss": 1.0675, "grad_norm": 0.71763014793396, "learning_rate": 0.0002, "epoch": 0.4483886034563288, "step": 2880}, {"loss": 1.0983, "grad_norm": 0.7187833189964294, "learning_rate": 0.0002, "epoch": 0.4499455083294411, "step": 2890}, {"loss": 1.0712, "grad_norm": 0.669449508190155, "learning_rate": 0.0002, "epoch": 0.4515024132025533, "step": 2900}, {"loss": 1.1717, "grad_norm": 3.782758951187134, "learning_rate": 0.0002, "epoch": 0.4530593180756656, "step": 2910}, {"loss": 1.0734, "grad_norm": 0.7955448627471924, "learning_rate": 0.0002, "epoch": 0.45461622294877785, "step": 2920}, {"loss": 1.0905, "grad_norm": 0.5675301551818848, "learning_rate": 0.0002, "epoch": 0.45617312782189007, "step": 2930}, {"loss": 1.086, "grad_norm": 0.7283480167388916, "learning_rate": 0.0002, "epoch": 0.45773003269500234, "step": 2940}, {"loss": 1.081, "grad_norm": 0.8040274977684021, "learning_rate": 0.0002, "epoch": 0.45928693756811456, "step": 2950}, {"loss": 1.0816, "grad_norm": 0.7220824956893921, "learning_rate": 0.0002, "epoch": 0.46084384244122684, "step": 2960}, {"loss": 1.0681, "grad_norm": 2.838085889816284, "learning_rate": 0.0002, "epoch": 0.4624007473143391, "step": 2970}, {"loss": 1.0574, "grad_norm": 0.6726126074790955, "learning_rate": 0.0002, "epoch": 0.46395765218745133, "step": 2980}, {"loss": 1.0426, "grad_norm": 0.6943784356117249, "learning_rate": 0.0002, "epoch": 0.4655145570605636, "step": 2990}, {"loss": 1.0749, "grad_norm": 0.6766270399093628, "learning_rate": 0.0002, "epoch": 0.4670714619336759, "step": 3000}, {"loss": 1.0757, "grad_norm": 0.5841159820556641, "learning_rate": 0.0002, "epoch": 0.4686283668067881, "step": 3010}, {"loss": 1.1016, "grad_norm": 2.5167019367218018, "learning_rate": 0.0002, "epoch": 0.4701852716799004, "step": 3020}, {"loss": 1.4291, "grad_norm": 62.75643539428711, "learning_rate": 0.0002, "epoch": 0.4717421765530126, "step": 3030}, {"loss": 1.1692, "grad_norm": 5.448807716369629, "learning_rate": 0.0002, "epoch": 0.47329908142612487, "step": 3040}, {"loss": 1.127, "grad_norm": 0.7601955533027649, "learning_rate": 0.0002, "epoch": 0.47485598629923714, "step": 3050}, {"loss": 1.0891, "grad_norm": 0.989210844039917, "learning_rate": 0.0002, "epoch": 0.47641289117234936, "step": 3060}, {"loss": 1.1359, "grad_norm": 0.8469926714897156, "learning_rate": 0.0002, "epoch": 0.47796979604546164, "step": 3070}, {"loss": 1.1149, "grad_norm": 0.9349185824394226, "learning_rate": 0.0002, "epoch": 0.47952670091857386, "step": 3080}, {"loss": 1.0983, "grad_norm": 0.6271135807037354, "learning_rate": 0.0002, "epoch": 0.48108360579168613, "step": 3090}, {"loss": 1.1094, "grad_norm": 0.7917095422744751, "learning_rate": 0.0002, "epoch": 0.4826405106647984, "step": 3100}, {"loss": 1.0135, "grad_norm": 0.6934359073638916, "learning_rate": 0.0002, "epoch": 0.4841974155379106, "step": 3110}, {"loss": 1.0877, "grad_norm": 0.6818416118621826, "learning_rate": 0.0002, "epoch": 0.4857543204110229, "step": 3120}, {"loss": 1.083, "grad_norm": 1.0593913793563843, "learning_rate": 0.0002, "epoch": 0.4873112252841351, "step": 3130}, {"loss": 1.0935, "grad_norm": 0.6998370289802551, "learning_rate": 0.0002, "epoch": 0.4888681301572474, "step": 3140}, {"loss": 1.0784, "grad_norm": 0.7944499254226685, "learning_rate": 0.0002, "epoch": 0.49042503503035967, "step": 3150}, {"loss": 1.0905, "grad_norm": 1.089996099472046, "learning_rate": 0.0002, "epoch": 0.4919819399034719, "step": 3160}, {"loss": 1.0593, "grad_norm": 0.700448215007782, "learning_rate": 0.0002, "epoch": 0.49353884477658416, "step": 3170}, {"loss": 1.1113, "grad_norm": 0.6886814832687378, "learning_rate": 0.0002, "epoch": 0.4950957496496964, "step": 3180}, {"loss": 1.0626, "grad_norm": 0.6269518136978149, "learning_rate": 0.0002, "epoch": 0.49665265452280866, "step": 3190}, {"loss": 1.043, "grad_norm": 0.7439284920692444, "learning_rate": 0.0002, "epoch": 0.49820955939592093, "step": 3200}, {"loss": 1.1239, "grad_norm": 0.8870360255241394, "learning_rate": 0.0002, "epoch": 0.49976646426903315, "step": 3210}, {"loss": 1.0922, "grad_norm": 0.7199103236198425, "learning_rate": 0.0002, "epoch": 0.5013233691421454, "step": 3220}, {"loss": 1.0979, "grad_norm": 0.7634034752845764, "learning_rate": 0.0002, "epoch": 0.5028802740152577, "step": 3230}, {"loss": 1.1356, "grad_norm": 0.8855092525482178, "learning_rate": 0.0002, "epoch": 0.5044371788883699, "step": 3240}, {"loss": 1.1478, "grad_norm": 0.9303096532821655, "learning_rate": 0.0002, "epoch": 0.5059940837614821, "step": 3250}, {"loss": 1.0473, "grad_norm": 0.6604179739952087, "learning_rate": 0.0002, "epoch": 0.5075509886345945, "step": 3260}, {"loss": 1.1155, "grad_norm": 0.6351062059402466, "learning_rate": 0.0002, "epoch": 0.5091078935077067, "step": 3270}, {"loss": 1.0595, "grad_norm": 0.630638599395752, "learning_rate": 0.0002, "epoch": 0.5106647983808189, "step": 3280}, {"loss": 1.0938, "grad_norm": 0.707846999168396, "learning_rate": 0.0002, "epoch": 0.5122217032539312, "step": 3290}, {"loss": 1.0848, "grad_norm": 0.833063006401062, "learning_rate": 0.0002, "epoch": 0.5137786081270435, "step": 3300}, {"loss": 1.0734, "grad_norm": 1.3204951286315918, "learning_rate": 0.0002, "epoch": 0.5153355130001557, "step": 3310}, {"loss": 1.0932, "grad_norm": 0.70233154296875, "learning_rate": 0.0002, "epoch": 0.5168924178732679, "step": 3320}, {"loss": 1.1014, "grad_norm": 0.8448212146759033, "learning_rate": 0.0002, "epoch": 0.5184493227463802, "step": 3330}, {"loss": 1.0423, "grad_norm": 0.6670085191726685, "learning_rate": 0.0002, "epoch": 0.5200062276194924, "step": 3340}, {"loss": 1.1122, "grad_norm": 0.829553484916687, "learning_rate": 0.0002, "epoch": 0.5215631324926047, "step": 3350}, {"loss": 1.1781, "grad_norm": 0.9076400399208069, "learning_rate": 0.0002, "epoch": 0.523120037365717, "step": 3360}, {"loss": 1.0546, "grad_norm": 0.8321594595909119, "learning_rate": 0.0002, "epoch": 0.5246769422388292, "step": 3370}, {"loss": 1.0595, "grad_norm": 0.8174448013305664, "learning_rate": 0.0002, "epoch": 0.5262338471119414, "step": 3380}, {"loss": 1.0639, "grad_norm": 0.7878963947296143, "learning_rate": 0.0002, "epoch": 0.5277907519850538, "step": 3390}, {"loss": 1.1515, "grad_norm": 0.7636891603469849, "learning_rate": 0.0002, "epoch": 0.529347656858166, "step": 3400}, {"loss": 1.1158, "grad_norm": 0.9053562879562378, "learning_rate": 0.0002, "epoch": 0.5309045617312782, "step": 3410}, {"loss": 1.0616, "grad_norm": 0.6890588402748108, "learning_rate": 0.0002, "epoch": 0.5324614666043904, "step": 3420}, {"loss": 1.0477, "grad_norm": 0.7571008205413818, "learning_rate": 0.0002, "epoch": 0.5340183714775028, "step": 3430}, {"loss": 1.0961, "grad_norm": 0.6796272993087769, "learning_rate": 0.0002, "epoch": 0.535575276350615, "step": 3440}, {"loss": 1.0556, "grad_norm": 0.6687950491905212, "learning_rate": 0.0002, "epoch": 0.5371321812237272, "step": 3450}, {"loss": 1.1041, "grad_norm": 0.6518206000328064, "learning_rate": 0.0002, "epoch": 0.5386890860968395, "step": 3460}, {"loss": 1.1103, "grad_norm": 0.7498114109039307, "learning_rate": 0.0002, "epoch": 0.5402459909699517, "step": 3470}, {"loss": 1.0797, "grad_norm": 0.7383188605308533, "learning_rate": 0.0002, "epoch": 0.541802895843064, "step": 3480}, {"loss": 1.0885, "grad_norm": 0.7201677560806274, "learning_rate": 0.0002, "epoch": 0.5433598007161763, "step": 3490}, {"loss": 1.0963, "grad_norm": 0.6782627701759338, "learning_rate": 0.0002, "epoch": 0.5449167055892885, "step": 3500}, {"loss": 1.0885, "grad_norm": 0.6866056323051453, "learning_rate": 0.0002, "epoch": 0.5464736104624007, "step": 3510}, {"loss": 1.0904, "grad_norm": 0.7693064212799072, "learning_rate": 0.0002, "epoch": 0.548030515335513, "step": 3520}, {"loss": 1.1055, "grad_norm": 0.7992173433303833, "learning_rate": 0.0002, "epoch": 0.5495874202086253, "step": 3530}, {"loss": 1.0783, "grad_norm": 0.7489389777183533, "learning_rate": 0.0002, "epoch": 0.5511443250817375, "step": 3540}, {"loss": 1.0892, "grad_norm": 0.9006646871566772, "learning_rate": 0.0002, "epoch": 0.5527012299548497, "step": 3550}, {"loss": 1.0813, "grad_norm": 0.6955394744873047, "learning_rate": 0.0002, "epoch": 0.554258134827962, "step": 3560}, {"loss": 1.0846, "grad_norm": 0.8455405831336975, "learning_rate": 0.0002, "epoch": 0.5558150397010743, "step": 3570}, {"loss": 1.0735, "grad_norm": 0.6958834528923035, "learning_rate": 0.0002, "epoch": 0.5573719445741865, "step": 3580}, {"loss": 1.1052, "grad_norm": 0.6896408796310425, "learning_rate": 0.0002, "epoch": 0.5589288494472988, "step": 3590}, {"loss": 1.0773, "grad_norm": 0.8004612922668457, "learning_rate": 0.0002, "epoch": 0.560485754320411, "step": 3600}, {"loss": 1.0603, "grad_norm": 0.9905720353126526, "learning_rate": 0.0002, "epoch": 0.5620426591935233, "step": 3610}, {"loss": 1.1081, "grad_norm": 0.7359225153923035, "learning_rate": 0.0002, "epoch": 0.5635995640666356, "step": 3620}, {"loss": 1.0429, "grad_norm": 0.696476936340332, "learning_rate": 0.0002, "epoch": 0.5651564689397478, "step": 3630}, {"loss": 1.1515, "grad_norm": 0.8042669296264648, "learning_rate": 0.0002, "epoch": 0.56671337381286, "step": 3640}, {"loss": 1.0732, "grad_norm": 0.835766077041626, "learning_rate": 0.0002, "epoch": 0.5682702786859722, "step": 3650}, {"loss": 1.1112, "grad_norm": 0.886236846446991, "learning_rate": 0.0002, "epoch": 0.5698271835590846, "step": 3660}, {"loss": 1.1164, "grad_norm": 0.9304346442222595, "learning_rate": 0.0002, "epoch": 0.5713840884321968, "step": 3670}, {"loss": 1.0412, "grad_norm": 0.6224237084388733, "learning_rate": 0.0002, "epoch": 0.572940993305309, "step": 3680}, {"loss": 1.0783, "grad_norm": 0.7759581208229065, "learning_rate": 0.0002, "epoch": 0.5744978981784213, "step": 3690}, {"loss": 1.0661, "grad_norm": 0.6068091988563538, "learning_rate": 0.0002, "epoch": 0.5760548030515336, "step": 3700}, {"loss": 1.0961, "grad_norm": 0.8322245478630066, "learning_rate": 0.0002, "epoch": 0.5776117079246458, "step": 3710}, {"loss": 1.0681, "grad_norm": 0.8465868234634399, "learning_rate": 0.0002, "epoch": 0.5791686127977581, "step": 3720}, {"loss": 1.0578, "grad_norm": 0.7247440814971924, "learning_rate": 0.0002, "epoch": 0.5807255176708703, "step": 3730}, {"loss": 1.0638, "grad_norm": 0.8392718434333801, "learning_rate": 0.0002, "epoch": 0.5822824225439825, "step": 3740}, {"loss": 1.0694, "grad_norm": 0.7680680155754089, "learning_rate": 0.0002, "epoch": 0.5838393274170948, "step": 3750}, {"loss": 1.0664, "grad_norm": 0.7348233461380005, "learning_rate": 0.0002, "epoch": 0.5853962322902071, "step": 3760}, {"loss": 1.1248, "grad_norm": 0.7348080277442932, "learning_rate": 0.0002, "epoch": 0.5869531371633193, "step": 3770}, {"loss": 1.0932, "grad_norm": 0.9335888028144836, "learning_rate": 0.0002, "epoch": 0.5885100420364315, "step": 3780}, {"loss": 1.0923, "grad_norm": 0.8341727256774902, "learning_rate": 0.0002, "epoch": 0.5900669469095439, "step": 3790}, {"loss": 1.0878, "grad_norm": 0.7428248524665833, "learning_rate": 0.0002, "epoch": 0.5916238517826561, "step": 3800}, {"loss": 1.0233, "grad_norm": 0.7464084625244141, "learning_rate": 0.0002, "epoch": 0.5931807566557683, "step": 3810}, {"loss": 1.0248, "grad_norm": 0.7931474447250366, "learning_rate": 0.0002, "epoch": 0.5947376615288806, "step": 3820}, {"loss": 1.0991, "grad_norm": 0.715437650680542, "learning_rate": 0.0002, "epoch": 0.5962945664019929, "step": 3830}, {"loss": 1.0926, "grad_norm": 0.9891166090965271, "learning_rate": 0.0002, "epoch": 0.5978514712751051, "step": 3840}, {"loss": 1.0474, "grad_norm": 0.7681272029876709, "learning_rate": 0.0002, "epoch": 0.5994083761482173, "step": 3850}, {"loss": 1.0141, "grad_norm": 1.1160913705825806, "learning_rate": 0.0002, "epoch": 0.6009652810213296, "step": 3860}, {"loss": 1.1201, "grad_norm": 0.7390976548194885, "learning_rate": 0.0002, "epoch": 0.6025221858944418, "step": 3870}, {"loss": 1.115, "grad_norm": 0.7421828508377075, "learning_rate": 0.0002, "epoch": 0.6040790907675541, "step": 3880}, {"loss": 1.1017, "grad_norm": 0.672709047794342, "learning_rate": 0.0002, "epoch": 0.6056359956406664, "step": 3890}, {"loss": 1.0553, "grad_norm": 0.7313169836997986, "learning_rate": 0.0002, "epoch": 0.6071929005137786, "step": 3900}, {"loss": 1.0824, "grad_norm": 0.6218095421791077, "learning_rate": 0.0002, "epoch": 0.6087498053868908, "step": 3910}, {"loss": 1.0686, "grad_norm": 0.8796320557594299, "learning_rate": 0.0002, "epoch": 0.6103067102600032, "step": 3920}, {"loss": 1.0643, "grad_norm": 0.9690935611724854, "learning_rate": 0.0002, "epoch": 0.6118636151331154, "step": 3930}, {"loss": 1.038, "grad_norm": 0.7001955509185791, "learning_rate": 0.0002, "epoch": 0.6134205200062276, "step": 3940}, {"loss": 1.0827, "grad_norm": 0.6987056732177734, "learning_rate": 0.0002, "epoch": 0.6149774248793398, "step": 3950}, {"loss": 1.0754, "grad_norm": 0.6997740864753723, "learning_rate": 0.0002, "epoch": 0.6165343297524521, "step": 3960}, {"loss": 1.1164, "grad_norm": 0.87599778175354, "learning_rate": 0.0002, "epoch": 0.6180912346255644, "step": 3970}, {"loss": 1.0904, "grad_norm": 0.7927989959716797, "learning_rate": 0.0002, "epoch": 0.6196481394986766, "step": 3980}, {"loss": 1.064, "grad_norm": 0.7939152717590332, "learning_rate": 0.0002, "epoch": 0.6212050443717889, "step": 3990}, {"loss": 1.0676, "grad_norm": 0.6806561350822449, "learning_rate": 0.0002, "epoch": 0.6227619492449011, "step": 4000}, {"loss": 1.096, "grad_norm": 0.8112443685531616, "learning_rate": 0.0002, "epoch": 0.6243188541180134, "step": 4010}, {"loss": 1.05, "grad_norm": 0.6750677227973938, "learning_rate": 0.0002, "epoch": 0.6258757589911257, "step": 4020}, {"loss": 1.0901, "grad_norm": 0.6818493604660034, "learning_rate": 0.0002, "epoch": 0.6274326638642379, "step": 4030}, {"loss": 1.075, "grad_norm": 0.808699369430542, "learning_rate": 0.0002, "epoch": 0.6289895687373501, "step": 4040}, {"loss": 1.0935, "grad_norm": 0.6548172235488892, "learning_rate": 0.0002, "epoch": 0.6305464736104625, "step": 4050}, {"loss": 1.0121, "grad_norm": 0.7432080507278442, "learning_rate": 0.0002, "epoch": 0.6321033784835747, "step": 4060}, {"loss": 1.0735, "grad_norm": 0.9340347647666931, "learning_rate": 0.0002, "epoch": 0.6336602833566869, "step": 4070}, {"loss": 1.0453, "grad_norm": 0.6241884231567383, "learning_rate": 0.0002, "epoch": 0.6352171882297991, "step": 4080}, {"loss": 1.0491, "grad_norm": 0.8011093735694885, "learning_rate": 0.0002, "epoch": 0.6367740931029114, "step": 4090}, {"loss": 1.0918, "grad_norm": 0.6794643402099609, "learning_rate": 0.0002, "epoch": 0.6383309979760237, "step": 4100}, {"loss": 1.1108, "grad_norm": 0.810511589050293, "learning_rate": 0.0002, "epoch": 0.6398879028491359, "step": 4110}, {"loss": 1.07, "grad_norm": 0.7479730844497681, "learning_rate": 0.0002, "epoch": 0.6414448077222482, "step": 4120}, {"loss": 1.0626, "grad_norm": 0.9362952709197998, "learning_rate": 0.0002, "epoch": 0.6430017125953604, "step": 4130}, {"loss": 1.0687, "grad_norm": 0.658596932888031, "learning_rate": 0.0002, "epoch": 0.6445586174684726, "step": 4140}, {"loss": 1.0705, "grad_norm": 0.816819429397583, "learning_rate": 0.0002, "epoch": 0.646115522341585, "step": 4150}, {"loss": 1.086, "grad_norm": 1.035759687423706, "learning_rate": 0.0002, "epoch": 0.6476724272146972, "step": 4160}, {"loss": 1.0501, "grad_norm": 0.9264973998069763, "learning_rate": 0.0002, "epoch": 0.6492293320878094, "step": 4170}, {"loss": 1.1168, "grad_norm": 0.7711799740791321, "learning_rate": 0.0002, "epoch": 0.6507862369609216, "step": 4180}, {"loss": 1.091, "grad_norm": 0.7456914782524109, "learning_rate": 0.0002, "epoch": 0.652343141834034, "step": 4190}, {"loss": 1.039, "grad_norm": 0.9097701907157898, "learning_rate": 0.0002, "epoch": 0.6539000467071462, "step": 4200}, {"loss": 1.0766, "grad_norm": 0.7438989877700806, "learning_rate": 0.0002, "epoch": 0.6554569515802584, "step": 4210}, {"loss": 1.0316, "grad_norm": 0.7387025356292725, "learning_rate": 0.0002, "epoch": 0.6570138564533707, "step": 4220}, {"loss": 1.0629, "grad_norm": 0.982597291469574, "learning_rate": 0.0002, "epoch": 0.658570761326483, "step": 4230}, {"loss": 1.0291, "grad_norm": 0.7802243232727051, "learning_rate": 0.0002, "epoch": 0.6601276661995952, "step": 4240}, {"loss": 1.1107, "grad_norm": 1.1484220027923584, "learning_rate": 0.0002, "epoch": 0.6616845710727075, "step": 4250}, {"loss": 1.014, "grad_norm": 0.7660313844680786, "learning_rate": 0.0002, "epoch": 0.6632414759458197, "step": 4260}, {"loss": 1.0586, "grad_norm": 0.8125105500221252, "learning_rate": 0.0002, "epoch": 0.6647983808189319, "step": 4270}, {"loss": 1.0439, "grad_norm": 0.6372778415679932, "learning_rate": 0.0002, "epoch": 0.6663552856920442, "step": 4280}, {"loss": 1.061, "grad_norm": 0.6706284880638123, "learning_rate": 0.0002, "epoch": 0.6679121905651565, "step": 4290}, {"loss": 1.039, "grad_norm": 0.6464365124702454, "learning_rate": 0.0002, "epoch": 0.6694690954382687, "step": 4300}, {"loss": 1.0438, "grad_norm": 0.7389585971832275, "learning_rate": 0.0002, "epoch": 0.6710260003113809, "step": 4310}, {"loss": 1.0977, "grad_norm": 0.7603920102119446, "learning_rate": 0.0002, "epoch": 0.6725829051844933, "step": 4320}, {"loss": 1.0695, "grad_norm": 1.0394083261489868, "learning_rate": 0.0002, "epoch": 0.6741398100576055, "step": 4330}, {"loss": 1.0769, "grad_norm": 1.0419425964355469, "learning_rate": 0.0002, "epoch": 0.6756967149307177, "step": 4340}, {"loss": 1.0955, "grad_norm": 0.8922641277313232, "learning_rate": 0.0002, "epoch": 0.67725361980383, "step": 4350}, {"loss": 1.0998, "grad_norm": 0.8732121586799622, "learning_rate": 0.0002, "epoch": 0.6788105246769423, "step": 4360}, {"loss": 1.1436, "grad_norm": 0.8333417773246765, "learning_rate": 0.0002, "epoch": 0.6803674295500545, "step": 4370}, {"loss": 1.0742, "grad_norm": 0.8446813225746155, "learning_rate": 0.0002, "epoch": 0.6819243344231667, "step": 4380}, {"loss": 1.1036, "grad_norm": 0.8006939888000488, "learning_rate": 0.0002, "epoch": 0.683481239296279, "step": 4390}, {"loss": 1.0982, "grad_norm": 0.9227681159973145, "learning_rate": 0.0002, "epoch": 0.6850381441693912, "step": 4400}, {"loss": 1.1049, "grad_norm": 0.8276755213737488, "learning_rate": 0.0002, "epoch": 0.6865950490425035, "step": 4410}, {"loss": 1.0988, "grad_norm": 1.3938783407211304, "learning_rate": 0.0002, "epoch": 0.6881519539156158, "step": 4420}, {"loss": 1.1049, "grad_norm": 0.8166589736938477, "learning_rate": 0.0002, "epoch": 0.689708858788728, "step": 4430}, {"loss": 1.1702, "grad_norm": 0.675490140914917, "learning_rate": 0.0002, "epoch": 0.6912657636618402, "step": 4440}, {"loss": 1.0836, "grad_norm": 0.617508053779602, "learning_rate": 0.0002, "epoch": 0.6928226685349526, "step": 4450}, {"loss": 1.118, "grad_norm": 0.7889925241470337, "learning_rate": 0.0002, "epoch": 0.6943795734080648, "step": 4460}, {"loss": 1.0854, "grad_norm": 0.7918288111686707, "learning_rate": 0.0002, "epoch": 0.695936478281177, "step": 4470}, {"loss": 1.0917, "grad_norm": 0.8474521636962891, "learning_rate": 0.0002, "epoch": 0.6974933831542893, "step": 4480}, {"loss": 1.0649, "grad_norm": 0.7213913798332214, "learning_rate": 0.0002, "epoch": 0.6990502880274015, "step": 4490}, {"loss": 1.0699, "grad_norm": 0.6864824295043945, "learning_rate": 0.0002, "epoch": 0.7006071929005138, "step": 4500}, {"loss": 1.0521, "grad_norm": 0.7951189279556274, "learning_rate": 0.0002, "epoch": 0.702164097773626, "step": 4510}, {"loss": 1.0833, "grad_norm": 0.6230813264846802, "learning_rate": 0.0002, "epoch": 0.7037210026467383, "step": 4520}, {"loss": 1.0398, "grad_norm": 0.9469047784805298, "learning_rate": 0.0002, "epoch": 0.7052779075198505, "step": 4530}, {"loss": 1.0816, "grad_norm": 0.8481845855712891, "learning_rate": 0.0002, "epoch": 0.7068348123929628, "step": 4540}, {"loss": 1.0725, "grad_norm": 0.819776713848114, "learning_rate": 0.0002, "epoch": 0.7083917172660751, "step": 4550}, {"loss": 1.0592, "grad_norm": 0.8301685452461243, "learning_rate": 0.0002, "epoch": 0.7099486221391873, "step": 4560}, {"loss": 1.0318, "grad_norm": 0.9077170491218567, "learning_rate": 0.0002, "epoch": 0.7115055270122995, "step": 4570}, {"loss": 1.041, "grad_norm": 0.7909579873085022, "learning_rate": 0.0002, "epoch": 0.7130624318854119, "step": 4580}, {"loss": 1.0602, "grad_norm": 0.6523225903511047, "learning_rate": 0.0002, "epoch": 0.7146193367585241, "step": 4590}, {"loss": 1.0703, "grad_norm": 0.8301730155944824, "learning_rate": 0.0002, "epoch": 0.7161762416316363, "step": 4600}, {"loss": 1.0591, "grad_norm": 0.7911930680274963, "learning_rate": 0.0002, "epoch": 0.7177331465047485, "step": 4610}, {"loss": 1.1137, "grad_norm": 0.8627018928527832, "learning_rate": 0.0002, "epoch": 0.7192900513778608, "step": 4620}, {"loss": 1.083, "grad_norm": 0.9536554217338562, "learning_rate": 0.0002, "epoch": 0.7208469562509731, "step": 4630}, {"loss": 1.0666, "grad_norm": 0.7584307193756104, "learning_rate": 0.0002, "epoch": 0.7224038611240853, "step": 4640}, {"loss": 1.1222, "grad_norm": 0.815437376499176, "learning_rate": 0.0002, "epoch": 0.7239607659971976, "step": 4650}, {"loss": 1.0653, "grad_norm": 0.7486551403999329, "learning_rate": 0.0002, "epoch": 0.7255176708703098, "step": 4660}, {"loss": 1.0955, "grad_norm": 0.7780658006668091, "learning_rate": 0.0002, "epoch": 0.727074575743422, "step": 4670}, {"loss": 1.0695, "grad_norm": 0.6655897498130798, "learning_rate": 0.0002, "epoch": 0.7286314806165344, "step": 4680}, {"loss": 1.0957, "grad_norm": 0.8646983504295349, "learning_rate": 0.0002, "epoch": 0.7301883854896466, "step": 4690}, {"loss": 1.045, "grad_norm": 0.8383577466011047, "learning_rate": 0.0002, "epoch": 0.7317452903627588, "step": 4700}, {"loss": 1.0543, "grad_norm": 0.7498257756233215, "learning_rate": 0.0002, "epoch": 0.733302195235871, "step": 4710}, {"loss": 1.0462, "grad_norm": 0.8027714490890503, "learning_rate": 0.0002, "epoch": 0.7348591001089834, "step": 4720}, {"loss": 1.0607, "grad_norm": 0.7282228469848633, "learning_rate": 0.0002, "epoch": 0.7364160049820956, "step": 4730}, {"loss": 1.1279, "grad_norm": 0.8892863988876343, "learning_rate": 0.0002, "epoch": 0.7379729098552078, "step": 4740}, {"loss": 1.0638, "grad_norm": 0.8727153539657593, "learning_rate": 0.0002, "epoch": 0.7395298147283201, "step": 4750}, {"loss": 1.0949, "grad_norm": 0.9239740967750549, "learning_rate": 0.0002, "epoch": 0.7410867196014324, "step": 4760}, {"loss": 1.0902, "grad_norm": 0.8102642893791199, "learning_rate": 0.0002, "epoch": 0.7426436244745446, "step": 4770}, {"loss": 1.0783, "grad_norm": 0.8584149479866028, "learning_rate": 0.0002, "epoch": 0.7442005293476569, "step": 4780}, {"loss": 1.0717, "grad_norm": 0.9363023638725281, "learning_rate": 0.0002, "epoch": 0.7457574342207691, "step": 4790}, {"loss": 1.039, "grad_norm": 0.7935735583305359, "learning_rate": 0.0002, "epoch": 0.7473143390938813, "step": 4800}, {"loss": 1.0981, "grad_norm": 0.6224421858787537, "learning_rate": 0.0002, "epoch": 0.7488712439669937, "step": 4810}, {"loss": 1.0687, "grad_norm": 0.9423881769180298, "learning_rate": 0.0002, "epoch": 0.7504281488401059, "step": 4820}, {"loss": 1.0814, "grad_norm": 0.7841699719429016, "learning_rate": 0.0002, "epoch": 0.7519850537132181, "step": 4830}, {"loss": 1.0708, "grad_norm": 0.7534794211387634, "learning_rate": 0.0002, "epoch": 0.7535419585863303, "step": 4840}, {"loss": 1.0664, "grad_norm": 0.8285418748855591, "learning_rate": 0.0002, "epoch": 0.7550988634594427, "step": 4850}, {"loss": 1.1207, "grad_norm": 0.7921267151832581, "learning_rate": 0.0002, "epoch": 0.7566557683325549, "step": 4860}, {"loss": 1.0413, "grad_norm": 0.774894654750824, "learning_rate": 0.0002, "epoch": 0.7582126732056671, "step": 4870}, {"loss": 1.069, "grad_norm": 0.8826556205749512, "learning_rate": 0.0002, "epoch": 0.7597695780787794, "step": 4880}, {"loss": 1.1188, "grad_norm": 0.7641022205352783, "learning_rate": 0.0002, "epoch": 0.7613264829518916, "step": 4890}, {"loss": 1.1125, "grad_norm": 0.8973536491394043, "learning_rate": 0.0002, "epoch": 0.7628833878250039, "step": 4900}, {"loss": 1.0588, "grad_norm": 0.7350989580154419, "learning_rate": 0.0002, "epoch": 0.7644402926981162, "step": 4910}, {"loss": 1.1035, "grad_norm": 14.1688871383667, "learning_rate": 0.0002, "epoch": 0.7659971975712284, "step": 4920}, {"loss": 1.0357, "grad_norm": 0.6861683130264282, "learning_rate": 0.0002, "epoch": 0.7675541024443406, "step": 4930}, {"loss": 1.0713, "grad_norm": 0.8251516819000244, "learning_rate": 0.0002, "epoch": 0.7691110073174529, "step": 4940}, {"loss": 1.0944, "grad_norm": 0.8444218635559082, "learning_rate": 0.0002, "epoch": 0.7706679121905652, "step": 4950}, {"loss": 1.0951, "grad_norm": 0.7211325168609619, "learning_rate": 0.0002, "epoch": 0.7722248170636774, "step": 4960}, {"loss": 0.9944, "grad_norm": 0.7545472979545593, "learning_rate": 0.0002, "epoch": 0.7737817219367896, "step": 4970}, {"loss": 1.0554, "grad_norm": 0.7790022492408752, "learning_rate": 0.0002, "epoch": 0.775338626809902, "step": 4980}, {"loss": 1.065, "grad_norm": 0.9435457587242126, "learning_rate": 0.0002, "epoch": 0.7768955316830142, "step": 4990}, {"loss": 1.0908, "grad_norm": 3.6213088035583496, "learning_rate": 0.0002, "epoch": 0.7784524365561264, "step": 5000}, {"loss": 1.0569, "grad_norm": 0.843288779258728, "learning_rate": 0.0002, "epoch": 0.7800093414292387, "step": 5010}, {"loss": 1.0666, "grad_norm": 0.7558038830757141, "learning_rate": 0.0002, "epoch": 0.7815662463023509, "step": 5020}, {"loss": 1.1004, "grad_norm": 0.851983368396759, "learning_rate": 0.0002, "epoch": 0.7831231511754632, "step": 5030}, {"loss": 1.0614, "grad_norm": 0.7531154751777649, "learning_rate": 0.0002, "epoch": 0.7846800560485754, "step": 5040}, {"loss": 1.0599, "grad_norm": 0.7359105348587036, "learning_rate": 0.0002, "epoch": 0.7862369609216877, "step": 5050}, {"loss": 1.0856, "grad_norm": 0.8272745609283447, "learning_rate": 0.0002, "epoch": 0.7877938657947999, "step": 5060}, {"loss": 1.0599, "grad_norm": 0.7510097622871399, "learning_rate": 0.0002, "epoch": 0.7893507706679121, "step": 5070}, {"loss": 1.0387, "grad_norm": 1.0069338083267212, "learning_rate": 0.0002, "epoch": 0.7909076755410245, "step": 5080}, {"loss": 1.08, "grad_norm": 0.7729558348655701, "learning_rate": 0.0002, "epoch": 0.7924645804141367, "step": 5090}, {"loss": 1.0519, "grad_norm": 0.8252530694007874, "learning_rate": 0.0002, "epoch": 0.7940214852872489, "step": 5100}, {"loss": 1.0751, "grad_norm": 0.8848336935043335, "learning_rate": 0.0002, "epoch": 0.7955783901603612, "step": 5110}, {"loss": 1.0416, "grad_norm": 0.9576271176338196, "learning_rate": 0.0002, "epoch": 0.7971352950334735, "step": 5120}, {"loss": 1.095, "grad_norm": 0.9695903062820435, "learning_rate": 0.0002, "epoch": 0.7986921999065857, "step": 5130}, {"loss": 1.0906, "grad_norm": 0.854999840259552, "learning_rate": 0.0002, "epoch": 0.8002491047796979, "step": 5140}, {"loss": 1.0609, "grad_norm": 0.6366874575614929, "learning_rate": 0.0002, "epoch": 0.8018060096528102, "step": 5150}, {"loss": 1.057, "grad_norm": 0.8735486268997192, "learning_rate": 0.0002, "epoch": 0.8033629145259225, "step": 5160}, {"loss": 1.0437, "grad_norm": 0.973213791847229, "learning_rate": 0.0002, "epoch": 0.8049198193990347, "step": 5170}, {"loss": 1.0706, "grad_norm": 0.7560285925865173, "learning_rate": 0.0002, "epoch": 0.806476724272147, "step": 5180}, {"loss": 1.0821, "grad_norm": 0.8631170392036438, "learning_rate": 0.0002, "epoch": 0.8080336291452592, "step": 5190}, {"loss": 1.1323, "grad_norm": 0.8319111466407776, "learning_rate": 0.0002, "epoch": 0.8095905340183714, "step": 5200}, {"loss": 1.0686, "grad_norm": 0.88200443983078, "learning_rate": 0.0002, "epoch": 0.8111474388914838, "step": 5210}, {"loss": 1.092, "grad_norm": 0.7943915128707886, "learning_rate": 0.0002, "epoch": 0.812704343764596, "step": 5220}, {"loss": 1.0477, "grad_norm": 0.8400788307189941, "learning_rate": 0.0002, "epoch": 0.8142612486377082, "step": 5230}, {"loss": 1.0919, "grad_norm": 0.836159348487854, "learning_rate": 0.0002, "epoch": 0.8158181535108205, "step": 5240}, {"loss": 1.1151, "grad_norm": 0.8944069147109985, "learning_rate": 0.0002, "epoch": 0.8173750583839328, "step": 5250}, {"loss": 1.034, "grad_norm": 0.8507514595985413, "learning_rate": 0.0002, "epoch": 0.818931963257045, "step": 5260}, {"loss": 1.1083, "grad_norm": 0.7153893709182739, "learning_rate": 0.0002, "epoch": 0.8204888681301572, "step": 5270}, {"loss": 1.0554, "grad_norm": 0.9960679411888123, "learning_rate": 0.0002, "epoch": 0.8220457730032695, "step": 5280}, {"loss": 1.0689, "grad_norm": 0.8010078072547913, "learning_rate": 0.0002, "epoch": 0.8236026778763818, "step": 5290}, {"loss": 1.066, "grad_norm": 0.828545093536377, "learning_rate": 0.0002, "epoch": 0.825159582749494, "step": 5300}, {"loss": 1.0483, "grad_norm": 0.8475313782691956, "learning_rate": 0.0002, "epoch": 0.8267164876226063, "step": 5310}, {"loss": 1.0651, "grad_norm": 0.7004364132881165, "learning_rate": 0.0002, "epoch": 0.8282733924957185, "step": 5320}, {"loss": 1.0619, "grad_norm": 0.9940724968910217, "learning_rate": 0.0002, "epoch": 0.8298302973688307, "step": 5330}, {"loss": 1.1068, "grad_norm": 0.7745581269264221, "learning_rate": 0.0002, "epoch": 0.8313872022419431, "step": 5340}, {"loss": 1.0099, "grad_norm": 0.8869436383247375, "learning_rate": 0.0002, "epoch": 0.8329441071150553, "step": 5350}, {"loss": 1.1077, "grad_norm": 0.7553290128707886, "learning_rate": 0.0002, "epoch": 0.8345010119881675, "step": 5360}, {"loss": 1.0694, "grad_norm": 0.7386214733123779, "learning_rate": 0.0002, "epoch": 0.8360579168612797, "step": 5370}, {"loss": 1.0421, "grad_norm": 0.7427400350570679, "learning_rate": 0.0002, "epoch": 0.8376148217343921, "step": 5380}, {"loss": 1.0785, "grad_norm": 0.8802869915962219, "learning_rate": 0.0002, "epoch": 0.8391717266075043, "step": 5390}, {"loss": 1.0108, "grad_norm": 0.9886623024940491, "learning_rate": 0.0002, "epoch": 0.8407286314806165, "step": 5400}, {"loss": 1.0826, "grad_norm": 0.7445552349090576, "learning_rate": 0.0002, "epoch": 0.8422855363537288, "step": 5410}, {"loss": 1.0553, "grad_norm": 1.035735845565796, "learning_rate": 0.0002, "epoch": 0.843842441226841, "step": 5420}, {"loss": 1.0694, "grad_norm": 0.786275327205658, "learning_rate": 0.0002, "epoch": 0.8453993460999533, "step": 5430}, {"loss": 1.0806, "grad_norm": 0.8249770402908325, "learning_rate": 0.0002, "epoch": 0.8469562509730656, "step": 5440}, {"loss": 1.0761, "grad_norm": 0.8181759715080261, "learning_rate": 0.0002, "epoch": 0.8485131558461778, "step": 5450}, {"loss": 1.0958, "grad_norm": 0.8673132658004761, "learning_rate": 0.0002, "epoch": 0.85007006071929, "step": 5460}, {"loss": 1.0457, "grad_norm": 0.9361504316329956, "learning_rate": 0.0002, "epoch": 0.8516269655924023, "step": 5470}, {"loss": 1.1216, "grad_norm": 0.9014797806739807, "learning_rate": 0.0002, "epoch": 0.8531838704655146, "step": 5480}, {"loss": 1.0861, "grad_norm": 0.8566169738769531, "learning_rate": 0.0002, "epoch": 0.8547407753386268, "step": 5490}, {"loss": 1.0582, "grad_norm": 0.758193850517273, "learning_rate": 0.0002, "epoch": 0.856297680211739, "step": 5500}, {"loss": 1.033, "grad_norm": 1.0685954093933105, "learning_rate": 0.0002, "epoch": 0.8578545850848514, "step": 5510}, {"loss": 1.081, "grad_norm": 2.039160966873169, "learning_rate": 0.0002, "epoch": 0.8594114899579636, "step": 5520}, {"loss": 1.0092, "grad_norm": 0.8142752051353455, "learning_rate": 0.0002, "epoch": 0.8609683948310758, "step": 5530}, {"loss": 1.0731, "grad_norm": 0.8965482115745544, "learning_rate": 0.0002, "epoch": 0.8625252997041881, "step": 5540}, {"loss": 1.0667, "grad_norm": 0.8454253673553467, "learning_rate": 0.0002, "epoch": 0.8640822045773003, "step": 5550}, {"loss": 1.1077, "grad_norm": 0.9199718832969666, "learning_rate": 0.0002, "epoch": 0.8656391094504126, "step": 5560}, {"loss": 1.1153, "grad_norm": 1.3581570386886597, "learning_rate": 0.0002, "epoch": 0.8671960143235248, "step": 5570}, {"loss": 1.0976, "grad_norm": 1.0530487298965454, "learning_rate": 0.0002, "epoch": 0.8687529191966371, "step": 5580}, {"loss": 1.064, "grad_norm": 0.8709384202957153, "learning_rate": 0.0002, "epoch": 0.8703098240697493, "step": 5590}, {"loss": 1.05, "grad_norm": 0.7050154209136963, "learning_rate": 0.0002, "epoch": 0.8718667289428615, "step": 5600}, {"loss": 1.056, "grad_norm": 0.7182510495185852, "learning_rate": 0.0002, "epoch": 0.8734236338159739, "step": 5610}, {"loss": 1.0801, "grad_norm": 0.8140570521354675, "learning_rate": 0.0002, "epoch": 0.8749805386890861, "step": 5620}, {"loss": 1.1154, "grad_norm": 0.8480454087257385, "learning_rate": 0.0002, "epoch": 0.8765374435621983, "step": 5630}, {"loss": 1.0615, "grad_norm": 0.6963641047477722, "learning_rate": 0.0002, "epoch": 0.8780943484353106, "step": 5640}, {"loss": 1.0694, "grad_norm": 0.9263169765472412, "learning_rate": 0.0002, "epoch": 0.8796512533084229, "step": 5650}, {"loss": 1.0548, "grad_norm": 0.7170609831809998, "learning_rate": 0.0002, "epoch": 0.8812081581815351, "step": 5660}, {"loss": 1.0626, "grad_norm": 0.8241595029830933, "learning_rate": 0.0002, "epoch": 0.8827650630546474, "step": 5670}, {"loss": 1.0947, "grad_norm": 1.0092189311981201, "learning_rate": 0.0002, "epoch": 0.8843219679277596, "step": 5680}, {"loss": 1.0692, "grad_norm": 0.8658111095428467, "learning_rate": 0.0002, "epoch": 0.8858788728008719, "step": 5690}, {"loss": 1.0723, "grad_norm": 0.8604783415794373, "learning_rate": 0.0002, "epoch": 0.8874357776739841, "step": 5700}, {"loss": 1.0806, "grad_norm": 0.9609537720680237, "learning_rate": 0.0002, "epoch": 0.8889926825470964, "step": 5710}, {"loss": 1.0297, "grad_norm": 0.906218945980072, "learning_rate": 0.0002, "epoch": 0.8905495874202086, "step": 5720}, {"loss": 1.0758, "grad_norm": 1.1734384298324585, "learning_rate": 0.0002, "epoch": 0.8921064922933208, "step": 5730}, {"loss": 1.0846, "grad_norm": 0.9665660262107849, "learning_rate": 0.0002, "epoch": 0.8936633971664332, "step": 5740}, {"loss": 1.013, "grad_norm": 1.030881643295288, "learning_rate": 0.0002, "epoch": 0.8952203020395454, "step": 5750}, {"loss": 1.0805, "grad_norm": 0.8214338421821594, "learning_rate": 0.0002, "epoch": 0.8967772069126576, "step": 5760}, {"loss": 1.0705, "grad_norm": 0.9224949479103088, "learning_rate": 0.0002, "epoch": 0.8983341117857699, "step": 5770}, {"loss": 1.0836, "grad_norm": 0.8578333854675293, "learning_rate": 0.0002, "epoch": 0.8998910166588822, "step": 5780}, {"loss": 1.0857, "grad_norm": 0.7420307397842407, "learning_rate": 0.0002, "epoch": 0.9014479215319944, "step": 5790}, {"loss": 1.125, "grad_norm": 0.7854830622673035, "learning_rate": 0.0002, "epoch": 0.9030048264051066, "step": 5800}, {"loss": 1.0958, "grad_norm": 0.764635443687439, "learning_rate": 0.0002, "epoch": 0.9045617312782189, "step": 5810}, {"loss": 1.1135, "grad_norm": 0.8214240074157715, "learning_rate": 0.0002, "epoch": 0.9061186361513311, "step": 5820}, {"loss": 1.0671, "grad_norm": 0.9154710173606873, "learning_rate": 0.0002, "epoch": 0.9076755410244434, "step": 5830}, {"loss": 1.0634, "grad_norm": 0.8590622544288635, "learning_rate": 0.0002, "epoch": 0.9092324458975557, "step": 5840}, {"loss": 1.11, "grad_norm": 0.8412153720855713, "learning_rate": 0.0002, "epoch": 0.9107893507706679, "step": 5850}, {"loss": 1.0638, "grad_norm": 1.9675449132919312, "learning_rate": 0.0002, "epoch": 0.9123462556437801, "step": 5860}, {"loss": 1.0296, "grad_norm": 1.1379388570785522, "learning_rate": 0.0002, "epoch": 0.9139031605168925, "step": 5870}, {"loss": 1.0216, "grad_norm": 0.7163333296775818, "learning_rate": 0.0002, "epoch": 0.9154600653900047, "step": 5880}, {"loss": 1.0477, "grad_norm": 0.9098557233810425, "learning_rate": 0.0002, "epoch": 0.9170169702631169, "step": 5890}, {"loss": 1.0432, "grad_norm": 0.7249660491943359, "learning_rate": 0.0002, "epoch": 0.9185738751362291, "step": 5900}, {"loss": 1.1456, "grad_norm": 0.7825692892074585, "learning_rate": 0.0002, "epoch": 0.9201307800093415, "step": 5910}, {"loss": 1.0636, "grad_norm": 0.8548965454101562, "learning_rate": 0.0002, "epoch": 0.9216876848824537, "step": 5920}, {"loss": 1.0882, "grad_norm": 0.9693538546562195, "learning_rate": 0.0002, "epoch": 0.9232445897555659, "step": 5930}, {"loss": 1.0624, "grad_norm": 0.7708289623260498, "learning_rate": 0.0002, "epoch": 0.9248014946286782, "step": 5940}, {"loss": 1.1018, "grad_norm": 0.8006708025932312, "learning_rate": 0.0002, "epoch": 0.9263583995017904, "step": 5950}, {"loss": 1.0193, "grad_norm": 0.6454119086265564, "learning_rate": 0.0002, "epoch": 0.9279153043749027, "step": 5960}, {"loss": 1.0527, "grad_norm": 0.7499276995658875, "learning_rate": 0.0002, "epoch": 0.929472209248015, "step": 5970}, {"loss": 1.0905, "grad_norm": 0.822564959526062, "learning_rate": 0.0002, "epoch": 0.9310291141211272, "step": 5980}, {"loss": 1.0865, "grad_norm": 0.9264562129974365, "learning_rate": 0.0002, "epoch": 0.9325860189942394, "step": 5990}, {"loss": 1.0341, "grad_norm": 0.793655276298523, "learning_rate": 0.0002, "epoch": 0.9341429238673518, "step": 6000}, {"loss": 1.1133, "grad_norm": 0.957902729511261, "learning_rate": 0.0002, "epoch": 0.935699828740464, "step": 6010}, {"loss": 1.0464, "grad_norm": 0.8112550377845764, "learning_rate": 0.0002, "epoch": 0.9372567336135762, "step": 6020}, {"loss": 1.0623, "grad_norm": 1.0112614631652832, "learning_rate": 0.0002, "epoch": 0.9388136384866884, "step": 6030}, {"loss": 1.0408, "grad_norm": 0.8313361406326294, "learning_rate": 0.0002, "epoch": 0.9403705433598007, "step": 6040}, {"loss": 1.0841, "grad_norm": 1.0863341093063354, "learning_rate": 0.0002, "epoch": 0.941927448232913, "step": 6050}, {"loss": 1.0491, "grad_norm": 0.7765035033226013, "learning_rate": 0.0002, "epoch": 0.9434843531060252, "step": 6060}, {"loss": 1.1264, "grad_norm": 0.9313017129898071, "learning_rate": 0.0002, "epoch": 0.9450412579791375, "step": 6070}, {"loss": 1.0546, "grad_norm": 0.7248339653015137, "learning_rate": 0.0002, "epoch": 0.9465981628522497, "step": 6080}, {"loss": 1.0344, "grad_norm": 0.8163012862205505, "learning_rate": 0.0002, "epoch": 0.948155067725362, "step": 6090}, {"loss": 1.0442, "grad_norm": 0.7007320523262024, "learning_rate": 0.0002, "epoch": 0.9497119725984743, "step": 6100}, {"loss": 1.1153, "grad_norm": 0.8500741720199585, "learning_rate": 0.0002, "epoch": 0.9512688774715865, "step": 6110}, {"loss": 1.0632, "grad_norm": 1.0168312788009644, "learning_rate": 0.0002, "epoch": 0.9528257823446987, "step": 6120}, {"loss": 1.0429, "grad_norm": 0.884118378162384, "learning_rate": 0.0002, "epoch": 0.9543826872178109, "step": 6130}, {"loss": 1.0902, "grad_norm": 0.878180742263794, "learning_rate": 0.0002, "epoch": 0.9559395920909233, "step": 6140}, {"loss": 1.0709, "grad_norm": 0.8097935914993286, "learning_rate": 0.0002, "epoch": 0.9574964969640355, "step": 6150}, {"loss": 1.0256, "grad_norm": 1.012940526008606, "learning_rate": 0.0002, "epoch": 0.9590534018371477, "step": 6160}, {"loss": 1.0577, "grad_norm": 0.7821488380432129, "learning_rate": 0.0002, "epoch": 0.96061030671026, "step": 6170}, {"loss": 1.0836, "grad_norm": 0.8818637728691101, "learning_rate": 0.0002, "epoch": 0.9621672115833723, "step": 6180}, {"loss": 1.0676, "grad_norm": 0.6988415122032166, "learning_rate": 0.0002, "epoch": 0.9637241164564845, "step": 6190}, {"loss": 1.0554, "grad_norm": 0.7380658388137817, "learning_rate": 0.0002, "epoch": 0.9652810213295968, "step": 6200}, {"loss": 1.0251, "grad_norm": 0.8896392583847046, "learning_rate": 0.0002, "epoch": 0.966837926202709, "step": 6210}, {"loss": 1.0748, "grad_norm": 0.7214707732200623, "learning_rate": 0.0002, "epoch": 0.9683948310758212, "step": 6220}, {"loss": 1.0829, "grad_norm": 0.8196309804916382, "learning_rate": 0.0002, "epoch": 0.9699517359489335, "step": 6230}, {"loss": 1.0552, "grad_norm": 1.0904492139816284, "learning_rate": 0.0002, "epoch": 0.9715086408220458, "step": 6240}, {"loss": 1.112, "grad_norm": 0.9569701552391052, "learning_rate": 0.0002, "epoch": 0.973065545695158, "step": 6250}, {"loss": 1.0625, "grad_norm": 0.6737582683563232, "learning_rate": 0.0002, "epoch": 0.9746224505682702, "step": 6260}, {"loss": 1.1142, "grad_norm": 0.8714433312416077, "learning_rate": 0.0002, "epoch": 0.9761793554413826, "step": 6270}, {"loss": 1.0581, "grad_norm": 0.9081109762191772, "learning_rate": 0.0002, "epoch": 0.9777362603144948, "step": 6280}, {"loss": 0.9908, "grad_norm": 0.8933042883872986, "learning_rate": 0.0002, "epoch": 0.979293165187607, "step": 6290}, {"loss": 1.0499, "grad_norm": 0.7654024958610535, "learning_rate": 0.0002, "epoch": 0.9808500700607193, "step": 6300}, {"loss": 1.1293, "grad_norm": 0.7197521328926086, "learning_rate": 0.0002, "epoch": 0.9824069749338316, "step": 6310}, {"loss": 1.0719, "grad_norm": 0.9743903279304504, "learning_rate": 0.0002, "epoch": 0.9839638798069438, "step": 6320}, {"loss": 1.0792, "grad_norm": 0.7470610737800598, "learning_rate": 0.0002, "epoch": 0.985520784680056, "step": 6330}, {"loss": 1.0719, "grad_norm": 0.9095385074615479, "learning_rate": 0.0002, "epoch": 0.9870776895531683, "step": 6340}, {"loss": 1.0516, "grad_norm": 0.7956041693687439, "learning_rate": 0.0002, "epoch": 0.9886345944262805, "step": 6350}, {"loss": 1.0969, "grad_norm": 0.8058228492736816, "learning_rate": 0.0002, "epoch": 0.9901914992993928, "step": 6360}, {"loss": 1.0584, "grad_norm": 0.8515527248382568, "learning_rate": 0.0002, "epoch": 0.9917484041725051, "step": 6370}, {"loss": 1.0168, "grad_norm": 0.9754130840301514, "learning_rate": 0.0002, "epoch": 0.9933053090456173, "step": 6380}, {"loss": 1.0738, "grad_norm": 0.8905683755874634, "learning_rate": 0.0002, "epoch": 0.9948622139187295, "step": 6390}, {"loss": 1.0609, "grad_norm": 0.9041321277618408, "learning_rate": 0.0002, "epoch": 0.9964191187918419, "step": 6400}, {"loss": 1.0788, "grad_norm": 0.8536698818206787, "learning_rate": 0.0002, "epoch": 0.9979760236649541, "step": 6410}, {"loss": 1.075, "grad_norm": 0.8174132704734802, "learning_rate": 0.0002, "epoch": 0.9995329285380663, "step": 6420}]}
|