Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

adapter_config.json +2 -2
adapter_model.bin +1 -1
checkpoint-1500/README.md +202 -0
checkpoint-1500/adapter_config.json +30 -0
checkpoint-1500/adapter_model.safetensors +3 -0
checkpoint-1500/optimizer.pt +3 -0
checkpoint-1500/rng_state.pth +3 -0
checkpoint-1500/scheduler.pt +3 -0
checkpoint-1500/special_tokens_map.json +23 -0
checkpoint-1500/tokenizer.model +3 -0
checkpoint-1500/tokenizer_config.json +43 -0
checkpoint-1500/trainer_state.json +2658 -0
checkpoint-1500/training_args.bin +3 -0
trainer_state.json +0 -0

adapter_config.json CHANGED Viewed

@@ -20,9 +20,9 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "k_proj",
     "v_proj",
-    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "q_proj",
     "v_proj",
+    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:303f74739d41df8f7708618f177ce8c620f3099088c0c3007c811e5460844c87
 size 100733002

 version https://git-lfs.github.com/spec/v1
+oid sha256:e1efd888f4565a0c6f252f53fb2d520cdae353f3ac6f7b318cf5a66dac8c4c86
 size 100733002

checkpoint-1500/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: FlagAlpha/Llama2-Chinese-7b-Chat
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.11.1

checkpoint-1500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "FlagAlpha/Llama2-Chinese-7b-Chat",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8607b9b56bcdfaeb6b3c76736547e896aba4ac5151c781e13e3fd6b4549e38ab
+size 100689344

checkpoint-1500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f450b6a41305430d5bbd8d6d61835cc8651a7979b67aa7200bfdbbb2ea6c642b
+size 201488570

checkpoint-1500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d03a096db7ee5a1191528b365ea5b44b41de47ef2fcd633a48ead136d5f3f4f
+size 14244

checkpoint-1500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:16bb6eb3e071ae677d4ff02f958ab690aa9cdb0fd86a22d9cdf4c9c04c998c00
+size 1064

checkpoint-1500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1500/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

checkpoint-1500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": false,
+  "model_max_length": 512,
+  "pad_token": null,
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false,
+  "use_fast": false
+}

checkpoint-1500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2658 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.5353121801432958,
+  "eval_steps": 500,
+  "global_step": 1500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0040941658137154556,
+      "grad_norm": 0.59375,
+      "learning_rate": 5.9999999999999995e-05,
+      "loss": 0.6558,
+      "step": 4
+    },
+    {
+      "epoch": 0.008188331627430911,
+      "grad_norm": 0.46875,
+      "learning_rate": 0.00011999999999999999,
+      "loss": 0.6895,
+      "step": 8
+    },
+    {
+      "epoch": 0.012282497441146366,
+      "grad_norm": 0.4296875,
+      "learning_rate": 0.00017999999999999998,
+      "loss": 0.641,
+      "step": 12
+    },
+    {
+      "epoch": 0.016376663254861822,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00023999999999999998,
+      "loss": 0.6635,
+      "step": 16
+    },
+    {
+      "epoch": 0.02047082906857728,
+      "grad_norm": 0.4296875,
+      "learning_rate": 0.0003,
+      "loss": 0.6486,
+      "step": 20
+    },
+    {
+      "epoch": 0.02456499488229273,
+      "grad_norm": 0.408203125,
+      "learning_rate": 0.0002999968335945527,
+      "loss": 0.6335,
+      "step": 24
+    },
+    {
+      "epoch": 0.028659160696008188,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.00029998733451189267,
+      "loss": 0.6513,
+      "step": 28
+    },
+    {
+      "epoch": 0.032753326509723645,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.0002999715031530591,
+      "loss": 0.6439,
+      "step": 32
+    },
+    {
+      "epoch": 0.0368474923234391,
+      "grad_norm": 0.4296875,
+      "learning_rate": 0.000299949340186432,
+      "loss": 0.6304,
+      "step": 36
+    },
+    {
+      "epoch": 0.04094165813715456,
+      "grad_norm": 0.423828125,
+      "learning_rate": 0.0002999208465477039,
+      "loss": 0.6772,
+      "step": 40
+    },
+    {
+      "epoch": 0.04503582395087001,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.0002998860234398403,
+      "loss": 0.6294,
+      "step": 44
+    },
+    {
+      "epoch": 0.04912998976458546,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.0002998448723330289,
+      "loss": 0.6093,
+      "step": 48
+    },
+    {
+      "epoch": 0.05322415557830092,
+      "grad_norm": 0.408203125,
+      "learning_rate": 0.0002997973949646176,
+      "loss": 0.6516,
+      "step": 52
+    },
+    {
+      "epoch": 0.057318321392016376,
+      "grad_norm": 0.443359375,
+      "learning_rate": 0.0002997435933390409,
+      "loss": 0.6261,
+      "step": 56
+    },
+    {
+      "epoch": 0.06141248720573183,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.0002996834697277358,
+      "loss": 0.6294,
+      "step": 60
+    },
+    {
+      "epoch": 0.06550665301944729,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.00029961702666904524,
+      "loss": 0.615,
+      "step": 64
+    },
+    {
+      "epoch": 0.06960081883316274,
+      "grad_norm": 0.40234375,
+      "learning_rate": 0.00029954426696811147,
+      "loss": 0.6036,
+      "step": 68
+    },
+    {
+      "epoch": 0.0736949846468782,
+      "grad_norm": 0.3984375,
+      "learning_rate": 0.00029946519369675726,
+      "loss": 0.5834,
+      "step": 72
+    },
+    {
+      "epoch": 0.07778915046059365,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.0002993798101933565,
+      "loss": 0.628,
+      "step": 76
+    },
+    {
+      "epoch": 0.08188331627430911,
+      "grad_norm": 0.39453125,
+      "learning_rate": 0.000299288120062693,
+      "loss": 0.6307,
+      "step": 80
+    },
+    {
+      "epoch": 0.08597748208802457,
+      "grad_norm": 0.3984375,
+      "learning_rate": 0.0002991901271758085,
+      "loss": 0.6392,
+      "step": 84
+    },
+    {
+      "epoch": 0.09007164790174002,
+      "grad_norm": 0.419921875,
+      "learning_rate": 0.0002990858356698392,
+      "loss": 0.6184,
+      "step": 88
+    },
+    {
+      "epoch": 0.09416581371545547,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.00029897524994784095,
+      "loss": 0.6669,
+      "step": 92
+    },
+    {
+      "epoch": 0.09825997952917093,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.0002988583746786035,
+      "loss": 0.6474,
+      "step": 96
+    },
+    {
+      "epoch": 0.1023541453428864,
+      "grad_norm": 0.41015625,
+      "learning_rate": 0.0002987352147964534,
+      "loss": 0.6427,
+      "step": 100
+    },
+    {
+      "epoch": 0.10644831115660185,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.00029860577550104567,
+      "loss": 0.6331,
+      "step": 104
+    },
+    {
+      "epoch": 0.1105424769703173,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.0002984700622571441,
+      "loss": 0.6274,
+      "step": 108
+    },
+    {
+      "epoch": 0.11463664278403275,
+      "grad_norm": 0.40234375,
+      "learning_rate": 0.00029832808079439076,
+      "loss": 0.6363,
+      "step": 112
+    },
+    {
+      "epoch": 0.1187308085977482,
+      "grad_norm": 0.57421875,
+      "learning_rate": 0.000298179837107064,
+      "loss": 0.6413,
+      "step": 116
+    },
+    {
+      "epoch": 0.12282497441146366,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.00029802533745382546,
+      "loss": 0.6439,
+      "step": 120
+    },
+    {
+      "epoch": 0.1269191402251791,
+      "grad_norm": 0.3984375,
+      "learning_rate": 0.00029786458835745564,
+      "loss": 0.6274,
+      "step": 124
+    },
+    {
+      "epoch": 0.13101330603889458,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.0002976975966045788,
+      "loss": 0.605,
+      "step": 128
+    },
+    {
+      "epoch": 0.13510747185261002,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.00029752436924537616,
+      "loss": 0.6316,
+      "step": 132
+    },
+    {
+      "epoch": 0.13920163766632548,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00029734491359328854,
+      "loss": 0.6802,
+      "step": 136
+    },
+    {
+      "epoch": 0.14329580348004095,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.00029715923722470724,
+      "loss": 0.6841,
+      "step": 140
+    },
+    {
+      "epoch": 0.1473899692937564,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.0002969673479786545,
+      "loss": 0.6004,
+      "step": 144
+    },
+    {
+      "epoch": 0.15148413510747186,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.00029676925395645233,
+      "loss": 0.671,
+      "step": 148
+    },
+    {
+      "epoch": 0.1555783009211873,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.00029656496352138066,
+      "loss": 0.6257,
+      "step": 152
+    },
+    {
+      "epoch": 0.15967246673490276,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.00029635448529832407,
+      "loss": 0.6141,
+      "step": 156
+    },
+    {
+      "epoch": 0.16376663254861823,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.0002961378281734078,
+      "loss": 0.6391,
+      "step": 160
+    },
+    {
+      "epoch": 0.16786079836233367,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.00029591500129362255,
+      "loss": 0.6211,
+      "step": 164
+    },
+    {
+      "epoch": 0.17195496417604914,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.00029568601406643826,
+      "loss": 0.6644,
+      "step": 168
+    },
+    {
+      "epoch": 0.17604912998976457,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.0002954508761594069,
+      "loss": 0.6667,
+      "step": 172
+    },
+    {
+      "epoch": 0.18014329580348004,
+      "grad_norm": 0.412109375,
+      "learning_rate": 0.0002952095974997546,
+      "loss": 0.6351,
+      "step": 176
+    },
+    {
+      "epoch": 0.1842374616171955,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.0002949621882739621,
+      "loss": 0.6142,
+      "step": 180
+    },
+    {
+      "epoch": 0.18833162743091095,
+      "grad_norm": 0.412109375,
+      "learning_rate": 0.000294708658927335,
+      "loss": 0.6569,
+      "step": 184
+    },
+    {
+      "epoch": 0.19242579324462641,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.00029444902016356267,
+      "loss": 0.6272,
+      "step": 188
+    },
+    {
+      "epoch": 0.19651995905834185,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.00029418328294426643,
+      "loss": 0.6742,
+      "step": 192
+    },
+    {
+      "epoch": 0.20061412487205732,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.00029391145848853674,
+      "loss": 0.6513,
+      "step": 196
+    },
+    {
+      "epoch": 0.2047082906857728,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.00029363355827245925,
+      "loss": 0.6369,
+      "step": 200
+    },
+    {
+      "epoch": 0.20880245649948823,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.0002933495940286309,
+      "loss": 0.6371,
+      "step": 204
+    },
+    {
+      "epoch": 0.2128966223132037,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.000293059577745664,
+      "loss": 0.6745,
+      "step": 208
+    },
+    {
+      "epoch": 0.21699078812691913,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00029276352166768033,
+      "loss": 0.6577,
+      "step": 212
+    },
+    {
+      "epoch": 0.2210849539406346,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.0002924614382937944,
+      "loss": 0.6224,
+      "step": 216
+    },
+    {
+      "epoch": 0.22517911975435004,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.0002921533403775853,
+      "loss": 0.6471,
+      "step": 220
+    },
+    {
+      "epoch": 0.2292732855680655,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.0002918392409265587,
+      "loss": 0.6583,
+      "step": 224
+    },
+    {
+      "epoch": 0.23336745138178097,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.00029151915320159747,
+      "loss": 0.6408,
+      "step": 228
+    },
+    {
+      "epoch": 0.2374616171954964,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.0002911930907164017,
+      "loss": 0.6275,
+      "step": 232
+    },
+    {
+      "epoch": 0.24155578300921188,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.00029086106723691857,
+      "loss": 0.6083,
+      "step": 236
+    },
+    {
+      "epoch": 0.24564994882292732,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.00029052309678076065,
+      "loss": 0.5966,
+      "step": 240
+    },
+    {
+      "epoch": 0.24974411463664278,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.0002901791936166147,
+      "loss": 0.6294,
+      "step": 244
+    },
+    {
+      "epoch": 0.2538382804503582,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.0002898293722636386,
+      "loss": 0.647,
+      "step": 248
+    },
+    {
+      "epoch": 0.2579324462640737,
+      "grad_norm": 0.421875,
+      "learning_rate": 0.00028947364749084897,
+      "loss": 0.6532,
+      "step": 252
+    },
+    {
+      "epoch": 0.26202661207778916,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.0002891120343164972,
+      "loss": 0.6059,
+      "step": 256
+    },
+    {
+      "epoch": 0.2661207778915046,
+      "grad_norm": 0.39453125,
+      "learning_rate": 0.00028874454800743556,
+      "loss": 0.6545,
+      "step": 260
+    },
+    {
+      "epoch": 0.27021494370522003,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.00028837120407847286,
+      "loss": 0.6462,
+      "step": 264
+    },
+    {
+      "epoch": 0.2743091095189355,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.000287992018291719,
+      "loss": 0.6337,
+      "step": 268
+    },
+    {
+      "epoch": 0.27840327533265097,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.00028760700665591985,
+      "loss": 0.6431,
+      "step": 272
+    },
+    {
+      "epoch": 0.28249744114636643,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.0002872161854257814,
+      "loss": 0.6797,
+      "step": 276
+    },
+    {
+      "epoch": 0.2865916069600819,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.00028681957110128313,
+      "loss": 0.6191,
+      "step": 280
+    },
+    {
+      "epoch": 0.2906857727737973,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.000286417180426982,
+      "loss": 0.6064,
+      "step": 284
+    },
+    {
+      "epoch": 0.2947799385875128,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.0002860090303913048,
+      "loss": 0.6451,
+      "step": 288
+    },
+    {
+      "epoch": 0.29887410440122825,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.00028559513822583153,
+      "loss": 0.6402,
+      "step": 292
+    },
+    {
+      "epoch": 0.3029682702149437,
+      "grad_norm": 0.408203125,
+      "learning_rate": 0.0002851755214045676,
+      "loss": 0.655,
+      "step": 296
+    },
+    {
+      "epoch": 0.3070624360286592,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00028475019764320634,
+      "loss": 0.6627,
+      "step": 300
+    },
+    {
+      "epoch": 0.3111566018423746,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.00028431918489838057,
+      "loss": 0.6654,
+      "step": 304
+    },
+    {
+      "epoch": 0.31525076765609006,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.0002838825013669051,
+      "loss": 0.6193,
+      "step": 308
+    },
+    {
+      "epoch": 0.3193449334698055,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.000283440165485008,
+      "loss": 0.6411,
+      "step": 312
+    },
+    {
+      "epoch": 0.323439099283521,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.00028299219592755264,
+      "loss": 0.5887,
+      "step": 316
+    },
+    {
+      "epoch": 0.32753326509723646,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.000282538611607249,
+      "loss": 0.5907,
+      "step": 320
+    },
+    {
+      "epoch": 0.33162743091095187,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.00028207943167385516,
+      "loss": 0.6408,
+      "step": 324
+    },
+    {
+      "epoch": 0.33572159672466734,
+      "grad_norm": 0.4140625,
+      "learning_rate": 0.000281614675513369,
+      "loss": 0.5642,
+      "step": 328
+    },
+    {
+      "epoch": 0.3398157625383828,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.0002811443627472098,
+      "loss": 0.6303,
+      "step": 332
+    },
+    {
+      "epoch": 0.34390992835209827,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.0002806685132313896,
+      "loss": 0.6267,
+      "step": 336
+    },
+    {
+      "epoch": 0.34800409416581374,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.00028018714705567503,
+      "loss": 0.6681,
+      "step": 340
+    },
+    {
+      "epoch": 0.35209825997952915,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00027970028454273917,
+      "loss": 0.6606,
+      "step": 344
+    },
+    {
+      "epoch": 0.3561924257932446,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.0002792079462473035,
+      "loss": 0.6027,
+      "step": 348
+    },
+    {
+      "epoch": 0.3602865916069601,
+      "grad_norm": 0.400390625,
+      "learning_rate": 0.0002787101529552702,
+      "loss": 0.6485,
+      "step": 352
+    },
+    {
+      "epoch": 0.36438075742067555,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.0002782069256828445,
+      "loss": 0.6345,
+      "step": 356
+    },
+    {
+      "epoch": 0.368474923234391,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.0002776982856756473,
+      "loss": 0.6211,
+      "step": 360
+    },
+    {
+      "epoch": 0.3725690890481064,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.0002771842544078187,
+      "loss": 0.598,
+      "step": 364
+    },
+    {
+      "epoch": 0.3766632548618219,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.0002766648535811105,
+      "loss": 0.6719,
+      "step": 368
+    },
+    {
+      "epoch": 0.38075742067553736,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.000276140105123971,
+      "loss": 0.6142,
+      "step": 372
+    },
+    {
+      "epoch": 0.38485158648925283,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.0002756100311906185,
+      "loss": 0.6187,
+      "step": 376
+    },
+    {
+      "epoch": 0.3889457523029683,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.000275074654160106,
+      "loss": 0.6373,
+      "step": 380
+    },
+    {
+      "epoch": 0.3930399181166837,
+      "grad_norm": 0.361328125,
+      "learning_rate": 0.00027453399663537707,
+      "loss": 0.6376,
+      "step": 384
+    },
+    {
+      "epoch": 0.3971340839303992,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.0002739880814423106,
+      "loss": 0.6187,
+      "step": 388
+    },
+    {
+      "epoch": 0.40122824974411464,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.0002734369316287578,
+      "loss": 0.648,
+      "step": 392
+    },
+    {
+      "epoch": 0.4053224155578301,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.0002728805704635691,
+      "loss": 0.6342,
+      "step": 396
+    },
+    {
+      "epoch": 0.4094165813715456,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.0002723190214356113,
+      "loss": 0.584,
+      "step": 400
+    },
+    {
+      "epoch": 0.413510747185261,
+      "grad_norm": 0.416015625,
+      "learning_rate": 0.0002717523082527766,
+      "loss": 0.6497,
+      "step": 404
+    },
+    {
+      "epoch": 0.41760491299897645,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.00027118045484098095,
+      "loss": 0.6038,
+      "step": 408
+    },
+    {
+      "epoch": 0.4216990788126919,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.0002706034853431546,
+      "loss": 0.6665,
+      "step": 412
+    },
+    {
+      "epoch": 0.4257932446264074,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.0002700214241182223,
+      "loss": 0.6422,
+      "step": 416
+    },
+    {
+      "epoch": 0.42988741044012285,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.00026943429574007515,
+      "loss": 0.5954,
+      "step": 420
+    },
+    {
+      "epoch": 0.43398157625383826,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.0002688421249965331,
+      "loss": 0.5899,
+      "step": 424
+    },
+    {
+      "epoch": 0.43807574206755373,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.0002682449368882984,
+      "loss": 0.5858,
+      "step": 428
+    },
+    {
+      "epoch": 0.4421699078812692,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00026764275662790005,
+      "loss": 0.6247,
+      "step": 432
+    },
+    {
+      "epoch": 0.44626407369498466,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00026703560963862956,
+      "loss": 0.5961,
+      "step": 436
+    },
+    {
+      "epoch": 0.4503582395087001,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.0002664235215534673,
+      "loss": 0.6428,
+      "step": 440
+    },
+    {
+      "epoch": 0.45445240532241554,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00026580651821400057,
+      "loss": 0.6387,
+      "step": 444
+    },
+    {
+      "epoch": 0.458546571136131,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.0002651846256693326,
+      "loss": 0.6024,
+      "step": 448
+    },
+    {
+      "epoch": 0.4626407369498465,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.00026455787017498253,
+      "loss": 0.6385,
+      "step": 452
+    },
+    {
+      "epoch": 0.46673490276356194,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.0002639262781917771,
+      "loss": 0.6228,
+      "step": 456
+    },
+    {
+      "epoch": 0.47082906857727735,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.0002632898763847338,
+      "loss": 0.6307,
+      "step": 460
+    },
+    {
+      "epoch": 0.4749232343909928,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.0002626486916219344,
+      "loss": 0.6465,
+      "step": 464
+    },
+    {
+      "epoch": 0.4790174002047083,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.0002620027509733914,
+      "loss": 0.6349,
+      "step": 468
+    },
+    {
+      "epoch": 0.48311156601842375,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.0002613520817099045,
+      "loss": 0.6223,
+      "step": 472
+    },
+    {
+      "epoch": 0.4872057318321392,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.0002606967113019098,
+      "loss": 0.605,
+      "step": 476
+    },
+    {
+      "epoch": 0.49129989764585463,
+      "grad_norm": 0.40234375,
+      "learning_rate": 0.0002600366674183196,
+      "loss": 0.6169,
+      "step": 480
+    },
+    {
+      "epoch": 0.4953940634595701,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.0002593719779253548,
+      "loss": 0.6289,
+      "step": 484
+    },
+    {
+      "epoch": 0.49948822927328557,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.0002587026708853674,
+      "loss": 0.6718,
+      "step": 488
+    },
+    {
+      "epoch": 0.503582395087001,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.0002580287745556572,
+      "loss": 0.5592,
+      "step": 492
+    },
+    {
+      "epoch": 0.5076765609007164,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.00025735031738727753,
+      "loss": 0.6118,
+      "step": 496
+    },
+    {
+      "epoch": 0.5117707267144319,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.00025666732802383463,
+      "loss": 0.5798,
+      "step": 500
+    },
+    {
+      "epoch": 0.5158648925281474,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.0002559798353002785,
+      "loss": 0.6488,
+      "step": 504
+    },
+    {
+      "epoch": 0.5199590583418628,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.0002552878682416851,
+      "loss": 0.6363,
+      "step": 508
+    },
+    {
+      "epoch": 0.5240532241555783,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.0002545914560620313,
+      "loss": 0.6246,
+      "step": 512
+    },
+    {
+      "epoch": 0.5281473899692938,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.00025389062816296153,
+      "loss": 0.6277,
+      "step": 516
+    },
+    {
+      "epoch": 0.5322415557830092,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00025318541413254587,
+      "loss": 0.5822,
+      "step": 520
+    },
+    {
+      "epoch": 0.5363357215967247,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.0002524758437440318,
+      "loss": 0.581,
+      "step": 524
+    },
+    {
+      "epoch": 0.5404298874104401,
+      "grad_norm": 0.345703125,
+      "learning_rate": 0.00025176194695458644,
+      "loss": 0.6365,
+      "step": 528
+    },
+    {
+      "epoch": 0.5445240532241555,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.0002510437539040324,
+      "loss": 0.5974,
+      "step": 532
+    },
+    {
+      "epoch": 0.548618219037871,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.0002503212949135747,
+      "loss": 0.646,
+      "step": 536
+    },
+    {
+      "epoch": 0.5527123848515865,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.00024959460048452117,
+      "loss": 0.6508,
+      "step": 540
+    },
+    {
+      "epoch": 0.5568065506653019,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.0002488637012969945,
+      "loss": 0.5838,
+      "step": 544
+    },
+    {
+      "epoch": 0.5609007164790174,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.0002481286282086368,
+      "loss": 0.597,
+      "step": 548
+    },
+    {
+      "epoch": 0.5649948822927329,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.00024738941225330727,
+      "loss": 0.6617,
+      "step": 552
+    },
+    {
+      "epoch": 0.5690890481064483,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00024664608463977164,
+      "loss": 0.5968,
+      "step": 556
+    },
+    {
+      "epoch": 0.5731832139201638,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.0002458986767503845,
+      "loss": 0.5837,
+      "step": 560
+    },
+    {
+      "epoch": 0.5772773797338793,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.00024514722013976485,
+      "loss": 0.6175,
+      "step": 564
+    },
+    {
+      "epoch": 0.5813715455475946,
+      "grad_norm": 0.4140625,
+      "learning_rate": 0.00024439174653346325,
+      "loss": 0.592,
+      "step": 568
+    },
+    {
+      "epoch": 0.5854657113613101,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.00024363228782662308,
+      "loss": 0.6434,
+      "step": 572
+    },
+    {
+      "epoch": 0.5895598771750256,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.0002428688760826334,
+      "loss": 0.6441,
+      "step": 576
+    },
+    {
+      "epoch": 0.593654042988741,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00024210154353177562,
+      "loss": 0.5881,
+      "step": 580
+    },
+    {
+      "epoch": 0.5977482088024565,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00024133032256986274,
+      "loss": 0.6454,
+      "step": 584
+    },
+    {
+      "epoch": 0.601842374616172,
+      "grad_norm": 0.3203125,
+      "learning_rate": 0.00024055524575687136,
+      "loss": 0.5999,
+      "step": 588
+    },
+    {
+      "epoch": 0.6059365404298874,
+      "grad_norm": 0.345703125,
+      "learning_rate": 0.00023977634581556743,
+      "loss": 0.6028,
+      "step": 592
+    },
+    {
+      "epoch": 0.6100307062436029,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00023899365563012455,
+      "loss": 0.5945,
+      "step": 596
+    },
+    {
+      "epoch": 0.6141248720573184,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.00023820720824473555,
+      "loss": 0.6106,
+      "step": 600
+    },
+    {
+      "epoch": 0.6182190378710338,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.00023741703686221767,
+      "loss": 0.6626,
+      "step": 604
+    },
+    {
+      "epoch": 0.6223132036847492,
+      "grad_norm": 0.361328125,
+      "learning_rate": 0.00023662317484261038,
+      "loss": 0.6107,
+      "step": 608
+    },
+    {
+      "epoch": 0.6264073694984647,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.00023582565570176738,
+      "loss": 0.5691,
+      "step": 612
+    },
+    {
+      "epoch": 0.6305015353121801,
+      "grad_norm": 0.330078125,
+      "learning_rate": 0.00023502451310994138,
+      "loss": 0.648,
+      "step": 616
+    },
+    {
+      "epoch": 0.6345957011258956,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.0002342197808903626,
+      "loss": 0.6255,
+      "step": 620
+    },
+    {
+      "epoch": 0.638689866939611,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.00023341149301781076,
+      "loss": 0.6423,
+      "step": 624
+    },
+    {
+      "epoch": 0.6427840327533265,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.00023259968361718093,
+      "loss": 0.6358,
+      "step": 628
+    },
+    {
+      "epoch": 0.646878198567042,
+      "grad_norm": 0.328125,
+      "learning_rate": 0.00023178438696204248,
+      "loss": 0.6217,
+      "step": 632
+    },
+    {
+      "epoch": 0.6509723643807575,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.0002309656374731923,
+      "loss": 0.6543,
+      "step": 636
+    },
+    {
+      "epoch": 0.6550665301944729,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.00023014346971720172,
+      "loss": 0.6438,
+      "step": 640
+    },
+    {
+      "epoch": 0.6591606960081884,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.00022931791840495683,
+      "loss": 0.6364,
+      "step": 644
+    },
+    {
+      "epoch": 0.6632548618219037,
+      "grad_norm": 0.375,
+      "learning_rate": 0.00022848901839019325,
+      "loss": 0.5709,
+      "step": 648
+    },
+    {
+      "epoch": 0.6673490276356192,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00022765680466802467,
+      "loss": 0.6298,
+      "step": 652
+    },
+    {
+      "epoch": 0.6714431934493347,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.00022682131237346514,
+      "loss": 0.6143,
+      "step": 656
+    },
+    {
+      "epoch": 0.6755373592630501,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.00022598257677994616,
+      "loss": 0.64,
+      "step": 660
+    },
+    {
+      "epoch": 0.6796315250767656,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.00022514063329782702,
+      "loss": 0.6509,
+      "step": 664
+    },
+    {
+      "epoch": 0.6837256908904811,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.0002242955174729001,
+      "loss": 0.6393,
+      "step": 668
+    },
+    {
+      "epoch": 0.6878198567041965,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00022344726498489009,
+      "loss": 0.6492,
+      "step": 672
+    },
+    {
+      "epoch": 0.691914022517912,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.0002225959116459477,
+      "loss": 0.6519,
+      "step": 676
+    },
+    {
+      "epoch": 0.6960081883316275,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.00022174149339913745,
+      "loss": 0.5808,
+      "step": 680
+    },
+    {
+      "epoch": 0.7001023541453428,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.0002208840463169207,
+      "loss": 0.6531,
+      "step": 684
+    },
+    {
+      "epoch": 0.7041965199590583,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.0002200236065996322,
+      "loss": 0.677,
+      "step": 688
+    },
+    {
+      "epoch": 0.7082906857727738,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.0002191602105739521,
+      "loss": 0.5903,
+      "step": 692
+    },
+    {
+      "epoch": 0.7123848515864892,
+      "grad_norm": 0.40625,
+      "learning_rate": 0.00021829389469137206,
+      "loss": 0.6429,
+      "step": 696
+    },
+    {
+      "epoch": 0.7164790174002047,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.0002174246955266565,
+      "loss": 0.6316,
+      "step": 700
+    },
+    {
+      "epoch": 0.7205731832139202,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00021655264977629842,
+      "loss": 0.6191,
+      "step": 704
+    },
+    {
+      "epoch": 0.7246673490276356,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.00021567779425696993,
+      "loss": 0.5909,
+      "step": 708
+    },
+    {
+      "epoch": 0.7287615148413511,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.00021480016590396807,
+      "loss": 0.5745,
+      "step": 712
+    },
+    {
+      "epoch": 0.7328556806550666,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.0002139198017696556,
+      "loss": 0.627,
+      "step": 716
+    },
+    {
+      "epoch": 0.736949846468782,
+      "grad_norm": 0.375,
+      "learning_rate": 0.00021303673902189636,
+      "loss": 0.5907,
+      "step": 720
+    },
+    {
+      "epoch": 0.7410440122824974,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.00021215101494248618,
+      "loss": 0.6565,
+      "step": 724
+    },
+    {
+      "epoch": 0.7451381780962129,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.00021126266692557917,
+      "loss": 0.6313,
+      "step": 728
+    },
+    {
+      "epoch": 0.7492323439099283,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.00021037173247610863,
+      "loss": 0.6126,
+      "step": 732
+    },
+    {
+      "epoch": 0.7533265097236438,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.00020947824920820383,
+      "loss": 0.6302,
+      "step": 736
+    },
+    {
+      "epoch": 0.7574206755373593,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.00020858225484360186,
+      "loss": 0.6709,
+      "step": 740
+    },
+    {
+      "epoch": 0.7615148413510747,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.00020768378721005526,
+      "loss": 0.6173,
+      "step": 744
+    },
+    {
+      "epoch": 0.7656090071647902,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00020678288423973476,
+      "loss": 0.5911,
+      "step": 748
+    },
+    {
+      "epoch": 0.7697031729785057,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.00020587958396762815,
+      "loss": 0.6153,
+      "step": 752
+    },
+    {
+      "epoch": 0.7737973387922211,
+      "grad_norm": 0.375,
+      "learning_rate": 0.00020497392452993395,
+      "loss": 0.5763,
+      "step": 756
+    },
+    {
+      "epoch": 0.7778915046059366,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.0002040659441624519,
+      "loss": 0.5981,
+      "step": 760
+    },
+    {
+      "epoch": 0.781985670419652,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.00020315568119896846,
+      "loss": 0.6124,
+      "step": 764
+    },
+    {
+      "epoch": 0.7860798362333674,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.00020224317406963835,
+      "loss": 0.6245,
+      "step": 768
+    },
+    {
+      "epoch": 0.7901740020470829,
+      "grad_norm": 0.361328125,
+      "learning_rate": 0.00020132846129936223,
+      "loss": 0.6093,
+      "step": 772
+    },
+    {
+      "epoch": 0.7942681678607983,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.00020041158150615996,
+      "loss": 0.6212,
+      "step": 776
+    },
+    {
+      "epoch": 0.7983623336745138,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00019949257339954056,
+      "loss": 0.6338,
+      "step": 780
+    },
+    {
+      "epoch": 0.8024564994882293,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.0001985714757788677,
+      "loss": 0.6196,
+      "step": 784
+    },
+    {
+      "epoch": 0.8065506653019447,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.00019764832753172172,
+      "loss": 0.5654,
+      "step": 788
+    },
+    {
+      "epoch": 0.8106448311156602,
+      "grad_norm": 0.47265625,
+      "learning_rate": 0.00019672316763225773,
+      "loss": 0.5876,
+      "step": 792
+    },
+    {
+      "epoch": 0.8147389969293757,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.0001957960351395604,
+      "loss": 0.5951,
+      "step": 796
+    },
+    {
+      "epoch": 0.8188331627430911,
+      "grad_norm": 0.361328125,
+      "learning_rate": 0.0001948669691959947,
+      "loss": 0.5991,
+      "step": 800
+    },
+    {
+      "epoch": 0.8229273285568065,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.0001939360090255535,
+      "loss": 0.6002,
+      "step": 804
+    },
+    {
+      "epoch": 0.827021494370522,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00019300319393220146,
+      "loss": 0.5882,
+      "step": 808
+    },
+    {
+      "epoch": 0.8311156601842374,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.00019206856329821595,
+      "loss": 0.6324,
+      "step": 812
+    },
+    {
+      "epoch": 0.8352098259979529,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.00019113215658252394,
+      "loss": 0.608,
+      "step": 816
+    },
+    {
+      "epoch": 0.8393039918116684,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.0001901940133190365,
+      "loss": 0.659,
+      "step": 820
+    },
+    {
+      "epoch": 0.8433981576253838,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.00018925417311497944,
+      "loss": 0.641,
+      "step": 824
+    },
+    {
+      "epoch": 0.8474923234390993,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.00018831267564922135,
+      "loss": 0.6171,
+      "step": 828
+    },
+    {
+      "epoch": 0.8515864892528148,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.00018736956067059827,
+      "loss": 0.6022,
+      "step": 832
+    },
+    {
+      "epoch": 0.8556806550665302,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.00018642486799623563,
+      "loss": 0.6303,
+      "step": 836
+    },
+    {
+      "epoch": 0.8597748208802457,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.00018547863750986715,
+      "loss": 0.6694,
+      "step": 840
+    },
+    {
+      "epoch": 0.8638689866939611,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.000184530909160151,
+      "loss": 0.5785,
+      "step": 844
+    },
+    {
+      "epoch": 0.8679631525076765,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.0001835817229589834,
+      "loss": 0.6293,
+      "step": 848
+    },
+    {
+      "epoch": 0.872057318321392,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00018263111897980907,
+      "loss": 0.6031,
+      "step": 852
+    },
+    {
+      "epoch": 0.8761514841351075,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.00018167913735592955,
+      "loss": 0.5936,
+      "step": 856
+    },
+    {
+      "epoch": 0.8802456499488229,
+      "grad_norm": 0.375,
+      "learning_rate": 0.00018072581827880885,
+      "loss": 0.6135,
+      "step": 860
+    },
+    {
+      "epoch": 0.8843398157625384,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.0001797712019963766,
+      "loss": 0.5845,
+      "step": 864
+    },
+    {
+      "epoch": 0.8884339815762539,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00017881532881132878,
+      "loss": 0.5956,
+      "step": 868
+    },
+    {
+      "epoch": 0.8925281473899693,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.00017785823907942602,
+      "loss": 0.6384,
+      "step": 872
+    },
+    {
+      "epoch": 0.8966223132036848,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00017689997320779037,
+      "loss": 0.6084,
+      "step": 876
+    },
+    {
+      "epoch": 0.9007164790174002,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.00017594057165319876,
+      "loss": 0.6023,
+      "step": 880
+    },
+    {
+      "epoch": 0.9048106448311156,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.00017498007492037536,
+      "loss": 0.6271,
+      "step": 884
+    },
+    {
+      "epoch": 0.9089048106448311,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.00017401852356028124,
+      "loss": 0.6071,
+      "step": 888
+    },
+    {
+      "epoch": 0.9129989764585466,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.00017305595816840267,
+      "loss": 0.5881,
+      "step": 892
+    },
+    {
+      "epoch": 0.917093142272262,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.00017209241938303697,
+      "loss": 0.6022,
+      "step": 896
+    },
+    {
+      "epoch": 0.9211873080859775,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00017112794788357686,
+      "loss": 0.5948,
+      "step": 900
+    },
+    {
+      "epoch": 0.925281473899693,
+      "grad_norm": 0.345703125,
+      "learning_rate": 0.00017016258438879323,
+      "loss": 0.5529,
+      "step": 904
+    },
+    {
+      "epoch": 0.9293756397134084,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.00016919636965511572,
+      "loss": 0.61,
+      "step": 908
+    },
+    {
+      "epoch": 0.9334698055271239,
+      "grad_norm": 0.345703125,
+      "learning_rate": 0.00016822934447491232,
+      "loss": 0.6209,
+      "step": 912
+    },
+    {
+      "epoch": 0.9375639713408394,
+      "grad_norm": 0.396484375,
+      "learning_rate": 0.000167261549674767,
+      "loss": 0.5949,
+      "step": 916
+    },
+    {
+      "epoch": 0.9416581371545547,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.0001662930261137561,
+      "loss": 0.5936,
+      "step": 920
+    },
+    {
+      "epoch": 0.9457523029682702,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.0001653238146817233,
+      "loss": 0.6019,
+      "step": 924
+    },
+    {
+      "epoch": 0.9498464687819856,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.00016435395629755346,
+      "loss": 0.5651,
+      "step": 928
+    },
+    {
+      "epoch": 0.9539406345957011,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.00016338349190744486,
+      "loss": 0.6279,
+      "step": 932
+    },
+    {
+      "epoch": 0.9580348004094166,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.0001624124624831805,
+      "loss": 0.599,
+      "step": 936
+    },
+    {
+      "epoch": 0.962128966223132,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00016144090902039856,
+      "loss": 0.593,
+      "step": 940
+    },
+    {
+      "epoch": 0.9662231320368475,
+      "grad_norm": 0.32421875,
+      "learning_rate": 0.00016046887253686135,
+      "loss": 0.5827,
+      "step": 944
+    },
+    {
+      "epoch": 0.970317297850563,
+      "grad_norm": 0.333984375,
+      "learning_rate": 0.00015949639407072383,
+      "loss": 0.6371,
+      "step": 948
+    },
+    {
+      "epoch": 0.9744114636642784,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.00015852351467880076,
+      "loss": 0.5856,
+      "step": 952
+    },
+    {
+      "epoch": 0.9785056294779939,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.00015755027543483353,
+      "loss": 0.6166,
+      "step": 956
+    },
+    {
+      "epoch": 0.9825997952917093,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00015657671742775613,
+      "loss": 0.5667,
+      "step": 960
+    },
+    {
+      "epoch": 0.9866939611054247,
+      "grad_norm": 0.345703125,
+      "learning_rate": 0.00015560288175996023,
+      "loss": 0.5446,
+      "step": 964
+    },
+    {
+      "epoch": 0.9907881269191402,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.00015462880954555998,
+      "loss": 0.6376,
+      "step": 968
+    },
+    {
+      "epoch": 0.9948822927328557,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.0001536545419086563,
+      "loss": 0.6071,
+      "step": 972
+    },
+    {
+      "epoch": 0.9989764585465711,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00015268011998160048,
+      "loss": 0.6143,
+      "step": 976
+    },
+    {
+      "epoch": 1.0030706243602865,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.00015170558490325793,
+      "loss": 0.5123,
+      "step": 980
+    },
+    {
+      "epoch": 1.007164790174002,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.000150730977817271,
+      "loss": 0.5264,
+      "step": 984
+    },
+    {
+      "epoch": 1.0112589559877174,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.00014975633987032212,
+      "loss": 0.4917,
+      "step": 988
+    },
+    {
+      "epoch": 1.015353121801433,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.00014878171221039676,
+      "loss": 0.5258,
+      "step": 992
+    },
+    {
+      "epoch": 1.0194472876151484,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.000147807135985046,
+      "loss": 0.4466,
+      "step": 996
+    },
+    {
+      "epoch": 1.0235414534288638,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.00014683265233964937,
+      "loss": 0.5049,
+      "step": 1000
+    },
+    {
+      "epoch": 1.0276356192425793,
+      "grad_norm": 0.345703125,
+      "learning_rate": 0.00014585830241567785,
+      "loss": 0.469,
+      "step": 1004
+    },
+    {
+      "epoch": 1.0317297850562948,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.00014488412734895692,
+      "loss": 0.4901,
+      "step": 1008
+    },
+    {
+      "epoch": 1.0358239508700102,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.00014391016826792972,
+      "loss": 0.5008,
+      "step": 1012
+    },
+    {
+      "epoch": 1.0399181166837257,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.0001429364662919208,
+      "loss": 0.5037,
+      "step": 1016
+    },
+    {
+      "epoch": 1.0440122824974412,
+      "grad_norm": 0.32421875,
+      "learning_rate": 0.00014196306252939998,
+      "loss": 0.5418,
+      "step": 1020
+    },
+    {
+      "epoch": 1.0481064483111566,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.00014098999807624695,
+      "loss": 0.5068,
+      "step": 1024
+    },
+    {
+      "epoch": 1.052200614124872,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.00014001731401401622,
+      "loss": 0.523,
+      "step": 1028
+    },
+    {
+      "epoch": 1.0562947799385876,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.00013904505140820264,
+      "loss": 0.486,
+      "step": 1032
+    },
+    {
+      "epoch": 1.060388945752303,
+      "grad_norm": 0.322265625,
+      "learning_rate": 0.00013807325130650764,
+      "loss": 0.4964,
+      "step": 1036
+    },
+    {
+      "epoch": 1.0644831115660185,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00013710195473710636,
+      "loss": 0.4921,
+      "step": 1040
+    },
+    {
+      "epoch": 1.068577277379734,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.00013613120270691552,
+      "loss": 0.5132,
+      "step": 1044
+    },
+    {
+      "epoch": 1.0726714431934494,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00013516103619986192,
+      "loss": 0.5205,
+      "step": 1048
+    },
+    {
+      "epoch": 1.076765609007165,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.00013419149617515243,
+      "loss": 0.5278,
+      "step": 1052
+    },
+    {
+      "epoch": 1.0808597748208801,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.00013322262356554456,
+      "loss": 0.4682,
+      "step": 1056
+    },
+    {
+      "epoch": 1.0849539406345956,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.0001322544592756185,
+      "loss": 0.5016,
+      "step": 1060
+    },
+    {
+      "epoch": 1.089048106448311,
+      "grad_norm": 0.333984375,
+      "learning_rate": 0.00013128704418004995,
+      "loss": 0.5081,
+      "step": 1064
+    },
+    {
+      "epoch": 1.0931422722620265,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.00013032041912188467,
+      "loss": 0.5117,
+      "step": 1068
+    },
+    {
+      "epoch": 1.097236438075742,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.00012935462491081391,
+      "loss": 0.4805,
+      "step": 1072
+    },
+    {
+      "epoch": 1.1013306038894575,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.00012838970232145172,
+      "loss": 0.5378,
+      "step": 1076
+    },
+    {
+      "epoch": 1.105424769703173,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00012742569209161334,
+      "loss": 0.494,
+      "step": 1080
+    },
+    {
+      "epoch": 1.1095189355168884,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.00012646263492059528,
+      "loss": 0.4742,
+      "step": 1084
+    },
+    {
+      "epoch": 1.1136131013306039,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.0001255005714674573,
+      "loss": 0.5341,
+      "step": 1088
+    },
+    {
+      "epoch": 1.1177072671443193,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.00012453954234930542,
+      "loss": 0.5028,
+      "step": 1092
+    },
+    {
+      "epoch": 1.1218014329580348,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.00012357958813957748,
+      "loss": 0.4893,
+      "step": 1096
+    },
+    {
+      "epoch": 1.1258955987717503,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.00012262074936632994,
+      "loss": 0.548,
+      "step": 1100
+    },
+    {
+      "epoch": 1.1299897645854657,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.00012166306651052708,
+      "loss": 0.4871,
+      "step": 1104
+    },
+    {
+      "epoch": 1.1340839303991812,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00012070658000433166,
+      "loss": 0.4393,
+      "step": 1108
+    },
+    {
+      "epoch": 1.1381780962128967,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.00011975133022939816,
+      "loss": 0.5077,
+      "step": 1112
+    },
+    {
+      "epoch": 1.1422722620266121,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.0001187973575151677,
+      "loss": 0.5037,
+      "step": 1116
+    },
+    {
+      "epoch": 1.1463664278403276,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.00011784470213716574,
+      "loss": 0.4682,
+      "step": 1120
+    },
+    {
+      "epoch": 1.150460593654043,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.00011689340431530123,
+      "loss": 0.5539,
+      "step": 1124
+    },
+    {
+      "epoch": 1.1545547594677585,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.00011594350421216891,
+      "loss": 0.4853,
+      "step": 1128
+    },
+    {
+      "epoch": 1.158648925281474,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.00011499504193135363,
+      "loss": 0.5045,
+      "step": 1132
+    },
+    {
+      "epoch": 1.1627430910951895,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.00011404805751573712,
+      "loss": 0.4964,
+      "step": 1136
+    },
+    {
+      "epoch": 1.1668372569089047,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.00011310259094580754,
+      "loss": 0.4819,
+      "step": 1140
+    },
+    {
+      "epoch": 1.1709314227226202,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.00011215868213797156,
+      "loss": 0.4805,
+      "step": 1144
+    },
+    {
+      "epoch": 1.1750255885363357,
+      "grad_norm": 0.361328125,
+      "learning_rate": 0.00011121637094286903,
+      "loss": 0.4872,
+      "step": 1148
+    },
+    {
+      "epoch": 1.1791197543500511,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.00011027569714369059,
+      "loss": 0.4955,
+      "step": 1152
+    },
+    {
+      "epoch": 1.1832139201637666,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.00010933670045449822,
+      "loss": 0.4939,
+      "step": 1156
+    },
+    {
+      "epoch": 1.187308085977482,
+      "grad_norm": 0.345703125,
+      "learning_rate": 0.00010839942051854829,
+      "loss": 0.5074,
+      "step": 1160
+    },
+    {
+      "epoch": 1.1914022517911975,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.00010746389690661808,
+      "loss": 0.5396,
+      "step": 1164
+    },
+    {
+      "epoch": 1.195496417604913,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.000106530169115335,
+      "loss": 0.5302,
+      "step": 1168
+    },
+    {
+      "epoch": 1.1995905834186285,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00010559827656550933,
+      "loss": 0.5012,
+      "step": 1172
+    },
+    {
+      "epoch": 1.203684749232344,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00010466825860046967,
+      "loss": 0.5324,
+      "step": 1176
+    },
+    {
+      "epoch": 1.2077789150460594,
+      "grad_norm": 0.32421875,
+      "learning_rate": 0.00010374015448440203,
+      "loss": 0.498,
+      "step": 1180
+    },
+    {
+      "epoch": 1.2118730808597749,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.00010281400340069205,
+      "loss": 0.4906,
+      "step": 1184
+    },
+    {
+      "epoch": 1.2159672466734903,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.00010188984445027097,
+      "loss": 0.4885,
+      "step": 1188
+    },
+    {
+      "epoch": 1.2200614124872058,
+      "grad_norm": 0.376953125,
+      "learning_rate": 0.00010096771664996456,
+      "loss": 0.5133,
+      "step": 1192
+    },
+    {
+      "epoch": 1.2241555783009213,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.00010004765893084603,
+      "loss": 0.4521,
+      "step": 1196
+    },
+    {
+      "epoch": 1.2282497441146367,
+      "grad_norm": 0.37109375,
+      "learning_rate": 9.912971013659232e-05,
+      "loss": 0.5168,
+      "step": 1200
+    },
+    {
+      "epoch": 1.2323439099283522,
+      "grad_norm": 0.498046875,
+      "learning_rate": 9.821390902184426e-05,
+      "loss": 0.4759,
+      "step": 1204
+    },
+    {
+      "epoch": 1.2364380757420674,
+      "grad_norm": 0.36328125,
+      "learning_rate": 9.730029425057045e-05,
+      "loss": 0.5194,
+      "step": 1208
+    },
+    {
+      "epoch": 1.240532241555783,
+      "grad_norm": 0.376953125,
+      "learning_rate": 9.638890439443464e-05,
+      "loss": 0.4903,
+      "step": 1212
+    },
+    {
+      "epoch": 1.2446264073694984,
+      "grad_norm": 0.33984375,
+      "learning_rate": 9.547977793116762e-05,
+      "loss": 0.5149,
+      "step": 1216
+    },
+    {
+      "epoch": 1.2487205731832138,
+      "grad_norm": 0.3359375,
+      "learning_rate": 9.457295324294247e-05,
+      "loss": 0.4665,
+      "step": 1220
+    },
+    {
+      "epoch": 1.2528147389969293,
+      "grad_norm": 0.408203125,
+      "learning_rate": 9.366846861475435e-05,
+      "loss": 0.4834,
+      "step": 1224
+    },
+    {
+      "epoch": 1.2569089048106448,
+      "grad_norm": 0.34375,
+      "learning_rate": 9.276636223280396e-05,
+      "loss": 0.5027,
+      "step": 1228
+    },
+    {
+      "epoch": 1.2610030706243602,
+      "grad_norm": 0.36328125,
+      "learning_rate": 9.186667218288549e-05,
+      "loss": 0.5093,
+      "step": 1232
+    },
+    {
+      "epoch": 1.2650972364380757,
+      "grad_norm": 0.35546875,
+      "learning_rate": 9.096943644877854e-05,
+      "loss": 0.5105,
+      "step": 1236
+    },
+    {
+      "epoch": 1.2691914022517912,
+      "grad_norm": 0.380859375,
+      "learning_rate": 9.007469291064467e-05,
+      "loss": 0.5212,
+      "step": 1240
+    },
+    {
+      "epoch": 1.2732855680655066,
+      "grad_norm": 0.353515625,
+      "learning_rate": 8.918247934342806e-05,
+      "loss": 0.5191,
+      "step": 1244
+    },
+    {
+      "epoch": 1.277379733879222,
+      "grad_norm": 0.365234375,
+      "learning_rate": 8.829283341526067e-05,
+      "loss": 0.5019,
+      "step": 1248
+    },
+    {
+      "epoch": 1.2814738996929376,
+      "grad_norm": 0.384765625,
+      "learning_rate": 8.74057926858721e-05,
+      "loss": 0.5162,
+      "step": 1252
+    },
+    {
+      "epoch": 1.285568065506653,
+      "grad_norm": 0.3828125,
+      "learning_rate": 8.652139460500359e-05,
+      "loss": 0.5061,
+      "step": 1256
+    },
+    {
+      "epoch": 1.2896622313203685,
+      "grad_norm": 0.376953125,
+      "learning_rate": 8.563967651082713e-05,
+      "loss": 0.5003,
+      "step": 1260
+    },
+    {
+      "epoch": 1.293756397134084,
+      "grad_norm": 0.3671875,
+      "learning_rate": 8.47606756283691e-05,
+      "loss": 0.5196,
+      "step": 1264
+    },
+    {
+      "epoch": 1.2978505629477994,
+      "grad_norm": 0.34765625,
+      "learning_rate": 8.388442906793862e-05,
+      "loss": 0.4932,
+      "step": 1268
+    },
+    {
+      "epoch": 1.301944728761515,
+      "grad_norm": 0.369140625,
+      "learning_rate": 8.301097382356067e-05,
+      "loss": 0.4871,
+      "step": 1272
+    },
+    {
+      "epoch": 1.3060388945752304,
+      "grad_norm": 0.384765625,
+      "learning_rate": 8.214034677141465e-05,
+      "loss": 0.494,
+      "step": 1276
+    },
+    {
+      "epoch": 1.3101330603889458,
+      "grad_norm": 0.357421875,
+      "learning_rate": 8.127258466827704e-05,
+      "loss": 0.5034,
+      "step": 1280
+    },
+    {
+      "epoch": 1.3142272262026613,
+      "grad_norm": 0.365234375,
+      "learning_rate": 8.040772414996984e-05,
+      "loss": 0.5111,
+      "step": 1284
+    },
+    {
+      "epoch": 1.3183213920163768,
+      "grad_norm": 0.32421875,
+      "learning_rate": 7.95458017298138e-05,
+      "loss": 0.5169,
+      "step": 1288
+    },
+    {
+      "epoch": 1.3224155578300922,
+      "grad_norm": 0.37109375,
+      "learning_rate": 7.868685379708686e-05,
+      "loss": 0.4631,
+      "step": 1292
+    },
+    {
+      "epoch": 1.3265097236438077,
+      "grad_norm": 0.341796875,
+      "learning_rate": 7.783091661548789e-05,
+      "loss": 0.4756,
+      "step": 1296
+    },
+    {
+      "epoch": 1.330603889457523,
+      "grad_norm": 0.34765625,
+      "learning_rate": 7.697802632160557e-05,
+      "loss": 0.4705,
+      "step": 1300
+    },
+    {
+      "epoch": 1.3346980552712384,
+      "grad_norm": 0.357421875,
+      "learning_rate": 7.612821892339284e-05,
+      "loss": 0.522,
+      "step": 1304
+    },
+    {
+      "epoch": 1.3387922210849539,
+      "grad_norm": 0.375,
+      "learning_rate": 7.528153029864682e-05,
+      "loss": 0.5192,
+      "step": 1308
+    },
+    {
+      "epoch": 1.3428863868986693,
+      "grad_norm": 0.341796875,
+      "learning_rate": 7.443799619349374e-05,
+      "loss": 0.5183,
+      "step": 1312
+    },
+    {
+      "epoch": 1.3469805527123848,
+      "grad_norm": 0.35546875,
+      "learning_rate": 7.359765222088008e-05,
+      "loss": 0.506,
+      "step": 1316
+    },
+    {
+      "epoch": 1.3510747185261003,
+      "grad_norm": 0.3828125,
+      "learning_rate": 7.276053385906896e-05,
+      "loss": 0.5021,
+      "step": 1320
+    },
+    {
+      "epoch": 1.3551688843398157,
+      "grad_norm": 0.3515625,
+      "learning_rate": 7.192667645014223e-05,
+      "loss": 0.4803,
+      "step": 1324
+    },
+    {
+      "epoch": 1.3592630501535312,
+      "grad_norm": 0.365234375,
+      "learning_rate": 7.109611519850845e-05,
+      "loss": 0.4941,
+      "step": 1328
+    },
+    {
+      "epoch": 1.3633572159672467,
+      "grad_norm": 0.37890625,
+      "learning_rate": 7.026888516941658e-05,
+      "loss": 0.508,
+      "step": 1332
+    },
+    {
+      "epoch": 1.3674513817809621,
+      "grad_norm": 0.36328125,
+      "learning_rate": 6.944502128747558e-05,
+      "loss": 0.5139,
+      "step": 1336
+    },
+    {
+      "epoch": 1.3715455475946776,
+      "grad_norm": 0.36328125,
+      "learning_rate": 6.862455833517979e-05,
+      "loss": 0.4899,
+      "step": 1340
+    },
+    {
+      "epoch": 1.375639713408393,
+      "grad_norm": 0.38671875,
+      "learning_rate": 6.780753095144086e-05,
+      "loss": 0.526,
+      "step": 1344
+    },
+    {
+      "epoch": 1.3797338792221086,
+      "grad_norm": 0.373046875,
+      "learning_rate": 6.699397363012482e-05,
+      "loss": 0.499,
+      "step": 1348
+    },
+    {
+      "epoch": 1.383828045035824,
+      "grad_norm": 0.39453125,
+      "learning_rate": 6.618392071859612e-05,
+      "loss": 0.5155,
+      "step": 1352
+    },
+    {
+      "epoch": 1.3879222108495395,
+      "grad_norm": 0.384765625,
+      "learning_rate": 6.537740641626746e-05,
+      "loss": 0.5165,
+      "step": 1356
+    },
+    {
+      "epoch": 1.3920163766632547,
+      "grad_norm": 0.357421875,
+      "learning_rate": 6.457446477315588e-05,
+      "loss": 0.4815,
+      "step": 1360
+    },
+    {
+      "epoch": 1.3961105424769702,
+      "grad_norm": 0.3671875,
+      "learning_rate": 6.377512968844533e-05,
+      "loss": 0.5091,
+      "step": 1364
+    },
+    {
+      "epoch": 1.4002047082906857,
+      "grad_norm": 0.3515625,
+      "learning_rate": 6.297943490905531e-05,
+      "loss": 0.4868,
+      "step": 1368
+    },
+    {
+      "epoch": 1.4042988741044011,
+      "grad_norm": 0.373046875,
+      "learning_rate": 6.218741402821624e-05,
+      "loss": 0.4928,
+      "step": 1372
+    },
+    {
+      "epoch": 1.4083930399181166,
+      "grad_norm": 0.369140625,
+      "learning_rate": 6.139910048405134e-05,
+      "loss": 0.5173,
+      "step": 1376
+    },
+    {
+      "epoch": 1.412487205731832,
+      "grad_norm": 0.3515625,
+      "learning_rate": 6.061452755816451e-05,
+      "loss": 0.492,
+      "step": 1380
+    },
+    {
+      "epoch": 1.4165813715455475,
+      "grad_norm": 0.357421875,
+      "learning_rate": 5.9833728374235615e-05,
+      "loss": 0.5033,
+      "step": 1384
+    },
+    {
+      "epoch": 1.420675537359263,
+      "grad_norm": 0.33984375,
+      "learning_rate": 5.9056735896621796e-05,
+      "loss": 0.5119,
+      "step": 1388
+    },
+    {
+      "epoch": 1.4247697031729785,
+      "grad_norm": 0.34375,
+      "learning_rate": 5.8283582928965986e-05,
+      "loss": 0.4938,
+      "step": 1392
+    },
+    {
+      "epoch": 1.428863868986694,
+      "grad_norm": 0.36328125,
+      "learning_rate": 5.751430211281165e-05,
+      "loss": 0.4877,
+      "step": 1396
+    },
+    {
+      "epoch": 1.4329580348004094,
+      "grad_norm": 0.359375,
+      "learning_rate": 5.674892592622502e-05,
+      "loss": 0.4866,
+      "step": 1400
+    },
+    {
+      "epoch": 1.4370522006141249,
+      "grad_norm": 0.369140625,
+      "learning_rate": 5.5987486682423865e-05,
+      "loss": 0.4863,
+      "step": 1404
+    },
+    {
+      "epoch": 1.4411463664278403,
+      "grad_norm": 0.361328125,
+      "learning_rate": 5.5230016528413076e-05,
+      "loss": 0.5,
+      "step": 1408
+    },
+    {
+      "epoch": 1.4452405322415558,
+      "grad_norm": 0.357421875,
+      "learning_rate": 5.447654744362761e-05,
+      "loss": 0.4917,
+      "step": 1412
+    },
+    {
+      "epoch": 1.4493346980552713,
+      "grad_norm": 0.34765625,
+      "learning_rate": 5.37271112385823e-05,
+      "loss": 0.4828,
+      "step": 1416
+    },
+    {
+      "epoch": 1.4534288638689867,
+      "grad_norm": 0.3515625,
+      "learning_rate": 5.2981739553528944e-05,
+      "loss": 0.5278,
+      "step": 1420
+    },
+    {
+      "epoch": 1.4575230296827022,
+      "grad_norm": 0.365234375,
+      "learning_rate": 5.2240463857120365e-05,
+      "loss": 0.4959,
+      "step": 1424
+    },
+    {
+      "epoch": 1.4616171954964177,
+      "grad_norm": 0.384765625,
+      "learning_rate": 5.1503315445081946e-05,
+      "loss": 0.4767,
+      "step": 1428
+    },
+    {
+      "epoch": 1.4657113613101331,
+      "grad_norm": 0.357421875,
+      "learning_rate": 5.0770325438890304e-05,
+      "loss": 0.5052,
+      "step": 1432
+    },
+    {
+      "epoch": 1.4698055271238486,
+      "grad_norm": 0.365234375,
+      "learning_rate": 5.004152478445939e-05,
+      "loss": 0.4988,
+      "step": 1436
+    },
+    {
+      "epoch": 1.473899692937564,
+      "grad_norm": 0.36328125,
+      "learning_rate": 4.9316944250834126e-05,
+      "loss": 0.486,
+      "step": 1440
+    },
+    {
+      "epoch": 1.4779938587512795,
+      "grad_norm": 0.384765625,
+      "learning_rate": 4.8596614428891094e-05,
+      "loss": 0.5126,
+      "step": 1444
+    },
+    {
+      "epoch": 1.482088024564995,
+      "grad_norm": 0.35546875,
+      "learning_rate": 4.788056573004726e-05,
+      "loss": 0.5124,
+      "step": 1448
+    },
+    {
+      "epoch": 1.4861821903787105,
+      "grad_norm": 0.365234375,
+      "learning_rate": 4.7168828384975985e-05,
+      "loss": 0.5304,
+      "step": 1452
+    },
+    {
+      "epoch": 1.4902763561924257,
+      "grad_norm": 0.376953125,
+      "learning_rate": 4.646143244233068e-05,
+      "loss": 0.5023,
+      "step": 1456
+    },
+    {
+      "epoch": 1.4943705220061412,
+      "grad_norm": 0.33203125,
+      "learning_rate": 4.575840776747621e-05,
+      "loss": 0.453,
+      "step": 1460
+    },
+    {
+      "epoch": 1.4984646878198566,
+      "grad_norm": 0.365234375,
+      "learning_rate": 4.505978404122805e-05,
+      "loss": 0.4769,
+      "step": 1464
+    },
+    {
+      "epoch": 1.5025588536335721,
+      "grad_norm": 0.36328125,
+      "learning_rate": 4.436559075859911e-05,
+      "loss": 0.54,
+      "step": 1468
+    },
+    {
+      "epoch": 1.5066530194472876,
+      "grad_norm": 0.365234375,
+      "learning_rate": 4.367585722755474e-05,
+      "loss": 0.5083,
+      "step": 1472
+    },
+    {
+      "epoch": 1.510747185261003,
+      "grad_norm": 0.341796875,
+      "learning_rate": 4.299061256777498e-05,
+      "loss": 0.4746,
+      "step": 1476
+    },
+    {
+      "epoch": 1.5148413510747185,
+      "grad_norm": 0.3671875,
+      "learning_rate": 4.23098857094255e-05,
+      "loss": 0.511,
+      "step": 1480
+    },
+    {
+      "epoch": 1.518935516888434,
+      "grad_norm": 0.380859375,
+      "learning_rate": 4.163370539193606e-05,
+      "loss": 0.4853,
+      "step": 1484
+    },
+    {
+      "epoch": 1.5230296827021494,
+      "grad_norm": 0.35546875,
+      "learning_rate": 4.0962100162787195e-05,
+      "loss": 0.4949,
+      "step": 1488
+    },
+    {
+      "epoch": 1.527123848515865,
+      "grad_norm": 0.37109375,
+      "learning_rate": 4.029509837630499e-05,
+      "loss": 0.4859,
+      "step": 1492
+    },
+    {
+      "epoch": 1.5312180143295804,
+      "grad_norm": 0.353515625,
+      "learning_rate": 3.9632728192463986e-05,
+      "loss": 0.5075,
+      "step": 1496
+    },
+    {
+      "epoch": 1.5353121801432958,
+      "grad_norm": 0.380859375,
+      "learning_rate": 3.897501757569827e-05,
+      "loss": 0.5268,
+      "step": 1500
+    }
+  ],
+  "logging_steps": 4,
+  "max_steps": 1954,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 9.81652806593151e+17,
+  "train_batch_size": 32,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f9c03ab85612736ff806f80c0760cb10efb622262d7cd71babdea03fc557f761
+size 5368

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff