Training in progress, step 375, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +36 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +70 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +358 -0
last-checkpoint/trainer_state.json +2666 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: bigcode/starcoder2-3b
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder2-3b",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj",
+    "c_fc",
+    "c_proj",
+    "k_proj",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:823c97ac8a4371fa9f02bfbdb7020411457589584c7e352b373297156f19fb26
+size 47724600

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:409ae18b3ec3dee4277db2aefa0ebd50faf87fff215738c48c9324bf2fcda1d2
+size 25331900

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:31c079f5a53b33b8611e6bc35e8a6b11ea1da8ec7f7a7b36cf42e94d17aa464f
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4bf304e342001350c82d6970cec50fb92a4329a84dcb76ae8031bca03ca92aa9
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,70 @@

+{
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<fim_prefix>",
+    "<fim_middle>",
+    "<fim_suffix>",
+    "<fim_pad>",
+    "<repo_name>",
+    "<file_sep>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<jupyter_script>",
+    "<empty_output>",
+    "<code_to_intermediate>",
+    "<intermediate_to_code>",
+    "<pr>",
+    "<pr_status>",
+    "<pr_is_merged>",
+    "<pr_base>",
+    "<pr_file>",
+    "<pr_base_code>",
+    "<pr_diff>",
+    "<pr_diff_hunk>",
+    "<pr_comment>",
+    "<pr_event_id>",
+    "<pr_review>",
+    "<pr_review_state>",
+    "<pr_review_comment>",
+    "<pr_in_reply_to_review_id>",
+    "<pr_in_reply_to_comment_id>",
+    "<pr_diff_hunk_comment_line>",
+    "<NAME>",
+    "<EMAIL>",
+    "<KEY>",
+    "<PASSWORD>"
+  ],
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,358 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<fim_pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<repo_name>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<file_sep>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<jupyter_script>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<code_to_intermediate>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<intermediate_to_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<pr>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "19": {
+      "content": "<pr_status>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "20": {
+      "content": "<pr_is_merged>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21": {
+      "content": "<pr_base>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "22": {
+      "content": "<pr_file>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "23": {
+      "content": "<pr_base_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "24": {
+      "content": "<pr_diff>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "25": {
+      "content": "<pr_diff_hunk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "26": {
+      "content": "<pr_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "27": {
+      "content": "<pr_event_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "28": {
+      "content": "<pr_review>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "29": {
+      "content": "<pr_review_state>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30": {
+      "content": "<pr_review_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "31": {
+      "content": "<pr_in_reply_to_review_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32": {
+      "content": "<pr_in_reply_to_comment_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "33": {
+      "content": "<pr_diff_hunk_comment_line>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "34": {
+      "content": "<NAME>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "35": {
+      "content": "<EMAIL>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "36": {
+      "content": "<KEY>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "37": {
+      "content": "<PASSWORD>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<fim_prefix>",
+    "<fim_middle>",
+    "<fim_suffix>",
+    "<fim_pad>",
+    "<repo_name>",
+    "<file_sep>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<jupyter_script>",
+    "<empty_output>",
+    "<code_to_intermediate>",
+    "<intermediate_to_code>",
+    "<pr>",
+    "<pr_status>",
+    "<pr_is_merged>",
+    "<pr_base>",
+    "<pr_file>",
+    "<pr_base_code>",
+    "<pr_diff>",
+    "<pr_diff_hunk>",
+    "<pr_comment>",
+    "<pr_event_id>",
+    "<pr_review>",
+    "<pr_review_state>",
+    "<pr_review_comment>",
+    "<pr_in_reply_to_review_id>",
+    "<pr_in_reply_to_comment_id>",
+    "<pr_diff_hunk_comment_line>",
+    "<NAME>",
+    "<EMAIL>",
+    "<KEY>",
+    "<PASSWORD>"
+  ],
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2666 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.12340600575894693,
+  "eval_steps": 375,
+  "global_step": 375,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00032908268202385847,
+      "grad_norm": 46.0558967590332,
+      "learning_rate": 2e-05,
+      "loss": 4.5639,
+      "step": 1
+    },
+    {
+      "epoch": 0.0006581653640477169,
+      "grad_norm": 116.75971221923828,
+      "learning_rate": 4e-05,
+      "loss": 4.7399,
+      "step": 2
+    },
+    {
+      "epoch": 0.0009872480460715754,
+      "grad_norm": 51.51197814941406,
+      "learning_rate": 6e-05,
+      "loss": 4.8224,
+      "step": 3
+    },
+    {
+      "epoch": 0.0013163307280954339,
+      "grad_norm": 78.98606872558594,
+      "learning_rate": 8e-05,
+      "loss": 5.1893,
+      "step": 4
+    },
+    {
+      "epoch": 0.0016454134101192926,
+      "grad_norm": 48.04075622558594,
+      "learning_rate": 0.0001,
+      "loss": 4.4059,
+      "step": 5
+    },
+    {
+      "epoch": 0.001974496092143151,
+      "grad_norm": 38.80451202392578,
+      "learning_rate": 0.00012,
+      "loss": 4.0177,
+      "step": 6
+    },
+    {
+      "epoch": 0.0023035787741670093,
+      "grad_norm": 39.55317687988281,
+      "learning_rate": 0.00014,
+      "loss": 4.291,
+      "step": 7
+    },
+    {
+      "epoch": 0.0026326614561908677,
+      "grad_norm": 40.044979095458984,
+      "learning_rate": 0.00016,
+      "loss": 3.793,
+      "step": 8
+    },
+    {
+      "epoch": 0.0029617441382147267,
+      "grad_norm": 58.86537170410156,
+      "learning_rate": 0.00018,
+      "loss": 4.1226,
+      "step": 9
+    },
+    {
+      "epoch": 0.003290826820238585,
+      "grad_norm": 24.430255889892578,
+      "learning_rate": 0.0002,
+      "loss": 3.6641,
+      "step": 10
+    },
+    {
+      "epoch": 0.0036199095022624436,
+      "grad_norm": 54.99418640136719,
+      "learning_rate": 0.00019999977772170748,
+      "loss": 3.8054,
+      "step": 11
+    },
+    {
+      "epoch": 0.003948992184286302,
+      "grad_norm": 76.18338775634766,
+      "learning_rate": 0.00019999911088781805,
+      "loss": 3.4282,
+      "step": 12
+    },
+    {
+      "epoch": 0.0042780748663101605,
+      "grad_norm": 38.46460723876953,
+      "learning_rate": 0.0001999979995012962,
+      "loss": 3.3058,
+      "step": 13
+    },
+    {
+      "epoch": 0.0046071575483340186,
+      "grad_norm": 44.550357818603516,
+      "learning_rate": 0.00019999644356708261,
+      "loss": 3.4775,
+      "step": 14
+    },
+    {
+      "epoch": 0.0049362402303578775,
+      "grad_norm": 34.525020599365234,
+      "learning_rate": 0.00019999444309209432,
+      "loss": 3.3947,
+      "step": 15
+    },
+    {
+      "epoch": 0.0052653229123817355,
+      "grad_norm": 57.00188446044922,
+      "learning_rate": 0.0001999919980852246,
+      "loss": 3.3006,
+      "step": 16
+    },
+    {
+      "epoch": 0.005594405594405594,
+      "grad_norm": 117.598876953125,
+      "learning_rate": 0.00019998910855734288,
+      "loss": 3.2207,
+      "step": 17
+    },
+    {
+      "epoch": 0.005923488276429453,
+      "grad_norm": 37.49771499633789,
+      "learning_rate": 0.0001999857745212947,
+      "loss": 3.1492,
+      "step": 18
+    },
+    {
+      "epoch": 0.006252570958453311,
+      "grad_norm": 30.6367244720459,
+      "learning_rate": 0.00019998199599190178,
+      "loss": 3.2436,
+      "step": 19
+    },
+    {
+      "epoch": 0.00658165364047717,
+      "grad_norm": 39.12260055541992,
+      "learning_rate": 0.0001999777729859618,
+      "loss": 3.2412,
+      "step": 20
+    },
+    {
+      "epoch": 0.006910736322501028,
+      "grad_norm": 172.7960968017578,
+      "learning_rate": 0.00019997310552224846,
+      "loss": 3.1229,
+      "step": 21
+    },
+    {
+      "epoch": 0.007239819004524887,
+      "grad_norm": 39.700347900390625,
+      "learning_rate": 0.00019996799362151122,
+      "loss": 3.1227,
+      "step": 22
+    },
+    {
+      "epoch": 0.007568901686548745,
+      "grad_norm": 72.12504577636719,
+      "learning_rate": 0.00019996243730647538,
+      "loss": 3.23,
+      "step": 23
+    },
+    {
+      "epoch": 0.007897984368572603,
+      "grad_norm": 35.2486457824707,
+      "learning_rate": 0.00019995643660184191,
+      "loss": 3.1196,
+      "step": 24
+    },
+    {
+      "epoch": 0.008227067050596462,
+      "grad_norm": 38.94593048095703,
+      "learning_rate": 0.00019994999153428737,
+      "loss": 3.1875,
+      "step": 25
+    },
+    {
+      "epoch": 0.008556149732620321,
+      "grad_norm": 32.85285568237305,
+      "learning_rate": 0.00019994310213246368,
+      "loss": 3.0243,
+      "step": 26
+    },
+    {
+      "epoch": 0.00888523241464418,
+      "grad_norm": 36.16541290283203,
+      "learning_rate": 0.00019993576842699816,
+      "loss": 2.9224,
+      "step": 27
+    },
+    {
+      "epoch": 0.009214315096668037,
+      "grad_norm": 37.95417785644531,
+      "learning_rate": 0.0001999279904504933,
+      "loss": 3.1117,
+      "step": 28
+    },
+    {
+      "epoch": 0.009543397778691896,
+      "grad_norm": 30.928470611572266,
+      "learning_rate": 0.00019991976823752653,
+      "loss": 3.0161,
+      "step": 29
+    },
+    {
+      "epoch": 0.009872480460715755,
+      "grad_norm": 30.129854202270508,
+      "learning_rate": 0.00019991110182465032,
+      "loss": 2.9128,
+      "step": 30
+    },
+    {
+      "epoch": 0.010201563142739614,
+      "grad_norm": 32.489471435546875,
+      "learning_rate": 0.00019990199125039174,
+      "loss": 2.8793,
+      "step": 31
+    },
+    {
+      "epoch": 0.010530645824763471,
+      "grad_norm": 40.40331268310547,
+      "learning_rate": 0.00019989243655525247,
+      "loss": 3.0345,
+      "step": 32
+    },
+    {
+      "epoch": 0.01085972850678733,
+      "grad_norm": 36.23051834106445,
+      "learning_rate": 0.00019988243778170853,
+      "loss": 2.9974,
+      "step": 33
+    },
+    {
+      "epoch": 0.011188811188811189,
+      "grad_norm": 46.42790603637695,
+      "learning_rate": 0.0001998719949742101,
+      "loss": 3.0721,
+      "step": 34
+    },
+    {
+      "epoch": 0.011517893870835048,
+      "grad_norm": 41.625057220458984,
+      "learning_rate": 0.0001998611081791814,
+      "loss": 2.9996,
+      "step": 35
+    },
+    {
+      "epoch": 0.011846976552858907,
+      "grad_norm": 45.90873718261719,
+      "learning_rate": 0.00019984977744502038,
+      "loss": 2.9567,
+      "step": 36
+    },
+    {
+      "epoch": 0.012176059234882764,
+      "grad_norm": 184.02110290527344,
+      "learning_rate": 0.00019983800282209857,
+      "loss": 3.202,
+      "step": 37
+    },
+    {
+      "epoch": 0.012505141916906623,
+      "grad_norm": 39.56529998779297,
+      "learning_rate": 0.00019982578436276082,
+      "loss": 3.1411,
+      "step": 38
+    },
+    {
+      "epoch": 0.012834224598930482,
+      "grad_norm": 146.53335571289062,
+      "learning_rate": 0.00019981312212132512,
+      "loss": 3.1131,
+      "step": 39
+    },
+    {
+      "epoch": 0.01316330728095434,
+      "grad_norm": 150.16696166992188,
+      "learning_rate": 0.00019980001615408228,
+      "loss": 3.1807,
+      "step": 40
+    },
+    {
+      "epoch": 0.013492389962978198,
+      "grad_norm": 55.29084014892578,
+      "learning_rate": 0.00019978646651929572,
+      "loss": 3.0728,
+      "step": 41
+    },
+    {
+      "epoch": 0.013821472645002057,
+      "grad_norm": 67.03924560546875,
+      "learning_rate": 0.00019977247327720128,
+      "loss": 3.1384,
+      "step": 42
+    },
+    {
+      "epoch": 0.014150555327025915,
+      "grad_norm": 52.389244079589844,
+      "learning_rate": 0.0001997580364900068,
+      "loss": 3.0188,
+      "step": 43
+    },
+    {
+      "epoch": 0.014479638009049774,
+      "grad_norm": 52.6592903137207,
+      "learning_rate": 0.000199743156221892,
+      "loss": 3.2027,
+      "step": 44
+    },
+    {
+      "epoch": 0.014808720691073632,
+      "grad_norm": 60.01515197753906,
+      "learning_rate": 0.00019972783253900808,
+      "loss": 3.3363,
+      "step": 45
+    },
+    {
+      "epoch": 0.01513780337309749,
+      "grad_norm": 63.57032012939453,
+      "learning_rate": 0.00019971206550947748,
+      "loss": 3.4156,
+      "step": 46
+    },
+    {
+      "epoch": 0.01546688605512135,
+      "grad_norm": 95.09534454345703,
+      "learning_rate": 0.00019969585520339354,
+      "loss": 3.019,
+      "step": 47
+    },
+    {
+      "epoch": 0.015795968737145206,
+      "grad_norm": 75.03887939453125,
+      "learning_rate": 0.0001996792016928203,
+      "loss": 3.6353,
+      "step": 48
+    },
+    {
+      "epoch": 0.016125051419169065,
+      "grad_norm": 208.92349243164062,
+      "learning_rate": 0.00019966210505179197,
+      "loss": 3.8851,
+      "step": 49
+    },
+    {
+      "epoch": 0.016454134101192924,
+      "grad_norm": 96.15306091308594,
+      "learning_rate": 0.00019964456535631286,
+      "loss": 3.951,
+      "step": 50
+    },
+    {
+      "epoch": 0.016783216783216783,
+      "grad_norm": 205.12437438964844,
+      "learning_rate": 0.0001996265826843568,
+      "loss": 2.9162,
+      "step": 51
+    },
+    {
+      "epoch": 0.017112299465240642,
+      "grad_norm": 65.38352966308594,
+      "learning_rate": 0.00019960815711586696,
+      "loss": 2.8065,
+      "step": 52
+    },
+    {
+      "epoch": 0.0174413821472645,
+      "grad_norm": 59.9429931640625,
+      "learning_rate": 0.00019958928873275539,
+      "loss": 2.646,
+      "step": 53
+    },
+    {
+      "epoch": 0.01777046482928836,
+      "grad_norm": 40.976078033447266,
+      "learning_rate": 0.00019956997761890277,
+      "loss": 2.622,
+      "step": 54
+    },
+    {
+      "epoch": 0.01809954751131222,
+      "grad_norm": 47.47517776489258,
+      "learning_rate": 0.00019955022386015792,
+      "loss": 2.799,
+      "step": 55
+    },
+    {
+      "epoch": 0.018428630193336074,
+      "grad_norm": 47.42388153076172,
+      "learning_rate": 0.00019953002754433743,
+      "loss": 2.6488,
+      "step": 56
+    },
+    {
+      "epoch": 0.018757712875359933,
+      "grad_norm": 25.498687744140625,
+      "learning_rate": 0.00019950938876122542,
+      "loss": 2.4878,
+      "step": 57
+    },
+    {
+      "epoch": 0.019086795557383792,
+      "grad_norm": 23.83648681640625,
+      "learning_rate": 0.00019948830760257291,
+      "loss": 2.6812,
+      "step": 58
+    },
+    {
+      "epoch": 0.01941587823940765,
+      "grad_norm": 37.21333694458008,
+      "learning_rate": 0.0001994667841620976,
+      "loss": 2.6438,
+      "step": 59
+    },
+    {
+      "epoch": 0.01974496092143151,
+      "grad_norm": 56.65325927734375,
+      "learning_rate": 0.00019944481853548335,
+      "loss": 2.7186,
+      "step": 60
+    },
+    {
+      "epoch": 0.02007404360345537,
+      "grad_norm": 25.59310531616211,
+      "learning_rate": 0.00019942241082037982,
+      "loss": 2.6601,
+      "step": 61
+    },
+    {
+      "epoch": 0.020403126285479228,
+      "grad_norm": 36.67039108276367,
+      "learning_rate": 0.00019939956111640197,
+      "loss": 2.4964,
+      "step": 62
+    },
+    {
+      "epoch": 0.020732208967503087,
+      "grad_norm": 25.524160385131836,
+      "learning_rate": 0.00019937626952512964,
+      "loss": 2.6319,
+      "step": 63
+    },
+    {
+      "epoch": 0.021061291649526942,
+      "grad_norm": 103.86463165283203,
+      "learning_rate": 0.0001993525361501072,
+      "loss": 2.6137,
+      "step": 64
+    },
+    {
+      "epoch": 0.0213903743315508,
+      "grad_norm": 28.601654052734375,
+      "learning_rate": 0.00019932836109684286,
+      "loss": 2.6402,
+      "step": 65
+    },
+    {
+      "epoch": 0.02171945701357466,
+      "grad_norm": 45.02948760986328,
+      "learning_rate": 0.00019930374447280845,
+      "loss": 2.5556,
+      "step": 66
+    },
+    {
+      "epoch": 0.02204853969559852,
+      "grad_norm": 48.84526062011719,
+      "learning_rate": 0.00019927868638743875,
+      "loss": 2.6032,
+      "step": 67
+    },
+    {
+      "epoch": 0.022377622377622378,
+      "grad_norm": 29.451082229614258,
+      "learning_rate": 0.0001992531869521312,
+      "loss": 2.6241,
+      "step": 68
+    },
+    {
+      "epoch": 0.022706705059646237,
+      "grad_norm": 66.02137756347656,
+      "learning_rate": 0.00019922724628024515,
+      "loss": 2.6018,
+      "step": 69
+    },
+    {
+      "epoch": 0.023035787741670095,
+      "grad_norm": 30.177003860473633,
+      "learning_rate": 0.0001992008644871016,
+      "loss": 2.6833,
+      "step": 70
+    },
+    {
+      "epoch": 0.023364870423693954,
+      "grad_norm": 57.67784118652344,
+      "learning_rate": 0.00019917404168998256,
+      "loss": 2.4953,
+      "step": 71
+    },
+    {
+      "epoch": 0.023693953105717813,
+      "grad_norm": 29.414043426513672,
+      "learning_rate": 0.0001991467780081305,
+      "loss": 2.4697,
+      "step": 72
+    },
+    {
+      "epoch": 0.02402303578774167,
+      "grad_norm": 30.574806213378906,
+      "learning_rate": 0.00019911907356274795,
+      "loss": 2.524,
+      "step": 73
+    },
+    {
+      "epoch": 0.024352118469765528,
+      "grad_norm": 44.30821228027344,
+      "learning_rate": 0.00019909092847699683,
+      "loss": 2.5585,
+      "step": 74
+    },
+    {
+      "epoch": 0.024681201151789386,
+      "grad_norm": 34.140777587890625,
+      "learning_rate": 0.00019906234287599798,
+      "loss": 2.4821,
+      "step": 75
+    },
+    {
+      "epoch": 0.025010283833813245,
+      "grad_norm": 39.644832611083984,
+      "learning_rate": 0.00019903331688683057,
+      "loss": 2.7098,
+      "step": 76
+    },
+    {
+      "epoch": 0.025339366515837104,
+      "grad_norm": 73.76107025146484,
+      "learning_rate": 0.00019900385063853154,
+      "loss": 2.6193,
+      "step": 77
+    },
+    {
+      "epoch": 0.025668449197860963,
+      "grad_norm": 34.471954345703125,
+      "learning_rate": 0.00019897394426209505,
+      "loss": 2.3583,
+      "step": 78
+    },
+    {
+      "epoch": 0.025997531879884822,
+      "grad_norm": 44.11329650878906,
+      "learning_rate": 0.00019894359789047187,
+      "loss": 2.5031,
+      "step": 79
+    },
+    {
+      "epoch": 0.02632661456190868,
+      "grad_norm": 38.81841278076172,
+      "learning_rate": 0.00019891281165856873,
+      "loss": 2.7198,
+      "step": 80
+    },
+    {
+      "epoch": 0.026655697243932536,
+      "grad_norm": 35.21909713745117,
+      "learning_rate": 0.00019888158570324795,
+      "loss": 2.5912,
+      "step": 81
+    },
+    {
+      "epoch": 0.026984779925956395,
+      "grad_norm": 47.79863357543945,
+      "learning_rate": 0.0001988499201633265,
+      "loss": 2.562,
+      "step": 82
+    },
+    {
+      "epoch": 0.027313862607980254,
+      "grad_norm": 50.25193786621094,
+      "learning_rate": 0.00019881781517957562,
+      "loss": 2.7047,
+      "step": 83
+    },
+    {
+      "epoch": 0.027642945290004113,
+      "grad_norm": 36.6878776550293,
+      "learning_rate": 0.0001987852708947202,
+      "loss": 2.4972,
+      "step": 84
+    },
+    {
+      "epoch": 0.027972027972027972,
+      "grad_norm": 42.61648941040039,
+      "learning_rate": 0.00019875228745343794,
+      "loss": 2.5156,
+      "step": 85
+    },
+    {
+      "epoch": 0.02830111065405183,
+      "grad_norm": 81.98995208740234,
+      "learning_rate": 0.0001987188650023589,
+      "loss": 2.569,
+      "step": 86
+    },
+    {
+      "epoch": 0.02863019333607569,
+      "grad_norm": 46.91239929199219,
+      "learning_rate": 0.0001986850036900648,
+      "loss": 2.7152,
+      "step": 87
+    },
+    {
+      "epoch": 0.02895927601809955,
+      "grad_norm": 52.46195602416992,
+      "learning_rate": 0.00019865070366708836,
+      "loss": 2.7093,
+      "step": 88
+    },
+    {
+      "epoch": 0.029288358700123408,
+      "grad_norm": 41.07624435424805,
+      "learning_rate": 0.00019861596508591255,
+      "loss": 2.7295,
+      "step": 89
+    },
+    {
+      "epoch": 0.029617441382147263,
+      "grad_norm": 55.525753021240234,
+      "learning_rate": 0.00019858078810097002,
+      "loss": 2.6919,
+      "step": 90
+    },
+    {
+      "epoch": 0.029946524064171122,
+      "grad_norm": 50.650325775146484,
+      "learning_rate": 0.00019854517286864245,
+      "loss": 2.7554,
+      "step": 91
+    },
+    {
+      "epoch": 0.03027560674619498,
+      "grad_norm": 53.117122650146484,
+      "learning_rate": 0.0001985091195472596,
+      "loss": 2.7014,
+      "step": 92
+    },
+    {
+      "epoch": 0.03060468942821884,
+      "grad_norm": 50.88463592529297,
+      "learning_rate": 0.0001984726282970989,
+      "loss": 2.9423,
+      "step": 93
+    },
+    {
+      "epoch": 0.0309337721102427,
+      "grad_norm": 56.69411849975586,
+      "learning_rate": 0.0001984356992803847,
+      "loss": 2.9391,
+      "step": 94
+    },
+    {
+      "epoch": 0.03126285479226656,
+      "grad_norm": 49.7108268737793,
+      "learning_rate": 0.00019839833266128724,
+      "loss": 2.806,
+      "step": 95
+    },
+    {
+      "epoch": 0.03159193747429041,
+      "grad_norm": 55.716922760009766,
+      "learning_rate": 0.00019836052860592237,
+      "loss": 2.7375,
+      "step": 96
+    },
+    {
+      "epoch": 0.031921020156314275,
+      "grad_norm": 63.348724365234375,
+      "learning_rate": 0.0001983222872823505,
+      "loss": 2.8574,
+      "step": 97
+    },
+    {
+      "epoch": 0.03225010283833813,
+      "grad_norm": 61.3756103515625,
+      "learning_rate": 0.00019828360886057594,
+      "loss": 3.3003,
+      "step": 98
+    },
+    {
+      "epoch": 0.03257918552036199,
+      "grad_norm": 102.8317642211914,
+      "learning_rate": 0.00019824449351254616,
+      "loss": 2.9397,
+      "step": 99
+    },
+    {
+      "epoch": 0.03290826820238585,
+      "grad_norm": 85.07316589355469,
+      "learning_rate": 0.00019820494141215104,
+      "loss": 3.2812,
+      "step": 100
+    },
+    {
+      "epoch": 0.03323735088440971,
+      "grad_norm": 44.999996185302734,
+      "learning_rate": 0.000198164952735222,
+      "loss": 2.4481,
+      "step": 101
+    },
+    {
+      "epoch": 0.033566433566433566,
+      "grad_norm": 35.384925842285156,
+      "learning_rate": 0.00019812452765953135,
+      "loss": 2.2091,
+      "step": 102
+    },
+    {
+      "epoch": 0.03389551624845742,
+      "grad_norm": 29.201196670532227,
+      "learning_rate": 0.00019808366636479147,
+      "loss": 2.4956,
+      "step": 103
+    },
+    {
+      "epoch": 0.034224598930481284,
+      "grad_norm": 18.83892822265625,
+      "learning_rate": 0.00019804236903265388,
+      "loss": 2.3206,
+      "step": 104
+    },
+    {
+      "epoch": 0.03455368161250514,
+      "grad_norm": 17.895370483398438,
+      "learning_rate": 0.00019800063584670863,
+      "loss": 2.3439,
+      "step": 105
+    },
+    {
+      "epoch": 0.034882764294529,
+      "grad_norm": 17.173036575317383,
+      "learning_rate": 0.00019795846699248332,
+      "loss": 2.2369,
+      "step": 106
+    },
+    {
+      "epoch": 0.03521184697655286,
+      "grad_norm": 20.797094345092773,
+      "learning_rate": 0.00019791586265744237,
+      "loss": 2.2587,
+      "step": 107
+    },
+    {
+      "epoch": 0.03554092965857672,
+      "grad_norm": 20.905153274536133,
+      "learning_rate": 0.00019787282303098617,
+      "loss": 2.3331,
+      "step": 108
+    },
+    {
+      "epoch": 0.035870012340600575,
+      "grad_norm": 24.041580200195312,
+      "learning_rate": 0.0001978293483044502,
+      "loss": 2.3722,
+      "step": 109
+    },
+    {
+      "epoch": 0.03619909502262444,
+      "grad_norm": 21.71147918701172,
+      "learning_rate": 0.00019778543867110426,
+      "loss": 2.381,
+      "step": 110
+    },
+    {
+      "epoch": 0.03652817770464829,
+      "grad_norm": 19.62755012512207,
+      "learning_rate": 0.00019774109432615147,
+      "loss": 2.2829,
+      "step": 111
+    },
+    {
+      "epoch": 0.03685726038667215,
+      "grad_norm": 19.470176696777344,
+      "learning_rate": 0.00019769631546672756,
+      "loss": 2.3016,
+      "step": 112
+    },
+    {
+      "epoch": 0.03718634306869601,
+      "grad_norm": 25.57579803466797,
+      "learning_rate": 0.00019765110229189988,
+      "loss": 2.2059,
+      "step": 113
+    },
+    {
+      "epoch": 0.037515425750719866,
+      "grad_norm": 26.536754608154297,
+      "learning_rate": 0.00019760545500266657,
+      "loss": 2.3904,
+      "step": 114
+    },
+    {
+      "epoch": 0.03784450843274373,
+      "grad_norm": 21.48318099975586,
+      "learning_rate": 0.00019755937380195568,
+      "loss": 2.3908,
+      "step": 115
+    },
+    {
+      "epoch": 0.038173591114767584,
+      "grad_norm": 33.38320541381836,
+      "learning_rate": 0.00019751285889462423,
+      "loss": 2.4642,
+      "step": 116
+    },
+    {
+      "epoch": 0.038502673796791446,
+      "grad_norm": 24.473419189453125,
+      "learning_rate": 0.0001974659104874573,
+      "loss": 2.3328,
+      "step": 117
+    },
+    {
+      "epoch": 0.0388317564788153,
+      "grad_norm": 23.20199203491211,
+      "learning_rate": 0.0001974185287891671,
+      "loss": 2.2691,
+      "step": 118
+    },
+    {
+      "epoch": 0.039160839160839164,
+      "grad_norm": 27.398014068603516,
+      "learning_rate": 0.0001973707140103921,
+      "loss": 2.3794,
+      "step": 119
+    },
+    {
+      "epoch": 0.03948992184286302,
+      "grad_norm": 30.54142189025879,
+      "learning_rate": 0.00019732246636369605,
+      "loss": 2.3396,
+      "step": 120
+    },
+    {
+      "epoch": 0.039819004524886875,
+      "grad_norm": 28.70248031616211,
+      "learning_rate": 0.00019727378606356703,
+      "loss": 2.3872,
+      "step": 121
+    },
+    {
+      "epoch": 0.04014808720691074,
+      "grad_norm": 25.75336456298828,
+      "learning_rate": 0.00019722467332641656,
+      "loss": 2.2814,
+      "step": 122
+    },
+    {
+      "epoch": 0.04047716988893459,
+      "grad_norm": 29.759145736694336,
+      "learning_rate": 0.00019717512837057855,
+      "loss": 2.3478,
+      "step": 123
+    },
+    {
+      "epoch": 0.040806252570958455,
+      "grad_norm": 26.37862777709961,
+      "learning_rate": 0.0001971251514163083,
+      "loss": 2.3774,
+      "step": 124
+    },
+    {
+      "epoch": 0.04113533525298231,
+      "grad_norm": 28.256498336791992,
+      "learning_rate": 0.0001970747426857817,
+      "loss": 2.2388,
+      "step": 125
+    },
+    {
+      "epoch": 0.04146441793500617,
+      "grad_norm": 33.0186767578125,
+      "learning_rate": 0.00019702390240309404,
+      "loss": 2.3465,
+      "step": 126
+    },
+    {
+      "epoch": 0.04179350061703003,
+      "grad_norm": 34.24951934814453,
+      "learning_rate": 0.0001969726307942592,
+      "loss": 2.4891,
+      "step": 127
+    },
+    {
+      "epoch": 0.042122583299053884,
+      "grad_norm": 31.087078094482422,
+      "learning_rate": 0.00019692092808720846,
+      "loss": 2.4853,
+      "step": 128
+    },
+    {
+      "epoch": 0.042451665981077746,
+      "grad_norm": 35.06305694580078,
+      "learning_rate": 0.0001968687945117896,
+      "loss": 2.3925,
+      "step": 129
+    },
+    {
+      "epoch": 0.0427807486631016,
+      "grad_norm": 29.33173370361328,
+      "learning_rate": 0.00019681623029976588,
+      "loss": 2.4487,
+      "step": 130
+    },
+    {
+      "epoch": 0.043109831345125464,
+      "grad_norm": 31.75111198425293,
+      "learning_rate": 0.00019676323568481498,
+      "loss": 2.3077,
+      "step": 131
+    },
+    {
+      "epoch": 0.04343891402714932,
+      "grad_norm": 27.908418655395508,
+      "learning_rate": 0.00019670981090252792,
+      "loss": 2.4815,
+      "step": 132
+    },
+    {
+      "epoch": 0.04376799670917318,
+      "grad_norm": 32.187171936035156,
+      "learning_rate": 0.00019665595619040808,
+      "loss": 2.4108,
+      "step": 133
+    },
+    {
+      "epoch": 0.04409707939119704,
+      "grad_norm": 35.1542854309082,
+      "learning_rate": 0.0001966016717878702,
+      "loss": 2.441,
+      "step": 134
+    },
+    {
+      "epoch": 0.0444261620732209,
+      "grad_norm": 33.850250244140625,
+      "learning_rate": 0.00019654695793623907,
+      "loss": 2.5023,
+      "step": 135
+    },
+    {
+      "epoch": 0.044755244755244755,
+      "grad_norm": 38.28538513183594,
+      "learning_rate": 0.0001964918148787488,
+      "loss": 2.6209,
+      "step": 136
+    },
+    {
+      "epoch": 0.04508432743726861,
+      "grad_norm": 45.59490203857422,
+      "learning_rate": 0.00019643624286054144,
+      "loss": 2.563,
+      "step": 137
+    },
+    {
+      "epoch": 0.04541341011929247,
+      "grad_norm": 41.408103942871094,
+      "learning_rate": 0.00019638024212866606,
+      "loss": 2.5537,
+      "step": 138
+    },
+    {
+      "epoch": 0.04574249280131633,
+      "grad_norm": 35.567420959472656,
+      "learning_rate": 0.0001963238129320776,
+      "loss": 2.5059,
+      "step": 139
+    },
+    {
+      "epoch": 0.04607157548334019,
+      "grad_norm": 51.440792083740234,
+      "learning_rate": 0.00019626695552163578,
+      "loss": 2.5288,
+      "step": 140
+    },
+    {
+      "epoch": 0.046400658165364046,
+      "grad_norm": 55.620033264160156,
+      "learning_rate": 0.00019620967015010395,
+      "loss": 2.7742,
+      "step": 141
+    },
+    {
+      "epoch": 0.04672974084738791,
+      "grad_norm": 47.921180725097656,
+      "learning_rate": 0.00019615195707214803,
+      "loss": 2.545,
+      "step": 142
+    },
+    {
+      "epoch": 0.047058823529411764,
+      "grad_norm": 52.1619758605957,
+      "learning_rate": 0.0001960938165443353,
+      "loss": 2.4808,
+      "step": 143
+    },
+    {
+      "epoch": 0.047387906211435626,
+      "grad_norm": 52.686729431152344,
+      "learning_rate": 0.00019603524882513327,
+      "loss": 2.5127,
+      "step": 144
+    },
+    {
+      "epoch": 0.04771698889345948,
+      "grad_norm": 48.75902557373047,
+      "learning_rate": 0.0001959762541749086,
+      "loss": 2.5492,
+      "step": 145
+    },
+    {
+      "epoch": 0.04804607157548334,
+      "grad_norm": 57.62579345703125,
+      "learning_rate": 0.00019591683285592593,
+      "loss": 2.5136,
+      "step": 146
+    },
+    {
+      "epoch": 0.0483751542575072,
+      "grad_norm": 66.0849380493164,
+      "learning_rate": 0.00019585698513234663,
+      "loss": 2.9436,
+      "step": 147
+    },
+    {
+      "epoch": 0.048704236939531055,
+      "grad_norm": 65.15868377685547,
+      "learning_rate": 0.0001957967112702277,
+      "loss": 2.9614,
+      "step": 148
+    },
+    {
+      "epoch": 0.04903331962155492,
+      "grad_norm": 61.37369155883789,
+      "learning_rate": 0.00019573601153752052,
+      "loss": 3.0038,
+      "step": 149
+    },
+    {
+      "epoch": 0.04936240230357877,
+      "grad_norm": 132.2886505126953,
+      "learning_rate": 0.00019567488620406983,
+      "loss": 3.1973,
+      "step": 150
+    },
+    {
+      "epoch": 0.049691484985602635,
+      "grad_norm": 35.6921272277832,
+      "learning_rate": 0.00019561333554161224,
+      "loss": 2.1981,
+      "step": 151
+    },
+    {
+      "epoch": 0.05002056766762649,
+      "grad_norm": 32.4654541015625,
+      "learning_rate": 0.0001955513598237753,
+      "loss": 2.197,
+      "step": 152
+    },
+    {
+      "epoch": 0.05034965034965035,
+      "grad_norm": 25.712648391723633,
+      "learning_rate": 0.00019548895932607621,
+      "loss": 2.338,
+      "step": 153
+    },
+    {
+      "epoch": 0.05067873303167421,
+      "grad_norm": 19.411991119384766,
+      "learning_rate": 0.00019542613432592038,
+      "loss": 2.2655,
+      "step": 154
+    },
+    {
+      "epoch": 0.051007815713698064,
+      "grad_norm": 13.076166152954102,
+      "learning_rate": 0.00019536288510260056,
+      "loss": 1.9767,
+      "step": 155
+    },
+    {
+      "epoch": 0.051336898395721926,
+      "grad_norm": 15.647604942321777,
+      "learning_rate": 0.00019529921193729534,
+      "loss": 2.2871,
+      "step": 156
+    },
+    {
+      "epoch": 0.05166598107774578,
+      "grad_norm": 15.999380111694336,
+      "learning_rate": 0.00019523511511306793,
+      "loss": 2.4586,
+      "step": 157
+    },
+    {
+      "epoch": 0.051995063759769644,
+      "grad_norm": 16.724533081054688,
+      "learning_rate": 0.000195170594914865,
+      "loss": 2.1605,
+      "step": 158
+    },
+    {
+      "epoch": 0.0523241464417935,
+      "grad_norm": 17.9171142578125,
+      "learning_rate": 0.00019510565162951537,
+      "loss": 2.2012,
+      "step": 159
+    },
+    {
+      "epoch": 0.05265322912381736,
+      "grad_norm": 18.63538360595703,
+      "learning_rate": 0.00019504028554572864,
+      "loss": 2.2715,
+      "step": 160
+    },
+    {
+      "epoch": 0.05298231180584122,
+      "grad_norm": 17.931528091430664,
+      "learning_rate": 0.00019497449695409408,
+      "loss": 2.2195,
+      "step": 161
+    },
+    {
+      "epoch": 0.05331139448786507,
+      "grad_norm": 18.84501075744629,
+      "learning_rate": 0.00019490828614707916,
+      "loss": 2.2326,
+      "step": 162
+    },
+    {
+      "epoch": 0.053640477169888935,
+      "grad_norm": 18.82663345336914,
+      "learning_rate": 0.00019484165341902845,
+      "loss": 2.3262,
+      "step": 163
+    },
+    {
+      "epoch": 0.05396955985191279,
+      "grad_norm": 20.771976470947266,
+      "learning_rate": 0.00019477459906616206,
+      "loss": 2.3659,
+      "step": 164
+    },
+    {
+      "epoch": 0.05429864253393665,
+      "grad_norm": 19.50181007385254,
+      "learning_rate": 0.00019470712338657458,
+      "loss": 2.2192,
+      "step": 165
+    },
+    {
+      "epoch": 0.05462772521596051,
+      "grad_norm": 18.81004524230957,
+      "learning_rate": 0.0001946392266802336,
+      "loss": 2.1428,
+      "step": 166
+    },
+    {
+      "epoch": 0.05495680789798437,
+      "grad_norm": 21.053260803222656,
+      "learning_rate": 0.0001945709092489783,
+      "loss": 2.2203,
+      "step": 167
+    },
+    {
+      "epoch": 0.055285890580008226,
+      "grad_norm": 25.28620147705078,
+      "learning_rate": 0.00019450217139651844,
+      "loss": 2.1879,
+      "step": 168
+    },
+    {
+      "epoch": 0.05561497326203209,
+      "grad_norm": 22.09459686279297,
+      "learning_rate": 0.0001944330134284326,
+      "loss": 2.1769,
+      "step": 169
+    },
+    {
+      "epoch": 0.055944055944055944,
+      "grad_norm": 24.473697662353516,
+      "learning_rate": 0.00019436343565216711,
+      "loss": 2.415,
+      "step": 170
+    },
+    {
+      "epoch": 0.0562731386260798,
+      "grad_norm": 25.860061645507812,
+      "learning_rate": 0.00019429343837703455,
+      "loss": 2.299,
+      "step": 171
+    },
+    {
+      "epoch": 0.05660222130810366,
+      "grad_norm": 25.009765625,
+      "learning_rate": 0.0001942230219142124,
+      "loss": 2.1542,
+      "step": 172
+    },
+    {
+      "epoch": 0.05693130399012752,
+      "grad_norm": 28.018394470214844,
+      "learning_rate": 0.0001941521865767417,
+      "loss": 2.4432,
+      "step": 173
+    },
+    {
+      "epoch": 0.05726038667215138,
+      "grad_norm": 31.617511749267578,
+      "learning_rate": 0.0001940809326795256,
+      "loss": 2.3726,
+      "step": 174
+    },
+    {
+      "epoch": 0.057589469354175235,
+      "grad_norm": 26.330232620239258,
+      "learning_rate": 0.000194009260539328,
+      "loss": 2.2941,
+      "step": 175
+    },
+    {
+      "epoch": 0.0579185520361991,
+      "grad_norm": 28.39286994934082,
+      "learning_rate": 0.0001939371704747721,
+      "loss": 2.368,
+      "step": 176
+    },
+    {
+      "epoch": 0.05824763471822295,
+      "grad_norm": 29.393531799316406,
+      "learning_rate": 0.00019386466280633906,
+      "loss": 2.3252,
+      "step": 177
+    },
+    {
+      "epoch": 0.058576717400246815,
+      "grad_norm": 27.360153198242188,
+      "learning_rate": 0.00019379173785636646,
+      "loss": 2.1943,
+      "step": 178
+    },
+    {
+      "epoch": 0.05890580008227067,
+      "grad_norm": 29.285520553588867,
+      "learning_rate": 0.000193718395949047,
+      "loss": 2.4499,
+      "step": 179
+    },
+    {
+      "epoch": 0.059234882764294526,
+      "grad_norm": 29.243824005126953,
+      "learning_rate": 0.00019364463741042694,
+      "loss": 2.3499,
+      "step": 180
+    },
+    {
+      "epoch": 0.05956396544631839,
+      "grad_norm": 32.861839294433594,
+      "learning_rate": 0.00019357046256840473,
+      "loss": 2.3157,
+      "step": 181
+    },
+    {
+      "epoch": 0.059893048128342244,
+      "grad_norm": 40.80834197998047,
+      "learning_rate": 0.00019349587175272948,
+      "loss": 2.2473,
+      "step": 182
+    },
+    {
+      "epoch": 0.060222130810366106,
+      "grad_norm": 33.760169982910156,
+      "learning_rate": 0.0001934208652949996,
+      "loss": 2.4274,
+      "step": 183
+    },
+    {
+      "epoch": 0.06055121349238996,
+      "grad_norm": 34.87383270263672,
+      "learning_rate": 0.00019334544352866127,
+      "loss": 2.5144,
+      "step": 184
+    },
+    {
+      "epoch": 0.060880296174413824,
+      "grad_norm": 36.6989631652832,
+      "learning_rate": 0.00019326960678900688,
+      "loss": 2.2844,
+      "step": 185
+    },
+    {
+      "epoch": 0.06120937885643768,
+      "grad_norm": 36.20161056518555,
+      "learning_rate": 0.00019319335541317361,
+      "loss": 2.5463,
+      "step": 186
+    },
+    {
+      "epoch": 0.06153846153846154,
+      "grad_norm": 37.2934455871582,
+      "learning_rate": 0.00019311668974014208,
+      "loss": 2.3696,
+      "step": 187
+    },
+    {
+      "epoch": 0.0618675442204854,
+      "grad_norm": 39.52557373046875,
+      "learning_rate": 0.00019303961011073447,
+      "loss": 2.4879,
+      "step": 188
+    },
+    {
+      "epoch": 0.06219662690250925,
+      "grad_norm": 38.334991455078125,
+      "learning_rate": 0.00019296211686761346,
+      "loss": 2.5505,
+      "step": 189
+    },
+    {
+      "epoch": 0.06252570958453312,
+      "grad_norm": 43.470672607421875,
+      "learning_rate": 0.00019288421035528028,
+      "loss": 2.4267,
+      "step": 190
+    },
+    {
+      "epoch": 0.06285479226655698,
+      "grad_norm": 38.25648498535156,
+      "learning_rate": 0.00019280589092007352,
+      "loss": 2.4197,
+      "step": 191
+    },
+    {
+      "epoch": 0.06318387494858083,
+      "grad_norm": 45.45105743408203,
+      "learning_rate": 0.00019272715891016735,
+      "loss": 2.5399,
+      "step": 192
+    },
+    {
+      "epoch": 0.06351295763060469,
+      "grad_norm": 46.69264221191406,
+      "learning_rate": 0.00019264801467557007,
+      "loss": 2.5085,
+      "step": 193
+    },
+    {
+      "epoch": 0.06384204031262855,
+      "grad_norm": 51.463844299316406,
+      "learning_rate": 0.00019256845856812266,
+      "loss": 2.602,
+      "step": 194
+    },
+    {
+      "epoch": 0.06417112299465241,
+      "grad_norm": 53.442283630371094,
+      "learning_rate": 0.000192488490941497,
+      "loss": 2.6754,
+      "step": 195
+    },
+    {
+      "epoch": 0.06450020567667626,
+      "grad_norm": 64.25224304199219,
+      "learning_rate": 0.00019240811215119448,
+      "loss": 2.766,
+      "step": 196
+    },
+    {
+      "epoch": 0.06482928835870012,
+      "grad_norm": 46.239723205566406,
+      "learning_rate": 0.00019232732255454422,
+      "loss": 2.4271,
+      "step": 197
+    },
+    {
+      "epoch": 0.06515837104072399,
+      "grad_norm": 70.70040130615234,
+      "learning_rate": 0.00019224612251070175,
+      "loss": 2.6559,
+      "step": 198
+    },
+    {
+      "epoch": 0.06548745372274783,
+      "grad_norm": 69.72952270507812,
+      "learning_rate": 0.0001921645123806472,
+      "loss": 2.8281,
+      "step": 199
+    },
+    {
+      "epoch": 0.0658165364047717,
+      "grad_norm": 81.83056640625,
+      "learning_rate": 0.0001920824925271838,
+      "loss": 3.1637,
+      "step": 200
+    },
+    {
+      "epoch": 0.06614561908679556,
+      "grad_norm": 26.758020401000977,
+      "learning_rate": 0.0001920000633149362,
+      "loss": 2.2422,
+      "step": 201
+    },
+    {
+      "epoch": 0.06647470176881942,
+      "grad_norm": 24.241539001464844,
+      "learning_rate": 0.00019191722511034884,
+      "loss": 2.2236,
+      "step": 202
+    },
+    {
+      "epoch": 0.06680378445084327,
+      "grad_norm": 19.917999267578125,
+      "learning_rate": 0.00019183397828168448,
+      "loss": 2.2469,
+      "step": 203
+    },
+    {
+      "epoch": 0.06713286713286713,
+      "grad_norm": 14.072540283203125,
+      "learning_rate": 0.00019175032319902234,
+      "loss": 2.0199,
+      "step": 204
+    },
+    {
+      "epoch": 0.067461949814891,
+      "grad_norm": 15.998994827270508,
+      "learning_rate": 0.00019166626023425662,
+      "loss": 2.1876,
+      "step": 205
+    },
+    {
+      "epoch": 0.06779103249691484,
+      "grad_norm": 31.02007293701172,
+      "learning_rate": 0.00019158178976109476,
+      "loss": 2.0833,
+      "step": 206
+    },
+    {
+      "epoch": 0.0681201151789387,
+      "grad_norm": 17.03249168395996,
+      "learning_rate": 0.0001914969121550558,
+      "loss": 2.1668,
+      "step": 207
+    },
+    {
+      "epoch": 0.06844919786096257,
+      "grad_norm": 15.970086097717285,
+      "learning_rate": 0.00019141162779346874,
+      "loss": 2.0027,
+      "step": 208
+    },
+    {
+      "epoch": 0.06877828054298643,
+      "grad_norm": 17.21071434020996,
+      "learning_rate": 0.00019132593705547082,
+      "loss": 2.1795,
+      "step": 209
+    },
+    {
+      "epoch": 0.06910736322501028,
+      "grad_norm": 16.515851974487305,
+      "learning_rate": 0.00019123984032200586,
+      "loss": 2.1902,
+      "step": 210
+    },
+    {
+      "epoch": 0.06943644590703414,
+      "grad_norm": 16.371320724487305,
+      "learning_rate": 0.00019115333797582254,
+      "loss": 2.2563,
+      "step": 211
+    },
+    {
+      "epoch": 0.069765528589058,
+      "grad_norm": 17.641443252563477,
+      "learning_rate": 0.00019106643040147278,
+      "loss": 2.1812,
+      "step": 212
+    },
+    {
+      "epoch": 0.07009461127108185,
+      "grad_norm": 34.028690338134766,
+      "learning_rate": 0.00019097911798530987,
+      "loss": 2.0955,
+      "step": 213
+    },
+    {
+      "epoch": 0.07042369395310571,
+      "grad_norm": 18.238664627075195,
+      "learning_rate": 0.00019089140111548696,
+      "loss": 2.2354,
+      "step": 214
+    },
+    {
+      "epoch": 0.07075277663512958,
+      "grad_norm": 19.597766876220703,
+      "learning_rate": 0.00019080328018195513,
+      "loss": 2.2604,
+      "step": 215
+    },
+    {
+      "epoch": 0.07108185931715344,
+      "grad_norm": 22.564088821411133,
+      "learning_rate": 0.0001907147555764618,
+      "loss": 2.2941,
+      "step": 216
+    },
+    {
+      "epoch": 0.07141094199917729,
+      "grad_norm": 19.086936950683594,
+      "learning_rate": 0.00019062582769254895,
+      "loss": 2.162,
+      "step": 217
+    },
+    {
+      "epoch": 0.07174002468120115,
+      "grad_norm": 21.065017700195312,
+      "learning_rate": 0.00019053649692555135,
+      "loss": 1.9859,
+      "step": 218
+    },
+    {
+      "epoch": 0.07206910736322501,
+      "grad_norm": 22.767284393310547,
+      "learning_rate": 0.00019044676367259476,
+      "loss": 2.358,
+      "step": 219
+    },
+    {
+      "epoch": 0.07239819004524888,
+      "grad_norm": 22.147249221801758,
+      "learning_rate": 0.00019035662833259432,
+      "loss": 2.1264,
+      "step": 220
+    },
+    {
+      "epoch": 0.07272727272727272,
+      "grad_norm": 21.24959945678711,
+      "learning_rate": 0.00019026609130625257,
+      "loss": 2.1611,
+      "step": 221
+    },
+    {
+      "epoch": 0.07305635540929659,
+      "grad_norm": 24.62726593017578,
+      "learning_rate": 0.00019017515299605788,
+      "loss": 2.2199,
+      "step": 222
+    },
+    {
+      "epoch": 0.07338543809132045,
+      "grad_norm": 22.732820510864258,
+      "learning_rate": 0.00019008381380628247,
+      "loss": 2.2954,
+      "step": 223
+    },
+    {
+      "epoch": 0.0737145207733443,
+      "grad_norm": 22.863624572753906,
+      "learning_rate": 0.00018999207414298067,
+      "loss": 2.2531,
+      "step": 224
+    },
+    {
+      "epoch": 0.07404360345536816,
+      "grad_norm": 22.743608474731445,
+      "learning_rate": 0.00018989993441398726,
+      "loss": 2.1744,
+      "step": 225
+    },
+    {
+      "epoch": 0.07437268613739202,
+      "grad_norm": 25.53584861755371,
+      "learning_rate": 0.00018980739502891546,
+      "loss": 2.2578,
+      "step": 226
+    },
+    {
+      "epoch": 0.07470176881941588,
+      "grad_norm": 24.606985092163086,
+      "learning_rate": 0.0001897144563991552,
+      "loss": 2.3099,
+      "step": 227
+    },
+    {
+      "epoch": 0.07503085150143973,
+      "grad_norm": 28.77580451965332,
+      "learning_rate": 0.00018962111893787128,
+      "loss": 2.4734,
+      "step": 228
+    },
+    {
+      "epoch": 0.0753599341834636,
+      "grad_norm": 24.38824462890625,
+      "learning_rate": 0.00018952738306000151,
+      "loss": 2.2832,
+      "step": 229
+    },
+    {
+      "epoch": 0.07568901686548746,
+      "grad_norm": 26.79664421081543,
+      "learning_rate": 0.00018943324918225494,
+      "loss": 2.2934,
+      "step": 230
+    },
+    {
+      "epoch": 0.0760180995475113,
+      "grad_norm": 26.263587951660156,
+      "learning_rate": 0.0001893387177231099,
+      "loss": 2.3581,
+      "step": 231
+    },
+    {
+      "epoch": 0.07634718222953517,
+      "grad_norm": 31.044546127319336,
+      "learning_rate": 0.0001892437891028122,
+      "loss": 2.2172,
+      "step": 232
+    },
+    {
+      "epoch": 0.07667626491155903,
+      "grad_norm": 30.85577392578125,
+      "learning_rate": 0.0001891484637433733,
+      "loss": 2.3933,
+      "step": 233
+    },
+    {
+      "epoch": 0.07700534759358289,
+      "grad_norm": 30.781057357788086,
+      "learning_rate": 0.00018905274206856837,
+      "loss": 2.2013,
+      "step": 234
+    },
+    {
+      "epoch": 0.07733443027560674,
+      "grad_norm": 27.969951629638672,
+      "learning_rate": 0.00018895662450393438,
+      "loss": 2.3257,
+      "step": 235
+    },
+    {
+      "epoch": 0.0776635129576306,
+      "grad_norm": 34.26099395751953,
+      "learning_rate": 0.00018886011147676833,
+      "loss": 2.2869,
+      "step": 236
+    },
+    {
+      "epoch": 0.07799259563965447,
+      "grad_norm": 30.937467575073242,
+      "learning_rate": 0.00018876320341612522,
+      "loss": 2.5343,
+      "step": 237
+    },
+    {
+      "epoch": 0.07832167832167833,
+      "grad_norm": 35.388099670410156,
+      "learning_rate": 0.00018866590075281624,
+      "loss": 2.4132,
+      "step": 238
+    },
+    {
+      "epoch": 0.07865076100370218,
+      "grad_norm": 32.884273529052734,
+      "learning_rate": 0.00018856820391940674,
+      "loss": 2.366,
+      "step": 239
+    },
+    {
+      "epoch": 0.07897984368572604,
+      "grad_norm": 34.471805572509766,
+      "learning_rate": 0.00018847011335021449,
+      "loss": 2.4882,
+      "step": 240
+    },
+    {
+      "epoch": 0.0793089263677499,
+      "grad_norm": 38.46910095214844,
+      "learning_rate": 0.00018837162948130752,
+      "loss": 2.468,
+      "step": 241
+    },
+    {
+      "epoch": 0.07963800904977375,
+      "grad_norm": 45.747642517089844,
+      "learning_rate": 0.00018827275275050233,
+      "loss": 2.5533,
+      "step": 242
+    },
+    {
+      "epoch": 0.07996709173179761,
+      "grad_norm": 42.68091583251953,
+      "learning_rate": 0.00018817348359736203,
+      "loss": 2.6073,
+      "step": 243
+    },
+    {
+      "epoch": 0.08029617441382148,
+      "grad_norm": 42.13290786743164,
+      "learning_rate": 0.00018807382246319412,
+      "loss": 2.5101,
+      "step": 244
+    },
+    {
+      "epoch": 0.08062525709584534,
+      "grad_norm": 44.64775466918945,
+      "learning_rate": 0.00018797376979104872,
+      "loss": 2.4074,
+      "step": 245
+    },
+    {
+      "epoch": 0.08095433977786919,
+      "grad_norm": 44.45621871948242,
+      "learning_rate": 0.00018787332602571662,
+      "loss": 2.6026,
+      "step": 246
+    },
+    {
+      "epoch": 0.08128342245989305,
+      "grad_norm": 46.53767013549805,
+      "learning_rate": 0.00018777249161372713,
+      "loss": 2.7316,
+      "step": 247
+    },
+    {
+      "epoch": 0.08161250514191691,
+      "grad_norm": 66.93853759765625,
+      "learning_rate": 0.00018767126700334634,
+      "loss": 3.0533,
+      "step": 248
+    },
+    {
+      "epoch": 0.08194158782394076,
+      "grad_norm": 70.93820190429688,
+      "learning_rate": 0.0001875696526445749,
+      "loss": 3.0685,
+      "step": 249
+    },
+    {
+      "epoch": 0.08227067050596462,
+      "grad_norm": 79.4852294921875,
+      "learning_rate": 0.0001874676489891461,
+      "loss": 3.0647,
+      "step": 250
+    },
+    {
+      "epoch": 0.08259975318798848,
+      "grad_norm": 32.315250396728516,
+      "learning_rate": 0.00018736525649052394,
+      "loss": 2.2311,
+      "step": 251
+    },
+    {
+      "epoch": 0.08292883587001235,
+      "grad_norm": 30.23067855834961,
+      "learning_rate": 0.00018726247560390099,
+      "loss": 2.0774,
+      "step": 252
+    },
+    {
+      "epoch": 0.0832579185520362,
+      "grad_norm": 26.982328414916992,
+      "learning_rate": 0.00018715930678619644,
+      "loss": 2.122,
+      "step": 253
+    },
+    {
+      "epoch": 0.08358700123406006,
+      "grad_norm": 18.85164451599121,
+      "learning_rate": 0.00018705575049605413,
+      "loss": 2.2208,
+      "step": 254
+    },
+    {
+      "epoch": 0.08391608391608392,
+      "grad_norm": 13.671011924743652,
+      "learning_rate": 0.00018695180719384029,
+      "loss": 2.0684,
+      "step": 255
+    },
+    {
+      "epoch": 0.08424516659810777,
+      "grad_norm": 14.02846908569336,
+      "learning_rate": 0.00018684747734164177,
+      "loss": 1.9996,
+      "step": 256
+    },
+    {
+      "epoch": 0.08457424928013163,
+      "grad_norm": 13.098374366760254,
+      "learning_rate": 0.00018674276140326376,
+      "loss": 2.0488,
+      "step": 257
+    },
+    {
+      "epoch": 0.08490333196215549,
+      "grad_norm": 14.445398330688477,
+      "learning_rate": 0.00018663765984422786,
+      "loss": 2.1794,
+      "step": 258
+    },
+    {
+      "epoch": 0.08523241464417936,
+      "grad_norm": 14.592230796813965,
+      "learning_rate": 0.00018653217313177004,
+      "loss": 2.0188,
+      "step": 259
+    },
+    {
+      "epoch": 0.0855614973262032,
+      "grad_norm": 16.81397247314453,
+      "learning_rate": 0.00018642630173483832,
+      "loss": 2.1347,
+      "step": 260
+    },
+    {
+      "epoch": 0.08589058000822707,
+      "grad_norm": 15.40119743347168,
+      "learning_rate": 0.00018632004612409103,
+      "loss": 2.1071,
+      "step": 261
+    },
+    {
+      "epoch": 0.08621966269025093,
+      "grad_norm": 16.573291778564453,
+      "learning_rate": 0.00018621340677189453,
+      "loss": 2.0809,
+      "step": 262
+    },
+    {
+      "epoch": 0.08654874537227479,
+      "grad_norm": 16.650859832763672,
+      "learning_rate": 0.00018610638415232097,
+      "loss": 2.0691,
+      "step": 263
+    },
+    {
+      "epoch": 0.08687782805429864,
+      "grad_norm": 18.518003463745117,
+      "learning_rate": 0.00018599897874114652,
+      "loss": 2.1923,
+      "step": 264
+    },
+    {
+      "epoch": 0.0872069107363225,
+      "grad_norm": 18.871150970458984,
+      "learning_rate": 0.00018589119101584898,
+      "loss": 2.1555,
+      "step": 265
+    },
+    {
+      "epoch": 0.08753599341834636,
+      "grad_norm": 19.301851272583008,
+      "learning_rate": 0.00018578302145560584,
+      "loss": 2.1732,
+      "step": 266
+    },
+    {
+      "epoch": 0.08786507610037021,
+      "grad_norm": 18.156076431274414,
+      "learning_rate": 0.00018567447054129195,
+      "loss": 2.1245,
+      "step": 267
+    },
+    {
+      "epoch": 0.08819415878239407,
+      "grad_norm": 19.837614059448242,
+      "learning_rate": 0.00018556553875547754,
+      "loss": 2.2374,
+      "step": 268
+    },
+    {
+      "epoch": 0.08852324146441794,
+      "grad_norm": 18.92027473449707,
+      "learning_rate": 0.00018545622658242607,
+      "loss": 2.2301,
+      "step": 269
+    },
+    {
+      "epoch": 0.0888523241464418,
+      "grad_norm": 21.270837783813477,
+      "learning_rate": 0.00018534653450809197,
+      "loss": 2.2331,
+      "step": 270
+    },
+    {
+      "epoch": 0.08918140682846565,
+      "grad_norm": 21.555301666259766,
+      "learning_rate": 0.00018523646302011867,
+      "loss": 2.128,
+      "step": 271
+    },
+    {
+      "epoch": 0.08951048951048951,
+      "grad_norm": 23.148181915283203,
+      "learning_rate": 0.00018512601260783606,
+      "loss": 2.2258,
+      "step": 272
+    },
+    {
+      "epoch": 0.08983957219251337,
+      "grad_norm": 20.981473922729492,
+      "learning_rate": 0.00018501518376225887,
+      "loss": 2.2262,
+      "step": 273
+    },
+    {
+      "epoch": 0.09016865487453722,
+      "grad_norm": 23.206968307495117,
+      "learning_rate": 0.00018490397697608395,
+      "loss": 2.2757,
+      "step": 274
+    },
+    {
+      "epoch": 0.09049773755656108,
+      "grad_norm": 22.09346580505371,
+      "learning_rate": 0.0001847923927436884,
+      "loss": 2.205,
+      "step": 275
+    },
+    {
+      "epoch": 0.09082682023858495,
+      "grad_norm": 23.256229400634766,
+      "learning_rate": 0.00018468043156112728,
+      "loss": 2.1677,
+      "step": 276
+    },
+    {
+      "epoch": 0.09115590292060881,
+      "grad_norm": 23.727642059326172,
+      "learning_rate": 0.0001845680939261314,
+      "loss": 2.2491,
+      "step": 277
+    },
+    {
+      "epoch": 0.09148498560263266,
+      "grad_norm": 25.546274185180664,
+      "learning_rate": 0.00018445538033810515,
+      "loss": 2.3562,
+      "step": 278
+    },
+    {
+      "epoch": 0.09181406828465652,
+      "grad_norm": 23.656875610351562,
+      "learning_rate": 0.00018434229129812418,
+      "loss": 2.109,
+      "step": 279
+    },
+    {
+      "epoch": 0.09214315096668038,
+      "grad_norm": 28.220298767089844,
+      "learning_rate": 0.0001842288273089332,
+      "loss": 2.4128,
+      "step": 280
+    },
+    {
+      "epoch": 0.09247223364870423,
+      "grad_norm": 28.209348678588867,
+      "learning_rate": 0.00018411498887494396,
+      "loss": 2.3428,
+      "step": 281
+    },
+    {
+      "epoch": 0.09280131633072809,
+      "grad_norm": 28.554391860961914,
+      "learning_rate": 0.00018400077650223263,
+      "loss": 2.2634,
+      "step": 282
+    },
+    {
+      "epoch": 0.09313039901275195,
+      "grad_norm": 30.214752197265625,
+      "learning_rate": 0.0001838861906985379,
+      "loss": 2.3671,
+      "step": 283
+    },
+    {
+      "epoch": 0.09345948169477582,
+      "grad_norm": 31.313325881958008,
+      "learning_rate": 0.00018377123197325842,
+      "loss": 2.4922,
+      "step": 284
+    },
+    {
+      "epoch": 0.09378856437679967,
+      "grad_norm": 31.089052200317383,
+      "learning_rate": 0.00018365590083745085,
+      "loss": 2.4211,
+      "step": 285
+    },
+    {
+      "epoch": 0.09411764705882353,
+      "grad_norm": 32.58794403076172,
+      "learning_rate": 0.00018354019780382735,
+      "loss": 2.3834,
+      "step": 286
+    },
+    {
+      "epoch": 0.09444672974084739,
+      "grad_norm": 34.73944854736328,
+      "learning_rate": 0.0001834241233867533,
+      "loss": 2.2465,
+      "step": 287
+    },
+    {
+      "epoch": 0.09477581242287125,
+      "grad_norm": 38.76915740966797,
+      "learning_rate": 0.00018330767810224524,
+      "loss": 2.2918,
+      "step": 288
+    },
+    {
+      "epoch": 0.0951048951048951,
+      "grad_norm": 37.93132781982422,
+      "learning_rate": 0.0001831908624679683,
+      "loss": 2.5079,
+      "step": 289
+    },
+    {
+      "epoch": 0.09543397778691896,
+      "grad_norm": 37.57957458496094,
+      "learning_rate": 0.0001830736770032341,
+      "loss": 2.2431,
+      "step": 290
+    },
+    {
+      "epoch": 0.09576306046894283,
+      "grad_norm": 36.72309112548828,
+      "learning_rate": 0.0001829561222289984,
+      "loss": 2.5937,
+      "step": 291
+    },
+    {
+      "epoch": 0.09609214315096667,
+      "grad_norm": 43.910457611083984,
+      "learning_rate": 0.00018283819866785853,
+      "loss": 2.5796,
+      "step": 292
+    },
+    {
+      "epoch": 0.09642122583299054,
+      "grad_norm": 37.60491943359375,
+      "learning_rate": 0.0001827199068440516,
+      "loss": 2.501,
+      "step": 293
+    },
+    {
+      "epoch": 0.0967503085150144,
+      "grad_norm": 41.552066802978516,
+      "learning_rate": 0.00018260124728345162,
+      "loss": 2.5463,
+      "step": 294
+    },
+    {
+      "epoch": 0.09707939119703826,
+      "grad_norm": 42.12718963623047,
+      "learning_rate": 0.00018248222051356754,
+      "loss": 2.5723,
+      "step": 295
+    },
+    {
+      "epoch": 0.09740847387906211,
+      "grad_norm": 44.49871063232422,
+      "learning_rate": 0.00018236282706354063,
+      "loss": 2.6006,
+      "step": 296
+    },
+    {
+      "epoch": 0.09773755656108597,
+      "grad_norm": 46.53413391113281,
+      "learning_rate": 0.00018224306746414238,
+      "loss": 2.5239,
+      "step": 297
+    },
+    {
+      "epoch": 0.09806663924310983,
+      "grad_norm": 71.21157836914062,
+      "learning_rate": 0.00018212294224777197,
+      "loss": 2.8279,
+      "step": 298
+    },
+    {
+      "epoch": 0.09839572192513368,
+      "grad_norm": 76.81084442138672,
+      "learning_rate": 0.00018200245194845399,
+      "loss": 3.0209,
+      "step": 299
+    },
+    {
+      "epoch": 0.09872480460715755,
+      "grad_norm": 75.40888214111328,
+      "learning_rate": 0.00018188159710183594,
+      "loss": 2.9355,
+      "step": 300
+    },
+    {
+      "epoch": 0.09905388728918141,
+      "grad_norm": 25.123001098632812,
+      "learning_rate": 0.000181760378245186,
+      "loss": 2.1246,
+      "step": 301
+    },
+    {
+      "epoch": 0.09938296997120527,
+      "grad_norm": 24.127634048461914,
+      "learning_rate": 0.00018163879591739067,
+      "loss": 2.0098,
+      "step": 302
+    },
+    {
+      "epoch": 0.09971205265322912,
+      "grad_norm": 19.23702049255371,
+      "learning_rate": 0.0001815168506589521,
+      "loss": 2.0683,
+      "step": 303
+    },
+    {
+      "epoch": 0.10004113533525298,
+      "grad_norm": 15.723398208618164,
+      "learning_rate": 0.000181394543011986,
+      "loss": 2.1503,
+      "step": 304
+    },
+    {
+      "epoch": 0.10037021801727684,
+      "grad_norm": 13.344893455505371,
+      "learning_rate": 0.00018127187352021907,
+      "loss": 2.099,
+      "step": 305
+    },
+    {
+      "epoch": 0.1006993006993007,
+      "grad_norm": 12.449121475219727,
+      "learning_rate": 0.0001811488427289866,
+      "loss": 2.12,
+      "step": 306
+    },
+    {
+      "epoch": 0.10102838338132455,
+      "grad_norm": 10.836393356323242,
+      "learning_rate": 0.00018102545118523007,
+      "loss": 1.9312,
+      "step": 307
+    },
+    {
+      "epoch": 0.10135746606334842,
+      "grad_norm": 13.552892684936523,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 2.1161,
+      "step": 308
+    },
+    {
+      "epoch": 0.10168654874537228,
+      "grad_norm": 12.804911613464355,
+      "learning_rate": 0.00018077758803592718,
+      "loss": 1.9294,
+      "step": 309
+    },
+    {
+      "epoch": 0.10201563142739613,
+      "grad_norm": 15.500523567199707,
+      "learning_rate": 0.00018065311753227273,
+      "loss": 2.1699,
+      "step": 310
+    },
+    {
+      "epoch": 0.10234471410941999,
+      "grad_norm": 13.986373901367188,
+      "learning_rate": 0.0001805282884798732,
+      "loss": 2.1869,
+      "step": 311
+    },
+    {
+      "epoch": 0.10267379679144385,
+      "grad_norm": 15.537351608276367,
+      "learning_rate": 0.00018040310143366446,
+      "loss": 2.0441,
+      "step": 312
+    },
+    {
+      "epoch": 0.10300287947346771,
+      "grad_norm": 14.554361343383789,
+      "learning_rate": 0.00018027755695017368,
+      "loss": 2.1316,
+      "step": 313
+    },
+    {
+      "epoch": 0.10333196215549156,
+      "grad_norm": 15.637019157409668,
+      "learning_rate": 0.00018015165558751717,
+      "loss": 2.0265,
+      "step": 314
+    },
+    {
+      "epoch": 0.10366104483751543,
+      "grad_norm": 16.677814483642578,
+      "learning_rate": 0.00018002539790539773,
+      "loss": 2.1547,
+      "step": 315
+    },
+    {
+      "epoch": 0.10399012751953929,
+      "grad_norm": 17.848800659179688,
+      "learning_rate": 0.00017989878446510215,
+      "loss": 2.0712,
+      "step": 316
+    },
+    {
+      "epoch": 0.10431921020156314,
+      "grad_norm": 21.053678512573242,
+      "learning_rate": 0.00017977181582949888,
+      "loss": 2.1301,
+      "step": 317
+    },
+    {
+      "epoch": 0.104648292883587,
+      "grad_norm": 18.58283042907715,
+      "learning_rate": 0.0001796444925630353,
+      "loss": 2.1719,
+      "step": 318
+    },
+    {
+      "epoch": 0.10497737556561086,
+      "grad_norm": 18.665454864501953,
+      "learning_rate": 0.00017951681523173542,
+      "loss": 2.1489,
+      "step": 319
+    },
+    {
+      "epoch": 0.10530645824763472,
+      "grad_norm": 17.518543243408203,
+      "learning_rate": 0.0001793887844031972,
+      "loss": 2.1305,
+      "step": 320
+    },
+    {
+      "epoch": 0.10563554092965857,
+      "grad_norm": 19.706811904907227,
+      "learning_rate": 0.00017926040064659014,
+      "loss": 2.1922,
+      "step": 321
+    },
+    {
+      "epoch": 0.10596462361168243,
+      "grad_norm": 19.07513427734375,
+      "learning_rate": 0.0001791316645326526,
+      "loss": 2.1125,
+      "step": 322
+    },
+    {
+      "epoch": 0.1062937062937063,
+      "grad_norm": 21.25273323059082,
+      "learning_rate": 0.00017900257663368963,
+      "loss": 2.0967,
+      "step": 323
+    },
+    {
+      "epoch": 0.10662278897573015,
+      "grad_norm": 23.022178649902344,
+      "learning_rate": 0.0001788731375235698,
+      "loss": 2.1737,
+      "step": 324
+    },
+    {
+      "epoch": 0.10695187165775401,
+      "grad_norm": 24.29770851135254,
+      "learning_rate": 0.00017874334777772327,
+      "loss": 2.2003,
+      "step": 325
+    },
+    {
+      "epoch": 0.10728095433977787,
+      "grad_norm": 24.43622398376465,
+      "learning_rate": 0.00017861320797313892,
+      "loss": 2.2152,
+      "step": 326
+    },
+    {
+      "epoch": 0.10761003702180173,
+      "grad_norm": 28.540864944458008,
+      "learning_rate": 0.0001784827186883618,
+      "loss": 2.1726,
+      "step": 327
+    },
+    {
+      "epoch": 0.10793911970382558,
+      "grad_norm": 22.597213745117188,
+      "learning_rate": 0.00017835188050349064,
+      "loss": 2.2314,
+      "step": 328
+    },
+    {
+      "epoch": 0.10826820238584944,
+      "grad_norm": 25.8801326751709,
+      "learning_rate": 0.00017822069400017516,
+      "loss": 2.2515,
+      "step": 329
+    },
+    {
+      "epoch": 0.1085972850678733,
+      "grad_norm": 21.48238754272461,
+      "learning_rate": 0.00017808915976161362,
+      "loss": 2.2769,
+      "step": 330
+    },
+    {
+      "epoch": 0.10892636774989717,
+      "grad_norm": 26.163372039794922,
+      "learning_rate": 0.00017795727837255015,
+      "loss": 2.1905,
+      "step": 331
+    },
+    {
+      "epoch": 0.10925545043192102,
+      "grad_norm": 26.860511779785156,
+      "learning_rate": 0.00017782505041927216,
+      "loss": 2.2843,
+      "step": 332
+    },
+    {
+      "epoch": 0.10958453311394488,
+      "grad_norm": 28.667987823486328,
+      "learning_rate": 0.00017769247648960774,
+      "loss": 2.3807,
+      "step": 333
+    },
+    {
+      "epoch": 0.10991361579596874,
+      "grad_norm": 37.908939361572266,
+      "learning_rate": 0.00017755955717292296,
+      "loss": 2.331,
+      "step": 334
+    },
+    {
+      "epoch": 0.11024269847799259,
+      "grad_norm": 30.53937339782715,
+      "learning_rate": 0.00017742629306011944,
+      "loss": 2.4017,
+      "step": 335
+    },
+    {
+      "epoch": 0.11057178116001645,
+      "grad_norm": 31.421762466430664,
+      "learning_rate": 0.00017729268474363154,
+      "loss": 2.421,
+      "step": 336
+    },
+    {
+      "epoch": 0.11090086384204031,
+      "grad_norm": 28.493751525878906,
+      "learning_rate": 0.0001771587328174239,
+      "loss": 2.2773,
+      "step": 337
+    },
+    {
+      "epoch": 0.11122994652406418,
+      "grad_norm": 28.117023468017578,
+      "learning_rate": 0.0001770244378769885,
+      "loss": 2.4702,
+      "step": 338
+    },
+    {
+      "epoch": 0.11155902920608803,
+      "grad_norm": 36.35397720336914,
+      "learning_rate": 0.0001768898005193425,
+      "loss": 2.502,
+      "step": 339
+    },
+    {
+      "epoch": 0.11188811188811189,
+      "grad_norm": 31.640914916992188,
+      "learning_rate": 0.000176754821343025,
+      "loss": 2.4533,
+      "step": 340
+    },
+    {
+      "epoch": 0.11221719457013575,
+      "grad_norm": 33.728538513183594,
+      "learning_rate": 0.0001766195009480949,
+      "loss": 2.4158,
+      "step": 341
+    },
+    {
+      "epoch": 0.1125462772521596,
+      "grad_norm": 30.469196319580078,
+      "learning_rate": 0.0001764838399361279,
+      "loss": 2.4001,
+      "step": 342
+    },
+    {
+      "epoch": 0.11287535993418346,
+      "grad_norm": 39.749664306640625,
+      "learning_rate": 0.00017634783891021393,
+      "loss": 2.3815,
+      "step": 343
+    },
+    {
+      "epoch": 0.11320444261620732,
+      "grad_norm": 41.524436950683594,
+      "learning_rate": 0.00017621149847495458,
+      "loss": 2.4092,
+      "step": 344
+    },
+    {
+      "epoch": 0.11353352529823119,
+      "grad_norm": 55.04652404785156,
+      "learning_rate": 0.00017607481923646016,
+      "loss": 2.6198,
+      "step": 345
+    },
+    {
+      "epoch": 0.11386260798025503,
+      "grad_norm": 41.766761779785156,
+      "learning_rate": 0.0001759378018023473,
+      "loss": 2.4331,
+      "step": 346
+    },
+    {
+      "epoch": 0.1141916906622789,
+      "grad_norm": 73.7178955078125,
+      "learning_rate": 0.00017580044678173592,
+      "loss": 2.6612,
+      "step": 347
+    },
+    {
+      "epoch": 0.11452077334430276,
+      "grad_norm": 58.45720672607422,
+      "learning_rate": 0.00017566275478524693,
+      "loss": 2.7114,
+      "step": 348
+    },
+    {
+      "epoch": 0.11484985602632661,
+      "grad_norm": 62.9808349609375,
+      "learning_rate": 0.0001755247264249991,
+      "loss": 2.6575,
+      "step": 349
+    },
+    {
+      "epoch": 0.11517893870835047,
+      "grad_norm": 65.06092071533203,
+      "learning_rate": 0.0001753863623146066,
+      "loss": 3.1385,
+      "step": 350
+    },
+    {
+      "epoch": 0.11550802139037433,
+      "grad_norm": 21.07193946838379,
+      "learning_rate": 0.00017524766306917618,
+      "loss": 2.1127,
+      "step": 351
+    },
+    {
+      "epoch": 0.1158371040723982,
+      "grad_norm": 22.23618507385254,
+      "learning_rate": 0.0001751086293053045,
+      "loss": 2.1514,
+      "step": 352
+    },
+    {
+      "epoch": 0.11616618675442204,
+      "grad_norm": 20.072818756103516,
+      "learning_rate": 0.0001749692616410753,
+      "loss": 2.0668,
+      "step": 353
+    },
+    {
+      "epoch": 0.1164952694364459,
+      "grad_norm": 16.212791442871094,
+      "learning_rate": 0.00017482956069605668,
+      "loss": 2.0342,
+      "step": 354
+    },
+    {
+      "epoch": 0.11682435211846977,
+      "grad_norm": 12.142776489257812,
+      "learning_rate": 0.00017468952709129846,
+      "loss": 1.9391,
+      "step": 355
+    },
+    {
+      "epoch": 0.11715343480049363,
+      "grad_norm": 13.167658805847168,
+      "learning_rate": 0.00017454916144932922,
+      "loss": 2.1191,
+      "step": 356
+    },
+    {
+      "epoch": 0.11748251748251748,
+      "grad_norm": 12.89388656616211,
+      "learning_rate": 0.0001744084643941536,
+      "loss": 1.9692,
+      "step": 357
+    },
+    {
+      "epoch": 0.11781160016454134,
+      "grad_norm": 12.43535327911377,
+      "learning_rate": 0.00017426743655124974,
+      "loss": 2.1307,
+      "step": 358
+    },
+    {
+      "epoch": 0.1181406828465652,
+      "grad_norm": 13.791735649108887,
+      "learning_rate": 0.0001741260785475661,
+      "loss": 2.1729,
+      "step": 359
+    },
+    {
+      "epoch": 0.11846976552858905,
+      "grad_norm": 13.562893867492676,
+      "learning_rate": 0.00017398439101151905,
+      "loss": 2.0874,
+      "step": 360
+    },
+    {
+      "epoch": 0.11879884821061291,
+      "grad_norm": 14.418350219726562,
+      "learning_rate": 0.00017384237457298987,
+      "loss": 2.1214,
+      "step": 361
+    },
+    {
+      "epoch": 0.11912793089263678,
+      "grad_norm": 14.90912914276123,
+      "learning_rate": 0.00017370002986332193,
+      "loss": 2.15,
+      "step": 362
+    },
+    {
+      "epoch": 0.11945701357466064,
+      "grad_norm": 14.4798583984375,
+      "learning_rate": 0.00017355735751531807,
+      "loss": 2.104,
+      "step": 363
+    },
+    {
+      "epoch": 0.11978609625668449,
+      "grad_norm": 14.9854154586792,
+      "learning_rate": 0.00017341435816323756,
+      "loss": 2.1634,
+      "step": 364
+    },
+    {
+      "epoch": 0.12011517893870835,
+      "grad_norm": 15.484580039978027,
+      "learning_rate": 0.00017327103244279348,
+      "loss": 2.1227,
+      "step": 365
+    },
+    {
+      "epoch": 0.12044426162073221,
+      "grad_norm": 16.974597930908203,
+      "learning_rate": 0.00017312738099114973,
+      "loss": 2.1332,
+      "step": 366
+    },
+    {
+      "epoch": 0.12077334430275606,
+      "grad_norm": 17.421018600463867,
+      "learning_rate": 0.00017298340444691835,
+      "loss": 2.1759,
+      "step": 367
+    },
+    {
+      "epoch": 0.12110242698477992,
+      "grad_norm": 16.27506446838379,
+      "learning_rate": 0.00017283910345015647,
+      "loss": 2.1096,
+      "step": 368
+    },
+    {
+      "epoch": 0.12143150966680379,
+      "grad_norm": 17.663877487182617,
+      "learning_rate": 0.0001726944786423637,
+      "loss": 2.0878,
+      "step": 369
+    },
+    {
+      "epoch": 0.12176059234882765,
+      "grad_norm": 18.855493545532227,
+      "learning_rate": 0.00017254953066647913,
+      "loss": 2.1556,
+      "step": 370
+    },
+    {
+      "epoch": 0.1220896750308515,
+      "grad_norm": 20.37822151184082,
+      "learning_rate": 0.00017240426016687863,
+      "loss": 2.1551,
+      "step": 371
+    },
+    {
+      "epoch": 0.12241875771287536,
+      "grad_norm": 18.425636291503906,
+      "learning_rate": 0.00017225866778937165,
+      "loss": 2.1598,
+      "step": 372
+    },
+    {
+      "epoch": 0.12274784039489922,
+      "grad_norm": 18.747047424316406,
+      "learning_rate": 0.00017211275418119876,
+      "loss": 2.0371,
+      "step": 373
+    },
+    {
+      "epoch": 0.12307692307692308,
+      "grad_norm": 20.014904022216797,
+      "learning_rate": 0.0001719665199910285,
+      "loss": 2.2492,
+      "step": 374
+    },
+    {
+      "epoch": 0.12340600575894693,
+      "grad_norm": 21.430776596069336,
+      "learning_rate": 0.00017181996586895454,
+      "loss": 2.2077,
+      "step": 375
+    },
+    {
+      "epoch": 0.12340600575894693,
+      "eval_loss": 1.9685778617858887,
+      "eval_runtime": 163.1829,
+      "eval_samples_per_second": 31.364,
+      "eval_steps_per_second": 15.682,
+      "step": 375
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 375,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.1601171198181376e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fbd6bcaa7be704720eab9e48fc7b9b19e0f1a5717bbcf45ac40b30debbe1620
+size 6840

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff