Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

.gitattributes +1 -0
README.md +202 -0
adapter_config.json +37 -0
adapter_model.safetensors +3 -0
added_tokens.json +24 -0
merges.txt +0 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +208 -0
trainer_state.json +1377 -0
training_args.bin +3 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: allura-org/Teleut-7b
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "allura-org/Teleut-7b",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.25,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "up_proj",
+    "gate_proj",
+    "k_proj",
+    "down_proj",
+    "o_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4159f11888e366e9681983ab17f8ae529839e5074e7e083edf2897e2a46808e8
+size 161533192

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9cfb31e2d925b0e278cf8f94e5f5776cc885477c222d4f6c8f5e00e5fb4d2223
+size 82463028

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9e015bcb8a9000a9133a0f0c119ba11635b59ee6ff17e40841dbce26e2eca31
+size 14244

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7732e2e30c14cbb62e2c404b7e63ae0794b5ff4ffce10c35af31548620bfeb11
+size 1064

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,208 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1377 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.24060150375939848,
+  "eval_steps": 500,
+  "global_step": 192,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0012531328320802004,
+      "grad_norm": 0.1519206017255783,
+      "learning_rate": 2.4000000000000003e-06,
+      "loss": 1.8379,
+      "step": 1
+    },
+    {
+      "epoch": 0.002506265664160401,
+      "grad_norm": 0.13846151530742645,
+      "learning_rate": 4.800000000000001e-06,
+      "loss": 1.6569,
+      "step": 2
+    },
+    {
+      "epoch": 0.0037593984962406013,
+      "grad_norm": 0.13162349164485931,
+      "learning_rate": 7.2e-06,
+      "loss": 1.6083,
+      "step": 3
+    },
+    {
+      "epoch": 0.005012531328320802,
+      "grad_norm": 0.1345846951007843,
+      "learning_rate": 9.600000000000001e-06,
+      "loss": 1.6603,
+      "step": 4
+    },
+    {
+      "epoch": 0.006265664160401002,
+      "grad_norm": 0.1223267987370491,
+      "learning_rate": 1.2e-05,
+      "loss": 1.6579,
+      "step": 5
+    },
+    {
+      "epoch": 0.007518796992481203,
+      "grad_norm": 0.09618138521909714,
+      "learning_rate": 1.44e-05,
+      "loss": 1.5689,
+      "step": 6
+    },
+    {
+      "epoch": 0.008771929824561403,
+      "grad_norm": 0.09933976829051971,
+      "learning_rate": 1.6800000000000002e-05,
+      "loss": 1.4663,
+      "step": 7
+    },
+    {
+      "epoch": 0.010025062656641603,
+      "grad_norm": 0.1198974996805191,
+      "learning_rate": 1.9200000000000003e-05,
+      "loss": 1.734,
+      "step": 8
+    },
+    {
+      "epoch": 0.011278195488721804,
+      "grad_norm": 0.15424160659313202,
+      "learning_rate": 2.16e-05,
+      "loss": 1.6767,
+      "step": 9
+    },
+    {
+      "epoch": 0.012531328320802004,
+      "grad_norm": 0.1701776534318924,
+      "learning_rate": 2.4e-05,
+      "loss": 1.5974,
+      "step": 10
+    },
+    {
+      "epoch": 0.013784461152882205,
+      "grad_norm": 0.15073753893375397,
+      "learning_rate": 2.64e-05,
+      "loss": 1.5259,
+      "step": 11
+    },
+    {
+      "epoch": 0.015037593984962405,
+      "grad_norm": 0.14940425753593445,
+      "learning_rate": 2.88e-05,
+      "loss": 1.6349,
+      "step": 12
+    },
+    {
+      "epoch": 0.016290726817042606,
+      "grad_norm": 0.12604176998138428,
+      "learning_rate": 3.12e-05,
+      "loss": 1.6443,
+      "step": 13
+    },
+    {
+      "epoch": 0.017543859649122806,
+      "grad_norm": 0.15434400737285614,
+      "learning_rate": 3.3600000000000004e-05,
+      "loss": 1.6378,
+      "step": 14
+    },
+    {
+      "epoch": 0.018796992481203006,
+      "grad_norm": 0.12647251784801483,
+      "learning_rate": 3.6e-05,
+      "loss": 1.542,
+      "step": 15
+    },
+    {
+      "epoch": 0.020050125313283207,
+      "grad_norm": 0.1258278489112854,
+      "learning_rate": 3.8400000000000005e-05,
+      "loss": 1.6091,
+      "step": 16
+    },
+    {
+      "epoch": 0.021303258145363407,
+      "grad_norm": 0.09623159468173981,
+      "learning_rate": 4.08e-05,
+      "loss": 1.5555,
+      "step": 17
+    },
+    {
+      "epoch": 0.022556390977443608,
+      "grad_norm": 0.10304850339889526,
+      "learning_rate": 4.32e-05,
+      "loss": 1.5517,
+      "step": 18
+    },
+    {
+      "epoch": 0.023809523809523808,
+      "grad_norm": 0.09278815984725952,
+      "learning_rate": 4.5600000000000004e-05,
+      "loss": 1.6436,
+      "step": 19
+    },
+    {
+      "epoch": 0.02506265664160401,
+      "grad_norm": 0.08530683070421219,
+      "learning_rate": 4.8e-05,
+      "loss": 1.5664,
+      "step": 20
+    },
+    {
+      "epoch": 0.02631578947368421,
+      "grad_norm": 0.08876050263643265,
+      "learning_rate": 5.04e-05,
+      "loss": 1.5628,
+      "step": 21
+    },
+    {
+      "epoch": 0.02756892230576441,
+      "grad_norm": 0.09901795536279678,
+      "learning_rate": 5.28e-05,
+      "loss": 1.5402,
+      "step": 22
+    },
+    {
+      "epoch": 0.02882205513784461,
+      "grad_norm": 0.09156398475170135,
+      "learning_rate": 5.520000000000001e-05,
+      "loss": 1.606,
+      "step": 23
+    },
+    {
+      "epoch": 0.03007518796992481,
+      "grad_norm": 0.09371288120746613,
+      "learning_rate": 5.76e-05,
+      "loss": 1.566,
+      "step": 24
+    },
+    {
+      "epoch": 0.03132832080200501,
+      "grad_norm": 0.0892362967133522,
+      "learning_rate": 6e-05,
+      "loss": 1.6193,
+      "step": 25
+    },
+    {
+      "epoch": 0.03258145363408521,
+      "grad_norm": 0.08784804493188858,
+      "learning_rate": 5.99997522398708e-05,
+      "loss": 1.5756,
+      "step": 26
+    },
+    {
+      "epoch": 0.03383458646616541,
+      "grad_norm": 0.08121798187494278,
+      "learning_rate": 5.999900896357553e-05,
+      "loss": 1.5085,
+      "step": 27
+    },
+    {
+      "epoch": 0.03508771929824561,
+      "grad_norm": 0.09490139782428741,
+      "learning_rate": 5.999777018339115e-05,
+      "loss": 1.5692,
+      "step": 28
+    },
+    {
+      "epoch": 0.03634085213032581,
+      "grad_norm": 0.1026797667145729,
+      "learning_rate": 5.999603591977901e-05,
+      "loss": 1.5761,
+      "step": 29
+    },
+    {
+      "epoch": 0.03759398496240601,
+      "grad_norm": 0.09289643168449402,
+      "learning_rate": 5.999380620138454e-05,
+      "loss": 1.5612,
+      "step": 30
+    },
+    {
+      "epoch": 0.03884711779448621,
+      "grad_norm": 0.09448801726102829,
+      "learning_rate": 5.9991081065036745e-05,
+      "loss": 1.6029,
+      "step": 31
+    },
+    {
+      "epoch": 0.040100250626566414,
+      "grad_norm": 0.09210563451051712,
+      "learning_rate": 5.998786055574766e-05,
+      "loss": 1.5229,
+      "step": 32
+    },
+    {
+      "epoch": 0.041353383458646614,
+      "grad_norm": 0.09483816474676132,
+      "learning_rate": 5.998414472671151e-05,
+      "loss": 1.5753,
+      "step": 33
+    },
+    {
+      "epoch": 0.042606516290726815,
+      "grad_norm": 0.0975150614976883,
+      "learning_rate": 5.997993363930393e-05,
+      "loss": 1.654,
+      "step": 34
+    },
+    {
+      "epoch": 0.043859649122807015,
+      "grad_norm": 0.10024327784776688,
+      "learning_rate": 5.997522736308089e-05,
+      "loss": 1.5868,
+      "step": 35
+    },
+    {
+      "epoch": 0.045112781954887216,
+      "grad_norm": 0.08656363934278488,
+      "learning_rate": 5.9970025975777576e-05,
+      "loss": 1.5087,
+      "step": 36
+    },
+    {
+      "epoch": 0.046365914786967416,
+      "grad_norm": 0.09871859848499298,
+      "learning_rate": 5.996432956330705e-05,
+      "loss": 1.4932,
+      "step": 37
+    },
+    {
+      "epoch": 0.047619047619047616,
+      "grad_norm": 0.11055205762386322,
+      "learning_rate": 5.9958138219758926e-05,
+      "loss": 1.5467,
+      "step": 38
+    },
+    {
+      "epoch": 0.04887218045112782,
+      "grad_norm": 0.11448398977518082,
+      "learning_rate": 5.995145204739774e-05,
+      "loss": 1.5205,
+      "step": 39
+    },
+    {
+      "epoch": 0.05012531328320802,
+      "grad_norm": 0.09594336152076721,
+      "learning_rate": 5.994427115666128e-05,
+      "loss": 1.4101,
+      "step": 40
+    },
+    {
+      "epoch": 0.05137844611528822,
+      "grad_norm": 0.1022314578294754,
+      "learning_rate": 5.993659566615878e-05,
+      "loss": 1.5726,
+      "step": 41
+    },
+    {
+      "epoch": 0.05263157894736842,
+      "grad_norm": 0.0918896496295929,
+      "learning_rate": 5.9928425702668936e-05,
+      "loss": 1.4416,
+      "step": 42
+    },
+    {
+      "epoch": 0.05388471177944862,
+      "grad_norm": 0.11478869616985321,
+      "learning_rate": 5.9919761401137845e-05,
+      "loss": 1.4871,
+      "step": 43
+    },
+    {
+      "epoch": 0.05513784461152882,
+      "grad_norm": 0.09935441613197327,
+      "learning_rate": 5.991060290467671e-05,
+      "loss": 1.5057,
+      "step": 44
+    },
+    {
+      "epoch": 0.05639097744360902,
+      "grad_norm": 0.1036807969212532,
+      "learning_rate": 5.990095036455958e-05,
+      "loss": 1.6244,
+      "step": 45
+    },
+    {
+      "epoch": 0.05764411027568922,
+      "grad_norm": 0.10617007315158844,
+      "learning_rate": 5.989080394022074e-05,
+      "loss": 1.6287,
+      "step": 46
+    },
+    {
+      "epoch": 0.05889724310776942,
+      "grad_norm": 0.09776347130537033,
+      "learning_rate": 5.988016379925215e-05,
+      "loss": 1.5421,
+      "step": 47
+    },
+    {
+      "epoch": 0.06015037593984962,
+      "grad_norm": 0.10930298268795013,
+      "learning_rate": 5.986903011740067e-05,
+      "loss": 1.6162,
+      "step": 48
+    },
+    {
+      "epoch": 0.06140350877192982,
+      "grad_norm": 0.10414919257164001,
+      "learning_rate": 5.985740307856512e-05,
+      "loss": 1.5381,
+      "step": 49
+    },
+    {
+      "epoch": 0.06265664160401002,
+      "grad_norm": 0.10629701614379883,
+      "learning_rate": 5.984528287479328e-05,
+      "loss": 1.6127,
+      "step": 50
+    },
+    {
+      "epoch": 0.06390977443609022,
+      "grad_norm": 0.11536388099193573,
+      "learning_rate": 5.983266970627869e-05,
+      "loss": 1.5455,
+      "step": 51
+    },
+    {
+      "epoch": 0.06516290726817042,
+      "grad_norm": 0.12118639051914215,
+      "learning_rate": 5.9819563781357385e-05,
+      "loss": 1.5043,
+      "step": 52
+    },
+    {
+      "epoch": 0.06641604010025062,
+      "grad_norm": 0.11645355075597763,
+      "learning_rate": 5.98059653165044e-05,
+      "loss": 1.5473,
+      "step": 53
+    },
+    {
+      "epoch": 0.06766917293233082,
+      "grad_norm": 0.12099536508321762,
+      "learning_rate": 5.9791874536330225e-05,
+      "loss": 1.5482,
+      "step": 54
+    },
+    {
+      "epoch": 0.06892230576441102,
+      "grad_norm": 0.10795404016971588,
+      "learning_rate": 5.97772916735771e-05,
+      "loss": 1.5326,
+      "step": 55
+    },
+    {
+      "epoch": 0.07017543859649122,
+      "grad_norm": 0.09979701787233353,
+      "learning_rate": 5.9762216969115154e-05,
+      "loss": 1.4133,
+      "step": 56
+    },
+    {
+      "epoch": 0.07142857142857142,
+      "grad_norm": 0.12733447551727295,
+      "learning_rate": 5.974665067193844e-05,
+      "loss": 1.5839,
+      "step": 57
+    },
+    {
+      "epoch": 0.07268170426065163,
+      "grad_norm": 0.11343546956777573,
+      "learning_rate": 5.97305930391608e-05,
+      "loss": 1.524,
+      "step": 58
+    },
+    {
+      "epoch": 0.07393483709273183,
+      "grad_norm": 0.11789534986019135,
+      "learning_rate": 5.971404433601165e-05,
+      "loss": 1.4677,
+      "step": 59
+    },
+    {
+      "epoch": 0.07518796992481203,
+      "grad_norm": 0.10624619573354721,
+      "learning_rate": 5.969700483583159e-05,
+      "loss": 1.4893,
+      "step": 60
+    },
+    {
+      "epoch": 0.07644110275689223,
+      "grad_norm": 0.10631600022315979,
+      "learning_rate": 5.967947482006786e-05,
+      "loss": 1.573,
+      "step": 61
+    },
+    {
+      "epoch": 0.07769423558897243,
+      "grad_norm": 0.12548910081386566,
+      "learning_rate": 5.9661454578269724e-05,
+      "loss": 1.5625,
+      "step": 62
+    },
+    {
+      "epoch": 0.07894736842105263,
+      "grad_norm": 0.1474038064479828,
+      "learning_rate": 5.964294440808368e-05,
+      "loss": 1.6886,
+      "step": 63
+    },
+    {
+      "epoch": 0.08020050125313283,
+      "grad_norm": 0.12192469835281372,
+      "learning_rate": 5.962394461524854e-05,
+      "loss": 1.5252,
+      "step": 64
+    },
+    {
+      "epoch": 0.08145363408521303,
+      "grad_norm": 0.11869639158248901,
+      "learning_rate": 5.960445551359037e-05,
+      "loss": 1.5262,
+      "step": 65
+    },
+    {
+      "epoch": 0.08270676691729323,
+      "grad_norm": 0.12624335289001465,
+      "learning_rate": 5.958447742501735e-05,
+      "loss": 1.5645,
+      "step": 66
+    },
+    {
+      "epoch": 0.08395989974937343,
+      "grad_norm": 0.11716689169406891,
+      "learning_rate": 5.9564010679514376e-05,
+      "loss": 1.47,
+      "step": 67
+    },
+    {
+      "epoch": 0.08521303258145363,
+      "grad_norm": 0.12144263088703156,
+      "learning_rate": 5.954305561513769e-05,
+      "loss": 1.6181,
+      "step": 68
+    },
+    {
+      "epoch": 0.08646616541353383,
+      "grad_norm": 0.12713098526000977,
+      "learning_rate": 5.9521612578009255e-05,
+      "loss": 1.6227,
+      "step": 69
+    },
+    {
+      "epoch": 0.08771929824561403,
+      "grad_norm": 0.1043514758348465,
+      "learning_rate": 5.9499681922311046e-05,
+      "loss": 1.4522,
+      "step": 70
+    },
+    {
+      "epoch": 0.08897243107769423,
+      "grad_norm": 0.11064239591360092,
+      "learning_rate": 5.947726401027921e-05,
+      "loss": 1.4685,
+      "step": 71
+    },
+    {
+      "epoch": 0.09022556390977443,
+      "grad_norm": 0.11946432292461395,
+      "learning_rate": 5.945435921219806e-05,
+      "loss": 1.4165,
+      "step": 72
+    },
+    {
+      "epoch": 0.09147869674185463,
+      "grad_norm": 0.11144128441810608,
+      "learning_rate": 5.943096790639398e-05,
+      "loss": 1.5081,
+      "step": 73
+    },
+    {
+      "epoch": 0.09273182957393483,
+      "grad_norm": 0.11953038722276688,
+      "learning_rate": 5.9407090479229166e-05,
+      "loss": 1.5773,
+      "step": 74
+    },
+    {
+      "epoch": 0.09398496240601503,
+      "grad_norm": 0.12432811409235,
+      "learning_rate": 5.938272732509525e-05,
+      "loss": 1.528,
+      "step": 75
+    },
+    {
+      "epoch": 0.09523809523809523,
+      "grad_norm": 0.13081413507461548,
+      "learning_rate": 5.9357878846406776e-05,
+      "loss": 1.5024,
+      "step": 76
+    },
+    {
+      "epoch": 0.09649122807017543,
+      "grad_norm": 0.12342929095029831,
+      "learning_rate": 5.933254545359456e-05,
+      "loss": 1.478,
+      "step": 77
+    },
+    {
+      "epoch": 0.09774436090225563,
+      "grad_norm": 0.12129031121730804,
+      "learning_rate": 5.9306727565098925e-05,
+      "loss": 1.531,
+      "step": 78
+    },
+    {
+      "epoch": 0.09899749373433583,
+      "grad_norm": 0.12284113466739655,
+      "learning_rate": 5.928042560736275e-05,
+      "loss": 1.6204,
+      "step": 79
+    },
+    {
+      "epoch": 0.10025062656641603,
+      "grad_norm": 0.14579783380031586,
+      "learning_rate": 5.9253640014824466e-05,
+      "loss": 1.4382,
+      "step": 80
+    },
+    {
+      "epoch": 0.10150375939849623,
+      "grad_norm": 0.13060207664966583,
+      "learning_rate": 5.9226371229910885e-05,
+      "loss": 1.5713,
+      "step": 81
+    },
+    {
+      "epoch": 0.10275689223057644,
+      "grad_norm": 0.12609316408634186,
+      "learning_rate": 5.919861970302982e-05,
+      "loss": 1.5312,
+      "step": 82
+    },
+    {
+      "epoch": 0.10401002506265664,
+      "grad_norm": 0.12477698922157288,
+      "learning_rate": 5.9170385892562755e-05,
+      "loss": 1.5538,
+      "step": 83
+    },
+    {
+      "epoch": 0.10526315789473684,
+      "grad_norm": 0.14683856070041656,
+      "learning_rate": 5.914167026485719e-05,
+      "loss": 1.5318,
+      "step": 84
+    },
+    {
+      "epoch": 0.10651629072681704,
+      "grad_norm": 0.12768588960170746,
+      "learning_rate": 5.9112473294218954e-05,
+      "loss": 1.5587,
+      "step": 85
+    },
+    {
+      "epoch": 0.10776942355889724,
+      "grad_norm": 0.1492781639099121,
+      "learning_rate": 5.9082795462904396e-05,
+      "loss": 1.5623,
+      "step": 86
+    },
+    {
+      "epoch": 0.10902255639097744,
+      "grad_norm": 0.12561501562595367,
+      "learning_rate": 5.905263726111241e-05,
+      "loss": 1.5232,
+      "step": 87
+    },
+    {
+      "epoch": 0.11027568922305764,
+      "grad_norm": 0.11631227284669876,
+      "learning_rate": 5.902199918697634e-05,
+      "loss": 1.505,
+      "step": 88
+    },
+    {
+      "epoch": 0.11152882205513784,
+      "grad_norm": 0.126926988363266,
+      "learning_rate": 5.899088174655571e-05,
+      "loss": 1.5001,
+      "step": 89
+    },
+    {
+      "epoch": 0.11278195488721804,
+      "grad_norm": 0.13932526111602783,
+      "learning_rate": 5.8959285453827936e-05,
+      "loss": 1.4141,
+      "step": 90
+    },
+    {
+      "epoch": 0.11403508771929824,
+      "grad_norm": 0.13378114998340607,
+      "learning_rate": 5.8927210830679785e-05,
+      "loss": 1.5173,
+      "step": 91
+    },
+    {
+      "epoch": 0.11528822055137844,
+      "grad_norm": 0.12687304615974426,
+      "learning_rate": 5.889465840689878e-05,
+      "loss": 1.3759,
+      "step": 92
+    },
+    {
+      "epoch": 0.11654135338345864,
+      "grad_norm": 0.12497436255216599,
+      "learning_rate": 5.886162872016442e-05,
+      "loss": 1.6235,
+      "step": 93
+    },
+    {
+      "epoch": 0.11779448621553884,
+      "grad_norm": 0.12597259879112244,
+      "learning_rate": 5.882812231603937e-05,
+      "loss": 1.4533,
+      "step": 94
+    },
+    {
+      "epoch": 0.11904761904761904,
+      "grad_norm": 0.14128872752189636,
+      "learning_rate": 5.879413974796033e-05,
+      "loss": 1.4717,
+      "step": 95
+    },
+    {
+      "epoch": 0.12030075187969924,
+      "grad_norm": 0.13640277087688446,
+      "learning_rate": 5.8759681577229014e-05,
+      "loss": 1.4978,
+      "step": 96
+    },
+    {
+      "epoch": 0.12155388471177944,
+      "grad_norm": 0.11883015185594559,
+      "learning_rate": 5.8724748373002805e-05,
+      "loss": 1.5333,
+      "step": 97
+    },
+    {
+      "epoch": 0.12280701754385964,
+      "grad_norm": 0.14055690169334412,
+      "learning_rate": 5.868934071228539e-05,
+      "loss": 1.5967,
+      "step": 98
+    },
+    {
+      "epoch": 0.12406015037593984,
+      "grad_norm": 0.1388673484325409,
+      "learning_rate": 5.8653459179917196e-05,
+      "loss": 1.4698,
+      "step": 99
+    },
+    {
+      "epoch": 0.12531328320802004,
+      "grad_norm": 0.14991509914398193,
+      "learning_rate": 5.861710436856577e-05,
+      "loss": 1.565,
+      "step": 100
+    },
+    {
+      "epoch": 0.12656641604010024,
+      "grad_norm": 0.15015804767608643,
+      "learning_rate": 5.8580276878715964e-05,
+      "loss": 1.5374,
+      "step": 101
+    },
+    {
+      "epoch": 0.12781954887218044,
+      "grad_norm": 0.14958204329013824,
+      "learning_rate": 5.854297731866002e-05,
+      "loss": 1.3789,
+      "step": 102
+    },
+    {
+      "epoch": 0.12907268170426064,
+      "grad_norm": 0.1282397210597992,
+      "learning_rate": 5.850520630448752e-05,
+      "loss": 1.5317,
+      "step": 103
+    },
+    {
+      "epoch": 0.13032581453634084,
+      "grad_norm": 0.12608109414577484,
+      "learning_rate": 5.8466964460075225e-05,
+      "loss": 1.4828,
+      "step": 104
+    },
+    {
+      "epoch": 0.13157894736842105,
+      "grad_norm": 0.17321082949638367,
+      "learning_rate": 5.8428252417076766e-05,
+      "loss": 1.4825,
+      "step": 105
+    },
+    {
+      "epoch": 0.13283208020050125,
+      "grad_norm": 0.135195791721344,
+      "learning_rate": 5.838907081491219e-05,
+      "loss": 1.5631,
+      "step": 106
+    },
+    {
+      "epoch": 0.13408521303258145,
+      "grad_norm": 0.12379708886146545,
+      "learning_rate": 5.8349420300757393e-05,
+      "loss": 1.5213,
+      "step": 107
+    },
+    {
+      "epoch": 0.13533834586466165,
+      "grad_norm": 0.13356727361679077,
+      "learning_rate": 5.830930152953351e-05,
+      "loss": 1.5834,
+      "step": 108
+    },
+    {
+      "epoch": 0.13659147869674185,
+      "grad_norm": 0.13438117504119873,
+      "learning_rate": 5.8268715163895984e-05,
+      "loss": 1.5039,
+      "step": 109
+    },
+    {
+      "epoch": 0.13784461152882205,
+      "grad_norm": 0.14895889163017273,
+      "learning_rate": 5.82276618742237e-05,
+      "loss": 1.5213,
+      "step": 110
+    },
+    {
+      "epoch": 0.13909774436090225,
+      "grad_norm": 0.13232296705245972,
+      "learning_rate": 5.818614233860789e-05,
+      "loss": 1.4048,
+      "step": 111
+    },
+    {
+      "epoch": 0.14035087719298245,
+      "grad_norm": 0.13256017863750458,
+      "learning_rate": 5.8144157242840904e-05,
+      "loss": 1.5077,
+      "step": 112
+    },
+    {
+      "epoch": 0.14160401002506265,
+      "grad_norm": 0.13263653218746185,
+      "learning_rate": 5.810170728040494e-05,
+      "loss": 1.3955,
+      "step": 113
+    },
+    {
+      "epoch": 0.14285714285714285,
+      "grad_norm": 0.12680189311504364,
+      "learning_rate": 5.8058793152460524e-05,
+      "loss": 1.4194,
+      "step": 114
+    },
+    {
+      "epoch": 0.14411027568922305,
+      "grad_norm": 0.13666822016239166,
+      "learning_rate": 5.801541556783501e-05,
+      "loss": 1.4234,
+      "step": 115
+    },
+    {
+      "epoch": 0.14536340852130325,
+      "grad_norm": 0.14599356055259705,
+      "learning_rate": 5.7971575243010775e-05,
+      "loss": 1.5034,
+      "step": 116
+    },
+    {
+      "epoch": 0.14661654135338345,
+      "grad_norm": 0.14143531024456024,
+      "learning_rate": 5.792727290211347e-05,
+      "loss": 1.4257,
+      "step": 117
+    },
+    {
+      "epoch": 0.14786967418546365,
+      "grad_norm": 0.14613430202007294,
+      "learning_rate": 5.7882509276899995e-05,
+      "loss": 1.5734,
+      "step": 118
+    },
+    {
+      "epoch": 0.14912280701754385,
+      "grad_norm": 0.13908414542675018,
+      "learning_rate": 5.7837285106746455e-05,
+      "loss": 1.4259,
+      "step": 119
+    },
+    {
+      "epoch": 0.15037593984962405,
+      "grad_norm": 0.14299297332763672,
+      "learning_rate": 5.779160113863594e-05,
+      "loss": 1.6024,
+      "step": 120
+    },
+    {
+      "epoch": 0.15162907268170425,
+      "grad_norm": 0.1389327496290207,
+      "learning_rate": 5.774545812714617e-05,
+      "loss": 1.5475,
+      "step": 121
+    },
+    {
+      "epoch": 0.15288220551378445,
+      "grad_norm": 0.12398528307676315,
+      "learning_rate": 5.769885683443704e-05,
+      "loss": 1.4205,
+      "step": 122
+    },
+    {
+      "epoch": 0.15413533834586465,
+      "grad_norm": 0.1524042785167694,
+      "learning_rate": 5.765179803023805e-05,
+      "loss": 1.5462,
+      "step": 123
+    },
+    {
+      "epoch": 0.15538847117794485,
+      "grad_norm": 0.13744120299816132,
+      "learning_rate": 5.7604282491835546e-05,
+      "loss": 1.396,
+      "step": 124
+    },
+    {
+      "epoch": 0.15664160401002505,
+      "grad_norm": 0.1439579576253891,
+      "learning_rate": 5.755631100405994e-05,
+      "loss": 1.3751,
+      "step": 125
+    },
+    {
+      "epoch": 0.15789473684210525,
+      "grad_norm": 0.1428079754114151,
+      "learning_rate": 5.750788435927268e-05,
+      "loss": 1.4661,
+      "step": 126
+    },
+    {
+      "epoch": 0.15914786967418545,
+      "grad_norm": 0.15680895745754242,
+      "learning_rate": 5.7459003357353214e-05,
+      "loss": 1.5036,
+      "step": 127
+    },
+    {
+      "epoch": 0.16040100250626566,
+      "grad_norm": 0.1669396311044693,
+      "learning_rate": 5.740966880568579e-05,
+      "loss": 1.5688,
+      "step": 128
+    },
+    {
+      "epoch": 0.16165413533834586,
+      "grad_norm": 0.14206421375274658,
+      "learning_rate": 5.735988151914606e-05,
+      "loss": 1.4673,
+      "step": 129
+    },
+    {
+      "epoch": 0.16290726817042606,
+      "grad_norm": 0.14038872718811035,
+      "learning_rate": 5.730964232008765e-05,
+      "loss": 1.4081,
+      "step": 130
+    },
+    {
+      "epoch": 0.16416040100250626,
+      "grad_norm": 0.14589321613311768,
+      "learning_rate": 5.72589520383286e-05,
+      "loss": 1.4524,
+      "step": 131
+    },
+    {
+      "epoch": 0.16541353383458646,
+      "grad_norm": 0.12820565700531006,
+      "learning_rate": 5.72078115111376e-05,
+      "loss": 1.513,
+      "step": 132
+    },
+    {
+      "epoch": 0.16666666666666666,
+      "grad_norm": 0.14205814898014069,
+      "learning_rate": 5.715622158322027e-05,
+      "loss": 1.4528,
+      "step": 133
+    },
+    {
+      "epoch": 0.16791979949874686,
+      "grad_norm": 0.14166466891765594,
+      "learning_rate": 5.7104183106705065e-05,
+      "loss": 1.4185,
+      "step": 134
+    },
+    {
+      "epoch": 0.16917293233082706,
+      "grad_norm": 0.1772213876247406,
+      "learning_rate": 5.705169694112929e-05,
+      "loss": 1.6193,
+      "step": 135
+    },
+    {
+      "epoch": 0.17042606516290726,
+      "grad_norm": 0.14686815440654755,
+      "learning_rate": 5.6998763953424906e-05,
+      "loss": 1.4533,
+      "step": 136
+    },
+    {
+      "epoch": 0.17167919799498746,
+      "grad_norm": 0.1334281861782074,
+      "learning_rate": 5.694538501790417e-05,
+      "loss": 1.4905,
+      "step": 137
+    },
+    {
+      "epoch": 0.17293233082706766,
+      "grad_norm": 0.1628011018037796,
+      "learning_rate": 5.689156101624519e-05,
+      "loss": 1.4211,
+      "step": 138
+    },
+    {
+      "epoch": 0.17418546365914786,
+      "grad_norm": 0.15260601043701172,
+      "learning_rate": 5.683729283747743e-05,
+      "loss": 1.487,
+      "step": 139
+    },
+    {
+      "epoch": 0.17543859649122806,
+      "grad_norm": 0.15220171213150024,
+      "learning_rate": 5.6782581377966954e-05,
+      "loss": 1.4547,
+      "step": 140
+    },
+    {
+      "epoch": 0.17669172932330826,
+      "grad_norm": 0.13987644016742706,
+      "learning_rate": 5.672742754140162e-05,
+      "loss": 1.4544,
+      "step": 141
+    },
+    {
+      "epoch": 0.17794486215538846,
+      "grad_norm": 0.1494978666305542,
+      "learning_rate": 5.6671832238776246e-05,
+      "loss": 1.3525,
+      "step": 142
+    },
+    {
+      "epoch": 0.17919799498746866,
+      "grad_norm": 0.15626025199890137,
+      "learning_rate": 5.661579638837744e-05,
+      "loss": 1.4859,
+      "step": 143
+    },
+    {
+      "epoch": 0.18045112781954886,
+      "grad_norm": 0.14824387431144714,
+      "learning_rate": 5.655932091576849e-05,
+      "loss": 1.487,
+      "step": 144
+    },
+    {
+      "epoch": 0.18170426065162906,
+      "grad_norm": 0.15267407894134521,
+      "learning_rate": 5.6502406753774104e-05,
+      "loss": 1.507,
+      "step": 145
+    },
+    {
+      "epoch": 0.18295739348370926,
+      "grad_norm": 0.14779289066791534,
+      "learning_rate": 5.644505484246495e-05,
+      "loss": 1.4844,
+      "step": 146
+    },
+    {
+      "epoch": 0.18421052631578946,
+      "grad_norm": 0.153567373752594,
+      "learning_rate": 5.638726612914217e-05,
+      "loss": 1.5149,
+      "step": 147
+    },
+    {
+      "epoch": 0.18546365914786966,
+      "grad_norm": 0.13506165146827698,
+      "learning_rate": 5.632904156832169e-05,
+      "loss": 1.4841,
+      "step": 148
+    },
+    {
+      "epoch": 0.18671679197994986,
+      "grad_norm": 0.153153657913208,
+      "learning_rate": 5.62703821217185e-05,
+      "loss": 1.5633,
+      "step": 149
+    },
+    {
+      "epoch": 0.18796992481203006,
+      "grad_norm": 0.14939852058887482,
+      "learning_rate": 5.621128875823073e-05,
+      "loss": 1.3896,
+      "step": 150
+    },
+    {
+      "epoch": 0.18922305764411027,
+      "grad_norm": 0.1525796800851822,
+      "learning_rate": 5.615176245392367e-05,
+      "loss": 1.5226,
+      "step": 151
+    },
+    {
+      "epoch": 0.19047619047619047,
+      "grad_norm": 0.15289181470870972,
+      "learning_rate": 5.609180419201366e-05,
+      "loss": 1.5143,
+      "step": 152
+    },
+    {
+      "epoch": 0.19172932330827067,
+      "grad_norm": 0.15484128892421722,
+      "learning_rate": 5.603141496285179e-05,
+      "loss": 1.3956,
+      "step": 153
+    },
+    {
+      "epoch": 0.19298245614035087,
+      "grad_norm": 0.1458120048046112,
+      "learning_rate": 5.597059576390762e-05,
+      "loss": 1.453,
+      "step": 154
+    },
+    {
+      "epoch": 0.19423558897243107,
+      "grad_norm": 0.14109253883361816,
+      "learning_rate": 5.590934759975267e-05,
+      "loss": 1.4137,
+      "step": 155
+    },
+    {
+      "epoch": 0.19548872180451127,
+      "grad_norm": 0.14820784330368042,
+      "learning_rate": 5.584767148204379e-05,
+      "loss": 1.4878,
+      "step": 156
+    },
+    {
+      "epoch": 0.19674185463659147,
+      "grad_norm": 0.16532014310359955,
+      "learning_rate": 5.578556842950651e-05,
+      "loss": 1.4942,
+      "step": 157
+    },
+    {
+      "epoch": 0.19799498746867167,
+      "grad_norm": 0.15438464283943176,
+      "learning_rate": 5.572303946791819e-05,
+      "loss": 1.353,
+      "step": 158
+    },
+    {
+      "epoch": 0.19924812030075187,
+      "grad_norm": 0.150332972407341,
+      "learning_rate": 5.566008563009107e-05,
+      "loss": 1.469,
+      "step": 159
+    },
+    {
+      "epoch": 0.20050125313283207,
+      "grad_norm": 0.1449950933456421,
+      "learning_rate": 5.5596707955855215e-05,
+      "loss": 1.4347,
+      "step": 160
+    },
+    {
+      "epoch": 0.20175438596491227,
+      "grad_norm": 0.15944981575012207,
+      "learning_rate": 5.553290749204134e-05,
+      "loss": 1.512,
+      "step": 161
+    },
+    {
+      "epoch": 0.20300751879699247,
+      "grad_norm": 0.14288978278636932,
+      "learning_rate": 5.546868529246352e-05,
+      "loss": 1.4669,
+      "step": 162
+    },
+    {
+      "epoch": 0.20426065162907267,
+      "grad_norm": 0.16595055162906647,
+      "learning_rate": 5.54040424179018e-05,
+      "loss": 1.6102,
+      "step": 163
+    },
+    {
+      "epoch": 0.20551378446115287,
+      "grad_norm": 0.1445576548576355,
+      "learning_rate": 5.533897993608463e-05,
+      "loss": 1.3255,
+      "step": 164
+    },
+    {
+      "epoch": 0.20676691729323307,
+      "grad_norm": 0.15171661972999573,
+      "learning_rate": 5.527349892167127e-05,
+      "loss": 1.4527,
+      "step": 165
+    },
+    {
+      "epoch": 0.20802005012531327,
+      "grad_norm": 0.1350892335176468,
+      "learning_rate": 5.520760045623403e-05,
+      "loss": 1.524,
+      "step": 166
+    },
+    {
+      "epoch": 0.20927318295739347,
+      "grad_norm": 0.15125590562820435,
+      "learning_rate": 5.514128562824039e-05,
+      "loss": 1.5476,
+      "step": 167
+    },
+    {
+      "epoch": 0.21052631578947367,
+      "grad_norm": 0.18517212569713593,
+      "learning_rate": 5.507455553303506e-05,
+      "loss": 1.4797,
+      "step": 168
+    },
+    {
+      "epoch": 0.21177944862155387,
+      "grad_norm": 0.16194948554039001,
+      "learning_rate": 5.5007411272821826e-05,
+      "loss": 1.4256,
+      "step": 169
+    },
+    {
+      "epoch": 0.21303258145363407,
+      "grad_norm": 0.16133227944374084,
+      "learning_rate": 5.493985395664539e-05,
+      "loss": 1.5882,
+      "step": 170
+    },
+    {
+      "epoch": 0.21428571428571427,
+      "grad_norm": 0.15564045310020447,
+      "learning_rate": 5.487188470037305e-05,
+      "loss": 1.5051,
+      "step": 171
+    },
+    {
+      "epoch": 0.21553884711779447,
+      "grad_norm": 0.1621345579624176,
+      "learning_rate": 5.480350462667625e-05,
+      "loss": 1.4765,
+      "step": 172
+    },
+    {
+      "epoch": 0.21679197994987467,
+      "grad_norm": 0.16004936397075653,
+      "learning_rate": 5.473471486501206e-05,
+      "loss": 1.4995,
+      "step": 173
+    },
+    {
+      "epoch": 0.21804511278195488,
+      "grad_norm": 0.1602533906698227,
+      "learning_rate": 5.466551655160449e-05,
+      "loss": 1.5091,
+      "step": 174
+    },
+    {
+      "epoch": 0.21929824561403508,
+      "grad_norm": 0.15978209674358368,
+      "learning_rate": 5.459591082942574e-05,
+      "loss": 1.5351,
+      "step": 175
+    },
+    {
+      "epoch": 0.22055137844611528,
+      "grad_norm": 0.15208299458026886,
+      "learning_rate": 5.452589884817733e-05,
+      "loss": 1.5023,
+      "step": 176
+    },
+    {
+      "epoch": 0.22180451127819548,
+      "grad_norm": 0.1487240493297577,
+      "learning_rate": 5.445548176427108e-05,
+      "loss": 1.4458,
+      "step": 177
+    },
+    {
+      "epoch": 0.22305764411027568,
+      "grad_norm": 0.14227648079395294,
+      "learning_rate": 5.4384660740810074e-05,
+      "loss": 1.4342,
+      "step": 178
+    },
+    {
+      "epoch": 0.22431077694235588,
+      "grad_norm": 0.13798528909683228,
+      "learning_rate": 5.431343694756935e-05,
+      "loss": 1.3625,
+      "step": 179
+    },
+    {
+      "epoch": 0.22556390977443608,
+      "grad_norm": 0.16410118341445923,
+      "learning_rate": 5.424181156097666e-05,
+      "loss": 1.4706,
+      "step": 180
+    },
+    {
+      "epoch": 0.22681704260651628,
+      "grad_norm": 0.15224584937095642,
+      "learning_rate": 5.416978576409301e-05,
+      "loss": 1.5443,
+      "step": 181
+    },
+    {
+      "epoch": 0.22807017543859648,
+      "grad_norm": 0.15167997777462006,
+      "learning_rate": 5.409736074659311e-05,
+      "loss": 1.5039,
+      "step": 182
+    },
+    {
+      "epoch": 0.22932330827067668,
+      "grad_norm": 0.16224287450313568,
+      "learning_rate": 5.402453770474575e-05,
+      "loss": 1.4773,
+      "step": 183
+    },
+    {
+      "epoch": 0.23057644110275688,
+      "grad_norm": 0.16024534404277802,
+      "learning_rate": 5.395131784139401e-05,
+      "loss": 1.5209,
+      "step": 184
+    },
+    {
+      "epoch": 0.23182957393483708,
+      "grad_norm": 0.15539222955703735,
+      "learning_rate": 5.3877702365935404e-05,
+      "loss": 1.4137,
+      "step": 185
+    },
+    {
+      "epoch": 0.23308270676691728,
+      "grad_norm": 0.146916463971138,
+      "learning_rate": 5.380369249430191e-05,
+      "loss": 1.4557,
+      "step": 186
+    },
+    {
+      "epoch": 0.23433583959899748,
+      "grad_norm": 0.16157691180706024,
+      "learning_rate": 5.37292894489399e-05,
+      "loss": 1.5541,
+      "step": 187
+    },
+    {
+      "epoch": 0.23558897243107768,
+      "grad_norm": 0.15176048874855042,
+      "learning_rate": 5.36544944587899e-05,
+      "loss": 1.5486,
+      "step": 188
+    },
+    {
+      "epoch": 0.23684210526315788,
+      "grad_norm": 0.17215217649936676,
+      "learning_rate": 5.357930875926636e-05,
+      "loss": 1.5725,
+      "step": 189
+    },
+    {
+      "epoch": 0.23809523809523808,
+      "grad_norm": 0.16552142798900604,
+      "learning_rate": 5.3503733592237174e-05,
+      "loss": 1.5372,
+      "step": 190
+    },
+    {
+      "epoch": 0.23934837092731828,
+      "grad_norm": 0.1510804295539856,
+      "learning_rate": 5.342777020600321e-05,
+      "loss": 1.3789,
+      "step": 191
+    },
+    {
+      "epoch": 0.24060150375939848,
+      "grad_norm": 0.1481272578239441,
+      "learning_rate": 5.335141985527771e-05,
+      "loss": 1.7376,
+      "step": 192
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 798,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 32,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 8.052925747125289e+17,
+  "train_batch_size": 12,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84f6f65f5435bf9ae11288dbf2ea4e72b2aa90052557edfb882ee4e255779d8e
+size 6712

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff