Delete single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora

Browse files

Files changed (6) hide show

single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/README.md +0 -202
single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/adapter_config.json +0 -34
single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/adapter_model.safetensors +0 -3
single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/config.json +0 -45
single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/non_lora_trainables.bin +0 -3
single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/trainer_state.json +0 -2226

single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/README.md DELETED Viewed

@@ -1,202 +0,0 @@
----
-library_name: peft
-base_model: ./weights/Bunny-Llama-3-8B-V
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.11.1

single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/adapter_config.json DELETED Viewed

@@ -1,34 +0,0 @@
-{
-  "alpha_pattern": {},
-  "auto_mapping": null,
-  "base_model_name_or_path": "./weights/Bunny-Llama-3-8B-V",
-  "bias": "none",
-  "fan_in_fan_out": false,
-  "inference_mode": true,
-  "init_lora_weights": true,
-  "layer_replication": null,
-  "layers_pattern": null,
-  "layers_to_transform": null,
-  "loftq_config": {},
-  "lora_alpha": 256,
-  "lora_dropout": 0.1,
-  "megatron_config": null,
-  "megatron_core": "megatron.core",
-  "modules_to_save": null,
-  "peft_type": "LORA",
-  "r": 128,
-  "rank_pattern": {},
-  "revision": null,
-  "target_modules": [
-    "up_proj",
-    "gate_proj",
-    "v_proj",
-    "down_proj",
-    "q_proj",
-    "o_proj",
-    "k_proj"
-  ],
-  "task_type": "CAUSAL_LM",
-  "use_dora": false,
-  "use_rslora": false
-}

single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/adapter_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:65ff6fb22dfaec71c00b932805e5dee92f9b92d22dc3cd5f6f58dcdcafd3d949
-size 671150064

single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/config.json DELETED Viewed

@@ -1,45 +0,0 @@
-{
-  "_name_or_path": "./weights/Bunny-Llama-3-8B-V",
-  "architectures": [
-    "BunnyLlamaForCausalLM"
-  ],
-  "attention_bias": false,
-  "attention_dropout": 0.0,
-  "auto_map": {
-    "AutoConfig": "configuration_bunny_llama.BunnyLlamaConfig",
-    "AutoModelForCausalLM": "modeling_bunny_llama.BunnyLlamaForCausalLM"
-  },
-  "bos_token_id": 128000,
-  "continuous_training": false,
-  "eos_token_id": 128001,
-  "freeze_mm_mlp_adapter": false,
-  "hidden_act": "silu",
-  "hidden_size": 4096,
-  "image_aspect_ratio": "pad",
-  "initializer_range": 0.02,
-  "intermediate_size": 14336,
-  "max_position_embeddings": 8192,
-  "mm_hidden_size": 1152,
-  "mm_projector_lr": null,
-  "mm_projector_type": "mlp2x_gelu",
-  "mm_vision_tower": "./weights/siglip-so400m-patch14-384",
-  "model_type": "bunny-llama",
-  "num_attention_heads": 32,
-  "num_hidden_layers": 32,
-  "num_key_value_heads": 8,
-  "pretraining_tp": 1,
-  "rms_norm_eps": 1e-05,
-  "rope_scaling": null,
-  "rope_theta": 500000.0,
-  "tie_word_embeddings": false,
-  "tokenizer_model_max_length": 2048,
-  "tokenizer_padding_side": "right",
-  "torch_dtype": "float16",
-  "transformers_version": "4.41.2",
-  "tune_mm_mlp_adapter": false,
-  "unfreeze_vision_tower": true,
-  "use_cache": true,
-  "use_mm_proj": true,
-  "use_s2": false,
-  "vocab_size": 128256
-}

single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/non_lora_trainables.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:2d7005b204a93bd5f662d33d61974e83298a58f3195ae20f0ace5eb2d7251256
-size 899633034

single_dataset/img2json/bugsBunny-Llama-3-8B-V-img2json_dataset_5000_epochs_1_lora/trainer_state.json DELETED Viewed

@@ -1,2226 +0,0 @@
-{
-  "best_metric": null,
-  "best_model_checkpoint": null,
-  "epoch": 0.9984,
-  "eval_steps": 500,
-  "global_step": 312,
-  "is_hyper_param_search": false,
-  "is_local_process_zero": true,
-  "is_world_process_zero": true,
-  "log_history": [
-    {
-      "epoch": 0.0032,
-      "grad_norm": 0.9456431410164449,
-      "learning_rate": 2e-05,
-      "loss": 1.4684,
-      "step": 1
-    },
-    {
-      "epoch": 0.0064,
-      "grad_norm": 0.9595554488188345,
-      "learning_rate": 4e-05,
-      "loss": 1.3749,
-      "step": 2
-    },
-    {
-      "epoch": 0.0096,
-      "grad_norm": 0.7819824872221672,
-      "learning_rate": 6e-05,
-      "loss": 1.3507,
-      "step": 3
-    },
-    {
-      "epoch": 0.0128,
-      "grad_norm": 0.9167023815526777,
-      "learning_rate": 8e-05,
-      "loss": 1.2165,
-      "step": 4
-    },
-    {
-      "epoch": 0.016,
-      "grad_norm": 0.847495342554959,
-      "learning_rate": 0.0001,
-      "loss": 1.1304,
-      "step": 5
-    },
-    {
-      "epoch": 0.0192,
-      "grad_norm": 0.8818427732884414,
-      "learning_rate": 0.00012,
-      "loss": 0.9715,
-      "step": 6
-    },
-    {
-      "epoch": 0.0224,
-      "grad_norm": 0.7726751380491141,
-      "learning_rate": 0.00014,
-      "loss": 0.9336,
-      "step": 7
-    },
-    {
-      "epoch": 0.0256,
-      "grad_norm": 0.6483561690537337,
-      "learning_rate": 0.00016,
-      "loss": 0.8903,
-      "step": 8
-    },
-    {
-      "epoch": 0.0288,
-      "grad_norm": 0.5687921838840364,
-      "learning_rate": 0.00018,
-      "loss": 0.7977,
-      "step": 9
-    },
-    {
-      "epoch": 0.032,
-      "grad_norm": 0.4217197943554311,
-      "learning_rate": 0.0002,
-      "loss": 0.8058,
-      "step": 10
-    },
-    {
-      "epoch": 0.0352,
-      "grad_norm": 0.39301658901144887,
-      "learning_rate": 0.00019999458931878073,
-      "loss": 0.8494,
-      "step": 11
-    },
-    {
-      "epoch": 0.0384,
-      "grad_norm": 0.4590183081730723,
-      "learning_rate": 0.0001999783578606323,
-      "loss": 0.8636,
-      "step": 12
-    },
-    {
-      "epoch": 0.0416,
-      "grad_norm": 0.41874721337638304,
-      "learning_rate": 0.00019995130738201966,
-      "loss": 0.8045,
-      "step": 13
-    },
-    {
-      "epoch": 0.0448,
-      "grad_norm": 0.36156401408882166,
-      "learning_rate": 0.0001999134408101731,
-      "loss": 0.7129,
-      "step": 14
-    },
-    {
-      "epoch": 0.048,
-      "grad_norm": 0.43359389552236977,
-      "learning_rate": 0.00019986476224277165,
-      "loss": 0.737,
-      "step": 15
-    },
-    {
-      "epoch": 0.0512,
-      "grad_norm": 0.3732292263108806,
-      "learning_rate": 0.00019980527694749952,
-      "loss": 0.721,
-      "step": 16
-    },
-    {
-      "epoch": 0.0544,
-      "grad_norm": 0.3672574865848022,
-      "learning_rate": 0.00019973499136147606,
-      "loss": 0.768,
-      "step": 17
-    },
-    {
-      "epoch": 0.0576,
-      "grad_norm": 0.3788969898154259,
-      "learning_rate": 0.0001996539130905593,
-      "loss": 0.6994,
-      "step": 18
-    },
-    {
-      "epoch": 0.0608,
-      "grad_norm": 0.3591018825770351,
-      "learning_rate": 0.0001995620509085228,
-      "loss": 0.7546,
-      "step": 19
-    },
-    {
-      "epoch": 0.064,
-      "grad_norm": 0.3347980700973235,
-      "learning_rate": 0.00019945941475610623,
-      "loss": 0.7799,
-      "step": 20
-    },
-    {
-      "epoch": 0.0672,
-      "grad_norm": 0.3176137835338356,
-      "learning_rate": 0.0001993460157399396,
-      "loss": 0.718,
-      "step": 21
-    },
-    {
-      "epoch": 0.0704,
-      "grad_norm": 0.3110224161054545,
-      "learning_rate": 0.0001992218661313415,
-      "loss": 0.6712,
-      "step": 22
-    },
-    {
-      "epoch": 0.0736,
-      "grad_norm": 0.3176740651877855,
-      "learning_rate": 0.00019908697936499103,
-      "loss": 0.7517,
-      "step": 23
-    },
-    {
-      "epoch": 0.0768,
-      "grad_norm": 0.31311011452703175,
-      "learning_rate": 0.00019894137003747403,
-      "loss": 0.7431,
-      "step": 24
-    },
-    {
-      "epoch": 0.08,
-      "grad_norm": 0.31035422091689385,
-      "learning_rate": 0.00019878505390570362,
-      "loss": 0.7121,
-      "step": 25
-    },
-    {
-      "epoch": 0.0832,
-      "grad_norm": 0.2986181957125755,
-      "learning_rate": 0.00019861804788521493,
-      "loss": 0.6987,
-      "step": 26
-    },
-    {
-      "epoch": 0.0864,
-      "grad_norm": 0.32611739837441994,
-      "learning_rate": 0.00019844037004833473,
-      "loss": 0.7732,
-      "step": 27
-    },
-    {
-      "epoch": 0.0896,
-      "grad_norm": 0.290949601128612,
-      "learning_rate": 0.00019825203962222572,
-      "loss": 0.6783,
-      "step": 28
-    },
-    {
-      "epoch": 0.0928,
-      "grad_norm": 0.30951784533907173,
-      "learning_rate": 0.0001980530769868059,
-      "loss": 0.7126,
-      "step": 29
-    },
-    {
-      "epoch": 0.096,
-      "grad_norm": 0.30642887130649027,
-      "learning_rate": 0.00019784350367254322,
-      "loss": 0.6716,
-      "step": 30
-    },
-    {
-      "epoch": 0.0992,
-      "grad_norm": 0.3253752205567512,
-      "learning_rate": 0.0001976233423581255,
-      "loss": 0.7618,
-      "step": 31
-    },
-    {
-      "epoch": 0.1024,
-      "grad_norm": 0.3074017490574547,
-      "learning_rate": 0.0001973926168680066,
-      "loss": 0.6863,
-      "step": 32
-    },
-    {
-      "epoch": 0.1056,
-      "grad_norm": 0.2991620525987948,
-      "learning_rate": 0.00019715135216982798,
-      "loss": 0.7061,
-      "step": 33
-    },
-    {
-      "epoch": 0.1088,
-      "grad_norm": 0.3226295279455581,
-      "learning_rate": 0.0001968995743717171,
-      "loss": 0.7164,
-      "step": 34
-    },
-    {
-      "epoch": 0.112,
-      "grad_norm": 0.2942063456789797,
-      "learning_rate": 0.00019663731071946206,
-      "loss": 0.6732,
-      "step": 35
-    },
-    {
-      "epoch": 0.1152,
-      "grad_norm": 0.30145830255706224,
-      "learning_rate": 0.00019636458959356316,
-      "loss": 0.6917,
-      "step": 36
-    },
-    {
-      "epoch": 0.1184,
-      "grad_norm": 0.2976180783817878,
-      "learning_rate": 0.0001960814405061619,
-      "loss": 0.6688,
-      "step": 37
-    },
-    {
-      "epoch": 0.1216,
-      "grad_norm": 0.28814304785348366,
-      "learning_rate": 0.00019578789409784727,
-      "loss": 0.6852,
-      "step": 38
-    },
-    {
-      "epoch": 0.1248,
-      "grad_norm": 0.2968046184063759,
-      "learning_rate": 0.00019548398213434007,
-      "loss": 0.6827,
-      "step": 39
-    },
-    {
-      "epoch": 0.128,
-      "grad_norm": 0.2848652475342455,
-      "learning_rate": 0.00019516973750305532,
-      "loss": 0.6905,
-      "step": 40
-    },
-    {
-      "epoch": 0.1312,
-      "grad_norm": 0.29541131745521493,
-      "learning_rate": 0.00019484519420954354,
-      "loss": 0.7409,
-      "step": 41
-    },
-    {
-      "epoch": 0.1344,
-      "grad_norm": 0.30472745054693,
-      "learning_rate": 0.00019451038737381077,
-      "loss": 0.6976,
-      "step": 42
-    },
-    {
-      "epoch": 0.1376,
-      "grad_norm": 0.287978936057206,
-      "learning_rate": 0.00019416535322651818,
-      "loss": 0.6889,
-      "step": 43
-    },
-    {
-      "epoch": 0.1408,
-      "grad_norm": 0.2917374421163095,
-      "learning_rate": 0.00019381012910506146,
-      "loss": 0.7072,
-      "step": 44
-    },
-    {
-      "epoch": 0.144,
-      "grad_norm": 0.3014118067717161,
-      "learning_rate": 0.00019344475344953012,
-      "loss": 0.736,
-      "step": 45
-    },
-    {
-      "epoch": 0.1472,
-      "grad_norm": 0.27940318160556293,
-      "learning_rate": 0.00019306926579854821,
-      "loss": 0.6561,
-      "step": 46
-    },
-    {
-      "epoch": 0.1504,
-      "grad_norm": 0.2764033621334266,
-      "learning_rate": 0.00019268370678499533,
-      "loss": 0.6425,
-      "step": 47
-    },
-    {
-      "epoch": 0.1536,
-      "grad_norm": 0.286603211032781,
-      "learning_rate": 0.0001922881181316097,
-      "loss": 0.6984,
-      "step": 48
-    },
-    {
-      "epoch": 0.1568,
-      "grad_norm": 0.29296348821168333,
-      "learning_rate": 0.00019188254264647337,
-      "loss": 0.6898,
-      "step": 49
-    },
-    {
-      "epoch": 0.16,
-      "grad_norm": 0.28097584337963444,
-      "learning_rate": 0.0001914670242183795,
-      "loss": 0.6858,
-      "step": 50
-    },
-    {
-      "epoch": 0.1632,
-      "grad_norm": 0.30154798580564207,
-      "learning_rate": 0.0001910416078120832,
-      "loss": 0.7145,
-      "step": 51
-    },
-    {
-      "epoch": 0.1664,
-      "grad_norm": 0.30795671266773783,
-      "learning_rate": 0.0001906063394634356,
-      "loss": 0.7126,
-      "step": 52
-    },
-    {
-      "epoch": 0.1696,
-      "grad_norm": 0.28661331080986213,
-      "learning_rate": 0.00019016126627440237,
-      "loss": 0.6722,
-      "step": 53
-    },
-    {
-      "epoch": 0.1728,
-      "grad_norm": 0.27995432204171594,
-      "learning_rate": 0.00018970643640796642,
-      "loss": 0.6533,
-      "step": 54
-    },
-    {
-      "epoch": 0.176,
-      "grad_norm": 0.2881701437861025,
-      "learning_rate": 0.000189241899082916,
-      "loss": 0.6864,
-      "step": 55
-    },
-    {
-      "epoch": 0.1792,
-      "grad_norm": 0.2863440297333805,
-      "learning_rate": 0.00018876770456851877,
-      "loss": 0.6545,
-      "step": 56
-    },
-    {
-      "epoch": 0.1824,
-      "grad_norm": 0.2880700843682891,
-      "learning_rate": 0.0001882839041790818,
-      "loss": 0.6694,
-      "step": 57
-    },
-    {
-      "epoch": 0.1856,
-      "grad_norm": 0.2774792629615745,
-      "learning_rate": 0.00018779055026839868,
-      "loss": 0.6505,
-      "step": 58
-    },
-    {
-      "epoch": 0.1888,
-      "grad_norm": 0.2737154598370057,
-      "learning_rate": 0.00018728769622408423,
-      "loss": 0.6388,
-      "step": 59
-    },
-    {
-      "epoch": 0.192,
-      "grad_norm": 0.27952240531043854,
-      "learning_rate": 0.00018677539646179707,
-      "loss": 0.6802,
-      "step": 60
-    },
-    {
-      "epoch": 0.1952,
-      "grad_norm": 0.28225508446141556,
-      "learning_rate": 0.00018625370641935129,
-      "loss": 0.6638,
-      "step": 61
-    },
-    {
-      "epoch": 0.1984,
-      "grad_norm": 0.28445391301141404,
-      "learning_rate": 0.00018572268255071718,
-      "loss": 0.6997,
-      "step": 62
-    },
-    {
-      "epoch": 0.2016,
-      "grad_norm": 0.27249304099350813,
-      "learning_rate": 0.00018518238231991218,
-      "loss": 0.665,
-      "step": 63
-    },
-    {
-      "epoch": 0.2048,
-      "grad_norm": 0.27827800737681285,
-      "learning_rate": 0.00018463286419478255,
-      "loss": 0.6747,
-      "step": 64
-    },
-    {
-      "epoch": 0.208,
-      "grad_norm": 0.28236239197955565,
-      "learning_rate": 0.00018407418764067627,
-      "loss": 0.6804,
-      "step": 65
-    },
-    {
-      "epoch": 0.2112,
-      "grad_norm": 0.27711713330173016,
-      "learning_rate": 0.00018350641311400812,
-      "loss": 0.6825,
-      "step": 66
-    },
-    {
-      "epoch": 0.2144,
-      "grad_norm": 0.27559978427300996,
-      "learning_rate": 0.0001829296020557174,
-      "loss": 0.6893,
-      "step": 67
-    },
-    {
-      "epoch": 0.2176,
-      "grad_norm": 0.28871955197128163,
-      "learning_rate": 0.00018234381688461942,
-      "loss": 0.6574,
-      "step": 68
-    },
-    {
-      "epoch": 0.2208,
-      "grad_norm": 0.2840675164080103,
-      "learning_rate": 0.0001817491209906506,
-      "loss": 0.7286,
-      "step": 69
-    },
-    {
-      "epoch": 0.224,
-      "grad_norm": 0.3083968957018995,
-      "learning_rate": 0.00018114557872800905,
-      "loss": 0.6578,
-      "step": 70
-    },
-    {
-      "epoch": 0.2272,
-      "grad_norm": 0.2843871593471055,
-      "learning_rate": 0.00018053325540819045,
-      "loss": 0.6792,
-      "step": 71
-    },
-    {
-      "epoch": 0.2304,
-      "grad_norm": 0.2791531418567351,
-      "learning_rate": 0.0001799122172929206,
-      "loss": 0.6779,
-      "step": 72
-    },
-    {
-      "epoch": 0.2336,
-      "grad_norm": 0.28436440881242775,
-      "learning_rate": 0.00017928253158698473,
-      "loss": 0.6461,
-      "step": 73
-    },
-    {
-      "epoch": 0.2368,
-      "grad_norm": 0.28210657946106504,
-      "learning_rate": 0.0001786442664309554,
-      "loss": 0.6376,
-      "step": 74
-    },
-    {
-      "epoch": 0.24,
-      "grad_norm": 0.27471101989657526,
-      "learning_rate": 0.0001779974908938184,
-      "loss": 0.6589,
-      "step": 75
-    },
-    {
-      "epoch": 0.2432,
-      "grad_norm": 0.27255559553885733,
-      "learning_rate": 0.0001773422749654988,
-      "loss": 0.6624,
-      "step": 76
-    },
-    {
-      "epoch": 0.2464,
-      "grad_norm": 0.28238069396965426,
-      "learning_rate": 0.00017667868954928694,
-      "loss": 0.665,
-      "step": 77
-    },
-    {
-      "epoch": 0.2496,
-      "grad_norm": 0.28091292280312435,
-      "learning_rate": 0.00017600680645416583,
-      "loss": 0.6582,
-      "step": 78
-    },
-    {
-      "epoch": 0.2528,
-      "grad_norm": 0.2819117395356436,
-      "learning_rate": 0.00017532669838704035,
-      "loss": 0.681,
-      "step": 79
-    },
-    {
-      "epoch": 0.256,
-      "grad_norm": 0.2808167623131368,
-      "learning_rate": 0.00017463843894486937,
-      "loss": 0.6635,
-      "step": 80
-    },
-    {
-      "epoch": 0.2592,
-      "grad_norm": 0.2727980766577742,
-      "learning_rate": 0.0001739421026067017,
-      "loss": 0.6387,
-      "step": 81
-    },
-    {
-      "epoch": 0.2624,
-      "grad_norm": 0.27946610199292726,
-      "learning_rate": 0.00017323776472561627,
-      "loss": 0.6636,
-      "step": 82
-    },
-    {
-      "epoch": 0.2656,
-      "grad_norm": 0.2776930025762807,
-      "learning_rate": 0.00017252550152056795,
-      "loss": 0.6629,
-      "step": 83
-    },
-    {
-      "epoch": 0.2688,
-      "grad_norm": 0.29547305795196493,
-      "learning_rate": 0.0001718053900681397,
-      "loss": 0.7105,
-      "step": 84
-    },
-    {
-      "epoch": 0.272,
-      "grad_norm": 0.27992559053351135,
-      "learning_rate": 0.00017107750829420176,
-      "loss": 0.6217,
-      "step": 85
-    },
-    {
-      "epoch": 0.2752,
-      "grad_norm": 0.2836978378018261,
-      "learning_rate": 0.00017034193496547902,
-      "loss": 0.6364,
-      "step": 86
-    },
-    {
-      "epoch": 0.2784,
-      "grad_norm": 0.2859793795996647,
-      "learning_rate": 0.00016959874968102735,
-      "loss": 0.6611,
-      "step": 87
-    },
-    {
-      "epoch": 0.2816,
-      "grad_norm": 0.27670508997854193,
-      "learning_rate": 0.00016884803286362,
-      "loss": 0.6321,
-      "step": 88
-    },
-    {
-      "epoch": 0.2848,
-      "grad_norm": 0.2804552643700331,
-      "learning_rate": 0.00016808986575104465,
-      "loss": 0.654,
-      "step": 89
-    },
-    {
-      "epoch": 0.288,
-      "grad_norm": 0.2988662193267764,
-      "learning_rate": 0.00016732433038731242,
-      "loss": 0.6631,
-      "step": 90
-    },
-    {
-      "epoch": 0.2912,
-      "grad_norm": 0.2850945559743476,
-      "learning_rate": 0.0001665515096137797,
-      "loss": 0.6546,
-      "step": 91
-    },
-    {
-      "epoch": 0.2944,
-      "grad_norm": 0.34508854455612903,
-      "learning_rate": 0.00016577148706018328,
-      "loss": 0.6181,
-      "step": 92
-    },
-    {
-      "epoch": 0.2976,
-      "grad_norm": 0.29772302651197313,
-      "learning_rate": 0.00016498434713559088,
-      "loss": 0.6953,
-      "step": 93
-    },
-    {
-      "epoch": 0.3008,
-      "grad_norm": 0.2814403372374441,
-      "learning_rate": 0.00016419017501926656,
-      "loss": 0.6711,
-      "step": 94
-    },
-    {
-      "epoch": 0.304,
-      "grad_norm": 0.2666134049420953,
-      "learning_rate": 0.0001633890566514535,
-      "loss": 0.6384,
-      "step": 95
-    },
-    {
-      "epoch": 0.3072,
-      "grad_norm": 0.2759217127531887,
-      "learning_rate": 0.00016258107872407375,
-      "loss": 0.5927,
-      "step": 96
-    },
-    {
-      "epoch": 0.3104,
-      "grad_norm": 0.27234087211717334,
-      "learning_rate": 0.0001617663286713474,
-      "loss": 0.6047,
-      "step": 97
-    },
-    {
-      "epoch": 0.3136,
-      "grad_norm": 0.28929394588800267,
-      "learning_rate": 0.00016094489466033043,
-      "loss": 0.709,
-      "step": 98
-    },
-    {
-      "epoch": 0.3168,
-      "grad_norm": 0.27285248521549677,
-      "learning_rate": 0.00016011686558137448,
-      "loss": 0.6275,
-      "step": 99
-    },
-    {
-      "epoch": 0.32,
-      "grad_norm": 0.27087214179135655,
-      "learning_rate": 0.0001592823310385073,
-      "loss": 0.6288,
-      "step": 100
-    },
-    {
-      "epoch": 0.3232,
-      "grad_norm": 0.2759835535783692,
-      "learning_rate": 0.0001584413813397364,
-      "loss": 0.6529,
-      "step": 101
-    },
-    {
-      "epoch": 0.3264,
-      "grad_norm": 0.2732937178984582,
-      "learning_rate": 0.00015759410748727662,
-      "loss": 0.6443,
-      "step": 102
-    },
-    {
-      "epoch": 0.3296,
-      "grad_norm": 0.27150370023008635,
-      "learning_rate": 0.00015674060116770236,
-      "loss": 0.6401,
-      "step": 103
-    },
-    {
-      "epoch": 0.3328,
-      "grad_norm": 0.288480165883741,
-      "learning_rate": 0.00015588095474202595,
-      "loss": 0.661,
-      "step": 104
-    },
-    {
-      "epoch": 0.336,
-      "grad_norm": 0.2715185186712244,
-      "learning_rate": 0.00015501526123570277,
-      "loss": 0.5861,
-      "step": 105
-    },
-    {
-      "epoch": 0.3392,
-      "grad_norm": 0.2890522618399928,
-      "learning_rate": 0.00015414361432856475,
-      "loss": 0.6547,
-      "step": 106
-    },
-    {
-      "epoch": 0.3424,
-      "grad_norm": 0.2809942944669504,
-      "learning_rate": 0.0001532661083446829,
-      "loss": 0.6502,
-      "step": 107
-    },
-    {
-      "epoch": 0.3456,
-      "grad_norm": 0.2770621559504025,
-      "learning_rate": 0.00015238283824216015,
-      "loss": 0.651,
-      "step": 108
-    },
-    {
-      "epoch": 0.3488,
-      "grad_norm": 0.28643434801423096,
-      "learning_rate": 0.00015149389960285558,
-      "loss": 0.6717,
-      "step": 109
-    },
-    {
-      "epoch": 0.352,
-      "grad_norm": 0.2838267721875221,
-      "learning_rate": 0.00015059938862204127,
-      "loss": 0.6666,
-      "step": 110
-    },
-    {
-      "epoch": 0.3552,
-      "grad_norm": 0.2904515511293253,
-      "learning_rate": 0.00014969940209799248,
-      "loss": 0.6788,
-      "step": 111
-    },
-    {
-      "epoch": 0.3584,
-      "grad_norm": 0.27521059578174806,
-      "learning_rate": 0.00014879403742151283,
-      "loss": 0.6421,
-      "step": 112
-    },
-    {
-      "epoch": 0.3616,
-      "grad_norm": 0.28435477767209977,
-      "learning_rate": 0.00014788339256539544,
-      "loss": 0.6806,
-      "step": 113
-    },
-    {
-      "epoch": 0.3648,
-      "grad_norm": 0.2836689862948711,
-      "learning_rate": 0.0001469675660738206,
-      "loss": 0.6633,
-      "step": 114
-    },
-    {
-      "epoch": 0.368,
-      "grad_norm": 0.27993720267562205,
-      "learning_rate": 0.00014604665705169237,
-      "loss": 0.6031,
-      "step": 115
-    },
-    {
-      "epoch": 0.3712,
-      "grad_norm": 0.26823803322012507,
-      "learning_rate": 0.00014512076515391375,
-      "loss": 0.6258,
-      "step": 116
-    },
-    {
-      "epoch": 0.3744,
-      "grad_norm": 0.2755753433763202,
-      "learning_rate": 0.00014418999057460276,
-      "loss": 0.6238,
-      "step": 117
-    },
-    {
-      "epoch": 0.3776,
-      "grad_norm": 0.26378813575701704,
-      "learning_rate": 0.0001432544340362501,
-      "loss": 0.6105,
-      "step": 118
-    },
-    {
-      "epoch": 0.3808,
-      "grad_norm": 0.27807647013178804,
-      "learning_rate": 0.00014231419677881966,
-      "loss": 0.634,
-      "step": 119
-    },
-    {
-      "epoch": 0.384,
-      "grad_norm": 0.2850302043584264,
-      "learning_rate": 0.00014136938054879283,
-      "loss": 0.6473,
-      "step": 120
-    },
-    {
-      "epoch": 0.3872,
-      "grad_norm": 0.2842021607325469,
-      "learning_rate": 0.00014042008758815818,
-      "loss": 0.6666,
-      "step": 121
-    },
-    {
-      "epoch": 0.3904,
-      "grad_norm": 0.2646241757350502,
-      "learning_rate": 0.00013946642062334766,
-      "loss": 0.5882,
-      "step": 122
-    },
-    {
-      "epoch": 0.3936,
-      "grad_norm": 0.27553697595802684,
-      "learning_rate": 0.00013850848285411994,
-      "loss": 0.6553,
-      "step": 123
-    },
-    {
-      "epoch": 0.3968,
-      "grad_norm": 0.266869763753527,
-      "learning_rate": 0.000137546377942393,
-      "loss": 0.6334,
-      "step": 124
-    },
-    {
-      "epoch": 0.4,
-      "grad_norm": 0.2891373420552467,
-      "learning_rate": 0.00013658021000102636,
-      "loss": 0.6232,
-      "step": 125
-    },
-    {
-      "epoch": 0.4032,
-      "grad_norm": 0.29054049719448544,
-      "learning_rate": 0.00013561008358255468,
-      "loss": 0.6685,
-      "step": 126
-    },
-    {
-      "epoch": 0.4064,
-      "grad_norm": 0.26991808658939265,
-      "learning_rate": 0.00013463610366787392,
-      "loss": 0.6451,
-      "step": 127
-    },
-    {
-      "epoch": 0.4096,
-      "grad_norm": 0.2826019428481463,
-      "learning_rate": 0.00013365837565488064,
-      "loss": 0.6748,
-      "step": 128
-    },
-    {
-      "epoch": 0.4128,
-      "grad_norm": 0.26989180549288744,
-      "learning_rate": 0.0001326770053470668,
-      "loss": 0.6366,
-      "step": 129
-    },
-    {
-      "epoch": 0.416,
-      "grad_norm": 0.2693716385431619,
-      "learning_rate": 0.0001316920989420703,
-      "loss": 0.6365,
-      "step": 130
-    },
-    {
-      "epoch": 0.4192,
-      "grad_norm": 0.2587716538530605,
-      "learning_rate": 0.00013070376302018287,
-      "loss": 0.5821,
-      "step": 131
-    },
-    {
-      "epoch": 0.4224,
-      "grad_norm": 0.2733425696050926,
-      "learning_rate": 0.00012971210453281674,
-      "loss": 0.6601,
-      "step": 132
-    },
-    {
-      "epoch": 0.4256,
-      "grad_norm": 0.28260898546044794,
-      "learning_rate": 0.000128717230790931,
-      "loss": 0.6598,
-      "step": 133
-    },
-    {
-      "epoch": 0.4288,
-      "grad_norm": 0.26073598361962874,
-      "learning_rate": 0.00012771924945341906,
-      "loss": 0.6062,
-      "step": 134
-    },
-    {
-      "epoch": 0.432,
-      "grad_norm": 0.26672014153172025,
-      "learning_rate": 0.00012671826851545851,
-      "loss": 0.6664,
-      "step": 135
-    },
-    {
-      "epoch": 0.4352,
-      "grad_norm": 0.27064468874961567,
-      "learning_rate": 0.0001257143962968246,
-      "loss": 0.6409,
-      "step": 136
-    },
-    {
-      "epoch": 0.4384,
-      "grad_norm": 0.2678777921010367,
-      "learning_rate": 0.00012470774143016853,
-      "loss": 0.6146,
-      "step": 137
-    },
-    {
-      "epoch": 0.4416,
-      "grad_norm": 0.2831646298939026,
-      "learning_rate": 0.00012369841284926188,
-      "loss": 0.6641,
-      "step": 138
-    },
-    {
-      "epoch": 0.4448,
-      "grad_norm": 0.2863253592057525,
-      "learning_rate": 0.00012268651977720866,
-      "loss": 0.6653,
-      "step": 139
-    },
-    {
-      "epoch": 0.448,
-      "grad_norm": 0.26496566477700495,
-      "learning_rate": 0.00012167217171462566,
-      "loss": 0.6061,
-      "step": 140
-    },
-    {
-      "epoch": 0.4512,
-      "grad_norm": 0.2741590428796881,
-      "learning_rate": 0.0001206554784277931,
-      "loss": 0.683,
-      "step": 141
-    },
-    {
-      "epoch": 0.4544,
-      "grad_norm": 0.274967394883945,
-      "learning_rate": 0.00011963654993677645,
-      "loss": 0.6738,
-      "step": 142
-    },
-    {
-      "epoch": 0.4576,
-      "grad_norm": 0.26748184911275685,
-      "learning_rate": 0.00011861549650352069,
-      "loss": 0.6259,
-      "step": 143
-    },
-    {
-      "epoch": 0.4608,
-      "grad_norm": 0.27571463833666354,
-      "learning_rate": 0.00011759242861991855,
-      "loss": 0.6824,
-      "step": 144
-    },
-    {
-      "epoch": 0.464,
-      "grad_norm": 0.28204383254583004,
-      "learning_rate": 0.00011656745699585371,
-      "loss": 0.654,
-      "step": 145
-    },
-    {
-      "epoch": 0.4672,
-      "grad_norm": 0.2737273526998308,
-      "learning_rate": 0.00011554069254722051,
-      "loss": 0.6383,
-      "step": 146
-    },
-    {
-      "epoch": 0.4704,
-      "grad_norm": 0.26718453878335136,
-      "learning_rate": 0.00011451224638392129,
-      "loss": 0.6336,
-      "step": 147
-    },
-    {
-      "epoch": 0.4736,
-      "grad_norm": 0.28928334642647074,
-      "learning_rate": 0.00011348222979784289,
-      "loss": 0.6328,
-      "step": 148
-    },
-    {
-      "epoch": 0.4768,
-      "grad_norm": 0.27269344035826515,
-      "learning_rate": 0.00011245075425081328,
-      "loss": 0.6261,
-      "step": 149
-    },
-    {
-      "epoch": 0.48,
-      "grad_norm": 0.27361132880394723,
-      "learning_rate": 0.00011141793136253986,
-      "loss": 0.6423,
-      "step": 150
-    },
-    {
-      "epoch": 0.4832,
-      "grad_norm": 0.2666674720378228,
-      "learning_rate": 0.0001103838728985307,
-      "loss": 0.6397,
-      "step": 151
-    },
-    {
-      "epoch": 0.4864,
-      "grad_norm": 0.26339311415617495,
-      "learning_rate": 0.000109348690758,
-      "loss": 0.6184,
-      "step": 152
-    },
-    {
-      "epoch": 0.4896,
-      "grad_norm": 0.30511330160542777,
-      "learning_rate": 0.00010831249696175918,
-      "loss": 0.631,
-      "step": 153
-    },
-    {
-      "epoch": 0.4928,
-      "grad_norm": 0.27393530074277894,
-      "learning_rate": 0.0001072754036400944,
-      "loss": 0.64,
-      "step": 154
-    },
-    {
-      "epoch": 0.496,
-      "grad_norm": 0.2630950654462822,
-      "learning_rate": 0.00010623752302063283,
-      "loss": 0.6256,
-      "step": 155
-    },
-    {
-      "epoch": 0.4992,
-      "grad_norm": 0.26558776558003316,
-      "learning_rate": 0.00010519896741619803,
-      "loss": 0.6329,
-      "step": 156
-    },
-    {
-      "epoch": 0.5024,
-      "grad_norm": 0.27737305767773224,
-      "learning_rate": 0.00010415984921265609,
-      "loss": 0.6808,
-      "step": 157
-    },
-    {
-      "epoch": 0.5056,
-      "grad_norm": 0.34438277093807235,
-      "learning_rate": 0.00010312028085675391,
-      "loss": 0.628,
-      "step": 158
-    },
-    {
-      "epoch": 0.5088,
-      "grad_norm": 0.2716378054950986,
-      "learning_rate": 0.00010208037484395114,
-      "loss": 0.6197,
-      "step": 159
-    },
-    {
-      "epoch": 0.512,
-      "grad_norm": 0.25713458936107125,
-      "learning_rate": 0.00010104024370624644,
-      "loss": 0.6055,
-      "step": 160
-    },
-    {
-      "epoch": 0.5152,
-      "grad_norm": 0.2556414802851472,
-      "learning_rate": 0.0001,
-      "loss": 0.6088,
-      "step": 161
-    },
-    {
-      "epoch": 0.5184,
-      "grad_norm": 0.9117970631393475,
-      "learning_rate": 9.895975629375359e-05,
-      "loss": 0.6301,
-      "step": 162
-    },
-    {
-      "epoch": 0.5216,
-      "grad_norm": 0.26669052461677767,
-      "learning_rate": 9.791962515604887e-05,
-      "loss": 0.6326,
-      "step": 163
-    },
-    {
-      "epoch": 0.5248,
-      "grad_norm": 0.27925439381355327,
-      "learning_rate": 9.687971914324607e-05,
-      "loss": 0.6681,
-      "step": 164
-    },
-    {
-      "epoch": 0.528,
-      "grad_norm": 0.25468990353229803,
-      "learning_rate": 9.584015078734395e-05,
-      "loss": 0.5743,
-      "step": 165
-    },
-    {
-      "epoch": 0.5312,
-      "grad_norm": 0.2629677375763853,
-      "learning_rate": 9.480103258380198e-05,
-      "loss": 0.6055,
-      "step": 166
-    },
-    {
-      "epoch": 0.5344,
-      "grad_norm": 0.26370207518366157,
-      "learning_rate": 9.376247697936719e-05,
-      "loss": 0.6102,
-      "step": 167
-    },
-    {
-      "epoch": 0.5376,
-      "grad_norm": 0.26441992972242784,
-      "learning_rate": 9.272459635990562e-05,
-      "loss": 0.6238,
-      "step": 168
-    },
-    {
-      "epoch": 0.5408,
-      "grad_norm": 0.2798821979607459,
-      "learning_rate": 9.168750303824084e-05,
-      "loss": 0.6568,
-      "step": 169
-    },
-    {
-      "epoch": 0.544,
-      "grad_norm": 0.2748187095456481,
-      "learning_rate": 9.065130924199998e-05,
-      "loss": 0.6068,
-      "step": 170
-    },
-    {
-      "epoch": 0.5472,
-      "grad_norm": 0.26830989542228445,
-      "learning_rate": 8.961612710146934e-05,
-      "loss": 0.6217,
-      "step": 171
-    },
-    {
-      "epoch": 0.5504,
-      "grad_norm": 0.2624748827277877,
-      "learning_rate": 8.858206863746018e-05,
-      "loss": 0.6203,
-      "step": 172
-    },
-    {
-      "epoch": 0.5536,
-      "grad_norm": 0.2552606780949764,
-      "learning_rate": 8.754924574918675e-05,
-      "loss": 0.5663,
-      "step": 173
-    },
-    {
-      "epoch": 0.5568,
-      "grad_norm": 0.2557632710956968,
-      "learning_rate": 8.651777020215712e-05,
-      "loss": 0.5918,
-      "step": 174
-    },
-    {
-      "epoch": 0.56,
-      "grad_norm": 0.2666361550200854,
-      "learning_rate": 8.548775361607872e-05,
-      "loss": 0.625,
-      "step": 175
-    },
-    {
-      "epoch": 0.5632,
-      "grad_norm": 0.2560883378063079,
-      "learning_rate": 8.445930745277953e-05,
-      "loss": 0.6032,
-      "step": 176
-    },
-    {
-      "epoch": 0.5664,
-      "grad_norm": 0.2583406590237059,
-      "learning_rate": 8.343254300414628e-05,
-      "loss": 0.6075,
-      "step": 177
-    },
-    {
-      "epoch": 0.5696,
-      "grad_norm": 0.27012077672072304,
-      "learning_rate": 8.240757138008149e-05,
-      "loss": 0.645,
-      "step": 178
-    },
-    {
-      "epoch": 0.5728,
-      "grad_norm": 0.2697472104569865,
-      "learning_rate": 8.138450349647936e-05,
-      "loss": 0.6357,
-      "step": 179
-    },
-    {
-      "epoch": 0.576,
-      "grad_norm": 0.2704698732844997,
-      "learning_rate": 8.036345006322359e-05,
-      "loss": 0.6278,
-      "step": 180
-    },
-    {
-      "epoch": 0.5792,
-      "grad_norm": 0.27596503294944713,
-      "learning_rate": 7.934452157220694e-05,
-      "loss": 0.655,
-      "step": 181
-    },
-    {
-      "epoch": 0.5824,
-      "grad_norm": 0.26527906505737747,
-      "learning_rate": 7.832782828537437e-05,
-      "loss": 0.568,
-      "step": 182
-    },
-    {
-      "epoch": 0.5856,
-      "grad_norm": 0.2697094402719354,
-      "learning_rate": 7.731348022279134e-05,
-      "loss": 0.6172,
-      "step": 183
-    },
-    {
-      "epoch": 0.5888,
-      "grad_norm": 0.2738859813879391,
-      "learning_rate": 7.630158715073813e-05,
-      "loss": 0.6275,
-      "step": 184
-    },
-    {
-      "epoch": 0.592,
-      "grad_norm": 0.25939271181980195,
-      "learning_rate": 7.52922585698315e-05,
-      "loss": 0.5639,
-      "step": 185
-    },
-    {
-      "epoch": 0.5952,
-      "grad_norm": 0.2671651992817328,
-      "learning_rate": 7.428560370317542e-05,
-      "loss": 0.5919,
-      "step": 186
-    },
-    {
-      "epoch": 0.5984,
-      "grad_norm": 0.2688565475090851,
-      "learning_rate": 7.328173148454151e-05,
-      "loss": 0.6312,
-      "step": 187
-    },
-    {
-      "epoch": 0.6016,
-      "grad_norm": 0.2784248260332949,
-      "learning_rate": 7.228075054658096e-05,
-      "loss": 0.6124,
-      "step": 188
-    },
-    {
-      "epoch": 0.6048,
-      "grad_norm": 0.26997057247712153,
-      "learning_rate": 7.1282769209069e-05,
-      "loss": 0.648,
-      "step": 189
-    },
-    {
-      "epoch": 0.608,
-      "grad_norm": 0.2707092694992218,
-      "learning_rate": 7.028789546718326e-05,
-      "loss": 0.6108,
-      "step": 190
-    },
-    {
-      "epoch": 0.6112,
-      "grad_norm": 0.2694317279781858,
-      "learning_rate": 6.929623697981718e-05,
-      "loss": 0.6399,
-      "step": 191
-    },
-    {
-      "epoch": 0.6144,
-      "grad_norm": 0.26041601563873007,
-      "learning_rate": 6.830790105792973e-05,
-      "loss": 0.5889,
-      "step": 192
-    },
-    {
-      "epoch": 0.6176,
-      "grad_norm": 0.25291835995919404,
-      "learning_rate": 6.732299465293322e-05,
-      "loss": 0.5911,
-      "step": 193
-    },
-    {
-      "epoch": 0.6208,
-      "grad_norm": 0.26551700863248473,
-      "learning_rate": 6.63416243451194e-05,
-      "loss": 0.6089,
-      "step": 194
-    },
-    {
-      "epoch": 0.624,
-      "grad_norm": 0.25882948329790545,
-      "learning_rate": 6.536389633212609e-05,
-      "loss": 0.6298,
-      "step": 195
-    },
-    {
-      "epoch": 0.6272,
-      "grad_norm": 0.2529947012716399,
-      "learning_rate": 6.43899164174453e-05,
-      "loss": 0.5914,
-      "step": 196
-    },
-    {
-      "epoch": 0.6304,
-      "grad_norm": 0.32497365798332417,
-      "learning_rate": 6.341978999897365e-05,
-      "loss": 0.638,
-      "step": 197
-    },
-    {
-      "epoch": 0.6336,
-      "grad_norm": 0.26296553022742203,
-      "learning_rate": 6.245362205760704e-05,
-      "loss": 0.6258,
-      "step": 198
-    },
-    {
-      "epoch": 0.6368,
-      "grad_norm": 0.2657709305402464,
-      "learning_rate": 6.149151714588009e-05,
-      "loss": 0.6495,
-      "step": 199
-    },
-    {
-      "epoch": 0.64,
-      "grad_norm": 0.25801033104181925,
-      "learning_rate": 6.053357937665237e-05,
-      "loss": 0.6019,
-      "step": 200
-    },
-    {
-      "epoch": 0.6432,
-      "grad_norm": 0.25805367525126394,
-      "learning_rate": 5.957991241184184e-05,
-      "loss": 0.5931,
-      "step": 201
-    },
-    {
-      "epoch": 0.6464,
-      "grad_norm": 0.26487929148998474,
-      "learning_rate": 5.863061945120719e-05,
-      "loss": 0.6711,
-      "step": 202
-    },
-    {
-      "epoch": 0.6496,
-      "grad_norm": 0.25255797052501494,
-      "learning_rate": 5.768580322118034e-05,
-      "loss": 0.6088,
-      "step": 203
-    },
-    {
-      "epoch": 0.6528,
-      "grad_norm": 0.25317945487768007,
-      "learning_rate": 5.6745565963749925e-05,
-      "loss": 0.5703,
-      "step": 204
-    },
-    {
-      "epoch": 0.656,
-      "grad_norm": 0.2545495908249795,
-      "learning_rate": 5.5810009425397294e-05,
-      "loss": 0.5878,
-      "step": 205
-    },
-    {
-      "epoch": 0.6592,
-      "grad_norm": 0.32318784276902335,
-      "learning_rate": 5.487923484608629e-05,
-      "loss": 0.6273,
-      "step": 206
-    },
-    {
-      "epoch": 0.6624,
-      "grad_norm": 0.25499189403717754,
-      "learning_rate": 5.395334294830765e-05,
-      "loss": 0.576,
-      "step": 207
-    },
-    {
-      "epoch": 0.6656,
-      "grad_norm": 0.26227335079319297,
-      "learning_rate": 5.3032433926179395e-05,
-      "loss": 0.5812,
-      "step": 208
-    },
-    {
-      "epoch": 0.6688,
-      "grad_norm": 0.2514697436323961,
-      "learning_rate": 5.211660743460458e-05,
-      "loss": 0.5734,
-      "step": 209
-    },
-    {
-      "epoch": 0.672,
-      "grad_norm": 0.27670627140391063,
-      "learning_rate": 5.1205962578487155e-05,
-      "loss": 0.6589,
-      "step": 210
-    },
-    {
-      "epoch": 0.6752,
-      "grad_norm": 0.26233287953524004,
-      "learning_rate": 5.030059790200756e-05,
-      "loss": 0.5956,
-      "step": 211
-    },
-    {
-      "epoch": 0.6784,
-      "grad_norm": 0.25899185821758536,
-      "learning_rate": 4.940061137795876e-05,
-      "loss": 0.5981,
-      "step": 212
-    },
-    {
-      "epoch": 0.6816,
-      "grad_norm": 0.2695534934217965,
-      "learning_rate": 4.850610039714444e-05,
-      "loss": 0.5788,
-      "step": 213
-    },
-    {
-      "epoch": 0.6848,
-      "grad_norm": 0.2622202021562537,
-      "learning_rate": 4.761716175783989e-05,
-      "loss": 0.6263,
-      "step": 214
-    },
-    {
-      "epoch": 0.688,
-      "grad_norm": 0.26561851569401834,
-      "learning_rate": 4.673389165531714e-05,
-      "loss": 0.6423,
-      "step": 215
-    },
-    {
-      "epoch": 0.6912,
-      "grad_norm": 0.2519941661268092,
-      "learning_rate": 4.585638567143529e-05,
-      "loss": 0.602,
-      "step": 216
-    },
-    {
-      "epoch": 0.6944,
-      "grad_norm": 0.2585815299560701,
-      "learning_rate": 4.498473876429726e-05,
-      "loss": 0.6187,
-      "step": 217
-    },
-    {
-      "epoch": 0.6976,
-      "grad_norm": 0.2579931313485816,
-      "learning_rate": 4.411904525797408e-05,
-      "loss": 0.6252,
-      "step": 218
-    },
-    {
-      "epoch": 0.7008,
-      "grad_norm": 0.26138279560269373,
-      "learning_rate": 4.325939883229766e-05,
-      "loss": 0.6191,
-      "step": 219
-    },
-    {
-      "epoch": 0.704,
-      "grad_norm": 0.2652328876562207,
-      "learning_rate": 4.240589251272342e-05,
-      "loss": 0.6341,
-      "step": 220
-    },
-    {
-      "epoch": 0.7072,
-      "grad_norm": 0.26814443178673353,
-      "learning_rate": 4.155861866026364e-05,
-      "loss": 0.6069,
-      "step": 221
-    },
-    {
-      "epoch": 0.7104,
-      "grad_norm": 0.25716580395548105,
-      "learning_rate": 4.071766896149273e-05,
-      "loss": 0.5923,
-      "step": 222
-    },
-    {
-      "epoch": 0.7136,
-      "grad_norm": 0.25262697851315913,
-      "learning_rate": 3.988313441862553e-05,
-      "loss": 0.598,
-      "step": 223
-    },
-    {
-      "epoch": 0.7168,
-      "grad_norm": 0.25536363817600527,
-      "learning_rate": 3.9055105339669595e-05,
-      "loss": 0.5786,
-      "step": 224
-    },
-    {
-      "epoch": 0.72,
-      "grad_norm": 0.2515414352273644,
-      "learning_rate": 3.823367132865265e-05,
-      "loss": 0.5973,
-      "step": 225
-    },
-    {
-      "epoch": 0.7232,
-      "grad_norm": 0.3082923451611295,
-      "learning_rate": 3.741892127592625e-05,
-      "loss": 0.5738,
-      "step": 226
-    },
-    {
-      "epoch": 0.7264,
-      "grad_norm": 0.2665831077382451,
-      "learning_rate": 3.6610943348546526e-05,
-      "loss": 0.6079,
-      "step": 227
-    },
-    {
-      "epoch": 0.7296,
-      "grad_norm": 0.2570054423385765,
-      "learning_rate": 3.580982498073344e-05,
-      "loss": 0.5884,
-      "step": 228
-    },
-    {
-      "epoch": 0.7328,
-      "grad_norm": 0.2764118800977991,
-      "learning_rate": 3.501565286440914e-05,
-      "loss": 0.5664,
-      "step": 229
-    },
-    {
-      "epoch": 0.736,
-      "grad_norm": 0.2659015981177506,
-      "learning_rate": 3.422851293981676e-05,
-      "loss": 0.5945,
-      "step": 230
-    },
-    {
-      "epoch": 0.7392,
-      "grad_norm": 0.25185649916492603,
-      "learning_rate": 3.3448490386220355e-05,
-      "loss": 0.5709,
-      "step": 231
-    },
-    {
-      "epoch": 0.7424,
-      "grad_norm": 0.26390095033023064,
-      "learning_rate": 3.2675669612687565e-05,
-      "loss": 0.5989,
-      "step": 232
-    },
-    {
-      "epoch": 0.7456,
-      "grad_norm": 0.2530850728794403,
-      "learning_rate": 3.191013424895536e-05,
-      "loss": 0.593,
-      "step": 233
-    },
-    {
-      "epoch": 0.7488,
-      "grad_norm": 0.2565488992653013,
-      "learning_rate": 3.115196713638e-05,
-      "loss": 0.5769,
-      "step": 234
-    },
-    {
-      "epoch": 0.752,
-      "grad_norm": 0.2611020063864013,
-      "learning_rate": 3.040125031897264e-05,
-      "loss": 0.6358,
-      "step": 235
-    },
-    {
-      "epoch": 0.7552,
-      "grad_norm": 0.2464939099364865,
-      "learning_rate": 2.9658065034520978e-05,
-      "loss": 0.5717,
-      "step": 236
-    },
-    {
-      "epoch": 0.7584,
-      "grad_norm": 0.25190580400469725,
-      "learning_rate": 2.892249170579826e-05,
-      "loss": 0.6154,
-      "step": 237
-    },
-    {
-      "epoch": 0.7616,
-      "grad_norm": 0.2624308866231954,
-      "learning_rate": 2.8194609931860316e-05,
-      "loss": 0.6333,
-      "step": 238
-    },
-    {
-      "epoch": 0.7648,
-      "grad_norm": 0.2542363438650353,
-      "learning_rate": 2.7474498479432087e-05,
-      "loss": 0.6209,
-      "step": 239
-    },
-    {
-      "epoch": 0.768,
-      "grad_norm": 0.25002614059735057,
-      "learning_rate": 2.6762235274383772e-05,
-      "loss": 0.5733,
-      "step": 240
-    },
-    {
-      "epoch": 0.7712,
-      "grad_norm": 0.2522774493843005,
-      "learning_rate": 2.6057897393298324e-05,
-      "loss": 0.5709,
-      "step": 241
-    },
-    {
-      "epoch": 0.7744,
-      "grad_norm": 0.2592098056140366,
-      "learning_rate": 2.536156105513062e-05,
-      "loss": 0.6038,
-      "step": 242
-    },
-    {
-      "epoch": 0.7776,
-      "grad_norm": 0.2545528496860185,
-      "learning_rate": 2.4673301612959654e-05,
-      "loss": 0.6015,
-      "step": 243
-    },
-    {
-      "epoch": 0.7808,
-      "grad_norm": 0.26574395128852707,
-      "learning_rate": 2.399319354583418e-05,
-      "loss": 0.6239,
-      "step": 244
-    },
-    {
-      "epoch": 0.784,
-      "grad_norm": 0.2624335087446576,
-      "learning_rate": 2.3321310450713062e-05,
-      "loss": 0.6181,
-      "step": 245
-    },
-    {
-      "epoch": 0.7872,
-      "grad_norm": 0.2586614704436218,
-      "learning_rate": 2.265772503450122e-05,
-      "loss": 0.6334,
-      "step": 246
-    },
-    {
-      "epoch": 0.7904,
-      "grad_norm": 0.31003500089156544,
-      "learning_rate": 2.2002509106181624e-05,
-      "loss": 0.6078,
-      "step": 247
-    },
-    {
-      "epoch": 0.7936,
-      "grad_norm": 0.24237023956212553,
-      "learning_rate": 2.1355733569044635e-05,
-      "loss": 0.5494,
-      "step": 248
-    },
-    {
-      "epoch": 0.7968,
-      "grad_norm": 0.2625073020569576,
-      "learning_rate": 2.0717468413015283e-05,
-      "loss": 0.6057,
-      "step": 249
-    },
-    {
-      "epoch": 0.8,
-      "grad_norm": 0.2539419606355253,
-      "learning_rate": 2.008778270707944e-05,
-      "loss": 0.6047,
-      "step": 250
-    },
-    {
-      "epoch": 0.8032,
-      "grad_norm": 0.25884298262354777,
-      "learning_rate": 1.946674459180955e-05,
-      "loss": 0.5951,
-      "step": 251
-    },
-    {
-      "epoch": 0.8064,
-      "grad_norm": 0.2660197122315272,
-      "learning_rate": 1.8854421271990964e-05,
-      "loss": 0.6259,
-      "step": 252
-    },
-    {
-      "epoch": 0.8096,
-      "grad_norm": 0.25655988921277334,
-      "learning_rate": 1.8250879009349398e-05,
-      "loss": 0.568,
-      "step": 253
-    },
-    {
-      "epoch": 0.8128,
-      "grad_norm": 0.2457259860690368,
-      "learning_rate": 1.7656183115380577e-05,
-      "loss": 0.5566,
-      "step": 254
-    },
-    {
-      "epoch": 0.816,
-      "grad_norm": 0.2647745370183451,
-      "learning_rate": 1.707039794428259e-05,
-      "loss": 0.6137,
-      "step": 255
-    },
-    {
-      "epoch": 0.8192,
-      "grad_norm": 0.2526743886541512,
-      "learning_rate": 1.649358688599191e-05,
-      "loss": 0.5973,
-      "step": 256
-    },
-    {
-      "epoch": 0.8224,
-      "grad_norm": 0.26287332654405005,
-      "learning_rate": 1.5925812359323745e-05,
-      "loss": 0.5934,
-      "step": 257
-    },
-    {
-      "epoch": 0.8256,
-      "grad_norm": 0.2614410753690634,
-      "learning_rate": 1.5367135805217458e-05,
-      "loss": 0.6114,
-      "step": 258
-    },
-    {
-      "epoch": 0.8288,
-      "grad_norm": 0.259783723422659,
-      "learning_rate": 1.4817617680087825e-05,
-      "loss": 0.5961,
-      "step": 259
-    },
-    {
-      "epoch": 0.832,
-      "grad_norm": 0.24965207576444098,
-      "learning_rate": 1.4277317449282834e-05,
-      "loss": 0.5706,
-      "step": 260
-    },
-    {
-      "epoch": 0.8352,
-      "grad_norm": 0.2958426590398693,
-      "learning_rate": 1.3746293580648717e-05,
-      "loss": 0.6272,
-      "step": 261
-    },
-    {
-      "epoch": 0.8384,
-      "grad_norm": 0.25258297001548335,
-      "learning_rate": 1.3224603538202929e-05,
-      "loss": 0.6006,
-      "step": 262
-    },
-    {
-      "epoch": 0.8416,
-      "grad_norm": 0.2611674989165653,
-      "learning_rate": 1.2712303775915802e-05,
-      "loss": 0.6272,
-      "step": 263
-    },
-    {
-      "epoch": 0.8448,
-      "grad_norm": 0.25524659220422524,
-      "learning_rate": 1.220944973160133e-05,
-      "loss": 0.6174,
-      "step": 264
-    },
-    {
-      "epoch": 0.848,
-      "grad_norm": 0.2575177709932516,
-      "learning_rate": 1.1716095820918216e-05,
-      "loss": 0.5956,
-      "step": 265
-    },
-    {
-      "epoch": 0.8512,
-      "grad_norm": 0.2561395979203794,
-      "learning_rate": 1.1232295431481222e-05,
-      "loss": 0.5667,
-      "step": 266
-    },
-    {
-      "epoch": 0.8544,
-      "grad_norm": 0.25141728122486434,
-      "learning_rate": 1.0758100917083991e-05,
-      "loss": 0.572,
-      "step": 267
-    },
-    {
-      "epoch": 0.8576,
-      "grad_norm": 0.25964673762588375,
-      "learning_rate": 1.0293563592033595e-05,
-      "loss": 0.6372,
-      "step": 268
-    },
-    {
-      "epoch": 0.8608,
-      "grad_norm": 0.256690467072949,
-      "learning_rate": 9.838733725597615e-06,
-      "loss": 0.5884,
-      "step": 269
-    },
-    {
-      "epoch": 0.864,
-      "grad_norm": 0.2605600488715863,
-      "learning_rate": 9.393660536564408e-06,
-      "loss": 0.6151,
-      "step": 270
-    },
-    {
-      "epoch": 0.8672,
-      "grad_norm": 0.2520719685895574,
-      "learning_rate": 8.958392187916841e-06,
-      "loss": 0.586,
-      "step": 271
-    },
-    {
-      "epoch": 0.8704,
-      "grad_norm": 0.2544316858533973,
-      "learning_rate": 8.532975781620512e-06,
-      "loss": 0.5852,
-      "step": 272
-    },
-    {
-      "epoch": 0.8736,
-      "grad_norm": 0.25086193401636275,
-      "learning_rate": 8.117457353526625e-06,
-      "loss": 0.545,
-      "step": 273
-    },
-    {
-      "epoch": 0.8768,
-      "grad_norm": 0.2717178693247369,
-      "learning_rate": 7.711881868390291e-06,
-      "loss": 0.6135,
-      "step": 274
-    },
-    {
-      "epoch": 0.88,
-      "grad_norm": 0.26446663858778996,
-      "learning_rate": 7.3162932150046885e-06,
-      "loss": 0.654,
-      "step": 275
-    },
-    {
-      "epoch": 0.8832,
-      "grad_norm": 0.25710288834015854,
-      "learning_rate": 6.930734201451816e-06,
-      "loss": 0.6077,
-      "step": 276
-    },
-    {
-      "epoch": 0.8864,
-      "grad_norm": 0.2614053987061522,
-      "learning_rate": 6.555246550469907e-06,
-      "loss": 0.5887,
-      "step": 277
-    },
-    {
-      "epoch": 0.8896,
-      "grad_norm": 0.25803580837193296,
-      "learning_rate": 6.189870894938587e-06,
-      "loss": 0.5922,
-      "step": 278
-    },
-    {
-      "epoch": 0.8928,
-      "grad_norm": 0.25650163916574076,
-      "learning_rate": 5.834646773481811e-06,
-      "loss": 0.583,
-      "step": 279
-    },
-    {
-      "epoch": 0.896,
-      "grad_norm": 0.25422924226179444,
-      "learning_rate": 5.489612626189245e-06,
-      "loss": 0.5697,
-      "step": 280
-    },
-    {
-      "epoch": 0.8992,
-      "grad_norm": 0.24957575142600866,
-      "learning_rate": 5.154805790456485e-06,
-      "loss": 0.5912,
-      "step": 281
-    },
-    {
-      "epoch": 0.9024,
-      "grad_norm": 0.2515411538551669,
-      "learning_rate": 4.830262496944693e-06,
-      "loss": 0.5846,
-      "step": 282
-    },
-    {
-      "epoch": 0.9056,
-      "grad_norm": 0.2601071856992811,
-      "learning_rate": 4.516017865659949e-06,
-      "loss": 0.5616,
-      "step": 283
-    },
-    {
-      "epoch": 0.9088,
-      "grad_norm": 0.2639229721318898,
-      "learning_rate": 4.21210590215273e-06,
-      "loss": 0.6368,
-      "step": 284
-    },
-    {
-      "epoch": 0.912,
-      "grad_norm": 0.24857817483163056,
-      "learning_rate": 3.918559493838114e-06,
-      "loss": 0.5593,
-      "step": 285
-    },
-    {
-      "epoch": 0.9152,
-      "grad_norm": 0.2810784592988802,
-      "learning_rate": 3.6354104064368566e-06,
-      "loss": 0.7009,
-      "step": 286
-    },
-    {
-      "epoch": 0.9184,
-      "grad_norm": 0.25502077384196564,
-      "learning_rate": 3.3626892805379562e-06,
-      "loss": 0.5788,
-      "step": 287
-    },
-    {
-      "epoch": 0.9216,
-      "grad_norm": 0.24704174727298683,
-      "learning_rate": 3.100425628282899e-06,
-      "loss": 0.5657,
-      "step": 288
-    },
-    {
-      "epoch": 0.9248,
-      "grad_norm": 0.26397141045569056,
-      "learning_rate": 2.848647830172024e-06,
-      "loss": 0.6363,
-      "step": 289
-    },
-    {
-      "epoch": 0.928,
-      "grad_norm": 0.2524305059499819,
-      "learning_rate": 2.607383131993424e-06,
-      "loss": 0.5921,
-      "step": 290
-    },
-    {
-      "epoch": 0.9312,
-      "grad_norm": 0.2697223617632766,
-      "learning_rate": 2.3766576418745022e-06,
-      "loss": 0.6327,
-      "step": 291
-    },
-    {
-      "epoch": 0.9344,
-      "grad_norm": 0.26555534553448457,
-      "learning_rate": 2.1564963274568027e-06,
-      "loss": 0.608,
-      "step": 292
-    },
-    {
-      "epoch": 0.9376,
-      "grad_norm": 0.25409868011892073,
-      "learning_rate": 1.9469230131940907e-06,
-      "loss": 0.5747,
-      "step": 293
-    },
-    {
-      "epoch": 0.9408,
-      "grad_norm": 0.252487846427678,
-      "learning_rate": 1.7479603777742938e-06,
-      "loss": 0.5735,
-      "step": 294
-    },
-    {
-      "epoch": 0.944,
-      "grad_norm": 0.2571510334874647,
-      "learning_rate": 1.559629951665298e-06,
-      "loss": 0.5865,
-      "step": 295
-    },
-    {
-      "epoch": 0.9472,
-      "grad_norm": 0.4760901811369371,
-      "learning_rate": 1.3819521147851123e-06,
-      "loss": 0.542,
-      "step": 296
-    },
-    {
-      "epoch": 0.9504,
-      "grad_norm": 0.26372826176481556,
-      "learning_rate": 1.2149460942964098e-06,
-      "loss": 0.5884,
-      "step": 297
-    },
-    {
-      "epoch": 0.9536,
-      "grad_norm": 0.26175507028814576,
-      "learning_rate": 1.05862996252597e-06,
-      "loss": 0.6167,
-      "step": 298
-    },
-    {
-      "epoch": 0.9568,
-      "grad_norm": 0.26047362323447143,
-      "learning_rate": 9.130206350089765e-07,
-      "loss": 0.6128,
-      "step": 299
-    },
-    {
-      "epoch": 0.96,
-      "grad_norm": 0.2454540274824027,
-      "learning_rate": 7.781338686584927e-07,
-      "loss": 0.5388,
-      "step": 300
-    },
-    {
-      "epoch": 0.9632,
-      "grad_norm": 0.2476784963430826,
-      "learning_rate": 6.539842600603918e-07,
-      "loss": 0.5393,
-      "step": 301
-    },
-    {
-      "epoch": 0.9664,
-      "grad_norm": 0.24671440407493223,
-      "learning_rate": 5.405852438937764e-07,
-      "loss": 0.5966,
-      "step": 302
-    },
-    {
-      "epoch": 0.9696,
-      "grad_norm": 0.2571397406086264,
-      "learning_rate": 4.3794909147720773e-07,
-      "loss": 0.6121,
-      "step": 303
-    },
-    {
-      "epoch": 0.9728,
-      "grad_norm": 0.2895940448351629,
-      "learning_rate": 3.4608690944071263e-07,
-      "loss": 0.5805,
-      "step": 304
-    },
-    {
-      "epoch": 0.976,
-      "grad_norm": 0.27720052526167127,
-      "learning_rate": 2.6500863852395584e-07,
-      "loss": 0.6128,
-      "step": 305
-    },
-    {
-      "epoch": 0.9792,
-      "grad_norm": 0.2602744150458922,
-      "learning_rate": 1.947230525005006e-07,
-      "loss": 0.6157,
-      "step": 306
-    },
-    {
-      "epoch": 0.9824,
-      "grad_norm": 0.2565662022949563,
-      "learning_rate": 1.3523775722834587e-07,
-      "loss": 0.616,
-      "step": 307
-    },
-    {
-      "epoch": 0.9856,
-      "grad_norm": 0.2562137901406146,
-      "learning_rate": 8.655918982689581e-08,
-      "loss": 0.6134,
-      "step": 308
-    },
-    {
-      "epoch": 0.9888,
-      "grad_norm": 0.2525816370267746,
-      "learning_rate": 4.8692617980350406e-08,
-      "loss": 0.5834,
-      "step": 309
-    },
-    {
-      "epoch": 0.992,
-      "grad_norm": 0.2528167501756759,
-      "learning_rate": 2.164213936770576e-08,
-      "loss": 0.5784,
-      "step": 310
-    },
-    {
-      "epoch": 0.9952,
-      "grad_norm": 0.2557747985616951,
-      "learning_rate": 5.410681219286673e-09,
-      "loss": 0.6049,
-      "step": 311
-    },
-    {
-      "epoch": 0.9984,
-      "grad_norm": 0.2608436955663202,
-      "learning_rate": 0.0,
-      "loss": 0.5618,
-      "step": 312
-    },
-    {
-      "epoch": 0.9984,
-      "step": 312,
-      "total_flos": 185984439222272.0,
-      "train_loss": 0.6505028414420593,
-      "train_runtime": 4096.0965,
-      "train_samples_per_second": 1.221,
-      "train_steps_per_second": 0.076
-    }
-  ],
-  "logging_steps": 1.0,
-  "max_steps": 312,
-  "num_input_tokens_seen": 0,
-  "num_train_epochs": 1,
-  "save_steps": 500,
-  "stateful_callbacks": {
-    "TrainerControl": {
-      "args": {
-        "should_epoch_stop": false,
-        "should_evaluate": false,
-        "should_log": false,
-        "should_save": false,
-        "should_training_stop": false
-      },
-      "attributes": {}
-    }
-  },
-  "total_flos": 185984439222272.0,
-  "train_batch_size": 8,
-  "trial_name": null,
-  "trial_params": null
-}