Chat-Error commited on Feb 11, 2024

Commit

c2f3a04

verified ·

1 Parent(s): 2197429

Upload folder using huggingface_hub

Browse files

Files changed (39) hide show

.ipynb_checkpoints/Untitled-checkpoint.ipynb +6 -0
README.md +204 -0
Untitled.ipynb +58 -0
adapter_config.json +4 -4
adapter_model.bin +3 -0
checkpoint-1488/README.md +204 -0
checkpoint-1488/adapter_config.json +31 -0
checkpoint-1488/adapter_model.safetensors +3 -0
checkpoint-1488/optimizer.pt +3 -0
checkpoint-1488/rng_state.pth +3 -0
checkpoint-1488/scheduler.pt +3 -0
checkpoint-1488/trainer_state.json +0 -0
checkpoint-1488/training_args.bin +3 -0
checkpoint-1984/README.md +204 -0
checkpoint-1984/adapter_config.json +31 -0
checkpoint-1984/adapter_model.safetensors +3 -0
checkpoint-1984/optimizer.pt +3 -0
checkpoint-1984/rng_state.pth +3 -0
checkpoint-1984/scheduler.pt +3 -0
checkpoint-1984/trainer_state.json +0 -0
checkpoint-1984/training_args.bin +3 -0
checkpoint-496/README.md +204 -0
checkpoint-496/adapter_config.json +31 -0
checkpoint-496/adapter_model.safetensors +3 -0
checkpoint-496/optimizer.pt +3 -0
checkpoint-496/rng_state.pth +3 -0
checkpoint-496/scheduler.pt +3 -0
checkpoint-496/trainer_state.json +3021 -0
checkpoint-496/training_args.bin +3 -0
checkpoint-992/README.md +204 -0
checkpoint-992/adapter_config.json +31 -0
checkpoint-992/adapter_model.safetensors +3 -0
checkpoint-992/optimizer.pt +3 -0
checkpoint-992/rng_state.pth +3 -0
checkpoint-992/scheduler.pt +3 -0
checkpoint-992/trainer_state.json +0 -0
checkpoint-992/training_args.bin +3 -0
config.json +8 -8
tmp-checkpoint-516/README.md +204 -0

.ipynb_checkpoints/Untitled-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+ "cells": [],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Chat-Error/Mistral-Kimiko-CSFT
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.1

Untitled.ipynb ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "6cde784e-3949-4eb5-925e-9009da201ebd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import HfApi\n",
+    "\n",
+    "api = HfApi()\n",
+    "\n",
+    "# Upload all the content from the local folder to your remote Space.\n",
+    "# By default, files are uploaded at the root of the repo\n",
+    "def upload():\n",
+    "    api.upload_folder(\n",
+    "    \n",
+    "        folder_path=\"/workspace/axolotl/qlora-out\",\n",
+    "    \n",
+    "        repo_id=\"Chat-Error/Claude-Kimiko\",\n",
+    "    \n",
+    "        repo_type=\"model\",\n",
+    "    \n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e125b798-01f6-43d3-894a-fb0147059085",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

adapter_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "alpha_pattern": {},
   "auto_mapping": null,
-  "base_model_name_or_path": "Chat-Error/IWasDointCrystalMethOnTheKitchenButThenMomWalkedIn-NeuralHermesStripedCapybara-Mistral-11B-SLERP",
   "bias": "none",
   "fan_in_fan_out": null,
   "inference_mode": false,
@@ -10,7 +10,7 @@
   "layers_to_transform": null,
   "loftq_config": {},
   "lora_alpha": 64,
-  "lora_dropout": 0.05,
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": null,
@@ -20,10 +20,10 @@
   "revision": null,
   "target_modules": [
     "up_proj",
-    "gate_proj",
-    "k_proj",
     "q_proj",
     "v_proj",
     "o_proj",
     "down_proj"
   ],

 {
   "alpha_pattern": {},
   "auto_mapping": null,
+  "base_model_name_or_path": "Chat-Error/Mistral-Kimiko-CSFT",
   "bias": "none",
   "fan_in_fan_out": null,
   "inference_mode": false,
   "layers_to_transform": null,
   "loftq_config": {},
   "lora_alpha": 64,
+  "lora_dropout": 0.0,
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "revision": null,
   "target_modules": [
     "up_proj",
     "q_proj",
     "v_proj",
+    "k_proj",
+    "gate_proj",
     "o_proj",
     "down_proj"
   ],

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2e2127f72e5d9fae314e092e828185d63396d7a76ecf8f468ab746ed7c60990
+size 335706186

checkpoint-1488/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Chat-Error/Mistral-Kimiko-CSFT
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.1

checkpoint-1488/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Chat-Error/Mistral-Kimiko-CSFT",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "v_proj",
+    "k_proj",
+    "gate_proj",
+    "o_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

checkpoint-1488/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:df2a2f84496bb47a50f895a1d1647c8fedbaa3f83a2f3992e5776f64601d095b
+size 335604696

checkpoint-1488/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f4e52830095c066f3209956679e0810fcc648f287de5fe7b9a73372627ead10
+size 168625172

checkpoint-1488/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e3e5d946241df2516b06d7074d8779088eae7607173ad780df56583910a9589b
+size 14244

checkpoint-1488/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e15f7d9560f5fd87dfc12320a528f806c640656f865ca08c9dc5788eef9631a
+size 1064

checkpoint-1488/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1488/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8358787deffea50a57e590c449c424422609cef40af4a0de5a5b2512c3bc98e
+size 5304

checkpoint-1984/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Chat-Error/Mistral-Kimiko-CSFT
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.1

checkpoint-1984/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Chat-Error/Mistral-Kimiko-CSFT",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "v_proj",
+    "k_proj",
+    "gate_proj",
+    "o_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

checkpoint-1984/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:67ea9f328055d8f5f80ba2ed1e96c97d251ef25e9982dc350f79d20acc2a11ac
+size 335604696

checkpoint-1984/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18f6c5d65397b5d2d3340eb0ee236f258b58d123b516d0f645f0b5d9fdd75c94
+size 168625172

checkpoint-1984/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d3b7102895eb0637b0cab516bd672f216b2bf79078a83eb301011a90444f44c
+size 14244

checkpoint-1984/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:160dd51274d70b1c5165e721d0b137039fee2c3504ba87637895635fbc710971
+size 1064

checkpoint-1984/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1984/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8358787deffea50a57e590c449c424422609cef40af4a0de5a5b2512c3bc98e
+size 5304

checkpoint-496/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Chat-Error/Mistral-Kimiko-CSFT
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.1

checkpoint-496/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Chat-Error/Mistral-Kimiko-CSFT",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "v_proj",
+    "k_proj",
+    "gate_proj",
+    "o_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

checkpoint-496/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:61a8a8dc18214e72466cd9166a6872ea4d0f3b266a78aa831cac2cabdfd09555
+size 335604696

checkpoint-496/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c572160284c19840045d2ff7937bcd6e45cedcdfa9e54f96714ef3bb0f77f402
+size 168625172

checkpoint-496/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9899ccda7f0d8d9511991180b93aab508ce6e8489de708c88ad1188e7e1d90d6
+size 14244

checkpoint-496/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a802d8544e410ceb6c8c895daa2323a6fe5d9ac7469d1e1c7b73920560abe1e0
+size 1064

checkpoint-496/trainer_state.json ADDED Viewed

	@@ -0,0 +1,3021 @@

+{
+  "best_metric": 1.436846375465393,
+  "best_model_checkpoint": "./qlora-out/checkpoint-496",
+  "epoch": 0.25012607160867373,
+  "eval_steps": 248,
+  "global_step": 496,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 1.3547,
+      "step": 1
+    },
+    {
+      "epoch": 0.0,
+      "eval_loss": 1.5788122415542603,
+      "eval_runtime": 99.5564,
+      "eval_samples_per_second": 1.165,
+      "eval_steps_per_second": 1.165,
+      "step": 1
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 1.5678,
+      "step": 2
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 6e-06,
+      "loss": 1.8406,
+      "step": 3
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 2.2337,
+      "step": 4
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 1e-05,
+      "loss": 1.6725,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 1.2e-05,
+      "loss": 1.4016,
+      "step": 6
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 1.4e-05,
+      "loss": 1.3171,
+      "step": 7
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 1.4367,
+      "step": 8
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 1.8e-05,
+      "loss": 1.4082,
+      "step": 9
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 2e-05,
+      "loss": 1.5169,
+      "step": 10
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999996846759028e-05,
+      "loss": 1.6735,
+      "step": 11
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.99999873870381e-05,
+      "loss": 1.3341,
+      "step": 12
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999971620843182e-05,
+      "loss": 1.7104,
+      "step": 13
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999949548184215e-05,
+      "loss": 1.2838,
+      "step": 14
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999921169075117e-05,
+      "loss": 1.4659,
+      "step": 15
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999886483533792e-05,
+      "loss": 1.504,
+      "step": 16
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.999984549158211e-05,
+      "loss": 1.7891,
+      "step": 17
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999798193245924e-05,
+      "loss": 1.7854,
+      "step": 18
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999744588555065e-05,
+      "loss": 1.7957,
+      "step": 19
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999684677543332e-05,
+      "loss": 1.5033,
+      "step": 20
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999618460248515e-05,
+      "loss": 1.5491,
+      "step": 21
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999545936712364e-05,
+      "loss": 1.5846,
+      "step": 22
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999467106980627e-05,
+      "loss": 1.6081,
+      "step": 23
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999381971103015e-05,
+      "loss": 1.0283,
+      "step": 24
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999290529133215e-05,
+      "loss": 2.0389,
+      "step": 25
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9999192781128893e-05,
+      "loss": 1.3088,
+      "step": 26
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.99990887271517e-05,
+      "loss": 1.6174,
+      "step": 27
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9998978367267258e-05,
+      "loss": 1.4197,
+      "step": 28
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 1.9998861701545155e-05,
+      "loss": 1.2337,
+      "step": 29
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9998738730058974e-05,
+      "loss": 1.3482,
+      "step": 30
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.999860945288627e-05,
+      "loss": 1.6648,
+      "step": 31
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9998473870108565e-05,
+      "loss": 1.4751,
+      "step": 32
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.999833198181137e-05,
+      "loss": 2.0115,
+      "step": 33
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9998183788084155e-05,
+      "loss": 1.1591,
+      "step": 34
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9998029289020388e-05,
+      "loss": 1.4557,
+      "step": 35
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9997868484717504e-05,
+      "loss": 1.8469,
+      "step": 36
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.999770137527691e-05,
+      "loss": 1.7395,
+      "step": 37
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9997527960803994e-05,
+      "loss": 1.1644,
+      "step": 38
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.999734824140812e-05,
+      "loss": 1.3629,
+      "step": 39
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.999716221720263e-05,
+      "loss": 1.1913,
+      "step": 40
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9996969888304835e-05,
+      "loss": 1.9801,
+      "step": 41
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.999677125483603e-05,
+      "loss": 1.1722,
+      "step": 42
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9996566316921485e-05,
+      "loss": 1.4999,
+      "step": 43
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9996355074690438e-05,
+      "loss": 1.5593,
+      "step": 44
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.999613752827611e-05,
+      "loss": 1.7799,
+      "step": 45
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9995913677815705e-05,
+      "loss": 1.3714,
+      "step": 46
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9995683523450382e-05,
+      "loss": 1.428,
+      "step": 47
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9995447065325292e-05,
+      "loss": 1.4206,
+      "step": 48
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 1.9995204303589557e-05,
+      "loss": 1.6583,
+      "step": 49
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9994955238396276e-05,
+      "loss": 1.4349,
+      "step": 50
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9994699869902516e-05,
+      "loss": 1.1203,
+      "step": 51
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.999443819826933e-05,
+      "loss": 1.2595,
+      "step": 52
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.999417022366174e-05,
+      "loss": 1.7085,
+      "step": 53
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9993895946248744e-05,
+      "loss": 1.4112,
+      "step": 54
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9993615366203313e-05,
+      "loss": 1.1461,
+      "step": 55
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9993328483702393e-05,
+      "loss": 1.3644,
+      "step": 56
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.999303529892691e-05,
+      "loss": 1.2273,
+      "step": 57
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9992735812061757e-05,
+      "loss": 1.2104,
+      "step": 58
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.999243002329581e-05,
+      "loss": 1.6421,
+      "step": 59
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9992117932821906e-05,
+      "loss": 1.3875,
+      "step": 60
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9991799540836867e-05,
+      "loss": 1.4965,
+      "step": 61
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.999147484754149e-05,
+      "loss": 1.304,
+      "step": 62
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9991143853140543e-05,
+      "loss": 1.5476,
+      "step": 63
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9990806557842758e-05,
+      "loss": 1.609,
+      "step": 64
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.999046296186086e-05,
+      "loss": 1.5958,
+      "step": 65
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9990113065411532e-05,
+      "loss": 1.6518,
+      "step": 66
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9989756868715435e-05,
+      "loss": 1.6508,
+      "step": 67
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9989394371997205e-05,
+      "loss": 1.6813,
+      "step": 68
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 1.9989025575485453e-05,
+      "loss": 1.5673,
+      "step": 69
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9988650479412754e-05,
+      "loss": 1.2511,
+      "step": 70
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9988269084015668e-05,
+      "loss": 1.6433,
+      "step": 71
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9987881389534715e-05,
+      "loss": 1.7642,
+      "step": 72
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.99874873962144e-05,
+      "loss": 1.2926,
+      "step": 73
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9987087104303188e-05,
+      "loss": 1.1941,
+      "step": 74
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9986680514053526e-05,
+      "loss": 1.6356,
+      "step": 75
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998626762572183e-05,
+      "loss": 1.6645,
+      "step": 76
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9985848439568486e-05,
+      "loss": 1.0501,
+      "step": 77
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998542295585785e-05,
+      "loss": 1.1477,
+      "step": 78
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998499117485826e-05,
+      "loss": 1.1104,
+      "step": 79
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998455309684201e-05,
+      "loss": 1.2144,
+      "step": 80
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9984108722085378e-05,
+      "loss": 1.6791,
+      "step": 81
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998365805086861e-05,
+      "loss": 1.2276,
+      "step": 82
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998320108347591e-05,
+      "loss": 1.6222,
+      "step": 83
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998273782019548e-05,
+      "loss": 1.6216,
+      "step": 84
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9982268261319462e-05,
+      "loss": 1.332,
+      "step": 85
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9981792407143988e-05,
+      "loss": 1.3808,
+      "step": 86
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9981310257969158e-05,
+      "loss": 1.4833,
+      "step": 87
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.9980821814099033e-05,
+      "loss": 1.5134,
+      "step": 88
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 1.998032707584165e-05,
+      "loss": 1.6795,
+      "step": 89
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.997982604350902e-05,
+      "loss": 1.1978,
+      "step": 90
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9979318717417112e-05,
+      "loss": 1.6426,
+      "step": 91
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.997880509788587e-05,
+      "loss": 1.5072,
+      "step": 92
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9978285185239215e-05,
+      "loss": 1.5113,
+      "step": 93
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.997775897980502e-05,
+      "loss": 1.6025,
+      "step": 94
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.997722648191514e-05,
+      "loss": 1.5437,
+      "step": 95
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9976687691905394e-05,
+      "loss": 0.8807,
+      "step": 96
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9976142610115567e-05,
+      "loss": 1.2017,
+      "step": 97
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9975591236889414e-05,
+      "loss": 1.571,
+      "step": 98
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.997503357257466e-05,
+      "loss": 1.2984,
+      "step": 99
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9974469617522992e-05,
+      "loss": 1.6714,
+      "step": 100
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.997389937209007e-05,
+      "loss": 1.8847,
+      "step": 101
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9973322836635517e-05,
+      "loss": 1.2481,
+      "step": 102
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9972740011522927e-05,
+      "loss": 1.3737,
+      "step": 103
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.997215089711985e-05,
+      "loss": 1.1397,
+      "step": 104
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9971555493797817e-05,
+      "loss": 0.7657,
+      "step": 105
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9970953801932313e-05,
+      "loss": 1.2579,
+      "step": 106
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.9970345821902795e-05,
+      "loss": 1.588,
+      "step": 107
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.996973155409269e-05,
+      "loss": 1.0317,
+      "step": 108
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 1.996911099888938e-05,
+      "loss": 1.6896,
+      "step": 109
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9968484156684215e-05,
+      "loss": 1.7309,
+      "step": 110
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.996785102787252e-05,
+      "loss": 1.0569,
+      "step": 111
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9967211612853566e-05,
+      "loss": 1.8228,
+      "step": 112
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9966565912030607e-05,
+      "loss": 1.4042,
+      "step": 113
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9965913925810847e-05,
+      "loss": 1.2318,
+      "step": 114
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9965255654605466e-05,
+      "loss": 1.6043,
+      "step": 115
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.99645910988296e-05,
+      "loss": 1.7256,
+      "step": 116
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9963920258902344e-05,
+      "loss": 1.477,
+      "step": 117
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.996324313524677e-05,
+      "loss": 1.4139,
+      "step": 118
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.99625597282899e-05,
+      "loss": 1.4781,
+      "step": 119
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9961870038462727e-05,
+      "loss": 1.1951,
+      "step": 120
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.99611740662002e-05,
+      "loss": 1.2997,
+      "step": 121
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.996047181194123e-05,
+      "loss": 1.2891,
+      "step": 122
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.99597632761287e-05,
+      "loss": 2.083,
+      "step": 123
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.995904845920944e-05,
+      "loss": 1.3798,
+      "step": 124
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9958327361634248e-05,
+      "loss": 1.3713,
+      "step": 125
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.995759998385789e-05,
+      "loss": 1.2818,
+      "step": 126
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9956866326339076e-05,
+      "loss": 2.1837,
+      "step": 127
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 1.9956126389540493e-05,
+      "loss": 1.3562,
+      "step": 128
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9955380173928777e-05,
+      "loss": 1.4653,
+      "step": 129
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.995462767997453e-05,
+      "loss": 1.5638,
+      "step": 130
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.995386890815231e-05,
+      "loss": 1.3003,
+      "step": 131
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9953103858940633e-05,
+      "loss": 1.676,
+      "step": 132
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.995233253282198e-05,
+      "loss": 1.5618,
+      "step": 133
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9951554930282782e-05,
+      "loss": 0.9876,
+      "step": 134
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9950771051813435e-05,
+      "loss": 1.3107,
+      "step": 135
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.994998089790829e-05,
+      "loss": 1.3858,
+      "step": 136
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.994918446906566e-05,
+      "loss": 1.7285,
+      "step": 137
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9948381765787802e-05,
+      "loss": 1.5864,
+      "step": 138
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.994757278858095e-05,
+      "loss": 1.6605,
+      "step": 139
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.994675753795528e-05,
+      "loss": 1.3543,
+      "step": 140
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9945936014424924e-05,
+      "loss": 1.481,
+      "step": 141
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9945108218507976e-05,
+      "loss": 1.5271,
+      "step": 142
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.994427415072649e-05,
+      "loss": 1.4439,
+      "step": 143
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9943433811606465e-05,
+      "loss": 1.5647,
+      "step": 144
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.994258720167786e-05,
+      "loss": 1.2134,
+      "step": 145
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9941734321474586e-05,
+      "loss": 1.3431,
+      "step": 146
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.994087517153451e-05,
+      "loss": 1.6574,
+      "step": 147
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 1.9940009752399462e-05,
+      "loss": 1.3856,
+      "step": 148
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9939138064615205e-05,
+      "loss": 1.2281,
+      "step": 149
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9938260108731474e-05,
+      "loss": 1.7524,
+      "step": 150
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9937375885301948e-05,
+      "loss": 1.4849,
+      "step": 151
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9936485394884263e-05,
+      "loss": 1.6183,
+      "step": 152
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9935588638040005e-05,
+      "loss": 1.854,
+      "step": 153
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.993468561533471e-05,
+      "loss": 1.6805,
+      "step": 154
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.993377632733787e-05,
+      "loss": 1.3257,
+      "step": 155
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.993286077462292e-05,
+      "loss": 1.5286,
+      "step": 156
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.993193895776726e-05,
+      "loss": 1.7098,
+      "step": 157
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.993101087735223e-05,
+      "loss": 2.0118,
+      "step": 158
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9930076533963117e-05,
+      "loss": 1.0722,
+      "step": 159
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.992913592818917e-05,
+      "loss": 1.1482,
+      "step": 160
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9928189060623574e-05,
+      "loss": 1.5841,
+      "step": 161
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9927235931863477e-05,
+      "loss": 1.3712,
+      "step": 162
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.992627654250996e-05,
+      "loss": 1.563,
+      "step": 163
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.992531089316806e-05,
+      "loss": 1.1601,
+      "step": 164
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9924338984446773e-05,
+      "loss": 1.5141,
+      "step": 165
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9923360816959016e-05,
+      "loss": 1.6872,
+      "step": 166
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.992237639132168e-05,
+      "loss": 1.6658,
+      "step": 167
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 1.9921385708155588e-05,
+      "loss": 1.5354,
+      "step": 168
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9920388768085513e-05,
+      "loss": 1.457,
+      "step": 169
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9919385571740172e-05,
+      "loss": 1.5823,
+      "step": 170
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.991837611975223e-05,
+      "loss": 1.5392,
+      "step": 171
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9917360412758295e-05,
+      "loss": 1.3985,
+      "step": 172
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9916338451398923e-05,
+      "loss": 1.1896,
+      "step": 173
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9915310236318607e-05,
+      "loss": 1.2652,
+      "step": 174
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9914275768165793e-05,
+      "loss": 1.0745,
+      "step": 175
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.991323504759287e-05,
+      "loss": 1.8471,
+      "step": 176
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.991218807525616e-05,
+      "loss": 1.871,
+      "step": 177
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9911134851815935e-05,
+      "loss": 1.3217,
+      "step": 178
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9910075377936414e-05,
+      "loss": 1.4894,
+      "step": 179
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9909009654285748e-05,
+      "loss": 1.1755,
+      "step": 180
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9907937681536032e-05,
+      "loss": 1.6328,
+      "step": 181
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9906859460363307e-05,
+      "loss": 1.4971,
+      "step": 182
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9905774991447552e-05,
+      "loss": 1.4121,
+      "step": 183
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9904684275472684e-05,
+      "loss": 1.5965,
+      "step": 184
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9903587313126557e-05,
+      "loss": 1.21,
+      "step": 185
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9902484105100974e-05,
+      "loss": 1.8833,
+      "step": 186
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9901374652091666e-05,
+      "loss": 1.0323,
+      "step": 187
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 1.9900258954798315e-05,
+      "loss": 1.5577,
+      "step": 188
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.989913701392453e-05,
+      "loss": 1.8889,
+      "step": 189
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9898008830177856e-05,
+      "loss": 1.6154,
+      "step": 190
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9896874404269786e-05,
+      "loss": 1.5606,
+      "step": 191
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.989573373691574e-05,
+      "loss": 1.2633,
+      "step": 192
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.989458682883508e-05,
+      "loss": 1.2808,
+      "step": 193
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9893433680751105e-05,
+      "loss": 1.5281,
+      "step": 194
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9892274293391035e-05,
+      "loss": 1.1487,
+      "step": 195
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9891108667486047e-05,
+      "loss": 1.1523,
+      "step": 196
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9889936803771237e-05,
+      "loss": 1.3093,
+      "step": 197
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9888758702985637e-05,
+      "loss": 1.249,
+      "step": 198
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9887574365872214e-05,
+      "loss": 1.1357,
+      "step": 199
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.988638379317787e-05,
+      "loss": 1.5416,
+      "step": 200
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.988518698565344e-05,
+      "loss": 1.63,
+      "step": 201
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9883983944053678e-05,
+      "loss": 1.194,
+      "step": 202
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9882774669137293e-05,
+      "loss": 1.281,
+      "step": 203
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9881559161666905e-05,
+      "loss": 1.1447,
+      "step": 204
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.988033742240907e-05,
+      "loss": 1.7913,
+      "step": 205
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9879109452134283e-05,
+      "loss": 1.5167,
+      "step": 206
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9877875251616954e-05,
+      "loss": 1.2428,
+      "step": 207
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 1.9876634821635432e-05,
+      "loss": 1.3816,
+      "step": 208
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9875388162971992e-05,
+      "loss": 1.4307,
+      "step": 209
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9874135276412837e-05,
+      "loss": 1.3444,
+      "step": 210
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.98728761627481e-05,
+      "loss": 1.3973,
+      "step": 211
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9871610822771835e-05,
+      "loss": 1.4477,
+      "step": 212
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9870339257282028e-05,
+      "loss": 1.0952,
+      "step": 213
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9869061467080587e-05,
+      "loss": 1.582,
+      "step": 214
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9867777452973352e-05,
+      "loss": 1.4796,
+      "step": 215
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9866487215770084e-05,
+      "loss": 0.996,
+      "step": 216
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9865190756284467e-05,
+      "loss": 1.2355,
+      "step": 217
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9863888075334113e-05,
+      "loss": 1.3907,
+      "step": 218
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.986257917374055e-05,
+      "loss": 1.5925,
+      "step": 219
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.986126405232924e-05,
+      "loss": 1.3692,
+      "step": 220
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9859942711929557e-05,
+      "loss": 1.2802,
+      "step": 221
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9858615153374808e-05,
+      "loss": 1.3644,
+      "step": 222
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.985728137750221e-05,
+      "loss": 1.1744,
+      "step": 223
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.985594138515291e-05,
+      "loss": 1.4946,
+      "step": 224
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9854595177171968e-05,
+      "loss": 1.6203,
+      "step": 225
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9853242754408376e-05,
+      "loss": 1.5148,
+      "step": 226
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9851884117715027e-05,
+      "loss": 1.7796,
+      "step": 227
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 1.9850519267948747e-05,
+      "loss": 1.0399,
+      "step": 228
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9849148205970275e-05,
+      "loss": 1.1837,
+      "step": 229
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.984777093264427e-05,
+      "loss": 1.3893,
+      "step": 230
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9846387448839308e-05,
+      "loss": 1.2283,
+      "step": 231
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9844997755427875e-05,
+      "loss": 1.0594,
+      "step": 232
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.984360185328639e-05,
+      "loss": 1.5023,
+      "step": 233
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9842199743295164e-05,
+      "loss": 1.619,
+      "step": 234
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.984079142633844e-05,
+      "loss": 0.9526,
+      "step": 235
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.983937690330437e-05,
+      "loss": 1.3427,
+      "step": 236
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.983795617508502e-05,
+      "loss": 1.1187,
+      "step": 237
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9836529242576373e-05,
+      "loss": 1.2346,
+      "step": 238
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.983509610667832e-05,
+      "loss": 1.8011,
+      "step": 239
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.983365676829466e-05,
+      "loss": 1.1902,
+      "step": 240
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.983221122833312e-05,
+      "loss": 1.3564,
+      "step": 241
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.983075948770532e-05,
+      "loss": 1.2038,
+      "step": 242
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9829301547326794e-05,
+      "loss": 1.3158,
+      "step": 243
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9827837408116996e-05,
+      "loss": 1.6507,
+      "step": 244
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9826367070999284e-05,
+      "loss": 1.8778,
+      "step": 245
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.9824890536900917e-05,
+      "loss": 1.3691,
+      "step": 246
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 1.982340780675307e-05,
+      "loss": 1.3956,
+      "step": 247
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.982191888149083e-05,
+      "loss": 1.275,
+      "step": 248
+    },
+    {
+      "epoch": 0.13,
+      "eval_loss": 1.4743634462356567,
+      "eval_runtime": 99.6539,
+      "eval_samples_per_second": 1.164,
+      "eval_steps_per_second": 1.164,
+      "step": 248
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9820423762053178e-05,
+      "loss": 1.3582,
+      "step": 249
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.981892244938301e-05,
+      "loss": 1.3869,
+      "step": 250
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9817414944427133e-05,
+      "loss": 1.2653,
+      "step": 251
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9815901248136242e-05,
+      "loss": 1.3705,
+      "step": 252
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9814381361464953e-05,
+      "loss": 1.5251,
+      "step": 253
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9812855285371778e-05,
+      "loss": 1.5613,
+      "step": 254
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9811323020819136e-05,
+      "loss": 1.3138,
+      "step": 255
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.980978456877334e-05,
+      "loss": 1.0931,
+      "step": 256
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9808239930204625e-05,
+      "loss": 1.4202,
+      "step": 257
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.98066891060871e-05,
+      "loss": 1.2566,
+      "step": 258
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.98051320973988e-05,
+      "loss": 1.1673,
+      "step": 259
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9803568905121647e-05,
+      "loss": 1.2889,
+      "step": 260
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.980199953024146e-05,
+      "loss": 1.3371,
+      "step": 261
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9800423973747972e-05,
+      "loss": 1.4299,
+      "step": 262
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9798842236634797e-05,
+      "loss": 1.4825,
+      "step": 263
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9797254319899453e-05,
+      "loss": 1.2745,
+      "step": 264
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.979566022454337e-05,
+      "loss": 1.4379,
+      "step": 265
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.9794059951571848e-05,
+      "loss": 1.386,
+      "step": 266
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 1.97924535019941e-05,
+      "loss": 1.2546,
+      "step": 267
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.979084087682323e-05,
+      "loss": 1.2028,
+      "step": 268
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.978922207707624e-05,
+      "loss": 1.1725,
+      "step": 269
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.978759710377402e-05,
+      "loss": 1.6772,
+      "step": 270
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9785965957941362e-05,
+      "loss": 1.555,
+      "step": 271
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.978432864060694e-05,
+      "loss": 1.6195,
+      "step": 272
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9782685152803326e-05,
+      "loss": 1.4591,
+      "step": 273
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.978103549556698e-05,
+      "loss": 1.2727,
+      "step": 274
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9779379669938265e-05,
+      "loss": 1.3214,
+      "step": 275
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9777717676961412e-05,
+      "loss": 1.6267,
+      "step": 276
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.977604951768456e-05,
+      "loss": 1.1487,
+      "step": 277
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.977437519315973e-05,
+      "loss": 1.8019,
+      "step": 278
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9772694704442836e-05,
+      "loss": 1.3772,
+      "step": 279
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.977100805259367e-05,
+      "loss": 1.1348,
+      "step": 280
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9769315238675916e-05,
+      "loss": 1.209,
+      "step": 281
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9767616263757146e-05,
+      "loss": 1.4235,
+      "step": 282
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9765911128908813e-05,
+      "loss": 1.2202,
+      "step": 283
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.976419983520626e-05,
+      "loss": 1.2418,
+      "step": 284
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.976248238372871e-05,
+      "loss": 1.3901,
+      "step": 285
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9760758775559275e-05,
+      "loss": 1.1207,
+      "step": 286
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 1.9759029011784936e-05,
+      "loss": 1.0332,
+      "step": 287
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9757293093496573e-05,
+      "loss": 1.4162,
+      "step": 288
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9755551021788934e-05,
+      "loss": 1.5068,
+      "step": 289
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.975380279776066e-05,
+      "loss": 1.5863,
+      "step": 290
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9752048422514262e-05,
+      "loss": 1.4527,
+      "step": 291
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9750287897156136e-05,
+      "loss": 1.2066,
+      "step": 292
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.974852122279655e-05,
+      "loss": 1.2543,
+      "step": 293
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9746748400549653e-05,
+      "loss": 1.455,
+      "step": 294
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9744969431533474e-05,
+      "loss": 1.4658,
+      "step": 295
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9743184316869924e-05,
+      "loss": 1.2493,
+      "step": 296
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.974139305768477e-05,
+      "loss": 1.4861,
+      "step": 297
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9739595655107675e-05,
+      "loss": 1.6473,
+      "step": 298
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9737792110272167e-05,
+      "loss": 1.4982,
+      "step": 299
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9735982424315642e-05,
+      "loss": 1.4031,
+      "step": 300
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.973416659837938e-05,
+      "loss": 1.1998,
+      "step": 301
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.973234463360853e-05,
+      "loss": 1.0635,
+      "step": 302
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.973051653115211e-05,
+      "loss": 1.29,
+      "step": 303
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9728682292163002e-05,
+      "loss": 1.2678,
+      "step": 304
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9726841917797977e-05,
+      "loss": 1.3882,
+      "step": 305
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.9724995409217658e-05,
+      "loss": 1.0476,
+      "step": 306
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 1.972314276758654e-05,
+      "loss": 1.2119,
+      "step": 307
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9721283994072995e-05,
+      "loss": 1.7283,
+      "step": 308
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.971941908984925e-05,
+      "loss": 1.6008,
+      "step": 309
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.97175480560914e-05,
+      "loss": 1.3317,
+      "step": 310
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9715670893979416e-05,
+      "loss": 1.2612,
+      "step": 311
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9713787604697125e-05,
+      "loss": 1.4356,
+      "step": 312
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9711898189432218e-05,
+      "loss": 1.3431,
+      "step": 313
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9710002649376255e-05,
+      "loss": 0.8521,
+      "step": 314
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9708100985724654e-05,
+      "loss": 1.9041,
+      "step": 315
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.970619319967669e-05,
+      "loss": 1.6982,
+      "step": 316
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.970427929243551e-05,
+      "loss": 1.6813,
+      "step": 317
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9702359265208114e-05,
+      "loss": 1.5686,
+      "step": 318
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9700433119205368e-05,
+      "loss": 1.7669,
+      "step": 319
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9698500855641988e-05,
+      "loss": 0.9086,
+      "step": 320
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9696562475736556e-05,
+      "loss": 1.4198,
+      "step": 321
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9694617980711503e-05,
+      "loss": 1.8693,
+      "step": 322
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9692667371793127e-05,
+      "loss": 0.7734,
+      "step": 323
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9690710650211572e-05,
+      "loss": 1.1205,
+      "step": 324
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.968874781720084e-05,
+      "loss": 0.959,
+      "step": 325
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9686778873998792e-05,
+      "loss": 1.9894,
+      "step": 326
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 1.9684803821847137e-05,
+      "loss": 1.6784,
+      "step": 327
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9682822661991435e-05,
+      "loss": 1.4841,
+      "step": 328
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.968083539568111e-05,
+      "loss": 1.7689,
+      "step": 329
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9678842024169418e-05,
+      "loss": 1.3004,
+      "step": 330
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9676842548713475e-05,
+      "loss": 1.5111,
+      "step": 331
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9674836970574253e-05,
+      "loss": 1.0877,
+      "step": 332
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9672825291016564e-05,
+      "loss": 1.3979,
+      "step": 333
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.967080751130907e-05,
+      "loss": 1.5652,
+      "step": 334
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9668783632724278e-05,
+      "loss": 1.2373,
+      "step": 335
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9666753656538545e-05,
+      "loss": 1.5039,
+      "step": 336
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9664717584032075e-05,
+      "loss": 1.3572,
+      "step": 337
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9662675416488908e-05,
+      "loss": 1.3156,
+      "step": 338
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9660627155196934e-05,
+      "loss": 1.4775,
+      "step": 339
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.965857280144789e-05,
+      "loss": 1.5048,
+      "step": 340
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9656512356537343e-05,
+      "loss": 1.0645,
+      "step": 341
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9654445821764717e-05,
+      "loss": 1.4329,
+      "step": 342
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9652373198433265e-05,
+      "loss": 1.3218,
+      "step": 343
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.965029448785008e-05,
+      "loss": 1.6661,
+      "step": 344
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.9648209691326103e-05,
+      "loss": 1.183,
+      "step": 345
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.96461188101761e-05,
+      "loss": 1.2924,
+      "step": 346
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 1.964402184571869e-05,
+      "loss": 1.0152,
+      "step": 347
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9641918799276313e-05,
+      "loss": 1.3248,
+      "step": 348
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9639809672175253e-05,
+      "loss": 1.2242,
+      "step": 349
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.963769446574563e-05,
+      "loss": 1.2416,
+      "step": 350
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9635573181321394e-05,
+      "loss": 1.4711,
+      "step": 351
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9633445820240323e-05,
+      "loss": 1.454,
+      "step": 352
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.963131238384404e-05,
+      "loss": 1.355,
+      "step": 353
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9629172873477995e-05,
+      "loss": 1.3991,
+      "step": 354
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.962702729049146e-05,
+      "loss": 1.3905,
+      "step": 355
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9624875636237547e-05,
+      "loss": 1.5024,
+      "step": 356
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9622717912073193e-05,
+      "loss": 1.1932,
+      "step": 357
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.962055411935916e-05,
+      "loss": 1.4808,
+      "step": 358
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.961838425946004e-05,
+      "loss": 1.5079,
+      "step": 359
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9616208333744255e-05,
+      "loss": 1.1668,
+      "step": 360
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9614026343584048e-05,
+      "loss": 0.9482,
+      "step": 361
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9611838290355483e-05,
+      "loss": 1.5103,
+      "step": 362
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9609644175438457e-05,
+      "loss": 1.4825,
+      "step": 363
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9607444000216676e-05,
+      "loss": 1.0542,
+      "step": 364
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9605237766077686e-05,
+      "loss": 1.3711,
+      "step": 365
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 1.9603025474412844e-05,
+      "loss": 1.8036,
+      "step": 366
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.960080712661732e-05,
+      "loss": 1.2591,
+      "step": 367
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.959858272409012e-05,
+      "loss": 1.5749,
+      "step": 368
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9596352268234053e-05,
+      "loss": 1.4592,
+      "step": 369
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9594115760455755e-05,
+      "loss": 0.8912,
+      "step": 370
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9591873202165678e-05,
+      "loss": 0.9196,
+      "step": 371
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9589624594778077e-05,
+      "loss": 1.4079,
+      "step": 372
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9587369939711044e-05,
+      "loss": 1.3132,
+      "step": 373
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.958510923838647e-05,
+      "loss": 0.6786,
+      "step": 374
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.958284249223006e-05,
+      "loss": 1.4426,
+      "step": 375
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9580569702671332e-05,
+      "loss": 1.1317,
+      "step": 376
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.957829087114362e-05,
+      "loss": 1.4542,
+      "step": 377
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.957600599908406e-05,
+      "loss": 0.807,
+      "step": 378
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.957371508793361e-05,
+      "loss": 1.5726,
+      "step": 379
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9571418139137023e-05,
+      "loss": 1.4531,
+      "step": 380
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9569115154142873e-05,
+      "loss": 1.4213,
+      "step": 381
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9566806134403526e-05,
+      "loss": 1.3222,
+      "step": 382
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9564491081375157e-05,
+      "loss": 0.9039,
+      "step": 383
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.956216999651776e-05,
+      "loss": 1.507,
+      "step": 384
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.9559842881295122e-05,
+      "loss": 1.4292,
+      "step": 385
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 1.955750973717483e-05,
+      "loss": 1.2868,
+      "step": 386
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.955517056562828e-05,
+      "loss": 1.5842,
+      "step": 387
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.955282536813066e-05,
+      "loss": 1.615,
+      "step": 388
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.955047414616097e-05,
+      "loss": 1.3707,
+      "step": 389
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9548116901202006e-05,
+      "loss": 1.4077,
+      "step": 390
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9545753634740358e-05,
+      "loss": 1.4326,
+      "step": 391
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9543384348266415e-05,
+      "loss": 1.3821,
+      "step": 392
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.954100904327436e-05,
+      "loss": 1.4803,
+      "step": 393
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.953862772126218e-05,
+      "loss": 1.5412,
+      "step": 394
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.953624038373165e-05,
+      "loss": 1.7189,
+      "step": 395
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9533847032188337e-05,
+      "loss": 1.1109,
+      "step": 396
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.953144766814161e-05,
+      "loss": 1.237,
+      "step": 397
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.952904229310462e-05,
+      "loss": 1.2492,
+      "step": 398
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.952663090859431e-05,
+      "loss": 1.3605,
+      "step": 399
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.952421351613142e-05,
+      "loss": 1.4423,
+      "step": 400
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9521790117240472e-05,
+      "loss": 1.3342,
+      "step": 401
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9519360713449775e-05,
+      "loss": 1.7274,
+      "step": 402
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9516925306291435e-05,
+      "loss": 1.6019,
+      "step": 403
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.951448389730133e-05,
+      "loss": 1.1267,
+      "step": 404
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9512036488019138e-05,
+      "loss": 1.1338,
+      "step": 405
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 1.9509583079988307e-05,
+      "loss": 1.6472,
+      "step": 406
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9507123674756076e-05,
+      "loss": 1.2389,
+      "step": 407
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9504658273873465e-05,
+      "loss": 1.5887,
+      "step": 408
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9502186878895273e-05,
+      "loss": 1.0987,
+      "step": 409
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9499709491380083e-05,
+      "loss": 1.384,
+      "step": 410
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9497226112890252e-05,
+      "loss": 1.7325,
+      "step": 411
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9494736744991925e-05,
+      "loss": 1.4435,
+      "step": 412
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9492241389255006e-05,
+      "loss": 1.9634,
+      "step": 413
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9489740047253197e-05,
+      "loss": 1.5567,
+      "step": 414
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9487232720563962e-05,
+      "loss": 1.3805,
+      "step": 415
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.948471941076854e-05,
+      "loss": 1.325,
+      "step": 416
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9482200119451945e-05,
+      "loss": 1.3569,
+      "step": 417
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.947967484820297e-05,
+      "loss": 1.1947,
+      "step": 418
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.947714359861416e-05,
+      "loss": 1.231,
+      "step": 419
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9474606372281854e-05,
+      "loss": 1.1062,
+      "step": 420
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9472063170806144e-05,
+      "loss": 1.8761,
+      "step": 421
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.94695139957909e-05,
+      "loss": 1.3697,
+      "step": 422
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9466958848843748e-05,
+      "loss": 1.0142,
+      "step": 423
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9464397731576093e-05,
+      "loss": 1.0189,
+      "step": 424
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.94618306456031e-05,
+      "loss": 1.7268,
+      "step": 425
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 1.9459257592543688e-05,
+      "loss": 1.0269,
+      "step": 426
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9456678574020557e-05,
+      "loss": 1.2833,
+      "step": 427
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9454093591660155e-05,
+      "loss": 1.1967,
+      "step": 428
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.94515026470927e-05,
+      "loss": 1.5793,
+      "step": 429
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9448905741952167e-05,
+      "loss": 1.4184,
+      "step": 430
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.944630287787629e-05,
+      "loss": 1.2217,
+      "step": 431
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9443694056506556e-05,
+      "loss": 1.222,
+      "step": 432
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9441079279488213e-05,
+      "loss": 1.2332,
+      "step": 433
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9438458548470268e-05,
+      "loss": 1.7458,
+      "step": 434
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9435831865105482e-05,
+      "loss": 1.3248,
+      "step": 435
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9433199231050367e-05,
+      "loss": 1.9621,
+      "step": 436
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9430560647965192e-05,
+      "loss": 0.8943,
+      "step": 437
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.942791611751397e-05,
+      "loss": 0.6546,
+      "step": 438
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9425265641364467e-05,
+      "loss": 1.4626,
+      "step": 439
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9422609221188208e-05,
+      "loss": 1.3289,
+      "step": 440
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9419946858660452e-05,
+      "loss": 0.8965,
+      "step": 441
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9417278555460223e-05,
+      "loss": 1.225,
+      "step": 442
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.941460431327027e-05,
+      "loss": 1.713,
+      "step": 443
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.941192413377711e-05,
+      "loss": 1.0535,
+      "step": 444
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9409238018670986e-05,
+      "loss": 1.4671,
+      "step": 445
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 1.9406545969645894e-05,
+      "loss": 1.5399,
+      "step": 446
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.940384798839957e-05,
+      "loss": 1.3484,
+      "step": 447
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.940114407663349e-05,
+      "loss": 1.6319,
+      "step": 448
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9398434236052873e-05,
+      "loss": 1.7901,
+      "step": 449
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9395718468366672e-05,
+      "loss": 1.4023,
+      "step": 450
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9392996775287588e-05,
+      "loss": 1.1306,
+      "step": 451
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9390269158532043e-05,
+      "loss": 1.648,
+      "step": 452
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9387535619820207e-05,
+      "loss": 1.5907,
+      "step": 453
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9384796160875982e-05,
+      "loss": 1.2352,
+      "step": 454
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9382050783427e-05,
+      "loss": 1.6986,
+      "step": 455
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9379299489204634e-05,
+      "loss": 1.4436,
+      "step": 456
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.937654227994398e-05,
+      "loss": 1.8155,
+      "step": 457
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.937377915738386e-05,
+      "loss": 1.4464,
+      "step": 458
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.937101012326685e-05,
+      "loss": 1.5623,
+      "step": 459
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9368235179339217e-05,
+      "loss": 1.6515,
+      "step": 460
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9365454327350984e-05,
+      "loss": 1.3808,
+      "step": 461
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.936266756905589e-05,
+      "loss": 1.708,
+      "step": 462
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.93598749062114e-05,
+      "loss": 1.3696,
+      "step": 463
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9357076340578696e-05,
+      "loss": 1.4497,
+      "step": 464
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.9354271873922692e-05,
+      "loss": 1.2003,
+      "step": 465
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 1.935146150801202e-05,
+      "loss": 1.182,
+      "step": 466
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9348645244619035e-05,
+      "loss": 1.9314,
+      "step": 467
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9345823085519804e-05,
+      "loss": 1.5496,
+      "step": 468
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9342995032494116e-05,
+      "loss": 1.0179,
+      "step": 469
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9340161087325483e-05,
+      "loss": 1.3892,
+      "step": 470
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9337321251801123e-05,
+      "loss": 1.0485,
+      "step": 471
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9334475527711973e-05,
+      "loss": 1.786,
+      "step": 472
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9331623916852683e-05,
+      "loss": 1.3026,
+      "step": 473
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.932876642102162e-05,
+      "loss": 1.2246,
+      "step": 474
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9325903042020856e-05,
+      "loss": 1.7167,
+      "step": 475
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9323033781656178e-05,
+      "loss": 1.5324,
+      "step": 476
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9320158641737077e-05,
+      "loss": 1.51,
+      "step": 477
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9317277624076758e-05,
+      "loss": 1.3634,
+      "step": 478
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.931439073049213e-05,
+      "loss": 1.2313,
+      "step": 479
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.93114979628038e-05,
+      "loss": 1.624,
+      "step": 480
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9308599322836092e-05,
+      "loss": 1.152,
+      "step": 481
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.930569481241703e-05,
+      "loss": 1.2635,
+      "step": 482
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9302784433378333e-05,
+      "loss": 1.921,
+      "step": 483
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.929986818755543e-05,
+      "loss": 1.2884,
+      "step": 484
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 1.9296946076787447e-05,
+      "loss": 1.2914,
+      "step": 485
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9294018102917208e-05,
+      "loss": 1.2168,
+      "step": 486
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.929108426779123e-05,
+      "loss": 1.7837,
+      "step": 487
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9288144573259735e-05,
+      "loss": 1.3335,
+      "step": 488
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9285199021176634e-05,
+      "loss": 1.2004,
+      "step": 489
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9282247613399537e-05,
+      "loss": 1.3216,
+      "step": 490
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9279290351789737e-05,
+      "loss": 1.7597,
+      "step": 491
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9276327238212232e-05,
+      "loss": 1.3228,
+      "step": 492
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9273358274535703e-05,
+      "loss": 1.6335,
+      "step": 493
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.9270383462632524e-05,
+      "loss": 1.4952,
+      "step": 494
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.926740280437875e-05,
+      "loss": 1.8452,
+      "step": 495
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 1.926441630165413e-05,
+      "loss": 1.1917,
+      "step": 496
+    },
+    {
+      "epoch": 0.25,
+      "eval_loss": 1.436846375465393,
+      "eval_runtime": 99.6318,
+      "eval_samples_per_second": 1.164,
+      "eval_steps_per_second": 1.164,
+      "step": 496
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 3966,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 496,
+  "total_flos": 8.76993307434025e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-496/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8358787deffea50a57e590c449c424422609cef40af4a0de5a5b2512c3bc98e
+size 5304

checkpoint-992/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Chat-Error/Mistral-Kimiko-CSFT
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.1

checkpoint-992/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Chat-Error/Mistral-Kimiko-CSFT",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "q_proj",
+    "v_proj",
+    "k_proj",
+    "gate_proj",
+    "o_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

checkpoint-992/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b88b9637e3358f5b8ff473600cce90ebdf16f5d070afb62b1f86c6e44327e30e
+size 335604696

checkpoint-992/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4abd00b5601aeee65035edf8ab1a8b0c75a90406c5963728872944d295dc861f
+size 168625172

checkpoint-992/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d245e05e72192c132e0f2edb6fdcae0c578c890f0fe912f17ec7b0bba2d38cc3
+size 14244

checkpoint-992/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2929294335bfa51c72282b3bac32d7befabaac2c7f9a9cadeba03e0dd0062741
+size 1064

checkpoint-992/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-992/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8358787deffea50a57e590c449c424422609cef40af4a0de5a5b2512c3bc98e
+size 5304

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "Chat-Error/IWasDointCrystalMethOnTheKitchenButThenMomWalkedIn-NeuralHermesStripedCapybara-Mistral-11B-SLERP",
   "architectures": [
     "MistralForCausalLM"
   ],
@@ -16,17 +16,17 @@
   "num_hidden_layers": 32,
   "num_key_value_heads": 8,
   "quantization_config": {
-    "_load_in_4bit": false,
-    "_load_in_8bit": true,
-    "bnb_4bit_compute_dtype": "float32",
-    "bnb_4bit_quant_type": "fp4",
-    "bnb_4bit_use_double_quant": false,
     "llm_int8_enable_fp32_cpu_offload": false,
     "llm_int8_has_fp16_weight": false,
     "llm_int8_skip_modules": null,
     "llm_int8_threshold": 6.0,
-    "load_in_4bit": false,
-    "load_in_8bit": true,
     "quant_method": "bitsandbytes"
   },
   "rms_norm_eps": 1e-05,

 {
+  "_name_or_path": "Chat-Error/Mistral-Kimiko-CSFT",
   "architectures": [
     "MistralForCausalLM"
   ],
   "num_hidden_layers": 32,
   "num_key_value_heads": 8,
   "quantization_config": {
+    "_load_in_4bit": true,
+    "_load_in_8bit": false,
+    "bnb_4bit_compute_dtype": "bfloat16",
+    "bnb_4bit_quant_type": "nf4",
+    "bnb_4bit_use_double_quant": true,
     "llm_int8_enable_fp32_cpu_offload": false,
     "llm_int8_has_fp16_weight": false,
     "llm_int8_skip_modules": null,
     "llm_int8_threshold": 6.0,
+    "load_in_4bit": true,
+    "load_in_8bit": false,
     "quant_method": "bitsandbytes"
   },
   "rms_norm_eps": 1e-05,

tmp-checkpoint-516/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Chat-Error/Mistral-Kimiko-CSFT
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2