Crystalcareai commited on Feb 7

Commit

7a23527

•

1 Parent(s): 528e497

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +58 -0
adapter_config.json +27 -0
adapter_model.safetensors +3 -0
added_tokens.json +5 -0
all_results.json +7 -0
checkpoint-100/README.md +204 -0
checkpoint-100/adapter_config.json +27 -0
checkpoint-100/adapter_model.safetensors +3 -0
checkpoint-100/added_tokens.json +5 -0
checkpoint-100/merges.txt +0 -0
checkpoint-100/optimizer.pt +3 -0
checkpoint-100/rng_state.pth +3 -0
checkpoint-100/scheduler.pt +3 -0
checkpoint-100/special_tokens_map.json +14 -0
checkpoint-100/tokenizer_config.json +44 -0
checkpoint-100/trainer_state.json +141 -0
checkpoint-100/training_args.bin +3 -0
checkpoint-100/vocab.json +0 -0
checkpoint-1000/README.md +204 -0
checkpoint-1000/adapter_config.json +27 -0
checkpoint-1000/adapter_model.safetensors +3 -0
checkpoint-1000/added_tokens.json +5 -0
checkpoint-1000/merges.txt +0 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state.pth +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/special_tokens_map.json +14 -0
checkpoint-1000/tokenizer_config.json +44 -0
checkpoint-1000/trainer_state.json +1221 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-1000/vocab.json +0 -0
checkpoint-1100/README.md +204 -0
checkpoint-1100/adapter_config.json +27 -0
checkpoint-1100/adapter_model.safetensors +3 -0
checkpoint-1100/added_tokens.json +5 -0
checkpoint-1100/merges.txt +0 -0
checkpoint-1100/optimizer.pt +3 -0
checkpoint-1100/rng_state.pth +3 -0
checkpoint-1100/scheduler.pt +3 -0
checkpoint-1100/special_tokens_map.json +14 -0
checkpoint-1100/tokenizer_config.json +44 -0
checkpoint-1100/trainer_state.json +1341 -0
checkpoint-1100/training_args.bin +3 -0
checkpoint-1100/vocab.json +0 -0
checkpoint-1200/README.md +204 -0
checkpoint-1200/adapter_config.json +27 -0
checkpoint-1200/adapter_model.safetensors +3 -0
checkpoint-1200/added_tokens.json +5 -0
checkpoint-1200/merges.txt +0 -0
checkpoint-1200/optimizer.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+license: other
+library_name: peft
+tags:
+- llama-factory
+- lora
+- generated_from_trainer
+base_model: Qwen/Qwen1.5-7B
+model-index:
+- name: train_2024-02-07-03-18-19
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# train_2024-02-07-03-18-19
+This model is a fine-tuned version of [Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B) on the openorca dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 1
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- num_epochs: 1.5
+### Training results
+### Framework versions
+- PEFT 0.8.2
+- Transformers 4.37.2
+- Pytorch 2.1.1+cu121
+- Datasets 2.16.1
+- Tokenizers 0.15.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen1.5-7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7419f49b6d29f1e884f957d0a7683a0ee62f9fe457a8c76cb9012a3d918f7efd
+size 16794200

added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

all_results.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+    "epoch": 1.5,
+    "train_loss": 0.5786039993286133,
+    "train_runtime": 49929.7586,
+    "train_samples_per_second": 3.005,
+    "train_steps_per_second": 0.188
+}

checkpoint-100/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Qwen/Qwen1.5-7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

checkpoint-100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen1.5-7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:434d95a4249f6504296279a5afb6eab6cc6be19784dda3e605ca88d38397055f
+size 16794200

checkpoint-100/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

checkpoint-100/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-100/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07ba0565b93c130179680f49954bf9c5e43262001804ed2a6cead76db4398bda
+size 33662074

checkpoint-100/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d9d73fcab9fded2400eb9f44001395b47a31303928eb60d0757795eafcf11eb
+size 14244

checkpoint-100/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e768aec4f8b2ab32c9e46a5c7390405181351d84675afd0c7341bdc75075b42
+size 1064

checkpoint-100/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "eos_token": "<|im_end|>",
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-100/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-100/trainer_state.json ADDED Viewed

	@@ -0,0 +1,141 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.015998240193578706,
+  "eval_steps": 500,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.9999964908081455e-05,
+      "loss": 0.7285,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999985963242432e-05,
+      "loss": 0.6712,
+      "step": 10
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999968417332415e-05,
+      "loss": 0.6081,
+      "step": 15
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999943853127351e-05,
+      "loss": 0.6383,
+      "step": 20
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999912270696202e-05,
+      "loss": 0.6456,
+      "step": 25
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.9998736701276295e-05,
+      "loss": 0.6228,
+      "step": 30
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99982805153e-05,
+      "loss": 0.6134,
+      "step": 35
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9997754150313815e-05,
+      "loss": 0.5975,
+      "step": 40
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999715760779541e-05,
+      "loss": 0.6053,
+      "step": 45
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9996490889419514e-05,
+      "loss": 0.6064,
+      "step": 50
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999575399705783e-05,
+      "loss": 0.5947,
+      "step": 55
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999494693277907e-05,
+      "loss": 0.5839,
+      "step": 60
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999406969884897e-05,
+      "loss": 0.6106,
+      "step": 65
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999312229773022e-05,
+      "loss": 0.6146,
+      "step": 70
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99921047320825e-05,
+      "loss": 0.5659,
+      "step": 75
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9991017004762496e-05,
+      "loss": 0.5682,
+      "step": 80
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.998985911882384e-05,
+      "loss": 0.6352,
+      "step": 85
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.998863107751711e-05,
+      "loss": 0.6018,
+      "step": 90
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998733288428987e-05,
+      "loss": 0.6342,
+      "step": 95
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.9985964542786614e-05,
+      "loss": 0.5886,
+      "step": 100
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 9375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 100,
+  "total_flos": 2.3307211022598144e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-100/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d44690adb91aea51e69289625324f309934686a0562d32d1f72a68a6232ae48
+size 4920

checkpoint-100/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Qwen/Qwen1.5-7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen1.5-7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-1000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b30480fc658a0413024f09dafc9aef2936910f19580be05221ecb0ecc7068925
+size 16794200

checkpoint-1000/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

checkpoint-1000/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64fad9f48cb3be54e0e1012fd1111735c08ddb6843a0c786622017bdee78e513
+size 33662074

checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1af6716805f356b873cfc2836ca30f8e6774dc7ddd366425c736af5e3b37b92
+size 14244

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9c080de809ec8b0a7559bf9b59e45c72f73a17c81d01c11662264e945fba7d5
+size 1064

checkpoint-1000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "eos_token": "<|im_end|>",
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1221 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.15998240193578706,
+  "eval_steps": 500,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.9999964908081455e-05,
+      "loss": 0.7285,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999985963242432e-05,
+      "loss": 0.6712,
+      "step": 10
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999968417332415e-05,
+      "loss": 0.6081,
+      "step": 15
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999943853127351e-05,
+      "loss": 0.6383,
+      "step": 20
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999912270696202e-05,
+      "loss": 0.6456,
+      "step": 25
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.9998736701276295e-05,
+      "loss": 0.6228,
+      "step": 30
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99982805153e-05,
+      "loss": 0.6134,
+      "step": 35
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9997754150313815e-05,
+      "loss": 0.5975,
+      "step": 40
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999715760779541e-05,
+      "loss": 0.6053,
+      "step": 45
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9996490889419514e-05,
+      "loss": 0.6064,
+      "step": 50
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999575399705783e-05,
+      "loss": 0.5947,
+      "step": 55
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999494693277907e-05,
+      "loss": 0.5839,
+      "step": 60
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999406969884897e-05,
+      "loss": 0.6106,
+      "step": 65
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999312229773022e-05,
+      "loss": 0.6146,
+      "step": 70
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99921047320825e-05,
+      "loss": 0.5659,
+      "step": 75
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9991017004762496e-05,
+      "loss": 0.5682,
+      "step": 80
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.998985911882384e-05,
+      "loss": 0.6352,
+      "step": 85
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.998863107751711e-05,
+      "loss": 0.6018,
+      "step": 90
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998733288428987e-05,
+      "loss": 0.6342,
+      "step": 95
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.9985964542786614e-05,
+      "loss": 0.5886,
+      "step": 100
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998452605684874e-05,
+      "loss": 0.6097,
+      "step": 105
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998301743051459e-05,
+      "loss": 0.5687,
+      "step": 110
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998143866801942e-05,
+      "loss": 0.5866,
+      "step": 115
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997978977379536e-05,
+      "loss": 0.5612,
+      "step": 120
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997807075247146e-05,
+      "loss": 0.6221,
+      "step": 125
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997628160887361e-05,
+      "loss": 0.5728,
+      "step": 130
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997442234802456e-05,
+      "loss": 0.6105,
+      "step": 135
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997249297514394e-05,
+      "loss": 0.6161,
+      "step": 140
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997049349564814e-05,
+      "loss": 0.6511,
+      "step": 145
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.996842391515044e-05,
+      "loss": 0.6108,
+      "step": 150
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.996628423946087e-05,
+      "loss": 0.5664,
+      "step": 155
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.996407447458626e-05,
+      "loss": 0.6127,
+      "step": 160
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.99617946267302e-05,
+      "loss": 0.6213,
+      "step": 165
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.995944470229302e-05,
+      "loss": 0.596,
+      "step": 170
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.9957024707871806e-05,
+      "loss": 0.5731,
+      "step": 175
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.995453465026032e-05,
+      "loss": 0.5704,
+      "step": 180
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.995197453644905e-05,
+      "loss": 0.5767,
+      "step": 185
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.994934437362513e-05,
+      "loss": 0.6134,
+      "step": 190
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.9946644169172355e-05,
+      "loss": 0.5919,
+      "step": 195
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.994387393067117e-05,
+      "loss": 0.6031,
+      "step": 200
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.994103366589859e-05,
+      "loss": 0.6236,
+      "step": 205
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.993812338282826e-05,
+      "loss": 0.6307,
+      "step": 210
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.993514308963036e-05,
+      "loss": 0.5618,
+      "step": 215
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.993209279467164e-05,
+      "loss": 0.6093,
+      "step": 220
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.992897250651535e-05,
+      "loss": 0.5744,
+      "step": 225
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.992578223392124e-05,
+      "loss": 0.5844,
+      "step": 230
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.992252198584554e-05,
+      "loss": 0.5358,
+      "step": 235
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9919191771440905e-05,
+      "loss": 0.6067,
+      "step": 240
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.991579160005644e-05,
+      "loss": 0.6147,
+      "step": 245
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.991232148123761e-05,
+      "loss": 0.6185,
+      "step": 250
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.990878142472628e-05,
+      "loss": 0.5797,
+      "step": 255
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.990517144046064e-05,
+      "loss": 0.58,
+      "step": 260
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9901491538575185e-05,
+      "loss": 0.6051,
+      "step": 265
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9897741729400705e-05,
+      "loss": 0.6199,
+      "step": 270
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9893922023464236e-05,
+      "loss": 0.6173,
+      "step": 275
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.989003243148904e-05,
+      "loss": 0.5907,
+      "step": 280
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.988607296439458e-05,
+      "loss": 0.5818,
+      "step": 285
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.988204363329648e-05,
+      "loss": 0.5767,
+      "step": 290
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.987794444950651e-05,
+      "loss": 0.579,
+      "step": 295
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.987377542453251e-05,
+      "loss": 0.6454,
+      "step": 300
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.986953657007841e-05,
+      "loss": 0.5777,
+      "step": 305
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.986522789804417e-05,
+      "loss": 0.532,
+      "step": 310
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9860849420525766e-05,
+      "loss": 0.56,
+      "step": 315
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9856401149815126e-05,
+      "loss": 0.5624,
+      "step": 320
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.985188309840012e-05,
+      "loss": 0.6008,
+      "step": 325
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9847295278964514e-05,
+      "loss": 0.6425,
+      "step": 330
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.984263770438793e-05,
+      "loss": 0.5907,
+      "step": 335
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9837910387745845e-05,
+      "loss": 0.5926,
+      "step": 340
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.98331133423095e-05,
+      "loss": 0.6115,
+      "step": 345
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.982824658154589e-05,
+      "loss": 0.5685,
+      "step": 350
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.982331011911774e-05,
+      "loss": 0.5412,
+      "step": 355
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.981830396888344e-05,
+      "loss": 0.6032,
+      "step": 360
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.981322814489703e-05,
+      "loss": 0.568,
+      "step": 365
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.980808266140813e-05,
+      "loss": 0.5835,
+      "step": 370
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.980286753286195e-05,
+      "loss": 0.5633,
+      "step": 375
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.979758277389919e-05,
+      "loss": 0.574,
+      "step": 380
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.979222839935602e-05,
+      "loss": 0.5774,
+      "step": 385
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.9786804424264085e-05,
+      "loss": 0.5608,
+      "step": 390
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.9781310863850405e-05,
+      "loss": 0.5659,
+      "step": 395
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.977574773353732e-05,
+      "loss": 0.6167,
+      "step": 400
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.977011504894252e-05,
+      "loss": 0.5523,
+      "step": 405
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9764412825878943e-05,
+      "loss": 0.5804,
+      "step": 410
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.975864108035474e-05,
+      "loss": 0.5811,
+      "step": 415
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.975279982857324e-05,
+      "loss": 0.5832,
+      "step": 420
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9746889086932895e-05,
+      "loss": 0.6035,
+      "step": 425
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.974090887202726e-05,
+      "loss": 0.6077,
+      "step": 430
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9734859200644905e-05,
+      "loss": 0.6147,
+      "step": 435
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.97287400897694e-05,
+      "loss": 0.6825,
+      "step": 440
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.972255155657925e-05,
+      "loss": 0.5573,
+      "step": 445
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.971629361844785e-05,
+      "loss": 0.5691,
+      "step": 450
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9709966292943455e-05,
+      "loss": 0.5768,
+      "step": 455
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.970356959782909e-05,
+      "loss": 0.5953,
+      "step": 460
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9697103551062556e-05,
+      "loss": 0.6207,
+      "step": 465
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.969056817079633e-05,
+      "loss": 0.5845,
+      "step": 470
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.968396347537751e-05,
+      "loss": 0.5719,
+      "step": 475
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.967728948334784e-05,
+      "loss": 0.5889,
+      "step": 480
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.967054621344356e-05,
+      "loss": 0.5574,
+      "step": 485
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.966373368459541e-05,
+      "loss": 0.6037,
+      "step": 490
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.965685191592859e-05,
+      "loss": 0.5707,
+      "step": 495
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.964990092676263e-05,
+      "loss": 0.6586,
+      "step": 500
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.964288073661142e-05,
+      "loss": 0.5559,
+      "step": 505
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.963579136518312e-05,
+      "loss": 0.5609,
+      "step": 510
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.96286328323801e-05,
+      "loss": 0.6081,
+      "step": 515
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.96214051582989e-05,
+      "loss": 0.5724,
+      "step": 520
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.9614108363230135e-05,
+      "loss": 0.5758,
+      "step": 525
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.960674246765851e-05,
+      "loss": 0.6191,
+      "step": 530
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.959930749226269e-05,
+      "loss": 0.5638,
+      "step": 535
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.959180345791528e-05,
+      "loss": 0.5698,
+      "step": 540
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.958423038568274e-05,
+      "loss": 0.5878,
+      "step": 545
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.9576588296825386e-05,
+      "loss": 0.5734,
+      "step": 550
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.956887721279726e-05,
+      "loss": 0.606,
+      "step": 555
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.956109715524608e-05,
+      "loss": 0.5871,
+      "step": 560
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.955324814601324e-05,
+      "loss": 0.6306,
+      "step": 565
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.9545330207133664e-05,
+      "loss": 0.6231,
+      "step": 570
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.953734336083583e-05,
+      "loss": 0.6278,
+      "step": 575
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.952928762954161e-05,
+      "loss": 0.5551,
+      "step": 580
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.952116303586631e-05,
+      "loss": 0.6441,
+      "step": 585
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.951296960261853e-05,
+      "loss": 0.5753,
+      "step": 590
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9504707352800125e-05,
+      "loss": 0.5458,
+      "step": 595
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.949637630960617e-05,
+      "loss": 0.5668,
+      "step": 600
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.948797649642484e-05,
+      "loss": 0.5727,
+      "step": 605
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9479507936837364e-05,
+      "loss": 0.628,
+      "step": 610
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.947097065461801e-05,
+      "loss": 0.5958,
+      "step": 615
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.946236467373392e-05,
+      "loss": 0.6173,
+      "step": 620
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9453690018345144e-05,
+      "loss": 0.5473,
+      "step": 625
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9444946712804494e-05,
+      "loss": 0.6012,
+      "step": 630
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.943613478165753e-05,
+      "loss": 0.6303,
+      "step": 635
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9427254249642444e-05,
+      "loss": 0.6188,
+      "step": 640
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.941830514169004e-05,
+      "loss": 0.6079,
+      "step": 645
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.940928748292363e-05,
+      "loss": 0.5589,
+      "step": 650
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.940020129865895e-05,
+      "loss": 0.5652,
+      "step": 655
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.939104661440415e-05,
+      "loss": 0.5415,
+      "step": 660
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.938182345585966e-05,
+      "loss": 0.5657,
+      "step": 665
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9372531848918145e-05,
+      "loss": 0.6403,
+      "step": 670
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9363171819664434e-05,
+      "loss": 0.6198,
+      "step": 675
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.935374339437543e-05,
+      "loss": 0.5685,
+      "step": 680
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.934424659952006e-05,
+      "loss": 0.5737,
+      "step": 685
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.933468146175918e-05,
+      "loss": 0.5975,
+      "step": 690
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9325048007945526e-05,
+      "loss": 0.6033,
+      "step": 695
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9315346265123594e-05,
+      "loss": 0.6587,
+      "step": 700
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9305576260529607e-05,
+      "loss": 0.5903,
+      "step": 705
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.929573802159143e-05,
+      "loss": 0.5883,
+      "step": 710
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9285831575928465e-05,
+      "loss": 0.6151,
+      "step": 715
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.927585695135162e-05,
+      "loss": 0.5687,
+      "step": 720
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9265814175863186e-05,
+      "loss": 0.6008,
+      "step": 725
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.925570327765678e-05,
+      "loss": 0.6045,
+      "step": 730
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9245524285117274e-05,
+      "loss": 0.5439,
+      "step": 735
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9235277226820695e-05,
+      "loss": 0.61,
+      "step": 740
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.922496213153416e-05,
+      "loss": 0.5913,
+      "step": 745
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9214579028215776e-05,
+      "loss": 0.6255,
+      "step": 750
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.920412794601461e-05,
+      "loss": 0.5834,
+      "step": 755
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9193608914270515e-05,
+      "loss": 0.6102,
+      "step": 760
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.918302196251415e-05,
+      "loss": 0.5591,
+      "step": 765
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.917236712046682e-05,
+      "loss": 0.556,
+      "step": 770
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.916164441804044e-05,
+      "loss": 0.5774,
+      "step": 775
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9150853885337426e-05,
+      "loss": 0.6256,
+      "step": 780
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.913999555265062e-05,
+      "loss": 0.6505,
+      "step": 785
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.9129069450463186e-05,
+      "loss": 0.6012,
+      "step": 790
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.911807560944858e-05,
+      "loss": 0.5913,
+      "step": 795
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.910701406047037e-05,
+      "loss": 0.6963,
+      "step": 800
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.909588483458225e-05,
+      "loss": 0.5588,
+      "step": 805
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.9084687963027894e-05,
+      "loss": 0.6078,
+      "step": 810
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.907342347724087e-05,
+      "loss": 0.5816,
+      "step": 815
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.906209140884459e-05,
+      "loss": 0.5989,
+      "step": 820
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.905069178965215e-05,
+      "loss": 0.5568,
+      "step": 825
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.9039224651666325e-05,
+      "loss": 0.6021,
+      "step": 830
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.902769002707942e-05,
+      "loss": 0.6009,
+      "step": 835
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.90160879482732e-05,
+      "loss": 0.5801,
+      "step": 840
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.9004418447818815e-05,
+      "loss": 0.5939,
+      "step": 845
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.899268155847667e-05,
+      "loss": 0.6027,
+      "step": 850
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.898087731319636e-05,
+      "loss": 0.5416,
+      "step": 855
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.896900574511657e-05,
+      "loss": 0.5596,
+      "step": 860
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.8957066887565e-05,
+      "loss": 0.6187,
+      "step": 865
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.894506077405824e-05,
+      "loss": 0.5733,
+      "step": 870
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.893298743830168e-05,
+      "loss": 0.5862,
+      "step": 875
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.892084691418947e-05,
+      "loss": 0.6191,
+      "step": 880
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.8908639235804324e-05,
+      "loss": 0.6258,
+      "step": 885
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.889636443741752e-05,
+      "loss": 0.5504,
+      "step": 890
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.888402255348876e-05,
+      "loss": 0.6241,
+      "step": 895
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.887161361866608e-05,
+      "loss": 0.5839,
+      "step": 900
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.8859137667785735e-05,
+      "loss": 0.5797,
+      "step": 905
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.884659473587213e-05,
+      "loss": 0.6189,
+      "step": 910
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8833984858137715e-05,
+      "loss": 0.6194,
+      "step": 915
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8821308069982867e-05,
+      "loss": 0.585,
+      "step": 920
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.880856440699582e-05,
+      "loss": 0.582,
+      "step": 925
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8795753904952534e-05,
+      "loss": 0.5789,
+      "step": 930
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.878287659981662e-05,
+      "loss": 0.6236,
+      "step": 935
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8769932527739225e-05,
+      "loss": 0.5614,
+      "step": 940
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8756921725058934e-05,
+      "loss": 0.5972,
+      "step": 945
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.874384422830167e-05,
+      "loss": 0.5682,
+      "step": 950
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.873070007418059e-05,
+      "loss": 0.6327,
+      "step": 955
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.871748929959598e-05,
+      "loss": 0.5577,
+      "step": 960
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.870421194163515e-05,
+      "loss": 0.56,
+      "step": 965
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.8690868037572346e-05,
+      "loss": 0.6324,
+      "step": 970
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.867745762486861e-05,
+      "loss": 0.5764,
+      "step": 975
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.8663980741171724e-05,
+      "loss": 0.564,
+      "step": 980
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.865043742431605e-05,
+      "loss": 0.5534,
+      "step": 985
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.863682771232248e-05,
+      "loss": 0.5953,
+      "step": 990
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.862315164339829e-05,
+      "loss": 0.4992,
+      "step": 995
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.860940925593703e-05,
+      "loss": 0.6061,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 9375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 100,
+  "total_flos": 2.2933185503035392e+17,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d44690adb91aea51e69289625324f309934686a0562d32d1f72a68a6232ae48
+size 4920

checkpoint-1000/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1100/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Qwen/Qwen1.5-7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

checkpoint-1100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen1.5-7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-1100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:19934a9511ecf62cfcdf35652e5259e1cd42fc1d49aeab8cd4516d133e8eb25a
+size 16794200

checkpoint-1100/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

checkpoint-1100/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1100/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a85cb04d96917f5fc80b13194fcf4a2f4c8693f31b6871813008dba903430ec8
+size 33662074

checkpoint-1100/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c424384b626fd13e20f3948d7c6ef387ec7a14238d77ce4204ad41af1b60f0b
+size 14244

checkpoint-1100/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e67631f49d5befdfdeca5817dfc9485cfee75a3c901cce81dfb12e0114fa127
+size 1064

checkpoint-1100/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "eos_token": "<|im_end|>",
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1100/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-1100/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1341 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.17598064212936576,
+  "eval_steps": 500,
+  "global_step": 1100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.9999964908081455e-05,
+      "loss": 0.7285,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999985963242432e-05,
+      "loss": 0.6712,
+      "step": 10
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999968417332415e-05,
+      "loss": 0.6081,
+      "step": 15
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999943853127351e-05,
+      "loss": 0.6383,
+      "step": 20
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.999912270696202e-05,
+      "loss": 0.6456,
+      "step": 25
+    },
+    {
+      "epoch": 0.0,
+      "learning_rate": 4.9998736701276295e-05,
+      "loss": 0.6228,
+      "step": 30
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99982805153e-05,
+      "loss": 0.6134,
+      "step": 35
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9997754150313815e-05,
+      "loss": 0.5975,
+      "step": 40
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999715760779541e-05,
+      "loss": 0.6053,
+      "step": 45
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9996490889419514e-05,
+      "loss": 0.6064,
+      "step": 50
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999575399705783e-05,
+      "loss": 0.5947,
+      "step": 55
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999494693277907e-05,
+      "loss": 0.5839,
+      "step": 60
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999406969884897e-05,
+      "loss": 0.6106,
+      "step": 65
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.999312229773022e-05,
+      "loss": 0.6146,
+      "step": 70
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99921047320825e-05,
+      "loss": 0.5659,
+      "step": 75
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9991017004762496e-05,
+      "loss": 0.5682,
+      "step": 80
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.998985911882384e-05,
+      "loss": 0.6352,
+      "step": 85
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.998863107751711e-05,
+      "loss": 0.6018,
+      "step": 90
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998733288428987e-05,
+      "loss": 0.6342,
+      "step": 95
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.9985964542786614e-05,
+      "loss": 0.5886,
+      "step": 100
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998452605684874e-05,
+      "loss": 0.6097,
+      "step": 105
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998301743051459e-05,
+      "loss": 0.5687,
+      "step": 110
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.998143866801942e-05,
+      "loss": 0.5866,
+      "step": 115
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997978977379536e-05,
+      "loss": 0.5612,
+      "step": 120
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997807075247146e-05,
+      "loss": 0.6221,
+      "step": 125
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997628160887361e-05,
+      "loss": 0.5728,
+      "step": 130
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997442234802456e-05,
+      "loss": 0.6105,
+      "step": 135
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997249297514394e-05,
+      "loss": 0.6161,
+      "step": 140
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.997049349564814e-05,
+      "loss": 0.6511,
+      "step": 145
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.996842391515044e-05,
+      "loss": 0.6108,
+      "step": 150
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.996628423946087e-05,
+      "loss": 0.5664,
+      "step": 155
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.996407447458626e-05,
+      "loss": 0.6127,
+      "step": 160
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.99617946267302e-05,
+      "loss": 0.6213,
+      "step": 165
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.995944470229302e-05,
+      "loss": 0.596,
+      "step": 170
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.9957024707871806e-05,
+      "loss": 0.5731,
+      "step": 175
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.995453465026032e-05,
+      "loss": 0.5704,
+      "step": 180
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.995197453644905e-05,
+      "loss": 0.5767,
+      "step": 185
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.994934437362513e-05,
+      "loss": 0.6134,
+      "step": 190
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.9946644169172355e-05,
+      "loss": 0.5919,
+      "step": 195
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.994387393067117e-05,
+      "loss": 0.6031,
+      "step": 200
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.994103366589859e-05,
+      "loss": 0.6236,
+      "step": 205
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.993812338282826e-05,
+      "loss": 0.6307,
+      "step": 210
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.993514308963036e-05,
+      "loss": 0.5618,
+      "step": 215
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.993209279467164e-05,
+      "loss": 0.6093,
+      "step": 220
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.992897250651535e-05,
+      "loss": 0.5744,
+      "step": 225
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.992578223392124e-05,
+      "loss": 0.5844,
+      "step": 230
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.992252198584554e-05,
+      "loss": 0.5358,
+      "step": 235
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9919191771440905e-05,
+      "loss": 0.6067,
+      "step": 240
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.991579160005644e-05,
+      "loss": 0.6147,
+      "step": 245
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.991232148123761e-05,
+      "loss": 0.6185,
+      "step": 250
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.990878142472628e-05,
+      "loss": 0.5797,
+      "step": 255
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.990517144046064e-05,
+      "loss": 0.58,
+      "step": 260
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9901491538575185e-05,
+      "loss": 0.6051,
+      "step": 265
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9897741729400705e-05,
+      "loss": 0.6199,
+      "step": 270
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9893922023464236e-05,
+      "loss": 0.6173,
+      "step": 275
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.989003243148904e-05,
+      "loss": 0.5907,
+      "step": 280
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.988607296439458e-05,
+      "loss": 0.5818,
+      "step": 285
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.988204363329648e-05,
+      "loss": 0.5767,
+      "step": 290
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.987794444950651e-05,
+      "loss": 0.579,
+      "step": 295
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.987377542453251e-05,
+      "loss": 0.6454,
+      "step": 300
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.986953657007841e-05,
+      "loss": 0.5777,
+      "step": 305
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.986522789804417e-05,
+      "loss": 0.532,
+      "step": 310
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9860849420525766e-05,
+      "loss": 0.56,
+      "step": 315
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9856401149815126e-05,
+      "loss": 0.5624,
+      "step": 320
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.985188309840012e-05,
+      "loss": 0.6008,
+      "step": 325
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9847295278964514e-05,
+      "loss": 0.6425,
+      "step": 330
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.984263770438793e-05,
+      "loss": 0.5907,
+      "step": 335
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9837910387745845e-05,
+      "loss": 0.5926,
+      "step": 340
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.98331133423095e-05,
+      "loss": 0.6115,
+      "step": 345
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.982824658154589e-05,
+      "loss": 0.5685,
+      "step": 350
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.982331011911774e-05,
+      "loss": 0.5412,
+      "step": 355
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.981830396888344e-05,
+      "loss": 0.6032,
+      "step": 360
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.981322814489703e-05,
+      "loss": 0.568,
+      "step": 365
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.980808266140813e-05,
+      "loss": 0.5835,
+      "step": 370
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.980286753286195e-05,
+      "loss": 0.5633,
+      "step": 375
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.979758277389919e-05,
+      "loss": 0.574,
+      "step": 380
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.979222839935602e-05,
+      "loss": 0.5774,
+      "step": 385
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.9786804424264085e-05,
+      "loss": 0.5608,
+      "step": 390
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.9781310863850405e-05,
+      "loss": 0.5659,
+      "step": 395
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.977574773353732e-05,
+      "loss": 0.6167,
+      "step": 400
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.977011504894252e-05,
+      "loss": 0.5523,
+      "step": 405
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9764412825878943e-05,
+      "loss": 0.5804,
+      "step": 410
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.975864108035474e-05,
+      "loss": 0.5811,
+      "step": 415
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.975279982857324e-05,
+      "loss": 0.5832,
+      "step": 420
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9746889086932895e-05,
+      "loss": 0.6035,
+      "step": 425
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.974090887202726e-05,
+      "loss": 0.6077,
+      "step": 430
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9734859200644905e-05,
+      "loss": 0.6147,
+      "step": 435
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.97287400897694e-05,
+      "loss": 0.6825,
+      "step": 440
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.972255155657925e-05,
+      "loss": 0.5573,
+      "step": 445
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.971629361844785e-05,
+      "loss": 0.5691,
+      "step": 450
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9709966292943455e-05,
+      "loss": 0.5768,
+      "step": 455
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.970356959782909e-05,
+      "loss": 0.5953,
+      "step": 460
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.9697103551062556e-05,
+      "loss": 0.6207,
+      "step": 465
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.969056817079633e-05,
+      "loss": 0.5845,
+      "step": 470
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.968396347537751e-05,
+      "loss": 0.5719,
+      "step": 475
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.967728948334784e-05,
+      "loss": 0.5889,
+      "step": 480
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.967054621344356e-05,
+      "loss": 0.5574,
+      "step": 485
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.966373368459541e-05,
+      "loss": 0.6037,
+      "step": 490
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.965685191592859e-05,
+      "loss": 0.5707,
+      "step": 495
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.964990092676263e-05,
+      "loss": 0.6586,
+      "step": 500
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.964288073661142e-05,
+      "loss": 0.5559,
+      "step": 505
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.963579136518312e-05,
+      "loss": 0.5609,
+      "step": 510
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.96286328323801e-05,
+      "loss": 0.6081,
+      "step": 515
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.96214051582989e-05,
+      "loss": 0.5724,
+      "step": 520
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.9614108363230135e-05,
+      "loss": 0.5758,
+      "step": 525
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.960674246765851e-05,
+      "loss": 0.6191,
+      "step": 530
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.959930749226269e-05,
+      "loss": 0.5638,
+      "step": 535
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.959180345791528e-05,
+      "loss": 0.5698,
+      "step": 540
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.958423038568274e-05,
+      "loss": 0.5878,
+      "step": 545
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.9576588296825386e-05,
+      "loss": 0.5734,
+      "step": 550
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.956887721279726e-05,
+      "loss": 0.606,
+      "step": 555
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.956109715524608e-05,
+      "loss": 0.5871,
+      "step": 560
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.955324814601324e-05,
+      "loss": 0.6306,
+      "step": 565
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.9545330207133664e-05,
+      "loss": 0.6231,
+      "step": 570
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.953734336083583e-05,
+      "loss": 0.6278,
+      "step": 575
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.952928762954161e-05,
+      "loss": 0.5551,
+      "step": 580
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.952116303586631e-05,
+      "loss": 0.6441,
+      "step": 585
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.951296960261853e-05,
+      "loss": 0.5753,
+      "step": 590
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9504707352800125e-05,
+      "loss": 0.5458,
+      "step": 595
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.949637630960617e-05,
+      "loss": 0.5668,
+      "step": 600
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.948797649642484e-05,
+      "loss": 0.5727,
+      "step": 605
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9479507936837364e-05,
+      "loss": 0.628,
+      "step": 610
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.947097065461801e-05,
+      "loss": 0.5958,
+      "step": 615
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.946236467373392e-05,
+      "loss": 0.6173,
+      "step": 620
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9453690018345144e-05,
+      "loss": 0.5473,
+      "step": 625
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9444946712804494e-05,
+      "loss": 0.6012,
+      "step": 630
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.943613478165753e-05,
+      "loss": 0.6303,
+      "step": 635
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.9427254249642444e-05,
+      "loss": 0.6188,
+      "step": 640
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.941830514169004e-05,
+      "loss": 0.6079,
+      "step": 645
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.940928748292363e-05,
+      "loss": 0.5589,
+      "step": 650
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.940020129865895e-05,
+      "loss": 0.5652,
+      "step": 655
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.939104661440415e-05,
+      "loss": 0.5415,
+      "step": 660
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.938182345585966e-05,
+      "loss": 0.5657,
+      "step": 665
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9372531848918145e-05,
+      "loss": 0.6403,
+      "step": 670
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9363171819664434e-05,
+      "loss": 0.6198,
+      "step": 675
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.935374339437543e-05,
+      "loss": 0.5685,
+      "step": 680
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.934424659952006e-05,
+      "loss": 0.5737,
+      "step": 685
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.933468146175918e-05,
+      "loss": 0.5975,
+      "step": 690
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9325048007945526e-05,
+      "loss": 0.6033,
+      "step": 695
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9315346265123594e-05,
+      "loss": 0.6587,
+      "step": 700
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9305576260529607e-05,
+      "loss": 0.5903,
+      "step": 705
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.929573802159143e-05,
+      "loss": 0.5883,
+      "step": 710
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.9285831575928465e-05,
+      "loss": 0.6151,
+      "step": 715
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.927585695135162e-05,
+      "loss": 0.5687,
+      "step": 720
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9265814175863186e-05,
+      "loss": 0.6008,
+      "step": 725
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.925570327765678e-05,
+      "loss": 0.6045,
+      "step": 730
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9245524285117274e-05,
+      "loss": 0.5439,
+      "step": 735
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9235277226820695e-05,
+      "loss": 0.61,
+      "step": 740
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.922496213153416e-05,
+      "loss": 0.5913,
+      "step": 745
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9214579028215776e-05,
+      "loss": 0.6255,
+      "step": 750
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.920412794601461e-05,
+      "loss": 0.5834,
+      "step": 755
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9193608914270515e-05,
+      "loss": 0.6102,
+      "step": 760
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.918302196251415e-05,
+      "loss": 0.5591,
+      "step": 765
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.917236712046682e-05,
+      "loss": 0.556,
+      "step": 770
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.916164441804044e-05,
+      "loss": 0.5774,
+      "step": 775
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.9150853885337426e-05,
+      "loss": 0.6256,
+      "step": 780
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.913999555265062e-05,
+      "loss": 0.6505,
+      "step": 785
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.9129069450463186e-05,
+      "loss": 0.6012,
+      "step": 790
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.911807560944858e-05,
+      "loss": 0.5913,
+      "step": 795
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.910701406047037e-05,
+      "loss": 0.6963,
+      "step": 800
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.909588483458225e-05,
+      "loss": 0.5588,
+      "step": 805
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.9084687963027894e-05,
+      "loss": 0.6078,
+      "step": 810
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.907342347724087e-05,
+      "loss": 0.5816,
+      "step": 815
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.906209140884459e-05,
+      "loss": 0.5989,
+      "step": 820
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.905069178965215e-05,
+      "loss": 0.5568,
+      "step": 825
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.9039224651666325e-05,
+      "loss": 0.6021,
+      "step": 830
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.902769002707942e-05,
+      "loss": 0.6009,
+      "step": 835
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.90160879482732e-05,
+      "loss": 0.5801,
+      "step": 840
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.9004418447818815e-05,
+      "loss": 0.5939,
+      "step": 845
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.899268155847667e-05,
+      "loss": 0.6027,
+      "step": 850
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.898087731319636e-05,
+      "loss": 0.5416,
+      "step": 855
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.896900574511657e-05,
+      "loss": 0.5596,
+      "step": 860
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.8957066887565e-05,
+      "loss": 0.6187,
+      "step": 865
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.894506077405824e-05,
+      "loss": 0.5733,
+      "step": 870
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.893298743830168e-05,
+      "loss": 0.5862,
+      "step": 875
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.892084691418947e-05,
+      "loss": 0.6191,
+      "step": 880
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.8908639235804324e-05,
+      "loss": 0.6258,
+      "step": 885
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.889636443741752e-05,
+      "loss": 0.5504,
+      "step": 890
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.888402255348876e-05,
+      "loss": 0.6241,
+      "step": 895
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.887161361866608e-05,
+      "loss": 0.5839,
+      "step": 900
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.8859137667785735e-05,
+      "loss": 0.5797,
+      "step": 905
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.884659473587213e-05,
+      "loss": 0.6189,
+      "step": 910
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8833984858137715e-05,
+      "loss": 0.6194,
+      "step": 915
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8821308069982867e-05,
+      "loss": 0.585,
+      "step": 920
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.880856440699582e-05,
+      "loss": 0.582,
+      "step": 925
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8795753904952534e-05,
+      "loss": 0.5789,
+      "step": 930
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.878287659981662e-05,
+      "loss": 0.6236,
+      "step": 935
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8769932527739225e-05,
+      "loss": 0.5614,
+      "step": 940
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.8756921725058934e-05,
+      "loss": 0.5972,
+      "step": 945
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.874384422830167e-05,
+      "loss": 0.5682,
+      "step": 950
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.873070007418059e-05,
+      "loss": 0.6327,
+      "step": 955
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.871748929959598e-05,
+      "loss": 0.5577,
+      "step": 960
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.870421194163515e-05,
+      "loss": 0.56,
+      "step": 965
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.8690868037572346e-05,
+      "loss": 0.6324,
+      "step": 970
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.867745762486861e-05,
+      "loss": 0.5764,
+      "step": 975
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.8663980741171724e-05,
+      "loss": 0.564,
+      "step": 980
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.865043742431605e-05,
+      "loss": 0.5534,
+      "step": 985
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.863682771232248e-05,
+      "loss": 0.5953,
+      "step": 990
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.862315164339829e-05,
+      "loss": 0.4992,
+      "step": 995
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.860940925593703e-05,
+      "loss": 0.6061,
+      "step": 1000
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.859560058851844e-05,
+      "loss": 0.5429,
+      "step": 1005
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.8581725679908317e-05,
+      "loss": 0.6121,
+      "step": 1010
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.856778456905846e-05,
+      "loss": 0.5678,
+      "step": 1015
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.855377729510648e-05,
+      "loss": 0.633,
+      "step": 1020
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.8539703897375755e-05,
+      "loss": 0.6405,
+      "step": 1025
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.852556441537528e-05,
+      "loss": 0.5485,
+      "step": 1030
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.851135888879958e-05,
+      "loss": 0.5974,
+      "step": 1035
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.849708735752859e-05,
+      "loss": 0.582,
+      "step": 1040
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.848274986162754e-05,
+      "loss": 0.5806,
+      "step": 1045
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.846834644134686e-05,
+      "loss": 0.6104,
+      "step": 1050
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.845387713712203e-05,
+      "loss": 0.6216,
+      "step": 1055
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.84393419895735e-05,
+      "loss": 0.612,
+      "step": 1060
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.8424741039506575e-05,
+      "loss": 0.6155,
+      "step": 1065
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.841007432791129e-05,
+      "loss": 0.5703,
+      "step": 1070
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.839534189596228e-05,
+      "loss": 0.6044,
+      "step": 1075
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.8380543785018677e-05,
+      "loss": 0.5819,
+      "step": 1080
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.8365680036624026e-05,
+      "loss": 0.617,
+      "step": 1085
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.835075069250613e-05,
+      "loss": 0.5707,
+      "step": 1090
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 4.833575579457691e-05,
+      "loss": 0.6348,
+      "step": 1095
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 4.832069538493237e-05,
+      "loss": 0.6339,
+      "step": 1100
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 9375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 100,
+  "total_flos": 2.520154628849664e+17,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1100/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d44690adb91aea51e69289625324f309934686a0562d32d1f72a68a6232ae48
+size 4920

checkpoint-1100/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1200/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: Qwen/Qwen1.5-7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

checkpoint-1200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen1.5-7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-1200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:944f65ad2800e59e75d20353da25153da181f4056f0ef771a8f531a8b69100fd
+size 16794200

checkpoint-1200/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

checkpoint-1200/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aaabef3df9d233ea3552c758102d22c97d8cce4405bc846589623bb9e44d51e8
+size 33662074