Wenzhe commited on Mar 14

Commit

cca8ff9

•

1 Parent(s): 6c51ad9

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +59 -0
adapter_config.json +28 -0
adapter_model.safetensors +3 -0
all_results.json +7 -0
checkpoint-100/README.md +202 -0
checkpoint-100/adapter_config.json +28 -0
checkpoint-100/adapter_model.safetensors +3 -0
checkpoint-100/optimizer.pt +3 -0
checkpoint-100/rng_state.pth +3 -0
checkpoint-100/scheduler.pt +3 -0
checkpoint-100/special_tokens_map.json +30 -0
checkpoint-100/tokenizer.model +3 -0
checkpoint-100/tokenizer_config.json +54 -0
checkpoint-100/trainer_state.json +161 -0
checkpoint-100/training_args.bin +3 -0
checkpoint-1000/README.md +202 -0
checkpoint-1000/adapter_config.json +28 -0
checkpoint-1000/adapter_model.safetensors +3 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state.pth +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/special_tokens_map.json +30 -0
checkpoint-1000/tokenizer.model +3 -0
checkpoint-1000/tokenizer_config.json +54 -0
checkpoint-1000/trainer_state.json +1421 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-1100/README.md +202 -0
checkpoint-1100/adapter_config.json +28 -0
checkpoint-1100/adapter_model.safetensors +3 -0
checkpoint-1100/optimizer.pt +3 -0
checkpoint-1100/rng_state.pth +3 -0
checkpoint-1100/scheduler.pt +3 -0
checkpoint-1100/special_tokens_map.json +30 -0
checkpoint-1100/tokenizer.model +3 -0
checkpoint-1100/tokenizer_config.json +54 -0
checkpoint-1100/trainer_state.json +1561 -0
checkpoint-1100/training_args.bin +3 -0
checkpoint-1200/README.md +202 -0
checkpoint-1200/adapter_config.json +28 -0
checkpoint-1200/adapter_model.safetensors +3 -0
checkpoint-1200/optimizer.pt +3 -0
checkpoint-1200/rng_state.pth +3 -0
checkpoint-1200/scheduler.pt +3 -0
checkpoint-1200/special_tokens_map.json +30 -0
checkpoint-1200/tokenizer.model +3 -0
checkpoint-1200/tokenizer_config.json +54 -0
checkpoint-1200/trainer_state.json +1701 -0
checkpoint-1200/training_args.bin +3 -0
checkpoint-1300/README.md +202 -0
checkpoint-1300/adapter_config.json +28 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: other
+library_name: peft
+tags:
+- llama-factory
+- lora
+- generated_from_trainer
+base_model: hfl/chinese-alpaca-2-1.3b
+model-index:
+- name: train_2024-03-14-05-56-29
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# train_2024-03-14-05-56-29
+This model is a fine-tuned version of [hfl/chinese-alpaca-2-1.3b](https://huggingface.co/hfl/chinese-alpaca-2-1.3b) on the alpaca_zh dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 2
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- num_epochs: 1.0
+- mixed_precision_training: Native AMP
+### Training results
+### Framework versions
+- PEFT 0.9.0
+- Transformers 4.38.2
+- Pytorch 2.2.1+cu121
+- Datasets 2.18.0
+- Tokenizers 0.15.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "hfl/chinese-alpaca-2-1.3b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:462bcac6c9586774f4979f614b6ec3f18bbc2f0febcdd1c766687b7f2056c66a
+size 2099272

all_results.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+    "epoch": 0.48,
+    "train_loss": 2.0602192145127516,
+    "train_runtime": 801.4891,
+    "train_samples_per_second": 64.198,
+    "train_steps_per_second": 4.011
+}

checkpoint-100/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: hfl/chinese-alpaca-2-1.3b
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "hfl/chinese-alpaca-2-1.3b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d36ee8ac772f766edf9dbbfd9624d8e73a0cfd65f5df48211d324cc57a4e5c6
+size 2099272

checkpoint-100/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f41177184e0c283aa638b8ba6cde4ccf932b0a8e13fad8e15ae916a4cb18c62
+size 4208302

checkpoint-100/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5b99b025074fe1142a3334c2107e39b98b9d36c2981cb810283beb4adc7ac94
+size 14244

checkpoint-100/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2197384a954a4fcf1ccbd8df7831a0d78a90198a2460b1cc6c71d1497ca1586
+size 1064

checkpoint-100/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-100/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3b8844863b200dfcca971db228e96ce388290dfcf72c15d7a9d2f604bac787c
+size 844403

checkpoint-100/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% set system_message = 'You are a helpful assistant. 你是一个乐于助人的助手。' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if loop.index0 == 0 and system_message is defined %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false,
+  "use_fast": false
+}

checkpoint-100/trainer_state.json ADDED Viewed

	@@ -0,0 +1,161 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.031095735997201383,
+  "eval_steps": 500,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.47774845361709595,
+      "learning_rate": 4.999970160815579e-05,
+      "loss": 2.0765,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6051416397094727,
+      "learning_rate": 4.999880643974619e-05,
+      "loss": 2.2297,
+      "step": 10
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6161717772483826,
+      "learning_rate": 4.9997314516140056e-05,
+      "loss": 2.1103,
+      "step": 15
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.4686434268951416,
+      "learning_rate": 4.999522587295162e-05,
+      "loss": 2.0057,
+      "step": 20
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8412289023399353,
+      "learning_rate": 4.999254056003963e-05,
+      "loss": 2.1778,
+      "step": 25
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.5333625078201294,
+      "learning_rate": 4.99892586415061e-05,
+      "loss": 2.2399,
+      "step": 30
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.821148157119751,
+      "learning_rate": 4.9985380195694856e-05,
+      "loss": 2.3215,
+      "step": 35
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8403909206390381,
+      "learning_rate": 4.998090531518962e-05,
+      "loss": 1.8295,
+      "step": 40
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.6633398532867432,
+      "learning_rate": 4.9975834106811834e-05,
+      "loss": 2.0195,
+      "step": 45
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6386868357658386,
+      "learning_rate": 4.997016669161806e-05,
+      "loss": 2.1257,
+      "step": 50
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.7762248516082764,
+      "learning_rate": 4.996390320489715e-05,
+      "loss": 2.057,
+      "step": 55
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.3192856311798096,
+      "learning_rate": 4.9957043796166966e-05,
+      "loss": 2.0753,
+      "step": 60
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.9797518849372864,
+      "learning_rate": 4.994958862917083e-05,
+      "loss": 1.9736,
+      "step": 65
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.000693440437317,
+      "learning_rate": 4.994153788187363e-05,
+      "loss": 2.1572,
+      "step": 70
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6852813959121704,
+      "learning_rate": 4.993289174645757e-05,
+      "loss": 2.1491,
+      "step": 75
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.0075691938400269,
+      "learning_rate": 4.992365042931752e-05,
+      "loss": 1.945,
+      "step": 80
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.1973133087158203,
+      "learning_rate": 4.991381415105619e-05,
+      "loss": 2.0811,
+      "step": 85
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9927239418029785,
+      "learning_rate": 4.990338314647881e-05,
+      "loss": 1.961,
+      "step": 90
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9499759674072266,
+      "learning_rate": 4.98923576645875e-05,
+      "loss": 2.0653,
+      "step": 95
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.7233040928840637,
+      "learning_rate": 4.9880737968575365e-05,
+      "loss": 1.9999,
+      "step": 100
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3215,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "total_flos": 1342876749004800.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-100/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79e80b13ff00898b4493d440a3c1a1eb234c0ae541cbca8a8b1befef97a354c9
+size 5112

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: hfl/chinese-alpaca-2-1.3b
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "hfl/chinese-alpaca-2-1.3b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:48a2d9599d49f21f6da2043f811c7fc36667538dcd0694bf5ab45264e604eb6a
+size 2099272

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f006162ce9a8ae562e525479e6cbcc5e65fcf10721c1c878a1eba1f3248032d
+size 4208302

checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bab00d452a5c7da709ec7f6117cea515f8bedf68f40e39d014d87811d017f294
+size 14244

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0997e59d7add886c3367f611ab22a16eaa2f60d42a1ebdb93b4ad46ac309297
+size 1064

checkpoint-1000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1000/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3b8844863b200dfcca971db228e96ce388290dfcf72c15d7a9d2f604bac787c
+size 844403

checkpoint-1000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% set system_message = 'You are a helpful assistant. 你是一个乐于助人的助手。' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if loop.index0 == 0 and system_message is defined %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false,
+  "use_fast": false
+}

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1421 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.31095735997201385,
+  "eval_steps": 500,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.47774845361709595,
+      "learning_rate": 4.999970160815579e-05,
+      "loss": 2.0765,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6051416397094727,
+      "learning_rate": 4.999880643974619e-05,
+      "loss": 2.2297,
+      "step": 10
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6161717772483826,
+      "learning_rate": 4.9997314516140056e-05,
+      "loss": 2.1103,
+      "step": 15
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.4686434268951416,
+      "learning_rate": 4.999522587295162e-05,
+      "loss": 2.0057,
+      "step": 20
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8412289023399353,
+      "learning_rate": 4.999254056003963e-05,
+      "loss": 2.1778,
+      "step": 25
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.5333625078201294,
+      "learning_rate": 4.99892586415061e-05,
+      "loss": 2.2399,
+      "step": 30
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.821148157119751,
+      "learning_rate": 4.9985380195694856e-05,
+      "loss": 2.3215,
+      "step": 35
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8403909206390381,
+      "learning_rate": 4.998090531518962e-05,
+      "loss": 1.8295,
+      "step": 40
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.6633398532867432,
+      "learning_rate": 4.9975834106811834e-05,
+      "loss": 2.0195,
+      "step": 45
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6386868357658386,
+      "learning_rate": 4.997016669161806e-05,
+      "loss": 2.1257,
+      "step": 50
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.7762248516082764,
+      "learning_rate": 4.996390320489715e-05,
+      "loss": 2.057,
+      "step": 55
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.3192856311798096,
+      "learning_rate": 4.9957043796166966e-05,
+      "loss": 2.0753,
+      "step": 60
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.9797518849372864,
+      "learning_rate": 4.994958862917083e-05,
+      "loss": 1.9736,
+      "step": 65
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.000693440437317,
+      "learning_rate": 4.994153788187363e-05,
+      "loss": 2.1572,
+      "step": 70
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6852813959121704,
+      "learning_rate": 4.993289174645757e-05,
+      "loss": 2.1491,
+      "step": 75
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.0075691938400269,
+      "learning_rate": 4.992365042931752e-05,
+      "loss": 1.945,
+      "step": 80
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.1973133087158203,
+      "learning_rate": 4.991381415105619e-05,
+      "loss": 2.0811,
+      "step": 85
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9927239418029785,
+      "learning_rate": 4.990338314647881e-05,
+      "loss": 1.961,
+      "step": 90
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9499759674072266,
+      "learning_rate": 4.98923576645875e-05,
+      "loss": 2.0653,
+      "step": 95
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.7233040928840637,
+      "learning_rate": 4.9880737968575365e-05,
+      "loss": 1.9999,
+      "step": 100
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.55235755443573,
+      "learning_rate": 4.986852433582022e-05,
+      "loss": 2.2258,
+      "step": 105
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9007890820503235,
+      "learning_rate": 4.985571705787793e-05,
+      "loss": 2.1034,
+      "step": 110
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.6774860620498657,
+      "learning_rate": 4.9842316440475475e-05,
+      "loss": 2.1753,
+      "step": 115
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.7676737308502197,
+      "learning_rate": 4.9828322803503665e-05,
+      "loss": 2.1384,
+      "step": 120
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9624544978141785,
+      "learning_rate": 4.981373648100946e-05,
+      "loss": 2.0521,
+      "step": 125
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9315722584724426,
+      "learning_rate": 4.979855782118802e-05,
+      "loss": 1.9256,
+      "step": 130
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9035864472389221,
+      "learning_rate": 4.978278718637443e-05,
+      "loss": 2.0882,
+      "step": 135
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.7997236251831055,
+      "learning_rate": 4.9766424953035e-05,
+      "loss": 2.0724,
+      "step": 140
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 1.0692921876907349,
+      "learning_rate": 4.974947151175826e-05,
+      "loss": 2.1329,
+      "step": 145
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.9506180286407471,
+      "learning_rate": 4.973192726724572e-05,
+      "loss": 2.082,
+      "step": 150
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.8647387027740479,
+      "learning_rate": 4.9713792638302145e-05,
+      "loss": 2.0366,
+      "step": 155
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 1.105302095413208,
+      "learning_rate": 4.969506805782555e-05,
+      "loss": 2.1481,
+      "step": 160
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.7593303918838501,
+      "learning_rate": 4.967575397279689e-05,
+      "loss": 2.032,
+      "step": 165
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.7521979808807373,
+      "learning_rate": 4.965585084426943e-05,
+      "loss": 2.0379,
+      "step": 170
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.947120726108551,
+      "learning_rate": 4.9635359147357655e-05,
+      "loss": 2.1444,
+      "step": 175
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.2184454202651978,
+      "learning_rate": 4.961427937122598e-05,
+      "loss": 1.9164,
+      "step": 180
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.221663475036621,
+      "learning_rate": 4.959261201907707e-05,
+      "loss": 2.0084,
+      "step": 185
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.0457361936569214,
+      "learning_rate": 4.957035760813982e-05,
+      "loss": 2.2032,
+      "step": 190
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.8834909200668335,
+      "learning_rate": 4.954751666965701e-05,
+      "loss": 2.2101,
+      "step": 195
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.791902482509613,
+      "learning_rate": 4.9524089748872615e-05,
+      "loss": 2.0472,
+      "step": 200
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.2905739545822144,
+      "learning_rate": 4.9500077405018807e-05,
+      "loss": 2.0987,
+      "step": 205
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.8612006306648254,
+      "learning_rate": 4.9475480211302583e-05,
+      "loss": 2.1765,
+      "step": 210
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.3128459453582764,
+      "learning_rate": 4.945029875489212e-05,
+      "loss": 1.9926,
+      "step": 215
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.9610918164253235,
+      "learning_rate": 4.94245336369027e-05,
+      "loss": 2.0124,
+      "step": 220
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.873160183429718,
+      "learning_rate": 4.939818547238241e-05,
+      "loss": 2.2229,
+      "step": 225
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.5535285472869873,
+      "learning_rate": 4.9371254890297446e-05,
+      "loss": 2.2013,
+      "step": 230
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.1951836347579956,
+      "learning_rate": 4.93437425335171e-05,
+      "loss": 2.014,
+      "step": 235
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.7874170541763306,
+      "learning_rate": 4.9315649058798384e-05,
+      "loss": 2.1701,
+      "step": 240
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.3503323793411255,
+      "learning_rate": 4.928697513677042e-05,
+      "loss": 2.1681,
+      "step": 245
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.3091179132461548,
+      "learning_rate": 4.925772145191834e-05,
+      "loss": 2.1224,
+      "step": 250
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.4428555965423584,
+      "learning_rate": 4.9227888702567044e-05,
+      "loss": 2.0512,
+      "step": 255
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 0.8234395980834961,
+      "learning_rate": 4.9197477600864446e-05,
+      "loss": 2.1067,
+      "step": 260
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.9094969034194946,
+      "learning_rate": 4.9166488872764526e-05,
+      "loss": 1.8884,
+      "step": 265
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.0074087381362915,
+      "learning_rate": 4.913492325800999e-05,
+      "loss": 1.9345,
+      "step": 270
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 1.0867297649383545,
+      "learning_rate": 4.910278151011458e-05,
+      "loss": 2.1928,
+      "step": 275
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.6842357516288757,
+      "learning_rate": 4.907006439634516e-05,
+      "loss": 2.0407,
+      "step": 280
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.8409023284912109,
+      "learning_rate": 4.903677269770329e-05,
+      "loss": 2.2344,
+      "step": 285
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.8119503259658813,
+      "learning_rate": 4.900290720890671e-05,
+      "loss": 2.1296,
+      "step": 290
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.9938147068023682,
+      "learning_rate": 4.8968468738370244e-05,
+      "loss": 2.152,
+      "step": 295
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.9865244030952454,
+      "learning_rate": 4.8933458108186606e-05,
+      "loss": 1.9623,
+      "step": 300
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 1.3944802284240723,
+      "learning_rate": 4.889787615410672e-05,
+      "loss": 1.915,
+      "step": 305
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.3749767541885376,
+      "learning_rate": 4.886172372551977e-05,
+      "loss": 1.9934,
+      "step": 310
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.9024938941001892,
+      "learning_rate": 4.882500168543294e-05,
+      "loss": 2.1541,
+      "step": 315
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.1978263854980469,
+      "learning_rate": 4.878771091045082e-05,
+      "loss": 2.1688,
+      "step": 320
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.8360010981559753,
+      "learning_rate": 4.874985229075446e-05,
+      "loss": 2.1387,
+      "step": 325
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.7683364152908325,
+      "learning_rate": 4.871142673008012e-05,
+      "loss": 2.0215,
+      "step": 330
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.4230670928955078,
+      "learning_rate": 4.867243514569772e-05,
+      "loss": 1.9491,
+      "step": 335
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 0.8198773860931396,
+      "learning_rate": 4.863287846838891e-05,
+      "loss": 2.0151,
+      "step": 340
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.467207908630371,
+      "learning_rate": 4.85927576424249e-05,
+      "loss": 1.8906,
+      "step": 345
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 0.9537095427513123,
+      "learning_rate": 4.855207362554385e-05,
+      "loss": 2.1844,
+      "step": 350
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.0757155418395996,
+      "learning_rate": 4.851082738892809e-05,
+      "loss": 2.048,
+      "step": 355
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.6884938478469849,
+      "learning_rate": 4.8469019917180846e-05,
+      "loss": 1.9537,
+      "step": 360
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.4680182933807373,
+      "learning_rate": 4.8426652208302814e-05,
+      "loss": 1.9731,
+      "step": 365
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.1778632402420044,
+      "learning_rate": 4.83837252736683e-05,
+      "loss": 2.1395,
+      "step": 370
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.2865056991577148,
+      "learning_rate": 4.834024013800108e-05,
+      "loss": 2.0016,
+      "step": 375
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.055177092552185,
+      "learning_rate": 4.8296197839349944e-05,
+      "loss": 1.9632,
+      "step": 380
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.0041871070861816,
+      "learning_rate": 4.825159942906389e-05,
+      "loss": 2.3302,
+      "step": 385
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.0026438236236572,
+      "learning_rate": 4.820644597176709e-05,
+      "loss": 2.1517,
+      "step": 390
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.3532180786132812,
+      "learning_rate": 4.81607385453334e-05,
+      "loss": 2.1229,
+      "step": 395
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 0.7670988440513611,
+      "learning_rate": 4.81144782408607e-05,
+      "loss": 2.1382,
+      "step": 400
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.0405700206756592,
+      "learning_rate": 4.8067666162644774e-05,
+      "loss": 1.9614,
+      "step": 405
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.2252662181854248,
+      "learning_rate": 4.802030342815304e-05,
+      "loss": 2.1399,
+      "step": 410
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.237946629524231,
+      "learning_rate": 4.7972391167997754e-05,
+      "loss": 1.9034,
+      "step": 415
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.8064705729484558,
+      "learning_rate": 4.7923930525909156e-05,
+      "loss": 2.0075,
+      "step": 420
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.8717565536499023,
+      "learning_rate": 4.7874922658708065e-05,
+      "loss": 2.0105,
+      "step": 425
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.6693098545074463,
+      "learning_rate": 4.782536873627832e-05,
+      "loss": 2.0242,
+      "step": 430
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.82447350025177,
+      "learning_rate": 4.777526994153882e-05,
+      "loss": 2.0267,
+      "step": 435
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.9926588535308838,
+      "learning_rate": 4.7724627470415307e-05,
+      "loss": 1.9119,
+      "step": 440
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.0924450159072876,
+      "learning_rate": 4.7673442531811796e-05,
+      "loss": 2.2653,
+      "step": 445
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.1592103242874146,
+      "learning_rate": 4.762171634758177e-05,
+      "loss": 2.0017,
+      "step": 450
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.9172110557556152,
+      "learning_rate": 4.7569450152498927e-05,
+      "loss": 2.1408,
+      "step": 455
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.1897525787353516,
+      "learning_rate": 4.751664519422778e-05,
+      "loss": 2.0935,
+      "step": 460
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.8793094158172607,
+      "learning_rate": 4.746330273329386e-05,
+      "loss": 2.1142,
+      "step": 465
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.4337489604949951,
+      "learning_rate": 4.740942404305356e-05,
+      "loss": 2.1289,
+      "step": 470
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.0251764059066772,
+      "learning_rate": 4.735501040966383e-05,
+      "loss": 1.9741,
+      "step": 475
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.2659822702407837,
+      "learning_rate": 4.730006313205143e-05,
+      "loss": 2.088,
+      "step": 480
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.8884140849113464,
+      "learning_rate": 4.724458352188192e-05,
+      "loss": 2.2079,
+      "step": 485
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.1937768459320068,
+      "learning_rate": 4.718857290352835e-05,
+      "loss": 2.048,
+      "step": 490
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.9741552472114563,
+      "learning_rate": 4.713203261403966e-05,
+      "loss": 2.2569,
+      "step": 495
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.7996780872344971,
+      "learning_rate": 4.707496400310874e-05,
+      "loss": 1.9574,
+      "step": 500
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.8182051181793213,
+      "learning_rate": 4.701736843304025e-05,
+      "loss": 2.0951,
+      "step": 505
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.507320761680603,
+      "learning_rate": 4.695924727871805e-05,
+      "loss": 2.0253,
+      "step": 510
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.759121835231781,
+      "learning_rate": 4.690060192757242e-05,
+      "loss": 2.0602,
+      "step": 515
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.5943195819854736,
+      "learning_rate": 4.684143377954691e-05,
+      "loss": 2.0386,
+      "step": 520
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.8568710088729858,
+      "learning_rate": 4.6781744247064955e-05,
+      "loss": 2.073,
+      "step": 525
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.3352620601654053,
+      "learning_rate": 4.6721534754996125e-05,
+      "loss": 2.1443,
+      "step": 530
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.3417474031448364,
+      "learning_rate": 4.666080674062213e-05,
+      "loss": 2.0288,
+      "step": 535
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.5334464311599731,
+      "learning_rate": 4.659956165360251e-05,
+      "loss": 2.0609,
+      "step": 540
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 0.9658721089363098,
+      "learning_rate": 4.6537800955940005e-05,
+      "loss": 1.9539,
+      "step": 545
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.9197947978973389,
+      "learning_rate": 4.647552612194572e-05,
+      "loss": 2.149,
+      "step": 550
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 0.8512137532234192,
+      "learning_rate": 4.641273863820383e-05,
+      "loss": 1.9722,
+      "step": 555
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.827289342880249,
+      "learning_rate": 4.634944000353622e-05,
+      "loss": 2.0729,
+      "step": 560
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.088416337966919,
+      "learning_rate": 4.628563172896655e-05,
+      "loss": 1.9507,
+      "step": 565
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.3566908836364746,
+      "learning_rate": 4.6221315337684353e-05,
+      "loss": 2.1643,
+      "step": 570
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.3541293144226074,
+      "learning_rate": 4.615649236500854e-05,
+      "loss": 2.1839,
+      "step": 575
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 0.991269588470459,
+      "learning_rate": 4.609116435835083e-05,
+      "loss": 2.0976,
+      "step": 580
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.0280535221099854,
+      "learning_rate": 4.602533287717877e-05,
+      "loss": 2.1474,
+      "step": 585
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.013123631477356,
+      "learning_rate": 4.5958999492978524e-05,
+      "loss": 2.1873,
+      "step": 590
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.1753040552139282,
+      "learning_rate": 4.589216578921737e-05,
+      "loss": 2.1744,
+      "step": 595
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.1839090585708618,
+      "learning_rate": 4.582483336130586e-05,
+      "loss": 1.9982,
+      "step": 600
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.0724798440933228,
+      "learning_rate": 4.575700381655979e-05,
+      "loss": 2.1234,
+      "step": 605
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 2.009913682937622,
+      "learning_rate": 4.5688678774161796e-05,
+      "loss": 1.9478,
+      "step": 610
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.9897060394287109,
+      "learning_rate": 4.561985986512271e-05,
+      "loss": 1.8268,
+      "step": 615
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.8881808519363403,
+      "learning_rate": 4.555054873224263e-05,
+      "loss": 1.9887,
+      "step": 620
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.155900001525879,
+      "learning_rate": 4.54807470300717e-05,
+      "loss": 2.0777,
+      "step": 625
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 0.8782421350479126,
+      "learning_rate": 4.5410456424870596e-05,
+      "loss": 2.0566,
+      "step": 630
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.3324674367904663,
+      "learning_rate": 4.5339678594570795e-05,
+      "loss": 2.047,
+      "step": 635
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.9805939197540283,
+      "learning_rate": 4.526841522873449e-05,
+      "loss": 1.962,
+      "step": 640
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.4999943971633911,
+      "learning_rate": 4.519666802851422e-05,
+      "loss": 2.0972,
+      "step": 645
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.4504961967468262,
+      "learning_rate": 4.5124438706612376e-05,
+      "loss": 2.0041,
+      "step": 650
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 0.9078169465065002,
+      "learning_rate": 4.505172898724018e-05,
+      "loss": 2.1229,
+      "step": 655
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.1635804176330566,
+      "learning_rate": 4.497854060607662e-05,
+      "loss": 2.0195,
+      "step": 660
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.46576726436615,
+      "learning_rate": 4.490487531022699e-05,
+      "loss": 2.0745,
+      "step": 665
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.2094652652740479,
+      "learning_rate": 4.4830734858181145e-05,
+      "loss": 2.1068,
+      "step": 670
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.4738895893096924,
+      "learning_rate": 4.47561210197716e-05,
+      "loss": 1.8088,
+      "step": 675
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.23384690284729,
+      "learning_rate": 4.4681035576131215e-05,
+      "loss": 2.0995,
+      "step": 680
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.8332946300506592,
+      "learning_rate": 4.46054803196507e-05,
+      "loss": 2.0541,
+      "step": 685
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.9207485318183899,
+      "learning_rate": 4.452945705393586e-05,
+      "loss": 2.166,
+      "step": 690
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.292945146560669,
+      "learning_rate": 4.445296759376449e-05,
+      "loss": 2.0784,
+      "step": 695
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.9874763488769531,
+      "learning_rate": 4.437601376504307e-05,
+      "loss": 2.2087,
+      "step": 700
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.9427415132522583,
+      "learning_rate": 4.4298597404763186e-05,
+      "loss": 2.1199,
+      "step": 705
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.7369529008865356,
+      "learning_rate": 4.422072036095768e-05,
+      "loss": 2.0355,
+      "step": 710
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.2423696517944336,
+      "learning_rate": 4.414238449265654e-05,
+      "loss": 2.0011,
+      "step": 715
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.2304831743240356,
+      "learning_rate": 4.406359166984249e-05,
+      "loss": 2.0368,
+      "step": 720
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 0.9090413451194763,
+      "learning_rate": 4.39843437734064e-05,
+      "loss": 1.9983,
+      "step": 725
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.2729507684707642,
+      "learning_rate": 4.390464269510233e-05,
+      "loss": 2.021,
+      "step": 730
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.3009227514266968,
+      "learning_rate": 4.382449033750244e-05,
+      "loss": 1.9743,
+      "step": 735
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.5456056594848633,
+      "learning_rate": 4.37438886139515e-05,
+      "loss": 2.0689,
+      "step": 740
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.3235007524490356,
+      "learning_rate": 4.3662839448521264e-05,
+      "loss": 2.0838,
+      "step": 745
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 2.2074007987976074,
+      "learning_rate": 4.358134477596454e-05,
+      "loss": 2.0835,
+      "step": 750
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.403738021850586,
+      "learning_rate": 4.3499406541668966e-05,
+      "loss": 2.0916,
+      "step": 755
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.0940325260162354,
+      "learning_rate": 4.3417026701610616e-05,
+      "loss": 1.972,
+      "step": 760
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.666353702545166,
+      "learning_rate": 4.3334207222307275e-05,
+      "loss": 1.927,
+      "step": 765
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.0777515172958374,
+      "learning_rate": 4.325095008077154e-05,
+      "loss": 2.1192,
+      "step": 770
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.7218186855316162,
+      "learning_rate": 4.316725726446353e-05,
+      "loss": 2.0774,
+      "step": 775
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.356753945350647,
+      "learning_rate": 4.3083130771243586e-05,
+      "loss": 2.0847,
+      "step": 780
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 0.9967429637908936,
+      "learning_rate": 4.299857260932445e-05,
+      "loss": 2.0485,
+      "step": 785
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.6216442584991455,
+      "learning_rate": 4.2913584797223397e-05,
+      "loss": 2.1008,
+      "step": 790
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.2556742429733276,
+      "learning_rate": 4.2828169363714016e-05,
+      "loss": 1.9209,
+      "step": 795
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.1800439357757568,
+      "learning_rate": 4.274232834777782e-05,
+      "loss": 1.9722,
+      "step": 800
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.1313499212265015,
+      "learning_rate": 4.2656063798555515e-05,
+      "loss": 1.9176,
+      "step": 805
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.137534737586975,
+      "learning_rate": 4.256937777529815e-05,
+      "loss": 1.9929,
+      "step": 810
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.0575093030929565,
+      "learning_rate": 4.2482272347317906e-05,
+      "loss": 2.166,
+      "step": 815
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.5939594507217407,
+      "learning_rate": 4.2394749593938733e-05,
+      "loss": 2.1334,
+      "step": 820
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1045507192611694,
+      "learning_rate": 4.230681160444669e-05,
+      "loss": 2.0853,
+      "step": 825
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.3480136394500732,
+      "learning_rate": 4.221846047804009e-05,
+      "loss": 2.1802,
+      "step": 830
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1822657585144043,
+      "learning_rate": 4.2129698323779366e-05,
+      "loss": 2.0739,
+      "step": 835
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1771117448806763,
+      "learning_rate": 4.204052726053676e-05,
+      "loss": 2.0238,
+      "step": 840
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.4757814407348633,
+      "learning_rate": 4.195094941694571e-05,
+      "loss": 2.1557,
+      "step": 845
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 0.9095075726509094,
+      "learning_rate": 4.1860966931350054e-05,
+      "loss": 2.1666,
+      "step": 850
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.1039543151855469,
+      "learning_rate": 4.1770581951752976e-05,
+      "loss": 2.105,
+      "step": 855
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 0.8517205119132996,
+      "learning_rate": 4.1679796635765735e-05,
+      "loss": 1.9656,
+      "step": 860
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.239492654800415,
+      "learning_rate": 4.158861315055617e-05,
+      "loss": 2.0166,
+      "step": 865
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.1358321905136108,
+      "learning_rate": 4.1497033672796924e-05,
+      "loss": 2.0076,
+      "step": 870
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.6215249300003052,
+      "learning_rate": 4.140506038861356e-05,
+      "loss": 2.1594,
+      "step": 875
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.0528080463409424,
+      "learning_rate": 4.131269549353229e-05,
+      "loss": 2.1416,
+      "step": 880
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 0.8976901769638062,
+      "learning_rate": 4.1219941192427644e-05,
+      "loss": 2.1242,
+      "step": 885
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.263594388961792,
+      "learning_rate": 4.112679969946977e-05,
+      "loss": 2.02,
+      "step": 890
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.4173017740249634,
+      "learning_rate": 4.103327323807162e-05,
+      "loss": 2.0438,
+      "step": 895
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.876170039176941,
+      "learning_rate": 4.093936404083585e-05,
+      "loss": 1.9806,
+      "step": 900
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.4649231433868408,
+      "learning_rate": 4.0845074349501544e-05,
+      "loss": 2.1476,
+      "step": 905
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.0446043014526367,
+      "learning_rate": 4.0750406414890695e-05,
+      "loss": 1.9672,
+      "step": 910
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.0225305557250977,
+      "learning_rate": 4.065536249685448e-05,
+      "loss": 1.9984,
+      "step": 915
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0120617151260376,
+      "learning_rate": 4.055994486421929e-05,
+      "loss": 2.1162,
+      "step": 920
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0469881296157837,
+      "learning_rate": 4.04641557947326e-05,
+      "loss": 2.0435,
+      "step": 925
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.2435941696166992,
+      "learning_rate": 4.036799757500856e-05,
+      "loss": 2.0431,
+      "step": 930
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0055103302001953,
+      "learning_rate": 4.027147250047348e-05,
+      "loss": 2.2021,
+      "step": 935
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.1212949752807617,
+      "learning_rate": 4.017458287531094e-05,
+      "loss": 1.997,
+      "step": 940
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.1048357486724854,
+      "learning_rate": 4.007733101240685e-05,
+      "loss": 1.946,
+      "step": 945
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.4721689224243164,
+      "learning_rate": 3.997971923329426e-05,
+      "loss": 2.0723,
+      "step": 950
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.3793156147003174,
+      "learning_rate": 3.988174986809783e-05,
+      "loss": 2.034,
+      "step": 955
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.9013482928276062,
+      "learning_rate": 3.9783425255478355e-05,
+      "loss": 1.9736,
+      "step": 960
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.9192422032356262,
+      "learning_rate": 3.968474774257682e-05,
+      "loss": 1.9878,
+      "step": 965
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.9304206371307373,
+      "learning_rate": 3.9585719684958446e-05,
+      "loss": 2.117,
+      "step": 970
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.0435137748718262,
+      "learning_rate": 3.948634344655639e-05,
+      "loss": 2.0585,
+      "step": 975
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.4636590480804443,
+      "learning_rate": 3.938662139961538e-05,
+      "loss": 2.0409,
+      "step": 980
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.8014529943466187,
+      "learning_rate": 3.928655592463508e-05,
+      "loss": 2.0369,
+      "step": 985
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.2412620782852173,
+      "learning_rate": 3.918614941031319e-05,
+      "loss": 1.967,
+      "step": 990
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.3581103086471558,
+      "learning_rate": 3.908540425348852e-05,
+      "loss": 2.0037,
+      "step": 995
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.2377780675888062,
+      "learning_rate": 3.8984322859083725e-05,
+      "loss": 1.9991,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3215,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "total_flos": 1.343065816498176e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79e80b13ff00898b4493d440a3c1a1eb234c0ae541cbca8a8b1befef97a354c9
+size 5112

checkpoint-1100/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: hfl/chinese-alpaca-2-1.3b
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-1100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "hfl/chinese-alpaca-2-1.3b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a683217eb65ff427c775be7b4f2b17fa3e4d2a1f4ec2c6f932ead2131164d58
+size 2099272

checkpoint-1100/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f09f84a51333688d2c1ffa008ee924f88b13e5b05cf3e96c960de16b6a3ec732
+size 4208302

checkpoint-1100/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea2be7046a8cfae98823a1a5937a6b641a96662a439d14f80e764a9be0f430b4
+size 14244

checkpoint-1100/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9ebce50ac9027c187ef9430639e84c374e26350a5f18c89c2fee60ddec9bbbf
+size 1064

checkpoint-1100/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1100/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3b8844863b200dfcca971db228e96ce388290dfcf72c15d7a9d2f604bac787c
+size 844403

checkpoint-1100/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% set system_message = 'You are a helpful assistant. 你是一个乐于助人的助手。' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if loop.index0 == 0 and system_message is defined %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false,
+  "use_fast": false
+}

checkpoint-1100/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1561 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.34205309596921524,
+  "eval_steps": 500,
+  "global_step": 1100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.47774845361709595,
+      "learning_rate": 4.999970160815579e-05,
+      "loss": 2.0765,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6051416397094727,
+      "learning_rate": 4.999880643974619e-05,
+      "loss": 2.2297,
+      "step": 10
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6161717772483826,
+      "learning_rate": 4.9997314516140056e-05,
+      "loss": 2.1103,
+      "step": 15
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.4686434268951416,
+      "learning_rate": 4.999522587295162e-05,
+      "loss": 2.0057,
+      "step": 20
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8412289023399353,
+      "learning_rate": 4.999254056003963e-05,
+      "loss": 2.1778,
+      "step": 25
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.5333625078201294,
+      "learning_rate": 4.99892586415061e-05,
+      "loss": 2.2399,
+      "step": 30
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.821148157119751,
+      "learning_rate": 4.9985380195694856e-05,
+      "loss": 2.3215,
+      "step": 35
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8403909206390381,
+      "learning_rate": 4.998090531518962e-05,
+      "loss": 1.8295,
+      "step": 40
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.6633398532867432,
+      "learning_rate": 4.9975834106811834e-05,
+      "loss": 2.0195,
+      "step": 45
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6386868357658386,
+      "learning_rate": 4.997016669161806e-05,
+      "loss": 2.1257,
+      "step": 50
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.7762248516082764,
+      "learning_rate": 4.996390320489715e-05,
+      "loss": 2.057,
+      "step": 55
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.3192856311798096,
+      "learning_rate": 4.9957043796166966e-05,
+      "loss": 2.0753,
+      "step": 60
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.9797518849372864,
+      "learning_rate": 4.994958862917083e-05,
+      "loss": 1.9736,
+      "step": 65
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.000693440437317,
+      "learning_rate": 4.994153788187363e-05,
+      "loss": 2.1572,
+      "step": 70
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6852813959121704,
+      "learning_rate": 4.993289174645757e-05,
+      "loss": 2.1491,
+      "step": 75
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.0075691938400269,
+      "learning_rate": 4.992365042931752e-05,
+      "loss": 1.945,
+      "step": 80
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.1973133087158203,
+      "learning_rate": 4.991381415105619e-05,
+      "loss": 2.0811,
+      "step": 85
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9927239418029785,
+      "learning_rate": 4.990338314647881e-05,
+      "loss": 1.961,
+      "step": 90
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9499759674072266,
+      "learning_rate": 4.98923576645875e-05,
+      "loss": 2.0653,
+      "step": 95
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.7233040928840637,
+      "learning_rate": 4.9880737968575365e-05,
+      "loss": 1.9999,
+      "step": 100
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.55235755443573,
+      "learning_rate": 4.986852433582022e-05,
+      "loss": 2.2258,
+      "step": 105
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9007890820503235,
+      "learning_rate": 4.985571705787793e-05,
+      "loss": 2.1034,
+      "step": 110
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.6774860620498657,
+      "learning_rate": 4.9842316440475475e-05,
+      "loss": 2.1753,
+      "step": 115
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.7676737308502197,
+      "learning_rate": 4.9828322803503665e-05,
+      "loss": 2.1384,
+      "step": 120
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9624544978141785,
+      "learning_rate": 4.981373648100946e-05,
+      "loss": 2.0521,
+      "step": 125
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9315722584724426,
+      "learning_rate": 4.979855782118802e-05,
+      "loss": 1.9256,
+      "step": 130
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9035864472389221,
+      "learning_rate": 4.978278718637443e-05,
+      "loss": 2.0882,
+      "step": 135
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.7997236251831055,
+      "learning_rate": 4.9766424953035e-05,
+      "loss": 2.0724,
+      "step": 140
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 1.0692921876907349,
+      "learning_rate": 4.974947151175826e-05,
+      "loss": 2.1329,
+      "step": 145
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.9506180286407471,
+      "learning_rate": 4.973192726724572e-05,
+      "loss": 2.082,
+      "step": 150
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.8647387027740479,
+      "learning_rate": 4.9713792638302145e-05,
+      "loss": 2.0366,
+      "step": 155
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 1.105302095413208,
+      "learning_rate": 4.969506805782555e-05,
+      "loss": 2.1481,
+      "step": 160
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.7593303918838501,
+      "learning_rate": 4.967575397279689e-05,
+      "loss": 2.032,
+      "step": 165
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.7521979808807373,
+      "learning_rate": 4.965585084426943e-05,
+      "loss": 2.0379,
+      "step": 170
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.947120726108551,
+      "learning_rate": 4.9635359147357655e-05,
+      "loss": 2.1444,
+      "step": 175
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.2184454202651978,
+      "learning_rate": 4.961427937122598e-05,
+      "loss": 1.9164,
+      "step": 180
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.221663475036621,
+      "learning_rate": 4.959261201907707e-05,
+      "loss": 2.0084,
+      "step": 185
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.0457361936569214,
+      "learning_rate": 4.957035760813982e-05,
+      "loss": 2.2032,
+      "step": 190
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.8834909200668335,
+      "learning_rate": 4.954751666965701e-05,
+      "loss": 2.2101,
+      "step": 195
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.791902482509613,
+      "learning_rate": 4.9524089748872615e-05,
+      "loss": 2.0472,
+      "step": 200
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.2905739545822144,
+      "learning_rate": 4.9500077405018807e-05,
+      "loss": 2.0987,
+      "step": 205
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.8612006306648254,
+      "learning_rate": 4.9475480211302583e-05,
+      "loss": 2.1765,
+      "step": 210
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.3128459453582764,
+      "learning_rate": 4.945029875489212e-05,
+      "loss": 1.9926,
+      "step": 215
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.9610918164253235,
+      "learning_rate": 4.94245336369027e-05,
+      "loss": 2.0124,
+      "step": 220
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.873160183429718,
+      "learning_rate": 4.939818547238241e-05,
+      "loss": 2.2229,
+      "step": 225
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.5535285472869873,
+      "learning_rate": 4.9371254890297446e-05,
+      "loss": 2.2013,
+      "step": 230
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.1951836347579956,
+      "learning_rate": 4.93437425335171e-05,
+      "loss": 2.014,
+      "step": 235
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.7874170541763306,
+      "learning_rate": 4.9315649058798384e-05,
+      "loss": 2.1701,
+      "step": 240
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.3503323793411255,
+      "learning_rate": 4.928697513677042e-05,
+      "loss": 2.1681,
+      "step": 245
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.3091179132461548,
+      "learning_rate": 4.925772145191834e-05,
+      "loss": 2.1224,
+      "step": 250
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.4428555965423584,
+      "learning_rate": 4.9227888702567044e-05,
+      "loss": 2.0512,
+      "step": 255
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 0.8234395980834961,
+      "learning_rate": 4.9197477600864446e-05,
+      "loss": 2.1067,
+      "step": 260
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.9094969034194946,
+      "learning_rate": 4.9166488872764526e-05,
+      "loss": 1.8884,
+      "step": 265
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.0074087381362915,
+      "learning_rate": 4.913492325800999e-05,
+      "loss": 1.9345,
+      "step": 270
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 1.0867297649383545,
+      "learning_rate": 4.910278151011458e-05,
+      "loss": 2.1928,
+      "step": 275
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.6842357516288757,
+      "learning_rate": 4.907006439634516e-05,
+      "loss": 2.0407,
+      "step": 280
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.8409023284912109,
+      "learning_rate": 4.903677269770329e-05,
+      "loss": 2.2344,
+      "step": 285
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.8119503259658813,
+      "learning_rate": 4.900290720890671e-05,
+      "loss": 2.1296,
+      "step": 290
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.9938147068023682,
+      "learning_rate": 4.8968468738370244e-05,
+      "loss": 2.152,
+      "step": 295
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.9865244030952454,
+      "learning_rate": 4.8933458108186606e-05,
+      "loss": 1.9623,
+      "step": 300
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 1.3944802284240723,
+      "learning_rate": 4.889787615410672e-05,
+      "loss": 1.915,
+      "step": 305
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.3749767541885376,
+      "learning_rate": 4.886172372551977e-05,
+      "loss": 1.9934,
+      "step": 310
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.9024938941001892,
+      "learning_rate": 4.882500168543294e-05,
+      "loss": 2.1541,
+      "step": 315
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.1978263854980469,
+      "learning_rate": 4.878771091045082e-05,
+      "loss": 2.1688,
+      "step": 320
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.8360010981559753,
+      "learning_rate": 4.874985229075446e-05,
+      "loss": 2.1387,
+      "step": 325
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.7683364152908325,
+      "learning_rate": 4.871142673008012e-05,
+      "loss": 2.0215,
+      "step": 330
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.4230670928955078,
+      "learning_rate": 4.867243514569772e-05,
+      "loss": 1.9491,
+      "step": 335
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 0.8198773860931396,
+      "learning_rate": 4.863287846838891e-05,
+      "loss": 2.0151,
+      "step": 340
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.467207908630371,
+      "learning_rate": 4.85927576424249e-05,
+      "loss": 1.8906,
+      "step": 345
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 0.9537095427513123,
+      "learning_rate": 4.855207362554385e-05,
+      "loss": 2.1844,
+      "step": 350
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.0757155418395996,
+      "learning_rate": 4.851082738892809e-05,
+      "loss": 2.048,
+      "step": 355
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.6884938478469849,
+      "learning_rate": 4.8469019917180846e-05,
+      "loss": 1.9537,
+      "step": 360
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.4680182933807373,
+      "learning_rate": 4.8426652208302814e-05,
+      "loss": 1.9731,
+      "step": 365
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.1778632402420044,
+      "learning_rate": 4.83837252736683e-05,
+      "loss": 2.1395,
+      "step": 370
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.2865056991577148,
+      "learning_rate": 4.834024013800108e-05,
+      "loss": 2.0016,
+      "step": 375
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.055177092552185,
+      "learning_rate": 4.8296197839349944e-05,
+      "loss": 1.9632,
+      "step": 380
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.0041871070861816,
+      "learning_rate": 4.825159942906389e-05,
+      "loss": 2.3302,
+      "step": 385
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.0026438236236572,
+      "learning_rate": 4.820644597176709e-05,
+      "loss": 2.1517,
+      "step": 390
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.3532180786132812,
+      "learning_rate": 4.81607385453334e-05,
+      "loss": 2.1229,
+      "step": 395
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 0.7670988440513611,
+      "learning_rate": 4.81144782408607e-05,
+      "loss": 2.1382,
+      "step": 400
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.0405700206756592,
+      "learning_rate": 4.8067666162644774e-05,
+      "loss": 1.9614,
+      "step": 405
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.2252662181854248,
+      "learning_rate": 4.802030342815304e-05,
+      "loss": 2.1399,
+      "step": 410
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.237946629524231,
+      "learning_rate": 4.7972391167997754e-05,
+      "loss": 1.9034,
+      "step": 415
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.8064705729484558,
+      "learning_rate": 4.7923930525909156e-05,
+      "loss": 2.0075,
+      "step": 420
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.8717565536499023,
+      "learning_rate": 4.7874922658708065e-05,
+      "loss": 2.0105,
+      "step": 425
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.6693098545074463,
+      "learning_rate": 4.782536873627832e-05,
+      "loss": 2.0242,
+      "step": 430
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.82447350025177,
+      "learning_rate": 4.777526994153882e-05,
+      "loss": 2.0267,
+      "step": 435
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.9926588535308838,
+      "learning_rate": 4.7724627470415307e-05,
+      "loss": 1.9119,
+      "step": 440
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.0924450159072876,
+      "learning_rate": 4.7673442531811796e-05,
+      "loss": 2.2653,
+      "step": 445
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.1592103242874146,
+      "learning_rate": 4.762171634758177e-05,
+      "loss": 2.0017,
+      "step": 450
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.9172110557556152,
+      "learning_rate": 4.7569450152498927e-05,
+      "loss": 2.1408,
+      "step": 455
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.1897525787353516,
+      "learning_rate": 4.751664519422778e-05,
+      "loss": 2.0935,
+      "step": 460
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.8793094158172607,
+      "learning_rate": 4.746330273329386e-05,
+      "loss": 2.1142,
+      "step": 465
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.4337489604949951,
+      "learning_rate": 4.740942404305356e-05,
+      "loss": 2.1289,
+      "step": 470
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.0251764059066772,
+      "learning_rate": 4.735501040966383e-05,
+      "loss": 1.9741,
+      "step": 475
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.2659822702407837,
+      "learning_rate": 4.730006313205143e-05,
+      "loss": 2.088,
+      "step": 480
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.8884140849113464,
+      "learning_rate": 4.724458352188192e-05,
+      "loss": 2.2079,
+      "step": 485
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.1937768459320068,
+      "learning_rate": 4.718857290352835e-05,
+      "loss": 2.048,
+      "step": 490
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.9741552472114563,
+      "learning_rate": 4.713203261403966e-05,
+      "loss": 2.2569,
+      "step": 495
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.7996780872344971,
+      "learning_rate": 4.707496400310874e-05,
+      "loss": 1.9574,
+      "step": 500
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.8182051181793213,
+      "learning_rate": 4.701736843304025e-05,
+      "loss": 2.0951,
+      "step": 505
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.507320761680603,
+      "learning_rate": 4.695924727871805e-05,
+      "loss": 2.0253,
+      "step": 510
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.759121835231781,
+      "learning_rate": 4.690060192757242e-05,
+      "loss": 2.0602,
+      "step": 515
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.5943195819854736,
+      "learning_rate": 4.684143377954691e-05,
+      "loss": 2.0386,
+      "step": 520
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.8568710088729858,
+      "learning_rate": 4.6781744247064955e-05,
+      "loss": 2.073,
+      "step": 525
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.3352620601654053,
+      "learning_rate": 4.6721534754996125e-05,
+      "loss": 2.1443,
+      "step": 530
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.3417474031448364,
+      "learning_rate": 4.666080674062213e-05,
+      "loss": 2.0288,
+      "step": 535
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.5334464311599731,
+      "learning_rate": 4.659956165360251e-05,
+      "loss": 2.0609,
+      "step": 540
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 0.9658721089363098,
+      "learning_rate": 4.6537800955940005e-05,
+      "loss": 1.9539,
+      "step": 545
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.9197947978973389,
+      "learning_rate": 4.647552612194572e-05,
+      "loss": 2.149,
+      "step": 550
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 0.8512137532234192,
+      "learning_rate": 4.641273863820383e-05,
+      "loss": 1.9722,
+      "step": 555
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.827289342880249,
+      "learning_rate": 4.634944000353622e-05,
+      "loss": 2.0729,
+      "step": 560
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.088416337966919,
+      "learning_rate": 4.628563172896655e-05,
+      "loss": 1.9507,
+      "step": 565
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.3566908836364746,
+      "learning_rate": 4.6221315337684353e-05,
+      "loss": 2.1643,
+      "step": 570
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.3541293144226074,
+      "learning_rate": 4.615649236500854e-05,
+      "loss": 2.1839,
+      "step": 575
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 0.991269588470459,
+      "learning_rate": 4.609116435835083e-05,
+      "loss": 2.0976,
+      "step": 580
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.0280535221099854,
+      "learning_rate": 4.602533287717877e-05,
+      "loss": 2.1474,
+      "step": 585
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.013123631477356,
+      "learning_rate": 4.5958999492978524e-05,
+      "loss": 2.1873,
+      "step": 590
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.1753040552139282,
+      "learning_rate": 4.589216578921737e-05,
+      "loss": 2.1744,
+      "step": 595
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.1839090585708618,
+      "learning_rate": 4.582483336130586e-05,
+      "loss": 1.9982,
+      "step": 600
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.0724798440933228,
+      "learning_rate": 4.575700381655979e-05,
+      "loss": 2.1234,
+      "step": 605
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 2.009913682937622,
+      "learning_rate": 4.5688678774161796e-05,
+      "loss": 1.9478,
+      "step": 610
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.9897060394287109,
+      "learning_rate": 4.561985986512271e-05,
+      "loss": 1.8268,
+      "step": 615
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.8881808519363403,
+      "learning_rate": 4.555054873224263e-05,
+      "loss": 1.9887,
+      "step": 620
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.155900001525879,
+      "learning_rate": 4.54807470300717e-05,
+      "loss": 2.0777,
+      "step": 625
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 0.8782421350479126,
+      "learning_rate": 4.5410456424870596e-05,
+      "loss": 2.0566,
+      "step": 630
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.3324674367904663,
+      "learning_rate": 4.5339678594570795e-05,
+      "loss": 2.047,
+      "step": 635
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.9805939197540283,
+      "learning_rate": 4.526841522873449e-05,
+      "loss": 1.962,
+      "step": 640
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.4999943971633911,
+      "learning_rate": 4.519666802851422e-05,
+      "loss": 2.0972,
+      "step": 645
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.4504961967468262,
+      "learning_rate": 4.5124438706612376e-05,
+      "loss": 2.0041,
+      "step": 650
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 0.9078169465065002,
+      "learning_rate": 4.505172898724018e-05,
+      "loss": 2.1229,
+      "step": 655
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.1635804176330566,
+      "learning_rate": 4.497854060607662e-05,
+      "loss": 2.0195,
+      "step": 660
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.46576726436615,
+      "learning_rate": 4.490487531022699e-05,
+      "loss": 2.0745,
+      "step": 665
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.2094652652740479,
+      "learning_rate": 4.4830734858181145e-05,
+      "loss": 2.1068,
+      "step": 670
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.4738895893096924,
+      "learning_rate": 4.47561210197716e-05,
+      "loss": 1.8088,
+      "step": 675
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.23384690284729,
+      "learning_rate": 4.4681035576131215e-05,
+      "loss": 2.0995,
+      "step": 680
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.8332946300506592,
+      "learning_rate": 4.46054803196507e-05,
+      "loss": 2.0541,
+      "step": 685
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.9207485318183899,
+      "learning_rate": 4.452945705393586e-05,
+      "loss": 2.166,
+      "step": 690
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.292945146560669,
+      "learning_rate": 4.445296759376449e-05,
+      "loss": 2.0784,
+      "step": 695
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.9874763488769531,
+      "learning_rate": 4.437601376504307e-05,
+      "loss": 2.2087,
+      "step": 700
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.9427415132522583,
+      "learning_rate": 4.4298597404763186e-05,
+      "loss": 2.1199,
+      "step": 705
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.7369529008865356,
+      "learning_rate": 4.422072036095768e-05,
+      "loss": 2.0355,
+      "step": 710
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.2423696517944336,
+      "learning_rate": 4.414238449265654e-05,
+      "loss": 2.0011,
+      "step": 715
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.2304831743240356,
+      "learning_rate": 4.406359166984249e-05,
+      "loss": 2.0368,
+      "step": 720
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 0.9090413451194763,
+      "learning_rate": 4.39843437734064e-05,
+      "loss": 1.9983,
+      "step": 725
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.2729507684707642,
+      "learning_rate": 4.390464269510233e-05,
+      "loss": 2.021,
+      "step": 730
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.3009227514266968,
+      "learning_rate": 4.382449033750244e-05,
+      "loss": 1.9743,
+      "step": 735
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.5456056594848633,
+      "learning_rate": 4.37438886139515e-05,
+      "loss": 2.0689,
+      "step": 740
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.3235007524490356,
+      "learning_rate": 4.3662839448521264e-05,
+      "loss": 2.0838,
+      "step": 745
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 2.2074007987976074,
+      "learning_rate": 4.358134477596454e-05,
+      "loss": 2.0835,
+      "step": 750
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.403738021850586,
+      "learning_rate": 4.3499406541668966e-05,
+      "loss": 2.0916,
+      "step": 755
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.0940325260162354,
+      "learning_rate": 4.3417026701610616e-05,
+      "loss": 1.972,
+      "step": 760
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.666353702545166,
+      "learning_rate": 4.3334207222307275e-05,
+      "loss": 1.927,
+      "step": 765
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.0777515172958374,
+      "learning_rate": 4.325095008077154e-05,
+      "loss": 2.1192,
+      "step": 770
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.7218186855316162,
+      "learning_rate": 4.316725726446353e-05,
+      "loss": 2.0774,
+      "step": 775
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.356753945350647,
+      "learning_rate": 4.3083130771243586e-05,
+      "loss": 2.0847,
+      "step": 780
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 0.9967429637908936,
+      "learning_rate": 4.299857260932445e-05,
+      "loss": 2.0485,
+      "step": 785
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.6216442584991455,
+      "learning_rate": 4.2913584797223397e-05,
+      "loss": 2.1008,
+      "step": 790
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.2556742429733276,
+      "learning_rate": 4.2828169363714016e-05,
+      "loss": 1.9209,
+      "step": 795
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.1800439357757568,
+      "learning_rate": 4.274232834777782e-05,
+      "loss": 1.9722,
+      "step": 800
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.1313499212265015,
+      "learning_rate": 4.2656063798555515e-05,
+      "loss": 1.9176,
+      "step": 805
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.137534737586975,
+      "learning_rate": 4.256937777529815e-05,
+      "loss": 1.9929,
+      "step": 810
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.0575093030929565,
+      "learning_rate": 4.2482272347317906e-05,
+      "loss": 2.166,
+      "step": 815
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.5939594507217407,
+      "learning_rate": 4.2394749593938733e-05,
+      "loss": 2.1334,
+      "step": 820
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1045507192611694,
+      "learning_rate": 4.230681160444669e-05,
+      "loss": 2.0853,
+      "step": 825
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.3480136394500732,
+      "learning_rate": 4.221846047804009e-05,
+      "loss": 2.1802,
+      "step": 830
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1822657585144043,
+      "learning_rate": 4.2129698323779366e-05,
+      "loss": 2.0739,
+      "step": 835
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1771117448806763,
+      "learning_rate": 4.204052726053676e-05,
+      "loss": 2.0238,
+      "step": 840
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.4757814407348633,
+      "learning_rate": 4.195094941694571e-05,
+      "loss": 2.1557,
+      "step": 845
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 0.9095075726509094,
+      "learning_rate": 4.1860966931350054e-05,
+      "loss": 2.1666,
+      "step": 850
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.1039543151855469,
+      "learning_rate": 4.1770581951752976e-05,
+      "loss": 2.105,
+      "step": 855
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 0.8517205119132996,
+      "learning_rate": 4.1679796635765735e-05,
+      "loss": 1.9656,
+      "step": 860
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.239492654800415,
+      "learning_rate": 4.158861315055617e-05,
+      "loss": 2.0166,
+      "step": 865
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.1358321905136108,
+      "learning_rate": 4.1497033672796924e-05,
+      "loss": 2.0076,
+      "step": 870
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.6215249300003052,
+      "learning_rate": 4.140506038861356e-05,
+      "loss": 2.1594,
+      "step": 875
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.0528080463409424,
+      "learning_rate": 4.131269549353229e-05,
+      "loss": 2.1416,
+      "step": 880
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 0.8976901769638062,
+      "learning_rate": 4.1219941192427644e-05,
+      "loss": 2.1242,
+      "step": 885
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.263594388961792,
+      "learning_rate": 4.112679969946977e-05,
+      "loss": 2.02,
+      "step": 890
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.4173017740249634,
+      "learning_rate": 4.103327323807162e-05,
+      "loss": 2.0438,
+      "step": 895
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.876170039176941,
+      "learning_rate": 4.093936404083585e-05,
+      "loss": 1.9806,
+      "step": 900
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.4649231433868408,
+      "learning_rate": 4.0845074349501544e-05,
+      "loss": 2.1476,
+      "step": 905
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.0446043014526367,
+      "learning_rate": 4.0750406414890695e-05,
+      "loss": 1.9672,
+      "step": 910
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.0225305557250977,
+      "learning_rate": 4.065536249685448e-05,
+      "loss": 1.9984,
+      "step": 915
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0120617151260376,
+      "learning_rate": 4.055994486421929e-05,
+      "loss": 2.1162,
+      "step": 920
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0469881296157837,
+      "learning_rate": 4.04641557947326e-05,
+      "loss": 2.0435,
+      "step": 925
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.2435941696166992,
+      "learning_rate": 4.036799757500856e-05,
+      "loss": 2.0431,
+      "step": 930
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0055103302001953,
+      "learning_rate": 4.027147250047348e-05,
+      "loss": 2.2021,
+      "step": 935
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.1212949752807617,
+      "learning_rate": 4.017458287531094e-05,
+      "loss": 1.997,
+      "step": 940
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.1048357486724854,
+      "learning_rate": 4.007733101240685e-05,
+      "loss": 1.946,
+      "step": 945
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.4721689224243164,
+      "learning_rate": 3.997971923329426e-05,
+      "loss": 2.0723,
+      "step": 950
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.3793156147003174,
+      "learning_rate": 3.988174986809783e-05,
+      "loss": 2.034,
+      "step": 955
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.9013482928276062,
+      "learning_rate": 3.9783425255478355e-05,
+      "loss": 1.9736,
+      "step": 960
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.9192422032356262,
+      "learning_rate": 3.968474774257682e-05,
+      "loss": 1.9878,
+      "step": 965
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.9304206371307373,
+      "learning_rate": 3.9585719684958446e-05,
+      "loss": 2.117,
+      "step": 970
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.0435137748718262,
+      "learning_rate": 3.948634344655639e-05,
+      "loss": 2.0585,
+      "step": 975
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.4636590480804443,
+      "learning_rate": 3.938662139961538e-05,
+      "loss": 2.0409,
+      "step": 980
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.8014529943466187,
+      "learning_rate": 3.928655592463508e-05,
+      "loss": 2.0369,
+      "step": 985
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.2412620782852173,
+      "learning_rate": 3.918614941031319e-05,
+      "loss": 1.967,
+      "step": 990
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.3581103086471558,
+      "learning_rate": 3.908540425348852e-05,
+      "loss": 2.0037,
+      "step": 995
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.2377780675888062,
+      "learning_rate": 3.8984322859083725e-05,
+      "loss": 1.9991,
+      "step": 1000
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 0.9209259748458862,
+      "learning_rate": 3.8882907640047896e-05,
+      "loss": 2.0448,
+      "step": 1005
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.0150959491729736,
+      "learning_rate": 3.878116101729897e-05,
+      "loss": 2.0791,
+      "step": 1010
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.5959141254425049,
+      "learning_rate": 3.867908541966594e-05,
+      "loss": 1.9997,
+      "step": 1015
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.3945012092590332,
+      "learning_rate": 3.857668328383088e-05,
+      "loss": 2.0481,
+      "step": 1020
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.2361671924591064,
+      "learning_rate": 3.847395705427075e-05,
+      "loss": 2.2664,
+      "step": 1025
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.9661719799041748,
+      "learning_rate": 3.837090918319909e-05,
+      "loss": 1.9752,
+      "step": 1030
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.6995949745178223,
+      "learning_rate": 3.8267542130507436e-05,
+      "loss": 2.1332,
+      "step": 1035
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.1248412132263184,
+      "learning_rate": 3.816385836370663e-05,
+      "loss": 2.0432,
+      "step": 1040
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 0.8734235763549805,
+      "learning_rate": 3.805986035786789e-05,
+      "loss": 1.9618,
+      "step": 1045
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.322766661643982,
+      "learning_rate": 3.795555059556378e-05,
+      "loss": 2.0267,
+      "step": 1050
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.0396028757095337,
+      "learning_rate": 3.7850931566808866e-05,
+      "loss": 2.1075,
+      "step": 1055
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 0.9574625492095947,
+      "learning_rate": 3.7746005769000363e-05,
+      "loss": 2.156,
+      "step": 1060
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.4480133056640625,
+      "learning_rate": 3.764077570685844e-05,
+      "loss": 1.9615,
+      "step": 1065
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.5908560752868652,
+      "learning_rate": 3.753524389236648e-05,
+      "loss": 2.0928,
+      "step": 1070
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.2628813982009888,
+      "learning_rate": 3.742941284471111e-05,
+      "loss": 2.1074,
+      "step": 1075
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.2687503099441528,
+      "learning_rate": 3.7323285090222054e-05,
+      "loss": 1.9666,
+      "step": 1080
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.2571731805801392,
+      "learning_rate": 3.721686316231181e-05,
+      "loss": 2.0468,
+      "step": 1085
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.007453441619873,
+      "learning_rate": 3.7110149601415215e-05,
+      "loss": 2.0624,
+      "step": 1090
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.2390377521514893,
+      "learning_rate": 3.700314695492876e-05,
+      "loss": 1.9888,
+      "step": 1095
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.0878371000289917,
+      "learning_rate": 3.6895857777149825e-05,
+      "loss": 2.1013,
+      "step": 1100
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3215,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "total_flos": 1.478467994517504e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1100/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79e80b13ff00898b4493d440a3c1a1eb234c0ae541cbca8a8b1befef97a354c9
+size 5112

checkpoint-1200/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: hfl/chinese-alpaca-2-1.3b
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-1200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "hfl/chinese-alpaca-2-1.3b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7ac7f86b3d698764177b292c0945fc14ade25e4053b2ea32433e2ec468c1c68
+size 2099272

checkpoint-1200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f46418bdb2847edff424887e74f54e939ccb878883a90f7033fb72d289847b08
+size 4208302

checkpoint-1200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76802f226aa39edc0b86081075bc5ce21c5a32a4f1656a577b0f88858dbbf174
+size 14244

checkpoint-1200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca63f72cf59858dda6b2859e21cee9d57c26194ed3023a7e6e3eb27a883baab6
+size 1064

checkpoint-1200/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1200/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3b8844863b200dfcca971db228e96ce388290dfcf72c15d7a9d2f604bac787c
+size 844403

checkpoint-1200/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": true,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32000": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% set system_message = 'You are a helpful assistant. 你是一个乐于助人的助手。' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if loop.index0 == 0 and system_message is defined %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ '<s>' + '[INST] ' + content + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ content + '</s>' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false,
+  "use_fast": false
+}

checkpoint-1200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1701 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.3731488319664166,
+  "eval_steps": 500,
+  "global_step": 1200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.47774845361709595,
+      "learning_rate": 4.999970160815579e-05,
+      "loss": 2.0765,
+      "step": 5
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6051416397094727,
+      "learning_rate": 4.999880643974619e-05,
+      "loss": 2.2297,
+      "step": 10
+    },
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.6161717772483826,
+      "learning_rate": 4.9997314516140056e-05,
+      "loss": 2.1103,
+      "step": 15
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.4686434268951416,
+      "learning_rate": 4.999522587295162e-05,
+      "loss": 2.0057,
+      "step": 20
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8412289023399353,
+      "learning_rate": 4.999254056003963e-05,
+      "loss": 2.1778,
+      "step": 25
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.5333625078201294,
+      "learning_rate": 4.99892586415061e-05,
+      "loss": 2.2399,
+      "step": 30
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.821148157119751,
+      "learning_rate": 4.9985380195694856e-05,
+      "loss": 2.3215,
+      "step": 35
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.8403909206390381,
+      "learning_rate": 4.998090531518962e-05,
+      "loss": 1.8295,
+      "step": 40
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.6633398532867432,
+      "learning_rate": 4.9975834106811834e-05,
+      "loss": 2.0195,
+      "step": 45
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6386868357658386,
+      "learning_rate": 4.997016669161806e-05,
+      "loss": 2.1257,
+      "step": 50
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.7762248516082764,
+      "learning_rate": 4.996390320489715e-05,
+      "loss": 2.057,
+      "step": 55
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.3192856311798096,
+      "learning_rate": 4.9957043796166966e-05,
+      "loss": 2.0753,
+      "step": 60
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.9797518849372864,
+      "learning_rate": 4.994958862917083e-05,
+      "loss": 1.9736,
+      "step": 65
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.000693440437317,
+      "learning_rate": 4.994153788187363e-05,
+      "loss": 2.1572,
+      "step": 70
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.6852813959121704,
+      "learning_rate": 4.993289174645757e-05,
+      "loss": 2.1491,
+      "step": 75
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 1.0075691938400269,
+      "learning_rate": 4.992365042931752e-05,
+      "loss": 1.945,
+      "step": 80
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.1973133087158203,
+      "learning_rate": 4.991381415105619e-05,
+      "loss": 2.0811,
+      "step": 85
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9927239418029785,
+      "learning_rate": 4.990338314647881e-05,
+      "loss": 1.961,
+      "step": 90
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9499759674072266,
+      "learning_rate": 4.98923576645875e-05,
+      "loss": 2.0653,
+      "step": 95
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.7233040928840637,
+      "learning_rate": 4.9880737968575365e-05,
+      "loss": 1.9999,
+      "step": 100
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 1.55235755443573,
+      "learning_rate": 4.986852433582022e-05,
+      "loss": 2.2258,
+      "step": 105
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.9007890820503235,
+      "learning_rate": 4.985571705787793e-05,
+      "loss": 2.1034,
+      "step": 110
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.6774860620498657,
+      "learning_rate": 4.9842316440475475e-05,
+      "loss": 2.1753,
+      "step": 115
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.7676737308502197,
+      "learning_rate": 4.9828322803503665e-05,
+      "loss": 2.1384,
+      "step": 120
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9624544978141785,
+      "learning_rate": 4.981373648100946e-05,
+      "loss": 2.0521,
+      "step": 125
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9315722584724426,
+      "learning_rate": 4.979855782118802e-05,
+      "loss": 1.9256,
+      "step": 130
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.9035864472389221,
+      "learning_rate": 4.978278718637443e-05,
+      "loss": 2.0882,
+      "step": 135
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.7997236251831055,
+      "learning_rate": 4.9766424953035e-05,
+      "loss": 2.0724,
+      "step": 140
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 1.0692921876907349,
+      "learning_rate": 4.974947151175826e-05,
+      "loss": 2.1329,
+      "step": 145
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.9506180286407471,
+      "learning_rate": 4.973192726724572e-05,
+      "loss": 2.082,
+      "step": 150
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.8647387027740479,
+      "learning_rate": 4.9713792638302145e-05,
+      "loss": 2.0366,
+      "step": 155
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 1.105302095413208,
+      "learning_rate": 4.969506805782555e-05,
+      "loss": 2.1481,
+      "step": 160
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.7593303918838501,
+      "learning_rate": 4.967575397279689e-05,
+      "loss": 2.032,
+      "step": 165
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.7521979808807373,
+      "learning_rate": 4.965585084426943e-05,
+      "loss": 2.0379,
+      "step": 170
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.947120726108551,
+      "learning_rate": 4.9635359147357655e-05,
+      "loss": 2.1444,
+      "step": 175
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.2184454202651978,
+      "learning_rate": 4.961427937122598e-05,
+      "loss": 1.9164,
+      "step": 180
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.221663475036621,
+      "learning_rate": 4.959261201907707e-05,
+      "loss": 2.0084,
+      "step": 185
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.0457361936569214,
+      "learning_rate": 4.957035760813982e-05,
+      "loss": 2.2032,
+      "step": 190
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.8834909200668335,
+      "learning_rate": 4.954751666965701e-05,
+      "loss": 2.2101,
+      "step": 195
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.791902482509613,
+      "learning_rate": 4.9524089748872615e-05,
+      "loss": 2.0472,
+      "step": 200
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 1.2905739545822144,
+      "learning_rate": 4.9500077405018807e-05,
+      "loss": 2.0987,
+      "step": 205
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.8612006306648254,
+      "learning_rate": 4.9475480211302583e-05,
+      "loss": 2.1765,
+      "step": 210
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.3128459453582764,
+      "learning_rate": 4.945029875489212e-05,
+      "loss": 1.9926,
+      "step": 215
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.9610918164253235,
+      "learning_rate": 4.94245336369027e-05,
+      "loss": 2.0124,
+      "step": 220
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.873160183429718,
+      "learning_rate": 4.939818547238241e-05,
+      "loss": 2.2229,
+      "step": 225
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.5535285472869873,
+      "learning_rate": 4.9371254890297446e-05,
+      "loss": 2.2013,
+      "step": 230
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 1.1951836347579956,
+      "learning_rate": 4.93437425335171e-05,
+      "loss": 2.014,
+      "step": 235
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.7874170541763306,
+      "learning_rate": 4.9315649058798384e-05,
+      "loss": 2.1701,
+      "step": 240
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.3503323793411255,
+      "learning_rate": 4.928697513677042e-05,
+      "loss": 2.1681,
+      "step": 245
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.3091179132461548,
+      "learning_rate": 4.925772145191834e-05,
+      "loss": 2.1224,
+      "step": 250
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.4428555965423584,
+      "learning_rate": 4.9227888702567044e-05,
+      "loss": 2.0512,
+      "step": 255
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 0.8234395980834961,
+      "learning_rate": 4.9197477600864446e-05,
+      "loss": 2.1067,
+      "step": 260
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.9094969034194946,
+      "learning_rate": 4.9166488872764526e-05,
+      "loss": 1.8884,
+      "step": 265
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 1.0074087381362915,
+      "learning_rate": 4.913492325800999e-05,
+      "loss": 1.9345,
+      "step": 270
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 1.0867297649383545,
+      "learning_rate": 4.910278151011458e-05,
+      "loss": 2.1928,
+      "step": 275
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.6842357516288757,
+      "learning_rate": 4.907006439634516e-05,
+      "loss": 2.0407,
+      "step": 280
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.8409023284912109,
+      "learning_rate": 4.903677269770329e-05,
+      "loss": 2.2344,
+      "step": 285
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.8119503259658813,
+      "learning_rate": 4.900290720890671e-05,
+      "loss": 2.1296,
+      "step": 290
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.9938147068023682,
+      "learning_rate": 4.8968468738370244e-05,
+      "loss": 2.152,
+      "step": 295
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.9865244030952454,
+      "learning_rate": 4.8933458108186606e-05,
+      "loss": 1.9623,
+      "step": 300
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 1.3944802284240723,
+      "learning_rate": 4.889787615410672e-05,
+      "loss": 1.915,
+      "step": 305
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.3749767541885376,
+      "learning_rate": 4.886172372551977e-05,
+      "loss": 1.9934,
+      "step": 310
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.9024938941001892,
+      "learning_rate": 4.882500168543294e-05,
+      "loss": 2.1541,
+      "step": 315
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.1978263854980469,
+      "learning_rate": 4.878771091045082e-05,
+      "loss": 2.1688,
+      "step": 320
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.8360010981559753,
+      "learning_rate": 4.874985229075446e-05,
+      "loss": 2.1387,
+      "step": 325
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.7683364152908325,
+      "learning_rate": 4.871142673008012e-05,
+      "loss": 2.0215,
+      "step": 330
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 1.4230670928955078,
+      "learning_rate": 4.867243514569772e-05,
+      "loss": 1.9491,
+      "step": 335
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 0.8198773860931396,
+      "learning_rate": 4.863287846838891e-05,
+      "loss": 2.0151,
+      "step": 340
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.467207908630371,
+      "learning_rate": 4.85927576424249e-05,
+      "loss": 1.8906,
+      "step": 345
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 0.9537095427513123,
+      "learning_rate": 4.855207362554385e-05,
+      "loss": 2.1844,
+      "step": 350
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.0757155418395996,
+      "learning_rate": 4.851082738892809e-05,
+      "loss": 2.048,
+      "step": 355
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.6884938478469849,
+      "learning_rate": 4.8469019917180846e-05,
+      "loss": 1.9537,
+      "step": 360
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 1.4680182933807373,
+      "learning_rate": 4.8426652208302814e-05,
+      "loss": 1.9731,
+      "step": 365
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.1778632402420044,
+      "learning_rate": 4.83837252736683e-05,
+      "loss": 2.1395,
+      "step": 370
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.2865056991577148,
+      "learning_rate": 4.834024013800108e-05,
+      "loss": 2.0016,
+      "step": 375
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.055177092552185,
+      "learning_rate": 4.8296197839349944e-05,
+      "loss": 1.9632,
+      "step": 380
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.0041871070861816,
+      "learning_rate": 4.825159942906389e-05,
+      "loss": 2.3302,
+      "step": 385
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.0026438236236572,
+      "learning_rate": 4.820644597176709e-05,
+      "loss": 2.1517,
+      "step": 390
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 1.3532180786132812,
+      "learning_rate": 4.81607385453334e-05,
+      "loss": 2.1229,
+      "step": 395
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 0.7670988440513611,
+      "learning_rate": 4.81144782408607e-05,
+      "loss": 2.1382,
+      "step": 400
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.0405700206756592,
+      "learning_rate": 4.8067666162644774e-05,
+      "loss": 1.9614,
+      "step": 405
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.2252662181854248,
+      "learning_rate": 4.802030342815304e-05,
+      "loss": 2.1399,
+      "step": 410
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.237946629524231,
+      "learning_rate": 4.7972391167997754e-05,
+      "loss": 1.9034,
+      "step": 415
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.8064705729484558,
+      "learning_rate": 4.7923930525909156e-05,
+      "loss": 2.0075,
+      "step": 420
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.8717565536499023,
+      "learning_rate": 4.7874922658708065e-05,
+      "loss": 2.0105,
+      "step": 425
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 1.6693098545074463,
+      "learning_rate": 4.782536873627832e-05,
+      "loss": 2.0242,
+      "step": 430
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.82447350025177,
+      "learning_rate": 4.777526994153882e-05,
+      "loss": 2.0267,
+      "step": 435
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.9926588535308838,
+      "learning_rate": 4.7724627470415307e-05,
+      "loss": 1.9119,
+      "step": 440
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.0924450159072876,
+      "learning_rate": 4.7673442531811796e-05,
+      "loss": 2.2653,
+      "step": 445
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.1592103242874146,
+      "learning_rate": 4.762171634758177e-05,
+      "loss": 2.0017,
+      "step": 450
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.9172110557556152,
+      "learning_rate": 4.7569450152498927e-05,
+      "loss": 2.1408,
+      "step": 455
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 1.1897525787353516,
+      "learning_rate": 4.751664519422778e-05,
+      "loss": 2.0935,
+      "step": 460
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.8793094158172607,
+      "learning_rate": 4.746330273329386e-05,
+      "loss": 2.1142,
+      "step": 465
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.4337489604949951,
+      "learning_rate": 4.740942404305356e-05,
+      "loss": 2.1289,
+      "step": 470
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.0251764059066772,
+      "learning_rate": 4.735501040966383e-05,
+      "loss": 1.9741,
+      "step": 475
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.2659822702407837,
+      "learning_rate": 4.730006313205143e-05,
+      "loss": 2.088,
+      "step": 480
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.8884140849113464,
+      "learning_rate": 4.724458352188192e-05,
+      "loss": 2.2079,
+      "step": 485
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 1.1937768459320068,
+      "learning_rate": 4.718857290352835e-05,
+      "loss": 2.048,
+      "step": 490
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.9741552472114563,
+      "learning_rate": 4.713203261403966e-05,
+      "loss": 2.2569,
+      "step": 495
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.7996780872344971,
+      "learning_rate": 4.707496400310874e-05,
+      "loss": 1.9574,
+      "step": 500
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.8182051181793213,
+      "learning_rate": 4.701736843304025e-05,
+      "loss": 2.0951,
+      "step": 505
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.507320761680603,
+      "learning_rate": 4.695924727871805e-05,
+      "loss": 2.0253,
+      "step": 510
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.759121835231781,
+      "learning_rate": 4.690060192757242e-05,
+      "loss": 2.0602,
+      "step": 515
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.5943195819854736,
+      "learning_rate": 4.684143377954691e-05,
+      "loss": 2.0386,
+      "step": 520
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.8568710088729858,
+      "learning_rate": 4.6781744247064955e-05,
+      "loss": 2.073,
+      "step": 525
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 1.3352620601654053,
+      "learning_rate": 4.6721534754996125e-05,
+      "loss": 2.1443,
+      "step": 530
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.3417474031448364,
+      "learning_rate": 4.666080674062213e-05,
+      "loss": 2.0288,
+      "step": 535
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.5334464311599731,
+      "learning_rate": 4.659956165360251e-05,
+      "loss": 2.0609,
+      "step": 540
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 0.9658721089363098,
+      "learning_rate": 4.6537800955940005e-05,
+      "loss": 1.9539,
+      "step": 545
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.9197947978973389,
+      "learning_rate": 4.647552612194572e-05,
+      "loss": 2.149,
+      "step": 550
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 0.8512137532234192,
+      "learning_rate": 4.641273863820383e-05,
+      "loss": 1.9722,
+      "step": 555
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.827289342880249,
+      "learning_rate": 4.634944000353622e-05,
+      "loss": 2.0729,
+      "step": 560
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.088416337966919,
+      "learning_rate": 4.628563172896655e-05,
+      "loss": 1.9507,
+      "step": 565
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.3566908836364746,
+      "learning_rate": 4.6221315337684353e-05,
+      "loss": 2.1643,
+      "step": 570
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.3541293144226074,
+      "learning_rate": 4.615649236500854e-05,
+      "loss": 2.1839,
+      "step": 575
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 0.991269588470459,
+      "learning_rate": 4.609116435835083e-05,
+      "loss": 2.0976,
+      "step": 580
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.0280535221099854,
+      "learning_rate": 4.602533287717877e-05,
+      "loss": 2.1474,
+      "step": 585
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 1.013123631477356,
+      "learning_rate": 4.5958999492978524e-05,
+      "loss": 2.1873,
+      "step": 590
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.1753040552139282,
+      "learning_rate": 4.589216578921737e-05,
+      "loss": 2.1744,
+      "step": 595
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.1839090585708618,
+      "learning_rate": 4.582483336130586e-05,
+      "loss": 1.9982,
+      "step": 600
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.0724798440933228,
+      "learning_rate": 4.575700381655979e-05,
+      "loss": 2.1234,
+      "step": 605
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 2.009913682937622,
+      "learning_rate": 4.5688678774161796e-05,
+      "loss": 1.9478,
+      "step": 610
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.9897060394287109,
+      "learning_rate": 4.561985986512271e-05,
+      "loss": 1.8268,
+      "step": 615
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.8881808519363403,
+      "learning_rate": 4.555054873224263e-05,
+      "loss": 1.9887,
+      "step": 620
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 1.155900001525879,
+      "learning_rate": 4.54807470300717e-05,
+      "loss": 2.0777,
+      "step": 625
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 0.8782421350479126,
+      "learning_rate": 4.5410456424870596e-05,
+      "loss": 2.0566,
+      "step": 630
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.3324674367904663,
+      "learning_rate": 4.5339678594570795e-05,
+      "loss": 2.047,
+      "step": 635
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.9805939197540283,
+      "learning_rate": 4.526841522873449e-05,
+      "loss": 1.962,
+      "step": 640
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.4999943971633911,
+      "learning_rate": 4.519666802851422e-05,
+      "loss": 2.0972,
+      "step": 645
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 1.4504961967468262,
+      "learning_rate": 4.5124438706612376e-05,
+      "loss": 2.0041,
+      "step": 650
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 0.9078169465065002,
+      "learning_rate": 4.505172898724018e-05,
+      "loss": 2.1229,
+      "step": 655
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.1635804176330566,
+      "learning_rate": 4.497854060607662e-05,
+      "loss": 2.0195,
+      "step": 660
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.46576726436615,
+      "learning_rate": 4.490487531022699e-05,
+      "loss": 2.0745,
+      "step": 665
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.2094652652740479,
+      "learning_rate": 4.4830734858181145e-05,
+      "loss": 2.1068,
+      "step": 670
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.4738895893096924,
+      "learning_rate": 4.47561210197716e-05,
+      "loss": 1.8088,
+      "step": 675
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 1.23384690284729,
+      "learning_rate": 4.4681035576131215e-05,
+      "loss": 2.0995,
+      "step": 680
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.8332946300506592,
+      "learning_rate": 4.46054803196507e-05,
+      "loss": 2.0541,
+      "step": 685
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.9207485318183899,
+      "learning_rate": 4.452945705393586e-05,
+      "loss": 2.166,
+      "step": 690
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.292945146560669,
+      "learning_rate": 4.445296759376449e-05,
+      "loss": 2.0784,
+      "step": 695
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.9874763488769531,
+      "learning_rate": 4.437601376504307e-05,
+      "loss": 2.2087,
+      "step": 700
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.9427415132522583,
+      "learning_rate": 4.4298597404763186e-05,
+      "loss": 2.1199,
+      "step": 705
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.7369529008865356,
+      "learning_rate": 4.422072036095768e-05,
+      "loss": 2.0355,
+      "step": 710
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.2423696517944336,
+      "learning_rate": 4.414238449265654e-05,
+      "loss": 2.0011,
+      "step": 715
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 1.2304831743240356,
+      "learning_rate": 4.406359166984249e-05,
+      "loss": 2.0368,
+      "step": 720
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 0.9090413451194763,
+      "learning_rate": 4.39843437734064e-05,
+      "loss": 1.9983,
+      "step": 725
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.2729507684707642,
+      "learning_rate": 4.390464269510233e-05,
+      "loss": 2.021,
+      "step": 730
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.3009227514266968,
+      "learning_rate": 4.382449033750244e-05,
+      "loss": 1.9743,
+      "step": 735
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.5456056594848633,
+      "learning_rate": 4.37438886139515e-05,
+      "loss": 2.0689,
+      "step": 740
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.3235007524490356,
+      "learning_rate": 4.3662839448521264e-05,
+      "loss": 2.0838,
+      "step": 745
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 2.2074007987976074,
+      "learning_rate": 4.358134477596454e-05,
+      "loss": 2.0835,
+      "step": 750
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 1.403738021850586,
+      "learning_rate": 4.3499406541668966e-05,
+      "loss": 2.0916,
+      "step": 755
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.0940325260162354,
+      "learning_rate": 4.3417026701610616e-05,
+      "loss": 1.972,
+      "step": 760
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.666353702545166,
+      "learning_rate": 4.3334207222307275e-05,
+      "loss": 1.927,
+      "step": 765
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.0777515172958374,
+      "learning_rate": 4.325095008077154e-05,
+      "loss": 2.1192,
+      "step": 770
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.7218186855316162,
+      "learning_rate": 4.316725726446353e-05,
+      "loss": 2.0774,
+      "step": 775
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 1.356753945350647,
+      "learning_rate": 4.3083130771243586e-05,
+      "loss": 2.0847,
+      "step": 780
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 0.9967429637908936,
+      "learning_rate": 4.299857260932445e-05,
+      "loss": 2.0485,
+      "step": 785
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.6216442584991455,
+      "learning_rate": 4.2913584797223397e-05,
+      "loss": 2.1008,
+      "step": 790
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.2556742429733276,
+      "learning_rate": 4.2828169363714016e-05,
+      "loss": 1.9209,
+      "step": 795
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.1800439357757568,
+      "learning_rate": 4.274232834777782e-05,
+      "loss": 1.9722,
+      "step": 800
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.1313499212265015,
+      "learning_rate": 4.2656063798555515e-05,
+      "loss": 1.9176,
+      "step": 805
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.137534737586975,
+      "learning_rate": 4.256937777529815e-05,
+      "loss": 1.9929,
+      "step": 810
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.0575093030929565,
+      "learning_rate": 4.2482272347317906e-05,
+      "loss": 2.166,
+      "step": 815
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 1.5939594507217407,
+      "learning_rate": 4.2394749593938733e-05,
+      "loss": 2.1334,
+      "step": 820
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1045507192611694,
+      "learning_rate": 4.230681160444669e-05,
+      "loss": 2.0853,
+      "step": 825
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.3480136394500732,
+      "learning_rate": 4.221846047804009e-05,
+      "loss": 2.1802,
+      "step": 830
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1822657585144043,
+      "learning_rate": 4.2129698323779366e-05,
+      "loss": 2.0739,
+      "step": 835
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.1771117448806763,
+      "learning_rate": 4.204052726053676e-05,
+      "loss": 2.0238,
+      "step": 840
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 1.4757814407348633,
+      "learning_rate": 4.195094941694571e-05,
+      "loss": 2.1557,
+      "step": 845
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 0.9095075726509094,
+      "learning_rate": 4.1860966931350054e-05,
+      "loss": 2.1666,
+      "step": 850
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.1039543151855469,
+      "learning_rate": 4.1770581951752976e-05,
+      "loss": 2.105,
+      "step": 855
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 0.8517205119132996,
+      "learning_rate": 4.1679796635765735e-05,
+      "loss": 1.9656,
+      "step": 860
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.239492654800415,
+      "learning_rate": 4.158861315055617e-05,
+      "loss": 2.0166,
+      "step": 865
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.1358321905136108,
+      "learning_rate": 4.1497033672796924e-05,
+      "loss": 2.0076,
+      "step": 870
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.6215249300003052,
+      "learning_rate": 4.140506038861356e-05,
+      "loss": 2.1594,
+      "step": 875
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 1.0528080463409424,
+      "learning_rate": 4.131269549353229e-05,
+      "loss": 2.1416,
+      "step": 880
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 0.8976901769638062,
+      "learning_rate": 4.1219941192427644e-05,
+      "loss": 2.1242,
+      "step": 885
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.263594388961792,
+      "learning_rate": 4.112679969946977e-05,
+      "loss": 2.02,
+      "step": 890
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.4173017740249634,
+      "learning_rate": 4.103327323807162e-05,
+      "loss": 2.0438,
+      "step": 895
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.876170039176941,
+      "learning_rate": 4.093936404083585e-05,
+      "loss": 1.9806,
+      "step": 900
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.4649231433868408,
+      "learning_rate": 4.0845074349501544e-05,
+      "loss": 2.1476,
+      "step": 905
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.0446043014526367,
+      "learning_rate": 4.0750406414890695e-05,
+      "loss": 1.9672,
+      "step": 910
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 1.0225305557250977,
+      "learning_rate": 4.065536249685448e-05,
+      "loss": 1.9984,
+      "step": 915
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0120617151260376,
+      "learning_rate": 4.055994486421929e-05,
+      "loss": 2.1162,
+      "step": 920
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0469881296157837,
+      "learning_rate": 4.04641557947326e-05,
+      "loss": 2.0435,
+      "step": 925
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.2435941696166992,
+      "learning_rate": 4.036799757500856e-05,
+      "loss": 2.0431,
+      "step": 930
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.0055103302001953,
+      "learning_rate": 4.027147250047348e-05,
+      "loss": 2.2021,
+      "step": 935
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.1212949752807617,
+      "learning_rate": 4.017458287531094e-05,
+      "loss": 1.997,
+      "step": 940
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 1.1048357486724854,
+      "learning_rate": 4.007733101240685e-05,
+      "loss": 1.946,
+      "step": 945
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.4721689224243164,
+      "learning_rate": 3.997971923329426e-05,
+      "loss": 2.0723,
+      "step": 950
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.3793156147003174,
+      "learning_rate": 3.988174986809783e-05,
+      "loss": 2.034,
+      "step": 955
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.9013482928276062,
+      "learning_rate": 3.9783425255478355e-05,
+      "loss": 1.9736,
+      "step": 960
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.9192422032356262,
+      "learning_rate": 3.968474774257682e-05,
+      "loss": 1.9878,
+      "step": 965
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.9304206371307373,
+      "learning_rate": 3.9585719684958446e-05,
+      "loss": 2.117,
+      "step": 970
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.0435137748718262,
+      "learning_rate": 3.948634344655639e-05,
+      "loss": 2.0585,
+      "step": 975
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 1.4636590480804443,
+      "learning_rate": 3.938662139961538e-05,
+      "loss": 2.0409,
+      "step": 980
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.8014529943466187,
+      "learning_rate": 3.928655592463508e-05,
+      "loss": 2.0369,
+      "step": 985
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.2412620782852173,
+      "learning_rate": 3.918614941031319e-05,
+      "loss": 1.967,
+      "step": 990
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.3581103086471558,
+      "learning_rate": 3.908540425348852e-05,
+      "loss": 2.0037,
+      "step": 995
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.2377780675888062,
+      "learning_rate": 3.8984322859083725e-05,
+      "loss": 1.9991,
+      "step": 1000
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 0.9209259748458862,
+      "learning_rate": 3.8882907640047896e-05,
+      "loss": 2.0448,
+      "step": 1005
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 1.0150959491729736,
+      "learning_rate": 3.878116101729897e-05,
+      "loss": 2.0791,
+      "step": 1010
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.5959141254425049,
+      "learning_rate": 3.867908541966594e-05,
+      "loss": 1.9997,
+      "step": 1015
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.3945012092590332,
+      "learning_rate": 3.857668328383088e-05,
+      "loss": 2.0481,
+      "step": 1020
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.2361671924591064,
+      "learning_rate": 3.847395705427075e-05,
+      "loss": 2.2664,
+      "step": 1025
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.9661719799041748,
+      "learning_rate": 3.837090918319909e-05,
+      "loss": 1.9752,
+      "step": 1030
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.6995949745178223,
+      "learning_rate": 3.8267542130507436e-05,
+      "loss": 2.1332,
+      "step": 1035
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 1.1248412132263184,
+      "learning_rate": 3.816385836370663e-05,
+      "loss": 2.0432,
+      "step": 1040
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 0.8734235763549805,
+      "learning_rate": 3.805986035786789e-05,
+      "loss": 1.9618,
+      "step": 1045
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.322766661643982,
+      "learning_rate": 3.795555059556378e-05,
+      "loss": 2.0267,
+      "step": 1050
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.0396028757095337,
+      "learning_rate": 3.7850931566808866e-05,
+      "loss": 2.1075,
+      "step": 1055
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 0.9574625492095947,
+      "learning_rate": 3.7746005769000363e-05,
+      "loss": 2.156,
+      "step": 1060
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.4480133056640625,
+      "learning_rate": 3.764077570685844e-05,
+      "loss": 1.9615,
+      "step": 1065
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.5908560752868652,
+      "learning_rate": 3.753524389236648e-05,
+      "loss": 2.0928,
+      "step": 1070
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 1.2628813982009888,
+      "learning_rate": 3.742941284471111e-05,
+      "loss": 2.1074,
+      "step": 1075
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.2687503099441528,
+      "learning_rate": 3.7323285090222054e-05,
+      "loss": 1.9666,
+      "step": 1080
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.2571731805801392,
+      "learning_rate": 3.721686316231181e-05,
+      "loss": 2.0468,
+      "step": 1085
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.007453441619873,
+      "learning_rate": 3.7110149601415215e-05,
+      "loss": 2.0624,
+      "step": 1090
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.2390377521514893,
+      "learning_rate": 3.700314695492876e-05,
+      "loss": 1.9888,
+      "step": 1095
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.0878371000289917,
+      "learning_rate": 3.6895857777149825e-05,
+      "loss": 2.1013,
+      "step": 1100
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 0.8759217262268066,
+      "learning_rate": 3.6788284629215624e-05,
+      "loss": 1.875,
+      "step": 1105
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 1.1345970630645752,
+      "learning_rate": 3.668043007904219e-05,
+      "loss": 1.9096,
+      "step": 1110
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 1.253629446029663,
+      "learning_rate": 3.6572296701262966e-05,
+      "loss": 2.1859,
+      "step": 1115
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 0.9796190857887268,
+      "learning_rate": 3.646388707716738e-05,
+      "loss": 2.2092,
+      "step": 1120
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 1.3893767595291138,
+      "learning_rate": 3.635520379463926e-05,
+      "loss": 2.0026,
+      "step": 1125
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 0.8778309226036072,
+      "learning_rate": 3.6246249448095004e-05,
+      "loss": 2.2112,
+      "step": 1130
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 1.2479698657989502,
+      "learning_rate": 3.6137026638421696e-05,
+      "loss": 2.0221,
+      "step": 1135
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 1.3813824653625488,
+      "learning_rate": 3.6027537972914974e-05,
+      "loss": 1.9106,
+      "step": 1140
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 1.2043218612670898,
+      "learning_rate": 3.5917786065216826e-05,
+      "loss": 2.0673,
+      "step": 1145
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 1.5337340831756592,
+      "learning_rate": 3.580777353525318e-05,
+      "loss": 2.1463,
+      "step": 1150
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 1.155813455581665,
+      "learning_rate": 3.5697503009171385e-05,
+      "loss": 2.0255,
+      "step": 1155
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 1.034644365310669,
+      "learning_rate": 3.558697711927748e-05,
+      "loss": 2.1348,
+      "step": 1160
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 1.0959795713424683,
+      "learning_rate": 3.54761985039734e-05,
+      "loss": 2.1457,
+      "step": 1165
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 1.1938838958740234,
+      "learning_rate": 3.5365169807693966e-05,
+      "loss": 2.1256,
+      "step": 1170
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 0.8162047863006592,
+      "learning_rate": 3.525389368084379e-05,
+      "loss": 1.9587,
+      "step": 1175
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 0.9358930587768555,
+      "learning_rate": 3.514237277973393e-05,
+      "loss": 1.8965,
+      "step": 1180
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 0.9210988879203796,
+      "learning_rate": 3.503060976651862e-05,
+      "loss": 1.9669,
+      "step": 1185
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 1.4641343355178833,
+      "learning_rate": 3.491860730913156e-05,
+      "loss": 2.003,
+      "step": 1190
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 1.2458257675170898,
+      "learning_rate": 3.480636808122235e-05,
+      "loss": 2.1487,
+      "step": 1195
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 1.6770122051239014,
+      "learning_rate": 3.469389476209259e-05,
+      "loss": 2.0686,
+      "step": 1200
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 3215,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "total_flos": 1.613880123457536e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:79e80b13ff00898b4493d440a3c1a1eb234c0ae541cbca8a8b1befef97a354c9
+size 5112

checkpoint-1300/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+library_name: peft
+base_model: hfl/chinese-alpaca-2-1.3b
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-1300/adapter_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "hfl/chinese-alpaca-2-1.3b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}