ukk0708 commited on
Commit
3d54587
1 Parent(s): 2e7b74c

Upload 12 files

Browse files
Yi-1.5-9B-Chat-LoRA/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: ../../model/Yi-1.5-9B-Chat
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
Yi-1.5-9B-Chat-LoRA/adapter_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "../../model/Yi-1.5-9B-Chat",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.0,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "v_proj",
24
+ "q_proj"
25
+ ],
26
+ "task_type": "CAUSAL_LM",
27
+ "use_dora": false,
28
+ "use_rslora": false
29
+ }
Yi-1.5-9B-Chat-LoRA/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb59628946cf46f6da675e0a3c8cf755403db49a54e85e3af20ac31bc729241f
3
+ size 19686264
Yi-1.5-9B-Chat-LoRA/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d17f5c1baca039db59602a5ca935d32ffa8a4a88bd76291f4635d8fda23c713
3
+ size 39482810
Yi-1.5-9B-Chat-LoRA/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55f6ef5fd81f87dc8a69f1b9a751ba37cc49c37318322e45ba4733ff23a92208
3
+ size 14512
Yi-1.5-9B-Chat-LoRA/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f510db96e40d7f66609c96cf485c13417fc4eaf253603d2b6591466c3fb5f63a
3
+ size 14512
Yi-1.5-9B-Chat-LoRA/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e147b5d4cbad0fb2ad49d465222adf18aab8e828f7d056a3874291f92779ae3d
3
+ size 1064
Yi-1.5-9B-Chat-LoRA/special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|im_end|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
Yi-1.5-9B-Chat-LoRA/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:386c49cf943d71aa110361135338c50e38beeff0a66593480421f37b319e1a39
3
+ size 1033105
Yi-1.5-9B-Chat-LoRA/tokenizer_config.json ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<|startoftext|>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "<|endoftext|>",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "7": {
31
+ "content": "<|im_end|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ }
38
+ },
39
+ "bos_token": "<|startoftext|>",
40
+ "chat_template": "{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ system_message }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\\n' + content + '<|im_end|>\\n<|im_start|>assistant\\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\\n' }}{% endif %}{% endfor %}",
41
+ "clean_up_tokenization_spaces": false,
42
+ "eos_token": "<|im_end|>",
43
+ "legacy": true,
44
+ "model_max_length": 4096,
45
+ "pad_token": "<unk>",
46
+ "padding_side": "right",
47
+ "sp_model_kwargs": {},
48
+ "spaces_between_special_tokens": false,
49
+ "split_special_tokens": false,
50
+ "tokenizer_class": "LlamaTokenizer",
51
+ "unk_token": "<unk>",
52
+ "use_default_system_prompt": false
53
+ }
Yi-1.5-9B-Chat-LoRA/trainer_state.json ADDED
@@ -0,0 +1,2093 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 2.2857470512390137,
3
+ "best_model_checkpoint": "../../saves/Yi-1.5-9B-Chat/lora/sft/checkpoint-2800",
4
+ "epoch": 4.148148148148148,
5
+ "eval_steps": 200,
6
+ "global_step": 2800,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01,
13
+ "grad_norm": 11.507218360900879,
14
+ "learning_rate": 2e-05,
15
+ "loss": 3.995,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.03,
20
+ "grad_norm": 7.0516510009765625,
21
+ "learning_rate": 4.5e-05,
22
+ "loss": 3.8373,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.04,
27
+ "grad_norm": 5.1662492752075195,
28
+ "learning_rate": 4.999929854041747e-05,
29
+ "loss": 3.4376,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.06,
34
+ "grad_norm": 2.265901565551758,
35
+ "learning_rate": 4.999644892832738e-05,
36
+ "loss": 3.0327,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.07,
41
+ "grad_norm": 4.507016658782959,
42
+ "learning_rate": 4.999140757217391e-05,
43
+ "loss": 2.7494,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.09,
48
+ "grad_norm": 3.204484462738037,
49
+ "learning_rate": 4.9984174913994355e-05,
50
+ "loss": 2.6219,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.1,
55
+ "grad_norm": 1.585574746131897,
56
+ "learning_rate": 4.9974751587964214e-05,
57
+ "loss": 2.6297,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.12,
62
+ "grad_norm": 2.005038261413574,
63
+ "learning_rate": 4.9963138420341604e-05,
64
+ "loss": 2.6139,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.13,
69
+ "grad_norm": 2.9263103008270264,
70
+ "learning_rate": 4.994933642939482e-05,
71
+ "loss": 2.6403,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.15,
76
+ "grad_norm": 2.398559331893921,
77
+ "learning_rate": 4.993334682531302e-05,
78
+ "loss": 2.7009,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.16,
83
+ "grad_norm": 2.3030171394348145,
84
+ "learning_rate": 4.991517101010015e-05,
85
+ "loss": 2.9349,
86
+ "step": 110
87
+ },
88
+ {
89
+ "epoch": 0.18,
90
+ "grad_norm": 2.3386640548706055,
91
+ "learning_rate": 4.9894810577451975e-05,
92
+ "loss": 2.2748,
93
+ "step": 120
94
+ },
95
+ {
96
+ "epoch": 0.19,
97
+ "grad_norm": 1.8978828191757202,
98
+ "learning_rate": 4.9872267312616384e-05,
99
+ "loss": 2.3982,
100
+ "step": 130
101
+ },
102
+ {
103
+ "epoch": 0.21,
104
+ "grad_norm": 2.644758939743042,
105
+ "learning_rate": 4.9847543192236815e-05,
106
+ "loss": 2.4243,
107
+ "step": 140
108
+ },
109
+ {
110
+ "epoch": 0.22,
111
+ "grad_norm": 1.887523889541626,
112
+ "learning_rate": 4.9820640384178954e-05,
113
+ "loss": 2.5096,
114
+ "step": 150
115
+ },
116
+ {
117
+ "epoch": 0.24,
118
+ "grad_norm": 1.8633583784103394,
119
+ "learning_rate": 4.9791561247340674e-05,
120
+ "loss": 2.3775,
121
+ "step": 160
122
+ },
123
+ {
124
+ "epoch": 0.25,
125
+ "grad_norm": 2.8847458362579346,
126
+ "learning_rate": 4.976030833144516e-05,
127
+ "loss": 2.2988,
128
+ "step": 170
129
+ },
130
+ {
131
+ "epoch": 0.27,
132
+ "grad_norm": 1.756564974784851,
133
+ "learning_rate": 4.972688437681736e-05,
134
+ "loss": 2.3573,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 0.28,
139
+ "grad_norm": 4.001959800720215,
140
+ "learning_rate": 4.969129231414374e-05,
141
+ "loss": 2.5404,
142
+ "step": 190
143
+ },
144
+ {
145
+ "epoch": 0.3,
146
+ "grad_norm": 2.489682674407959,
147
+ "learning_rate": 4.9653535264215256e-05,
148
+ "loss": 2.5707,
149
+ "step": 200
150
+ },
151
+ {
152
+ "epoch": 0.3,
153
+ "eval_loss": 2.391483783721924,
154
+ "eval_runtime": 96.186,
155
+ "eval_samples_per_second": 6.238,
156
+ "eval_steps_per_second": 3.119,
157
+ "step": 200
158
+ },
159
+ {
160
+ "epoch": 0.31,
161
+ "grad_norm": 2.316711187362671,
162
+ "learning_rate": 4.961361653765377e-05,
163
+ "loss": 2.5629,
164
+ "step": 210
165
+ },
166
+ {
167
+ "epoch": 0.33,
168
+ "grad_norm": 2.2436294555664062,
169
+ "learning_rate": 4.957153963462172e-05,
170
+ "loss": 2.3062,
171
+ "step": 220
172
+ },
173
+ {
174
+ "epoch": 0.34,
175
+ "grad_norm": 1.6833086013793945,
176
+ "learning_rate": 4.952730824451527e-05,
177
+ "loss": 2.4841,
178
+ "step": 230
179
+ },
180
+ {
181
+ "epoch": 0.36,
182
+ "grad_norm": 2.4797868728637695,
183
+ "learning_rate": 4.9480926245640754e-05,
184
+ "loss": 2.4149,
185
+ "step": 240
186
+ },
187
+ {
188
+ "epoch": 0.37,
189
+ "grad_norm": 2.539104700088501,
190
+ "learning_rate": 4.943239770487469e-05,
191
+ "loss": 2.5375,
192
+ "step": 250
193
+ },
194
+ {
195
+ "epoch": 0.39,
196
+ "grad_norm": 2.656684637069702,
197
+ "learning_rate": 4.9381726877307124e-05,
198
+ "loss": 2.58,
199
+ "step": 260
200
+ },
201
+ {
202
+ "epoch": 0.4,
203
+ "grad_norm": 2.8903274536132812,
204
+ "learning_rate": 4.9328918205868556e-05,
205
+ "loss": 2.36,
206
+ "step": 270
207
+ },
208
+ {
209
+ "epoch": 0.41,
210
+ "grad_norm": 3.5389297008514404,
211
+ "learning_rate": 4.927397632094039e-05,
212
+ "loss": 2.4487,
213
+ "step": 280
214
+ },
215
+ {
216
+ "epoch": 0.43,
217
+ "grad_norm": 2.3600199222564697,
218
+ "learning_rate": 4.9216906039948896e-05,
219
+ "loss": 2.5569,
220
+ "step": 290
221
+ },
222
+ {
223
+ "epoch": 0.44,
224
+ "grad_norm": 2.797351837158203,
225
+ "learning_rate": 4.915771236694286e-05,
226
+ "loss": 2.5081,
227
+ "step": 300
228
+ },
229
+ {
230
+ "epoch": 0.46,
231
+ "grad_norm": 2.2043402194976807,
232
+ "learning_rate": 4.909640049215478e-05,
233
+ "loss": 2.6873,
234
+ "step": 310
235
+ },
236
+ {
237
+ "epoch": 0.47,
238
+ "grad_norm": 2.070732831954956,
239
+ "learning_rate": 4.903297579154577e-05,
240
+ "loss": 2.4935,
241
+ "step": 320
242
+ },
243
+ {
244
+ "epoch": 0.49,
245
+ "grad_norm": 2.1852307319641113,
246
+ "learning_rate": 4.896744382633419e-05,
247
+ "loss": 2.4865,
248
+ "step": 330
249
+ },
250
+ {
251
+ "epoch": 0.5,
252
+ "grad_norm": 1.5772058963775635,
253
+ "learning_rate": 4.889981034250807e-05,
254
+ "loss": 2.3605,
255
+ "step": 340
256
+ },
257
+ {
258
+ "epoch": 0.52,
259
+ "grad_norm": 2.855281114578247,
260
+ "learning_rate": 4.883008127032121e-05,
261
+ "loss": 2.6443,
262
+ "step": 350
263
+ },
264
+ {
265
+ "epoch": 0.53,
266
+ "grad_norm": 2.7813808917999268,
267
+ "learning_rate": 4.8758262723773255e-05,
268
+ "loss": 2.2597,
269
+ "step": 360
270
+ },
271
+ {
272
+ "epoch": 0.55,
273
+ "grad_norm": 3.336205244064331,
274
+ "learning_rate": 4.86843610000736e-05,
275
+ "loss": 2.4404,
276
+ "step": 370
277
+ },
278
+ {
279
+ "epoch": 0.56,
280
+ "grad_norm": 1.562402367591858,
281
+ "learning_rate": 4.860838257908925e-05,
282
+ "loss": 2.3216,
283
+ "step": 380
284
+ },
285
+ {
286
+ "epoch": 0.58,
287
+ "grad_norm": 1.9673439264297485,
288
+ "learning_rate": 4.85303341227766e-05,
289
+ "loss": 2.3308,
290
+ "step": 390
291
+ },
292
+ {
293
+ "epoch": 0.59,
294
+ "grad_norm": 2.689194440841675,
295
+ "learning_rate": 4.845022247459736e-05,
296
+ "loss": 2.5978,
297
+ "step": 400
298
+ },
299
+ {
300
+ "epoch": 0.59,
301
+ "eval_loss": 2.3469619750976562,
302
+ "eval_runtime": 96.7403,
303
+ "eval_samples_per_second": 6.202,
304
+ "eval_steps_per_second": 3.101,
305
+ "step": 400
306
+ },
307
+ {
308
+ "epoch": 0.61,
309
+ "grad_norm": 2.1179988384246826,
310
+ "learning_rate": 4.836805465891844e-05,
311
+ "loss": 2.679,
312
+ "step": 410
313
+ },
314
+ {
315
+ "epoch": 0.62,
316
+ "grad_norm": 2.770677089691162,
317
+ "learning_rate": 4.828383788039611e-05,
318
+ "loss": 2.3037,
319
+ "step": 420
320
+ },
321
+ {
322
+ "epoch": 0.64,
323
+ "grad_norm": 2.4875175952911377,
324
+ "learning_rate": 4.819757952334425e-05,
325
+ "loss": 2.1605,
326
+ "step": 430
327
+ },
328
+ {
329
+ "epoch": 0.65,
330
+ "grad_norm": 2.146296501159668,
331
+ "learning_rate": 4.810928715108683e-05,
332
+ "loss": 2.4998,
333
+ "step": 440
334
+ },
335
+ {
336
+ "epoch": 0.67,
337
+ "grad_norm": 1.9566316604614258,
338
+ "learning_rate": 4.801896850529482e-05,
339
+ "loss": 2.6092,
340
+ "step": 450
341
+ },
342
+ {
343
+ "epoch": 0.68,
344
+ "grad_norm": 2.384901285171509,
345
+ "learning_rate": 4.792663150530733e-05,
346
+ "loss": 2.5815,
347
+ "step": 460
348
+ },
349
+ {
350
+ "epoch": 0.7,
351
+ "grad_norm": 2.225851535797119,
352
+ "learning_rate": 4.783228424743726e-05,
353
+ "loss": 2.3677,
354
+ "step": 470
355
+ },
356
+ {
357
+ "epoch": 0.71,
358
+ "grad_norm": 2.222594976425171,
359
+ "learning_rate": 4.773593500426134e-05,
360
+ "loss": 2.3782,
361
+ "step": 480
362
+ },
363
+ {
364
+ "epoch": 0.73,
365
+ "grad_norm": 1.6287986040115356,
366
+ "learning_rate": 4.763759222389487e-05,
367
+ "loss": 2.4165,
368
+ "step": 490
369
+ },
370
+ {
371
+ "epoch": 0.74,
372
+ "grad_norm": 2.4728448390960693,
373
+ "learning_rate": 4.7537264529250835e-05,
374
+ "loss": 2.3643,
375
+ "step": 500
376
+ },
377
+ {
378
+ "epoch": 0.76,
379
+ "grad_norm": 1.7345590591430664,
380
+ "learning_rate": 4.743496071728396e-05,
381
+ "loss": 2.4526,
382
+ "step": 510
383
+ },
384
+ {
385
+ "epoch": 0.77,
386
+ "grad_norm": 1.9642170667648315,
387
+ "learning_rate": 4.7330689758219314e-05,
388
+ "loss": 2.3306,
389
+ "step": 520
390
+ },
391
+ {
392
+ "epoch": 0.79,
393
+ "grad_norm": 2.757434368133545,
394
+ "learning_rate": 4.722446079476576e-05,
395
+ "loss": 2.5495,
396
+ "step": 530
397
+ },
398
+ {
399
+ "epoch": 0.8,
400
+ "grad_norm": 2.5214667320251465,
401
+ "learning_rate": 4.711628314131436e-05,
402
+ "loss": 2.5145,
403
+ "step": 540
404
+ },
405
+ {
406
+ "epoch": 0.81,
407
+ "grad_norm": 2.977623462677002,
408
+ "learning_rate": 4.700616628312158e-05,
409
+ "loss": 2.4552,
410
+ "step": 550
411
+ },
412
+ {
413
+ "epoch": 0.83,
414
+ "grad_norm": 3.109473466873169,
415
+ "learning_rate": 4.689411987547773e-05,
416
+ "loss": 2.4047,
417
+ "step": 560
418
+ },
419
+ {
420
+ "epoch": 0.84,
421
+ "grad_norm": 1.7021738290786743,
422
+ "learning_rate": 4.678015374286025e-05,
423
+ "loss": 2.5649,
424
+ "step": 570
425
+ },
426
+ {
427
+ "epoch": 0.86,
428
+ "grad_norm": 2.258920669555664,
429
+ "learning_rate": 4.666427787807232e-05,
430
+ "loss": 2.5556,
431
+ "step": 580
432
+ },
433
+ {
434
+ "epoch": 0.87,
435
+ "grad_norm": 2.1758129596710205,
436
+ "learning_rate": 4.654650244136669e-05,
437
+ "loss": 2.4234,
438
+ "step": 590
439
+ },
440
+ {
441
+ "epoch": 0.89,
442
+ "grad_norm": 2.581289529800415,
443
+ "learning_rate": 4.642683775955476e-05,
444
+ "loss": 2.5284,
445
+ "step": 600
446
+ },
447
+ {
448
+ "epoch": 0.89,
449
+ "eval_loss": 2.327061414718628,
450
+ "eval_runtime": 96.1253,
451
+ "eval_samples_per_second": 6.242,
452
+ "eval_steps_per_second": 3.121,
453
+ "step": 600
454
+ },
455
+ {
456
+ "epoch": 0.9,
457
+ "grad_norm": 3.0182411670684814,
458
+ "learning_rate": 4.630529432510118e-05,
459
+ "loss": 2.3928,
460
+ "step": 610
461
+ },
462
+ {
463
+ "epoch": 0.92,
464
+ "grad_norm": 2.1703760623931885,
465
+ "learning_rate": 4.618188279520374e-05,
466
+ "loss": 2.675,
467
+ "step": 620
468
+ },
469
+ {
470
+ "epoch": 0.93,
471
+ "grad_norm": 2.2590174674987793,
472
+ "learning_rate": 4.6056613990859024e-05,
473
+ "loss": 2.4192,
474
+ "step": 630
475
+ },
476
+ {
477
+ "epoch": 0.95,
478
+ "grad_norm": 3.697880744934082,
479
+ "learning_rate": 4.5929498895913514e-05,
480
+ "loss": 2.1851,
481
+ "step": 640
482
+ },
483
+ {
484
+ "epoch": 0.96,
485
+ "grad_norm": 1.6290298700332642,
486
+ "learning_rate": 4.580054865610059e-05,
487
+ "loss": 2.452,
488
+ "step": 650
489
+ },
490
+ {
491
+ "epoch": 0.98,
492
+ "grad_norm": 2.1037967205047607,
493
+ "learning_rate": 4.5669774578063174e-05,
494
+ "loss": 2.368,
495
+ "step": 660
496
+ },
497
+ {
498
+ "epoch": 0.99,
499
+ "grad_norm": 3.8899028301239014,
500
+ "learning_rate": 4.5537188128362384e-05,
501
+ "loss": 2.4681,
502
+ "step": 670
503
+ },
504
+ {
505
+ "epoch": 1.01,
506
+ "grad_norm": 2.6862452030181885,
507
+ "learning_rate": 4.54028009324721e-05,
508
+ "loss": 2.5741,
509
+ "step": 680
510
+ },
511
+ {
512
+ "epoch": 1.02,
513
+ "grad_norm": 2.2980988025665283,
514
+ "learning_rate": 4.52666247737596e-05,
515
+ "loss": 2.3131,
516
+ "step": 690
517
+ },
518
+ {
519
+ "epoch": 1.04,
520
+ "grad_norm": 2.9786365032196045,
521
+ "learning_rate": 4.512867159245242e-05,
522
+ "loss": 2.4059,
523
+ "step": 700
524
+ },
525
+ {
526
+ "epoch": 1.05,
527
+ "grad_norm": 2.39225435256958,
528
+ "learning_rate": 4.498895348459135e-05,
529
+ "loss": 2.3781,
530
+ "step": 710
531
+ },
532
+ {
533
+ "epoch": 1.07,
534
+ "grad_norm": 1.9918076992034912,
535
+ "learning_rate": 4.484748270096988e-05,
536
+ "loss": 2.399,
537
+ "step": 720
538
+ },
539
+ {
540
+ "epoch": 1.08,
541
+ "grad_norm": 2.9783575534820557,
542
+ "learning_rate": 4.470427164605997e-05,
543
+ "loss": 2.4341,
544
+ "step": 730
545
+ },
546
+ {
547
+ "epoch": 1.1,
548
+ "grad_norm": 1.9173182249069214,
549
+ "learning_rate": 4.455933287692444e-05,
550
+ "loss": 2.3917,
551
+ "step": 740
552
+ },
553
+ {
554
+ "epoch": 1.11,
555
+ "grad_norm": 5.648810863494873,
556
+ "learning_rate": 4.441267910211594e-05,
557
+ "loss": 2.6513,
558
+ "step": 750
559
+ },
560
+ {
561
+ "epoch": 1.13,
562
+ "grad_norm": 4.045050144195557,
563
+ "learning_rate": 4.4264323180562574e-05,
564
+ "loss": 2.5065,
565
+ "step": 760
566
+ },
567
+ {
568
+ "epoch": 1.14,
569
+ "grad_norm": 3.8237059116363525,
570
+ "learning_rate": 4.411427812044049e-05,
571
+ "loss": 2.3481,
572
+ "step": 770
573
+ },
574
+ {
575
+ "epoch": 1.16,
576
+ "grad_norm": 2.632697582244873,
577
+ "learning_rate": 4.396255707803323e-05,
578
+ "loss": 2.445,
579
+ "step": 780
580
+ },
581
+ {
582
+ "epoch": 1.17,
583
+ "grad_norm": 3.1144275665283203,
584
+ "learning_rate": 4.3809173356578184e-05,
585
+ "loss": 2.3096,
586
+ "step": 790
587
+ },
588
+ {
589
+ "epoch": 1.19,
590
+ "grad_norm": 1.7161847352981567,
591
+ "learning_rate": 4.3654140405100116e-05,
592
+ "loss": 2.4712,
593
+ "step": 800
594
+ },
595
+ {
596
+ "epoch": 1.19,
597
+ "eval_loss": 2.317145824432373,
598
+ "eval_runtime": 96.3893,
599
+ "eval_samples_per_second": 6.225,
600
+ "eval_steps_per_second": 3.112,
601
+ "step": 800
602
+ },
603
+ {
604
+ "epoch": 1.2,
605
+ "grad_norm": 2.709351062774658,
606
+ "learning_rate": 4.349747181723197e-05,
607
+ "loss": 2.4706,
608
+ "step": 810
609
+ },
610
+ {
611
+ "epoch": 1.21,
612
+ "grad_norm": 2.301166534423828,
613
+ "learning_rate": 4.3339181330022876e-05,
614
+ "loss": 2.5085,
615
+ "step": 820
616
+ },
617
+ {
618
+ "epoch": 1.23,
619
+ "grad_norm": 2.3112149238586426,
620
+ "learning_rate": 4.3179282822733706e-05,
621
+ "loss": 2.3204,
622
+ "step": 830
623
+ },
624
+ {
625
+ "epoch": 1.24,
626
+ "grad_norm": 1.850696325302124,
627
+ "learning_rate": 4.301779031562011e-05,
628
+ "loss": 2.4174,
629
+ "step": 840
630
+ },
631
+ {
632
+ "epoch": 1.26,
633
+ "grad_norm": 2.511995315551758,
634
+ "learning_rate": 4.285471796870316e-05,
635
+ "loss": 2.3967,
636
+ "step": 850
637
+ },
638
+ {
639
+ "epoch": 1.27,
640
+ "grad_norm": 3.4540021419525146,
641
+ "learning_rate": 4.26900800805278e-05,
642
+ "loss": 2.2189,
643
+ "step": 860
644
+ },
645
+ {
646
+ "epoch": 1.29,
647
+ "grad_norm": 3.0399599075317383,
648
+ "learning_rate": 4.252389108690909e-05,
649
+ "loss": 2.4208,
650
+ "step": 870
651
+ },
652
+ {
653
+ "epoch": 1.3,
654
+ "grad_norm": 2.1426591873168945,
655
+ "learning_rate": 4.235616555966645e-05,
656
+ "loss": 2.479,
657
+ "step": 880
658
+ },
659
+ {
660
+ "epoch": 1.32,
661
+ "grad_norm": 2.262714147567749,
662
+ "learning_rate": 4.218691820534601e-05,
663
+ "loss": 2.5144,
664
+ "step": 890
665
+ },
666
+ {
667
+ "epoch": 1.33,
668
+ "grad_norm": 2.40321683883667,
669
+ "learning_rate": 4.201616386393102e-05,
670
+ "loss": 2.332,
671
+ "step": 900
672
+ },
673
+ {
674
+ "epoch": 1.35,
675
+ "grad_norm": 2.4343059062957764,
676
+ "learning_rate": 4.184391750754075e-05,
677
+ "loss": 2.4799,
678
+ "step": 910
679
+ },
680
+ {
681
+ "epoch": 1.36,
682
+ "grad_norm": 2.6487956047058105,
683
+ "learning_rate": 4.167019423911761e-05,
684
+ "loss": 2.4492,
685
+ "step": 920
686
+ },
687
+ {
688
+ "epoch": 1.38,
689
+ "grad_norm": 3.0731077194213867,
690
+ "learning_rate": 4.149500929110295e-05,
691
+ "loss": 2.4789,
692
+ "step": 930
693
+ },
694
+ {
695
+ "epoch": 1.39,
696
+ "grad_norm": 2.791496515274048,
697
+ "learning_rate": 4.1318378024101435e-05,
698
+ "loss": 2.2895,
699
+ "step": 940
700
+ },
701
+ {
702
+ "epoch": 1.41,
703
+ "grad_norm": 2.860171318054199,
704
+ "learning_rate": 4.114031592553417e-05,
705
+ "loss": 2.3098,
706
+ "step": 950
707
+ },
708
+ {
709
+ "epoch": 1.42,
710
+ "grad_norm": 2.6719272136688232,
711
+ "learning_rate": 4.096083860828076e-05,
712
+ "loss": 2.2381,
713
+ "step": 960
714
+ },
715
+ {
716
+ "epoch": 1.44,
717
+ "grad_norm": 3.2551610469818115,
718
+ "learning_rate": 4.07799618093103e-05,
719
+ "loss": 2.4553,
720
+ "step": 970
721
+ },
722
+ {
723
+ "epoch": 1.45,
724
+ "grad_norm": 2.201517105102539,
725
+ "learning_rate": 4.059770138830157e-05,
726
+ "loss": 2.4248,
727
+ "step": 980
728
+ },
729
+ {
730
+ "epoch": 1.47,
731
+ "grad_norm": 3.5828166007995605,
732
+ "learning_rate": 4.041407332625238e-05,
733
+ "loss": 2.3741,
734
+ "step": 990
735
+ },
736
+ {
737
+ "epoch": 1.48,
738
+ "grad_norm": 2.608720064163208,
739
+ "learning_rate": 4.022909372407835e-05,
740
+ "loss": 2.4542,
741
+ "step": 1000
742
+ },
743
+ {
744
+ "epoch": 1.48,
745
+ "eval_loss": 2.308347463607788,
746
+ "eval_runtime": 96.2879,
747
+ "eval_samples_per_second": 6.231,
748
+ "eval_steps_per_second": 3.116,
749
+ "step": 1000
750
+ },
751
+ {
752
+ "epoch": 1.5,
753
+ "grad_norm": 2.607658624649048,
754
+ "learning_rate": 4.004277880120113e-05,
755
+ "loss": 2.5501,
756
+ "step": 1010
757
+ },
758
+ {
759
+ "epoch": 1.51,
760
+ "grad_norm": 2.599700450897217,
761
+ "learning_rate": 3.9855144894126235e-05,
762
+ "loss": 2.2606,
763
+ "step": 1020
764
+ },
765
+ {
766
+ "epoch": 1.53,
767
+ "grad_norm": 2.6854465007781982,
768
+ "learning_rate": 3.966620845501067e-05,
769
+ "loss": 2.3407,
770
+ "step": 1030
771
+ },
772
+ {
773
+ "epoch": 1.54,
774
+ "grad_norm": 2.488729476928711,
775
+ "learning_rate": 3.9475986050220314e-05,
776
+ "loss": 2.4184,
777
+ "step": 1040
778
+ },
779
+ {
780
+ "epoch": 1.56,
781
+ "grad_norm": 2.6692395210266113,
782
+ "learning_rate": 3.928449435887737e-05,
783
+ "loss": 2.4879,
784
+ "step": 1050
785
+ },
786
+ {
787
+ "epoch": 1.57,
788
+ "grad_norm": 2.208466053009033,
789
+ "learning_rate": 3.909175017139791e-05,
790
+ "loss": 2.2039,
791
+ "step": 1060
792
+ },
793
+ {
794
+ "epoch": 1.59,
795
+ "grad_norm": 2.5494725704193115,
796
+ "learning_rate": 3.889777038801964e-05,
797
+ "loss": 2.3029,
798
+ "step": 1070
799
+ },
800
+ {
801
+ "epoch": 1.6,
802
+ "grad_norm": 2.0070173740386963,
803
+ "learning_rate": 3.870257201732005e-05,
804
+ "loss": 2.3363,
805
+ "step": 1080
806
+ },
807
+ {
808
+ "epoch": 1.61,
809
+ "grad_norm": 2.75435209274292,
810
+ "learning_rate": 3.8506172174725066e-05,
811
+ "loss": 2.2523,
812
+ "step": 1090
813
+ },
814
+ {
815
+ "epoch": 1.63,
816
+ "grad_norm": 2.6911637783050537,
817
+ "learning_rate": 3.830858808100834e-05,
818
+ "loss": 2.4057,
819
+ "step": 1100
820
+ },
821
+ {
822
+ "epoch": 1.64,
823
+ "grad_norm": 3.0497798919677734,
824
+ "learning_rate": 3.810983706078131e-05,
825
+ "loss": 2.2635,
826
+ "step": 1110
827
+ },
828
+ {
829
+ "epoch": 1.66,
830
+ "grad_norm": 3.2239983081817627,
831
+ "learning_rate": 3.790993654097405e-05,
832
+ "loss": 2.3918,
833
+ "step": 1120
834
+ },
835
+ {
836
+ "epoch": 1.67,
837
+ "grad_norm": 2.4736838340759277,
838
+ "learning_rate": 3.770890404930738e-05,
839
+ "loss": 2.3823,
840
+ "step": 1130
841
+ },
842
+ {
843
+ "epoch": 1.69,
844
+ "grad_norm": 2.585200548171997,
845
+ "learning_rate": 3.7506757212755886e-05,
846
+ "loss": 2.3349,
847
+ "step": 1140
848
+ },
849
+ {
850
+ "epoch": 1.7,
851
+ "grad_norm": 2.8950488567352295,
852
+ "learning_rate": 3.730351375600239e-05,
853
+ "loss": 2.2586,
854
+ "step": 1150
855
+ },
856
+ {
857
+ "epoch": 1.72,
858
+ "grad_norm": 2.7123405933380127,
859
+ "learning_rate": 3.7099191499883806e-05,
860
+ "loss": 2.309,
861
+ "step": 1160
862
+ },
863
+ {
864
+ "epoch": 1.73,
865
+ "grad_norm": 2.049273729324341,
866
+ "learning_rate": 3.6893808359828565e-05,
867
+ "loss": 2.3608,
868
+ "step": 1170
869
+ },
870
+ {
871
+ "epoch": 1.75,
872
+ "grad_norm": 2.6950619220733643,
873
+ "learning_rate": 3.668738234428575e-05,
874
+ "loss": 2.4085,
875
+ "step": 1180
876
+ },
877
+ {
878
+ "epoch": 1.76,
879
+ "grad_norm": 3.231593370437622,
880
+ "learning_rate": 3.64799315531461e-05,
881
+ "loss": 2.2365,
882
+ "step": 1190
883
+ },
884
+ {
885
+ "epoch": 1.78,
886
+ "grad_norm": 3.310612201690674,
887
+ "learning_rate": 3.627147417615493e-05,
888
+ "loss": 2.3518,
889
+ "step": 1200
890
+ },
891
+ {
892
+ "epoch": 1.78,
893
+ "eval_loss": 2.2971346378326416,
894
+ "eval_runtime": 96.9315,
895
+ "eval_samples_per_second": 6.19,
896
+ "eval_steps_per_second": 3.095,
897
+ "step": 1200
898
+ },
899
+ {
900
+ "epoch": 1.79,
901
+ "grad_norm": 2.5581092834472656,
902
+ "learning_rate": 3.606202849131723e-05,
903
+ "loss": 2.2343,
904
+ "step": 1210
905
+ },
906
+ {
907
+ "epoch": 1.81,
908
+ "grad_norm": 2.7495291233062744,
909
+ "learning_rate": 3.585161286329503e-05,
910
+ "loss": 2.3144,
911
+ "step": 1220
912
+ },
913
+ {
914
+ "epoch": 1.82,
915
+ "grad_norm": 1.433355689048767,
916
+ "learning_rate": 3.564024574179713e-05,
917
+ "loss": 2.4354,
918
+ "step": 1230
919
+ },
920
+ {
921
+ "epoch": 1.84,
922
+ "grad_norm": 1.7245852947235107,
923
+ "learning_rate": 3.542794565996137e-05,
924
+ "loss": 2.405,
925
+ "step": 1240
926
+ },
927
+ {
928
+ "epoch": 1.85,
929
+ "grad_norm": 2.598426103591919,
930
+ "learning_rate": 3.5214731232729626e-05,
931
+ "loss": 2.4057,
932
+ "step": 1250
933
+ },
934
+ {
935
+ "epoch": 1.87,
936
+ "grad_norm": 2.4231202602386475,
937
+ "learning_rate": 3.500062115521562e-05,
938
+ "loss": 2.233,
939
+ "step": 1260
940
+ },
941
+ {
942
+ "epoch": 1.88,
943
+ "grad_norm": 2.9336726665496826,
944
+ "learning_rate": 3.478563420106565e-05,
945
+ "loss": 2.5745,
946
+ "step": 1270
947
+ },
948
+ {
949
+ "epoch": 1.9,
950
+ "grad_norm": 2.057365655899048,
951
+ "learning_rate": 3.4569789220812544e-05,
952
+ "loss": 2.4635,
953
+ "step": 1280
954
+ },
955
+ {
956
+ "epoch": 1.91,
957
+ "grad_norm": 1.8743510246276855,
958
+ "learning_rate": 3.435310514022272e-05,
959
+ "loss": 2.3892,
960
+ "step": 1290
961
+ },
962
+ {
963
+ "epoch": 1.93,
964
+ "grad_norm": 2.422725200653076,
965
+ "learning_rate": 3.4135600958636794e-05,
966
+ "loss": 2.4463,
967
+ "step": 1300
968
+ },
969
+ {
970
+ "epoch": 1.94,
971
+ "grad_norm": 2.9806418418884277,
972
+ "learning_rate": 3.391729574730365e-05,
973
+ "loss": 2.2907,
974
+ "step": 1310
975
+ },
976
+ {
977
+ "epoch": 1.96,
978
+ "grad_norm": 2.656452178955078,
979
+ "learning_rate": 3.369820864770822e-05,
980
+ "loss": 2.55,
981
+ "step": 1320
982
+ },
983
+ {
984
+ "epoch": 1.97,
985
+ "grad_norm": 1.4007813930511475,
986
+ "learning_rate": 3.347835886989318e-05,
987
+ "loss": 2.4001,
988
+ "step": 1330
989
+ },
990
+ {
991
+ "epoch": 1.99,
992
+ "grad_norm": 2.9661433696746826,
993
+ "learning_rate": 3.3257765690774474e-05,
994
+ "loss": 2.2728,
995
+ "step": 1340
996
+ },
997
+ {
998
+ "epoch": 2.0,
999
+ "grad_norm": 2.8605289459228516,
1000
+ "learning_rate": 3.303644845245114e-05,
1001
+ "loss": 2.4102,
1002
+ "step": 1350
1003
+ },
1004
+ {
1005
+ "epoch": 2.01,
1006
+ "grad_norm": 2.4378559589385986,
1007
+ "learning_rate": 3.2814426560509335e-05,
1008
+ "loss": 2.3268,
1009
+ "step": 1360
1010
+ },
1011
+ {
1012
+ "epoch": 2.03,
1013
+ "grad_norm": 2.231828212738037,
1014
+ "learning_rate": 3.259171948232081e-05,
1015
+ "loss": 2.265,
1016
+ "step": 1370
1017
+ },
1018
+ {
1019
+ "epoch": 2.04,
1020
+ "grad_norm": 3.6883370876312256,
1021
+ "learning_rate": 3.236834674533595e-05,
1022
+ "loss": 2.3077,
1023
+ "step": 1380
1024
+ },
1025
+ {
1026
+ "epoch": 2.06,
1027
+ "grad_norm": 2.531064510345459,
1028
+ "learning_rate": 3.214432793537159e-05,
1029
+ "loss": 2.2186,
1030
+ "step": 1390
1031
+ },
1032
+ {
1033
+ "epoch": 2.07,
1034
+ "grad_norm": 3.7311625480651855,
1035
+ "learning_rate": 3.1919682694893676e-05,
1036
+ "loss": 2.3739,
1037
+ "step": 1400
1038
+ },
1039
+ {
1040
+ "epoch": 2.07,
1041
+ "eval_loss": 2.2983055114746094,
1042
+ "eval_runtime": 97.4749,
1043
+ "eval_samples_per_second": 6.155,
1044
+ "eval_steps_per_second": 3.078,
1045
+ "step": 1400
1046
+ },
1047
+ {
1048
+ "epoch": 2.09,
1049
+ "grad_norm": 2.1773197650909424,
1050
+ "learning_rate": 3.169443072129498e-05,
1051
+ "loss": 2.3585,
1052
+ "step": 1410
1053
+ },
1054
+ {
1055
+ "epoch": 2.1,
1056
+ "grad_norm": 3.2135908603668213,
1057
+ "learning_rate": 3.146859176516795e-05,
1058
+ "loss": 2.4114,
1059
+ "step": 1420
1060
+ },
1061
+ {
1062
+ "epoch": 2.12,
1063
+ "grad_norm": 2.469650983810425,
1064
+ "learning_rate": 3.1242185628573e-05,
1065
+ "loss": 2.4616,
1066
+ "step": 1430
1067
+ },
1068
+ {
1069
+ "epoch": 2.13,
1070
+ "grad_norm": 3.1853158473968506,
1071
+ "learning_rate": 3.101523216330216e-05,
1072
+ "loss": 2.4351,
1073
+ "step": 1440
1074
+ },
1075
+ {
1076
+ "epoch": 2.15,
1077
+ "grad_norm": 3.484740972518921,
1078
+ "learning_rate": 3.0787751269138454e-05,
1079
+ "loss": 2.4084,
1080
+ "step": 1450
1081
+ },
1082
+ {
1083
+ "epoch": 2.16,
1084
+ "grad_norm": 2.925419330596924,
1085
+ "learning_rate": 3.055976289211105e-05,
1086
+ "loss": 2.3629,
1087
+ "step": 1460
1088
+ },
1089
+ {
1090
+ "epoch": 2.18,
1091
+ "grad_norm": 2.416266918182373,
1092
+ "learning_rate": 3.033128702274634e-05,
1093
+ "loss": 2.3339,
1094
+ "step": 1470
1095
+ },
1096
+ {
1097
+ "epoch": 2.19,
1098
+ "grad_norm": 2.828092575073242,
1099
+ "learning_rate": 3.010234369431511e-05,
1100
+ "loss": 2.2583,
1101
+ "step": 1480
1102
+ },
1103
+ {
1104
+ "epoch": 2.21,
1105
+ "grad_norm": 2.0409436225891113,
1106
+ "learning_rate": 2.9872952981076008e-05,
1107
+ "loss": 2.0624,
1108
+ "step": 1490
1109
+ },
1110
+ {
1111
+ "epoch": 2.22,
1112
+ "grad_norm": 2.849675416946411,
1113
+ "learning_rate": 2.9643134996515364e-05,
1114
+ "loss": 2.3726,
1115
+ "step": 1500
1116
+ },
1117
+ {
1118
+ "epoch": 2.24,
1119
+ "grad_norm": 4.13971471786499,
1120
+ "learning_rate": 2.9412909891583613e-05,
1121
+ "loss": 2.2965,
1122
+ "step": 1510
1123
+ },
1124
+ {
1125
+ "epoch": 2.25,
1126
+ "grad_norm": 3.702918529510498,
1127
+ "learning_rate": 2.9182297852928407e-05,
1128
+ "loss": 2.4658,
1129
+ "step": 1520
1130
+ },
1131
+ {
1132
+ "epoch": 2.27,
1133
+ "grad_norm": 3.2200419902801514,
1134
+ "learning_rate": 2.8951319101124598e-05,
1135
+ "loss": 2.4594,
1136
+ "step": 1530
1137
+ },
1138
+ {
1139
+ "epoch": 2.28,
1140
+ "grad_norm": 2.465409517288208,
1141
+ "learning_rate": 2.8719993888901258e-05,
1142
+ "loss": 2.4301,
1143
+ "step": 1540
1144
+ },
1145
+ {
1146
+ "epoch": 2.3,
1147
+ "grad_norm": 2.5337352752685547,
1148
+ "learning_rate": 2.848834249936589e-05,
1149
+ "loss": 2.3253,
1150
+ "step": 1550
1151
+ },
1152
+ {
1153
+ "epoch": 2.31,
1154
+ "grad_norm": 3.3071987628936768,
1155
+ "learning_rate": 2.8256385244225926e-05,
1156
+ "loss": 2.6393,
1157
+ "step": 1560
1158
+ },
1159
+ {
1160
+ "epoch": 2.33,
1161
+ "grad_norm": 2.4905800819396973,
1162
+ "learning_rate": 2.802414246200781e-05,
1163
+ "loss": 2.1755,
1164
+ "step": 1570
1165
+ },
1166
+ {
1167
+ "epoch": 2.34,
1168
+ "grad_norm": 2.8511528968811035,
1169
+ "learning_rate": 2.7791634516273574e-05,
1170
+ "loss": 2.2376,
1171
+ "step": 1580
1172
+ },
1173
+ {
1174
+ "epoch": 2.36,
1175
+ "grad_norm": 2.7542080879211426,
1176
+ "learning_rate": 2.755888179383543e-05,
1177
+ "loss": 2.3509,
1178
+ "step": 1590
1179
+ },
1180
+ {
1181
+ "epoch": 2.37,
1182
+ "grad_norm": 3.257232189178467,
1183
+ "learning_rate": 2.7325904702968137e-05,
1184
+ "loss": 2.2619,
1185
+ "step": 1600
1186
+ },
1187
+ {
1188
+ "epoch": 2.37,
1189
+ "eval_loss": 2.294617176055908,
1190
+ "eval_runtime": 98.0772,
1191
+ "eval_samples_per_second": 6.118,
1192
+ "eval_steps_per_second": 3.059,
1193
+ "step": 1600
1194
+ },
1195
+ {
1196
+ "epoch": 2.39,
1197
+ "grad_norm": 2.707037925720215,
1198
+ "learning_rate": 2.7092723671619565e-05,
1199
+ "loss": 2.4258,
1200
+ "step": 1610
1201
+ },
1202
+ {
1203
+ "epoch": 2.4,
1204
+ "grad_norm": 3.6930806636810303,
1205
+ "learning_rate": 2.685935914561954e-05,
1206
+ "loss": 2.3555,
1207
+ "step": 1620
1208
+ },
1209
+ {
1210
+ "epoch": 2.41,
1211
+ "grad_norm": 1.9949381351470947,
1212
+ "learning_rate": 2.6625831586887116e-05,
1213
+ "loss": 2.3908,
1214
+ "step": 1630
1215
+ },
1216
+ {
1217
+ "epoch": 2.43,
1218
+ "grad_norm": 2.457606554031372,
1219
+ "learning_rate": 2.6392161471636413e-05,
1220
+ "loss": 2.2989,
1221
+ "step": 1640
1222
+ },
1223
+ {
1224
+ "epoch": 2.44,
1225
+ "grad_norm": 2.2386317253112793,
1226
+ "learning_rate": 2.615836928858122e-05,
1227
+ "loss": 2.6807,
1228
+ "step": 1650
1229
+ },
1230
+ {
1231
+ "epoch": 2.46,
1232
+ "grad_norm": 2.672177791595459,
1233
+ "learning_rate": 2.5924475537138497e-05,
1234
+ "loss": 2.1579,
1235
+ "step": 1660
1236
+ },
1237
+ {
1238
+ "epoch": 2.47,
1239
+ "grad_norm": 4.241297721862793,
1240
+ "learning_rate": 2.569050072563097e-05,
1241
+ "loss": 2.0706,
1242
+ "step": 1670
1243
+ },
1244
+ {
1245
+ "epoch": 2.49,
1246
+ "grad_norm": 2.5108397006988525,
1247
+ "learning_rate": 2.5456465369488864e-05,
1248
+ "loss": 2.4219,
1249
+ "step": 1680
1250
+ },
1251
+ {
1252
+ "epoch": 2.5,
1253
+ "grad_norm": 2.7684569358825684,
1254
+ "learning_rate": 2.5222389989451096e-05,
1255
+ "loss": 2.2234,
1256
+ "step": 1690
1257
+ },
1258
+ {
1259
+ "epoch": 2.52,
1260
+ "grad_norm": 3.104278087615967,
1261
+ "learning_rate": 2.4988295109765972e-05,
1262
+ "loss": 2.3018,
1263
+ "step": 1700
1264
+ },
1265
+ {
1266
+ "epoch": 2.53,
1267
+ "grad_norm": 3.235226631164551,
1268
+ "learning_rate": 2.4754201256391585e-05,
1269
+ "loss": 2.364,
1270
+ "step": 1710
1271
+ },
1272
+ {
1273
+ "epoch": 2.55,
1274
+ "grad_norm": 2.1415085792541504,
1275
+ "learning_rate": 2.4520128955196008e-05,
1276
+ "loss": 2.3683,
1277
+ "step": 1720
1278
+ },
1279
+ {
1280
+ "epoch": 2.56,
1281
+ "grad_norm": 2.5002896785736084,
1282
+ "learning_rate": 2.42860987301576e-05,
1283
+ "loss": 2.4247,
1284
+ "step": 1730
1285
+ },
1286
+ {
1287
+ "epoch": 2.58,
1288
+ "grad_norm": 2.8159451484680176,
1289
+ "learning_rate": 2.4052131101565364e-05,
1290
+ "loss": 2.3574,
1291
+ "step": 1740
1292
+ },
1293
+ {
1294
+ "epoch": 2.59,
1295
+ "grad_norm": 3.0876357555389404,
1296
+ "learning_rate": 2.3818246584219726e-05,
1297
+ "loss": 2.2649,
1298
+ "step": 1750
1299
+ },
1300
+ {
1301
+ "epoch": 2.61,
1302
+ "grad_norm": 2.8891026973724365,
1303
+ "learning_rate": 2.3584465685633738e-05,
1304
+ "loss": 2.4012,
1305
+ "step": 1760
1306
+ },
1307
+ {
1308
+ "epoch": 2.62,
1309
+ "grad_norm": 2.3504886627197266,
1310
+ "learning_rate": 2.335080890423491e-05,
1311
+ "loss": 2.3263,
1312
+ "step": 1770
1313
+ },
1314
+ {
1315
+ "epoch": 2.64,
1316
+ "grad_norm": 2.84779691696167,
1317
+ "learning_rate": 2.3117296727567897e-05,
1318
+ "loss": 2.4177,
1319
+ "step": 1780
1320
+ },
1321
+ {
1322
+ "epoch": 2.65,
1323
+ "grad_norm": 2.4880871772766113,
1324
+ "learning_rate": 2.288394963049807e-05,
1325
+ "loss": 2.3029,
1326
+ "step": 1790
1327
+ },
1328
+ {
1329
+ "epoch": 2.67,
1330
+ "grad_norm": 2.4965240955352783,
1331
+ "learning_rate": 2.2650788073416293e-05,
1332
+ "loss": 2.2876,
1333
+ "step": 1800
1334
+ },
1335
+ {
1336
+ "epoch": 2.67,
1337
+ "eval_loss": 2.287304401397705,
1338
+ "eval_runtime": 97.8793,
1339
+ "eval_samples_per_second": 6.13,
1340
+ "eval_steps_per_second": 3.065,
1341
+ "step": 1800
1342
+ },
1343
+ {
1344
+ "epoch": 2.68,
1345
+ "grad_norm": 3.6506803035736084,
1346
+ "learning_rate": 2.2417832500444827e-05,
1347
+ "loss": 2.2686,
1348
+ "step": 1810
1349
+ },
1350
+ {
1351
+ "epoch": 2.7,
1352
+ "grad_norm": 2.156888008117676,
1353
+ "learning_rate": 2.2185103337644833e-05,
1354
+ "loss": 2.4572,
1355
+ "step": 1820
1356
+ },
1357
+ {
1358
+ "epoch": 2.71,
1359
+ "grad_norm": 2.9049007892608643,
1360
+ "learning_rate": 2.1952620991225285e-05,
1361
+ "loss": 2.4824,
1362
+ "step": 1830
1363
+ },
1364
+ {
1365
+ "epoch": 2.73,
1366
+ "grad_norm": 3.4357845783233643,
1367
+ "learning_rate": 2.1720405845753792e-05,
1368
+ "loss": 2.3334,
1369
+ "step": 1840
1370
+ },
1371
+ {
1372
+ "epoch": 2.74,
1373
+ "grad_norm": 2.405451774597168,
1374
+ "learning_rate": 2.148847826236914e-05,
1375
+ "loss": 2.4271,
1376
+ "step": 1850
1377
+ },
1378
+ {
1379
+ "epoch": 2.76,
1380
+ "grad_norm": 2.0909016132354736,
1381
+ "learning_rate": 2.125685857699609e-05,
1382
+ "loss": 2.3499,
1383
+ "step": 1860
1384
+ },
1385
+ {
1386
+ "epoch": 2.77,
1387
+ "grad_norm": 3.3600564002990723,
1388
+ "learning_rate": 2.1025567098562177e-05,
1389
+ "loss": 2.2665,
1390
+ "step": 1870
1391
+ },
1392
+ {
1393
+ "epoch": 2.79,
1394
+ "grad_norm": 3.0894439220428467,
1395
+ "learning_rate": 2.0794624107217056e-05,
1396
+ "loss": 2.3211,
1397
+ "step": 1880
1398
+ },
1399
+ {
1400
+ "epoch": 2.8,
1401
+ "grad_norm": 2.564870834350586,
1402
+ "learning_rate": 2.056404985255424e-05,
1403
+ "loss": 2.3905,
1404
+ "step": 1890
1405
+ },
1406
+ {
1407
+ "epoch": 2.81,
1408
+ "grad_norm": 2.177769422531128,
1409
+ "learning_rate": 2.0333864551835602e-05,
1410
+ "loss": 2.4703,
1411
+ "step": 1900
1412
+ },
1413
+ {
1414
+ "epoch": 2.83,
1415
+ "grad_norm": 2.499175548553467,
1416
+ "learning_rate": 2.010408838821866e-05,
1417
+ "loss": 2.3287,
1418
+ "step": 1910
1419
+ },
1420
+ {
1421
+ "epoch": 2.84,
1422
+ "grad_norm": 3.076934337615967,
1423
+ "learning_rate": 1.987474150898691e-05,
1424
+ "loss": 2.3857,
1425
+ "step": 1920
1426
+ },
1427
+ {
1428
+ "epoch": 2.86,
1429
+ "grad_norm": 5.456985950469971,
1430
+ "learning_rate": 1.9645844023783206e-05,
1431
+ "loss": 2.3238,
1432
+ "step": 1930
1433
+ },
1434
+ {
1435
+ "epoch": 2.87,
1436
+ "grad_norm": 2.502319574356079,
1437
+ "learning_rate": 1.941741600284656e-05,
1438
+ "loss": 2.3027,
1439
+ "step": 1940
1440
+ },
1441
+ {
1442
+ "epoch": 2.89,
1443
+ "grad_norm": 3.1800990104675293,
1444
+ "learning_rate": 1.918947747525232e-05,
1445
+ "loss": 2.086,
1446
+ "step": 1950
1447
+ },
1448
+ {
1449
+ "epoch": 2.9,
1450
+ "grad_norm": 3.400146245956421,
1451
+ "learning_rate": 1.896204842715596e-05,
1452
+ "loss": 2.5469,
1453
+ "step": 1960
1454
+ },
1455
+ {
1456
+ "epoch": 2.92,
1457
+ "grad_norm": 2.4540586471557617,
1458
+ "learning_rate": 1.873514880004065e-05,
1459
+ "loss": 2.2501,
1460
+ "step": 1970
1461
+ },
1462
+ {
1463
+ "epoch": 2.93,
1464
+ "grad_norm": 3.094639778137207,
1465
+ "learning_rate": 1.8508798488968803e-05,
1466
+ "loss": 2.3037,
1467
+ "step": 1980
1468
+ },
1469
+ {
1470
+ "epoch": 2.95,
1471
+ "grad_norm": 2.7142815589904785,
1472
+ "learning_rate": 1.8283017340837517e-05,
1473
+ "loss": 2.2974,
1474
+ "step": 1990
1475
+ },
1476
+ {
1477
+ "epoch": 2.96,
1478
+ "grad_norm": 2.40388560295105,
1479
+ "learning_rate": 1.8057825152638478e-05,
1480
+ "loss": 2.2484,
1481
+ "step": 2000
1482
+ },
1483
+ {
1484
+ "epoch": 2.96,
1485
+ "eval_loss": 2.2887699604034424,
1486
+ "eval_runtime": 100.5569,
1487
+ "eval_samples_per_second": 5.967,
1488
+ "eval_steps_per_second": 2.983,
1489
+ "step": 2000
1490
+ },
1491
+ {
1492
+ "epoch": 2.98,
1493
+ "grad_norm": 3.5946638584136963,
1494
+ "learning_rate": 1.7833241669722015e-05,
1495
+ "loss": 2.2191,
1496
+ "step": 2010
1497
+ },
1498
+ {
1499
+ "epoch": 2.99,
1500
+ "grad_norm": 3.1359758377075195,
1501
+ "learning_rate": 1.760928658406587e-05,
1502
+ "loss": 2.4429,
1503
+ "step": 2020
1504
+ },
1505
+ {
1506
+ "epoch": 3.01,
1507
+ "grad_norm": 3.105003833770752,
1508
+ "learning_rate": 1.738597953254848e-05,
1509
+ "loss": 2.3241,
1510
+ "step": 2030
1511
+ },
1512
+ {
1513
+ "epoch": 3.02,
1514
+ "grad_norm": 2.3050458431243896,
1515
+ "learning_rate": 1.716334009522726e-05,
1516
+ "loss": 2.3608,
1517
+ "step": 2040
1518
+ },
1519
+ {
1520
+ "epoch": 3.04,
1521
+ "grad_norm": 2.2272346019744873,
1522
+ "learning_rate": 1.6941387793621673e-05,
1523
+ "loss": 2.3107,
1524
+ "step": 2050
1525
+ },
1526
+ {
1527
+ "epoch": 3.05,
1528
+ "grad_norm": 2.0482161045074463,
1529
+ "learning_rate": 1.672014208900165e-05,
1530
+ "loss": 2.1823,
1531
+ "step": 2060
1532
+ },
1533
+ {
1534
+ "epoch": 3.07,
1535
+ "grad_norm": 2.0835390090942383,
1536
+ "learning_rate": 1.6499622380681096e-05,
1537
+ "loss": 2.1622,
1538
+ "step": 2070
1539
+ },
1540
+ {
1541
+ "epoch": 3.08,
1542
+ "grad_norm": 3.687225103378296,
1543
+ "learning_rate": 1.6279848004316972e-05,
1544
+ "loss": 2.3643,
1545
+ "step": 2080
1546
+ },
1547
+ {
1548
+ "epoch": 3.1,
1549
+ "grad_norm": 3.2139699459075928,
1550
+ "learning_rate": 1.6060838230213883e-05,
1551
+ "loss": 2.2241,
1552
+ "step": 2090
1553
+ },
1554
+ {
1555
+ "epoch": 3.11,
1556
+ "grad_norm": 3.513046979904175,
1557
+ "learning_rate": 1.5842612261634392e-05,
1558
+ "loss": 2.311,
1559
+ "step": 2100
1560
+ },
1561
+ {
1562
+ "epoch": 3.13,
1563
+ "grad_norm": 2.698282241821289,
1564
+ "learning_rate": 1.5625189233115282e-05,
1565
+ "loss": 2.4009,
1566
+ "step": 2110
1567
+ },
1568
+ {
1569
+ "epoch": 3.14,
1570
+ "grad_norm": 2.889256238937378,
1571
+ "learning_rate": 1.5408588208789733e-05,
1572
+ "loss": 2.2708,
1573
+ "step": 2120
1574
+ },
1575
+ {
1576
+ "epoch": 3.16,
1577
+ "grad_norm": 2.4953372478485107,
1578
+ "learning_rate": 1.5192828180715824e-05,
1579
+ "loss": 2.2726,
1580
+ "step": 2130
1581
+ },
1582
+ {
1583
+ "epoch": 3.17,
1584
+ "grad_norm": 3.368839740753174,
1585
+ "learning_rate": 1.4977928067211178e-05,
1586
+ "loss": 2.0851,
1587
+ "step": 2140
1588
+ },
1589
+ {
1590
+ "epoch": 3.19,
1591
+ "grad_norm": 3.3648414611816406,
1592
+ "learning_rate": 1.4763906711194229e-05,
1593
+ "loss": 2.11,
1594
+ "step": 2150
1595
+ },
1596
+ {
1597
+ "epoch": 3.2,
1598
+ "grad_norm": 2.7509853839874268,
1599
+ "learning_rate": 1.4550782878531972e-05,
1600
+ "loss": 2.3487,
1601
+ "step": 2160
1602
+ },
1603
+ {
1604
+ "epoch": 3.21,
1605
+ "grad_norm": 3.071002721786499,
1606
+ "learning_rate": 1.4338575256394612e-05,
1607
+ "loss": 2.2536,
1608
+ "step": 2170
1609
+ },
1610
+ {
1611
+ "epoch": 3.23,
1612
+ "grad_norm": 2.7192609310150146,
1613
+ "learning_rate": 1.4127302451616936e-05,
1614
+ "loss": 2.2367,
1615
+ "step": 2180
1616
+ },
1617
+ {
1618
+ "epoch": 3.24,
1619
+ "grad_norm": 5.182852268218994,
1620
+ "learning_rate": 1.3916982989066915e-05,
1621
+ "loss": 2.0933,
1622
+ "step": 2190
1623
+ },
1624
+ {
1625
+ "epoch": 3.26,
1626
+ "grad_norm": 2.816575527191162,
1627
+ "learning_rate": 1.370763531002132e-05,
1628
+ "loss": 2.4534,
1629
+ "step": 2200
1630
+ },
1631
+ {
1632
+ "epoch": 3.26,
1633
+ "eval_loss": 2.2867271900177,
1634
+ "eval_runtime": 98.3879,
1635
+ "eval_samples_per_second": 6.098,
1636
+ "eval_steps_per_second": 3.049,
1637
+ "step": 2200
1638
+ },
1639
+ {
1640
+ "epoch": 3.27,
1641
+ "grad_norm": 2.238353967666626,
1642
+ "learning_rate": 1.3499277770548823e-05,
1643
+ "loss": 2.3927,
1644
+ "step": 2210
1645
+ },
1646
+ {
1647
+ "epoch": 3.29,
1648
+ "grad_norm": 3.0616304874420166,
1649
+ "learning_rate": 1.3291928639900436e-05,
1650
+ "loss": 2.3978,
1651
+ "step": 2220
1652
+ },
1653
+ {
1654
+ "epoch": 3.3,
1655
+ "grad_norm": 3.807537317276001,
1656
+ "learning_rate": 1.3085606098907682e-05,
1657
+ "loss": 2.1303,
1658
+ "step": 2230
1659
+ },
1660
+ {
1661
+ "epoch": 3.32,
1662
+ "grad_norm": 3.6472954750061035,
1663
+ "learning_rate": 1.2880328238388393e-05,
1664
+ "loss": 2.3277,
1665
+ "step": 2240
1666
+ },
1667
+ {
1668
+ "epoch": 3.33,
1669
+ "grad_norm": 3.634000301361084,
1670
+ "learning_rate": 1.2676113057560515e-05,
1671
+ "loss": 2.358,
1672
+ "step": 2250
1673
+ },
1674
+ {
1675
+ "epoch": 3.35,
1676
+ "grad_norm": 5.468724727630615,
1677
+ "learning_rate": 1.2472978462463874e-05,
1678
+ "loss": 2.4583,
1679
+ "step": 2260
1680
+ },
1681
+ {
1682
+ "epoch": 3.36,
1683
+ "grad_norm": 3.201179265975952,
1684
+ "learning_rate": 1.2270942264390174e-05,
1685
+ "loss": 2.2543,
1686
+ "step": 2270
1687
+ },
1688
+ {
1689
+ "epoch": 3.38,
1690
+ "grad_norm": 2.4082183837890625,
1691
+ "learning_rate": 1.2070022178321186e-05,
1692
+ "loss": 2.2401,
1693
+ "step": 2280
1694
+ },
1695
+ {
1696
+ "epoch": 3.39,
1697
+ "grad_norm": 3.068176031112671,
1698
+ "learning_rate": 1.1870235821375553e-05,
1699
+ "loss": 2.3446,
1700
+ "step": 2290
1701
+ },
1702
+ {
1703
+ "epoch": 3.41,
1704
+ "grad_norm": 2.7859139442443848,
1705
+ "learning_rate": 1.1671600711263991e-05,
1706
+ "loss": 2.3761,
1707
+ "step": 2300
1708
+ },
1709
+ {
1710
+ "epoch": 3.42,
1711
+ "grad_norm": 2.423513650894165,
1712
+ "learning_rate": 1.1474134264753384e-05,
1713
+ "loss": 2.2563,
1714
+ "step": 2310
1715
+ },
1716
+ {
1717
+ "epoch": 3.44,
1718
+ "grad_norm": 3.0830302238464355,
1719
+ "learning_rate": 1.1277853796139554e-05,
1720
+ "loss": 2.2455,
1721
+ "step": 2320
1722
+ },
1723
+ {
1724
+ "epoch": 3.45,
1725
+ "grad_norm": 3.237128734588623,
1726
+ "learning_rate": 1.1082776515729201e-05,
1727
+ "loss": 2.3861,
1728
+ "step": 2330
1729
+ },
1730
+ {
1731
+ "epoch": 3.47,
1732
+ "grad_norm": 2.9493908882141113,
1733
+ "learning_rate": 1.0888919528330777e-05,
1734
+ "loss": 2.0657,
1735
+ "step": 2340
1736
+ },
1737
+ {
1738
+ "epoch": 3.48,
1739
+ "grad_norm": 3.7209978103637695,
1740
+ "learning_rate": 1.0696299831754753e-05,
1741
+ "loss": 2.4492,
1742
+ "step": 2350
1743
+ },
1744
+ {
1745
+ "epoch": 3.5,
1746
+ "grad_norm": 2.332671880722046,
1747
+ "learning_rate": 1.0504934315323181e-05,
1748
+ "loss": 2.3108,
1749
+ "step": 2360
1750
+ },
1751
+ {
1752
+ "epoch": 3.51,
1753
+ "grad_norm": 3.0048203468322754,
1754
+ "learning_rate": 1.0314839758388859e-05,
1755
+ "loss": 2.5104,
1756
+ "step": 2370
1757
+ },
1758
+ {
1759
+ "epoch": 3.53,
1760
+ "grad_norm": 3.380918502807617,
1761
+ "learning_rate": 1.0126032828863982e-05,
1762
+ "loss": 2.3024,
1763
+ "step": 2380
1764
+ },
1765
+ {
1766
+ "epoch": 3.54,
1767
+ "grad_norm": 2.693096876144409,
1768
+ "learning_rate": 9.938530081758764e-06,
1769
+ "loss": 2.3422,
1770
+ "step": 2390
1771
+ },
1772
+ {
1773
+ "epoch": 3.56,
1774
+ "grad_norm": 2.64311146736145,
1775
+ "learning_rate": 9.752347957729804e-06,
1776
+ "loss": 2.3934,
1777
+ "step": 2400
1778
+ },
1779
+ {
1780
+ "epoch": 3.56,
1781
+ "eval_loss": 2.2875092029571533,
1782
+ "eval_runtime": 98.3803,
1783
+ "eval_samples_per_second": 6.099,
1784
+ "eval_steps_per_second": 3.049,
1785
+ "step": 2400
1786
+ },
1787
+ {
1788
+ "epoch": 3.57,
1789
+ "grad_norm": 2.693216562271118,
1790
+ "learning_rate": 9.567502781638516e-06,
1791
+ "loss": 2.3249,
1792
+ "step": 2410
1793
+ },
1794
+ {
1795
+ "epoch": 3.59,
1796
+ "grad_norm": 2.68764066696167,
1797
+ "learning_rate": 9.384010761119787e-06,
1798
+ "loss": 2.2552,
1799
+ "step": 2420
1800
+ },
1801
+ {
1802
+ "epoch": 3.6,
1803
+ "grad_norm": 4.221546649932861,
1804
+ "learning_rate": 9.201887985160804e-06,
1805
+ "loss": 2.3362,
1806
+ "step": 2430
1807
+ },
1808
+ {
1809
+ "epoch": 3.61,
1810
+ "grad_norm": 2.777925729751587,
1811
+ "learning_rate": 9.039161391719244e-06,
1812
+ "loss": 2.3256,
1813
+ "step": 2440
1814
+ },
1815
+ {
1816
+ "epoch": 3.63,
1817
+ "grad_norm": 2.7611911296844482,
1818
+ "learning_rate": 8.859684074465835e-06,
1819
+ "loss": 2.2209,
1820
+ "step": 2450
1821
+ },
1822
+ {
1823
+ "epoch": 3.64,
1824
+ "grad_norm": 2.7354393005371094,
1825
+ "learning_rate": 8.681621975898577e-06,
1826
+ "loss": 2.1957,
1827
+ "step": 2460
1828
+ },
1829
+ {
1830
+ "epoch": 3.66,
1831
+ "grad_norm": 2.901160478591919,
1832
+ "learning_rate": 8.504990708897056e-06,
1833
+ "loss": 2.2935,
1834
+ "step": 2470
1835
+ },
1836
+ {
1837
+ "epoch": 3.67,
1838
+ "grad_norm": 2.4266469478607178,
1839
+ "learning_rate": 8.329805760882403e-06,
1840
+ "loss": 2.307,
1841
+ "step": 2480
1842
+ },
1843
+ {
1844
+ "epoch": 3.69,
1845
+ "grad_norm": 2.3984947204589844,
1846
+ "learning_rate": 8.156082492459257e-06,
1847
+ "loss": 2.3943,
1848
+ "step": 2490
1849
+ },
1850
+ {
1851
+ "epoch": 3.7,
1852
+ "grad_norm": 2.6705055236816406,
1853
+ "learning_rate": 7.983836136068984e-06,
1854
+ "loss": 2.3774,
1855
+ "step": 2500
1856
+ },
1857
+ {
1858
+ "epoch": 3.72,
1859
+ "grad_norm": 3.173973321914673,
1860
+ "learning_rate": 7.813081794653995e-06,
1861
+ "loss": 2.2757,
1862
+ "step": 2510
1863
+ },
1864
+ {
1865
+ "epoch": 3.73,
1866
+ "grad_norm": 3.104217529296875,
1867
+ "learning_rate": 7.643834440333553e-06,
1868
+ "loss": 2.2961,
1869
+ "step": 2520
1870
+ },
1871
+ {
1872
+ "epoch": 3.75,
1873
+ "grad_norm": 3.088330030441284,
1874
+ "learning_rate": 7.476108913090915e-06,
1875
+ "loss": 2.2001,
1876
+ "step": 2530
1877
+ },
1878
+ {
1879
+ "epoch": 3.76,
1880
+ "grad_norm": 3.717886447906494,
1881
+ "learning_rate": 7.309919919472208e-06,
1882
+ "loss": 2.1859,
1883
+ "step": 2540
1884
+ },
1885
+ {
1886
+ "epoch": 3.78,
1887
+ "grad_norm": 3.12146258354187,
1888
+ "learning_rate": 7.145282031296841e-06,
1889
+ "loss": 2.2422,
1890
+ "step": 2550
1891
+ },
1892
+ {
1893
+ "epoch": 3.79,
1894
+ "grad_norm": 3.212069272994995,
1895
+ "learning_rate": 6.982209684379892e-06,
1896
+ "loss": 2.1191,
1897
+ "step": 2560
1898
+ },
1899
+ {
1900
+ "epoch": 3.81,
1901
+ "grad_norm": 3.0071327686309814,
1902
+ "learning_rate": 6.8207171772662976e-06,
1903
+ "loss": 2.1472,
1904
+ "step": 2570
1905
+ },
1906
+ {
1907
+ "epoch": 3.82,
1908
+ "grad_norm": 3.0386757850646973,
1909
+ "learning_rate": 6.660818669977134e-06,
1910
+ "loss": 2.3547,
1911
+ "step": 2580
1912
+ },
1913
+ {
1914
+ "epoch": 3.84,
1915
+ "grad_norm": 2.2448551654815674,
1916
+ "learning_rate": 6.5025281827680335e-06,
1917
+ "loss": 2.2866,
1918
+ "step": 2590
1919
+ },
1920
+ {
1921
+ "epoch": 3.85,
1922
+ "grad_norm": 3.914092779159546,
1923
+ "learning_rate": 6.345859594899886e-06,
1924
+ "loss": 2.3713,
1925
+ "step": 2600
1926
+ },
1927
+ {
1928
+ "epoch": 3.85,
1929
+ "eval_loss": 2.2858927249908447,
1930
+ "eval_runtime": 98.5519,
1931
+ "eval_samples_per_second": 6.088,
1932
+ "eval_steps_per_second": 3.044,
1933
+ "step": 2600
1934
+ },
1935
+ {
1936
+ "epoch": 3.87,
1937
+ "grad_norm": 2.350407600402832,
1938
+ "learning_rate": 6.1908266434218235e-06,
1939
+ "loss": 2.2876,
1940
+ "step": 2610
1941
+ },
1942
+ {
1943
+ "epoch": 3.88,
1944
+ "grad_norm": 2.7468740940093994,
1945
+ "learning_rate": 6.037442921966771e-06,
1946
+ "loss": 2.3253,
1947
+ "step": 2620
1948
+ },
1949
+ {
1950
+ "epoch": 3.9,
1951
+ "grad_norm": 3.026620626449585,
1952
+ "learning_rate": 5.885721879559514e-06,
1953
+ "loss": 2.2033,
1954
+ "step": 2630
1955
+ },
1956
+ {
1957
+ "epoch": 3.91,
1958
+ "grad_norm": 2.2398440837860107,
1959
+ "learning_rate": 5.735676819437425e-06,
1960
+ "loss": 2.317,
1961
+ "step": 2640
1962
+ },
1963
+ {
1964
+ "epoch": 3.93,
1965
+ "grad_norm": 2.4124555587768555,
1966
+ "learning_rate": 5.587320897884066e-06,
1967
+ "loss": 2.284,
1968
+ "step": 2650
1969
+ },
1970
+ {
1971
+ "epoch": 3.94,
1972
+ "grad_norm": 2.4340391159057617,
1973
+ "learning_rate": 5.440667123075558e-06,
1974
+ "loss": 2.3012,
1975
+ "step": 2660
1976
+ },
1977
+ {
1978
+ "epoch": 3.96,
1979
+ "grad_norm": 2.7486612796783447,
1980
+ "learning_rate": 5.295728353940038e-06,
1981
+ "loss": 2.5206,
1982
+ "step": 2670
1983
+ },
1984
+ {
1985
+ "epoch": 3.97,
1986
+ "grad_norm": 2.5270161628723145,
1987
+ "learning_rate": 5.152517299030127e-06,
1988
+ "loss": 2.5541,
1989
+ "step": 2680
1990
+ },
1991
+ {
1992
+ "epoch": 3.99,
1993
+ "grad_norm": 3.589616298675537,
1994
+ "learning_rate": 5.011046515408657e-06,
1995
+ "loss": 2.3475,
1996
+ "step": 2690
1997
+ },
1998
+ {
1999
+ "epoch": 4.0,
2000
+ "grad_norm": 2.27154541015625,
2001
+ "learning_rate": 4.871328407547587e-06,
2002
+ "loss": 2.4889,
2003
+ "step": 2700
2004
+ },
2005
+ {
2006
+ "epoch": 4.01,
2007
+ "grad_norm": 2.9212470054626465,
2008
+ "learning_rate": 4.733375226240408e-06,
2009
+ "loss": 2.4318,
2010
+ "step": 2710
2011
+ },
2012
+ {
2013
+ "epoch": 4.03,
2014
+ "grad_norm": 3.5055928230285645,
2015
+ "learning_rate": 4.597199067527907e-06,
2016
+ "loss": 2.1628,
2017
+ "step": 2720
2018
+ },
2019
+ {
2020
+ "epoch": 4.04,
2021
+ "grad_norm": 2.520705223083496,
2022
+ "learning_rate": 4.462811871637618e-06,
2023
+ "loss": 2.0723,
2024
+ "step": 2730
2025
+ },
2026
+ {
2027
+ "epoch": 4.06,
2028
+ "grad_norm": 3.2816295623779297,
2029
+ "learning_rate": 4.330225421936823e-06,
2030
+ "loss": 2.2386,
2031
+ "step": 2740
2032
+ },
2033
+ {
2034
+ "epoch": 4.07,
2035
+ "grad_norm": 3.64699649810791,
2036
+ "learning_rate": 4.1994513438994156e-06,
2037
+ "loss": 2.1135,
2038
+ "step": 2750
2039
+ },
2040
+ {
2041
+ "epoch": 4.09,
2042
+ "grad_norm": 3.278775930404663,
2043
+ "learning_rate": 4.070501104086488e-06,
2044
+ "loss": 2.2199,
2045
+ "step": 2760
2046
+ },
2047
+ {
2048
+ "epoch": 4.1,
2049
+ "grad_norm": 1.8690662384033203,
2050
+ "learning_rate": 3.943386009140984e-06,
2051
+ "loss": 2.2364,
2052
+ "step": 2770
2053
+ },
2054
+ {
2055
+ "epoch": 4.12,
2056
+ "grad_norm": 3.0856454372406006,
2057
+ "learning_rate": 3.818117204796262e-06,
2058
+ "loss": 2.0439,
2059
+ "step": 2780
2060
+ },
2061
+ {
2062
+ "epoch": 4.13,
2063
+ "grad_norm": 3.822516679763794,
2064
+ "learning_rate": 3.694705674898827e-06,
2065
+ "loss": 2.2703,
2066
+ "step": 2790
2067
+ },
2068
+ {
2069
+ "epoch": 4.15,
2070
+ "grad_norm": 5.145390510559082,
2071
+ "learning_rate": 3.573162240445238e-06,
2072
+ "loss": 2.3365,
2073
+ "step": 2800
2074
+ },
2075
+ {
2076
+ "epoch": 4.15,
2077
+ "eval_loss": 2.2857470512390137,
2078
+ "eval_runtime": 98.4481,
2079
+ "eval_samples_per_second": 6.095,
2080
+ "eval_steps_per_second": 3.047,
2081
+ "step": 2800
2082
+ }
2083
+ ],
2084
+ "logging_steps": 10,
2085
+ "max_steps": 3375,
2086
+ "num_input_tokens_seen": 0,
2087
+ "num_train_epochs": 5,
2088
+ "save_steps": 200,
2089
+ "total_flos": 7.882780555954094e+17,
2090
+ "train_batch_size": 2,
2091
+ "trial_name": null,
2092
+ "trial_params": null
2093
+ }
Yi-1.5-9B-Chat-LoRA/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63f70fe831711cdfd1f28877e724899c1779d7deea9ba996bb7d43de2c0034eb
3
+ size 5112