CreatorPhan commited on
Commit
e93c715
1 Parent(s): 7801575

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,6 +1,203 @@
1
  ---
2
  library_name: peft
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ## Training procedure
5
 
6
 
@@ -15,6 +212,7 @@ The following `bitsandbytes` quantization config was used during training:
15
  - bnb_4bit_quant_type: fp4
16
  - bnb_4bit_use_double_quant: False
17
  - bnb_4bit_compute_dtype: float32
 
18
  ### Framework versions
19
 
20
 
 
1
  ---
2
  library_name: peft
3
+ base_model: bigscience/bloomz-3b
4
  ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Data Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
199
+
200
+
201
  ## Training procedure
202
 
203
 
 
212
  - bnb_4bit_quant_type: fp4
213
  - bnb_4bit_use_double_quant: False
214
  - bnb_4bit_compute_dtype: float32
215
+
216
  ### Framework versions
217
 
218
 
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f054a50c1c97a69a96573ad64c2b580cacf6e043599b8e7a34409e206f02b3e6
3
  size 39409357
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e40b7b6ec13002db8ba2108993b9af6829adb99a2fddff2d6b009c40f622d55
3
  size 39409357
checkpoint-100/README.md CHANGED
@@ -1,6 +1,203 @@
1
  ---
2
  library_name: peft
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ## Training procedure
5
 
6
 
@@ -16,6 +213,13 @@ The following `bitsandbytes` quantization config was used during training:
16
  - bnb_4bit_use_double_quant: False
17
  - bnb_4bit_compute_dtype: float32
18
 
 
 
 
 
 
 
 
19
  The following `bitsandbytes` quantization config was used during training:
20
  - quant_method: bitsandbytes
21
  - load_in_8bit: True
@@ -27,8 +231,8 @@ The following `bitsandbytes` quantization config was used during training:
27
  - bnb_4bit_quant_type: fp4
28
  - bnb_4bit_use_double_quant: False
29
  - bnb_4bit_compute_dtype: float32
 
30
  ### Framework versions
31
 
32
- - PEFT 0.6.0.dev0
33
 
34
  - PEFT 0.6.0.dev0
 
1
  ---
2
  library_name: peft
3
+ base_model: bigscience/bloomz-3b
4
  ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Data Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
199
+
200
+
201
  ## Training procedure
202
 
203
 
 
213
  - bnb_4bit_use_double_quant: False
214
  - bnb_4bit_compute_dtype: float32
215
 
216
+ ### Framework versions
217
+
218
+
219
+ - PEFT 0.6.0.dev0
220
+ ## Training procedure
221
+
222
+
223
  The following `bitsandbytes` quantization config was used during training:
224
  - quant_method: bitsandbytes
225
  - load_in_8bit: True
 
231
  - bnb_4bit_quant_type: fp4
232
  - bnb_4bit_use_double_quant: False
233
  - bnb_4bit_compute_dtype: float32
234
+
235
  ### Framework versions
236
 
 
237
 
238
  - PEFT 0.6.0.dev0
checkpoint-100/adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e10ec585cc785ec6c1ba52b342ed481e0c422b9ac31d8e20a8a23da9b83db472
3
  size 39409357
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c394307bbd8fa249a03539583b0146418f3ec081d9e00bd4d47de6b6362685a1
3
  size 39409357
checkpoint-100/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aa455e33c4a5d31afb7a6b507415d81b58428f2df2ae11cfc9a5c88741af4575
3
  size 78844421
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5c435a8d185fca70ecaa721a10760c7ba3f9eab4a917a3864841f3f9d5ee652
3
  size 78844421
checkpoint-100/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bc401e4179bc1e7efa275a89930a0253550e2251df7c7fb11bdb457cda3e88aa
3
  size 14575
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36fc71bd44bd7f04f2599c5dface64c517de1a7ab7bac3600f3f6470c6c72673
3
  size 14575
checkpoint-100/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f3c6698901762d2f6230a1becbf5ebc118ea9c781a5ef978d76e0b649f4f5e37
3
  size 627
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:623e145e5ab24cb1507f4f210040814726c2c3abec15b64a36227aa6dd37bb5a
3
  size 627
checkpoint-100/tokenizer_config.json CHANGED
@@ -1,5 +1,40 @@
1
  {
2
  "add_prefix_space": false,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "bos_token": "<s>",
4
  "clean_up_tokenization_spaces": false,
5
  "eos_token": "</s>",
 
1
  {
2
  "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<unk>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<s>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<pad>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ }
36
+ },
37
+ "additional_special_tokens": [],
38
  "bos_token": "<s>",
39
  "clean_up_tokenization_spaces": false,
40
  "eos_token": "</s>",
checkpoint-100/trainer_state.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 32.0,
5
  "eval_steps": 500,
6
  "global_step": 100,
7
  "is_hyper_param_search": false,
@@ -9,611 +9,611 @@
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.32,
13
- "learning_rate": 0.00019895833333333332,
14
- "loss": 2.436,
15
  "step": 1
16
  },
17
  {
18
- "epoch": 0.64,
19
- "learning_rate": 0.0001979166666666667,
20
- "loss": 2.134,
21
  "step": 2
22
  },
23
  {
24
- "epoch": 0.96,
25
- "learning_rate": 0.000196875,
26
- "loss": 1.8754,
27
  "step": 3
28
  },
29
  {
30
- "epoch": 1.28,
31
- "learning_rate": 0.00019583333333333334,
32
- "loss": 1.7027,
33
  "step": 4
34
  },
35
  {
36
- "epoch": 1.6,
37
- "learning_rate": 0.00019479166666666668,
38
- "loss": 1.5335,
39
  "step": 5
40
  },
41
  {
42
- "epoch": 1.92,
43
- "learning_rate": 0.00019375000000000002,
44
- "loss": 1.4574,
45
  "step": 6
46
  },
47
  {
48
- "epoch": 2.24,
49
- "learning_rate": 0.00019270833333333333,
50
- "loss": 1.3616,
51
  "step": 7
52
  },
53
  {
54
- "epoch": 2.56,
55
- "learning_rate": 0.00019166666666666667,
56
- "loss": 1.2486,
57
  "step": 8
58
  },
59
  {
60
- "epoch": 2.88,
61
- "learning_rate": 0.000190625,
62
- "loss": 1.2004,
63
  "step": 9
64
  },
65
  {
66
- "epoch": 3.2,
67
- "learning_rate": 0.00018958333333333332,
68
- "loss": 1.1875,
69
  "step": 10
70
  },
71
  {
72
- "epoch": 3.52,
73
- "learning_rate": 0.0001885416666666667,
74
- "loss": 1.146,
75
  "step": 11
76
  },
77
  {
78
- "epoch": 3.84,
79
- "learning_rate": 0.0001875,
80
- "loss": 1.1426,
81
  "step": 12
82
  },
83
  {
84
- "epoch": 4.16,
85
- "learning_rate": 0.00018645833333333334,
86
- "loss": 1.081,
87
  "step": 13
88
  },
89
  {
90
- "epoch": 4.48,
91
- "learning_rate": 0.00018541666666666668,
92
- "loss": 1.0904,
93
  "step": 14
94
  },
95
  {
96
- "epoch": 4.8,
97
- "learning_rate": 0.000184375,
98
- "loss": 1.0793,
99
  "step": 15
100
  },
101
  {
102
- "epoch": 5.12,
103
- "learning_rate": 0.00018333333333333334,
104
- "loss": 1.0554,
105
  "step": 16
106
  },
107
  {
108
- "epoch": 5.44,
109
- "learning_rate": 0.00018229166666666667,
110
- "loss": 1.0499,
111
  "step": 17
112
  },
113
  {
114
- "epoch": 5.76,
115
- "learning_rate": 0.00018125000000000001,
116
- "loss": 1.0378,
117
  "step": 18
118
  },
119
  {
120
- "epoch": 6.08,
121
- "learning_rate": 0.00018020833333333333,
122
- "loss": 1.0245,
123
  "step": 19
124
  },
125
  {
126
- "epoch": 6.4,
127
- "learning_rate": 0.0001791666666666667,
128
- "loss": 1.0436,
129
  "step": 20
130
  },
131
  {
132
- "epoch": 6.72,
133
- "learning_rate": 0.000178125,
134
- "loss": 0.9766,
135
  "step": 21
136
  },
137
  {
138
- "epoch": 7.04,
139
- "learning_rate": 0.00017708333333333335,
140
- "loss": 0.9664,
141
  "step": 22
142
  },
143
  {
144
- "epoch": 7.36,
145
- "learning_rate": 0.00017604166666666669,
146
- "loss": 0.9977,
147
  "step": 23
148
  },
149
  {
150
- "epoch": 7.68,
151
- "learning_rate": 0.000175,
152
- "loss": 0.9671,
153
  "step": 24
154
  },
155
  {
156
- "epoch": 8.0,
157
- "learning_rate": 0.00017395833333333334,
158
- "loss": 0.9229,
159
  "step": 25
160
  },
161
  {
162
- "epoch": 8.32,
163
- "learning_rate": 0.00017291666666666668,
164
- "loss": 0.957,
165
  "step": 26
166
  },
167
  {
168
- "epoch": 8.64,
169
- "learning_rate": 0.00017187500000000002,
170
- "loss": 0.9213,
171
  "step": 27
172
  },
173
  {
174
- "epoch": 8.96,
175
- "learning_rate": 0.00017083333333333333,
176
- "loss": 0.9147,
177
  "step": 28
178
  },
179
  {
180
- "epoch": 9.28,
181
- "learning_rate": 0.00016979166666666667,
182
- "loss": 0.8976,
183
  "step": 29
184
  },
185
  {
186
- "epoch": 9.6,
187
- "learning_rate": 0.00016875,
188
- "loss": 0.8961,
189
  "step": 30
190
  },
191
  {
192
- "epoch": 9.92,
193
- "learning_rate": 0.00016770833333333332,
194
- "loss": 0.8934,
195
  "step": 31
196
  },
197
  {
198
- "epoch": 10.24,
199
- "learning_rate": 0.0001666666666666667,
200
- "loss": 0.865,
201
  "step": 32
202
  },
203
  {
204
- "epoch": 10.56,
205
- "learning_rate": 0.000165625,
206
- "loss": 0.8952,
207
  "step": 33
208
  },
209
  {
210
- "epoch": 10.88,
211
- "learning_rate": 0.00016458333333333334,
212
- "loss": 0.832,
213
  "step": 34
214
  },
215
  {
216
- "epoch": 11.2,
217
- "learning_rate": 0.00016354166666666668,
218
- "loss": 0.8241,
219
  "step": 35
220
  },
221
  {
222
- "epoch": 11.52,
223
- "learning_rate": 0.00016250000000000002,
224
- "loss": 0.834,
225
  "step": 36
226
  },
227
  {
228
- "epoch": 11.84,
229
- "learning_rate": 0.00016145833333333333,
230
- "loss": 0.8305,
231
  "step": 37
232
  },
233
  {
234
- "epoch": 12.16,
235
- "learning_rate": 0.00016041666666666667,
236
- "loss": 0.7752,
237
  "step": 38
238
  },
239
  {
240
- "epoch": 12.48,
241
- "learning_rate": 0.000159375,
242
- "loss": 0.8084,
243
  "step": 39
244
  },
245
  {
246
- "epoch": 12.8,
247
- "learning_rate": 0.00015833333333333332,
248
- "loss": 0.7757,
249
  "step": 40
250
  },
251
  {
252
- "epoch": 13.12,
253
- "learning_rate": 0.0001572916666666667,
254
- "loss": 0.7724,
255
  "step": 41
256
  },
257
  {
258
- "epoch": 13.44,
259
- "learning_rate": 0.00015625,
260
- "loss": 0.7478,
261
  "step": 42
262
  },
263
  {
264
- "epoch": 13.76,
265
- "learning_rate": 0.00015520833333333334,
266
- "loss": 0.7291,
267
  "step": 43
268
  },
269
  {
270
- "epoch": 14.08,
271
- "learning_rate": 0.00015416666666666668,
272
- "loss": 0.7444,
273
  "step": 44
274
  },
275
  {
276
- "epoch": 14.4,
277
- "learning_rate": 0.000153125,
278
- "loss": 0.732,
279
  "step": 45
280
  },
281
  {
282
- "epoch": 14.72,
283
- "learning_rate": 0.00015208333333333333,
284
- "loss": 0.6892,
285
  "step": 46
286
  },
287
  {
288
- "epoch": 15.04,
289
- "learning_rate": 0.00015104166666666667,
290
- "loss": 0.6804,
291
  "step": 47
292
  },
293
  {
294
- "epoch": 15.36,
295
- "learning_rate": 0.00015000000000000001,
296
- "loss": 0.668,
297
  "step": 48
298
  },
299
  {
300
- "epoch": 15.68,
301
- "learning_rate": 0.00014895833333333333,
302
- "loss": 0.6568,
303
  "step": 49
304
  },
305
  {
306
- "epoch": 16.0,
307
- "learning_rate": 0.0001479166666666667,
308
- "loss": 0.6475,
309
  "step": 50
310
  },
311
  {
312
- "epoch": 16.32,
313
- "learning_rate": 0.000146875,
314
- "loss": 0.6317,
315
  "step": 51
316
  },
317
  {
318
- "epoch": 16.64,
319
- "learning_rate": 0.00014583333333333335,
320
- "loss": 0.5976,
321
  "step": 52
322
  },
323
  {
324
- "epoch": 16.96,
325
- "learning_rate": 0.00014479166666666669,
326
- "loss": 0.6074,
327
  "step": 53
328
  },
329
  {
330
- "epoch": 17.28,
331
- "learning_rate": 0.00014375,
332
- "loss": 0.5905,
333
  "step": 54
334
  },
335
  {
336
- "epoch": 17.6,
337
- "learning_rate": 0.00014270833333333334,
338
- "loss": 0.5564,
339
  "step": 55
340
  },
341
  {
342
- "epoch": 17.92,
343
- "learning_rate": 0.00014166666666666668,
344
- "loss": 0.5773,
345
  "step": 56
346
  },
347
  {
348
- "epoch": 18.24,
349
- "learning_rate": 0.00014062500000000002,
350
- "loss": 0.5337,
351
  "step": 57
352
  },
353
  {
354
- "epoch": 18.56,
355
- "learning_rate": 0.00013958333333333333,
356
- "loss": 0.5227,
357
  "step": 58
358
  },
359
  {
360
- "epoch": 18.88,
361
- "learning_rate": 0.00013854166666666667,
362
- "loss": 0.5251,
363
  "step": 59
364
  },
365
  {
366
- "epoch": 19.2,
367
- "learning_rate": 0.0001375,
368
- "loss": 0.503,
369
  "step": 60
370
  },
371
  {
372
- "epoch": 19.52,
373
- "learning_rate": 0.00013645833333333332,
374
- "loss": 0.486,
375
  "step": 61
376
  },
377
  {
378
- "epoch": 19.84,
379
- "learning_rate": 0.0001354166666666667,
380
- "loss": 0.4632,
381
  "step": 62
382
  },
383
  {
384
- "epoch": 20.16,
385
- "learning_rate": 0.000134375,
386
- "loss": 0.4734,
387
  "step": 63
388
  },
389
  {
390
- "epoch": 20.48,
391
- "learning_rate": 0.00013333333333333334,
392
- "loss": 0.4212,
393
  "step": 64
394
  },
395
  {
396
- "epoch": 20.8,
397
- "learning_rate": 0.00013229166666666668,
398
- "loss": 0.4255,
399
  "step": 65
400
  },
401
  {
402
- "epoch": 21.12,
403
- "learning_rate": 0.00013125000000000002,
404
- "loss": 0.4231,
405
  "step": 66
406
  },
407
  {
408
- "epoch": 21.44,
409
- "learning_rate": 0.00013020833333333333,
410
- "loss": 0.392,
411
  "step": 67
412
  },
413
  {
414
- "epoch": 21.76,
415
- "learning_rate": 0.00012916666666666667,
416
- "loss": 0.3924,
417
  "step": 68
418
  },
419
  {
420
- "epoch": 22.08,
421
- "learning_rate": 0.000128125,
422
- "loss": 0.3787,
423
  "step": 69
424
  },
425
  {
426
- "epoch": 22.4,
427
- "learning_rate": 0.00012708333333333332,
428
- "loss": 0.3562,
429
  "step": 70
430
  },
431
  {
432
- "epoch": 22.72,
433
- "learning_rate": 0.0001260416666666667,
434
- "loss": 0.3474,
435
  "step": 71
436
  },
437
  {
438
- "epoch": 23.04,
439
- "learning_rate": 0.000125,
440
- "loss": 0.338,
441
  "step": 72
442
  },
443
  {
444
- "epoch": 23.36,
445
- "learning_rate": 0.00012395833333333334,
446
- "loss": 0.326,
447
  "step": 73
448
  },
449
  {
450
- "epoch": 23.68,
451
- "learning_rate": 0.00012291666666666668,
452
- "loss": 0.3049,
453
  "step": 74
454
  },
455
  {
456
- "epoch": 24.0,
457
- "learning_rate": 0.00012187500000000001,
458
- "loss": 0.3032,
459
  "step": 75
460
  },
461
  {
462
- "epoch": 24.32,
463
- "learning_rate": 0.00012083333333333333,
464
- "loss": 0.2957,
465
  "step": 76
466
  },
467
  {
468
- "epoch": 24.64,
469
- "learning_rate": 0.00011979166666666667,
470
- "loss": 0.2771,
471
  "step": 77
472
  },
473
  {
474
- "epoch": 24.96,
475
- "learning_rate": 0.00011875,
476
- "loss": 0.2706,
477
  "step": 78
478
  },
479
  {
480
- "epoch": 25.28,
481
- "learning_rate": 0.00011770833333333333,
482
- "loss": 0.2611,
483
  "step": 79
484
  },
485
  {
486
- "epoch": 25.6,
487
- "learning_rate": 0.00011666666666666668,
488
- "loss": 0.2515,
489
  "step": 80
490
  },
491
  {
492
- "epoch": 25.92,
493
- "learning_rate": 0.000115625,
494
- "loss": 0.2353,
495
  "step": 81
496
  },
497
  {
498
- "epoch": 26.24,
499
- "learning_rate": 0.00011458333333333333,
500
- "loss": 0.2323,
501
  "step": 82
502
  },
503
  {
504
- "epoch": 26.56,
505
- "learning_rate": 0.00011354166666666668,
506
- "loss": 0.2394,
507
  "step": 83
508
  },
509
  {
510
- "epoch": 26.88,
511
- "learning_rate": 0.00011250000000000001,
512
- "loss": 0.2154,
513
  "step": 84
514
  },
515
  {
516
- "epoch": 27.2,
517
- "learning_rate": 0.00011145833333333334,
518
- "loss": 0.2045,
519
  "step": 85
520
  },
521
  {
522
- "epoch": 27.52,
523
- "learning_rate": 0.00011041666666666668,
524
- "loss": 0.2133,
525
  "step": 86
526
  },
527
  {
528
- "epoch": 27.84,
529
- "learning_rate": 0.000109375,
530
- "loss": 0.1995,
531
  "step": 87
532
  },
533
  {
534
- "epoch": 28.16,
535
- "learning_rate": 0.00010833333333333333,
536
- "loss": 0.1938,
537
  "step": 88
538
  },
539
  {
540
- "epoch": 28.48,
541
- "learning_rate": 0.00010729166666666668,
542
- "loss": 0.1881,
543
  "step": 89
544
  },
545
  {
546
- "epoch": 28.8,
547
- "learning_rate": 0.00010625000000000001,
548
- "loss": 0.1814,
549
  "step": 90
550
  },
551
  {
552
- "epoch": 29.12,
553
- "learning_rate": 0.00010520833333333333,
554
- "loss": 0.1707,
555
  "step": 91
556
  },
557
  {
558
- "epoch": 29.44,
559
- "learning_rate": 0.00010416666666666667,
560
- "loss": 0.1716,
561
  "step": 92
562
  },
563
  {
564
- "epoch": 29.76,
565
- "learning_rate": 0.000103125,
566
- "loss": 0.1706,
567
  "step": 93
568
  },
569
  {
570
- "epoch": 30.08,
571
- "learning_rate": 0.00010208333333333333,
572
- "loss": 0.1685,
573
  "step": 94
574
  },
575
  {
576
- "epoch": 30.4,
577
- "learning_rate": 0.00010104166666666668,
578
- "loss": 0.1621,
579
  "step": 95
580
  },
581
  {
582
- "epoch": 30.72,
583
- "learning_rate": 0.0001,
584
- "loss": 0.1548,
585
  "step": 96
586
  },
587
  {
588
- "epoch": 31.04,
589
- "learning_rate": 9.895833333333334e-05,
590
- "loss": 0.1525,
591
  "step": 97
592
  },
593
  {
594
- "epoch": 31.36,
595
- "learning_rate": 9.791666666666667e-05,
596
- "loss": 0.1488,
597
  "step": 98
598
  },
599
  {
600
- "epoch": 31.68,
601
- "learning_rate": 9.687500000000001e-05,
602
- "loss": 0.1423,
603
  "step": 99
604
  },
605
  {
606
- "epoch": 32.0,
607
- "learning_rate": 9.583333333333334e-05,
608
- "loss": 0.1452,
609
  "step": 100
610
  }
611
  ],
612
  "logging_steps": 1,
613
- "max_steps": 192,
614
- "num_train_epochs": 64,
615
  "save_steps": 100,
616
- "total_flos": 2.4750253661952e+16,
617
  "trial_name": null,
618
  "trial_params": null
619
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 7.111111111111111,
5
  "eval_steps": 500,
6
  "global_step": 100,
7
  "is_hyper_param_search": false,
 
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.07,
13
+ "learning_rate": 0.0001985714285714286,
14
+ "loss": 2.5072,
15
  "step": 1
16
  },
17
  {
18
+ "epoch": 0.14,
19
+ "learning_rate": 0.00019714285714285716,
20
+ "loss": 2.194,
21
  "step": 2
22
  },
23
  {
24
+ "epoch": 0.21,
25
+ "learning_rate": 0.00019571428571428572,
26
+ "loss": 1.9685,
27
  "step": 3
28
  },
29
  {
30
+ "epoch": 0.28,
31
+ "learning_rate": 0.0001942857142857143,
32
+ "loss": 1.7577,
33
  "step": 4
34
  },
35
  {
36
+ "epoch": 0.36,
37
+ "learning_rate": 0.00019285714285714286,
38
+ "loss": 1.6095,
39
  "step": 5
40
  },
41
  {
42
+ "epoch": 0.43,
43
+ "learning_rate": 0.00019142857142857145,
44
+ "loss": 1.5448,
45
  "step": 6
46
  },
47
  {
48
+ "epoch": 0.5,
49
+ "learning_rate": 0.00019,
50
+ "loss": 1.453,
51
  "step": 7
52
  },
53
  {
54
+ "epoch": 0.57,
55
+ "learning_rate": 0.00018857142857142857,
56
+ "loss": 1.41,
57
  "step": 8
58
  },
59
  {
60
+ "epoch": 0.64,
61
+ "learning_rate": 0.00018714285714285716,
62
+ "loss": 1.3054,
63
  "step": 9
64
  },
65
  {
66
+ "epoch": 0.71,
67
+ "learning_rate": 0.00018571428571428572,
68
+ "loss": 1.2634,
69
  "step": 10
70
  },
71
  {
72
+ "epoch": 0.78,
73
+ "learning_rate": 0.00018428571428571428,
74
+ "loss": 1.2269,
75
  "step": 11
76
  },
77
  {
78
+ "epoch": 0.85,
79
+ "learning_rate": 0.00018285714285714286,
80
+ "loss": 1.2405,
81
  "step": 12
82
  },
83
  {
84
+ "epoch": 0.92,
85
+ "learning_rate": 0.00018142857142857142,
86
+ "loss": 1.2436,
87
  "step": 13
88
  },
89
  {
90
+ "epoch": 1.0,
91
+ "learning_rate": 0.00018,
92
+ "loss": 1.2063,
93
  "step": 14
94
  },
95
  {
96
+ "epoch": 1.07,
97
+ "learning_rate": 0.0001785714285714286,
98
+ "loss": 1.1789,
99
  "step": 15
100
  },
101
  {
102
+ "epoch": 1.14,
103
+ "learning_rate": 0.00017714285714285713,
104
+ "loss": 1.2007,
105
  "step": 16
106
  },
107
  {
108
+ "epoch": 1.21,
109
+ "learning_rate": 0.00017571428571428572,
110
+ "loss": 1.1616,
111
  "step": 17
112
  },
113
  {
114
+ "epoch": 1.28,
115
+ "learning_rate": 0.0001742857142857143,
116
+ "loss": 1.157,
117
  "step": 18
118
  },
119
  {
120
+ "epoch": 1.35,
121
+ "learning_rate": 0.00017285714285714287,
122
+ "loss": 1.1555,
123
  "step": 19
124
  },
125
  {
126
+ "epoch": 1.42,
127
+ "learning_rate": 0.00017142857142857143,
128
+ "loss": 1.1559,
129
  "step": 20
130
  },
131
  {
132
+ "epoch": 1.49,
133
+ "learning_rate": 0.00017,
134
+ "loss": 1.1487,
135
  "step": 21
136
  },
137
  {
138
+ "epoch": 1.56,
139
+ "learning_rate": 0.00016857142857142857,
140
+ "loss": 1.1729,
141
  "step": 22
142
  },
143
  {
144
+ "epoch": 1.64,
145
+ "learning_rate": 0.00016714285714285716,
146
+ "loss": 1.1251,
147
  "step": 23
148
  },
149
  {
150
+ "epoch": 1.71,
151
+ "learning_rate": 0.00016571428571428575,
152
+ "loss": 1.1181,
153
  "step": 24
154
  },
155
  {
156
+ "epoch": 1.78,
157
+ "learning_rate": 0.00016428571428571428,
158
+ "loss": 1.1144,
159
  "step": 25
160
  },
161
  {
162
+ "epoch": 1.85,
163
+ "learning_rate": 0.00016285714285714287,
164
+ "loss": 1.1416,
165
  "step": 26
166
  },
167
  {
168
+ "epoch": 1.92,
169
+ "learning_rate": 0.00016142857142857145,
170
+ "loss": 1.0965,
171
  "step": 27
172
  },
173
  {
174
+ "epoch": 1.99,
175
+ "learning_rate": 0.00016,
176
+ "loss": 1.0936,
177
  "step": 28
178
  },
179
  {
180
+ "epoch": 2.06,
181
+ "learning_rate": 0.00015857142857142857,
182
+ "loss": 1.0839,
183
  "step": 29
184
  },
185
  {
186
+ "epoch": 2.13,
187
+ "learning_rate": 0.00015714285714285716,
188
+ "loss": 1.127,
189
  "step": 30
190
  },
191
  {
192
+ "epoch": 2.2,
193
+ "learning_rate": 0.00015571428571428572,
194
+ "loss": 1.0886,
195
  "step": 31
196
  },
197
  {
198
+ "epoch": 2.28,
199
+ "learning_rate": 0.0001542857142857143,
200
+ "loss": 1.0447,
201
  "step": 32
202
  },
203
  {
204
+ "epoch": 2.35,
205
+ "learning_rate": 0.00015285714285714287,
206
+ "loss": 1.0513,
207
  "step": 33
208
  },
209
  {
210
+ "epoch": 2.42,
211
+ "learning_rate": 0.00015142857142857143,
212
+ "loss": 1.098,
213
  "step": 34
214
  },
215
  {
216
+ "epoch": 2.49,
217
+ "learning_rate": 0.00015000000000000001,
218
+ "loss": 1.0628,
219
  "step": 35
220
  },
221
  {
222
+ "epoch": 2.56,
223
+ "learning_rate": 0.00014857142857142857,
224
+ "loss": 1.0814,
225
  "step": 36
226
  },
227
  {
228
+ "epoch": 2.63,
229
+ "learning_rate": 0.00014714285714285716,
230
+ "loss": 1.0638,
231
  "step": 37
232
  },
233
  {
234
+ "epoch": 2.7,
235
+ "learning_rate": 0.00014571428571428572,
236
+ "loss": 1.0652,
237
  "step": 38
238
  },
239
  {
240
+ "epoch": 2.77,
241
+ "learning_rate": 0.00014428571428571428,
242
+ "loss": 1.0463,
243
  "step": 39
244
  },
245
  {
246
+ "epoch": 2.84,
247
+ "learning_rate": 0.00014285714285714287,
248
+ "loss": 1.0349,
249
  "step": 40
250
  },
251
  {
252
+ "epoch": 2.92,
253
+ "learning_rate": 0.00014142857142857145,
254
+ "loss": 1.0165,
255
  "step": 41
256
  },
257
  {
258
+ "epoch": 2.99,
259
+ "learning_rate": 0.00014,
260
+ "loss": 1.0905,
261
  "step": 42
262
  },
263
  {
264
+ "epoch": 3.06,
265
+ "learning_rate": 0.00013857142857142857,
266
+ "loss": 1.0297,
267
  "step": 43
268
  },
269
  {
270
+ "epoch": 3.13,
271
+ "learning_rate": 0.00013714285714285716,
272
+ "loss": 1.0061,
273
  "step": 44
274
  },
275
  {
276
+ "epoch": 3.2,
277
+ "learning_rate": 0.00013571428571428572,
278
+ "loss": 1.0019,
279
  "step": 45
280
  },
281
  {
282
+ "epoch": 3.27,
283
+ "learning_rate": 0.00013428571428571428,
284
+ "loss": 0.9555,
285
  "step": 46
286
  },
287
  {
288
+ "epoch": 3.34,
289
+ "learning_rate": 0.00013285714285714287,
290
+ "loss": 1.038,
291
  "step": 47
292
  },
293
  {
294
+ "epoch": 3.41,
295
+ "learning_rate": 0.00013142857142857143,
296
+ "loss": 0.9932,
297
  "step": 48
298
  },
299
  {
300
+ "epoch": 3.48,
301
+ "learning_rate": 0.00013000000000000002,
302
+ "loss": 1.0451,
303
  "step": 49
304
  },
305
  {
306
+ "epoch": 3.56,
307
+ "learning_rate": 0.00012857142857142858,
308
+ "loss": 1.008,
309
  "step": 50
310
  },
311
  {
312
+ "epoch": 3.63,
313
+ "learning_rate": 0.00012714285714285714,
314
+ "loss": 1.0362,
315
  "step": 51
316
  },
317
  {
318
+ "epoch": 3.7,
319
+ "learning_rate": 0.00012571428571428572,
320
+ "loss": 1.0007,
321
  "step": 52
322
  },
323
  {
324
+ "epoch": 3.77,
325
+ "learning_rate": 0.00012428571428571428,
326
+ "loss": 1.0038,
327
  "step": 53
328
  },
329
  {
330
+ "epoch": 3.84,
331
+ "learning_rate": 0.00012285714285714287,
332
+ "loss": 1.0057,
333
  "step": 54
334
  },
335
  {
336
+ "epoch": 3.91,
337
+ "learning_rate": 0.00012142857142857143,
338
+ "loss": 1.0172,
339
  "step": 55
340
  },
341
  {
342
+ "epoch": 3.98,
343
+ "learning_rate": 0.00012,
344
+ "loss": 0.982,
345
  "step": 56
346
  },
347
  {
348
+ "epoch": 4.05,
349
+ "learning_rate": 0.00011857142857142858,
350
+ "loss": 0.9838,
351
  "step": 57
352
  },
353
  {
354
+ "epoch": 4.12,
355
+ "learning_rate": 0.00011714285714285715,
356
+ "loss": 0.9677,
357
  "step": 58
358
  },
359
  {
360
+ "epoch": 4.2,
361
+ "learning_rate": 0.00011571428571428574,
362
+ "loss": 0.9815,
363
  "step": 59
364
  },
365
  {
366
+ "epoch": 4.27,
367
+ "learning_rate": 0.00011428571428571428,
368
+ "loss": 0.9711,
369
  "step": 60
370
  },
371
  {
372
+ "epoch": 4.34,
373
+ "learning_rate": 0.00011285714285714286,
374
+ "loss": 1.0086,
375
  "step": 61
376
  },
377
  {
378
+ "epoch": 4.41,
379
+ "learning_rate": 0.00011142857142857144,
380
+ "loss": 0.9485,
381
  "step": 62
382
  },
383
  {
384
+ "epoch": 4.48,
385
+ "learning_rate": 0.00011000000000000002,
386
+ "loss": 0.9342,
387
  "step": 63
388
  },
389
  {
390
+ "epoch": 4.55,
391
+ "learning_rate": 0.00010857142857142856,
392
+ "loss": 0.9887,
393
  "step": 64
394
  },
395
  {
396
+ "epoch": 4.62,
397
+ "learning_rate": 0.00010714285714285715,
398
+ "loss": 0.9614,
399
  "step": 65
400
  },
401
  {
402
+ "epoch": 4.69,
403
+ "learning_rate": 0.00010571428571428572,
404
+ "loss": 0.9644,
405
  "step": 66
406
  },
407
  {
408
+ "epoch": 4.76,
409
+ "learning_rate": 0.0001042857142857143,
410
+ "loss": 0.9267,
411
  "step": 67
412
  },
413
  {
414
+ "epoch": 4.84,
415
+ "learning_rate": 0.00010285714285714286,
416
+ "loss": 0.954,
417
  "step": 68
418
  },
419
  {
420
+ "epoch": 4.91,
421
+ "learning_rate": 0.00010142857142857143,
422
+ "loss": 0.919,
423
  "step": 69
424
  },
425
  {
426
+ "epoch": 4.98,
427
+ "learning_rate": 0.0001,
428
+ "loss": 0.9478,
429
  "step": 70
430
  },
431
  {
432
+ "epoch": 5.05,
433
+ "learning_rate": 9.857142857142858e-05,
434
+ "loss": 0.9559,
435
  "step": 71
436
  },
437
  {
438
+ "epoch": 5.12,
439
+ "learning_rate": 9.714285714285715e-05,
440
+ "loss": 0.9596,
441
  "step": 72
442
  },
443
  {
444
+ "epoch": 5.19,
445
+ "learning_rate": 9.571428571428573e-05,
446
+ "loss": 0.9151,
447
  "step": 73
448
  },
449
  {
450
+ "epoch": 5.26,
451
+ "learning_rate": 9.428571428571429e-05,
452
+ "loss": 0.9059,
453
  "step": 74
454
  },
455
  {
456
+ "epoch": 5.33,
457
+ "learning_rate": 9.285714285714286e-05,
458
+ "loss": 0.8717,
459
  "step": 75
460
  },
461
  {
462
+ "epoch": 5.4,
463
+ "learning_rate": 9.142857142857143e-05,
464
+ "loss": 0.8912,
465
  "step": 76
466
  },
467
  {
468
+ "epoch": 5.48,
469
+ "learning_rate": 9e-05,
470
+ "loss": 0.9166,
471
  "step": 77
472
  },
473
  {
474
+ "epoch": 5.55,
475
+ "learning_rate": 8.857142857142857e-05,
476
+ "loss": 0.9362,
477
  "step": 78
478
  },
479
  {
480
+ "epoch": 5.62,
481
+ "learning_rate": 8.714285714285715e-05,
482
+ "loss": 0.8969,
483
  "step": 79
484
  },
485
  {
486
+ "epoch": 5.69,
487
+ "learning_rate": 8.571428571428571e-05,
488
+ "loss": 0.898,
489
  "step": 80
490
  },
491
  {
492
+ "epoch": 5.76,
493
+ "learning_rate": 8.428571428571429e-05,
494
+ "loss": 0.8626,
495
  "step": 81
496
  },
497
  {
498
+ "epoch": 5.83,
499
+ "learning_rate": 8.285714285714287e-05,
500
+ "loss": 0.9353,
501
  "step": 82
502
  },
503
  {
504
+ "epoch": 5.9,
505
+ "learning_rate": 8.142857142857143e-05,
506
+ "loss": 0.9353,
507
  "step": 83
508
  },
509
  {
510
+ "epoch": 5.97,
511
+ "learning_rate": 8e-05,
512
+ "loss": 0.9277,
513
  "step": 84
514
  },
515
  {
516
+ "epoch": 6.04,
517
+ "learning_rate": 7.857142857142858e-05,
518
+ "loss": 0.8856,
519
  "step": 85
520
  },
521
  {
522
+ "epoch": 6.12,
523
+ "learning_rate": 7.714285714285715e-05,
524
+ "loss": 0.8771,
525
  "step": 86
526
  },
527
  {
528
+ "epoch": 6.19,
529
+ "learning_rate": 7.571428571428571e-05,
530
+ "loss": 0.8634,
531
  "step": 87
532
  },
533
  {
534
+ "epoch": 6.26,
535
+ "learning_rate": 7.428571428571429e-05,
536
+ "loss": 0.8655,
537
  "step": 88
538
  },
539
  {
540
+ "epoch": 6.33,
541
+ "learning_rate": 7.285714285714286e-05,
542
+ "loss": 0.856,
543
  "step": 89
544
  },
545
  {
546
+ "epoch": 6.4,
547
+ "learning_rate": 7.142857142857143e-05,
548
+ "loss": 0.8929,
549
  "step": 90
550
  },
551
  {
552
+ "epoch": 6.47,
553
+ "learning_rate": 7e-05,
554
+ "loss": 0.8844,
555
  "step": 91
556
  },
557
  {
558
+ "epoch": 6.54,
559
+ "learning_rate": 6.857142857142858e-05,
560
+ "loss": 0.8951,
561
  "step": 92
562
  },
563
  {
564
+ "epoch": 6.61,
565
+ "learning_rate": 6.714285714285714e-05,
566
+ "loss": 0.8385,
567
  "step": 93
568
  },
569
  {
570
+ "epoch": 6.68,
571
+ "learning_rate": 6.571428571428571e-05,
572
+ "loss": 0.873,
573
  "step": 94
574
  },
575
  {
576
+ "epoch": 6.76,
577
+ "learning_rate": 6.428571428571429e-05,
578
+ "loss": 0.9033,
579
  "step": 95
580
  },
581
  {
582
+ "epoch": 6.83,
583
+ "learning_rate": 6.285714285714286e-05,
584
+ "loss": 0.8643,
585
  "step": 96
586
  },
587
  {
588
+ "epoch": 6.9,
589
+ "learning_rate": 6.142857142857143e-05,
590
+ "loss": 0.8894,
591
  "step": 97
592
  },
593
  {
594
+ "epoch": 6.97,
595
+ "learning_rate": 6e-05,
596
+ "loss": 0.8436,
597
  "step": 98
598
  },
599
  {
600
+ "epoch": 7.04,
601
+ "learning_rate": 5.8571428571428575e-05,
602
+ "loss": 0.8362,
603
  "step": 99
604
  },
605
  {
606
+ "epoch": 7.11,
607
+ "learning_rate": 5.714285714285714e-05,
608
+ "loss": 0.8162,
609
  "step": 100
610
  }
611
  ],
612
  "logging_steps": 1,
613
+ "max_steps": 140,
614
+ "num_train_epochs": 10,
615
  "save_steps": 100,
616
+ "total_flos": 1.837898937498624e+16,
617
  "trial_name": null,
618
  "trial_params": null
619
  }
checkpoint-100/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:18fa67295c2f705605ed2e7ff81543bd20a36db35f54e417d9b1ea047663c02f
3
  size 4027
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f64eef1b40d4774a448e5873470138b7a2cb17cf32f63605c071f25bf135444
3
  size 4027
runs/Oct12_18-12-36_63a985a0dcf5/events.out.tfevents.1697134361.63a985a0dcf5.4074.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12f388aa7a381716fcd82edc36b07c4432ae5880f46a9837eed7f8b93b7b65e3
3
+ size 26628
tokenizer_config.json CHANGED
@@ -1,5 +1,40 @@
1
  {
2
  "add_prefix_space": false,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "bos_token": "<s>",
4
  "clean_up_tokenization_spaces": false,
5
  "eos_token": "</s>",
 
1
  {
2
  "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<unk>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<s>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<pad>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ }
36
+ },
37
+ "additional_special_tokens": [],
38
  "bos_token": "<s>",
39
  "clean_up_tokenization_spaces": false,
40
  "eos_token": "</s>",
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:18fa67295c2f705605ed2e7ff81543bd20a36db35f54e417d9b1ea047663c02f
3
  size 4027
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f64eef1b40d4774a448e5873470138b7a2cb17cf32f63605c071f25bf135444
3
  size 4027