jonathanagustin commited on
Commit
0365b89
1 Parent(s): d086ecc

Model save

Browse files
Files changed (5) hide show
  1. README.md +58 -261
  2. config.json +1 -1
  3. metrics.json +6 -6
  4. tokenizer.json +3 -5
  5. trainer_state.json +159 -19
README.md CHANGED
@@ -1,281 +1,78 @@
1
  ---
2
- language: en
3
- license: mit
4
- model_details: "\n ## Abstract\n This model, 'roberta-finetuned', is\
5
- \ a question-answering chatbot trained on the SQuAD dataset, demonstrating competency\
6
- \ in building conversational AI using recent advances in natural language processing.\
7
- \ It utilizes a BERT model fine-tuned for extractive question answering.\n\n \
8
- \ ## Data Collection and Preprocessing\n The model was trained on the\
9
- \ Stanford Question Answering Dataset (SQuAD), which contains over 100,000 question-answer\
10
- \ pairs based on Wikipedia articles. The data preprocessing involved tokenizing\
11
- \ context paragraphs and questions, truncating sequences to fit BERT's max length,\
12
- \ and adding special tokens to mark question and paragraph segments.\n\n \
13
- \ ## Model Architecture and Training\n The architecture is based on the BERT\
14
- \ transformer model, which was pretrained on large unlabeled text corpora. For this\
15
- \ project, the BERT base model was fine-tuned on SQuAD for extractive question answering,\
16
- \ with additional output layers for predicting the start and end indices of the\
17
- \ answer span.\n\n ## SQuAD 2.0 Dataset\n SQuAD 2.0 combines the existing\
18
- \ SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers\
19
- \ to look similar to answerable ones. This version of the dataset challenges models\
20
- \ to not only produce answers when possible but also determine when no answer is\
21
- \ supported by the paragraph and abstain from answering.\n "
22
- intended_use: "\n - Answering questions from the squad_v2 dataset.\n \
23
- \ - Developing question-answering systems within the scope of the aai520-project.\n\
24
- \ - Research and experimentation in the NLP question-answering domain.\n\
25
- \ "
26
- limitations_and_bias: "\n The model inherits limitations and biases from the\
27
- \ 'roberta-base' model, as it was trained on the same foundational data. \n \
28
- \ It may underperform on questions that are ambiguous or too far outside the\
29
- \ scope of the topics covered in the squad_v2 dataset. \n Additionally, the\
30
- \ model may reflect societal biases present in its training data.\n "
31
- ethical_considerations: "\n This model should not be used for making critical\
32
- \ decisions without human oversight, \n as it can generate incorrect or biased\
33
- \ answers, especially for topics not covered in the training data. \n Users\
34
- \ should also consider the ethical implications of using AI in decision-making processes\
35
- \ and the potential for perpetuating biases.\n "
36
- evaluation: "\n The model was evaluated on the squad_v2 dataset using various\
37
- \ metrics. These metrics, along with their corresponding scores, \n are detailed\
38
- \ in the 'eval_results' section. The evaluation process ensured a comprehensive\
39
- \ assessment of the model's performance \n in question-answering scenarios.\n\
40
- \ "
41
- training: "\n The model was trained over 4 epochs with a learning rate of 2e-05,\
42
- \ using a batch size of 64. \n The training utilized a cross-entropy loss\
43
- \ function and the AdamW optimizer, with gradient accumulation over 4 steps.\n \
44
- \ "
45
- tips_and_tricks: "\n For optimal performance, questions should be clear, concise,\
46
- \ and grammatically correct. \n The model performs best on questions related\
47
- \ to topics covered in the squad_v2 dataset. \n It is advisable to pre-process\
48
- \ text for consistency in encoding and punctuation, and to manage expectations for\
49
- \ questions on topics outside the training data.\n "
50
  model-index:
51
- - name: roberta-finetuned
52
- results:
53
- - task:
54
- type: question-answering
55
- dataset:
56
- name: SQuAD v2
57
- type: squad_v2
58
- metrics:
59
- - type: Exact
60
- value: 36.511412448412365
61
- - type: F1
62
- value: 39.525181841913
63
- - type: Total
64
- value: 11873
65
- - type: Hasans Exact
66
- value: 72.19973009446694
67
- - type: Hasans F1
68
- value: 78.23591160746174
69
- - type: Hasans Total
70
- value: 5928
71
- - type: Noans Exact
72
- value: 0.9251471825063078
73
- - type: Noans F1
74
- value: 0.9251471825063078
75
- - type: Noans Total
76
- value: 5945
77
- - type: Best Exact
78
- value: 50.09685841825992
79
- - type: Best Exact Thresh
80
- value: 0.0
81
- - type: Best F1
82
- value: 50.09685841825992
83
- - type: Best F1 Thresh
84
- value: 0.0
85
  ---
86
 
87
- # Model Card for Model ID
 
88
 
89
- <!-- Provide a quick summary of what the model is/does. -->
90
 
 
 
 
91
 
 
92
 
93
- ## Model Details
94
 
95
- ### Model Description
96
 
97
- <!-- Provide a longer summary of what this model is. -->
98
 
 
99
 
 
100
 
101
- - **Developed by:** [More Information Needed]
102
- - **Shared by [optional]:** [More Information Needed]
103
- - **Model type:** [More Information Needed]
104
- - **Language(s) (NLP):** en
105
- - **License:** mit
106
- - **Finetuned from model [optional]:** [More Information Needed]
107
 
108
- ### Model Sources [optional]
109
 
110
- <!-- Provide the basic links for the model. -->
 
 
 
 
 
 
 
 
 
111
 
112
- - **Repository:** [More Information Needed]
113
- - **Paper [optional]:** [More Information Needed]
114
- - **Demo [optional]:** [More Information Needed]
115
 
116
- ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
119
-
120
- ### Direct Use
121
-
122
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
123
-
124
- [More Information Needed]
125
-
126
- ### Downstream Use [optional]
127
-
128
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
129
-
130
- [More Information Needed]
131
-
132
- ### Out-of-Scope Use
133
-
134
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
135
-
136
- [More Information Needed]
137
-
138
- ## Bias, Risks, and Limitations
139
-
140
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
141
-
142
- [More Information Needed]
143
-
144
- ### Recommendations
145
-
146
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
147
-
148
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
149
-
150
- ## How to Get Started with the Model
151
-
152
- Use the code below to get started with the model.
153
-
154
- [More Information Needed]
155
-
156
- ## Training Details
157
-
158
- ### Training Data
159
-
160
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
161
-
162
- [More Information Needed]
163
-
164
- ### Training Procedure
165
-
166
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
167
-
168
- #### Preprocessing [optional]
169
-
170
- [More Information Needed]
171
-
172
-
173
- #### Training Hyperparameters
174
-
175
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
176
-
177
- #### Speeds, Sizes, Times [optional]
178
-
179
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
180
-
181
- [More Information Needed]
182
-
183
- ## Evaluation
184
-
185
- <!-- This section describes the evaluation protocols and provides the results. -->
186
-
187
- ### Testing Data, Factors & Metrics
188
-
189
- #### Testing Data
190
-
191
- <!-- This should link to a Data Card if possible. -->
192
-
193
- [More Information Needed]
194
-
195
- #### Factors
196
-
197
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
198
-
199
- [More Information Needed]
200
-
201
- #### Metrics
202
-
203
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
204
-
205
- [More Information Needed]
206
-
207
- ### Results
208
-
209
- [More Information Needed]
210
-
211
- #### Summary
212
-
213
-
214
-
215
- ## Model Examination [optional]
216
-
217
- <!-- Relevant interpretability work for the model goes here -->
218
-
219
- [More Information Needed]
220
-
221
- ## Environmental Impact
222
-
223
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
224
-
225
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
226
-
227
- - **Hardware Type:** [More Information Needed]
228
- - **Hours used:** [More Information Needed]
229
- - **Cloud Provider:** [More Information Needed]
230
- - **Compute Region:** [More Information Needed]
231
- - **Carbon Emitted:** [More Information Needed]
232
-
233
- ## Technical Specifications [optional]
234
-
235
- ### Model Architecture and Objective
236
-
237
- [More Information Needed]
238
-
239
- ### Compute Infrastructure
240
-
241
- [More Information Needed]
242
-
243
- #### Hardware
244
-
245
- [More Information Needed]
246
-
247
- #### Software
248
-
249
- [More Information Needed]
250
-
251
- ## Citation [optional]
252
-
253
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
254
-
255
- **BibTeX:**
256
-
257
- [More Information Needed]
258
-
259
- **APA:**
260
-
261
- [More Information Needed]
262
-
263
- ## Glossary [optional]
264
-
265
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
266
-
267
- [More Information Needed]
268
-
269
- ## More Information [optional]
270
-
271
- [More Information Needed]
272
-
273
- ## Model Card Authors [optional]
274
-
275
- [More Information Needed]
276
-
277
- ## Model Card Contact
278
-
279
- [More Information Needed]
280
 
 
281
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - generated_from_trainer
4
+ datasets:
5
+ - squad_v2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  model-index:
7
+ - name: roberta-finetuned-squad_v2
8
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # roberta-finetuned-squad_v2
15
 
16
+ This model was trained from scratch on the squad_v2 dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.8582
19
 
20
+ ## Model description
21
 
22
+ More information needed
23
 
24
+ ## Intended uses & limitations
25
 
26
+ More information needed
27
 
28
+ ## Training and evaluation data
29
 
30
+ More information needed
31
 
32
+ ## Training procedure
 
 
 
 
 
33
 
34
+ ### Training hyperparameters
35
 
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 2e-05
38
+ - train_batch_size: 64
39
+ - eval_batch_size: 64
40
+ - seed: 42
41
+ - gradient_accumulation_steps: 4
42
+ - total_train_batch_size: 256
43
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
+ - lr_scheduler_type: linear
45
+ - num_epochs: 4
46
 
47
+ ### Training results
 
 
48
 
49
+ | Training Loss | Epoch | Step | Validation Loss |
50
+ |:-------------:|:-----:|:----:|:---------------:|
51
+ | 2.9129 | 0.2 | 100 | 1.4700 |
52
+ | 1.4395 | 0.39 | 200 | 1.2407 |
53
+ | 1.2356 | 0.59 | 300 | 1.0325 |
54
+ | 1.1284 | 0.78 | 400 | 0.9750 |
55
+ | 1.0821 | 0.98 | 500 | 0.9345 |
56
+ | 0.9978 | 1.18 | 600 | 0.9893 |
57
+ | 0.9697 | 1.37 | 700 | 0.9300 |
58
+ | 0.9455 | 1.57 | 800 | 0.9351 |
59
+ | 0.9322 | 1.76 | 900 | 0.9451 |
60
+ | 0.9269 | 1.96 | 1000 | 0.9064 |
61
+ | 0.9105 | 2.16 | 1100 | 0.8837 |
62
+ | 0.8805 | 2.35 | 1200 | 0.8876 |
63
+ | 0.8703 | 2.55 | 1300 | 0.9853 |
64
+ | 0.8699 | 2.75 | 1400 | 0.9235 |
65
+ | 0.8633 | 2.94 | 1500 | 0.8930 |
66
+ | 0.828 | 3.14 | 1600 | 0.8582 |
67
+ | 0.8284 | 3.33 | 1700 | 0.9203 |
68
+ | 0.8076 | 3.53 | 1800 | 0.8866 |
69
+ | 0.7805 | 3.73 | 1900 | 0.9099 |
70
+ | 0.7974 | 3.92 | 2000 | 0.8746 |
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
+ ### Framework versions
74
 
75
+ - Transformers 4.34.1
76
+ - Pytorch 2.1.0+cu118
77
+ - Datasets 2.14.5
78
+ - Tokenizers 0.14.1
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/roberta-finetuned/checkpoint-1000",
3
  "architectures": [
4
  "RobertaForQuestionAnswering"
5
  ],
 
1
  {
2
+ "_name_or_path": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/roberta-finetuned/checkpoint-2000",
3
  "architectures": [
4
  "RobertaForQuestionAnswering"
5
  ],
metrics.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
- "exact": 37.40419439063421,
3
- "f1": 40.42817816263742,
4
  "total": 11873,
5
- "HasAns_exact": 72.58771929824562,
6
- "HasAns_f1": 78.64435886049151,
7
  "HasAns_total": 5928,
8
- "NoAns_exact": 2.3212783851976453,
9
- "NoAns_f1": 2.3212783851976453,
10
  "NoAns_total": 5945,
11
  "best_exact": 50.09685841825992,
12
  "best_exact_thresh": 0.0,
 
1
  {
2
+ "exact": 36.511412448412365,
3
+ "f1": 39.525181841913,
4
  "total": 11873,
5
+ "HasAns_exact": 72.19973009446694,
6
+ "HasAns_f1": 78.23591160746174,
7
  "HasAns_total": 5928,
8
+ "NoAns_exact": 0.9251471825063078,
9
+ "NoAns_f1": 0.9251471825063078,
10
  "NoAns_total": 5945,
11
  "best_exact": 50.09685841825992,
12
  "best_exact_thresh": 0.0,
tokenizer.json CHANGED
@@ -3,13 +3,11 @@
3
  "truncation": {
4
  "direction": "Right",
5
  "max_length": 512,
6
- "strategy": "LongestFirst",
7
- "stride": 0
8
  },
9
  "padding": {
10
- "strategy": {
11
- "Fixed": 512
12
- },
13
  "direction": "Right",
14
  "pad_to_multiple_of": null,
15
  "pad_id": 1,
 
3
  "truncation": {
4
  "direction": "Right",
5
  "max_length": 512,
6
+ "strategy": "OnlySecond",
7
+ "stride": 128
8
  },
9
  "padding": {
10
+ "strategy": "BatchLongest",
 
 
11
  "direction": "Right",
12
  "pad_to_multiple_of": null,
13
  "pad_id": 1,
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "best_metric": 0.9063502550125122,
3
- "best_model_checkpoint": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/roberta-finetuned/checkpoint-1000",
4
- "epoch": 1.9607843137254903,
5
  "eval_steps": 100,
6
- "global_step": 1000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -149,28 +149,168 @@
149
  "step": 1000
150
  },
151
  {
152
- "epoch": 1.96,
153
- "step": 1000,
154
- "total_flos": 6.688961805360538e+16,
155
- "train_loss": 0.0,
156
- "train_runtime": 2.2787,
157
- "train_samples_per_second": 229166.609,
158
- "train_steps_per_second": 222.933
159
  },
160
  {
161
- "epoch": 1.96,
162
- "eval_loss": 0.9063556790351868,
163
- "eval_runtime": 17.1279,
164
- "eval_samples_per_second": 697.982,
165
- "eval_steps_per_second": 2.744,
166
- "step": 1000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
  }
168
  ],
169
  "logging_steps": 100,
170
- "max_steps": 508,
171
  "num_train_epochs": 4,
172
  "save_steps": 100,
173
- "total_flos": 6.688961805360538e+16,
174
  "trial_name": null,
175
  "trial_params": null
176
  }
 
1
  {
2
+ "best_metric": 0.8582048416137695,
3
+ "best_model_checkpoint": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/roberta-finetuned/checkpoint-1600",
4
+ "epoch": 4.0,
5
  "eval_steps": 100,
6
+ "global_step": 2040,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
149
  "step": 1000
150
  },
151
  {
152
+ "epoch": 2.16,
153
+ "learning_rate": 9.284313725490197e-06,
154
+ "loss": 0.9105,
155
+ "step": 1100
 
 
 
156
  },
157
  {
158
+ "epoch": 2.16,
159
+ "eval_loss": 0.8837365508079529,
160
+ "eval_runtime": 17.5298,
161
+ "eval_samples_per_second": 681.981,
162
+ "eval_steps_per_second": 10.668,
163
+ "step": 1100
164
+ },
165
+ {
166
+ "epoch": 2.35,
167
+ "learning_rate": 8.303921568627452e-06,
168
+ "loss": 0.8805,
169
+ "step": 1200
170
+ },
171
+ {
172
+ "epoch": 2.35,
173
+ "eval_loss": 0.8875929713249207,
174
+ "eval_runtime": 17.5814,
175
+ "eval_samples_per_second": 679.978,
176
+ "eval_steps_per_second": 10.636,
177
+ "step": 1200
178
+ },
179
+ {
180
+ "epoch": 2.55,
181
+ "learning_rate": 7.333333333333333e-06,
182
+ "loss": 0.8703,
183
+ "step": 1300
184
+ },
185
+ {
186
+ "epoch": 2.55,
187
+ "eval_loss": 0.9852611422538757,
188
+ "eval_runtime": 17.5854,
189
+ "eval_samples_per_second": 679.824,
190
+ "eval_steps_per_second": 10.634,
191
+ "step": 1300
192
+ },
193
+ {
194
+ "epoch": 2.75,
195
+ "learning_rate": 6.352941176470589e-06,
196
+ "loss": 0.8699,
197
+ "step": 1400
198
+ },
199
+ {
200
+ "epoch": 2.75,
201
+ "eval_loss": 0.9235011339187622,
202
+ "eval_runtime": 17.5815,
203
+ "eval_samples_per_second": 679.975,
204
+ "eval_steps_per_second": 10.636,
205
+ "step": 1400
206
+ },
207
+ {
208
+ "epoch": 2.94,
209
+ "learning_rate": 5.372549019607843e-06,
210
+ "loss": 0.8633,
211
+ "step": 1500
212
+ },
213
+ {
214
+ "epoch": 2.94,
215
+ "eval_loss": 0.8929564356803894,
216
+ "eval_runtime": 17.5589,
217
+ "eval_samples_per_second": 680.85,
218
+ "eval_steps_per_second": 10.65,
219
+ "step": 1500
220
+ },
221
+ {
222
+ "epoch": 3.14,
223
+ "learning_rate": 4.392156862745098e-06,
224
+ "loss": 0.828,
225
+ "step": 1600
226
+ },
227
+ {
228
+ "epoch": 3.14,
229
+ "eval_loss": 0.8582048416137695,
230
+ "eval_runtime": 17.5663,
231
+ "eval_samples_per_second": 680.564,
232
+ "eval_steps_per_second": 10.645,
233
+ "step": 1600
234
+ },
235
+ {
236
+ "epoch": 3.33,
237
+ "learning_rate": 3.421568627450981e-06,
238
+ "loss": 0.8284,
239
+ "step": 1700
240
+ },
241
+ {
242
+ "epoch": 3.33,
243
+ "eval_loss": 0.920342743396759,
244
+ "eval_runtime": 17.6216,
245
+ "eval_samples_per_second": 678.428,
246
+ "eval_steps_per_second": 10.612,
247
+ "step": 1700
248
+ },
249
+ {
250
+ "epoch": 3.53,
251
+ "learning_rate": 2.4411764705882356e-06,
252
+ "loss": 0.8076,
253
+ "step": 1800
254
+ },
255
+ {
256
+ "epoch": 3.53,
257
+ "eval_loss": 0.8865646719932556,
258
+ "eval_runtime": 17.6165,
259
+ "eval_samples_per_second": 678.626,
260
+ "eval_steps_per_second": 10.615,
261
+ "step": 1800
262
+ },
263
+ {
264
+ "epoch": 3.73,
265
+ "learning_rate": 1.4607843137254903e-06,
266
+ "loss": 0.7805,
267
+ "step": 1900
268
+ },
269
+ {
270
+ "epoch": 3.73,
271
+ "eval_loss": 0.9098581075668335,
272
+ "eval_runtime": 17.5589,
273
+ "eval_samples_per_second": 680.85,
274
+ "eval_steps_per_second": 10.65,
275
+ "step": 1900
276
+ },
277
+ {
278
+ "epoch": 3.92,
279
+ "learning_rate": 4.901960784313725e-07,
280
+ "loss": 0.7974,
281
+ "step": 2000
282
+ },
283
+ {
284
+ "epoch": 3.92,
285
+ "eval_loss": 0.8746156096458435,
286
+ "eval_runtime": 17.5409,
287
+ "eval_samples_per_second": 681.548,
288
+ "eval_steps_per_second": 10.661,
289
+ "step": 2000
290
+ },
291
+ {
292
+ "epoch": 4.0,
293
+ "step": 2040,
294
+ "total_flos": 1.3645021155456614e+17,
295
+ "train_loss": 0.42923324809354896,
296
+ "train_runtime": 1714.7653,
297
+ "train_samples_per_second": 304.534,
298
+ "train_steps_per_second": 1.19
299
+ },
300
+ {
301
+ "epoch": 4.0,
302
+ "eval_loss": 0.8582048416137695,
303
+ "eval_runtime": 17.5753,
304
+ "eval_samples_per_second": 680.218,
305
+ "eval_steps_per_second": 10.64,
306
+ "step": 2040
307
  }
308
  ],
309
  "logging_steps": 100,
310
+ "max_steps": 2040,
311
  "num_train_epochs": 4,
312
  "save_steps": 100,
313
+ "total_flos": 1.3645021155456614e+17,
314
  "trial_name": null,
315
  "trial_params": null
316
  }