jonathanagustin commited on
Commit
9f68fb1
1 Parent(s): 9030417

Model save

Browse files
Files changed (4) hide show
  1. README.md +58 -261
  2. metrics.json +3 -3
  3. trainer_state.json +17 -17
  4. training_args.bin +1 -1
README.md CHANGED
@@ -1,281 +1,78 @@
1
  ---
2
- language: en
3
- license: mit
4
- model_details: "\n ## Abstract\n This model, 'roberta-finetuned', is\
5
- \ a question-answering chatbot trained on the SQuAD dataset, demonstrating competency\
6
- \ in building conversational AI using recent advances in natural language processing.\
7
- \ It utilizes a BERT model fine-tuned for extractive question answering.\n\n \
8
- \ ## Data Collection and Preprocessing\n The model was trained on the\
9
- \ Stanford Question Answering Dataset (SQuAD), which contains over 100,000 question-answer\
10
- \ pairs based on Wikipedia articles. The data preprocessing involved tokenizing\
11
- \ context paragraphs and questions, truncating sequences to fit BERT's max length,\
12
- \ and adding special tokens to mark question and paragraph segments.\n\n \
13
- \ ## Model Architecture and Training\n The architecture is based on the BERT\
14
- \ transformer model, which was pretrained on large unlabeled text corpora. For this\
15
- \ project, the BERT base model was fine-tuned on SQuAD for extractive question answering,\
16
- \ with additional output layers for predicting the start and end indices of the\
17
- \ answer span.\n\n ## SQuAD 2.0 Dataset\n SQuAD 2.0 combines the existing\
18
- \ SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers\
19
- \ to look similar to answerable ones. This version of the dataset challenges models\
20
- \ to not only produce answers when possible but also determine when no answer is\
21
- \ supported by the paragraph and abstain from answering.\n "
22
- intended_use: "\n - Answering questions from the squad_v2 dataset.\n \
23
- \ - Developing question-answering systems within the scope of the aai520-project.\n\
24
- \ - Research and experimentation in the NLP question-answering domain.\n\
25
- \ "
26
- limitations_and_bias: "\n The model inherits limitations and biases from the\
27
- \ 'roberta-base' model, as it was trained on the same foundational data. \n \
28
- \ It may underperform on questions that are ambiguous or too far outside the\
29
- \ scope of the topics covered in the squad_v2 dataset. \n Additionally, the\
30
- \ model may reflect societal biases present in its training data.\n "
31
- ethical_considerations: "\n This model should not be used for making critical\
32
- \ decisions without human oversight, \n as it can generate incorrect or biased\
33
- \ answers, especially for topics not covered in the training data. \n Users\
34
- \ should also consider the ethical implications of using AI in decision-making processes\
35
- \ and the potential for perpetuating biases.\n "
36
- evaluation: "\n The model was evaluated on the squad_v2 dataset using various\
37
- \ metrics. These metrics, along with their corresponding scores, \n are detailed\
38
- \ in the 'eval_results' section. The evaluation process ensured a comprehensive\
39
- \ assessment of the model's performance \n in question-answering scenarios.\n\
40
- \ "
41
- training: "\n The model was trained over 4 epochs with a learning rate of 2e-05,\
42
- \ using a batch size of 128. \n The training utilized a cross-entropy loss\
43
- \ function and the AdamW optimizer, with gradient accumulation over 4 steps.\n \
44
- \ "
45
- tips_and_tricks: "\n For optimal performance, questions should be clear, concise,\
46
- \ and grammatically correct. \n The model performs best on questions related\
47
- \ to topics covered in the squad_v2 dataset. \n It is advisable to pre-process\
48
- \ text for consistency in encoding and punctuation, and to manage expectations for\
49
- \ questions on topics outside the training data.\n "
50
  model-index:
51
- - name: roberta-finetuned
52
- results:
53
- - task:
54
- type: question-answering
55
- dataset:
56
- name: SQuAD v2
57
- type: squad_v2
58
- metrics:
59
- - type: Exact
60
- value: 100.0
61
- - type: F1
62
- value: 100.0
63
- - type: Total
64
- value: 2
65
- - type: Hasans Exact
66
- value: 100.0
67
- - type: Hasans F1
68
- value: 100.0
69
- - type: Hasans Total
70
- value: 2
71
- - type: Best Exact
72
- value: 100.0
73
- - type: Best Exact Thresh
74
- value: 0.9603068232536316
75
- - type: Best F1
76
- value: 100.0
77
- - type: Best F1 Thresh
78
- value: 0.9603068232536316
79
- - type: Total Time In Seconds
80
- value: 0.034724613000435056
81
- - type: Samples Per Second
82
- value: 57.59603425889707
83
- - type: Latency In Seconds
84
- value: 0.017362306500217528
85
  ---
86
 
87
- # Model Card for Model ID
 
88
 
89
- <!-- Provide a quick summary of what the model is/does. -->
90
 
 
 
 
91
 
 
92
 
93
- ## Model Details
94
 
95
- ### Model Description
96
 
97
- <!-- Provide a longer summary of what this model is. -->
98
 
 
99
 
 
100
 
101
- - **Developed by:** [More Information Needed]
102
- - **Shared by [optional]:** [More Information Needed]
103
- - **Model type:** [More Information Needed]
104
- - **Language(s) (NLP):** en
105
- - **License:** mit
106
- - **Finetuned from model [optional]:** [More Information Needed]
107
 
108
- ### Model Sources [optional]
109
 
110
- <!-- Provide the basic links for the model. -->
 
 
 
 
 
 
 
 
 
111
 
112
- - **Repository:** [More Information Needed]
113
- - **Paper [optional]:** [More Information Needed]
114
- - **Demo [optional]:** [More Information Needed]
115
 
116
- ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
119
-
120
- ### Direct Use
121
-
122
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
123
-
124
- [More Information Needed]
125
-
126
- ### Downstream Use [optional]
127
-
128
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
129
-
130
- [More Information Needed]
131
-
132
- ### Out-of-Scope Use
133
-
134
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
135
-
136
- [More Information Needed]
137
-
138
- ## Bias, Risks, and Limitations
139
-
140
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
141
-
142
- [More Information Needed]
143
-
144
- ### Recommendations
145
-
146
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
147
-
148
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
149
-
150
- ## How to Get Started with the Model
151
-
152
- Use the code below to get started with the model.
153
-
154
- [More Information Needed]
155
-
156
- ## Training Details
157
-
158
- ### Training Data
159
-
160
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
161
-
162
- [More Information Needed]
163
-
164
- ### Training Procedure
165
-
166
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
167
-
168
- #### Preprocessing [optional]
169
-
170
- [More Information Needed]
171
-
172
-
173
- #### Training Hyperparameters
174
-
175
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
176
-
177
- #### Speeds, Sizes, Times [optional]
178
-
179
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
180
-
181
- [More Information Needed]
182
-
183
- ## Evaluation
184
-
185
- <!-- This section describes the evaluation protocols and provides the results. -->
186
-
187
- ### Testing Data, Factors & Metrics
188
-
189
- #### Testing Data
190
-
191
- <!-- This should link to a Data Card if possible. -->
192
-
193
- [More Information Needed]
194
-
195
- #### Factors
196
-
197
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
198
-
199
- [More Information Needed]
200
-
201
- #### Metrics
202
-
203
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
204
-
205
- [More Information Needed]
206
-
207
- ### Results
208
-
209
- [More Information Needed]
210
-
211
- #### Summary
212
-
213
-
214
-
215
- ## Model Examination [optional]
216
-
217
- <!-- Relevant interpretability work for the model goes here -->
218
-
219
- [More Information Needed]
220
-
221
- ## Environmental Impact
222
-
223
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
224
-
225
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
226
-
227
- - **Hardware Type:** [More Information Needed]
228
- - **Hours used:** [More Information Needed]
229
- - **Cloud Provider:** [More Information Needed]
230
- - **Compute Region:** [More Information Needed]
231
- - **Carbon Emitted:** [More Information Needed]
232
-
233
- ## Technical Specifications [optional]
234
-
235
- ### Model Architecture and Objective
236
-
237
- [More Information Needed]
238
-
239
- ### Compute Infrastructure
240
-
241
- [More Information Needed]
242
-
243
- #### Hardware
244
-
245
- [More Information Needed]
246
-
247
- #### Software
248
-
249
- [More Information Needed]
250
-
251
- ## Citation [optional]
252
-
253
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
254
-
255
- **BibTeX:**
256
-
257
- [More Information Needed]
258
-
259
- **APA:**
260
-
261
- [More Information Needed]
262
-
263
- ## Glossary [optional]
264
-
265
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
266
-
267
- [More Information Needed]
268
-
269
- ## More Information [optional]
270
-
271
- [More Information Needed]
272
-
273
- ## Model Card Authors [optional]
274
-
275
- [More Information Needed]
276
-
277
- ## Model Card Contact
278
-
279
- [More Information Needed]
280
 
 
281
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - generated_from_trainer
4
+ datasets:
5
+ - squad_v2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  model-index:
7
+ - name: roberta-finetuned-squad_v2
8
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # roberta-finetuned-squad_v2
15
 
16
+ This model was trained from scratch on the squad_v2 dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.8582
19
 
20
+ ## Model description
21
 
22
+ More information needed
23
 
24
+ ## Intended uses & limitations
25
 
26
+ More information needed
27
 
28
+ ## Training and evaluation data
29
 
30
+ More information needed
31
 
32
+ ## Training procedure
 
 
 
 
 
33
 
34
+ ### Training hyperparameters
35
 
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 2e-05
38
+ - train_batch_size: 128
39
+ - eval_batch_size: 128
40
+ - seed: 42
41
+ - gradient_accumulation_steps: 4
42
+ - total_train_batch_size: 512
43
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
+ - lr_scheduler_type: linear
45
+ - num_epochs: 4
46
 
47
+ ### Training results
 
 
48
 
49
+ | Training Loss | Epoch | Step | Validation Loss |
50
+ |:-------------:|:-----:|:----:|:---------------:|
51
+ | 2.9129 | 0.2 | 100 | 1.4700 |
52
+ | 1.4395 | 0.39 | 200 | 1.2407 |
53
+ | 1.2356 | 0.59 | 300 | 1.0325 |
54
+ | 1.1284 | 0.78 | 400 | 0.9750 |
55
+ | 1.0821 | 0.98 | 500 | 0.9345 |
56
+ | 0.9978 | 1.18 | 600 | 0.9893 |
57
+ | 0.9697 | 1.37 | 700 | 0.9300 |
58
+ | 0.9455 | 1.57 | 800 | 0.9351 |
59
+ | 0.9322 | 1.76 | 900 | 0.9451 |
60
+ | 0.9269 | 1.96 | 1000 | 0.9064 |
61
+ | 0.9105 | 2.16 | 1100 | 0.8837 |
62
+ | 0.8805 | 2.35 | 1200 | 0.8876 |
63
+ | 0.8703 | 2.55 | 1300 | 0.9853 |
64
+ | 0.8699 | 2.75 | 1400 | 0.9235 |
65
+ | 0.8633 | 2.94 | 1500 | 0.8930 |
66
+ | 0.828 | 3.14 | 1600 | 0.8582 |
67
+ | 0.8284 | 3.33 | 1700 | 0.9203 |
68
+ | 0.8076 | 3.53 | 1800 | 0.8866 |
69
+ | 0.7805 | 3.73 | 1900 | 0.9099 |
70
+ | 0.7974 | 3.92 | 2000 | 0.8746 |
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
+ ### Framework versions
74
 
75
+ - Transformers 4.34.1
76
+ - Pytorch 2.1.0+cu118
77
+ - Datasets 2.14.5
78
+ - Tokenizers 0.14.1
metrics.json CHANGED
@@ -9,7 +9,7 @@
9
  "best_exact_thresh": 0.9603068232536316,
10
  "best_f1": 100.0,
11
  "best_f1_thresh": 0.9603068232536316,
12
- "total_time_in_seconds": 0.034005987999989884,
13
- "samples_per_second": 58.813171374423675,
14
- "latency_in_seconds": 0.017002993999994942
15
  }
 
9
  "best_exact_thresh": 0.9603068232536316,
10
  "best_f1": 100.0,
11
  "best_f1_thresh": 0.9603068232536316,
12
+ "total_time_in_seconds": 0.034724613000435056,
13
+ "samples_per_second": 57.59603425889707,
14
+ "latency_in_seconds": 0.017362306500217528
15
  }
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": 0.8582048416137695,
3
  "best_model_checkpoint": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/roberta-finetuned/checkpoint-1600",
4
- "epoch": 4.0,
5
  "eval_steps": 100,
6
- "global_step": 2040,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -289,28 +289,28 @@
289
  "step": 2000
290
  },
291
  {
292
- "epoch": 4.0,
293
- "step": 2040,
294
- "total_flos": 1.3645021155456614e+17,
295
- "train_loss": 0.015366485072117226,
296
- "train_runtime": 57.0805,
297
- "train_samples_per_second": 9148.547,
298
- "train_steps_per_second": 35.739
299
  },
300
  {
301
- "epoch": 4.0,
302
- "eval_loss": 0.8582048416137695,
303
- "eval_runtime": 17.613,
304
- "eval_samples_per_second": 678.761,
305
- "eval_steps_per_second": 10.617,
306
- "step": 2040
307
  }
308
  ],
309
  "logging_steps": 100,
310
- "max_steps": 2040,
311
  "num_train_epochs": 4,
312
  "save_steps": 100,
313
- "total_flos": 1.3645021155456614e+17,
314
  "trial_name": null,
315
  "trial_params": null
316
  }
 
1
  {
2
  "best_metric": 0.8582048416137695,
3
  "best_model_checkpoint": "/content/drive/My Drive/Colab Notebooks/aai520-project/checkpoints/roberta-finetuned/checkpoint-1600",
4
+ "epoch": 3.9215686274509802,
5
  "eval_steps": 100,
6
+ "global_step": 2000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
289
  "step": 2000
290
  },
291
  {
292
+ "epoch": 3.92,
293
+ "step": 2000,
294
+ "total_flos": 1.3377688443640013e+17,
295
+ "train_loss": 0.0,
296
+ "train_runtime": 0.6732,
297
+ "train_samples_per_second": 775720.158,
298
+ "train_steps_per_second": 1515.183
299
  },
300
  {
301
+ "epoch": 3.92,
302
+ "eval_loss": 0.8582085371017456,
303
+ "eval_runtime": 17.3486,
304
+ "eval_samples_per_second": 689.103,
305
+ "eval_steps_per_second": 5.418,
306
+ "step": 2000
307
  }
308
  ],
309
  "logging_steps": 100,
310
+ "max_steps": 1020,
311
  "num_train_epochs": 4,
312
  "save_steps": 100,
313
+ "total_flos": 1.3377688443640013e+17,
314
  "trial_name": null,
315
  "trial_params": null
316
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:79779cf4e7d8a5fdc99b0dc402459aacd75bad5cb5b42f73d24b20e7d7034ed4
3
  size 4664
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a83ce951e3313774ed7ff687bd51c4324e10cc0ddcf35e61cf92056310d5f13
3
  size 4664