Update README.md
Browse files
README.md
CHANGED
@@ -17,10 +17,10 @@ bert_squad is a transformer-based model trained for context-based question answe
|
|
17 |
|
18 |
The model was trained using free computational resources, demonstrating its accessibility for educational and small-scale research purposes.
|
19 |
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
|
25 |
### Model Sources [optional]
|
26 |
|
@@ -30,167 +30,156 @@ The model was trained using free computational resources, demonstrating its acce
|
|
30 |
- **Paper [optional]:** [More Information Needed]
|
31 |
- **Demo [optional]:** [More Information Needed]
|
32 |
|
33 |
-
##
|
34 |
|
35 |
-
|
36 |
|
37 |
-
###
|
38 |
|
39 |
-
|
|
|
|
|
40 |
|
41 |
-
[More Information Needed]
|
42 |
|
43 |
-
### Downstream Use [optional]
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
48 |
|
49 |
-
|
|
|
|
|
50 |
|
51 |
-
|
|
|
52 |
|
53 |
-
[More Information Needed]
|
54 |
-
|
55 |
-
## Bias, Risks, and Limitations
|
56 |
-
|
57 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
58 |
-
|
59 |
-
[More Information Needed]
|
60 |
-
|
61 |
-
### Recommendations
|
62 |
-
|
63 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
64 |
-
|
65 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
66 |
|
67 |
## How to Get Started with the Model
|
68 |
|
69 |
Use the code below to get started with the model.
|
70 |
|
71 |
-
|
|
|
72 |
|
73 |
-
|
|
|
|
|
|
|
74 |
|
75 |
-
### Training Data
|
76 |
|
77 |
-
|
78 |
|
79 |
-
[More Information Needed]
|
80 |
|
81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
84 |
|
85 |
-
#### Preprocessing [optional]
|
86 |
|
87 |
-
[More Information Needed]
|
88 |
|
89 |
|
90 |
-
|
|
|
91 |
|
92 |
-
|
93 |
|
94 |
-
#### Speeds, Sizes, Times [optional]
|
95 |
|
96 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
97 |
|
98 |
-
|
99 |
|
100 |
-
|
|
|
101 |
|
102 |
-
|
|
|
103 |
|
104 |
-
|
|
|
105 |
|
106 |
-
|
|
|
107 |
|
108 |
-
<!-- This should link to a Dataset Card if possible. -->
|
109 |
|
110 |
-
[More Information Needed]
|
111 |
|
112 |
-
#### Factors
|
113 |
|
114 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
115 |
|
116 |
-
[More Information Needed]
|
117 |
|
118 |
-
#### Metrics
|
119 |
|
120 |
-
|
121 |
|
122 |
-
|
|
|
|
|
|
|
123 |
|
124 |
### Results
|
125 |
|
126 |
-
|
127 |
-
|
128 |
-
#### Summary
|
129 |
-
|
130 |
-
|
131 |
|
132 |
-
|
|
|
|
|
|
|
|
|
133 |
|
134 |
-
|
135 |
-
|
136 |
-
[More Information Needed]
|
137 |
-
|
138 |
-
## Environmental Impact
|
139 |
|
140 |
-
|
141 |
|
142 |
-
|
143 |
|
144 |
-
- **Hardware Type:** [More Information Needed]
|
145 |
-
- **Hours used:** [More Information Needed]
|
146 |
-
- **Cloud Provider:** [More Information Needed]
|
147 |
-
- **Compute Region:** [More Information Needed]
|
148 |
-
- **Carbon Emitted:** [More Information Needed]
|
149 |
|
150 |
-
## Technical Specifications [optional]
|
151 |
|
152 |
### Model Architecture and Objective
|
153 |
|
154 |
-
|
155 |
|
156 |
### Compute Infrastructure
|
157 |
|
158 |
-
[More Information Needed]
|
159 |
-
|
160 |
#### Hardware
|
161 |
|
162 |
-
|
163 |
|
164 |
#### Software
|
165 |
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
|
170 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
171 |
|
172 |
**BibTeX:**
|
173 |
|
174 |
-
|
|
|
|
|
|
|
|
|
|
|
175 |
|
176 |
-
**APA:**
|
177 |
|
178 |
-
[More Information Needed]
|
179 |
|
180 |
## Glossary [optional]
|
181 |
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
## More Information [optional]
|
187 |
-
|
188 |
-
[More Information Needed]
|
189 |
-
|
190 |
-
## Model Card Authors [optional]
|
191 |
-
|
192 |
-
[More Information Needed]
|
193 |
-
|
194 |
-
## Model Card Contact
|
195 |
-
|
196 |
-
[More Information Needed]
|
|
|
17 |
|
18 |
The model was trained using free computational resources, demonstrating its accessibility for educational and small-scale research purposes.
|
19 |
|
20 |
+
Developed by: SADAT PARVEJ, RAFIFA BINTE JAHIR
|
21 |
+
Shared by: SADAT PARVEJ
|
22 |
+
Language(s) (NLP): ENGLISH
|
23 |
+
Finetuned from model: https://huggingface.co/google-bert/bert-base-uncased
|
24 |
|
25 |
### Model Sources [optional]
|
26 |
|
|
|
30 |
- **Paper [optional]:** [More Information Needed]
|
31 |
- **Demo [optional]:** [More Information Needed]
|
32 |
|
33 |
+
## Training Objective
|
34 |
|
35 |
+
The model predicts the most relevant span of text in a given passage that answers a specific question. It fine-tunes BERT's ability to analyze context using supervised data from SQuAD.
|
36 |
|
37 |
+
### Performance Benchmarks
|
38 |
|
39 |
+
Training Loss: 0.477800
|
40 |
+
Validation Loss: 0.465936
|
41 |
+
Exact Match (EM): 87.568590%
|
42 |
|
|
|
43 |
|
|
|
44 |
|
45 |
+
## Intended Uses & Limitations
|
46 |
|
47 |
+
This model is designed for tasks such as:
|
48 |
|
49 |
+
Extractive Question Answering
|
50 |
+
Reading comprehension applications
|
51 |
+
Known Limitations:
|
52 |
|
53 |
+
As BERT is inherently a masked language model (MLM), its original pretraining limits its ability for generative tasks or handling queries outside the SQuAD-style question-answering setup.
|
54 |
+
The model's predictions may be biased or overly reliant on the training dataset, as SQuAD comprises structured and fact-based question-answer pairs.
|
55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
## How to Get Started with the Model
|
58 |
|
59 |
Use the code below to get started with the model.
|
60 |
|
61 |
+
from transformers import pipeline
|
62 |
+
qa_pipeline = pipeline('question-answering', model='bert_squad')
|
63 |
|
64 |
+
context = "BERT is a transformers model for natural language processing."
|
65 |
+
question = "What is BERT used for?"
|
66 |
+
result = qa_pipeline(question=question, context=context)
|
67 |
+
print(result)
|
68 |
|
|
|
69 |
|
70 |
+
## Training Details
|
71 |
|
|
|
72 |
|
73 |
+
| Step | Training Loss | Validation Loss | Exact Match | Squad F1 | Start Accuracy | End Accuracy |
|
74 |
+
|------|---------------|-----------------|-------------|----------|----------------|--------------|
|
75 |
+
| 100 | 0.632200 | 0.811809 | 84.749290 | 84.749290| 0.847493 | 0.899243 |
|
76 |
+
| 200 | 0.751500 | 0.627198 | 84.768212 | 84.768212| 0.847682 | 0.899243 |
|
77 |
+
| 300 | 0.662600 | 0.557515 | 86.244087 | 86.244087| 0.862441 | 0.899243 |
|
78 |
+
| 400 | 0.600400 | 0.567693 | 86.177862 | 86.177862| 0.861779 | 0.899243 |
|
79 |
+
| 500 | 0.613200 | 0.523546 | 86.499527 | 86.499527| 0.864995 | 0.899243 |
|
80 |
+
| 600 | 0.495200 | 0.539225 | 86.565752 | 86.565752| 0.865658 | 0.899243 |
|
81 |
+
| 700 | 0.645300 | 0.552358 | 85.354778 | 85.354778| 0.853548 | 0.899243 |
|
82 |
+
| 800 | 0.499100 | 0.562317 | 86.338694 | 86.338694| 0.863387 | 0.899243 |
|
83 |
+
| 900 | 0.482800 | 0.499747 | 86.811731 | 86.811731| 0.868117 | 0.899243 |
|
84 |
+
| 1000 | 0.372800 | 0.543513 | 86.972564 | 86.972564| 0.869726 | 0.900000 |
|
85 |
+
| 1100 | 0.554000 | 0.502747 | 85.969726 | 85.969726| 0.859697 | 0.894797 |
|
86 |
+
| 1200 | 0.459800 | 0.484941 | 87.019868 | 87.019868| 0.870199 | 0.900662 |
|
87 |
+
| 1300 | 0.463600 | 0.477527 | 87.407758 | 87.407758| 0.874078 | 0.899905 |
|
88 |
+
| 1400 | 0.356800 | 0.499119 | 87.549669 | 87.549669| 0.875497 | 0.901608 |
|
89 |
+
| 1500 | 0.494200 | 0.485287 | 87.549669 | 87.549669| 0.875497 | 0.901703 |
|
90 |
+
| 1600 | 0.521100 | 0.466062 | 87.284768 | 87.284768| 0.872848 | 0.899243 |
|
91 |
+
| 1700 | 0.461200 | 0.462704 | 87.540208 | 87.540208| 0.875402 | 0.901419 |
|
92 |
+
| 1800 | 0.415700 | 0.474295 | 87.691580 | 87.691580| 0.876916 | 0.901892 |
|
93 |
+
| 1900 | 0.622900 | 0.462900 | 87.417219 | 87.417219| 0.874172 | 0.901987 |
|
94 |
+
| 2000 | 0.477800 | 0.465936 | 87.568590 | 87.568590| 0.875686 | 0.901892 |
|
95 |
|
|
|
96 |
|
|
|
97 |
|
|
|
98 |
|
99 |
|
100 |
+
### Training Data
|
101 |
+
The model was trained on the [SQuAD](https://huggingface.co/datasets/squad) dataset, a widely used benchmark for context-based question-answering tasks. It consists of passages from Wikipedia and corresponding questions, with human-annotated answers.
|
102 |
|
103 |
+
During training, the dataset was processed to extract contexts, questions, and answers, ensuring compatibility with the BERT architecture for QA. The training utilized free resources to minimize costs and focus on model efficiency.
|
104 |
|
|
|
105 |
|
|
|
106 |
|
107 |
+
### Training Procedure
|
108 |
|
109 |
+
**Training Objective**
|
110 |
+
The model was trained with the objective of performing context-based question answering using the SQuAD dataset. The fine-tuning process adapts BERT's masked language model (MLM) architecture for QA tasks by leveraging its ability to encode contextual relationships between the passage, question, and answer.
|
111 |
|
112 |
+
**Optimization**
|
113 |
+
The training utilized the AdamW optimizer with a linear learning rate scheduler and warm-up steps to ensure effective weight updates and prevent overfitting. The training was run for 2000 steps, with early stopping applied based on the validation loss and exact match score.
|
114 |
|
115 |
+
**Hardware and Resources**
|
116 |
+
Training was conducted on free resources, such as Google Colab or equivalent free GPU resources. While this limited the scale, adjustments in batch size and learning rate were optimized to make the training efficient within these constraints.
|
117 |
|
118 |
+
**Unique Features**
|
119 |
+
The model fine-tuning procedure emphasizes efficient learning, leveraging BERT's pre-trained knowledge while adapting it specifically to QA tasks in a resource-constrained environment.
|
120 |
|
|
|
121 |
|
|
|
122 |
|
|
|
123 |
|
|
|
124 |
|
|
|
125 |
|
|
|
126 |
|
127 |
+
#### Metrics
|
128 |
|
129 |
+
Performance was evaluated using the following metrics:
|
130 |
+
- **Exact Match (EM)**: Measures the percentage of predictions that match the ground-truth answers exactly.
|
131 |
+
- **F1 Score**: Assesses the overlap between the predicted and true answers at a token level, balancing precision and recall.
|
132 |
+
- **Start and End Accuracy**: Tracks the model’s ability to correctly identify the start and end indices of answers within the context.
|
133 |
|
134 |
### Results
|
135 |
|
136 |
+
The model trained on the SQuAD dataset achieved the following key performance metrics:
|
|
|
|
|
|
|
|
|
137 |
|
138 |
+
Exact Match (EM): Up to 87.69%
|
139 |
+
F1 Score: Up to 87.69%
|
140 |
+
Validation Loss: Reduced to 0.46
|
141 |
+
Start Accuracy: Peaked at 87.69%
|
142 |
+
End Accuracy: Peaked at 90.19%
|
143 |
|
144 |
+
#### Summary
|
145 |
+
#### Summary
|
|
|
|
|
|
|
146 |
|
147 |
+
The model, **bert_squad**, was fine-tuned for context-based question answering using the SQuAD dataset from Hugging Face. Key metrics include an Exact Match (EM) and F1 score of up to **87.69%**, demonstrating strong accuracy. Performance benchmarks show consistent improvement in loss and accuracy over 2000 steps, with validation loss reaching as low as **0.46**.
|
148 |
|
149 |
+
The training utilized free resources, leveraging BERT’s robust pretraining, although BERT’s limitation as a Masked Language Model (MLM) remains a consideration. This work highlights the potential for effective question-answering systems built on pre-existing datasets and infrastructure.
|
150 |
|
|
|
|
|
|
|
|
|
|
|
151 |
|
|
|
152 |
|
153 |
### Model Architecture and Objective
|
154 |
|
155 |
+
The model uses BERT, a pre-trained Transformer-based architecture, fine-tuned for context-based question answering tasks. It aims to predict answers based on the given input text and context.
|
156 |
|
157 |
### Compute Infrastructure
|
158 |
|
|
|
|
|
159 |
#### Hardware
|
160 |
|
161 |
+
GPU: Tesla P100, NVIDIA T4
|
162 |
|
163 |
#### Software
|
164 |
|
165 |
+
Framework: Hugging Face Transformers
|
166 |
+
Dataset: SQuAD (from Hugging Face)
|
167 |
+
Other tools: Python, PyTorch
|
168 |
|
|
|
169 |
|
170 |
**BibTeX:**
|
171 |
|
172 |
+
@misc{bert_squad_finetune,
|
173 |
+
title = {BERT Fine-tuned for SQuAD},
|
174 |
+
author = {Your Name or Team Name},
|
175 |
+
year = {2024},
|
176 |
+
url = {https://huggingface.co/your-model-repository}
|
177 |
+
}
|
178 |
|
|
|
179 |
|
|
|
180 |
|
181 |
## Glossary [optional]
|
182 |
|
183 |
+
Exact Match (EM): A metric measuring the percentage of predictions that match the ground truth exactly.
|
184 |
+
F1 Score: The harmonic mean of precision and recall, used for evaluating the quality of the predictions.
|
185 |
+
Masked Language Model (MLM): Pre-training objective for BERT, predicting masked words in input sentences.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|