File size: 6,393 Bytes
71e9c82
 
a0a2cc0
 
 
 
 
 
 
 
 
 
 
 
ad829de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb52a60
ad829de
 
eb52a60
ad829de
7431a7e
 
 
 
 
 
 
 
 
eb52a60
 
 
 
 
 
7431a7e
 
 
 
 
 
 
 
 
 
eb52a60
7431a7e
 
eb52a60
7431a7e
7b956e0
 
 
 
eb52a60
7b956e0
 
 
 
eb52a60
 
 
 
 
 
7b956e0
 
 
 
eb52a60
7b956e0
 
 
 
eb52a60
 
 
 
 
 
81003ad
 
 
 
eb52a60
81003ad
 
 
 
eb52a60
 
 
 
 
 
3368621
 
 
 
eb52a60
3368621
 
 
 
eb52a60
 
 
 
 
 
71e9c82
a0a2cc0
 
 
 
 
26366e7
9f7b3ec
a0a2cc0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d66e31
 
 
 
9f7b3ec
9d66e31
a0a2cc0
 
 
 
 
09c14f0
a0a2cc0
 
606792c
9f7b3ec
 
606792c
a0a2cc0
 
 
 
 
914343a
a0a2cc0
914343a
a0a2cc0
 
1961e9a
914343a
a0a2cc0
 
 
 
 
 
 
 
a6e7894
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a0a2cc0
 
039280c
 
 
 
 
 
 
 
a0a2cc0
 
 
 
 
 
6249d84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
---
license: mit
datasets:
- squad_v2
- squad
language:
- en
library_name: transformers
pipeline_tag: question-answering
tags:
- question-answering
- squad
- squad_v2
- t5
model-index:
- name: sjrhuschlee/flan-t5-base-squad2
  results:
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squad_v2
      type: squad_v2
      config: squad_v2
      split: validation
    metrics:
    - type: exact_match
      value: 82.203
      name: Exact Match
    - type: f1
      value: 85.283
      name: F1
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squad
      type: squad
      config: plain_text
      split: validation
    metrics:
    - type: exact_match
      value: 86.367
      name: Exact Match
    - type: f1
      value: 92.965
      name: F1
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: adversarial_qa
      type: adversarial_qa
      config: adversarialQA
      split: validation
    metrics:
    - type: exact_match
      value: 34.167
      name: Exact Match
    - type: f1
      value: 46.911
      name: F1
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squad_adversarial
      type: squad_adversarial
      config: AddOneSent
      split: validation
    metrics:
    - type: exact_match
      value: 80.862
      name: Exact Match
    - type: f1
      value: 86.070
      name: F1
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squadshifts amazon
      type: squadshifts
      config: amazon
      split: test
    metrics:
    - type: exact_match
      value: 71.624
      name: Exact Match
    - type: f1
      value: 85.113
      name: F1
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squadshifts new_wiki
      type: squadshifts
      config: new_wiki
      split: test
    metrics:
    - type: exact_match
      value: 82.389
      name: Exact Match
    - type: f1
      value: 91.259
      name: F1
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squadshifts nyt
      type: squadshifts
      config: nyt
      split: test
    metrics:
    - type: exact_match
      value: 83.736
      name: Exact Match
    - type: f1
      value: 91.675
      name: F1
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squadshifts reddit
      type: squadshifts
      config: reddit
      split: test
    metrics:
    - type: exact_match
      value: 72.743
      name: Exact Match
    - type: f1
      value: 84.273
      name: F1
---

# flan-t5-base for Extractive QA

This is the [flan-t5-base](https://huggingface.co/google/flan-t5-base) model, fine-tuned using the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering.

**UPDATE:** With transformers version 4.31.0 the `use_remote_code=True` is no longer necessary.

**NOTE:** The `<cls>` token must be manually added to the beginning of the question for this model to work properly.
It uses the `<cls>` token to be able to make "no answer" predictions.
The t5 tokenizer does not automatically add this special token which is why it is added manually.

## Overview
**Language model:** flan-t5-base  
**Language:** English  
**Downstream-task:** Extractive QA  
**Training data:** SQuAD 2.0  
**Eval data:** SQuAD 2.0  
**Infrastructure**: 1x NVIDIA 3070  

## Model Usage
```python
import torch
from transformers import(
  AutoModelForQuestionAnswering,
  AutoTokenizer,
  pipeline
)
model_name = "sjrhuschlee/flan-t5-base-squad2"

# a) Using pipelines
nlp = pipeline(
  'question-answering',
  model=model_name,
  tokenizer=model_name,
  # trust_remote_code=True, # Do not use if version transformers>=4.31.0
)
qa_input = {
'question': f'{nlp.tokenizer.cls_token}Where do I live?',  # '<cls>Where do I live?'
'context': 'My name is Sarah and I live in London'
}
res = nlp(qa_input)
# {'score': 0.980, 'start': 30, 'end': 37, 'answer': ' London'}

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(
  model_name,
  # trust_remote_code=True # Do not use if version transformers>=4.31.0
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

question = f'{tokenizer.cls_token}Where do I live?'  # '<cls>Where do I live?'
context = 'My name is Sarah and I live in London'
encoding = tokenizer(question, context, return_tensors="pt")
output = model(
  encoding["input_ids"],
  attention_mask=encoding["attention_mask"]
)

all_tokens = tokenizer.convert_ids_to_tokens(encoding["input_ids"][0].tolist())
answer_tokens = all_tokens[torch.argmax(output["start_logits"]):torch.argmax(output["end_logits"]) + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
# 'London'
```

## Metrics

```bash
# Squad v2
{
    "eval_HasAns_exact": 79.97638326585695,
    "eval_HasAns_f1": 86.1444296592862,
    "eval_HasAns_total": 5928,
    "eval_NoAns_exact": 84.42388561816652,
    "eval_NoAns_f1": 84.42388561816652,
    "eval_NoAns_total": 5945,
    "eval_best_exact": 82.2033184536343,
    "eval_best_exact_thresh": 0.0,
    "eval_best_f1": 85.28292588395921,
    "eval_best_f1_thresh": 0.0,
    "eval_exact": 82.2033184536343,
    "eval_f1": 85.28292588395928,
    "eval_runtime": 522.0299,
    "eval_samples": 12001,
    "eval_samples_per_second": 22.989,
    "eval_steps_per_second": 0.96,
    "eval_total": 11873
}

# Squad
{
    "eval_exact_match": 86.3197729422895,
    "eval_f1": 92.94686836210295,
    "eval_runtime": 442.1088,
    "eval_samples": 10657,
    "eval_samples_per_second": 24.105,
    "eval_steps_per_second": 1.007
}
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 6
- total_train_batch_size: 96
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4.0

### Training results



### Framework versions

- Transformers 4.30.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3