flan-t5-base for Extractive QA

This is the flan-t5-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering.

UPDATE: With transformers version 4.31.0 the use_remote_code=True is no longer necessary.

NOTE: The <cls> token must be manually added to the beginning of the question for this model to work properly. It uses the <cls> token to be able to make "no answer" predictions. The t5 tokenizer does not automatically add this special token which is why it is added manually.

Overview

Language model: flan-t5-base
Language: English
Downstream-task: Extractive QA
Training data: SQuAD 2.0
Eval data: SQuAD 2.0
Infrastructure: 1x NVIDIA 3070

Model Usage

import torch
from transformers import(
  AutoModelForQuestionAnswering,
  AutoTokenizer,
  pipeline
)
model_name = "sjrhuschlee/flan-t5-base-squad2"

# a) Using pipelines
nlp = pipeline(
  'question-answering',
  model=model_name,
  tokenizer=model_name,
  # trust_remote_code=True, # Do not use if version transformers>=4.31.0
)
qa_input = {
'question': f'{nlp.tokenizer.cls_token}Where do I live?',  # '<cls>Where do I live?'
'context': 'My name is Sarah and I live in London'
}
res = nlp(qa_input)
# {'score': 0.980, 'start': 30, 'end': 37, 'answer': ' London'}

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(
  model_name,
  # trust_remote_code=True # Do not use if version transformers>=4.31.0
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

question = f'{tokenizer.cls_token}Where do I live?'  # '<cls>Where do I live?'
context = 'My name is Sarah and I live in London'
encoding = tokenizer(question, context, return_tensors="pt")
output = model(
  encoding["input_ids"],
  attention_mask=encoding["attention_mask"]
)

all_tokens = tokenizer.convert_ids_to_tokens(encoding["input_ids"][0].tolist())
answer_tokens = all_tokens[torch.argmax(output["start_logits"]):torch.argmax(output["end_logits"]) + 1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
# 'London'

Metrics

# Squad v2
{
    "eval_HasAns_exact": 79.97638326585695,
    "eval_HasAns_f1": 86.1444296592862,
    "eval_HasAns_total": 5928,
    "eval_NoAns_exact": 84.42388561816652,
    "eval_NoAns_f1": 84.42388561816652,
    "eval_NoAns_total": 5945,
    "eval_best_exact": 82.2033184536343,
    "eval_best_exact_thresh": 0.0,
    "eval_best_f1": 85.28292588395921,
    "eval_best_f1_thresh": 0.0,
    "eval_exact": 82.2033184536343,
    "eval_f1": 85.28292588395928,
    "eval_runtime": 522.0299,
    "eval_samples": 12001,
    "eval_samples_per_second": 22.989,
    "eval_steps_per_second": 0.96,
    "eval_total": 11873
}

# Squad
{
    "eval_exact_match": 86.3197729422895,
    "eval_f1": 92.94686836210295,
    "eval_runtime": 442.1088,
    "eval_samples": 10657,
    "eval_samples_per_second": 24.105,
    "eval_steps_per_second": 1.007
}

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 96
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4.0

Training results

Framework versions

  • Transformers 4.30.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
2,270
Safetensors
Model size
223M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train sjrhuschlee/flan-t5-base-squad2

Evaluation results