metadata

language: en
datasets:
  - squad_v2
license: cc-by-4.0
co2_eq_emissions: 360

roberta-base for QA

This is the roberta-base model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.

Model Details

Model developers: Branden Chan, Timo Möller, Malte Pietsch, Tanay Soni
Model type: Transformer-based language model
Language: English
Downstream task: Extractive QA
Training data: SQuAD 2.0
Evaluation data: SQuAD 2.0
Code: See an example QA pipeline on Haystack
Infrastructure: 4x Tesla v100
Related Models: Users should see the roberta-base model card for information about the roberta-base model. Deepest has also released a distilled version of this model called deepset/tinyroberta-squad2. The distilled model has a comparable prediction quality and runs at twice the speed of the base model.

How to Use the Model

In Haystack

Haystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in Haystack:

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")
# or 
reader = TransformersReader(model_name_or_path="deepset/roberta-base-squad2",tokenizer="deepset/roberta-base-squad2")

For a complete example of roberta-base-squad2 being used for Question Answering, check out the Tutorials in Haystack Documentation

In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Using a distilled model instead

Please note that we have also released a distilled version of this model called deepset/tinyroberta-squad2. The distilled model has a comparable prediction quality and runs at twice the speed of the base model.

Uses and Limitations

Uses

This model can be used for the task of question answering.

Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). The roberta-base model card notes that:

The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral. Therefore, the model can have biased predictions...This bias will also affect all fine-tuned versions of this model.

See the roberta-base model card for demonstrative examples. Note that those examples are not a comprehensive stress-testing of the model. Readers considering using the model should consider whether more rigorous evaluations of the model may be appropriate depending on their use case and context. For discussion of bias in QA systems, see, e.g., Mao et al. (2021).

Training

Training Data

This model is the roberta-base model, fine tuned using the Squad2.0 dataset. See the Squad2.0 dataset card to learn more about Squad2.0. From the roberta-base model card training data section:

The RoBERTa model was pretrained on the reunion of five datasets:

BookCorpus, a dataset consisting of 11,038 unpublished books;

English Wikipedia (excluding lists, tables and headers) ;

CC-News, a dataset containing 63 millions English news articles crawled between September 2016 and February 2019.

OpenWebText, an opensource recreation of the WebText dataset used to train GPT-2,

Stories a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas.

Together theses datasets weight 160GB of text.

To learn more about these datasets, see some of the associated dataset cards: BookCorpus, CC-News.

Training Procedure

The hyperparameters were:

batch_size = 96
n_epochs = 2
base_LM_model = "roberta-base"
max_seq_len = 386
learning_rate = 3e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64

Evaluation Results

The model was evaluated on the SQuAD 2.0 dev set with the official eval script.

Evaluation results include:

"exact": 79.87029394424324,
"f1": 82.91251169582613,

"total": 11873,
"HasAns_exact": 77.93522267206478,
"HasAns_f1": 84.02838248389763,
"HasAns_total": 5928,
"NoAns_exact": 81.79983179142137,
"NoAns_f1": 81.79983179142137,
"NoAns_total": 5945

Environmental Impacts

Carbon emissions associated with training the model (fine-tuning the roberta-base model) were estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.

Hardware Type: 4x V100 GPU (p3.8xlarge)
Hours used: .5 (30 minutes)
Cloud Provider: AWS
Compute Region: EU-Ireland
Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid): .36 kg CO2 eq.

Authors

Branden Chan: branden.chan@deepset.ai
Timo Möller: timo.moeller@deepset.ai
Malte Pietsch: malte.pietsch@deepset.ai
Tanay Soni: tanay.soni@deepset.ai

About us

deepset is the company behind the open-source NLP framework Haystack which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.

Some of our other work:

Get in touch and join the Haystack community

For more info on Haystack, visit our GitHub repo and Documentation.

We also have a community open to everyone!

Twitter | LinkedIn | Slack | GitHub Discussions | Website

By the way: we're hiring!