t5-v1_1-large-rss / README.md
ocastel's picture
Update README.md
5cf6ecc
metadata
language: en
datasets:
  - c4
  - wikipedia
metrics:
  - f1

T5-V1.1-large-rss

This model is T5-v1.1-large finetuned on RSS dataset. The model was finetuned as part of "How Optimal is Greedy Decoding for Extractive Question Answering?", while the RSS pretraining method was introduced in this paper.

Model description

The original T5-v1.1-large was only pre-trained on C4 excluding any supervised training. Our version is further trained on Rucurrent Span Selection scheme (RSS), using a sample from the dataset used to pretrain Splinter:

  • contexts with a span occurring more than once are detected
  • a single instance of the recurring span is maked
  • the model is trained (teacher forcing) to predict the masked span This training scheme naturally matches the extractive question answering task.

During training time, the masked span is replaced with <extra_id_0> and the labels are formatted as <extra_id_0>span<extra_id_0>. Unlike Splinter, only one span is mask at a time.

Intended uses & limitations

This model naturally fits tasks where a span from a context is intended to be copied, like extractive question answering. This checkpoint is primarily aimed to be used in zero-shot setting - further fine-tuning it on an annotated dataset gives equal results to those of the original T5-v1.1-large.

How to use

You can use this model directly but it is recommended to format the input to be aligned with that of the training scheme and as a text-question context:

from transformers import  AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained('tau/t5-v1_1-large-rss')
tokenizer = AutoTokenizer.from_pretrained('tau/t5-v1_1-large-rss')

passage = 'Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. '
question = 'When was Obama inaugurated?'
text = f'Text: {passage}.\nQuestion: {question}\nAnswer:{tokenizer.additional_special_tokens[0]}.'
encoded_input = tokenizer(text, return_tensors='pt')
output_ids = model.generate(input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask,
               eos_token_id=tokenizer.additional_special_tokens_ids[1], num_beams=1, max_length=512, min_length=3)
tokenizer.decode(output_ids[0])

The generated answer is then "<pad><extra_id_0> 2009<extra_id_1>", while the one generated by the original T5-v1.1-large is "<pad><extra_id_0> On January 20, 2009<extra_id_1>" - a correct yet non-extractive answer.

Limitations and bias

Although using the model with greedy decoding tends toward extracted outputs, is may sometimes produce non-extracted ones - may it be different casing or a whole different string (or substring) that may bear another semantic meaning.

Pretraining

The model was finetuned with 100,000 rss-examples for 3 epochs using Adafactor optimizer with constant learning rate of 5e-5.

Evaluation results

Evaluated over few-shot QA in a zero-shot setting (no finetuning on annotated examples):

Model \ Dataset SQuAD TriviaQA NaturalQs NewsQA SearchQA HotpotQA BioASQ TextbookQA
T5 50.4 61.7 42.1 19.2 24.0 43.3 55.5 17.8
T5-rss 71.4 69.3 57.2 43.2 29.7 59.0 65.5 39.0

The gap between the two models diminishes as more training examples are introduced, for additional result see the paper.

BibTeX entry and citation info

@inproceedings{ram-etal-2021-shot,
    title = "Few-Shot Question Answering by Pretraining Span Selection",
    author = "Ram, Ori  and
      Kirstain, Yuval  and
      Berant, Jonathan  and
      Globerson, Amir  and
      Levy, Omer",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.239",
    doi = "10.18653/v1/2021.acl-long.239",
    pages = "3066--3079",
},
@misc{castel2021optimal,
      title={How Optimal is Greedy Decoding for Extractive Question Answering?}, 
      author={Or Castel and Ori Ram and Avia Efrat and Omer Levy},
      year={2021},
      eprint={2108.05857},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}