--- language: en datasets: - c4 - wikipedia metrics: - f1 --- # T5-V1.1-large-rss This model is [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) finetuned on RSS dataset. The model was finetuned as part of ["How Optimal is Greedy Decoding for Extractive Question Answering?"](https://arxiv.org/abs/2108.05857), while the RSS pretraining method was introduced in [this paper](https://arxiv.org/pdf/2101.00438.pdf). ## Model description The original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) was only pre-trained on C4 excluding any supervised training. Our version is further trained on Rucurrent Span Selection scheme (RSS), using a sample from the dataset used to pretrain [Splinter](tau/splinter-large): * contexts with a span occurring more than once are detected * a single instance of the recurring span is maked * the model is trained (teacher forcing) to predict the masked span This training scheme naturally matches the extractive question answering task. During training time, the masked span is replaced with `` and the labels are formatted as `span`. Unlike [Splinter](tau/splinter-large), only one span is mask at a time. ## Intended uses & limitations This model naturally fits tasks where a span from a context is intended to be copied, like extractive question answering. This checkpoint is primarily aimed to be used in zero-shot setting - further fine-tuning it on an annotated dataset gives equal results to those of the original T5-v1.1-large. ### How to use You can use this model directly but it is recommended to format the input to be aligned with that of the training scheme and as a text-question context: ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer model = AutoModelForSeq2SeqLM.from_pretrained('tau/t5-v1_1-large-rss') tokenizer = AutoTokenizer.from_pretrained('tau/t5-v1_1-large-rss') passage = 'Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. ' question = 'When was Obama inaugurated?' text = f'Text: {passage}.\nQuestion: {question}\nAnswer:{tokenizer.additional_special_tokens[0]}.' encoded_input = tokenizer(text, return_tensors='pt') output_ids = model.generate(input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask, eos_token_id=tokenizer.additional_special_tokens_ids[1], num_beams=1, max_length=512, min_length=3) tokenizer.decode(output_ids[0]) ``` The generated answer is then `" 2009"`, while the one generated by the original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) is `" On January 20, 2009"` - a correct yet non-extractive answer. ### Limitations and bias Although using the model with greedy decoding tends toward extracted outputs, is may sometimes produce non-extracted ones - may it be different casing or a whole different string (or substring) that may bear another semantic meaning. ### Pretraining The model was finetuned with 100,000 rss-examples for 3 epochs using Adafactor optimizer with constant learning rate of 5e-5. ## Evaluation results Evaluated over few-shot QA in a zero-shot setting (no finetuning on annotated examples): |Model \ Dataset| SQuAD |TriviaQA | NaturalQs | NewsQA | SearchQA | HotpotQA | BioASQ | TextbookQA| |:-------------:|:-----:|:-------:|:---------:|:------:|:--------:|:--------:|:------:|:---------:| |T5 | 50.4 | 61.7 | 42.1 | 19.2 | 24.0 | 43.3 | 55.5 | 17.8 | |T5-rss | 71.4 | 69.3 | 57.2 | 43.2 | 29.7 | 59.0 | 65.5 | 39.0 | The gap between the two models diminishes as more training examples are introduced, for additional result see the [paper]((https://arxiv.org/abs/2108.05857). ### BibTeX entry and citation info ```bibtex @inproceedings{ram-etal-2021-shot, title = "Few-Shot Question Answering by Pretraining Span Selection", author = "Ram, Ori and Kirstain, Yuval and Berant, Jonathan and Globerson, Amir and Levy, Omer", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.239", doi = "10.18653/v1/2021.acl-long.239", pages = "3066--3079", }, @misc{castel2021optimal, title={How Optimal is Greedy Decoding for Extractive Question Answering?}, author={Or Castel and Ori Ram and Avia Efrat and Omer Levy}, year={2021}, eprint={2108.05857}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```