t5-v1_1-large-rss / README.md
ocastel's picture
Update README.md
5cf6ecc
---
language: en
datasets:
- c4
- wikipedia
metrics:
- f1
---
# T5-V1.1-large-rss
This model is [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) finetuned on RSS dataset. The model was finetuned as part of
["How Optimal is Greedy Decoding for Extractive Question Answering?"](https://arxiv.org/abs/2108.05857), while the RSS pretraining method was introduced in [this paper](https://arxiv.org/pdf/2101.00438.pdf).
## Model description
The original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) was only pre-trained on C4 excluding any supervised training. Our version is further trained on Rucurrent Span Selection scheme (RSS), using a sample from the dataset used to pretrain [Splinter](tau/splinter-large):
* contexts with a span occurring more than once are detected
* a single instance of the recurring span is maked
* the model is trained (teacher forcing) to predict the masked span
This training scheme naturally matches the extractive question answering task.
During training time, the masked span is replaced with `<extra_id_0>` and the labels are formatted as `<extra_id_0>span<extra_id_0>`. Unlike [Splinter](tau/splinter-large), only one span is mask at a time.
## Intended uses & limitations
This model naturally fits tasks where a span from a context is intended to be copied, like extractive question answering.
This checkpoint is primarily aimed to be used in zero-shot setting - further fine-tuning it on an annotated dataset gives equal results to those of the original T5-v1.1-large.
### How to use
You can use this model directly but it is recommended to format the input to be aligned with that of the training scheme and as a text-question context:
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained('tau/t5-v1_1-large-rss')
tokenizer = AutoTokenizer.from_pretrained('tau/t5-v1_1-large-rss')
passage = 'Barack Hussein Obama II is an American politician and attorney who served as the 44th president of the United States from 2009 to 2017. '
question = 'When was Obama inaugurated?'
text = f'Text: {passage}.\nQuestion: {question}\nAnswer:{tokenizer.additional_special_tokens[0]}.'
encoded_input = tokenizer(text, return_tensors='pt')
output_ids = model.generate(input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask,
eos_token_id=tokenizer.additional_special_tokens_ids[1], num_beams=1, max_length=512, min_length=3)
tokenizer.decode(output_ids[0])
```
The generated answer is then `"<pad><extra_id_0> 2009<extra_id_1>"`, while the one generated by the original [T5-v1.1-large](https://huggingface.co/google/t5-v1_1-large) is `"<pad><extra_id_0> On January 20, 2009<extra_id_1>"` - a correct yet non-extractive answer.
### Limitations and bias
Although using the model with greedy decoding tends toward extracted outputs, is may sometimes produce non-extracted ones - may it be different casing or a whole different string (or substring) that may bear another semantic meaning.
### Pretraining
The model was finetuned with 100,000 rss-examples for 3 epochs using Adafactor optimizer with constant learning rate of 5e-5.
## Evaluation results
Evaluated over few-shot QA in a zero-shot setting (no finetuning on annotated examples):
|Model \ Dataset| SQuAD |TriviaQA | NaturalQs | NewsQA | SearchQA | HotpotQA | BioASQ | TextbookQA|
|:-------------:|:-----:|:-------:|:---------:|:------:|:--------:|:--------:|:------:|:---------:|
|T5 | 50.4 | 61.7 | 42.1 | 19.2 | 24.0 | 43.3 | 55.5 | 17.8 |
|T5-rss | 71.4 | 69.3 | 57.2 | 43.2 | 29.7 | 59.0 | 65.5 | 39.0 |
The gap between the two models diminishes as more training examples are introduced, for additional result see the [paper]((https://arxiv.org/abs/2108.05857).
### BibTeX entry and citation info
```bibtex
@inproceedings{ram-etal-2021-shot,
title = "Few-Shot Question Answering by Pretraining Span Selection",
author = "Ram, Ori and
Kirstain, Yuval and
Berant, Jonathan and
Globerson, Amir and
Levy, Omer",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.239",
doi = "10.18653/v1/2021.acl-long.239",
pages = "3066--3079",
},
@misc{castel2021optimal,
title={How Optimal is Greedy Decoding for Extractive Question Answering?},
author={Or Castel and Ori Ram and Avia Efrat and Omer Levy},
year={2021},
eprint={2108.05857},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```