--- license: mit datasets: - squad_v2 - squad language: - en library_name: transformers tags: - question-answering - squad - squad_v2 - t5 --- # flan-t5-large for Extractive QA This is the [flan-t5-large](https://huggingface.co/google/flan-t5-large) model, fine-tuned using the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering. This model was trained using LoRA available through the [PEFT library](https://github.com/huggingface/peft). NOTE: The token must be manually added to the beginning of the question for this model to work properly. It uses the token to be able to make "no answer" predictions. The t5 tokenizer does not automatically add this special token which is why it is added manually. ## Overview **Language model:** flan-t5-large **Language:** English **Downstream-task:** Extractive QA **Training data:** SQuAD 2.0 **Eval data:** SQuAD 2.0 **Infrastructure**: 1x NVIDIA 3070 ## Model Usage ### Using Transformers This uses the merged weights (base model weights + LoRA weights) to allow for simple use in Transformers pipelines. It has the same performance as using the weights separately when using the PEFT library. ```python import torch from transformers import( AutoModelForQuestionAnswering, AutoTokenizer, pipeline ) model_name = "sjrhuschlee/flan-t5-large-squad2" # a) Using pipelines nlp = pipeline('question-answering', model=model_name, tokenizer=model_name) qa_input = { 'question': f'{nlp.tokenizer.cls_token}Where do I live?', # 'Where do I live?' 'context': 'My name is Sarah and I live in London' } res = nlp(qa_input) # {'score': 0.984, 'start': 30, 'end': 37, 'answer': ' London'} # b) Load model & tokenizer model = AutoModelForQuestionAnswering.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) question = f'{tokenizer.cls_token}Where do I live?' # 'Where do I live?' context = 'My name is Sarah and I live in London' encoding = tokenizer(question, context, return_tensors="pt") start_scores, end_scores = model( encoding["input_ids"], attention_mask=encoding["attention_mask"], return_dict=False ) all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist()) answer_tokens = all_tokens[torch.argmax(start_scores):torch.argmax(end_scores) + 1] answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens)) # 'London' ``` ### Using with Peft **NOTE**: This requires code in the PR https://github.com/huggingface/peft/pull/473 for the PEFT library. ```python #!pip install peft from peft import LoraConfig, PeftModelForQuestionAnswering from transformers import AutoModelForQuestionAnswering, AutoTokenizer model_name = "sjrhuschlee/flan-t5-large-squad2" ```