potsawee commited on
Commit
5c0a488
1 Parent(s): 6d61d4a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - race
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: question-answering
9
+ ---
10
+ # longformer-large-4096 fine-tuned to RACE for (Multiple-Choice) Question Answering
11
+ - Input: `context`, `question`, `options`
12
+ - Output: logit (or probability over the options)
13
+
14
+ ## Model Details
15
+
16
+ longformer-large-4096 model is fine-tuned to the RACE dataset where the input is a concatenation of ```context + question + option```. We follow the architecture/setup described in https://openreview.net/forum?id=HJgJtT4tvB).
17
+ The output is the logit over the options. This is the question answering (QA) component in our [MQAG paper](https://arxiv.org/abs/2301.12307),
18
+ or please refer to the GitHub repo of this project: https://github.com/potsawee/mqag0.
19
+
20
+ ## How to Use the Model
21
+
22
+ Use the code below to get started with the model.
23
+
24
+ ```python
25
+ >>> import torch
26
+ >>> import numpy as np
27
+ >>> from transformers import LongformerTokenizer, LongformerForMultipleChoice
28
+
29
+ >>> tokenizer = LongformerTokenizer.from_pretrained("potsawee/longformer-large-4096-answering-race")
30
+ >>> model = LongformerForMultipleChoice.from_pretrained("potsawee/longformer-large-4096-answering-race")
31
+
32
+ >>> context = r"""Chelsea's mini-revival continued with a third victory in a row as they consigned struggling Leicester City to a fifth consecutive defeat.
33
+ Buoyed by their Champions League win over Borussia Dortmund, Chelsea started brightly and Ben Chilwell volleyed in from a tight angle against his old club.
34
+ Chelsea's Joao Felix and Leicester's Kiernan Dewsbury-Hall hit the woodwork in the space of two minutes, then Felix had a goal ruled out by the video assistant referee for offside.
35
+ Patson Daka rifled home an excellent equaliser after Ricardo Pereira won the ball off the dawdling Felix outside the box.
36
+ But Kai Havertz pounced six minutes into first-half injury time with an excellent dinked finish from Enzo Fernandez's clever aerial ball.
37
+ Mykhailo Mudryk thought he had his first goal for the Blues after the break but his effort was disallowed for offside.
38
+ Mateo Kovacic sealed the win as he volleyed in from Mudryk's header.
39
+ The sliding Foxes, who ended with 10 men following Wout Faes' late dismissal for a second booking, now just sit one point outside the relegation zone.
40
+ """.replace('\n', ' ')
41
+ >>> question = "Who had a goal ruled out for offside?"
42
+ >>> options = ['Mykhailo Mudryk', 'Ben Chilwell', 'Joao Felix', 'The Foxes']
43
+
44
+ >>> inputs = prepare_answering_input(
45
+ tokenizer=tokenizer, question=question,
46
+ options=options, context=context,
47
+ )
48
+ >>> outputs = model(**inputs)
49
+ >>> prob = torch.softmax(outputs.logits, dim=-1)[0].tolist()
50
+ >>> selected_answer = options[np.argmax(prob)]
51
+
52
+ >>> print(prob)
53
+ [0.085958, 0.043270, 0.719262, 0.151508]
54
+ >>> print(selected_answer)
55
+ Joao Felix
56
+ ```
57
+
58
+ where the function the prepare the input to the answering model is:
59
+
60
+ ```python
61
+ def prepare_answering_input(
62
+ tokenizer, # longformer_tokenizer
63
+ question, # str
64
+ options, # List[str]
65
+ context, # str
66
+ max_seq_length=4096,
67
+ ):
68
+ c_plus_q = question + ' ' + tokenizer.bos_token + ' ' + context
69
+ c_plus_q_4 = [c_plus_q] * len(options)
70
+ tokenized_examples = tokenizer(
71
+ c_plus_q_4, options,
72
+ max_length=max_seq_length,
73
+ padding="longest",
74
+ truncation=True,
75
+ return_tensors="pt",
76
+ )
77
+ input_ids = tokenized_examples['input_ids'].unsqueeze(0)
78
+ attention_mask = tokenized_examples['attention_mask'].unsqueeze(0)
79
+ example_encoded = {
80
+ "input_ids": input_ids,
81
+ "attention_mask": attention_mask,
82
+ }
83
+ return example_encoded
84
+ ```
85
+
86
+
87
+ ## Related Models
88
+ - Question/Answering Generation ```Context ---> Question + Answer```:
89
+ - https://huggingface.co/potsawee/t5-large-generation-race-QuestionAnswer
90
+ - https://huggingface.co/potsawee/t5-large-generation-squad-QuestionAnswer
91
+
92
+ - Distractor (False options) Generation:
93
+ - https://huggingface.co/potsawee/t5-large-generation-race-Distractor
94
+
95
+ ## Citation
96
+
97
+ ```bibtex
98
+ @article{manakul2023mqag,
99
+ title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization},
100
+ author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF},
101
+ journal={arXiv preprint arXiv:2301.12307},
102
+ year={2023}
103
+ }
104
+ ```