lidiya commited on
Commit
d48d020
1 Parent(s): 710eed2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ widget:
4
+ - text: Robert Boyle \\n In the late 17th century, Robert Boyle proved that air is necessary for combustion.
5
+ ---
6
+
7
+ # MixQG (large-sized model)
8
+
9
+ MixQG is a new question generation model pre-trained on a collection of QA datasets with a mix of answer types. It was introduced in the paper [MixQG: Neural Question Generation with Mixed Answer Types](https://arxiv.org/abs/2110.08175) and the associated code is released in [this](https://github.com/salesforce/QGen) repository.
10
+
11
+ ### How to use
12
+ Using Huggingface pipeline abstraction:
13
+ ```
14
+ from transformers import pipeline
15
+
16
+ nlp = pipeline("text2text-generation", model='Salesforce/mixqg-large', tokenizer='Salesforce/mixqg-large')
17
+
18
+ CONTEXT = "In the late 17th century, Robert Boyle proved that air is necessary for combustion."
19
+ ANSWER = "Robert Boyle"
20
+
21
+ def format_inputs(context: str, answer: str):
22
+ return f"{answer} \\n {context}"
23
+
24
+ text = format_inputs(CONTEXT, ANSWER)
25
+
26
+ nlp(text)
27
+ # should output [{'generated_text': 'Who proved that air is necessary for combustion?'}]
28
+ ```
29
+ Using the pre-trained model directly:
30
+ ```
31
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained('Salesforce/mixqg-large')
34
+ model = AutoModelForSeq2SeqLM.from_pretrained('Salesforce/mixqg-large')
35
+
36
+ CONTEXT = "In the late 17th century, Robert Boyle proved that air is necessary for combustion."
37
+ ANSWER = "Robert Boyle"
38
+
39
+ def format_inputs(context: str, answer: str):
40
+ return f"{answer} \\n {context}"
41
+
42
+ text = format_inputs(CONTEXT, ANSWER)
43
+
44
+ input_ids = tokenizer(text, return_tensors="pt").input_ids
45
+ generated_ids = model.generate(input_ids, max_length=32, num_beams=4)
46
+ output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
47
+ print(output)
48
+ # should output "Who proved that air is necessary for combustion?"
49
+ ```
50
+
51
+ ### Citation
52
+ ```
53
+ @misc{murakhovska2021mixqg,
54
+ title={MixQG: Neural Question Generation with Mixed Answer Types},
55
+ author={Lidiya Murakhovs'ka and Chien-Sheng Wu and Tong Niu and Wenhao Liu and Caiming Xiong},
56
+ year={2021},
57
+ eprint={2110.08175},
58
+ archivePrefix={arXiv},
59
+ primaryClass={cs.CL}
60
+ }
61
+ ```