lidiya commited on
Commit
0db3c8a
1 Parent(s): e7eeb83

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ widget:
4
+ - text: Robert Boyle \\n In the late 17th century, Robert Boyle proved that air is necessary for combustion.
5
+ ---
6
+ # MixQG (3b-sized model)
7
+ MixQG is a new question generation model pre-trained on a collection of QA datasets with a mix of answer types. It was introduced in the paper [MixQG: Neural Question Generation with Mixed Answer Types](https://arxiv.org/abs/2110.08175) and the associated code is released in [this](https://github.com/salesforce/QGen) repository.
8
+ ### How to use
9
+ Using Huggingface pipeline abstraction:
10
+ ```
11
+ from transformers import pipeline
12
+
13
+ nlp = pipeline("text2text-generation", model='Salesforce/mixqg-3b', tokenizer='Salesforce/mixqg-3b')
14
+
15
+ CONTEXT = "In the late 17th century, Robert Boyle proved that air is necessary for combustion."
16
+ ANSWER = "Robert Boyle"
17
+
18
+ def format_inputs(context: str, answer: str):
19
+ return f"{answer} \\n {context}"
20
+
21
+ text = format_inputs(CONTEXT, ANSWER)
22
+
23
+ nlp(text)
24
+ # should output [{'generated_text': 'Who proved that air is necessary for combustion?'}]
25
+ ```
26
+ Using the pre-trained model directly:
27
+ ```
28
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
29
+
30
+ tokenizer = AutoTokenizer.from_pretrained('Salesforce/mixqg-3b')
31
+ model = AutoModelForSeq2SeqLM.from_pretrained('Salesforce/mixqg-3b')
32
+
33
+ CONTEXT = "In the late 17th century, Robert Boyle proved that air is necessary for combustion."
34
+ ANSWER = "Robert Boyle"
35
+
36
+ def format_inputs(context: str, answer: str):
37
+ return f"{answer} \\n {context}"
38
+
39
+ text = format_inputs(CONTEXT, ANSWER)
40
+
41
+ input_ids = tokenizer(text, return_tensors="pt").input_ids
42
+ generated_ids = model.generate(input_ids, max_length=32, num_beams=4)
43
+ output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
44
+ print(output)
45
+ # should output "Who proved that air is necessary for combustion?"
46
+ ```
47
+
48
+ ### Citation
49
+ ```
50
+ @misc{murakhovska2021mixqg,
51
+ title={MixQG: Neural Question Generation with Mixed Answer Types},
52
+ author={Lidiya Murakhovs'ka and Chien-Sheng Wu and Tong Niu and Wenhao Liu and Caiming Xiong},
53
+ year={2021},
54
+ eprint={2110.08175},
55
+ archivePrefix={arXiv},
56
+ primaryClass={cs.CL}
57
+ }
58
+ ```