felipesanma commited on
Commit
b1863ac
1 Parent(s): db66fc9

update readme model card

Browse files
Files changed (1) hide show
  1. README.md +80 -1
README.md CHANGED
@@ -4,4 +4,83 @@ datasets:
4
  - squad
5
  language:
6
  - en
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - squad
5
  language:
6
  - en
7
+ ---
8
+
9
+
10
+ # Question Generator
11
+
12
+ This model should be used to generate questions based on a given string.
13
+
14
+ ### Out-of-Scope Use
15
+
16
+ English language support only.
17
+
18
+ ## How to Get Started with the Model
19
+
20
+ Use the code below to get started with the model.
21
+
22
+ ```python
23
+ import torch
24
+ from transformers import T5ForConditionalGeneration, T5Tokenizer
25
+
26
+ def question_parser(question: str) -> str:
27
+ return " ".join(question.split(":")[1].split())
28
+
29
+ def generate_questions_v2(context: str, answer: str, n_questions: int = 1):
30
+ model = T5ForConditionalGeneration.from_pretrained(
31
+ "pipesanma/chasquilla-question-generator"
32
+ )
33
+ tokenizer = T5Tokenizer.from_pretrained("pipesanma/chasquilla-question-generator")
34
+
35
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
36
+ model = model.to(device)
37
+ text = "context: " + context + " " + "answer: " + answer + " </s>"
38
+
39
+ encoding = tokenizer.encode_plus(
40
+ text, max_length=512, padding=True, return_tensors="pt"
41
+ )
42
+ input_ids, attention_mask = encoding["input_ids"].to(device), encoding[
43
+ "attention_mask"
44
+ ].to(device)
45
+
46
+ model.eval()
47
+ beam_outputs = model.generate(
48
+ input_ids=input_ids,
49
+ attention_mask=attention_mask,
50
+ max_length=72,
51
+ early_stopping=True,
52
+ num_beams=5,
53
+ num_return_sequences=n_questions,
54
+ )
55
+
56
+ questions = []
57
+
58
+ for beam_output in beam_outputs:
59
+ sent = tokenizer.decode(
60
+ beam_output, skip_special_tokens=True, clean_up_tokenization_spaces=True
61
+ )
62
+ print(sent)
63
+ questions.append(question_parser(sent))
64
+
65
+ return questions
66
+
67
+
68
+ context = "President Donald Trump said and predicted that some states would reopen this month."
69
+ answer = "Donald Trump"
70
+
71
+ questions = generate_questions_v2(context, answer, 1)
72
+ print(questions)
73
+ ```
74
+
75
+ ## Training Details
76
+
77
+ ### Dataset generation
78
+
79
+ The dataset is "squad" from datasets library.
80
+
81
+ Check the [utils/dataset_gen.py](utils/dataset_gen.py) file for the dataset generation.
82
+
83
+ ### Training model
84
+
85
+ Check the [utils/t5_train_model.py](utils/t5_train_model.py) file for the training process
86
+