Skolkovo Institute of Science and Technology commited on
Commit
6e32a9f
1 Parent(s): aa9baf7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -2
README.md CHANGED
@@ -32,10 +32,58 @@ I want to stop smoking during driving bicycle . 23:29 A <gerund> does not normal
32
 
33
  ```
34
 
 
 
35
  ### Data preprocessing
36
 
37
- We lowercased the text and explicitly pointed out the error in the original text
 
 
 
38
 
 
 
39
 
40
 
41
- ## How to use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ```
34
 
35
+ Grammar termins are highlighted with '< ... >' marks and word examples - with '<< ... >>'
36
+
37
  ### Data preprocessing
38
 
39
+ We lowercased the text, split it from any punctuation, including task specific marks (<< >>) and explicitly pointed out the error in the original text using << >>.
40
+
41
+ ```
42
+ the smoke < < flow > > < < my > > face . 10:17 When the < verb > < < flow > > is used as an < intransitive verb > to express '' to move in a stream '', a < preposition > needs to be placed to indicate the direction. ' to ' and ' towards ' are < prepositions > that indicate direction .
43
 
44
+ i want to stop smoking < < during > > driving bicycle . 23:29 a < gerund > does not normally follow the < preposition > < < during > > . think of an expression using the < conjunction > ' while ' instead of a < preposition > .
45
+ ```
46
 
47
 
48
+ ## How to use
49
+
50
+ ```python
51
+
52
+ from transformers import T5ForConditionalGeneration, AutoTokenizer, PreTrainedTokenizerFast
53
+
54
+ text_with_error = 'I want to stop smoking during driving bicycle .'
55
+ error_span = '23:29'
56
+
57
+ off1, off2 = list(map(int,error_span.split(":")))
58
+ text_with_error_pointed = text_with_error [:off1] + "< < " + re.sub("\s+", " > > < < ", text_with_error [off1:off2].strip()) + " > > " + text_with_error[off2:]
59
+ text_with_error_pointed = re.sub("\s+", " ", text_with_error_pointed .strip()).lower()
60
+
61
+ tokenizer = AutoTokenizer.from_pretrained("SkolkovoInstitute/GenChal_2022_nigula")
62
+ model = T5ForConditionalGeneration.from_pretrained("SkolkovoInstitute/GenChal_2022_nigula").cuda();
63
+ model.eval();
64
+
65
+ def paraphrase(text, model, temperature=1.0, beams=3):
66
+ texts = [text] if isinstance(text, str) else text
67
+ inputs = tokenizer(texts, return_tensors='pt', padding=True)['input_ids'].to(model.device)
68
+ result = model.generate(
69
+ inputs,
70
+ # num_return_sequences=n or 1,
71
+ do_sample=False,
72
+ temperature=temperature,
73
+ repetition_penalty=1.1,
74
+ max_length=int(inputs.shape[1] * 3) ,
75
+ # bad_words_ids=[[2]], # unk
76
+ num_beams=beams,
77
+ )
78
+ texts = [tokenizer.decode(r, skip_special_tokens=True) for r in result]
79
+ if isinstance(text, str):
80
+ return texts[0]
81
+ return texts
82
+
83
+
84
+ paraphrase([pointed_example], model)
85
+
86
+ # expected output: ["a gerund > does not normally follow the preposition > during > >. think of an expression using the conjunction >'while'instead of a preposition >."]
87
+
88
+
89
+ ```