Skolkovo Institute of Science and Technology
commited on
Commit
•
b92b103
1
Parent(s):
abb1785
Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,23 @@ This model was trained in terms of [GenChal 2022: Feedback Comment Generation fo
|
|
14 |
|
15 |
In this task, the model gets the string with text with the error and the exact span of the error and should return the comment in natural language, which explains the nature of the error.
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Model training details
|
18 |
|
19 |
#### Data
|
@@ -50,48 +67,6 @@ The main feature of our training pipeline was data augmentation. The idea of the
|
|
50 |
|
51 |
Using both initial and augmented data we fine-tuned [t5-large](https://huggingface.co/t5-large).
|
52 |
|
53 |
-
## How to use
|
54 |
-
|
55 |
-
```python
|
56 |
-
|
57 |
-
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
58 |
-
|
59 |
-
text_with_error = 'I want to stop smoking during driving bicycle .'
|
60 |
-
error_span = '23:29'
|
61 |
-
|
62 |
-
off1, off2 = list(map(int,error_span.split(":")))
|
63 |
-
text_with_error_pointed = text_with_error [:off1] + "< < " + re.sub("\s+", " > > < < ", text_with_error [off1:off2].strip()) + " > > " + text_with_error[off2:]
|
64 |
-
text_with_error_pointed = re.sub("\s+", " ", text_with_error_pointed .strip()).lower()
|
65 |
-
|
66 |
-
tokenizer = AutoTokenizer.from_pretrained("SkolkovoInstitute/GenChal_2022_nigula")
|
67 |
-
model = T5ForConditionalGeneration.from_pretrained("SkolkovoInstitute/GenChal_2022_nigula").cuda();
|
68 |
-
model.eval();
|
69 |
-
|
70 |
-
def paraphrase(text, model, temperature=1.0, beams=3):
|
71 |
-
texts = [text] if isinstance(text, str) else text
|
72 |
-
inputs = tokenizer(texts, return_tensors='pt', padding=True)['input_ids'].to(model.device)
|
73 |
-
result = model.generate(
|
74 |
-
inputs,
|
75 |
-
# num_return_sequences=n or 1,
|
76 |
-
do_sample=False,
|
77 |
-
temperature=temperature,
|
78 |
-
repetition_penalty=1.1,
|
79 |
-
max_length=int(inputs.shape[1] * 3) ,
|
80 |
-
# bad_words_ids=[[2]], # unk
|
81 |
-
num_beams=beams,
|
82 |
-
)
|
83 |
-
texts = [tokenizer.decode(r, skip_special_tokens=True) for r in result]
|
84 |
-
if isinstance(text, str):
|
85 |
-
return texts[0]
|
86 |
-
return texts
|
87 |
-
|
88 |
-
|
89 |
-
paraphrase([pointed_example], model)
|
90 |
-
|
91 |
-
# expected output: ["a gerund > does not normally follow the preposition > during > >. think of an expression using the conjunction >'while'instead of a preposition >."]
|
92 |
-
|
93 |
-
|
94 |
-
```
|
95 |
|
96 |
|
97 |
## Licensing Information
|
|
|
14 |
|
15 |
In this task, the model gets the string with text with the error and the exact span of the error and should return the comment in natural language, which explains the nature of the error.
|
16 |
|
17 |
+
|
18 |
+
## How to use
|
19 |
+
|
20 |
+
```python
|
21 |
+
!pip install feedback_generation_nigula
|
22 |
+
from feedback_generation_nigula.generator import FeedbackGenerator
|
23 |
+
|
24 |
+
fg = FeedbackGenerator(cuda_index = 0)
|
25 |
+
text = "The smoke flow my face ."
|
26 |
+
span = (10,17)
|
27 |
+
|
28 |
+
fg.get_feedback([text], [span])
|
29 |
+
|
30 |
+
# expected output ["When the <verb> <<flow>> is used as an <intransitive verb> to express'' to move in a stream'', a <preposition> needs to be placed to indicate the direction"]
|
31 |
+
|
32 |
+
```
|
33 |
+
|
34 |
## Model training details
|
35 |
|
36 |
#### Data
|
|
|
67 |
|
68 |
Using both initial and augmented data we fine-tuned [t5-large](https://huggingface.co/t5-large).
|
69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
|
72 |
## Licensing Information
|