s-nlp /GenChal_2022_nigula

Model overview

This model was trained in terms of GenChal 2022: Feedback Comment Generation for Writing Learning shared task

In this task, the model gets the string with text with the error and the exact span of the error and should return the comment in natural language, which explains the nature of the error.

How to use

!pip install feedback_generation_nigula
from feedback_generation_nigula.generator import FeedbackGenerator

fg = FeedbackGenerator(cuda_index = 0)
text_with_error = "The smoke flow my face ."
error_span = (10,17)

fg.get_feedback([text_with_error ], [error_span ])

# expected output ["When the <verb> <<flow>> is used as an <intransitive verb> to express'' to move in a stream'', a <preposition> needs to be placed to indicate the direction"]


Model training details

Data

The data was provided in the following way

input sentence [\t] offset range [\t] feedback comment


Here are some examples

The smoke flow my face .	10:17	When the <verb> <<flow>> is used as an <intransitive verb> to express ''to move in a stream'', a <preposition> needs to be placed to indicate the direction. 'To' and 'towards' are <prepositions> that indicate direction.

I want to stop smoking during driving bicycle .	23:29	A <gerund> does not normally follow the <preposition> <<during>>. Think of an expression using the <conjunction> 'while' instead of a <preposition>.


Grammar termins are highlighted with '< ... >' marks and word examples - with '<< ... >>'

Data preprocessing

We lowercased the text, split it from any punctuation, including task specific marks (<< >>) and explicitly pointed out the error in the original text using << >>.

the smoke < < flow > > < < my > > face .	10:17 When the < verb > < < flow > > is used as an < intransitive verb > to express '' to move in a stream '', a < preposition > needs to be placed to indicate the direction. ' to ' and ' towards ' are < prepositions > that indicate direction .

i want to stop smoking < < during > > driving bicycle .	23:29	a < gerund > does not normally follow the < preposition > < < during > > . think of an expression using the < conjunction > ' while ' instead of a < preposition > .


Data augmentation

The main feature of our training pipeline was data augmentation. The idea of the augmentation is as follows: we cut the existing text with error after the last word which was syntactically connected to the words inside the error span (syntactic dependencies were automatically parsed with spacy) and this cut version of the text with error was used as a prompt for language model (we used GPT-Neo 1.3B).

Using both initial and augmented data we fine-tuned t5-large.