README.md · Unbabel/gec-t5_small at 80b119ab23d4ad305ae656d8362018afee218446

This model is an implementation of the paper A Simple Recipe for Multilingual Grammatical Error Correction from Google where they report the State of the art score in the task of Grammatical Error Correction (GEC). We implement the version with the T5-small with the reported F_0.5 score in the paper (60.70).

In order to use the model, look at the following snippet:

from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained("Unbabel/gec-t5_small")
tokenizer = T5Tokenizer.from_pretrained('t5-small')

sentence = "I like to swimming"
tokenized_sentence = tokenizer('gec: ' + sentence, max_length=128, truncation=True, padding='max_length', return_tensors='pt')
corrected_sentence = tokenizer.decode(
    model.generate(
        input_ids = tokenized_sentence.input_ids,
        attention_mask = tokenized_sentence.attention_mask, 
        max_length=128,
        num_beams=5,
        early_stopping=True,
    )[0],
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True
)
print(corrected_sentence) # -> I like swimming.