This is google/byt5-small transformer model trained on Lithuanian text for ~100 hours. It was created during the work Towards Lithuanian Grammatical Error Correction, which was presented at 11th Computer Science On-line Conference 2022.

The model is yet in its infancy (we are planning to train 100x longer in the future). Nevertheless, it clearly shows the possibilities and capabilities.


Given the following corrupted text obtained from []:

text = 'Sveiki pardodu tvarkyngą "Audi" firmos automobylį. Kątik iš Amerikės. Viena savininka prižiurietas ir mylietas Automobylis. Dar turu patobulintą „Mersedes“ su automatinia greičių pavara už 4000 evrų (iš Amerikės). Taippat tvarkingas.'

The correction can be obtained by:

from transformers import pipeline
name= "LukasStankevicius/ByT5-Lithuanian-gec-100h"
my_pipeline = pipeline(task="text2text-generation", model=name, framework="pt")
corrected_text = my_pipeline(text)[0]["generated_text"]

Output from the above would be:

Sveiki parduodu tvarkingą „Audi“ firmos automobilį. Ką tik iš Amerikės. Viena savininkas prižiūrintas ir mylimas automobilis. Dar turiu patobulintą „Mersedes“ su automatine greičių pavara už 4000 eurų (iš Amerikės). Taip pat tvarkingas.

More information can be found in the accompanying GitHub repository:

Downloads last month
Hosted inference API
Text2Text Generation
This model can be loaded on the Inference API on-demand.