metadata

license: cc-by-sa-4.0
datasets:
  - cjvt/cc_gigafida
  - cjvt/solar3
  - cjvt/sloleks
language:
  - sl
tags:
  - word shape correction

language:

license: cc-by-sa-4.0

T5-slo-word-shape-corrector

This T5 model is designed to identify and correct words with incorrect shapes.

Model Output Example

Imagine we have the following Slovenian text:

Model v besedilu popravljaj besede, ki imeti nepravilno obliko.

The model might return the following text (note: predictions chosen for demonstration/explanation, not reproducibility!):

Model v besedilu popravlja besede, ki imajo nepravilno obliko.

We observe that in the input sentence, the words popravljaj and imeti are written with incorrect gender and inclination based on the context. Our model corrects them to popravlja and imajo.

More details

Testing the model with generated test sets provides the following result (combining detection and correction of words with incorrect shapes):

Precission: 0,911
Recall:0,811
F1: 0,858

Acknowledgement

The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills.

Authors

Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing these models.