--- license: cc-by-sa-4.0 datasets: - cjvt/cc_gigafida - cjvt/solar3 - cjvt/sloleks language: - sl tags: - word spelling correction --- --- language: - sl license: cc-by-sa-4.0 --- # T5-incorrect-word-spelling-corrector This T5 model is designed to identify and correct words with incorrect spelling in the Slovenian language. ## Model Output Example Consider the following Slovenian text: _Model v besedlu popravi napaake v nepravilno črkovanih besedah._ The model might return the following text (note: predictions chosen for demonstration/explanation, not reproducibility!): _Model v besedilu popravi napake v nepravilno črkovanih besedah._ We observe that in the input sentence, the words `besedlu` and `napaake` are incorrectly spelled, so the model corrects them to `besedilu` and `napake`. ## More details Testing the model with **generated** test sets provides the following result (combining detection and correction of words with incorrect spelling): - `Precission`: 0,986 - `Recall`: 0,935 - `F1`: 0,960 Testing the model, in combination with **cjvt/SloBERTa-slo-word-spelling-annotator**, with test sets constructed using the **Šolar Eval** dataset provides the following results (combining detection and correction of words with incorrect spelling): - `Precission`: 0,823 - `Recall`: 0,796 - `F1`: 0,810 ## Acknowledgement The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills. ## Authors Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing these models.