|
--- |
|
license: cc-by-sa-4.0 |
|
datasets: |
|
- cjvt/cc_gigafida |
|
- cjvt/solar3 |
|
- cjvt/sloleks |
|
language: |
|
- sl |
|
tags: |
|
- word shape correction |
|
--- |
|
|
|
--- |
|
language: |
|
- sl |
|
|
|
license: cc-by-sa-4.0 |
|
--- |
|
|
|
# T5-slo-word-shape-corrector |
|
|
|
This T5 model is designed to identify and correct words with incorrect shapes. |
|
|
|
## Model Output Example |
|
|
|
Imagine we have the following Slovenian text: |
|
|
|
_Model v besedilu popravljaj besede, ki imeti nepravilno obliko._ |
|
|
|
The model might return the following text (note: predictions chosen for demonstration/explanation, not reproducibility!): |
|
|
|
_Model v besedilu popravlja besede, ki imajo nepravilno obliko._ |
|
|
|
We observe that in the input sentence, the words `popravljaj` and `imeti` are written with incorrect gender and inclination based on the context. Our model corrects them to `popravlja` and `imajo`. |
|
|
|
## More details |
|
|
|
Testing the model with **generated** test sets provides the following result (combining detection and correction of words with incorrect shapes): |
|
|
|
- `Precission`: 0,911 |
|
- `Recall`:0,811 |
|
- `F1`: 0,858 |
|
|
|
## Acknowledgement |
|
|
|
The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills. |
|
|
|
## Authors |
|
|
|
Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing these models. |