File size: 1,199 Bytes
46ee5d3
108fbe9
46ee5d3
108fbe9
 
 
c5bc568
 
 
 
23176f8
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
---
license: agpl-3.0
---

Model is developed in support of the University of Belgrade doctoral dissertation "Composite pseudogrammars based on parallel language models of Serbian" by Mihailo Škorić.

It generates syntactly masked sentences for Serbian.

This small gpt-2 model was fine-tuned on several corpora for Serbian, augmented using [Serbian Morphological Dictionaries](http://poincare.matf.bg.ac.rs/~cvetana/biblio/22_Vitas_Krstev.pdf)). 

The corpora include ["The corpus of Contemporary Serbian"](https://drive.google.com/file/d/1wRgoWer6YULGCXR0zWOl1fVA6VIe1DOR), [SrpELTeC](https://drive.google.com/file/d/1RtBXyw5Cdh6y_cqbJoMlYhSwNFydBRUv) and WikiKorpus by [JeRTeh – Society for Language Resources and Technologies](https://jerteh.rs/).

<b style="color:red">This model is purely experimental! For actual models for Serbian see <a href="https://huggingface.co/jerteh/gpt2-orao" style="color:blue;font-weight:bold">GPT2-ORAO</a> and <a style="color:blue;font-weight:bold" href="https://huggingface.co/jerteh/gpt2-orao">GPT2-VRABAC</a></b>
<br/><b>If you use this model for your reseach please cite:  [https://doi.org/10.3390/math11224660](https://doi.org/10.3390/math11224660)</b>