--- license: apache-2.0 metrics: - perplexity pipeline_tag: fill-mask language: - orv - cu tags: - roberta-based - old church slavonic - old east slavic - old russian - middle russian - early slavic widget: - text: >- моли непрестанно о всѣхъ [MASK], честную память твою присно въ пѣснехъ почитающихъ example_title: Example 1 - text: >- да испишеть имѧна ваша. [MASK] возмуть мѣсѧчное свое съли слебное example_title: Example 2 --- # BERTislav Baseline fill-mask model based on ruBERT and fine-tuned on a 10M-word corpus of mixed Old Church Slavonic, (Later) Church Slavonic, Old East Slavic, Middle Russian, and Medieval Serbian texts. # Overview - **Model Name:** BERTislav - **Task**: Fill-mask - **Base Model:** [ai-forever/ruBert-base](https://huggingface.co/ai-forever/ruBert-base) - **Languages:** orv (Old East Slavic, Middle Russian), cu (Old Church Slavonic, Church Slavonic) - **Developed by:** [Nilo Pedrazzini](https://huggingface.co/npedrazzini) # Input Format A `str`-type input with [MASK]ed tokens. # Output Format The predicted token, with the confidence score for each labels. # Examples ### Example 1: COMING SOON # Uses The model can be used as a baseline model for further finetuning to perform specific downstream tasks (e.g. linguistic annotation). # Bias, Risks, and Limitations The model should only be considered a baseline, and should **not** be evaluated on its own. Testing is needed regarding its usefulness to improve the performance of language models finetuned for specific tasks. # Training Details The texts used as training data are from the following sources: - [Fundamental Digital Library Russian Literature & Folklore](https://feb-web.ru/indexen.htm) (FEB-web) - Puškinskij Dom's [*Библиотека литературы Древней Руси*](http://lib.pushkinskijdom.ru/Default.aspx?tabid=2070) - [Cyrillomethodiana](https://histdict.uni-sofia.bg/) - Parts of the Bdinski Sbornik, as digitized in [Obdurodon](http://bdinski.obdurodon.org/). - [Tromsø Old Russian and Old Church Slavonic Treebank](https://torottreebank.github.io/) (TOROT). **NB: Texts were heavily normalized and anyone planning to use the model is advised to do the same for the best outcome. Use the [provided normalization script](https://huggingface.co/npedrazzini/BERTislav/blob/main/normalize.py), customizing it as needed.** # Model Card Authors Nilo Pedrazzini # Model Card Contact npedrazzini@turing.ac.uk # How to use the model COMING SOON