File size: 2,613 Bytes
d33de53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a79cc6
d33de53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d70db44
ec2d495
d33de53
 
 
 
 
 
 
 
 
 
2fa6ed9
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
license: apache-2.0
metrics:
- perplexity
pipeline_tag: fill-mask
language:
- orv
- cu
tags:
- roberta-based
- old church slavonic
- old east slavic
- old russian
- middle russian
- early slavic
widget:
- text: >-
    моли непрестанно о всѣхъ [MASK], честную память твою присно въ пѣснехъ почитающихъ
  example_title: Example 1
- text: >-
    да испишеть имѧна ваша. [MASK] возмуть мѣсѧчное свое съли слебное
  example_title: Example 2
---

# BERTislav

Baseline fill-mask model based on ruBERT and fine-tuned on a 10M-word corpus of mixed Old Church Slavonic, (Later) Church Slavonic, Old East Slavic, Middle Russian, and Medieval Serbian texts.

# Overview
- **Model Name:** BERTislav
- **Task**: Fill-mask
- **Base Model:** [ai-forever/ruBert-base](https://huggingface.co/ai-forever/ruBert-base)
- **Languages:** orv (Old East Slavic, Middle Russian), cu (Old Church Slavonic, Church Slavonic)
- **Developed by:** [Nilo Pedrazzini](https://huggingface.co/npedrazzini)

# Input Format
A `str`-type input with [MASK]ed tokens.

# Output Format
The predicted token, with the confidence score for each labels.

# Examples

### Example 1:

COMING SOON

# Uses
The model can be used as a baseline model for further finetuning to perform specific downstream tasks (e.g. linguistic annotation).

# Bias, Risks, and Limitations
The model should only be considered a baseline, and should **not** be evaluated on its own.
Testing is needed regarding its usefulness to improve the performance of language models finetuned for specific tasks.

# Training Details

The texts used as training data are from the following sources:
- [Fundamental Digital Library Russian Literature & Folklore](https://feb-web.ru/indexen.htm) (FEB-web)
- Puškinskij Dom's [*Библиотека литературы Древней Руси*](http://lib.pushkinskijdom.ru/Default.aspx?tabid=2070)
- [Cyrillomethodiana](https://histdict.uni-sofia.bg/)
- Parts of the Bdinski Sbornik, as digitized in [Obdurodon](http://bdinski.obdurodon.org/).
- [Tromsø Old Russian and Old Church Slavonic Treebank](https://torottreebank.github.io/) (TOROT).

**NB: Texts were heavily normalized and anyone planning to use the model is advised to do the same for the best outcome.
Use the [provided normalization script](https://huggingface.co/npedrazzini/BERTislav/blob/main/normalize.py), customizing it as needed.**

# Model Card Authors

Nilo Pedrazzini

# Model Card Contact

npedrazzini@turing.ac.uk

# How to use the model

COMING SOON