Edit model card

BERTislav

Baseline fill-mask model based on ruBERT and fine-tuned on a 10M-word corpus of mixed Old Church Slavonic, (Later) Church Slavonic, Old East Slavic, Middle Russian, and Medieval Serbian texts.

Overview

  • Model Name: BERTislav
  • Task: Fill-mask
  • Base Model: ai-forever/ruBert-base
  • Languages: orv (Old East Slavic, Middle Russian), cu (Old Church Slavonic, Church Slavonic)
  • Developed by: Nilo Pedrazzini

Input Format

A str-type input with [MASK]ed tokens.

Output Format

The predicted token, with the confidence score for each labels.

Examples

Example 1:

COMING SOON

Uses

The model can be used as a baseline model for further finetuning to perform specific downstream tasks (e.g. linguistic annotation).

Bias, Risks, and Limitations

The model should only be considered a baseline, and should not be evaluated on its own. Testing is needed regarding its usefulness to improve the performance of language models finetuned for specific tasks.

Training Details

The texts used as training data are from the following sources:

NB: Texts were heavily normalized and anyone planning to use the model is advised to do the same for the best outcome. Use the provided normalization script, customizing it as needed.

Model Card Authors

Nilo Pedrazzini

Model Card Contact

npedrazzini@turing.ac.uk

How to use the model

COMING SOON

Downloads last month
4
Safetensors
Model size
178M params
Tensor type
F32
·