BERTislav / README.md
npedrazzini's picture
Update README.md
7240761 verified
metadata
license: apache-2.0
metrics:
  - perplexity
pipeline_tag: fill-mask
language:
  - cu
  - orv
  - chu
tags:
  - roberta-based
  - old church slavonic
  - old east slavic
  - old russian
  - middle russian
  - early slavic
widget:
  - text: >-
      моли непрестанно о всѣхъ [MASK], честную память твою присно въ пѣснехъ
      почитающихъ
    example_title: Example 1
  - text: да испишеть имѧна ваша. [MASK] возмуть мѣсѧчное свое съли слебное
    example_title: Example 2

BERTislav

Baseline fill-mask model based on ruBERT and fine-tuned on a 10M-word corpus of mixed Old Church Slavonic, (Later) Church Slavonic, Old East Slavic, Middle Russian, and Medieval Serbian texts.

Overview

  • Model Name: BERTislav
  • Task: Fill-mask
  • Base Model: ai-forever/ruBert-base
  • Languages: orv (Old East Slavic, Middle Russian), cu (Old Church Slavonic, Church Slavonic)
  • Developed by: Nilo Pedrazzini

Input Format

A str-type input with [MASK]ed tokens.

Output Format

The predicted token, with the confidence score for each labels.

Examples

Example 1:

COMING SOON

Uses

The model can be used as a baseline model for further finetuning to perform specific downstream tasks (e.g. linguistic annotation).

Bias, Risks, and Limitations

The model should only be considered a baseline, and should not be evaluated on its own. Testing is needed regarding its usefulness to improve the performance of language models finetuned for specific tasks.

Training Details

The texts used as training data are from the following sources:

NB: Texts were heavily normalized and anyone planning to use the model is advised to do the same for the best outcome. Use the provided normalization script, customizing it as needed.

Model Card Authors

Nilo Pedrazzini

Model Card Contact

npedrazzini@turing.ac.uk

How to use the model

COMING SOON