metadata
datasets:
- oscar
- srwac
- leipzig
language:
- hr
- sr
tags:
- masked-lm
widget:
- text: Ovo je početak <mask>.
license: apache-2.0
Transformer language model for Croatian and Serbian
Trained on 6GB datasets that contain Croatian and Serbian language for two epochs (500k steps). Leipzig, OSCAR and srWac datasets
Model | #params | Arch. | Training data |
---|---|---|---|
Andrija/SRoBERTa-L |
80M | Third | Leipzig Corpus, OSCAR and srWac (10 GB of text) |