metadata
license: cc-by-sa-4.0
language:
- hr
- bs
- sr
XLM-R-BERTić
This model was produced by pre-training XLM-Roberta-large 48k steps on South Slavic languages.
Benchmarking
Three tasks were chosen for model evaluation:
- Named Entity Recognition (NER)
- Sentiment regression
- COPA (Choice of plausible alternatives)
In all cases, this model was finetuned for specific downstream tasks.
NER
(entry to be added soon)
Sentiment regression
ParlaSent dataset was used to evaluate sentiment regression for Bosnian, Croatian, and Serbian languages. The procedure is explained in greater detail in the dedicated benchmarking repository.
system | train | test | r^2 |
---|---|---|---|
xlm-r-parlasent | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.615 |
BERTić | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.612 |
XLM-R-SloBERTić | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.607 |
XLM-Roberta-Large | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.605 |
XLM-R-BERTić | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.601 |
crosloengual-bert | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.537 |
XLM-Roberta-Base | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.500 |
dummy (mean) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | -0.12 |
COPA
(to be added soon)
Citation
(to be added soon)