---
license: apache-2.0
datasets:
- tay-yozhik/NaturalText
language:
- ru
---

# NaturalRoBERTa

This is a pre-trained model of type [RoBERTa](https://arxiv.org/abs/1907.11692). 
NaturalRoBERTa is built on a dataset obtained from open sources: three news sub-corpuses [Taiga](https://github.com/TatianaShavrina/taiga_site) (Lenta.ru, Interfax, N+1) and [Russian Wikipedia texts](https://ru.wikipedia.org/).

# Evaluation

This model was evaluated on [RussianSuperGLUE tests](https://russiansuperglue.com/):
| Task | Result | Metrics |
|-------|----------|---------|
| LiDiRus | 0,0 | Matthews Correlation Coefficient |
| RCB | 0,217 / 0,484 | F1 / Accuracy |
| PARus | 0,498 | Accuracy |
| TERRa | 0,487 | Accuracy |
| RUSSE | 0,587 | Accuracy |
| RWSD | 0,669 | Accuracy |