metadata
license: apache-2.0
language:
- ru
- en
library_name: transformers
RoBERTa-base from deepvk
Pretrained bidirectional encoder for russian language.
Model Details
Model Description
Model was pretrained using standard MLM objective on a large text corpora including open social data, books, Wikipedia, webpages etc.
- Developed by: VK Applied Research Team
- Model type: RoBERTa
- Languages: Mostly russian and small fraction of other languages
- License: Apache 2.0
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
model = AutoModel.from_pretrained("deepvk/roberta-base")
text = "Привет, мир!"
inputs = tokenizer(text, return_tensors='pt')
predictions = model(**inputs)
Training Details
Training Data
500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles, News websites, Social corpus.
Training Procedure
Training Hyperparameters
Argument | Value |
---|---|
Training regime | fp16 mixed precision |
Training framework | Fairseq |
Optimizer | Adam |
Adam betas | 0.9,0.98 |
Adam eps | 1e-6 |
Num training steps | 500k |
Model was trained using 8xA100 for ~22 days.
Architecture details
Standard RoBERTa-base parameters:
Argument | Value |
---|---|
Activation function | gelu |
Attention dropout | 0.1 |
Dropout | 0.1 |
Encoder attention heads | 12 |
Encoder embed dim | 768 |
Encoder ffn embed dim | 3,072 |
Encoder layers | 12 |
Max positions | 512 |
Vocab size | 50266 |
Tokenizer type | Bete-level BPE |
Evaluation
Russian Super Glue dev set.
Best result across base size models in bold.
Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
---|---|---|---|---|---|---|---|---|
vk-roberta-base | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
vk-deberta-distill | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
vk-deberta-base | 0.450 | 0.61 | 0.722 | 0.704 | 0.948 | 0.578 | 0.76 | 0.682 |
vk-bert-base | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 | 0.583 | 0.737 | 0.657 |
sber-bert-base | 0.491 | 0.61 | 0.663 | 0.769 | 0.962 | 0.574 | 0.678 | 0.678 |
sber-roberta-large | 0.463 | 0.61 | 0.775 | 0.886 | 0.946 | 0.564 | 0.761 | 0.715 |