roberta-base / README.md
zemerov's picture
Update README.md
7f770c3
|
raw
history blame
No virus
3.59 kB
metadata
license: apache-2.0
language:
  - ru
  - en
library_name: transformers

RoBERTa-base from deepvk

Pretrained bidirectional encoder for russian language.

Model Details

Model Description

Model was pretrained using standard MLM objective on a large text corpora including open social data, books, Wikipedia, webpages etc.

  • Developed by: VK Applied Research Team
  • Model type: RoBERTa
  • Languages: Mostly russian and small fraction of other languages
  • License: Apache 2.0

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
model = AutoModel.from_pretrained("deepvk/roberta-base")

text = "Привет, мир!"

inputs = tokenizer(text, return_tensors='pt')
predictions = model(**inputs)

Training Details

Training Data

500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles, News websites, Social corpus.

Training Procedure

Training Hyperparameters

Argument Value
Training regime fp16 mixed precision
Training framework Fairseq
Optimizer Adam
Adam betas 0.9,0.98
Adam eps 1e-6
Num training steps 500k

Model was trained using 8xA100 for ~22 days.

Architecture details

Standard RoBERTa-base parameters:

Argument Value
Activation function gelu
Attention dropout 0.1
Dropout 0.1
Encoder attention heads 12
Encoder embed dim 768
Encoder ffn embed dim 3,072
Encoder layers 12
Max positions 512
Vocab size 50266
Tokenizer type Bete-level BPE

Evaluation

Russian Super Glue dev set.

Best result across base size models in bold.

Модель RCB PARus MuSeRC TERRa RUSSE RWSD DaNetQA Результат
vk-roberta-base 0.46 0.56 0.679 0.769 0.960 0.569 0.658 0.665
vk-deberta-distill 0.433 0.56 0.625 0.59 0.943 0.569 0.726 0.635
vk-deberta-base 0.450 0.61 0.722 0.704 0.948 0.578 0.76 0.682
vk-bert-base 0.467 0.57 0.587 0.704 0.953 0.583 0.737 0.657
sber-bert-base 0.491 0.61 0.663 0.769 0.962 0.574 0.678 0.678
sber-roberta-large 0.463 0.61 0.775 0.886 0.946 0.564 0.761 0.715