Edit model card

BERT-base

Pretrained bidirectional encoder for russian language. The model was trained using standard MLM objective on large text corpora including open social data. See Training Details section for more information.

⚠️ This model contains only the encoder part without any pretrained head.

  • Developed by: deepvk
  • Model type: BERT
  • Languages: Mostly russian and small fraction of other languages
  • License: Apache 2.0

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("deepvk/bert-base-uncased")
model = AutoModel.from_pretrained("deepvk/bert-base-uncased")

text = "Привет, мир!"

inputs = tokenizer(text, return_tensors='pt')
predictions = model(**inputs)

Training Details

The model was trained using the NVIDIA source code. See the pretraining documentation for details.

Training Data

250 GB of filtered texts in total. A mix of the following data: Wikipedia, Books and Social corpus.

Architecture details

Argument Value
Encoder layers 12
Encoder attention heads 12
Encoder embed dim 768
Encoder ffn embed dim 3,072
Activation function GeLU
Attention dropout 0.1
Dropout 0.1
Max positions 512
Vocab size 36000
Tokenizer type BertTokenizer

Evaluation

We evaluated the model on Russian Super Glue dev set. The best result in each task is marked in bold. All models have the same size except the distilled version of DeBERTa.

Model RCB PARus MuSeRC TERRa RUSSE RWSD DaNetQA Score
vk-deberta-distill 0.433 0.56 0.625 0.59 0.943 0.569 0.726 0.635
vk-roberta-base 0.46 0.56 0.679 0.769 0.960 0.569 0.658 0.665
vk-deberta-base 0.450 0.61 0.722 0.704 0.948 0.578 0.76 0.682
vk-bert-base 0.467 0.57 0.587 0.704 0.953 0.583 0.737 0.657
sber-bert-base 0.491 0.61 0.663 0.769 0.962 0.574 0.678 0.678
Downloads last month
57
Safetensors
Model size
114M params
Tensor type
I64
·
F32
·

Collection including deepvk/bert-base-uncased