|
--- |
|
license: apache-2.0 |
|
language: |
|
- ru |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
# RoBERTa-base from deepvk |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Pretrained bidirectional encoder for russian language. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
Model was pretrained using standard MLM objective on a large text corpora including open social data, books, Wikipedia, webpages etc. |
|
|
|
|
|
- **Developed by:** VK Applied Research Team |
|
- **Model type:** RoBERTa |
|
- **Languages:** Mostly russian and small fraction of other languages |
|
- **License:** Apache 2.0 |
|
|
|
## How to Get Started with the Model |
|
|
|
``` |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base") |
|
model = AutoModel.from_pretrained("deepvk/roberta-base") |
|
|
|
text = "Привет, мир!" |
|
|
|
inputs = tokenizer(text, return_tensors='pt') |
|
predictions = model(**inputs) |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
Mix of the following data: |
|
|
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Preprocessing [optional] |
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
#### Speeds, Sizes, Times [optional] |
|
|
|
Standard RoBERTA-base size; |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Data Card if possible. --> |
|
|
|
[More Information Needed] |
|
|
|
#### Factors |
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
|
|
|
[More Information Needed] |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
[More Information Needed] |
|
|
|
### Results |
|
|
|
[More Information Needed] |
|
|
|
#### Summary |
|
|
|
|
|
## Compute Infrastructure |
|
|
|
Model was trained using 8xA100 for ~22 days. |