Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ru
5
+ - en
6
+ library_name: transformers
7
+ ---
8
+
9
+ # RoBERTa-base from deepvk
10
+
11
+ <!-- Provide a quick summary of what the model is/does. -->
12
+
13
+ Pretrained bidirectional encoder for russian language.
14
+
15
+ ## Model Details
16
+
17
+ ### Model Description
18
+
19
+ <!-- Provide a longer summary of what this model is. -->
20
+ Model was pretrained using standard MLM objective on a large text corpora including open social data, books, Wikipedia, webpages etc.
21
+
22
+
23
+ - **Developed by:** VK Applied Research Team
24
+ - **Model type:** RoBERTa
25
+ - **Languages:** Mostly russian and small fraction of other languages
26
+ - **License:** Apache 2.0
27
+
28
+ ## How to Get Started with the Model
29
+
30
+ ```
31
+ from transformers import AutoTokenizer, AutoModel
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
34
+ model = AutoModel.from_pretrained("deepvk/roberta-base")
35
+
36
+ text = "Привет, мир!"
37
+
38
+ inputs = tokenizer(text, return_tensors='pt')
39
+ predictions = model(**inputs)
40
+ ```
41
+
42
+
43
+ ## Training Details
44
+
45
+ ### Training Data
46
+
47
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
48
+
49
+ Mix of the following data:
50
+
51
+
52
+ ### Training Procedure
53
+
54
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
55
+
56
+ #### Preprocessing [optional]
57
+
58
+ [More Information Needed]
59
+
60
+
61
+ #### Training Hyperparameters
62
+
63
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
64
+
65
+ #### Speeds, Sizes, Times [optional]
66
+
67
+ Standard RoBERTA-base size;
68
+
69
+ ## Evaluation
70
+
71
+ <!-- This section describes the evaluation protocols and provides the results. -->
72
+
73
+ ### Testing Data, Factors & Metrics
74
+
75
+ #### Testing Data
76
+
77
+ <!-- This should link to a Data Card if possible. -->
78
+
79
+ [More Information Needed]
80
+
81
+ #### Factors
82
+
83
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
84
+
85
+ [More Information Needed]
86
+
87
+ #### Metrics
88
+
89
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
90
+
91
+ [More Information Needed]
92
+
93
+ ### Results
94
+
95
+ [More Information Needed]
96
+
97
+ #### Summary
98
+
99
+
100
+ ## Compute Infrastructure
101
+
102
+ Model was trained using 8xA100 for ~22 days.