SpirinEgor commited on
Commit
fdab3b0
1 Parent(s): cfe4d17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -28
README.md CHANGED
@@ -4,23 +4,19 @@ language:
4
  - ru
5
  - en
6
  library_name: transformers
 
7
  ---
8
 
9
- # RoBERTa-base from deepvk
10
 
11
  <!-- Provide a quick summary of what the model is/does. -->
12
 
13
- Pretrained bidirectional encoder for russian language.
 
 
14
 
15
- ## Model Details
16
 
17
- ### Model Description
18
-
19
- <!-- Provide a longer summary of what this model is. -->
20
- Model was pretrained using standard MLM objective on a large text corpora including open social data, books, Wikipedia, webpages etc.
21
-
22
-
23
- - **Developed by:** VK Applied Research Team
24
  - **Model type:** RoBERTa
25
  - **Languages:** Mostly russian and small fraction of other languages
26
  - **License:** Apache 2.0
@@ -43,12 +39,10 @@ predictions = model(**inputs)
43
 
44
  ### Training Data
45
 
46
- 500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles,
47
- News websites, Social corpus.
48
-
49
- ### Training Procedure
50
 
51
- #### Training Hyperparameters
52
 
53
  | Argument | Value |
54
  |--------------------|----------------------|
@@ -59,36 +53,35 @@ News websites, Social corpus.
59
  | Adam eps | 1e-6 |
60
  | Num training steps | 500k |
61
 
62
- Model was trained using 8xA100 for ~22 days.
63
 
64
- #### Architecture details
65
 
66
- Standard RoBERTa-base parameters:
67
 
68
  | Argument | Value |
69
  |-------------------------|----------------|
70
- |Activation function | gelu |
71
- |Attention dropout | 0.1 |
72
- |Dropout | 0.1 |
73
  |Encoder attention heads | 12 |
74
  |Encoder embed dim | 768 |
75
  |Encoder ffn embed dim | 3,072 |
76
- |Encoder layers | 12 |
 
 
77
  |Max positions | 512 |
78
  |Vocab size | 50266 |
79
  |Tokenizer type | Byte-level BPE |
80
 
81
  ## Evaluation
82
 
83
- Russian Super Glue dev set.
84
-
85
- Best result across base size models in bold.
86
 
87
  | Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
88
  |------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
89
- | [vk-roberta-base](https://huggingface.co/deepvk/roberta-base) | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
90
  | [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
 
 
91
  | [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base) | 0.450 |**0.61**|**0.722**| 0.704 | 0.948 | 0.578 |**0.76** |**0.682** |
92
  | [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased) | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 |**0.583**| 0.737 | 0.657 |
93
- | [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base) | **0.491** |**0.61**| 0.663 | 0.769 |**0.962**| 0.574 | 0.678 | 0.678 |
94
- | [sber-roberta-large](https://huggingface.co/ai-forever/ruRoberta-large)| 0.463 | 0.61 | 0.775 | 0.886 | 0.946 | 0.564 | 0.761 | 0.715 |
 
4
  - ru
5
  - en
6
  library_name: transformers
7
+ pipeline_tag: fill-mask
8
  ---
9
 
10
+ # RoBERTa-base
11
 
12
  <!-- Provide a quick summary of what the model is/does. -->
13
 
14
+ Pretrained bidirectional encoder for russian language.
15
+ The model was trained using standard MLM objective on large text corpora including open social data.
16
+ See [`Training Details`](https://huggingface.co/docs/hub/model-cards#training-details) section for more information
17
 
 
18
 
19
+ - **Developed by:** [deepvk](https://vk.com/deepvk)
 
 
 
 
 
 
20
  - **Model type:** RoBERTa
21
  - **Languages:** Mostly russian and small fraction of other languages
22
  - **License:** Apache 2.0
 
39
 
40
  ### Training Data
41
 
42
+ 500 GB of raw text in total.
43
+ A mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles, News websites, and Social corpus.
 
 
44
 
45
+ ### Training Hyperparameters
46
 
47
  | Argument | Value |
48
  |--------------------|----------------------|
 
53
  | Adam eps | 1e-6 |
54
  | Num training steps | 500k |
55
 
56
+ The model was trained on a machine with 8xA100 for approximately 22 days.
57
 
58
+ ### Architecture details
59
 
 
60
 
61
  | Argument | Value |
62
  |-------------------------|----------------|
63
+ |Encoder layers | 12 |
 
 
64
  |Encoder attention heads | 12 |
65
  |Encoder embed dim | 768 |
66
  |Encoder ffn embed dim | 3,072 |
67
+ |Activation function | GeLU |
68
+ |Attention dropout | 0.1 |
69
+ |Dropout | 0.1 |
70
  |Max positions | 512 |
71
  |Vocab size | 50266 |
72
  |Tokenizer type | Byte-level BPE |
73
 
74
  ## Evaluation
75
 
76
+ We evaluated the model on [Russian Super Glue](https://russiansuperglue.com/) dev set.
77
+ The best result in each task is marked in bold.
78
+ All models have the same size except the distilled version of DeBERTa.
79
 
80
  | Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
81
  |------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
 
82
  | [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
83
+ | | | | | | | | | |
84
+ | [vk-roberta-base](https://huggingface.co/deepvk/roberta-base) | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
85
  | [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base) | 0.450 |**0.61**|**0.722**| 0.704 | 0.948 | 0.578 |**0.76** |**0.682** |
86
  | [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased) | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 |**0.583**| 0.737 | 0.657 |
87
+ | [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base) | **0.491** |**0.61**| 0.663 | 0.769 |**0.962**| 0.574 | 0.678 | 0.678 |