zemerov commited on
Commit
f91371a
1 Parent(s): eb1bea4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -39
README.md CHANGED
@@ -27,7 +27,7 @@ Model was pretrained using standard MLM objective on a large text corpora includ
27
 
28
  ## How to Get Started with the Model
29
 
30
- ```
31
  from transformers import AutoTokenizer, AutoModel
32
 
33
  tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
@@ -43,31 +43,21 @@ predictions = model(**inputs)
43
 
44
  ### Training Data
45
 
46
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
47
-
48
- Mix of the following data:
49
- * Wikipedia
50
- * Books
51
- * Twitter comments
52
- * Pikabu
53
- * Proza.ru
54
- * Film subtitles
55
- * News websites
56
- * Social corpus
57
-
58
- ~500gb of raw texts
59
 
60
  ### Training Procedure
61
 
62
  #### Training Hyperparameters
63
 
64
- - **Training regime:** fp16 mixed precision
65
- - **Training framework:** Fairseq
66
- - **Optimizer:** Adam
67
- - **Adam betas:** 0.9,0.98
68
- - **Adam eps:** 1e-6
69
- - **Num training steps:** 500k
70
- - **Train batch size:** 4096
 
71
 
72
  Model was trained using 8xA100 for ~22 days.
73
 
@@ -75,25 +65,29 @@ Model was trained using 8xA100 for ~22 days.
75
 
76
  Standard RoBERTa-base parameters:
77
 
78
- - **Activation function:** gelu
79
- - **Attention dropout:** 0.1
80
- - **Dropout:** 0.1
81
- - **Encoder attention heads:** 12
82
- - **Encoder embed dim:** 768
83
- - **Encoder ffn embed dim:** 3,072
84
- - **Encoder layers:** 12
85
- - **Max positions:** 512
86
- - **Vocab size:** 50266
 
 
87
 
88
  ## Evaluation
89
 
90
- Результаты на Russian Super Glue dev
 
 
91
 
92
- | Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
93
- |--------------------|-------|-------|--------|-------|-------|-------|---------|-----------|
94
- | vk-roberta-base | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
95
- | vk-deberta-distill | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
96
- | vk-deberta-base | 0.450 | 0.61 | 0.722 | 0.704 | 0.948 | 0.578 | 0.76 | 0.682 |
97
- | vk-bert-base | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 | 0.583 | 0.737 | 0.657 |
98
- | sber-roberta-large | 0.463 | 0.61 | 0.775 | 0.886 | 0.946 | 0.564 | 0.761 | 0.715 |
99
- | sber-bert-base | 0.491 | 0.61 | 0.663 | 0.769 | 0.962 | 0.574 | 0.678 | 0.678 |
 
27
 
28
  ## How to Get Started with the Model
29
 
30
+ ```python
31
  from transformers import AutoTokenizer, AutoModel
32
 
33
  tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
 
43
 
44
  ### Training Data
45
 
46
+ 500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles,
47
+ News websites, Social corpus.
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ### Training Procedure
50
 
51
  #### Training Hyperparameters
52
 
53
+ | Argument | Value |
54
+ |--------------------|----------------------|
55
+ | Training regime | fp16 mixed precision |
56
+ | Training framework | Fairseq |
57
+ | Optimizer | Adam |
58
+ | Adam betas | 0.9,0.98 |
59
+ | Adam eps | 1e-6 |
60
+ | Num training steps | 500k |
61
 
62
  Model was trained using 8xA100 for ~22 days.
63
 
 
65
 
66
  Standard RoBERTa-base parameters:
67
 
68
+ | Argument | Value |
69
+ |-------------------------|-------|
70
+ |Activation function | gelu |
71
+ |Attention dropout | 0.1 |
72
+ |Dropout | 0.1 |
73
+ |Encoder attention heads | 12 |
74
+ |Encoder embed dim | 768 |
75
+ |Encoder ffn embed dim | 3,072 |
76
+ |Encoder layers | 12 |
77
+ |Max positions | 512 |
78
+ |Vocab size | 50266 |
79
 
80
  ## Evaluation
81
 
82
+ Russian Super Glue dev set.
83
+
84
+ Best result across base size models in bold.
85
 
86
+ | Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
87
+ |------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
88
+ | [vk-roberta-base](https://huggingface.co/deepvk/roberta-base) | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
89
+ | [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
90
+ | [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base) | 0.450 |**0.61**|**0.722**| 0.704 | 0.948 | 0.578 |**0.76** |**0.682** |
91
+ | [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased) | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 |**0.583**| 0.737 | 0.657 |
92
+ | [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base) | **0.491** |**0.61**| 0.663 | 0.769 |**0.962**| 0.574 | 0.678 | 0.678 |
93
+ | [sber-roberta-large](https://huggingface.co/ai-forever/ruRoberta-large)| 0.463 | 0.61 | 0.775 | 0.886 | 0.946 | 0.564 | 0.761 | 0.715 |