zemerov commited on
Commit
eb1bea4
1 Parent(s): e457630

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -41
README.md CHANGED
@@ -39,7 +39,6 @@ inputs = tokenizer(text, return_tensors='pt')
39
  predictions = model(**inputs)
40
  ```
41
 
42
-
43
  ## Training Details
44
 
45
  ### Training Data
@@ -47,56 +46,54 @@ predictions = model(**inputs)
47
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
48
 
49
  Mix of the following data:
 
 
 
 
 
 
 
 
50
 
 
51
 
52
  ### Training Procedure
53
 
54
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
55
-
56
- #### Preprocessing [optional]
57
-
58
- [More Information Needed]
59
-
60
-
61
  #### Training Hyperparameters
62
 
63
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
64
-
65
- #### Speeds, Sizes, Times [optional]
66
-
67
- Standard RoBERTA-base size;
68
-
69
- ## Evaluation
70
-
71
- <!-- This section describes the evaluation protocols and provides the results. -->
72
 
73
- ### Testing Data, Factors & Metrics
74
 
75
- #### Testing Data
76
 
77
- <!-- This should link to a Data Card if possible. -->
78
 
79
- [More Information Needed]
80
-
81
- #### Factors
82
-
83
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
84
-
85
- [More Information Needed]
86
-
87
- #### Metrics
88
-
89
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
90
-
91
- [More Information Needed]
92
-
93
- ### Results
94
-
95
- [More Information Needed]
96
-
97
- #### Summary
98
 
 
99
 
100
- ## Compute Infrastructure
101
 
102
- Model was trained using 8xA100 for ~22 days.
 
 
 
 
 
 
 
 
39
  predictions = model(**inputs)
40
  ```
41
 
 
42
  ## Training Details
43
 
44
  ### Training Data
 
46
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
47
 
48
  Mix of the following data:
49
+ * Wikipedia
50
+ * Books
51
+ * Twitter comments
52
+ * Pikabu
53
+ * Proza.ru
54
+ * Film subtitles
55
+ * News websites
56
+ * Social corpus
57
 
58
+ ~500gb of raw texts
59
 
60
  ### Training Procedure
61
 
 
 
 
 
 
 
 
62
  #### Training Hyperparameters
63
 
64
+ - **Training regime:** fp16 mixed precision
65
+ - **Training framework:** Fairseq
66
+ - **Optimizer:** Adam
67
+ - **Adam betas:** 0.9,0.98
68
+ - **Adam eps:** 1e-6
69
+ - **Num training steps:** 500k
70
+ - **Train batch size:** 4096
 
 
71
 
72
+ Model was trained using 8xA100 for ~22 days.
73
 
74
+ #### Architecture details
75
 
76
+ Standard RoBERTa-base parameters:
77
 
78
+ - **Activation function:** gelu
79
+ - **Attention dropout:** 0.1
80
+ - **Dropout:** 0.1
81
+ - **Encoder attention heads:** 12
82
+ - **Encoder embed dim:** 768
83
+ - **Encoder ffn embed dim:** 3,072
84
+ - **Encoder layers:** 12
85
+ - **Max positions:** 512
86
+ - **Vocab size:** 50266
 
 
 
 
 
 
 
 
 
 
87
 
88
+ ## Evaluation
89
 
90
+ Результаты на Russian Super Glue dev
91
 
92
+ | Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
93
+ |--------------------|-------|-------|--------|-------|-------|-------|---------|-----------|
94
+ | vk-roberta-base | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
95
+ | vk-deberta-distill | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
96
+ | vk-deberta-base | 0.450 | 0.61 | 0.722 | 0.704 | 0.948 | 0.578 | 0.76 | 0.682 |
97
+ | vk-bert-base | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 | 0.583 | 0.737 | 0.657 |
98
+ | sber-roberta-large | 0.463 | 0.61 | 0.775 | 0.886 | 0.946 | 0.564 | 0.761 | 0.715 |
99
+ | sber-bert-base | 0.491 | 0.61 | 0.663 | 0.769 | 0.962 | 0.574 | 0.678 | 0.678 |