uer commited on
Commit
d441ccd
1 Parent(s): 2a2ad24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -25
README.md CHANGED
@@ -26,6 +26,26 @@ You can download the 5 Chinese RoBERTa miniatures either from the [UER-py Github
26
  | **word-based RoBERTa-Medium** | [**L=8/H=512 (Medium)**][8_512] |
27
  | **word-based RoBERTa-Base** | [**L=12/H=768 (Base)**][12_768] |
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## How to use
30
 
31
  You can use this model directly with a pipeline for masked language modeling (take the case of word-based RoBERTa-Medium):
@@ -117,51 +137,51 @@ Taking the case of word-based RoBERTa-Medium
117
  Stage1:
118
 
119
  ```
120
- python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
121
- --spm_model_path models/cluecorpussmall_spm.model \
122
- --dataset_path cluecorpussmall_word_seq128_dataset.pt \
123
- --processes_num 32 --seq_length 128 \
124
  --dynamic_masking --target mlm
125
  ```
126
 
127
  ```
128
- python3 pretrain.py --dataset_path cluecorpussmall_word_seq128_dataset.pt \
129
- --spm_model_path models/cluecorpussmall_spm.model \
130
- --config_path models/bert/medium_config.json \
131
- --output_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin \
132
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
133
- --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
134
- --learning_rate 1e-4 --batch_size 64 \
135
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
136
  ```
137
 
138
  Stage2:
139
 
140
  ```
141
- python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \
142
- --spm_model_path models/cluecorpussmall_spm.model \
143
- --dataset_path cluecorpussmall_word_seq512_dataset.pt \
144
- --processes_num 32 --seq_length 512 \
145
  --dynamic_masking --target mlm
146
  ```
147
 
148
  ```
149
- python3 pretrain.py --dataset_path cluecorpussmall_word_seq512_dataset.pt \
150
- --pretrained_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-1000000 \
151
- --spm_model_path models/cluecorpussmall_spm.model \
152
- --config_path models/bert/medium_config.json \
153
- --output_model_path models/cluecorpussmall_word_roberta_medium_seq512_model.bin \
154
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
155
- --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
156
- --learning_rate 5e-5 --batch_size 16 \
157
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
158
  ```
159
 
160
  Finally, we convert the pre-trained model into Huggingface's format:
161
 
162
  ```
163
- python3 scripts/convert_bert_from_uer_to_huggingface.py --input_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-250000 \
164
- --output_model_path pytorch_model.bin \
165
  --layers_num 12 --target mlm
166
  ```
167
 
 
26
  | **word-based RoBERTa-Medium** | [**L=8/H=512 (Medium)**][8_512] |
27
  | **word-based RoBERTa-Base** | [**L=12/H=768 (Base)**][12_768] |
28
 
29
+ Here are scores on the devlopment set of six Chinese tasks:
30
+
31
+ | Model | Score | douban | chnsenticorp | lcqmc | tnews(CLUE) | iflytek(CLUE) | ocnli(CLUE) |
32
+ | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
33
+ | RoBERTa-Tiny | 72.3 | 83.0 | 91.4 | 81.8 | 62.0 | 55.0 | 60.3 |
34
+ | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
35
+ | RoBERTa-Mini | 75.7 | 84.8 | 93.7 | 86.1 | 63.9 | 58.3 | 67.4 |
36
+ | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
37
+ | RoBERTa-Small | 76.8 | 86.5 | 93.4 | 86.5 | 65.1 | 59.4 | 69.7 |
38
+ | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
39
+ | RoBERTa-Medium | 77.8 | 87.6 | 94.8 | 88.1 | 65.6 | 59.5 | 71.2 |
40
+ | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
41
+ | RoBERTa-Base | 79.5 | 89.1 | 95.2 | 89.2 | 67.0 | 60.9 | 75.5 |
42
+
43
+ For each task, we selected the best fine-tuning hyperparameters from the lists below, and trained with the sequence length of 128:
44
+
45
+ - epochs: 3, 5, 8
46
+ - batch sizes: 32, 64
47
+ - learning rates: 3e-5, 1e-4, 3e-4
48
+
49
  ## How to use
50
 
51
  You can use this model directly with a pipeline for masked language modeling (take the case of word-based RoBERTa-Medium):
 
137
  Stage1:
138
 
139
  ```
140
+ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\
141
+ --spm_model_path models/cluecorpussmall_spm.model \\
142
+ --dataset_path cluecorpussmall_word_seq128_dataset.pt \\
143
+ --processes_num 32 --seq_length 128 \\
144
  --dynamic_masking --target mlm
145
  ```
146
 
147
  ```
148
+ python3 pretrain.py --dataset_path cluecorpussmall_word_seq128_dataset.pt \\
149
+ --spm_model_path models/cluecorpussmall_spm.model \\
150
+ --config_path models/bert/medium_config.json \\
151
+ --output_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin \\
152
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\
153
+ --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \\
154
+ --learning_rate 1e-4 --batch_size 64 \\
155
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
156
  ```
157
 
158
  Stage2:
159
 
160
  ```
161
+ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\
162
+ --spm_model_path models/cluecorpussmall_spm.model \\
163
+ --dataset_path cluecorpussmall_word_seq512_dataset.pt \\
164
+ --processes_num 32 --seq_length 512 \\
165
  --dynamic_masking --target mlm
166
  ```
167
 
168
  ```
169
+ python3 pretrain.py --dataset_path cluecorpussmall_word_seq512_dataset.pt \\
170
+ --pretrained_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-1000000 \\
171
+ --spm_model_path models/cluecorpussmall_spm.model \\
172
+ --config_path models/bert/medium_config.json \\
173
+ --output_model_path models/cluecorpussmall_word_roberta_medium_seq512_model.bin \\
174
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\
175
+ --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \\
176
+ --learning_rate 5e-5 --batch_size 16 \\
177
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
178
  ```
179
 
180
  Finally, we convert the pre-trained model into Huggingface's format:
181
 
182
  ```
183
+ python3 scripts/convert_bert_from_uer_to_huggingface.py --input_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-250000 \\
184
+ --output_model_path pytorch_model.bin \\
185
  --layers_num 12 --target mlm
186
  ```
187