uer commited on
Commit
87d3bef
1 Parent(s): d441ccd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -34
README.md CHANGED
@@ -30,15 +30,16 @@ Here are scores on the devlopment set of six Chinese tasks:
30
 
31
  | Model | Score | douban | chnsenticorp | lcqmc | tnews(CLUE) | iflytek(CLUE) | ocnli(CLUE) |
32
  | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
33
- | RoBERTa-Tiny | 72.3 | 83.0 | 91.4 | 81.8 | 62.0 | 55.0 | 60.3 |
34
- | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
35
- | RoBERTa-Mini | 75.7 | 84.8 | 93.7 | 86.1 | 63.9 | 58.3 | 67.4 |
36
- | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
37
- | RoBERTa-Small | 76.8 | 86.5 | 93.4 | 86.5 | 65.1 | 59.4 | 69.7 |
38
- | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
39
- | RoBERTa-Medium | 77.8 | 87.6 | 94.8 | 88.1 | 65.6 | 59.5 | 71.2 |
40
- | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
41
- | RoBERTa-Base | 79.5 | 89.1 | 95.2 | 89.2 | 67.0 | 60.9 | 75.5 |
 
42
 
43
  For each task, we selected the best fine-tuning hyperparameters from the lists below, and trained with the sequence length of 128:
44
 
@@ -137,51 +138,51 @@ Taking the case of word-based RoBERTa-Medium
137
  Stage1:
138
 
139
  ```
140
- python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\
141
- --spm_model_path models/cluecorpussmall_spm.model \\
142
- --dataset_path cluecorpussmall_word_seq128_dataset.pt \\
143
- --processes_num 32 --seq_length 128 \\
144
  --dynamic_masking --target mlm
145
  ```
146
 
147
  ```
148
- python3 pretrain.py --dataset_path cluecorpussmall_word_seq128_dataset.pt \\
149
- --spm_model_path models/cluecorpussmall_spm.model \\
150
- --config_path models/bert/medium_config.json \\
151
- --output_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin \\
152
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\
153
- --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \\
154
- --learning_rate 1e-4 --batch_size 64 \\
155
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
156
  ```
157
 
158
  Stage2:
159
 
160
  ```
161
- python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\
162
- --spm_model_path models/cluecorpussmall_spm.model \\
163
- --dataset_path cluecorpussmall_word_seq512_dataset.pt \\
164
- --processes_num 32 --seq_length 512 \\
165
  --dynamic_masking --target mlm
166
  ```
167
 
168
  ```
169
- python3 pretrain.py --dataset_path cluecorpussmall_word_seq512_dataset.pt \\
170
- --pretrained_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-1000000 \\
171
- --spm_model_path models/cluecorpussmall_spm.model \\
172
- --config_path models/bert/medium_config.json \\
173
- --output_model_path models/cluecorpussmall_word_roberta_medium_seq512_model.bin \\
174
- --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\
175
- --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \\
176
- --learning_rate 5e-5 --batch_size 16 \\
177
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
178
  ```
179
 
180
  Finally, we convert the pre-trained model into Huggingface's format:
181
 
182
  ```
183
- python3 scripts/convert_bert_from_uer_to_huggingface.py --input_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-250000 \\
184
- --output_model_path pytorch_model.bin \\
185
  --layers_num 12 --target mlm
186
  ```
187
 
 
30
 
31
  | Model | Score | douban | chnsenticorp | lcqmc | tnews(CLUE) | iflytek(CLUE) | ocnli(CLUE) |
32
  | -------------- | :---: | :----: | :----------: | :---: | :---------: | :-----------: | :---------: |
33
+ | RoBERTa-Tiny (char) | 72.3 | 83.0 | 91.4 | 81.8 | 62.0 | 55.0 | 60.3 |
34
+ | RoBERTa-Tiny (word) | 72.3 | 83.0 | 91.4 | 81.8 | 62.0 | 55.0 | 60.3 |
35
+ | RoBERTa-Mini (char) | 75.7 | 84.8 | 93.7 | 86.1 | 63.9 | 58.3 | 67.4 |
36
+ | RoBERTa-Mini (word) | 75.7 | 84.8 | 93.7 | 86.1 | 63.9 | 58.3 | 67.4 |
37
+ | RoBERTa-Small (char) | 76.8 | 86.5 | 93.4 | 86.5 | 65.1 | 59.4 | 69.7 |
38
+ | RoBERTa-Small (word) | 76.8 | 86.5 | 93.4 | 86.5 | 65.1 | 59.4 | 69.7 |
39
+ | RoBERTa-Medium (char) | 77.8 | 87.6 | 94.8 | 88.1 | 65.6 | 59.5 | 71.2 |
40
+ | RoBERTa-Medium (word) | 77.8 | 87.6 | 94.8 | 88.1 | 65.6 | 59.5 | 71.2 |
41
+ | RoBERTa-Base (char) | 79.5 | 89.1 | 95.2 | 89.2 | 67.0 | 60.9 | 75.5 |
42
+ | RoBERTa-Base (word) | 79.5 | 89.1 | 95.2 | 89.2 | 67.0 | 60.9 | 75.5 |
43
 
44
  For each task, we selected the best fine-tuning hyperparameters from the lists below, and trained with the sequence length of 128:
45
 
 
138
  Stage1:
139
 
140
  ```
141
+ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\\\
142
+ --spm_model_path models/cluecorpussmall_spm.model \\\\
143
+ --dataset_path cluecorpussmall_word_seq128_dataset.pt \\\\
144
+ --processes_num 32 --seq_length 128 \\\\
145
  --dynamic_masking --target mlm
146
  ```
147
 
148
  ```
149
+ python3 pretrain.py --dataset_path cluecorpussmall_word_seq128_dataset.pt \\\\
150
+ --spm_model_path models/cluecorpussmall_spm.model \\\\
151
+ --config_path models/bert/medium_config.json \\\\
152
+ --output_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin \\\\
153
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\\\
154
+ --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \\\\
155
+ --learning_rate 1e-4 --batch_size 64 \\\\
156
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
157
  ```
158
 
159
  Stage2:
160
 
161
  ```
162
+ python3 preprocess.py --corpus_path corpora/cluecorpussmall.txt \\\\
163
+ --spm_model_path models/cluecorpussmall_spm.model \\\\
164
+ --dataset_path cluecorpussmall_word_seq512_dataset.pt \\\\
165
+ --processes_num 32 --seq_length 512 \\\\
166
  --dynamic_masking --target mlm
167
  ```
168
 
169
  ```
170
+ python3 pretrain.py --dataset_path cluecorpussmall_word_seq512_dataset.pt \\\\
171
+ --pretrained_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-1000000 \\\\
172
+ --spm_model_path models/cluecorpussmall_spm.model \\\\
173
+ --config_path models/bert/medium_config.json \\\\
174
+ --output_model_path models/cluecorpussmall_word_roberta_medium_seq512_model.bin \\\\
175
+ --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \\\\
176
+ --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \\\\
177
+ --learning_rate 5e-5 --batch_size 16 \\\\
178
  --embedding word_pos_seg --encoder transformer --mask fully_visible --target mlm --tie_weights
179
  ```
180
 
181
  Finally, we convert the pre-trained model into Huggingface's format:
182
 
183
  ```
184
+ python3 scripts/convert_bert_from_uer_to_huggingface.py --input_model_path models/cluecorpussmall_word_roberta_medium_seq128_model.bin-250000 \\\\
185
+ --output_model_path pytorch_model.bin \\\\
186
  --layers_num 12 --target mlm
187
  ```
188