uer commited on
Commit
b895131
1 Parent(s): 64bd4f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -8,7 +8,7 @@ widget:
8
  ---
9
 
10
 
11
- # Chinese GPT2-distil Model
12
 
13
  ## Model description
14
 
@@ -112,7 +112,7 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
112
  --dataset_path corpora/cluecorpussmall_lm_seq128_dataset.pt \
113
  --vocab_path models/google_zh_vocab.txt \
114
  --config_path models/gpt2/xlarge_config.json \
115
- --output_model_path models/cluecorpussmall_gpt2_xlarge_seq128 \
116
  --world_size 8 --batch_size 64 \
117
  --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
118
  --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 24
@@ -121,8 +121,8 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
121
  Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
122
 
123
  ```
124
- python3 models/cluecorpussmall_gpt2_xlarge_seq128/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq128/ \
125
- models/cluecorpussmall_gpt2_xlarge_seq128.bin
126
  ```
127
 
128
  Stage2:
@@ -139,8 +139,8 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
139
  --dataset_path corpora/cluecorpussmall_lm_seq1024_dataset.pt \
140
  --vocab_path models/google_zh_vocab.txt \
141
  --config_path models/gpt2/xlarge_config.json \
142
- --pretrained_model_path models/cluecorpussmall_gpt2_xlarge_seq128.bin \
143
- --output_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_stage2 \
144
  --world_size 8 --batch_size 16 --learning_rate 5e-5 \
145
  --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
146
  --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 6
@@ -149,14 +149,14 @@ deepspeed pretrain.py --deepspeed --deepspeed_config models/deepspeed_config.jso
149
  Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
150
 
151
  ```
152
- python3 models/cluecorpussmall_gpt2_xlarge_seq1024_stage2/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq1024_stage2/ \
153
- models/cluecorpussmall_gpt2_xlarge_seq1024_stage2.bin
154
  ```
155
 
156
  Finally, we convert the pre-trained model into Huggingface's format:
157
 
158
  ```
159
- python3 scripts/convert_gpt2_from_tencentpretrain_to_huggingface.py --input_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_stage2.bin \
160
  --output_model_path pytorch_model.bin \
161
  --layers_num 48
162
  ```
 
8
  ---
9
 
10
 
11
+ # Chinese GPT2 Models
12
 
13
  ## Model description
14
 
 
112
  --dataset_path corpora/cluecorpussmall_lm_seq128_dataset.pt \
113
  --vocab_path models/google_zh_vocab.txt \
114
  --config_path models/gpt2/xlarge_config.json \
115
+ --output_model_path models/cluecorpussmall_gpt2_xlarge_seq128_model \
116
  --world_size 8 --batch_size 64 \
117
  --total_steps 1000000 --save_checkpoint_steps 100000 --report_steps 50000 \
118
  --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 24
 
121
  Before stage2, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
122
 
123
  ```
124
+ python3 models/cluecorpussmall_gpt2_xlarge_seq128_model/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq128_model/ \
125
+ models/cluecorpussmall_gpt2_xlarge_seq128_model.bin
126
  ```
127
 
128
  Stage2:
 
139
  --dataset_path corpora/cluecorpussmall_lm_seq1024_dataset.pt \
140
  --vocab_path models/google_zh_vocab.txt \
141
  --config_path models/gpt2/xlarge_config.json \
142
+ --pretrained_model_path models/cluecorpussmall_gpt2_xlarge_seq128_model.bin \
143
+ --output_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_model \
144
  --world_size 8 --batch_size 16 --learning_rate 5e-5 \
145
  --total_steps 250000 --save_checkpoint_steps 50000 --report_steps 10000 \
146
  --deepspeed_checkpoint_activations --deepspeed_checkpoint_layers_num 6
 
149
  Then, we extract fp32 consolidated weights from a zero 2 and 3 DeepSpeed checkpoints:
150
 
151
  ```
152
+ python3 models/cluecorpussmall_gpt2_xlarge_seq1024_model/zero_to_fp32.py models/cluecorpussmall_gpt2_xlarge_seq1024_model/ \
153
+ models/cluecorpussmall_gpt2_xlarge_seq1024_model.bin
154
  ```
155
 
156
  Finally, we convert the pre-trained model into Huggingface's format:
157
 
158
  ```
159
+ python3 scripts/convert_gpt2_from_tencentpretrain_to_huggingface.py --input_model_path models/cluecorpussmall_gpt2_xlarge_seq1024_model.bin \
160
  --output_model_path pytorch_model.bin \
161
  --layers_num 48
162
  ```