uer commited on
Commit
6230961
1 Parent(s): f8e559b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -32,7 +32,7 @@ Training data contains 3,000,000 ancient Chinese which are collected by [daizhig
32
 
33
  ## Training procedure
34
 
35
- The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 500,000 steps with a sequence length of 320.
36
 
37
  ```
38
  python3 preprocess.py --corpus_path corpora/ancient_chinese.txt \
 
32
 
33
  ## Training procedure
34
 
35
+ The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 500,000 steps with a sequence length of 320. We use extended vocabulary to handle out-of-vocabulary words. The Chinese character that occurs greater than or equal to 100 in ancient chinese corpus is added to the vocabulary.
36
 
37
  ```
38
  python3 preprocess.py --corpus_path corpora/ancient_chinese.txt \