uer commited on
Commit
d794daf
1 Parent(s): fa99536

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -6,21 +6,19 @@ widget:
6
 
7
  ---
8
 
9
- # Chinese GPT2 Language Model
10
 
11
  ## Model description
12
 
13
- This model is used to generate Chinese ancient poems and is pre-trained by [UER-py](https://www.aclweb.org/anthology/D19-3041.pdf).
14
 
15
- You can download this model via HuggingFace from the link :[gpt2-chinese-poem][poem]
16
 
17
  ## How to use
18
 
19
- Because the parameter ***skip_special_tokens*** is used in the ***pipelines.py*** , special tokens such as [SEP], [UNK] will be deleted, and the output results may not be neat.
20
 
21
- You can use this model directly with a pipeline for text generation:
22
-
23
- When the parameter ***skip_special_tokens*** is True:
24
 
25
  ```python
26
  >>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
@@ -32,7 +30,7 @@ When the parameter ***skip_special_tokens*** is True:
32
  [{'generated_text': '[CLS]梅 山 如 积 翠 , 的 手 堪 捧 。 遥 遥 仙 人 尉 , 盘 盘 故 时 陇 。 丹 泉 清 可 鉴 , 石 乳 甘 于 。 行 将 解 尘 缨 , 于 焉 蹈 高 踵 。 我'}]
33
  ```
34
 
35
- When the parameter ***skip_special_tokens*** is Flase:
36
 
37
  ```python
38
  >>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
@@ -46,11 +44,11 @@ When the parameter ***skip_special_tokens*** is Flase:
46
 
47
  ## Training data
48
 
49
- Contains about 800,000 chinese ancient poems.
50
 
51
  ## Training procedure
52
 
53
- Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 200,000 steps with a sequence length of 128.
54
 
55
  ```
56
  python3 preprocess.py --corpus_path corpora/poem.txt \
 
6
 
7
  ---
8
 
9
+ # Chinese Poem GPT2 Model
10
 
11
  ## Model description
12
 
13
+ The model is used to generate Chinese ancient poems. You can download the model either from the [GPT2-Chinese Github page](https://github.com/Morizeyao/GPT2-Chinese), or via HuggingFace from the link [gpt2-chinese-poem][poem].
14
 
15
+ Since the parameter skip_special_tokens is used in the pipelines.py, special tokens such as [SEP], [UNK] will be deleted, and the output results may not be neat.
16
 
17
  ## How to use
18
 
19
+ You can use the model directly with a pipeline for text generation:
20
 
21
+ When the parameter skip_special_tokens is True:
 
 
22
 
23
  ```python
24
  >>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
 
30
  [{'generated_text': '[CLS]梅 山 如 积 翠 , 的 手 堪 捧 。 遥 遥 仙 人 尉 , 盘 盘 故 时 陇 。 丹 泉 清 可 鉴 , 石 乳 甘 于 。 行 将 解 尘 缨 , 于 焉 蹈 高 踵 。 我'}]
31
  ```
32
 
33
+ When the parameter skip_special_tokens is False:
34
 
35
  ```python
36
  >>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
 
44
 
45
  ## Training data
46
 
47
+ Contains 800,000 chinese ancient poems collected by [chinese-poetry](https://github.com/chinese-poetry/chinese-poetry) and [Poetry](https://github.com/Werneror/Poetry) projects.
48
 
49
  ## Training procedure
50
 
51
+ The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 200,000 steps with a sequence length of 128.
52
 
53
  ```
54
  python3 preprocess.py --corpus_path corpora/poem.txt \