dongxiaoqun commited on
Commit
e0b383a
1 Parent(s): 6d0226a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -7,14 +7,14 @@ inference: False
7
  ---
8
 
9
 
10
- Randeng_egasus_238M_summary model (Chinese),codes has merged into [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
11
 
12
  The 523M million parameter randeng_pegasus_large model, training with sampled gap sentence ratios on 180G Chinese data, and stochastically sample important sentences. The pretraining task just same as the paper [PEGASUS: Pre-training with Extracted Gap-sentences for
13
  Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf) mentioned.
14
 
15
  Different from the English version of pegasus, considering that the Chinese sentence piece is unstable, we use jieba and Bertokenizer as the tokenizer in chinese pegasus model.
16
 
17
- We also pretained a large model , available with [Randeng_Pegasus_523M_Chinese](https://huggingface.co/IDEA-CCNL/Randeng_Pegasus_523M_Chinese)
18
 
19
  Task: Summarization
20
 
@@ -29,8 +29,8 @@ from transformers import PegasusForConditionalGeneration
29
  # and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model
30
  from tokenizers_pegasus import PegasusTokenizer
31
 
32
- model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng_Pegasus_238M_Chinese")
33
- tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/Randeng_Pegasus_238M_Chinese")
34
 
35
  text = "据微信公众号“界面”报道,4日上午10点左右,中国发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
36
  inputs = tokenizer(text, max_length=512, return_tensors="pt")
 
7
  ---
8
 
9
 
10
+ IDEA-CCNL/Randeng-Pegasus-238M-Chinese model (Chinese),codes has merged into [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
11
 
12
  The 523M million parameter randeng_pegasus_large model, training with sampled gap sentence ratios on 180G Chinese data, and stochastically sample important sentences. The pretraining task just same as the paper [PEGASUS: Pre-training with Extracted Gap-sentences for
13
  Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf) mentioned.
14
 
15
  Different from the English version of pegasus, considering that the Chinese sentence piece is unstable, we use jieba and Bertokenizer as the tokenizer in chinese pegasus model.
16
 
17
+ We also pretained a large model , available with [Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)
18
 
19
  Task: Summarization
20
 
 
29
  # and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model
30
  from tokenizers_pegasus import PegasusTokenizer
31
 
32
+ model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-238M-Chinese")
33
+ tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/Randeng-Pegasus-238M-Chinese")
34
 
35
  text = "据微信公众号“界面”报道,4日上午10点左右,中国发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
36
  inputs = tokenizer(text, max_length=512, return_tensors="pt")