dongxiaoqun
commited on
Commit
•
e0b383a
1
Parent(s):
6d0226a
Update README.md
Browse files
README.md
CHANGED
@@ -7,14 +7,14 @@ inference: False
|
|
7 |
---
|
8 |
|
9 |
|
10 |
-
|
11 |
|
12 |
The 523M million parameter randeng_pegasus_large model, training with sampled gap sentence ratios on 180G Chinese data, and stochastically sample important sentences. The pretraining task just same as the paper [PEGASUS: Pre-training with Extracted Gap-sentences for
|
13 |
Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf) mentioned.
|
14 |
|
15 |
Different from the English version of pegasus, considering that the Chinese sentence piece is unstable, we use jieba and Bertokenizer as the tokenizer in chinese pegasus model.
|
16 |
|
17 |
-
We also pretained a large model , available with [
|
18 |
|
19 |
Task: Summarization
|
20 |
|
@@ -29,8 +29,8 @@ from transformers import PegasusForConditionalGeneration
|
|
29 |
# and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model
|
30 |
from tokenizers_pegasus import PegasusTokenizer
|
31 |
|
32 |
-
model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/
|
33 |
-
tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/
|
34 |
|
35 |
text = "据微信公众号“界面”报道,4日上午10点左右,中国发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
|
36 |
inputs = tokenizer(text, max_length=512, return_tensors="pt")
|
|
|
7 |
---
|
8 |
|
9 |
|
10 |
+
IDEA-CCNL/Randeng-Pegasus-238M-Chinese model (Chinese),codes has merged into [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
|
11 |
|
12 |
The 523M million parameter randeng_pegasus_large model, training with sampled gap sentence ratios on 180G Chinese data, and stochastically sample important sentences. The pretraining task just same as the paper [PEGASUS: Pre-training with Extracted Gap-sentences for
|
13 |
Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf) mentioned.
|
14 |
|
15 |
Different from the English version of pegasus, considering that the Chinese sentence piece is unstable, we use jieba and Bertokenizer as the tokenizer in chinese pegasus model.
|
16 |
|
17 |
+
We also pretained a large model , available with [Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)
|
18 |
|
19 |
Task: Summarization
|
20 |
|
|
|
29 |
# and then you will see the tokenizers_pegasus.py and data_utils.py which are needed by pegasus model
|
30 |
from tokenizers_pegasus import PegasusTokenizer
|
31 |
|
32 |
+
model = PegasusForConditionalGeneration.from_pretrained("IDEA-CCNL/Randeng-Pegasus-238M-Chinese")
|
33 |
+
tokenizer = PegasusTokenizer.from_pretrained("IDEA-CCNL/Randeng-Pegasus-238M-Chinese")
|
34 |
|
35 |
text = "据微信公众号“界面”报道,4日上午10点左右,中国发改委反垄断调查小组突击查访奔驰上海办事处,调取数据材料,并对多名奔驰高管进行了约谈。截止昨日晚9点,包括北京梅赛德斯-奔驰销售服务有限公司东区总经理在内的多名管理人员仍留在上海办公室内"
|
36 |
inputs = tokenizer(text, max_length=512, return_tensors="pt")
|