wanng commited on
Commit
36501e1
1 Parent(s): 384ba68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -27,9 +27,9 @@ Good at solving text summarization tasks, Chinese PAGASUS-base.
27
 
28
  参考论文:[PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf)
29
 
30
- 为了解决中文的自动摘要任务,我们遵循PEGASUS的设计来训练中文的版本。我们使用了悟道语料库(180G版本)作为预训练数据集。此外,考虑到中文sentence piece不稳定,我们在Randeng-PEGASUS中同时使用了结巴分词和BERT分词器。我们也提供large的版本:[IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)。
31
 
32
- To solve Chinese abstractive summarization tasks, we follow the PEGASUS guidelines. We employ a version of WuDao Corpora (180 GB version) as a pre-training dataset. In addition, considering that the Chinese sentence chunk is unstable, we utilize jiebaand BERT tokenizer in our Randeng-PEGASUS. We also provide a large size version, available with [IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)
33
 
34
  ## 使用 Usage
35
 
 
27
 
28
  参考论文:[PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf)
29
 
30
+ 为了解决中文的自动摘要任务,我们遵循PEGASUS的设计来训练中文的版本。我们使用了悟道语料库(180G版本)作为预训练数据集。此外,考虑到中文sentence piece不稳定,我们在Randeng-PEGASUS中同时使用了结巴分词和BERT分词器。我们也提供large的版本:[IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)。以及,我们也提供了在中文摘要数据集上微调的版本:[Randeng-Pegasus-238M-Summary-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese)
31
 
32
+ To solve Chinese abstractive summarization tasks, we follow the PEGASUS guidelines. We employ a version of WuDao Corpora (180 GB version) as a pre-training dataset. In addition, considering that the Chinese sentence chunk is unstable, we utilize jiebaand BERT tokenizer in our Randeng-PEGASUS. We also provide a large size version, available with [IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese). And, we also provide a version after fine-tuning on Chinese text summarization datasets: [Randeng-Pegasus-238M-Summary-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese).
33
 
34
  ## 使用 Usage
35