Update README.md
Browse files
README.md
CHANGED
@@ -27,9 +27,9 @@ Good at solving text summarization tasks, Chinese PAGASUS-base.
|
|
27 |
|
28 |
参考论文:[PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf)
|
29 |
|
30 |
-
为了解决中文的自动摘要任务,我们遵循PEGASUS的设计来训练中文的版本。我们使用了悟道语料库(180G版本)作为预训练数据集。此外,考虑到中文sentence piece不稳定,我们在Randeng-PEGASUS中同时使用了结巴分词和BERT分词器。我们也提供large的版本:[IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)。
|
31 |
|
32 |
-
To solve Chinese abstractive summarization tasks, we follow the PEGASUS guidelines. We employ a version of WuDao Corpora (180 GB version) as a pre-training dataset. In addition, considering that the Chinese sentence chunk is unstable, we utilize jiebaand BERT tokenizer in our Randeng-PEGASUS. We also provide a large size version, available with [IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)
|
33 |
|
34 |
## 使用 Usage
|
35 |
|
|
|
27 |
|
28 |
参考论文:[PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf)
|
29 |
|
30 |
+
为了解决中文的自动摘要任务,我们遵循PEGASUS的设计来训练中文的版本。我们使用了悟道语料库(180G版本)作为预训练数据集。此外,考虑到中文sentence piece不稳定,我们在Randeng-PEGASUS中同时使用了结巴分词和BERT分词器。我们也提供large的版本:[IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)。以及,我们也提供了在中文摘要数据集上微调的版本:[Randeng-Pegasus-238M-Summary-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese)。
|
31 |
|
32 |
+
To solve Chinese abstractive summarization tasks, we follow the PEGASUS guidelines. We employ a version of WuDao Corpora (180 GB version) as a pre-training dataset. In addition, considering that the Chinese sentence chunk is unstable, we utilize jiebaand BERT tokenizer in our Randeng-PEGASUS. We also provide a large size version, available with [IDEA-CCNL/Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese). And, we also provide a version after fine-tuning on Chinese text summarization datasets: [Randeng-Pegasus-238M-Summary-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese).
|
33 |
|
34 |
## 使用 Usage
|
35 |
|