IDEA-CCNL
/

Randeng-Pegasus-523M-Summary-Chinese-V1

text2text-generation

Model card Files Files and versions Community

dongxq commited on Jan 13, 2023

Commit

90583ee

·

1 Parent(s): b1a3ea5

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -21,15 +21,15 @@ Good at solving text summarization tasks, after fine-tuning on multiple Chinese
 |  需求 Demand  | 任务 Task       | 系列 Series      | 模型 Model    | 参数 Parameter | 额外 Extra |
 |  :----:  | :----:  | :----:  | :----:  | :----:  | :----:  |
-| 通用 General | 自然语言转换 NLT | 燃灯 Randeng | PEFASUS |      238M      |    文本摘要任务-中文 Summary-Chinese    |
 ## 模型信息 Model Information
 参考论文：[PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf)
-基于[Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)，我们在收集的7个中文领域的文本摘要数据集（约4M个样本）上微调了它，得到了summary版本。这7个数据集为：education, new2016zh, nlpcc, shence, sohu, thucnews和weibo。
-Based on [Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese), we fine-tuned a text summarization version (summary) on 7 Chinese text summarization datasets, with totaling around 4M samples. The datasets include: education, new2016zh, nlpcc, shence, sohu, thucnews and weibo.
 ## 使用 Usage

 |  需求 Demand  | 任务 Task       | 系列 Series      | 模型 Model    | 参数 Parameter | 额外 Extra |
 |  :----:  | :----:  | :----:  | :----:  | :----:  | :----:  |
+| 通用 General | 自然语言转换 NLT | 燃灯 Randeng | PEFASUS |      523M      |    文本摘要任务-中文 Summary-Chinese    |
 ## 模型信息 Model Information
 参考论文：[PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf)
+基于[Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese)，我们在收集的7个中文领域的文本摘要数据集（约4M个样本），使用实体过滤后数据集（约1.8M）重新微调，在不损伤下游指标的情况下提升了摘要对原文的忠实度，得到了summary-v1版本。这7个数据集为：education, new2016zh, nlpcc, shence, sohu, thucnews和weibo。
+Based on [Randeng-Pegasus-523M-Chinese](https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Chinese), we fine-tuned a text summarization version (summary-v1) on a filted dataset(1.8M), which we use entitys to filter these 7 Chinese text summarization datasets, with totaling around 4M samples. We can improve the faithfulness of summaries without damage to the downstream task, eg Rouge-L on lcsts. The datasets include: education, new2016zh, nlpcc, shence, sohu, thucnews and weibo.
 ## 使用 Usage