Jiaqi7777 commited on
Commit
dca5f77
1 Parent(s): d252247

Upload 9 files

Browse files
README.md CHANGED
@@ -1,3 +1,75 @@
1
  ---
2
- license: bigscience-openrail-m
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ license: apache-2.0
5
+
6
+ inference: true
7
+
8
+ widget:
9
+ - text: 'summary: 在北京冬奥会自由式滑雪女子坡面障碍技巧决赛中,中国选手谷爱凌夺得银牌。祝贺谷爱凌!今天上午,自由式滑雪女子坡面障碍技巧决赛举行。决赛分三轮进行,取选手最佳成绩排名决出奖牌。第一跳,中国选手谷爱凌获得69.90分。在12位选手中排名第三。完成动作后,谷爱凌又扮了个鬼脸,甚是可爱。第二轮中,谷爱凌在道具区第三个障碍处失误,落地时摔倒。获得16.98分。网友:摔倒了也没关系,继续加油!在第二跳失误摔倒的情况下,谷爱凌顶住压力,第三跳稳稳发挥,流畅落地!获得86.23分!此轮比赛,共12位选手参赛,谷爱凌第10位出场。网友:看比赛时我比谷爱凌紧张,加油!'
10
  ---
11
+
12
+ # Randeng-BART-139M-SUMMARY
13
+
14
+ - Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
15
+ - Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/)
16
+
17
+ ## 简介 Brief Introduction
18
+
19
+ 善于处理摘要任务,在一个中文摘要数据集上微调后的,中文版的BART-base。
20
+
21
+ Good at solving text summarization tasks, after fine-tuning on a Chinese text summarization dataset, Chinese BART-base.
22
+
23
+ ## 模型分类 Model Taxonomy
24
+
25
+ | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
26
+ | :----: | :----: | :----: | :----: | :----: | :----: |
27
+ | 通用 General | 自然语言转换 NLT | 燃灯 Randeng | BART | 139M | 中文-文本摘要任务 Chinese-Summary |
28
+
29
+ ## 模型信息 Model Information
30
+
31
+ 基于[Randeng-BART-139M](https://huggingface.co/IDEA-CCNL/Randeng-BART-139M),我们在收集的1个中文领域的文本摘要数据集(LCSTS)上微调了它,得到了summary版本。
32
+
33
+ Based on 基于[Randeng-BART-139M](https://huggingface.co/IDEA-CCNL/Randeng-BART-139M), we fine-tuned a text summarization version (summary) on a Chinese text summarization datasets (LCSTS).
34
+
35
+ ## 使用 Usage
36
+
37
+ ```python
38
+ from transformers import BartForConditionalGeneration, AutoTokenizer, Text2TextGenerationPipeline
39
+ import torch
40
+
41
+ tokenizer=AutoTokenizer.from_pretrained('IDEA-CCNL/Randeng-BART-139M-SUMMARY')
42
+ model=BartForConditionalGeneration.from_pretrained('IDEA-CCNL/Randeng-BART-139M-SUMMARY')
43
+ text = 'summary:在北京冬奥会自由式滑雪女子坡面障碍技巧决赛中,中国选手谷爱凌夺得银牌。祝贺谷爱凌!今天上午,自由式滑雪女子坡面障碍技巧决赛举行。决赛分三轮进行,取选手最佳成绩排名决出奖牌。第一跳,中国选手谷爱凌获得69.90分。在12位选手中排名第三。完成动作后,谷爱凌又扮了个鬼脸,甚是可爱。第二轮中,谷爱凌在道具区第三个障碍处失误,落地时摔倒。获得16.98分。网友:摔倒了也没关系,继续加油!在第二跳失误摔倒的情况下,谷爱凌顶住压力,第三跳稳稳发挥,流畅落地!获得86.23分!此轮比赛,共12位选手参赛,谷爱凌第10位出场。网友:看比赛时我比谷爱凌紧张,加油!'
44
+ text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
45
+ print(text2text_generator(text, max_length=50, do_sample=False))
46
+ ```
47
+
48
+ ## 引用 Citation
49
+
50
+ 如果您在您的工作中使用了我们的模型,可以引用我们的[论文](https://arxiv.org/abs/2209.02970):
51
+
52
+ If you are using the resource for your work, please cite the our [paper](https://arxiv.org/abs/2209.02970):
53
+
54
+ ```text
55
+ @article{fengshenbang,
56
+ author = {Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen and Ruyi Gan and Jiaxing Zhang},
57
+ title = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
58
+ journal = {CoRR},
59
+ volume = {abs/2209.02970},
60
+ year = {2022}
61
+ }
62
+ ```
63
+
64
+ 也可以引用我们的[网站](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
65
+
66
+ You can also cite our [website](https://github.com/IDEA-CCNL/Fengshenbang-LM/):
67
+
68
+ ```text
69
+ @misc{Fengshenbang-LM,
70
+ title={Fengshenbang-LM},
71
+ author={IDEA-CCNL},
72
+ year={2021},
73
+ howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
74
+ }
75
+ ```
added_tokens.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"<mask>": 40001, "<pad>": 40000}
config.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bart-base",
3
+ "activation_dropout": 0.1,
4
+ "activation_function": "gelu",
5
+ "add_bias_logits": false,
6
+ "add_final_layer_norm": false,
7
+ "architectures": [
8
+ "BartForConditionalGeneration"
9
+ ],
10
+ "attention_dropout": 0.1,
11
+ "bos_token_id": 0,
12
+ "classif_dropout": 0.1,
13
+ "classifier_dropout": 0.0,
14
+ "d_model": 768,
15
+ "decoder_attention_heads": 12,
16
+ "decoder_ffn_dim": 3072,
17
+ "decoder_layerdrop": 0.0,
18
+ "decoder_layers": 6,
19
+ "decoder_start_token_id": 2,
20
+ "dropout": 0.1,
21
+ "encoder_attention_heads": 12,
22
+ "encoder_ffn_dim": 3072,
23
+ "encoder_layerdrop": 0.0,
24
+ "encoder_layers": 6,
25
+ "eos_token_id": 2,
26
+ "forced_eos_token_id": 2,
27
+ "id2label": {
28
+ "0": "LABEL_0",
29
+ "1": "LABEL_1",
30
+ "2": "LABEL_2"
31
+ },
32
+ "init_std": 0.02,
33
+ "is_encoder_decoder": true,
34
+ "label2id": {
35
+ "LABEL_0": 0,
36
+ "LABEL_1": 1,
37
+ "LABEL_2": 2
38
+ },
39
+ "max_position_embeddings": 1024,
40
+ "model_type": "bart",
41
+ "no_repeat_ngram_size": 3,
42
+ "normalize_before": false,
43
+ "normalize_embedding": true,
44
+ "num_beams": 4,
45
+ "num_hidden_layers": 6,
46
+ "pad_token_id": 1,
47
+ "scale_embedding": false,
48
+ "torch_dtype": "float16",
49
+ "transformers_version": "4.16.0.dev0",
50
+ "use_cache": true,
51
+ "vocab_size": 50265
52
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26802edda43f57980a2e2edcc6f8d3907eb9ce30457b886006d323dfd2cbd2b2
3
+ size 134
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "mask_token": "<mask>", "additional_special_tokens": ["<s>", "<mask>"]}
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5db8098aad8adeabab08f5b1adfe296b21f67b3e0f1cc2a1edfff35a24561433
3
+ size 131
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eos_token": "</s>",
3
+ "unk_token": "<unk>",
4
+ "pad_token": "<pad>",
5
+ "extra_ids": 0,
6
+ "additional_special_tokens": [
7
+ "<s>",
8
+ "<mask>"
9
+ ],
10
+ "sp_model_kwargs": {},
11
+ "name_or_path": "/cognitive_comp/gaoxinyu/hf_hub/Randeng-BART-139M",
12
+ "special_tokens_map_file": "/cognitive_comp/gaoxinyu/hf_hub/Randeng-BART-139M/special_tokens_map.json",
13
+ "tokenizer_class": "T5Tokenizer"
14
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff