shibing624 commited on
Commit
8919838
1 Parent(s): 48be009

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -1
README.md CHANGED
@@ -1,3 +1,89 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ tags:
5
+ - t5
6
+ - pytorch
7
+ - zh
8
+ - text-generation
9
+ license: "apache-2.0"
10
+ widget:
11
+ - text: "丹枫江冷人初去"
12
+
13
  ---
14
+
15
+ # T5 for Chinese Couplet(t5-chinese-couplet) Model
16
+ T5中文对联生成模型
17
+
18
+ `t5-chinese-couplet` evaluate couplet test data:
19
+
20
+ The overall performance of BERT on couplet **test**:
21
+
22
+ |prefix|input_text|target_text|pred|
23
+ |:-- |:--- |:--- |:-- |
24
+ |对联:|春回大地,对对黄莺鸣暖树|日照神州,群群紫燕衔新泥|福至人间,家家紫燕舞和风|
25
+
26
+ 在Couplet测试集上生成结果满足字数相同、词性对齐、词面对齐、形似要求,而语义对仗工整和平仄合律还不满足。
27
+
28
+ T5的网络结构(原生T5):
29
+
30
+ ![arch](t5.png)
31
+
32
+ ## Usage
33
+
34
+ 本项目开源在文本生成项目:[textgen](https://github.com/shibing624/textgen),可支持T5模型,通过如下命令调用:
35
+
36
+ ```shell
37
+ >>> from textgen import T5Model
38
+ >>> model = T5Model("t5", "shibing624/t5-chinese-couplet")
39
+ >>> r = model.predict(["对联:丹枫江冷人初去"])
40
+ ```
41
+
42
+ 模型文件组成:
43
+ ```
44
+ t5-chinese-couplet
45
+ ├── config.json
46
+ ├── model_args.json
47
+ ├── pytorch_model.bin
48
+ ├── special_tokens_map.json
49
+ ├── tokenizer_config.json
50
+ ├── spiece.model
51
+ └── vocab.txt
52
+ ```
53
+
54
+
55
+ ### 训练数据集
56
+ #### 中文对联数据集
57
+
58
+ - 数据:[对联github](https://github.com/wb14123/couplet-dataset)、[清洗过的对联github](https://github.com/v-zich/couplet-clean-dataset)
59
+ - 相关内容
60
+ - [Huggingface](https://huggingface.co/)
61
+ - LangZhou Chinese [MengZi T5 pretrained Model](https://huggingface.co/Langboat/mengzi-t5-base) and [paper](https://arxiv.org/pdf/2110.06696.pdf)
62
+ - [textgen](https://github.com/shibing624/textgen)
63
+
64
+
65
+ 数据格式:
66
+
67
+ ```text
68
+ ==> .//couplet_files/couplet/train/in.txt <==
69
+ 晚 风 摇 树 树 还 挺
70
+
71
+ ==> .//couplet_files/couplet/train/out.txt <==
72
+ 晨 露 润 花 花 更 红
73
+ ```
74
+
75
+
76
+ 如果需要训练T5模型,请参考[https://github.com/shibing624/textgen/blob/main/docs/%E5%AF%B9%E8%81%94%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B%E5%AF%B9%E6%AF%94.md](https://github.com/shibing624/textgen/blob/main/docs/%E5%AF%B9%E8%81%94%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B%E5%AF%B9%E6%AF%94.md)
77
+
78
+
79
+ ## Citation
80
+
81
+ ```latex
82
+ @software{textgen,
83
+ author = {Xu Ming},
84
+ title = {textgen: Implementation of Text Generation models},
85
+ year = {2022},
86
+ url = {https://github.com/shibing624/textgen},
87
+ }
88
+ ```
89
+