kimsan0622 commited on
Commit
9854a90
1 Parent(s): 3a2efa3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: [en, ko]
4
+ tags:
5
+ - Korean
6
+ - English
7
+ - t5
8
+ eos_token: "</s>"
9
+ widget:
10
+ - text: 아버지가 방에 들어가신다.</s>
11
+ ---
12
+
13
+ # ke-t5 base
14
+
15
+ Pretrained T5 Model on Korean and English. See [Github](https://github.com/AIRC-KETI/ke-t5) and [Paper](https://aclanthology.org/2021.findings-emnlp.33/) [Korean paper](https://koreascience.kr/article/CFKO202130060717834.pdf) for more details.
16
+
17
+ ## How to use
18
+
19
+ ```python
20
+ from transformers import AutoModel, AutoTokenizer
21
+
22
+ model = AutoModel.from_pretrained("KETI-AIR/ke-t5-small")
23
+ tokenizer = AutoTokenizer.from_pretrained("KETI-AIR/ke-t5-small")
24
+ ```
25
+
26
+ ## BibTeX entry and citation info
27
+
28
+ ```bibtex
29
+ @inproceedings{kim-etal-2021-model-cross,
30
+ title = "A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems",
31
+ author = "Kim, San and
32
+ Jang, Jin Yea and
33
+ Jung, Minyoung and
34
+ Shin, Saim",
35
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
36
+ month = nov,
37
+ year = "2021",
38
+ address = "Punta Cana, Dominican Republic",
39
+ publisher = "Association for Computational Linguistics",
40
+ url = "https://aclanthology.org/2021.findings-emnlp.33",
41
+ doi = "10.18653/v1/2021.findings-emnlp.33",
42
+ pages = "352--365",
43
+ abstract = "Research on open-domain dialogue systems that allow free topics is challenging in the field of natural language processing (NLP). The performance of the dialogue system has been improved recently by the method utilizing dialogue-related knowledge; however, non-English dialogue systems suffer from reproducing the performance of English dialogue systems because securing knowledge in the same language with the dialogue system is relatively difficult. Through experiments with a Korean dialogue system, this paper proves that the performance of a non-English dialogue system can be improved by utilizing English knowledge, highlighting the system uses cross-lingual knowledge. For the experiments, we 1) constructed a Korean version of the Wizard of Wikipedia dataset, 2) built Korean-English T5 (KE-T5), a language model pre-trained with Korean and English corpus, and 3) developed a knowledge-grounded Korean dialogue model based on KE-T5. We observed the performance improvement in the open-domain Korean dialogue model even only English knowledge was given. The experimental results showed that the knowledge inherent in cross-lingual language models can be helpful for generating responses in open dialogue systems.",
44
+ }
45
+ ```