Quasimodo-GenT LM-based Alignment
This model is trained to translate an open triple (initially from Quasimodo) into a closed triple that uses relationships from ConceptNet.
Model Details
Model Description
- **Developed by: Julien Romero
- **Model type: GPT2
- **Language(s) (NLP): English
- **Finetuned from model: gpt2-large
Model Sources [optional]
- **Repository: https://github.com/Aunsiels/GenT
- **Paper [optional]: https://arxiv.org/pdf/2306.12766.pdf
Uses
We observed good results by using a beam search decoding. Other decoding methods might be less adapted.
Direct Use
You must give the open triple with subject, object, and predicate separated by a tabulation and then followed by [SEP]. Examples:
fish lives in ocean[SEP]
elephant be killed in africa[SEP]
doctor write prescription[SEP]
From Subject/Subject-Predicate
It is also possible to give a subject or a subject-predicate to generate a knowledge base directly. The output must be parsed correctly in this case. Examples:
fish
elephant capable of
doctor at location
From Text
When used with text as input, this model can behave like a relation extractor, although it was not trained on this task. Examples:
Some air pollutants fall to earth in the form of acid rain.[SEP]
Elon Musk Races to Secure Financing for Twitter Bid.[SEP]
Citation [optional]
BibTeX:
@InProceedings{10.1007/978-3-031-47240-4_20, author="Romero, Julien and Razniewski, Simon", editor="Payne, Terry R. and Presutti, Valentina and Qi, Guilin and Poveda-Villal{'o}n, Mar{'i}a and Stoilos, Giorgos and Hollink, Laura and Kaoudi, Zoi and Cheng, Gong and Li, Juanzi", title="Mapping and Cleaning Open Commonsense Knowledge Bases with Generative Translation", booktitle="The Semantic Web -- ISWC 2023", year="2023", publisher="Springer Nature Switzerland", address="Cham", pages="368--387", abstract="Structured knowledge bases (KBs) are the backbone of many knowledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce structure from a text. However, although it allows high recall, the extracted knowledge tends to inherit noise from the sources and the OpenIE algorithm. Besides, OpenIE tuples contain an open-ended, non-canonicalized set of relations, making the extracted knowledge's downstream exploitation harder. In this paper, we study the problem of mapping an open KB into the fixed schema of an existing KB, specifically for the case of commonsense knowledge. We propose approaching the problem by generative translation, i.e., by training a language model to generate fixed-schema assertions from open ones. Experiments show that this approach occupies a sweet spot between traditional manual, rule-based, or classification-based canonicalization and purely generative KB construction like COMET. Moreover, it produces higher mapping accuracy than the former while avoiding the association-based noise of the latter. Code and data are available. (https://github.com/Aunsiels/GenT, julienromero.fr/data/GenT)", isbn="978-3-031-47240-4" }
- Downloads last month
- 16