Quasimodo-GenT LM-based Alignment

This model is trained to translate an open triple (initially from Quasimodo) into a closed triple that uses relationships from ConceptNet.

Model Details

Model Description

**Developed by: Julien Romero
**Model type: GPT2
**Language(s) (NLP): English
**Finetuned from model: gpt2-large

Model Sources [optional]

**Repository: https://github.com/Aunsiels/GenT
**Paper [optional]: https://arxiv.org/pdf/2306.12766.pdf

Uses

We observed good results by using a beam search decoding. Other decoding methods might be less adapted.

Direct Use

You must give the open triple with subject, object, and predicate separated by a tabulation and then followed by [SEP]. Examples:

fish	lives in	ocean[SEP]
elephant	be killed in	africa[SEP]
doctor	write	prescription[SEP]

From Subject/Subject-Predicate

It is also possible to give a subject or a subject-predicate to generate a knowledge base directly. The output must be parsed correctly in this case. Examples:

fish	
elephant	capable of	
doctor	at location

From Text

When used with text as input, this model can behave like a relation extractor, although it was not trained on this task. Examples:

Some air pollutants fall to earth in the form of acid rain.[SEP]
Elon Musk Races to Secure Financing for Twitter Bid.[SEP]

Citation [optional]

BibTeX:

@InProceedings{10.1007/978-3-031-47240-4_20, author="Romero, Julien and Razniewski, Simon", editor="Payne, Terry R. and Presutti, Valentina and Qi, Guilin and Poveda-Villal{'o}n, Mar{'i}a and Stoilos, Giorgos and Hollink, Laura and Kaoudi, Zoi and Cheng, Gong and Li, Juanzi", title="Mapping and Cleaning Open Commonsense Knowledge Bases with Generative Translation", booktitle="The Semantic Web -- ISWC 2023", year="2023", publisher="Springer Nature Switzerland", address="Cham", pages="368--387", abstract="Structured knowledge bases (KBs) are the backbone of many knowledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce structure from a text. However, although it allows high recall, the extracted knowledge tends to inherit noise from the sources and the OpenIE algorithm. Besides, OpenIE tuples contain an open-ended, non-canonicalized set of relations, making the extracted knowledge's downstream exploitation harder. In this paper, we study the problem of mapping an open KB into the fixed schema of an existing KB, specifically for the case of commonsense knowledge. We propose approaching the problem by generative translation, i.e., by training a language model to generate fixed-schema assertions from open ones. Experiments show that this approach occupies a sweet spot between traditional manual, rule-based, or classification-based canonicalization and purely generative KB construction like COMET. Moreover, it produces higher mapping accuracy than the former while avoiding the association-based noise of the latter. Code and data are available. (https://github.com/Aunsiels/GenT, julienromero.fr/data/GenT)", isbn="978-3-031-47240-4" }