using Luke to Perform Entity Disambiguation

#1
by christian-siagian - opened

Hi I am trying to perform entity disambiguation, given a sentence and a wiki link.
An example (taken from Fig. 5, or page 11, of a survey paper https://content.iospress.com/download/semantic-web/sw222986?id=semantic-web%2Fsw222986)
Sentence: "Scott Young played for the Cleveland Browns", wiki link: https://en.wikipedia.org/wiki/Scott_Young_(American_football) or "Scott Young (American football)"

I use the example code:
from transformers import LukeTokenizer, LukeForEntitySpanClassification, LukeModel, LukeForEntityPairClassification
text = "Scott Young played for the Cleveland Browns"
entity_spans = [(0, 11)]
entities = [ "Scott Young (American football)" ]
inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
outputs = model(**inputs)
word_last_hidden_state = outputs.last_hidden_state
entity_last_hidden_state = outputs.entity_last_hidden_state

Is there an easy way to check whether the disambiguation is correct, say the matching score should deteriorate if I were to pass in the title "Scott Young (English footballer)", instead?

Thank you, in advance.

Studio Ousia org

Although LUKE is not designed for entity disambiguation, maybe you could use masked entity prediction for that purpose.

from transformers import LukeTokenizer, LukeForMaskedLM
import torch

tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
model = LukeForMaskedLM.from_pretrained("studio-ousia/luke-base")


text = "Scott Young played for the Cleveland Browns"
entity_spans = [(0, 11)]
entities = ["[MASK]"]
inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
outputs = model(**inputs)


idx2entity = {idx: entity for entity, idx in tokenizer.entity_vocab.items()}
for predicted_entity_index in torch.topk(outputs.entity_logits[0][0], k=10).indices:
    print(idx2entity[int(predicted_entity_index)])

output
============
[UNK]
Scott Mitchell (quarterback)
Vince Young
Scott Sharp
Scott Armstrong (wrestler)
Scott Stadium
Mark Aitchison Young
Scott Peterson
Scott Dixon
Scott Air Force Base

(It seems the entity vocabulary does not cover the entity you gave though.)
Hope it helps!

Thanks ryo064. Good to know.
For the above example, it's be hard as there are many that played for Cleveland Brown.

However, I saw that the original code has entity disambiguation (https://github.com/studio-ousia/luke/tree/master/examples/entity_disambiguation).
I will start from there.

christian-siagian changed discussion status to closed
christian-siagian changed discussion status to open
christian-siagian changed discussion status to closed

Sign up or log in to comment