How to load the model's tokenizer? What is the input of each models?

by spiesg - opened Feb 25, 2023

Feb 25, 2023

•

Hi, I want to test the effect of the original model on the dataset, but I don't know the data input form. Here are the questions:

What is the correspondence between entity and ENTITY_id in add_tokens.json? and the relation to RELATION_ID?
And the input for each model is "[CLS][ENTITY_i][SEP][RELATION_i][SEP][MASK][SEP]" or "[CLS][ENTITY_i][RELATION_i][MASK][SEP]" and use [MASK] as the query?
3.When we use [MASK] to do prediction, how many categories are there? Only the Number of entities OR the number of entities+Bert's original vocabulary size?

spiesg changed discussion status to closed Feb 25, 2023

ZJUNLP org Feb 25, 2023

The ENTITY_ID and RELATION_ID are encoded ID numbers read in order from relations.txt and entity2textlong.txt
Yes, we use the [MASK] token for prediction.
Our candidate set only includes entities and does not consist of the original vocabulary of BERT. Specifically, we use the part of the expanded vocabulary that pertains to entities as the candidate set.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment