Hello,

#4
by ArthurParkerhouse - opened

Is there any way to use this on text that is longer that 512 words? I'm attempting to use it in a google Collab, but it only seems to be reading a small portion of the text file I'm giving to it. I'm trying to get all of the characters names, locations and orgs from a somewhat short novel which I split into 10,000 words per text file.

Hello @ArthurParkerhouse ,

No there is no way to use this model on more than 512 tokens. This is linked to the architecture of the Roberta model which is defined like that. You might find some model that can accept more token but there will always be a limit. I would recommend that you split your text in shorter text. I usually split my text by sentence before sending it in the model. You can keep track of the position of each sentence if you need to have the position of the entity in the original text. I hope that it helps.

Thanks,
Jean-Baptiste

Sign up or log in to comment