Error when encoding large sentences

#9
by GuillaumeGrosjean - opened

Encoding large texts throws an error. When executing the following code:

from sentence_transformers import SentenceTransformer
model =  SentenceTransformer("dangvantuan/sentence-camembert-large")
model.encode(":"*1000)

It thows the error:

>>> IndexError: index out of range in self

It seems that model.max_seq_length=514.
When explicitly setting model.max_seq_length=512, everything works fine.
Large texts seem to be truncated to 514 tokens by default, but i think should be truncated to 512.

Sign up or log in to comment