Maximum context length actually used by the model

#5
by lthamm - opened

I hope this not to be a question with a very obvious answer, but how much context does this model actually use / been trained on?
RoBERTa has a maximum context length of 512 tokens (minus some reserved tokens) and when I load the model and check model.max_seq_length it is indeed 512 tokens.

However in the sentence_bert_config.json I find

{
  "max_seq_length": 128
}

Thank you for opensourcing this great model!

T-Systems on site services GmbH org

Yes. It is not the full 512. It was 128.

T-Systems on site services GmbH org

Does this answer your question?
If yes - please close this again.

Many thanks
Philip

This helps a lot!

Just to be clear: What exactly happens when I pass in an input longer than 128 tokens?
As model.max_seq_length says 512, will it just work with the input but with worse quality?
Or will it actually truncate the input?

T-Systems on site services GmbH org

I think it will not crash. It will also not truncate as far as I know.
My guess is that the quality is just degraded.

Thank you!
If anyone should come across this: While everything between 128 and 512 tokens might or might not be truncated, everything above 512 definitely will
(https://github.com/UKPLab/sentence-transformers/issues/181).

GOATransformers 🐐

lthamm changed discussion status to closed

Sign up or log in to comment