"Token indices sequence length is longer than the specified maximum sequence length" when using HHEM-2.1-open

#13

by ytangccc - opened Aug 2

Aug 2

I see the message "Token indices sequence length is longer than the specified maximum sequence length for this model (624 > 512). Running this sequence through the model will result in indexing errors" when using HHEM-2.1-open. It still ran through but I'm just wondering since HHEM-2.1 should have an unlimited context length?

forrest-vectara

Vectara org Aug 2

Don't worry about it. This is a notification inherited from the foundation, T5-base.

maverick84

Aug 19

Is there a restriction on using a different tokenizer?

forrest-vectara

Vectara org Aug 19

•

edited Aug 19

You have to use the same tokenizer that Google T5 uses. Otherwise, tokens will be mapped to different integer indexes and then mapped to wrong token embeddings. This is a limitation to any Transformer-based models or any model that relies on an embedding layer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment