hkunlp
/

instructor-xl

Model card Files Files and versions Community

Technical Information

by gsaivinay - opened May 3, 2023

Discussion

gsaivinay

May 3, 2023

Hello, Thanks for this model.

Could you please provide information about this model?

What is the max input length of a sequence to generate embeddings? Will this model be useful for 1024 sequence length of a document text?
What is the dimension length of the output embedding vectors?

multi-train

NLP Group of The University of Hong Kong org May 16, 2023

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

By default, the maximum input length is 512, but it should be compatible with documents that have sequence length 1024.
The dimension of embedding vectors is 768.

Feel free to add any further questions or comments.

gsaivinay

May 16, 2023

Thank you very much for your reply.

I've few thousand documents, and some of them can be as big as 2500+ tokens. If I split those bigger models into 512 chunks, will this model be effective in fetching them?

multi-train

NLP Group of The University of Hong Kong org May 17, 2023

Yes. As the model is trained with maximum length 512, it is expected to work better if long documents are split into shorter chunks.

Feel free to add any further questions or comments!

j3cordeiro

Jun 22, 2023

Is the text in the instruction counted towards the number max number of tokens? example, if the instruction has 12 tokens, then the max number of tokens in the text is 500 ?

multi-train

NLP Group of The University of Hong Kong org Jul 2, 2023

Yes, the instruction is included in the maximum length calculation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment