config.json for pooling is incorrect

#4
by HectorL - opened

the config JSON for pooling includes arguments that are not valid for the Pooling function.
The following are in the config (not in this order):

"word_embedding_dimension": 768,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,

"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false

The Pooling function only accepts the top 5 arguments.
Model will not instantiate without removing the bottom two keys from the config.

I'm cloning the repo and using:
model = SentenceTranformer("local_path")

NLP Group of The University of Hong Kong org

Hi, Thanks a lot for your interest in INSTRUCTOR!

As we have overwritten several classes of sentence transformer library, you may need to install the InstructorEmbedding package following instructions at https://github.com/HKUNLP/instructor-embedding#installation.

After that, you can use our INSTRUCTOR model as

from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')

Feel free to add any further questions or comments!

No issues using your recommended method. I was also able to get the cloning method to work by removing the unaccepted keys. Are there any negative consequences to removing the following keys from the config?

"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false

Its working great for my embedding task. Just curious about this.

NLP Group of The University of Hong Kong org

Hi, thanks a lot for your comments!

By removing unnecessary keys and using the SentenceTranformer library, it seems that you will not be able to add instructions for embedding calculation.

HectorL changed discussion status to closed

Sign up or log in to comment