config.json for pooling is incorrect
the config JSON for pooling includes arguments that are not valid for the Pooling function.
The following are in the config (not in this order):
"word_embedding_dimension": 768,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false
The Pooling function only accepts the top 5 arguments.
Model will not instantiate without removing the bottom two keys from the config.
I'm cloning the repo and using:
model = SentenceTranformer("local_path")
Hi, Thanks a lot for your interest in INSTRUCTOR!
As we have overwritten several classes of sentence transformer library, you may need to install the InstructorEmbedding package following instructions at https://github.com/HKUNLP/instructor-embedding#installation.
After that, you can use our INSTRUCTOR model as
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')
Feel free to add any further questions or comments!
No issues using your recommended method. I was also able to get the cloning method to work by removing the unaccepted keys. Are there any negative consequences to removing the following keys from the config?
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false
Its working great for my embedding task. Just curious about this.
Hi, thanks a lot for your comments!
By removing unnecessary keys and using the SentenceTranformer library, it seems that you will not be able to add instructions for embedding calculation.