Issues when using in Colab & Sentence Transformers

#6
by Vvkishere - opened

I get a
TypeError: init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

When trying to load the model using sentence transformers in a Google Colab Pro notebook. I am not sure how to resolve the issue.

NLP Group of The University of Hong Kong org

Thanks for your question!

You may try installing customized sentence-transformers here: https://github.com/HKUNLP/instructor-embedding/tree/main/sentence-transformers, and use transformers 4.20.0.

Feel free to leave further questions!

How does one go about doing that specifically? What is the command one must run?

NLP Group of The University of Hong Kong org

Thanks for the question!

Specifically, you may first clone the repository:

git clone https://github.com/HKUNLP/instructor-embedding

Then go to the sentence-transformers folder:

cd instructor-embedding/sentence-transformers

Finally you will be able to install the customized package:

pip install -e .

Feel free to leave your further questions here.

Has this been integrated into the main Hugging Face sentence-transformers yet?

NLP Group of The University of Hong Kong org

No, because we have overwritten several classes in the sentence-transformers library to incorporate instructions.

Hi, I cannot find the path to sentence_transformers

NLP Group of The University of Hong Kong org

Hi, you may want to install the sentence-transformers via pip install sentence-transformers.

There aren't may results on Google related to this issue except this thread. I have the sentence-transformers installed and I'm still getting the error from the original post.

TypeError: Pooling.__init__() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

Here's the script that I've used:

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from InstructorEmbedding import INSTRUCTOR

model = INSTRUCTOR('hkunlp/instructor-xl')

pdf_path = "./document.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()

embeddings = HuggingFaceEmbeddings(model_name="hkunlp/instructor-xl")

db = Chroma.from_documents(documents=pages, embedding=embeddings, persist_directory="./chroma_db")
db.persist()

Any idea how to solve this? Thanks!

EDIT: It works fine with "sentence-transformers/all-MiniLM-L6-v2" model for example.
EDIT 2: This seems to work https://github.com/Muennighoff/sgpt/issues/14#issuecomment-1405205453

NLP Group of The University of Hong Kong org

You may try to install sentence-transformers 2.2.2.

You may try to install sentence-transformers 2.2.2.

despite trying that, i still get the same exact error

Here's my hack to solve it (until there's an official fix):

INSTR = [instructor-xl, instructor-large, instructor-base] pick your instructor model

  1. edit the pooling config file in models/hkunlp/INSTR/1_Pooling/config.json

  2. Remove the offending lines with "pooling_mode_max_tokens" and "pooling_mode_mean_sqrt_len_tokens"
    change this:
    {
    "word_embedding_dimension": 768,
    "pooling_mode_cls_token": false,
    "pooling_mode_mean_tokens": true,
    "pooling_mode_max_tokens": false,
    "pooling_mode_mean_sqrt_len_tokens": false,
    "pooling_mode_weightedmean_tokens": false,
    "pooling_mode_lasttoken": false
    }
    to this:
    {
    "word_embedding_dimension": 768,
    "pooling_mode_cls_token": false,
    "pooling_mode_mean_tokens": true,
    "pooling_mode_max_tokens": false,
    "pooling_mode_mean_sqrt_len_tokens": false
    }

  3. remember to also remove the "," at the end of the line above
    change "pooling_mode_mean_sqrt_len_tokens": false,
    to "pooling_mode_mean_sqrt_len_tokens": false

Hope this works for you!
Or you could edit the Pooling.py in your installed version of sentence-transformers, as the original author suggested: https://github.com/Muennighoff/sgpt/issues/14#issuecomment-1405205453

Thank You it worked for me

This issue can be solved by updating the sentence-transformers

Sign up or log in to comment