hkunlp/instructor-large · Sentence transformer gives loading error

Oct 18, 2023

I tried to run the sentence transformer example but it encountered loading error as follows:

TypeError                                 Traceback (most recent call last)
/home/21zz42/temp/example.ipynb Cell 3 line 1
     10 repos = ['hkunlp/instructor-large', 'intfloat/e5-large']
     12 for repo in repos:
---> 13     model = SentenceTransformer(repo)
     14     model_time = 0
     15     for _ in range(times):

File ~/temp/.venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py:95, in SentenceTransformer.__init__(self, model_name_or_path, modules, device, cache_folder, use_auth_token)
     87         snapshot_download(model_name_or_path,
     88                             cache_dir=cache_folder,
     89                             library_name='sentence-transformers',
     90                             library_version=__version__,
     91                             ignore_files=['flax_model.msgpack', 'rust_model.ot', 'tf_model.h5'],
     92                             use_auth_token=use_auth_token)
     94 if os.path.exists(os.path.join(model_path, 'modules.json')):    #Load as SentenceTransformer model
---> 95     modules = self._load_sbert_model(model_path)
     96 else:   #Load with AutoModel
     97     modules = self._load_auto_model(model_path)

File ~/temp/.venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py:840, in SentenceTransformer._load_sbert_model(self, model_path)
    838 for module_config in modules_config:
    839     module_class = import_from_string(module_config['type'])
--> 840     module = module_class.load(os.path.join(model_path, module_config['path']))
...
    117 with open(os.path.join(input_path, 'config.json')) as fIn:
    118     config = json.load(fIn)
--> 120 return Pooling(**config)

TypeError: Pooling.__init__() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

Here are the original code:

from sentence_transformers import SentenceTransformer
input_texts = [
    'query: how much protein should a female eat',
    'query: summit define',
    "passage: As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
    "passage: Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments."
]

times = 1
repos = ['hkunlp/instructor-large']

for repo in repos:
    model = SentenceTransformer(repo)
    model_time = 0
    for _ in range(times):
        time_start = time.time()
        embeddings = model.encode(input_texts, normalize_embeddings=True)
        time_taken = time.time() - time_start
        model_time += time_taken
    print(f'{repo} time: {model_time}')

I am using Python 3.10.12 on Linux OS.

multi-train

NLP Group of The University of Hong Kong org Oct 18, 2023

Hi, what is your sentence-transformer version? FYI, I install the version 2.2.2

zhiminy

Oct 18, 2023

Hi, what is your sentence-transformer version? FYI, I install the version 2.2.2

Same, I am using 2.2.2 as well

multi-train

NLP Group of The University of Hong Kong org Oct 18, 2023

Oh, you should use INSTRUCTOR to load the model:

from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large')

For details, you may refer to https://github.com/xlang-ai/instructor-embedding#getting-started.

zhiminy changed discussion status to closed Nov 27, 2023

inumulaisk

Jan 30, 2024

ers\SentenceTransformer.py", line 194, in init
modules = self._load_sbert_model(
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'

mruthyunkb

Jan 30, 2024

Can someone help. getting below error.

TypeError: _load_sbert_model() got an unexpected keyword argument 'token'

TypeError Traceback (most recent call last)
in

----> 2 model = INSTRUCTOR('/instructor-large/')
3 device = 'cuda'
4 model.to(device)
5 model.eval()

/local_disk0/.ephemeral_nfs/envs/pythonEnv-8f7450cd-10dc-4b27-9ec2-64ba748b19f9/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py in init(self, model_name_or_path, modules, device, cache_folder, trust_remote_code, revision, token, use_auth_token)
192
193 if is_sentence_transformer_model(model_name_or_path, token, cache_folder=cache_folder, revision=revision):
--> 194 modules = self._load_sbert_model(
195 model_name_or_path,
196 token=token,

TypeError: _load_sbert_model() got an unexpected keyword argument 'token'

utkarshkrc2

Mar 2, 2024

•

edited Mar 2, 2024

To solve this problem, use Sentence Transformer Module separately in your program..

import streamlit as st
from pypdf import PdfReader
from dotenv import load_dotenv
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.huggingface import HuggingFaceInstructEmbeddings
from langchain.vectorstores.faiss import FAISS
import torch

from sentence_transformers import SentenceTransformer # Use SentenceTransformer module to use Hugging face Model

def embedding_store(chunked_text):
# embeddings = OpenAIEmbeddings() # Creating object of class OPenAIEmbeddings

model = SentenceTransformer('hkunlp/instructor-xl')
model_kwargs = {'device': 'cpu'} 
encode_kwargs = {'normalize_embeddings': True}

embeddings = HuggingFaceInstructEmbeddings(model_name=model,model_kwargs=model_kwargs,encode_kwargs=encode_kwargs)

vectore_store = FAISS.from_texts(embedding=embeddings,texts=chunked_text)

return vectore_store

DQData

Mar 8, 2024

@utkarshkrc2 I got the following:

ValidationError: 1 validation error for HuggingFaceInstructEmbeddings
model_name
str type expected (type=type_error.str)

Which version of langchain are you using? I tested two versions (0.1.9 and the latest 0.1.11) but got the same problem with both. They don't accept SentenceTransformer as an argument.

Nishag

Mar 15, 2024

•

edited Mar 15, 2024

@utkarshkrc2 , Even I am getting the same error:

ValidationError: 1 validation error for HuggingFaceInstructEmbeddings model_name str type expected (type=type_error.str)

Below is the code I am using:

def get_vectorstore(text_chunks):
model = SentenceTransformer('hkunlp/instructor-xl') #choosing different models does not make a different, tried with thenlper/gte-base and hkunlp/instructor-large
model_kwargs = {'device': 'cpu'} #changing the device to gpu didn't make any difference
encode_kwargs = {'normalize_embeddings': True}

embeddings = HuggingFaceInstructEmbeddings(model_name=model,model_kwargs=model_kwargs,encode_kwargs=encode_kwargs) 
#embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl") 
vectorstore = FAISS.from_texts(texts=text_chunks, embedding=embeddings)
return vectorstore

Following is the version of langchain and sentence-transformers I am using.
Name: sentence-transformers
Version: 2.5.1
Name: langchain
Version: 0.1.12

This error comes after it processes the pdf to and finishes generating some of the embeddings, you can see that from the below logs:

modules.json: 100%|████████████████████████████████████████████████████████████████████████████████| 461/461 [00:00<00:00, 647kB/s]
config_sentence_transformers.json: 100%|███████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 338kB/s]
README.md: 100%|██████████████████████████████████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 7.72MB/s]
sentence_bert_config.json: 100%|█████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 206kB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 6.43MB/s]
pytorch_model.bin: 16%|███████████ pytorch_model.bin: 16%|███████████▋ pytorch_model.bin: 17%|████████████▏ pytorch_model.bin: 30%|█████████████████████ pytorch_model.bin: 31%|█████████████████████▋ pytorch_model.bin: 31%|██████████████████████▏ pytorch_model.bin: 32%|██████████████████████▊ pytorch_model.bin: 100%|██████████████████████████████████████████████████████████████████████| 1.34G/1.34G [02:01<00:00, 11.0MB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 5.16MB/s]
spiece.model: 100%|██████████████████████████████████████████████████████████████████████████████| 792k/792k [00:01<00:00, 771kB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 5.06MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 25.0MB/s]
1_Pooling/config.json: 100%|███████████████████████████████████████████████████████████████████████| 270/270 [00:00<00:00, 885kB/s]
2_Dense/config.json: 100%|█████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 459kB/s]
2_Dense/pytorch_model.bin: 100%|██████████████████████████████████████████████████████████████| 3.15M/3.15M [00:00<00:00, 8.11MB/s]
2024-03-15 22:19:04.739 Uncaught app exception
Traceback (most recent call last):

File "/test/LangchainApp.py", line 51, in get_vectorstore
embeddings = HuggingFaceInstructEmbeddings(model_name=model,model_kwargs=model_kwargs,encode_kwargs=encode_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/langchain_community/embeddings/huggingface.py", line 149, in init
super().init(**kwargs)
File "/test/.venv/lib/python3.12/site-packages/pydantic/v1/main.py", line 341, in init
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for HuggingFaceInstructEmbeddings
model_name
str type expected (type=type_error.str)

Would be really helpful if someone has any idea why I get this error.

utkarshkrc2

Mar 15, 2024

Hi, Everyone
Please Use the required Version of Hugging Face Transformer as I used.
Also, For Validation Error Please install python "Setuptools" Package.

Please Ping me, if it is working or not in your case. I also got the same error while executing my program.

Nishag

Mar 16, 2024

@utkarshkrc2 , thanks for replying. I did try to downgrade the sentence-transformers version to 2.2.2 as I did notice some threads with similar issues and it helped to use this version, however there are other dependencies I am using which didn't work with this.

I will try to downgrade the other dependencies also and test it with that.

Do you know if its a bug with newer version of sentence-transformers?

Nishag

Mar 18, 2024

@utkarshkrc2 , so I fixed the dependencies and got the version of sentence-transformers version to 2.2.2, it seems to have passed through the previous error but now I get this error:

File "/test/LangchainApp.py", line 48, in get_vectorstore
model = SentenceTransformer('hkunlp/instructor-xl')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 95, in init
modules = self._load_sbert_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 840, in _load_sbert_model
module = module_class.load(os.path.join(model_path, module_config['path']))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/test/.venv/lib/python3.12/site-packages/sentence_transformers/models/Pooling.py", line 120, in load
return Pooling(**config)
^^^^^^^^^^^^^^^^^

Please let me if know if you can help, I can ping you (do let me know how).

Nishag

Mar 21, 2024

•

edited Mar 22, 2024

@utkarshkrc2 , FYI, the code you showed in your comment does not work. Even with sentence-transformers version 2.2.2 and langchain version of 0.1.2, we get the "TypeError: Pooling.init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'" error.

If we change the config.json used by Pooling.py file and remove 'pooling_mode_weightedmean_tokens' and 'pooling_mode_lasttoken' as per https://huggingface.co/hkunlp/instructor-base/discussions/6, we are back to the original error of 'ValidationError: 1 validation error for HuggingFaceInstructEmbeddings model_name str type expected (type=type_error.str)'.

Still trying to figure out how to solve this.

Edit: It worked for me after changing the HuggingFaceInstructEmbeddings constructor with only the model name and no other arguments. Also I didn't need to downgrade langchain version, I am still using 0.1.13.

HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")

itzyash01

Jun 17, 2024

Hi, I am getting
TypeError: INSTRUCTOR._load_sbert_model() got an unexpected keyword argument 'token'
I am using langachain==0.1.2 and sentence-transformers==2.2.2

zhiminy changed discussion status to open Jun 17, 2024

ergito

Jul 9, 2024

•

edited Jul 10, 2024

langchain_huggingface
tesnsorboard
tensorflow
setuptools
transformer_sentence 2.2.2
instructor
embeddinginstructor
embeddings
transformer
this are the one need to be installed on my case

alice86

Sep 3, 2024

•

edited Sep 3, 2024

any update with new version?

not the transformer_sentence 2.2.2

I already update, it is still exist problem.
InstructorEmbedding==1.0.1
sentence-transformers==3.0.1

nbpe97

Sep 22, 2024

Works for me.
Python3.11, pip3.11 installed.
pip3.11 list
|Package| Version|

attrs 24.2.0|
cattrs 24.1.1|
certifi 2024.8.30|
charset-normalizer 3.3.2
click 8.1.7
coremltools 8.0
filelock 3.16.1
fsspec 2024.9.0
huggingface-hub 0.25.0
idna 3.10
InstructorEmbedding 1.0.1
Jinja2 3.1.4
joblib 1.4.2
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.3
nltk 3.9.1
numpy 2.1.1
onnx 1.16.2
onnx-coreml 1.3
packaging 24.1
pillow 10.4.0
pip 24.2
protobuf 5.28.1
pyaml 24.7.0
pybind11 2.13.5
PyYAML 6.0.2
regex 2024.9.11
requests 2.32.3
safetensors 0.4.5
scikit-learn 1.5.2
scipy 1.14.1
sentence-transformers 2.2.2
sentencepiece 0.2.0
setuptools 74.1.2
sympy 1.13.3
threadpoolctl 3.5.0
tokenizers 0.19.1
torch 2.4.1
torchvision 0.19.1
tqdm 4.66.5
transformers 4.44.2
typing 3.7.4.3
typing_extensions 4.12.2
urllib3 2.2.3
wheel 0.44.0

robbebras

Jan 4

I want to use a newer version of sentence-transformers but this issue is still not fixed. :( Anyone has a workaround?

nbpe97

Jan 4

Yep, I had to download the pip library because it was not being kept up to date, and none of the PRs or commits would work for the latest version of sentence-transformers.

To use with the latest version of sentence-transformers (3.3.1), install this modified version:

pip install git+https://github.com/NoahBPeterson/instructor-embedding.git@54076ec450d9825cf84f1ed6e54a5748f6877070

robbebras

Jan 5

Thank you!!