How to load jinna-embedindg model in local

#29

by zyznull - opened Nov 16, 2023

Nov 16, 2023

I wan to run the embeding model in a local environment, but it seems that the jinaai-implentation files can only be obtained through trust_remote_code, can i download these implentation files to local?

Ameckto

Nov 16, 2023

I have the same problem. Do you know where to look for the line what code is injected by "trust_remote_code=True"?

bwang0911

Jina AI org Nov 19, 2023

hi @zyznull and @makram93 the moment you download the model using AutoModel, it is cached in your local folder ~/cache/huggingface. The next time you load model it will default load from cache, not remote.

The implementation files, are automatically linked and downloaded as well, besides, it is also open source here: https://huggingface.co/jinaai/jina-bert-implementation

A-Issa-1999

Nov 23, 2023

hello, @bwang0911 , but what's the command to manually save and load the model without having this warning?

michael-guenther

Jina AI org Nov 23, 2023

I think, in any case you need to set the trust_remote_code flag because the code is not part of the transformers package. This is independent from whether the model is loaded from a local folder or from huggingface.

JoannaSapiecha

Dec 20, 2023

•

edited Dec 20, 2023

Hi, @zyznull did you manage to load jinna-embedindg model in local and run it? Thank you
2023-12-20: Solved

A-Issa-1999

Dec 21, 2023

Hello @JoannaSapiecha were you able to load it locally?

JoannaSapiecha

Dec 21, 2023

•

edited Dec 21, 2023

Hi, @A-Issa-1999 yes, I did.

A-Issa-1999

Dec 21, 2023

@JoannaSapiecha How ? without using trust_remote_cost ? because it tried to remove it and the model won't behave the same/

JoannaSapiecha

Dec 21, 2023

•

edited Dec 21, 2023

@A-Issa-1999 -

I used trust_remote_cost: model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True).
I upgraded the Transformers package to the newest available version (4.36.2), as advised by @robertsatya here: https://huggingface.co/jinaai/jina-embeddings-v2-base-en/discussions/35
Before I had removed the cashed version (all files from all sub-folders)

I did also some other things like using langchain.embeddings, but suggest to start with applying the above steps. Maybe nothing more is needed.

A-Issa-1999

Dec 21, 2023

@JoannaSapiecha so you were able to store and save the model and all its dependencies inside a specific older or all these are included in the cache?

JoannaSapiecha

Dec 21, 2023

@A-Issa-1999 , do you mean A) downloading the files manually from Hugging Face (HF) and creating the folders manually? Or, B) allowing HF to create the folders and download all the files.
Files = the model and all its dependencies (*.json, other files).

A-Issa-1999

Dec 21, 2023

@JoannaSapiecha i was referring to 'A'. I employed git clone for obtaining the model and placed it in a designated directory. However, when employing trust_remote_code, it appears that the model is fetching files from locations beyond huggingface, and I encountered difficulty manually storing them in a particular folder. Hence, I am curious if there is a prescribed method for achieving this.

JoannaSapiecha

Dec 21, 2023

@A-Issa-1999 , ok, understand.
My experience: the model is fetching files from https://huggingface.co/jinaai/jina-embeddings-v2-base-en/tree/main and from this place: https://huggingface.co/jinaai/jina-bert-implementation/tree/main.
I applied option B)first: HF itself downloaded the files and created necessary folders/sub-folders.

Then I started to apply option A) in the environment where for some security reasons I'm not allowed to apply option B), yet I can apply option A). For all the other HF models (SBERT and T5 mainly) option A) worked in this environment. For https://huggingface.co/jinaai/jina-embeddings-v2-base-en - I struggle - not sure where to place configuration_bert.py as the folder name (created when applying option B) includes a string like: \snapshots\c41...8431712f4b ... -> seems it is a parameter, probably you can change it. Maybe I should just use this name of a folder.

To not to spend too much time on the installation of the model, I switched to the environment where option B) works (with the newest version of Transformers). I testing the model for some LLM tasks and comparing it\s performance with BERT models/fine-tuned ones.

Let me know if you are able to apply option A). Thanks

A-Issa-1999

Dec 22, 2023

@JoannaSapiecha yes sure will try it and let you know.

robertsatya

Dec 22, 2023

For a complete offline loading of the model, you can place the 2 files from jina-bert-implementation (configuration_bert and modeling_bert) in the same folder as the model artifacts in your local folder.

You need to then edit the config.json to reflect the new location. You can modify it as follows

{
  "_name_or_path": ".",  ## --> Changed
  "model_max_length": 8192,
  "architectures": [
    "JinaBertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_bert.JinaBertConfig",  ## --> Changed
    "AutoModelForMaskedLM": "modeling_bert.JinaBertForMaskedLM",  ## --> Changed
    "AutoModel": "modeling_bert.JinaBertModel",  ## --> Changed
    "AutoModelForSequenceClassification": "modeling_bert.JinaBertForSequenceClassification"  ## --> Changed
  },
. . . 
. . .
}

This tells the module to look for these additional files in the same directory. You can then test it by setting local_files_only=True in your AutoModel.from_pretrained method

bwang0911

Jina AI org Feb 26

hi all, i'll close this issue. Let us know if you have further questions.

bwang0911 changed discussion status to closed Feb 26

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment