How to load jinna-embedindg model in local

#29
by zyznull - opened

I wan to run the embeding model in a local environment, but it seems that the jinaai-implentation files can only be obtained through trust_remote_code, can i download these implentation files to local?

I have the same problem. Do you know where to look for the line what code is injected by "trust_remote_code=True"?

Jina AI org

hi @zyznull and @makram93 the moment you download the model using AutoModel, it is cached in your local folder ~/cache/huggingface. The next time you load model it will default load from cache, not remote.

The implementation files, are automatically linked and downloaded as well, besides, it is also open source here: https://huggingface.co/jinaai/jina-bert-implementation

hello, @bwang0911 , but what's the command to manually save and load the model without having this warning?

I think, in any case you need to set the trust_remote_code flag because the code is not part of the transformers package. This is independent from whether the model is loaded from a local folder or from huggingface.

Hi, @zyznull did you manage to load jinna-embedindg model in local and run it? Thank you
2023-12-20: Solved

Hello @JoannaSapiecha were you able to load it locally?

Hi, @A-Issa-1999 yes, I did.

@JoannaSapiecha How ? without using trust_remote_cost ? because it tried to remove it and the model won't behave the same/

@A-Issa-1999 -

  1. I used trust_remote_cost: model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True).
  2. I upgraded the Transformers package to the newest available version (4.36.2), as advised by @robertsatya here: https://huggingface.co/jinaai/jina-embeddings-v2-base-en/discussions/35
  3. Before I had removed the cashed version (all files from all sub-folders)

I did also some other things like using langchain.embeddings, but suggest to start with applying the above steps. Maybe nothing more is needed.

@JoannaSapiecha so you were able to store and save the model and all its dependencies inside a specific older or all these are included in the cache?

@A-Issa-1999 , do you mean A) downloading the files manually from Hugging Face (HF) and creating the folders manually? Or, B) allowing HF to create the folders and download all the files.
Files = the model and all its dependencies (*.json, other files).

@JoannaSapiecha i was referring to 'A'. I employed git clone for obtaining the model and placed it in a designated directory. However, when employing trust_remote_code, it appears that the model is fetching files from locations beyond huggingface, and I encountered difficulty manually storing them in a particular folder. Hence, I am curious if there is a prescribed method for achieving this.

@A-Issa-1999 , ok, understand.
My experience: the model is fetching files from https://huggingface.co/jinaai/jina-embeddings-v2-base-en/tree/main and from this place: https://huggingface.co/jinaai/jina-bert-implementation/tree/main.
I applied option B)first: HF itself downloaded the files and created necessary folders/sub-folders.

Then I started to apply option A) in the environment where for some security reasons I'm not allowed to apply option B), yet I can apply option A). For all the other HF models (SBERT and T5 mainly) option A) worked in this environment. For https://huggingface.co/jinaai/jina-embeddings-v2-base-en - I struggle - not sure where to place configuration_bert.py as the folder name (created when applying option B) includes a string like: \snapshots\c41...8431712f4b ... -> seems it is a parameter, probably you can change it. Maybe I should just use this name of a folder.

To not to spend too much time on the installation of the model, I switched to the environment where option B) works (with the newest version of Transformers). I testing the model for some LLM tasks and comparing it\s performance with BERT models/fine-tuned ones.

Let me know if you are able to apply option A). Thanks

@JoannaSapiecha yes sure will try it and let you know.

For a complete offline loading of the model, you can place the 2 files from jina-bert-implementation (configuration_bert and modeling_bert) in the same folder as the model artifacts in your local folder.

You need to then edit the config.json to reflect the new location. You can modify it as follows

{
  "_name_or_path": ".",  ## --> Changed
  "model_max_length": 8192,
  "architectures": [
    "JinaBertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_bert.JinaBertConfig",  ## --> Changed
    "AutoModelForMaskedLM": "modeling_bert.JinaBertForMaskedLM",  ## --> Changed
    "AutoModel": "modeling_bert.JinaBertModel",  ## --> Changed
    "AutoModelForSequenceClassification": "modeling_bert.JinaBertForSequenceClassification"  ## --> Changed
  },
. . . 
. . .
} 

This tells the module to look for these additional files in the same directory. You can then test it by setting local_files_only=True in your AutoModel.from_pretrained method

Jina AI org

hi all, i'll close this issue. Let us know if you have further questions.

bwang0911 changed discussion status to closed

Sign up or log in to comment