michael-guenther commited on
Commit
6aa793d
1 Parent(s): 0f35a10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -1
README.md CHANGED
@@ -2686,7 +2686,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
2686
  </p>
2687
  </details>
2688
 
2689
- You can use Jina Embedding models directly from transformers package:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2690
  ```python
2691
  !pip install transformers
2692
  from transformers import AutoModel
@@ -2707,6 +2724,28 @@ embeddings = model.encode(
2707
  )
2708
  ```
2709
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2710
  ## Alternatives to Using Transformers Package
2711
 
2712
  1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
@@ -2727,6 +2766,28 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
2727
  2. Multimodal embedding models enable Multimodal RAG applications.
2728
  3. High-performt rerankers.
2729
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2730
  ## Contact
2731
 
2732
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
 
2686
  </p>
2687
  </details>
2688
 
2689
+ You can use Jina Embedding models directly from transformers package.
2690
+
2691
+ First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
2692
+ ```bash
2693
+ huggingface-cli login
2694
+ ```
2695
+ Alternatively, you can provide the access token as an environment variable in the shell:
2696
+ ```bash
2697
+ export HF_TOKEN="<your token here>"
2698
+ ```
2699
+ or in Python:
2700
+ ```python
2701
+ import os
2702
+
2703
+ os.environ['HF_TOKEN'] = "<your token here>"
2704
+ ```
2705
+
2706
+ Then, you can use load and use the model via the `AutoModel` class:
2707
  ```python
2708
  !pip install transformers
2709
  from transformers import AutoModel
 
2724
  )
2725
  ```
2726
 
2727
+ Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
2728
+
2729
+ ```python
2730
+ !pip install -U sentence-transformers
2731
+ from sentence_transformers import SentenceTransformer
2732
+ from sentence_transformers.util import cos_sim
2733
+
2734
+ model = SentenceTransformer(
2735
+ "jinaai/jina-embeddings-v2-small-en", # switch to en/zh for English or Chinese
2736
+ trust_remote_code=True
2737
+ )
2738
+
2739
+ # control your input sequence length up to 8192
2740
+ model.max_seq_length = 1024
2741
+
2742
+ embeddings = model.encode([
2743
+ 'How is the weather today?',
2744
+ 'Wie ist das Wetter heute?'
2745
+ ])
2746
+ print(cos_sim(embeddings[0], embeddings[1]))
2747
+ ```
2748
+
2749
  ## Alternatives to Using Transformers Package
2750
 
2751
  1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
 
2766
  2. Multimodal embedding models enable Multimodal RAG applications.
2767
  3. High-performt rerankers.
2768
 
2769
+ ## Trouble Shooting
2770
+
2771
+ **Loading of Model Code failed**
2772
+
2773
+ If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
2774
+ This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
2775
+
2776
+ ```bash
2777
+ Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
2778
+ ```
2779
+
2780
+
2781
+ **User is not logged into Huggingface**
2782
+
2783
+ The model is only availabe under [gated access](https://huggingface.co/docs/hub/models-gated).
2784
+ This means you need to be logged into huggingface load load it.
2785
+ If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above:
2786
+ ```bash
2787
+ OSError: jinaai/jina-embeddings-v2-base-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
2788
+ If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
2789
+ ```
2790
+
2791
  ## Contact
2792
 
2793
  Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.