bwang0911 commited on
Commit
2625d83
1 Parent(s): 2e6d12b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -2621,9 +2621,9 @@ model-index:
2621
 
2622
  ## Intended Usage & Model Info
2623
 
2624
- `jina-embedding-b-en-v2` is an English, monolingual **embedding model** supporting **8192 sequence length**.
2625
  It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
2626
- The backbone `jina-bert-b-en-v2` is pretrained on the C4 dataset.
2627
  The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
2628
  These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
2629
 
@@ -2635,15 +2635,15 @@ Additionally, we provide the following embedding models:
2635
 
2636
  ### V1 (Based on T5, 512 Seq)
2637
 
2638
- - [`jina-embedding-s-en-v1`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
2639
- - [`jina-embedding-b-en-v1`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
2640
- - [`jina-embedding-l-en-v1`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
2641
 
2642
  ### V2 (Based on JinaBert, 8k Seq)
2643
 
2644
- - [`jina-embedding-s-en-v2`](https://huggingface.co/jinaai/jina-embedding-s-en-v2): 33 million parameters **(you are here)**.
2645
- - [`jina-embedding-b-en-v2`](https://huggingface.co/jinaai/jina-embedding-b-en-v2): 137 million parameters.
2646
- - [`jina-embedding-l-en-v2`]: 435 million parameters (releasing soon).
2647
 
2648
  ## Data & Parameters
2649
 
@@ -2660,7 +2660,7 @@ from transformers import AutoModel
2660
  from numpy.linalg import norm
2661
 
2662
  cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
2663
- model = AutoModel.from_pretrained('jinaai/jina-embedding-b-en-v2', trust_remote_code=True) # trust_remote_code is needed to use the encode method
2664
  embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
2665
  print(cos_sim(embeddings[0], embeddings[1]))
2666
  ```
 
2621
 
2622
  ## Intended Usage & Model Info
2623
 
2624
+ `jina-embeddings-v2-base-en` is an English, monolingual **embedding model** supporting **8192 sequence length**.
2625
  It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
2626
+ The backbone `jina-bert-v2-base-en` is pretrained on the C4 dataset.
2627
  The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
2628
  These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
2629
 
 
2635
 
2636
  ### V1 (Based on T5, 512 Seq)
2637
 
2638
+ - [`jina-embeddings-v1-small-en`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
2639
+ - [`jina-embeddings-v1-base-en`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
2640
+ - [`jina-embeddings-v2-large-en`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
2641
 
2642
  ### V2 (Based on JinaBert, 8k Seq)
2643
 
2644
+ - [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters **(you are here)**.
2645
+ - [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
2646
+ - [`jina-embeddings-v2-large-en`](): 435 million parameters (releasing soon).
2647
 
2648
  ## Data & Parameters
2649
 
 
2660
  from numpy.linalg import norm
2661
 
2662
  cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
2663
+ model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True) # trust_remote_code is needed to use the encode method
2664
  embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
2665
  print(cos_sim(embeddings[0], embeddings[1]))
2666
  ```