sionic commited on
Commit
532895e
1 Parent(s): 97beb11

Update python examples to compute cosine similarity

Browse files
Files changed (1) hide show
  1. README.md +14 -12
README.md CHANGED
@@ -2615,7 +2615,7 @@ Anyone can easily train and control AI.
2615
 
2616
  ## How to get embeddings
2617
 
2618
- Currently, we open the beta version of embedding API v1 and v2.
2619
  To get embeddings, you should call API endpoint to send your text.
2620
  You can send either a single sentence or multiple sentences.
2621
  The embeddings that correspond to the inputs will be returned.
@@ -2681,14 +2681,15 @@ inputs1 = ["first query", "second query"]
2681
  inputs2 = ["third query", "fourth query"]
2682
  embedding1 = get_embedding(inputs1, url=url)
2683
  embedding2 = get_embedding(inputs2, url=url)
2684
- similarity = embedding1 @ embedding2.T
2685
- print(similarity)
2686
  ```
2687
 
2688
  Using pre-defined [SionicEmbeddingModel](https://huggingface.co/sionic-ai/sionic-ai-v1/blob/main/model_api.py) to obtain embeddings.
2689
 
2690
  ```python
2691
  from model_api import SionicEmbeddingModel
 
2692
 
2693
  inputs1 = ["first query", "second query"]
2694
  inputs2 = ["third query", "fourth query"]
@@ -2696,15 +2697,16 @@ model = SionicEmbeddingModel(url="https://api.sionic.ai/v1/embedding",
2696
  dimension=2048)
2697
  embedding1 = model.encode(inputs1)
2698
  embedding2 = model.encode(inputs2)
2699
- similarity = embedding1 @ embedding2.T
2700
- print(similarity)
2701
  ```
2702
  We apply the instruction to encode short queries for retrieval tasks.
2703
- By using `encode_queries()`, you can use instruction to encode queries which is prefixed to each query.
2704
- The instruction to use for both v1 and v2 models is `"query: "`.
2705
 
2706
  ```python
2707
  from model_api import SionicEmbeddingModel
 
2708
 
2709
  query = ["first query", "second query"]
2710
  passage = ["This is a passage related to the first query", "This is a passage related to the second query"]
@@ -2713,19 +2715,19 @@ model = SionicEmbeddingModel(url="https://api.sionic.ai/v1/embedding",
2713
  dimension=2048)
2714
  query_embedding = model.encode_queries(query)
2715
  passage_embedding = model.encode_corpus(passage)
2716
- similarity = query_embedding @ passage_embedding.T
2717
- print(similarity)
2718
  ```
2719
 
2720
  ## Massive Text Embedding Benchmark (MTEB) Evaluation
2721
 
2722
  Both versions of Sionic AI's embedding show the state-of-the-art performances on the MTEB!
2723
- You can find a code to evaluate MTEB datasets using v1 embedding [here](https://huggingface.co/sionic-ai/sionic-ai-v1/blob/main/mteb_evaluate.py).
2724
 
2725
  | Model Name | Dimension | Sequence Length | Average (56) |
2726
  |:-----------------------------------------------------------------------:|:---------:|:---:|:------------:|
2727
- | [sionic-ai/sionic-ai-v2](https://huggingface.co/sionic-ai/sionic-ai-v2) | 3072 | 512 | **65.23** |
2728
- | [sionic-ai/sionic-ai-v1](https://huggingface.co/sionic-ai/sionic-ai-v1) | 2048 | 512 | 64.92 |
2729
  | [bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | 64.23 |
2730
  | [gte-large-en](https://huggingface.co/barisaydin/gte-large) | 1024 | 512 | 63.13 |
2731
  | [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings/types-of-embedding-models) | 1536 | 8191 | 60.99 |
 
2615
 
2616
  ## How to get embeddings
2617
 
2618
+ Currently, we open the beta version of embedding APIs.
2619
  To get embeddings, you should call API endpoint to send your text.
2620
  You can send either a single sentence or multiple sentences.
2621
  The embeddings that correspond to the inputs will be returned.
 
2681
  inputs2 = ["third query", "fourth query"]
2682
  embedding1 = get_embedding(inputs1, url=url)
2683
  embedding2 = get_embedding(inputs2, url=url)
2684
+ cos_similarity = (embedding1 / np.linalg.norm(embedding1)) @ (embedding2 / np.linalg.norm(embedding1)).T
2685
+ print(cos_similarity)
2686
  ```
2687
 
2688
  Using pre-defined [SionicEmbeddingModel](https://huggingface.co/sionic-ai/sionic-ai-v1/blob/main/model_api.py) to obtain embeddings.
2689
 
2690
  ```python
2691
  from model_api import SionicEmbeddingModel
2692
+ import numpy as np
2693
 
2694
  inputs1 = ["first query", "second query"]
2695
  inputs2 = ["third query", "fourth query"]
 
2697
  dimension=2048)
2698
  embedding1 = model.encode(inputs1)
2699
  embedding2 = model.encode(inputs2)
2700
+ cos_similarity = (embedding1 / np.linalg.norm(embedding1)) @ (embedding2 / np.linalg.norm(embedding1)).T
2701
+ print(cos_similarity)
2702
  ```
2703
  We apply the instruction to encode short queries for retrieval tasks.
2704
+ By using `encode_queries()`, you can use the instruction to encode queries which is prefixed to each query as the following example.
2705
+ The recommended instruction for both v1 and v2 models is `"query: "`.
2706
 
2707
  ```python
2708
  from model_api import SionicEmbeddingModel
2709
+ import numpy as np
2710
 
2711
  query = ["first query", "second query"]
2712
  passage = ["This is a passage related to the first query", "This is a passage related to the second query"]
 
2715
  dimension=2048)
2716
  query_embedding = model.encode_queries(query)
2717
  passage_embedding = model.encode_corpus(passage)
2718
+ cos_similarity = (query_embedding / np.linalg.norm(query_embedding)) @ (passage_embedding / np.linalg.norm(passage_embedding)).T
2719
+ print(cos_similarity)
2720
  ```
2721
 
2722
  ## Massive Text Embedding Benchmark (MTEB) Evaluation
2723
 
2724
  Both versions of Sionic AI's embedding show the state-of-the-art performances on the MTEB!
2725
+ You can find a code to evaluate MTEB datasets [here](https://huggingface.co/sionic-ai/sionic-ai-v1/blob/main/mteb_evaluate.py).
2726
 
2727
  | Model Name | Dimension | Sequence Length | Average (56) |
2728
  |:-----------------------------------------------------------------------:|:---------:|:---:|:------------:|
2729
+ | [sionic-ai/sionic-ai-v2](https://huggingface.co/sionic-ai/sionic-ai-v2) | 3072 | 512 | 65.23 |
2730
+ | [sionic-ai/sionic-ai-v1](https://huggingface.co/sionic-ai/sionic-ai-v1) | 2048 | 512 | **64.92** |
2731
  | [bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | 64.23 |
2732
  | [gte-large-en](https://huggingface.co/barisaydin/gte-large) | 1024 | 512 | 63.13 |
2733
  | [text-embedding-ada-002](https://platform.openai.com/docs/guides/embeddings/types-of-embedding-models) | 1536 | 8191 | 60.99 |